==================================================Ascend ============================= test session starts ============================== platform linux -- Python 3.9.19, pytest-6.2.5, py-1.11.0, pluggy-1.5.0 rootdir: /home/jenkins/mindspore/testcases/testcases/tests/st/mint, configfile: ../../../../../../sault/virtual_test/virtualenv_005/sault/config/pytest.ini plugins: mock-3.14.0, hydra-core-1.3.2, forked-1.6.0, anyio-4.9.0, xdist-1.32.0 collected 300 items test_functional_mul.py . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x0-pynative],max_mem:2.0M TotalTime = 4.02449, [24] [bootstrap]: 0.00094873 [type_inference]: 0.116061 [event_method]: 1.493e-05 [auto_monad]: 0.00053237 [graph_reusing]: 4.30999e-06 [inline]: 3.01001e-06 [add_attr]: 0.127388, [1] [add_attr_with_inline]: 0.127181, [1] [Cycle 1]: 0.00010044, [2] [tag_attr]: 2.275e-05 [meta_addattr_fg_expand]: 4.70001e-06 [parallel-infer-symbol]: 3.6e-06 [pre_auto_parallel]: 4.833e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.78002e-06 [optimize]: 0.00617291, [53] [py_interpret_to_execute]: 3.036e-05 [rewriter_before_opt_a]: 7.516e-05 [opt_a]: 0.00356853, [2] [Cycle 1]: 0.00290084, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 3.59e-05 [loop_unroll]: 2.154e-05 [a_1]: 0.00101971 [with_stream_mark]: 2.271e-05 [recompute_prepare]: 1.112e-05 [updatestate_depend_eliminate]: 4.25e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 2.16e-06 [a_2]: 8.448e-05 [accelerated_algorithm]: 7.02002e-06 [shard]: 2.76e-06 [meta_shard_fg_expand]: 2.06998e-06 [shard_inline]: 8.33999e-06 [merge_send_recv]: 9.85002e-06 [auto_parallel]: 9.07001e-06 [parallel]: 5.677e-05 [flash_sp]: 1.3e-05 [merge_comm]: 1.799e-05 [allreduce_fusion]: 4.19997e-06 [matmul_add_comm_reduction]: 1.038e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.286e-05 [virtual_dataset]: 6.59001e-06 [get_grad_eliminate_]: 6.44001e-06 [virtual_output]: 6.78e-06 [merge_forward]: 3.86001e-06 [cell_reuse_recompute_pass]: 2.31e-06 [offload_activation]: 1.083e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.506e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 1.027e-05 [set_forward_comm_id_for_comm_node_pass]: 3.70998e-06 [meta_fg_expand]: 2.70002e-06 [flash_sp_send_recv_attached]: 2.90002e-06 [receive_attached]: 1.268e-05 [after_resolve]: 1.483e-05 [a_after_grad]: 9.74e-06 [renormalize]: 0.00079178 [add_forward_monad_depend]: 6.86001e-06 [auto_monad_grad]: 2.88e-06 [auto_monad_eliminator]: 1.684e-05 [cse]: 4.093e-05 [a_3]: 4.797e-05 [Cycle 2]: 0.00065524, [45] [expand_dump_flag]: 1.92999e-06 [switch_simplify]: 7.3e-06 [loop_unroll]: 5.87001e-06 [a_1]: 0.00013754 [with_stream_mark]: 1.265e-05 [recompute_prepare]: 5.85002e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.41998e-06 [a_2]: 7.012e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.91e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.41002e-06 [merge_send_recv]: 5.59e-06 [auto_parallel]: 7.78999e-06 [parallel]: 6.98998e-06 [flash_sp]: 7.1e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 6.74999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.84001e-06 [virtual_dataset]: 5.57001e-06 [get_grad_eliminate_]: 5.80002e-06 [virtual_output]: 6.25002e-06 [merge_forward]: 3.33e-06 [cell_reuse_recompute_pass]: 1.77999e-06 [offload_activation]: 8.47e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 8.91002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.02999e-06 [flash_sp_send_recv_attached]: 1.18001e-06 [receive_attached]: 1.40999e-06 [after_resolve]: 1.111e-05 [a_after_grad]: 8.67998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 7.20003e-06 [cse]: 1.396e-05 [a_3]: 3.493e-05 [py_interpret_to_execute_after_opt_a]: 1.145e-05 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 5.834e-05 [convert_after_rewriter]: 9.51998e-06 [order_py_execute_after_rewriter]: 5.38002e-06 [mutable_eliminate]: 0.00075371 [opt_b]: 0.00020484, [1] [Cycle 1]: 0.00019695, [7] [b_1]: 0.00011572 [b_2]: 7.93001e-06 [updatestate_depend_eliminate]: 7.78001e-06 [updatestate_assign_eliminate]: 2.60997e-06 [updatestate_loads_eliminate]: 2.35002e-06 [renormalize]: 5.29981e-07 [cse]: 2.148e-05 [optimize_parallel_all_gather_comm]: 1.847e-05 [overlap_param_gather]: 5.46e-06 [cconv]: 3.161e-05 [loop_unroll]: 0.00056533 [opt_after_cconv]: 0.00010793, [1] [Cycle 1]: 0.00010068, [7] [c_1]: 2.972e-05 [parameter_eliminate]: 4.53999e-06 [updatestate_depend_eliminate]: 6.07001e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.958e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.484e-05 [tuple_transform]: 7.96e-05, [1] [Cycle 1]: 7.381e-05, [4] [d_1]: 4.632e-05 [none_parameter_eliminate]: 2.28998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.59001e-06 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 5.606e-05 [cse_after_recomputation]: 2.282e-05, [1] [Cycle 1]: 1.741e-05, [1] [cse]: 1.175e-05 [environ_conv]: 1.948e-05 [swap_dp_allreduce_reducescatter]: 6.94999e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 5.80002e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.99999e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.34998e-06 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.94001e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.29003e-06 [interleave_parallel_branches]: 1.30999e-06 [overlap_opt_shard_in_pipeline]: 2.395e-05 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.42e-05 [grouped_pairwise_exchange_alltoall]: 2.16998e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 5.32001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.74e-06 [overlap_recompute_comm]: 2.33002e-06 [overlap_grad_ring_attention]: 4.97e-06 [overlap_grad_flash_sp]: 2.365e-05 [begin_end_overlap_inline]: 6.39993e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 2.36e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 0.00014806, [1] [Cycle 1]: 0.00014283, [6] [build]: 3.42002e-06 [elim_shapecalc]: 6.997e-05 [elim_not_effective]: 1.584e-05 [opt_reshape]: 7.59002e-06 [fold_const_symbol]: 9.91e-06 [renormalize]: 4.10015e-07 [detach_backward]: 2.43998e-06 [pipeline_parallel_scheduler]: 1.91998e-06 [auto_monad_reorder]: 2.611e-05 [get_jit_bprop_graph]: 1.81e-06 [rewriter_after_jit_bprop_graph]: 0.00016392 [opt_after_jit_grad]: 0.00053997 [validate]: 6.561e-05 [backend_pass]: 9.50007e-07 [task_emit]: 3.77153 [execute]: 0.00061986 Sums bootstrap : 0.000949s : 0.02% type_inference : 0.116061s : 2.98% event_method : 0.000015s : 0.00% auto_monad : 0.000532s : 0.01% graph_reusing : 0.000004s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000023s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000048s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000030s : 0.00% optimize.rewriter_before_opt_a : 0.000075s : 0.00% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000043s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.001157s : 0.03% optimize.opt_a.with_stream_mark : 0.000035s : 0.00% optimize.opt_a.recompute_prepare : 0.000017s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000014s : 0.00% optimize.opt_a.merge_send_recv : 0.000015s : 0.00% optimize.opt_a.auto_parallel : 0.000017s : 0.00% optimize.opt_a.parallel : 0.000064s : 0.00% optimize.opt_a.flash_sp : 0.000020s : 0.00% optimize.opt_a.merge_comm : 0.000022s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000020s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000014s : 0.00% optimize.opt_a.after_resolve : 0.000026s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000792s : 0.02% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.00% optimize.opt_a.cse : 0.000055s : 0.00% optimize.opt_a.a_3 : 0.000083s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000058s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000754s : 0.02% optimize.opt_b.b_1 : 0.000116s : 0.00% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000005s : 0.00% optimize.cconv : 0.000032s : 0.00% optimize.loop_unroll : 0.000565s : 0.01% optimize.opt_after_cconv.c_1 : 0.000030s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000046s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000019s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000024s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000070s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000164s : 0.00% opt_after_jit_grad : 0.000540s : 0.01% validate : 0.000066s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 3.771530s : 96.81% execute : 0.000620s : 0.02% Time group info: ------[substitution.] 0.000223 30 14.40% : 0.000032s : 5: substitution.arithmetic_simplify 0.89% : 0.000002s : 2: substitution.elim_not_effective 0.59% : 0.000001s : 2: substitution.fold_const_symbol 2.64% : 0.000006s : 4: substitution.graph_param_transform 67.23% : 0.000150s : 3: substitution.inline 1.78% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.21% : 0.000005s : 4: substitution.remove_not_recompute_node 3.96% : 0.000009s : 4: substitution.replace_old_param 6.29% : 0.000014s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.116003 2 98.56% : 0.114333s : 1: type_inference.infer 1.44% : 0.001670s : 1: type_inference.specialize ------[replace.] 0.000064 5 56.19% : 0.000036s : 3: replace.inline 43.81% : 0.000028s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000160 5 92.09% : 0.000147s : 3: match.inline 7.91% : 0.000013s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000484 1131 0.37% : 0.000002s : 11: predicate.accumulaten_eliminater 0.31% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.20% : 0.000001s : 8: predicate.addn_check_dump 0.35% : 0.000002s : 11: predicate.addn_zero_filter 0.27% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 0.91% : 0.000004s : 19: predicate.arithmetic_simplify 0.56% : 0.000003s : 11: predicate.cast_eliminate 0.37% : 0.000002s : 8: predicate.check_bprop_eliminate 0.20% : 0.000001s : 8: predicate.compare_switch_simplify 0.08% : 0.000000s : 4: predicate.const_output_eliminate 0.26% : 0.000001s : 8: predicate.depend_value_elim 0.34% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.37% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.31% : 0.000002s : 11: predicate.dict_set_item_eliminator 0.54% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 4: predicate.elim_not_effective 0.28% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 60.08% : 0.000291s : 15: predicate.environ_add_const_eliminate 0.43% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.41% : 0.000002s : 15: predicate.environ_get_depend_swap 0.84% : 0.000004s : 23: predicate.environ_get_eliminate 0.48% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.49% : 0.000002s : 16: predicate.exchange_switch_depend_value 0.89% : 0.000004s : 16: predicate.float_depend_g_call 0.22% : 0.000001s : 8: predicate.float_environ_get_switch 0.37% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.08% : 0.000000s : 4: predicate.fold_const_symbol 0.35% : 0.000002s : 8: predicate.get_grad_eliminate 0.09% : 0.000000s : 4: predicate.graph_param_transform 0.22% : 0.000001s : 8: predicate.incorporate_call 0.19% : 0.000001s : 8: predicate.incorporate_call_switch 2.29% : 0.000011s : 51: predicate.inline 0.31% : 0.000001s : 8: predicate.inline_without_move 0.15% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.39% : 0.000002s : 8: predicate.less_batch_normalization 0.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 0.93% : 0.000004s : 32: predicate.load_eliminater 0.51% : 0.000002s : 4: predicate.loop_unroll_after_grad 0.72% : 0.000003s : 26: predicate.loop_unroll_before_grad 0.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.20% : 0.000001s : 8: predicate.merge_addn 0.31% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.30% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.29% : 0.000001s : 11: predicate.minmaximum_grad 0.62% : 0.000003s : 4: predicate.mutable_eliminate 0.23% : 0.000001s : 4: predicate.opt_reshape 0.19% : 0.000001s : 4: predicate.parallel_virtual_node 0.89% : 0.000004s : 16: predicate.partial_defer_inline 0.50% : 0.000002s : 17: predicate.partial_eliminate 0.37% : 0.000002s : 11: predicate.print_const_string_wrapper 0.25% : 0.000001s : 8: predicate.reduce_all_const_elim 0.60% : 0.000003s : 11: predicate.reduce_eliminate 0.95% : 0.000005s : 32: predicate.redundant_stop_gradient_eliminater 0.19% : 0.000001s : 8: predicate.remove_not_recompute_node 0.55% : 0.000003s : 21: predicate.replace_applicator 0.27% : 0.000001s : 8: predicate.replace_old_param 0.12% : 0.000001s : 4: predicate.reset_defer_inline 0.33% : 0.000002s : 11: predicate.reshape_eliminate 0.30% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.22% : 0.000001s : 4: predicate.row_tensor_eliminate 0.46% : 0.000002s : 8: predicate.same_eliminate 0.19% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.50% : 0.000002s : 8: predicate.shard_identity_eliminate 0.33% : 0.000002s : 8: predicate.special_op_eliminate 0.25% : 0.000001s : 8: predicate.specialize_transform 0.56% : 0.000003s : 8: predicate.split_environ_get_set_with_tuple_value 0.36% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.13% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.49% : 0.000002s : 16: predicate.switch_defer_inline 0.76% : 0.000004s : 24: predicate.switch_layer_defer_inline 1.74% : 0.000008s : 54: predicate.switch_simplify 0.39% : 0.000002s : 11: predicate.tile_eliminate 0.34% : 0.000002s : 11: predicate.transpose_eliminate 0.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 0.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 0.49% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 1.22% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 0.56% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 0.83% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 0.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 0.85% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 1.15% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.21% : 0.000001s : 4: predicate.value_based_eliminate 0.28% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.44% : 0.000002s : 8: predicate.virtual_output_eliminate 0.15% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.25% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001141 8 34.99% : 0.000399s : 3: func_graph_cloner_run.FuncGraphClonerGraph 65.01% : 0.000742s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 4.160439 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.06% : 0.127394s : 1: add_attr 3.06% : 0.127187s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000544s : 1: auto_monad 0.00% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.02% : 0.001038s : 1: bootstrap 0.00% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000024s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.02% : 0.000634s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.01% : 0.000578s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.02% : 0.000768s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000027s : 1: opt.transform.mutable_eliminate 0.04% : 0.001568s : 78: opt.transform.opt_a 0.00% : 0.000028s : 1: opt.transform.opt_after_cconv 0.00% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000097s : 28: opt.transform.opt_b 0.00% : 0.000051s : 2: opt.transform.opt_trans_graph 0.00% : 0.000099s : 4: opt.transform.symbol_engine_opt 0.09% : 0.003572s : 1: opt_a 0.00% : 0.000112s : 1: opt_after_cconv 0.01% : 0.000553s : 1: opt_after_jit_grad 0.01% : 0.000209s : 1: opt_b 0.15% : 0.006178s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000028s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000009s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000053s : 1: pre_auto_parallel 0.00% : 0.000035s : 1: py_interpret_to_execute 0.00% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.00% : 0.000018s : 1: remove_dup_value 0.01% : 0.000397s : 1: renormalize.infer 0.01% : 0.000384s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000171s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000063s : 1: rewriter_after_opt_a 0.00% : 0.000079s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000151s : 1: symbol_engine_optimizer 90.65% : 3.771580s : 1: task_emit 0.00% : 0.000082s : 1: tuple_transform 2.79% : 0.116078s : 1: type_inference 0.00% : 0.000101s : 1: validate TotalTime = 0.169828, [24] [bootstrap]: 0.00062035 [type_inference]: 0.0102193 [event_method]: 1.441e-05 [auto_monad]: 5.99e-05 [graph_reusing]: 6.01e-06 [inline]: 2.68e-06 [add_attr]: 0.144451, [1] [add_attr_with_inline]: 0.144438, [1] [Cycle 1]: 6.437e-05, [2] [tag_attr]: 1.722e-05 [meta_addattr_fg_expand]: 3.71999e-06 [parallel-infer-symbol]: 4.2e-06 [pre_auto_parallel]: 3.871e-05 [insert-virtual-dataset]: 2.87002e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00486752, [53] [py_interpret_to_execute]: 2.188e-05 [rewriter_before_opt_a]: 5.391e-05 [opt_a]: 0.00257403, [2] [Cycle 1]: 0.00184469, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.73e-05 [loop_unroll]: 1.419e-05 [a_1]: 0.00034651 [with_stream_mark]: 2.085e-05 [recompute_prepare]: 9.24998e-06 [updatestate_depend_eliminate]: 3.96001e-06 [updatestate_assign_eliminate]: 3.56001e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 2.01e-06 [a_2]: 8.377e-05 [accelerated_algorithm]: 6.88e-06 [shard]: 2.83e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 6.46999e-06 [merge_send_recv]: 5.462e-05 [auto_parallel]: 7.01999e-06 [parallel]: 4.483e-05 [flash_sp]: 9.34e-06 [merge_comm]: 3.98999e-06 [allreduce_fusion]: 3.79002e-06 [matmul_add_comm_reduction]: 1.197e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 8.22e-06 [virtual_dataset]: 6.89999e-06 [get_grad_eliminate_]: 6.38e-06 [virtual_output]: 6.58e-06 [merge_forward]: 4.46002e-06 [cell_reuse_recompute_pass]: 1.59e-06 [offload_activation]: 1.271e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.334e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 1.127e-05 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.71999e-06 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 3.16001e-06 [after_resolve]: 1.261e-05 [a_after_grad]: 1.03e-05 [renormalize]: 0.00070251 [add_forward_monad_depend]: 6.04001e-06 [auto_monad_grad]: 2.68998e-06 [auto_monad_eliminator]: 1.948e-05 [cse]: 3.082e-05 [a_3]: 4.676e-05 [Cycle 2]: 0.00071646, [45] [expand_dump_flag]: 1.65001e-06 [switch_simplify]: 7.55e-06 [loop_unroll]: 6.08998e-06 [a_1]: 0.00013705 [with_stream_mark]: 1.375e-05 [recompute_prepare]: 6.49999e-06 [updatestate_depend_eliminate]: 3.38e-06 [updatestate_assign_eliminate]: 2.91999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.37e-06 [a_2]: 7.037e-05 [accelerated_algorithm]: 6.07001e-06 [shard]: 1.67001e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 8.54e-06 [merge_send_recv]: 6.06e-06 [auto_parallel]: 7.16999e-06 [parallel]: 6.53003e-06 [flash_sp]: 3.79002e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.86999e-06 [matmul_add_comm_reduction]: 7.46999e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 7.03e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.17e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 2.56e-06 [offload_activation]: 8.67e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 8.89995e-07 [before_grad]: 9.28002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 1.67001e-06 [receive_attached]: 2.14999e-06 [after_resolve]: 1.112e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.71e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 7.48999e-06 [cse]: 1.737e-05 [a_3]: 3.99e-05 [py_interpret_to_execute_after_opt_a]: 1.317e-05 [slice_cell_reuse_recomputed_activation]: 2.61e-06 [rewriter_after_opt_a]: 4.147e-05 [convert_after_rewriter]: 8.37e-06 [order_py_execute_after_rewriter]: 5.87999e-06 [mutable_eliminate]: 0.00067053 [opt_b]: 0.00020779, [1] [Cycle 1]: 0.0001991, [7] [b_1]: 0.00011599 [b_2]: 7.55e-06 [updatestate_depend_eliminate]: 8.35999e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.23998e-06 [renormalize]: 5.69999e-07 [cse]: 2.276e-05 [optimize_parallel_all_gather_comm]: 1.952e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 3.209e-05 [loop_unroll]: 0.0004701 [opt_after_cconv]: 0.00010495, [1] [Cycle 1]: 9.878e-05, [7] [c_1]: 2.951e-05 [parameter_eliminate]: 4.18999e-06 [updatestate_depend_eliminate]: 6.13998e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.97e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.438e-05 [tuple_transform]: 7.6e-05, [1] [Cycle 1]: 7.145e-05, [4] [d_1]: 4.408e-05 [none_parameter_eliminate]: 1.84998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.54999e-06 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 5.082e-05 [cse_after_recomputation]: 2.259e-05, [1] [Cycle 1]: 1.792e-05, [1] [cse]: 1.227e-05 [environ_conv]: 5.10001e-06 [swap_dp_allreduce_reducescatter]: 5.82001e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 5.20999e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.45002e-06 [assign_add_opt]: 1.69998e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 9.60019e-07 [full_micro_interleaved_order_control]: 2.18002e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.25001e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.356e-05 [grouped_pairwise_exchange_alltoall]: 1.61998e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.82998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.53002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.84999e-06 [overlap_grad_ring_attention]: 4.65001e-06 [overlap_grad_flash_sp]: 2.108e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.59001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.47999e-06 [symbol_engine_optimizer]: 7.869e-05, [1] [Cycle 1]: 7.424e-05, [6] [build]: 4.55001e-06 [elim_shapecalc]: 1.04e-05 [elim_not_effective]: 1.38e-05 [opt_reshape]: 6.89999e-06 [fold_const_symbol]: 9.81998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.46998e-06 [pipeline_parallel_scheduler]: 2.12999e-06 [auto_monad_reorder]: 1.817e-05 [get_jit_bprop_graph]: 1.99999e-06 [rewriter_after_jit_bprop_graph]: 5.89999e-06 [opt_after_jit_grad]: 0.0005149 [validate]: 4.087e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00866824 [execute]: 9.91998e-06 Sums bootstrap : 0.000620s : 2.57% type_inference : 0.010219s : 42.29% event_method : 0.000014s : 0.06% auto_monad : 0.000060s : 0.25% graph_reusing : 0.000006s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000039s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.09% optimize.rewriter_before_opt_a : 0.000054s : 0.22% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000035s : 0.14% optimize.opt_a.loop_unroll : 0.000020s : 0.08% optimize.opt_a.a_1 : 0.000484s : 2.00% optimize.opt_a.with_stream_mark : 0.000035s : 0.14% optimize.opt_a.recompute_prepare : 0.000016s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000154s : 0.64% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.05% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000015s : 0.06% optimize.opt_a.merge_send_recv : 0.000061s : 0.25% optimize.opt_a.auto_parallel : 0.000014s : 0.06% optimize.opt_a.parallel : 0.000051s : 0.21% optimize.opt_a.flash_sp : 0.000013s : 0.05% optimize.opt_a.merge_comm : 0.000008s : 0.03% optimize.opt_a.allreduce_fusion : 0.000008s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.06% optimize.opt_a.virtual_dataset : 0.000012s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.05% optimize.opt_a.virtual_output : 0.000012s : 0.05% optimize.opt_a.merge_forward : 0.000008s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000021s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000021s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000024s : 0.10% optimize.opt_a.a_after_grad : 0.000019s : 0.08% optimize.opt_a.renormalize : 0.000703s : 2.91% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.03% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000027s : 0.11% optimize.opt_a.cse : 0.000048s : 0.20% optimize.opt_a.a_3 : 0.000087s : 0.36% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000041s : 0.17% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000671s : 2.77% optimize.opt_b.b_1 : 0.000116s : 0.48% optimize.opt_b.b_2 : 0.000008s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000032s : 0.13% optimize.loop_unroll : 0.000470s : 1.95% optimize.opt_after_cconv.c_1 : 0.000030s : 0.12% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.06% optimize.tuple_transform.d_1 : 0.000044s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.21% optimize.cse_after_recomputation.cse : 0.000012s : 0.05% optimize.environ_conv : 0.000005s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000005s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000006s : 0.02% opt_after_jit_grad : 0.000515s : 2.13% validate : 0.000041s : 0.17% backend_pass : 0.000001s : 0.00% task_emit : 0.008668s : 35.87% execute : 0.000010s : 0.04% Time group info: ------[substitution.] 0.000162 26 17.74% : 0.000029s : 4: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000001s : 2: substitution.fold_const_symbol 4.10% : 0.000007s : 4: substitution.graph_param_transform 66.85% : 0.000108s : 2: substitution.inline 2.36% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.25% : 0.000005s : 4: substitution.remove_not_recompute_node 3.52% : 0.000006s : 4: substitution.replace_old_param ------[type_inference.] 0.010159 2 95.64% : 0.009716s : 1: type_inference.infer 4.36% : 0.000443s : 1: type_inference.specialize ------[replace.] 0.000025 2 100.00% : 0.000025s : 2: replace.inline ------[match.] 0.000107 2 100.00% : 0.000107s : 2: match.inline ------[predicate.] 0.000158 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.89% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.83% : 0.000004s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.84% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.03% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.61% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.03% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.94% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.29% : 0.000002s : 13: predicate.environ_get_depend_swap 1.79% : 0.000003s : 21: predicate.environ_get_eliminate 0.95% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.86% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.00% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.90% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.60% : 0.000001s : 8: predicate.incorporate_call_switch 6.34% : 0.000010s : 44: predicate.inline 1.18% : 0.000002s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000002s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.22% : 0.000003s : 26: predicate.load_eliminater 1.56% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.98% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.65% : 0.000003s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.14% : 0.000002s : 11: predicate.partial_defer_inline 1.10% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 9: predicate.reduce_eliminate 1.95% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.22% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.49% : 0.000001s : 4: predicate.reset_defer_inline 0.70% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.38% : 0.000002s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000002s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.94% : 0.000001s : 11: predicate.switch_defer_inline 1.64% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.27% : 0.000007s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.75% : 0.000001s : 9: predicate.transpose_eliminate 1.42% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.97% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.85% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.83% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000351 6 40.12% : 0.000141s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.88% : 0.000210s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.320887 196 0.00% : 0.000004s : 1: ForceFp32Comm 45.02% : 0.144459s : 1: add_attr 45.01% : 0.144443s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000055s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.02% : 0.000065s : 1: auto_monad 0.01% : 0.000022s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.21% : 0.000660s : 1: bootstrap 0.01% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000017s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.01% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000022s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000005s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.15% : 0.000480s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.21% : 0.000683s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 0.27% : 0.000878s : 78: opt.transform.opt_a 0.01% : 0.000028s : 1: opt.transform.opt_after_cconv 0.01% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000097s : 28: opt.transform.opt_b 0.02% : 0.000049s : 2: opt.transform.opt_trans_graph 0.01% : 0.000037s : 4: opt.transform.symbol_engine_opt 0.80% : 0.002577s : 1: opt_a 0.03% : 0.000109s : 1: opt_after_cconv 0.16% : 0.000527s : 1: opt_after_jit_grad 0.07% : 0.000212s : 1: opt_b 1.52% : 0.004873s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000043s : 1: pre_auto_parallel 0.01% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000018s : 1: remove_dup_value 0.13% : 0.000424s : 1: renormalize.infer 0.08% : 0.000268s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000046s : 1: rewriter_after_opt_a 0.02% : 0.000059s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000057s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.03% : 0.000081s : 1: symbol_engine_optimizer 2.71% : 0.008692s : 1: task_emit 0.02% : 0.000079s : 1: tuple_transform 3.19% : 0.010252s : 1: type_inference 0.02% : 0.000079s : 1: validate TotalTime = 0.163147, [24] [bootstrap]: 0.0005936 [type_inference]: 0.00730278 [event_method]: 1.812e-05 [auto_monad]: 6.276e-05 [graph_reusing]: 6.17999e-06 [inline]: 2.63998e-06 [add_attr]: 0.00444805, [1] [add_attr_with_inline]: 0.00443388, [1] [Cycle 1]: 6.41e-05, [2] [tag_attr]: 2.001e-05 [meta_addattr_fg_expand]: 4.56002e-06 [parallel-infer-symbol]: 3.86999e-06 [pre_auto_parallel]: 3.343e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 1.01002e-06 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.99e-06 [optimize]: 0.0057435, [53] [py_interpret_to_execute]: 2.917e-05 [rewriter_before_opt_a]: 0.00012452 [opt_a]: 0.00291074, [2] [Cycle 1]: 0.00219668, [45] [expand_dump_flag]: 3.75998e-06 [switch_simplify]: 3.578e-05 [loop_unroll]: 2.22e-05 [a_1]: 0.00058207 [with_stream_mark]: 2.377e-05 [recompute_prepare]: 1.198e-05 [updatestate_depend_eliminate]: 4.69998e-06 [updatestate_assign_eliminate]: 3.47002e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 2.49999e-06 [a_2]: 8.346e-05 [accelerated_algorithm]: 7.41001e-06 [shard]: 3.5e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 6.41e-06 [merge_send_recv]: 9.20999e-06 [auto_parallel]: 9.25999e-06 [parallel]: 2.163e-05 [flash_sp]: 1.063e-05 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.9e-06 [matmul_add_comm_reduction]: 1.074e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 9.71003e-06 [virtual_dataset]: 6.88998e-06 [get_grad_eliminate_]: 6.09001e-06 [virtual_output]: 5.79999e-06 [merge_forward]: 4.23999e-06 [cell_reuse_recompute_pass]: 1.41002e-06 [offload_activation]: 1.065e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.326e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.006e-05 [set_forward_comm_id_for_comm_node_pass]: 3.93999e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 3.16001e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 1.269e-05 [a_after_grad]: 9.92001e-06 [renormalize]: 0.00083032 [add_forward_monad_depend]: 7.46001e-06 [auto_monad_grad]: 2.91e-06 [auto_monad_eliminator]: 1.751e-05 [cse]: 3.206e-05 [a_3]: 5.366e-05 [Cycle 2]: 0.00070016, [45] [expand_dump_flag]: 2.11e-06 [switch_simplify]: 8.37e-06 [loop_unroll]: 6.43e-06 [a_1]: 0.00014705 [with_stream_mark]: 1.573e-05 [recompute_prepare]: 6.63e-06 [updatestate_depend_eliminate]: 3.36001e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.145e-05 [accelerated_algorithm]: 6.12001e-06 [shard]: 3.17002e-06 [meta_shard_fg_expand]: 2.32999e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.13998e-06 [auto_parallel]: 9.25999e-06 [parallel]: 7.91001e-06 [flash_sp]: 4.3e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.65998e-06 [matmul_add_comm_reduction]: 9.25999e-06 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 7.9e-06 [virtual_dataset]: 6.64999e-06 [get_grad_eliminate_]: 6.13002e-06 [virtual_output]: 6.24001e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 3.21999e-06 [offload_activation]: 9.26002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.185e-05 [merge_recompute_call_nodes]: 1.13001e-06 [before_grad]: 9.31e-06 [set_forward_comm_id_for_comm_node_pass]: 4e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 1.29e-06 [receive_attached]: 3.19001e-06 [after_resolve]: 1.061e-05 [a_after_grad]: 8.44998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.49998e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 7.85998e-06 [cse]: 1.639e-05 [a_3]: 3.438e-05 [py_interpret_to_execute_after_opt_a]: 1.467e-05 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 0.00014515 [convert_after_rewriter]: 5.349e-05 [order_py_execute_after_rewriter]: 6.40997e-06 [mutable_eliminate]: 0.00077588 [opt_b]: 0.00021569, [1] [Cycle 1]: 0.00020741, [7] [b_1]: 0.00012422 [b_2]: 8.23001e-06 [updatestate_depend_eliminate]: 6.92002e-06 [updatestate_assign_eliminate]: 2.92002e-06 [updatestate_loads_eliminate]: 2.59999e-06 [renormalize]: 1.22e-06 [cse]: 2.435e-05 [optimize_parallel_all_gather_comm]: 2.556e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 3.618e-05 [loop_unroll]: 0.00051427 [opt_after_cconv]: 0.00010711, [1] [Cycle 1]: 0.00010062, [7] [c_1]: 3.083e-05 [parameter_eliminate]: 4.83001e-06 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.51e-06 [cse]: 1.893e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.556e-05 [tuple_transform]: 8.288e-05, [1] [Cycle 1]: 7.742e-05, [4] [d_1]: 4.835e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.71999e-06 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 5.388e-05 [cse_after_recomputation]: 2.343e-05, [1] [Cycle 1]: 1.893e-05, [1] [cse]: 1.274e-05 [environ_conv]: 5.91e-06 [swap_dp_allreduce_reducescatter]: 5.55001e-06 [bias_add_comm_swap]: 3.15998e-06 [label_micro_interleaved_index]: 4.68999e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 3.11001e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.51998e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.54e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01998e-06 [control_data_broadcast_order]: 1.359e-05 [grouped_pairwise_exchange_alltoall]: 1.89999e-06 [offloading_packed_experts]: 4.02e-06 [overlap_recompute_and_grad_model_parallel]: 5.12e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.48002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.31998e-06 [overlap_grad_ring_attention]: 4.82e-06 [overlap_grad_flash_sp]: 0.00012031 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 2.46e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 8.155e-05, [1] [Cycle 1]: 7.632e-05, [6] [build]: 3.91001e-06 [elim_shapecalc]: 1.083e-05 [elim_not_effective]: 1.335e-05 [opt_reshape]: 7.77e-06 [fold_const_symbol]: 9.87001e-06 [renormalize]: 2.29978e-07 [detach_backward]: 2.53998e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.782e-05 [get_jit_bprop_graph]: 2.08998e-06 [rewriter_after_jit_bprop_graph]: 6.23e-06 [opt_after_jit_grad]: 0.136731 [validate]: 5.493e-05 [backend_pass]: 1.11002e-06 [task_emit]: 0.00781307 [execute]: 9.28002e-06 Sums bootstrap : 0.000594s : 0.38% type_inference : 0.007303s : 4.64% event_method : 0.000018s : 0.01% auto_monad : 0.000063s : 0.04% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000033s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000029s : 0.02% optimize.rewriter_before_opt_a : 0.000125s : 0.08% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000044s : 0.03% optimize.opt_a.loop_unroll : 0.000029s : 0.02% optimize.opt_a.a_1 : 0.000729s : 0.46% optimize.opt_a.with_stream_mark : 0.000039s : 0.03% optimize.opt_a.recompute_prepare : 0.000019s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.10% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.01% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000016s : 0.01% optimize.opt_a.auto_parallel : 0.000019s : 0.01% optimize.opt_a.parallel : 0.000030s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000008s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000020s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000018s : 0.01% optimize.opt_a.virtual_dataset : 0.000014s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.01% optimize.opt_a.virtual_output : 0.000012s : 0.01% optimize.opt_a.merge_forward : 0.000009s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000023s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000830s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.02% optimize.opt_a.cse : 0.000048s : 0.03% optimize.opt_a.a_3 : 0.000088s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000145s : 0.09% optimize.convert_after_rewriter : 0.000053s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000776s : 0.49% optimize.opt_b.b_1 : 0.000124s : 0.08% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000024s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000036s : 0.02% optimize.loop_unroll : 0.000514s : 0.33% optimize.opt_after_cconv.c_1 : 0.000031s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.01% optimize.tuple_transform.d_1 : 0.000048s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.03% optimize.cse_after_recomputation.cse : 0.000013s : 0.01% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000120s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.136731s : 86.84% validate : 0.000055s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.007813s : 4.96% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000279 30 13.00% : 0.000036s : 5: substitution.arithmetic_simplify 0.75% : 0.000002s : 2: substitution.elim_not_effective 0.51% : 0.000001s : 2: substitution.fold_const_symbol 2.36% : 0.000007s : 4: substitution.graph_param_transform 73.33% : 0.000204s : 3: substitution.inline 1.40% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.15% : 0.000006s : 4: substitution.remove_not_recompute_node 2.07% : 0.000006s : 4: substitution.replace_old_param 4.45% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007241 2 90.74% : 0.006571s : 1: type_inference.infer 9.26% : 0.000670s : 1: type_inference.specialize ------[replace.] 0.000047 5 70.57% : 0.000033s : 3: replace.inline 29.43% : 0.000014s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000213 5 94.73% : 0.000202s : 3: match.inline 5.27% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000197 1131 0.79% : 0.000002s : 11: predicate.accumulaten_eliminater 2.60% : 0.000005s : 4: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000002s : 11: predicate.addn_zero_filter 0.73% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000002s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.49% : 0.000001s : 8: predicate.compare_switch_simplify 0.19% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.76% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 11: predicate.dict_set_item_eliminator 2.60% : 0.000005s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.04% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.98% : 0.000002s : 15: predicate.environ_get_depend_swap 1.57% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.16% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.11% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.81% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000002s : 8: predicate.get_grad_eliminate 0.21% : 0.000000s : 4: predicate.graph_param_transform 0.52% : 0.000001s : 8: predicate.incorporate_call 0.45% : 0.000001s : 8: predicate.incorporate_call_switch 5.31% : 0.000010s : 51: predicate.inline 0.85% : 0.000002s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000002s : 8: predicate.less_batch_normalization 1.56% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.18% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.90% : 0.000004s : 26: predicate.loop_unroll_before_grad 2.16% : 0.000004s : 19: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 8: predicate.merge_addn 0.57% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.89% : 0.000002s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 11: predicate.minmaximum_grad 1.47% : 0.000003s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.90% : 0.000004s : 16: predicate.partial_defer_inline 1.30% : 0.000003s : 17: predicate.partial_eliminate 0.70% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.22% : 0.000002s : 11: predicate.reduce_eliminate 2.04% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.39% : 0.000001s : 8: predicate.remove_not_recompute_node 1.21% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.53% : 0.000001s : 4: predicate.reset_defer_inline 1.17% : 0.000002s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.08% : 0.000002s : 8: predicate.same_eliminate 0.42% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000002s : 8: predicate.shard_identity_eliminate 0.97% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000002s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.28% : 0.000003s : 16: predicate.switch_defer_inline 1.87% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.54% : 0.000009s : 54: predicate.switch_simplify 0.82% : 0.000002s : 11: predicate.tile_eliminate 0.79% : 0.000002s : 11: predicate.transpose_eliminate 1.51% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.44% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.83% : 0.000004s : 19: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.29% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.47% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.17% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.74% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.98% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000002s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000456 8 43.04% : 0.000196s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.96% : 0.000259s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.175446 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.54% : 0.004455s : 1: add_attr 2.53% : 0.004438s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000069s : 1: auto_monad 0.01% : 0.000023s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.36% : 0.000628s : 1: bootstrap 0.02% : 0.000040s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000058s : 1: convert_after_rewriter 0.02% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.01% : 0.000026s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.30% : 0.000525s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000788s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000020s : 1: opt.transform.mutable_eliminate 0.65% : 0.001144s : 78: opt.transform.opt_a 0.02% : 0.000029s : 1: opt.transform.opt_after_cconv 0.03% : 0.000051s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000102s : 28: opt.transform.opt_b 0.03% : 0.000053s : 2: opt.transform.opt_trans_graph 0.02% : 0.000038s : 4: opt.transform.symbol_engine_opt 1.66% : 0.002914s : 1: opt_a 0.06% : 0.000111s : 1: opt_after_cconv 77.95% : 0.136758s : 1: opt_after_jit_grad 0.13% : 0.000219s : 1: opt_b 3.28% : 0.005749s : 1: optimize 0.02% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.07% : 0.000125s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000038s : 1: pre_auto_parallel 0.02% : 0.000037s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000019s : 1: remove_dup_value 0.26% : 0.000453s : 1: renormalize.infer 0.21% : 0.000366s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000151s : 1: rewriter_after_opt_a 0.08% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000084s : 1: symbol_engine_optimizer 4.47% : 0.007834s : 1: task_emit 0.05% : 0.000086s : 1: tuple_transform 4.18% : 0.007330s : 1: type_inference 0.06% : 0.000104s : 1: validate TotalTime = 0.91665, [24] [bootstrap]: 0.00068391 [type_inference]: 0.329955 [event_method]: 0.00011503 [auto_monad]: 0.00013499 [graph_reusing]: 9.26998e-06 [inline]: 2.74999e-06 [add_attr]: 0.015672, [1] [add_attr_with_inline]: 0.0156554, [1] [Cycle 1]: 0.00010793, [2] [tag_attr]: 5.37e-05 [meta_addattr_fg_expand]: 1.051e-05 [parallel-infer-symbol]: 4.50001e-06 [pre_auto_parallel]: 6.216e-05 [insert-virtual-dataset]: 2.64001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 3.11001e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.176289, [53] [py_interpret_to_execute]: 4.379e-05 [rewriter_before_opt_a]: 0.00017645 [opt_a]: 0.173525, [3] [Cycle 1]: 0.168943, [45] [expand_dump_flag]: 6.63e-06 [switch_simplify]: 8.057e-05 [loop_unroll]: 6.517e-05 [a_1]: 0.00201275 [with_stream_mark]: 3.663e-05 [recompute_prepare]: 3.112e-05 [updatestate_depend_eliminate]: 1.062e-05 [updatestate_assign_eliminate]: 8.38999e-06 [updatestate_loads_eliminate]: 7.59002e-06 [parameter_eliminate]: 3.77998e-06 [a_2]: 0.0002598 [accelerated_algorithm]: 4.08e-05 [shard]: 3.11001e-06 [meta_shard_fg_expand]: 5.24998e-06 [shard_inline]: 1.75e-05 [merge_send_recv]: 2.218e-05 [auto_parallel]: 1.803e-05 [parallel]: 2.286e-05 [flash_sp]: 1.634e-05 [merge_comm]: 1.104e-05 [allreduce_fusion]: 9.11998e-06 [matmul_add_comm_reduction]: 3.639e-05 [allreduce_slice_to_reducescatter]: 8.29983e-07 [virtual_shard_identity]: 2.344e-05 [virtual_dataset]: 1.674e-05 [get_grad_eliminate_]: 1.604e-05 [virtual_output]: 1.546e-05 [merge_forward]: 1.211e-05 [cell_reuse_recompute_pass]: 2.17001e-06 [offload_activation]: 2.05e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.368e-05 [merge_recompute_call_nodes]: 1.76e-06 [before_grad]: 4.369e-05 [set_forward_comm_id_for_comm_node_pass]: 1.404e-05 [meta_fg_expand]: 0.00275495 [flash_sp_send_recv_attached]: 6.01e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 8.724e-05 [a_after_grad]: 0.00010447 [renormalize]: 0.161922 [add_forward_monad_depend]: 1.187e-05 [auto_monad_grad]: 7.1e-06 [auto_monad_eliminator]: 5.918e-05 [cse]: 0.00028576 [a_3]: 0.00037462 [Cycle 2]: 0.00364517, [45] [expand_dump_flag]: 3.63e-06 [switch_simplify]: 5.165e-05 [loop_unroll]: 4.853e-05 [a_1]: 0.00170786 [with_stream_mark]: 1.728e-05 [recompute_prepare]: 1.451e-05 [updatestate_depend_eliminate]: 5.93002e-06 [updatestate_assign_eliminate]: 4.74e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 2.56e-06 [a_2]: 0.00012094 [accelerated_algorithm]: 1.462e-05 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 2.91e-06 [shard_inline]: 8.83001e-06 [merge_send_recv]: 1.129e-05 [auto_parallel]: 1.182e-05 [parallel]: 1.028e-05 [flash_sp]: 4.86002e-06 [merge_comm]: 5.35999e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 1.138e-05 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 1.226e-05 [virtual_dataset]: 8.47e-06 [get_grad_eliminate_]: 8.19002e-06 [virtual_output]: 7.83999e-06 [merge_forward]: 5.12999e-06 [cell_reuse_recompute_pass]: 1.94999e-06 [offload_activation]: 1.41e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.567e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 1.369e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07e-06 [meta_fg_expand]: 0.00013564 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 3.23e-06 [after_resolve]: 2.058e-05 [a_after_grad]: 1.364e-05 [renormalize]: 0.00085774 [add_forward_monad_depend]: 7.87e-06 [auto_monad_grad]: 2.88998e-06 [auto_monad_eliminator]: 2.074e-05 [cse]: 3.94e-05 [a_3]: 8.449e-05 [Cycle 3]: 0.00091339, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 1.139e-05 [loop_unroll]: 8.50001e-06 [a_1]: 0.00022861 [with_stream_mark]: 1.409e-05 [recompute_prepare]: 9.23002e-06 [updatestate_depend_eliminate]: 4.63001e-06 [updatestate_assign_eliminate]: 3.77002e-06 [updatestate_loads_eliminate]: 4.02998e-06 [parameter_eliminate]: 1.55999e-06 [a_2]: 0.00011 [accelerated_algorithm]: 1.327e-05 [shard]: 1.94999e-06 [meta_shard_fg_expand]: 2.00002e-06 [shard_inline]: 8.66002e-06 [merge_send_recv]: 9.19e-06 [auto_parallel]: 1.073e-05 [parallel]: 9.94001e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 4.65001e-06 [allreduce_fusion]: 4.36002e-06 [matmul_add_comm_reduction]: 1.112e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 1.002e-05 [virtual_dataset]: 8.42e-06 [get_grad_eliminate_]: 8.22e-06 [virtual_output]: 7.36999e-06 [merge_forward]: 4.68999e-06 [cell_reuse_recompute_pass]: 2.23998e-06 [offload_activation]: 1.23e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.58e-05 [merge_recompute_call_nodes]: 1.15001e-06 [before_grad]: 1.336e-05 [set_forward_comm_id_for_comm_node_pass]: 4.98001e-06 [meta_fg_expand]: 3.13998e-06 [flash_sp_send_recv_attached]: 2.09999e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 1.408e-05 [a_after_grad]: 1.355e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.45001e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.223e-05 [cse]: 2.782e-05 [a_3]: 5.274e-05 [py_interpret_to_execute_after_opt_a]: 1.851e-05 [slice_cell_reuse_recomputed_activation]: 2.51e-06 [rewriter_after_opt_a]: 5.361e-05 [convert_after_rewriter]: 9.51e-06 [order_py_execute_after_rewriter]: 6.83e-06 [mutable_eliminate]: 0.00076465 [opt_b]: 0.00027834, [1] [Cycle 1]: 0.0002699, [7] [b_1]: 0.00016981 [b_2]: 1.012e-05 [updatestate_depend_eliminate]: 1.008e-05 [updatestate_assign_eliminate]: 3.60998e-06 [updatestate_loads_eliminate]: 4.27998e-06 [renormalize]: 6.19999e-07 [cse]: 3.424e-05 [optimize_parallel_all_gather_comm]: 2.159e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 3.676e-05 [loop_unroll]: 0.00048763 [opt_after_cconv]: 0.00013528, [1] [Cycle 1]: 0.00012778, [7] [c_1]: 4.54e-05 [parameter_eliminate]: 5.64e-06 [updatestate_depend_eliminate]: 7.46001e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.18998e-06 [cse]: 2.734e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 4.397e-05 [tuple_transform]: 0.000102, [1] [Cycle 1]: 9.724e-05, [4] [d_1]: 6.699e-05 [none_parameter_eliminate]: 1.99e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 9.29e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 7.053e-05 [cse_after_recomputation]: 3.018e-05, [1] [Cycle 1]: 2.482e-05, [1] [cse]: 1.888e-05 [environ_conv]: 1.137e-05 [swap_dp_allreduce_reducescatter]: 7.02002e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 5.34e-06 [label_fine_grained_interleaved_index]: 3.05998e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.39001e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.66e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 5.22999e-06 [overlap_recompute_and_grad_model_parallel]: 5.96998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 5.17e-06 [overlap_grad_flash_sp]: 2.81e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 0.00010267, [1] [Cycle 1]: 9.819e-05, [6] [build]: 1.316e-05 [elim_shapecalc]: 1.406e-05 [elim_not_effective]: 1.722e-05 [opt_reshape]: 1.022e-05 [fold_const_symbol]: 1.365e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.61e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 2.444e-05 [get_jit_bprop_graph]: 2.28998e-06 [rewriter_after_jit_bprop_graph]: 5.82999e-06 [opt_after_jit_grad]: 0.00061963 [validate]: 0.00014514 [backend_pass]: 1.06002e-06 [task_emit]: 0.39217 [execute]: 9.22999e-06 Sums bootstrap : 0.000684s : 0.08% type_inference : 0.329955s : 36.70% event_method : 0.000115s : 0.01% auto_monad : 0.000135s : 0.02% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000054s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.00% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.000062s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000044s : 0.00% optimize.rewriter_before_opt_a : 0.000176s : 0.02% optimize.opt_a.expand_dump_flag : 0.000013s : 0.00% optimize.opt_a.switch_simplify : 0.000144s : 0.02% optimize.opt_a.loop_unroll : 0.000122s : 0.01% optimize.opt_a.a_1 : 0.003949s : 0.44% optimize.opt_a.with_stream_mark : 0.000068s : 0.01% optimize.opt_a.recompute_prepare : 0.000055s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000008s : 0.00% optimize.opt_a.a_2 : 0.000491s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000069s : 0.01% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000043s : 0.00% optimize.opt_a.auto_parallel : 0.000041s : 0.00% optimize.opt_a.parallel : 0.000043s : 0.00% optimize.opt_a.flash_sp : 0.000022s : 0.00% optimize.opt_a.merge_comm : 0.000021s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000059s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000046s : 0.01% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.00% optimize.opt_a.virtual_output : 0.000031s : 0.00% optimize.opt_a.merge_forward : 0.000022s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000047s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000071s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000024s : 0.00% optimize.opt_a.meta_fg_expand : 0.002894s : 0.32% optimize.opt_a.flash_sp_send_recv_attached : 0.000010s : 0.00% optimize.opt_a.receive_attached : 0.000008s : 0.00% optimize.opt_a.after_resolve : 0.000122s : 0.01% optimize.opt_a.a_after_grad : 0.000132s : 0.01% optimize.opt_a.renormalize : 0.162780s : 18.11% optimize.opt_a.add_forward_monad_depend : 0.000021s : 0.00% optimize.opt_a.auto_monad_grad : 0.000012s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000092s : 0.01% optimize.opt_a.cse : 0.000353s : 0.04% optimize.opt_a.a_3 : 0.000512s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000054s : 0.01% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000765s : 0.09% optimize.opt_b.b_1 : 0.000170s : 0.02% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.00% optimize.loop_unroll : 0.000488s : 0.05% optimize.opt_after_cconv.c_1 : 0.000045s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000027s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000044s : 0.00% optimize.tuple_transform.d_1 : 0.000067s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000071s : 0.01% optimize.cse_after_recomputation.cse : 0.000019s : 0.00% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000620s : 0.07% validate : 0.000145s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.392170s : 43.62% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.001333 213 5.19% : 0.000069s : 12: substitution.arithmetic_simplify 0.19% : 0.000003s : 4: substitution.elim_not_effective 0.40% : 0.000005s : 5: substitution.float_depend_g_call 0.40% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.17% : 0.000002s : 4: substitution.fold_const_symbol 0.60% : 0.000008s : 7: substitution.graph_param_transform 0.26% : 0.000003s : 2: substitution.incorporate_call 0.16% : 0.000002s : 2: substitution.incorporate_call_switch 64.39% : 0.000858s : 17: substitution.inline 2.23% : 0.000030s : 2: substitution.inline_without_move 1.84% : 0.000024s : 18: substitution.j_node_and_user_rematch 1.65% : 0.000022s : 3: substitution.less_batch_normalization 1.22% : 0.000016s : 11: substitution.minmaximum_grad 2.71% : 0.000036s : 5: substitution.partial_eliminate 1.03% : 0.000014s : 18: substitution.remove_not_recompute_node 2.52% : 0.000034s : 10: substitution.replace_applicator 1.23% : 0.000016s : 15: substitution.replace_old_param 0.39% : 0.000005s : 1: substitution.set_cell_output_no_recompute 2.55% : 0.000034s : 11: substitution.tuple_list_convert_item_index_to_positive 1.12% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 1.59% : 0.000021s : 11: substitution.tuple_list_get_item_depend_reorder 6.57% : 0.000088s : 30: substitution.tuple_list_get_item_eliminator 1.60% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.329854 2 99.38% : 0.327807s : 1: type_inference.infer 0.62% : 0.002047s : 1: type_inference.specialize ------[replace.] 0.000291 33 61.72% : 0.000180s : 17: replace.inline 38.28% : 0.000111s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000892 33 94.91% : 0.000847s : 17: match.inline 5.09% : 0.000045s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000817 5530 1.17% : 0.000010s : 66: predicate.accumulaten_eliminater 0.35% : 0.000003s : 7: predicate.ad_related_special_op_eliminate 0.46% : 0.000004s : 30: predicate.addn_check_dump 1.10% : 0.000009s : 66: predicate.addn_zero_filter 1.02% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 2.12% : 0.000017s : 96: predicate.arithmetic_simplify 1.10% : 0.000009s : 66: predicate.cast_eliminate 1.08% : 0.000009s : 65: predicate.check_bprop_eliminate 0.57% : 0.000005s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.46% : 0.000004s : 30: predicate.depend_value_elim 1.13% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 1.16% : 0.000010s : 66: predicate.dict_get_item_eliminator 1.06% : 0.000009s : 66: predicate.dict_set_item_eliminator 0.43% : 0.000004s : 14: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 7: predicate.elim_not_effective 0.17% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000010s : 73: predicate.environ_add_const_eliminate 1.17% : 0.000010s : 73: predicate.environ_get_add_eliminate 1.09% : 0.000009s : 73: predicate.environ_get_depend_swap 3.67% : 0.000030s : 103: predicate.environ_get_eliminate 1.13% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.32% : 0.000019s : 99: predicate.float_depend_g_call 0.48% : 0.000004s : 30: predicate.float_environ_get_switch 0.58% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.06% : 0.000001s : 7: predicate.fold_const_symbol 0.56% : 0.000005s : 30: predicate.get_grad_eliminate 0.08% : 0.000001s : 7: predicate.graph_param_transform 0.49% : 0.000004s : 30: predicate.incorporate_call 0.41% : 0.000003s : 30: predicate.incorporate_call_switch 5.25% : 0.000043s : 239: predicate.inline 1.27% : 0.000010s : 53: predicate.inline_without_move 0.26% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.73% : 0.000006s : 30: predicate.less_batch_normalization 1.61% : 0.000013s : 96: predicate.list_to_tuple_eliminator_ 2.47% : 0.000020s : 162: predicate.load_eliminater 0.38% : 0.000003s : 7: predicate.loop_unroll_after_grad 2.25% : 0.000018s : 134: predicate.loop_unroll_before_grad 1.35% : 0.000011s : 80: predicate.make_slice_get_slice_eliminator 0.49% : 0.000004s : 30: predicate.merge_addn 1.05% : 0.000009s : 65: predicate.micro_step_allgather_replace 1.10% : 0.000009s : 65: predicate.mini_step_allgather_replace 1.10% : 0.000009s : 66: predicate.minmaximum_grad 0.44% : 0.000004s : 7: predicate.mutable_eliminate 0.19% : 0.000002s : 7: predicate.opt_reshape 0.15% : 0.000001s : 7: predicate.parallel_virtual_node 2.37% : 0.000019s : 99: predicate.partial_defer_inline 1.61% : 0.000013s : 89: predicate.partial_eliminate 1.11% : 0.000009s : 66: predicate.print_const_string_wrapper 0.56% : 0.000005s : 30: predicate.reduce_all_const_elim 1.34% : 0.000011s : 66: predicate.reduce_eliminate 2.49% : 0.000020s : 162: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 30: predicate.remove_not_recompute_node 1.87% : 0.000015s : 147: predicate.replace_applicator 0.68% : 0.000006s : 53: predicate.replace_old_param 0.16% : 0.000001s : 7: predicate.reset_defer_inline 1.19% : 0.000010s : 66: predicate.reshape_eliminate 1.13% : 0.000009s : 65: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 7: predicate.row_tensor_eliminate 1.32% : 0.000011s : 65: predicate.same_eliminate 0.35% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.71% : 0.000006s : 30: predicate.shard_identity_eliminate 0.26% : 0.000002s : 14: predicate.special_op_eliminate 0.52% : 0.000004s : 30: predicate.specialize_transform 1.40% : 0.000011s : 65: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000010s : 53: predicate.stack_unstack_eliminate 0.13% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.79% : 0.000015s : 99: predicate.switch_defer_inline 2.76% : 0.000023s : 164: predicate.switch_layer_defer_inline 4.84% : 0.000040s : 270: predicate.switch_simplify 1.10% : 0.000009s : 66: predicate.tile_eliminate 1.16% : 0.000009s : 66: predicate.transpose_eliminate 1.43% : 0.000012s : 80: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000013s : 80: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000011s : 80: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000024s : 126: predicate.tuple_list_get_item_eliminator 1.37% : 0.000011s : 80: predicate.tuple_list_get_set_item_eliminator 1.82% : 0.000015s : 110: predicate.tuple_list_set_item_eliminator 1.57% : 0.000013s : 96: predicate.tuple_to_list_eliminator_ 2.42% : 0.000020s : 162: predicate.updatestate_pure_node_eliminater 3.03% : 0.000025s : 192: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 7: predicate.value_based_eliminate 0.60% : 0.000005s : 30: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 30: predicate.virtual_output_eliminate 0.11% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002778 34 65.69% : 0.001825s : 13: func_graph_cloner_run.FuncGraphClonerGraph 34.31% : 0.000953s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.277116 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.23% : 0.015678s : 1: add_attr 1.23% : 0.015661s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000075s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000143s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000722s : 1: bootstrap 0.00% : 0.000041s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000033s : 1: cse_after_recomputation 0.00% : 0.000011s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.01% : 0.000126s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.04% : 0.000499s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000781s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000023s : 1: opt.transform.mutable_eliminate 0.45% : 0.005747s : 117: opt.transform.opt_a 0.00% : 0.000044s : 1: opt.transform.opt_after_cconv 0.00% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000152s : 28: opt.transform.opt_b 0.01% : 0.000074s : 2: opt.transform.opt_trans_graph 0.00% : 0.000051s : 4: opt.transform.symbol_engine_opt 13.59% : 0.173528s : 1: opt_a 0.01% : 0.000139s : 1: opt_after_cconv 0.06% : 0.000781s : 1: opt_after_jit_grad 0.02% : 0.000282s : 1: opt_b 13.80% : 0.176294s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000068s : 1: pre_auto_parallel 0.00% : 0.000051s : 1: py_interpret_to_execute 0.00% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000048s : 1: remove_dup_value 12.59% : 0.160822s : 2: renormalize.infer 0.15% : 0.001936s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000059s : 1: rewriter_after_opt_a 0.01% : 0.000186s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000105s : 1: symbol_engine_optimizer 30.71% : 0.392194s : 1: task_emit 0.01% : 0.000105s : 1: tuple_transform 25.84% : 0.329979s : 1: type_inference 0.02% : 0.000197s : 1: validate TotalTime = 0.349734, [24] [bootstrap]: 0.000415 [type_inference]: 0.210725 [event_method]: 1.369e-05 [auto_monad]: 5.597e-05 [graph_reusing]: 5.62001e-06 [inline]: 2.34999e-06 [add_attr]: 0.00389255, [1] [add_attr_with_inline]: 0.00387989, [1] [Cycle 1]: 6.588e-05, [2] [tag_attr]: 1.786e-05 [meta_addattr_fg_expand]: 3.4e-06 [parallel-infer-symbol]: 4.20999e-06 [pre_auto_parallel]: 3.539e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 9.5999e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00495594, [53] [py_interpret_to_execute]: 2.39e-05 [rewriter_before_opt_a]: 5.438e-05 [opt_a]: 0.00258271, [2] [Cycle 1]: 0.00188232, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 2.581e-05 [loop_unroll]: 1.426e-05 [a_1]: 0.00034745 [with_stream_mark]: 2.055e-05 [recompute_prepare]: 1.004e-05 [updatestate_depend_eliminate]: 3.80998e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 2.59999e-06 [a_2]: 8.276e-05 [accelerated_algorithm]: 6.76999e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.77999e-06 [auto_parallel]: 6.78e-06 [parallel]: 2.169e-05 [flash_sp]: 1.092e-05 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 1.143e-05 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.8e-06 [virtual_dataset]: 6.36998e-06 [get_grad_eliminate_]: 5.96e-06 [virtual_output]: 6.01998e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 1.263e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.29e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 9.94999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 2.78003e-06 [receive_attached]: 3.06001e-06 [after_resolve]: 1.163e-05 [a_after_grad]: 9.41998e-06 [renormalize]: 0.00080811 [add_forward_monad_depend]: 8.36002e-06 [auto_monad_grad]: 2.71999e-06 [auto_monad_eliminator]: 1.699e-05 [cse]: 3.377e-05 [a_3]: 5.343e-05 [Cycle 2]: 0.00068633, [45] [expand_dump_flag]: 2.11998e-06 [switch_simplify]: 7.56001e-06 [loop_unroll]: 5.97999e-06 [a_1]: 0.00014439 [with_stream_mark]: 1.696e-05 [recompute_prepare]: 7.05e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 2.84999e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.26002e-06 [a_2]: 7.225e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 8.16002e-06 [auto_parallel]: 8.97999e-06 [parallel]: 7.75e-06 [flash_sp]: 3.9e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.90998e-06 [matmul_add_comm_reduction]: 7.87003e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 6.24001e-06 [get_grad_eliminate_]: 5.74999e-06 [virtual_output]: 5.64998e-06 [merge_forward]: 4.60001e-06 [cell_reuse_recompute_pass]: 3.04999e-06 [offload_activation]: 9.32999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.179e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.33002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 1.62001e-06 [receive_attached]: 1.60001e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 8.75999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.82001e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 7.65998e-06 [cse]: 1.525e-05 [a_3]: 3.32e-05 [py_interpret_to_execute_after_opt_a]: 1.518e-05 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.887e-05 [convert_after_rewriter]: 8.03001e-06 [order_py_execute_after_rewriter]: 5.59998e-06 [mutable_eliminate]: 0.00075489 [opt_b]: 0.00020889, [1] [Cycle 1]: 0.00020008, [7] [b_1]: 0.00011682 [b_2]: 7.84002e-06 [updatestate_depend_eliminate]: 9.50001e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 6.09987e-07 [cse]: 2.435e-05 [optimize_parallel_all_gather_comm]: 1.843e-05 [overlap_param_gather]: 2.29999e-06 [cconv]: 3.35e-05 [loop_unroll]: 0.00050102 [opt_after_cconv]: 0.00010518, [1] [Cycle 1]: 9.858e-05, [7] [c_1]: 3.034e-05 [parameter_eliminate]: 4.27e-06 [updatestate_depend_eliminate]: 5.98002e-06 [updatestate_assign_eliminate]: 2.60997e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.829e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.465e-05 [tuple_transform]: 7.983e-05, [1] [Cycle 1]: 7.5e-05, [4] [d_1]: 4.745e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.88e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 5.405e-05 [cse_after_recomputation]: 2.279e-05, [1] [Cycle 1]: 1.771e-05, [1] [cse]: 1.22e-05 [environ_conv]: 6.24001e-06 [swap_dp_allreduce_reducescatter]: 5.82999e-06 [bias_add_comm_swap]: 2.94999e-06 [label_micro_interleaved_index]: 5.52999e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.85002e-06 [micro_interleaved_order_control]: 2.62001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.60001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.30001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.389e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 4.64998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.57001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.3e-06 [overlap_grad_flash_sp]: 2.138e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 7.795e-05, [1] [Cycle 1]: 7.301e-05, [6] [build]: 3.55e-06 [elim_shapecalc]: 9.98002e-06 [elim_not_effective]: 1.182e-05 [opt_reshape]: 6.96999e-06 [fold_const_symbol]: 9.51e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.29999e-06 [pipeline_parallel_scheduler]: 1.94999e-06 [auto_monad_reorder]: 1.828e-05 [get_jit_bprop_graph]: 2.80002e-06 [rewriter_after_jit_bprop_graph]: 6.27001e-06 [opt_after_jit_grad]: 0.00058012 [validate]: 4.651e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.128693 [execute]: 9.49e-06 Sums bootstrap : 0.000415s : 0.12% type_inference : 0.210725s : 61.13% event_method : 0.000014s : 0.00% auto_monad : 0.000056s : 0.02% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000035s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.01% optimize.rewriter_before_opt_a : 0.000054s : 0.02% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.01% optimize.opt_a.loop_unroll : 0.000020s : 0.01% optimize.opt_a.a_1 : 0.000492s : 0.14% optimize.opt_a.with_stream_mark : 0.000038s : 0.01% optimize.opt_a.recompute_prepare : 0.000017s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000017s : 0.00% optimize.opt_a.auto_parallel : 0.000016s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.01% optimize.opt_a.flash_sp : 0.000015s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000009s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000022s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000808s : 0.23% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.01% optimize.opt_a.cse : 0.000049s : 0.01% optimize.opt_a.a_3 : 0.000087s : 0.03% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000755s : 0.22% optimize.opt_b.b_1 : 0.000117s : 0.03% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000024s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000033s : 0.01% optimize.loop_unroll : 0.000501s : 0.15% optimize.opt_after_cconv.c_1 : 0.000030s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000047s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.02% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.01% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000580s : 0.17% validate : 0.000047s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.128693s : 37.33% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000169 26 18.43% : 0.000031s : 4: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 4.05% : 0.000007s : 4: substitution.graph_param_transform 66.67% : 0.000113s : 2: substitution.inline 2.43% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.41% : 0.000006s : 4: substitution.remove_not_recompute_node 3.10% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.210669 2 99.81% : 0.210266s : 1: type_inference.infer 0.19% : 0.000403s : 1: type_inference.specialize ------[replace.] 0.000025 2 100.00% : 0.000025s : 2: replace.inline ------[match.] 0.000111 2 100.00% : 0.000111s : 2: match.inline ------[predicate.] 0.000165 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.45% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 1.12% : 0.000002s : 9: predicate.addn_zero_filter 0.59% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000004s : 17: predicate.arithmetic_simplify 0.97% : 0.000002s : 9: predicate.cast_eliminate 0.90% : 0.000001s : 8: predicate.check_bprop_eliminate 0.83% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.77% : 0.000001s : 8: predicate.depend_value_elim 0.92% : 0.000002s : 9: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.59% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.65% : 0.000003s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_depend_swap 1.57% : 0.000003s : 21: predicate.environ_get_eliminate 0.90% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.88% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000003s : 11: predicate.float_depend_g_call 0.73% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.36% : 0.000001s : 4: predicate.fold_const_symbol 0.97% : 0.000002s : 8: predicate.get_grad_eliminate 0.40% : 0.000001s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.63% : 0.000009s : 44: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.28% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000004s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.47% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.60% : 0.000001s : 9: predicate.minmaximum_grad 1.66% : 0.000003s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.11% : 0.000002s : 11: predicate.partial_defer_inline 1.02% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.88% : 0.000001s : 8: predicate.reduce_all_const_elim 1.35% : 0.000002s : 9: predicate.reduce_eliminate 2.18% : 0.000004s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 17: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.56% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 4: predicate.row_tensor_eliminate 1.08% : 0.000002s : 8: predicate.same_eliminate 0.68% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.28% : 0.000002s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.45% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.91% : 0.000002s : 11: predicate.switch_defer_inline 1.55% : 0.000003s : 19: predicate.switch_layer_defer_inline 3.90% : 0.000006s : 41: predicate.switch_simplify 1.37% : 0.000002s : 9: predicate.tile_eliminate 0.70% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000003s : 17: predicate.tuple_list_convert_item_index_to_positive 1.69% : 0.000003s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.19% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000003s : 17: predicate.tuple_to_list_eliminator_ 1.81% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.72% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 1.16% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000411 6 49.34% : 0.000203s : 2: func_graph_cloner_run.FuncGraphClonerGraph 50.66% : 0.000208s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.360440 196 0.00% : 0.000003s : 1: ForceFp32Comm 1.08% : 0.003900s : 1: add_attr 1.08% : 0.003884s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000061s : 1: auto_monad 0.01% : 0.000022s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.13% : 0.000451s : 1: bootstrap 0.01% : 0.000037s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000007s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.14% : 0.000513s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.21% : 0.000769s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000019s : 1: opt.transform.mutable_eliminate 0.24% : 0.000878s : 78: opt.transform.opt_a 0.01% : 0.000028s : 1: opt.transform.opt_after_cconv 0.01% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000096s : 28: opt.transform.opt_b 0.01% : 0.000052s : 2: opt.transform.opt_trans_graph 0.01% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.72% : 0.002586s : 1: opt_a 0.03% : 0.000109s : 1: opt_after_cconv 0.16% : 0.000593s : 1: opt_after_jit_grad 0.06% : 0.000212s : 1: opt_b 1.38% : 0.004961s : 1: optimize 0.01% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000040s : 1: pre_auto_parallel 0.01% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000019s : 1: remove_dup_value 0.14% : 0.000499s : 1: renormalize.infer 0.08% : 0.000299s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000043s : 1: rewriter_after_opt_a 0.02% : 0.000059s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000081s : 1: symbol_engine_optimizer 35.71% : 0.128716s : 1: task_emit 0.02% : 0.000083s : 1: tuple_transform 58.47% : 0.210749s : 1: type_inference 0.02% : 0.000089s : 1: validate TotalTime = 0.392556, [24] [bootstrap]: 0.00052917 [type_inference]: 0.113562 [event_method]: 4.994e-05 [auto_monad]: 0.00013342 [graph_reusing]: 8.79e-06 [inline]: 3.09001e-06 [add_attr]: 0.00389812, [1] [add_attr_with_inline]: 0.00388417, [1] [Cycle 1]: 8.9e-05, [2] [tag_attr]: 3.986e-05 [meta_addattr_fg_expand]: 8.84e-06 [parallel-infer-symbol]: 3.76999e-06 [pre_auto_parallel]: 5.432e-05 [insert-virtual-dataset]: 2.74999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.135962, [53] [py_interpret_to_execute]: 4.116e-05 [rewriter_before_opt_a]: 0.0001506 [opt_a]: 0.132, [3] [Cycle 1]: 0.127165, [45] [expand_dump_flag]: 5.10999e-06 [switch_simplify]: 6.926e-05 [loop_unroll]: 5.578e-05 [a_1]: 0.00148144 [with_stream_mark]: 3.572e-05 [recompute_prepare]: 3.276e-05 [updatestate_depend_eliminate]: 2.491e-05 [updatestate_assign_eliminate]: 8.57e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 3.92002e-06 [a_2]: 0.00028373 [accelerated_algorithm]: 3.634e-05 [shard]: 2.09e-06 [meta_shard_fg_expand]: 5.12e-06 [shard_inline]: 1.671e-05 [merge_send_recv]: 1.95e-05 [auto_parallel]: 1.507e-05 [parallel]: 2.179e-05 [flash_sp]: 1.677e-05 [merge_comm]: 1.017e-05 [allreduce_fusion]: 8.82999e-06 [matmul_add_comm_reduction]: 3.564e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 1.966e-05 [virtual_dataset]: 1.603e-05 [get_grad_eliminate_]: 1.579e-05 [virtual_output]: 1.57e-05 [merge_forward]: 1.104e-05 [cell_reuse_recompute_pass]: 2.23002e-06 [offload_activation]: 1.921e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.361e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 2.943e-05 [set_forward_comm_id_for_comm_node_pass]: 1.012e-05 [meta_fg_expand]: 0.00172667 [flash_sp_send_recv_attached]: 4.83001e-06 [receive_attached]: 2.89999e-06 [after_resolve]: 6.929e-05 [a_after_grad]: 9.171e-05 [renormalize]: 0.121781 [add_forward_monad_depend]: 2.375e-05 [auto_monad_grad]: 9.07999e-06 [auto_monad_eliminator]: 7.564e-05 [cse]: 0.000223 [a_3]: 0.00040298 [Cycle 2]: 0.00389467, [45] [expand_dump_flag]: 3.43e-06 [switch_simplify]: 5.09e-05 [loop_unroll]: 4.413e-05 [a_1]: 0.00172031 [with_stream_mark]: 3.003e-05 [recompute_prepare]: 1.433e-05 [updatestate_depend_eliminate]: 5.84e-06 [updatestate_assign_eliminate]: 4.71002e-06 [updatestate_loads_eliminate]: 4.09997e-06 [parameter_eliminate]: 2.78e-06 [a_2]: 0.00011855 [accelerated_algorithm]: 1.301e-05 [shard]: 2.71e-06 [meta_shard_fg_expand]: 3.11999e-06 [shard_inline]: 8.85001e-06 [merge_send_recv]: 1.132e-05 [auto_parallel]: 1.248e-05 [parallel]: 1.247e-05 [flash_sp]: 4.60001e-06 [merge_comm]: 5.56e-06 [allreduce_fusion]: 4.61002e-06 [matmul_add_comm_reduction]: 1.27e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 1.05e-05 [virtual_dataset]: 8.66002e-06 [get_grad_eliminate_]: 7.83001e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 5.70001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.33e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.95e-05 [merge_recompute_call_nodes]: 1.79e-06 [before_grad]: 3.399e-05 [set_forward_comm_id_for_comm_node_pass]: 6.34001e-06 [meta_fg_expand]: 7.661e-05 [flash_sp_send_recv_attached]: 2.22001e-06 [receive_attached]: 3.35e-06 [after_resolve]: 1.748e-05 [a_after_grad]: 1.404e-05 [renormalize]: 0.001123 [add_forward_monad_depend]: 8.80001e-06 [auto_monad_grad]: 2.79001e-06 [auto_monad_eliminator]: 2.174e-05 [cse]: 4.25e-05 [a_3]: 6.92e-05 [Cycle 3]: 0.00091479, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 1.017e-05 [loop_unroll]: 8.35001e-06 [a_1]: 0.00022939 [with_stream_mark]: 1.505e-05 [recompute_prepare]: 8.72e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 3.73001e-06 [updatestate_loads_eliminate]: 3.96001e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 0.0001101 [accelerated_algorithm]: 1.251e-05 [shard]: 2.14e-06 [meta_shard_fg_expand]: 2.37999e-06 [shard_inline]: 8.40001e-06 [merge_send_recv]: 9.33997e-06 [auto_parallel]: 1.071e-05 [parallel]: 9.74999e-06 [flash_sp]: 1.45999e-06 [merge_comm]: 5.10999e-06 [allreduce_fusion]: 4.33999e-06 [matmul_add_comm_reduction]: 1.024e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.044e-05 [virtual_dataset]: 7.98999e-06 [get_grad_eliminate_]: 7.83001e-06 [virtual_output]: 8.18999e-06 [merge_forward]: 5.46998e-06 [cell_reuse_recompute_pass]: 3.03e-06 [offload_activation]: 1.26e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.65e-05 [merge_recompute_call_nodes]: 1.27e-06 [before_grad]: 1.365e-05 [set_forward_comm_id_for_comm_node_pass]: 4.77998e-06 [meta_fg_expand]: 2.91999e-06 [flash_sp_send_recv_attached]: 1.67999e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 1.537e-05 [a_after_grad]: 1.361e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.86e-06 [auto_monad_grad]: 2.49001e-06 [auto_monad_eliminator]: 1.254e-05 [cse]: 2.9e-05 [a_3]: 5.193e-05 [py_interpret_to_execute_after_opt_a]: 2.62e-05 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 5.925e-05 [convert_after_rewriter]: 1.003e-05 [order_py_execute_after_rewriter]: 7.68001e-06 [mutable_eliminate]: 0.00079303 [opt_b]: 0.00028454, [1] [Cycle 1]: 0.00027549, [7] [b_1]: 0.00017157 [b_2]: 1.028e-05 [updatestate_depend_eliminate]: 1.027e-05 [updatestate_assign_eliminate]: 4.42998e-06 [updatestate_loads_eliminate]: 3.31001e-06 [renormalize]: 5.19998e-07 [cse]: 3.777e-05 [optimize_parallel_all_gather_comm]: 2.396e-05 [overlap_param_gather]: 2.11e-06 [cconv]: 3.561e-05 [loop_unroll]: 0.00157691 [opt_after_cconv]: 0.00018279, [1] [Cycle 1]: 0.00015459, [7] [c_1]: 5.068e-05 [parameter_eliminate]: 6.56e-06 [updatestate_depend_eliminate]: 1.044e-05 [updatestate_assign_eliminate]: 4.02998e-06 [updatestate_loads_eliminate]: 3.6e-06 [cse]: 3.975e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 4.989e-05 [tuple_transform]: 0.00010919, [1] [Cycle 1]: 0.00010374, [4] [d_1]: 7.3e-05 [none_parameter_eliminate]: 2.07001e-06 [renormalize]: 3.80009e-07 [switch_simplify]: 8.94e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 7.506e-05 [cse_after_recomputation]: 3.137e-05, [1] [Cycle 1]: 2.558e-05, [1] [cse]: 1.943e-05 [environ_conv]: 1.089e-05 [swap_dp_allreduce_reducescatter]: 7.36999e-06 [bias_add_comm_swap]: 3.78001e-06 [label_micro_interleaved_index]: 7.68999e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.75002e-06 [assign_add_opt]: 1.75001e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 9.30013e-07 [full_micro_interleaved_order_control]: 2.84001e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 8.99978e-07 [interleave_split_concat_branches]: 1.59e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.89e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 4.78001e-06 [overlap_recompute_and_grad_model_parallel]: 5.32001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.90999e-06 [overlap_grad_flash_sp]: 2.718e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.45999e-06 [symbol_engine_optimizer]: 0.00010812, [1] [Cycle 1]: 0.00010326, [6] [build]: 1.397e-05 [elim_shapecalc]: 1.524e-05 [elim_not_effective]: 1.847e-05 [opt_reshape]: 9.82999e-06 [fold_const_symbol]: 1.415e-05 [renormalize]: 1.8999e-07 [detach_backward]: 2.89001e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 2.358e-05 [get_jit_bprop_graph]: 2.15002e-06 [rewriter_after_jit_bprop_graph]: 6.61e-06 [opt_after_jit_grad]: 0.00094099 [validate]: 5.813e-05 [backend_pass]: 1.22e-06 [task_emit]: 0.137016 [execute]: 1.015e-05 Sums bootstrap : 0.000529s : 0.14% type_inference : 0.113562s : 29.34% event_method : 0.000050s : 0.01% auto_monad : 0.000133s : 0.03% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000040s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000054s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.01% optimize.rewriter_before_opt_a : 0.000151s : 0.04% optimize.opt_a.expand_dump_flag : 0.000011s : 0.00% optimize.opt_a.switch_simplify : 0.000130s : 0.03% optimize.opt_a.loop_unroll : 0.000108s : 0.03% optimize.opt_a.a_1 : 0.003431s : 0.89% optimize.opt_a.with_stream_mark : 0.000081s : 0.02% optimize.opt_a.recompute_prepare : 0.000056s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000036s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.00% optimize.opt_a.parameter_eliminate : 0.000008s : 0.00% optimize.opt_a.a_2 : 0.000512s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000062s : 0.02% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.01% optimize.opt_a.merge_send_recv : 0.000040s : 0.01% optimize.opt_a.auto_parallel : 0.000038s : 0.01% optimize.opt_a.parallel : 0.000044s : 0.01% optimize.opt_a.flash_sp : 0.000023s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000059s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.01% optimize.opt_a.virtual_dataset : 0.000033s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.01% optimize.opt_a.virtual_output : 0.000032s : 0.01% optimize.opt_a.merge_forward : 0.000022s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000045s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000070s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000077s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001806s : 0.47% optimize.opt_a.flash_sp_send_recv_attached : 0.000009s : 0.00% optimize.opt_a.receive_attached : 0.000009s : 0.00% optimize.opt_a.after_resolve : 0.000102s : 0.03% optimize.opt_a.a_after_grad : 0.000119s : 0.03% optimize.opt_a.renormalize : 0.122904s : 31.75% optimize.opt_a.add_forward_monad_depend : 0.000034s : 0.01% optimize.opt_a.auto_monad_grad : 0.000014s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000110s : 0.03% optimize.opt_a.cse : 0.000295s : 0.08% optimize.opt_a.a_3 : 0.000524s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000026s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000059s : 0.02% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000008s : 0.00% optimize.mutable_eliminate : 0.000793s : 0.20% optimize.opt_b.b_1 : 0.000172s : 0.04% optimize.opt_b.b_2 : 0.000010s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000038s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000036s : 0.01% optimize.loop_unroll : 0.001577s : 0.41% optimize.opt_after_cconv.c_1 : 0.000051s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000040s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000050s : 0.01% optimize.tuple_transform.d_1 : 0.000073s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000075s : 0.02% optimize.cse_after_recomputation.cse : 0.000019s : 0.01% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000008s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.00% opt_after_jit_grad : 0.000941s : 0.24% validate : 0.000058s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.137016s : 35.40% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.001005 209 6.67% : 0.000067s : 11: substitution.arithmetic_simplify 0.28% : 0.000003s : 4: substitution.elim_not_effective 0.49% : 0.000005s : 5: substitution.float_depend_g_call 0.48% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.19% : 0.000002s : 4: substitution.fold_const_symbol 0.87% : 0.000009s : 7: substitution.graph_param_transform 0.29% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 59.17% : 0.000595s : 16: substitution.inline 2.20% : 0.000022s : 2: substitution.inline_without_move 1.23% : 0.000012s : 18: substitution.j_node_and_user_rematch 1.95% : 0.000020s : 3: substitution.less_batch_normalization 1.54% : 0.000015s : 11: substitution.minmaximum_grad 0.70% : 0.000007s : 5: substitution.partial_eliminate 1.44% : 0.000014s : 18: substitution.remove_not_recompute_node 3.63% : 0.000036s : 10: substitution.replace_applicator 1.48% : 0.000015s : 15: substitution.replace_old_param 0.49% : 0.000005s : 1: substitution.set_cell_output_no_recompute 3.35% : 0.000034s : 11: substitution.tuple_list_convert_item_index_to_positive 1.49% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.04% : 0.000021s : 11: substitution.tuple_list_get_item_depend_reorder 7.60% : 0.000076s : 28: substitution.tuple_list_get_item_eliminator 2.17% : 0.000022s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.113418 2 98.64% : 0.111880s : 1: type_inference.infer 1.36% : 0.001538s : 1: type_inference.specialize ------[replace.] 0.000247 30 61.72% : 0.000152s : 16: replace.inline 38.28% : 0.000094s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000620 30 94.26% : 0.000585s : 16: match.inline 5.74% : 0.000036s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000775 5429 1.07% : 0.000008s : 65: predicate.accumulaten_eliminater 0.34% : 0.000003s : 7: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 30: predicate.addn_check_dump 1.01% : 0.000008s : 65: predicate.addn_zero_filter 0.97% : 0.000008s : 65: predicate.adjust_all_reduce_mul_add 2.16% : 0.000017s : 95: predicate.arithmetic_simplify 1.07% : 0.000008s : 65: predicate.cast_eliminate 1.12% : 0.000009s : 65: predicate.check_bprop_eliminate 0.52% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.47% : 0.000004s : 30: predicate.depend_value_elim 1.09% : 0.000008s : 65: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 65: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 7: predicate.elim_not_effective 0.21% : 0.000002s : 7: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 72: predicate.environ_add_const_eliminate 1.12% : 0.000009s : 72: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 72: predicate.environ_get_depend_swap 1.59% : 0.000012s : 102: predicate.environ_get_eliminate 1.12% : 0.000009s : 72: predicate.environ_get_set_eliminate 1.65% : 0.000013s : 95: predicate.exchange_switch_depend_value 2.28% : 0.000018s : 95: predicate.float_depend_g_call 0.57% : 0.000004s : 30: predicate.float_environ_get_switch 0.66% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.51% : 0.000004s : 30: predicate.get_grad_eliminate 0.10% : 0.000001s : 7: predicate.graph_param_transform 0.53% : 0.000004s : 30: predicate.incorporate_call 0.45% : 0.000004s : 30: predicate.incorporate_call_switch 5.61% : 0.000043s : 234: predicate.inline 1.22% : 0.000009s : 53: predicate.inline_without_move 0.28% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 30: predicate.less_batch_normalization 1.58% : 0.000012s : 93: predicate.list_to_tuple_eliminator_ 2.65% : 0.000021s : 158: predicate.load_eliminater 0.65% : 0.000005s : 7: predicate.loop_unroll_after_grad 2.16% : 0.000017s : 126: predicate.loop_unroll_before_grad 1.49% : 0.000012s : 79: predicate.make_slice_get_slice_eliminator 0.50% : 0.000004s : 30: predicate.merge_addn 1.07% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.05% : 0.000008s : 65: predicate.minmaximum_grad 0.45% : 0.000003s : 7: predicate.mutable_eliminate 0.16% : 0.000001s : 7: predicate.opt_reshape 0.12% : 0.000001s : 7: predicate.parallel_virtual_node 2.16% : 0.000017s : 95: predicate.partial_defer_inline 1.55% : 0.000012s : 86: predicate.partial_eliminate 1.07% : 0.000008s : 65: predicate.print_const_string_wrapper 0.53% : 0.000004s : 30: predicate.reduce_all_const_elim 1.22% : 0.000009s : 65: predicate.reduce_eliminate 2.50% : 0.000019s : 158: predicate.redundant_stop_gradient_eliminater 0.39% : 0.000003s : 30: predicate.remove_not_recompute_node 1.98% : 0.000015s : 144: predicate.replace_applicator 0.61% : 0.000005s : 53: predicate.replace_old_param 0.15% : 0.000001s : 7: predicate.reset_defer_inline 1.08% : 0.000008s : 65: predicate.reshape_eliminate 1.20% : 0.000009s : 65: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 7: predicate.row_tensor_eliminate 1.47% : 0.000011s : 65: predicate.same_eliminate 0.38% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.76% : 0.000006s : 30: predicate.shard_identity_eliminate 0.26% : 0.000002s : 14: predicate.special_op_eliminate 0.56% : 0.000004s : 30: predicate.specialize_transform 1.32% : 0.000010s : 65: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 53: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.76% : 0.000014s : 95: predicate.switch_defer_inline 2.75% : 0.000021s : 160: predicate.switch_layer_defer_inline 4.62% : 0.000036s : 258: predicate.switch_simplify 1.01% : 0.000008s : 65: predicate.tile_eliminate 1.05% : 0.000008s : 65: predicate.transpose_eliminate 1.40% : 0.000011s : 79: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 79: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000010s : 79: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 123: predicate.tuple_list_get_item_eliminator 1.31% : 0.000010s : 79: predicate.tuple_list_get_set_item_eliminator 4.10% : 0.000032s : 109: predicate.tuple_list_set_item_eliminator 1.51% : 0.000012s : 93: predicate.tuple_to_list_eliminator_ 2.43% : 0.000019s : 158: predicate.updatestate_pure_node_eliminater 3.03% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 7: predicate.value_based_eliminate 0.56% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 30: predicate.virtual_output_eliminate 0.12% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002623 32 71.08% : 0.001864s : 12: func_graph_cloner_run.FuncGraphClonerGraph 28.92% : 0.000759s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.660818 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.59% : 0.003904s : 1: add_attr 0.59% : 0.003889s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000079s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000142s : 1: auto_monad 0.00% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000008s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.09% : 0.000562s : 1: bootstrap 0.01% : 0.000040s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000015s : 1: convert_after_rewriter 0.01% : 0.000034s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.01% : 0.000059s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000011s : 1: label_micro_interleaved_index 0.24% : 0.001594s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.12% : 0.000811s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000030s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000024s : 1: opt.transform.mutable_eliminate 0.79% : 0.005203s : 117: opt.transform.opt_a 0.01% : 0.000049s : 1: opt.transform.opt_after_cconv 0.01% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000154s : 28: opt.transform.opt_b 0.01% : 0.000079s : 2: opt.transform.opt_trans_graph 0.01% : 0.000054s : 4: opt.transform.symbol_engine_opt 19.98% : 0.132004s : 1: opt_a 0.03% : 0.000188s : 1: opt_after_cconv 0.14% : 0.000955s : 1: opt_after_jit_grad 0.04% : 0.000289s : 1: opt_b 20.58% : 0.135968s : 1: optimize 0.00% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000011s : 1: order_py_execute_after_rewriter 0.00% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000059s : 1: pre_auto_parallel 0.01% : 0.000045s : 1: py_interpret_to_execute 0.00% : 0.000030s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000055s : 1: remove_dup_value 0.45% : 0.002972s : 2: renormalize.infer 18.15% : 0.119906s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000065s : 1: rewriter_after_opt_a 0.02% : 0.000156s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000111s : 1: symbol_engine_optimizer 20.74% : 0.137040s : 1: task_emit 0.02% : 0.000113s : 1: tuple_transform 17.19% : 0.113593s : 1: type_inference 0.02% : 0.000104s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x0-kbk],max_mem:4.0M ..... TotalTime = 152.088, [24] [bootstrap]: 0.00072854 [type_inference]: 0.111475 [event_method]: 2.14e-05 [auto_monad]: 6.809e-05 [graph_reusing]: 6.23e-06 [inline]: 2.90998e-06 [add_attr]: 0.00567098, [1] [add_attr_with_inline]: 0.00565243, [1] [Cycle 1]: 7.507e-05, [2] [tag_attr]: 2.508e-05 [meta_addattr_fg_expand]: 4.63001e-06 [parallel-infer-symbol]: 4.48999e-06 [pre_auto_parallel]: 4.339e-05 [insert-virtual-dataset]: 2.67001e-06 [parallel-infer-symbol-second]: 1.25999e-06 [dataset_repeat_opt]: 2.19999e-06 [pipeline_split]: 1.71998e-06 [optimize]: 0.00535235, [53] [py_interpret_to_execute]: 3.157e-05 [rewriter_before_opt_a]: 8.18e-05 [opt_a]: 0.00284703, [2] [Cycle 1]: 0.00215348, [45] [expand_dump_flag]: 2.93998e-06 [switch_simplify]: 3.831e-05 [loop_unroll]: 2.311e-05 [a_1]: 0.00057181 [with_stream_mark]: 2.203e-05 [recompute_prepare]: 9.00001e-06 [updatestate_depend_eliminate]: 4.94003e-06 [updatestate_assign_eliminate]: 3.48999e-06 [updatestate_loads_eliminate]: 3.16999e-06 [parameter_eliminate]: 1.89999e-06 [a_2]: 8.297e-05 [accelerated_algorithm]: 7.25e-06 [shard]: 2.63998e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.27001e-06 [merge_send_recv]: 9.64e-06 [auto_parallel]: 8.33999e-06 [parallel]: 3.5e-05 [flash_sp]: 9.67001e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.81e-06 [allreduce_slice_to_reducescatter]: 7.60017e-07 [virtual_shard_identity]: 9.09e-06 [virtual_dataset]: 6.59001e-06 [get_grad_eliminate_]: 6.02999e-06 [virtual_output]: 6.65002e-06 [merge_forward]: 4.32e-06 [cell_reuse_recompute_pass]: 1.86e-06 [offload_activation]: 1.12e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.179e-05 [merge_recompute_call_nodes]: 1.73002e-06 [before_grad]: 1.029e-05 [set_forward_comm_id_for_comm_node_pass]: 3.67998e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.189e-05 [a_after_grad]: 9.97999e-06 [renormalize]: 0.00081088 [add_forward_monad_depend]: 7.85998e-06 [auto_monad_grad]: 2.94999e-06 [auto_monad_eliminator]: 1.859e-05 [cse]: 3.283e-05 [a_3]: 4.81e-05 [Cycle 2]: 0.00067937, [45] [expand_dump_flag]: 1.92999e-06 [switch_simplify]: 7.77e-06 [loop_unroll]: 5.87999e-06 [a_1]: 0.00014327 [with_stream_mark]: 1.419e-05 [recompute_prepare]: 6.86001e-06 [updatestate_depend_eliminate]: 3.31001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.351e-05 [accelerated_algorithm]: 6.54999e-06 [shard]: 1.44e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.08002e-06 [merge_send_recv]: 6.20002e-06 [auto_parallel]: 6.85998e-06 [parallel]: 6.63e-06 [flash_sp]: 3.87002e-06 [merge_comm]: 4.04997e-06 [allreduce_fusion]: 3.20998e-06 [matmul_add_comm_reduction]: 6.71e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 3.30003e-06 [cell_reuse_recompute_pass]: 2.89001e-06 [offload_activation]: 9.23002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 1.08001e-06 [before_grad]: 1.012e-05 [set_forward_comm_id_for_comm_node_pass]: 3.73999e-06 [meta_fg_expand]: 2.01e-06 [flash_sp_send_recv_attached]: 1.61998e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.117e-05 [a_after_grad]: 8.86997e-06 [renormalize]: 5.9983e-08 [add_forward_monad_depend]: 1.62001e-06 [auto_monad_grad]: 1.50999e-06 [auto_monad_eliminator]: 8.60999e-06 [cse]: 1.592e-05 [a_3]: 3.442e-05 [py_interpret_to_execute_after_opt_a]: 1.442e-05 [slice_cell_reuse_recomputed_activation]: 2.41e-06 [rewriter_after_opt_a]: 4.59e-05 [convert_after_rewriter]: 8.96998e-06 [order_py_execute_after_rewriter]: 6.24999e-06 [mutable_eliminate]: 0.00077881 [opt_b]: 0.00023596, [1] [Cycle 1]: 0.00019917, [7] [b_1]: 0.00011292 [b_2]: 7.51999e-06 [updatestate_depend_eliminate]: 9.39e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.57001e-06 [renormalize]: 7.79983e-07 [cse]: 2.506e-05 [optimize_parallel_all_gather_comm]: 2.154e-05 [overlap_param_gather]: 2.20002e-06 [cconv]: 3.289e-05 [loop_unroll]: 0.00051402 [opt_after_cconv]: 0.00012304, [1] [Cycle 1]: 0.00011511, [7] [c_1]: 4.284e-05 [parameter_eliminate]: 5.42999e-06 [updatestate_depend_eliminate]: 7.05e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.86e-05 [renormalize]: 7.7e-07 [remove_dup_value]: 1.392e-05 [tuple_transform]: 7.799e-05, [1] [Cycle 1]: 7.291e-05, [4] [d_1]: 4.691e-05 [none_parameter_eliminate]: 1.51998e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 5.274e-05 [cse_after_recomputation]: 2.23e-05, [1] [Cycle 1]: 1.774e-05, [1] [cse]: 1.164e-05 [environ_conv]: 6.11e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.51998e-06 [label_micro_interleaved_index]: 6.14999e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.74001e-06 [assign_add_opt]: 1.41002e-06 [ForceFp32Comm]: 8.80013e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.36998e-06 [reorder_send_recv_between_fp_bp]: 3.13e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.502e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 3.75998e-06 [overlap_recompute_and_grad_model_parallel]: 5.14e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.02998e-06 [overlap_grad_flash_sp]: 2.326e-05 [begin_end_overlap_inline]: 8.70001e-07 [split_matmul_comm_elemetwise]: 2.62001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.06002e-06 [symbol_engine_optimizer]: 7.815e-05, [1] [Cycle 1]: 7.337e-05, [6] [build]: 4.02e-06 [elim_shapecalc]: 1.104e-05 [elim_not_effective]: 1.153e-05 [opt_reshape]: 7.05e-06 [fold_const_symbol]: 9.13002e-06 [renormalize]: 1.90019e-07 [detach_backward]: 2.48998e-06 [pipeline_parallel_scheduler]: 1.87999e-06 [auto_monad_reorder]: 1.794e-05 [get_jit_bprop_graph]: 2.44001e-06 [rewriter_after_jit_bprop_graph]: 6.46e-06 [opt_after_jit_grad]: 0.000595 [validate]: 4.571e-05 [backend_pass]: 8.89995e-07 [task_emit]: 151.963 [execute]: 9.62999e-06 Sums bootstrap : 0.000729s : 0.00% type_inference : 0.111475s : 0.07% event_method : 0.000021s : 0.00% auto_monad : 0.000068s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000025s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000043s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000032s : 0.00% optimize.rewriter_before_opt_a : 0.000082s : 0.00% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000046s : 0.00% optimize.opt_a.loop_unroll : 0.000029s : 0.00% optimize.opt_a.a_1 : 0.000715s : 0.00% optimize.opt_a.with_stream_mark : 0.000036s : 0.00% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000156s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000016s : 0.00% optimize.opt_a.auto_parallel : 0.000015s : 0.00% optimize.opt_a.parallel : 0.000042s : 0.00% optimize.opt_a.flash_sp : 0.000014s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000023s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000811s : 0.00% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000027s : 0.00% optimize.opt_a.cse : 0.000049s : 0.00% optimize.opt_a.a_3 : 0.000083s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000779s : 0.00% optimize.opt_b.b_1 : 0.000113s : 0.00% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000033s : 0.00% optimize.loop_unroll : 0.000514s : 0.00% optimize.opt_after_cconv.c_1 : 0.000043s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000047s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000595s : 0.00% validate : 0.000046s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 151.963466s : 99.92% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000235 30 14.85% : 0.000035s : 5: substitution.arithmetic_simplify 0.75% : 0.000002s : 2: substitution.elim_not_effective 0.57% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000008s : 4: substitution.graph_param_transform 69.53% : 0.000164s : 3: substitution.inline 1.87% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.00% : 0.000005s : 4: substitution.remove_not_recompute_node 2.18% : 0.000005s : 4: substitution.replace_old_param 5.02% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.111393 2 99.18% : 0.110482s : 1: type_inference.infer 0.82% : 0.000912s : 1: type_inference.specialize ------[replace.] 0.000048 5 72.85% : 0.000035s : 3: replace.inline 27.15% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000172 5 93.75% : 0.000161s : 3: match.inline 6.25% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000187 1131 0.84% : 0.000002s : 11: predicate.accumulaten_eliminater 1.48% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000002s : 11: predicate.addn_zero_filter 0.82% : 0.000002s : 11: predicate.adjust_all_reduce_mul_add 2.42% : 0.000005s : 19: predicate.arithmetic_simplify 0.87% : 0.000002s : 11: predicate.cast_eliminate 0.58% : 0.000001s : 8: predicate.check_bprop_eliminate 0.53% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.98% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.40% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.98% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.96% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 0.99% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.82% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000002s : 8: predicate.get_grad_eliminate 0.37% : 0.000001s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.46% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000011s : 51: predicate.inline 0.82% : 0.000002s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.12% : 0.000002s : 8: predicate.less_batch_normalization 1.58% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.29% : 0.000004s : 32: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.01% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.99% : 0.000004s : 19: predicate.make_slice_get_slice_eliminator 0.52% : 0.000001s : 8: predicate.merge_addn 0.58% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 11: predicate.minmaximum_grad 1.70% : 0.000003s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.54% : 0.000003s : 16: predicate.partial_defer_inline 1.24% : 0.000002s : 17: predicate.partial_eliminate 0.95% : 0.000002s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.24% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000003s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000002s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.35% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000002s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000002s : 8: predicate.shard_identity_eliminate 0.94% : 0.000002s : 8: predicate.special_op_eliminate 0.83% : 0.000002s : 8: predicate.specialize_transform 1.33% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.08% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.21% : 0.000002s : 16: predicate.switch_defer_inline 1.84% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.66% : 0.000009s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000002s : 11: predicate.transpose_eliminate 1.38% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.38% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.52% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.15% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.90% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000002s : 8: predicate.virtual_output_eliminate 0.30% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000597 8 40.66% : 0.000243s : 3: func_graph_cloner_run.FuncGraphClonerGraph 59.34% : 0.000354s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 152.101016 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.00% : 0.005678s : 1: add_attr 0.00% : 0.005657s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000074s : 1: auto_monad 0.00% : 0.000023s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.00% : 0.000780s : 1: bootstrap 0.00% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000019s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000030s : 1: event_method 0.00% : 0.000036s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.00% : 0.000526s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.00% : 0.000795s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000020s : 1: opt.transform.mutable_eliminate 0.00% : 0.001122s : 78: opt.transform.opt_a 0.00% : 0.000042s : 1: opt.transform.opt_after_cconv 0.00% : 0.000031s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000094s : 28: opt.transform.opt_b 0.00% : 0.000051s : 2: opt.transform.opt_trans_graph 0.00% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.00% : 0.002851s : 1: opt_a 0.00% : 0.000127s : 1: opt_after_cconv 0.00% : 0.000611s : 1: opt_after_jit_grad 0.00% : 0.000244s : 1: opt_b 0.00% : 0.005359s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000048s : 1: pre_auto_parallel 0.00% : 0.000036s : 1: py_interpret_to_execute 0.00% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000018s : 1: remove_dup_value 0.00% : 0.000460s : 1: renormalize.infer 0.00% : 0.000342s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000051s : 1: rewriter_after_opt_a 0.00% : 0.000086s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000081s : 1: symbol_engine_optimizer 99.91% : 151.963542s : 1: task_emit 0.00% : 0.000081s : 1: tuple_transform 0.07% : 0.111508s : 1: type_inference 0.00% : 0.000079s : 1: validate . TotalTime = 0.925736, [24] [bootstrap]: 0.0011049 [type_inference]: 0.00775258 [event_method]: 1.526e-05 [auto_monad]: 6.999e-05 [graph_reusing]: 5.66998e-06 [inline]: 2.88e-06 [add_attr]: 0.00782973, [1] [add_attr_with_inline]: 0.00781423, [1] [Cycle 1]: 0.00012953, [2] [tag_attr]: 1.679e-05 [meta_addattr_fg_expand]: 3.93999e-06 [parallel-infer-symbol]: 3.50998e-06 [pre_auto_parallel]: 6.03e-05 [insert-virtual-dataset]: 2.64001e-06 [parallel-infer-symbol-second]: 9.39996e-07 [dataset_repeat_opt]: 2.58e-06 [pipeline_split]: 1.95001e-06 [optimize]: 0.0059083, [53] [py_interpret_to_execute]: 3.553e-05 [rewriter_before_opt_a]: 9.275e-05 [opt_a]: 0.0028951, [2] [Cycle 1]: 0.00213004, [45] [expand_dump_flag]: 3.24001e-06 [switch_simplify]: 3.206e-05 [loop_unroll]: 1.526e-05 [a_1]: 0.00037539 [with_stream_mark]: 2.604e-05 [recompute_prepare]: 1.016e-05 [updatestate_depend_eliminate]: 4.60999e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.43e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 8.845e-05 [accelerated_algorithm]: 9.67001e-06 [shard]: 3.25e-06 [meta_shard_fg_expand]: 2.29999e-06 [shard_inline]: 6.54001e-06 [merge_send_recv]: 1.059e-05 [auto_parallel]: 8.24998e-06 [parallel]: 0.00016645 [flash_sp]: 1.191e-05 [merge_comm]: 6.56e-06 [allreduce_fusion]: 3.61001e-06 [matmul_add_comm_reduction]: 1.256e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 1.373e-05 [virtual_dataset]: 7.78001e-06 [get_grad_eliminate_]: 6.12001e-06 [virtual_output]: 7.06001e-06 [merge_forward]: 5.86e-06 [cell_reuse_recompute_pass]: 1.99e-06 [offload_activation]: 1.292e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.626e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 1.145e-05 [set_forward_comm_id_for_comm_node_pass]: 4.07e-06 [meta_fg_expand]: 3.01001e-06 [flash_sp_send_recv_attached]: 2.90002e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.364e-05 [a_after_grad]: 9.99001e-06 [renormalize]: 0.00078714 [add_forward_monad_depend]: 8.20999e-06 [auto_monad_grad]: 2.29001e-06 [auto_monad_eliminator]: 1.925e-05 [cse]: 3.136e-05 [a_3]: 5.21e-05 [Cycle 2]: 0.00074983, [45] [expand_dump_flag]: 2.08002e-06 [switch_simplify]: 9.92001e-06 [loop_unroll]: 6.46e-06 [a_1]: 0.00014918 [with_stream_mark]: 1.611e-05 [recompute_prepare]: 6.62002e-06 [updatestate_depend_eliminate]: 4.12e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 1.27e-06 [a_2]: 7.705e-05 [accelerated_algorithm]: 7.5e-06 [shard]: 2.43e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 6.60002e-06 [merge_send_recv]: 9.38002e-06 [auto_parallel]: 9.20999e-06 [parallel]: 7.52002e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 4.80999e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 9.90025e-07 [virtual_shard_identity]: 8.13999e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.74999e-06 [virtual_output]: 5.96998e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 2.02001e-06 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.348e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.59999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.82998e-06 [meta_fg_expand]: 1.87001e-06 [flash_sp_send_recv_attached]: 1.14003e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.193e-05 [a_after_grad]: 8.95001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 2.94001e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.27e-05 [cse]: 2.181e-05 [a_3]: 3.549e-05 [py_interpret_to_execute_after_opt_a]: 1.529e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 4.296e-05 [convert_after_rewriter]: 1.191e-05 [order_py_execute_after_rewriter]: 6.04999e-06 [mutable_eliminate]: 0.00095771 [opt_b]: 0.00022413, [1] [Cycle 1]: 0.00021502, [7] [b_1]: 0.00012057 [b_2]: 8.35001e-06 [updatestate_depend_eliminate]: 1.058e-05 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.96999e-06 [renormalize]: 5.8001e-07 [cse]: 2.902e-05 [optimize_parallel_all_gather_comm]: 2.596e-05 [overlap_param_gather]: 7.591e-05 [cconv]: 3.37e-05 [loop_unroll]: 0.00060143 [opt_after_cconv]: 0.00012323, [1] [Cycle 1]: 0.00011494, [7] [c_1]: 3.153e-05 [parameter_eliminate]: 5.16998e-06 [updatestate_depend_eliminate]: 7.93001e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 2.65997e-06 [cse]: 2.588e-05 [renormalize]: 7.60017e-07 [remove_dup_value]: 1.486e-05 [tuple_transform]: 8.428e-05, [1] [Cycle 1]: 7.886e-05, [4] [d_1]: 4.796e-05 [none_parameter_eliminate]: 2.22999e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 7.83001e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.402e-05 [cse_after_recomputation]: 2.518e-05, [1] [Cycle 1]: 2.011e-05, [1] [cse]: 1.391e-05 [environ_conv]: 5.26002e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 3.18e-06 [label_micro_interleaved_index]: 6.85002e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.66e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.52001e-06 [assign_add_opt]: 1.80001e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 3.04999e-06 [comm_op_add_attrs]: 1.40001e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 8.697e-05 [overlap_opt_shard_grad_in_pipeline]: 2.99999e-06 [control_data_broadcast_order]: 1.894e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 4.62e-06 [overlap_recompute_and_grad_model_parallel]: 5.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.70001e-06 [overlap_recompute_comm]: 2.53003e-06 [overlap_grad_ring_attention]: 4.29002e-06 [overlap_grad_flash_sp]: 2.339e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.20002e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 0.00010186, [1] [Cycle 1]: 9.485e-05, [6] [build]: 4.95001e-06 [elim_shapecalc]: 1.776e-05 [elim_not_effective]: 1.568e-05 [opt_reshape]: 8.99998e-06 [fold_const_symbol]: 1.022e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.55997e-06 [pipeline_parallel_scheduler]: 2.19999e-06 [auto_monad_reorder]: 2.126e-05 [get_jit_bprop_graph]: 1.81e-06 [rewriter_after_jit_bprop_graph]: 7.48e-06 [opt_after_jit_grad]: 0.00067843 [validate]: 4.633e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.901912 [execute]: 1.073e-05 Sums bootstrap : 0.001105s : 0.12% type_inference : 0.007753s : 0.85% event_method : 0.000015s : 0.00% auto_monad : 0.000070s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000060s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.00% optimize.rewriter_before_opt_a : 0.000093s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.00% optimize.opt_a.loop_unroll : 0.000022s : 0.00% optimize.opt_a.a_1 : 0.000525s : 0.06% optimize.opt_a.with_stream_mark : 0.000042s : 0.00% optimize.opt_a.recompute_prepare : 0.000017s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000166s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000017s : 0.00% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000020s : 0.00% optimize.opt_a.auto_parallel : 0.000017s : 0.00% optimize.opt_a.parallel : 0.000174s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000011s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000021s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000022s : 0.00% optimize.opt_a.virtual_dataset : 0.000014s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000010s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000023s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000030s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000026s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000787s : 0.09% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000032s : 0.00% optimize.opt_a.cse : 0.000053s : 0.01% optimize.opt_a.a_3 : 0.000088s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.00% optimize.convert_after_rewriter : 0.000012s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000958s : 0.10% optimize.opt_b.b_1 : 0.000121s : 0.01% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000029s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.00% optimize.overlap_param_gather : 0.000076s : 0.01% optimize.cconv : 0.000034s : 0.00% optimize.loop_unroll : 0.000601s : 0.07% optimize.opt_after_cconv.c_1 : 0.000032s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000026s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000048s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000008s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.01% optimize.cse_after_recomputation.cse : 0.000014s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000007s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000087s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000003s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000018s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.00% opt_after_jit_grad : 0.000678s : 0.07% validate : 0.000046s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.901912s : 98.39% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000185 26 17.36% : 0.000032s : 4: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.67% : 0.000001s : 2: substitution.fold_const_symbol 3.57% : 0.000007s : 4: substitution.graph_param_transform 68.55% : 0.000127s : 2: substitution.inline 2.17% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.23% : 0.000006s : 4: substitution.remove_not_recompute_node 3.39% : 0.000006s : 4: substitution.replace_old_param ------[type_inference.] 0.007687 2 93.35% : 0.007176s : 1: type_inference.infer 6.65% : 0.000511s : 1: type_inference.specialize ------[replace.] 0.000024 2 100.00% : 0.000024s : 2: replace.inline ------[match.] 0.000125 2 100.00% : 0.000125s : 2: match.inline ------[predicate.] 0.000173 984 0.74% : 0.000001s : 9: predicate.accumulaten_eliminater 1.47% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 1.04% : 0.000002s : 9: predicate.addn_zero_filter 0.58% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.78% : 0.000005s : 17: predicate.arithmetic_simplify 0.68% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.68% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.62% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.44% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000001s : 4: predicate.elim_not_effective 0.84% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.82% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.93% : 0.000002s : 13: predicate.environ_get_depend_swap 2.01% : 0.000003s : 21: predicate.environ_get_eliminate 0.82% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.88% : 0.000002s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.39% : 0.000001s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.61% : 0.000010s : 44: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 1.93% : 0.000003s : 26: predicate.load_eliminater 1.75% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000003s : 18: predicate.loop_unroll_before_grad 2.09% : 0.000004s : 17: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 9: predicate.minmaximum_grad 2.53% : 0.000004s : 4: predicate.mutable_eliminate 0.65% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.43% : 0.000002s : 11: predicate.partial_defer_inline 0.99% : 0.000002s : 13: predicate.partial_eliminate 0.70% : 0.000001s : 9: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 9: predicate.reduce_eliminate 2.32% : 0.000004s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.10% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 1.14% : 0.000002s : 8: predicate.same_eliminate 0.47% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.27% : 0.000002s : 8: predicate.shard_identity_eliminate 1.01% : 0.000002s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 1.77% : 0.000003s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.92% : 0.000002s : 11: predicate.switch_defer_inline 1.48% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.29% : 0.000007s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.35% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.53% : 0.000003s : 17: predicate.tuple_list_get_item_depend_reorder 4.09% : 0.000007s : 25: predicate.tuple_list_get_item_eliminator 1.18% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.19% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.82% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.79% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.80% : 0.000001s : 4: predicate.value_based_eliminate 1.13% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.99% : 0.000002s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000382 6 36.40% : 0.000139s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.60% : 0.000243s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.941402 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.83% : 0.007839s : 1: add_attr 0.83% : 0.007819s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000077s : 1: auto_monad 0.00% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.12% : 0.001146s : 1: bootstrap 0.00% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000024s : 1: control_data_broadcast_order 0.00% : 0.000016s : 1: convert_after_rewriter 0.00% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000019s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.07% : 0.000617s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.10% : 0.000977s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000026s : 1: opt.transform.mutable_eliminate 0.10% : 0.000952s : 78: opt.transform.opt_a 0.00% : 0.000030s : 1: opt.transform.opt_after_cconv 0.00% : 0.000031s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000101s : 28: opt.transform.opt_b 0.01% : 0.000053s : 2: opt.transform.opt_trans_graph 0.01% : 0.000048s : 4: opt.transform.symbol_engine_opt 0.31% : 0.002899s : 1: opt_a 0.01% : 0.000127s : 1: opt_after_cconv 0.07% : 0.000695s : 1: opt_after_jit_grad 0.02% : 0.000228s : 1: opt_b 0.63% : 0.005915s : 1: optimize 0.00% : 0.000031s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000098s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000082s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000067s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 0.05% : 0.000470s : 1: renormalize.infer 0.03% : 0.000306s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000050s : 1: rewriter_after_opt_a 0.01% : 0.000100s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000105s : 1: symbol_engine_optimizer 95.81% : 0.901938s : 1: task_emit 0.01% : 0.000087s : 1: tuple_transform 0.83% : 0.007782s : 1: type_inference 0.01% : 0.000083s : 1: validate TotalTime = 0.882468, [24] [bootstrap]: 0.0327735 [type_inference]: 0.007417 [event_method]: 1.735e-05 [auto_monad]: 5.929e-05 [graph_reusing]: 7.156e-05 [inline]: 2.26e-06 [add_attr]: 0.00345411, [1] [add_attr_with_inline]: 0.0034426, [1] [Cycle 1]: 6.378e-05, [2] [tag_attr]: 1.907e-05 [meta_addattr_fg_expand]: 4.89e-06 [parallel-infer-symbol]: 3.58999e-06 [pre_auto_parallel]: 3.178e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 1.10001e-06 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 2.07999e-06 [optimize]: 0.0459972, [53] [py_interpret_to_execute]: 2.507e-05 [rewriter_before_opt_a]: 6.549e-05 [opt_a]: 0.043346, [2] [Cycle 1]: 0.0426251, [45] [expand_dump_flag]: 3.36001e-06 [switch_simplify]: 3.453e-05 [loop_unroll]: 2.209e-05 [a_1]: 0.0408295 [with_stream_mark]: 3.403e-05 [recompute_prepare]: 1.681e-05 [updatestate_depend_eliminate]: 5.76e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 2.29999e-06 [a_2]: 9.007e-05 [accelerated_algorithm]: 8.63001e-06 [shard]: 2.72001e-06 [meta_shard_fg_expand]: 3.38e-06 [shard_inline]: 6.76999e-06 [merge_send_recv]: 1.054e-05 [auto_parallel]: 1.098e-05 [parallel]: 2.039e-05 [flash_sp]: 1.176e-05 [merge_comm]: 4.44002e-06 [allreduce_fusion]: 4.12e-06 [matmul_add_comm_reduction]: 1.115e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 1.118e-05 [virtual_dataset]: 7.79002e-06 [get_grad_eliminate_]: 7.00002e-06 [virtual_output]: 7.00998e-06 [merge_forward]: 5.00001e-06 [cell_reuse_recompute_pass]: 1.79998e-06 [offload_activation]: 1.223e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.396e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 1.231e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30001e-06 [meta_fg_expand]: 3.83001e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.475e-05 [a_after_grad]: 1.036e-05 [renormalize]: 0.00091171 [add_forward_monad_depend]: 6.38998e-06 [auto_monad_grad]: 2.51e-06 [auto_monad_eliminator]: 2.017e-05 [cse]: 3.226e-05 [a_3]: 5.602e-05 [Cycle 2]: 0.00070623, [45] [expand_dump_flag]: 2.46998e-06 [switch_simplify]: 8.94e-06 [loop_unroll]: 6.49001e-06 [a_1]: 0.0001427 [with_stream_mark]: 1.545e-05 [recompute_prepare]: 7.08e-06 [updatestate_depend_eliminate]: 3.98001e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.20999e-06 [a_2]: 7.576e-05 [accelerated_algorithm]: 6.77002e-06 [shard]: 1.72999e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 6.59999e-06 [merge_send_recv]: 6.69999e-06 [auto_parallel]: 7.95998e-06 [parallel]: 7.19001e-06 [flash_sp]: 3.51001e-06 [merge_comm]: 3.22002e-06 [allreduce_fusion]: 3.17002e-06 [matmul_add_comm_reduction]: 7.41001e-06 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 7.61001e-06 [virtual_dataset]: 5.68002e-06 [get_grad_eliminate_]: 5.73002e-06 [virtual_output]: 6.00002e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 2.32999e-06 [offload_activation]: 8.37e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.133e-05 [merge_recompute_call_nodes]: 1.05999e-06 [before_grad]: 9.47001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.23999e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 1.30001e-06 [receive_attached]: 1.89e-06 [after_resolve]: 1.155e-05 [a_after_grad]: 8.30999e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.87001e-06 [auto_monad_grad]: 1.69998e-06 [auto_monad_eliminator]: 1.02e-05 [cse]: 1.849e-05 [a_3]: 3.511e-05 [py_interpret_to_execute_after_opt_a]: 1.472e-05 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 4.168e-05 [convert_after_rewriter]: 8.05e-06 [order_py_execute_after_rewriter]: 5.42999e-06 [mutable_eliminate]: 0.00082712 [opt_b]: 0.00022835, [1] [Cycle 1]: 0.00021915, [7] [b_1]: 0.00012316 [b_2]: 8.31002e-06 [updatestate_depend_eliminate]: 8.89998e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.67001e-06 [renormalize]: 3.9002e-07 [cse]: 2.873e-05 [optimize_parallel_all_gather_comm]: 2.249e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 3.145e-05 [loop_unroll]: 0.00055206 [opt_after_cconv]: 0.00012337, [1] [Cycle 1]: 0.00011538, [7] [c_1]: 3.36e-05 [parameter_eliminate]: 5.59998e-06 [updatestate_depend_eliminate]: 8.37e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.78998e-06 [cse]: 2.235e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.571e-05 [tuple_transform]: 8.267e-05, [1] [Cycle 1]: 7.732e-05, [4] [d_1]: 4.881e-05 [none_parameter_eliminate]: 2.19999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 7.21999e-06 [partial_unused_args_eliminate]: 1.88997e-06 [add_recomputation]: 5.672e-05 [cse_after_recomputation]: 2.614e-05, [1] [Cycle 1]: 2.095e-05, [1] [cse]: 1.395e-05 [environ_conv]: 6.53e-06 [swap_dp_allreduce_reducescatter]: 6.59001e-06 [bias_add_comm_swap]: 3.00998e-06 [label_micro_interleaved_index]: 6.29001e-06 [label_fine_grained_interleaved_index]: 3.11999e-06 [merge_cast_opt]: 1.67999e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.78003e-06 [assign_add_opt]: 1.89999e-06 [ForceFp32Comm]: 1.12999e-06 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.60999e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 2.05002e-06 [control_data_broadcast_order]: 1.603e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 4.49002e-06 [overlap_recompute_and_grad_model_parallel]: 4.95999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.81e-06 [overlap_grad_ring_attention]: 4.62998e-06 [overlap_grad_flash_sp]: 2.374e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 2.14e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 9.803e-05, [1] [Cycle 1]: 9.183e-05, [6] [build]: 3.75e-06 [elim_shapecalc]: 1.576e-05 [elim_not_effective]: 1.564e-05 [opt_reshape]: 8.15e-06 [fold_const_symbol]: 1.154e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.39001e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 2.059e-05 [get_jit_bprop_graph]: 2.15002e-06 [rewriter_after_jit_bprop_graph]: 7.61001e-06 [opt_after_jit_grad]: 0.00061575 [validate]: 4.953e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.79165 [execute]: 1.003e-05 Sums bootstrap : 0.032773s : 3.73% type_inference : 0.007417s : 0.85% event_method : 0.000017s : 0.00% auto_monad : 0.000059s : 0.01% graph_reusing : 0.000072s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000032s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000025s : 0.00% optimize.rewriter_before_opt_a : 0.000065s : 0.01% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000043s : 0.00% optimize.opt_a.loop_unroll : 0.000029s : 0.00% optimize.opt_a.a_1 : 0.040972s : 4.67% optimize.opt_a.with_stream_mark : 0.000049s : 0.01% optimize.opt_a.recompute_prepare : 0.000024s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000010s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000166s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000017s : 0.00% optimize.opt_a.auto_parallel : 0.000019s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000015s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000019s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000013s : 0.00% optimize.opt_a.virtual_output : 0.000013s : 0.00% optimize.opt_a.merge_forward : 0.000009s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000021s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000022s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000010s : 0.00% optimize.opt_a.meta_fg_expand : 0.000007s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000026s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000912s : 0.10% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000030s : 0.00% optimize.opt_a.cse : 0.000051s : 0.01% optimize.opt_a.a_3 : 0.000091s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.00% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000827s : 0.09% optimize.opt_b.b_1 : 0.000123s : 0.01% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000029s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000031s : 0.00% optimize.loop_unroll : 0.000552s : 0.06% optimize.opt_after_cconv.c_1 : 0.000034s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000016s : 0.00% optimize.tuple_transform.d_1 : 0.000049s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000014s : 0.00% optimize.environ_conv : 0.000007s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000012s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000021s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000008s : 0.00% opt_after_jit_grad : 0.000616s : 0.07% validate : 0.000050s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.791650s : 90.19% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.040350 30 0.09% : 0.000036s : 5: substitution.arithmetic_simplify 0.01% : 0.000002s : 2: substitution.elim_not_effective 0.00% : 0.000001s : 2: substitution.fold_const_symbol 0.02% : 0.000007s : 4: substitution.graph_param_transform 99.81% : 0.040272s : 3: substitution.inline 0.01% : 0.000004s : 4: substitution.j_node_and_user_rematch 0.01% : 0.000005s : 4: substitution.remove_not_recompute_node 0.02% : 0.000006s : 4: substitution.replace_old_param 0.04% : 0.000016s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007361 2 90.85% : 0.006687s : 1: type_inference.infer 9.15% : 0.000674s : 1: type_inference.specialize ------[replace.] 0.000122 5 84.77% : 0.000103s : 3: replace.inline 15.23% : 0.000019s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.040278 5 99.96% : 0.040263s : 3: match.inline 0.04% : 0.000014s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000217 1131 0.93% : 0.000002s : 11: predicate.accumulaten_eliminater 1.05% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 8: predicate.addn_check_dump 0.97% : 0.000002s : 11: predicate.addn_zero_filter 0.67% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.47% : 0.000005s : 19: predicate.arithmetic_simplify 1.13% : 0.000002s : 11: predicate.cast_eliminate 0.62% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.17% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.71% : 0.000004s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000001s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.05% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.24% : 0.000003s : 15: predicate.environ_get_add_eliminate 0.94% : 0.000002s : 15: predicate.environ_get_depend_swap 1.68% : 0.000004s : 23: predicate.environ_get_eliminate 0.90% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.08% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.39% : 0.000005s : 16: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000001s : 4: predicate.fold_const_symbol 0.58% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000001s : 4: predicate.graph_param_transform 0.56% : 0.000001s : 8: predicate.incorporate_call 0.41% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000013s : 51: predicate.inline 0.82% : 0.000002s : 8: predicate.inline_without_move 0.30% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000004s : 21: predicate.list_to_tuple_eliminator_ 2.06% : 0.000004s : 32: predicate.load_eliminater 1.53% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.86% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000004s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.53% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000002s : 11: predicate.minmaximum_grad 2.06% : 0.000004s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 3.04% : 0.000007s : 16: predicate.partial_defer_inline 1.13% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000002s : 11: predicate.print_const_string_wrapper 0.55% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000005s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.09% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.23% : 0.000001s : 4: predicate.reset_defer_inline 1.04% : 0.000002s : 11: predicate.reshape_eliminate 0.64% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.35% : 0.000001s : 4: predicate.row_tensor_eliminate 1.08% : 0.000002s : 8: predicate.same_eliminate 0.42% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.13% : 0.000002s : 8: predicate.shard_identity_eliminate 0.73% : 0.000002s : 8: predicate.special_op_eliminate 0.64% : 0.000001s : 8: predicate.specialize_transform 1.34% : 0.000003s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000003s : 16: predicate.switch_defer_inline 1.78% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.12% : 0.000009s : 54: predicate.switch_simplify 0.75% : 0.000002s : 11: predicate.tile_eliminate 0.91% : 0.000002s : 11: predicate.transpose_eliminate 1.51% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000007s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.38% : 0.000005s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000004s : 21: predicate.tuple_to_list_eliminator_ 2.02% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.79% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.96% : 0.000002s : 8: predicate.virtual_output_eliminate 0.29% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000492 8 39.42% : 0.000194s : 3: func_graph_cloner_run.FuncGraphClonerGraph 60.58% : 0.000298s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.974418 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.36% : 0.003460s : 1: add_attr 0.35% : 0.003447s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000062s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000065s : 1: auto_monad 0.00% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 3.37% : 0.032814s : 1: bootstrap 0.00% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000030s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000007s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.00% : 0.000024s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000076s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.06% : 0.000566s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.09% : 0.000881s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000022s : 1: opt.transform.mutable_eliminate 4.25% : 0.041405s : 78: opt.transform.opt_a 0.00% : 0.000032s : 1: opt.transform.opt_after_cconv 0.00% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000099s : 28: opt.transform.opt_b 0.01% : 0.000053s : 2: opt.transform.opt_trans_graph 0.00% : 0.000046s : 4: opt.transform.symbol_engine_opt 4.45% : 0.043350s : 1: opt_a 0.01% : 0.000128s : 1: opt_after_cconv 0.06% : 0.000631s : 1: opt_after_jit_grad 0.02% : 0.000232s : 1: opt_b 4.72% : 0.046003s : 1: optimize 0.00% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000006s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000037s : 1: pre_auto_parallel 0.00% : 0.000029s : 1: py_interpret_to_execute 0.00% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000005s : 1: remove_cast_before_assign_add 0.00% : 0.000020s : 1: remove_dup_value 0.05% : 0.000481s : 1: renormalize.infer 0.04% : 0.000419s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000048s : 1: rewriter_after_opt_a 0.01% : 0.000071s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000101s : 1: symbol_engine_optimizer 81.25% : 0.791673s : 1: task_emit 0.01% : 0.000086s : 1: tuple_transform 0.76% : 0.007441s : 1: type_inference 0.01% : 0.000087s : 1: validate TotalTime = 1.12202, [24] [bootstrap]: 0.00052673 [type_inference]: 0.0754722 [event_method]: 6.181e-05 [auto_monad]: 0.00014015 [graph_reusing]: 8.70001e-06 [inline]: 2.44001e-06 [add_attr]: 0.00405272, [1] [add_attr_with_inline]: 0.0040415, [1] [Cycle 1]: 9.924e-05, [2] [tag_attr]: 4.12e-05 [meta_addattr_fg_expand]: 1.063e-05 [parallel-infer-symbol]: 3.86999e-06 [pre_auto_parallel]: 6.221e-05 [insert-virtual-dataset]: 2.77002e-06 [parallel-infer-symbol-second]: 1.02e-06 [dataset_repeat_opt]: 2.18998e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.0247778, [53] [py_interpret_to_execute]: 4.77e-05 [rewriter_before_opt_a]: 0.00016574 [opt_a]: 0.0215999, [3] [Cycle 1]: 0.00929201, [45] [expand_dump_flag]: 4.78001e-06 [switch_simplify]: 8e-05 [loop_unroll]: 6.574e-05 [a_1]: 0.00168201 [with_stream_mark]: 3.233e-05 [recompute_prepare]: 2.831e-05 [updatestate_depend_eliminate]: 1.083e-05 [updatestate_assign_eliminate]: 8.02e-06 [updatestate_loads_eliminate]: 7.47998e-06 [parameter_eliminate]: 3.04001e-06 [a_2]: 0.00027576 [accelerated_algorithm]: 4.18e-05 [shard]: 2.66999e-06 [meta_shard_fg_expand]: 5.07e-06 [shard_inline]: 1.802e-05 [merge_send_recv]: 2.189e-05 [auto_parallel]: 1.393e-05 [parallel]: 2.213e-05 [flash_sp]: 1.572e-05 [merge_comm]: 1.11e-05 [allreduce_fusion]: 9.81e-06 [matmul_add_comm_reduction]: 3.395e-05 [allreduce_slice_to_reducescatter]: 8.10018e-07 [virtual_shard_identity]: 2.093e-05 [virtual_dataset]: 1.615e-05 [get_grad_eliminate_]: 1.58e-05 [virtual_output]: 1.654e-05 [merge_forward]: 1.104e-05 [cell_reuse_recompute_pass]: 1.69e-06 [offload_activation]: 2.173e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.508e-05 [merge_recompute_call_nodes]: 1.99999e-06 [before_grad]: 4.286e-05 [set_forward_comm_id_for_comm_node_pass]: 1.404e-05 [meta_fg_expand]: 0.00199726 [flash_sp_send_recv_attached]: 6.07999e-06 [receive_attached]: 3.16001e-06 [after_resolve]: 7.126e-05 [a_after_grad]: 9.119e-05 [renormalize]: 0.00340203 [add_forward_monad_depend]: 1.403e-05 [auto_monad_grad]: 6.63e-06 [auto_monad_eliminator]: 6.194e-05 [cse]: 0.00019623 [a_3]: 0.00039588 [Cycle 2]: 0.00536239, [45] [expand_dump_flag]: 4.12e-06 [switch_simplify]: 4.989e-05 [loop_unroll]: 4.537e-05 [a_1]: 0.00323079 [with_stream_mark]: 2.601e-05 [recompute_prepare]: 1.449e-05 [updatestate_depend_eliminate]: 6.98e-06 [updatestate_assign_eliminate]: 4.84003e-06 [updatestate_loads_eliminate]: 4.00998e-06 [parameter_eliminate]: 2.42001e-06 [a_2]: 0.00012447 [accelerated_algorithm]: 1.48e-05 [shard]: 2.73e-06 [meta_shard_fg_expand]: 3.48e-06 [shard_inline]: 9.23002e-06 [merge_send_recv]: 9.42001e-06 [auto_parallel]: 1.176e-05 [parallel]: 9.69999e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 5.54998e-06 [allreduce_fusion]: 4.63999e-06 [matmul_add_comm_reduction]: 1.047e-05 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 1.25e-05 [virtual_dataset]: 8.99e-06 [get_grad_eliminate_]: 4.234e-05 [virtual_output]: 1.025e-05 [merge_forward]: 6.95002e-06 [cell_reuse_recompute_pass]: 2.44001e-06 [offload_activation]: 1.362e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.869e-05 [merge_recompute_call_nodes]: 1.81e-06 [before_grad]: 1.522e-05 [set_forward_comm_id_for_comm_node_pass]: 6.21e-06 [meta_fg_expand]: 0.00013777 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 3.32002e-06 [after_resolve]: 2.012e-05 [a_after_grad]: 1.515e-05 [renormalize]: 0.00098256 [add_forward_monad_depend]: 7.55e-06 [auto_monad_grad]: 2.16e-06 [auto_monad_eliminator]: 2.022e-05 [cse]: 4.096e-05 [a_3]: 6.86e-05 [Cycle 3]: 0.00692373, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 9.47999e-06 [loop_unroll]: 8.27e-06 [a_1]: 0.00022903 [with_stream_mark]: 1.736e-05 [recompute_prepare]: 1.063e-05 [updatestate_depend_eliminate]: 5.52999e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.93999e-06 [parameter_eliminate]: 1.42e-06 [a_2]: 0.00011498 [accelerated_algorithm]: 1.409e-05 [shard]: 1.74e-06 [meta_shard_fg_expand]: 2.67001e-06 [shard_inline]: 9.84001e-06 [merge_send_recv]: 8.84e-06 [auto_parallel]: 9.58997e-06 [parallel]: 7.55e-06 [flash_sp]: 9.5999e-07 [merge_comm]: 0.00584224 [allreduce_fusion]: 9.99001e-06 [matmul_add_comm_reduction]: 1.53e-05 [allreduce_slice_to_reducescatter]: 1.27e-06 [virtual_shard_identity]: 3.208e-05 [virtual_dataset]: 1.079e-05 [get_grad_eliminate_]: 9.96e-06 [virtual_output]: 7.96001e-06 [merge_forward]: 6.43e-06 [cell_reuse_recompute_pass]: 3.18e-06 [offload_activation]: 1.461e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.867e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 1.479e-05 [set_forward_comm_id_for_comm_node_pass]: 5.97999e-06 [meta_fg_expand]: 5.40001e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.704e-05 [a_after_grad]: 1.544e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 5.62001e-06 [auto_monad_grad]: 2.60002e-06 [auto_monad_eliminator]: 2.018e-05 [cse]: 4.598e-05 [a_3]: 5.947e-05 [py_interpret_to_execute_after_opt_a]: 2.299e-05 [slice_cell_reuse_recomputed_activation]: 2.72001e-06 [rewriter_after_opt_a]: 6.242e-05 [convert_after_rewriter]: 2.119e-05 [order_py_execute_after_rewriter]: 7.55e-06 [mutable_eliminate]: 0.00088231 [opt_b]: 0.00034691, [1] [Cycle 1]: 0.00033731, [7] [b_1]: 0.00021935 [b_2]: 1.37e-05 [updatestate_depend_eliminate]: 1.118e-05 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 4.31002e-06 [renormalize]: 1.04e-06 [cse]: 3.466e-05 [optimize_parallel_all_gather_comm]: 2.398e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 3.641e-05 [loop_unroll]: 0.00058381 [opt_after_cconv]: 0.00014754, [1] [Cycle 1]: 0.00013915, [7] [c_1]: 4.874e-05 [parameter_eliminate]: 4.86997e-06 [updatestate_depend_eliminate]: 8.60999e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.38e-06 [cse]: 3.12e-05 [renormalize]: 6.60017e-07 [remove_dup_value]: 4.612e-05 [tuple_transform]: 0.00010957, [1] [Cycle 1]: 0.0001035, [4] [d_1]: 6.939e-05 [none_parameter_eliminate]: 2.07001e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 9.77001e-06 [partial_unused_args_eliminate]: 1.90001e-06 [add_recomputation]: 7.472e-05 [cse_after_recomputation]: 3.422e-05, [1] [Cycle 1]: 2.877e-05, [1] [cse]: 2.134e-05 [environ_conv]: 1.112e-05 [swap_dp_allreduce_reducescatter]: 9.22001e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 5.44e-06 [label_fine_grained_interleaved_index]: 3.07002e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.11997e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.69e-06 [overlap_opt_shard_grad_in_pipeline]: 2.21998e-06 [control_data_broadcast_order]: 2.063e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 5.29998e-06 [overlap_recompute_and_grad_model_parallel]: 6.63e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 5.10001e-06 [overlap_grad_flash_sp]: 3.009e-05 [begin_end_overlap_inline]: 6.69999e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 0.00012243, [1] [Cycle 1]: 0.00011587, [6] [build]: 1.386e-05 [elim_shapecalc]: 1.849e-05 [elim_not_effective]: 2.134e-05 [opt_reshape]: 9.77999e-06 [fold_const_symbol]: 1.35e-05 [renormalize]: 3.10014e-07 [detach_backward]: 2.20002e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 2.639e-05 [get_jit_bprop_graph]: 1.94999e-06 [rewriter_after_jit_bprop_graph]: 5.91998e-06 [opt_after_jit_grad]: 0.00057682 [validate]: 6.002e-05 [backend_pass]: 9.60019e-07 [task_emit]: 1.01589 [execute]: 9.52999e-06 Sums bootstrap : 0.000527s : 0.05% type_inference : 0.075472s : 6.76% event_method : 0.000062s : 0.01% auto_monad : 0.000140s : 0.01% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000041s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000062s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000048s : 0.00% optimize.rewriter_before_opt_a : 0.000166s : 0.01% optimize.opt_a.expand_dump_flag : 0.000012s : 0.00% optimize.opt_a.switch_simplify : 0.000139s : 0.01% optimize.opt_a.loop_unroll : 0.000119s : 0.01% optimize.opt_a.a_1 : 0.005142s : 0.46% optimize.opt_a.with_stream_mark : 0.000076s : 0.01% optimize.opt_a.recompute_prepare : 0.000053s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000023s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000515s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000071s : 0.01% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.00% optimize.opt_a.shard_inline : 0.000037s : 0.00% optimize.opt_a.merge_send_recv : 0.000040s : 0.00% optimize.opt_a.auto_parallel : 0.000035s : 0.00% optimize.opt_a.parallel : 0.000039s : 0.00% optimize.opt_a.flash_sp : 0.000020s : 0.00% optimize.opt_a.merge_comm : 0.005859s : 0.52% optimize.opt_a.allreduce_fusion : 0.000024s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000060s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000066s : 0.01% optimize.opt_a.virtual_dataset : 0.000036s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000068s : 0.01% optimize.opt_a.virtual_output : 0.000035s : 0.00% optimize.opt_a.merge_forward : 0.000024s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000007s : 0.00% optimize.opt_a.offload_activation : 0.000050s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000082s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000073s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000026s : 0.00% optimize.opt_a.meta_fg_expand : 0.002140s : 0.19% optimize.opt_a.flash_sp_send_recv_attached : 0.000011s : 0.00% optimize.opt_a.receive_attached : 0.000009s : 0.00% optimize.opt_a.after_resolve : 0.000108s : 0.01% optimize.opt_a.a_after_grad : 0.000122s : 0.01% optimize.opt_a.renormalize : 0.004385s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000027s : 0.00% optimize.opt_a.auto_monad_grad : 0.000011s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000102s : 0.01% optimize.opt_a.cse : 0.000283s : 0.03% optimize.opt_a.a_3 : 0.000524s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000023s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000062s : 0.01% optimize.convert_after_rewriter : 0.000021s : 0.00% optimize.order_py_execute_after_rewriter : 0.000008s : 0.00% optimize.mutable_eliminate : 0.000882s : 0.08% optimize.opt_b.b_1 : 0.000219s : 0.02% optimize.opt_b.b_2 : 0.000014s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000035s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000036s : 0.00% optimize.loop_unroll : 0.000584s : 0.05% optimize.opt_after_cconv.c_1 : 0.000049s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000046s : 0.00% optimize.tuple_transform.d_1 : 0.000069s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000075s : 0.01% optimize.cse_after_recomputation.cse : 0.000021s : 0.00% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000021s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000030s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000018s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000577s : 0.05% validate : 0.000060s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 1.015889s : 91.02% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.001132 213 6.04% : 0.000068s : 12: substitution.arithmetic_simplify 0.30% : 0.000003s : 4: substitution.elim_not_effective 0.46% : 0.000005s : 5: substitution.float_depend_g_call 0.45% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.16% : 0.000002s : 4: substitution.fold_const_symbol 0.74% : 0.000008s : 7: substitution.graph_param_transform 0.28% : 0.000003s : 2: substitution.incorporate_call 0.22% : 0.000002s : 2: substitution.incorporate_call_switch 61.78% : 0.000699s : 17: substitution.inline 1.92% : 0.000022s : 2: substitution.inline_without_move 1.65% : 0.000019s : 18: substitution.j_node_and_user_rematch 1.85% : 0.000021s : 3: substitution.less_batch_normalization 1.39% : 0.000016s : 11: substitution.minmaximum_grad 0.71% : 0.000008s : 5: substitution.partial_eliminate 1.31% : 0.000015s : 18: substitution.remove_not_recompute_node 3.26% : 0.000037s : 10: substitution.replace_applicator 1.34% : 0.000015s : 15: substitution.replace_old_param 0.41% : 0.000005s : 1: substitution.set_cell_output_no_recompute 2.91% : 0.000033s : 11: substitution.tuple_list_convert_item_index_to_positive 1.44% : 0.000016s : 11: substitution.tuple_list_get_item_const_eliminator 1.87% : 0.000021s : 11: substitution.tuple_list_get_item_depend_reorder 7.70% : 0.000087s : 30: substitution.tuple_list_get_item_eliminator 1.82% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.075365 2 97.22% : 0.073268s : 1: type_inference.infer 2.78% : 0.002097s : 1: type_inference.specialize ------[replace.] 0.000290 33 58.93% : 0.000171s : 17: replace.inline 41.07% : 0.000119s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000733 33 93.78% : 0.000687s : 17: match.inline 6.22% : 0.000046s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000823 5530 1.06% : 0.000009s : 66: predicate.accumulaten_eliminater 0.36% : 0.000003s : 7: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 30: predicate.addn_check_dump 1.04% : 0.000009s : 66: predicate.addn_zero_filter 0.98% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 2.15% : 0.000018s : 96: predicate.arithmetic_simplify 1.06% : 0.000009s : 66: predicate.cast_eliminate 1.08% : 0.000009s : 65: predicate.check_bprop_eliminate 0.47% : 0.000004s : 30: predicate.compare_switch_simplify 0.10% : 0.000001s : 7: predicate.const_output_eliminate 0.48% : 0.000004s : 30: predicate.depend_value_elim 1.10% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 1.08% : 0.000009s : 66: predicate.dict_get_item_eliminator 1.02% : 0.000008s : 66: predicate.dict_set_item_eliminator 0.49% : 0.000004s : 14: predicate.dumpgradient_eliminate 0.12% : 0.000001s : 7: predicate.elim_not_effective 0.25% : 0.000002s : 7: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000009s : 73: predicate.environ_add_const_eliminate 1.11% : 0.000009s : 73: predicate.environ_get_add_eliminate 1.12% : 0.000009s : 73: predicate.environ_get_depend_swap 1.60% : 0.000013s : 103: predicate.environ_get_eliminate 1.08% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.56% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.54% : 0.000021s : 99: predicate.float_depend_g_call 0.46% : 0.000004s : 30: predicate.float_environ_get_switch 0.58% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 7: predicate.fold_const_symbol 0.72% : 0.000006s : 30: predicate.get_grad_eliminate 0.10% : 0.000001s : 7: predicate.graph_param_transform 0.55% : 0.000005s : 30: predicate.incorporate_call 0.43% : 0.000004s : 30: predicate.incorporate_call_switch 5.55% : 0.000046s : 239: predicate.inline 1.35% : 0.000011s : 53: predicate.inline_without_move 0.26% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.73% : 0.000006s : 30: predicate.less_batch_normalization 1.48% : 0.000012s : 96: predicate.list_to_tuple_eliminator_ 2.43% : 0.000020s : 162: predicate.load_eliminater 0.48% : 0.000004s : 7: predicate.loop_unroll_after_grad 2.20% : 0.000018s : 134: predicate.loop_unroll_before_grad 1.36% : 0.000011s : 80: predicate.make_slice_get_slice_eliminator 0.49% : 0.000004s : 30: predicate.merge_addn 1.26% : 0.000010s : 65: predicate.micro_step_allgather_replace 1.11% : 0.000009s : 65: predicate.mini_step_allgather_replace 1.04% : 0.000009s : 66: predicate.minmaximum_grad 0.55% : 0.000004s : 7: predicate.mutable_eliminate 0.14% : 0.000001s : 7: predicate.opt_reshape 0.15% : 0.000001s : 7: predicate.parallel_virtual_node 2.38% : 0.000020s : 99: predicate.partial_defer_inline 1.56% : 0.000013s : 89: predicate.partial_eliminate 1.00% : 0.000008s : 66: predicate.print_const_string_wrapper 0.51% : 0.000004s : 30: predicate.reduce_all_const_elim 1.45% : 0.000012s : 66: predicate.reduce_eliminate 2.57% : 0.000021s : 162: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000003s : 30: predicate.remove_not_recompute_node 1.87% : 0.000015s : 147: predicate.replace_applicator 0.56% : 0.000005s : 53: predicate.replace_old_param 0.09% : 0.000001s : 7: predicate.reset_defer_inline 0.98% : 0.000008s : 66: predicate.reshape_eliminate 1.10% : 0.000009s : 65: predicate.row_tensor_add_zeros_like 0.23% : 0.000002s : 7: predicate.row_tensor_eliminate 1.34% : 0.000011s : 65: predicate.same_eliminate 0.40% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.90% : 0.000007s : 30: predicate.shard_identity_eliminate 0.28% : 0.000002s : 14: predicate.special_op_eliminate 0.59% : 0.000005s : 30: predicate.specialize_transform 1.40% : 0.000011s : 65: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000010s : 53: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.74% : 0.000014s : 99: predicate.switch_defer_inline 2.73% : 0.000023s : 164: predicate.switch_layer_defer_inline 6.76% : 0.000056s : 270: predicate.switch_simplify 0.98% : 0.000008s : 66: predicate.tile_eliminate 0.99% : 0.000008s : 66: predicate.transpose_eliminate 1.33% : 0.000011s : 80: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000012s : 80: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000011s : 80: predicate.tuple_list_get_item_depend_reorder 3.06% : 0.000025s : 126: predicate.tuple_list_get_item_eliminator 1.40% : 0.000012s : 80: predicate.tuple_list_get_set_item_eliminator 1.93% : 0.000016s : 110: predicate.tuple_list_set_item_eliminator 1.49% : 0.000012s : 96: predicate.tuple_to_list_eliminator_ 2.42% : 0.000020s : 162: predicate.updatestate_pure_node_eliminater 2.95% : 0.000024s : 192: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 7: predicate.value_based_eliminate 0.59% : 0.000005s : 30: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 30: predicate.virtual_output_eliminate 0.12% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.062189 34 98.25% : 0.061100s : 13: func_graph_cloner_run.FuncGraphClonerGraph 1.75% : 0.001088s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.162528 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.35% : 0.004060s : 1: add_attr 0.35% : 0.004046s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000081s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000149s : 1: auto_monad 0.00% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000036s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.05% : 0.000561s : 1: bootstrap 0.00% : 0.000041s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000024s : 1: control_data_broadcast_order 0.00% : 0.000025s : 1: convert_after_rewriter 0.00% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.01% : 0.000071s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000036s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.05% : 0.000597s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.08% : 0.000897s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000027s : 1: opt.transform.mutable_eliminate 0.60% : 0.006995s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000192s : 28: opt.transform.opt_b 0.01% : 0.000076s : 2: opt.transform.opt_trans_graph 0.00% : 0.000058s : 4: opt.transform.symbol_engine_opt 1.86% : 0.021605s : 1: opt_a 0.01% : 0.000151s : 1: opt_after_cconv 0.05% : 0.000590s : 1: opt_after_jit_grad 0.03% : 0.000351s : 1: opt_b 2.13% : 0.024785s : 1: optimize 0.00% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000011s : 1: order_py_execute_after_rewriter 0.00% : 0.000034s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000006s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000069s : 1: pre_auto_parallel 0.00% : 0.000053s : 1: py_interpret_to_execute 0.00% : 0.000028s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000051s : 1: remove_dup_value 0.20% : 0.002355s : 2: renormalize.infer 0.17% : 0.002009s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000068s : 1: rewriter_after_opt_a 0.01% : 0.000172s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000006s : 1: split_layernorm_comm 0.00% : 0.000007s : 1: split_matmul_comm_elemetwise 0.00% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000126s : 1: symbol_engine_optimizer 87.39% : 1.015914s : 1: task_emit 0.01% : 0.000113s : 1: tuple_transform 6.49% : 0.075500s : 1: type_inference 0.01% : 0.000094s : 1: validate TotalTime = 0.687148, [24] [bootstrap]: 0.00097155 [type_inference]: 0.0255683 [event_method]: 1.619e-05 [auto_monad]: 5.829e-05 [graph_reusing]: 5.62001e-06 [inline]: 2.99001e-06 [add_attr]: 0.00396148, [1] [add_attr_with_inline]: 0.00394883, [1] [Cycle 1]: 6.515e-05, [2] [tag_attr]: 1.788e-05 [meta_addattr_fg_expand]: 3.38e-06 [parallel-infer-symbol]: 4e-06 [pre_auto_parallel]: 3.314e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.39001e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00518009, [53] [py_interpret_to_execute]: 2.34e-05 [rewriter_before_opt_a]: 5.069e-05 [opt_a]: 0.0026739, [2] [Cycle 1]: 0.00184591, [45] [expand_dump_flag]: 3.19001e-06 [switch_simplify]: 2.921e-05 [loop_unroll]: 1.432e-05 [a_1]: 0.00034539 [with_stream_mark]: 2.316e-05 [recompute_prepare]: 1.203e-05 [updatestate_depend_eliminate]: 4.41002e-06 [updatestate_assign_eliminate]: 3.7e-06 [updatestate_loads_eliminate]: 3.56001e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 8.372e-05 [accelerated_algorithm]: 7.04001e-06 [shard]: 3.14999e-06 [meta_shard_fg_expand]: 1.95001e-06 [shard_inline]: 8.16002e-06 [merge_send_recv]: 1.059e-05 [auto_parallel]: 8.37998e-06 [parallel]: 2.15e-05 [flash_sp]: 1.152e-05 [merge_comm]: 4.17e-06 [allreduce_fusion]: 3.81999e-06 [matmul_add_comm_reduction]: 1.054e-05 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 1.074e-05 [virtual_dataset]: 6.68998e-06 [get_grad_eliminate_]: 6.14999e-06 [virtual_output]: 6.63e-06 [merge_forward]: 5.57999e-06 [cell_reuse_recompute_pass]: 2.88e-06 [offload_activation]: 1.214e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.466e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 1.175e-05 [set_forward_comm_id_for_comm_node_pass]: 4.57998e-06 [meta_fg_expand]: 2.54999e-06 [flash_sp_send_recv_attached]: 2.92002e-06 [receive_attached]: 2.65997e-06 [after_resolve]: 1.432e-05 [a_after_grad]: 1.016e-05 [renormalize]: 0.00071119 [add_forward_monad_depend]: 8.33999e-06 [auto_monad_grad]: 2.73e-06 [auto_monad_eliminator]: 1.996e-05 [cse]: 3.243e-05 [a_3]: 4.987e-05 [Cycle 2]: 0.0008153, [45] [expand_dump_flag]: 2.11998e-06 [switch_simplify]: 8.69003e-06 [loop_unroll]: 5.91998e-06 [a_1]: 0.0001978 [with_stream_mark]: 2.03e-05 [recompute_prepare]: 8.72e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 8.009e-05 [accelerated_algorithm]: 7.54002e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 7.88999e-06 [auto_parallel]: 8.01001e-06 [parallel]: 8.64e-06 [flash_sp]: 4.30999e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 2.96999e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 8.67998e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.11997e-06 [virtual_output]: 5.04e-06 [merge_forward]: 4.53999e-06 [cell_reuse_recompute_pass]: 2.22001e-06 [offload_activation]: 9.96998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.276e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.62001e-06 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 2.37001e-06 [flash_sp_send_recv_attached]: 1.47001e-06 [receive_attached]: 2.07001e-06 [after_resolve]: 1.131e-05 [a_after_grad]: 9.22001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.25002e-06 [auto_monad_grad]: 1.52001e-06 [auto_monad_eliminator]: 1.156e-05 [cse]: 2.07e-05 [a_3]: 3.562e-05 [py_interpret_to_execute_after_opt_a]: 1.362e-05 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 3.972e-05 [convert_after_rewriter]: 8.3e-06 [order_py_execute_after_rewriter]: 6.65002e-06 [mutable_eliminate]: 0.00075904 [opt_b]: 0.00021357, [1] [Cycle 1]: 0.00020456, [7] [b_1]: 0.00011766 [b_2]: 8.70999e-06 [updatestate_depend_eliminate]: 9.10001e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.91999e-06 [renormalize]: 5.00004e-07 [cse]: 2.475e-05 [optimize_parallel_all_gather_comm]: 2.108e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 3.036e-05 [loop_unroll]: 0.00059265 [opt_after_cconv]: 0.00011373, [1] [Cycle 1]: 0.00010644, [7] [c_1]: 3.106e-05 [parameter_eliminate]: 4.1e-06 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 2.61999e-06 [cse]: 2.337e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.369e-05 [tuple_transform]: 7.867e-05, [1] [Cycle 1]: 7.357e-05, [4] [d_1]: 4.555e-05 [none_parameter_eliminate]: 1.79998e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.79999e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 5.518e-05 [cse_after_recomputation]: 2.396e-05, [1] [Cycle 1]: 1.912e-05, [1] [cse]: 1.321e-05 [environ_conv]: 5.86e-06 [swap_dp_allreduce_reducescatter]: 5.50001e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 6.51e-06 [label_fine_grained_interleaved_index]: 3.32002e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.31002e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.88998e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 1.14998e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.571e-05 [grouped_pairwise_exchange_alltoall]: 2.09999e-06 [offloading_packed_experts]: 4.50001e-06 [overlap_recompute_and_grad_model_parallel]: 5.77001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.69001e-06 [overlap_grad_ring_attention]: 4.74998e-06 [overlap_grad_flash_sp]: 2.082e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 8.487e-05, [1] [Cycle 1]: 7.942e-05, [6] [build]: 3.43999e-06 [elim_shapecalc]: 1.309e-05 [elim_not_effective]: 1.279e-05 [opt_reshape]: 6.68998e-06 [fold_const_symbol]: 9.47001e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 2.106e-05 [get_jit_bprop_graph]: 2.22001e-06 [rewriter_after_jit_bprop_graph]: 4.91002e-06 [opt_after_jit_grad]: 0.00067464 [validate]: 4.332e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.650315 [execute]: 1.009e-05 Sums bootstrap : 0.000972s : 0.14% type_inference : 0.025568s : 3.75% event_method : 0.000016s : 0.00% auto_monad : 0.000058s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000033s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000051s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.01% optimize.opt_a.loop_unroll : 0.000020s : 0.00% optimize.opt_a.a_1 : 0.000543s : 0.08% optimize.opt_a.with_stream_mark : 0.000043s : 0.01% optimize.opt_a.recompute_prepare : 0.000021s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000164s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000014s : 0.00% optimize.opt_a.merge_send_recv : 0.000018s : 0.00% optimize.opt_a.auto_parallel : 0.000016s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000019s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000010s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000022s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000027s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000010s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000026s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000711s : 0.10% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000032s : 0.00% optimize.opt_a.cse : 0.000053s : 0.01% optimize.opt_a.a_3 : 0.000085s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000759s : 0.11% optimize.opt_b.b_1 : 0.000118s : 0.02% optimize.opt_b.b_2 : 0.000009s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.00% optimize.loop_unroll : 0.000593s : 0.09% optimize.opt_after_cconv.c_1 : 0.000031s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000046s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.01% optimize.cse_after_recomputation.cse : 0.000013s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000007s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000675s : 0.10% validate : 0.000043s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.650315s : 95.36% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000168 26 19.42% : 0.000033s : 4: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.73% : 0.000006s : 4: substitution.graph_param_transform 66.12% : 0.000111s : 2: substitution.inline 2.24% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.02% : 0.000005s : 4: substitution.remove_not_recompute_node 3.53% : 0.000006s : 4: substitution.replace_old_param ------[type_inference.] 0.025505 2 98.11% : 0.025022s : 1: type_inference.infer 1.89% : 0.000483s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000109 2 100.00% : 0.000109s : 2: match.inline ------[predicate.] 0.000162 984 0.71% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.61% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.71% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.86% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.70% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.64% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.81% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.05% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.96% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_depend_swap 1.64% : 0.000003s : 21: predicate.environ_get_eliminate 0.94% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.70% : 0.000009s : 44: predicate.inline 0.94% : 0.000002s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.55% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 1.86% : 0.000003s : 26: predicate.load_eliminater 1.87% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.55% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.97% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.64% : 0.000001s : 9: predicate.minmaximum_grad 2.26% : 0.000004s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.12% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 2.02% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.12% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.47% : 0.000001s : 4: predicate.reset_defer_inline 0.65% : 0.000001s : 9: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.52% : 0.000001s : 4: predicate.row_tensor_eliminate 1.19% : 0.000002s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.54% : 0.000002s : 8: predicate.shard_identity_eliminate 1.06% : 0.000002s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.24% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.92% : 0.000001s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.91% : 0.000008s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.93% : 0.000002s : 9: predicate.transpose_eliminate 1.60% : 0.000003s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000003s : 17: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.51% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.89% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.67% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000399 6 42.12% : 0.000168s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.88% : 0.000231s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.698128 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.57% : 0.003967s : 1: add_attr 0.57% : 0.003953s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000064s : 1: auto_monad 0.00% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.14% : 0.001005s : 1: bootstrap 0.00% : 0.000035s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.00% : 0.000023s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000007s : 1: label_fine_grained_interleaved_index 0.00% : 0.000010s : 1: label_micro_interleaved_index 0.09% : 0.000608s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.11% : 0.000775s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000022s : 1: opt.transform.mutable_eliminate 0.14% : 0.000959s : 78: opt.transform.opt_a 0.00% : 0.000030s : 1: opt.transform.opt_after_cconv 0.00% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000098s : 28: opt.transform.opt_b 0.01% : 0.000050s : 2: opt.transform.opt_trans_graph 0.01% : 0.000038s : 4: opt.transform.symbol_engine_opt 0.38% : 0.002677s : 1: opt_a 0.02% : 0.000118s : 1: opt_after_cconv 0.10% : 0.000690s : 1: opt_after_jit_grad 0.03% : 0.000217s : 1: opt_b 0.74% : 0.005185s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000037s : 1: pre_auto_parallel 0.00% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000018s : 1: remove_dup_value 0.06% : 0.000385s : 1: renormalize.infer 0.04% : 0.000314s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000045s : 1: rewriter_after_opt_a 0.01% : 0.000056s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000088s : 1: symbol_engine_optimizer 93.15% : 0.650340s : 1: task_emit 0.01% : 0.000082s : 1: tuple_transform 3.67% : 0.025599s : 1: type_inference 0.01% : 0.000077s : 1: validate TotalTime = 0.941259, [24] [bootstrap]: 0.00051937 [type_inference]: 0.0764879 [event_method]: 5.472e-05 [auto_monad]: 0.00014024 [graph_reusing]: 9.02999e-06 [inline]: 2.73e-06 [add_attr]: 0.00391809, [1] [add_attr_with_inline]: 0.00390437, [1] [Cycle 1]: 8.798e-05, [2] [tag_attr]: 3.812e-05 [meta_addattr_fg_expand]: 9.08002e-06 [parallel-infer-symbol]: 3.66999e-06 [pre_auto_parallel]: 5.564e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.0297745, [53] [py_interpret_to_execute]: 4.499e-05 [rewriter_before_opt_a]: 0.00014142 [opt_a]: 0.0271747, [3] [Cycle 1]: 0.00811131, [45] [expand_dump_flag]: 4.40999e-06 [switch_simplify]: 6.977e-05 [loop_unroll]: 5.653e-05 [a_1]: 0.00146288 [with_stream_mark]: 3.037e-05 [recompute_prepare]: 2.633e-05 [updatestate_depend_eliminate]: 9.90002e-06 [updatestate_assign_eliminate]: 8.32003e-06 [updatestate_loads_eliminate]: 8.07e-06 [parameter_eliminate]: 2.61999e-06 [a_2]: 0.00025411 [accelerated_algorithm]: 3.329e-05 [shard]: 2.04e-06 [meta_shard_fg_expand]: 4.18001e-06 [shard_inline]: 1.636e-05 [merge_send_recv]: 1.898e-05 [auto_parallel]: 1.295e-05 [parallel]: 2.167e-05 [flash_sp]: 1.371e-05 [merge_comm]: 1.018e-05 [allreduce_fusion]: 9.54e-06 [matmul_add_comm_reduction]: 3.325e-05 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 1.878e-05 [virtual_dataset]: 1.635e-05 [get_grad_eliminate_]: 1.586e-05 [virtual_output]: 2.14e-05 [merge_forward]: 1.026e-05 [cell_reuse_recompute_pass]: 1.36998e-06 [offload_activation]: 2.004e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.129e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 2.863e-05 [set_forward_comm_id_for_comm_node_pass]: 1.116e-05 [meta_fg_expand]: 0.00170548 [flash_sp_send_recv_attached]: 4.60999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 6.642e-05 [a_after_grad]: 8.721e-05 [renormalize]: 0.00302303 [add_forward_monad_depend]: 1.11e-05 [auto_monad_grad]: 6.37001e-06 [auto_monad_eliminator]: 6.017e-05 [cse]: 0.00018672 [a_3]: 0.00036006 [Cycle 2]: 0.0177522, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 4.918e-05 [loop_unroll]: 4.488e-05 [a_1]: 0.00164624 [with_stream_mark]: 1.765e-05 [recompute_prepare]: 1.166e-05 [updatestate_depend_eliminate]: 5.41002e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 3.61999e-06 [parameter_eliminate]: 1.81998e-06 [a_2]: 0.00011746 [accelerated_algorithm]: 1.27e-05 [shard]: 2.11e-06 [meta_shard_fg_expand]: 2.77002e-06 [shard_inline]: 8.68001e-06 [merge_send_recv]: 9.60001e-06 [auto_parallel]: 1.107e-05 [parallel]: 9.41e-06 [flash_sp]: 4.04002e-06 [merge_comm]: 5.44998e-06 [allreduce_fusion]: 4.47998e-06 [matmul_add_comm_reduction]: 1.02e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 1.004e-05 [virtual_dataset]: 8.33999e-06 [get_grad_eliminate_]: 7.83999e-06 [virtual_output]: 7.85e-06 [merge_forward]: 6.53e-06 [cell_reuse_recompute_pass]: 1.64e-06 [offload_activation]: 1.294e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.725e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 1.37e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17e-06 [meta_fg_expand]: 5.213e-05 [flash_sp_send_recv_attached]: 1.50001e-06 [receive_attached]: 2.83e-06 [after_resolve]: 1.701e-05 [a_after_grad]: 1.417e-05 [renormalize]: 0.0151302 [add_forward_monad_depend]: 1.114e-05 [auto_monad_grad]: 2.82002e-06 [auto_monad_eliminator]: 2.605e-05 [cse]: 4.504e-05 [a_3]: 8.362e-05 [Cycle 3]: 0.00128723, [45] [expand_dump_flag]: 2.45002e-06 [switch_simplify]: 1.209e-05 [loop_unroll]: 9.93998e-06 [a_1]: 0.00025722 [with_stream_mark]: 1.521e-05 [recompute_prepare]: 1.019e-05 [updatestate_depend_eliminate]: 5.82001e-06 [updatestate_assign_eliminate]: 0.00029888 [updatestate_loads_eliminate]: 4.82998e-06 [parameter_eliminate]: 2.48002e-06 [a_2]: 0.00012666 [accelerated_algorithm]: 1.513e-05 [shard]: 2.20002e-06 [meta_shard_fg_expand]: 2.59999e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 1.133e-05 [auto_parallel]: 1.168e-05 [parallel]: 1.082e-05 [flash_sp]: 1.90001e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 5.43002e-06 [matmul_add_comm_reduction]: 1.173e-05 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 1.18e-05 [virtual_dataset]: 9.22999e-06 [get_grad_eliminate_]: 8.52e-06 [virtual_output]: 8.49002e-06 [merge_forward]: 5.72001e-06 [cell_reuse_recompute_pass]: 3.49001e-06 [offload_activation]: 1.305e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.762e-05 [merge_recompute_call_nodes]: 1.82001e-06 [before_grad]: 1.573e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34e-06 [meta_fg_expand]: 3.91001e-06 [flash_sp_send_recv_attached]: 2.06998e-06 [receive_attached]: 2.89001e-06 [after_resolve]: 1.745e-05 [a_after_grad]: 1.473e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.44998e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.035e-05 [cse]: 2.428e-05 [a_3]: 5.221e-05 [py_interpret_to_execute_after_opt_a]: 1.92e-05 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 5.716e-05 [convert_after_rewriter]: 9.19998e-06 [order_py_execute_after_rewriter]: 6.25002e-06 [mutable_eliminate]: 0.00071335 [opt_b]: 0.00028195, [1] [Cycle 1]: 0.00027402, [7] [b_1]: 0.00018496 [b_2]: 1.132e-05 [updatestate_depend_eliminate]: 6.93998e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.47002e-06 [renormalize]: 5.09986e-07 [cse]: 2.581e-05 [optimize_parallel_all_gather_comm]: 2.275e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.615e-05 [loop_unroll]: 0.00044309 [opt_after_cconv]: 0.00013001, [1] [Cycle 1]: 0.0001236, [7] [c_1]: 4.682e-05 [parameter_eliminate]: 2.42001e-06 [updatestate_depend_eliminate]: 6.33e-06 [updatestate_assign_eliminate]: 3.63999e-06 [updatestate_loads_eliminate]: 3.48999e-06 [cse]: 2.533e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 3.741e-05 [tuple_transform]: 9.812e-05, [1] [Cycle 1]: 9.31e-05, [4] [d_1]: 6.276e-05 [none_parameter_eliminate]: 1.86998e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 9.39998e-06 [partial_unused_args_eliminate]: 2.04999e-06 [add_recomputation]: 6.439e-05 [cse_after_recomputation]: 2.836e-05, [1] [Cycle 1]: 2.355e-05, [1] [cse]: 1.787e-05 [environ_conv]: 9.56e-06 [swap_dp_allreduce_reducescatter]: 7.00998e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.18998e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.22999e-06 [overlap_opt_shard_in_pipeline]: 1.29998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.577e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 4.78001e-06 [overlap_recompute_and_grad_model_parallel]: 5.33002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.53002e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.92e-06 [overlap_grad_flash_sp]: 2.556e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.36998e-06 [split_layernorm_comm]: 1.81003e-06 [handle_group_info]: 1.05999e-06 [symbol_engine_optimizer]: 9.726e-05, [1] [Cycle 1]: 9.283e-05, [6] [build]: 1.241e-05 [elim_shapecalc]: 1.25e-05 [elim_not_effective]: 1.691e-05 [opt_reshape]: 9.32999e-06 [fold_const_symbol]: 1.33e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.43e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 2.237e-05 [get_jit_bprop_graph]: 2.35002e-06 [rewriter_after_jit_bprop_graph]: 3.86999e-06 [opt_after_jit_grad]: 0.00048226 [validate]: 4.891e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.829462 [execute]: 8.89e-06 Sums bootstrap : 0.000519s : 0.06% type_inference : 0.076488s : 8.17% event_method : 0.000055s : 0.01% auto_monad : 0.000140s : 0.01% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000056s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000045s : 0.00% optimize.rewriter_before_opt_a : 0.000141s : 0.02% optimize.opt_a.expand_dump_flag : 0.000010s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.01% optimize.opt_a.loop_unroll : 0.000111s : 0.01% optimize.opt_a.a_1 : 0.003366s : 0.36% optimize.opt_a.with_stream_mark : 0.000063s : 0.01% optimize.opt_a.recompute_prepare : 0.000048s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000312s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.00% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000498s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000061s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.00% optimize.opt_a.merge_send_recv : 0.000040s : 0.00% optimize.opt_a.auto_parallel : 0.000036s : 0.00% optimize.opt_a.parallel : 0.000042s : 0.00% optimize.opt_a.flash_sp : 0.000020s : 0.00% optimize.opt_a.merge_comm : 0.000021s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000055s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.00% optimize.opt_a.virtual_output : 0.000038s : 0.00% optimize.opt_a.merge_forward : 0.000023s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000046s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.00% optimize.opt_a.meta_fg_expand : 0.001762s : 0.19% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000008s : 0.00% optimize.opt_a.after_resolve : 0.000101s : 0.01% optimize.opt_a.a_after_grad : 0.000116s : 0.01% optimize.opt_a.renormalize : 0.018153s : 1.94% optimize.opt_a.add_forward_monad_depend : 0.000024s : 0.00% optimize.opt_a.auto_monad_grad : 0.000010s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000097s : 0.01% optimize.opt_a.cse : 0.000256s : 0.03% optimize.opt_a.a_3 : 0.000496s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000057s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000713s : 0.08% optimize.opt_b.b_1 : 0.000185s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000026s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000443s : 0.05% optimize.opt_after_cconv.c_1 : 0.000047s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000037s : 0.00% optimize.tuple_transform.d_1 : 0.000063s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000064s : 0.01% optimize.cse_after_recomputation.cse : 0.000018s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000022s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000482s : 0.05% validate : 0.000049s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.829462s : 88.63% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000888 209 6.90% : 0.000061s : 11: substitution.arithmetic_simplify 0.31% : 0.000003s : 4: substitution.elim_not_effective 0.54% : 0.000005s : 5: substitution.float_depend_g_call 0.52% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.20% : 0.000002s : 4: substitution.fold_const_symbol 0.88% : 0.000008s : 7: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.22% : 0.000002s : 2: substitution.incorporate_call_switch 57.27% : 0.000509s : 16: substitution.inline 2.10% : 0.000019s : 2: substitution.inline_without_move 1.31% : 0.000012s : 18: substitution.j_node_and_user_rematch 2.15% : 0.000019s : 3: substitution.less_batch_normalization 1.65% : 0.000015s : 11: substitution.minmaximum_grad 0.79% : 0.000007s : 5: substitution.partial_eliminate 1.64% : 0.000015s : 18: substitution.remove_not_recompute_node 3.42% : 0.000030s : 10: substitution.replace_applicator 1.63% : 0.000014s : 15: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.55% : 0.000032s : 11: substitution.tuple_list_convert_item_index_to_positive 1.64% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 8.04% : 0.000071s : 28: substitution.tuple_list_get_item_eliminator 2.28% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.076387 2 97.72% : 0.074646s : 1: type_inference.infer 2.28% : 0.001741s : 1: type_inference.specialize ------[replace.] 0.000232 30 59.53% : 0.000138s : 16: replace.inline 40.47% : 0.000094s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000532 30 93.81% : 0.000499s : 16: match.inline 6.19% : 0.000033s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000774 5429 1.09% : 0.000008s : 65: predicate.accumulaten_eliminater 0.26% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 30: predicate.addn_check_dump 1.06% : 0.000008s : 65: predicate.addn_zero_filter 0.98% : 0.000008s : 65: predicate.adjust_all_reduce_mul_add 2.26% : 0.000017s : 95: predicate.arithmetic_simplify 1.12% : 0.000009s : 65: predicate.cast_eliminate 1.13% : 0.000009s : 65: predicate.check_bprop_eliminate 0.53% : 0.000004s : 30: predicate.compare_switch_simplify 0.07% : 0.000001s : 7: predicate.const_output_eliminate 0.54% : 0.000004s : 30: predicate.depend_value_elim 1.12% : 0.000009s : 65: predicate.dict_get_item_const_eliminator 1.24% : 0.000010s : 65: predicate.dict_get_item_eliminator 1.07% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.34% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 7: predicate.elim_not_effective 0.14% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 72: predicate.environ_add_const_eliminate 1.14% : 0.000009s : 72: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 72: predicate.environ_get_depend_swap 1.65% : 0.000013s : 102: predicate.environ_get_eliminate 1.12% : 0.000009s : 72: predicate.environ_get_set_eliminate 1.62% : 0.000013s : 95: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 95: predicate.float_depend_g_call 0.54% : 0.000004s : 30: predicate.float_environ_get_switch 0.65% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.54% : 0.000004s : 30: predicate.get_grad_eliminate 0.08% : 0.000001s : 7: predicate.graph_param_transform 0.52% : 0.000004s : 30: predicate.incorporate_call 0.46% : 0.000004s : 30: predicate.incorporate_call_switch 5.50% : 0.000043s : 234: predicate.inline 1.23% : 0.000010s : 53: predicate.inline_without_move 0.30% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 30: predicate.less_batch_normalization 1.58% : 0.000012s : 93: predicate.list_to_tuple_eliminator_ 2.52% : 0.000019s : 158: predicate.load_eliminater 0.31% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.15% : 0.000017s : 126: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 79: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 30: predicate.merge_addn 1.14% : 0.000009s : 65: predicate.micro_step_allgather_replace 1.17% : 0.000009s : 65: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 65: predicate.minmaximum_grad 0.32% : 0.000002s : 7: predicate.mutable_eliminate 0.15% : 0.000001s : 7: predicate.opt_reshape 0.16% : 0.000001s : 7: predicate.parallel_virtual_node 2.02% : 0.000016s : 95: predicate.partial_defer_inline 1.63% : 0.000013s : 86: predicate.partial_eliminate 1.07% : 0.000008s : 65: predicate.print_const_string_wrapper 0.56% : 0.000004s : 30: predicate.reduce_all_const_elim 3.21% : 0.000025s : 65: predicate.reduce_eliminate 2.56% : 0.000020s : 158: predicate.redundant_stop_gradient_eliminater 0.41% : 0.000003s : 30: predicate.remove_not_recompute_node 1.90% : 0.000015s : 144: predicate.replace_applicator 0.64% : 0.000005s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.04% : 0.000008s : 65: predicate.reshape_eliminate 1.13% : 0.000009s : 65: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 7: predicate.row_tensor_eliminate 1.33% : 0.000010s : 65: predicate.same_eliminate 0.35% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 30: predicate.shard_identity_eliminate 0.27% : 0.000002s : 14: predicate.special_op_eliminate 0.63% : 0.000005s : 30: predicate.specialize_transform 1.47% : 0.000011s : 65: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 53: predicate.stack_unstack_eliminate 0.13% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 95: predicate.switch_defer_inline 2.75% : 0.000021s : 160: predicate.switch_layer_defer_inline 4.72% : 0.000037s : 258: predicate.switch_simplify 1.04% : 0.000008s : 65: predicate.tile_eliminate 1.08% : 0.000008s : 65: predicate.transpose_eliminate 1.52% : 0.000012s : 79: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 79: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000011s : 79: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000022s : 123: predicate.tuple_list_get_item_eliminator 1.51% : 0.000012s : 79: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000016s : 109: predicate.tuple_list_set_item_eliminator 1.53% : 0.000012s : 93: predicate.tuple_to_list_eliminator_ 2.50% : 0.000019s : 158: predicate.updatestate_pure_node_eliminater 3.02% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 7: predicate.value_based_eliminate 0.55% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 30: predicate.virtual_output_eliminate 0.11% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002026 32 53.98% : 0.001094s : 12: func_graph_cloner_run.FuncGraphClonerGraph 46.02% : 0.000932s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.998458 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.39% : 0.003924s : 1: add_attr 0.39% : 0.003909s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000069s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000150s : 1: auto_monad 0.00% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.05% : 0.000548s : 1: bootstrap 0.00% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000019s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000031s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000064s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000471s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.07% : 0.000723s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.51% : 0.005082s : 117: opt.transform.opt_a 0.00% : 0.000045s : 1: opt.transform.opt_after_cconv 0.00% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000167s : 28: opt.transform.opt_b 0.01% : 0.000070s : 2: opt.transform.opt_trans_graph 0.00% : 0.000048s : 4: opt.transform.symbol_engine_opt 2.72% : 0.027178s : 1: opt_a 0.01% : 0.000134s : 1: opt_after_cconv 0.05% : 0.000492s : 1: opt_after_jit_grad 0.03% : 0.000285s : 1: opt_b 2.98% : 0.029780s : 1: optimize 0.00% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000060s : 1: pre_auto_parallel 0.00% : 0.000050s : 1: py_interpret_to_execute 0.00% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000042s : 1: remove_dup_value 1.62% : 0.016162s : 2: renormalize.infer 0.20% : 0.001969s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000061s : 1: rewriter_after_opt_a 0.01% : 0.000147s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000100s : 1: symbol_engine_optimizer 83.08% : 0.829487s : 1: task_emit 0.01% : 0.000101s : 1: tuple_transform 7.66% : 0.076515s : 1: type_inference 0.01% : 0.000082s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x0-ge],max_mem:4.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x1-pynative],max_mem:4.0M TotalTime = 0.176883, [24] [bootstrap]: 0.00071224 [type_inference]: 0.0731227 [event_method]: 1.763e-05 [auto_monad]: 6.135e-05 [graph_reusing]: 6.01e-06 [inline]: 2.26998e-06 [add_attr]: 0.00517723, [1] [add_attr_with_inline]: 0.00515991, [1] [Cycle 1]: 6.538e-05, [2] [tag_attr]: 2.115e-05 [meta_addattr_fg_expand]: 4.45e-06 [parallel-infer-symbol]: 3.71001e-06 [pre_auto_parallel]: 3.985e-05 [insert-virtual-dataset]: 2.85998e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00498175, [53] [py_interpret_to_execute]: 2.793e-05 [rewriter_before_opt_a]: 7.064e-05 [opt_a]: 0.00266163, [2] [Cycle 1]: 0.00200676, [45] [expand_dump_flag]: 3.00002e-06 [switch_simplify]: 3.379e-05 [loop_unroll]: 2.204e-05 [a_1]: 0.0005174 [with_stream_mark]: 1.791e-05 [recompute_prepare]: 8.76997e-06 [updatestate_depend_eliminate]: 4.11001e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 8.11e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.96e-06 [shard_inline]: 5.98998e-06 [merge_send_recv]: 8.63001e-06 [auto_parallel]: 7.75e-06 [parallel]: 2.967e-05 [flash_sp]: 9.40001e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.43999e-06 [matmul_add_comm_reduction]: 1.016e-05 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 8.38001e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 5.63002e-06 [virtual_output]: 6.13002e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 1.046e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.214e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.84999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.90998e-06 [meta_fg_expand]: 2.62001e-06 [flash_sp_send_recv_attached]: 2.81999e-06 [receive_attached]: 2.73e-06 [after_resolve]: 1.061e-05 [a_after_grad]: 9.41e-06 [renormalize]: 0.00077262 [add_forward_monad_depend]: 5.64e-06 [auto_monad_grad]: 2.29999e-06 [auto_monad_eliminator]: 1.466e-05 [cse]: 3.259e-05 [a_3]: 4.644e-05 [Cycle 2]: 0.00064338, [45] [expand_dump_flag]: 1.46002e-06 [switch_simplify]: 7.55998e-06 [loop_unroll]: 5.89e-06 [a_1]: 0.00013342 [with_stream_mark]: 1.311e-05 [recompute_prepare]: 6.21e-06 [updatestate_depend_eliminate]: 3.09999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.22999e-06 [a_2]: 7.032e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.62001e-06 [meta_shard_fg_expand]: 1.86003e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 5.19e-06 [auto_parallel]: 6.33e-06 [parallel]: 6.44001e-06 [flash_sp]: 3.31001e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 6.73998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.61e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.16998e-06 [virtual_output]: 5.04e-06 [merge_forward]: 3.13e-06 [cell_reuse_recompute_pass]: 2.29001e-06 [offload_activation]: 7.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.028e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.74e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 1.14e-06 [receive_attached]: 1.86e-06 [after_resolve]: 1.008e-05 [a_after_grad]: 8.20999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.47001e-06 [auto_monad_grad]: 1.44998e-06 [auto_monad_eliminator]: 7.01001e-06 [cse]: 2.028e-05 [a_3]: 3.453e-05 [py_interpret_to_execute_after_opt_a]: 9.96e-06 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 3.817e-05 [convert_after_rewriter]: 7.60998e-06 [order_py_execute_after_rewriter]: 4.90999e-06 [mutable_eliminate]: 0.00064429 [opt_b]: 0.00020546, [1] [Cycle 1]: 0.00019688, [7] [b_1]: 0.00011587 [b_2]: 8.04002e-06 [updatestate_depend_eliminate]: 8.18001e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [renormalize]: 3.00002e-07 [cse]: 2.23e-05 [optimize_parallel_all_gather_comm]: 1.826e-05 [overlap_param_gather]: 2.61e-06 [cconv]: 3.002e-05 [loop_unroll]: 0.00050627 [opt_after_cconv]: 0.00011576, [1] [Cycle 1]: 0.00010788, [7] [c_1]: 3.112e-05 [parameter_eliminate]: 4.39002e-06 [updatestate_depend_eliminate]: 8.21002e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 2.53e-06 [cse]: 2.146e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.542e-05 [tuple_transform]: 8.342e-05, [1] [Cycle 1]: 7.856e-05, [4] [d_1]: 4.856e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 7.16999e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 6.365e-05 [cse_after_recomputation]: 2.509e-05, [1] [Cycle 1]: 1.917e-05, [1] [cse]: 1.294e-05 [environ_conv]: 5.74e-06 [swap_dp_allreduce_reducescatter]: 5.61998e-06 [bias_add_comm_swap]: 2.84001e-06 [label_micro_interleaved_index]: 5.12e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.59999e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.724e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.29997e-06 [overlap_recompute_and_grad_model_parallel]: 5.39e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 4.60999e-06 [overlap_grad_flash_sp]: 2.2e-05 [begin_end_overlap_inline]: 9.00007e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 8.912e-05, [1] [Cycle 1]: 8.33e-05, [6] [build]: 3.66999e-06 [elim_shapecalc]: 1.49e-05 [elim_not_effective]: 1.425e-05 [opt_reshape]: 7.11999e-06 [fold_const_symbol]: 9.91998e-06 [renormalize]: 1.50001e-07 [detach_backward]: 2.10002e-06 [pipeline_parallel_scheduler]: 1.92001e-06 [auto_monad_reorder]: 2.077e-05 [get_jit_bprop_graph]: 2.39999e-06 [rewriter_after_jit_bprop_graph]: 0.00017023 [opt_after_jit_grad]: 0.00061957 [validate]: 4.978e-05 [backend_pass]: 1.32e-06 [task_emit]: 0.091594 [execute]: 9.29e-06 Sums bootstrap : 0.000712s : 0.42% type_inference : 0.073123s : 42.87% event_method : 0.000018s : 0.01% auto_monad : 0.000061s : 0.04% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000040s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000028s : 0.02% optimize.rewriter_before_opt_a : 0.000071s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.02% optimize.opt_a.loop_unroll : 0.000028s : 0.02% optimize.opt_a.a_1 : 0.000651s : 0.38% optimize.opt_a.with_stream_mark : 0.000031s : 0.02% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.09% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000014s : 0.01% optimize.opt_a.parallel : 0.000036s : 0.02% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000773s : 0.45% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.01% optimize.opt_a.cse : 0.000053s : 0.03% optimize.opt_a.a_3 : 0.000081s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000038s : 0.02% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000644s : 0.38% optimize.opt_b.b_1 : 0.000116s : 0.07% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.01% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000030s : 0.02% optimize.loop_unroll : 0.000506s : 0.30% optimize.opt_after_cconv.c_1 : 0.000031s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.01% optimize.tuple_transform.d_1 : 0.000049s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000064s : 0.04% optimize.cse_after_recomputation.cse : 0.000013s : 0.01% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000021s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000170s : 0.10% opt_after_jit_grad : 0.000620s : 0.36% validate : 0.000050s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.091594s : 53.70% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000207 30 13.77% : 0.000028s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000001s : 2: substitution.fold_const_symbol 3.06% : 0.000006s : 4: substitution.graph_param_transform 69.49% : 0.000144s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.12% : 0.000004s : 4: substitution.remove_not_recompute_node 2.21% : 0.000005s : 4: substitution.replace_old_param 5.92% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.073052 2 98.89% : 0.072241s : 1: type_inference.infer 1.11% : 0.000811s : 1: type_inference.specialize ------[replace.] 0.000044 5 70.64% : 0.000031s : 3: replace.inline 29.36% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000152 5 92.73% : 0.000141s : 3: match.inline 7.27% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000179 1131 0.83% : 0.000001s : 11: predicate.accumulaten_eliminater 1.51% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 11: predicate.addn_zero_filter 0.73% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000002s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.51% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.75% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.72% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.99% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.99% : 0.000002s : 15: predicate.environ_get_depend_swap 1.58% : 0.000003s : 23: predicate.environ_get_eliminate 0.99% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.13% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.03% : 0.000004s : 16: predicate.float_depend_g_call 0.53% : 0.000001s : 8: predicate.float_environ_get_switch 0.78% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.66% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000001s : 4: predicate.graph_param_transform 0.63% : 0.000001s : 8: predicate.incorporate_call 0.51% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000011s : 51: predicate.inline 0.94% : 0.000002s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.20% : 0.000004s : 32: predicate.load_eliminater 1.64% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.05% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.84% : 0.000002s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000003s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.77% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.19% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.59% : 0.000003s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.47% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000002s : 8: predicate.shard_identity_eliminate 0.93% : 0.000002s : 8: predicate.special_op_eliminate 0.90% : 0.000002s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 16: predicate.switch_defer_inline 1.86% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.86% : 0.000009s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000002s : 11: predicate.transpose_eliminate 1.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.59% : 0.000005s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.10% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.24% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.64% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000628 8 44.65% : 0.000280s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.35% : 0.000347s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.189021 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.74% : 0.005183s : 1: add_attr 2.73% : 0.005164s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000070s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000067s : 1: auto_monad 0.01% : 0.000025s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.40% : 0.000755s : 1: bootstrap 0.02% : 0.000034s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000028s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.01% : 0.000024s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.27% : 0.000519s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000657s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 0.55% : 0.001041s : 78: opt.transform.opt_a 0.02% : 0.000030s : 1: opt.transform.opt_after_cconv 0.02% : 0.000031s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000097s : 28: opt.transform.opt_b 0.03% : 0.000053s : 2: opt.transform.opt_trans_graph 0.02% : 0.000042s : 4: opt.transform.symbol_engine_opt 1.41% : 0.002665s : 1: opt_a 0.06% : 0.000121s : 1: opt_after_cconv 0.34% : 0.000635s : 1: opt_after_jit_grad 0.11% : 0.000209s : 1: opt_b 2.64% : 0.004987s : 1: optimize 0.01% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000045s : 1: pre_auto_parallel 0.02% : 0.000032s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000019s : 1: remove_dup_value 0.22% : 0.000425s : 1: renormalize.infer 0.18% : 0.000338s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.09% : 0.000179s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000043s : 1: rewriter_after_opt_a 0.04% : 0.000075s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000092s : 1: symbol_engine_optimizer 48.47% : 0.091616s : 1: task_emit 0.05% : 0.000086s : 1: tuple_transform 38.70% : 0.073148s : 1: type_inference 0.05% : 0.000091s : 1: validate TotalTime = 0.0817916, [24] [bootstrap]: 0.00049011 [type_inference]: 0.0410938 [event_method]: 1.194e-05 [auto_monad]: 5.396e-05 [graph_reusing]: 5.34998e-06 [inline]: 2.29001e-06 [add_attr]: 0.00326613, [1] [add_attr_with_inline]: 0.00325591, [1] [Cycle 1]: 5.463e-05, [2] [tag_attr]: 1.561e-05 [meta_addattr_fg_expand]: 3.28998e-06 [parallel-infer-symbol]: 3.55e-06 [pre_auto_parallel]: 2.849e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 1.00001e-06 [dataset_repeat_opt]: 2.31998e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00461524, [53] [py_interpret_to_execute]: 1.915e-05 [rewriter_before_opt_a]: 4.538e-05 [opt_a]: 0.00239448, [2] [Cycle 1]: 0.00169741, [45] [expand_dump_flag]: 2.56998e-06 [switch_simplify]: 2.539e-05 [loop_unroll]: 1.398e-05 [a_1]: 0.00032749 [with_stream_mark]: 1.965e-05 [recompute_prepare]: 1.029e-05 [updatestate_depend_eliminate]: 4.25e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.99e-06 [a_2]: 8.192e-05 [accelerated_algorithm]: 7.87e-06 [shard]: 2.63e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 6.64999e-06 [merge_send_recv]: 8.79998e-06 [auto_parallel]: 8.38999e-06 [parallel]: 2.018e-05 [flash_sp]: 9.42001e-06 [merge_comm]: 4.06001e-06 [allreduce_fusion]: 3.42997e-06 [matmul_add_comm_reduction]: 9.69999e-06 [allreduce_slice_to_reducescatter]: 1.19e-06 [virtual_shard_identity]: 8.32998e-06 [virtual_dataset]: 6.46e-06 [get_grad_eliminate_]: 6.40002e-06 [virtual_output]: 5.96e-06 [merge_forward]: 4.77998e-06 [cell_reuse_recompute_pass]: 1.72001e-06 [offload_activation]: 1.081e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.313e-05 [merge_recompute_call_nodes]: 2.02001e-06 [before_grad]: 1.058e-05 [set_forward_comm_id_for_comm_node_pass]: 4.33001e-06 [meta_fg_expand]: 2.68e-06 [flash_sp_send_recv_attached]: 3.66999e-06 [receive_attached]: 2.98998e-06 [after_resolve]: 1.205e-05 [a_after_grad]: 9.50001e-06 [renormalize]: 0.00064556 [add_forward_monad_depend]: 6.61999e-06 [auto_monad_grad]: 2.27001e-06 [auto_monad_eliminator]: 1.7e-05 [cse]: 2.931e-05 [a_3]: 4.678e-05 [Cycle 2]: 0.00068567, [45] [expand_dump_flag]: 1.76e-06 [switch_simplify]: 7.63999e-06 [loop_unroll]: 5.92001e-06 [a_1]: 0.00013767 [with_stream_mark]: 1.573e-05 [recompute_prepare]: 7.23999e-06 [updatestate_depend_eliminate]: 3.51001e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 1.29e-06 [a_2]: 7.183e-05 [accelerated_algorithm]: 6.43998e-06 [shard]: 1.34998e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 5.56e-06 [auto_parallel]: 6.86001e-06 [parallel]: 6.48e-06 [flash_sp]: 3.61999e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 6.88e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 8.43999e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 6.05002e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.66e-06 [offload_activation]: 8.36002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.268e-05 [merge_recompute_call_nodes]: 1.07e-06 [before_grad]: 8.30999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 1.19003e-06 [receive_attached]: 1.89e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.53001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 2.97002e-06 [auto_monad_grad]: 1.52001e-06 [auto_monad_eliminator]: 1.087e-05 [cse]: 1.606e-05 [a_3]: 3.492e-05 [py_interpret_to_execute_after_opt_a]: 1.205e-05 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 4.014e-05 [convert_after_rewriter]: 7.46999e-06 [order_py_execute_after_rewriter]: 4.84e-06 [mutable_eliminate]: 0.00060658 [opt_b]: 0.00020542, [1] [Cycle 1]: 0.00019816, [7] [b_1]: 0.00011376 [b_2]: 8.08999e-06 [updatestate_depend_eliminate]: 8.15999e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 4.7998e-07 [cse]: 2.195e-05 [optimize_parallel_all_gather_comm]: 1.911e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.835e-05 [loop_unroll]: 0.00051274 [opt_after_cconv]: 0.00011122, [1] [Cycle 1]: 0.00010351, [7] [c_1]: 3.035e-05 [parameter_eliminate]: 4.79e-06 [updatestate_depend_eliminate]: 7.52998e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 2.63e-06 [cse]: 1.96e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.412e-05 [tuple_transform]: 7.799e-05, [1] [Cycle 1]: 7.332e-05, [4] [d_1]: 4.554e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.30007e-07 [switch_simplify]: 6.88998e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 5.029e-05 [cse_after_recomputation]: 2.412e-05, [1] [Cycle 1]: 1.868e-05, [1] [cse]: 1.15e-05 [environ_conv]: 5.71e-06 [swap_dp_allreduce_reducescatter]: 5.47999e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.88001e-06 [label_fine_grained_interleaved_index]: 3.37997e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.48002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.81999e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.77001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.596e-05 [grouped_pairwise_exchange_alltoall]: 1.77001e-06 [offloading_packed_experts]: 4.27998e-06 [overlap_recompute_and_grad_model_parallel]: 5.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.51998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 2.039e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.25002e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.62001e-06 [symbol_engine_optimizer]: 8.63e-05, [1] [Cycle 1]: 8.072e-05, [6] [build]: 3.91999e-06 [elim_shapecalc]: 1.157e-05 [elim_not_effective]: 1.365e-05 [opt_reshape]: 7.82998e-06 [fold_const_symbol]: 1.006e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.48e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.026e-05 [get_jit_bprop_graph]: 1.72999e-06 [rewriter_after_jit_bprop_graph]: 5.48002e-06 [opt_after_jit_grad]: 0.00052825 [validate]: 4.369e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.0313478 [execute]: 9.22001e-06 Sums bootstrap : 0.000490s : 0.63% type_inference : 0.041094s : 53.10% event_method : 0.000012s : 0.02% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000045s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000033s : 0.04% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000465s : 0.60% optimize.opt_a.with_stream_mark : 0.000035s : 0.05% optimize.opt_a.recompute_prepare : 0.000018s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000015s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000013s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000026s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000022s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000646s : 0.83% optimize.opt_a.add_forward_monad_depend : 0.000010s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000028s : 0.04% optimize.opt_a.cse : 0.000045s : 0.06% optimize.opt_a.a_3 : 0.000082s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000607s : 0.78% optimize.opt_b.b_1 : 0.000114s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000028s : 0.04% optimize.loop_unroll : 0.000513s : 0.66% optimize.opt_after_cconv.c_1 : 0.000030s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000046s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000020s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000528s : 0.68% validate : 0.000044s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.031348s : 40.51% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000148 26 19.05% : 0.000028s : 4: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 4.46% : 0.000007s : 4: substitution.graph_param_transform 65.38% : 0.000097s : 2: substitution.inline 2.50% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.66% : 0.000005s : 4: substitution.remove_not_recompute_node 2.84% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.041041 2 99.01% : 0.040635s : 1: type_inference.infer 0.99% : 0.000406s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000094 2 100.00% : 0.000094s : 2: match.inline ------[predicate.] 0.000157 984 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 1.64% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.64% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.56% : 0.000004s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.75% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.63% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.66% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 13: predicate.environ_get_depend_swap 1.73% : 0.000003s : 21: predicate.environ_get_eliminate 0.95% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.88% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 1.04% : 0.000002s : 8: predicate.get_grad_eliminate 0.37% : 0.000001s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000010s : 44: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.19% : 0.000002s : 8: predicate.less_batch_normalization 1.41% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.95% : 0.000003s : 26: predicate.load_eliminater 2.10% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.61% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 9: predicate.minmaximum_grad 2.29% : 0.000004s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.34% : 0.000002s : 11: predicate.partial_defer_inline 1.11% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 9: predicate.reduce_eliminate 2.01% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.82% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.47% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000002s : 8: predicate.same_eliminate 0.79% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.24% : 0.000002s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.41% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.52% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.93% : 0.000001s : 11: predicate.switch_defer_inline 1.57% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.36% : 0.000007s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.73% : 0.000001s : 9: predicate.transpose_eliminate 1.41% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.85% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.86% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000303 6 35.97% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.03% : 0.000194s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.091340 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.58% : 0.003272s : 1: add_attr 3.57% : 0.003260s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.03% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.57% : 0.000523s : 1: bootstrap 0.03% : 0.000032s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000019s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000005s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000007s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.57% : 0.000524s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.68% : 0.000620s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000018s : 1: opt.transform.mutable_eliminate 0.93% : 0.000851s : 78: opt.transform.opt_a 0.03% : 0.000029s : 1: opt.transform.opt_after_cconv 0.03% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000094s : 28: opt.transform.opt_b 0.06% : 0.000050s : 2: opt.transform.opt_trans_graph 0.04% : 0.000039s : 4: opt.transform.symbol_engine_opt 2.63% : 0.002398s : 1: opt_a 0.13% : 0.000115s : 1: opt_after_cconv 0.59% : 0.000542s : 1: opt_after_jit_grad 0.23% : 0.000209s : 1: opt_b 5.06% : 0.004621s : 1: optimize 0.03% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.42% : 0.000381s : 1: renormalize.infer 0.28% : 0.000255s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000045s : 1: rewriter_after_opt_a 0.06% : 0.000051s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000089s : 1: symbol_engine_optimizer 34.34% : 0.031369s : 1: task_emit 0.09% : 0.000081s : 1: tuple_transform 45.01% : 0.041115s : 1: type_inference 0.09% : 0.000084s : 1: validate TotalTime = 0.0336612, [24] [bootstrap]: 0.00040061 [type_inference]: 0.00591183 [event_method]: 1.762e-05 [auto_monad]: 5.921e-05 [graph_reusing]: 6.79999e-06 [inline]: 2.66999e-06 [add_attr]: 0.00341665, [1] [add_attr_with_inline]: 0.00340727, [1] [Cycle 1]: 6.263e-05, [2] [tag_attr]: 2.007e-05 [meta_addattr_fg_expand]: 5.07999e-06 [parallel-infer-symbol]: 3.81001e-06 [pre_auto_parallel]: 3.561e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 1.16002e-06 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.525e-05 [optimize]: 0.0050212, [53] [py_interpret_to_execute]: 2.652e-05 [rewriter_before_opt_a]: 6.979e-05 [opt_a]: 0.00269761, [2] [Cycle 1]: 0.00198019, [45] [expand_dump_flag]: 3.3e-06 [switch_simplify]: 3.587e-05 [loop_unroll]: 2.471e-05 [a_1]: 0.00051539 [with_stream_mark]: 1.715e-05 [recompute_prepare]: 8.88002e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.71999e-06 [updatestate_loads_eliminate]: 3.17002e-06 [parameter_eliminate]: 2.05002e-06 [a_2]: 8.462e-05 [accelerated_algorithm]: 7.26999e-06 [shard]: 2.48e-06 [meta_shard_fg_expand]: 1.96e-06 [shard_inline]: 6.77002e-06 [merge_send_recv]: 8.12e-06 [auto_parallel]: 6.69001e-06 [parallel]: 1.92e-05 [flash_sp]: 8.48999e-06 [merge_comm]: 4.05e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 9.36e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.85e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.76e-06 [merge_forward]: 3.83999e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 9.37999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 1.01e-05 [set_forward_comm_id_for_comm_node_pass]: 3.30998e-06 [meta_fg_expand]: 2.78e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.078e-05 [a_after_grad]: 9.35001e-06 [renormalize]: 0.0007472 [add_forward_monad_depend]: 7.77e-06 [auto_monad_grad]: 2.31998e-06 [auto_monad_eliminator]: 1.757e-05 [cse]: 3.013e-05 [a_3]: 4.706e-05 [Cycle 2]: 0.00070409, [45] [expand_dump_flag]: 2.11e-06 [switch_simplify]: 8.2e-06 [loop_unroll]: 5.71998e-06 [a_1]: 0.00013649 [with_stream_mark]: 1.592e-05 [recompute_prepare]: 7.53e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 1.45001e-06 [a_2]: 7.132e-05 [accelerated_algorithm]: 6.26998e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 2.34001e-06 [shard_inline]: 6.23002e-06 [merge_send_recv]: 6.63003e-06 [auto_parallel]: 7.6e-06 [parallel]: 7.03e-06 [flash_sp]: 3.23e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 7.97e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.23002e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.52002e-06 [cell_reuse_recompute_pass]: 2.37999e-06 [offload_activation]: 8.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.268e-05 [merge_recompute_call_nodes]: 1.05001e-06 [before_grad]: 8.77e-06 [set_forward_comm_id_for_comm_node_pass]: 4.34997e-06 [meta_fg_expand]: 2.52001e-06 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 2.26998e-06 [after_resolve]: 1.062e-05 [a_after_grad]: 8.89e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 3.36001e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.228e-05 [cse]: 2.397e-05 [a_3]: 3.545e-05 [py_interpret_to_execute_after_opt_a]: 1.309e-05 [slice_cell_reuse_recomputed_activation]: 1.83002e-06 [rewriter_after_opt_a]: 4.02e-05 [convert_after_rewriter]: 8.01001e-06 [order_py_execute_after_rewriter]: 5.10001e-06 [mutable_eliminate]: 0.00063908 [opt_b]: 0.0002067, [1] [Cycle 1]: 0.00019855, [7] [b_1]: 0.00011543 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 8.32e-06 [updatestate_assign_eliminate]: 2.80997e-06 [updatestate_loads_eliminate]: 3.22002e-06 [renormalize]: 4.19997e-07 [cse]: 2.365e-05 [optimize_parallel_all_gather_comm]: 2.085e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 3.353e-05 [loop_unroll]: 0.00052559 [opt_after_cconv]: 0.0001179, [1] [Cycle 1]: 0.00011031, [7] [c_1]: 3.03e-05 [parameter_eliminate]: 5.19e-06 [updatestate_depend_eliminate]: 8.97999e-06 [updatestate_assign_eliminate]: 2.89999e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 2.259e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.48e-05 [tuple_transform]: 8.305e-05, [1] [Cycle 1]: 7.785e-05, [4] [d_1]: 4.749e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 7.1e-06 [partial_unused_args_eliminate]: 1.98997e-06 [add_recomputation]: 5.313e-05 [cse_after_recomputation]: 2.406e-05, [1] [Cycle 1]: 1.88e-05, [1] [cse]: 1.235e-05 [environ_conv]: 6.09999e-06 [swap_dp_allreduce_reducescatter]: 5.86003e-06 [bias_add_comm_swap]: 2.98e-06 [label_micro_interleaved_index]: 5.51e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 1.02998e-06 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.43998e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.59998e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.5e-05 [grouped_pairwise_exchange_alltoall]: 1.87999e-06 [offloading_packed_experts]: 4.36002e-06 [overlap_recompute_and_grad_model_parallel]: 4.66002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 4.25e-06 [overlap_grad_flash_sp]: 2.091e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.58e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.12999e-06 [symbol_engine_optimizer]: 8.411e-05, [1] [Cycle 1]: 7.778e-05, [6] [build]: 3.71999e-06 [elim_shapecalc]: 1.214e-05 [elim_not_effective]: 1.33e-05 [opt_reshape]: 6.69001e-06 [fold_const_symbol]: 9.36998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.46e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 1.937e-05 [get_jit_bprop_graph]: 1.74e-06 [rewriter_after_jit_bprop_graph]: 5.96e-06 [opt_after_jit_grad]: 0.00056141 [validate]: 4.257e-05 [backend_pass]: 1.19e-06 [task_emit]: 0.0178722 [execute]: 9.82001e-06 Sums bootstrap : 0.000401s : 1.38% type_inference : 0.005912s : 20.30% event_method : 0.000018s : 0.06% auto_monad : 0.000059s : 0.20% graph_reusing : 0.000007s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.02% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000036s : 0.12% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000015s : 0.05% optimize.py_interpret_to_execute : 0.000027s : 0.09% optimize.rewriter_before_opt_a : 0.000070s : 0.24% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000044s : 0.15% optimize.opt_a.loop_unroll : 0.000030s : 0.10% optimize.opt_a.a_1 : 0.000652s : 2.24% optimize.opt_a.with_stream_mark : 0.000033s : 0.11% optimize.opt_a.recompute_prepare : 0.000016s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000156s : 0.54% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.05% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.04% optimize.opt_a.merge_send_recv : 0.000015s : 0.05% optimize.opt_a.auto_parallel : 0.000014s : 0.05% optimize.opt_a.parallel : 0.000026s : 0.09% optimize.opt_a.flash_sp : 0.000012s : 0.04% optimize.opt_a.merge_comm : 0.000008s : 0.03% optimize.opt_a.allreduce_fusion : 0.000007s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.06% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.05% optimize.opt_a.virtual_dataset : 0.000012s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.04% optimize.opt_a.virtual_output : 0.000011s : 0.04% optimize.opt_a.merge_forward : 0.000007s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.06% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.07% optimize.opt_a.a_after_grad : 0.000018s : 0.06% optimize.opt_a.renormalize : 0.000747s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000030s : 0.10% optimize.opt_a.cse : 0.000054s : 0.19% optimize.opt_a.a_3 : 0.000083s : 0.28% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.14% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000639s : 2.19% optimize.opt_b.b_1 : 0.000115s : 0.40% optimize.opt_b.b_2 : 0.000007s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000024s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000034s : 0.12% optimize.loop_unroll : 0.000526s : 1.80% optimize.opt_after_cconv.c_1 : 0.000030s : 0.10% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000023s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.05% optimize.tuple_transform.d_1 : 0.000047s : 0.16% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.18% optimize.cse_after_recomputation.cse : 0.000012s : 0.04% optimize.environ_conv : 0.000006s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000019s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000006s : 0.02% opt_after_jit_grad : 0.000561s : 1.93% validate : 0.000043s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.017872s : 61.37% execute : 0.000010s : 0.03% Time group info: ------[substitution.] 0.000197 30 14.78% : 0.000029s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000002s : 2: substitution.fold_const_symbol 3.61% : 0.000007s : 4: substitution.graph_param_transform 67.39% : 0.000133s : 3: substitution.inline 1.79% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.32% : 0.000005s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 5.90% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005864 2 89.39% : 0.005241s : 1: type_inference.infer 10.61% : 0.000622s : 1: type_inference.specialize ------[replace.] 0.000044 5 68.75% : 0.000030s : 3: replace.inline 31.25% : 0.000014s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000141 5 92.55% : 0.000131s : 3: match.inline 7.45% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000182 1131 0.87% : 0.000002s : 11: predicate.accumulaten_eliminater 1.05% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.94% : 0.000002s : 11: predicate.addn_zero_filter 0.74% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000002s : 11: predicate.cast_eliminate 0.64% : 0.000001s : 8: predicate.check_bprop_eliminate 0.53% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.67% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.62% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.98% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.06% : 0.000004s : 16: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.82% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.64% : 0.000001s : 8: predicate.get_grad_eliminate 0.21% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.51% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000011s : 51: predicate.inline 1.03% : 0.000002s : 8: predicate.inline_without_move 0.34% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.25% : 0.000004s : 32: predicate.load_eliminater 1.87% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.54% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 1.03% : 0.000002s : 11: predicate.minmaximum_grad 2.31% : 0.000004s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.50% : 0.000003s : 16: predicate.partial_defer_inline 1.38% : 0.000003s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.29% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000002s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000002s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000002s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.78% : 0.000009s : 54: predicate.switch_simplify 0.86% : 0.000002s : 11: predicate.tile_eliminate 0.80% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.46% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.15% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.94% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.65% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.27% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.41% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000420 8 42.79% : 0.000180s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.21% : 0.000240s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.044076 196 0.01% : 0.000004s : 1: ForceFp32Comm 7.76% : 0.003422s : 1: add_attr 7.74% : 0.003411s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.13% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.15% : 0.000064s : 1: auto_monad 0.05% : 0.000024s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.97% : 0.000429s : 1: bootstrap 0.09% : 0.000038s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000019s : 1: control_data_broadcast_order 0.03% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000009s : 1: environ_conv 0.06% : 0.000025s : 1: event_method 0.04% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000009s : 1: label_micro_interleaved_index 1.22% : 0.000539s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 1.48% : 0.000652s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000019s : 1: opt.transform.mutable_eliminate 2.40% : 0.001056s : 78: opt.transform.opt_a 0.07% : 0.000029s : 1: opt.transform.opt_after_cconv 0.07% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.21% : 0.000094s : 28: opt.transform.opt_b 0.12% : 0.000052s : 2: opt.transform.opt_trans_graph 0.08% : 0.000037s : 4: opt.transform.symbol_engine_opt 6.13% : 0.002701s : 1: opt_a 0.28% : 0.000122s : 1: opt_after_cconv 1.31% : 0.000576s : 1: opt_after_jit_grad 0.48% : 0.000210s : 1: opt_b 11.40% : 0.005027s : 1: optimize 0.06% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.04% : 0.000019s : 1: pipeline_split 0.09% : 0.000040s : 1: pre_auto_parallel 0.07% : 0.000031s : 1: py_interpret_to_execute 0.04% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.04% : 0.000018s : 1: remove_dup_value 0.93% : 0.000411s : 1: renormalize.infer 0.74% : 0.000327s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000045s : 1: rewriter_after_opt_a 0.17% : 0.000075s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.20% : 0.000087s : 1: symbol_engine_optimizer 40.59% : 0.017892s : 1: task_emit 0.20% : 0.000086s : 1: tuple_transform 13.45% : 0.005930s : 1: type_inference 0.19% : 0.000082s : 1: validate TotalTime = 0.104332, [24] [bootstrap]: 0.00049933 [type_inference]: 0.0341186 [event_method]: 5.985e-05 [auto_monad]: 0.00013281 [graph_reusing]: 8.85001e-06 [inline]: 2.69999e-06 [add_attr]: 0.003614, [1] [add_attr_with_inline]: 0.00360223, [1] [Cycle 1]: 9.334e-05, [2] [tag_attr]: 4.132e-05 [meta_addattr_fg_expand]: 1.158e-05 [parallel-infer-symbol]: 3.73001e-06 [pre_auto_parallel]: 5.981e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 9.30013e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.0419531, [53] [py_interpret_to_execute]: 5.264e-05 [rewriter_before_opt_a]: 0.00016535 [opt_a]: 0.0389259, [3] [Cycle 1]: 0.0214337, [45] [expand_dump_flag]: 4.39002e-06 [switch_simplify]: 8.183e-05 [loop_unroll]: 6.46e-05 [a_1]: 0.013312 [with_stream_mark]: 5.209e-05 [recompute_prepare]: 3.408e-05 [updatestate_depend_eliminate]: 1.187e-05 [updatestate_assign_eliminate]: 8.05999e-06 [updatestate_loads_eliminate]: 7.48e-06 [parameter_eliminate]: 3.66001e-06 [a_2]: 0.00026528 [accelerated_algorithm]: 4.048e-05 [shard]: 2.64001e-06 [meta_shard_fg_expand]: 8.79e-06 [shard_inline]: 1.755e-05 [merge_send_recv]: 2.024e-05 [auto_parallel]: 1.736e-05 [parallel]: 2.194e-05 [flash_sp]: 1.594e-05 [merge_comm]: 1.079e-05 [allreduce_fusion]: 9.18002e-06 [matmul_add_comm_reduction]: 3.543e-05 [allreduce_slice_to_reducescatter]: 1.49e-06 [virtual_shard_identity]: 2.212e-05 [virtual_dataset]: 1.693e-05 [get_grad_eliminate_]: 1.631e-05 [virtual_output]: 1.663e-05 [merge_forward]: 1.185e-05 [cell_reuse_recompute_pass]: 1.66e-06 [offload_activation]: 2.167e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.331e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 2.994e-05 [set_forward_comm_id_for_comm_node_pass]: 1.21e-05 [meta_fg_expand]: 0.00211561 [flash_sp_send_recv_attached]: 5.72999e-06 [receive_attached]: 3.08e-06 [after_resolve]: 8.648e-05 [a_after_grad]: 9.663e-05 [renormalize]: 0.00381936 [add_forward_monad_depend]: 1.466e-05 [auto_monad_grad]: 7.33e-06 [auto_monad_eliminator]: 6.462e-05 [cse]: 0.00018706 [a_3]: 0.00036162 [Cycle 2]: 0.0163732, [45] [expand_dump_flag]: 2.89001e-06 [switch_simplify]: 5.052e-05 [loop_unroll]: 4.532e-05 [a_1]: 0.00166167 [with_stream_mark]: 2.006e-05 [recompute_prepare]: 1.258e-05 [updatestate_depend_eliminate]: 6.28e-06 [updatestate_assign_eliminate]: 5.50001e-06 [updatestate_loads_eliminate]: 4.76002e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 0.00013314 [accelerated_algorithm]: 1.46e-05 [shard]: 2.58998e-06 [meta_shard_fg_expand]: 3.04001e-06 [shard_inline]: 9.78002e-06 [merge_send_recv]: 1.101e-05 [auto_parallel]: 1.157e-05 [parallel]: 9.59e-06 [flash_sp]: 4.11001e-06 [merge_comm]: 6.99001e-06 [allreduce_fusion]: 5.61998e-06 [matmul_add_comm_reduction]: 1.318e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.242e-05 [virtual_dataset]: 9.31e-06 [get_grad_eliminate_]: 9.20001e-06 [virtual_output]: 8.65001e-06 [merge_forward]: 5.70001e-06 [cell_reuse_recompute_pass]: 1.77999e-06 [offload_activation]: 1.405e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.838e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 1.555e-05 [set_forward_comm_id_for_comm_node_pass]: 5.82001e-06 [meta_fg_expand]: 0.00011041 [flash_sp_send_recv_attached]: 1.82001e-06 [receive_attached]: 2.53e-06 [after_resolve]: 2.02e-05 [a_after_grad]: 1.559e-05 [renormalize]: 0.013563 [add_forward_monad_depend]: 1.244e-05 [auto_monad_grad]: 3.57997e-06 [auto_monad_eliminator]: 3.007e-05 [cse]: 8.031e-05 [a_3]: 9.773e-05 [Cycle 3]: 0.00109611, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 1.367e-05 [loop_unroll]: 1.101e-05 [a_1]: 0.00030735 [with_stream_mark]: 1.842e-05 [recompute_prepare]: 1.1e-05 [updatestate_depend_eliminate]: 7.61999e-06 [updatestate_assign_eliminate]: 5.12e-06 [updatestate_loads_eliminate]: 5.16002e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 0.00013569 [accelerated_algorithm]: 1.461e-05 [shard]: 3.18e-06 [meta_shard_fg_expand]: 3.06999e-06 [shard_inline]: 9.41e-06 [merge_send_recv]: 1.163e-05 [auto_parallel]: 1.332e-05 [parallel]: 1.049e-05 [flash_sp]: 1.86e-06 [merge_comm]: 5.65001e-06 [allreduce_fusion]: 5.62001e-06 [matmul_add_comm_reduction]: 1.344e-05 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 1.136e-05 [virtual_dataset]: 9.27001e-06 [get_grad_eliminate_]: 9.39e-06 [virtual_output]: 9.31998e-06 [merge_forward]: 6.17999e-06 [cell_reuse_recompute_pass]: 3.14999e-06 [offload_activation]: 1.434e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.835e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 1.642e-05 [set_forward_comm_id_for_comm_node_pass]: 6.02001e-06 [meta_fg_expand]: 4.42e-06 [flash_sp_send_recv_attached]: 1.84e-06 [receive_attached]: 2.67001e-06 [after_resolve]: 1.853e-05 [a_after_grad]: 1.693e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.71e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.528e-05 [cse]: 3.277e-05 [a_3]: 6.358e-05 [py_interpret_to_execute_after_opt_a]: 2.416e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 5.929e-05 [convert_after_rewriter]: 9.84999e-06 [order_py_execute_after_rewriter]: 7.46999e-06 [mutable_eliminate]: 0.00078694 [opt_b]: 0.00033352, [1] [Cycle 1]: 0.00032398, [7] [b_1]: 0.00020568 [b_2]: 1.22e-05 [updatestate_depend_eliminate]: 1.046e-05 [updatestate_assign_eliminate]: 4.72e-06 [updatestate_loads_eliminate]: 4.62998e-06 [renormalize]: 6.09987e-07 [cse]: 4.577e-05 [optimize_parallel_all_gather_comm]: 3.457e-05 [overlap_param_gather]: 2.52001e-06 [cconv]: 3.38e-05 [loop_unroll]: 0.00055181 [opt_after_cconv]: 0.00016341, [1] [Cycle 1]: 0.00015494, [7] [c_1]: 5.659e-05 [parameter_eliminate]: 4.23999e-06 [updatestate_depend_eliminate]: 8.84003e-06 [updatestate_assign_eliminate]: 5.10999e-06 [updatestate_loads_eliminate]: 4.86002e-06 [cse]: 3.649e-05 [renormalize]: 9.89996e-07 [remove_dup_value]: 5.015e-05 [tuple_transform]: 0.00012079, [1] [Cycle 1]: 0.00011475, [4] [d_1]: 7.955e-05 [none_parameter_eliminate]: 2.06e-06 [renormalize]: 3.09985e-07 [switch_simplify]: 1.17e-05 [partial_unused_args_eliminate]: 2.19001e-06 [add_recomputation]: 7.341e-05 [cse_after_recomputation]: 3.682e-05, [1] [Cycle 1]: 3.117e-05, [1] [cse]: 2.471e-05 [environ_conv]: 1.312e-05 [swap_dp_allreduce_reducescatter]: 8.52e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 5.07e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.72001e-06 [slice_recompute_activation]: 2.68003e-06 [micro_interleaved_order_control]: 2.25002e-06 [assign_add_opt]: 1.49998e-06 [ForceFp32Comm]: 1.25001e-06 [remove_cast_before_assign_add]: 9.90025e-07 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.24003e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.43002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 2.025e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 5.86998e-06 [overlap_recompute_and_grad_model_parallel]: 6.65002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.80997e-06 [overlap_grad_ring_attention]: 6.34999e-06 [overlap_grad_flash_sp]: 3.026e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.57999e-06 [symbol_engine_optimizer]: 0.00012108, [1] [Cycle 1]: 0.00011575, [6] [build]: 1.416e-05 [elim_shapecalc]: 1.8e-05 [elim_not_effective]: 2.245e-05 [opt_reshape]: 1.224e-05 [fold_const_symbol]: 1.669e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.35002e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 2.778e-05 [get_jit_bprop_graph]: 2.40002e-06 [rewriter_after_jit_bprop_graph]: 5.94e-06 [opt_after_jit_grad]: 0.00058035 [validate]: 6.35e-05 [backend_pass]: 1.12999e-06 [task_emit]: 0.0229135 [execute]: 8.90999e-06 Sums bootstrap : 0.000499s : 0.50% type_inference : 0.034119s : 34.43% event_method : 0.000060s : 0.06% auto_monad : 0.000133s : 0.13% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000041s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000012s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000060s : 0.06% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000053s : 0.05% optimize.rewriter_before_opt_a : 0.000165s : 0.17% optimize.opt_a.expand_dump_flag : 0.000010s : 0.01% optimize.opt_a.switch_simplify : 0.000146s : 0.15% optimize.opt_a.loop_unroll : 0.000121s : 0.12% optimize.opt_a.a_1 : 0.015281s : 15.42% optimize.opt_a.with_stream_mark : 0.000091s : 0.09% optimize.opt_a.recompute_prepare : 0.000058s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000026s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.02% optimize.opt_a.parameter_eliminate : 0.000007s : 0.01% optimize.opt_a.a_2 : 0.000534s : 0.54% optimize.opt_a.accelerated_algorithm : 0.000070s : 0.07% optimize.opt_a.shard : 0.000008s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000015s : 0.02% optimize.opt_a.shard_inline : 0.000037s : 0.04% optimize.opt_a.merge_send_recv : 0.000043s : 0.04% optimize.opt_a.auto_parallel : 0.000042s : 0.04% optimize.opt_a.parallel : 0.000042s : 0.04% optimize.opt_a.flash_sp : 0.000022s : 0.02% optimize.opt_a.merge_comm : 0.000023s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000062s : 0.06% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000046s : 0.05% optimize.opt_a.virtual_dataset : 0.000036s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.04% optimize.opt_a.virtual_output : 0.000035s : 0.03% optimize.opt_a.merge_forward : 0.000024s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000007s : 0.01% optimize.opt_a.offload_activation : 0.000050s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000070s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000062s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000024s : 0.02% optimize.opt_a.meta_fg_expand : 0.002230s : 2.25% optimize.opt_a.flash_sp_send_recv_attached : 0.000009s : 0.01% optimize.opt_a.receive_attached : 0.000008s : 0.01% optimize.opt_a.after_resolve : 0.000125s : 0.13% optimize.opt_a.a_after_grad : 0.000129s : 0.13% optimize.opt_a.renormalize : 0.017382s : 17.54% optimize.opt_a.add_forward_monad_depend : 0.000029s : 0.03% optimize.opt_a.auto_monad_grad : 0.000013s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000110s : 0.11% optimize.opt_a.cse : 0.000300s : 0.30% optimize.opt_a.a_3 : 0.000523s : 0.53% optimize.py_interpret_to_execute_after_opt_a : 0.000024s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000059s : 0.06% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000787s : 0.79% optimize.opt_b.b_1 : 0.000206s : 0.21% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000046s : 0.05% optimize.optimize_parallel_all_gather_comm : 0.000035s : 0.03% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000034s : 0.03% optimize.loop_unroll : 0.000552s : 0.56% optimize.opt_after_cconv.c_1 : 0.000057s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.cse : 0.000036s : 0.04% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000050s : 0.05% optimize.tuple_transform.d_1 : 0.000080s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000012s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000073s : 0.07% optimize.cse_after_recomputation.cse : 0.000025s : 0.02% optimize.environ_conv : 0.000013s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000030s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000002s : 0.00% optimize.symbol_engine_optimizer.build : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000018s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000022s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000028s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000580s : 0.59% validate : 0.000063s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.022914s : 23.12% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.012485 222 0.54% : 0.000068s : 12: substitution.arithmetic_simplify 0.19% : 0.000024s : 2: substitution.cast_eliminate 0.02% : 0.000003s : 5: substitution.elim_not_effective 0.04% : 0.000005s : 5: substitution.float_depend_g_call 0.04% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.02% : 0.000002s : 5: substitution.fold_const_symbol 0.07% : 0.000009s : 8: substitution.graph_param_transform 0.03% : 0.000004s : 2: substitution.incorporate_call 0.02% : 0.000002s : 2: substitution.incorporate_call_switch 5.50% : 0.000686s : 17: substitution.inline 0.19% : 0.000024s : 2: substitution.inline_without_move 0.10% : 0.000012s : 20: substitution.j_node_and_user_rematch 0.18% : 0.000023s : 3: substitution.less_batch_normalization 0.16% : 0.000020s : 11: substitution.minmaximum_grad 0.05% : 0.000007s : 5: substitution.partial_eliminate 0.13% : 0.000016s : 20: substitution.remove_not_recompute_node 0.28% : 0.000035s : 10: substitution.replace_applicator 0.13% : 0.000016s : 15: substitution.replace_old_param 0.03% : 0.000004s : 1: substitution.set_cell_output_no_recompute 91.10% : 0.011374s : 11: substitution.tuple_list_convert_item_index_to_positive 0.13% : 0.000016s : 11: substitution.tuple_list_get_item_const_eliminator 0.19% : 0.000024s : 11: substitution.tuple_list_get_item_depend_reorder 0.68% : 0.000085s : 30: substitution.tuple_list_get_item_eliminator 0.17% : 0.000022s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.034026 2 94.59% : 0.032185s : 1: type_inference.infer 5.41% : 0.001842s : 1: type_inference.specialize ------[replace.] 0.000283 33 60.46% : 0.000171s : 17: replace.inline 39.54% : 0.000112s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000714 33 94.39% : 0.000674s : 17: match.inline 5.61% : 0.000040s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000837 5764 1.18% : 0.000010s : 68: predicate.accumulaten_eliminater 0.32% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000009s : 68: predicate.addn_zero_filter 0.97% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.27% : 0.000019s : 100: predicate.arithmetic_simplify 1.21% : 0.000010s : 68: predicate.cast_eliminate 1.06% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.56% : 0.000005s : 32: predicate.depend_value_elim 1.10% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.38% : 0.000012s : 68: predicate.dict_get_item_eliminator 1.03% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.12% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.17% : 0.000010s : 76: predicate.environ_get_depend_swap 1.65% : 0.000014s : 108: predicate.environ_get_eliminate 1.15% : 0.000010s : 76: predicate.environ_get_set_eliminate 1.61% : 0.000014s : 101: predicate.exchange_switch_depend_value 2.35% : 0.000020s : 101: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.62% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000005s : 32: predicate.incorporate_call 0.45% : 0.000004s : 32: predicate.incorporate_call_switch 5.96% : 0.000050s : 249: predicate.inline 1.26% : 0.000011s : 55: predicate.inline_without_move 0.30% : 0.000003s : 32: predicate.j_node_and_user_rematch 0.71% : 0.000006s : 32: predicate.less_batch_normalization 1.58% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.56% : 0.000021s : 168: predicate.load_eliminater 0.46% : 0.000004s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.36% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.02% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.03% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 68: predicate.minmaximum_grad 0.54% : 0.000005s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.29% : 0.000019s : 101: predicate.partial_defer_inline 1.62% : 0.000014s : 92: predicate.partial_eliminate 1.07% : 0.000009s : 68: predicate.print_const_string_wrapper 0.49% : 0.000004s : 32: predicate.reduce_all_const_elim 1.61% : 0.000013s : 68: predicate.reduce_eliminate 2.53% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.85% : 0.000015s : 152: predicate.replace_applicator 0.69% : 0.000006s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000009s : 68: predicate.reshape_eliminate 1.11% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000011s : 68: predicate.same_eliminate 0.41% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.70% : 0.000006s : 32: predicate.shard_identity_eliminate 0.31% : 0.000003s : 16: predicate.special_op_eliminate 0.58% : 0.000005s : 32: predicate.specialize_transform 1.39% : 0.000012s : 68: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.19% : 0.000002s : 8: predicate.switch_call_monad_eliminater 1.75% : 0.000015s : 101: predicate.switch_defer_inline 2.73% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.92% : 0.000041s : 277: predicate.switch_simplify 1.06% : 0.000009s : 68: predicate.tile_eliminate 1.06% : 0.000009s : 68: predicate.transpose_eliminate 1.46% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000013s : 84: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000012s : 84: predicate.tuple_list_get_item_depend_reorder 2.94% : 0.000025s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000018s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000014s : 100: predicate.tuple_to_list_eliminator_ 2.43% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.03% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.52% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002415 34 50.17% : 0.001212s : 13: func_graph_cloner_run.FuncGraphClonerGraph 49.83% : 0.001203s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.184768 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.96% : 0.003620s : 1: add_attr 1.95% : 0.003607s : 1: add_attr_with_inline 0.00% : 0.000005s : 1: add_comm_op_reuse_tag 0.04% : 0.000078s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000141s : 1: auto_monad 0.02% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.29% : 0.000529s : 1: bootstrap 0.02% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000024s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000040s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000018s : 1: environ_conv 0.04% : 0.000070s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.30% : 0.000562s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.43% : 0.000799s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000022s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000023s : 1: opt.transform.mutable_eliminate 9.28% : 0.017140s : 117: opt.transform.opt_a 0.03% : 0.000055s : 1: opt.transform.opt_after_cconv 0.02% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000187s : 28: opt.transform.opt_b 0.05% : 0.000089s : 2: opt.transform.opt_trans_graph 0.04% : 0.000065s : 4: opt.transform.symbol_engine_opt 21.07% : 0.038930s : 1: opt_a 0.09% : 0.000167s : 1: opt_after_cconv 0.32% : 0.000592s : 1: opt_after_jit_grad 0.18% : 0.000338s : 1: opt_b 22.71% : 0.041959s : 1: optimize 0.02% : 0.000039s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000034s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000010s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000011s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000066s : 1: pre_auto_parallel 0.03% : 0.000058s : 1: py_interpret_to_execute 0.02% : 0.000028s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000055s : 1: remove_dup_value 8.13% : 0.015019s : 2: renormalize.infer 1.26% : 0.002336s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000064s : 1: rewriter_after_opt_a 0.09% : 0.000172s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000124s : 1: symbol_engine_optimizer 12.41% : 0.022933s : 1: task_emit 0.07% : 0.000124s : 1: tuple_transform 18.48% : 0.034143s : 1: type_inference 0.06% : 0.000109s : 1: validate TotalTime = 0.0241993, [24] [bootstrap]: 0.00044933 [type_inference]: 0.00530553 [event_method]: 1.545e-05 [auto_monad]: 5.703e-05 [graph_reusing]: 5.37001e-06 [inline]: 2.58003e-06 [add_attr]: 0.00388282, [1] [add_attr_with_inline]: 0.00386984, [1] [Cycle 1]: 6.53e-05, [2] [tag_attr]: 1.563e-05 [meta_addattr_fg_expand]: 3.53999e-06 [parallel-infer-symbol]: 4.03001e-06 [pre_auto_parallel]: 3.413e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0048179, [53] [py_interpret_to_execute]: 2.242e-05 [rewriter_before_opt_a]: 4.934e-05 [opt_a]: 0.00252213, [2] [Cycle 1]: 0.0017917, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 2.654e-05 [loop_unroll]: 1.456e-05 [a_1]: 0.00035197 [with_stream_mark]: 1.928e-05 [recompute_prepare]: 8.70999e-06 [updatestate_depend_eliminate]: 4.07e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.97001e-06 [a_2]: 8.25e-05 [accelerated_algorithm]: 7.7e-06 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 2.14e-06 [shard_inline]: 6.08998e-06 [merge_send_recv]: 8.40999e-06 [auto_parallel]: 7.15e-06 [parallel]: 2.096e-05 [flash_sp]: 1.034e-05 [merge_comm]: 3.70998e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 1.032e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 8.90001e-06 [virtual_dataset]: 6.09999e-06 [get_grad_eliminate_]: 5.91e-06 [virtual_output]: 5.84999e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 1.48002e-06 [offload_activation]: 1.095e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.277e-05 [merge_recompute_call_nodes]: 1.68002e-06 [before_grad]: 1.148e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.69001e-06 [flash_sp_send_recv_attached]: 2.58e-06 [receive_attached]: 2.65002e-06 [after_resolve]: 1.188e-05 [a_after_grad]: 1.047e-05 [renormalize]: 0.00072929 [add_forward_monad_depend]: 6.74001e-06 [auto_monad_grad]: 2.89999e-06 [auto_monad_eliminator]: 1.724e-05 [cse]: 3.189e-05 [a_3]: 5.016e-05 [Cycle 2]: 0.00071887, [45] [expand_dump_flag]: 1.99e-06 [switch_simplify]: 7.71999e-06 [loop_unroll]: 6.33998e-06 [a_1]: 0.00014292 [with_stream_mark]: 2.015e-05 [recompute_prepare]: 6.83998e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.74001e-06 [updatestate_loads_eliminate]: 3.24001e-06 [parameter_eliminate]: 1.37e-06 [a_2]: 7.512e-05 [accelerated_algorithm]: 5.99e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 6.41e-06 [auto_parallel]: 7.23999e-06 [parallel]: 7.3e-06 [flash_sp]: 3.75e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 8.29002e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.74999e-06 [virtual_output]: 5.22e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.95001e-06 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.158e-05 [merge_recompute_call_nodes]: 1.24e-06 [before_grad]: 8.70001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.3e-06 [meta_fg_expand]: 2.51e-06 [flash_sp_send_recv_attached]: 1.13001e-06 [receive_attached]: 1.41998e-06 [after_resolve]: 1.102e-05 [a_after_grad]: 8.67998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.74998e-06 [auto_monad_grad]: 1.05999e-06 [auto_monad_eliminator]: 8.69e-06 [cse]: 1.714e-05 [a_3]: 3.514e-05 [py_interpret_to_execute_after_opt_a]: 1.348e-05 [slice_cell_reuse_recomputed_activation]: 2.34001e-06 [rewriter_after_opt_a]: 3.898e-05 [convert_after_rewriter]: 7.18998e-06 [order_py_execute_after_rewriter]: 6.66999e-06 [mutable_eliminate]: 0.00069915 [opt_b]: 0.00021815, [1] [Cycle 1]: 0.00021011, [7] [b_1]: 0.0001302 [b_2]: 8.04002e-06 [updatestate_depend_eliminate]: 6.75998e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.48e-06 [renormalize]: 5.59987e-07 [cse]: 2.068e-05 [optimize_parallel_all_gather_comm]: 1.845e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.951e-05 [loop_unroll]: 0.00050264 [opt_after_cconv]: 0.00010354, [1] [Cycle 1]: 9.689e-05, [7] [c_1]: 3.054e-05 [parameter_eliminate]: 2.82002e-06 [updatestate_depend_eliminate]: 5.65001e-06 [updatestate_assign_eliminate]: 2.79999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.823e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.369e-05 [tuple_transform]: 7.732e-05, [1] [Cycle 1]: 7.257e-05, [4] [d_1]: 4.558e-05 [none_parameter_eliminate]: 1.78002e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.79999e-06 [partial_unused_args_eliminate]: 2.11e-06 [add_recomputation]: 5.025e-05 [cse_after_recomputation]: 2.088e-05, [1] [Cycle 1]: 1.615e-05, [1] [cse]: 1.089e-05 [environ_conv]: 5.81e-06 [swap_dp_allreduce_reducescatter]: 5.44998e-06 [bias_add_comm_swap]: 2.85998e-06 [label_micro_interleaved_index]: 5.42001e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.62001e-06 [reorder_send_recv_between_fp_bp]: 3.25e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.11997e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.35e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 4.1e-06 [overlap_recompute_and_grad_model_parallel]: 5.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.73997e-06 [overlap_recompute_comm]: 2.69001e-06 [overlap_grad_ring_attention]: 4.58999e-06 [overlap_grad_flash_sp]: 1.93e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 2.20002e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 7.581e-05, [1] [Cycle 1]: 7.144e-05, [6] [build]: 3.18998e-06 [elim_shapecalc]: 1e-05 [elim_not_effective]: 1.243e-05 [opt_reshape]: 6.91999e-06 [fold_const_symbol]: 9.74999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.59999e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.752e-05 [get_jit_bprop_graph]: 1.67999e-06 [rewriter_after_jit_bprop_graph]: 4.89998e-06 [opt_after_jit_grad]: 0.00053416 [validate]: 4.236e-05 [backend_pass]: 1.10999e-06 [task_emit]: 0.0087504 [execute]: 8.16002e-06 Sums bootstrap : 0.000449s : 2.34% type_inference : 0.005306s : 27.64% event_method : 0.000015s : 0.08% auto_monad : 0.000057s : 0.30% graph_reusing : 0.000005s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000034s : 0.18% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.12% optimize.rewriter_before_opt_a : 0.000049s : 0.26% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000034s : 0.18% optimize.opt_a.loop_unroll : 0.000021s : 0.11% optimize.opt_a.a_1 : 0.000495s : 2.58% optimize.opt_a.with_stream_mark : 0.000039s : 0.21% optimize.opt_a.recompute_prepare : 0.000016s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000158s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.07% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000015s : 0.08% optimize.opt_a.auto_parallel : 0.000014s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.15% optimize.opt_a.flash_sp : 0.000014s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000020s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000023s : 0.12% optimize.opt_a.a_after_grad : 0.000019s : 0.10% optimize.opt_a.renormalize : 0.000729s : 3.80% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.14% optimize.opt_a.cse : 0.000049s : 0.26% optimize.opt_a.a_3 : 0.000085s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000039s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000007s : 0.03% optimize.mutable_eliminate : 0.000699s : 3.64% optimize.opt_b.b_1 : 0.000130s : 0.68% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000030s : 0.15% optimize.loop_unroll : 0.000503s : 2.62% optimize.opt_after_cconv.c_1 : 0.000031s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.07% optimize.tuple_transform.d_1 : 0.000046s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.26% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000534s : 2.78% validate : 0.000042s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.008750s : 45.59% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000166 26 17.68% : 0.000029s : 4: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000001s : 2: substitution.fold_const_symbol 4.05% : 0.000007s : 4: substitution.graph_param_transform 67.99% : 0.000113s : 2: substitution.inline 2.25% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.90% : 0.000005s : 4: substitution.remove_not_recompute_node 3.13% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.005248 2 91.31% : 0.004792s : 1: type_inference.infer 8.69% : 0.000456s : 1: type_inference.specialize ------[replace.] 0.000024 2 100.00% : 0.000024s : 2: replace.inline ------[match.] 0.000111 2 100.00% : 0.000111s : 2: match.inline ------[predicate.] 0.000161 984 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 1.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.80% : 0.000005s : 17: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.83% : 0.000001s : 8: predicate.depend_value_elim 0.68% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.11% : 0.000002s : 9: predicate.dict_get_item_eliminator 1.19% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.46% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.00% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_depend_swap 1.71% : 0.000003s : 21: predicate.environ_get_eliminate 0.98% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.86% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000003s : 11: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.93% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.62% : 0.000009s : 44: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.30% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.01% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.57% : 0.000003s : 18: predicate.loop_unroll_before_grad 2.15% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.01% : 0.000002s : 8: predicate.mini_step_allgather_replace 0.65% : 0.000001s : 9: predicate.minmaximum_grad 2.43% : 0.000004s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.05% : 0.000002s : 11: predicate.partial_defer_inline 1.04% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 9: predicate.reduce_eliminate 2.06% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.28% : 0.000002s : 17: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.69% : 0.000001s : 9: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000002s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.11% : 0.000002s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000002s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.96% : 0.000002s : 11: predicate.switch_defer_inline 1.51% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000007s : 41: predicate.switch_simplify 0.81% : 0.000001s : 9: predicate.tile_eliminate 0.99% : 0.000002s : 9: predicate.transpose_eliminate 1.55% : 0.000003s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.58% : 0.000003s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 17: predicate.tuple_to_list_eliminator_ 1.86% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.81% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000366 6 34.78% : 0.000127s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.22% : 0.000238s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.034672 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.22% : 0.003891s : 1: add_attr 11.17% : 0.003874s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000063s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.39% : 0.000481s : 1: bootstrap 0.10% : 0.000033s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000022s : 1: event_method 0.05% : 0.000016s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.48% : 0.000513s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.05% : 0.000711s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000017s : 1: opt.transform.mutable_eliminate 2.54% : 0.000882s : 78: opt.transform.opt_a 0.08% : 0.000029s : 1: opt.transform.opt_after_cconv 0.07% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000099s : 28: opt.transform.opt_b 0.14% : 0.000050s : 2: opt.transform.opt_trans_graph 0.10% : 0.000036s : 4: opt.transform.symbol_engine_opt 7.29% : 0.002526s : 1: opt_a 0.31% : 0.000107s : 1: opt_after_cconv 1.57% : 0.000545s : 1: opt_after_jit_grad 0.64% : 0.000222s : 1: opt_b 13.91% : 0.004823s : 1: optimize 0.06% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000010s : 1: order_py_execute_after_rewriter 0.06% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000038s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.05% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000018s : 1: remove_dup_value 1.23% : 0.000425s : 1: renormalize.infer 0.85% : 0.000295s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000043s : 1: rewriter_after_opt_a 0.16% : 0.000054s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000079s : 1: symbol_engine_optimizer 25.30% : 0.008771s : 1: task_emit 0.23% : 0.000080s : 1: tuple_transform 15.38% : 0.005332s : 1: type_inference 0.24% : 0.000082s : 1: validate TotalTime = 0.105698, [24] [bootstrap]: 0.00052693 [type_inference]: 0.027835 [event_method]: 4.921e-05 [auto_monad]: 0.00012428 [graph_reusing]: 8.94e-06 [inline]: 2.17999e-06 [add_attr]: 0.00338275, [1] [add_attr_with_inline]: 0.00337108, [1] [Cycle 1]: 9.647e-05, [2] [tag_attr]: 4.296e-05 [meta_addattr_fg_expand]: 9.82001e-06 [parallel-infer-symbol]: 4.12e-06 [pre_auto_parallel]: 5.697e-05 [insert-virtual-dataset]: 2.73998e-06 [parallel-infer-symbol-second]: 1.02998e-06 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.91998e-06 [optimize]: 0.0186619, [53] [py_interpret_to_execute]: 5.537e-05 [rewriter_before_opt_a]: 0.00015228 [opt_a]: 0.0157033, [3] [Cycle 1]: 0.00998525, [45] [expand_dump_flag]: 5.65001e-06 [switch_simplify]: 7.152e-05 [loop_unroll]: 5.783e-05 [a_1]: 0.0015579 [with_stream_mark]: 3.985e-05 [recompute_prepare]: 3.202e-05 [updatestate_depend_eliminate]: 1.178e-05 [updatestate_assign_eliminate]: 8.73001e-06 [updatestate_loads_eliminate]: 7.58999e-06 [parameter_eliminate]: 3.98999e-06 [a_2]: 0.00026949 [accelerated_algorithm]: 4.393e-05 [shard]: 3.18e-06 [meta_shard_fg_expand]: 6.14001e-06 [shard_inline]: 2.081e-05 [merge_send_recv]: 2.257e-05 [auto_parallel]: 1.794e-05 [parallel]: 2.497e-05 [flash_sp]: 1.662e-05 [merge_comm]: 1.222e-05 [allreduce_fusion]: 9.64999e-06 [matmul_add_comm_reduction]: 3.965e-05 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 2.367e-05 [virtual_dataset]: 1.762e-05 [get_grad_eliminate_]: 1.859e-05 [virtual_output]: 1.784e-05 [merge_forward]: 1.338e-05 [cell_reuse_recompute_pass]: 3.18e-06 [offload_activation]: 2.188e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.951e-05 [merge_recompute_call_nodes]: 1.79e-06 [before_grad]: 3.239e-05 [set_forward_comm_id_for_comm_node_pass]: 1.447e-05 [meta_fg_expand]: 0.00238818 [flash_sp_send_recv_attached]: 5.99999e-06 [receive_attached]: 2.60002e-06 [after_resolve]: 9.051e-05 [a_after_grad]: 0.00010242 [renormalize]: 0.00374197 [add_forward_monad_depend]: 1.65e-05 [auto_monad_grad]: 7.33999e-06 [auto_monad_eliminator]: 7.524e-05 [cse]: 0.00020978 [a_3]: 0.00038104 [Cycle 2]: 0.00449998, [45] [expand_dump_flag]: 3.86999e-06 [switch_simplify]: 5.37e-05 [loop_unroll]: 4.713e-05 [a_1]: 0.00198754 [with_stream_mark]: 3.492e-05 [recompute_prepare]: 2.158e-05 [updatestate_depend_eliminate]: 7.61999e-06 [updatestate_assign_eliminate]: 5.54e-06 [updatestate_loads_eliminate]: 4.31002e-06 [parameter_eliminate]: 2.63e-06 [a_2]: 0.00014586 [accelerated_algorithm]: 1.722e-05 [shard]: 2.78e-06 [meta_shard_fg_expand]: 4.44002e-06 [shard_inline]: 1.322e-05 [merge_send_recv]: 1.299e-05 [auto_parallel]: 1.399e-05 [parallel]: 1.131e-05 [flash_sp]: 4.60001e-06 [merge_comm]: 9.03002e-06 [allreduce_fusion]: 6.17999e-06 [matmul_add_comm_reduction]: 1.483e-05 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 1.354e-05 [virtual_dataset]: 1.03e-05 [get_grad_eliminate_]: 1.036e-05 [virtual_output]: 1.019e-05 [merge_forward]: 7.25e-06 [cell_reuse_recompute_pass]: 2.61999e-06 [offload_activation]: 1.605e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.2e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 1.63e-05 [set_forward_comm_id_for_comm_node_pass]: 6.26998e-06 [meta_fg_expand]: 9.184e-05 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 3.35998e-06 [after_resolve]: 2.58e-05 [a_after_grad]: 1.757e-05 [renormalize]: 0.00124824 [add_forward_monad_depend]: 9.07001e-06 [auto_monad_grad]: 2.95002e-06 [auto_monad_eliminator]: 2.386e-05 [cse]: 8.082e-05 [a_3]: 8.361e-05 [Cycle 3]: 0.00119275, [45] [expand_dump_flag]: 3.41001e-06 [switch_simplify]: 1.279e-05 [loop_unroll]: 9.77001e-06 [a_1]: 0.00037659 [with_stream_mark]: 2.156e-05 [recompute_prepare]: 1.356e-05 [updatestate_depend_eliminate]: 6.33e-06 [updatestate_assign_eliminate]: 4.95999e-06 [updatestate_loads_eliminate]: 4.68999e-06 [parameter_eliminate]: 1.56998e-06 [a_2]: 0.00013801 [accelerated_algorithm]: 1.524e-05 [shard]: 3.15002e-06 [meta_shard_fg_expand]: 3.04999e-06 [shard_inline]: 1.022e-05 [merge_send_recv]: 1.101e-05 [auto_parallel]: 1.356e-05 [parallel]: 9.41003e-06 [flash_sp]: 1.69e-06 [merge_comm]: 5.71e-06 [allreduce_fusion]: 5.66e-06 [matmul_add_comm_reduction]: 1.314e-05 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 1.313e-05 [virtual_dataset]: 9.52999e-06 [get_grad_eliminate_]: 9.21998e-06 [virtual_output]: 8.92999e-06 [merge_forward]: 6.29999e-06 [cell_reuse_recompute_pass]: 3.23998e-06 [offload_activation]: 1.428e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.901e-05 [merge_recompute_call_nodes]: 1.19e-06 [before_grad]: 1.694e-05 [set_forward_comm_id_for_comm_node_pass]: 5.99999e-06 [meta_fg_expand]: 4.18999e-06 [flash_sp_send_recv_attached]: 1.38002e-06 [receive_attached]: 1.92001e-06 [after_resolve]: 1.741e-05 [a_after_grad]: 1.635e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 2.81e-06 [auto_monad_grad]: 1.86003e-06 [auto_monad_eliminator]: 1.487e-05 [cse]: 3.423e-05 [a_3]: 6.173e-05 [py_interpret_to_execute_after_opt_a]: 2.05e-05 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 5.892e-05 [convert_after_rewriter]: 1.019e-05 [order_py_execute_after_rewriter]: 8.38999e-06 [mutable_eliminate]: 0.00078033 [opt_b]: 0.00033106, [1] [Cycle 1]: 0.00032157, [7] [b_1]: 0.00020386 [b_2]: 1.23e-05 [updatestate_depend_eliminate]: 1.095e-05 [updatestate_assign_eliminate]: 4.70999e-06 [updatestate_loads_eliminate]: 4.03001e-06 [renormalize]: 6.09987e-07 [cse]: 4.298e-05 [optimize_parallel_all_gather_comm]: 2.619e-05 [overlap_param_gather]: 1.93002e-06 [cconv]: 3.401e-05 [loop_unroll]: 0.00050654 [opt_after_cconv]: 0.00016232, [1] [Cycle 1]: 0.00015381, [7] [c_1]: 5.276e-05 [parameter_eliminate]: 5.25999e-06 [updatestate_depend_eliminate]: 9.77001e-06 [updatestate_assign_eliminate]: 4.53999e-06 [updatestate_loads_eliminate]: 4.35e-06 [cse]: 3.786e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 4.887e-05 [tuple_transform]: 0.0001171, [1] [Cycle 1]: 0.00011119, [4] [d_1]: 7.672e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 1.107e-05 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 7.502e-05 [cse_after_recomputation]: 3.967e-05, [1] [Cycle 1]: 3.365e-05, [1] [cse]: 2.637e-05 [environ_conv]: 1.192e-05 [swap_dp_allreduce_reducescatter]: 9.16998e-06 [bias_add_comm_swap]: 2.92002e-06 [label_micro_interleaved_index]: 5.05001e-06 [label_fine_grained_interleaved_index]: 2.65002e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.33002e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 8.80013e-07 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 2.188e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 5.40999e-06 [overlap_recompute_and_grad_model_parallel]: 6.09001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.78998e-06 [overlap_grad_ring_attention]: 6.05002e-06 [overlap_grad_flash_sp]: 3.104e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 0.00011486, [1] [Cycle 1]: 0.0001098, [6] [build]: 1.255e-05 [elim_shapecalc]: 1.775e-05 [elim_not_effective]: 2.02e-05 [opt_reshape]: 1.095e-05 [fold_const_symbol]: 1.534e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.44999e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 2.84e-05 [get_jit_bprop_graph]: 2.09e-06 [rewriter_after_jit_bprop_graph]: 5.07999e-06 [opt_after_jit_grad]: 0.00052662 [validate]: 5.62e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.0541501 [execute]: 9.32999e-06 Sums bootstrap : 0.000527s : 0.52% type_inference : 0.027835s : 27.68% event_method : 0.000049s : 0.05% auto_monad : 0.000124s : 0.12% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000043s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000057s : 0.06% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000055s : 0.06% optimize.rewriter_before_opt_a : 0.000152s : 0.15% optimize.opt_a.expand_dump_flag : 0.000013s : 0.01% optimize.opt_a.switch_simplify : 0.000138s : 0.14% optimize.opt_a.loop_unroll : 0.000115s : 0.11% optimize.opt_a.a_1 : 0.003922s : 3.90% optimize.opt_a.with_stream_mark : 0.000096s : 0.10% optimize.opt_a.recompute_prepare : 0.000067s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000026s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.02% optimize.opt_a.parameter_eliminate : 0.000008s : 0.01% optimize.opt_a.a_2 : 0.000553s : 0.55% optimize.opt_a.accelerated_algorithm : 0.000076s : 0.08% optimize.opt_a.shard : 0.000009s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000014s : 0.01% optimize.opt_a.shard_inline : 0.000044s : 0.04% optimize.opt_a.merge_send_recv : 0.000047s : 0.05% optimize.opt_a.auto_parallel : 0.000045s : 0.05% optimize.opt_a.parallel : 0.000046s : 0.05% optimize.opt_a.flash_sp : 0.000023s : 0.02% optimize.opt_a.merge_comm : 0.000027s : 0.03% optimize.opt_a.allreduce_fusion : 0.000021s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000068s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000050s : 0.05% optimize.opt_a.virtual_dataset : 0.000037s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000038s : 0.04% optimize.opt_a.virtual_output : 0.000037s : 0.04% optimize.opt_a.merge_forward : 0.000027s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000009s : 0.01% optimize.opt_a.offload_activation : 0.000052s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000081s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000066s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000027s : 0.03% optimize.opt_a.meta_fg_expand : 0.002484s : 2.47% optimize.opt_a.flash_sp_send_recv_attached : 0.000010s : 0.01% optimize.opt_a.receive_attached : 0.000008s : 0.01% optimize.opt_a.after_resolve : 0.000134s : 0.13% optimize.opt_a.a_after_grad : 0.000136s : 0.14% optimize.opt_a.renormalize : 0.004990s : 4.96% optimize.opt_a.add_forward_monad_depend : 0.000028s : 0.03% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000114s : 0.11% optimize.opt_a.cse : 0.000325s : 0.32% optimize.opt_a.a_3 : 0.000526s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000059s : 0.06% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000008s : 0.01% optimize.mutable_eliminate : 0.000780s : 0.78% optimize.opt_b.b_1 : 0.000204s : 0.20% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000043s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000034s : 0.03% optimize.loop_unroll : 0.000507s : 0.50% optimize.opt_after_cconv.c_1 : 0.000053s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000038s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000049s : 0.05% optimize.tuple_transform.d_1 : 0.000077s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000075s : 0.07% optimize.cse_after_recomputation.cse : 0.000026s : 0.03% optimize.environ_conv : 0.000012s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000022s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000031s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000018s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000028s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000527s : 0.52% validate : 0.000056s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.054150s : 53.85% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.001236 218 6.29% : 0.000078s : 11: substitution.arithmetic_simplify 2.32% : 0.000029s : 2: substitution.cast_eliminate 0.22% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000006s : 5: substitution.float_depend_g_call 0.44% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.18% : 0.000002s : 5: substitution.fold_const_symbol 0.74% : 0.000009s : 8: substitution.graph_param_transform 0.30% : 0.000004s : 2: substitution.incorporate_call 0.17% : 0.000002s : 2: substitution.incorporate_call_switch 56.94% : 0.000704s : 16: substitution.inline 2.32% : 0.000029s : 2: substitution.inline_without_move 1.07% : 0.000013s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000024s : 3: substitution.less_batch_normalization 1.26% : 0.000016s : 11: substitution.minmaximum_grad 0.64% : 0.000008s : 5: substitution.partial_eliminate 1.33% : 0.000016s : 20: substitution.remove_not_recompute_node 3.38% : 0.000042s : 10: substitution.replace_applicator 1.33% : 0.000016s : 15: substitution.replace_old_param 0.37% : 0.000005s : 1: substitution.set_cell_output_no_recompute 2.70% : 0.000033s : 11: substitution.tuple_list_convert_item_index_to_positive 1.28% : 0.000016s : 11: substitution.tuple_list_get_item_const_eliminator 1.88% : 0.000023s : 11: substitution.tuple_list_get_item_depend_reorder 10.70% : 0.000132s : 28: substitution.tuple_list_get_item_eliminator 1.81% : 0.000022s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.027749 2 94.99% : 0.026358s : 1: type_inference.infer 5.01% : 0.001391s : 1: type_inference.specialize ------[replace.] 0.000279 30 60.36% : 0.000168s : 16: replace.inline 39.64% : 0.000110s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000735 30 94.16% : 0.000692s : 16: match.inline 5.84% : 0.000043s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000829 5663 1.03% : 0.000009s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.01% : 0.000008s : 67: predicate.addn_zero_filter 0.99% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.45% : 0.000020s : 99: predicate.arithmetic_simplify 1.11% : 0.000009s : 67: predicate.cast_eliminate 1.17% : 0.000010s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.09% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.13% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.21% : 0.000002s : 8: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.08% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000010s : 75: predicate.environ_get_depend_swap 1.56% : 0.000013s : 107: predicate.environ_get_eliminate 1.27% : 0.000011s : 75: predicate.environ_get_set_eliminate 1.54% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.29% : 0.000019s : 97: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.74% : 0.000006s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.60% : 0.000005s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000005s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.86% : 0.000049s : 244: predicate.inline 1.46% : 0.000012s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.77% : 0.000006s : 32: predicate.less_batch_normalization 1.53% : 0.000013s : 97: predicate.list_to_tuple_eliminator_ 2.41% : 0.000020s : 164: predicate.load_eliminater 0.45% : 0.000004s : 8: predicate.loop_unroll_after_grad 2.11% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000012s : 83: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000009s : 67: predicate.minmaximum_grad 0.58% : 0.000005s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.15% : 0.000018s : 97: predicate.partial_defer_inline 1.57% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000009s : 67: predicate.print_const_string_wrapper 0.55% : 0.000005s : 32: predicate.reduce_all_const_elim 1.34% : 0.000011s : 67: predicate.reduce_eliminate 2.44% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.43% : 0.000004s : 32: predicate.remove_not_recompute_node 1.88% : 0.000016s : 149: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000009s : 67: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000011s : 68: predicate.same_eliminate 0.47% : 0.000004s : 32: predicate.set_cell_output_no_recompute 0.77% : 0.000006s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.44% : 0.000012s : 68: predicate.split_environ_get_set_with_tuple_value 1.41% : 0.000012s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.66% : 0.000014s : 97: predicate.switch_defer_inline 2.74% : 0.000023s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000040s : 265: predicate.switch_simplify 0.99% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000009s : 67: predicate.transpose_eliminate 1.55% : 0.000013s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000013s : 83: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000012s : 83: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000026s : 129: predicate.tuple_list_get_item_eliminator 1.38% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000017s : 115: predicate.tuple_list_set_item_eliminator 1.53% : 0.000013s : 97: predicate.tuple_to_list_eliminator_ 2.45% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.05% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.63% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.63% : 0.000005s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002143 32 58.60% : 0.001256s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.40% : 0.000887s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.138908 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.44% : 0.003389s : 1: add_attr 2.43% : 0.003375s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000081s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000132s : 1: auto_monad 0.02% : 0.000033s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.40% : 0.000559s : 1: bootstrap 0.03% : 0.000039s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000026s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.03% : 0.000043s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000015s : 1: environ_conv 0.04% : 0.000057s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.37% : 0.000519s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000797s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.02% : 0.000022s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000026s : 1: opt.transform.mutable_eliminate 4.20% : 0.005839s : 117: opt.transform.opt_a 0.04% : 0.000051s : 1: opt.transform.opt_after_cconv 0.03% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000185s : 28: opt.transform.opt_b 0.06% : 0.000085s : 2: opt.transform.opt_trans_graph 0.04% : 0.000060s : 4: opt.transform.symbol_engine_opt 11.31% : 0.015707s : 1: opt_a 0.12% : 0.000166s : 1: opt_after_cconv 0.39% : 0.000538s : 1: opt_after_jit_grad 0.24% : 0.000335s : 1: opt_b 13.44% : 0.018668s : 1: optimize 0.02% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000012s : 1: order_py_execute_after_rewriter 0.03% : 0.000035s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000016s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000062s : 1: pre_auto_parallel 0.04% : 0.000060s : 1: py_interpret_to_execute 0.02% : 0.000025s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.04% : 0.000054s : 1: remove_dup_value 1.97% : 0.002734s : 2: renormalize.infer 1.61% : 0.002229s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000064s : 1: rewriter_after_opt_a 0.11% : 0.000158s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000118s : 1: symbol_engine_optimizer 39.00% : 0.054171s : 1: task_emit 0.09% : 0.000120s : 1: tuple_transform 20.05% : 0.027856s : 1: type_inference 0.07% : 0.000099s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x1-kbk],max_mem:4.0M ......... TotalTime = 82.8838, [24] [bootstrap]: 0.0007013 [type_inference]: 0.0811565 [event_method]: 1.855e-05 [auto_monad]: 6.359e-05 [graph_reusing]: 6.11e-06 [inline]: 3.01001e-06 [add_attr]: 0.00448483, [1] [add_attr_with_inline]: 0.00447037, [1] [Cycle 1]: 6.088e-05, [2] [tag_attr]: 1.844e-05 [meta_addattr_fg_expand]: 4.58001e-06 [parallel-infer-symbol]: 3.7e-06 [pre_auto_parallel]: 3.735e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 6.99976e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00516147, [53] [py_interpret_to_execute]: 2.575e-05 [rewriter_before_opt_a]: 6.565e-05 [opt_a]: 0.0029283, [2] [Cycle 1]: 0.00196084, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 3.511e-05 [loop_unroll]: 2.173e-05 [a_1]: 0.00053762 [with_stream_mark]: 1.765e-05 [recompute_prepare]: 9.96e-06 [updatestate_depend_eliminate]: 4.37e-06 [updatestate_assign_eliminate]: 3.97998e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.92001e-06 [a_2]: 8.051e-05 [accelerated_algorithm]: 6.63998e-06 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 8.23001e-06 [auto_parallel]: 6.64999e-06 [parallel]: 2.688e-05 [flash_sp]: 9.39e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.35003e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 8.60999e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.98002e-06 [virtual_output]: 6.49999e-06 [merge_forward]: 3.83999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.039e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.151e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 1.378e-05 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.67001e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.113e-05 [a_after_grad]: 8.97999e-06 [renormalize]: 0.00071475 [add_forward_monad_depend]: 5.37999e-06 [auto_monad_grad]: 2.23998e-06 [auto_monad_eliminator]: 1.591e-05 [cse]: 2.898e-05 [a_3]: 4.621e-05 [Cycle 2]: 0.00095521, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 7.25998e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00013374 [with_stream_mark]: 1.171e-05 [recompute_prepare]: 6.36e-06 [updatestate_depend_eliminate]: 3.13e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 2.50002e-06 [parameter_eliminate]: 1.50999e-06 [a_2]: 7.223e-05 [accelerated_algorithm]: 6.36998e-06 [shard]: 1.22999e-06 [meta_shard_fg_expand]: 2.04e-06 [shard_inline]: 5.90002e-06 [merge_send_recv]: 5.10001e-06 [auto_parallel]: 6.81999e-06 [parallel]: 6.33998e-06 [flash_sp]: 3.57002e-06 [merge_comm]: 3.21999e-06 [allreduce_fusion]: 2.82002e-06 [matmul_add_comm_reduction]: 7.36001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.46e-06 [virtual_dataset]: 5.41002e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.87002e-06 [cell_reuse_recompute_pass]: 2.27999e-06 [offload_activation]: 7.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 8.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 1.05999e-06 [receive_attached]: 1.77001e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.77999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.93002e-06 [auto_monad_grad]: 1.31002e-06 [auto_monad_eliminator]: 8.08001e-06 [cse]: 1.636e-05 [a_3]: 0.00033237 [py_interpret_to_execute_after_opt_a]: 1.481e-05 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 4.543e-05 [convert_after_rewriter]: 8.60001e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00060397 [opt_b]: 0.0002111, [1] [Cycle 1]: 0.0002036, [7] [b_1]: 0.00012389 [b_2]: 7.82e-06 [updatestate_depend_eliminate]: 6.49001e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.72001e-06 [renormalize]: 3.50003e-07 [cse]: 2.268e-05 [optimize_parallel_all_gather_comm]: 1.86e-05 [overlap_param_gather]: 2.10002e-06 [cconv]: 3.368e-05 [loop_unroll]: 0.00049768 [opt_after_cconv]: 0.00010613, [1] [Cycle 1]: 9.883e-05, [7] [c_1]: 3.149e-05 [parameter_eliminate]: 2.88998e-06 [updatestate_depend_eliminate]: 5.36998e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.36998e-06 [cse]: 1.837e-05 [renormalize]: 6.80011e-07 [remove_dup_value]: 1.474e-05 [tuple_transform]: 7.719e-05, [1] [Cycle 1]: 7.24e-05, [4] [d_1]: 4.332e-05 [none_parameter_eliminate]: 1.48002e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 7.01001e-06 [partial_unused_args_eliminate]: 2.16998e-06 [add_recomputation]: 5.602e-05 [cse_after_recomputation]: 2.228e-05, [1] [Cycle 1]: 1.742e-05, [1] [cse]: 1.198e-05 [environ_conv]: 5.12999e-06 [swap_dp_allreduce_reducescatter]: 5.53002e-06 [bias_add_comm_swap]: 3.09001e-06 [label_micro_interleaved_index]: 4.76002e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.25002e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 3.11999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.299e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 3.99002e-06 [overlap_recompute_and_grad_model_parallel]: 5.30001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.42001e-06 [overlap_grad_ring_attention]: 4.50999e-06 [overlap_grad_flash_sp]: 1.964e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 2.21e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 7.753e-05, [1] [Cycle 1]: 7.299e-05, [6] [build]: 3.85e-06 [elim_shapecalc]: 1.019e-05 [elim_not_effective]: 1.297e-05 [opt_reshape]: 7.28999e-06 [fold_const_symbol]: 9.27001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.10002e-06 [pipeline_parallel_scheduler]: 1.75001e-06 [auto_monad_reorder]: 1.746e-05 [get_jit_bprop_graph]: 2.37999e-06 [rewriter_after_jit_bprop_graph]: 4.76002e-06 [opt_after_jit_grad]: 0.00050151 [validate]: 3.929e-05 [backend_pass]: 1.02e-06 [task_emit]: 82.7909 [execute]: 1.15e-05 Sums bootstrap : 0.000701s : 0.00% type_inference : 0.081156s : 0.10% event_method : 0.000019s : 0.00% auto_monad : 0.000064s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000037s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000026s : 0.00% optimize.rewriter_before_opt_a : 0.000066s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000671s : 0.00% optimize.opt_a.with_stream_mark : 0.000029s : 0.00% optimize.opt_a.recompute_prepare : 0.000016s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000153s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000033s : 0.00% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000023s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000715s : 0.00% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.00% optimize.opt_a.cse : 0.000045s : 0.00% optimize.opt_a.a_3 : 0.000379s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000045s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000604s : 0.00% optimize.opt_b.b_1 : 0.000124s : 0.00% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000023s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000034s : 0.00% optimize.loop_unroll : 0.000498s : 0.00% optimize.opt_after_cconv.c_1 : 0.000031s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000043s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000502s : 0.00% validate : 0.000039s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 82.790892s : 99.90% execute : 0.000012s : 0.00% Time group info: ------[substitution.] 0.000205 30 13.76% : 0.000028s : 5: substitution.arithmetic_simplify 0.88% : 0.000002s : 2: substitution.elim_not_effective 0.63% : 0.000001s : 2: substitution.fold_const_symbol 2.84% : 0.000006s : 4: substitution.graph_param_transform 69.84% : 0.000143s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.29% : 0.000005s : 4: substitution.remove_not_recompute_node 2.38% : 0.000005s : 4: substitution.replace_old_param 5.71% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.081073 2 98.83% : 0.080128s : 1: type_inference.infer 1.17% : 0.000945s : 1: type_inference.specialize ------[replace.] 0.000046 5 71.47% : 0.000033s : 3: replace.inline 28.53% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000151 5 93.02% : 0.000141s : 3: match.inline 6.98% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000172 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.82% : 0.000001s : 11: predicate.cast_eliminate 0.64% : 0.000001s : 8: predicate.check_bprop_eliminate 0.53% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_depend_swap 1.71% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.32% : 0.000011s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000002s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.44% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.74% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.79% : 0.000001s : 11: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.25% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.60% : 0.000003s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000002s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.72% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 2.69% : 0.000005s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000009s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.58% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.19% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000604 8 41.16% : 0.000249s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.84% : 0.000356s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 82.895442 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.01% : 0.004490s : 1: add_attr 0.01% : 0.004475s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000070s : 1: auto_monad 0.00% : 0.000022s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.00% : 0.000761s : 1: bootstrap 0.00% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000025s : 1: event_method 0.00% : 0.000043s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.00% : 0.000508s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.00% : 0.000614s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000016s : 1: opt.transform.mutable_eliminate 0.00% : 0.001361s : 78: opt.transform.opt_a 0.00% : 0.000030s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000104s : 28: opt.transform.opt_b 0.00% : 0.000048s : 2: opt.transform.opt_trans_graph 0.00% : 0.000036s : 4: opt.transform.symbol_engine_opt 0.00% : 0.002932s : 1: opt_a 0.00% : 0.000110s : 1: opt_after_cconv 0.00% : 0.000513s : 1: opt_after_jit_grad 0.00% : 0.000215s : 1: opt_b 0.01% : 0.005166s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000042s : 1: pre_auto_parallel 0.00% : 0.000030s : 1: py_interpret_to_execute 0.00% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000018s : 1: remove_dup_value 0.00% : 0.000394s : 1: renormalize.infer 0.00% : 0.000313s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000050s : 1: rewriter_after_opt_a 0.00% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000080s : 1: symbol_engine_optimizer 99.87% : 82.791047s : 1: task_emit 0.00% : 0.000080s : 1: tuple_transform 0.10% : 0.081190s : 1: type_inference 0.00% : 0.000074s : 1: validate TotalTime = 1.09299, [24] [bootstrap]: 0.00044659 [type_inference]: 0.00474784 [event_method]: 1.171e-05 [auto_monad]: 5.366e-05 [graph_reusing]: 5.10001e-06 [inline]: 2.20002e-06 [add_attr]: 0.00355726, [1] [add_attr_with_inline]: 0.0035475, [1] [Cycle 1]: 4.141e-05, [2] [tag_attr]: 1.249e-05 [meta_addattr_fg_expand]: 3.26999e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.326e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 9.49978e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00433459, [53] [py_interpret_to_execute]: 1.551e-05 [rewriter_before_opt_a]: 3.998e-05 [opt_a]: 0.00220828, [2] [Cycle 1]: 0.00149364, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.209e-05 [loop_unroll]: 1.588e-05 [a_1]: 0.00030981 [with_stream_mark]: 1.225e-05 [recompute_prepare]: 7.74002e-06 [updatestate_depend_eliminate]: 3.13e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 7.906e-05 [accelerated_algorithm]: 6.74001e-06 [shard]: 2.68e-06 [meta_shard_fg_expand]: 1.92999e-06 [shard_inline]: 6.58e-06 [merge_send_recv]: 8.60001e-06 [auto_parallel]: 6.78998e-06 [parallel]: 2.772e-05 [flash_sp]: 9.22001e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 7.88999e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.80998e-06 [virtual_dataset]: 6.02999e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.88998e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.53002e-06 [offload_activation]: 1.021e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.159e-05 [merge_recompute_call_nodes]: 1.73002e-06 [before_grad]: 1.065e-05 [set_forward_comm_id_for_comm_node_pass]: 4.18001e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.79999e-06 [after_resolve]: 1.136e-05 [a_after_grad]: 9.49e-06 [renormalize]: 0.00051781 [add_forward_monad_depend]: 4.76997e-06 [auto_monad_grad]: 2.64999e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.25e-05 [a_3]: 4.411e-05 [Cycle 2]: 0.00070483, [45] [expand_dump_flag]: 1.53002e-06 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.34998e-06 [a_1]: 0.00018583 [with_stream_mark]: 1.365e-05 [recompute_prepare]: 6.58e-06 [updatestate_depend_eliminate]: 3.23e-06 [updatestate_assign_eliminate]: 2.76999e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.32e-06 [a_2]: 7.135e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 1.45999e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 6.04001e-06 [auto_parallel]: 6.49001e-06 [parallel]: 6.71e-06 [flash_sp]: 3.48e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 6.71e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.66999e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.14003e-06 [merge_forward]: 3.23e-06 [cell_reuse_recompute_pass]: 1.82001e-06 [offload_activation]: 7.9e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.02e-05 [merge_recompute_call_nodes]: 9.5999e-07 [before_grad]: 8.79e-06 [set_forward_comm_id_for_comm_node_pass]: 3.89002e-06 [meta_fg_expand]: 2.04e-06 [flash_sp_send_recv_attached]: 1.24e-06 [receive_attached]: 1.45001e-06 [after_resolve]: 9.60001e-06 [a_after_grad]: 8.92e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.81003e-06 [auto_monad_grad]: 1.35001e-06 [auto_monad_eliminator]: 7.35998e-06 [cse]: 1.546e-05 [a_3]: 3.394e-05 [py_interpret_to_execute_after_opt_a]: 1.209e-05 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.458e-05 [convert_after_rewriter]: 7.41999e-06 [order_py_execute_after_rewriter]: 6.19999e-06 [mutable_eliminate]: 0.00061089 [opt_b]: 0.00020469, [1] [Cycle 1]: 0.00019725, [7] [b_1]: 0.00011745 [b_2]: 7.75e-06 [updatestate_depend_eliminate]: 7.15003e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.61e-06 [renormalize]: 4.39992e-07 [cse]: 2.126e-05 [optimize_parallel_all_gather_comm]: 1.854e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.745e-05 [loop_unroll]: 0.00046313 [opt_after_cconv]: 0.0001029, [1] [Cycle 1]: 9.676e-05, [7] [c_1]: 3.032e-05 [parameter_eliminate]: 3.33998e-06 [updatestate_depend_eliminate]: 5.72001e-06 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.807e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 7.589e-05, [1] [Cycle 1]: 7.098e-05, [4] [d_1]: 4.38e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 6.78e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 5.36e-05 [cse_after_recomputation]: 2.225e-05, [1] [Cycle 1]: 1.774e-05, [1] [cse]: 1.226e-05 [environ_conv]: 5.77999e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 3.6e-06 [label_micro_interleaved_index]: 5.21002e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.39998e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.34998e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.54998e-06 [full_micro_interleaved_order_control]: 2.70002e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.29003e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.239e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.61997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.25999e-06 [overlap_grad_flash_sp]: 1.995e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 7.689e-05, [1] [Cycle 1]: 7.238e-05, [6] [build]: 2.74001e-06 [elim_shapecalc]: 1.064e-05 [elim_not_effective]: 1.31e-05 [opt_reshape]: 6.57002e-06 [fold_const_symbol]: 9.03002e-06 [renormalize]: 3.39991e-07 [detach_backward]: 2.37999e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 2.26998e-06 [rewriter_after_jit_bprop_graph]: 3.78999e-06 [opt_after_jit_grad]: 0.00050365 [validate]: 3.94e-05 [backend_pass]: 7.2e-07 [task_emit]: 1.07901 [execute]: 6.74001e-06 Sums bootstrap : 0.000447s : 0.04% type_inference : 0.004748s : 0.44% event_method : 0.000012s : 0.00% auto_monad : 0.000054s : 0.00% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.00% optimize.rewriter_before_opt_a : 0.000040s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000029s : 0.00% optimize.opt_a.loop_unroll : 0.000021s : 0.00% optimize.opt_a.a_1 : 0.000496s : 0.05% optimize.opt_a.with_stream_mark : 0.000026s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000015s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000034s : 0.00% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000518s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.00% optimize.opt_a.cse : 0.000038s : 0.00% optimize.opt_a.a_3 : 0.000078s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000611s : 0.06% optimize.opt_b.b_1 : 0.000117s : 0.01% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.00% optimize.loop_unroll : 0.000463s : 0.04% optimize.opt_after_cconv.c_1 : 0.000030s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000044s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.00% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000504s : 0.05% validate : 0.000039s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.079012s : 99.14% execute : 0.000007s : 0.00% Time group info: ------[substitution.] 0.000132 26 20.03% : 0.000026s : 4: substitution.arithmetic_simplify 1.27% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000001s : 2: substitution.fold_const_symbol 4.83% : 0.000006s : 4: substitution.graph_param_transform 63.52% : 0.000084s : 2: substitution.inline 2.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.22% : 0.000004s : 4: substitution.remove_not_recompute_node 3.61% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004700 2 91.45% : 0.004298s : 1: type_inference.infer 8.55% : 0.000402s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000083 2 100.00% : 0.000083s : 2: match.inline ------[predicate.] 0.000147 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.59% : 0.000004s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.77% : 0.000001s : 8: predicate.depend_value_elim 0.74% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.42% : 0.000001s : 4: predicate.elim_not_effective 0.68% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.61% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.37% : 0.000001s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000009s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.69% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000002s : 8: predicate.less_batch_normalization 1.50% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.08% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.84% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 9: predicate.minmaximum_grad 1.44% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.68% : 0.000001s : 4: predicate.parallel_virtual_node 1.14% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.48% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 1.03% : 0.000002s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 1.04% : 0.000002s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.53% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000002s : 11: predicate.switch_defer_inline 1.90% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.25% : 0.000006s : 41: predicate.switch_simplify 0.83% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.70% : 0.000003s : 17: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000302 6 40.70% : 0.000123s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.30% : 0.000179s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.102438 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.32% : 0.003563s : 1: add_attr 0.32% : 0.003551s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000059s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.04% : 0.000473s : 1: bootstrap 0.00% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000018s : 1: event_method 0.00% : 0.000013s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.04% : 0.000473s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000624s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.08% : 0.000866s : 78: opt.transform.opt_a 0.00% : 0.000029s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000097s : 28: opt.transform.opt_b 0.00% : 0.000048s : 2: opt.transform.opt_trans_graph 0.00% : 0.000036s : 4: opt.transform.symbol_engine_opt 0.20% : 0.002212s : 1: opt_a 0.01% : 0.000106s : 1: opt_after_cconv 0.05% : 0.000515s : 1: opt_after_jit_grad 0.02% : 0.000208s : 1: opt_b 0.39% : 0.004339s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000027s : 1: pre_auto_parallel 0.00% : 0.000020s : 1: py_interpret_to_execute 0.00% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.03% : 0.000286s : 1: renormalize.infer 0.02% : 0.000225s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000039s : 1: rewriter_after_opt_a 0.00% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000080s : 1: symbol_engine_optimizer 97.88% : 1.079029s : 1: task_emit 0.01% : 0.000079s : 1: tuple_transform 0.43% : 0.004765s : 1: type_inference 0.01% : 0.000065s : 1: validate TotalTime = 0.786335, [24] [bootstrap]: 0.0005002 [type_inference]: 0.0061772 [event_method]: 1.535e-05 [auto_monad]: 5.703e-05 [graph_reusing]: 6.42001e-06 [inline]: 2.47001e-06 [add_attr]: 0.0032976, [1] [add_attr_with_inline]: 0.00328704, [1] [Cycle 1]: 5.541e-05, [2] [tag_attr]: 1.871e-05 [meta_addattr_fg_expand]: 4.82998e-06 [parallel-infer-symbol]: 3.43e-06 [pre_auto_parallel]: 2.973e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 1.09e-06 [dataset_repeat_opt]: 2.13002e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0137049, [53] [py_interpret_to_execute]: 2.358e-05 [rewriter_before_opt_a]: 6.863e-05 [opt_a]: 0.0114302, [2] [Cycle 1]: 0.0107174, [45] [expand_dump_flag]: 2.73998e-06 [switch_simplify]: 3.461e-05 [loop_unroll]: 2.213e-05 [a_1]: 0.00051599 [with_stream_mark]: 1.422e-05 [recompute_prepare]: 7.82002e-06 [updatestate_depend_eliminate]: 4.03001e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.891e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 1.73002e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 9.01002e-06 [auto_parallel]: 6.40997e-06 [parallel]: 1.886e-05 [flash_sp]: 8.32e-06 [merge_comm]: 3.97998e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 1.018e-05 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 7.56999e-06 [virtual_dataset]: 6.63e-06 [get_grad_eliminate_]: 5.97001e-06 [virtual_output]: 5.89e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 9.79999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.112e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.61e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 9.61e-06 [renormalize]: 0.00947047 [add_forward_monad_depend]: 1.101e-05 [auto_monad_grad]: 3.05998e-06 [auto_monad_eliminator]: 2.235e-05 [cse]: 2.952e-05 [a_3]: 6.149e-05 [Cycle 2]: 0.00069958, [45] [expand_dump_flag]: 2.68003e-06 [switch_simplify]: 8.74e-06 [loop_unroll]: 6.44001e-06 [a_1]: 0.00015382 [with_stream_mark]: 1.644e-05 [recompute_prepare]: 6.19001e-06 [updatestate_depend_eliminate]: 4.01001e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 3.30003e-06 [parameter_eliminate]: 2.08998e-06 [a_2]: 7.081e-05 [accelerated_algorithm]: 6.38003e-06 [shard]: 3.18e-06 [meta_shard_fg_expand]: 2.16e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 9.05999e-06 [auto_parallel]: 1.067e-05 [parallel]: 9.20001e-06 [flash_sp]: 4.57e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.23002e-06 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 7.53999e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 5.37001e-06 [merge_forward]: 4.85001e-06 [cell_reuse_recompute_pass]: 3.08e-06 [offload_activation]: 9.72001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 8.93002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 1.62999e-06 [receive_attached]: 2.72001e-06 [after_resolve]: 1.129e-05 [a_after_grad]: 9.42999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 2.19001e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 7.18e-06 [cse]: 1.383e-05 [a_3]: 3.257e-05 [py_interpret_to_execute_after_opt_a]: 1.629e-05 [slice_cell_reuse_recomputed_activation]: 2.52001e-06 [rewriter_after_opt_a]: 3.748e-05 [convert_after_rewriter]: 7.95998e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00073964 [opt_b]: 0.00020058, [1] [Cycle 1]: 0.0001922, [7] [b_1]: 0.0001169 [b_2]: 7.60998e-06 [updatestate_depend_eliminate]: 6.09999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.40002e-06 [renormalize]: 1.02e-06 [cse]: 1.974e-05 [optimize_parallel_all_gather_comm]: 1.883e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.976e-05 [loop_unroll]: 0.00044724 [opt_after_cconv]: 0.00010349, [1] [Cycle 1]: 9.705e-05, [7] [c_1]: 2.905e-05 [parameter_eliminate]: 3.55998e-06 [updatestate_depend_eliminate]: 7.08e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.716e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.484e-05 [tuple_transform]: 7.614e-05, [1] [Cycle 1]: 7.113e-05, [4] [d_1]: 4.393e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.54999e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 4.628e-05 [cse_after_recomputation]: 2.18e-05, [1] [Cycle 1]: 1.709e-05, [1] [cse]: 1.163e-05 [environ_conv]: 6.04999e-06 [swap_dp_allreduce_reducescatter]: 5.49998e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 3.02002e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.59001e-06 [micro_interleaved_order_control]: 2.62001e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 3.19001e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.39e-06 [overlap_opt_shard_in_pipeline]: 1.45001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07001e-06 [control_data_broadcast_order]: 1.287e-05 [grouped_pairwise_exchange_alltoall]: 1.69998e-06 [offloading_packed_experts]: 4.08999e-06 [overlap_recompute_and_grad_model_parallel]: 5.18002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.72999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.79e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 4.29002e-06 [overlap_grad_flash_sp]: 2.036e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.45001e-06 [symbol_engine_optimizer]: 7.45e-05, [1] [Cycle 1]: 6.947e-05, [6] [build]: 2.64999e-06 [elim_shapecalc]: 1.014e-05 [elim_not_effective]: 1.308e-05 [opt_reshape]: 6.64999e-06 [fold_const_symbol]: 9.05999e-06 [renormalize]: 3.00002e-07 [detach_backward]: 2.46e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.695e-05 [get_jit_bprop_graph]: 1.94e-06 [rewriter_after_jit_bprop_graph]: 4.72998e-06 [opt_after_jit_grad]: 0.00053566 [validate]: 4.036e-05 [backend_pass]: 9.19972e-07 [task_emit]: 0.761679 [execute]: 9.96998e-06 Sums bootstrap : 0.000500s : 0.06% type_inference : 0.006177s : 0.79% event_method : 0.000015s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000030s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.00% optimize.rewriter_before_opt_a : 0.000069s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000043s : 0.01% optimize.opt_a.loop_unroll : 0.000029s : 0.00% optimize.opt_a.a_1 : 0.000670s : 0.09% optimize.opt_a.with_stream_mark : 0.000031s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000018s : 0.00% optimize.opt_a.auto_parallel : 0.000017s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000009s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.009471s : 1.21% optimize.opt_a.add_forward_monad_depend : 0.000013s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000030s : 0.00% optimize.opt_a.cse : 0.000043s : 0.01% optimize.opt_a.a_3 : 0.000094s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.00% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000740s : 0.09% optimize.opt_b.b_1 : 0.000117s : 0.01% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.00% optimize.loop_unroll : 0.000447s : 0.06% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.00% optimize.tuple_transform.d_1 : 0.000044s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000536s : 0.07% validate : 0.000040s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.761679s : 97.41% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000206 30 17.93% : 0.000037s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.63% : 0.000001s : 2: substitution.fold_const_symbol 3.17% : 0.000007s : 4: substitution.graph_param_transform 64.33% : 0.000132s : 3: substitution.inline 1.79% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.47% : 0.000005s : 4: substitution.remove_not_recompute_node 2.76% : 0.000006s : 4: substitution.replace_old_param 5.86% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006128 2 88.99% : 0.005454s : 1: type_inference.infer 11.01% : 0.000674s : 1: type_inference.specialize ------[replace.] 0.000044 5 69.03% : 0.000030s : 3: replace.inline 30.97% : 0.000014s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000141 5 92.34% : 0.000130s : 3: match.inline 7.66% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000182 1131 0.83% : 0.000002s : 11: predicate.accumulaten_eliminater 0.83% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.99% : 0.000002s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000002s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.50% : 0.000001s : 8: predicate.compare_switch_simplify 0.20% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 1.17% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 0.99% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.05% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.77% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000001s : 4: predicate.graph_param_transform 0.58% : 0.000001s : 8: predicate.incorporate_call 0.49% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000011s : 51: predicate.inline 0.78% : 0.000001s : 8: predicate.inline_without_move 0.34% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000005s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.54% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.35% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.19% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000003s : 21: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000002s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000002s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000002s : 8: predicate.special_op_eliminate 0.72% : 0.000001s : 8: predicate.specialize_transform 1.22% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.29% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.89% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.58% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000002s : 11: predicate.tile_eliminate 0.82% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.26% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000507 8 35.30% : 0.000179s : 3: func_graph_cloner_run.FuncGraphClonerGraph 64.70% : 0.000328s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.814025 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.41% : 0.003303s : 1: add_attr 0.40% : 0.003292s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000063s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000529s : 1: bootstrap 0.00% : 0.000034s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000022s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.06% : 0.000456s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.09% : 0.000751s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.13% : 0.001066s : 78: opt.transform.opt_a 0.00% : 0.000028s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000094s : 28: opt.transform.opt_b 0.01% : 0.000048s : 2: opt.transform.opt_trans_graph 0.00% : 0.000035s : 4: opt.transform.symbol_engine_opt 1.40% : 0.011434s : 1: opt_a 0.01% : 0.000107s : 1: opt_after_cconv 0.07% : 0.000548s : 1: opt_after_jit_grad 0.03% : 0.000204s : 1: opt_b 1.68% : 0.013711s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000034s : 1: pre_auto_parallel 0.00% : 0.000028s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000019s : 1: remove_dup_value 1.10% : 0.008982s : 1: renormalize.infer 0.06% : 0.000475s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000042s : 1: rewriter_after_opt_a 0.01% : 0.000073s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000077s : 1: symbol_engine_optimizer 93.57% : 0.761704s : 1: task_emit 0.01% : 0.000079s : 1: tuple_transform 0.76% : 0.006198s : 1: type_inference 0.01% : 0.000071s : 1: validate .. TotalTime = 17.9334, [24] [bootstrap]: 0.00049723 [type_inference]: 0.013249 [event_method]: 6.105e-05 [auto_monad]: 0.0001276 [graph_reusing]: 8.02998e-06 [inline]: 2.27001e-06 [add_attr]: 0.00371624, [1] [add_attr_with_inline]: 0.00370246, [1] [Cycle 1]: 9.276e-05, [2] [tag_attr]: 5.233e-05 [meta_addattr_fg_expand]: 1.006e-05 [parallel-infer-symbol]: 4.00998e-06 [pre_auto_parallel]: 5.665e-05 [insert-virtual-dataset]: 2.78e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 2.13002e-06 [optimize]: 0.0162618, [53] [py_interpret_to_execute]: 4.196e-05 [rewriter_before_opt_a]: 0.00017254 [opt_a]: 0.0135545, [3] [Cycle 1]: 0.00882777, [45] [expand_dump_flag]: 5.91e-06 [switch_simplify]: 8.185e-05 [loop_unroll]: 6.741e-05 [a_1]: 0.00171934 [with_stream_mark]: 3.86e-05 [recompute_prepare]: 2.864e-05 [updatestate_depend_eliminate]: 1.059e-05 [updatestate_assign_eliminate]: 7.95e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.53003e-06 [a_2]: 0.000264 [accelerated_algorithm]: 3.625e-05 [shard]: 2.79001e-06 [meta_shard_fg_expand]: 4.43001e-06 [shard_inline]: 1.904e-05 [merge_send_recv]: 1.988e-05 [auto_parallel]: 1.601e-05 [parallel]: 2.178e-05 [flash_sp]: 1.689e-05 [merge_comm]: 1.179e-05 [allreduce_fusion]: 1.046e-05 [matmul_add_comm_reduction]: 2.57e-05 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 2.177e-05 [virtual_dataset]: 1.802e-05 [get_grad_eliminate_]: 1.574e-05 [virtual_output]: 1.549e-05 [merge_forward]: 1.114e-05 [cell_reuse_recompute_pass]: 1.55001e-06 [offload_activation]: 1.944e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.151e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 2.801e-05 [set_forward_comm_id_for_comm_node_pass]: 1.191e-05 [meta_fg_expand]: 0.00180064 [flash_sp_send_recv_attached]: 4.79e-06 [receive_attached]: 2.94001e-06 [after_resolve]: 6.998e-05 [a_after_grad]: 9.369e-05 [renormalize]: 0.0033059 [add_forward_monad_depend]: 1.251e-05 [auto_monad_grad]: 7.31001e-06 [auto_monad_eliminator]: 6.099e-05 [cse]: 0.00017625 [a_3]: 0.00036062 [Cycle 2]: 0.00368598, [45] [expand_dump_flag]: 3.41999e-06 [switch_simplify]: 5.036e-05 [loop_unroll]: 4.555e-05 [a_1]: 0.00167571 [with_stream_mark]: 1.821e-05 [recompute_prepare]: 1.339e-05 [updatestate_depend_eliminate]: 6.59999e-06 [updatestate_assign_eliminate]: 5.74e-06 [updatestate_loads_eliminate]: 4.95999e-06 [parameter_eliminate]: 1.99999e-06 [a_2]: 0.00013581 [accelerated_algorithm]: 1.522e-05 [shard]: 2.64001e-06 [meta_shard_fg_expand]: 3.21999e-06 [shard_inline]: 9.73002e-06 [merge_send_recv]: 1.241e-05 [auto_parallel]: 1.241e-05 [parallel]: 1.113e-05 [flash_sp]: 4.52e-06 [merge_comm]: 7.03e-06 [allreduce_fusion]: 6.76999e-06 [matmul_add_comm_reduction]: 1.275e-05 [allreduce_slice_to_reducescatter]: 9.49978e-07 [virtual_shard_identity]: 1.251e-05 [virtual_dataset]: 9.39e-06 [get_grad_eliminate_]: 9.02e-06 [virtual_output]: 9.09e-06 [merge_forward]: 6.44001e-06 [cell_reuse_recompute_pass]: 2.27001e-06 [offload_activation]: 1.46e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.882e-05 [merge_recompute_call_nodes]: 1.82001e-06 [before_grad]: 1.688e-05 [set_forward_comm_id_for_comm_node_pass]: 5.84e-06 [meta_fg_expand]: 0.00013416 [flash_sp_send_recv_attached]: 2.05002e-06 [receive_attached]: 2.98998e-06 [after_resolve]: 2.069e-05 [a_after_grad]: 1.702e-05 [renormalize]: 0.00088608 [add_forward_monad_depend]: 5.32999e-06 [auto_monad_grad]: 2.58998e-06 [auto_monad_eliminator]: 2.169e-05 [cse]: 7.277e-05 [a_3]: 7.712e-05 [Cycle 3]: 0.00102147, [45] [expand_dump_flag]: 2.21998e-06 [switch_simplify]: 1.192e-05 [loop_unroll]: 9.50001e-06 [a_1]: 0.00026951 [with_stream_mark]: 1.414e-05 [recompute_prepare]: 9.95002e-06 [updatestate_depend_eliminate]: 5.19003e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 4.13001e-06 [parameter_eliminate]: 1.50001e-06 [a_2]: 0.00016666 [accelerated_algorithm]: 1.376e-05 [shard]: 2.16998e-06 [meta_shard_fg_expand]: 2.12001e-06 [shard_inline]: 9.37999e-06 [merge_send_recv]: 9.91e-06 [auto_parallel]: 9.66e-06 [parallel]: 7.00002e-06 [flash_sp]: 1.73002e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 5.58002e-06 [matmul_add_comm_reduction]: 9.69999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.071e-05 [virtual_dataset]: 9.05001e-06 [get_grad_eliminate_]: 8.87999e-06 [virtual_output]: 8.64e-06 [merge_forward]: 6.01998e-06 [cell_reuse_recompute_pass]: 2.64001e-06 [offload_activation]: 1.035e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.726e-05 [merge_recompute_call_nodes]: 9.80013e-07 [before_grad]: 1.474e-05 [set_forward_comm_id_for_comm_node_pass]: 5.72999e-06 [meta_fg_expand]: 4.03001e-06 [flash_sp_send_recv_attached]: 1.24e-06 [receive_attached]: 2.06e-06 [after_resolve]: 1.497e-05 [a_after_grad]: 1.531e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.99e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.205e-05 [cse]: 2.973e-05 [a_3]: 6.137e-05 [py_interpret_to_execute_after_opt_a]: 1.533e-05 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 6.182e-05 [convert_after_rewriter]: 9.98998e-06 [order_py_execute_after_rewriter]: 7.18e-06 [mutable_eliminate]: 0.00066846 [opt_b]: 0.00031876, [1] [Cycle 1]: 0.00031072, [7] [b_1]: 0.00020039 [b_2]: 1.204e-05 [updatestate_depend_eliminate]: 7.71999e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 4.50999e-06 [renormalize]: 7.59988e-07 [cse]: 4.26e-05 [optimize_parallel_all_gather_comm]: 2.778e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 3.016e-05 [loop_unroll]: 0.00046656 [opt_after_cconv]: 0.00014734, [1] [Cycle 1]: 0.00014041, [7] [c_1]: 5.091e-05 [parameter_eliminate]: 2.69999e-06 [updatestate_depend_eliminate]: 7.65e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 4.22e-06 [cse]: 3.433e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 4.65e-05 [tuple_transform]: 0.00010929, [1] [Cycle 1]: 0.00010382, [4] [d_1]: 7.163e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 1.033e-05 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 6.569e-05 [cse_after_recomputation]: 3.654e-05, [1] [Cycle 1]: 3.141e-05, [1] [cse]: 2.47e-05 [environ_conv]: 9.52001e-06 [swap_dp_allreduce_reducescatter]: 8.68001e-06 [bias_add_comm_swap]: 2.69999e-06 [label_micro_interleaved_index]: 4.41002e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.56e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.15001e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.78003e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.32999e-06 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.29998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.947e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 5.24003e-06 [overlap_recompute_and_grad_model_parallel]: 5.99999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.53e-06 [overlap_grad_ring_attention]: 5.30999e-06 [overlap_grad_flash_sp]: 2.718e-05 [begin_end_overlap_inline]: 8.99978e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 0.00010766, [1] [Cycle 1]: 0.00010258, [6] [build]: 8.79e-06 [elim_shapecalc]: 1.5e-05 [elim_not_effective]: 1.958e-05 [opt_reshape]: 1.164e-05 [fold_const_symbol]: 1.612e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.66e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 2.844e-05 [get_jit_bprop_graph]: 2.11e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00049411 [validate]: 5.009e-05 [backend_pass]: 1.04e-06 [task_emit]: 17.8986 [execute]: 1.068e-05 Sums bootstrap : 0.000497s : 0.00% type_inference : 0.013249s : 0.07% event_method : 0.000061s : 0.00% auto_monad : 0.000128s : 0.00% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000052s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000057s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.00% optimize.rewriter_before_opt_a : 0.000173s : 0.00% optimize.opt_a.expand_dump_flag : 0.000012s : 0.00% optimize.opt_a.switch_simplify : 0.000144s : 0.00% optimize.opt_a.loop_unroll : 0.000122s : 0.00% optimize.opt_a.a_1 : 0.003665s : 0.02% optimize.opt_a.with_stream_mark : 0.000071s : 0.00% optimize.opt_a.recompute_prepare : 0.000052s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.00% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000566s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000065s : 0.00% optimize.opt_a.shard : 0.000008s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.00% optimize.opt_a.shard_inline : 0.000038s : 0.00% optimize.opt_a.merge_send_recv : 0.000042s : 0.00% optimize.opt_a.auto_parallel : 0.000038s : 0.00% optimize.opt_a.parallel : 0.000040s : 0.00% optimize.opt_a.flash_sp : 0.000023s : 0.00% optimize.opt_a.merge_comm : 0.000024s : 0.00% optimize.opt_a.allreduce_fusion : 0.000023s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000045s : 0.00% optimize.opt_a.virtual_dataset : 0.000036s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.00% optimize.opt_a.virtual_output : 0.000033s : 0.00% optimize.opt_a.merge_forward : 0.000024s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000068s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000060s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.00% optimize.opt_a.meta_fg_expand : 0.001939s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000008s : 0.00% optimize.opt_a.after_resolve : 0.000106s : 0.00% optimize.opt_a.a_after_grad : 0.000126s : 0.00% optimize.opt_a.renormalize : 0.004192s : 0.02% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.00% optimize.opt_a.auto_monad_grad : 0.000012s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000095s : 0.00% optimize.opt_a.cse : 0.000279s : 0.00% optimize.opt_a.a_3 : 0.000499s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000062s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000668s : 0.00% optimize.opt_b.b_1 : 0.000200s : 0.00% optimize.opt_b.b_2 : 0.000012s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000043s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000028s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.00% optimize.loop_unroll : 0.000467s : 0.00% optimize.opt_after_cconv.c_1 : 0.000051s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000047s : 0.00% optimize.tuple_transform.d_1 : 0.000072s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000066s : 0.00% optimize.cse_after_recomputation.cse : 0.000025s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000028s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000494s : 0.00% validate : 0.000050s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 17.898577s : 99.83% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.001049 222 6.53% : 0.000069s : 12: substitution.arithmetic_simplify 1.96% : 0.000021s : 2: substitution.cast_eliminate 0.26% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000005s : 5: substitution.float_depend_g_call 0.48% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.21% : 0.000002s : 5: substitution.fold_const_symbol 0.87% : 0.000009s : 8: substitution.graph_param_transform 0.28% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 58.87% : 0.000617s : 17: substitution.inline 2.27% : 0.000024s : 2: substitution.inline_without_move 1.08% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000020s : 3: substitution.less_batch_normalization 1.48% : 0.000015s : 11: substitution.minmaximum_grad 0.63% : 0.000007s : 5: substitution.partial_eliminate 1.41% : 0.000015s : 20: substitution.remove_not_recompute_node 3.33% : 0.000035s : 10: substitution.replace_applicator 1.11% : 0.000012s : 15: substitution.replace_old_param 0.44% : 0.000005s : 1: substitution.set_cell_output_no_recompute 3.04% : 0.000032s : 11: substitution.tuple_list_convert_item_index_to_positive 1.42% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.08% : 0.000022s : 11: substitution.tuple_list_get_item_depend_reorder 7.72% : 0.000081s : 30: substitution.tuple_list_get_item_eliminator 1.95% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013159 2 84.42% : 0.011110s : 1: type_inference.infer 15.58% : 0.002050s : 1: type_inference.specialize ------[replace.] 0.000268 33 58.98% : 0.000158s : 17: replace.inline 41.02% : 0.000110s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000647 33 93.76% : 0.000606s : 17: match.inline 6.24% : 0.000040s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000818 5764 1.05% : 0.000009s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.01% : 0.000008s : 68: predicate.addn_zero_filter 0.99% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.31% : 0.000019s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.11% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.11% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.06% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.12% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.11% : 0.000009s : 76: predicate.environ_get_depend_swap 1.65% : 0.000013s : 108: predicate.environ_get_eliminate 1.14% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.63% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000019s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000006s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.51% : 0.000045s : 249: predicate.inline 1.29% : 0.000011s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.74% : 0.000006s : 32: predicate.less_batch_normalization 1.58% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.50% : 0.000020s : 168: predicate.load_eliminater 0.27% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.36% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.06% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.04% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.04% : 0.000009s : 68: predicate.minmaximum_grad 0.31% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.10% : 0.000017s : 101: predicate.partial_defer_inline 1.64% : 0.000013s : 92: predicate.partial_eliminate 1.11% : 0.000009s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000011s : 68: predicate.reduce_eliminate 2.48% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.78% : 0.000015s : 152: predicate.replace_applicator 0.67% : 0.000006s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000009s : 68: predicate.reshape_eliminate 1.09% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000011s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.35% : 0.000003s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.39% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.76% : 0.000014s : 101: predicate.switch_defer_inline 2.80% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.93% : 0.000040s : 277: predicate.switch_simplify 1.04% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000009s : 68: predicate.transpose_eliminate 1.46% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000013s : 84: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.93% : 0.000024s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000017s : 116: predicate.tuple_list_set_item_eliminator 1.52% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.49% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 5.03% : 0.000041s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002186 34 53.28% : 0.001165s : 13: func_graph_cloner_run.FuncGraphClonerGraph 46.72% : 0.001021s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 17.963416 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.02% : 0.003723s : 1: add_attr 0.02% : 0.003707s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000070s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000135s : 1: auto_monad 0.00% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.00% : 0.000519s : 1: bootstrap 0.00% : 0.000034s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000023s : 1: control_data_broadcast_order 0.00% : 0.000014s : 1: convert_after_rewriter 0.00% : 0.000040s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.00% : 0.000069s : 1: event_method 0.00% : 0.000021s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.00% : 0.000476s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.00% : 0.000677s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000020s : 1: opt.transform.mutable_eliminate 0.03% : 0.005520s : 117: opt.transform.opt_a 0.00% : 0.000049s : 1: opt.transform.opt_after_cconv 0.00% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000183s : 28: opt.transform.opt_b 0.00% : 0.000080s : 2: opt.transform.opt_trans_graph 0.00% : 0.000059s : 4: opt.transform.symbol_engine_opt 0.08% : 0.013559s : 1: opt_a 0.00% : 0.000151s : 1: opt_after_cconv 0.00% : 0.000504s : 1: opt_after_jit_grad 0.00% : 0.000323s : 1: opt_b 0.09% : 0.016267s : 1: optimize 0.00% : 0.000032s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000062s : 1: pre_auto_parallel 0.00% : 0.000047s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000051s : 1: remove_dup_value 0.01% : 0.002250s : 2: renormalize.infer 0.01% : 0.001924s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000067s : 1: rewriter_after_opt_a 0.00% : 0.000178s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000111s : 1: symbol_engine_optimizer 99.64% : 17.898610s : 1: task_emit 0.00% : 0.000112s : 1: tuple_transform 0.07% : 0.013271s : 1: type_inference 0.00% : 0.000076s : 1: validate TotalTime = 0.103781, [24] [bootstrap]: 0.00046358 [type_inference]: 0.00451442 [event_method]: 1.19e-05 [auto_monad]: 5.743e-05 [graph_reusing]: 5.38002e-06 [inline]: 2.14999e-06 [add_attr]: 0.00350083, [1] [add_attr_with_inline]: 0.00349005, [1] [Cycle 1]: 5.991e-05, [2] [tag_attr]: 1.417e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 3.46999e-06 [pre_auto_parallel]: 2.731e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.73002e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.00434533, [53] [py_interpret_to_execute]: 1.724e-05 [rewriter_before_opt_a]: 4.509e-05 [opt_a]: 0.00238541, [2] [Cycle 1]: 0.00174573, [45] [expand_dump_flag]: 2.76e-06 [switch_simplify]: 2.786e-05 [loop_unroll]: 1.5e-05 [a_1]: 0.00032843 [with_stream_mark]: 1.824e-05 [recompute_prepare]: 8.16002e-06 [updatestate_depend_eliminate]: 3.91999e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 8.685e-05 [accelerated_algorithm]: 7.04001e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 6.03002e-06 [merge_send_recv]: 8.70999e-06 [auto_parallel]: 6.49999e-06 [parallel]: 2.349e-05 [flash_sp]: 9.46e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.44001e-06 [matmul_add_comm_reduction]: 1.107e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.9e-06 [virtual_dataset]: 6.24001e-06 [get_grad_eliminate_]: 4.998e-05 [virtual_output]: 6.25002e-06 [merge_forward]: 4.21001e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 1.097e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.233e-05 [merge_recompute_call_nodes]: 2.042e-05 [before_grad]: 1.025e-05 [set_forward_comm_id_for_comm_node_pass]: 3.55998e-06 [meta_fg_expand]: 3.21001e-06 [flash_sp_send_recv_attached]: 3.16999e-06 [receive_attached]: 2.68998e-06 [after_resolve]: 1.277e-05 [a_after_grad]: 2.233e-05 [renormalize]: 0.00062208 [add_forward_monad_depend]: 5.09e-06 [auto_monad_grad]: 2.29001e-06 [auto_monad_eliminator]: 1.564e-05 [cse]: 3.179e-05 [a_3]: 4.489e-05 [Cycle 2]: 0.000629, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 7.44002e-06 [loop_unroll]: 6.09999e-06 [a_1]: 0.00013201 [with_stream_mark]: 1.154e-05 [recompute_prepare]: 6.20002e-06 [updatestate_depend_eliminate]: 3.02002e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.28002e-06 [a_2]: 7.045e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.25001e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.81003e-06 [merge_send_recv]: 5.04e-06 [auto_parallel]: 5.52999e-06 [parallel]: 5.98998e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.05002e-06 [allreduce_fusion]: 2.91999e-06 [matmul_add_comm_reduction]: 6.57002e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 2.94999e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.27e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72001e-06 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 8.38001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.20998e-06 [meta_fg_expand]: 2.04999e-06 [flash_sp_send_recv_attached]: 6.50005e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 9.34998e-06 [a_after_grad]: 8.75999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09003e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.48e-06 [cse]: 1.519e-05 [a_3]: 3.299e-05 [py_interpret_to_execute_after_opt_a]: 8.87e-06 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 3.593e-05 [convert_after_rewriter]: 7.53999e-06 [order_py_execute_after_rewriter]: 5.59998e-06 [mutable_eliminate]: 0.00049931 [opt_b]: 0.00019319, [1] [Cycle 1]: 0.00018656, [7] [b_1]: 0.0001134 [b_2]: 8.07998e-06 [updatestate_depend_eliminate]: 6.20002e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.42001e-06 [renormalize]: 6.19999e-07 [cse]: 1.812e-05 [optimize_parallel_all_gather_comm]: 1.681e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.432e-05 [loop_unroll]: 0.00043542 [opt_after_cconv]: 0.00010086, [1] [Cycle 1]: 9.517e-05, [7] [c_1]: 3.003e-05 [parameter_eliminate]: 3.35998e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.751e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.427e-05 [tuple_transform]: 7.265e-05, [1] [Cycle 1]: 6.814e-05, [4] [d_1]: 4.147e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.44999e-06 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 5.615e-05 [cse_after_recomputation]: 2.209e-05, [1] [Cycle 1]: 1.736e-05, [1] [cse]: 1.205e-05 [environ_conv]: 5.25001e-06 [swap_dp_allreduce_reducescatter]: 5.83002e-06 [bias_add_comm_swap]: 2.82002e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.94999e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.83e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.13001e-06 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.268e-05 [grouped_pairwise_exchange_alltoall]: 1.81998e-06 [offloading_packed_experts]: 4.37998e-06 [overlap_recompute_and_grad_model_parallel]: 4.74998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.924e-05 [begin_end_overlap_inline]: 9.09989e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 2.17001e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 7.222e-05, [1] [Cycle 1]: 6.776e-05, [6] [build]: 2.50002e-06 [elim_shapecalc]: 9.17001e-06 [elim_not_effective]: 1.285e-05 [opt_reshape]: 6.56e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.715e-05 [get_jit_bprop_graph]: 1.99e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00049411 [validate]: 3.602e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.090048 [execute]: 8.59e-06 Sums bootstrap : 0.000464s : 0.47% type_inference : 0.004514s : 4.55% event_method : 0.000012s : 0.01% auto_monad : 0.000057s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.02% optimize.rewriter_before_opt_a : 0.000045s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000035s : 0.04% optimize.opt_a.loop_unroll : 0.000021s : 0.02% optimize.opt_a.a_1 : 0.000460s : 0.46% optimize.opt_a.with_stream_mark : 0.000030s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000157s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000055s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000021s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000031s : 0.03% optimize.opt_a.renormalize : 0.000622s : 0.63% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000047s : 0.05% optimize.opt_a.a_3 : 0.000078s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000499s : 0.50% optimize.opt_b.b_1 : 0.000113s : 0.11% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000435s : 0.44% optimize.opt_after_cconv.c_1 : 0.000030s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000494s : 0.50% validate : 0.000036s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.090048s : 90.75% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000142 26 17.36% : 0.000025s : 4: substitution.arithmetic_simplify 1.32% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000006s : 4: substitution.graph_param_transform 67.38% : 0.000096s : 2: substitution.inline 2.57% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.10% : 0.000004s : 4: substitution.remove_not_recompute_node 2.85% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004466 2 90.44% : 0.004040s : 1: type_inference.infer 9.56% : 0.000427s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000094 2 100.00% : 0.000094s : 2: match.inline ------[predicate.] 0.000147 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.56% : 0.000004s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.75% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.44% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.95% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.87% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 5.87% : 0.000009s : 44: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.83% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.30% : 0.000002s : 11: predicate.partial_defer_inline 1.16% : 0.000002s : 13: predicate.partial_eliminate 1.03% : 0.000002s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.68% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 1.34% : 0.000002s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.29% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.68% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.24% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.55% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.97% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000298 6 36.56% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.44% : 0.000189s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.113253 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.10% : 0.003506s : 1: add_attr 3.09% : 0.003494s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000062s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.44% : 0.000494s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.39% : 0.000444s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.45% : 0.000509s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.75% : 0.000850s : 78: opt.transform.opt_a 0.03% : 0.000029s : 1: opt.transform.opt_after_cconv 0.02% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000095s : 28: opt.transform.opt_b 0.04% : 0.000046s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 2.11% : 0.002389s : 1: opt_a 0.09% : 0.000105s : 1: opt_after_cconv 0.45% : 0.000505s : 1: opt_after_jit_grad 0.17% : 0.000197s : 1: opt_b 3.84% : 0.004350s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000021s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.33% : 0.000377s : 1: renormalize.infer 0.21% : 0.000237s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000040s : 1: rewriter_after_opt_a 0.04% : 0.000049s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000075s : 1: symbol_engine_optimizer 79.53% : 0.090069s : 1: task_emit 0.07% : 0.000076s : 1: tuple_transform 4.00% : 0.004533s : 1: type_inference 0.06% : 0.000067s : 1: validate TotalTime = 0.827308, [24] [bootstrap]: 0.00048561 [type_inference]: 0.0110155 [event_method]: 4.731e-05 [auto_monad]: 0.0001215 [graph_reusing]: 8.25e-06 [inline]: 2.14999e-06 [add_attr]: 0.0031252, [1] [add_attr_with_inline]: 0.00311673, [1] [Cycle 1]: 6.741e-05, [2] [tag_attr]: 3.162e-05 [meta_addattr_fg_expand]: 9.12999e-06 [parallel-infer-symbol]: 2.90002e-06 [pre_auto_parallel]: 4.621e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.0136521, [53] [py_interpret_to_execute]: 3.656e-05 [rewriter_before_opt_a]: 0.00013013 [opt_a]: 0.0113223, [3] [Cycle 1]: 0.00722595, [45] [expand_dump_flag]: 3.3e-06 [switch_simplify]: 6.819e-05 [loop_unroll]: 5.647e-05 [a_1]: 0.00136823 [with_stream_mark]: 2.385e-05 [recompute_prepare]: 2.214e-05 [updatestate_depend_eliminate]: 9.29e-06 [updatestate_assign_eliminate]: 7.6e-06 [updatestate_loads_eliminate]: 7.31001e-06 [parameter_eliminate]: 3.01001e-06 [a_2]: 0.00024759 [accelerated_algorithm]: 3.191e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 3.53e-06 [shard_inline]: 1.654e-05 [merge_send_recv]: 1.681e-05 [auto_parallel]: 1.074e-05 [parallel]: 2.256e-05 [flash_sp]: 1.271e-05 [merge_comm]: 9.80002e-06 [allreduce_fusion]: 9.16002e-06 [matmul_add_comm_reduction]: 2.715e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.837e-05 [virtual_dataset]: 1.606e-05 [get_grad_eliminate_]: 1.585e-05 [virtual_output]: 1.581e-05 [merge_forward]: 9.37999e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 1.767e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.88e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 2.709e-05 [set_forward_comm_id_for_comm_node_pass]: 9.56e-06 [meta_fg_expand]: 0.00143856 [flash_sp_send_recv_attached]: 4.09002e-06 [receive_attached]: 2.40002e-06 [after_resolve]: 6.022e-05 [a_after_grad]: 8.401e-05 [renormalize]: 0.00260921 [add_forward_monad_depend]: 1.707e-05 [auto_monad_grad]: 6.24001e-06 [auto_monad_eliminator]: 5.793e-05 [cse]: 0.0001791 [a_3]: 0.00034195 [Cycle 2]: 0.00307231, [45] [expand_dump_flag]: 1.59e-06 [switch_simplify]: 4.83e-05 [loop_unroll]: 4.478e-05 [a_1]: 0.00157169 [with_stream_mark]: 1.278e-05 [recompute_prepare]: 1.172e-05 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 4.63001e-06 [updatestate_loads_eliminate]: 3.94002e-06 [parameter_eliminate]: 1.22e-06 [a_2]: 0.00012799 [accelerated_algorithm]: 1.219e-05 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.96003e-06 [shard_inline]: 9.46998e-06 [merge_send_recv]: 6.88998e-06 [auto_parallel]: 8.19002e-06 [parallel]: 5.06002e-06 [flash_sp]: 3.28998e-06 [merge_comm]: 5.15999e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 8.72998e-06 [allreduce_slice_to_reducescatter]: 4.89992e-07 [virtual_shard_identity]: 1.051e-05 [virtual_dataset]: 9.10999e-06 [get_grad_eliminate_]: 9.12999e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.21001e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 1.02e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.657e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.423e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 3.62e-05 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 1.30001e-06 [after_resolve]: 1.544e-05 [a_after_grad]: 1.462e-05 [renormalize]: 0.00063529 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.35999e-06 [auto_monad_eliminator]: 1.476e-05 [cse]: 4.979e-05 [a_3]: 6.733e-05 [Cycle 3]: 0.00100912, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.09e-05 [loop_unroll]: 9.24e-06 [a_1]: 0.00032266 [with_stream_mark]: 1.19e-05 [recompute_prepare]: 1.074e-05 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 4.18001e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 0.00012758 [accelerated_algorithm]: 1.208e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.49e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 7.22002e-06 [parallel]: 4.65001e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.04e-06 [allreduce_fusion]: 5.00999e-06 [matmul_add_comm_reduction]: 7.7e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 9.04e-06 [get_grad_eliminate_]: 8.69998e-06 [virtual_output]: 8.68001e-06 [merge_forward]: 4.44002e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 8.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.695e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.446e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 3.24001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 1.37e-05 [a_after_grad]: 1.397e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.45999e-06 [auto_monad_grad]: 1.18001e-06 [auto_monad_eliminator]: 1.182e-05 [cse]: 2.996e-05 [a_3]: 6.145e-05 [py_interpret_to_execute_after_opt_a]: 1.251e-05 [slice_cell_reuse_recomputed_activation]: 2.29999e-06 [rewriter_after_opt_a]: 4.932e-05 [convert_after_rewriter]: 9.20001e-06 [order_py_execute_after_rewriter]: 7.26999e-06 [mutable_eliminate]: 0.00048218 [opt_b]: 0.00029792, [1] [Cycle 1]: 0.00029099, [7] [b_1]: 0.00019415 [b_2]: 1.144e-05 [updatestate_depend_eliminate]: 7.47998e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 4.17998e-06 [renormalize]: 4.30009e-07 [cse]: 3.353e-05 [optimize_parallel_all_gather_comm]: 2.191e-05 [overlap_param_gather]: 2.20002e-06 [cconv]: 2.126e-05 [loop_unroll]: 0.0004375 [opt_after_cconv]: 0.00014141, [1] [Cycle 1]: 0.00013517, [7] [c_1]: 5.037e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 7.55e-06 [updatestate_assign_eliminate]: 4.32998e-06 [updatestate_loads_eliminate]: 4.25e-06 [cse]: 3.122e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 2.937e-05 [tuple_transform]: 0.00010518, [1] [Cycle 1]: 0.00010034, [4] [d_1]: 6.908e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 1.009e-05 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 5.936e-05 [cse_after_recomputation]: 3.534e-05, [1] [Cycle 1]: 3.036e-05, [1] [cse]: 2.454e-05 [environ_conv]: 9.69e-06 [swap_dp_allreduce_reducescatter]: 8.28999e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.67998e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.29003e-06 [slice_recompute_activation]: 2.57001e-06 [micro_interleaved_order_control]: 2.53998e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.11998e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.44e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.74e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 5.15001e-06 [overlap_recompute_and_grad_model_parallel]: 6.66999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.39999e-06 [overlap_grad_ring_attention]: 5.30999e-06 [overlap_grad_flash_sp]: 2.538e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 0.00010179, [1] [Cycle 1]: 9.742e-05, [6] [build]: 1e-05 [elim_shapecalc]: 1.44e-05 [elim_not_effective]: 1.867e-05 [opt_reshape]: 1.061e-05 [fold_const_symbol]: 1.466e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 2.674e-05 [get_jit_bprop_graph]: 1.24e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00047972 [validate]: 4.919e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.797993 [execute]: 8.79e-06 Sums bootstrap : 0.000486s : 0.06% type_inference : 0.011015s : 1.34% event_method : 0.000047s : 0.01% auto_monad : 0.000121s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.00% optimize.rewriter_before_opt_a : 0.000130s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000127s : 0.02% optimize.opt_a.loop_unroll : 0.000110s : 0.01% optimize.opt_a.a_1 : 0.003263s : 0.40% optimize.opt_a.with_stream_mark : 0.000049s : 0.01% optimize.opt_a.recompute_prepare : 0.000045s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000503s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000031s : 0.00% optimize.opt_a.auto_parallel : 0.000026s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000017s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.00% optimize.opt_a.virtual_output : 0.000033s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.001478s : 0.18% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.01% optimize.opt_a.a_after_grad : 0.000113s : 0.01% optimize.opt_a.renormalize : 0.003245s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000023s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.01% optimize.opt_a.cse : 0.000259s : 0.03% optimize.opt_a.a_3 : 0.000471s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000482s : 0.06% optimize.opt_b.b_1 : 0.000194s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.00% optimize.loop_unroll : 0.000438s : 0.05% optimize.opt_after_cconv.c_1 : 0.000050s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000029s : 0.00% optimize.tuple_transform.d_1 : 0.000069s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.01% optimize.cse_after_recomputation.cse : 0.000025s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000027s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000480s : 0.06% validate : 0.000049s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.797993s : 96.98% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000763 218 6.16% : 0.000047s : 11: substitution.arithmetic_simplify 1.89% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.36% : 0.000422s : 16: substitution.inline 2.17% : 0.000017s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.72% : 0.000013s : 20: substitution.remove_not_recompute_node 3.23% : 0.000025s : 10: substitution.replace_applicator 1.37% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.77% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.33% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010943 2 87.39% : 0.009563s : 1: type_inference.infer 12.61% : 0.001379s : 1: type_inference.specialize ------[replace.] 0.000208 30 59.23% : 0.000123s : 16: replace.inline 40.77% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 30 93.03% : 0.000414s : 16: match.inline 6.97% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000812 5663 1.00% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.02% : 0.000008s : 67: predicate.addn_zero_filter 0.95% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000017s : 99: predicate.arithmetic_simplify 1.04% : 0.000008s : 67: predicate.cast_eliminate 1.05% : 0.000009s : 68: predicate.check_bprop_eliminate 0.49% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.09% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.03% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.09% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.10% : 0.000009s : 75: predicate.environ_get_depend_swap 1.62% : 0.000013s : 107: predicate.environ_get_eliminate 1.08% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.55% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.10% : 0.000017s : 97: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.52% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.45% : 0.000004s : 32: predicate.incorporate_call_switch 5.30% : 0.000043s : 244: predicate.inline 1.18% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.46% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.40% : 0.000019s : 164: predicate.load_eliminater 0.31% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.03% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.30% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.03% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.04% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.02% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.13% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.81% : 0.000015s : 97: predicate.partial_defer_inline 1.59% : 0.000013s : 89: predicate.partial_eliminate 0.98% : 0.000008s : 67: predicate.print_const_string_wrapper 0.48% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000010s : 67: predicate.reduce_eliminate 2.45% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.74% : 0.000014s : 149: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.00% : 0.000008s : 67: predicate.reshape_eliminate 1.09% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.22% : 0.000010s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.58% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.14% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.71% : 0.000014s : 97: predicate.switch_defer_inline 2.67% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.51% : 0.000037s : 265: predicate.switch_simplify 0.98% : 0.000008s : 67: predicate.tile_eliminate 0.99% : 0.000008s : 67: predicate.transpose_eliminate 8.60% : 0.000070s : 83: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.62% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.38% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.92% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.47% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.41% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 2.98% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.53% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001626 32 57.56% : 0.000936s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.44% : 0.000690s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.852603 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.37% : 0.003130s : 1: add_attr 0.37% : 0.003121s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000129s : 1: auto_monad 0.00% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000521s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000055s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000446s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000492s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.58% : 0.004944s : 117: opt.transform.opt_a 0.01% : 0.000049s : 1: opt.transform.opt_after_cconv 0.00% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000179s : 28: opt.transform.opt_b 0.01% : 0.000077s : 2: opt.transform.opt_trans_graph 0.01% : 0.000055s : 4: opt.transform.symbol_engine_opt 1.33% : 0.011325s : 1: opt_a 0.02% : 0.000145s : 1: opt_after_cconv 0.06% : 0.000490s : 1: opt_after_jit_grad 0.04% : 0.000302s : 1: opt_b 1.60% : 0.013656s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000051s : 1: pre_auto_parallel 0.00% : 0.000041s : 1: py_interpret_to_execute 0.00% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000034s : 1: remove_dup_value 0.20% : 0.001710s : 2: renormalize.infer 0.18% : 0.001521s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000053s : 1: rewriter_after_opt_a 0.02% : 0.000135s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000105s : 1: symbol_engine_optimizer 93.60% : 0.798014s : 1: task_emit 0.01% : 0.000108s : 1: tuple_transform 1.29% : 0.011033s : 1: type_inference 0.01% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x1-ge],max_mem:4.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x2-pynative],max_mem:4.0M TotalTime = 0.0292265, [24] [bootstrap]: 0.00061052 [type_inference]: 0.0076697 [event_method]: 1.483e-05 [auto_monad]: 5.822e-05 [graph_reusing]: 5.97001e-06 [inline]: 2.43e-06 [add_attr]: 0.00442562, [1] [add_attr_with_inline]: 0.00441288, [1] [Cycle 1]: 5.673e-05, [2] [tag_attr]: 2.145e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 2.95002e-06 [pre_auto_parallel]: 3.48e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00532211, [53] [py_interpret_to_execute]: 2.451e-05 [rewriter_before_opt_a]: 7.026e-05 [opt_a]: 0.00297755, [2] [Cycle 1]: 0.00229547, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 3.362e-05 [loop_unroll]: 2.095e-05 [a_1]: 0.00049608 [with_stream_mark]: 1.553e-05 [recompute_prepare]: 8.59e-06 [updatestate_depend_eliminate]: 4.17e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 8.054e-05 [accelerated_algorithm]: 6.64001e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 7.53e-06 [auto_parallel]: 6.57002e-06 [parallel]: 2.926e-05 [flash_sp]: 1.025e-05 [merge_comm]: 4.58001e-06 [allreduce_fusion]: 3.74002e-06 [matmul_add_comm_reduction]: 8.64e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 8.11002e-06 [virtual_dataset]: 6.72002e-06 [get_grad_eliminate_]: 5.87001e-06 [virtual_output]: 5.94e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 1.107e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.193e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 1.004e-05 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.68003e-06 [receive_attached]: 3.01001e-06 [after_resolve]: 1.115e-05 [a_after_grad]: 9.71998e-06 [renormalize]: 0.00108886 [add_forward_monad_depend]: 5.75001e-06 [auto_monad_grad]: 2.91e-06 [auto_monad_eliminator]: 1.726e-05 [cse]: 2.855e-05 [a_3]: 4.816e-05 [Cycle 2]: 0.00066911, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 7.7e-06 [loop_unroll]: 5.97001e-06 [a_1]: 0.00013997 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 6.02001e-06 [updatestate_depend_eliminate]: 3.29001e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.16002e-06 [a_2]: 7.095e-05 [accelerated_algorithm]: 5.79e-06 [shard]: 1.60999e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 5.54998e-06 [auto_parallel]: 7.29001e-06 [parallel]: 5.79e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 7.73999e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 6.34001e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.35999e-06 [merge_forward]: 3.13e-06 [cell_reuse_recompute_pass]: 2.01998e-06 [offload_activation]: 8.80001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.257e-05 [merge_recompute_call_nodes]: 8.89995e-07 [before_grad]: 9.97001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.82998e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 1.19e-06 [receive_attached]: 1.49e-06 [after_resolve]: 1.083e-05 [a_after_grad]: 9.31e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.57001e-06 [auto_monad_grad]: 9.99979e-07 [auto_monad_eliminator]: 7.43e-06 [cse]: 1.598e-05 [a_3]: 3.396e-05 [py_interpret_to_execute_after_opt_a]: 1.29e-05 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 3.359e-05 [convert_after_rewriter]: 7.86001e-06 [order_py_execute_after_rewriter]: 5.11002e-06 [mutable_eliminate]: 0.00071085 [opt_b]: 0.00022189, [1] [Cycle 1]: 0.00021399, [7] [b_1]: 0.00013291 [b_2]: 8.25999e-06 [updatestate_depend_eliminate]: 5.49998e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 2.65002e-06 [renormalize]: 6.89994e-07 [cse]: 2.258e-05 [optimize_parallel_all_gather_comm]: 1.878e-05 [overlap_param_gather]: 2.54999e-06 [cconv]: 2.714e-05 [loop_unroll]: 0.00049908 [opt_after_cconv]: 0.0001076, [1] [Cycle 1]: 0.00010101, [7] [c_1]: 3.109e-05 [parameter_eliminate]: 3.78001e-06 [updatestate_depend_eliminate]: 6.31998e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.46998e-06 [cse]: 1.951e-05 [renormalize]: 2.10013e-07 [remove_dup_value]: 1.395e-05 [tuple_transform]: 7.954e-05, [1] [Cycle 1]: 7.442e-05, [4] [d_1]: 4.541e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.96999e-06 [partial_unused_args_eliminate]: 2.25002e-06 [add_recomputation]: 5.921e-05 [cse_after_recomputation]: 2.296e-05, [1] [Cycle 1]: 1.782e-05, [1] [cse]: 1.218e-05 [environ_conv]: 5.96998e-06 [swap_dp_allreduce_reducescatter]: 5.52999e-06 [bias_add_comm_swap]: 2.88e-06 [label_micro_interleaved_index]: 4.85001e-06 [label_fine_grained_interleaved_index]: 3.46999e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.39999e-06 [assign_add_opt]: 1.29003e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 7.89994e-07 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 3.13e-06 [comm_op_add_attrs]: 1.39998e-06 [add_comm_op_reuse_tag]: 1.34e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.26002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.27e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 5.45001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.70002e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 2.114e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 8.90024e-07 [symbol_engine_optimizer]: 8.132e-05, [1] [Cycle 1]: 7.658e-05, [6] [build]: 3.91999e-06 [elim_shapecalc]: 1.15e-05 [elim_not_effective]: 1.336e-05 [opt_reshape]: 7.05e-06 [fold_const_symbol]: 1.015e-05 [renormalize]: 2.89991e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.60001e-06 [auto_monad_reorder]: 1.89e-05 [get_jit_bprop_graph]: 2.11998e-06 [rewriter_after_jit_bprop_graph]: 0.00019675 [opt_after_jit_grad]: 0.00067075 [validate]: 4.502e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.0098823 [execute]: 8.89e-06 Sums bootstrap : 0.000611s : 2.57% type_inference : 0.007670s : 32.34% event_method : 0.000015s : 0.06% auto_monad : 0.000058s : 0.25% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000035s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000025s : 0.10% optimize.rewriter_before_opt_a : 0.000070s : 0.30% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.17% optimize.opt_a.loop_unroll : 0.000027s : 0.11% optimize.opt_a.a_1 : 0.000636s : 2.68% optimize.opt_a.with_stream_mark : 0.000029s : 0.12% optimize.opt_a.recompute_prepare : 0.000015s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000151s : 0.64% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.05% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.05% optimize.opt_a.merge_send_recv : 0.000013s : 0.06% optimize.opt_a.auto_parallel : 0.000014s : 0.06% optimize.opt_a.parallel : 0.000035s : 0.15% optimize.opt_a.flash_sp : 0.000014s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.03% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.06% optimize.opt_a.virtual_dataset : 0.000012s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000007s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000020s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000022s : 0.09% optimize.opt_a.a_after_grad : 0.000019s : 0.08% optimize.opt_a.renormalize : 0.001089s : 4.59% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.03% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.10% optimize.opt_a.cse : 0.000045s : 0.19% optimize.opt_a.a_3 : 0.000082s : 0.35% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.14% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000711s : 3.00% optimize.opt_b.b_1 : 0.000133s : 0.56% optimize.opt_b.b_2 : 0.000008s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.08% optimize.overlap_param_gather : 0.000003s : 0.01% optimize.cconv : 0.000027s : 0.11% optimize.loop_unroll : 0.000499s : 2.10% optimize.opt_after_cconv.c_1 : 0.000031s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.06% optimize.tuple_transform.d_1 : 0.000045s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000059s : 0.25% optimize.cse_after_recomputation.cse : 0.000012s : 0.05% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000019s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000197s : 0.83% opt_after_jit_grad : 0.000671s : 2.83% validate : 0.000045s : 0.19% backend_pass : 0.000001s : 0.00% task_emit : 0.009882s : 41.67% execute : 0.000009s : 0.04% Time group info: ------[substitution.] 0.000206 30 15.10% : 0.000031s : 5: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000002s : 2: substitution.fold_const_symbol 3.19% : 0.000007s : 4: substitution.graph_param_transform 66.60% : 0.000137s : 3: substitution.inline 1.93% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.11% : 0.000006s : 4: substitution.remove_not_recompute_node 2.43% : 0.000005s : 4: substitution.replace_old_param 5.72% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007603 2 90.57% : 0.006886s : 1: type_inference.infer 9.43% : 0.000717s : 1: type_inference.specialize ------[replace.] 0.000040 5 68.55% : 0.000028s : 3: replace.inline 31.45% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000146 5 92.69% : 0.000135s : 3: match.inline 7.31% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000179 1131 0.81% : 0.000001s : 11: predicate.accumulaten_eliminater 1.38% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.84% : 0.000002s : 11: predicate.adjust_all_reduce_mul_add 2.40% : 0.000004s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.51% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.98% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.48% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.95% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.21% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 16: predicate.float_depend_g_call 0.51% : 0.000001s : 8: predicate.float_environ_get_switch 0.83% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.58% : 0.000001s : 8: predicate.incorporate_call 0.51% : 0.000001s : 8: predicate.incorporate_call_switch 5.79% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000002s : 8: predicate.less_batch_normalization 2.20% : 0.000004s : 21: predicate.list_to_tuple_eliminator_ 2.25% : 0.000004s : 32: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.02% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 11: predicate.minmaximum_grad 1.46% : 0.000003s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.53% : 0.000003s : 16: predicate.partial_defer_inline 1.34% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000002s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.16% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.96% : 0.000002s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000002s : 8: predicate.same_eliminate 0.42% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000002s : 8: predicate.special_op_eliminate 0.70% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.80% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.73% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000002s : 11: predicate.transpose_eliminate 1.53% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.44% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.64% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.49% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.22% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.76% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000518 8 41.15% : 0.000213s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.85% : 0.000305s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.041261 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.74% : 0.004431s : 1: add_attr 10.70% : 0.004417s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.15% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.15% : 0.000064s : 1: auto_monad 0.05% : 0.000023s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.54% : 0.000637s : 1: bootstrap 0.08% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.06% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000009s : 1: environ_conv 0.05% : 0.000021s : 1: event_method 0.04% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000007s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.23% : 0.000509s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000722s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.03% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000015s : 1: opt.transform.mutable_eliminate 2.49% : 0.001027s : 78: opt.transform.opt_a 0.07% : 0.000029s : 1: opt.transform.opt_after_cconv 0.07% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000105s : 28: opt.transform.opt_b 0.12% : 0.000050s : 2: opt.transform.opt_trans_graph 0.09% : 0.000038s : 4: opt.transform.symbol_engine_opt 7.22% : 0.002981s : 1: opt_a 0.27% : 0.000111s : 1: opt_after_cconv 1.66% : 0.000683s : 1: opt_after_jit_grad 0.55% : 0.000226s : 1: opt_b 12.91% : 0.005327s : 1: optimize 0.05% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000039s : 1: pre_auto_parallel 0.07% : 0.000028s : 1: py_interpret_to_execute 0.04% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.04% : 0.000018s : 1: remove_dup_value 1.66% : 0.000685s : 1: renormalize.infer 0.96% : 0.000394s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.49% : 0.000203s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000038s : 1: rewriter_after_opt_a 0.18% : 0.000074s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.20% : 0.000084s : 1: symbol_engine_optimizer 24.00% : 0.009903s : 1: task_emit 0.20% : 0.000083s : 1: tuple_transform 18.63% : 0.007688s : 1: type_inference 0.22% : 0.000090s : 1: validate TotalTime = 0.0407988, [24] [bootstrap]: 0.00049778 [type_inference]: 0.00497777 [event_method]: 1.135e-05 [auto_monad]: 5.77e-05 [graph_reusing]: 5.66e-06 [inline]: 2.50002e-06 [add_attr]: 0.00325547, [1] [add_attr_with_inline]: 0.00324577, [1] [Cycle 1]: 5.74e-05, [2] [tag_attr]: 1.361e-05 [meta_addattr_fg_expand]: 3.91999e-06 [parallel-infer-symbol]: 3.68999e-06 [pre_auto_parallel]: 2.476e-05 [insert-virtual-dataset]: 2.91999e-06 [parallel-infer-symbol-second]: 6.99976e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00404738, [53] [py_interpret_to_execute]: 1.807e-05 [rewriter_before_opt_a]: 4.268e-05 [opt_a]: 0.00209812, [2] [Cycle 1]: 0.00143169, [45] [expand_dump_flag]: 3.16001e-06 [switch_simplify]: 2.572e-05 [loop_unroll]: 1.367e-05 [a_1]: 0.00031567 [with_stream_mark]: 1.497e-05 [recompute_prepare]: 7.97e-06 [updatestate_depend_eliminate]: 3.76001e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.85001e-06 [a_2]: 7.974e-05 [accelerated_algorithm]: 6.79999e-06 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 8.18001e-06 [auto_parallel]: 7.48e-06 [parallel]: 1.964e-05 [flash_sp]: 8.35999e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.53999e-06 [matmul_add_comm_reduction]: 9.72001e-06 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 5.76003e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.92001e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.18002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.159e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 9.63997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.71001e-06 [meta_fg_expand]: 2.54001e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.65002e-06 [after_resolve]: 1.112e-05 [a_after_grad]: 9.12999e-06 [renormalize]: 0.00046211 [add_forward_monad_depend]: 5.40999e-06 [auto_monad_grad]: 2.39999e-06 [auto_monad_eliminator]: 1.52e-05 [cse]: 3.012e-05 [a_3]: 4.268e-05 [Cycle 2]: 0.00065659, [45] [expand_dump_flag]: 1.14998e-06 [switch_simplify]: 7.26001e-06 [loop_unroll]: 5.77001e-06 [a_1]: 0.0001295 [with_stream_mark]: 1.037e-05 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00010306 [accelerated_algorithm]: 6.08998e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.61998e-06 [merge_send_recv]: 5.05999e-06 [auto_parallel]: 5.45001e-06 [parallel]: 4.71002e-06 [flash_sp]: 3.39001e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 6.06e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.56e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.20001e-06 [merge_forward]: 3.06001e-06 [cell_reuse_recompute_pass]: 1.67001e-06 [offload_activation]: 6.58e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.019e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.3e-06 [set_forward_comm_id_for_comm_node_pass]: 3.67002e-06 [meta_fg_expand]: 1.86003e-06 [flash_sp_send_recv_attached]: 1.28002e-06 [receive_attached]: 1.37e-06 [after_resolve]: 9.62001e-06 [a_after_grad]: 8.47e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.37e-06 [auto_monad_grad]: 1.29998e-06 [auto_monad_eliminator]: 6.91001e-06 [cse]: 1.417e-05 [a_3]: 3.28e-05 [py_interpret_to_execute_after_opt_a]: 8.32998e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.613e-05 [convert_after_rewriter]: 6.94999e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.00050984 [opt_b]: 0.00019258, [1] [Cycle 1]: 0.00018586, [7] [b_1]: 0.0001127 [b_2]: 8.00999e-06 [updatestate_depend_eliminate]: 6.48e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.44999e-06 [renormalize]: 3.50003e-07 [cse]: 1.686e-05 [optimize_parallel_all_gather_comm]: 1.661e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.691e-05 [loop_unroll]: 0.00043241 [opt_after_cconv]: 0.00010063, [1] [Cycle 1]: 9.487e-05, [7] [c_1]: 2.956e-05 [parameter_eliminate]: 2.69999e-06 [updatestate_depend_eliminate]: 5.54998e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.726e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.33e-05 [tuple_transform]: 7.328e-05, [1] [Cycle 1]: 6.892e-05, [4] [d_1]: 4.185e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.56999e-06 [partial_unused_args_eliminate]: 2.25002e-06 [add_recomputation]: 4.77e-05 [cse_after_recomputation]: 2.074e-05, [1] [Cycle 1]: 1.612e-05, [1] [cse]: 1.089e-05 [environ_conv]: 4.94e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 4.53999e-06 [label_fine_grained_interleaved_index]: 3.09001e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.77002e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.79002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.50002e-06 [overlap_grad_ring_attention]: 4.14002e-06 [overlap_grad_flash_sp]: 1.859e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.78e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 7.227e-05, [1] [Cycle 1]: 6.82e-05, [6] [build]: 3.13e-06 [elim_shapecalc]: 8.97999e-06 [elim_not_effective]: 1.247e-05 [opt_reshape]: 6.53998e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.686e-05 [get_jit_bprop_graph]: 2.24001e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00047692 [validate]: 3.707e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0271271 [execute]: 8.54002e-06 Sums bootstrap : 0.000498s : 1.36% type_inference : 0.004978s : 13.63% event_method : 0.000011s : 0.03% auto_monad : 0.000058s : 0.16% graph_reusing : 0.000006s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000025s : 0.07% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.05% optimize.rewriter_before_opt_a : 0.000043s : 0.12% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000033s : 0.09% optimize.opt_a.loop_unroll : 0.000019s : 0.05% optimize.opt_a.a_1 : 0.000445s : 1.22% optimize.opt_a.with_stream_mark : 0.000025s : 0.07% optimize.opt_a.recompute_prepare : 0.000014s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000183s : 0.50% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.04% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.03% optimize.opt_a.merge_send_recv : 0.000013s : 0.04% optimize.opt_a.auto_parallel : 0.000013s : 0.04% optimize.opt_a.parallel : 0.000024s : 0.07% optimize.opt_a.flash_sp : 0.000012s : 0.03% optimize.opt_a.merge_comm : 0.000007s : 0.02% optimize.opt_a.allreduce_fusion : 0.000006s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.04% optimize.opt_a.virtual_dataset : 0.000011s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.03% optimize.opt_a.virtual_output : 0.000011s : 0.03% optimize.opt_a.merge_forward : 0.000007s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.02% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.06% optimize.opt_a.a_after_grad : 0.000018s : 0.05% optimize.opt_a.renormalize : 0.000462s : 1.27% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.02% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.06% optimize.opt_a.cse : 0.000044s : 0.12% optimize.opt_a.a_3 : 0.000075s : 0.21% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.10% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000510s : 1.40% optimize.opt_b.b_1 : 0.000113s : 0.31% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.05% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.07% optimize.loop_unroll : 0.000432s : 1.18% optimize.opt_after_cconv.c_1 : 0.000030s : 0.08% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.05% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.04% optimize.tuple_transform.d_1 : 0.000042s : 0.11% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.13% optimize.cse_after_recomputation.cse : 0.000011s : 0.03% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.05% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.05% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000477s : 1.31% validate : 0.000037s : 0.10% backend_pass : 0.000001s : 0.00% task_emit : 0.027127s : 74.28% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000137 26 17.37% : 0.000024s : 4: substitution.arithmetic_simplify 1.36% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 4.06% : 0.000006s : 4: substitution.graph_param_transform 67.43% : 0.000092s : 2: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.38% : 0.000005s : 4: substitution.remove_not_recompute_node 3.27% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004933 2 91.99% : 0.004538s : 1: type_inference.infer 8.01% : 0.000395s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000091 2 100.00% : 0.000091s : 2: match.inline ------[predicate.] 0.000144 984 0.95% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.66% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.51% : 0.000004s : 17: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.07% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.40% : 0.000001s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.95% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 6.23% : 0.000009s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.50% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.50% : 0.000004s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.66% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.32% : 0.000002s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.62% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 1.03% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.97% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.30% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.48% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000279 6 41.40% : 0.000116s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.60% : 0.000164s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.049529 196 0.01% : 0.000004s : 1: ForceFp32Comm 6.58% : 0.003260s : 1: add_attr 6.56% : 0.003249s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.13% : 0.000063s : 1: auto_monad 0.04% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 1.08% : 0.000535s : 1: bootstrap 0.06% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.05% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.03% : 0.000017s : 1: event_method 0.03% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 0.89% : 0.000442s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.05% : 0.000520s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.03% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000014s : 1: opt.transform.mutable_eliminate 1.63% : 0.000809s : 78: opt.transform.opt_a 0.06% : 0.000028s : 1: opt.transform.opt_after_cconv 0.05% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.19% : 0.000095s : 28: opt.transform.opt_b 0.09% : 0.000046s : 2: opt.transform.opt_trans_graph 0.07% : 0.000034s : 4: opt.transform.symbol_engine_opt 4.24% : 0.002101s : 1: opt_a 0.21% : 0.000104s : 1: opt_after_cconv 0.99% : 0.000488s : 1: opt_after_jit_grad 0.40% : 0.000196s : 1: opt_b 8.18% : 0.004051s : 1: optimize 0.04% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.04% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.06% : 0.000029s : 1: pre_auto_parallel 0.04% : 0.000022s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000017s : 1: remove_dup_value 0.53% : 0.000260s : 1: renormalize.infer 0.39% : 0.000194s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000040s : 1: rewriter_after_opt_a 0.09% : 0.000046s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000075s : 1: symbol_engine_optimizer 54.81% : 0.027146s : 1: task_emit 0.15% : 0.000076s : 1: tuple_transform 10.09% : 0.004996s : 1: type_inference 0.14% : 0.000068s : 1: validate TotalTime = 0.0657091, [24] [bootstrap]: 0.00052694 [type_inference]: 0.0312678 [event_method]: 1.74e-05 [auto_monad]: 6.218e-05 [graph_reusing]: 6.34001e-06 [inline]: 2.43002e-06 [add_attr]: 0.00379116, [1] [add_attr_with_inline]: 0.00377894, [1] [Cycle 1]: 7.914e-05, [2] [tag_attr]: 2.074e-05 [meta_addattr_fg_expand]: 5.13002e-06 [parallel-infer-symbol]: 3.73999e-06 [pre_auto_parallel]: 3.448e-05 [insert-virtual-dataset]: 3.08e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.15002e-06 [pipeline_split]: 1.69998e-06 [optimize]: 0.0214744, [53] [py_interpret_to_execute]: 2.73e-05 [rewriter_before_opt_a]: 6.686e-05 [opt_a]: 0.0190029, [2] [Cycle 1]: 0.0182648, [45] [expand_dump_flag]: 3.11999e-06 [switch_simplify]: 3.615e-05 [loop_unroll]: 2.245e-05 [a_1]: 0.00049843 [with_stream_mark]: 1.862e-05 [recompute_prepare]: 9.16002e-06 [updatestate_depend_eliminate]: 4.15e-06 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.82999e-06 [a_2]: 7.908e-05 [accelerated_algorithm]: 7.3e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 6.36998e-06 [merge_send_recv]: 8.31002e-06 [auto_parallel]: 7.5e-06 [parallel]: 1.976e-05 [flash_sp]: 8.37e-06 [merge_comm]: 3.41001e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 1.027e-05 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 7.86001e-06 [virtual_dataset]: 5.90002e-06 [get_grad_eliminate_]: 5.39998e-06 [virtual_output]: 5.46e-06 [merge_forward]: 4.11001e-06 [cell_reuse_recompute_pass]: 1.84e-06 [offload_activation]: 1.029e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.194e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 2.82002e-06 [after_resolve]: 1.087e-05 [a_after_grad]: 9.27001e-06 [renormalize]: 0.0169742 [add_forward_monad_depend]: 1.229e-05 [auto_monad_grad]: 2.65997e-06 [auto_monad_eliminator]: 2.677e-05 [cse]: 3.419e-05 [a_3]: 6.054e-05 [Cycle 2]: 0.00072231, [45] [expand_dump_flag]: 3.21999e-06 [switch_simplify]: 9.42999e-06 [loop_unroll]: 6.18998e-06 [a_1]: 0.00014908 [with_stream_mark]: 1.794e-05 [recompute_prepare]: 6.33e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.45003e-06 [updatestate_loads_eliminate]: 3.44001e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.126e-05 [accelerated_algorithm]: 6.41998e-06 [shard]: 3.04999e-06 [meta_shard_fg_expand]: 2.38998e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 8.95999e-06 [auto_parallel]: 1.001e-05 [parallel]: 9.67001e-06 [flash_sp]: 4.38999e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 1.149e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 6.88e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.76998e-06 [virtual_output]: 5.12e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 3.26999e-06 [offload_activation]: 1.073e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.258e-05 [merge_recompute_call_nodes]: 1.84998e-06 [before_grad]: 9.07001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.12e-06 [meta_fg_expand]: 2.53e-06 [flash_sp_send_recv_attached]: 1.89999e-06 [receive_attached]: 3.11001e-06 [after_resolve]: 1.273e-05 [a_after_grad]: 8.25e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 2.45002e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 8.36002e-06 [cse]: 1.68e-05 [a_3]: 3.409e-05 [py_interpret_to_execute_after_opt_a]: 2.084e-05 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 4.407e-05 [convert_after_rewriter]: 8.04002e-06 [order_py_execute_after_rewriter]: 5.35001e-06 [mutable_eliminate]: 0.00083534 [opt_b]: 0.000205, [1] [Cycle 1]: 0.00019694, [7] [b_1]: 0.00011323 [b_2]: 6.94001e-06 [updatestate_depend_eliminate]: 8.77999e-06 [updatestate_assign_eliminate]: 2.73998e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 7.99977e-07 [cse]: 2.45e-05 [optimize_parallel_all_gather_comm]: 2.209e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 3.429e-05 [loop_unroll]: 0.00050095 [opt_after_cconv]: 0.00010405, [1] [Cycle 1]: 9.746e-05, [7] [c_1]: 2.928e-05 [parameter_eliminate]: 4.3e-06 [updatestate_depend_eliminate]: 5.93002e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.08998e-06 [cse]: 1.855e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.485e-05 [tuple_transform]: 7.56e-05, [1] [Cycle 1]: 7.047e-05, [4] [d_1]: 4.312e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.83998e-06 [partial_unused_args_eliminate]: 2.16e-06 [add_recomputation]: 5.315e-05 [cse_after_recomputation]: 2.328e-05, [1] [Cycle 1]: 1.776e-05, [1] [cse]: 1.174e-05 [environ_conv]: 5.61e-06 [swap_dp_allreduce_reducescatter]: 5.59998e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 5.61e-06 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.65001e-06 [slice_recompute_activation]: 2.35997e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 3.13e-06 [reorder_send_recv_between_fp_bp]: 2.94001e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 1.16002e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.324e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.42001e-06 [overlap_grad_ring_attention]: 4.4e-06 [overlap_grad_flash_sp]: 2.288e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 7.719e-05, [1] [Cycle 1]: 7.192e-05, [6] [build]: 3.85e-06 [elim_shapecalc]: 1.014e-05 [elim_not_effective]: 1.186e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 9.11002e-06 [renormalize]: 1.60013e-07 [detach_backward]: 2.78e-06 [pipeline_parallel_scheduler]: 1.89e-06 [auto_monad_reorder]: 1.686e-05 [get_jit_bprop_graph]: 2.12999e-06 [rewriter_after_jit_bprop_graph]: 5.32001e-06 [opt_after_jit_grad]: 0.00061753 [validate]: 4.346e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.00731153 [execute]: 1.167e-05 Sums bootstrap : 0.000527s : 0.87% type_inference : 0.031268s : 51.67% event_method : 0.000017s : 0.03% auto_monad : 0.000062s : 0.10% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000034s : 0.06% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000027s : 0.05% optimize.rewriter_before_opt_a : 0.000067s : 0.11% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000046s : 0.08% optimize.opt_a.loop_unroll : 0.000029s : 0.05% optimize.opt_a.a_1 : 0.000648s : 1.07% optimize.opt_a.with_stream_mark : 0.000037s : 0.06% optimize.opt_a.recompute_prepare : 0.000015s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000150s : 0.25% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.02% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000017s : 0.03% optimize.opt_a.auto_parallel : 0.000018s : 0.03% optimize.opt_a.parallel : 0.000029s : 0.05% optimize.opt_a.flash_sp : 0.000013s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000022s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000009s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000021s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000024s : 0.04% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.016974s : 28.05% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000035s : 0.06% optimize.opt_a.cse : 0.000051s : 0.08% optimize.opt_a.a_3 : 0.000095s : 0.16% optimize.py_interpret_to_execute_after_opt_a : 0.000021s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000044s : 0.07% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000835s : 1.38% optimize.opt_b.b_1 : 0.000113s : 0.19% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.04% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000034s : 0.06% optimize.loop_unroll : 0.000501s : 0.83% optimize.opt_after_cconv.c_1 : 0.000029s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000043s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000618s : 1.02% validate : 0.000043s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.007312s : 12.08% execute : 0.000012s : 0.02% Time group info: ------[substitution.] 0.000213 30 17.16% : 0.000037s : 5: substitution.arithmetic_simplify 0.82% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000001s : 2: substitution.fold_const_symbol 2.81% : 0.000006s : 4: substitution.graph_param_transform 65.65% : 0.000140s : 3: substitution.inline 1.81% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000006s : 4: substitution.remove_not_recompute_node 2.91% : 0.000006s : 4: substitution.replace_old_param 5.46% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.031209 2 97.99% : 0.030582s : 1: type_inference.infer 2.01% : 0.000627s : 1: type_inference.specialize ------[replace.] 0.000043 5 70.90% : 0.000031s : 3: replace.inline 29.10% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000148 5 92.89% : 0.000138s : 3: match.inline 7.11% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000181 1131 0.87% : 0.000002s : 11: predicate.accumulaten_eliminater 1.04% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.50% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000002s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.44% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000002s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.56% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.79% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000001s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.04% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.23% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.22% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 0.95% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.13% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.06% : 0.000004s : 16: predicate.float_depend_g_call 0.54% : 0.000001s : 8: predicate.float_environ_get_switch 0.77% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.60% : 0.000001s : 8: predicate.incorporate_call 0.50% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000011s : 51: predicate.inline 0.89% : 0.000002s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000002s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.19% : 0.000004s : 32: predicate.load_eliminater 1.75% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.06% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.59% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 11: predicate.minmaximum_grad 2.44% : 0.000004s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.69% : 0.000003s : 16: predicate.partial_defer_inline 1.27% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000002s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.30% : 0.000002s : 11: predicate.reduce_eliminate 2.19% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 8: predicate.remove_not_recompute_node 1.58% : 0.000003s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.98% : 0.000002s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 1.13% : 0.000002s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000002s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 1.24% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.22% : 0.000002s : 16: predicate.switch_defer_inline 1.84% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.75% : 0.000009s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000002s : 11: predicate.transpose_eliminate 1.44% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.08% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.82% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.42% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.016567 8 1.13% : 0.000188s : 3: func_graph_cloner_run.FuncGraphClonerGraph 98.87% : 0.016379s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.109154 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.48% : 0.003798s : 1: add_attr 3.47% : 0.003784s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000068s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.52% : 0.000565s : 1: bootstrap 0.04% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000025s : 1: event_method 0.02% : 0.000021s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.47% : 0.000511s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000850s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000021s : 1: opt.transform.mutable_eliminate 0.96% : 0.001050s : 78: opt.transform.opt_a 0.03% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000092s : 28: opt.transform.opt_b 0.04% : 0.000048s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 17.41% : 0.019007s : 1: opt_a 0.10% : 0.000108s : 1: opt_after_cconv 0.58% : 0.000630s : 1: opt_after_jit_grad 0.19% : 0.000209s : 1: opt_b 19.68% : 0.021481s : 1: optimize 0.02% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000039s : 1: pre_auto_parallel 0.03% : 0.000031s : 1: py_interpret_to_execute 0.02% : 0.000024s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.35% : 0.000378s : 1: renormalize.infer 15.19% : 0.016583s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000049s : 1: rewriter_after_opt_a 0.07% : 0.000073s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000080s : 1: symbol_engine_optimizer 6.94% : 0.007573s : 1: task_emit 0.07% : 0.000079s : 1: tuple_transform 28.67% : 0.031290s : 1: type_inference 0.08% : 0.000082s : 1: validate TotalTime = 0.0441622, [24] [bootstrap]: 0.00050899 [type_inference]: 0.0129234 [event_method]: 5.645e-05 [auto_monad]: 0.00013233 [graph_reusing]: 8.79e-06 [inline]: 2.25002e-06 [add_attr]: 0.00354905, [1] [add_attr_with_inline]: 0.00353956, [1] [Cycle 1]: 8.215e-05, [2] [tag_attr]: 3.666e-05 [meta_addattr_fg_expand]: 9.83002e-06 [parallel-infer-symbol]: 3.22002e-06 [pre_auto_parallel]: 5.326e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.23998e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.0164202, [53] [py_interpret_to_execute]: 4.072e-05 [rewriter_before_opt_a]: 0.00015571 [opt_a]: 0.0133324, [3] [Cycle 1]: 0.0087224, [45] [expand_dump_flag]: 4.22e-06 [switch_simplify]: 7.319e-05 [loop_unroll]: 6.275e-05 [a_1]: 0.00146406 [with_stream_mark]: 2.56e-05 [recompute_prepare]: 2.31e-05 [updatestate_depend_eliminate]: 1.028e-05 [updatestate_assign_eliminate]: 8.00999e-06 [updatestate_loads_eliminate]: 7.58001e-06 [parameter_eliminate]: 2.81e-06 [a_2]: 0.00024726 [accelerated_algorithm]: 3.203e-05 [shard]: 1.95001e-06 [meta_shard_fg_expand]: 3.76001e-06 [shard_inline]: 1.636e-05 [merge_send_recv]: 1.85e-05 [auto_parallel]: 1.231e-05 [parallel]: 2.099e-05 [flash_sp]: 1.325e-05 [merge_comm]: 1.038e-05 [allreduce_fusion]: 9.37001e-06 [matmul_add_comm_reduction]: 3.162e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 1.923e-05 [virtual_dataset]: 1.598e-05 [get_grad_eliminate_]: 1.557e-05 [virtual_output]: 1.556e-05 [merge_forward]: 1.011e-05 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.967e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.005e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 7.156e-05 [set_forward_comm_id_for_comm_node_pass]: 1.063e-05 [meta_fg_expand]: 0.00186368 [flash_sp_send_recv_attached]: 5.01002e-06 [receive_attached]: 2.41998e-06 [after_resolve]: 6.405e-05 [a_after_grad]: 8.64e-05 [renormalize]: 0.00347584 [add_forward_monad_depend]: 1.159e-05 [auto_monad_grad]: 7.24001e-06 [auto_monad_eliminator]: 6.009e-05 [cse]: 0.00018065 [a_3]: 0.0003506 [Cycle 2]: 0.00362779, [45] [expand_dump_flag]: 3.13e-06 [switch_simplify]: 4.854e-05 [loop_unroll]: 4.516e-05 [a_1]: 0.0016739 [with_stream_mark]: 1.654e-05 [recompute_prepare]: 1.172e-05 [updatestate_depend_eliminate]: 6.44999e-06 [updatestate_assign_eliminate]: 5.10001e-06 [updatestate_loads_eliminate]: 4.20999e-06 [parameter_eliminate]: 2.03002e-06 [a_2]: 0.00013794 [accelerated_algorithm]: 1.332e-05 [shard]: 1.60999e-06 [meta_shard_fg_expand]: 2.79999e-06 [shard_inline]: 9.70002e-06 [merge_send_recv]: 8.20999e-06 [auto_parallel]: 9.24e-06 [parallel]: 6.80998e-06 [flash_sp]: 3.12002e-06 [merge_comm]: 5.82999e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 8.23999e-06 [allreduce_slice_to_reducescatter]: 7.40023e-07 [virtual_shard_identity]: 1.067e-05 [virtual_dataset]: 9.55001e-06 [get_grad_eliminate_]: 9.79e-06 [virtual_output]: 8.77e-06 [merge_forward]: 5.24998e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.101e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.679e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.571e-05 [set_forward_comm_id_for_comm_node_pass]: 6.29001e-06 [meta_fg_expand]: 0.00011455 [flash_sp_send_recv_attached]: 1.02998e-06 [receive_attached]: 1.49e-06 [after_resolve]: 1.821e-05 [a_after_grad]: 1.548e-05 [renormalize]: 0.00093652 [add_forward_monad_depend]: 5.82999e-06 [auto_monad_grad]: 2.81e-06 [auto_monad_eliminator]: 1.905e-05 [cse]: 5.994e-05 [a_3]: 6.946e-05 [Cycle 3]: 0.0009615, [45] [expand_dump_flag]: 1.89e-06 [switch_simplify]: 1.091e-05 [loop_unroll]: 9.47999e-06 [a_1]: 0.00025609 [with_stream_mark]: 1.238e-05 [recompute_prepare]: 1.022e-05 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 4.44002e-06 [updatestate_loads_eliminate]: 4.10998e-06 [parameter_eliminate]: 1.24e-06 [a_2]: 0.00013726 [accelerated_algorithm]: 1.322e-05 [shard]: 1.27999e-06 [meta_shard_fg_expand]: 1.99e-06 [shard_inline]: 9.33002e-06 [merge_send_recv]: 8.04002e-06 [auto_parallel]: 8.21002e-06 [parallel]: 6.12001e-06 [flash_sp]: 1.22e-06 [merge_comm]: 5.29998e-06 [allreduce_fusion]: 5.05001e-06 [matmul_add_comm_reduction]: 8.86002e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.031e-05 [virtual_dataset]: 8.89e-06 [get_grad_eliminate_]: 8.74e-06 [virtual_output]: 8.55001e-06 [merge_forward]: 4.79e-06 [cell_reuse_recompute_pass]: 2.14e-06 [offload_activation]: 9.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.613e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 1.489e-05 [set_forward_comm_id_for_comm_node_pass]: 5.81998e-06 [meta_fg_expand]: 4.33999e-06 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.502e-05 [a_after_grad]: 1.53e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.89e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.293e-05 [cse]: 3.044e-05 [a_3]: 6.131e-05 [py_interpret_to_execute_after_opt_a]: 1.768e-05 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 5.595e-05 [convert_after_rewriter]: 9.74999e-06 [order_py_execute_after_rewriter]: 7.05e-06 [mutable_eliminate]: 0.00076676 [opt_b]: 0.00032261, [1] [Cycle 1]: 0.00031433, [7] [b_1]: 0.00021243 [b_2]: 1.181e-05 [updatestate_depend_eliminate]: 8.3e-06 [updatestate_assign_eliminate]: 4.37998e-06 [updatestate_loads_eliminate]: 4.3e-06 [renormalize]: 4.19997e-07 [cse]: 3.471e-05 [optimize_parallel_all_gather_comm]: 2.827e-05 [overlap_param_gather]: 2.33002e-06 [cconv]: 3.122e-05 [loop_unroll]: 0.00048689 [opt_after_cconv]: 0.00014473, [1] [Cycle 1]: 0.00013768, [7] [c_1]: 4.983e-05 [parameter_eliminate]: 3.25e-06 [updatestate_depend_eliminate]: 7.45e-06 [updatestate_assign_eliminate]: 4.26001e-06 [updatestate_loads_eliminate]: 3.89002e-06 [cse]: 3.32e-05 [renormalize]: 9.09989e-07 [remove_dup_value]: 4.359e-05 [tuple_transform]: 0.00010658, [1] [Cycle 1]: 0.00010161, [4] [d_1]: 7.059e-05 [none_parameter_eliminate]: 1.85001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.004e-05 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 6.79e-05 [cse_after_recomputation]: 3.698e-05, [1] [Cycle 1]: 3.172e-05, [1] [cse]: 2.516e-05 [environ_conv]: 1.019e-05 [swap_dp_allreduce_reducescatter]: 9.34e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.92e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.71999e-06 [assign_add_opt]: 1.47999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.55002e-06 [reorder_send_recv_between_fp_bp]: 3.28998e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.35999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.99e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 5.91e-06 [overlap_recompute_and_grad_model_parallel]: 5.71998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 5.51e-06 [overlap_grad_flash_sp]: 2.956e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 0.00010384, [1] [Cycle 1]: 9.917e-05, [6] [build]: 7.89002e-06 [elim_shapecalc]: 1.542e-05 [elim_not_effective]: 1.875e-05 [opt_reshape]: 1.167e-05 [fold_const_symbol]: 1.559e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.58e-06 [pipeline_parallel_scheduler]: 1.74998e-06 [auto_monad_reorder]: 2.796e-05 [get_jit_bprop_graph]: 1.86003e-06 [rewriter_after_jit_bprop_graph]: 4.00998e-06 [opt_after_jit_grad]: 0.00049756 [validate]: 5.008e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.00966747 [execute]: 1.005e-05 Sums bootstrap : 0.000509s : 1.31% type_inference : 0.012923s : 33.17% event_method : 0.000056s : 0.14% auto_monad : 0.000132s : 0.34% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000053s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.10% optimize.rewriter_before_opt_a : 0.000156s : 0.40% optimize.opt_a.expand_dump_flag : 0.000009s : 0.02% optimize.opt_a.switch_simplify : 0.000133s : 0.34% optimize.opt_a.loop_unroll : 0.000117s : 0.30% optimize.opt_a.a_1 : 0.003394s : 8.71% optimize.opt_a.with_stream_mark : 0.000055s : 0.14% optimize.opt_a.recompute_prepare : 0.000045s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.04% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000522s : 1.34% optimize.opt_a.accelerated_algorithm : 0.000059s : 0.15% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.09% optimize.opt_a.merge_send_recv : 0.000035s : 0.09% optimize.opt_a.auto_parallel : 0.000030s : 0.08% optimize.opt_a.parallel : 0.000034s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000022s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000049s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.10% optimize.opt_a.virtual_dataset : 0.000034s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.09% optimize.opt_a.virtual_output : 0.000033s : 0.08% optimize.opt_a.merge_forward : 0.000020s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000041s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000102s : 0.26% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.06% optimize.opt_a.meta_fg_expand : 0.001983s : 5.09% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000097s : 0.25% optimize.opt_a.a_after_grad : 0.000117s : 0.30% optimize.opt_a.renormalize : 0.004412s : 11.33% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.05% optimize.opt_a.auto_monad_grad : 0.000012s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000092s : 0.24% optimize.opt_a.cse : 0.000271s : 0.70% optimize.opt_a.a_3 : 0.000481s : 1.24% optimize.py_interpret_to_execute_after_opt_a : 0.000018s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000056s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000767s : 1.97% optimize.opt_b.b_1 : 0.000212s : 0.55% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000035s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000028s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000031s : 0.08% optimize.loop_unroll : 0.000487s : 1.25% optimize.opt_after_cconv.c_1 : 0.000050s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000033s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000044s : 0.11% optimize.tuple_transform.d_1 : 0.000071s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000068s : 0.17% optimize.cse_after_recomputation.cse : 0.000025s : 0.06% optimize.environ_conv : 0.000010s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000030s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000028s : 0.07% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000498s : 1.28% validate : 0.000050s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.009667s : 24.81% execute : 0.000010s : 0.03% Time group info: ------[substitution.] 0.000835 222 5.90% : 0.000049s : 12: substitution.arithmetic_simplify 2.00% : 0.000017s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000003s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 57.09% : 0.000477s : 17: substitution.inline 2.17% : 0.000018s : 2: substitution.inline_without_move 1.34% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000017s : 3: substitution.less_batch_normalization 1.73% : 0.000014s : 11: substitution.minmaximum_grad 0.65% : 0.000005s : 5: substitution.partial_eliminate 1.62% : 0.000014s : 20: substitution.remove_not_recompute_node 3.13% : 0.000026s : 10: substitution.replace_applicator 1.39% : 0.000012s : 15: substitution.replace_old_param 0.32% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.43% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.59% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.11% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.10% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.13% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012834 2 80.29% : 0.010305s : 1: type_inference.infer 19.71% : 0.002529s : 1: type_inference.specialize ------[replace.] 0.000234 33 57.50% : 0.000134s : 17: replace.inline 42.50% : 0.000099s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000500 33 93.53% : 0.000468s : 17: match.inline 6.47% : 0.000032s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000832 5764 1.00% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.45% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000009s : 68: predicate.addn_zero_filter 0.99% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000017s : 100: predicate.arithmetic_simplify 1.09% : 0.000009s : 68: predicate.cast_eliminate 1.06% : 0.000009s : 68: predicate.check_bprop_eliminate 0.47% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.48% : 0.000004s : 32: predicate.depend_value_elim 1.09% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.10% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.03% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.08% : 0.000009s : 76: predicate.environ_get_depend_swap 1.65% : 0.000014s : 108: predicate.environ_get_eliminate 1.08% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.58% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.12% : 0.000018s : 101: predicate.float_depend_g_call 0.47% : 0.000004s : 32: predicate.float_environ_get_switch 0.63% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.50% : 0.000004s : 32: predicate.incorporate_call 0.45% : 0.000004s : 32: predicate.incorporate_call_switch 5.19% : 0.000043s : 249: predicate.inline 1.20% : 0.000010s : 55: predicate.inline_without_move 5.41% : 0.000045s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.48% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.46% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.15% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.37% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.03% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.05% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.00% : 0.000008s : 68: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.93% : 0.000016s : 101: predicate.partial_defer_inline 1.60% : 0.000013s : 92: predicate.partial_eliminate 0.97% : 0.000008s : 68: predicate.print_const_string_wrapper 0.50% : 0.000004s : 32: predicate.reduce_all_const_elim 1.25% : 0.000010s : 68: predicate.reduce_eliminate 2.45% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000003s : 32: predicate.remove_not_recompute_node 1.75% : 0.000015s : 152: predicate.replace_applicator 0.57% : 0.000005s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.02% : 0.000008s : 68: predicate.reshape_eliminate 1.04% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000011s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000003s : 16: predicate.special_op_eliminate 0.57% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.71% : 0.000014s : 101: predicate.switch_defer_inline 2.72% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.58% : 0.000038s : 277: predicate.switch_simplify 0.99% : 0.000008s : 68: predicate.tile_eliminate 0.99% : 0.000008s : 68: predicate.transpose_eliminate 1.40% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.68% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.40% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 3.23% : 0.000027s : 116: predicate.tuple_list_set_item_eliminator 1.51% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.42% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 2.99% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.52% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 32: predicate.virtual_output_eliminate 0.12% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002333 34 46.53% : 0.001086s : 13: func_graph_cloner_run.FuncGraphClonerGraph 53.47% : 0.001248s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.073786 237 0.00% : 0.000004s : 1: ForceFp32Comm 4.82% : 0.003554s : 1: add_attr 4.80% : 0.003544s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000072s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000140s : 1: auto_monad 0.04% : 0.000032s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.72% : 0.000533s : 1: bootstrap 0.05% : 0.000035s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000023s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000040s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.09% : 0.000065s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.67% : 0.000496s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 1.05% : 0.000777s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.02% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000020s : 1: opt.transform.mutable_eliminate 7.02% : 0.005177s : 117: opt.transform.opt_a 0.07% : 0.000049s : 1: opt.transform.opt_after_cconv 0.05% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000196s : 28: opt.transform.opt_b 0.11% : 0.000078s : 2: opt.transform.opt_trans_graph 0.08% : 0.000058s : 4: opt.transform.symbol_engine_opt 18.07% : 0.013336s : 1: opt_a 0.20% : 0.000148s : 1: opt_after_cconv 0.69% : 0.000507s : 1: opt_after_jit_grad 0.44% : 0.000326s : 1: opt_b 22.26% : 0.016426s : 1: optimize 0.05% : 0.000038s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000033s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000058s : 1: pre_auto_parallel 0.06% : 0.000045s : 1: py_interpret_to_execute 0.03% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000048s : 1: remove_dup_value 3.08% : 0.002275s : 2: renormalize.infer 2.87% : 0.002121s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000060s : 1: rewriter_after_opt_a 0.22% : 0.000161s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.14% : 0.000106s : 1: symbol_engine_optimizer 13.12% : 0.009682s : 1: task_emit 0.15% : 0.000110s : 1: tuple_transform 17.55% : 0.012946s : 1: type_inference 0.13% : 0.000093s : 1: validate TotalTime = 0.0204753, [24] [bootstrap]: 0.00048083 [type_inference]: 0.00478704 [event_method]: 1.122e-05 [auto_monad]: 5.449e-05 [graph_reusing]: 5.75001e-06 [inline]: 2.31e-06 [add_attr]: 0.00322911, [1] [add_attr_with_inline]: 0.00321897, [1] [Cycle 1]: 4.888e-05, [2] [tag_attr]: 1.349e-05 [meta_addattr_fg_expand]: 3.06999e-06 [parallel-infer-symbol]: 3.10998e-06 [pre_auto_parallel]: 2.79e-05 [insert-virtual-dataset]: 2.77002e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 2.27999e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00403488, [53] [py_interpret_to_execute]: 1.699e-05 [rewriter_before_opt_a]: 4.028e-05 [opt_a]: 0.00206947, [2] [Cycle 1]: 0.00144363, [45] [expand_dump_flag]: 3.36001e-06 [switch_simplify]: 2.534e-05 [loop_unroll]: 1.417e-05 [a_1]: 0.00031411 [with_stream_mark]: 1.6e-05 [recompute_prepare]: 7.87998e-06 [updatestate_depend_eliminate]: 4.73001e-06 [updatestate_assign_eliminate]: 3.47002e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 2.22999e-06 [a_2]: 8.099e-05 [accelerated_algorithm]: 6.73e-06 [shard]: 2.94999e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 6.08998e-06 [merge_send_recv]: 8.27003e-06 [auto_parallel]: 7.11999e-06 [parallel]: 1.985e-05 [flash_sp]: 8.33999e-06 [merge_comm]: 4.08001e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 9.47001e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.85998e-06 [virtual_dataset]: 6.01998e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.87001e-06 [merge_forward]: 4.53001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 1.012e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.193e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 1.019e-05 [set_forward_comm_id_for_comm_node_pass]: 3.83999e-06 [meta_fg_expand]: 2.52001e-06 [flash_sp_send_recv_attached]: 2.68e-06 [receive_attached]: 2.82002e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 9.31e-06 [renormalize]: 0.00046949 [add_forward_monad_depend]: 5.32999e-06 [auto_monad_grad]: 2.10002e-06 [auto_monad_eliminator]: 1.52e-05 [cse]: 2.923e-05 [a_3]: 4.208e-05 [Cycle 2]: 0.00061599, [45] [expand_dump_flag]: 1.60999e-06 [switch_simplify]: 7.44002e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.0001283 [with_stream_mark]: 1.317e-05 [recompute_prepare]: 5.90002e-06 [updatestate_depend_eliminate]: 3.03998e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.04998e-06 [a_2]: 6.924e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.11997e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 4.57998e-06 [auto_parallel]: 6.24001e-06 [parallel]: 5.00999e-06 [flash_sp]: 3.84002e-06 [merge_comm]: 2.93998e-06 [allreduce_fusion]: 2.89999e-06 [matmul_add_comm_reduction]: 5.52001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.35002e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 5.06002e-06 [virtual_output]: 4.82e-06 [merge_forward]: 2.51998e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 7.13998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.015e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.23999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.76998e-06 [after_resolve]: 9.49999e-06 [a_after_grad]: 8.33999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.72001e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 7.15e-06 [cse]: 1.385e-05 [a_3]: 3.248e-05 [py_interpret_to_execute_after_opt_a]: 1.002e-05 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.375e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.69e-06 [mutable_eliminate]: 0.00053856 [opt_b]: 0.00018992, [1] [Cycle 1]: 0.0001836, [7] [b_1]: 0.00010909 [b_2]: 8.02e-06 [updatestate_depend_eliminate]: 6.04001e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 2.80002e-06 [renormalize]: 3.89991e-07 [cse]: 1.836e-05 [optimize_parallel_all_gather_comm]: 1.852e-05 [overlap_param_gather]: 2.16998e-06 [cconv]: 2.692e-05 [loop_unroll]: 0.00042855 [opt_after_cconv]: 9.768e-05, [1] [Cycle 1]: 9.215e-05, [7] [c_1]: 2.847e-05 [parameter_eliminate]: 3.35e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.652e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.378e-05 [tuple_transform]: 7.216e-05, [1] [Cycle 1]: 6.785e-05, [4] [d_1]: 4.156e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.43998e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.772e-05 [cse_after_recomputation]: 2.04e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.08e-05 [environ_conv]: 5.76998e-06 [swap_dp_allreduce_reducescatter]: 5.12999e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.68001e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.79999e-06 [reorder_send_recv_between_fp_bp]: 2.66999e-06 [comm_op_add_attrs]: 1.42999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.273e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55999e-06 [overlap_recompute_comm]: 2.67001e-06 [overlap_grad_ring_attention]: 4.06001e-06 [overlap_grad_flash_sp]: 2.001e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.81e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 7.033e-05, [1] [Cycle 1]: 6.613e-05, [6] [build]: 2.70002e-06 [elim_shapecalc]: 9.05001e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 6.33e-06 [fold_const_symbol]: 9.06002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.23002e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.638e-05 [get_jit_bprop_graph]: 1.54e-06 [rewriter_after_jit_bprop_graph]: 4.00998e-06 [opt_after_jit_grad]: 0.00048177 [validate]: 3.659e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00706167 [execute]: 8.16002e-06 Sums bootstrap : 0.000481s : 2.96% type_inference : 0.004787s : 29.47% event_method : 0.000011s : 0.07% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.25% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000033s : 0.20% optimize.opt_a.loop_unroll : 0.000020s : 0.12% optimize.opt_a.a_1 : 0.000442s : 2.72% optimize.opt_a.with_stream_mark : 0.000029s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000150s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000025s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000470s : 2.89% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.14% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000539s : 3.32% optimize.opt_b.b_1 : 0.000109s : 0.67% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.17% optimize.loop_unroll : 0.000429s : 2.64% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000042s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000482s : 2.97% validate : 0.000037s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.007062s : 43.47% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000137 26 17.39% : 0.000024s : 4: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 4.51% : 0.000006s : 4: substitution.graph_param_transform 66.86% : 0.000092s : 2: substitution.inline 2.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000005s : 4: substitution.remove_not_recompute_node 2.90% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004737 2 91.79% : 0.004348s : 1: type_inference.infer 8.21% : 0.000389s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000090 2 100.00% : 0.000090s : 2: match.inline ------[predicate.] 0.000144 984 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.45% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.40% : 0.000001s : 4: predicate.elim_not_effective 0.82% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.89% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 1.00% : 0.000001s : 8: predicate.get_grad_eliminate 0.46% : 0.000001s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000009s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.55% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.06% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.80% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 9: predicate.minmaximum_grad 2.24% : 0.000003s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.15% : 0.000002s : 13: predicate.partial_eliminate 0.73% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.29% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.89% : 0.000001s : 8: predicate.remove_not_recompute_node 1.26% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.51% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.67% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.10% : 0.000002s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.58% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.70% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.29% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.74% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.92% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000276 6 39.83% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.17% : 0.000166s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029173 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.08% : 0.003234s : 1: add_attr 11.05% : 0.003223s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.74% : 0.000509s : 1: bootstrap 0.11% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.50% : 0.000438s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.88% : 0.000549s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.06% : 0.000016s : 1: opt.transform.mutable_eliminate 2.76% : 0.000805s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000046s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.10% : 0.002073s : 1: opt_a 0.35% : 0.000101s : 1: opt_after_cconv 1.68% : 0.000492s : 1: opt_after_jit_grad 0.66% : 0.000193s : 1: opt_b 13.84% : 0.004039s : 1: optimize 0.08% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.07% : 0.000021s : 1: py_interpret_to_execute 0.05% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.90% : 0.000263s : 1: renormalize.infer 0.68% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.15% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000006s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000073s : 1: symbol_engine_optimizer 24.27% : 0.007080s : 1: task_emit 0.26% : 0.000075s : 1: tuple_transform 16.47% : 0.004806s : 1: type_inference 0.24% : 0.000069s : 1: validate TotalTime = 0.0434726, [24] [bootstrap]: 0.00047292 [type_inference]: 0.0113498 [event_method]: 4.323e-05 [auto_monad]: 0.0001252 [graph_reusing]: 9.16998e-06 [inline]: 2.01998e-06 [add_attr]: 0.00330066, [1] [add_attr_with_inline]: 0.00329099, [1] [Cycle 1]: 7.503e-05, [2] [tag_attr]: 3.467e-05 [meta_addattr_fg_expand]: 9.42001e-06 [parallel-infer-symbol]: 3.85e-06 [pre_auto_parallel]: 4.909e-05 [insert-virtual-dataset]: 3.25998e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.91998e-06 [pipeline_split]: 1.78997e-06 [optimize]: 0.0167222, [53] [py_interpret_to_execute]: 3.683e-05 [rewriter_before_opt_a]: 0.00013277 [opt_a]: 0.0140866, [3] [Cycle 1]: 0.0089246, [45] [expand_dump_flag]: 4.72e-06 [switch_simplify]: 6.864e-05 [loop_unroll]: 5.59e-05 [a_1]: 0.00165452 [with_stream_mark]: 3.067e-05 [recompute_prepare]: 8.627e-05 [updatestate_depend_eliminate]: 9.90002e-06 [updatestate_assign_eliminate]: 8.08999e-06 [updatestate_loads_eliminate]: 7.67998e-06 [parameter_eliminate]: 3.71999e-06 [a_2]: 0.00025255 [accelerated_algorithm]: 3.315e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 4.25e-06 [shard_inline]: 1.685e-05 [merge_send_recv]: 1.801e-05 [auto_parallel]: 1.231e-05 [parallel]: 2.069e-05 [flash_sp]: 1.353e-05 [merge_comm]: 1.017e-05 [allreduce_fusion]: 9.52001e-06 [matmul_add_comm_reduction]: 2.872e-05 [allreduce_slice_to_reducescatter]: 1.29e-06 [virtual_shard_identity]: 2.013e-05 [virtual_dataset]: 1.576e-05 [get_grad_eliminate_]: 1.575e-05 [virtual_output]: 1.59e-05 [merge_forward]: 1.131e-05 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 1.886e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.123e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 2.925e-05 [set_forward_comm_id_for_comm_node_pass]: 1.054e-05 [meta_fg_expand]: 0.00188962 [flash_sp_send_recv_attached]: 4.95001e-06 [receive_attached]: 2.76999e-06 [after_resolve]: 6.612e-05 [a_after_grad]: 8.871e-05 [renormalize]: 0.00341355 [add_forward_monad_depend]: 1.123e-05 [auto_monad_grad]: 6.65998e-06 [auto_monad_eliminator]: 5.967e-05 [cse]: 0.00017816 [a_3]: 0.00036924 [Cycle 2]: 0.00413698, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 4.939e-05 [loop_unroll]: 4.593e-05 [a_1]: 0.00231984 [with_stream_mark]: 1.9e-05 [recompute_prepare]: 1.609e-05 [updatestate_depend_eliminate]: 6.26998e-06 [updatestate_assign_eliminate]: 5.63002e-06 [updatestate_loads_eliminate]: 5.20999e-06 [parameter_eliminate]: 2.17001e-06 [a_2]: 0.00013805 [accelerated_algorithm]: 1.398e-05 [shard]: 2.69999e-06 [meta_shard_fg_expand]: 3.20002e-06 [shard_inline]: 1.09e-05 [merge_send_recv]: 1.038e-05 [auto_parallel]: 1.1e-05 [parallel]: 9.91998e-06 [flash_sp]: 3.88999e-06 [merge_comm]: 5.92999e-06 [allreduce_fusion]: 5.29e-06 [matmul_add_comm_reduction]: 1.102e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.304e-05 [virtual_dataset]: 1.009e-05 [get_grad_eliminate_]: 9.68002e-06 [virtual_output]: 8.54e-06 [merge_forward]: 6.34001e-06 [cell_reuse_recompute_pass]: 2.32001e-06 [offload_activation]: 1.449e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.874e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 1.709e-05 [set_forward_comm_id_for_comm_node_pass]: 5.86e-06 [meta_fg_expand]: 5.023e-05 [flash_sp_send_recv_attached]: 1.29998e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.849e-05 [a_after_grad]: 1.523e-05 [renormalize]: 0.00080612 [add_forward_monad_depend]: 5.10999e-06 [auto_monad_grad]: 2.32001e-06 [auto_monad_eliminator]: 1.819e-05 [cse]: 6.08e-05 [a_3]: 7.065e-05 [Cycle 3]: 0.00100731, [45] [expand_dump_flag]: 1.84998e-06 [switch_simplify]: 1.126e-05 [loop_unroll]: 9.20999e-06 [a_1]: 0.00025824 [with_stream_mark]: 1.296e-05 [recompute_prepare]: 9.77999e-06 [updatestate_depend_eliminate]: 5.64998e-06 [updatestate_assign_eliminate]: 4.62e-06 [updatestate_loads_eliminate]: 4.40999e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 0.00012801 [accelerated_algorithm]: 1.282e-05 [shard]: 1.34e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 9.14998e-06 [merge_send_recv]: 8.23999e-06 [auto_parallel]: 8.73001e-06 [parallel]: 5.74999e-06 [flash_sp]: 1.04003e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 5.20001e-06 [matmul_add_comm_reduction]: 9.66e-06 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 1.16e-05 [virtual_dataset]: 8.92999e-06 [get_grad_eliminate_]: 8.94e-06 [virtual_output]: 8.60999e-06 [merge_forward]: 4.79e-06 [cell_reuse_recompute_pass]: 1.68997e-06 [offload_activation]: 1.166e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.827e-05 [merge_recompute_call_nodes]: 1.12999e-06 [before_grad]: 1.545e-05 [set_forward_comm_id_for_comm_node_pass]: 5.70001e-06 [meta_fg_expand]: 3.56999e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.72001e-06 [after_resolve]: 1.409e-05 [a_after_grad]: 1.532e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.19001e-06 [auto_monad_grad]: 1.60001e-06 [auto_monad_eliminator]: 1.297e-05 [cse]: 3.014e-05 [a_3]: 6.283e-05 [py_interpret_to_execute_after_opt_a]: 1.619e-05 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 5.521e-05 [convert_after_rewriter]: 9.72001e-06 [order_py_execute_after_rewriter]: 7.61001e-06 [mutable_eliminate]: 0.00066786 [opt_b]: 0.00031144, [1] [Cycle 1]: 0.00030355, [7] [b_1]: 0.00019901 [b_2]: 1.193e-05 [updatestate_depend_eliminate]: 8.77999e-06 [updatestate_assign_eliminate]: 4.68001e-06 [updatestate_loads_eliminate]: 4.32e-06 [renormalize]: 2.60014e-07 [cse]: 3.762e-05 [optimize_parallel_all_gather_comm]: 2.491e-05 [overlap_param_gather]: 1.98002e-06 [cconv]: 2.768e-05 [loop_unroll]: 0.00048365 [opt_after_cconv]: 0.00014387, [1] [Cycle 1]: 0.00013676, [7] [c_1]: 4.926e-05 [parameter_eliminate]: 2.46998e-06 [updatestate_depend_eliminate]: 8.08999e-06 [updatestate_assign_eliminate]: 4.73001e-06 [updatestate_loads_eliminate]: 4.37998e-06 [cse]: 3.225e-05 [renormalize]: 4.59986e-07 [remove_dup_value]: 3.152e-05 [tuple_transform]: 0.00010657, [1] [Cycle 1]: 0.00010158, [4] [d_1]: 7.054e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 1.042e-05 [partial_unused_args_eliminate]: 2.26998e-06 [add_recomputation]: 6.663e-05 [cse_after_recomputation]: 3.395e-05, [1] [Cycle 1]: 2.933e-05, [1] [cse]: 2.35e-05 [environ_conv]: 1.072e-05 [swap_dp_allreduce_reducescatter]: 8.54e-06 [bias_add_comm_swap]: 2.63003e-06 [label_micro_interleaved_index]: 5.24998e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.33002e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 1.25001e-06 [remove_cast_before_assign_add]: 1.15001e-06 [full_micro_interleaved_order_control]: 2.76999e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.01997e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.09003e-06 [overlap_opt_shard_in_pipeline]: 1.24998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.943e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 5.57001e-06 [overlap_recompute_and_grad_model_parallel]: 6.14001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.57001e-06 [overlap_recompute_comm]: 2.40997e-06 [overlap_grad_ring_attention]: 5.27001e-06 [overlap_grad_flash_sp]: 2.823e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 2.07999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010796, [1] [Cycle 1]: 0.00010318, [6] [build]: 1.262e-05 [elim_shapecalc]: 1.484e-05 [elim_not_effective]: 1.9e-05 [opt_reshape]: 1.113e-05 [fold_const_symbol]: 1.564e-05 [renormalize]: 3.00002e-07 [detach_backward]: 2.45002e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 2.674e-05 [get_jit_bprop_graph]: 1.81e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00052804 [validate]: 4.956e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.0105255 [execute]: 9.05001e-06 Sums bootstrap : 0.000473s : 1.22% type_inference : 0.011350s : 29.30% event_method : 0.000043s : 0.11% auto_monad : 0.000125s : 0.32% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000049s : 0.13% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.10% optimize.rewriter_before_opt_a : 0.000133s : 0.34% optimize.opt_a.expand_dump_flag : 0.000009s : 0.02% optimize.opt_a.switch_simplify : 0.000129s : 0.33% optimize.opt_a.loop_unroll : 0.000111s : 0.29% optimize.opt_a.a_1 : 0.004233s : 10.93% optimize.opt_a.with_stream_mark : 0.000063s : 0.16% optimize.opt_a.recompute_prepare : 0.000112s : 0.29% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.04% optimize.opt_a.parameter_eliminate : 0.000007s : 0.02% optimize.opt_a.a_2 : 0.000519s : 1.34% optimize.opt_a.accelerated_algorithm : 0.000060s : 0.15% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000037s : 0.10% optimize.opt_a.merge_send_recv : 0.000037s : 0.09% optimize.opt_a.auto_parallel : 0.000032s : 0.08% optimize.opt_a.parallel : 0.000036s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.05% optimize.opt_a.allreduce_fusion : 0.000020s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000049s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000045s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.09% optimize.opt_a.virtual_output : 0.000033s : 0.09% optimize.opt_a.merge_forward : 0.000022s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.01% optimize.opt_a.offload_activation : 0.000045s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000068s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.01% optimize.opt_a.before_grad : 0.000062s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.06% optimize.opt_a.meta_fg_expand : 0.001943s : 5.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000007s : 0.02% optimize.opt_a.after_resolve : 0.000099s : 0.25% optimize.opt_a.a_after_grad : 0.000119s : 0.31% optimize.opt_a.renormalize : 0.004220s : 10.89% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.05% optimize.opt_a.auto_monad_grad : 0.000011s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000091s : 0.23% optimize.opt_a.cse : 0.000269s : 0.69% optimize.opt_a.a_3 : 0.000503s : 1.30% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000055s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000008s : 0.02% optimize.mutable_eliminate : 0.000668s : 1.72% optimize.opt_b.b_1 : 0.000199s : 0.51% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000038s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.07% optimize.loop_unroll : 0.000484s : 1.25% optimize.opt_after_cconv.c_1 : 0.000049s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000032s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.08% optimize.tuple_transform.d_1 : 0.000071s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000067s : 0.17% optimize.cse_after_recomputation.cse : 0.000024s : 0.06% optimize.environ_conv : 0.000011s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000028s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.07% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000528s : 1.36% validate : 0.000050s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.010526s : 27.17% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000954 218 6.17% : 0.000059s : 11: substitution.arithmetic_simplify 1.81% : 0.000017s : 2: substitution.cast_eliminate 0.29% : 0.000003s : 5: substitution.elim_not_effective 0.59% : 0.000006s : 5: substitution.float_depend_g_call 0.49% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.93% : 0.000009s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 58.34% : 0.000557s : 16: substitution.inline 2.02% : 0.000019s : 2: substitution.inline_without_move 1.29% : 0.000012s : 20: substitution.j_node_and_user_rematch 1.85% : 0.000018s : 3: substitution.less_batch_normalization 1.63% : 0.000016s : 11: substitution.minmaximum_grad 0.77% : 0.000007s : 5: substitution.partial_eliminate 1.54% : 0.000015s : 20: substitution.remove_not_recompute_node 3.15% : 0.000030s : 10: substitution.replace_applicator 1.25% : 0.000012s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.19% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.50% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.04% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.88% : 0.000075s : 28: substitution.tuple_list_get_item_eliminator 2.14% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011264 2 87.59% : 0.009866s : 1: type_inference.infer 12.41% : 0.001398s : 1: type_inference.specialize ------[replace.] 0.000240 30 60.65% : 0.000146s : 16: replace.inline 39.35% : 0.000095s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000583 30 93.70% : 0.000546s : 16: match.inline 6.30% : 0.000037s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000793 5663 1.37% : 0.000011s : 67: predicate.accumulaten_eliminater 0.34% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.01% : 0.000008s : 67: predicate.addn_zero_filter 1.00% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.18% : 0.000017s : 99: predicate.arithmetic_simplify 1.12% : 0.000009s : 67: predicate.cast_eliminate 1.22% : 0.000010s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.13% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.13% : 0.000001s : 8: predicate.elim_not_effective 0.22% : 0.000002s : 8: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000010s : 75: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.69% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.59% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.43% : 0.000019s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.63% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000005s : 32: predicate.get_grad_eliminate 0.12% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.80% : 0.000046s : 244: predicate.inline 1.33% : 0.000011s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000006s : 32: predicate.less_batch_normalization 1.56% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 164: predicate.load_eliminater 0.37% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.13% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.34% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 67: predicate.minmaximum_grad 0.45% : 0.000004s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.24% : 0.000018s : 97: predicate.partial_defer_inline 1.63% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.38% : 0.000011s : 67: predicate.reduce_eliminate 2.52% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000015s : 149: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.14% : 0.000001s : 8: predicate.reset_defer_inline 1.04% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.34% : 0.000011s : 68: predicate.same_eliminate 0.40% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.72% : 0.000006s : 32: predicate.shard_identity_eliminate 0.35% : 0.000003s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.37% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.73% : 0.000014s : 97: predicate.switch_defer_inline 2.94% : 0.000023s : 165: predicate.switch_layer_defer_inline 4.71% : 0.000037s : 265: predicate.switch_simplify 1.01% : 0.000008s : 67: predicate.tile_eliminate 1.04% : 0.000008s : 67: predicate.transpose_eliminate 1.53% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000023s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000012s : 83: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000017s : 115: predicate.tuple_list_set_item_eliminator 1.53% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.47% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.11% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.22% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002011 32 53.77% : 0.001081s : 12: func_graph_cloner_run.FuncGraphClonerGraph 46.23% : 0.000930s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.074106 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.46% : 0.003305s : 1: add_attr 4.45% : 0.003295s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000072s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000133s : 1: auto_monad 0.04% : 0.000031s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.68% : 0.000501s : 1: bootstrap 0.04% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000023s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.07% : 0.000050s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.66% : 0.000493s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.92% : 0.000678s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000022s : 1: opt.transform.mutable_eliminate 8.18% : 0.006059s : 117: opt.transform.opt_a 0.06% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000183s : 28: opt.transform.opt_b 0.11% : 0.000078s : 2: opt.transform.opt_trans_graph 0.08% : 0.000057s : 4: opt.transform.symbol_engine_opt 19.01% : 0.014091s : 1: opt_a 0.20% : 0.000147s : 1: opt_after_cconv 0.73% : 0.000538s : 1: opt_after_jit_grad 0.43% : 0.000315s : 1: opt_b 22.57% : 0.016728s : 1: optimize 0.04% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000032s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000054s : 1: pre_auto_parallel 0.06% : 0.000041s : 1: py_interpret_to_execute 0.03% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000036s : 1: remove_dup_value 2.88% : 0.002132s : 2: renormalize.infer 2.79% : 0.002070s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000059s : 1: rewriter_after_opt_a 0.19% : 0.000138s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000111s : 1: symbol_engine_optimizer 14.23% : 0.010546s : 1: task_emit 0.15% : 0.000110s : 1: tuple_transform 15.34% : 0.011370s : 1: type_inference 0.12% : 0.000087s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x2-kbk],max_mem:4.0M TotalTime = 0.531962, [24] [bootstrap]: 0.00067922 [type_inference]: 0.00694871 [event_method]: 1.481e-05 [auto_monad]: 5.859e-05 [graph_reusing]: 6.01998e-06 [inline]: 2.61e-06 [add_attr]: 0.00392926, [1] [add_attr_with_inline]: 0.00391831, [1] [Cycle 1]: 5.527e-05, [2] [tag_attr]: 1.776e-05 [meta_addattr_fg_expand]: 4.98001e-06 [parallel-infer-symbol]: 3.36999e-06 [pre_auto_parallel]: 3.074e-05 [insert-virtual-dataset]: 2.74999e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.27999e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00434872, [53] [py_interpret_to_execute]: 2.116e-05 [rewriter_before_opt_a]: 6.299e-05 [opt_a]: 0.0024138, [2] [Cycle 1]: 0.00179811, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 3.357e-05 [loop_unroll]: 2.139e-05 [a_1]: 0.00054437 [with_stream_mark]: 1.57e-05 [recompute_prepare]: 8.94e-06 [updatestate_depend_eliminate]: 4.46002e-06 [updatestate_assign_eliminate]: 3.53e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 0.00010633 [accelerated_algorithm]: 7.16001e-06 [shard]: 2.63e-06 [meta_shard_fg_expand]: 2.09e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 9.53002e-06 [auto_parallel]: 6.89999e-06 [parallel]: 2.609e-05 [flash_sp]: 8.42e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.54e-06 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 7.90998e-06 [virtual_dataset]: 6.49999e-06 [get_grad_eliminate_]: 6.41e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 3.84002e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 1.016e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.215e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.038e-05 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.46998e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.112e-05 [a_after_grad]: 9.24e-06 [renormalize]: 0.00050288 [add_forward_monad_depend]: 5.38002e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.453e-05 [cse]: 3.027e-05 [a_3]: 4.273e-05 [Cycle 2]: 0.00060614, [45] [expand_dump_flag]: 1.16002e-06 [switch_simplify]: 7.06001e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00012973 [with_stream_mark]: 1.11e-05 [recompute_prepare]: 5.99999e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.107e-05 [accelerated_algorithm]: 6.09999e-06 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 4.55001e-06 [auto_parallel]: 5.69999e-06 [parallel]: 4.56002e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.27999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.99001e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.97999e-06 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 8.12998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.94e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.30001e-06 [a_after_grad]: 8.64e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.228e-05 [a_3]: 3.248e-05 [py_interpret_to_execute_after_opt_a]: 8.38999e-06 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 3.287e-05 [convert_after_rewriter]: 7.6e-06 [order_py_execute_after_rewriter]: 5.30001e-06 [mutable_eliminate]: 0.00047953 [opt_b]: 0.00019276, [1] [Cycle 1]: 0.00018672, [7] [b_1]: 0.00011251 [b_2]: 7.41001e-06 [updatestate_depend_eliminate]: 5.04998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.09999e-06 [renormalize]: 4.00003e-07 [cse]: 2.217e-05 [optimize_parallel_all_gather_comm]: 1.684e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.366e-05 [loop_unroll]: 0.00043203 [opt_after_cconv]: 9.726e-05, [1] [Cycle 1]: 9.158e-05, [7] [c_1]: 2.85e-05 [parameter_eliminate]: 2.41998e-06 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.654e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.373e-05 [tuple_transform]: 7.215e-05, [1] [Cycle 1]: 6.788e-05, [4] [d_1]: 4.091e-05 [none_parameter_eliminate]: 1.97999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.40002e-06 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 5.233e-05 [cse_after_recomputation]: 2.077e-05, [1] [Cycle 1]: 1.634e-05, [1] [cse]: 1.096e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 6.53003e-06 [bias_add_comm_swap]: 2.49001e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.59999e-06 [micro_interleaved_order_control]: 2.69001e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.65997e-06 [reorder_send_recv_between_fp_bp]: 2.93998e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.31998e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.57999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.248e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 4.48999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.76e-06 [overlap_grad_ring_attention]: 4.04002e-06 [overlap_grad_flash_sp]: 1.808e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 7.022e-05, [1] [Cycle 1]: 6.62e-05, [6] [build]: 2.59001e-06 [elim_shapecalc]: 9.19998e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.59999e-06 [fold_const_symbol]: 9.41e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.61002e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.65e-05 [get_jit_bprop_graph]: 1.54e-06 [rewriter_after_jit_bprop_graph]: 3.42002e-06 [opt_after_jit_grad]: 0.00094279 [validate]: 3.402e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.51469 [execute]: 9.42001e-06 Sums bootstrap : 0.000679s : 0.13% type_inference : 0.006949s : 1.32% event_method : 0.000015s : 0.00% auto_monad : 0.000059s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000031s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000063s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.01% optimize.opt_a.loop_unroll : 0.000027s : 0.01% optimize.opt_a.a_1 : 0.000674s : 0.13% optimize.opt_a.with_stream_mark : 0.000027s : 0.01% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000177s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000031s : 0.01% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000503s : 0.10% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.00% optimize.opt_a.cse : 0.000043s : 0.01% optimize.opt_a.a_3 : 0.000075s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.01% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000480s : 0.09% optimize.opt_b.b_1 : 0.000113s : 0.02% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000432s : 0.08% optimize.opt_after_cconv.c_1 : 0.000029s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000943s : 0.18% validate : 0.000034s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.514690s : 97.67% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000176 30 15.66% : 0.000028s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.85% : 0.000001s : 2: substitution.fold_const_symbol 3.39% : 0.000006s : 4: substitution.graph_param_transform 64.98% : 0.000114s : 3: substitution.inline 2.09% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.53% : 0.000004s : 4: substitution.replace_old_param 6.98% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006895 2 90.99% : 0.006273s : 1: type_inference.infer 9.01% : 0.000621s : 1: type_inference.specialize ------[replace.] 0.000042 5 66.31% : 0.000028s : 3: replace.inline 33.69% : 0.000014s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 90.97% : 0.000112s : 3: match.inline 9.03% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000167 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.92% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_depend_swap 1.96% : 0.000003s : 23: predicate.environ_get_eliminate 1.02% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.93% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 1.00% : 0.000002s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.23% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.78% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.22% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000393 8 45.72% : 0.000180s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.28% : 0.000214s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.541977 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.73% : 0.003934s : 1: add_attr 0.72% : 0.003922s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000064s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.13% : 0.000718s : 1: bootstrap 0.01% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.08% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.09% : 0.000489s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.20% : 0.001083s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000093s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.01% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.45% : 0.002417s : 1: opt_a 0.02% : 0.000101s : 1: opt_after_cconv 0.18% : 0.000954s : 1: opt_after_jit_grad 0.04% : 0.000196s : 1: opt_b 0.80% : 0.004353s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000035s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.05% : 0.000262s : 1: renormalize.infer 0.04% : 0.000233s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000037s : 1: rewriter_after_opt_a 0.01% : 0.000067s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000073s : 1: symbol_engine_optimizer 94.97% : 0.514712s : 1: task_emit 0.01% : 0.000075s : 1: tuple_transform 1.29% : 0.006965s : 1: type_inference 0.01% : 0.000062s : 1: validate TotalTime = 0.242143, [24] [bootstrap]: 0.00062369 [type_inference]: 0.0067004 [event_method]: 1.155e-05 [auto_monad]: 5.741e-05 [graph_reusing]: 5.91e-06 [inline]: 1.81998e-06 [add_attr]: 0.00438476, [1] [add_attr_with_inline]: 0.00435993, [1] [Cycle 1]: 5.842e-05, [2] [tag_attr]: 1.295e-05 [meta_addattr_fg_expand]: 3.8e-06 [parallel-infer-symbol]: 3.22002e-06 [pre_auto_parallel]: 2.291e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.64999e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00423594, [53] [py_interpret_to_execute]: 1.727e-05 [rewriter_before_opt_a]: 4.292e-05 [opt_a]: 0.00219923, [2] [Cycle 1]: 0.00144572, [45] [expand_dump_flag]: 2.89001e-06 [switch_simplify]: 2.625e-05 [loop_unroll]: 1.417e-05 [a_1]: 0.00031463 [with_stream_mark]: 3.993e-05 [recompute_prepare]: 9.14998e-06 [updatestate_depend_eliminate]: 4.45e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 8.038e-05 [accelerated_algorithm]: 6.74999e-06 [shard]: 2.76999e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 6.29999e-06 [merge_send_recv]: 8.46002e-06 [auto_parallel]: 6.21998e-06 [parallel]: 1.937e-05 [flash_sp]: 8.2e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.71001e-06 [matmul_add_comm_reduction]: 9.77999e-06 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 7.64002e-06 [virtual_dataset]: 5.86998e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.85998e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 1.006e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.258e-05 [merge_recompute_call_nodes]: 2.13998e-06 [before_grad]: 1.005e-05 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.53e-06 [flash_sp_send_recv_attached]: 2.52001e-06 [receive_attached]: 2.79999e-06 [after_resolve]: 1.119e-05 [a_after_grad]: 9.20999e-06 [renormalize]: 0.00044567 [add_forward_monad_depend]: 4.57998e-06 [auto_monad_grad]: 2.19001e-06 [auto_monad_eliminator]: 1.445e-05 [cse]: 3.086e-05 [a_3]: 4.228e-05 [Cycle 2]: 0.00074322, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.77999e-06 [a_1]: 0.00012975 [with_stream_mark]: 1.094e-05 [recompute_prepare]: 6.19999e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 2.45997e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 0.00011713 [accelerated_algorithm]: 5.81003e-06 [shard]: 1.16997e-06 [meta_shard_fg_expand]: 1.31998e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.58001e-06 [auto_parallel]: 5.49998e-06 [parallel]: 4.41002e-06 [flash_sp]: 3.53e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.65001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.19999e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 1.256e-05 [virtual_output]: 6.14999e-06 [merge_forward]: 3.52002e-06 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 7.25998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.155e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 8.64998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.77999e-06 [a_after_grad]: 8.67e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.61998e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 7.87998e-06 [cse]: 1.558e-05 [a_3]: 3.475e-05 [py_interpret_to_execute_after_opt_a]: 9.12999e-06 [slice_cell_reuse_recomputed_activation]: 2.53e-06 [rewriter_after_opt_a]: 6.928e-05 [convert_after_rewriter]: 7.16001e-06 [order_py_execute_after_rewriter]: 5.51e-06 [mutable_eliminate]: 0.00049652 [opt_b]: 0.00018723, [1] [Cycle 1]: 0.00018026, [7] [b_1]: 0.00011044 [b_2]: 7.38999e-06 [updatestate_depend_eliminate]: 5.48002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 2.69996e-07 [cse]: 1.69e-05 [optimize_parallel_all_gather_comm]: 4.938e-05 [overlap_param_gather]: 2.37999e-06 [cconv]: 2.416e-05 [loop_unroll]: 0.0004348 [opt_after_cconv]: 9.818e-05, [1] [Cycle 1]: 9.218e-05, [7] [c_1]: 2.798e-05 [parameter_eliminate]: 3.04999e-06 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 2.58003e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.661e-05 [renormalize]: 2.29978e-07 [remove_dup_value]: 1.318e-05 [tuple_transform]: 7.288e-05, [1] [Cycle 1]: 6.827e-05, [4] [d_1]: 4.158e-05 [none_parameter_eliminate]: 1.93002e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.86998e-06 [add_recomputation]: 7.286e-05 [cse_after_recomputation]: 2.243e-05, [1] [Cycle 1]: 1.794e-05, [1] [cse]: 1.257e-05 [environ_conv]: 5.56002e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.54999e-06 [micro_interleaved_order_control]: 2.69001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 8.09989e-07 [full_micro_interleaved_order_control]: 2.56998e-06 [reorder_send_recv_between_fp_bp]: 2.78998e-06 [comm_op_add_attrs]: 1.34e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.45999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 2.17999e-06 [control_data_broadcast_order]: 1.227e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 4.27e-06 [overlap_grad_flash_sp]: 1.878e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 1.94999e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 7.09e-05, [1] [Cycle 1]: 6.672e-05, [6] [build]: 2.94999e-06 [elim_shapecalc]: 8.98002e-06 [elim_not_effective]: 1.23e-05 [opt_reshape]: 6.12999e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.634e-05 [get_jit_bprop_graph]: 1.86e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00047477 [validate]: 3.703e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.225178 [execute]: 1.121e-05 Sums bootstrap : 0.000624s : 0.26% type_inference : 0.006700s : 2.83% event_method : 0.000012s : 0.00% auto_monad : 0.000057s : 0.02% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.01% optimize.rewriter_before_opt_a : 0.000043s : 0.02% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.01% optimize.opt_a.loop_unroll : 0.000020s : 0.01% optimize.opt_a.a_1 : 0.000444s : 0.19% optimize.opt_a.with_stream_mark : 0.000051s : 0.02% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000198s : 0.08% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000024s : 0.01% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000018s : 0.01% optimize.opt_a.virtual_output : 0.000012s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000446s : 0.19% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.01% optimize.opt_a.cse : 0.000046s : 0.02% optimize.opt_a.a_3 : 0.000077s : 0.03% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000069s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000497s : 0.21% optimize.opt_b.b_1 : 0.000110s : 0.05% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000049s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.01% optimize.loop_unroll : 0.000435s : 0.18% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000042s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000073s : 0.03% optimize.cse_after_recomputation.cse : 0.000013s : 0.01% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000475s : 0.20% validate : 0.000037s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.225178s : 95.20% execute : 0.000011s : 0.00% Time group info: ------[substitution.] 0.000184 26 37.73% : 0.000069s : 4: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000002s : 2: substitution.fold_const_symbol 3.24% : 0.000006s : 4: substitution.graph_param_transform 50.56% : 0.000093s : 2: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 2.17% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.006587 2 91.95% : 0.006057s : 1: type_inference.infer 8.05% : 0.000530s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000091 2 100.00% : 0.000091s : 2: match.inline ------[predicate.] 0.000145 984 0.75% : 0.000001s : 9: predicate.accumulaten_eliminater 1.51% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000004s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.38% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.45% : 0.000001s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.74% : 0.000003s : 21: predicate.environ_get_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 1.09% : 0.000002s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000009s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.77% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.33% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.67% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.97% : 0.000001s : 8: predicate.remove_not_recompute_node 1.21% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.69% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.94% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.90% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000406 6 44.29% : 0.000180s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.71% : 0.000226s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.252124 196 0.00% : 0.000004s : 1: ForceFp32Comm 1.74% : 0.004390s : 1: add_attr 1.73% : 0.004364s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000077s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000063s : 1: auto_monad 0.01% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.26% : 0.000668s : 1: bootstrap 0.01% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.01% : 0.000018s : 1: event_method 0.01% : 0.000031s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.18% : 0.000443s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.20% : 0.000506s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.34% : 0.000864s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.04% : 0.000093s : 28: opt.transform.opt_b 0.02% : 0.000046s : 2: opt.transform.opt_trans_graph 0.01% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.87% : 0.002203s : 1: opt_a 0.04% : 0.000102s : 1: opt_after_cconv 0.19% : 0.000485s : 1: opt_after_jit_grad 0.08% : 0.000191s : 1: opt_b 1.68% : 0.004240s : 1: optimize 0.02% : 0.000054s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000027s : 1: pre_auto_parallel 0.01% : 0.000022s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.10% : 0.000241s : 1: renormalize.infer 0.08% : 0.000197s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000074s : 1: rewriter_after_opt_a 0.02% : 0.000047s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.03% : 0.000074s : 1: symbol_engine_optimizer 89.34% : 0.225238s : 1: task_emit 0.03% : 0.000076s : 1: tuple_transform 2.67% : 0.006721s : 1: type_inference 0.02% : 0.000062s : 1: validate TotalTime = 0.171365, [24] [bootstrap]: 0.0004984 [type_inference]: 0.00620431 [event_method]: 1.558e-05 [auto_monad]: 5.952e-05 [graph_reusing]: 6.17001e-06 [inline]: 1.89e-06 [add_attr]: 0.00335601, [1] [add_attr_with_inline]: 0.00334641, [1] [Cycle 1]: 5.603e-05, [2] [tag_attr]: 1.698e-05 [meta_addattr_fg_expand]: 4.43999e-06 [parallel-infer-symbol]: 3.57002e-06 [pre_auto_parallel]: 2.94e-05 [insert-virtual-dataset]: 2.96001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.90001e-06 [optimize]: 0.00443143, [53] [py_interpret_to_execute]: 2.234e-05 [rewriter_before_opt_a]: 6.399e-05 [opt_a]: 0.00235796, [2] [Cycle 1]: 0.00171273, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 3.378e-05 [loop_unroll]: 2.159e-05 [a_1]: 0.00047167 [with_stream_mark]: 1.469e-05 [recompute_prepare]: 7.92e-06 [updatestate_depend_eliminate]: 3.91999e-06 [updatestate_assign_eliminate]: 3.35003e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 1.99e-06 [a_2]: 7.793e-05 [accelerated_algorithm]: 7e-06 [shard]: 2.73e-06 [meta_shard_fg_expand]: 2.48e-06 [shard_inline]: 6.62002e-06 [merge_send_recv]: 9.08002e-06 [auto_parallel]: 7.61999e-06 [parallel]: 1.857e-05 [flash_sp]: 7.94002e-06 [merge_comm]: 4.08999e-06 [allreduce_fusion]: 3.56001e-06 [matmul_add_comm_reduction]: 9.56e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.87e-06 [virtual_dataset]: 6.55997e-06 [get_grad_eliminate_]: 5.58002e-06 [virtual_output]: 5.76e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 1.065e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.123e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.05e-06 [meta_fg_expand]: 2.43002e-06 [flash_sp_send_recv_attached]: 2.84001e-06 [receive_attached]: 3.18e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00056471 [add_forward_monad_depend]: 5.27999e-06 [auto_monad_grad]: 2.45002e-06 [auto_monad_eliminator]: 1.578e-05 [cse]: 2.939e-05 [a_3]: 4.214e-05 [Cycle 2]: 0.00063512, [45] [expand_dump_flag]: 1.14e-06 [switch_simplify]: 7.08998e-06 [loop_unroll]: 5.90002e-06 [a_1]: 0.00012778 [with_stream_mark]: 1.053e-05 [recompute_prepare]: 6.43e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 6.923e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.29998e-06 [shard_inline]: 5.35999e-06 [merge_send_recv]: 4.67e-06 [auto_parallel]: 5.35999e-06 [parallel]: 5.24e-06 [flash_sp]: 4.01001e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.84001e-06 [virtual_dataset]: 6.12001e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 2.95998e-06 [cell_reuse_recompute_pass]: 1.81e-06 [offload_activation]: 6.53e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.001e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.50999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.57002e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.15999e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 9.25999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.26002e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 6.73998e-06 [cse]: 1.66e-05 [a_3]: 3.608e-05 [py_interpret_to_execute_after_opt_a]: 9.22999e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 3.862e-05 [convert_after_rewriter]: 9.03002e-06 [order_py_execute_after_rewriter]: 6.96001e-06 [mutable_eliminate]: 0.00052744 [opt_b]: 0.00018898, [1] [Cycle 1]: 0.00018248, [7] [b_1]: 0.00011144 [b_2]: 7.4e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [renormalize]: 2.70025e-07 [cse]: 1.784e-05 [optimize_parallel_all_gather_comm]: 1.632e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.492e-05 [loop_unroll]: 0.0004353 [opt_after_cconv]: 0.00017101, [1] [Cycle 1]: 9.568e-05, [7] [c_1]: 2.989e-05 [parameter_eliminate]: 2.61e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.753e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.333e-05 [tuple_transform]: 7.136e-05, [1] [Cycle 1]: 6.661e-05, [4] [d_1]: 4.012e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 4.668e-05 [cse_after_recomputation]: 2.197e-05, [1] [Cycle 1]: 1.74e-05, [1] [cse]: 1.184e-05 [environ_conv]: 4.94e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.60001e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.62001e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.69999e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.41998e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.25e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.81999e-06 [overlap_recompute_and_grad_model_parallel]: 4.78001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.62001e-06 [overlap_recompute_comm]: 2.68e-06 [overlap_grad_ring_attention]: 4.20999e-06 [overlap_grad_flash_sp]: 2.058e-05 [begin_end_overlap_inline]: 7.89994e-07 [split_matmul_comm_elemetwise]: 2.50002e-06 [split_layernorm_comm]: 2.11e-06 [handle_group_info]: 1.45999e-06 [symbol_engine_optimizer]: 7.111e-05, [1] [Cycle 1]: 6.661e-05, [6] [build]: 3.44001e-06 [elim_shapecalc]: 8.03001e-06 [elim_not_effective]: 1.197e-05 [opt_reshape]: 6.21998e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.723e-05 [get_jit_bprop_graph]: 1.42999e-06 [rewriter_after_jit_bprop_graph]: 3.76999e-06 [opt_after_jit_grad]: 0.00048851 [validate]: 3.698e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.155969 [execute]: 9.71998e-06 Sums bootstrap : 0.000498s : 0.30% type_inference : 0.006204s : 3.72% event_method : 0.000016s : 0.01% auto_monad : 0.000060s : 0.04% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000029s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.01% optimize.rewriter_before_opt_a : 0.000064s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.02% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000599s : 0.36% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.09% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000013s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.01% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000013s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000565s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.01% optimize.opt_a.cse : 0.000046s : 0.03% optimize.opt_a.a_3 : 0.000078s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.02% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000527s : 0.32% optimize.opt_b.b_1 : 0.000111s : 0.07% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.01% optimize.loop_unroll : 0.000435s : 0.26% optimize.opt_after_cconv.c_1 : 0.000030s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.03% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000489s : 0.29% validate : 0.000037s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.155969s : 93.44% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000180 30 14.32% : 0.000026s : 5: substitution.arithmetic_simplify 1.18% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000002s : 2: substitution.fold_const_symbol 3.13% : 0.000006s : 4: substitution.graph_param_transform 67.27% : 0.000121s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.34% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.79% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006156 2 88.84% : 0.005468s : 1: type_inference.infer 11.16% : 0.000687s : 1: type_inference.specialize ------[replace.] 0.000042 5 71.10% : 0.000030s : 3: replace.inline 28.90% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000130 5 91.41% : 0.000119s : 3: match.inline 8.59% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 1131 1.04% : 0.000002s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 1.02% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.42% : 0.000001s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.32% : 0.000004s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.94% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.60% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 11: predicate.minmaximum_grad 1.04% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.38% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.17% : 0.000002s : 11: predicate.reduce_eliminate 2.25% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.50% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000002s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.12% : 0.000002s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.95% : 0.000002s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.18% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.92% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000375 8 40.71% : 0.000153s : 3: func_graph_cloner_run.FuncGraphClonerGraph 59.29% : 0.000222s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.180850 196 0.00% : 0.000004s : 1: ForceFp32Comm 1.86% : 0.003361s : 1: add_attr 1.85% : 0.003350s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000065s : 1: auto_monad 0.01% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.29% : 0.000528s : 1: bootstrap 0.02% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000022s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000445s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.30% : 0.000538s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.54% : 0.000981s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.01% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000092s : 28: opt.transform.opt_b 0.02% : 0.000045s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.31% : 0.002361s : 1: opt_a 0.10% : 0.000175s : 1: opt_after_cconv 0.28% : 0.000499s : 1: opt_after_jit_grad 0.11% : 0.000192s : 1: opt_b 2.45% : 0.004436s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000034s : 1: pre_auto_parallel 0.01% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000293s : 1: renormalize.infer 0.15% : 0.000264s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000043s : 1: rewriter_after_opt_a 0.04% : 0.000069s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000074s : 1: symbol_engine_optimizer 86.25% : 0.155991s : 1: task_emit 0.04% : 0.000074s : 1: tuple_transform 3.44% : 0.006223s : 1: type_inference 0.03% : 0.000062s : 1: validate TotalTime = 0.377404, [24] [bootstrap]: 0.00057552 [type_inference]: 0.0129061 [event_method]: 6.218e-05 [auto_monad]: 0.00011739 [graph_reusing]: 6.98998e-06 [inline]: 2.01e-06 [add_attr]: 0.00346848, [1] [add_attr_with_inline]: 0.00345952, [1] [Cycle 1]: 7.62e-05, [2] [tag_attr]: 3.755e-05 [meta_addattr_fg_expand]: 9.47001e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 5.229e-05 [insert-virtual-dataset]: 1.40001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.34e-06 [pipeline_split]: 1.24003e-06 [optimize]: 0.0162887, [53] [py_interpret_to_execute]: 3.954e-05 [rewriter_before_opt_a]: 0.00015419 [opt_a]: 0.0134112, [3] [Cycle 1]: 0.00865759, [45] [expand_dump_flag]: 3.88001e-06 [switch_simplify]: 7.174e-05 [loop_unroll]: 6.35e-05 [a_1]: 0.00152887 [with_stream_mark]: 2.505e-05 [recompute_prepare]: 2.323e-05 [updatestate_depend_eliminate]: 9.36998e-06 [updatestate_assign_eliminate]: 7.78999e-06 [updatestate_loads_eliminate]: 7.13998e-06 [parameter_eliminate]: 1.96e-06 [a_2]: 0.00025567 [accelerated_algorithm]: 3.3e-05 [shard]: 1.28002e-06 [meta_shard_fg_expand]: 4.58999e-06 [shard_inline]: 1.679e-05 [merge_send_recv]: 1.438e-05 [auto_parallel]: 1.235e-05 [parallel]: 1.422e-05 [flash_sp]: 9.93002e-06 [merge_comm]: 9.68002e-06 [allreduce_fusion]: 8.77999e-06 [matmul_add_comm_reduction]: 2.307e-05 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 2.002e-05 [virtual_dataset]: 1.697e-05 [get_grad_eliminate_]: 1.638e-05 [virtual_output]: 1.6e-05 [merge_forward]: 9.76e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 1.632e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.991e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 2.881e-05 [set_forward_comm_id_for_comm_node_pass]: 9.87001e-06 [meta_fg_expand]: 0.00205533 [flash_sp_send_recv_attached]: 3.60998e-06 [receive_attached]: 1.81e-06 [after_resolve]: 6.79e-05 [a_after_grad]: 8.869e-05 [renormalize]: 0.00316622 [add_forward_monad_depend]: 1.107e-05 [auto_monad_grad]: 6.80998e-06 [auto_monad_eliminator]: 6.502e-05 [cse]: 0.00019887 [a_3]: 0.00035883 [Cycle 2]: 0.00376934, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 4.927e-05 [loop_unroll]: 4.506e-05 [a_1]: 0.00165021 [with_stream_mark]: 1.942e-05 [recompute_prepare]: 1.224e-05 [updatestate_depend_eliminate]: 6.15002e-06 [updatestate_assign_eliminate]: 5.24998e-06 [updatestate_loads_eliminate]: 4.08001e-06 [parameter_eliminate]: 2.29999e-06 [a_2]: 0.00013381 [accelerated_algorithm]: 1.44e-05 [shard]: 1.99e-06 [meta_shard_fg_expand]: 3.71999e-06 [shard_inline]: 9.34e-06 [merge_send_recv]: 1.061e-05 [auto_parallel]: 8.261e-05 [parallel]: 8.42e-06 [flash_sp]: 4.37e-06 [merge_comm]: 6.40002e-06 [allreduce_fusion]: 5.35999e-06 [matmul_add_comm_reduction]: 1.002e-05 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 1.154e-05 [virtual_dataset]: 9.69999e-06 [get_grad_eliminate_]: 5.763e-05 [virtual_output]: 9.91e-06 [merge_forward]: 7.06001e-06 [cell_reuse_recompute_pass]: 1.65001e-06 [offload_activation]: 1.578e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.21e-05 [merge_recompute_call_nodes]: 1.59998e-06 [before_grad]: 1.604e-05 [set_forward_comm_id_for_comm_node_pass]: 6.74001e-06 [meta_fg_expand]: 0.00011988 [flash_sp_send_recv_attached]: 1.90001e-06 [receive_attached]: 1.97999e-06 [after_resolve]: 1.933e-05 [a_after_grad]: 1.627e-05 [renormalize]: 0.00093136 [add_forward_monad_depend]: 5.47001e-06 [auto_monad_grad]: 2.50002e-06 [auto_monad_eliminator]: 1.815e-05 [cse]: 6.449e-05 [a_3]: 7.255e-05 [Cycle 3]: 0.00096568, [45] [expand_dump_flag]: 1.60001e-06 [switch_simplify]: 1.133e-05 [loop_unroll]: 9.65002e-06 [a_1]: 0.00026434 [with_stream_mark]: 1.267e-05 [recompute_prepare]: 1.011e-05 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 4.03999e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012845 [accelerated_algorithm]: 1.311e-05 [shard]: 1.22e-06 [meta_shard_fg_expand]: 2.53e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 8.33001e-06 [auto_parallel]: 8.84e-06 [parallel]: 5.69e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 5.20001e-06 [allreduce_fusion]: 5.46e-06 [matmul_add_comm_reduction]: 8.75999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.105e-05 [virtual_dataset]: 9.24998e-06 [get_grad_eliminate_]: 9.00001e-06 [virtual_output]: 8.74e-06 [merge_forward]: 4.77e-06 [cell_reuse_recompute_pass]: 1.97999e-06 [offload_activation]: 1.057e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.73e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 1.487e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35999e-06 [meta_fg_expand]: 3.40998e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.19003e-06 [after_resolve]: 1.456e-05 [a_after_grad]: 1.437e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.42e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.248e-05 [cse]: 3.244e-05 [a_3]: 6.239e-05 [py_interpret_to_execute_after_opt_a]: 1.962e-05 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 5.595e-05 [convert_after_rewriter]: 1.025e-05 [order_py_execute_after_rewriter]: 7.46999e-06 [mutable_eliminate]: 0.00071765 [opt_b]: 0.000318, [1] [Cycle 1]: 0.00031019, [7] [b_1]: 0.00020149 [b_2]: 1.211e-05 [updatestate_depend_eliminate]: 8.64998e-06 [updatestate_assign_eliminate]: 4.86997e-06 [updatestate_loads_eliminate]: 4.67998e-06 [renormalize]: 4.80009e-07 [cse]: 4.165e-05 [optimize_parallel_all_gather_comm]: 2.353e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 2.749e-05 [loop_unroll]: 0.00061763 [opt_after_cconv]: 0.00015292, [1] [Cycle 1]: 0.00014581, [7] [c_1]: 5.325e-05 [parameter_eliminate]: 3.26001e-06 [updatestate_depend_eliminate]: 8.03001e-06 [updatestate_assign_eliminate]: 4.57998e-06 [updatestate_loads_eliminate]: 4.15e-06 [cse]: 3.698e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 3.72e-05 [tuple_transform]: 0.00011101, [1] [Cycle 1]: 0.00010582, [4] [d_1]: 7.376e-05 [none_parameter_eliminate]: 2.16e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 1.074e-05 [partial_unused_args_eliminate]: 1.89999e-06 [add_recomputation]: 6.497e-05 [cse_after_recomputation]: 3.602e-05, [1] [Cycle 1]: 3.101e-05, [1] [cse]: 2.497e-05 [environ_conv]: 1.105e-05 [swap_dp_allreduce_reducescatter]: 9.17001e-06 [bias_add_comm_swap]: 3.00998e-06 [label_micro_interleaved_index]: 5.31998e-06 [label_fine_grained_interleaved_index]: 2.65002e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.26998e-06 [assign_add_opt]: 1.59e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 8.30012e-07 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 1.44e-06 [add_comm_op_reuse_tag]: 8.09989e-07 [interleave_split_concat_branches]: 1.54e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.982e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 5.57999e-06 [overlap_recompute_and_grad_model_parallel]: 6.14999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.50997e-06 [overlap_grad_ring_attention]: 5.72999e-06 [overlap_grad_flash_sp]: 2.854e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.55002e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 0.00010973, [1] [Cycle 1]: 0.00010528, [6] [build]: 1.178e-05 [elim_shapecalc]: 1.466e-05 [elim_not_effective]: 1.969e-05 [opt_reshape]: 1.215e-05 [fold_const_symbol]: 1.636e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.57001e-06 [pipeline_parallel_scheduler]: 1.80001e-06 [auto_monad_reorder]: 2.778e-05 [get_jit_bprop_graph]: 2.06998e-06 [rewriter_after_jit_bprop_graph]: 4.58001e-06 [opt_after_jit_grad]: 0.00056897 [validate]: 5.663e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.342989 [execute]: 8.68001e-06 Sums bootstrap : 0.000576s : 0.15% type_inference : 0.012906s : 3.46% event_method : 0.000062s : 0.02% auto_monad : 0.000117s : 0.03% graph_reusing : 0.000007s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000052s : 0.01% insert-virtual-dataset : 0.000001s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000001s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.01% optimize.rewriter_before_opt_a : 0.000154s : 0.04% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.04% optimize.opt_a.loop_unroll : 0.000118s : 0.03% optimize.opt_a.a_1 : 0.003443s : 0.92% optimize.opt_a.with_stream_mark : 0.000057s : 0.02% optimize.opt_a.recompute_prepare : 0.000046s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000518s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000061s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.01% optimize.opt_a.merge_send_recv : 0.000033s : 0.01% optimize.opt_a.auto_parallel : 0.000104s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.01% optimize.opt_a.flash_sp : 0.000015s : 0.00% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.01% optimize.opt_a.virtual_dataset : 0.000036s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000083s : 0.02% optimize.opt_a.virtual_output : 0.000035s : 0.01% optimize.opt_a.merge_forward : 0.000022s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000043s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000069s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000060s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.01% optimize.opt_a.meta_fg_expand : 0.002179s : 0.58% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000102s : 0.03% optimize.opt_a.a_after_grad : 0.000119s : 0.03% optimize.opt_a.renormalize : 0.004098s : 1.10% optimize.opt_a.add_forward_monad_depend : 0.000018s : 0.00% optimize.opt_a.auto_monad_grad : 0.000011s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000096s : 0.03% optimize.opt_a.cse : 0.000296s : 0.08% optimize.opt_a.a_3 : 0.000494s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000056s : 0.02% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000718s : 0.19% optimize.opt_b.b_1 : 0.000201s : 0.05% optimize.opt_b.b_2 : 0.000012s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000042s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.01% optimize.loop_unroll : 0.000618s : 0.17% optimize.opt_after_cconv.c_1 : 0.000053s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000037s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000037s : 0.01% optimize.tuple_transform.d_1 : 0.000074s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000065s : 0.02% optimize.cse_after_recomputation.cse : 0.000025s : 0.01% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000028s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000569s : 0.15% validate : 0.000057s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.342989s : 92.07% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000890 222 5.97% : 0.000053s : 12: substitution.arithmetic_simplify 1.97% : 0.000018s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000005s : 5: substitution.float_depend_g_call 0.56% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 56.54% : 0.000503s : 17: substitution.inline 2.21% : 0.000020s : 2: substitution.inline_without_move 1.24% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.03% : 0.000018s : 3: substitution.less_batch_normalization 1.63% : 0.000014s : 11: substitution.minmaximum_grad 0.76% : 0.000007s : 5: substitution.partial_eliminate 1.78% : 0.000016s : 20: substitution.remove_not_recompute_node 3.24% : 0.000029s : 10: substitution.replace_applicator 1.42% : 0.000013s : 15: substitution.replace_old_param 0.28% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.50% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.70% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.29% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.88% : 0.000070s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012819 2 86.17% : 0.011046s : 1: type_inference.infer 13.83% : 0.001773s : 1: type_inference.specialize ------[replace.] 0.000247 33 56.66% : 0.000140s : 17: replace.inline 43.34% : 0.000107s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000527 33 93.65% : 0.000494s : 17: match.inline 6.35% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000788 5764 1.16% : 0.000009s : 68: predicate.accumulaten_eliminater 0.34% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000009s : 68: predicate.addn_zero_filter 1.00% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.22% : 0.000017s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_depend_swap 1.69% : 0.000013s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.20% : 0.000017s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.69% : 0.000005s : 32: predicate.get_grad_eliminate 0.13% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.54% : 0.000044s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000006s : 32: predicate.less_batch_normalization 1.68% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.56% : 0.000020s : 168: predicate.load_eliminater 0.42% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000009s : 68: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.08% : 0.000016s : 101: predicate.partial_defer_inline 1.76% : 0.000014s : 92: predicate.partial_eliminate 1.03% : 0.000008s : 68: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.58% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.81% : 0.000014s : 152: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000009s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.35% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 101: predicate.switch_defer_inline 2.86% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.82% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.53% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000013s : 84: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.94% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.52% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000017s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.53% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.11% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.60% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001858 34 53.78% : 0.000999s : 13: func_graph_cloner_run.FuncGraphClonerGraph 46.22% : 0.000859s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.406855 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.85% : 0.003474s : 1: add_attr 0.85% : 0.003463s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000069s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.03% : 0.000125s : 1: auto_monad 0.01% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.15% : 0.000615s : 1: bootstrap 0.01% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000023s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.01% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000004s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000015s : 1: environ_conv 0.02% : 0.000070s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000011s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.15% : 0.000628s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.18% : 0.000728s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 1.29% : 0.005255s : 117: opt.transform.opt_a 0.01% : 0.000051s : 1: opt.transform.opt_after_cconv 0.01% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000183s : 28: opt.transform.opt_b 0.02% : 0.000082s : 2: opt.transform.opt_trans_graph 0.01% : 0.000059s : 4: opt.transform.symbol_engine_opt 3.30% : 0.013415s : 1: opt_a 0.04% : 0.000157s : 1: opt_after_cconv 0.14% : 0.000580s : 1: opt_after_jit_grad 0.08% : 0.000322s : 1: opt_b 4.00% : 0.016294s : 1: optimize 0.01% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000011s : 1: order_py_execute_after_rewriter 0.01% : 0.000032s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000057s : 1: pre_auto_parallel 0.01% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000042s : 1: remove_dup_value 0.55% : 0.002222s : 2: renormalize.infer 0.46% : 0.001858s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000060s : 1: rewriter_after_opt_a 0.04% : 0.000159s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.03% : 0.000113s : 1: symbol_engine_optimizer 84.31% : 0.343009s : 1: task_emit 0.03% : 0.000114s : 1: tuple_transform 3.18% : 0.012927s : 1: type_inference 0.02% : 0.000092s : 1: validate TotalTime = 0.44424, [24] [bootstrap]: 0.00122276 [type_inference]: 0.00592078 [event_method]: 1.197e-05 [auto_monad]: 5.66e-05 [graph_reusing]: 6.11e-06 [inline]: 2.20002e-06 [add_attr]: 0.00360853, [1] [add_attr_with_inline]: 0.00359874, [1] [Cycle 1]: 0.00027757, [2] [tag_attr]: 0.00022788 [meta_addattr_fg_expand]: 4.65999e-06 [parallel-infer-symbol]: 3.95e-06 [pre_auto_parallel]: 2.884e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.44001e-06 [pipeline_split]: 1.66002e-06 [optimize]: 0.00422964, [53] [py_interpret_to_execute]: 2.014e-05 [rewriter_before_opt_a]: 4.423e-05 [opt_a]: 0.00216626, [2] [Cycle 1]: 0.00153008, [45] [expand_dump_flag]: 3.23e-06 [switch_simplify]: 2.732e-05 [loop_unroll]: 1.495e-05 [a_1]: 0.00031843 [with_stream_mark]: 1.554e-05 [recompute_prepare]: 8.88002e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 1.92999e-06 [a_2]: 7.951e-05 [accelerated_algorithm]: 6.84999e-06 [shard]: 2.79999e-06 [meta_shard_fg_expand]: 1.90001e-06 [shard_inline]: 6.42001e-06 [merge_send_recv]: 8.07998e-06 [auto_parallel]: 6.63e-06 [parallel]: 7.401e-05 [flash_sp]: 8.23999e-06 [merge_comm]: 4.18001e-06 [allreduce_fusion]: 4.03001e-06 [matmul_add_comm_reduction]: 9.71998e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 8.58001e-06 [virtual_dataset]: 6.36e-06 [get_grad_eliminate_]: 5.71003e-06 [virtual_output]: 6.23998e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 1.09e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.234e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 1.001e-05 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.61999e-06 [flash_sp_send_recv_attached]: 2.46998e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 1.075e-05 [a_after_grad]: 9.32001e-06 [renormalize]: 0.0004769 [add_forward_monad_depend]: 5.04e-06 [auto_monad_grad]: 2.66e-06 [auto_monad_eliminator]: 1.554e-05 [cse]: 3.806e-05 [a_3]: 4.293e-05 [Cycle 2]: 0.00062477, [45] [expand_dump_flag]: 1.34e-06 [switch_simplify]: 7.23999e-06 [loop_unroll]: 5.62999e-06 [a_1]: 0.00012851 [with_stream_mark]: 1.318e-05 [recompute_prepare]: 6.17999e-06 [updatestate_depend_eliminate]: 2.94999e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 7.057e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.25001e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 4.90999e-06 [auto_parallel]: 6.14999e-06 [parallel]: 4.90999e-06 [flash_sp]: 3.55e-06 [merge_comm]: 3.27002e-06 [allreduce_fusion]: 2.93e-06 [matmul_add_comm_reduction]: 6.29999e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.54999e-06 [virtual_dataset]: 5.60001e-06 [get_grad_eliminate_]: 5.20001e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.79999e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 7.49002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.12e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.95001e-06 [flash_sp_send_recv_attached]: 1.09998e-06 [receive_attached]: 1.25001e-06 [after_resolve]: 9.31e-06 [a_after_grad]: 8.35001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 7.68999e-06 [cse]: 1.515e-05 [a_3]: 3.315e-05 [py_interpret_to_execute_after_opt_a]: 9.09998e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.414e-05 [convert_after_rewriter]: 7.5e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00054638 [opt_b]: 0.00019156, [1] [Cycle 1]: 0.00018511, [7] [b_1]: 0.00011204 [b_2]: 7.51001e-06 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [renormalize]: 4.09986e-07 [cse]: 1.838e-05 [optimize_parallel_all_gather_comm]: 1.756e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 2.734e-05 [loop_unroll]: 0.00049727 [opt_after_cconv]: 9.938e-05, [1] [Cycle 1]: 9.351e-05, [7] [c_1]: 2.883e-05 [parameter_eliminate]: 2.79001e-06 [updatestate_depend_eliminate]: 5.96e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.43e-06 [cse]: 1.697e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.364e-05 [tuple_transform]: 7.441e-05, [1] [Cycle 1]: 6.985e-05, [4] [d_1]: 4.298e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.80998e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 4.884e-05 [cse_after_recomputation]: 2.18e-05, [1] [Cycle 1]: 1.698e-05, [1] [cse]: 1.173e-05 [environ_conv]: 5.82001e-06 [swap_dp_allreduce_reducescatter]: 5.56e-06 [bias_add_comm_swap]: 2.34999e-06 [label_micro_interleaved_index]: 4.97e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.68998e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.32999e-06 [interleave_split_concat_branches]: 1.39e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09e-06 [control_data_broadcast_order]: 1.223e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 3.93999e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.16001e-06 [overlap_grad_flash_sp]: 1.962e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.39001e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 7.318e-05, [1] [Cycle 1]: 6.871e-05, [6] [build]: 3.87002e-06 [elim_shapecalc]: 8.75001e-06 [elim_not_effective]: 1.179e-05 [opt_reshape]: 6.53e-06 [fold_const_symbol]: 9.01002e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.717e-05 [get_jit_bprop_graph]: 1.22e-06 [rewriter_after_jit_bprop_graph]: 4.16001e-06 [opt_after_jit_grad]: 0.00050684 [validate]: 3.681e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.428297 [execute]: 1.017e-05 Sums bootstrap : 0.001223s : 0.28% type_inference : 0.005921s : 1.35% event_method : 0.000012s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000228s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000029s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000044s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000035s : 0.01% optimize.opt_a.loop_unroll : 0.000021s : 0.00% optimize.opt_a.a_1 : 0.000447s : 0.10% optimize.opt_a.with_stream_mark : 0.000029s : 0.01% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000079s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000477s : 0.11% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.01% optimize.opt_a.cse : 0.000053s : 0.01% optimize.opt_a.a_3 : 0.000076s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000546s : 0.12% optimize.opt_b.b_1 : 0.000112s : 0.03% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.01% optimize.loop_unroll : 0.000497s : 0.11% optimize.opt_after_cconv.c_1 : 0.000029s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000043s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000507s : 0.12% validate : 0.000037s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.428297s : 97.39% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000137 26 17.63% : 0.000024s : 4: substitution.arithmetic_simplify 1.63% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000001s : 2: substitution.fold_const_symbol 4.15% : 0.000006s : 4: substitution.graph_param_transform 67.14% : 0.000092s : 2: substitution.inline 2.10% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.39% : 0.000005s : 4: substitution.remove_not_recompute_node 3.05% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.005871 2 92.99% : 0.005459s : 1: type_inference.infer 7.01% : 0.000412s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000090 2 100.00% : 0.000090s : 2: match.inline ------[predicate.] 0.000144 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.31% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 1.19% : 0.000002s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.75% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.74% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.43% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000008s : 44: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 1.98% : 0.000003s : 26: predicate.load_eliminater 1.41% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.98% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.92% : 0.000003s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 1.09% : 0.000002s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.94% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.87% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000284 6 40.58% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.42% : 0.000169s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.453531 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.80% : 0.003614s : 1: add_attr 0.79% : 0.003602s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000062s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.28% : 0.001287s : 1: bootstrap 0.01% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000018s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.11% : 0.000507s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.12% : 0.000556s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.18% : 0.000817s : 78: opt.transform.opt_a 0.01% : 0.000028s : 1: opt.transform.opt_after_cconv 0.01% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000094s : 28: opt.transform.opt_b 0.01% : 0.000048s : 2: opt.transform.opt_trans_graph 0.01% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.48% : 0.002170s : 1: opt_a 0.02% : 0.000103s : 1: opt_after_cconv 0.11% : 0.000519s : 1: opt_after_jit_grad 0.04% : 0.000195s : 1: opt_b 0.93% : 0.004234s : 1: optimize 0.00% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000033s : 1: pre_auto_parallel 0.01% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.06% : 0.000270s : 1: renormalize.infer 0.04% : 0.000199s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000039s : 1: rewriter_after_opt_a 0.01% : 0.000048s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000076s : 1: symbol_engine_optimizer 94.44% : 0.428322s : 1: task_emit 0.02% : 0.000077s : 1: tuple_transform 1.31% : 0.005940s : 1: type_inference 0.01% : 0.000066s : 1: validate TotalTime = 0.862567, [24] [bootstrap]: 0.00049411 [type_inference]: 0.026427 [event_method]: 4.918e-05 [auto_monad]: 0.00012294 [graph_reusing]: 8.47e-06 [inline]: 2.12001e-06 [add_attr]: 0.00319629, [1] [add_attr_with_inline]: 0.00318767, [1] [Cycle 1]: 7.332e-05, [2] [tag_attr]: 3.467e-05 [meta_addattr_fg_expand]: 8.75001e-06 [parallel-infer-symbol]: 3.55e-06 [pre_auto_parallel]: 4.932e-05 [insert-virtual-dataset]: 2.91e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0278988, [53] [py_interpret_to_execute]: 3.8e-05 [rewriter_before_opt_a]: 0.00013002 [opt_a]: 0.0254911, [3] [Cycle 1]: 0.0214674, [45] [expand_dump_flag]: 3.79002e-06 [switch_simplify]: 6.721e-05 [loop_unroll]: 5.547e-05 [a_1]: 0.00137892 [with_stream_mark]: 2.518e-05 [recompute_prepare]: 2.217e-05 [updatestate_depend_eliminate]: 9.66e-06 [updatestate_assign_eliminate]: 8.11002e-06 [updatestate_loads_eliminate]: 7.75e-06 [parameter_eliminate]: 3.36001e-06 [a_2]: 0.00024815 [accelerated_algorithm]: 3.113e-05 [shard]: 1.93997e-06 [meta_shard_fg_expand]: 3.70998e-06 [shard_inline]: 1.6e-05 [merge_send_recv]: 1.717e-05 [auto_parallel]: 1.086e-05 [parallel]: 1.891e-05 [flash_sp]: 1.228e-05 [merge_comm]: 9.87001e-06 [allreduce_fusion]: 9.56e-06 [matmul_add_comm_reduction]: 2.83e-05 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 1.892e-05 [virtual_dataset]: 1.57e-05 [get_grad_eliminate_]: 1.541e-05 [virtual_output]: 1.546e-05 [merge_forward]: 9.82001e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.894e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.928e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 2.823e-05 [set_forward_comm_id_for_comm_node_pass]: 9.53002e-06 [meta_fg_expand]: 0.001593 [flash_sp_send_recv_attached]: 3.98999e-06 [receive_attached]: 2.63e-06 [after_resolve]: 5.998e-05 [a_after_grad]: 8.253e-05 [renormalize]: 0.0166953 [add_forward_monad_depend]: 1.053e-05 [auto_monad_grad]: 6.28e-06 [auto_monad_eliminator]: 5.703e-05 [cse]: 0.0001812 [a_3]: 0.0003371 [Cycle 2]: 0.00309725, [45] [expand_dump_flag]: 2.04e-06 [switch_simplify]: 4.696e-05 [loop_unroll]: 4.395e-05 [a_1]: 0.00156018 [with_stream_mark]: 1.389e-05 [recompute_prepare]: 1.197e-05 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 4.48999e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 1.22e-06 [a_2]: 0.00012737 [accelerated_algorithm]: 1.249e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 2.42001e-06 [shard_inline]: 9.42999e-06 [merge_send_recv]: 8.08001e-06 [auto_parallel]: 7.20998e-06 [parallel]: 5.37001e-06 [flash_sp]: 3.2e-06 [merge_comm]: 5.16002e-06 [allreduce_fusion]: 4.55001e-06 [matmul_add_comm_reduction]: 9.64999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.071e-05 [virtual_dataset]: 9.31998e-06 [get_grad_eliminate_]: 8.69e-06 [virtual_output]: 8.65001e-06 [merge_forward]: 4.47998e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 2.913e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.75e-05 [merge_recompute_call_nodes]: 1.04998e-06 [before_grad]: 1.499e-05 [set_forward_comm_id_for_comm_node_pass]: 5.77999e-06 [meta_fg_expand]: 3.846e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.22999e-06 [after_resolve]: 1.583e-05 [a_after_grad]: 1.493e-05 [renormalize]: 0.00064653 [add_forward_monad_depend]: 4.17e-06 [auto_monad_grad]: 1.42e-06 [auto_monad_eliminator]: 1.471e-05 [cse]: 4.928e-05 [a_3]: 6.654e-05 [Cycle 3]: 0.00091205, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.082e-05 [loop_unroll]: 9.15999e-06 [a_1]: 0.00025249 [with_stream_mark]: 1.015e-05 [recompute_prepare]: 9.57999e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 4.25e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012404 [accelerated_algorithm]: 1.169e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.08002e-06 [merge_send_recv]: 7.14001e-06 [auto_parallel]: 6.94001e-06 [parallel]: 4.94e-06 [flash_sp]: 1.00001e-06 [merge_comm]: 4.92999e-06 [allreduce_fusion]: 5.21002e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 8.60001e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.42003e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.55999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.608e-05 [merge_recompute_call_nodes]: 1.25001e-06 [before_grad]: 1.386e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 3.15998e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.325e-05 [a_after_grad]: 1.392e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.50001e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 1.087e-05 [cse]: 2.84e-05 [a_3]: 6.002e-05 [py_interpret_to_execute_after_opt_a]: 1.201e-05 [slice_cell_reuse_recomputed_activation]: 2.56998e-06 [rewriter_after_opt_a]: 4.884e-05 [convert_after_rewriter]: 9.77001e-06 [order_py_execute_after_rewriter]: 7.06001e-06 [mutable_eliminate]: 0.00051035 [opt_b]: 0.00029568, [1] [Cycle 1]: 0.00028847, [7] [b_1]: 0.00019242 [b_2]: 1.106e-05 [updatestate_depend_eliminate]: 6.98e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 4.08999e-06 [renormalize]: 3.19997e-07 [cse]: 3.423e-05 [optimize_parallel_all_gather_comm]: 2.179e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00044277 [opt_after_cconv]: 0.00014178, [1] [Cycle 1]: 0.00013558, [7] [c_1]: 4.979e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 7.31001e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 4.02e-06 [cse]: 3.291e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.985e-05 [tuple_transform]: 0.00010417, [1] [Cycle 1]: 9.931e-05, [4] [d_1]: 6.841e-05 [none_parameter_eliminate]: 1.69998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.024e-05 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 6.142e-05 [cse_after_recomputation]: 3.499e-05, [1] [Cycle 1]: 3.006e-05, [1] [cse]: 2.415e-05 [environ_conv]: 9.36e-06 [swap_dp_allreduce_reducescatter]: 8.33001e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.52001e-06 [slice_recompute_activation]: 2.72001e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.06997e-06 [overlap_opt_shard_in_pipeline]: 1.81e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.729e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 5.35001e-06 [overlap_recompute_and_grad_model_parallel]: 6.21e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 5.10999e-06 [overlap_grad_flash_sp]: 2.388e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.23002e-06 [split_layernorm_comm]: 2.15002e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 0.00010361, [1] [Cycle 1]: 9.903e-05, [6] [build]: 1.092e-05 [elim_shapecalc]: 1.427e-05 [elim_not_effective]: 1.951e-05 [opt_reshape]: 1.022e-05 [fold_const_symbol]: 1.523e-05 [renormalize]: 2.60014e-07 [detach_backward]: 2.10002e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.673e-05 [get_jit_bprop_graph]: 1.27999e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00049014 [validate]: 4.89e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.803493 [execute]: 9.50001e-06 Sums bootstrap : 0.000494s : 0.06% type_inference : 0.026427s : 3.08% event_method : 0.000049s : 0.01% auto_monad : 0.000123s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000049s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000130s : 0.02% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.01% optimize.opt_a.loop_unroll : 0.000109s : 0.01% optimize.opt_a.a_1 : 0.003192s : 0.37% optimize.opt_a.with_stream_mark : 0.000049s : 0.01% optimize.opt_a.recompute_prepare : 0.000044s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.00% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000500s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000032s : 0.00% optimize.opt_a.auto_parallel : 0.000025s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000046s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000033s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000057s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001635s : 0.19% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.01% optimize.opt_a.a_after_grad : 0.000111s : 0.01% optimize.opt_a.renormalize : 0.017342s : 2.02% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.01% optimize.opt_a.cse : 0.000259s : 0.03% optimize.opt_a.a_3 : 0.000464s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.01% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000510s : 0.06% optimize.opt_b.b_1 : 0.000192s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000443s : 0.05% optimize.opt_after_cconv.c_1 : 0.000050s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.00% optimize.tuple_transform.d_1 : 0.000068s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.01% optimize.cse_after_recomputation.cse : 0.000024s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000490s : 0.06% validate : 0.000049s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.803493s : 93.64% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000781 218 5.93% : 0.000046s : 11: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.64% : 0.000005s : 5: substitution.float_depend_g_call 0.62% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.31% : 0.000002s : 2: substitution.incorporate_call_switch 55.58% : 0.000434s : 16: substitution.inline 2.09% : 0.000016s : 2: substitution.inline_without_move 1.40% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000016s : 3: substitution.less_batch_normalization 1.83% : 0.000014s : 11: substitution.minmaximum_grad 0.79% : 0.000006s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.14% : 0.000025s : 10: substitution.replace_applicator 1.37% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.58% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.42% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.20% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.026346 2 94.46% : 0.024886s : 1: type_inference.infer 5.54% : 0.001460s : 1: type_inference.specialize ------[replace.] 0.000208 30 59.50% : 0.000124s : 16: replace.inline 40.50% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000457 30 93.21% : 0.000426s : 16: match.inline 6.79% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.32% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.21% : 0.000017s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.14% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000042s : 244: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 164: predicate.load_eliminater 0.36% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 97: predicate.partial_defer_inline 1.69% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.60% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.45% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.70% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.51% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.16% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001671 32 58.35% : 0.000975s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.65% : 0.000696s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.916187 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.35% : 0.003201s : 1: add_attr 0.35% : 0.003192s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000130s : 1: auto_monad 0.00% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000523s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000057s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000452s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000519s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.53% : 0.004857s : 117: opt.transform.opt_a 0.01% : 0.000049s : 1: opt.transform.opt_after_cconv 0.00% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000178s : 28: opt.transform.opt_b 0.01% : 0.000076s : 2: opt.transform.opt_trans_graph 0.01% : 0.000056s : 4: opt.transform.symbol_engine_opt 2.78% : 0.025494s : 1: opt_a 0.02% : 0.000145s : 1: opt_after_cconv 0.05% : 0.000499s : 1: opt_after_jit_grad 0.03% : 0.000299s : 1: opt_b 3.05% : 0.027904s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000054s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000034s : 1: remove_dup_value 1.72% : 0.015804s : 2: renormalize.infer 0.17% : 0.001523s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000053s : 1: rewriter_after_opt_a 0.01% : 0.000134s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000050s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000106s : 1: symbol_engine_optimizer 87.70% : 0.803517s : 1: task_emit 0.01% : 0.000107s : 1: tuple_transform 2.89% : 0.026448s : 1: type_inference 0.01% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x2-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x3-pynative],max_mem:6.0M TotalTime = 0.0247535, [24] [bootstrap]: 0.0005717 [type_inference]: 0.00710288 [event_method]: 1.551e-05 [auto_monad]: 6.535e-05 [graph_reusing]: 6.17001e-06 [inline]: 2.20002e-06 [add_attr]: 0.00413927, [1] [add_attr_with_inline]: 0.0041264, [1] [Cycle 1]: 5.44e-05, [2] [tag_attr]: 1.732e-05 [meta_addattr_fg_expand]: 5.00999e-06 [parallel-infer-symbol]: 3.46001e-06 [pre_auto_parallel]: 3.402e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.94999e-06 [optimize]: 0.00450088, [53] [py_interpret_to_execute]: 2.402e-05 [rewriter_before_opt_a]: 6.544e-05 [opt_a]: 0.00237679, [2] [Cycle 1]: 0.00175468, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 3.395e-05 [loop_unroll]: 2.14e-05 [a_1]: 0.00048315 [with_stream_mark]: 1.79e-05 [recompute_prepare]: 8.37e-06 [updatestate_depend_eliminate]: 4.37e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 1.71998e-06 [a_2]: 7.814e-05 [accelerated_algorithm]: 6.59001e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 6.04999e-06 [merge_send_recv]: 8.47998e-06 [auto_parallel]: 6.44001e-06 [parallel]: 2.976e-05 [flash_sp]: 7.95e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 9.83998e-06 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 8.13001e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 6.18002e-06 [virtual_output]: 6.19999e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.137e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 1.003e-05 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.072e-05 [a_after_grad]: 9.22001e-06 [renormalize]: 0.00058627 [add_forward_monad_depend]: 4.92e-06 [auto_monad_grad]: 2.22001e-06 [auto_monad_eliminator]: 1.472e-05 [cse]: 3.101e-05 [a_3]: 4.377e-05 [Cycle 2]: 0.00061244, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 7.23e-06 [loop_unroll]: 5.96e-06 [a_1]: 0.00012946 [with_stream_mark]: 1.059e-05 [recompute_prepare]: 6.02999e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.841e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.49998e-06 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.29998e-06 [parallel]: 4.85999e-06 [flash_sp]: 3.53e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.58998e-06 [matmul_add_comm_reduction]: 5.71003e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.30002e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.73998e-06 [cell_reuse_recompute_pass]: 1.61998e-06 [offload_activation]: 6.33e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.81998e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.07e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18998e-06 [meta_fg_expand]: 1.80001e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.45001e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.87002e-06 [cse]: 1.516e-05 [a_3]: 3.244e-05 [py_interpret_to_execute_after_opt_a]: 8.18001e-06 [slice_cell_reuse_recomputed_activation]: 2.51e-06 [rewriter_after_opt_a]: 3.466e-05 [convert_after_rewriter]: 7.65998e-06 [order_py_execute_after_rewriter]: 5.04998e-06 [mutable_eliminate]: 0.00060108 [opt_b]: 0.00019404, [1] [Cycle 1]: 0.0001877, [7] [b_1]: 0.00011458 [b_2]: 8.08999e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.67001e-06 [renormalize]: 4.50003e-07 [cse]: 1.873e-05 [optimize_parallel_all_gather_comm]: 2.015e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.579e-05 [loop_unroll]: 0.00045853 [opt_after_cconv]: 0.00010106, [1] [Cycle 1]: 9.526e-05, [7] [c_1]: 2.957e-05 [parameter_eliminate]: 2.86999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.865e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.297e-05 [tuple_transform]: 7.65e-05, [1] [Cycle 1]: 7.136e-05, [4] [d_1]: 4.361e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 6.76999e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 6.186e-05 [cse_after_recomputation]: 2.387e-05, [1] [Cycle 1]: 1.895e-05, [1] [cse]: 1.329e-05 [environ_conv]: 6.01e-06 [swap_dp_allreduce_reducescatter]: 5.66003e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.56e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.37e-06 [overlap_opt_shard_in_pipeline]: 1.59e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.237e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 3.91999e-06 [overlap_recompute_and_grad_model_parallel]: 4.58999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 3.95998e-06 [overlap_grad_flash_sp]: 1.85e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.75997e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 7.261e-05, [1] [Cycle 1]: 6.809e-05, [6] [build]: 3.63e-06 [elim_shapecalc]: 9.03002e-06 [elim_not_effective]: 1.16e-05 [opt_reshape]: 6.78e-06 [fold_const_symbol]: 9.15001e-06 [renormalize]: 1.90019e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.56002e-06 [auto_monad_reorder]: 1.712e-05 [get_jit_bprop_graph]: 1.23002e-06 [rewriter_after_jit_bprop_graph]: 0.00013147 [opt_after_jit_grad]: 0.00049639 [validate]: 3.838e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.00738231 [execute]: 8.70001e-06 Sums bootstrap : 0.000572s : 2.92% type_inference : 0.007103s : 36.24% event_method : 0.000016s : 0.08% auto_monad : 0.000065s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000034s : 0.17% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.12% optimize.rewriter_before_opt_a : 0.000065s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.21% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000613s : 3.13% optimize.opt_a.with_stream_mark : 0.000028s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000147s : 0.75% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000035s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.03% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000018s : 0.09% optimize.opt_a.renormalize : 0.000586s : 2.99% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.11% optimize.opt_a.cse : 0.000046s : 0.24% optimize.opt_a.a_3 : 0.000076s : 0.39% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.18% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000601s : 3.07% optimize.opt_b.b_1 : 0.000115s : 0.58% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.13% optimize.loop_unroll : 0.000459s : 2.34% optimize.opt_after_cconv.c_1 : 0.000030s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000044s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000062s : 0.32% optimize.cse_after_recomputation.cse : 0.000013s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000131s : 0.67% opt_after_jit_grad : 0.000496s : 2.53% validate : 0.000038s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.007382s : 37.66% execute : 0.000009s : 0.04% Time group info: ------[substitution.] 0.000186 30 14.42% : 0.000027s : 5: substitution.arithmetic_simplify 0.94% : 0.000002s : 2: substitution.elim_not_effective 0.69% : 0.000001s : 2: substitution.fold_const_symbol 3.41% : 0.000006s : 4: substitution.graph_param_transform 68.22% : 0.000127s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000005s : 4: substitution.remove_not_recompute_node 2.08% : 0.000004s : 4: substitution.replace_old_param 6.08% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007039 2 90.36% : 0.006361s : 1: type_inference.infer 9.64% : 0.000679s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.25% : 0.000027s : 3: replace.inline 30.75% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000135 5 92.48% : 0.000125s : 3: match.inline 7.52% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000165 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.74% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.94% : 0.000002s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.06% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.96% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.01% : 0.000003s : 16: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.36% : 0.000001s : 4: predicate.graph_param_transform 0.61% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.28% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.83% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.51% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000003s : 16: predicate.partial_defer_inline 1.36% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.28% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.96% : 0.000002s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000002s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 16: predicate.switch_defer_inline 1.90% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.88% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.80% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.57% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000431 8 43.19% : 0.000186s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.81% : 0.000245s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.035124 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.80% : 0.004144s : 1: add_attr 11.76% : 0.004131s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000070s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.74% : 0.000613s : 1: bootstrap 0.08% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000027s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.33% : 0.000468s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000611s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.82% : 0.000991s : 78: opt.transform.opt_a 0.08% : 0.000028s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000048s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.78% : 0.002380s : 1: opt_a 0.30% : 0.000105s : 1: opt_after_cconv 1.44% : 0.000507s : 1: opt_after_jit_grad 0.56% : 0.000198s : 1: opt_b 12.83% : 0.004505s : 1: optimize 0.07% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000038s : 1: pre_auto_parallel 0.08% : 0.000028s : 1: py_interpret_to_execute 0.03% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.87% : 0.000306s : 1: renormalize.infer 0.78% : 0.000273s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.39% : 0.000138s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000039s : 1: rewriter_after_opt_a 0.20% : 0.000070s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.21% : 0.000075s : 1: symbol_engine_optimizer 21.06% : 0.007396s : 1: task_emit 0.23% : 0.000079s : 1: tuple_transform 20.27% : 0.007120s : 1: type_inference 0.19% : 0.000068s : 1: validate TotalTime = 0.0213702, [24] [bootstrap]: 0.00083741 [type_inference]: 0.00565201 [event_method]: 1.162e-05 [auto_monad]: 5.396e-05 [graph_reusing]: 5.56e-06 [inline]: 1.96003e-06 [add_attr]: 0.00330436, [1] [add_attr_with_inline]: 0.00329423, [1] [Cycle 1]: 5.452e-05, [2] [tag_attr]: 1.435e-05 [meta_addattr_fg_expand]: 3.43999e-06 [parallel-infer-symbol]: 3.29001e-06 [pre_auto_parallel]: 2.731e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00409932, [53] [py_interpret_to_execute]: 1.762e-05 [rewriter_before_opt_a]: 4.336e-05 [opt_a]: 0.00210018, [2] [Cycle 1]: 0.00143041, [45] [expand_dump_flag]: 3.26999e-06 [switch_simplify]: 2.511e-05 [loop_unroll]: 1.344e-05 [a_1]: 0.00031329 [with_stream_mark]: 1.641e-05 [recompute_prepare]: 7.56001e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.50998e-06 [updatestate_loads_eliminate]: 3.23998e-06 [parameter_eliminate]: 1.92999e-06 [a_2]: 7.649e-05 [accelerated_algorithm]: 7.09001e-06 [shard]: 2.36e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.28001e-06 [auto_parallel]: 6.04999e-06 [parallel]: 2.464e-05 [flash_sp]: 8.55999e-06 [merge_comm]: 3.77998e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 7.88001e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.41998e-06 [virtual_output]: 5.94e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 1.016e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.17e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 9.89001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.57001e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.99999e-06 [after_resolve]: 1.105e-05 [a_after_grad]: 8.67e-06 [renormalize]: 0.00045861 [add_forward_monad_depend]: 5.12e-06 [auto_monad_grad]: 2.58e-06 [auto_monad_eliminator]: 1.403e-05 [cse]: 2.903e-05 [a_3]: 4.261e-05 [Cycle 2]: 0.00065934, [45] [expand_dump_flag]: 1.29e-06 [switch_simplify]: 7.4e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012719 [with_stream_mark]: 1.13e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 6.966e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.54e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.68002e-06 [merge_send_recv]: 4.74e-06 [auto_parallel]: 5.43002e-06 [parallel]: 4.92999e-06 [flash_sp]: 3.61999e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 2.96001e-06 [matmul_add_comm_reduction]: 6.95998e-06 [allreduce_slice_to_reducescatter]: 4.80009e-07 [virtual_shard_identity]: 8.04002e-06 [virtual_dataset]: 6.48e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.24001e-06 [cell_reuse_recompute_pass]: 2.11e-06 [offload_activation]: 7.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.173e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.77999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.44e-06 [after_resolve]: 1.001e-05 [a_after_grad]: 8.09002e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.49e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 6.64001e-06 [cse]: 1.355e-05 [a_3]: 3.142e-05 [py_interpret_to_execute_after_opt_a]: 9.63002e-06 [slice_cell_reuse_recomputed_activation]: 1.99999e-06 [rewriter_after_opt_a]: 3.946e-05 [convert_after_rewriter]: 7.93001e-06 [order_py_execute_after_rewriter]: 5.94999e-06 [mutable_eliminate]: 0.00052623 [opt_b]: 0.00022801, [1] [Cycle 1]: 0.00022182, [7] [b_1]: 0.00014741 [b_2]: 7.51999e-06 [updatestate_depend_eliminate]: 6.63998e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 4.19997e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.77e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 2.635e-05 [loop_unroll]: 0.00043086 [opt_after_cconv]: 9.689e-05, [1] [Cycle 1]: 9.111e-05, [7] [c_1]: 2.865e-05 [parameter_eliminate]: 2.69001e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.623e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.365e-05 [tuple_transform]: 7.041e-05, [1] [Cycle 1]: 6.593e-05, [4] [d_1]: 4.007e-05 [none_parameter_eliminate]: 1.51002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 2.58003e-06 [add_recomputation]: 4.785e-05 [cse_after_recomputation]: 1.948e-05, [1] [Cycle 1]: 1.504e-05, [1] [cse]: 1.015e-05 [environ_conv]: 4.70001e-06 [swap_dp_allreduce_reducescatter]: 5.39e-06 [bias_add_comm_swap]: 3.01001e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.29003e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.54001e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 1.28002e-06 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.239e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 4.08001e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.961e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.979e-05, [1] [Cycle 1]: 6.566e-05, [6] [build]: 2.74999e-06 [elim_shapecalc]: 8.63001e-06 [elim_not_effective]: 1.214e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.25002e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.7e-05 [get_jit_bprop_graph]: 1.60999e-06 [rewriter_after_jit_bprop_graph]: 3.99002e-06 [opt_after_jit_grad]: 0.00046409 [validate]: 3.61e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00660993 [execute]: 8.15e-06 Sums bootstrap : 0.000837s : 4.92% type_inference : 0.005652s : 33.22% event_method : 0.000012s : 0.07% auto_monad : 0.000054s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.10% optimize.rewriter_before_opt_a : 0.000043s : 0.25% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000033s : 0.19% optimize.opt_a.loop_unroll : 0.000019s : 0.11% optimize.opt_a.a_1 : 0.000440s : 2.59% optimize.opt_a.with_stream_mark : 0.000028s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000008s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000459s : 2.70% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000039s : 0.23% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000526s : 3.09% optimize.opt_b.b_1 : 0.000147s : 0.87% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.15% optimize.loop_unroll : 0.000431s : 2.53% optimize.opt_after_cconv.c_1 : 0.000029s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000003s : 0.02% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000464s : 2.73% validate : 0.000036s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006610s : 38.85% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000138 26 17.89% : 0.000025s : 4: substitution.arithmetic_simplify 1.62% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.24% : 0.000006s : 4: substitution.graph_param_transform 66.25% : 0.000091s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.47% : 0.000005s : 4: substitution.remove_not_recompute_node 3.27% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.005605 2 93.20% : 0.005224s : 1: type_inference.infer 6.80% : 0.000381s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000090 2 100.00% : 0.000090s : 2: match.inline ------[predicate.] 0.000175 984 0.73% : 0.000001s : 9: predicate.accumulaten_eliminater 0.90% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.58% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.99% : 0.000003s : 17: predicate.arithmetic_simplify 0.67% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.49% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.65% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.72% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.63% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.86% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.82% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.86% : 0.000002s : 13: predicate.environ_get_depend_swap 1.40% : 0.000002s : 21: predicate.environ_get_eliminate 0.83% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.76% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.54% : 0.000003s : 11: predicate.float_depend_g_call 0.51% : 0.000001s : 8: predicate.float_environ_get_switch 0.79% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.65% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.51% : 0.000001s : 8: predicate.incorporate_call_switch 5.17% : 0.000009s : 44: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000002s : 8: predicate.less_batch_normalization 1.36% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.67% : 0.000003s : 26: predicate.load_eliminater 1.43% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.34% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.47% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.54% : 0.000001s : 8: predicate.merge_addn 0.57% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.57% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.55% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.12% : 0.000002s : 11: predicate.partial_defer_inline 0.95% : 0.000002s : 13: predicate.partial_eliminate 0.61% : 0.000001s : 9: predicate.print_const_string_wrapper 0.60% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000002s : 9: predicate.reduce_eliminate 1.67% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.12% : 0.000002s : 17: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000001s : 4: predicate.reset_defer_inline 0.59% : 0.000001s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.34% : 0.000001s : 4: predicate.row_tensor_eliminate 0.71% : 0.000001s : 8: predicate.same_eliminate 0.45% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000002s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 1.27% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.79% : 0.000001s : 11: predicate.switch_defer_inline 1.32% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.73% : 0.000007s : 41: predicate.switch_simplify 0.61% : 0.000001s : 9: predicate.tile_eliminate 0.64% : 0.000001s : 9: predicate.transpose_eliminate 1.28% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.16% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 20.36% : 0.000036s : 25: predicate.tuple_list_get_item_eliminator 1.16% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.84% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.22% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.66% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.38% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000279 6 42.44% : 0.000119s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.56% : 0.000161s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030228 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.95% : 0.003309s : 1: add_attr 10.91% : 0.003298s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000059s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.91% : 0.000880s : 1: bootstrap 0.10% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.46% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000536s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.66% : 0.000805s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.42% : 0.000127s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.96% : 0.002103s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.57% : 0.000474s : 1: opt_after_jit_grad 0.77% : 0.000232s : 1: opt_b 13.57% : 0.004103s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000006s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.07% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.87% : 0.000263s : 1: renormalize.infer 0.62% : 0.000188s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.15% : 0.000044s : 1: rewriter_after_opt_a 0.16% : 0.000048s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 21.91% : 0.006623s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 18.76% : 0.005670s : 1: type_inference 0.22% : 0.000067s : 1: validate TotalTime = 0.0413886, [24] [bootstrap]: 0.000499 [type_inference]: 0.0260512 [event_method]: 1.433e-05 [auto_monad]: 5.549e-05 [graph_reusing]: 5.74999e-06 [inline]: 1.96e-06 [add_attr]: 0.00335304, [1] [add_attr_with_inline]: 0.0033422, [1] [Cycle 1]: 5.809e-05, [2] [tag_attr]: 1.656e-05 [meta_addattr_fg_expand]: 4.74998e-06 [parallel-infer-symbol]: 3.27002e-06 [pre_auto_parallel]: 3.021e-05 [insert-virtual-dataset]: 2.97002e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0042889, [53] [py_interpret_to_execute]: 2.331e-05 [rewriter_before_opt_a]: 6.235e-05 [opt_a]: 0.00230372, [2] [Cycle 1]: 0.00166394, [45] [expand_dump_flag]: 2.65002e-06 [switch_simplify]: 3.315e-05 [loop_unroll]: 2.122e-05 [a_1]: 0.00046849 [with_stream_mark]: 1.527e-05 [recompute_prepare]: 7.45998e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.681e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 6.46e-06 [parallel]: 1.778e-05 [flash_sp]: 8.17998e-06 [merge_comm]: 3.41999e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 9.57999e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.10998e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.78997e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 1.011e-05 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.51e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 9.19e-06 [renormalize]: 0.00052904 [add_forward_monad_depend]: 5.14e-06 [auto_monad_grad]: 2.69999e-06 [auto_monad_eliminator]: 1.458e-05 [cse]: 2.92e-05 [a_3]: 4.321e-05 [Cycle 2]: 0.00062859, [45] [expand_dump_flag]: 1.67999e-06 [switch_simplify]: 7.25e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00012664 [with_stream_mark]: 1.222e-05 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 3.05998e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 6.781e-05 [accelerated_algorithm]: 5.35999e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 6.13998e-06 [merge_send_recv]: 4.60001e-06 [auto_parallel]: 5.89999e-06 [parallel]: 4.89e-06 [flash_sp]: 3.16999e-06 [merge_comm]: 3.23998e-06 [allreduce_fusion]: 2.89999e-06 [matmul_add_comm_reduction]: 1.856e-05 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.78e-06 [virtual_dataset]: 5.17999e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.86e-06 [cell_reuse_recompute_pass]: 1.68002e-06 [offload_activation]: 6.77002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.091e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.35001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 2.04e-06 [flash_sp_send_recv_attached]: 9.99979e-07 [receive_attached]: 1.67999e-06 [after_resolve]: 9.57001e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.77001e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 6.69999e-06 [cse]: 1.731e-05 [a_3]: 3.273e-05 [py_interpret_to_execute_after_opt_a]: 9.35001e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.422e-05 [convert_after_rewriter]: 7.13998e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00053725 [opt_b]: 0.00018443, [1] [Cycle 1]: 0.00017776, [7] [b_1]: 0.00010808 [b_2]: 6.78e-06 [updatestate_depend_eliminate]: 5.69999e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.14e-06 [renormalize]: 2.50002e-07 [cse]: 1.696e-05 [optimize_parallel_all_gather_comm]: 1.811e-05 [overlap_param_gather]: 2.55002e-06 [cconv]: 2.49e-05 [loop_unroll]: 0.00043735 [opt_after_cconv]: 9.52e-05, [1] [Cycle 1]: 8.96e-05, [7] [c_1]: 2.824e-05 [parameter_eliminate]: 2.43002e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.621e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.307e-05 [tuple_transform]: 7.027e-05, [1] [Cycle 1]: 6.596e-05, [4] [d_1]: 3.981e-05 [none_parameter_eliminate]: 1.69998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.755e-05 [cse_after_recomputation]: 2.078e-05, [1] [Cycle 1]: 1.632e-05, [1] [cse]: 1.119e-05 [environ_conv]: 4.66002e-06 [swap_dp_allreduce_reducescatter]: 5.79e-06 [bias_add_comm_swap]: 2.80002e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.68e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 9.10019e-07 [remove_cast_before_assign_add]: 1.23002e-06 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.27999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09999e-06 [control_data_broadcast_order]: 1.178e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.41001e-06 [overlap_recompute_and_grad_model_parallel]: 4.37e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.952e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.16998e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 7.069e-05, [1] [Cycle 1]: 6.605e-05, [6] [build]: 3.26999e-06 [elim_shapecalc]: 8.90999e-06 [elim_not_effective]: 1.241e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.71002e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 1.602e-05 [get_jit_bprop_graph]: 1.27e-06 [rewriter_after_jit_bprop_graph]: 4.12003e-06 [opt_after_jit_grad]: 0.00047151 [validate]: 3.467e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00632392 [execute]: 8.05e-06 Sums bootstrap : 0.000499s : 1.35% type_inference : 0.026051s : 70.34% event_method : 0.000014s : 0.04% auto_monad : 0.000055s : 0.15% graph_reusing : 0.000006s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000030s : 0.08% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.06% optimize.rewriter_before_opt_a : 0.000062s : 0.17% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.11% optimize.opt_a.loop_unroll : 0.000027s : 0.07% optimize.opt_a.a_1 : 0.000595s : 1.61% optimize.opt_a.with_stream_mark : 0.000027s : 0.07% optimize.opt_a.recompute_prepare : 0.000013s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.39% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.03% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.03% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.03% optimize.opt_a.parallel : 0.000023s : 0.06% optimize.opt_a.flash_sp : 0.000011s : 0.03% optimize.opt_a.merge_comm : 0.000007s : 0.02% optimize.opt_a.allreduce_fusion : 0.000006s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000028s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.04% optimize.opt_a.virtual_dataset : 0.000011s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.03% optimize.opt_a.virtual_output : 0.000011s : 0.03% optimize.opt_a.merge_forward : 0.000007s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.02% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.05% optimize.opt_a.a_after_grad : 0.000017s : 0.05% optimize.opt_a.renormalize : 0.000529s : 1.43% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.02% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.06% optimize.opt_a.cse : 0.000047s : 0.13% optimize.opt_a.a_3 : 0.000076s : 0.21% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.09% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000537s : 1.45% optimize.opt_b.b_1 : 0.000108s : 0.29% optimize.opt_b.b_2 : 0.000007s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.05% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.05% optimize.overlap_param_gather : 0.000003s : 0.01% optimize.cconv : 0.000025s : 0.07% optimize.loop_unroll : 0.000437s : 1.18% optimize.opt_after_cconv.c_1 : 0.000028s : 0.08% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.04% optimize.tuple_transform.d_1 : 0.000040s : 0.11% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.13% optimize.cse_after_recomputation.cse : 0.000011s : 0.03% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000020s : 0.05% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.04% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000472s : 1.27% validate : 0.000035s : 0.09% backend_pass : 0.000001s : 0.00% task_emit : 0.006324s : 17.07% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000181 30 15.18% : 0.000027s : 5: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.13% : 0.000006s : 4: substitution.graph_param_transform 66.74% : 0.000121s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000005s : 4: substitution.remove_not_recompute_node 2.34% : 0.000004s : 4: substitution.replace_old_param 6.30% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.026001 2 97.73% : 0.025410s : 1: type_inference.infer 2.27% : 0.000591s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.39% : 0.000029s : 3: replace.inline 29.61% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000129 5 92.00% : 0.000119s : 3: match.inline 8.00% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 1.02% : 0.000002s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.24% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.66% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.49% : 0.000011s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.29% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.60% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.28% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000002s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000002s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.86% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.54% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.21% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000360 8 41.04% : 0.000148s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.96% : 0.000212s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.050679 196 0.01% : 0.000004s : 1: ForceFp32Comm 6.62% : 0.003357s : 1: add_attr 6.60% : 0.003346s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.12% : 0.000061s : 1: auto_monad 0.04% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.05% : 0.000533s : 1: bootstrap 0.06% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.05% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.04% : 0.000021s : 1: event_method 0.03% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.88% : 0.000446s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.08% : 0.000547s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000014s : 1: opt.transform.mutable_eliminate 1.91% : 0.000969s : 78: opt.transform.opt_a 0.05% : 0.000027s : 1: opt.transform.opt_after_cconv 0.04% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.18% : 0.000090s : 28: opt.transform.opt_b 0.09% : 0.000044s : 2: opt.transform.opt_trans_graph 0.07% : 0.000033s : 4: opt.transform.symbol_engine_opt 4.55% : 0.002307s : 1: opt_a 0.19% : 0.000099s : 1: opt_after_cconv 0.95% : 0.000482s : 1: opt_after_jit_grad 0.37% : 0.000188s : 1: opt_b 8.47% : 0.004293s : 1: optimize 0.04% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.04% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000035s : 1: pre_auto_parallel 0.05% : 0.000027s : 1: py_interpret_to_execute 0.03% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000017s : 1: remove_dup_value 0.56% : 0.000282s : 1: renormalize.infer 0.47% : 0.000238s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000038s : 1: rewriter_after_opt_a 0.13% : 0.000067s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.14% : 0.000073s : 1: symbol_engine_optimizer 12.50% : 0.006336s : 1: task_emit 0.14% : 0.000073s : 1: tuple_transform 51.44% : 0.026070s : 1: type_inference 0.13% : 0.000065s : 1: validate TotalTime = 0.0766386, [24] [bootstrap]: 0.00058956 [type_inference]: 0.0294992 [event_method]: 5.002e-05 [auto_monad]: 0.00013292 [graph_reusing]: 8.64e-06 [inline]: 2.19001e-06 [add_attr]: 0.00339465, [1] [add_attr_with_inline]: 0.00338262, [1] [Cycle 1]: 8.374e-05, [2] [tag_attr]: 3.694e-05 [meta_addattr_fg_expand]: 9.75002e-06 [parallel-infer-symbol]: 4.33001e-06 [pre_auto_parallel]: 5.574e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.0318773, [53] [py_interpret_to_execute]: 4.401e-05 [rewriter_before_opt_a]: 0.00014963 [opt_a]: 0.0124882, [3] [Cycle 1]: 0.00793331, [45] [expand_dump_flag]: 3.55e-06 [switch_simplify]: 7.577e-05 [loop_unroll]: 6.16e-05 [a_1]: 0.0015078 [with_stream_mark]: 3.322e-05 [recompute_prepare]: 2.502e-05 [updatestate_depend_eliminate]: 9.64999e-06 [updatestate_assign_eliminate]: 8.67e-06 [updatestate_loads_eliminate]: 7.21999e-06 [parameter_eliminate]: 2.61e-06 [a_2]: 0.00024856 [accelerated_algorithm]: 3.334e-05 [shard]: 2.43e-06 [meta_shard_fg_expand]: 3.88999e-06 [shard_inline]: 1.698e-05 [merge_send_recv]: 1.723e-05 [auto_parallel]: 1.237e-05 [parallel]: 1.965e-05 [flash_sp]: 1.352e-05 [merge_comm]: 1.014e-05 [allreduce_fusion]: 8.81002e-06 [matmul_add_comm_reduction]: 3.135e-05 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 1.831e-05 [virtual_dataset]: 1.565e-05 [get_grad_eliminate_]: 1.61e-05 [virtual_output]: 1.51e-05 [merge_forward]: 1.004e-05 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 1.85e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.981e-05 [merge_recompute_call_nodes]: 1.87001e-06 [before_grad]: 2.805e-05 [set_forward_comm_id_for_comm_node_pass]: 1.035e-05 [meta_fg_expand]: 0.00162175 [flash_sp_send_recv_attached]: 4.17998e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 6.439e-05 [a_after_grad]: 8.515e-05 [renormalize]: 0.00291027 [add_forward_monad_depend]: 1.247e-05 [auto_monad_grad]: 6.80998e-06 [auto_monad_eliminator]: 6.16e-05 [cse]: 0.00017395 [a_3]: 0.00035605 [Cycle 2]: 0.00352766, [45] [expand_dump_flag]: 2.35002e-06 [switch_simplify]: 4.952e-05 [loop_unroll]: 4.528e-05 [a_1]: 0.00170332 [with_stream_mark]: 2.051e-05 [recompute_prepare]: 1.363e-05 [updatestate_depend_eliminate]: 6.61e-06 [updatestate_assign_eliminate]: 5.61e-06 [updatestate_loads_eliminate]: 4.45e-06 [parameter_eliminate]: 1.97999e-06 [a_2]: 0.00014177 [accelerated_algorithm]: 1.356e-05 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 2.52001e-06 [shard_inline]: 1.026e-05 [merge_send_recv]: 9.82001e-06 [auto_parallel]: 1.139e-05 [parallel]: 8.26002e-06 [flash_sp]: 3.35e-06 [merge_comm]: 6.69999e-06 [allreduce_fusion]: 6.63998e-06 [matmul_add_comm_reduction]: 1.226e-05 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 1.152e-05 [virtual_dataset]: 9.80002e-06 [get_grad_eliminate_]: 9.52001e-06 [virtual_output]: 9.31e-06 [merge_forward]: 5.76e-06 [cell_reuse_recompute_pass]: 1.60001e-06 [offload_activation]: 1.271e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.954e-05 [merge_recompute_call_nodes]: 1.18001e-06 [before_grad]: 1.686e-05 [set_forward_comm_id_for_comm_node_pass]: 6.24999e-06 [meta_fg_expand]: 8.91e-05 [flash_sp_send_recv_attached]: 1.81e-06 [receive_attached]: 2.76999e-06 [after_resolve]: 1.839e-05 [a_after_grad]: 1.648e-05 [renormalize]: 0.00078777 [add_forward_monad_depend]: 5.68002e-06 [auto_monad_grad]: 2.56e-06 [auto_monad_eliminator]: 1.868e-05 [cse]: 5.761e-05 [a_3]: 7.398e-05 [Cycle 3]: 0.00100854, [45] [expand_dump_flag]: 2.14e-06 [switch_simplify]: 1.13e-05 [loop_unroll]: 9.77999e-06 [a_1]: 0.00028664 [with_stream_mark]: 1.33e-05 [recompute_prepare]: 1.009e-05 [updatestate_depend_eliminate]: 5.96e-06 [updatestate_assign_eliminate]: 4.79e-06 [updatestate_loads_eliminate]: 4.63999e-06 [parameter_eliminate]: 1.30999e-06 [a_2]: 0.00013684 [accelerated_algorithm]: 1.279e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 2.41e-06 [shard_inline]: 1.038e-05 [merge_send_recv]: 7.83001e-06 [auto_parallel]: 8.97e-06 [parallel]: 6.73e-06 [flash_sp]: 9.80013e-07 [merge_comm]: 6.06998e-06 [allreduce_fusion]: 5.30999e-06 [matmul_add_comm_reduction]: 9.98002e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.172e-05 [virtual_dataset]: 9.97999e-06 [get_grad_eliminate_]: 9.47999e-06 [virtual_output]: 9.22999e-06 [merge_forward]: 5.53002e-06 [cell_reuse_recompute_pass]: 2.14e-06 [offload_activation]: 1.044e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.851e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 1.61e-05 [set_forward_comm_id_for_comm_node_pass]: 6.03998e-06 [meta_fg_expand]: 3.51999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.56002e-06 [after_resolve]: 1.467e-05 [a_after_grad]: 1.571e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.50999e-06 [auto_monad_grad]: 1.60001e-06 [auto_monad_eliminator]: 1.247e-05 [cse]: 3.089e-05 [a_3]: 6.751e-05 [py_interpret_to_execute_after_opt_a]: 1.662e-05 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 6.622e-05 [convert_after_rewriter]: 1.096e-05 [order_py_execute_after_rewriter]: 7.61001e-06 [mutable_eliminate]: 0.0167753 [opt_b]: 0.00040929, [1] [Cycle 1]: 0.00039563, [7] [b_1]: 0.00022998 [b_2]: 1.323e-05 [updatestate_depend_eliminate]: 1.967e-05 [updatestate_assign_eliminate]: 5.61e-06 [updatestate_loads_eliminate]: 6.25002e-06 [renormalize]: 1.00001e-06 [cse]: 7.924e-05 [optimize_parallel_all_gather_comm]: 4.285e-05 [overlap_param_gather]: 2.41998e-06 [cconv]: 4.209e-05 [loop_unroll]: 0.00083408 [opt_after_cconv]: 0.00016181, [1] [Cycle 1]: 0.00015478, [7] [c_1]: 5.469e-05 [parameter_eliminate]: 5.87001e-06 [updatestate_depend_eliminate]: 9.14e-06 [updatestate_assign_eliminate]: 4.87998e-06 [updatestate_loads_eliminate]: 4.57e-06 [cse]: 3.925e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 6.106e-05 [tuple_transform]: 0.00012115, [1] [Cycle 1]: 0.00011584, [4] [d_1]: 8.219e-05 [none_parameter_eliminate]: 1.94999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 1.115e-05 [partial_unused_args_eliminate]: 2.34999e-06 [add_recomputation]: 8.383e-05 [cse_after_recomputation]: 3.569e-05, [1] [Cycle 1]: 3.066e-05, [1] [cse]: 2.487e-05 [environ_conv]: 1.245e-05 [swap_dp_allreduce_reducescatter]: 9.15999e-06 [bias_add_comm_swap]: 3.55998e-06 [label_micro_interleaved_index]: 5.26998e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 2.43998e-06 [micro_interleaved_order_control]: 2.48998e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 1.04003e-06 [remove_cast_before_assign_add]: 1.25001e-06 [full_micro_interleaved_order_control]: 2.55002e-06 [reorder_send_recv_between_fp_bp]: 3.13998e-06 [comm_op_add_attrs]: 1.19998e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.20001e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.993e-05 [grouped_pairwise_exchange_alltoall]: 1.81998e-06 [offloading_packed_experts]: 5.97999e-06 [overlap_recompute_and_grad_model_parallel]: 6.28e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 5.87999e-06 [overlap_grad_flash_sp]: 3.21e-05 [begin_end_overlap_inline]: 5.90022e-07 [split_matmul_comm_elemetwise]: 2.54001e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00011201, [1] [Cycle 1]: 0.0001074, [6] [build]: 1.433e-05 [elim_shapecalc]: 1.54e-05 [elim_not_effective]: 2.137e-05 [opt_reshape]: 1.092e-05 [fold_const_symbol]: 1.651e-05 [renormalize]: 1.8999e-07 [detach_backward]: 2.61999e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.697e-05 [get_jit_bprop_graph]: 2.36998e-06 [rewriter_after_jit_bprop_graph]: 6.91001e-06 [opt_after_jit_grad]: 0.00054488 [validate]: 6.243e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0101042 [execute]: 9.39e-06 Sums bootstrap : 0.000590s : 0.82% type_inference : 0.029499s : 41.08% event_method : 0.000050s : 0.07% auto_monad : 0.000133s : 0.19% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000056s : 0.08% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000044s : 0.06% optimize.rewriter_before_opt_a : 0.000150s : 0.21% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000137s : 0.19% optimize.opt_a.loop_unroll : 0.000117s : 0.16% optimize.opt_a.a_1 : 0.003498s : 4.87% optimize.opt_a.with_stream_mark : 0.000067s : 0.09% optimize.opt_a.recompute_prepare : 0.000049s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000019s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.02% optimize.opt_a.parameter_eliminate : 0.000006s : 0.01% optimize.opt_a.a_2 : 0.000527s : 0.73% optimize.opt_a.accelerated_algorithm : 0.000060s : 0.08% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.01% optimize.opt_a.shard_inline : 0.000038s : 0.05% optimize.opt_a.merge_send_recv : 0.000035s : 0.05% optimize.opt_a.auto_parallel : 0.000033s : 0.05% optimize.opt_a.parallel : 0.000035s : 0.05% optimize.opt_a.flash_sp : 0.000018s : 0.02% optimize.opt_a.merge_comm : 0.000023s : 0.03% optimize.opt_a.allreduce_fusion : 0.000021s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000054s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000042s : 0.06% optimize.opt_a.virtual_dataset : 0.000035s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.05% optimize.opt_a.virtual_output : 0.000034s : 0.05% optimize.opt_a.merge_forward : 0.000021s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000042s : 0.06% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000068s : 0.09% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000061s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.03% optimize.opt_a.meta_fg_expand : 0.001714s : 2.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.01% optimize.opt_a.receive_attached : 0.000007s : 0.01% optimize.opt_a.after_resolve : 0.000097s : 0.14% optimize.opt_a.a_after_grad : 0.000117s : 0.16% optimize.opt_a.renormalize : 0.003698s : 5.15% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.03% optimize.opt_a.auto_monad_grad : 0.000011s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000093s : 0.13% optimize.opt_a.cse : 0.000262s : 0.37% optimize.opt_a.a_3 : 0.000498s : 0.69% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000066s : 0.09% optimize.convert_after_rewriter : 0.000011s : 0.02% optimize.order_py_execute_after_rewriter : 0.000008s : 0.01% optimize.mutable_eliminate : 0.016775s : 23.36% optimize.opt_b.b_1 : 0.000230s : 0.32% optimize.opt_b.b_2 : 0.000013s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000020s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000079s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000043s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000042s : 0.06% optimize.loop_unroll : 0.000834s : 1.16% optimize.opt_after_cconv.c_1 : 0.000055s : 0.08% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.cse : 0.000039s : 0.05% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000061s : 0.09% optimize.tuple_transform.d_1 : 0.000082s : 0.11% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000084s : 0.12% optimize.cse_after_recomputation.cse : 0.000025s : 0.03% optimize.environ_conv : 0.000012s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000032s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000014s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.04% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.01% opt_after_jit_grad : 0.000545s : 0.76% validate : 0.000062s : 0.09% backend_pass : 0.000001s : 0.00% task_emit : 0.010104s : 14.07% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000948 231 5.90% : 0.000056s : 12: substitution.arithmetic_simplify 2.19% : 0.000021s : 4: substitution.cast_eliminate 0.32% : 0.000003s : 6: substitution.elim_not_effective 0.45% : 0.000004s : 5: substitution.float_depend_g_call 0.44% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000003s : 6: substitution.fold_const_symbol 1.02% : 0.000010s : 9: substitution.graph_param_transform 0.28% : 0.000003s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 52.83% : 0.000501s : 17: substitution.inline 1.98% : 0.000019s : 2: substitution.inline_without_move 1.26% : 0.000012s : 22: substitution.j_node_and_user_rematch 1.92% : 0.000018s : 3: substitution.less_batch_normalization 1.49% : 0.000014s : 11: substitution.minmaximum_grad 0.64% : 0.000006s : 5: substitution.partial_eliminate 1.61% : 0.000015s : 22: substitution.remove_not_recompute_node 3.12% : 0.000030s : 10: substitution.replace_applicator 1.24% : 0.000012s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.16% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.56% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 1.93% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 13.74% : 0.000130s : 30: substitution.tuple_list_get_item_eliminator 2.12% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.029402 2 94.74% : 0.027856s : 1: type_inference.infer 5.26% : 0.001547s : 1: type_inference.specialize ------[replace.] 0.000229 33 57.87% : 0.000132s : 17: replace.inline 42.13% : 0.000096s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000585 33 83.88% : 0.000491s : 17: match.inline 16.12% : 0.000094s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000810 5998 1.03% : 0.000008s : 70: predicate.accumulaten_eliminater 0.33% : 0.000003s : 9: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 34: predicate.addn_check_dump 1.03% : 0.000008s : 70: predicate.addn_zero_filter 1.00% : 0.000008s : 70: predicate.adjust_all_reduce_mul_add 2.01% : 0.000016s : 104: predicate.arithmetic_simplify 1.19% : 0.000010s : 70: predicate.cast_eliminate 1.10% : 0.000009s : 71: predicate.check_bprop_eliminate 0.51% : 0.000004s : 34: predicate.compare_switch_simplify 0.09% : 0.000001s : 9: predicate.const_output_eliminate 0.51% : 0.000004s : 34: predicate.depend_value_elim 1.16% : 0.000009s : 70: predicate.dict_get_item_const_eliminator 1.18% : 0.000010s : 70: predicate.dict_get_item_eliminator 1.16% : 0.000009s : 70: predicate.dict_set_item_eliminator 0.51% : 0.000004s : 18: predicate.dumpgradient_eliminate 0.14% : 0.000001s : 9: predicate.elim_not_effective 0.18% : 0.000001s : 9: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000010s : 79: predicate.environ_add_const_eliminate 1.18% : 0.000010s : 79: predicate.environ_get_add_eliminate 1.19% : 0.000010s : 79: predicate.environ_get_depend_swap 1.79% : 0.000014s : 113: predicate.environ_get_eliminate 1.19% : 0.000010s : 79: predicate.environ_get_set_eliminate 1.66% : 0.000013s : 103: predicate.exchange_switch_depend_value 2.28% : 0.000018s : 103: predicate.float_depend_g_call 0.50% : 0.000004s : 34: predicate.float_environ_get_switch 0.69% : 0.000006s : 43: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 9: predicate.fold_const_symbol 0.55% : 0.000004s : 34: predicate.get_grad_eliminate 0.13% : 0.000001s : 9: predicate.graph_param_transform 0.54% : 0.000004s : 34: predicate.incorporate_call 0.49% : 0.000004s : 34: predicate.incorporate_call_switch 5.72% : 0.000046s : 259: predicate.inline 1.28% : 0.000010s : 57: predicate.inline_without_move 0.31% : 0.000003s : 34: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 34: predicate.less_batch_normalization 1.67% : 0.000014s : 104: predicate.list_to_tuple_eliminator_ 2.65% : 0.000021s : 174: predicate.load_eliminater 0.42% : 0.000003s : 9: predicate.loop_unroll_after_grad 2.17% : 0.000018s : 138: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 88: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 34: predicate.merge_addn 1.08% : 0.000009s : 71: predicate.micro_step_allgather_replace 1.07% : 0.000009s : 71: predicate.mini_step_allgather_replace 1.06% : 0.000009s : 70: predicate.minmaximum_grad 1.09% : 0.000009s : 9: predicate.mutable_eliminate 0.16% : 0.000001s : 9: predicate.opt_reshape 0.19% : 0.000002s : 9: predicate.parallel_virtual_node 2.02% : 0.000016s : 103: predicate.partial_defer_inline 1.67% : 0.000014s : 95: predicate.partial_eliminate 1.03% : 0.000008s : 70: predicate.print_const_string_wrapper 0.53% : 0.000004s : 34: predicate.reduce_all_const_elim 1.25% : 0.000010s : 70: predicate.reduce_eliminate 2.64% : 0.000021s : 174: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 34: predicate.remove_not_recompute_node 1.83% : 0.000015s : 157: predicate.replace_applicator 0.61% : 0.000005s : 57: predicate.replace_old_param 0.17% : 0.000001s : 9: predicate.reset_defer_inline 1.04% : 0.000008s : 70: predicate.reshape_eliminate 1.12% : 0.000009s : 71: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 9: predicate.row_tensor_eliminate 1.22% : 0.000010s : 71: predicate.same_eliminate 0.43% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 34: predicate.shard_identity_eliminate 0.32% : 0.000003s : 18: predicate.special_op_eliminate 0.63% : 0.000005s : 34: predicate.specialize_transform 1.25% : 0.000010s : 71: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 57: predicate.stack_unstack_eliminate 0.19% : 0.000002s : 9: predicate.switch_call_monad_eliminater 1.76% : 0.000014s : 103: predicate.switch_defer_inline 2.80% : 0.000023s : 174: predicate.switch_layer_defer_inline 4.84% : 0.000039s : 284: predicate.switch_simplify 1.04% : 0.000008s : 70: predicate.tile_eliminate 1.01% : 0.000008s : 70: predicate.transpose_eliminate 1.51% : 0.000012s : 88: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000012s : 88: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000011s : 88: predicate.tuple_list_get_item_depend_reorder 3.00% : 0.000024s : 138: predicate.tuple_list_get_item_eliminator 1.45% : 0.000012s : 88: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000016s : 122: predicate.tuple_list_set_item_eliminator 1.61% : 0.000013s : 104: predicate.tuple_to_list_eliminator_ 2.55% : 0.000021s : 174: predicate.updatestate_pure_node_eliminater 3.18% : 0.000026s : 208: predicate.updatestate_useless_node_eliminater 0.20% : 0.000002s : 9: predicate.value_based_eliminate 0.56% : 0.000005s : 34: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 34: predicate.virtual_output_eliminate 0.14% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.20% : 0.000002s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001806 34 59.48% : 0.001074s : 13: func_graph_cloner_run.FuncGraphClonerGraph 40.52% : 0.000732s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.121301 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.80% : 0.003399s : 1: add_attr 2.79% : 0.003387s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000089s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000142s : 1: auto_monad 0.03% : 0.000031s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.52% : 0.000630s : 1: bootstrap 0.04% : 0.000046s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000023s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.03% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000016s : 1: environ_conv 0.05% : 0.000059s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000005s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.70% : 0.000845s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 13.86% : 0.016807s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.02% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000054s : 1: opt.transform.mutable_eliminate 4.35% : 0.005281s : 117: opt.transform.opt_a 0.04% : 0.000053s : 1: opt.transform.opt_after_cconv 0.03% : 0.000040s : 1: opt.transform.opt_after_jit_grad 0.17% : 0.000211s : 28: opt.transform.opt_b 0.08% : 0.000091s : 2: opt.transform.opt_trans_graph 0.05% : 0.000061s : 4: opt.transform.symbol_engine_opt 10.30% : 0.012491s : 1: opt_a 0.14% : 0.000165s : 1: opt_after_cconv 0.46% : 0.000557s : 1: opt_after_jit_grad 0.34% : 0.000414s : 1: opt_b 26.28% : 0.031882s : 1: optimize 0.04% : 0.000047s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.03% : 0.000035s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.05% : 0.000060s : 1: pre_auto_parallel 0.04% : 0.000048s : 1: py_interpret_to_execute 0.02% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000066s : 1: remove_dup_value 1.71% : 0.002078s : 2: renormalize.infer 1.32% : 0.001602s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000071s : 1: rewriter_after_opt_a 0.13% : 0.000155s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000115s : 1: symbol_engine_optimizer 8.34% : 0.010120s : 1: task_emit 0.10% : 0.000124s : 1: tuple_transform 24.34% : 0.029523s : 1: type_inference 0.09% : 0.000106s : 1: validate TotalTime = 0.0186656, [24] [bootstrap]: 0.0004047 [type_inference]: 0.00418823 [event_method]: 1.016e-05 [auto_monad]: 5.36e-05 [graph_reusing]: 5.92999e-06 [inline]: 2.06e-06 [add_attr]: 0.00307165, [1] [add_attr_with_inline]: 0.0030633, [1] [Cycle 1]: 4.205e-05, [2] [tag_attr]: 1.225e-05 [meta_addattr_fg_expand]: 3.4e-06 [parallel-infer-symbol]: 2.59999e-06 [pre_auto_parallel]: 2.203e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00373531, [53] [py_interpret_to_execute]: 1.542e-05 [rewriter_before_opt_a]: 3.889e-05 [opt_a]: 0.00189008, [2] [Cycle 1]: 0.00127288, [45] [expand_dump_flag]: 3.09999e-06 [switch_simplify]: 2.476e-05 [loop_unroll]: 1.364e-05 [a_1]: 0.00030045 [with_stream_mark]: 1.388e-05 [recompute_prepare]: 7.3e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.50998e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 2.07999e-06 [a_2]: 7.66e-05 [accelerated_algorithm]: 6.09999e-06 [shard]: 2.01998e-06 [meta_shard_fg_expand]: 1.76003e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.27e-06 [auto_parallel]: 5.71e-06 [parallel]: 1.876e-05 [flash_sp]: 7.4e-06 [merge_comm]: 3.84002e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.27999e-06 [allreduce_slice_to_reducescatter]: 1.02e-06 [virtual_shard_identity]: 6.93998e-06 [virtual_dataset]: 5.76998e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.141e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.29998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 1.09e-05 [a_after_grad]: 8.52e-06 [renormalize]: 0.00034698 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.339e-05 [cse]: 2.754e-05 [a_3]: 3.999e-05 [Cycle 2]: 0.00060813, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 6.61e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012466 [with_stream_mark]: 2.035e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 7.79983e-07 [a_2]: 6.87e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.56998e-06 [merge_send_recv]: 4.95999e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.05e-06 [flash_sp]: 3.13998e-06 [merge_comm]: 3.08998e-06 [allreduce_fusion]: 2.65002e-06 [matmul_add_comm_reduction]: 5.31998e-06 [allreduce_slice_to_reducescatter]: 3.49974e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.54998e-06 [get_grad_eliminate_]: 5.27999e-06 [virtual_output]: 5.11997e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.007e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.85998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.12001e-06 [a_after_grad]: 8e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 5.95002e-06 [cse]: 1.348e-05 [a_3]: 3.178e-05 [py_interpret_to_execute_after_opt_a]: 7.68999e-06 [slice_cell_reuse_recomputed_activation]: 2.33998e-06 [rewriter_after_opt_a]: 3.386e-05 [convert_after_rewriter]: 6.52001e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00046134 [opt_b]: 0.00018218, [1] [Cycle 1]: 0.00017623, [7] [b_1]: 0.00010756 [b_2]: 7.21001e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 4.10015e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.37e-05 [loop_unroll]: 0.00042463 [opt_after_cconv]: 9.539e-05, [1] [Cycle 1]: 8.948e-05, [7] [c_1]: 2.818e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.577e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.294e-05 [tuple_transform]: 6.97e-05, [1] [Cycle 1]: 6.512e-05, [4] [d_1]: 3.923e-05 [none_parameter_eliminate]: 1.94999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.02001e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.699e-05 [cse_after_recomputation]: 1.958e-05, [1] [Cycle 1]: 1.523e-05, [1] [cse]: 1.025e-05 [environ_conv]: 4.79e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 2.91e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.60002e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.47001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.49e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.219e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 3.81999e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.81998e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.908e-05, [1] [Cycle 1]: 6.486e-05, [6] [build]: 2.78e-06 [elim_shapecalc]: 8.64998e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.94e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.721e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.37002e-06 [opt_after_jit_grad]: 0.00045056 [validate]: 3.134e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00645717 [execute]: 7.03e-06 Sums bootstrap : 0.000405s : 2.77% type_inference : 0.004188s : 28.62% event_method : 0.000010s : 0.07% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000425s : 2.90% optimize.opt_a.with_stream_mark : 0.000034s : 0.23% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000347s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000034s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000461s : 3.15% optimize.opt_b.b_1 : 0.000108s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000425s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.32% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.12% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000451s : 3.08% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006457s : 44.12% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000127 26 18.48% : 0.000023s : 4: substitution.arithmetic_simplify 1.40% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.40% : 0.000006s : 4: substitution.graph_param_transform 65.97% : 0.000084s : 2: substitution.inline 2.15% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.45% : 0.000004s : 4: substitution.remove_not_recompute_node 3.16% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004147 2 92.22% : 0.003825s : 1: type_inference.infer 7.78% : 0.000323s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000082 2 100.00% : 0.000082s : 2: match.inline ------[predicate.] 0.000138 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.50% : 0.000003s : 17: predicate.arithmetic_simplify 0.72% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000008s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.35% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.78% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.70% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.51% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.60% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000229 6 37.87% : 0.000087s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.13% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026754 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.50% : 0.003076s : 1: add_attr 11.46% : 0.003067s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.08% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.62% : 0.000434s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000434s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000471s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.90% : 0.000777s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001893s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.72% : 0.000460s : 1: opt_after_jit_grad 0.69% : 0.000186s : 1: opt_b 13.98% : 0.003739s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000189s : 1: renormalize.infer 0.56% : 0.000151s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000037s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 24.18% : 0.006468s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 15.70% : 0.004201s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0440925, [24] [bootstrap]: 0.00057259 [type_inference]: 0.0135233 [event_method]: 5.014e-05 [auto_monad]: 0.00012363 [graph_reusing]: 8.28999e-06 [inline]: 2.02999e-06 [add_attr]: 0.00353879, [1] [add_attr_with_inline]: 0.00352844, [1] [Cycle 1]: 7.773e-05, [2] [tag_attr]: 3.392e-05 [meta_addattr_fg_expand]: 8.74998e-06 [parallel-infer-symbol]: 3.31001e-06 [pre_auto_parallel]: 5.115e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.0152927, [53] [py_interpret_to_execute]: 4.04e-05 [rewriter_before_opt_a]: 0.00013385 [opt_a]: 0.0125801, [3] [Cycle 1]: 0.00783486, [45] [expand_dump_flag]: 4.92e-06 [switch_simplify]: 6.86e-05 [loop_unroll]: 5.496e-05 [a_1]: 0.00138602 [with_stream_mark]: 2.707e-05 [recompute_prepare]: 2.256e-05 [updatestate_depend_eliminate]: 9.97999e-06 [updatestate_assign_eliminate]: 7.95998e-06 [updatestate_loads_eliminate]: 7.86001e-06 [parameter_eliminate]: 2.99001e-06 [a_2]: 0.00024825 [accelerated_algorithm]: 3.327e-05 [shard]: 2.68998e-06 [meta_shard_fg_expand]: 3.88999e-06 [shard_inline]: 1.641e-05 [merge_send_recv]: 1.764e-05 [auto_parallel]: 1.133e-05 [parallel]: 2.072e-05 [flash_sp]: 1.292e-05 [merge_comm]: 9.81e-06 [allreduce_fusion]: 8.88002e-06 [matmul_add_comm_reduction]: 2.924e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.914e-05 [virtual_dataset]: 1.614e-05 [get_grad_eliminate_]: 1.548e-05 [virtual_output]: 1.563e-05 [merge_forward]: 9.36e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.9e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.035e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 2.772e-05 [set_forward_comm_id_for_comm_node_pass]: 1.035e-05 [meta_fg_expand]: 0.00164523 [flash_sp_send_recv_attached]: 4.22e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 6.427e-05 [a_after_grad]: 8.368e-05 [renormalize]: 0.00292479 [add_forward_monad_depend]: 1.131e-05 [auto_monad_grad]: 6.23998e-06 [auto_monad_eliminator]: 6.074e-05 [cse]: 0.00018272 [a_3]: 0.00035546 [Cycle 2]: 0.00371053, [45] [expand_dump_flag]: 2.49001e-06 [switch_simplify]: 4.812e-05 [loop_unroll]: 4.483e-05 [a_1]: 0.0016295 [with_stream_mark]: 1.754e-05 [recompute_prepare]: 1.329e-05 [updatestate_depend_eliminate]: 6.16e-06 [updatestate_assign_eliminate]: 4.70999e-06 [updatestate_loads_eliminate]: 4.80001e-06 [parameter_eliminate]: 1.20001e-06 [a_2]: 0.00014157 [accelerated_algorithm]: 1.421e-05 [shard]: 1.64e-06 [meta_shard_fg_expand]: 3.14001e-06 [shard_inline]: 9.96e-06 [merge_send_recv]: 9.17999e-06 [auto_parallel]: 9.37001e-06 [parallel]: 7.88001e-06 [flash_sp]: 3.80998e-06 [merge_comm]: 5.81e-06 [allreduce_fusion]: 5.52001e-06 [matmul_add_comm_reduction]: 1.058e-05 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 1.199e-05 [virtual_dataset]: 9.95002e-06 [get_grad_eliminate_]: 9.71e-06 [virtual_output]: 9.74e-06 [merge_forward]: 5.62001e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 1.29e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.98e-05 [merge_recompute_call_nodes]: 1.12e-06 [before_grad]: 1.68e-05 [set_forward_comm_id_for_comm_node_pass]: 6.59999e-06 [meta_fg_expand]: 4.828e-05 [flash_sp_send_recv_attached]: 1.12e-06 [receive_attached]: 1.42999e-06 [after_resolve]: 1.711e-05 [a_after_grad]: 1.621e-05 [renormalize]: 0.00110406 [add_forward_monad_depend]: 5.62999e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.866e-05 [cse]: 5.678e-05 [a_3]: 7.749e-05 [Cycle 3]: 0.00101844, [45] [expand_dump_flag]: 1.19e-06 [switch_simplify]: 1.176e-05 [loop_unroll]: 1.009e-05 [a_1]: 0.00029022 [with_stream_mark]: 1.258e-05 [recompute_prepare]: 1.036e-05 [updatestate_depend_eliminate]: 5.61e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 4.84e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00013795 [accelerated_algorithm]: 1.344e-05 [shard]: 1.60999e-06 [meta_shard_fg_expand]: 2.00002e-06 [shard_inline]: 1.045e-05 [merge_send_recv]: 8.32e-06 [auto_parallel]: 8.13001e-06 [parallel]: 5.93002e-06 [flash_sp]: 1.23002e-06 [merge_comm]: 5.99e-06 [allreduce_fusion]: 5.72001e-06 [matmul_add_comm_reduction]: 1.003e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.154e-05 [virtual_dataset]: 1.003e-05 [get_grad_eliminate_]: 9.87999e-06 [virtual_output]: 9.36002e-06 [merge_forward]: 5.52999e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 1.174e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.861e-05 [merge_recompute_call_nodes]: 9.80013e-07 [before_grad]: 1.639e-05 [set_forward_comm_id_for_comm_node_pass]: 6.49001e-06 [meta_fg_expand]: 3.71999e-06 [flash_sp_send_recv_attached]: 1.37e-06 [receive_attached]: 1.15999e-06 [after_resolve]: 1.522e-05 [a_after_grad]: 1.577e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 1.263e-05 [cse]: 3.092e-05 [a_3]: 6.611e-05 [py_interpret_to_execute_after_opt_a]: 1.702e-05 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 6.811e-05 [convert_after_rewriter]: 1.027e-05 [order_py_execute_after_rewriter]: 7.47998e-06 [mutable_eliminate]: 0.00063763 [opt_b]: 0.00034177, [1] [Cycle 1]: 0.00033423, [7] [b_1]: 0.00022772 [b_2]: 1.277e-05 [updatestate_depend_eliminate]: 8.75999e-06 [updatestate_assign_eliminate]: 4.54002e-06 [updatestate_loads_eliminate]: 4.47e-06 [renormalize]: 3.19997e-07 [cse]: 3.855e-05 [optimize_parallel_all_gather_comm]: 2.363e-05 [overlap_param_gather]: 2.29001e-06 [cconv]: 2.808e-05 [loop_unroll]: 0.00047007 [opt_after_cconv]: 0.0001526, [1] [Cycle 1]: 0.00014641, [7] [c_1]: 5.578e-05 [parameter_eliminate]: 2.79999e-06 [updatestate_depend_eliminate]: 7.94997e-06 [updatestate_assign_eliminate]: 4.68001e-06 [updatestate_loads_eliminate]: 4.45e-06 [cse]: 3.56e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 4.364e-05 [tuple_transform]: 0.00011372, [1] [Cycle 1]: 0.00010847, [4] [d_1]: 7.615e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.122e-05 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 7.556e-05 [cse_after_recomputation]: 3.551e-05, [1] [Cycle 1]: 3.05e-05, [1] [cse]: 2.492e-05 [environ_conv]: 1.047e-05 [swap_dp_allreduce_reducescatter]: 8.47e-06 [bias_add_comm_swap]: 2.54001e-06 [label_micro_interleaved_index]: 4.008e-05 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.62001e-06 [micro_interleaved_order_control]: 2.86999e-06 [assign_add_opt]: 1.41002e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.53e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.22999e-06 [interleave_split_concat_branches]: 1.45999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 2.12e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 5.37001e-06 [overlap_recompute_and_grad_model_parallel]: 5.97001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.63e-06 [overlap_grad_ring_attention]: 5.51e-06 [overlap_grad_flash_sp]: 2.858e-05 [begin_end_overlap_inline]: 5.79981e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 2.20002e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 0.00010748, [1] [Cycle 1]: 0.00010229, [6] [build]: 1.154e-05 [elim_shapecalc]: 1.501e-05 [elim_not_effective]: 2.055e-05 [opt_reshape]: 1.075e-05 [fold_const_symbol]: 1.651e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.07999e-06 [pipeline_parallel_scheduler]: 1.91e-06 [auto_monad_reorder]: 2.684e-05 [get_jit_bprop_graph]: 1.80001e-06 [rewriter_after_jit_bprop_graph]: 4.65999e-06 [opt_after_jit_grad]: 0.00052933 [validate]: 6.062e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.010014 [execute]: 8.95999e-06 Sums bootstrap : 0.000573s : 1.46% type_inference : 0.013523s : 34.55% event_method : 0.000050s : 0.13% auto_monad : 0.000124s : 0.32% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.13% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.10% optimize.rewriter_before_opt_a : 0.000134s : 0.34% optimize.opt_a.expand_dump_flag : 0.000009s : 0.02% optimize.opt_a.switch_simplify : 0.000128s : 0.33% optimize.opt_a.loop_unroll : 0.000110s : 0.28% optimize.opt_a.a_1 : 0.003306s : 8.44% optimize.opt_a.with_stream_mark : 0.000057s : 0.15% optimize.opt_a.recompute_prepare : 0.000046s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000018s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000528s : 1.35% optimize.opt_a.accelerated_algorithm : 0.000061s : 0.16% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000037s : 0.09% optimize.opt_a.merge_send_recv : 0.000035s : 0.09% optimize.opt_a.auto_parallel : 0.000029s : 0.07% optimize.opt_a.parallel : 0.000035s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000022s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000050s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.11% optimize.opt_a.virtual_dataset : 0.000036s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.09% optimize.opt_a.virtual_output : 0.000035s : 0.09% optimize.opt_a.merge_forward : 0.000021s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000044s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000069s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000061s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.06% optimize.opt_a.meta_fg_expand : 0.001697s : 4.34% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000097s : 0.25% optimize.opt_a.a_after_grad : 0.000116s : 0.30% optimize.opt_a.renormalize : 0.004029s : 10.29% optimize.opt_a.add_forward_monad_depend : 0.000018s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000092s : 0.24% optimize.opt_a.cse : 0.000270s : 0.69% optimize.opt_a.a_3 : 0.000499s : 1.27% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000068s : 0.17% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000638s : 1.63% optimize.opt_b.b_1 : 0.000228s : 0.58% optimize.opt_b.b_2 : 0.000013s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000039s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.07% optimize.loop_unroll : 0.000470s : 1.20% optimize.opt_after_cconv.c_1 : 0.000056s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000036s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000044s : 0.11% optimize.tuple_transform.d_1 : 0.000076s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000076s : 0.19% optimize.cse_after_recomputation.cse : 0.000025s : 0.06% optimize.environ_conv : 0.000010s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000040s : 0.10% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000021s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000029s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.07% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000529s : 1.35% validate : 0.000061s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.010014s : 25.58% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000843 227 5.95% : 0.000050s : 11: substitution.arithmetic_simplify 2.59% : 0.000022s : 4: substitution.cast_eliminate 0.43% : 0.000004s : 6: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 6: substitution.fold_const_symbol 1.01% : 0.000009s : 9: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.65% : 0.000469s : 16: substitution.inline 2.27% : 0.000019s : 2: substitution.inline_without_move 1.41% : 0.000012s : 22: substitution.j_node_and_user_rematch 2.11% : 0.000018s : 3: substitution.less_batch_normalization 1.73% : 0.000015s : 11: substitution.minmaximum_grad 0.70% : 0.000006s : 5: substitution.partial_eliminate 1.85% : 0.000016s : 22: substitution.remove_not_recompute_node 3.27% : 0.000028s : 10: substitution.replace_applicator 1.32% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.55% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.69% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.23% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.74% : 0.000065s : 28: substitution.tuple_list_get_item_eliminator 2.21% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013433 2 88.03% : 0.011825s : 1: type_inference.infer 11.97% : 0.001608s : 1: type_inference.specialize ------[replace.] 0.000215 30 59.74% : 0.000128s : 16: replace.inline 40.26% : 0.000086s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000493 30 93.48% : 0.000460s : 16: match.inline 6.52% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000785 5897 1.13% : 0.000009s : 69: predicate.accumulaten_eliminater 0.36% : 0.000003s : 9: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 34: predicate.addn_check_dump 1.08% : 0.000008s : 69: predicate.addn_zero_filter 1.06% : 0.000008s : 69: predicate.adjust_all_reduce_mul_add 2.04% : 0.000016s : 103: predicate.arithmetic_simplify 1.18% : 0.000009s : 69: predicate.cast_eliminate 1.18% : 0.000009s : 71: predicate.check_bprop_eliminate 0.56% : 0.000004s : 34: predicate.compare_switch_simplify 0.10% : 0.000001s : 9: predicate.const_output_eliminate 0.56% : 0.000004s : 34: predicate.depend_value_elim 1.16% : 0.000009s : 69: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 69: predicate.dict_get_item_eliminator 1.11% : 0.000009s : 69: predicate.dict_set_item_eliminator 0.45% : 0.000004s : 18: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 9: predicate.elim_not_effective 0.19% : 0.000001s : 9: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000010s : 78: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 78: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 78: predicate.environ_get_depend_swap 1.80% : 0.000014s : 112: predicate.environ_get_eliminate 1.20% : 0.000009s : 78: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.23% : 0.000017s : 99: predicate.float_depend_g_call 0.53% : 0.000004s : 34: predicate.float_environ_get_switch 0.71% : 0.000006s : 43: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 9: predicate.fold_const_symbol 0.60% : 0.000005s : 34: predicate.get_grad_eliminate 0.10% : 0.000001s : 9: predicate.graph_param_transform 0.55% : 0.000004s : 34: predicate.incorporate_call 0.50% : 0.000004s : 34: predicate.incorporate_call_switch 5.58% : 0.000044s : 254: predicate.inline 1.22% : 0.000010s : 57: predicate.inline_without_move 0.31% : 0.000002s : 34: predicate.j_node_and_user_rematch 0.70% : 0.000006s : 34: predicate.less_batch_normalization 1.62% : 0.000013s : 101: predicate.list_to_tuple_eliminator_ 2.61% : 0.000020s : 170: predicate.load_eliminater 0.37% : 0.000003s : 9: predicate.loop_unroll_after_grad 2.09% : 0.000016s : 130: predicate.loop_unroll_before_grad 1.48% : 0.000012s : 87: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 34: predicate.merge_addn 1.14% : 0.000009s : 71: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 71: predicate.mini_step_allgather_replace 1.10% : 0.000009s : 69: predicate.minmaximum_grad 0.46% : 0.000004s : 9: predicate.mutable_eliminate 0.18% : 0.000001s : 9: predicate.opt_reshape 0.18% : 0.000001s : 9: predicate.parallel_virtual_node 1.98% : 0.000016s : 99: predicate.partial_defer_inline 1.68% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 69: predicate.print_const_string_wrapper 0.56% : 0.000004s : 34: predicate.reduce_all_const_elim 1.34% : 0.000011s : 69: predicate.reduce_eliminate 2.62% : 0.000021s : 170: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 34: predicate.remove_not_recompute_node 1.84% : 0.000014s : 154: predicate.replace_applicator 0.62% : 0.000005s : 57: predicate.replace_old_param 0.12% : 0.000001s : 9: predicate.reset_defer_inline 1.08% : 0.000009s : 69: predicate.reshape_eliminate 1.15% : 0.000009s : 71: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 9: predicate.row_tensor_eliminate 1.36% : 0.000011s : 71: predicate.same_eliminate 0.36% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 34: predicate.shard_identity_eliminate 0.33% : 0.000003s : 18: predicate.special_op_eliminate 0.63% : 0.000005s : 34: predicate.specialize_transform 1.31% : 0.000010s : 71: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 57: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 9: predicate.switch_call_monad_eliminater 1.74% : 0.000014s : 99: predicate.switch_defer_inline 2.80% : 0.000022s : 170: predicate.switch_layer_defer_inline 4.76% : 0.000037s : 272: predicate.switch_simplify 1.10% : 0.000009s : 69: predicate.tile_eliminate 1.07% : 0.000008s : 69: predicate.transpose_eliminate 1.43% : 0.000011s : 87: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 87: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000011s : 87: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000022s : 135: predicate.tuple_list_get_item_eliminator 1.49% : 0.000012s : 87: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000016s : 121: predicate.tuple_list_set_item_eliminator 1.63% : 0.000013s : 101: predicate.tuple_to_list_eliminator_ 2.56% : 0.000020s : 170: predicate.updatestate_pure_node_eliminater 3.21% : 0.000025s : 204: predicate.updatestate_useless_node_eliminater 0.21% : 0.000002s : 9: predicate.value_based_eliminate 0.57% : 0.000004s : 34: predicate.virtual_dataset_eliminate 0.60% : 0.000005s : 34: predicate.virtual_output_eliminate 0.15% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001880 32 53.63% : 0.001008s : 12: func_graph_cloner_run.FuncGraphClonerGraph 46.37% : 0.000872s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072409 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.89% : 0.003544s : 1: add_attr 4.88% : 0.003533s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000080s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000132s : 1: auto_monad 0.04% : 0.000031s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.85% : 0.000612s : 1: bootstrap 0.04% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000024s : 1: control_data_broadcast_order 0.02% : 0.000014s : 1: convert_after_rewriter 0.05% : 0.000039s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.08% : 0.000058s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.06% : 0.000044s : 1: label_micro_interleaved_index 0.66% : 0.000479s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.90% : 0.000649s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000022s : 1: opt.transform.mutable_eliminate 7.01% : 0.005076s : 117: opt.transform.opt_a 0.08% : 0.000055s : 1: opt.transform.opt_after_cconv 0.05% : 0.000040s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000212s : 28: opt.transform.opt_b 0.12% : 0.000085s : 2: opt.transform.opt_trans_graph 0.08% : 0.000059s : 4: opt.transform.symbol_engine_opt 17.38% : 0.012583s : 1: opt_a 0.22% : 0.000156s : 1: opt_after_cconv 0.75% : 0.000540s : 1: opt_after_jit_grad 0.48% : 0.000346s : 1: opt_b 21.13% : 0.015298s : 1: optimize 0.04% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000032s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000056s : 1: pre_auto_parallel 0.06% : 0.000045s : 1: py_interpret_to_execute 0.03% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000048s : 1: remove_dup_value 2.85% : 0.002060s : 2: renormalize.infer 2.70% : 0.001952s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000073s : 1: rewriter_after_opt_a 0.19% : 0.000139s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000110s : 1: symbol_engine_optimizer 13.85% : 0.010031s : 1: task_emit 0.16% : 0.000117s : 1: tuple_transform 18.72% : 0.013552s : 1: type_inference 0.14% : 0.000104s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x3-kbk],max_mem:6.0M ... TotalTime = 24.6061, [24] [bootstrap]: 0.00069566 [type_inference]: 0.00707241 [event_method]: 1.464e-05 [auto_monad]: 5.704e-05 [graph_reusing]: 5.34e-06 [inline]: 2.01e-06 [add_attr]: 0.00443715, [1] [add_attr_with_inline]: 0.00442707, [1] [Cycle 1]: 4.073e-05, [2] [tag_attr]: 1.572e-05 [meta_addattr_fg_expand]: 3.63999e-06 [parallel-infer-symbol]: 3.33998e-06 [pre_auto_parallel]: 2.591e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 2.29999e-06 [pipeline_split]: 1.87001e-06 [optimize]: 0.0041204, [53] [py_interpret_to_execute]: 5.406e-05 [rewriter_before_opt_a]: 5.578e-05 [opt_a]: 0.00220047, [2] [Cycle 1]: 0.00159569, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 2.78e-05 [loop_unroll]: 2.056e-05 [a_1]: 0.00045391 [with_stream_mark]: 1.087e-05 [recompute_prepare]: 8.32998e-06 [updatestate_depend_eliminate]: 3.45003e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 1.31002e-06 [a_2]: 7.446e-05 [accelerated_algorithm]: 6.71999e-06 [shard]: 1.19998e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 6.01003e-06 [merge_send_recv]: 6.66e-06 [auto_parallel]: 5.92001e-06 [parallel]: 2.544e-05 [flash_sp]: 6.91999e-06 [merge_comm]: 3.26999e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 6.13998e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 7.04001e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.85002e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 8.80013e-07 [offload_activation]: 7.75998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.038e-05 [merge_recompute_call_nodes]: 9.49978e-07 [before_grad]: 8.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.15002e-06 [meta_fg_expand]: 1.99999e-06 [flash_sp_send_recv_attached]: 2.16998e-06 [receive_attached]: 1.37999e-06 [after_resolve]: 9.72999e-06 [a_after_grad]: 8.84e-06 [renormalize]: 0.00053177 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 2.01e-06 [auto_monad_eliminator]: 1.029e-05 [cse]: 1.95e-05 [a_3]: 3.863e-05 [Cycle 2]: 0.00059529, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.54999e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012623 [with_stream_mark]: 9.07001e-06 [recompute_prepare]: 6.24999e-06 [updatestate_depend_eliminate]: 2.58003e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.60002e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.783e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 5.07e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.41002e-06 [flash_sp]: 3.40998e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.06002e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 6.49999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 8.27e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 1.66998e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.52001e-06 [a_after_grad]: 8.66002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.90025e-07 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.242e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 8.80999e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.178e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 5.52001e-06 [mutable_eliminate]: 0.00044815 [opt_b]: 0.00018531, [1] [Cycle 1]: 0.00018001, [7] [b_1]: 0.00010828 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 4.50999e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.14999e-06 [renormalize]: 3.39991e-07 [cse]: 2.042e-05 [optimize_parallel_all_gather_comm]: 1.222e-05 [overlap_param_gather]: 1.27e-06 [cconv]: 1.624e-05 [loop_unroll]: 0.00043045 [opt_after_cconv]: 9.291e-05, [1] [Cycle 1]: 8.739e-05, [7] [c_1]: 2.805e-05 [parameter_eliminate]: 2.01e-06 [updatestate_depend_eliminate]: 4.43999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.533e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 6.45002e-06 [tuple_transform]: 6.807e-05, [1] [Cycle 1]: 6.381e-05, [4] [d_1]: 3.867e-05 [none_parameter_eliminate]: 7.99977e-07 [renormalize]: 2.00002e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.09998e-06 [add_recomputation]: 3.925e-05 [cse_after_recomputation]: 2.003e-05, [1] [Cycle 1]: 1.59e-05, [1] [cse]: 1.076e-05 [environ_conv]: 2.96999e-06 [swap_dp_allreduce_reducescatter]: 4.14002e-06 [bias_add_comm_swap]: 1.19998e-06 [label_micro_interleaved_index]: 2.49001e-06 [label_fine_grained_interleaved_index]: 1.33002e-06 [merge_cast_opt]: 4.7998e-07 [slice_recompute_activation]: 9.60019e-07 [micro_interleaved_order_control]: 1.14e-06 [assign_add_opt]: 5.60016e-07 [ForceFp32Comm]: 5.50004e-07 [remove_cast_before_assign_add]: 3.29979e-07 [full_micro_interleaved_order_control]: 1.00999e-06 [reorder_send_recv_between_fp_bp]: 1.00001e-06 [comm_op_add_attrs]: 3.69997e-07 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.39e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.37002e-06 [overlap_recompute_and_grad_model_parallel]: 4.89998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.65002e-06 [overlap_grad_ring_attention]: 4.12e-06 [overlap_grad_flash_sp]: 1.906e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.56e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 0.00013942, [1] [Cycle 1]: 0.00013516, [6] [build]: 2.69001e-06 [elim_shapecalc]: 8.97e-06 [elim_not_effective]: 1.261e-05 [opt_reshape]: 7.164e-05 [fold_const_symbol]: 1.058e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.21998e-06 [pipeline_parallel_scheduler]: 1.71002e-06 [auto_monad_reorder]: 1.693e-05 [get_jit_bprop_graph]: 1.48002e-06 [rewriter_after_jit_bprop_graph]: 4.42e-06 [opt_after_jit_grad]: 0.00052274 [validate]: 3.443e-05 [backend_pass]: 1.17e-06 [task_emit]: 24.5889 [execute]: 9.84999e-06 Sums bootstrap : 0.000696s : 0.00% type_inference : 0.007072s : 0.03% event_method : 0.000015s : 0.00% auto_monad : 0.000057s : 0.00% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000054s : 0.00% optimize.rewriter_before_opt_a : 0.000056s : 0.00% optimize.opt_a.expand_dump_flag : 0.000002s : 0.00% optimize.opt_a.switch_simplify : 0.000034s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000580s : 0.00% optimize.opt_a.with_stream_mark : 0.000020s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000002s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000011s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000005s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000014s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000002s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000532s : 0.00% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000016s : 0.00% optimize.opt_a.cse : 0.000032s : 0.00% optimize.opt_a.a_3 : 0.000071s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000448s : 0.00% optimize.opt_b.b_1 : 0.000108s : 0.00% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000012s : 0.00% optimize.overlap_param_gather : 0.000001s : 0.00% optimize.cconv : 0.000016s : 0.00% optimize.loop_unroll : 0.000430s : 0.00% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000006s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000001s : 0.00% optimize.add_recomputation : 0.000039s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000003s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000004s : 0.00% optimize.bias_add_comm_swap : 0.000001s : 0.00% optimize.label_micro_interleaved_index : 0.000002s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000001s : 0.00% optimize.merge_cast_opt : 0.000000s : 0.00% optimize.slice_recompute_activation : 0.000001s : 0.00% optimize.micro_interleaved_order_control : 0.000001s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000000s : 0.00% optimize.full_micro_interleaved_order_control : 0.000001s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000001s : 0.00% optimize.comm_op_add_attrs : 0.000000s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000072s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000523s : 0.00% validate : 0.000034s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 24.588888s : 99.95% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000163 30 13.23% : 0.000021s : 5: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000002s : 2: substitution.fold_const_symbol 2.39% : 0.000004s : 4: substitution.graph_param_transform 69.49% : 0.000113s : 3: substitution.inline 1.59% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.23% : 0.000004s : 4: substitution.replace_old_param 6.10% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007018 2 91.10% : 0.006394s : 1: type_inference.infer 8.90% : 0.000624s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.42% : 0.000028s : 3: replace.inline 28.58% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 92.58% : 0.000111s : 3: match.inline 7.42% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.79% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.48% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.00% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.29% : 0.000002s : 11: predicate.reduce_eliminate 2.62% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.41% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 54: predicate.switch_simplify 0.91% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.69% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.79% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.26% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.43% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000396 8 43.42% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.58% : 0.000224s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 24.616384 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.02% : 0.004442s : 1: add_attr 0.02% : 0.004431s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000043s : 1: add_recomputation 0.00% : 0.000003s : 1: assign_add_opt 0.00% : 0.000063s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000004s : 1: bias_add_comm_swap 0.00% : 0.000723s : 1: bootstrap 0.00% : 0.000020s : 1: cconv 0.00% : 0.000003s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000006s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000004s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000004s : 1: label_fine_grained_interleaved_index 0.00% : 0.000005s : 1: label_micro_interleaved_index 0.00% : 0.000438s : 1: loop_unroll 0.00% : 0.000003s : 1: merge_cast_opt 0.00% : 0.000004s : 1: micro_interleaved_order_control 0.00% : 0.000456s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000011s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000012s : 1: opt.transform.mutable_eliminate 0.00% : 0.000941s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000089s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000100s : 4: opt.transform.symbol_engine_opt 0.01% : 0.002203s : 1: opt_a 0.00% : 0.000096s : 1: opt_after_cconv 0.00% : 0.000531s : 1: opt_after_jit_grad 0.00% : 0.000189s : 1: opt_b 0.02% : 0.004125s : 1: optimize 0.00% : 0.000016s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000004s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000030s : 1: pre_auto_parallel 0.00% : 0.000059s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.00% : 0.000010s : 1: remove_dup_value 0.00% : 0.000285s : 1: renormalize.infer 0.00% : 0.000241s : 1: renormalize.specialize 0.00% : 0.000004s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.00% : 0.000060s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000004s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000007s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000142s : 1: symbol_engine_optimizer 99.89% : 24.588911s : 1: task_emit 0.00% : 0.000071s : 1: tuple_transform 0.03% : 0.007089s : 1: type_inference 0.00% : 0.000063s : 1: validate TotalTime = 0.524219, [24] [bootstrap]: 0.0004886 [type_inference]: 0.0288357 [event_method]: 1.037e-05 [auto_monad]: 5.229e-05 [graph_reusing]: 4.94e-06 [inline]: 2.37001e-06 [add_attr]: 0.00311835, [1] [add_attr_with_inline]: 0.00311003, [1] [Cycle 1]: 5.091e-05, [2] [tag_attr]: 1.216e-05 [meta_addattr_fg_expand]: 3.14999e-06 [parallel-infer-symbol]: 2.76999e-06 [pre_auto_parallel]: 2.136e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00372831, [53] [py_interpret_to_execute]: 1.488e-05 [rewriter_before_opt_a]: 4.004e-05 [opt_a]: 0.00188585, [2] [Cycle 1]: 0.00128291, [45] [expand_dump_flag]: 2.44999e-06 [switch_simplify]: 2.368e-05 [loop_unroll]: 1.396e-05 [a_1]: 0.00029267 [with_stream_mark]: 1.336e-05 [recompute_prepare]: 7.56999e-06 [updatestate_depend_eliminate]: 3.51999e-06 [updatestate_assign_eliminate]: 4.19002e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.662e-05 [accelerated_algorithm]: 6.89001e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.16002e-06 [auto_parallel]: 5.76e-06 [parallel]: 1.829e-05 [flash_sp]: 6.85002e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.20998e-06 [matmul_add_comm_reduction]: 9.24e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.74999e-06 [merge_forward]: 3.86001e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 9.46003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.46998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.39999e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.121e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00036537 [add_forward_monad_depend]: 4.32998e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.3e-05 [cse]: 2.897e-05 [a_3]: 4.03e-05 [Cycle 2]: 0.00059349, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.56998e-06 [a_1]: 0.0001254 [with_stream_mark]: 9.48002e-06 [recompute_prepare]: 5.40999e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.859e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.64e-06 [parallel]: 4.25e-06 [flash_sp]: 2.92002e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.78e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.63997e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.24e-06 [a_after_grad]: 8.02e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.302e-05 [a_3]: 3.172e-05 [py_interpret_to_execute_after_opt_a]: 7.32002e-06 [slice_cell_reuse_recomputed_activation]: 2.17001e-06 [rewriter_after_opt_a]: 3.161e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 5.08002e-06 [mutable_eliminate]: 0.00047137 [opt_b]: 0.00018494, [1] [Cycle 1]: 0.00017887, [7] [b_1]: 0.00010861 [b_2]: 7.53e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.59001e-06 [renormalize]: 4.69998e-07 [cse]: 1.7e-05 [optimize_parallel_all_gather_comm]: 1.616e-05 [overlap_param_gather]: 2.13002e-06 [cconv]: 2.221e-05 [loop_unroll]: 0.00041599 [opt_after_cconv]: 9.492e-05, [1] [Cycle 1]: 8.931e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.21003e-06 [cse]: 1.622e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.252e-05 [tuple_transform]: 7.021e-05, [1] [Cycle 1]: 6.599e-05, [4] [d_1]: 4.046e-05 [none_parameter_eliminate]: 1.41002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.314e-05 [cse_after_recomputation]: 2.111e-05, [1] [Cycle 1]: 1.665e-05, [1] [cse]: 1.118e-05 [environ_conv]: 5.36002e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.48998e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.98e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.46998e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.87002e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50001e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 4.41002e-06 [overlap_recompute_and_grad_model_parallel]: 4.65999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.87001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 1.87999e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.734e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.55999e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.897e-05, [1] [Cycle 1]: 6.479e-05, [6] [build]: 2.49999e-06 [elim_shapecalc]: 8.39998e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 9.15999e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.559e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00044942 [validate]: 3.316e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.487212 [execute]: 9.29998e-06 Sums bootstrap : 0.000489s : 0.09% type_inference : 0.028836s : 5.54% event_method : 0.000010s : 0.00% auto_monad : 0.000052s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.00% optimize.rewriter_before_opt_a : 0.000040s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.01% optimize.opt_a.loop_unroll : 0.000020s : 0.00% optimize.opt_a.a_1 : 0.000418s : 0.08% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000023s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000365s : 0.07% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000042s : 0.01% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000471s : 0.09% optimize.opt_b.b_1 : 0.000109s : 0.02% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000416s : 0.08% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000449s : 0.09% validate : 0.000033s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.487212s : 93.67% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000121 26 18.53% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.19% : 0.000001s : 2: substitution.fold_const_symbol 4.59% : 0.000006s : 4: substitution.graph_param_transform 65.01% : 0.000078s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000004s : 4: substitution.remove_not_recompute_node 3.47% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.028787 2 98.66% : 0.028402s : 1: type_inference.infer 1.34% : 0.000385s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 1.03% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.83% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.75% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.82% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.17% : 0.000002s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.92% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.37% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.16% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.94% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.01% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.15% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000283 6 43.38% : 0.000123s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.62% : 0.000160s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.532353 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.59% : 0.003123s : 1: add_attr 0.58% : 0.003113s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000058s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.10% : 0.000524s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000016s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.08% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.09% : 0.000480s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.14% : 0.000770s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000091s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.01% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.35% : 0.001889s : 1: opt_a 0.02% : 0.000098s : 1: opt_after_cconv 0.09% : 0.000459s : 1: opt_after_jit_grad 0.04% : 0.000188s : 1: opt_b 0.70% : 0.003732s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000025s : 1: pre_auto_parallel 0.00% : 0.000019s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.04% : 0.000200s : 1: renormalize.infer 0.03% : 0.000159s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000045s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 91.52% : 0.487236s : 1: task_emit 0.01% : 0.000073s : 1: tuple_transform 5.42% : 0.028854s : 1: type_inference 0.01% : 0.000055s : 1: validate TotalTime = 0.578808, [24] [bootstrap]: 0.0004919 [type_inference]: 0.022146 [event_method]: 1.43e-05 [auto_monad]: 5.627e-05 [graph_reusing]: 5.74e-06 [inline]: 2.78998e-06 [add_attr]: 0.00310746, [1] [add_attr_with_inline]: 0.00309839, [1] [Cycle 1]: 4.883e-05, [2] [tag_attr]: 1.641e-05 [meta_addattr_fg_expand]: 3.8e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.653e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00419263, [53] [py_interpret_to_execute]: 2.056e-05 [rewriter_before_opt_a]: 5.904e-05 [opt_a]: 0.00223658, [2] [Cycle 1]: 0.0016197, [45] [expand_dump_flag]: 3.13998e-06 [switch_simplify]: 3.191e-05 [loop_unroll]: 2.137e-05 [a_1]: 0.00045701 [with_stream_mark]: 1.378e-05 [recompute_prepare]: 7.66999e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.525e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 8.62e-06 [auto_parallel]: 5.56e-06 [parallel]: 2.194e-05 [flash_sp]: 7.53999e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.6e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.32999e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 8.91002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 9.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.51e-06 [flash_sp_send_recv_attached]: 3.06001e-06 [receive_attached]: 2.55002e-06 [after_resolve]: 1.074e-05 [a_after_grad]: 9.43002e-06 [renormalize]: 0.00051376 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 1.86998e-06 [auto_monad_eliminator]: 1.339e-05 [cse]: 2.751e-05 [a_3]: 4.161e-05 [Cycle 2]: 0.00060708, [45] [expand_dump_flag]: 1.17e-06 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012551 [with_stream_mark]: 1.009e-05 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 7.142e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.72e-06 [auto_parallel]: 5.20001e-06 [parallel]: 4.18999e-06 [flash_sp]: 3.53999e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.99001e-06 [matmul_add_comm_reduction]: 5.37001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.78002e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.07e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.18001e-06 [after_resolve]: 8.97e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 1.14998e-06 [auto_monad_eliminator]: 6.62002e-06 [cse]: 1.356e-05 [a_3]: 3.223e-05 [py_interpret_to_execute_after_opt_a]: 8.06001e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 3.088e-05 [convert_after_rewriter]: 6.99001e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00047071 [opt_b]: 0.00024761, [1] [Cycle 1]: 0.00024131, [7] [b_1]: 0.00016649 [b_2]: 7.83999e-06 [updatestate_depend_eliminate]: 5.52001e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.42001e-06 [renormalize]: 3.9002e-07 [cse]: 1.742e-05 [optimize_parallel_all_gather_comm]: 1.669e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.447e-05 [loop_unroll]: 0.00044092 [opt_after_cconv]: 9.588e-05, [1] [Cycle 1]: 9.017e-05, [7] [c_1]: 2.851e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.613e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.28e-05 [tuple_transform]: 6.901e-05, [1] [Cycle 1]: 6.454e-05, [4] [d_1]: 3.907e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.435e-05 [cse_after_recomputation]: 2.03e-05, [1] [Cycle 1]: 1.538e-05, [1] [cse]: 1.019e-05 [environ_conv]: 4.89e-06 [swap_dp_allreduce_reducescatter]: 5.25001e-06 [bias_add_comm_swap]: 2.78998e-06 [label_micro_interleaved_index]: 4.84003e-06 [label_fine_grained_interleaved_index]: 2.37999e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.38e-06 [overlap_recompute_and_grad_model_parallel]: 4.11001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.766e-05 [begin_end_overlap_inline]: 6.09987e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 6.934e-05, [1] [Cycle 1]: 6.532e-05, [6] [build]: 2.78e-06 [elim_shapecalc]: 8.58001e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 6.03998e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 3.39991e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.595e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 4.16001e-06 [opt_after_jit_grad]: 0.00048542 [validate]: 3.653e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.547971 [execute]: 9.22001e-06 Sums bootstrap : 0.000492s : 0.09% type_inference : 0.022146s : 3.85% event_method : 0.000014s : 0.00% auto_monad : 0.000056s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.01% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000583s : 0.10% optimize.opt_a.with_stream_mark : 0.000024s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000514s : 0.09% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000041s : 0.01% optimize.opt_a.a_3 : 0.000074s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000471s : 0.08% optimize.opt_b.b_1 : 0.000166s : 0.03% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000441s : 0.08% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000485s : 0.08% validate : 0.000037s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.547971s : 95.35% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000172 30 14.73% : 0.000025s : 5: substitution.arithmetic_simplify 0.99% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 67.03% : 0.000115s : 3: substitution.inline 1.95% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000005s : 4: substitution.remove_not_recompute_node 2.25% : 0.000004s : 4: substitution.replace_old_param 6.52% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.022099 2 97.35% : 0.021513s : 1: type_inference.infer 2.65% : 0.000586s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.16% : 0.000027s : 3: replace.inline 29.84% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 91.72% : 0.000113s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000210 1131 0.67% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.43% : 0.000001s : 8: predicate.addn_check_dump 0.58% : 0.000001s : 11: predicate.addn_zero_filter 0.58% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.78% : 0.000004s : 19: predicate.arithmetic_simplify 0.62% : 0.000001s : 11: predicate.cast_eliminate 0.50% : 0.000001s : 8: predicate.check_bprop_eliminate 0.43% : 0.000001s : 8: predicate.compare_switch_simplify 0.19% : 0.000000s : 4: predicate.const_output_eliminate 0.50% : 0.000001s : 8: predicate.depend_value_elim 0.68% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.75% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.69% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.97% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 4: predicate.elim_not_effective 0.29% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.88% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.84% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.84% : 0.000002s : 15: predicate.environ_get_depend_swap 1.58% : 0.000003s : 23: predicate.environ_get_eliminate 0.90% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.98% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.70% : 0.000004s : 16: predicate.float_depend_g_call 0.43% : 0.000001s : 8: predicate.float_environ_get_switch 0.69% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 4: predicate.fold_const_symbol 0.56% : 0.000001s : 8: predicate.get_grad_eliminate 0.20% : 0.000000s : 4: predicate.graph_param_transform 0.56% : 0.000001s : 8: predicate.incorporate_call 0.43% : 0.000001s : 8: predicate.incorporate_call_switch 4.69% : 0.000010s : 51: predicate.inline 0.66% : 0.000001s : 8: predicate.inline_without_move 0.28% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.67% : 0.000001s : 8: predicate.less_batch_normalization 1.31% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.86% : 0.000004s : 32: predicate.load_eliminater 0.91% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.64% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.30% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.47% : 0.000001s : 8: predicate.merge_addn 0.49% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.53% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.58% : 0.000001s : 11: predicate.minmaximum_grad 1.00% : 0.000002s : 4: predicate.mutable_eliminate 0.28% : 0.000001s : 4: predicate.opt_reshape 0.31% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 16: predicate.partial_defer_inline 1.10% : 0.000002s : 17: predicate.partial_eliminate 0.66% : 0.000001s : 11: predicate.print_const_string_wrapper 0.45% : 0.000001s : 8: predicate.reduce_all_const_elim 0.91% : 0.000002s : 11: predicate.reduce_eliminate 1.86% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.39% : 0.000001s : 8: predicate.remove_not_recompute_node 1.10% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.24% : 0.000001s : 4: predicate.reset_defer_inline 0.66% : 0.000001s : 11: predicate.reshape_eliminate 0.63% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.32% : 0.000001s : 4: predicate.row_tensor_eliminate 0.56% : 0.000001s : 8: predicate.same_eliminate 0.39% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.74% : 0.000002s : 8: predicate.shard_identity_eliminate 0.64% : 0.000001s : 8: predicate.special_op_eliminate 0.58% : 0.000001s : 8: predicate.specialize_transform 0.72% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.67% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000002s : 16: predicate.switch_defer_inline 1.50% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.65% : 0.000008s : 54: predicate.switch_simplify 0.64% : 0.000001s : 11: predicate.tile_eliminate 0.66% : 0.000001s : 11: predicate.transpose_eliminate 1.12% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 24.02% : 0.000050s : 19: predicate.tuple_list_get_item_const_eliminator 1.07% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.43% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.03% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.27% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.77% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.50% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.57% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.55% : 0.000001s : 8: predicate.virtual_output_eliminate 0.27% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.42% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 45.10% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.90% : 0.000205s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.587784 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.53% : 0.003112s : 1: add_attr 0.53% : 0.003102s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000062s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.09% : 0.000530s : 1: bootstrap 0.00% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.08% : 0.000450s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.08% : 0.000480s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.16% : 0.000953s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000145s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.01% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.38% : 0.002239s : 1: opt_a 0.02% : 0.000099s : 1: opt_after_cconv 0.08% : 0.000497s : 1: opt_after_jit_grad 0.04% : 0.000251s : 1: opt_b 0.71% : 0.004197s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000031s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.05% : 0.000267s : 1: renormalize.infer 0.04% : 0.000240s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 93.23% : 0.547993s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 3.77% : 0.022164s : 1: type_inference 0.01% : 0.000063s : 1: validate . TotalTime = 6.7333, [24] [bootstrap]: 0.00054186 [type_inference]: 0.0395895 [event_method]: 4.906e-05 [auto_monad]: 9.659e-05 [graph_reusing]: 6.88e-06 [inline]: 1.72999e-06 [add_attr]: 0.00288111, [1] [add_attr_with_inline]: 0.00287356, [1] [Cycle 1]: 6.507e-05, [2] [tag_attr]: 3.221e-05 [meta_addattr_fg_expand]: 8.15999e-06 [parallel-infer-symbol]: 2.31e-06 [pre_auto_parallel]: 4.554e-05 [insert-virtual-dataset]: 1.12e-06 [parallel-infer-symbol-second]: 6.59988e-07 [dataset_repeat_opt]: 1.19998e-06 [pipeline_split]: 1.04998e-06 [optimize]: 0.0378023, [53] [py_interpret_to_execute]: 3.538e-05 [rewriter_before_opt_a]: 0.0199921 [opt_a]: 0.0151874, [3] [Cycle 1]: 0.00778227, [45] [expand_dump_flag]: 5.42999e-06 [switch_simplify]: 0.00013859 [loop_unroll]: 6.391e-05 [a_1]: 0.00148098 [with_stream_mark]: 2.238e-05 [recompute_prepare]: 2.319e-05 [updatestate_depend_eliminate]: 9.47999e-06 [updatestate_assign_eliminate]: 6.75998e-06 [updatestate_loads_eliminate]: 6.76e-06 [parameter_eliminate]: 2.06998e-06 [a_2]: 0.000243 [accelerated_algorithm]: 3.056e-05 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 3.92998e-06 [shard_inline]: 1.641e-05 [merge_send_recv]: 1.381e-05 [auto_parallel]: 1.285e-05 [parallel]: 1.551e-05 [flash_sp]: 1.084e-05 [merge_comm]: 9.79999e-06 [allreduce_fusion]: 8.54998e-06 [matmul_add_comm_reduction]: 2.309e-05 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 1.849e-05 [virtual_dataset]: 1.589e-05 [get_grad_eliminate_]: 1.533e-05 [virtual_output]: 1.558e-05 [merge_forward]: 8.99998e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 1.69e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.796e-05 [merge_recompute_call_nodes]: 9.19972e-07 [before_grad]: 2.676e-05 [set_forward_comm_id_for_comm_node_pass]: 9.46e-06 [meta_fg_expand]: 0.0016572 [flash_sp_send_recv_attached]: 3.65e-06 [receive_attached]: 2.11998e-06 [after_resolve]: 6.163e-05 [a_after_grad]: 8.43e-05 [renormalize]: 0.00277757 [add_forward_monad_depend]: 9.13002e-06 [auto_monad_grad]: 6.04001e-06 [auto_monad_eliminator]: 5.713e-05 [cse]: 0.00016716 [a_3]: 0.0003507 [Cycle 2]: 0.00636652, [45] [expand_dump_flag]: 1.95001e-06 [switch_simplify]: 4.885e-05 [loop_unroll]: 4.511e-05 [a_1]: 0.00433852 [with_stream_mark]: 2.356e-05 [recompute_prepare]: 1.381e-05 [updatestate_depend_eliminate]: 6.47001e-06 [updatestate_assign_eliminate]: 5.22e-06 [updatestate_loads_eliminate]: 4.44002e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 0.00014362 [accelerated_algorithm]: 1.515e-05 [shard]: 1.71e-06 [meta_shard_fg_expand]: 2.82002e-06 [shard_inline]: 9.91e-06 [merge_send_recv]: 9.12001e-06 [auto_parallel]: 9.37001e-06 [parallel]: 7.61001e-06 [flash_sp]: 3.28e-06 [merge_comm]: 5.96998e-06 [allreduce_fusion]: 5.66e-06 [matmul_add_comm_reduction]: 1.094e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.261e-05 [virtual_dataset]: 9.82001e-06 [get_grad_eliminate_]: 9.62999e-06 [virtual_output]: 1.011e-05 [merge_forward]: 5.44e-06 [cell_reuse_recompute_pass]: 1.74e-06 [offload_activation]: 1.183e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.941e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.716e-05 [set_forward_comm_id_for_comm_node_pass]: 5.92999e-06 [meta_fg_expand]: 0.0001317 [flash_sp_send_recv_attached]: 1.64e-06 [receive_attached]: 1.99e-06 [after_resolve]: 1.807e-05 [a_after_grad]: 1.835e-05 [renormalize]: 0.00093858 [add_forward_monad_depend]: 4.70999e-06 [auto_monad_grad]: 1.96998e-06 [auto_monad_eliminator]: 1.878e-05 [cse]: 6.08e-05 [a_3]: 7.97e-05 [Cycle 3]: 0.00101999, [45] [expand_dump_flag]: 1.84e-06 [switch_simplify]: 1.231e-05 [loop_unroll]: 1.012e-05 [a_1]: 0.00028819 [with_stream_mark]: 1.166e-05 [recompute_prepare]: 1.073e-05 [updatestate_depend_eliminate]: 5.74e-06 [updatestate_assign_eliminate]: 4.63001e-06 [updatestate_loads_eliminate]: 4.35e-06 [parameter_eliminate]: 1.30001e-06 [a_2]: 0.00013805 [accelerated_algorithm]: 1.297e-05 [shard]: 1.29998e-06 [meta_shard_fg_expand]: 2.31e-06 [shard_inline]: 1.02e-05 [merge_send_recv]: 8.48999e-06 [auto_parallel]: 8.06001e-06 [parallel]: 6.41998e-06 [flash_sp]: 1.00001e-06 [merge_comm]: 6.19001e-06 [allreduce_fusion]: 5.22999e-06 [matmul_add_comm_reduction]: 1.024e-05 [allreduce_slice_to_reducescatter]: 3.70026e-07 [virtual_shard_identity]: 1.167e-05 [virtual_dataset]: 1.031e-05 [get_grad_eliminate_]: 9.34e-06 [virtual_output]: 9.70002e-06 [merge_forward]: 5.27999e-06 [cell_reuse_recompute_pass]: 1.73002e-06 [offload_activation]: 1.162e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.872e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.665e-05 [set_forward_comm_id_for_comm_node_pass]: 5.86998e-06 [meta_fg_expand]: 3.72998e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.61998e-06 [after_resolve]: 1.633e-05 [a_after_grad]: 1.661e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.62001e-06 [auto_monad_grad]: 1.37e-06 [auto_monad_eliminator]: 1.266e-05 [cse]: 3.282e-05 [a_3]: 6.723e-05 [py_interpret_to_execute_after_opt_a]: 1.489e-05 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 5.651e-05 [convert_after_rewriter]: 1.012e-05 [order_py_execute_after_rewriter]: 7.93001e-06 [mutable_eliminate]: 0.00071187 [opt_b]: 0.00035126, [1] [Cycle 1]: 0.00034356, [7] [b_1]: 0.00022833 [b_2]: 1.28e-05 [updatestate_depend_eliminate]: 8.70001e-06 [updatestate_assign_eliminate]: 4.68999e-06 [updatestate_loads_eliminate]: 4.65999e-06 [renormalize]: 3.89991e-07 [cse]: 4.481e-05 [optimize_parallel_all_gather_comm]: 2.645e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.734e-05 [loop_unroll]: 0.00048113 [opt_after_cconv]: 0.00015402, [1] [Cycle 1]: 0.00014754, [7] [c_1]: 5.525e-05 [parameter_eliminate]: 2.86e-06 [updatestate_depend_eliminate]: 8.59998e-06 [updatestate_assign_eliminate]: 4.85001e-06 [updatestate_loads_eliminate]: 4.4e-06 [cse]: 3.729e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 3.901e-05 [tuple_transform]: 0.00011304, [1] [Cycle 1]: 0.00010794, [4] [d_1]: 7.604e-05 [none_parameter_eliminate]: 2.14999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.093e-05 [partial_unused_args_eliminate]: 1.58002e-06 [add_recomputation]: 6.435e-05 [cse_after_recomputation]: 3.64e-05, [1] [Cycle 1]: 3.147e-05, [1] [cse]: 2.589e-05 [environ_conv]: 9.81998e-06 [swap_dp_allreduce_reducescatter]: 8.89e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.32998e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 7.99977e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.951e-05 [grouped_pairwise_exchange_alltoall]: 1.12999e-06 [offloading_packed_experts]: 5.50001e-06 [overlap_recompute_and_grad_model_parallel]: 6.49001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.25999e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 5.12999e-06 [overlap_grad_flash_sp]: 2.84e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 0.00010761, [1] [Cycle 1]: 0.00010322, [6] [build]: 1.069e-05 [elim_shapecalc]: 1.429e-05 [elim_not_effective]: 2.099e-05 [opt_reshape]: 1.098e-05 [fold_const_symbol]: 1.681e-05 [renormalize]: 3.9002e-07 [detach_backward]: 2.22999e-06 [pipeline_parallel_scheduler]: 1.56998e-06 [auto_monad_reorder]: 2.549e-05 [get_jit_bprop_graph]: 2.11998e-06 [rewriter_after_jit_bprop_graph]: 4.12e-06 [opt_after_jit_grad]: 0.00052173 [validate]: 5.272e-05 [backend_pass]: 1.09e-06 [task_emit]: 6.65142 [execute]: 9.46998e-06 Sums bootstrap : 0.000542s : 0.01% type_inference : 0.039589s : 0.59% event_method : 0.000049s : 0.00% auto_monad : 0.000097s : 0.00% graph_reusing : 0.000007s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.00% parallel-infer-symbol : 0.000002s : 0.00% pre_auto_parallel : 0.000046s : 0.00% insert-virtual-dataset : 0.000001s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000001s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.00% optimize.rewriter_before_opt_a : 0.019992s : 0.30% optimize.opt_a.expand_dump_flag : 0.000009s : 0.00% optimize.opt_a.switch_simplify : 0.000200s : 0.00% optimize.opt_a.loop_unroll : 0.000119s : 0.00% optimize.opt_a.a_1 : 0.006108s : 0.09% optimize.opt_a.with_stream_mark : 0.000058s : 0.00% optimize.opt_a.recompute_prepare : 0.000048s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000525s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000059s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.00% optimize.opt_a.shard_inline : 0.000037s : 0.00% optimize.opt_a.merge_send_recv : 0.000031s : 0.00% optimize.opt_a.auto_parallel : 0.000030s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000015s : 0.00% optimize.opt_a.merge_comm : 0.000022s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.00% optimize.opt_a.virtual_dataset : 0.000036s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.00% optimize.opt_a.virtual_output : 0.000035s : 0.00% optimize.opt_a.merge_forward : 0.000020s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000040s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000061s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001793s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000096s : 0.00% optimize.opt_a.a_after_grad : 0.000119s : 0.00% optimize.opt_a.renormalize : 0.003716s : 0.06% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000089s : 0.00% optimize.opt_a.cse : 0.000261s : 0.00% optimize.opt_a.a_3 : 0.000498s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000057s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000008s : 0.00% optimize.mutable_eliminate : 0.000712s : 0.01% optimize.opt_b.b_1 : 0.000228s : 0.00% optimize.opt_b.b_2 : 0.000013s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000045s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.00% optimize.loop_unroll : 0.000481s : 0.01% optimize.opt_after_cconv.c_1 : 0.000055s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000037s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000039s : 0.00% optimize.tuple_transform.d_1 : 0.000076s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000064s : 0.00% optimize.cse_after_recomputation.cse : 0.000026s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000522s : 0.01% validate : 0.000053s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 6.651418s : 98.85% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.003425 231 1.50% : 0.000051s : 12: substitution.arithmetic_simplify 0.72% : 0.000024s : 4: substitution.cast_eliminate 0.10% : 0.000003s : 6: substitution.elim_not_effective 0.16% : 0.000005s : 5: substitution.float_depend_g_call 0.12% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.07% : 0.000002s : 6: substitution.fold_const_symbol 0.22% : 0.000008s : 9: substitution.graph_param_transform 0.08% : 0.000003s : 2: substitution.incorporate_call 0.06% : 0.000002s : 2: substitution.incorporate_call_switch 15.31% : 0.000524s : 17: substitution.inline 0.49% : 0.000017s : 2: substitution.inline_without_move 0.32% : 0.000011s : 22: substitution.j_node_and_user_rematch 0.46% : 0.000016s : 3: substitution.less_batch_normalization 0.45% : 0.000016s : 11: substitution.minmaximum_grad 0.17% : 0.000006s : 5: substitution.partial_eliminate 0.44% : 0.000015s : 22: substitution.remove_not_recompute_node 0.80% : 0.000027s : 10: substitution.replace_applicator 0.32% : 0.000011s : 15: substitution.replace_old_param 0.08% : 0.000003s : 1: substitution.set_cell_output_no_recompute 74.58% : 0.002554s : 11: substitution.tuple_list_convert_item_index_to_positive 0.41% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 0.55% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 2.04% : 0.000070s : 30: substitution.tuple_list_get_item_eliminator 0.56% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.039520 2 96.19% : 0.038013s : 1: type_inference.infer 3.81% : 0.001508s : 1: type_inference.specialize ------[replace.] 0.000231 33 57.49% : 0.000133s : 17: replace.inline 42.51% : 0.000098s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000548 33 93.80% : 0.000514s : 17: match.inline 6.20% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000834 5998 1.21% : 0.000010s : 70: predicate.accumulaten_eliminater 0.31% : 0.000003s : 9: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 34: predicate.addn_check_dump 1.08% : 0.000009s : 70: predicate.addn_zero_filter 0.99% : 0.000008s : 70: predicate.adjust_all_reduce_mul_add 2.58% : 0.000022s : 104: predicate.arithmetic_simplify 1.23% : 0.000010s : 70: predicate.cast_eliminate 1.10% : 0.000009s : 71: predicate.check_bprop_eliminate 0.49% : 0.000004s : 34: predicate.compare_switch_simplify 0.09% : 0.000001s : 9: predicate.const_output_eliminate 0.50% : 0.000004s : 34: predicate.depend_value_elim 1.14% : 0.000009s : 70: predicate.dict_get_item_const_eliminator 1.31% : 0.000011s : 70: predicate.dict_get_item_eliminator 1.07% : 0.000009s : 70: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 18: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 9: predicate.elim_not_effective 0.19% : 0.000002s : 9: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000010s : 79: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 79: predicate.environ_get_add_eliminate 1.16% : 0.000010s : 79: predicate.environ_get_depend_swap 1.77% : 0.000015s : 113: predicate.environ_get_eliminate 1.15% : 0.000010s : 79: predicate.environ_get_set_eliminate 1.59% : 0.000013s : 103: predicate.exchange_switch_depend_value 2.12% : 0.000018s : 103: predicate.float_depend_g_call 0.50% : 0.000004s : 34: predicate.float_environ_get_switch 0.67% : 0.000006s : 43: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 9: predicate.fold_const_symbol 0.56% : 0.000005s : 34: predicate.get_grad_eliminate 0.10% : 0.000001s : 9: predicate.graph_param_transform 0.51% : 0.000004s : 34: predicate.incorporate_call 0.48% : 0.000004s : 34: predicate.incorporate_call_switch 5.26% : 0.000044s : 259: predicate.inline 1.21% : 0.000010s : 57: predicate.inline_without_move 0.28% : 0.000002s : 34: predicate.j_node_and_user_rematch 0.78% : 0.000006s : 34: predicate.less_batch_normalization 1.64% : 0.000014s : 104: predicate.list_to_tuple_eliminator_ 2.66% : 0.000022s : 174: predicate.load_eliminater 0.38% : 0.000003s : 9: predicate.loop_unroll_after_grad 2.21% : 0.000018s : 138: predicate.loop_unroll_before_grad 1.45% : 0.000012s : 88: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 34: predicate.merge_addn 1.11% : 0.000009s : 71: predicate.micro_step_allgather_replace 1.17% : 0.000010s : 71: predicate.mini_step_allgather_replace 1.08% : 0.000009s : 70: predicate.minmaximum_grad 0.41% : 0.000003s : 9: predicate.mutable_eliminate 0.15% : 0.000001s : 9: predicate.opt_reshape 0.16% : 0.000001s : 9: predicate.parallel_virtual_node 1.97% : 0.000016s : 103: predicate.partial_defer_inline 1.64% : 0.000014s : 95: predicate.partial_eliminate 1.08% : 0.000009s : 70: predicate.print_const_string_wrapper 0.65% : 0.000005s : 34: predicate.reduce_all_const_elim 1.55% : 0.000013s : 70: predicate.reduce_eliminate 2.68% : 0.000022s : 174: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 34: predicate.remove_not_recompute_node 1.78% : 0.000015s : 157: predicate.replace_applicator 0.60% : 0.000005s : 57: predicate.replace_old_param 0.11% : 0.000001s : 9: predicate.reset_defer_inline 1.10% : 0.000009s : 70: predicate.reshape_eliminate 1.47% : 0.000012s : 71: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 9: predicate.row_tensor_eliminate 1.20% : 0.000010s : 71: predicate.same_eliminate 0.35% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.68% : 0.000006s : 34: predicate.shard_identity_eliminate 0.33% : 0.000003s : 18: predicate.special_op_eliminate 0.62% : 0.000005s : 34: predicate.specialize_transform 1.22% : 0.000010s : 71: predicate.split_environ_get_set_with_tuple_value 1.28% : 0.000011s : 57: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 9: predicate.switch_call_monad_eliminater 1.70% : 0.000014s : 103: predicate.switch_defer_inline 2.86% : 0.000024s : 174: predicate.switch_layer_defer_inline 5.21% : 0.000043s : 284: predicate.switch_simplify 1.11% : 0.000009s : 70: predicate.tile_eliminate 1.07% : 0.000009s : 70: predicate.transpose_eliminate 1.44% : 0.000012s : 88: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000013s : 88: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000011s : 88: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000024s : 138: predicate.tuple_list_get_item_eliminator 1.41% : 0.000012s : 88: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000017s : 122: predicate.tuple_list_set_item_eliminator 1.66% : 0.000014s : 104: predicate.tuple_to_list_eliminator_ 2.47% : 0.000021s : 174: predicate.updatestate_pure_node_eliminater 3.20% : 0.000027s : 208: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 9: predicate.value_based_eliminate 0.54% : 0.000005s : 34: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 34: predicate.virtual_output_eliminate 0.14% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.18% : 0.000002s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001717 34 56.04% : 0.000962s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.96% : 0.000755s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 6.786027 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.04% : 0.002885s : 1: add_attr 0.04% : 0.002877s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.00% : 0.000069s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000104s : 1: auto_monad 0.00% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.01% : 0.000578s : 1: bootstrap 0.00% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000023s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000004s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.00% : 0.000056s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.00% : 0.000011s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000004s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.01% : 0.000491s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.01% : 0.000723s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000022s : 1: opt.transform.mutable_eliminate 0.12% : 0.007953s : 117: opt.transform.opt_a 0.00% : 0.000054s : 1: opt.transform.opt_after_cconv 0.00% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000201s : 28: opt.transform.opt_b 0.00% : 0.000085s : 2: opt.transform.opt_trans_graph 0.00% : 0.000060s : 4: opt.transform.symbol_engine_opt 0.22% : 0.015195s : 1: opt_a 0.00% : 0.000158s : 1: opt_after_cconv 0.01% : 0.000533s : 1: opt_after_jit_grad 0.01% : 0.000355s : 1: opt_b 0.56% : 0.037807s : 1: optimize 0.00% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000011s : 1: order_py_execute_after_rewriter 0.00% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000050s : 1: pre_auto_parallel 0.00% : 0.000040s : 1: py_interpret_to_execute 0.00% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000044s : 1: remove_dup_value 0.03% : 0.002067s : 2: renormalize.infer 0.02% : 0.001635s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000061s : 1: rewriter_after_opt_a 0.29% : 0.020015s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000110s : 1: symbol_engine_optimizer 98.02% : 6.651442s : 1: task_emit 0.00% : 0.000116s : 1: tuple_transform 0.58% : 0.039605s : 1: type_inference 0.00% : 0.000090s : 1: validate TotalTime = 0.095278, [24] [bootstrap]: 0.00047553 [type_inference]: 0.0045066 [event_method]: 1.051e-05 [auto_monad]: 5.278e-05 [graph_reusing]: 5.28002e-06 [inline]: 2.16e-06 [add_attr]: 0.00309035, [1] [add_attr_with_inline]: 0.00308205, [1] [Cycle 1]: 4.551e-05, [2] [tag_attr]: 1.367e-05 [meta_addattr_fg_expand]: 3.36001e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 2.258e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00391862, [53] [py_interpret_to_execute]: 1.534e-05 [rewriter_before_opt_a]: 4.038e-05 [opt_a]: 0.00197051, [2] [Cycle 1]: 0.00136139, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 2.61e-05 [loop_unroll]: 1.35e-05 [a_1]: 0.00029428 [with_stream_mark]: 1.428e-05 [recompute_prepare]: 8.03999e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.71e-05 [accelerated_algorithm]: 6.60002e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 6.21998e-06 [parallel]: 1.991e-05 [flash_sp]: 7.72002e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.71e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.86001e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 9.94001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 9.82999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 2.94001e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.064e-05 [a_after_grad]: 8.94998e-06 [renormalize]: 0.0004331 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 2.08002e-06 [auto_monad_eliminator]: 1.382e-05 [cse]: 2.863e-05 [a_3]: 4.058e-05 [Cycle 2]: 0.00059981, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012564 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.69999e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 9.99979e-07 [a_2]: 6.755e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.39998e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 5.35001e-06 [parallel]: 4.10998e-06 [flash_sp]: 3.38999e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.81999e-06 [matmul_add_comm_reduction]: 7.15e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.28998e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 3.01999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.21e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.01e-05 [merge_recompute_call_nodes]: 1.09998e-06 [before_grad]: 8.12e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 1.59998e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.41002e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 7.82998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 8.03999e-06 [cse]: 1.433e-05 [a_3]: 3.192e-05 [py_interpret_to_execute_after_opt_a]: 7.93001e-06 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 3.196e-05 [convert_after_rewriter]: 7.53e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00055545 [opt_b]: 0.00018383, [1] [Cycle 1]: 0.00017699, [7] [b_1]: 0.00010811 [b_2]: 7.5e-06 [updatestate_depend_eliminate]: 5.92001e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.69997e-07 [cse]: 1.686e-05 [optimize_parallel_all_gather_comm]: 1.691e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.423e-05 [loop_unroll]: 0.00042425 [opt_after_cconv]: 9.712e-05, [1] [Cycle 1]: 9.164e-05, [7] [c_1]: 2.809e-05 [parameter_eliminate]: 2.96001e-06 [updatestate_depend_eliminate]: 4.87998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.09e-06 [cse]: 1.77e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.266e-05 [tuple_transform]: 7.153e-05, [1] [Cycle 1]: 6.73e-05, [4] [d_1]: 4.089e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.88997e-06 [add_recomputation]: 4.667e-05 [cse_after_recomputation]: 1.974e-05, [1] [Cycle 1]: 1.538e-05, [1] [cse]: 1.043e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 4.39998e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.56e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.33998e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.38002e-06 [overlap_opt_shard_in_pipeline]: 1.39e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.227e-05 [grouped_pairwise_exchange_alltoall]: 1.90001e-06 [offloading_packed_experts]: 3.78999e-06 [overlap_recompute_and_grad_model_parallel]: 4.80999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 4.57e-06 [overlap_grad_flash_sp]: 1.716e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.94e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 7.035e-05, [1] [Cycle 1]: 6.607e-05, [6] [build]: 3.03998e-06 [elim_shapecalc]: 8.56002e-06 [elim_not_effective]: 1.19e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 9.00999e-06 [renormalize]: 1.70025e-07 [detach_backward]: 1.56998e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.637e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 4.05e-06 [opt_after_jit_grad]: 0.00047796 [validate]: 3.615e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0824217 [execute]: 1.052e-05 Sums bootstrap : 0.000476s : 0.52% type_inference : 0.004507s : 4.94% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000420s : 0.46% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000433s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000043s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000555s : 0.61% optimize.opt_b.b_1 : 0.000108s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000424s : 0.47% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.05% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000478s : 0.52% validate : 0.000036s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.082422s : 90.36% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.94% : 0.000024s : 4: substitution.arithmetic_simplify 1.63% : 0.000002s : 2: substitution.elim_not_effective 1.17% : 0.000001s : 2: substitution.fold_const_symbol 5.05% : 0.000006s : 4: substitution.graph_param_transform 63.98% : 0.000080s : 2: substitution.inline 2.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004463 2 91.74% : 0.004095s : 1: type_inference.infer 8.26% : 0.000369s : 1: type_inference.specialize ------[replace.] 0.000017 2 100.00% : 0.000017s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.22% : 0.000003s : 17: predicate.arithmetic_simplify 0.72% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.69% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.30% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.34% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.47% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.39% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.30% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.82% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 1.07% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.63% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.95% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000258 6 41.46% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.54% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.103651 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.99% : 0.003094s : 1: add_attr 2.98% : 0.003086s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.48% : 0.000502s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.42% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000565s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.75% : 0.000775s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000045s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.90% : 0.001973s : 1: opt_a 0.10% : 0.000101s : 1: opt_after_cconv 0.47% : 0.000488s : 1: opt_after_jit_grad 0.18% : 0.000187s : 1: opt_b 3.78% : 0.003923s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000254s : 1: renormalize.infer 0.17% : 0.000172s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000045s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000073s : 1: symbol_engine_optimizer 79.54% : 0.082446s : 1: task_emit 0.07% : 0.000074s : 1: tuple_transform 4.36% : 0.004522s : 1: type_inference 0.06% : 0.000063s : 1: validate TotalTime = 0.155544, [24] [bootstrap]: 0.00057506 [type_inference]: 0.0114201 [event_method]: 4.818e-05 [auto_monad]: 0.0001207 [graph_reusing]: 8.72e-06 [inline]: 2.68998e-06 [add_attr]: 0.0036035, [1] [add_attr_with_inline]: 0.00359331, [1] [Cycle 1]: 8.059e-05, [2] [tag_attr]: 3.57e-05 [meta_addattr_fg_expand]: 9.01998e-06 [parallel-infer-symbol]: 3.88001e-06 [pre_auto_parallel]: 5.289e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.94999e-06 [optimize]: 0.0150711, [53] [py_interpret_to_execute]: 3.974e-05 [rewriter_before_opt_a]: 0.0001376 [opt_a]: 0.0123505, [3] [Cycle 1]: 0.00780165, [45] [expand_dump_flag]: 4.43001e-06 [switch_simplify]: 6.871e-05 [loop_unroll]: 5.532e-05 [a_1]: 0.00138406 [with_stream_mark]: 2.652e-05 [recompute_prepare]: 2.156e-05 [updatestate_depend_eliminate]: 9.14e-06 [updatestate_assign_eliminate]: 8.38001e-06 [updatestate_loads_eliminate]: 7.26999e-06 [parameter_eliminate]: 2.83e-06 [a_2]: 0.0002483 [accelerated_algorithm]: 3.176e-05 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 3.53e-06 [shard_inline]: 1.663e-05 [merge_send_recv]: 1.704e-05 [auto_parallel]: 1.144e-05 [parallel]: 1.992e-05 [flash_sp]: 1.231e-05 [merge_comm]: 2.088e-05 [allreduce_fusion]: 9.57001e-06 [matmul_add_comm_reduction]: 2.874e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.872e-05 [virtual_dataset]: 1.642e-05 [get_grad_eliminate_]: 1.547e-05 [virtual_output]: 1.537e-05 [merge_forward]: 9.62001e-06 [cell_reuse_recompute_pass]: 1.77001e-06 [offload_activation]: 1.828e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.893e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 2.783e-05 [set_forward_comm_id_for_comm_node_pass]: 9.90002e-06 [meta_fg_expand]: 0.00161272 [flash_sp_send_recv_attached]: 3.97998e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 6.159e-05 [a_after_grad]: 8.4e-05 [renormalize]: 0.00295912 [add_forward_monad_depend]: 1.041e-05 [auto_monad_grad]: 6.89999e-06 [auto_monad_eliminator]: 6.02e-05 [cse]: 0.00018692 [a_3]: 0.00034978 [Cycle 2]: 0.00352214, [45] [expand_dump_flag]: 2.24001e-06 [switch_simplify]: 4.916e-05 [loop_unroll]: 4.579e-05 [a_1]: 0.00162389 [with_stream_mark]: 1.732e-05 [recompute_prepare]: 1.233e-05 [updatestate_depend_eliminate]: 6.68003e-06 [updatestate_assign_eliminate]: 5.30999e-06 [updatestate_loads_eliminate]: 4.67e-06 [parameter_eliminate]: 1.45001e-06 [a_2]: 0.00014133 [accelerated_algorithm]: 1.411e-05 [shard]: 1.66002e-06 [meta_shard_fg_expand]: 2.41998e-06 [shard_inline]: 1.011e-05 [merge_send_recv]: 9.59999e-06 [auto_parallel]: 1.058e-05 [parallel]: 6.70002e-06 [flash_sp]: 3.76001e-06 [merge_comm]: 6.07001e-06 [allreduce_fusion]: 5.77001e-06 [matmul_add_comm_reduction]: 1.071e-05 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 1.251e-05 [virtual_dataset]: 9.97999e-06 [get_grad_eliminate_]: 1.012e-05 [virtual_output]: 9.63997e-06 [merge_forward]: 5.43002e-06 [cell_reuse_recompute_pass]: 9.80013e-07 [offload_activation]: 1.384e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.925e-05 [merge_recompute_call_nodes]: 1.17e-06 [before_grad]: 1.65e-05 [set_forward_comm_id_for_comm_node_pass]: 4.652e-05 [meta_fg_expand]: 5.158e-05 [flash_sp_send_recv_attached]: 1.45001e-06 [receive_attached]: 1.69e-06 [after_resolve]: 1.795e-05 [a_after_grad]: 1.639e-05 [renormalize]: 0.00088084 [add_forward_monad_depend]: 4.95999e-06 [auto_monad_grad]: 2.11e-06 [auto_monad_eliminator]: 1.981e-05 [cse]: 6.223e-05 [a_3]: 7.537e-05 [Cycle 3]: 0.00100948, [45] [expand_dump_flag]: 1.16002e-06 [switch_simplify]: 1.18e-05 [loop_unroll]: 1.003e-05 [a_1]: 0.00029224 [with_stream_mark]: 1.217e-05 [recompute_prepare]: 1.05e-05 [updatestate_depend_eliminate]: 5.54998e-06 [updatestate_assign_eliminate]: 4.47998e-06 [updatestate_loads_eliminate]: 4.53999e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.00013708 [accelerated_algorithm]: 1.3e-05 [shard]: 8.99978e-07 [meta_shard_fg_expand]: 1.98997e-06 [shard_inline]: 1.04e-05 [merge_send_recv]: 7.72998e-06 [auto_parallel]: 7.98999e-06 [parallel]: 5.95002e-06 [flash_sp]: 1.02e-06 [merge_comm]: 5.99999e-06 [allreduce_fusion]: 5.36998e-06 [matmul_add_comm_reduction]: 9.40001e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 1.142e-05 [virtual_dataset]: 1.019e-05 [get_grad_eliminate_]: 9.47999e-06 [virtual_output]: 9.25999e-06 [merge_forward]: 4.75001e-06 [cell_reuse_recompute_pass]: 1.66e-06 [offload_activation]: 1.13e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.799e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.616e-05 [set_forward_comm_id_for_comm_node_pass]: 6.32001e-06 [meta_fg_expand]: 3.69002e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 1.583e-05 [a_after_grad]: 1.745e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.54e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.316e-05 [cse]: 3.283e-05 [a_3]: 6.555e-05 [py_interpret_to_execute_after_opt_a]: 1.682e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 6.297e-05 [convert_after_rewriter]: 1.059e-05 [order_py_execute_after_rewriter]: 7.87e-06 [mutable_eliminate]: 0.00065683 [opt_b]: 0.00032776, [1] [Cycle 1]: 0.00031981, [7] [b_1]: 0.00021346 [b_2]: 1.234e-05 [updatestate_depend_eliminate]: 8.33001e-06 [updatestate_assign_eliminate]: 4.61002e-06 [updatestate_loads_eliminate]: 4.38001e-06 [renormalize]: 1.8999e-07 [cse]: 3.953e-05 [optimize_parallel_all_gather_comm]: 2.364e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.595e-05 [loop_unroll]: 0.00047427 [opt_after_cconv]: 0.00019425, [1] [Cycle 1]: 0.00018811, [7] [c_1]: 9.448e-05 [parameter_eliminate]: 2.49001e-06 [updatestate_depend_eliminate]: 8.33001e-06 [updatestate_assign_eliminate]: 4.75999e-06 [updatestate_loads_eliminate]: 4.25e-06 [cse]: 3.73e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 4.485e-05 [tuple_transform]: 0.0001136, [1] [Cycle 1]: 0.00010862, [4] [d_1]: 7.636e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.157e-05 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 7.2e-05 [cse_after_recomputation]: 3.832e-05, [1] [Cycle 1]: 3.304e-05, [1] [cse]: 2.704e-05 [environ_conv]: 1.096e-05 [swap_dp_allreduce_reducescatter]: 8.51002e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 5.23002e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.30002e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.37999e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 2.034e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 6.14001e-06 [overlap_recompute_and_grad_model_parallel]: 6.51999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55999e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 5.97999e-06 [overlap_grad_flash_sp]: 3.002e-05 [begin_end_overlap_inline]: 7.00005e-07 [split_matmul_comm_elemetwise]: 2.29999e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 0.00010835, [1] [Cycle 1]: 0.00010369, [6] [build]: 1.274e-05 [elim_shapecalc]: 1.45e-05 [elim_not_effective]: 2.073e-05 [opt_reshape]: 1.091e-05 [fold_const_symbol]: 1.668e-05 [renormalize]: 2.29978e-07 [detach_backward]: 2.32999e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.647e-05 [get_jit_bprop_graph]: 1.81e-06 [rewriter_after_jit_bprop_graph]: 4.14002e-06 [opt_after_jit_grad]: 0.00051894 [validate]: 6.255e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.123749 [execute]: 9.39e-06 Sums bootstrap : 0.000575s : 0.38% type_inference : 0.011420s : 7.58% event_method : 0.000048s : 0.03% auto_monad : 0.000121s : 0.08% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000053s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.03% optimize.rewriter_before_opt_a : 0.000138s : 0.09% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.09% optimize.opt_a.loop_unroll : 0.000111s : 0.07% optimize.opt_a.a_1 : 0.003300s : 2.19% optimize.opt_a.with_stream_mark : 0.000056s : 0.04% optimize.opt_a.recompute_prepare : 0.000044s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000527s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000059s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000037s : 0.02% optimize.opt_a.merge_send_recv : 0.000034s : 0.02% optimize.opt_a.auto_parallel : 0.000030s : 0.02% optimize.opt_a.parallel : 0.000033s : 0.02% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000033s : 0.02% optimize.opt_a.allreduce_fusion : 0.000021s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000049s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.03% optimize.opt_a.virtual_dataset : 0.000037s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.02% optimize.opt_a.virtual_output : 0.000034s : 0.02% optimize.opt_a.merge_forward : 0.000020s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000043s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000060s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000063s : 0.04% optimize.opt_a.meta_fg_expand : 0.001668s : 1.11% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000095s : 0.06% optimize.opt_a.a_after_grad : 0.000118s : 0.08% optimize.opt_a.renormalize : 0.003840s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000010s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000093s : 0.06% optimize.opt_a.cse : 0.000282s : 0.19% optimize.opt_a.a_3 : 0.000491s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000063s : 0.04% optimize.convert_after_rewriter : 0.000011s : 0.01% optimize.order_py_execute_after_rewriter : 0.000008s : 0.01% optimize.mutable_eliminate : 0.000657s : 0.44% optimize.opt_b.b_1 : 0.000213s : 0.14% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000040s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.02% optimize.loop_unroll : 0.000474s : 0.31% optimize.opt_after_cconv.c_1 : 0.000094s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000037s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000045s : 0.03% optimize.tuple_transform.d_1 : 0.000076s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000012s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000072s : 0.05% optimize.cse_after_recomputation.cse : 0.000027s : 0.02% optimize.environ_conv : 0.000011s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000030s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000519s : 0.34% validate : 0.000063s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.123749s : 82.17% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000834 227 5.91% : 0.000049s : 11: substitution.arithmetic_simplify 2.65% : 0.000022s : 4: substitution.cast_eliminate 0.43% : 0.000004s : 6: substitution.elim_not_effective 0.57% : 0.000005s : 5: substitution.float_depend_g_call 0.55% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 6: substitution.fold_const_symbol 1.04% : 0.000009s : 9: substitution.graph_param_transform 0.41% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.44% : 0.000462s : 16: substitution.inline 2.09% : 0.000017s : 2: substitution.inline_without_move 1.43% : 0.000012s : 22: substitution.j_node_and_user_rematch 2.07% : 0.000017s : 3: substitution.less_batch_normalization 1.63% : 0.000014s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.87% : 0.000016s : 22: substitution.remove_not_recompute_node 3.16% : 0.000026s : 10: substitution.replace_applicator 1.60% : 0.000013s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.52% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.70% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.79% : 0.000065s : 28: substitution.tuple_list_get_item_eliminator 2.29% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011341 2 87.06% : 0.009873s : 1: type_inference.infer 12.94% : 0.001467s : 1: type_inference.specialize ------[replace.] 0.000207 30 58.88% : 0.000122s : 16: replace.inline 41.12% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000485 30 93.59% : 0.000454s : 16: match.inline 6.41% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000786 5897 1.08% : 0.000008s : 69: predicate.accumulaten_eliminater 0.34% : 0.000003s : 9: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 34: predicate.addn_check_dump 1.09% : 0.000009s : 69: predicate.addn_zero_filter 1.05% : 0.000008s : 69: predicate.adjust_all_reduce_mul_add 2.13% : 0.000017s : 103: predicate.arithmetic_simplify 1.18% : 0.000009s : 69: predicate.cast_eliminate 1.16% : 0.000009s : 71: predicate.check_bprop_eliminate 0.56% : 0.000004s : 34: predicate.compare_switch_simplify 0.10% : 0.000001s : 9: predicate.const_output_eliminate 0.56% : 0.000004s : 34: predicate.depend_value_elim 1.16% : 0.000009s : 69: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 69: predicate.dict_get_item_eliminator 1.12% : 0.000009s : 69: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 18: predicate.dumpgradient_eliminate 0.12% : 0.000001s : 9: predicate.elim_not_effective 0.22% : 0.000002s : 9: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000010s : 78: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 78: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 78: predicate.environ_get_depend_swap 1.77% : 0.000014s : 112: predicate.environ_get_eliminate 1.19% : 0.000009s : 78: predicate.environ_get_set_eliminate 1.63% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.17% : 0.000017s : 99: predicate.float_depend_g_call 0.53% : 0.000004s : 34: predicate.float_environ_get_switch 0.68% : 0.000005s : 43: predicate.float_tuple_getitem_switch 0.10% : 0.000001s : 9: predicate.fold_const_symbol 0.63% : 0.000005s : 34: predicate.get_grad_eliminate 0.12% : 0.000001s : 9: predicate.graph_param_transform 0.54% : 0.000004s : 34: predicate.incorporate_call 0.51% : 0.000004s : 34: predicate.incorporate_call_switch 5.43% : 0.000043s : 254: predicate.inline 1.28% : 0.000010s : 57: predicate.inline_without_move 0.32% : 0.000003s : 34: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 34: predicate.less_batch_normalization 1.62% : 0.000013s : 101: predicate.list_to_tuple_eliminator_ 2.64% : 0.000021s : 170: predicate.load_eliminater 0.39% : 0.000003s : 9: predicate.loop_unroll_after_grad 2.13% : 0.000017s : 130: predicate.loop_unroll_before_grad 1.47% : 0.000012s : 87: predicate.make_slice_get_slice_eliminator 0.57% : 0.000005s : 34: predicate.merge_addn 1.11% : 0.000009s : 71: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 71: predicate.mini_step_allgather_replace 1.09% : 0.000009s : 69: predicate.minmaximum_grad 0.42% : 0.000003s : 9: predicate.mutable_eliminate 0.17% : 0.000001s : 9: predicate.opt_reshape 0.16% : 0.000001s : 9: predicate.parallel_virtual_node 1.94% : 0.000015s : 99: predicate.partial_defer_inline 1.67% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 69: predicate.print_const_string_wrapper 0.56% : 0.000004s : 34: predicate.reduce_all_const_elim 1.30% : 0.000010s : 69: predicate.reduce_eliminate 2.64% : 0.000021s : 170: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 34: predicate.remove_not_recompute_node 1.86% : 0.000015s : 154: predicate.replace_applicator 0.59% : 0.000005s : 57: predicate.replace_old_param 0.11% : 0.000001s : 9: predicate.reset_defer_inline 1.11% : 0.000009s : 69: predicate.reshape_eliminate 1.17% : 0.000009s : 71: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 9: predicate.row_tensor_eliminate 1.29% : 0.000010s : 71: predicate.same_eliminate 0.36% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.68% : 0.000005s : 34: predicate.shard_identity_eliminate 0.34% : 0.000003s : 18: predicate.special_op_eliminate 0.66% : 0.000005s : 34: predicate.specialize_transform 1.29% : 0.000010s : 71: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 57: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 9: predicate.switch_call_monad_eliminater 1.75% : 0.000014s : 99: predicate.switch_defer_inline 2.83% : 0.000022s : 170: predicate.switch_layer_defer_inline 4.80% : 0.000038s : 272: predicate.switch_simplify 1.08% : 0.000008s : 69: predicate.tile_eliminate 1.10% : 0.000009s : 69: predicate.transpose_eliminate 1.49% : 0.000012s : 87: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000012s : 87: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000011s : 87: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000022s : 135: predicate.tuple_list_get_item_eliminator 1.57% : 0.000012s : 87: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000016s : 121: predicate.tuple_list_set_item_eliminator 1.60% : 0.000013s : 101: predicate.tuple_to_list_eliminator_ 2.59% : 0.000020s : 170: predicate.updatestate_pure_node_eliminater 3.24% : 0.000025s : 204: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 9: predicate.value_based_eliminate 0.58% : 0.000005s : 34: predicate.virtual_dataset_eliminate 0.57% : 0.000005s : 34: predicate.virtual_output_eliminate 0.14% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001773 32 57.75% : 0.001024s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.25% : 0.000749s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.183524 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.97% : 0.003608s : 1: add_attr 1.96% : 0.003598s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000076s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000128s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.33% : 0.000609s : 1: bootstrap 0.02% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000023s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.02% : 0.000041s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000014s : 1: environ_conv 0.03% : 0.000056s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.26% : 0.000484s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000666s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000020s : 1: opt.transform.mutable_eliminate 2.76% : 0.005061s : 117: opt.transform.opt_a 0.05% : 0.000093s : 1: opt.transform.opt_after_cconv 0.02% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000199s : 28: opt.transform.opt_b 0.05% : 0.000085s : 2: opt.transform.opt_trans_graph 0.03% : 0.000059s : 4: opt.transform.symbol_engine_opt 6.73% : 0.012354s : 1: opt_a 0.11% : 0.000198s : 1: opt_after_cconv 0.29% : 0.000529s : 1: opt_after_jit_grad 0.18% : 0.000332s : 1: opt_b 8.21% : 0.015076s : 1: optimize 0.02% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000033s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000058s : 1: pre_auto_parallel 0.02% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000049s : 1: remove_dup_value 1.14% : 0.002098s : 2: renormalize.infer 0.94% : 0.001725s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000067s : 1: rewriter_after_opt_a 0.08% : 0.000142s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000111s : 1: symbol_engine_optimizer 67.44% : 0.123773s : 1: task_emit 0.06% : 0.000117s : 1: tuple_transform 6.24% : 0.011445s : 1: type_inference 0.05% : 0.000099s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x3-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x4-pynative],max_mem:6.0M TotalTime = 0.0721835, [24] [bootstrap]: 0.00055565 [type_inference]: 0.0312655 [event_method]: 1.529e-05 [auto_monad]: 5.894e-05 [graph_reusing]: 5.58002e-06 [inline]: 2.21e-06 [add_attr]: 0.00367553, [1] [add_attr_with_inline]: 0.00366354, [1] [Cycle 1]: 5.127e-05, [2] [tag_attr]: 1.644e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 3.36999e-06 [pre_auto_parallel]: 2.863e-05 [insert-virtual-dataset]: 2.73998e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0287246, [53] [py_interpret_to_execute]: 2.15e-05 [rewriter_before_opt_a]: 6.063e-05 [opt_a]: 0.0264744, [2] [Cycle 1]: 0.00164493, [45] [expand_dump_flag]: 3.33e-06 [switch_simplify]: 3.248e-05 [loop_unroll]: 2.049e-05 [a_1]: 0.00047395 [with_stream_mark]: 1.421e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.603e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.31002e-06 [auto_parallel]: 6.19999e-06 [parallel]: 2.518e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 9.69999e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.34e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 1.012e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 9.47999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.47001e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.71e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.63001e-06 [renormalize]: 0.00051312 [add_forward_monad_depend]: 4.91997e-06 [auto_monad_grad]: 2.33998e-06 [auto_monad_eliminator]: 1.397e-05 [cse]: 2.927e-05 [a_3]: 4.048e-05 [Cycle 2]: 0.0248183, [45] [expand_dump_flag]: 1.35999e-06 [switch_simplify]: 6.72002e-06 [loop_unroll]: 5.57001e-06 [a_1]: 0.00012563 [with_stream_mark]: 1.006e-05 [recompute_prepare]: 5.58002e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 6.768e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.06997e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.53002e-06 [merge_send_recv]: 4.37998e-06 [auto_parallel]: 4.97e-06 [parallel]: 4.53999e-06 [flash_sp]: 3.12002e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 3.11001e-06 [matmul_add_comm_reduction]: 5.49e-06 [allreduce_slice_to_reducescatter]: 1.71998e-06 [virtual_shard_identity]: 3.144e-05 [virtual_dataset]: 6.68998e-06 [get_grad_eliminate_]: 6.04999e-06 [virtual_output]: 6.74001e-06 [merge_forward]: 1.369e-05 [cell_reuse_recompute_pass]: 3.68e-06 [offload_activation]: 2.023e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.442e-05 [merge_recompute_call_nodes]: 2.21e-06 [before_grad]: 9.60001e-06 [set_forward_comm_id_for_comm_node_pass]: 7.11999e-06 [meta_fg_expand]: 3.45e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 3.02002e-06 [after_resolve]: 1.369e-05 [a_after_grad]: 8.88002e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 6.51999e-06 [auto_monad_grad]: 2.95998e-06 [auto_monad_eliminator]: 1.691e-05 [cse]: 3.541e-05 [a_3]: 3.713e-05 [py_interpret_to_execute_after_opt_a]: 1.904e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 4.407e-05 [convert_after_rewriter]: 7.51999e-06 [order_py_execute_after_rewriter]: 5.36002e-06 [mutable_eliminate]: 0.00073481 [opt_b]: 0.00019759, [1] [Cycle 1]: 0.00018804, [7] [b_1]: 0.0001129 [b_2]: 7.93001e-06 [updatestate_depend_eliminate]: 6.06e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 2.96001e-06 [renormalize]: 7.29982e-07 [cse]: 1.849e-05 [optimize_parallel_all_gather_comm]: 1.687e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 3.382e-05 [loop_unroll]: 0.00043061 [opt_after_cconv]: 9.875e-05, [1] [Cycle 1]: 9.267e-05, [7] [c_1]: 2.802e-05 [parameter_eliminate]: 3.64002e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.739e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.484e-05 [tuple_transform]: 7.361e-05, [1] [Cycle 1]: 6.875e-05, [4] [d_1]: 4.215e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 6.79999e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 5.827e-05 [cse_after_recomputation]: 2.223e-05, [1] [Cycle 1]: 1.741e-05, [1] [cse]: 1.219e-05 [environ_conv]: 5.74e-06 [swap_dp_allreduce_reducescatter]: 5.37001e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.58001e-06 [label_fine_grained_interleaved_index]: 3.61999e-06 [merge_cast_opt]: 1.62001e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.78998e-06 [comm_op_add_attrs]: 1.36002e-06 [add_comm_op_reuse_tag]: 9.90025e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06998e-06 [control_data_broadcast_order]: 1.162e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.78001e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.91e-06 [overlap_grad_ring_attention]: 4.17003e-06 [overlap_grad_flash_sp]: 2.148e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.78003e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 7.364e-05, [1] [Cycle 1]: 6.928e-05, [6] [build]: 3.41999e-06 [elim_shapecalc]: 9.02999e-06 [elim_not_effective]: 1.261e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.88998e-06 [renormalize]: 1.59984e-07 [detach_backward]: 2.31e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.702e-05 [get_jit_bprop_graph]: 1.75001e-06 [rewriter_after_jit_bprop_graph]: 0.00016253 [opt_after_jit_grad]: 0.00046068 [validate]: 3.979e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00691527 [execute]: 6.93e-06 Sums bootstrap : 0.000556s : 1.28% type_inference : 0.031265s : 72.04% event_method : 0.000015s : 0.04% auto_monad : 0.000059s : 0.14% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000029s : 0.07% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.05% optimize.rewriter_before_opt_a : 0.000061s : 0.14% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.09% optimize.opt_a.loop_unroll : 0.000026s : 0.06% optimize.opt_a.a_1 : 0.000600s : 1.38% optimize.opt_a.with_stream_mark : 0.000024s : 0.06% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.33% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.03% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.03% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000011s : 0.03% optimize.opt_a.parallel : 0.000030s : 0.07% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.02% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.09% optimize.opt_a.virtual_dataset : 0.000013s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.03% optimize.opt_a.virtual_output : 0.000012s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000030s : 0.07% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000035s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000011s : 0.02% optimize.opt_a.meta_fg_expand : 0.000006s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000024s : 0.06% optimize.opt_a.a_after_grad : 0.000018s : 0.04% optimize.opt_a.renormalize : 0.000513s : 1.18% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.03% optimize.opt_a.auto_monad_grad : 0.000005s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000031s : 0.07% optimize.opt_a.cse : 0.000065s : 0.15% optimize.opt_a.a_3 : 0.000078s : 0.18% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000044s : 0.10% optimize.convert_after_rewriter : 0.000008s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000735s : 1.69% optimize.opt_b.b_1 : 0.000113s : 0.26% optimize.opt_b.b_2 : 0.000008s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.04% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000034s : 0.08% optimize.loop_unroll : 0.000431s : 0.99% optimize.opt_after_cconv.c_1 : 0.000028s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.03% optimize.tuple_transform.d_1 : 0.000042s : 0.10% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.13% optimize.cse_after_recomputation.cse : 0.000012s : 0.03% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000004s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.05% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.04% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000163s : 0.37% opt_after_jit_grad : 0.000461s : 1.06% validate : 0.000040s : 0.09% backend_pass : 0.000001s : 0.00% task_emit : 0.006915s : 15.93% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000184 30 13.94% : 0.000026s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.55% : 0.000007s : 4: substitution.graph_param_transform 65.23% : 0.000120s : 3: substitution.inline 2.21% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.13% : 0.000006s : 4: substitution.remove_not_recompute_node 3.95% : 0.000007s : 4: substitution.replace_old_param 6.05% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.031206 2 98.03% : 0.030593s : 1: type_inference.infer 1.97% : 0.000613s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.59% : 0.000028s : 3: replace.inline 29.41% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000128 5 92.12% : 0.000118s : 3: match.inline 7.88% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000167 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.10% : 0.000004s : 19: predicate.arithmetic_simplify 0.80% : 0.000001s : 11: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.53% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.56% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.01% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.21% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.80% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 5.76% : 0.000010s : 51: predicate.inline 0.99% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.02% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.85% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.50% : 0.000003s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.59% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000002s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 2.49% : 0.000004s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 1.31% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.92% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.88% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.72% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.57% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.24% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.01% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000415 8 48.66% : 0.000202s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.34% : 0.000213s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.106256 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.46% : 0.003680s : 1: add_attr 3.45% : 0.003668s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000064s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.56% : 0.000596s : 1: bootstrap 0.04% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.01% : 0.000012s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.41% : 0.000439s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.70% : 0.000745s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.95% : 0.001009s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000093s : 28: opt.transform.opt_b 0.04% : 0.000047s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 24.92% : 0.026478s : 1: opt_a 0.10% : 0.000102s : 1: opt_after_cconv 0.44% : 0.000470s : 1: opt_after_jit_grad 0.19% : 0.000201s : 1: opt_b 27.04% : 0.028729s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.26% : 0.000272s : 1: renormalize.infer 0.22% : 0.000233s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.16% : 0.000169s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000048s : 1: rewriter_after_opt_a 0.06% : 0.000065s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000076s : 1: symbol_engine_optimizer 6.52% : 0.006927s : 1: task_emit 0.07% : 0.000077s : 1: tuple_transform 29.45% : 0.031287s : 1: type_inference 0.07% : 0.000078s : 1: validate TotalTime = 0.0192074, [24] [bootstrap]: 0.00049147 [type_inference]: 0.00476929 [event_method]: 1.067e-05 [auto_monad]: 5.269e-05 [graph_reusing]: 5.20001e-06 [inline]: 1.62999e-06 [add_attr]: 0.0030382, [1] [add_attr_with_inline]: 0.00303, [1] [Cycle 1]: 5.053e-05, [2] [tag_attr]: 1.307e-05 [meta_addattr_fg_expand]: 3.08e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.219e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00396558, [53] [py_interpret_to_execute]: 1.585e-05 [rewriter_before_opt_a]: 4.024e-05 [opt_a]: 0.00187485, [2] [Cycle 1]: 0.00127868, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 2.46e-05 [loop_unroll]: 1.345e-05 [a_1]: 0.00029321 [with_stream_mark]: 1.386e-05 [recompute_prepare]: 7.15e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 2.83003e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.755e-05 [accelerated_algorithm]: 6.87002e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.83002e-06 [merge_send_recv]: 8.13001e-06 [auto_parallel]: 6.07999e-06 [parallel]: 2.04e-05 [flash_sp]: 7.82e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.26999e-06 [virtual_dataset]: 6.29999e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 9.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 9.77001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.75e-06 [meta_fg_expand]: 2.18002e-06 [flash_sp_send_recv_attached]: 2.53998e-06 [receive_attached]: 2.78e-06 [after_resolve]: 1.09e-05 [a_after_grad]: 8.95999e-06 [renormalize]: 0.00035619 [add_forward_monad_depend]: 4.61002e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.333e-05 [cse]: 2.789e-05 [a_3]: 3.95e-05 [Cycle 2]: 0.00058689, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.82002e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.00012461 [with_stream_mark]: 1.1e-05 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.67001e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 7.30011e-07 [a_2]: 6.736e-05 [accelerated_algorithm]: 5.41998e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.55999e-06 [auto_parallel]: 5.14e-06 [parallel]: 3.88999e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.15002e-06 [allreduce_fusion]: 2.85002e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 5.92001e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.83997e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.03001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.04003e-06 [after_resolve]: 8.97999e-06 [a_after_grad]: 8.07998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.53e-06 [cse]: 1.257e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 7.01001e-06 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 3.219e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00044829 [opt_b]: 0.00019307, [1] [Cycle 1]: 0.00018706, [7] [b_1]: 0.0001191 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.17001e-06 [renormalize]: 5.39992e-07 [cse]: 1.583e-05 [optimize_parallel_all_gather_comm]: 1.624e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 2.206e-05 [loop_unroll]: 0.00068421 [opt_after_cconv]: 9.631e-05, [1] [Cycle 1]: 9.029e-05, [7] [c_1]: 2.849e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.627e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.252e-05 [tuple_transform]: 6.89e-05, [1] [Cycle 1]: 6.474e-05, [4] [d_1]: 3.945e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 4.373e-05 [cse_after_recomputation]: 1.986e-05, [1] [Cycle 1]: 1.557e-05, [1] [cse]: 1.059e-05 [environ_conv]: 4.47e-06 [swap_dp_allreduce_reducescatter]: 5.17999e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.65001e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.06998e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.138e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.92e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44998e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.759e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.758e-05, [1] [Cycle 1]: 6.313e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 7.93999e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 6.11998e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044784 [validate]: 3.156e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00613189 [execute]: 6.96999e-06 Sums bootstrap : 0.000491s : 3.23% type_inference : 0.004769s : 31.35% event_method : 0.000011s : 0.07% auto_monad : 0.000053s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000418s : 2.75% optimize.opt_a.with_stream_mark : 0.000025s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.95% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000356s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 2.95% optimize.opt_b.b_1 : 0.000119s : 0.78% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000684s : 4.50% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 2.94% validate : 0.000032s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006132s : 40.31% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.51% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.21% : 0.000001s : 2: substitution.fold_const_symbol 4.57% : 0.000006s : 4: substitution.graph_param_transform 64.70% : 0.000078s : 2: substitution.inline 2.40% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.91% : 0.000005s : 4: substitution.remove_not_recompute_node 3.25% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004725 2 92.42% : 0.004367s : 1: type_inference.infer 7.58% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.75% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 2.18% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.72% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.27% : 0.000009s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.32% : 0.000002s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.55% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.85% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 1.06% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.24% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.69% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.01% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.14% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.86% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027509 196 0.02% : 0.000004s : 1: ForceFp32Comm 11.06% : 0.003043s : 1: add_attr 11.03% : 0.003034s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000058s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000529s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 2.52% : 0.000694s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.66% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.80% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.37% : 0.000102s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.83% : 0.001878s : 1: opt_a 0.36% : 0.000100s : 1: opt_after_cconv 1.66% : 0.000457s : 1: opt_after_jit_grad 0.71% : 0.000197s : 1: opt_b 14.43% : 0.003969s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000197s : 1: renormalize.infer 0.56% : 0.000153s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 22.33% : 0.006142s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 17.39% : 0.004783s : 1: type_inference 0.21% : 0.000059s : 1: validate TotalTime = 0.0206744, [24] [bootstrap]: 0.00047583 [type_inference]: 0.00622765 [event_method]: 1.458e-05 [auto_monad]: 5.499e-05 [graph_reusing]: 5.75001e-06 [inline]: 1.96e-06 [add_attr]: 0.003049, [1] [add_attr_with_inline]: 0.00304145, [1] [Cycle 1]: 4.743e-05, [2] [tag_attr]: 1.643e-05 [meta_addattr_fg_expand]: 3.97e-06 [parallel-infer-symbol]: 3.16001e-06 [pre_auto_parallel]: 2.486e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00398751, [53] [py_interpret_to_execute]: 1.885e-05 [rewriter_before_opt_a]: 5.799e-05 [opt_a]: 0.00213518, [2] [Cycle 1]: 0.00152633, [45] [expand_dump_flag]: 2.65002e-06 [switch_simplify]: 3.09e-05 [loop_unroll]: 2.073e-05 [a_1]: 0.00044439 [with_stream_mark]: 1.362e-05 [recompute_prepare]: 8.23001e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.31001e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.897e-05 [accelerated_algorithm]: 6.75998e-06 [shard]: 2.47001e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 8.44002e-06 [auto_parallel]: 6.34001e-06 [parallel]: 1.743e-05 [flash_sp]: 7.28999e-06 [merge_comm]: 4.31002e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.24e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.64002e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.55e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 8.64998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 9.57001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.88003e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 9.11002e-06 [renormalize]: 0.00042113 [add_forward_monad_depend]: 4.4e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.385e-05 [cse]: 2.636e-05 [a_3]: 4.041e-05 [Cycle 2]: 0.00059939, [45] [expand_dump_flag]: 8.00006e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012515 [with_stream_mark]: 9.71998e-06 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 7.90023e-07 [a_2]: 6.843e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.27999e-06 [parallel]: 4.27998e-06 [flash_sp]: 2.88e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.55001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61998e-06 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 7.93999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89001e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 8.14997e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 6.63e-06 [cse]: 1.66e-05 [a_3]: 3.486e-05 [py_interpret_to_execute_after_opt_a]: 8.12e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 3.206e-05 [convert_after_rewriter]: 6.68998e-06 [order_py_execute_after_rewriter]: 5.62001e-06 [mutable_eliminate]: 0.00046524 [opt_b]: 0.00018015, [1] [Cycle 1]: 0.00017373, [7] [b_1]: 0.00010604 [b_2]: 7.58001e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 5.40022e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.624e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.232e-05 [loop_unroll]: 0.00041464 [opt_after_cconv]: 9.429e-05, [1] [Cycle 1]: 8.856e-05, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 5.21002e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.585e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.332e-05 [tuple_transform]: 7.037e-05, [1] [Cycle 1]: 6.623e-05, [4] [d_1]: 3.973e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.63998e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.302e-05 [cse_after_recomputation]: 2.005e-05, [1] [Cycle 1]: 1.558e-05, [1] [cse]: 1.034e-05 [environ_conv]: 4.75001e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.70002e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.38998e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.18002e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.35001e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 5.05001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.63998e-06 [overlap_grad_ring_attention]: 3.72998e-06 [overlap_grad_flash_sp]: 1.78e-05 [begin_end_overlap_inline]: 4.70027e-07 [split_matmul_comm_elemetwise]: 2.13002e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.21002e-06 [symbol_engine_optimizer]: 6.815e-05, [1] [Cycle 1]: 6.401e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.65001e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 8.85001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 1.576e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.34001e-06 [opt_after_jit_grad]: 0.00045912 [validate]: 3.063e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00605894 [execute]: 7.35e-06 Sums bootstrap : 0.000476s : 2.87% type_inference : 0.006228s : 37.51% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000570s : 3.43% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000147s : 0.89% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000014s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000421s : 2.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.26% optimize.opt_a.a_3 : 0.000075s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000465s : 2.80% optimize.opt_b.b_1 : 0.000106s : 0.64% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000415s : 2.50% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.26% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000459s : 2.77% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006059s : 36.49% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.71% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.45% : 0.000006s : 4: substitution.graph_param_transform 66.68% : 0.000109s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.62% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006186 2 91.09% : 0.005635s : 1: type_inference.infer 8.91% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.04% : 0.000027s : 3: replace.inline 29.96% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.67% : 0.000107s : 3: match.inline 8.33% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.87% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.31% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.22% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.85% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.20% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000348 8 46.37% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.63% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029230 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.45% : 0.003053s : 1: add_attr 10.42% : 0.003045s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.36% : 0.000104s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.75% : 0.000511s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.45% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.62% : 0.000475s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.22% : 0.000941s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.31% : 0.002138s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.60% : 0.000468s : 1: opt_after_jit_grad 0.63% : 0.000183s : 1: opt_b 13.65% : 0.003991s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.74% : 0.000215s : 1: renormalize.infer 0.68% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.76% : 0.006069s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 21.36% : 0.006242s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0424621, [24] [bootstrap]: 0.00056661 [type_inference]: 0.0151314 [event_method]: 4.778e-05 [auto_monad]: 0.00012182 [graph_reusing]: 8.27003e-06 [inline]: 2.09999e-06 [add_attr]: 0.003286, [1] [add_attr_with_inline]: 0.0032777, [1] [Cycle 1]: 7.204e-05, [2] [tag_attr]: 3.491e-05 [meta_addattr_fg_expand]: 9.37999e-06 [parallel-infer-symbol]: 3.38e-06 [pre_auto_parallel]: 5.127e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0139843, [53] [py_interpret_to_execute]: 3.904e-05 [rewriter_before_opt_a]: 0.00014764 [opt_a]: 0.01159, [3] [Cycle 1]: 0.00762475, [45] [expand_dump_flag]: 4.36002e-06 [switch_simplify]: 7.284e-05 [loop_unroll]: 6.124e-05 [a_1]: 0.00152297 [with_stream_mark]: 2.326e-05 [recompute_prepare]: 2.203e-05 [updatestate_depend_eliminate]: 8.99e-06 [updatestate_assign_eliminate]: 8.18999e-06 [updatestate_loads_eliminate]: 7.35e-06 [parameter_eliminate]: 2.44001e-06 [a_2]: 0.00024739 [accelerated_algorithm]: 3.078e-05 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 3.45e-06 [shard_inline]: 1.623e-05 [merge_send_recv]: 1.59e-05 [auto_parallel]: 1.122e-05 [parallel]: 1.927e-05 [flash_sp]: 1.177e-05 [merge_comm]: 9.89001e-06 [allreduce_fusion]: 8.81997e-06 [matmul_add_comm_reduction]: 2.658e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.783e-05 [virtual_dataset]: 1.565e-05 [get_grad_eliminate_]: 1.529e-05 [virtual_output]: 1.522e-05 [merge_forward]: 9.52001e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 1.843e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.901e-05 [merge_recompute_call_nodes]: 1.29998e-06 [before_grad]: 2.8e-05 [set_forward_comm_id_for_comm_node_pass]: 9.67001e-06 [meta_fg_expand]: 0.0017628 [flash_sp_send_recv_attached]: 4.32e-06 [receive_attached]: 2.51998e-06 [after_resolve]: 6.25e-05 [a_after_grad]: 8.225e-05 [renormalize]: 0.0025642 [add_forward_monad_depend]: 9.52001e-06 [auto_monad_grad]: 5.66e-06 [auto_monad_eliminator]: 5.606e-05 [cse]: 0.0001627 [a_3]: 0.00033539 [Cycle 2]: 0.00304843, [45] [expand_dump_flag]: 1.69e-06 [switch_simplify]: 4.672e-05 [loop_unroll]: 4.412e-05 [a_1]: 0.00156081 [with_stream_mark]: 1.215e-05 [recompute_prepare]: 1.136e-05 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 3.72998e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012564 [accelerated_algorithm]: 1.166e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.41998e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 7.63001e-06 [parallel]: 6.71e-06 [flash_sp]: 2.94001e-06 [merge_comm]: 4.97999e-06 [allreduce_fusion]: 4.68001e-06 [matmul_add_comm_reduction]: 8.17e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 9.89001e-06 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 8.65999e-06 [virtual_output]: 8.30999e-06 [merge_forward]: 5.19e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.618e-05 [merge_recompute_call_nodes]: 8.80013e-07 [before_grad]: 1.417e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 8.508e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.12999e-06 [after_resolve]: 1.712e-05 [a_after_grad]: 1.459e-05 [renormalize]: 0.00058581 [add_forward_monad_depend]: 4.00998e-06 [auto_monad_grad]: 1.17999e-06 [auto_monad_eliminator]: 1.469e-05 [cse]: 4.632e-05 [a_3]: 6.544e-05 [Cycle 3]: 0.00090218, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 1.07e-05 [loop_unroll]: 9.02e-06 [a_1]: 0.0002466 [with_stream_mark]: 9.91998e-06 [recompute_prepare]: 9.15999e-06 [updatestate_depend_eliminate]: 4.77998e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 4e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 0.00012392 [accelerated_algorithm]: 1.19e-05 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 7.19001e-06 [auto_parallel]: 7.07002e-06 [parallel]: 4.99e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 4.92999e-06 [allreduce_fusion]: 4.79e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 3.49974e-07 [virtual_shard_identity]: 1.005e-05 [virtual_dataset]: 8.85999e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.30999e-06 [merge_forward]: 4.12e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 8.57e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.591e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.05001e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 1.514e-05 [a_after_grad]: 1.478e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 1.071e-05 [cse]: 2.623e-05 [a_3]: 5.98e-05 [py_interpret_to_execute_after_opt_a]: 1.161e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 4.696e-05 [convert_after_rewriter]: 9.00999e-06 [order_py_execute_after_rewriter]: 6.63e-06 [mutable_eliminate]: 0.00048592 [opt_b]: 0.00028674, [1] [Cycle 1]: 0.00028016, [7] [b_1]: 0.00018901 [b_2]: 1.078e-05 [updatestate_depend_eliminate]: 6.93e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 3.93001e-06 [renormalize]: 4.30009e-07 [cse]: 3.087e-05 [optimize_parallel_all_gather_comm]: 2.035e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.008e-05 [loop_unroll]: 0.00051942 [opt_after_cconv]: 0.00013656, [1] [Cycle 1]: 0.00013045, [7] [c_1]: 4.878e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 7.35003e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 3.012e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 3.017e-05 [tuple_transform]: 0.00010008, [1] [Cycle 1]: 9.569e-05, [4] [d_1]: 6.547e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.97001e-06 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 5.907e-05 [cse_after_recomputation]: 3.252e-05, [1] [Cycle 1]: 2.79e-05, [1] [cse]: 2.195e-05 [environ_conv]: 8.13999e-06 [swap_dp_allreduce_reducescatter]: 7.38e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.80001e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.25001e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.85998e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 1.30999e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01e-06 [control_data_broadcast_order]: 1.696e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 5.39e-06 [overlap_recompute_and_grad_model_parallel]: 5.44998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.71e-06 [overlap_grad_ring_attention]: 5.19998e-06 [overlap_grad_flash_sp]: 2.536e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.43e-06 [split_layernorm_comm]: 1.54e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 9.884e-05, [1] [Cycle 1]: 9.451e-05, [6] [build]: 9.35001e-06 [elim_shapecalc]: 1.392e-05 [elim_not_effective]: 1.797e-05 [opt_reshape]: 1.01e-05 [fold_const_symbol]: 1.51e-05 [renormalize]: 1.69995e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.527e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.79002e-06 [opt_after_jit_grad]: 0.0004743 [validate]: 4.496e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00847186 [execute]: 7.12997e-06 Sums bootstrap : 0.000567s : 1.49% type_inference : 0.015131s : 39.92% event_method : 0.000048s : 0.13% auto_monad : 0.000122s : 0.32% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.10% optimize.rewriter_before_opt_a : 0.000148s : 0.39% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.34% optimize.opt_a.loop_unroll : 0.000114s : 0.30% optimize.opt_a.a_1 : 0.003330s : 8.79% optimize.opt_a.with_stream_mark : 0.000045s : 0.12% optimize.opt_a.recompute_prepare : 0.000043s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000497s : 1.31% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.14% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.09% optimize.opt_a.merge_send_recv : 0.000030s : 0.08% optimize.opt_a.auto_parallel : 0.000026s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.04% optimize.opt_a.merge_comm : 0.000020s : 0.05% optimize.opt_a.allreduce_fusion : 0.000018s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.10% optimize.opt_a.virtual_dataset : 0.000033s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.09% optimize.opt_a.virtual_output : 0.000032s : 0.08% optimize.opt_a.merge_forward : 0.000019s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.05% optimize.opt_a.meta_fg_expand : 0.001851s : 4.88% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000095s : 0.25% optimize.opt_a.a_after_grad : 0.000112s : 0.29% optimize.opt_a.renormalize : 0.003150s : 8.31% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.21% optimize.opt_a.cse : 0.000235s : 0.62% optimize.opt_a.a_3 : 0.000461s : 1.22% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.12% optimize.convert_after_rewriter : 0.000009s : 0.02% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000486s : 1.28% optimize.opt_b.b_1 : 0.000189s : 0.50% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.05% optimize.loop_unroll : 0.000519s : 1.37% optimize.opt_after_cconv.c_1 : 0.000049s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.08% optimize.tuple_transform.d_1 : 0.000065s : 0.17% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000059s : 0.16% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000474s : 1.25% validate : 0.000045s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.008472s : 22.35% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000871 222 8.60% : 0.000075s : 12: substitution.arithmetic_simplify 1.58% : 0.000014s : 2: substitution.cast_eliminate 0.30% : 0.000003s : 5: substitution.elim_not_effective 0.43% : 0.000004s : 5: substitution.float_depend_g_call 0.47% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.87% : 0.000008s : 8: substitution.graph_param_transform 0.29% : 0.000002s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 50.28% : 0.000438s : 17: substitution.inline 1.98% : 0.000017s : 2: substitution.inline_without_move 1.21% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.72% : 0.000015s : 3: substitution.less_batch_normalization 1.47% : 0.000013s : 11: substitution.minmaximum_grad 7.72% : 0.000067s : 5: substitution.partial_eliminate 1.56% : 0.000014s : 20: substitution.remove_not_recompute_node 2.80% : 0.000024s : 10: substitution.replace_applicator 1.27% : 0.000011s : 15: substitution.replace_old_param 0.27% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.22% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.58% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.05% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.73% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.14% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.014991 2 89.87% : 0.013471s : 1: type_inference.infer 10.13% : 0.001519s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.60% : 0.000126s : 17: replace.inline 42.40% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000464 33 92.53% : 0.000429s : 17: match.inline 7.47% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000754 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.20% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000043s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.37% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.70% : 0.000013s : 92: predicate.partial_eliminate 1.11% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 68: predicate.reduce_eliminate 2.65% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.97% : 0.000015s : 152: predicate.replace_applicator 0.58% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001601 34 56.82% : 0.000910s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.18% : 0.000691s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068209 237 0.00% : 0.000003s : 1: ForceFp32Comm 4.82% : 0.003290s : 1: add_attr 4.81% : 0.003282s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000129s : 1: auto_monad 0.04% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000603s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000055s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.78% : 0.000530s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000495s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.33% : 0.005003s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000174s : 28: opt.transform.opt_b 0.11% : 0.000073s : 2: opt.transform.opt_trans_graph 0.08% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.00% : 0.011593s : 1: opt_a 0.21% : 0.000140s : 1: opt_after_cconv 0.71% : 0.000484s : 1: opt_after_jit_grad 0.43% : 0.000291s : 1: opt_b 20.51% : 0.013988s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000056s : 1: pre_auto_parallel 0.06% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.53% : 0.001727s : 2: renormalize.infer 2.07% : 0.001410s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000153s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000101s : 1: symbol_engine_optimizer 12.44% : 0.008483s : 1: task_emit 0.15% : 0.000103s : 1: tuple_transform 22.21% : 0.015150s : 1: type_inference 0.12% : 0.000079s : 1: validate TotalTime = 0.0206974, [24] [bootstrap]: 0.00060564 [type_inference]: 0.00547632 [event_method]: 1.124e-05 [auto_monad]: 5.465e-05 [graph_reusing]: 5.04e-06 [inline]: 2.46998e-06 [add_attr]: 0.00325647, [1] [add_attr_with_inline]: 0.00324784, [1] [Cycle 1]: 4.688e-05, [2] [tag_attr]: 1.263e-05 [meta_addattr_fg_expand]: 3.63999e-06 [parallel-infer-symbol]: 3.58e-06 [pre_auto_parallel]: 2.37e-05 [insert-virtual-dataset]: 2.21998e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00389682, [53] [py_interpret_to_execute]: 1.704e-05 [rewriter_before_opt_a]: 4.106e-05 [opt_a]: 0.00198282, [2] [Cycle 1]: 0.0013578, [45] [expand_dump_flag]: 2.44999e-06 [switch_simplify]: 2.385e-05 [loop_unroll]: 1.386e-05 [a_1]: 0.00030057 [with_stream_mark]: 1.401e-05 [recompute_prepare]: 7.20998e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.646e-05 [accelerated_algorithm]: 6.06998e-06 [shard]: 2.69001e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 7.80998e-06 [auto_parallel]: 5.85002e-06 [parallel]: 1.731e-05 [flash_sp]: 7.86001e-06 [merge_comm]: 3.50003e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 9.38997e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 6.82002e-06 [virtual_dataset]: 6.24001e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.028e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.128e-05 [merge_recompute_call_nodes]: 1.56998e-06 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 2.33002e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 8.48001e-06 [renormalize]: 0.00043147 [add_forward_monad_depend]: 4.47e-06 [auto_monad_grad]: 2.37999e-06 [auto_monad_eliminator]: 1.425e-05 [cse]: 2.881e-05 [a_3]: 3.984e-05 [Cycle 2]: 0.00061499, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.75002e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012322 [with_stream_mark]: 9.56998e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.684e-05 [accelerated_algorithm]: 5.33002e-06 [shard]: 1.33002e-06 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.57999e-06 [parallel]: 4.22e-06 [flash_sp]: 2.96999e-06 [merge_comm]: 2.87002e-06 [allreduce_fusion]: 2.65002e-06 [matmul_add_comm_reduction]: 7.00002e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 5.79999e-06 [virtual_dataset]: 5.17999e-06 [get_grad_eliminate_]: 4.95001e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 6.21e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.82001e-06 [flash_sp_send_recv_attached]: 7.49977e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 7.82e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 2.412e-05 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.64999e-06 [cse]: 1.451e-05 [a_3]: 3.185e-05 [py_interpret_to_execute_after_opt_a]: 7.78001e-06 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.148e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00053709 [opt_b]: 0.00018232, [1] [Cycle 1]: 0.00017594, [7] [b_1]: 0.00010786 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.4002e-07 [cse]: 1.686e-05 [optimize_parallel_all_gather_comm]: 1.55e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.36e-05 [loop_unroll]: 0.00041811 [opt_after_cconv]: 9.472e-05, [1] [Cycle 1]: 8.879e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.617e-05 [renormalize]: 3.49974e-07 [remove_dup_value]: 1.211e-05 [tuple_transform]: 6.888e-05, [1] [Cycle 1]: 6.461e-05, [4] [d_1]: 3.866e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.49e-06 [add_recomputation]: 4.503e-05 [cse_after_recomputation]: 2.012e-05, [1] [Cycle 1]: 1.583e-05, [1] [cse]: 1.066e-05 [environ_conv]: 4.76002e-06 [swap_dp_allreduce_reducescatter]: 5.67999e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.34998e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.46998e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.30001e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.63002e-06 [control_data_broadcast_order]: 1.184e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.90001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 3.75e-06 [overlap_grad_flash_sp]: 1.883e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 6.934e-05, [1] [Cycle 1]: 6.509e-05, [6] [build]: 2.76999e-06 [elim_shapecalc]: 8.22e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.539e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.81999e-06 [opt_after_jit_grad]: 0.00045486 [validate]: 3.372e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0066302 [execute]: 7.93001e-06 Sums bootstrap : 0.000606s : 3.68% type_inference : 0.005476s : 33.26% event_method : 0.000011s : 0.07% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000024s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.10% optimize.rewriter_before_opt_a : 0.000041s : 0.25% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.19% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000424s : 2.57% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000432s : 2.62% optimize.opt_a.add_forward_monad_depend : 0.000029s : 0.17% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000043s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000537s : 3.26% optimize.opt_b.b_1 : 0.000108s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000418s : 2.54% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000001s : 0.01% optimize.add_recomputation : 0.000045s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000455s : 2.76% validate : 0.000034s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006630s : 40.26% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000126 26 17.04% : 0.000021s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.05% : 0.000005s : 4: substitution.graph_param_transform 67.52% : 0.000085s : 2: substitution.inline 2.16% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.50% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.005426 2 92.76% : 0.005033s : 1: type_inference.infer 7.24% : 0.000393s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000084 2 100.00% : 0.000084s : 2: match.inline ------[predicate.] 0.000137 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.46% : 0.000001s : 4: predicate.elim_not_effective 0.53% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.36% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.50% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.82% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.01% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.31% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.93% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.50% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.63% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.26% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000273 6 38.68% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.32% : 0.000167s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029207 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.16% : 0.003261s : 1: add_attr 11.13% : 0.003252s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.18% : 0.000636s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.46% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.87% : 0.000547s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.65% : 0.000773s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.80% : 0.001986s : 1: opt_a 0.34% : 0.000098s : 1: opt_after_cconv 1.59% : 0.000465s : 1: opt_after_jit_grad 0.64% : 0.000186s : 1: opt_b 13.36% : 0.003901s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000028s : 1: pre_auto_parallel 0.07% : 0.000021s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.85% : 0.000248s : 1: renormalize.infer 0.61% : 0.000177s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.15% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000072s : 1: symbol_engine_optimizer 22.74% : 0.006641s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 18.82% : 0.005497s : 1: type_inference 0.21% : 0.000062s : 1: validate TotalTime = 0.0376272, [24] [bootstrap]: 0.00054007 [type_inference]: 0.0105433 [event_method]: 4.191e-05 [auto_monad]: 0.00011308 [graph_reusing]: 7.76001e-06 [inline]: 2.02001e-06 [add_attr]: 0.00307328, [1] [add_attr_with_inline]: 0.00306401, [1] [Cycle 1]: 0.00012024, [2] [tag_attr]: 7.691e-05 [meta_addattr_fg_expand]: 9.10001e-06 [parallel-infer-symbol]: 3.04001e-06 [pre_auto_parallel]: 4.746e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0137819, [53] [py_interpret_to_execute]: 3.501e-05 [rewriter_before_opt_a]: 0.0001292 [opt_a]: 0.0111543, [3] [Cycle 1]: 0.00725014, [45] [expand_dump_flag]: 3.71999e-06 [switch_simplify]: 6.631e-05 [loop_unroll]: 5.517e-05 [a_1]: 0.00138198 [with_stream_mark]: 2.291e-05 [recompute_prepare]: 2.158e-05 [updatestate_depend_eliminate]: 9.14e-06 [updatestate_assign_eliminate]: 7.77e-06 [updatestate_loads_eliminate]: 7.46001e-06 [parameter_eliminate]: 2.57001e-06 [a_2]: 0.00024673 [accelerated_algorithm]: 3.136e-05 [shard]: 2.11003e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.659e-05 [merge_send_recv]: 1.613e-05 [auto_parallel]: 1.106e-05 [parallel]: 1.974e-05 [flash_sp]: 1.16e-05 [merge_comm]: 9.95002e-06 [allreduce_fusion]: 9.09998e-06 [matmul_add_comm_reduction]: 2.791e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.803e-05 [virtual_dataset]: 1.604e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.504e-05 [merge_forward]: 9.29998e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 1.809e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.925e-05 [merge_recompute_call_nodes]: 1.66002e-06 [before_grad]: 2.729e-05 [set_forward_comm_id_for_comm_node_pass]: 9.56998e-06 [meta_fg_expand]: 0.00144411 [flash_sp_send_recv_attached]: 4.2e-06 [receive_attached]: 2.92002e-06 [after_resolve]: 6.052e-05 [a_after_grad]: 8.106e-05 [renormalize]: 0.00265879 [add_forward_monad_depend]: 9.25001e-06 [auto_monad_grad]: 6.11e-06 [auto_monad_eliminator]: 5.53e-05 [cse]: 0.00016552 [a_3]: 0.00033588 [Cycle 2]: 0.00299636, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.713e-05 [loop_unroll]: 4.371e-05 [a_1]: 0.00155985 [with_stream_mark]: 1.242e-05 [recompute_prepare]: 1.149e-05 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.72002e-06 [parameter_eliminate]: 1.09e-06 [a_2]: 0.00012632 [accelerated_algorithm]: 1.225e-05 [shard]: 1.43002e-06 [meta_shard_fg_expand]: 2.36e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 7.50998e-06 [auto_parallel]: 7.28e-06 [parallel]: 5.52001e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.74e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 9.78002e-06 [virtual_dataset]: 8.71002e-06 [get_grad_eliminate_]: 8.82e-06 [virtual_output]: 8.40999e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 8.09989e-07 [offload_activation]: 1.063e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.595e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99998e-06 [meta_fg_expand]: 3.742e-05 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.496e-05 [a_after_grad]: 1.402e-05 [renormalize]: 0.00058685 [add_forward_monad_depend]: 3.78999e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.517e-05 [cse]: 4.652e-05 [a_3]: 6.571e-05 [Cycle 3]: 0.00089295, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 1.033e-05 [loop_unroll]: 8.85999e-06 [a_1]: 0.00024853 [with_stream_mark]: 9.66998e-06 [recompute_prepare]: 9.52001e-06 [updatestate_depend_eliminate]: 4.94003e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.00012224 [accelerated_algorithm]: 1.18e-05 [shard]: 8.89995e-07 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 7.44002e-06 [parallel]: 4.77e-06 [flash_sp]: 1.14e-06 [merge_comm]: 5.05001e-06 [allreduce_fusion]: 5.04998e-06 [matmul_add_comm_reduction]: 7.64002e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.67e-06 [virtual_output]: 8.32998e-06 [merge_forward]: 4.34002e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.573e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.371e-05 [set_forward_comm_id_for_comm_node_pass]: 5.47999e-06 [meta_fg_expand]: 2.89001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.316e-05 [a_after_grad]: 1.446e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 1.063e-05 [cse]: 2.583e-05 [a_3]: 5.694e-05 [py_interpret_to_execute_after_opt_a]: 1.257e-05 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 4.944e-05 [convert_after_rewriter]: 9.22001e-06 [order_py_execute_after_rewriter]: 7.05e-06 [mutable_eliminate]: 0.00048852 [opt_b]: 0.00028626, [1] [Cycle 1]: 0.00028001, [7] [b_1]: 0.0001881 [b_2]: 1.069e-05 [updatestate_depend_eliminate]: 7.27997e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 4.2e-06 [renormalize]: 4.50003e-07 [cse]: 3.114e-05 [optimize_parallel_all_gather_comm]: 1.943e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 2.01e-05 [loop_unroll]: 0.00077221 [opt_after_cconv]: 0.0001364, [1] [Cycle 1]: 0.00013008, [7] [c_1]: 4.758e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 7.26001e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.93001e-06 [cse]: 3.058e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 2.906e-05 [tuple_transform]: 0.00010194, [1] [Cycle 1]: 9.735e-05, [4] [d_1]: 6.731e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.74e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 5.912e-05 [cse_after_recomputation]: 3.32e-05, [1] [Cycle 1]: 2.855e-05, [1] [cse]: 2.304e-05 [environ_conv]: 8.65001e-06 [swap_dp_allreduce_reducescatter]: 7.80998e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.80999e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.715e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 5.02999e-06 [overlap_recompute_and_grad_model_parallel]: 5.59e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.53e-06 [overlap_grad_ring_attention]: 5.20001e-06 [overlap_grad_flash_sp]: 2.438e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 9.74e-05, [1] [Cycle 1]: 9.316e-05, [6] [build]: 9.15999e-06 [elim_shapecalc]: 1.331e-05 [elim_not_effective]: 1.772e-05 [opt_reshape]: 1.014e-05 [fold_const_symbol]: 1.452e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.578e-05 [get_jit_bprop_graph]: 1.47999e-06 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.00047376 [validate]: 4.363e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.00868965 [execute]: 7.11999e-06 Sums bootstrap : 0.000540s : 1.62% type_inference : 0.010543s : 31.63% event_method : 0.000042s : 0.13% auto_monad : 0.000113s : 0.34% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000077s : 0.23% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000129s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.37% optimize.opt_a.loop_unroll : 0.000108s : 0.32% optimize.opt_a.a_1 : 0.003190s : 9.57% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000495s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000030s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000038s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001484s : 4.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.003246s : 9.74% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000238s : 0.71% optimize.opt_a.a_3 : 0.000459s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000489s : 1.47% optimize.opt_b.b_1 : 0.000188s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000772s : 2.32% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.18% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000474s : 1.42% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008690s : 26.07% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000774 218 5.71% : 0.000044s : 11: substitution.arithmetic_simplify 1.80% : 0.000014s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.69% : 0.000439s : 16: substitution.inline 2.02% : 0.000016s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.79% : 0.000014s : 20: substitution.remove_not_recompute_node 3.04% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.72% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.16% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010462 2 87.27% : 0.009131s : 1: type_inference.infer 12.73% : 0.001331s : 1: type_inference.specialize ------[replace.] 0.000205 30 59.29% : 0.000121s : 16: replace.inline 40.71% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000462 30 93.29% : 0.000431s : 16: match.inline 6.71% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.07% : 0.000015s : 99: predicate.arithmetic_simplify 1.17% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.22% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.15% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.18% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000009s : 67: predicate.reduce_eliminate 2.62% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 149: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.67% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.12% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.60% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.30% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001536 32 57.51% : 0.000884s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.49% : 0.000653s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062891 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.89% : 0.003078s : 1: add_attr 4.88% : 0.003068s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000120s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.92% : 0.000576s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 1.24% : 0.000782s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000497s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.70% : 0.004840s : 117: opt.transform.opt_a 0.07% : 0.000046s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.74% : 0.011157s : 1: opt_a 0.22% : 0.000140s : 1: opt_after_cconv 0.77% : 0.000484s : 1: opt_after_jit_grad 0.46% : 0.000290s : 1: opt_b 21.92% : 0.013787s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000052s : 1: pre_auto_parallel 0.06% : 0.000039s : 1: py_interpret_to_execute 0.03% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.90% : 0.001824s : 2: renormalize.infer 2.24% : 0.001408s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000054s : 1: rewriter_after_opt_a 0.21% : 0.000134s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.83% : 0.008701s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 16.79% : 0.010560s : 1: type_inference 0.13% : 0.000079s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x4-kbk],max_mem:6.0M . TotalTime = 2.41926, [24] [bootstrap]: 0.00056051 [type_inference]: 0.00659647 [event_method]: 1.355e-05 [auto_monad]: 5.321e-05 [graph_reusing]: 5.34e-06 [inline]: 1.84998e-06 [add_attr]: 0.00362477, [1] [add_attr_with_inline]: 0.00361373, [1] [Cycle 1]: 4.447e-05, [2] [tag_attr]: 1.534e-05 [meta_addattr_fg_expand]: 3.76999e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.841e-05 [insert-virtual-dataset]: 2.17001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.0040066, [53] [py_interpret_to_execute]: 2.003e-05 [rewriter_before_opt_a]: 5.931e-05 [opt_a]: 0.00212288, [2] [Cycle 1]: 0.0015291, [45] [expand_dump_flag]: 3.18e-06 [switch_simplify]: 3.24e-05 [loop_unroll]: 2.067e-05 [a_1]: 0.00045641 [with_stream_mark]: 1.264e-05 [recompute_prepare]: 7.74002e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.63e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.699e-05 [accelerated_algorithm]: 6.74001e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.57002e-06 [merge_send_recv]: 7.91001e-06 [auto_parallel]: 5.94999e-06 [parallel]: 2.692e-05 [flash_sp]: 7.03e-06 [merge_comm]: 3.32002e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.45001e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.63999e-06 [virtual_dataset]: 6.19001e-06 [get_grad_eliminate_]: 5.38002e-06 [virtual_output]: 5.59e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 9.5999e-07 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 8.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.006e-05 [a_after_grad]: 9.10999e-06 [renormalize]: 0.0004249 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.597e-05 [a_3]: 4.082e-05 [Cycle 2]: 0.00058441, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.43002e-06 [a_1]: 0.00012531 [with_stream_mark]: 9.13002e-06 [recompute_prepare]: 5.55001e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.60002e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.751e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.51e-06 [parallel]: 4.17e-06 [flash_sp]: 3.46001e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.21998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.08002e-06 [virtual_dataset]: 5.18002e-06 [get_grad_eliminate_]: 4.82e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.53002e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 8.25e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.32e-06 [after_resolve]: 9.57999e-06 [a_after_grad]: 8.08999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 5.96e-06 [cse]: 1.219e-05 [a_3]: 3.102e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.71998e-06 [rewriter_after_opt_a]: 3.069e-05 [convert_after_rewriter]: 6.48998e-06 [order_py_execute_after_rewriter]: 4.78001e-06 [mutable_eliminate]: 0.00048729 [opt_b]: 0.00018621, [1] [Cycle 1]: 0.00018016, [7] [b_1]: 0.00011275 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.19997e-07 [cse]: 1.577e-05 [optimize_parallel_all_gather_comm]: 1.539e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.388e-05 [loop_unroll]: 0.00041426 [opt_after_cconv]: 9.406e-05, [1] [Cycle 1]: 8.814e-05, [7] [c_1]: 2.75e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.539e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.877e-05, [1] [Cycle 1]: 6.436e-05, [4] [d_1]: 3.897e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.23002e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.222e-05 [cse_after_recomputation]: 1.969e-05, [1] [Cycle 1]: 1.54e-05, [1] [cse]: 1.026e-05 [environ_conv]: 4.13999e-06 [swap_dp_allreduce_reducescatter]: 4.83001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.57998e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.01998e-06 [micro_interleaved_order_control]: 2.10002e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 9.5999e-07 [remove_cast_before_assign_add]: 7.59988e-07 [full_micro_interleaved_order_control]: 2.44999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.208e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.69998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.718e-05 [begin_end_overlap_inline]: 7.2e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.954e-05, [1] [Cycle 1]: 6.521e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.73001e-06 [elim_not_effective]: 1.153e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 9.22999e-06 [renormalize]: 2.59985e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.564e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.47997e-06 [opt_after_jit_grad]: 0.00045059 [validate]: 3.211e-05 [backend_pass]: 1.04998e-06 [task_emit]: 2.40363 [execute]: 1.009e-05 Sums bootstrap : 0.000561s : 0.02% type_inference : 0.006596s : 0.27% event_method : 0.000014s : 0.00% auto_monad : 0.000053s : 0.00% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000582s : 0.02% optimize.opt_a.with_stream_mark : 0.000022s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000031s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000010s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000425s : 0.02% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000038s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000006s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000487s : 0.02% optimize.opt_b.b_1 : 0.000113s : 0.00% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000414s : 0.02% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.00% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000451s : 0.02% validate : 0.000032s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 2.403628s : 99.54% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000167 30 14.68% : 0.000025s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000002s : 2: substitution.fold_const_symbol 3.16% : 0.000005s : 4: substitution.graph_param_transform 67.26% : 0.000113s : 3: substitution.inline 1.60% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 6.32% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006550 2 91.50% : 0.005994s : 1: type_inference.infer 8.50% : 0.000556s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.41% : 0.000027s : 3: replace.inline 29.59% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 92.07% : 0.000110s : 3: match.inline 7.93% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.36% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.89% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000002s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.87% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.76% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.46% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.89% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000353 8 48.15% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.85% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 2.428415 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.15% : 0.003629s : 1: add_attr 0.15% : 0.003617s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000059s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.02% : 0.000601s : 1: bootstrap 0.00% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.02% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.02% : 0.000497s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.04% : 0.000949s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.09% : 0.002126s : 1: opt_a 0.00% : 0.000097s : 1: opt_after_cconv 0.02% : 0.000460s : 1: opt_after_jit_grad 0.01% : 0.000190s : 1: opt_b 0.17% : 0.004010s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.01% : 0.000220s : 1: renormalize.infer 0.01% : 0.000198s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.00% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000072s : 1: symbol_engine_optimizer 98.98% : 2.403652s : 1: task_emit 0.00% : 0.000072s : 1: tuple_transform 0.27% : 0.006610s : 1: type_inference 0.00% : 0.000054s : 1: validate TotalTime = 0.443029, [24] [bootstrap]: 0.00051515 [type_inference]: 0.00454609 [event_method]: 1.029e-05 [auto_monad]: 5.122e-05 [graph_reusing]: 5.14998e-06 [inline]: 2.06998e-06 [add_attr]: 0.00300675, [1] [add_attr_with_inline]: 0.00299899, [1] [Cycle 1]: 4.416e-05, [2] [tag_attr]: 1.169e-05 [meta_addattr_fg_expand]: 3.13e-06 [parallel-infer-symbol]: 2.74001e-06 [pre_auto_parallel]: 2.086e-05 [insert-virtual-dataset]: 2.72001e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00372873, [53] [py_interpret_to_execute]: 1.468e-05 [rewriter_before_opt_a]: 3.699e-05 [opt_a]: 0.00184503, [2] [Cycle 1]: 0.00125268, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 2.493e-05 [loop_unroll]: 1.353e-05 [a_1]: 0.00029375 [with_stream_mark]: 1.47e-05 [recompute_prepare]: 7.21999e-06 [updatestate_depend_eliminate]: 3.50003e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.51e-05 [accelerated_algorithm]: 7.00998e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 7.88999e-06 [auto_parallel]: 5.77999e-06 [parallel]: 1.643e-05 [flash_sp]: 7.1e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.90001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.70001e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.109e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 8.92999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.01998e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.55997e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.48001e-06 [renormalize]: 0.00034571 [add_forward_monad_depend]: 4.17e-06 [auto_monad_grad]: 1.86003e-06 [auto_monad_eliminator]: 1.314e-05 [cse]: 2.704e-05 [a_3]: 3.929e-05 [Cycle 2]: 0.00058303, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 6.46999e-06 [loop_unroll]: 5.24e-06 [a_1]: 0.00012369 [with_stream_mark]: 9.93998e-06 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.33002e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.63e-05 [accelerated_algorithm]: 5.40001e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.04998e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.2e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.16001e-06 [flash_sp]: 3.59002e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.10999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 5.69e-06 [virtual_dataset]: 5.01997e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.83001e-06 [merge_forward]: 2.35002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 5.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.38001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.10998e-06 [meta_fg_expand]: 1.55001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.21998e-06 [a_after_grad]: 7.93999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.60019e-07 [auto_monad_grad]: 7.50006e-07 [auto_monad_eliminator]: 5.79999e-06 [cse]: 1.238e-05 [a_3]: 3.151e-05 [py_interpret_to_execute_after_opt_a]: 7.39002e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 3.033e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 5.52999e-06 [mutable_eliminate]: 0.00044722 [opt_b]: 0.00017958, [1] [Cycle 1]: 0.00017368, [7] [b_1]: 0.00010717 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.14e-06 [renormalize]: 2.69996e-07 [cse]: 1.61e-05 [optimize_parallel_all_gather_comm]: 1.58e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.253e-05 [loop_unroll]: 0.00050369 [opt_after_cconv]: 9.487e-05, [1] [Cycle 1]: 8.918e-05, [7] [c_1]: 2.818e-05 [parameter_eliminate]: 2.04999e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.664e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.216e-05 [tuple_transform]: 6.792e-05, [1] [Cycle 1]: 6.378e-05, [4] [d_1]: 3.853e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.07001e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.128e-05 [cse_after_recomputation]: 2.049e-05, [1] [Cycle 1]: 1.611e-05, [1] [cse]: 1.115e-05 [environ_conv]: 4.90001e-06 [swap_dp_allreduce_reducescatter]: 4.92999e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.40999e-06 [label_fine_grained_interleaved_index]: 2.43002e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.13998e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.44998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.178e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.05002e-06 [overlap_grad_ring_attention]: 3.89002e-06 [overlap_grad_flash_sp]: 1.706e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 6.802e-05, [1] [Cycle 1]: 6.418e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.36002e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 8.60001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.51e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.00045337 [validate]: 3.063e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.430414 [execute]: 8.50001e-06 Sums bootstrap : 0.000515s : 0.12% type_inference : 0.004546s : 1.04% event_method : 0.000010s : 0.00% auto_monad : 0.000051s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.00% optimize.rewriter_before_opt_a : 0.000037s : 0.01% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.01% optimize.opt_a.loop_unroll : 0.000019s : 0.00% optimize.opt_a.a_1 : 0.000417s : 0.10% optimize.opt_a.with_stream_mark : 0.000025s : 0.01% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000021s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000010s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000346s : 0.08% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000039s : 0.01% optimize.opt_a.a_3 : 0.000071s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.10% optimize.opt_b.b_1 : 0.000107s : 0.02% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.01% optimize.loop_unroll : 0.000504s : 0.11% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000041s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000453s : 0.10% validate : 0.000031s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.430414s : 98.03% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000122 26 18.70% : 0.000023s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000005s : 4: substitution.graph_param_transform 65.77% : 0.000080s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.38% : 0.000004s : 4: substitution.remove_not_recompute_node 3.10% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004506 2 91.94% : 0.004143s : 1: type_inference.infer 8.06% : 0.000363s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000135 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.75% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.50% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.33% : 0.000003s : 26: predicate.load_eliminater 1.38% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.17% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.50% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.97% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.63% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.43% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.13% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000265 6 43.11% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.89% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.451027 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.67% : 0.003011s : 1: add_attr 0.67% : 0.003002s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000045s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000056s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.12% : 0.000549s : 1: bootstrap 0.01% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000015s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.11% : 0.000513s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.10% : 0.000456s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.17% : 0.000764s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000089s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.01% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.41% : 0.001848s : 1: opt_a 0.02% : 0.000098s : 1: opt_after_cconv 0.10% : 0.000463s : 1: opt_after_jit_grad 0.04% : 0.000183s : 1: opt_b 0.83% : 0.003732s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000025s : 1: pre_auto_parallel 0.00% : 0.000018s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.04% : 0.000185s : 1: renormalize.infer 0.03% : 0.000155s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000034s : 1: rewriter_after_opt_a 0.01% : 0.000041s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000071s : 1: symbol_engine_optimizer 95.43% : 0.430436s : 1: task_emit 0.02% : 0.000071s : 1: tuple_transform 1.01% : 0.004560s : 1: type_inference 0.01% : 0.000051s : 1: validate TotalTime = 0.118933, [24] [bootstrap]: 0.00048275 [type_inference]: 0.00568038 [event_method]: 1.48e-05 [auto_monad]: 5.371e-05 [graph_reusing]: 5.64e-06 [inline]: 1.94e-06 [add_attr]: 0.00302922, [1] [add_attr_with_inline]: 0.00302113, [1] [Cycle 1]: 4.456e-05, [2] [tag_attr]: 1.509e-05 [meta_addattr_fg_expand]: 4.08001e-06 [parallel-infer-symbol]: 2.67001e-06 [pre_auto_parallel]: 2.502e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.62001e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00395955, [53] [py_interpret_to_execute]: 2.089e-05 [rewriter_before_opt_a]: 5.811e-05 [opt_a]: 0.00212582, [2] [Cycle 1]: 0.00151917, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 3.063e-05 [loop_unroll]: 2.083e-05 [a_1]: 0.00044604 [with_stream_mark]: 1.297e-05 [recompute_prepare]: 7.66999e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 3.11001e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.499e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 2.21998e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 5.66998e-06 [merge_send_recv]: 7.63001e-06 [auto_parallel]: 5.51e-06 [parallel]: 1.753e-05 [flash_sp]: 6.89001e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 8.59998e-06 [allreduce_slice_to_reducescatter]: 6.20028e-07 [virtual_shard_identity]: 6.93998e-06 [virtual_dataset]: 5.90002e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.94002e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 9.34998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.19e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.51998e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 9.19e-06 [renormalize]: 0.00044012 [add_forward_monad_depend]: 4.74e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 1.333e-05 [cse]: 2.625e-05 [a_3]: 4.064e-05 [Cycle 2]: 0.00059707, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.76e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012532 [with_stream_mark]: 9.96998e-06 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 3.02002e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 7.49977e-07 [a_2]: 6.754e-05 [accelerated_algorithm]: 5.41002e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.40001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.05001e-06 [parallel]: 3.78001e-06 [flash_sp]: 3.48e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.82002e-06 [matmul_add_comm_reduction]: 5.00001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.48002e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 7.07002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41998e-06 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 8.89998e-06 [a_after_grad]: 8.02e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.09998e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.38998e-06 [cse]: 1.351e-05 [a_3]: 3.326e-05 [py_interpret_to_execute_after_opt_a]: 7.88999e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.07e-05 [convert_after_rewriter]: 7.35e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00045272 [opt_b]: 0.00018083, [1] [Cycle 1]: 0.00017477, [7] [b_1]: 0.00010796 [b_2]: 7.01999e-06 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 4.30009e-07 [cse]: 1.587e-05 [optimize_parallel_all_gather_comm]: 1.574e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.144e-05 [loop_unroll]: 0.00041571 [opt_after_cconv]: 9.456e-05, [1] [Cycle 1]: 8.877e-05, [7] [c_1]: 2.74e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.575e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.285e-05 [tuple_transform]: 6.794e-05, [1] [Cycle 1]: 6.375e-05, [4] [d_1]: 3.838e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.15002e-06 [partial_unused_args_eliminate]: 1.53002e-06 [add_recomputation]: 4.417e-05 [cse_after_recomputation]: 2.065e-05, [1] [Cycle 1]: 1.65e-05, [1] [cse]: 1.13e-05 [environ_conv]: 4.99e-06 [swap_dp_allreduce_reducescatter]: 5.29998e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.75002e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 9.60019e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.77002e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.19003e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.61e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.86003e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.79e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.12e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 9.09e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.92999e-06 [auto_monad_reorder]: 1.514e-05 [get_jit_bprop_graph]: 9.09989e-07 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00045045 [validate]: 3.067e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.104948 [execute]: 9.22001e-06 Sums bootstrap : 0.000483s : 0.42% type_inference : 0.005680s : 4.94% event_method : 0.000015s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000037s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000571s : 0.50% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000440s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.03% optimize.opt_a.a_3 : 0.000074s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000453s : 0.39% optimize.opt_b.b_1 : 0.000108s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000416s : 0.36% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.39% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.104948s : 91.31% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000162 30 14.89% : 0.000024s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.10% : 0.000005s : 4: substitution.graph_param_transform 66.98% : 0.000108s : 3: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.29% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005637 2 89.94% : 0.005070s : 1: type_inference.infer 10.06% : 0.000567s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.54% : 0.000027s : 3: replace.inline 29.46% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 92.07% : 0.000106s : 3: match.inline 7.93% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 46.31% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.69% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.127448 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.38% : 0.003034s : 1: add_attr 2.37% : 0.003025s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000059s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000518s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000462s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.73% : 0.000935s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.67% : 0.002129s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.36% : 0.000460s : 1: opt_after_jit_grad 0.14% : 0.000184s : 1: opt_b 3.11% : 0.003963s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000215s : 1: renormalize.infer 0.17% : 0.000218s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 82.36% : 0.104970s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.47% : 0.005694s : 1: type_inference 0.04% : 0.000052s : 1: validate . TotalTime = 13.4268, [24] [bootstrap]: 0.00051503 [type_inference]: 0.011558 [event_method]: 4.786e-05 [auto_monad]: 0.00012085 [graph_reusing]: 7.65e-06 [inline]: 1.81e-06 [add_attr]: 0.0031486, [1] [add_attr_with_inline]: 0.00314037, [1] [Cycle 1]: 7.157e-05, [2] [tag_attr]: 3.497e-05 [meta_addattr_fg_expand]: 9.32999e-06 [parallel-infer-symbol]: 3.51001e-06 [pre_auto_parallel]: 5.002e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0137178, [53] [py_interpret_to_execute]: 3.783e-05 [rewriter_before_opt_a]: 0.00014739 [opt_a]: 0.0113886, [3] [Cycle 1]: 0.00736662, [45] [expand_dump_flag]: 3.56999e-06 [switch_simplify]: 7.338e-05 [loop_unroll]: 6.216e-05 [a_1]: 0.00150166 [with_stream_mark]: 2.28e-05 [recompute_prepare]: 2.204e-05 [updatestate_depend_eliminate]: 8.92e-06 [updatestate_assign_eliminate]: 7.75998e-06 [updatestate_loads_eliminate]: 7.1e-06 [parameter_eliminate]: 2.64001e-06 [a_2]: 0.00024938 [accelerated_algorithm]: 3.047e-05 [shard]: 1.81998e-06 [meta_shard_fg_expand]: 3.3e-06 [shard_inline]: 1.635e-05 [merge_send_recv]: 1.63e-05 [auto_parallel]: 1.137e-05 [parallel]: 1.892e-05 [flash_sp]: 1.118e-05 [merge_comm]: 9.92999e-06 [allreduce_fusion]: 8.89003e-06 [matmul_add_comm_reduction]: 2.747e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 1.805e-05 [virtual_dataset]: 1.613e-05 [get_grad_eliminate_]: 1.568e-05 [virtual_output]: 1.568e-05 [merge_forward]: 9.23002e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.837e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.903e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.832e-05 [set_forward_comm_id_for_comm_node_pass]: 9.76998e-06 [meta_fg_expand]: 0.001474 [flash_sp_send_recv_attached]: 3.88999e-06 [receive_attached]: 2.78e-06 [after_resolve]: 6.178e-05 [a_after_grad]: 8.312e-05 [renormalize]: 0.00259832 [add_forward_monad_depend]: 9.23002e-06 [auto_monad_grad]: 5.81e-06 [auto_monad_eliminator]: 5.649e-05 [cse]: 0.00016955 [a_3]: 0.00033875 [Cycle 2]: 0.003103, [45] [expand_dump_flag]: 1.71e-06 [switch_simplify]: 4.78e-05 [loop_unroll]: 4.4e-05 [a_1]: 0.00154642 [with_stream_mark]: 1.259e-05 [recompute_prepare]: 1.119e-05 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 4.60001e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 1.22e-06 [a_2]: 0.00012672 [accelerated_algorithm]: 1.224e-05 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 2.31e-06 [shard_inline]: 1.063e-05 [merge_send_recv]: 7.25998e-06 [auto_parallel]: 7.95e-06 [parallel]: 5.49e-06 [flash_sp]: 3.42002e-06 [merge_comm]: 5.24998e-06 [allreduce_fusion]: 5.34e-06 [matmul_add_comm_reduction]: 7.96001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 9.05999e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.34997e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 9.51998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.812e-05 [merge_recompute_call_nodes]: 8.59989e-07 [before_grad]: 1.47e-05 [set_forward_comm_id_for_comm_node_pass]: 5.53002e-06 [meta_fg_expand]: 7.128e-05 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.32999e-06 [after_resolve]: 1.601e-05 [a_after_grad]: 1.436e-05 [renormalize]: 0.00060986 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.475e-05 [cse]: 4.881e-05 [a_3]: 6.538e-05 [Cycle 3]: 0.00090336, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 1.064e-05 [loop_unroll]: 9.02e-06 [a_1]: 0.00025186 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 9.42999e-06 [updatestate_depend_eliminate]: 4.81002e-06 [updatestate_assign_eliminate]: 3.78999e-06 [updatestate_loads_eliminate]: 4.05998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012317 [accelerated_algorithm]: 1.17e-05 [shard]: 8.99978e-07 [meta_shard_fg_expand]: 1.76998e-06 [shard_inline]: 8.90001e-06 [merge_send_recv]: 6.98998e-06 [auto_parallel]: 7.29001e-06 [parallel]: 5.07999e-06 [flash_sp]: 1.04003e-06 [merge_comm]: 5.00001e-06 [allreduce_fusion]: 4.80001e-06 [matmul_add_comm_reduction]: 7.65e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 9.81e-06 [virtual_dataset]: 8.77999e-06 [get_grad_eliminate_]: 8.45999e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.43001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.579e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.418e-05 [set_forward_comm_id_for_comm_node_pass]: 5.00999e-06 [meta_fg_expand]: 3.14999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.466e-05 [a_after_grad]: 1.498e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 1.067e-05 [cse]: 2.645e-05 [a_3]: 5.918e-05 [py_interpret_to_execute_after_opt_a]: 1.091e-05 [slice_cell_reuse_recomputed_activation]: 2.32001e-06 [rewriter_after_opt_a]: 4.867e-05 [convert_after_rewriter]: 9.60001e-06 [order_py_execute_after_rewriter]: 6.99001e-06 [mutable_eliminate]: 0.0004814 [opt_b]: 0.00029001, [1] [Cycle 1]: 0.00028381, [7] [b_1]: 0.00019129 [b_2]: 1.073e-05 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 6.89994e-07 [cse]: 3.154e-05 [optimize_parallel_all_gather_comm]: 2.056e-05 [overlap_param_gather]: 2.11e-06 [cconv]: 2.023e-05 [loop_unroll]: 0.00043224 [opt_after_cconv]: 0.00013607, [1] [Cycle 1]: 0.00013001, [7] [c_1]: 4.837e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 7.26999e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 3.98001e-06 [cse]: 3.007e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 2.89e-05 [tuple_transform]: 0.00010316, [1] [Cycle 1]: 9.855e-05, [4] [d_1]: 6.798e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 9.81998e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 5.714e-05 [cse_after_recomputation]: 3.256e-05, [1] [Cycle 1]: 2.779e-05, [1] [cse]: 2.216e-05 [environ_conv]: 8.94003e-06 [swap_dp_allreduce_reducescatter]: 8.27e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.41002e-06 [slice_recompute_activation]: 2.66e-06 [micro_interleaved_order_control]: 2.11003e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.967e-05 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.8e-05 [grouped_pairwise_exchange_alltoall]: 1.94999e-06 [offloading_packed_experts]: 5.08002e-06 [overlap_recompute_and_grad_model_parallel]: 5.77001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.59e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 5.42001e-06 [overlap_grad_flash_sp]: 2.494e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 0.00010202, [1] [Cycle 1]: 9.775e-05, [6] [build]: 1.048e-05 [elim_shapecalc]: 1.401e-05 [elim_not_effective]: 1.938e-05 [opt_reshape]: 1.039e-05 [fold_const_symbol]: 1.494e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 2.474e-05 [get_jit_bprop_graph]: 1.13001e-06 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00047979 [validate]: 4.765e-05 [backend_pass]: 1.12e-06 [task_emit]: 13.3968 [execute]: 9.22001e-06 Sums bootstrap : 0.000515s : 0.00% type_inference : 0.011558s : 0.09% event_method : 0.000048s : 0.00% auto_monad : 0.000121s : 0.00% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000050s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000147s : 0.00% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.00% optimize.opt_a.loop_unroll : 0.000115s : 0.00% optimize.opt_a.a_1 : 0.003300s : 0.02% optimize.opt_a.with_stream_mark : 0.000045s : 0.00% optimize.opt_a.recompute_prepare : 0.000043s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000036s : 0.00% optimize.opt_a.merge_send_recv : 0.000031s : 0.00% optimize.opt_a.auto_parallel : 0.000027s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.001548s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.00% optimize.opt_a.a_after_grad : 0.000112s : 0.00% optimize.opt_a.renormalize : 0.003208s : 0.02% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000008s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.00% optimize.opt_a.cse : 0.000245s : 0.00% optimize.opt_a.a_3 : 0.000463s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000481s : 0.00% optimize.opt_b.b_1 : 0.000191s : 0.00% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000432s : 0.00% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.00% optimize.tuple_transform.d_1 : 0.000068s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.00% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000020s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000480s : 0.00% validate : 0.000048s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 13.396798s : 99.81% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000780 222 6.02% : 0.000047s : 12: substitution.arithmetic_simplify 1.75% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.58% : 0.000434s : 17: substitution.inline 2.24% : 0.000017s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.85% : 0.000014s : 3: substitution.less_batch_normalization 1.77% : 0.000014s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.12% : 0.000024s : 10: substitution.replace_applicator 1.37% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.82% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.70% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011483 2 86.67% : 0.009952s : 1: type_inference.infer 13.33% : 0.001531s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.78% : 0.000127s : 17: replace.inline 42.22% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000459 33 92.57% : 0.000425s : 17: match.inline 7.43% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000759 5764 1.11% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.03% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.70% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000018s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000043s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.06% : 0.000016s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.64% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.66% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.92% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.96% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.67% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.23% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001623 34 56.54% : 0.000918s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.46% : 0.000705s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 13.452151 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.02% : 0.003153s : 1: add_attr 0.02% : 0.003144s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000128s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.00% : 0.000549s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000055s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.00% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.00% : 0.000491s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.04% : 0.004983s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000176s : 28: opt.transform.opt_b 0.00% : 0.000076s : 2: opt.transform.opt_trans_graph 0.00% : 0.000055s : 4: opt.transform.symbol_engine_opt 0.08% : 0.011392s : 1: opt_a 0.00% : 0.000139s : 1: opt_after_cconv 0.00% : 0.000490s : 1: opt_after_jit_grad 0.00% : 0.000293s : 1: opt_b 0.10% : 0.013722s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000023s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000055s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000033s : 1: remove_dup_value 0.01% : 0.001745s : 2: renormalize.infer 0.01% : 0.001449s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000053s : 1: rewriter_after_opt_a 0.00% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000105s : 1: symbol_engine_optimizer 99.59% : 13.396820s : 1: task_emit 0.00% : 0.000106s : 1: tuple_transform 0.09% : 0.011572s : 1: type_inference 0.00% : 0.000073s : 1: validate TotalTime = 0.469426, [24] [bootstrap]: 0.00065703 [type_inference]: 0.00503441 [event_method]: 1.125e-05 [auto_monad]: 5.149e-05 [graph_reusing]: 5.25001e-06 [inline]: 2.14e-06 [add_attr]: 0.00343477, [1] [add_attr_with_inline]: 0.00342538, [1] [Cycle 1]: 5.18e-05, [2] [tag_attr]: 1.306e-05 [meta_addattr_fg_expand]: 3.21001e-06 [parallel-infer-symbol]: 3.31999e-06 [pre_auto_parallel]: 2.646e-05 [insert-virtual-dataset]: 2.72001e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.17001e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00416719, [53] [py_interpret_to_execute]: 1.686e-05 [rewriter_before_opt_a]: 4.268e-05 [opt_a]: 0.00211114, [2] [Cycle 1]: 0.00149258, [45] [expand_dump_flag]: 2.46e-06 [switch_simplify]: 2.422e-05 [loop_unroll]: 1.403e-05 [a_1]: 0.00030639 [with_stream_mark]: 1.591e-05 [recompute_prepare]: 7.72002e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 9.685e-05 [accelerated_algorithm]: 6.43998e-06 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.39998e-06 [auto_parallel]: 6.47001e-06 [parallel]: 6.017e-05 [flash_sp]: 8.12e-06 [merge_comm]: 4.07003e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 9.35001e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 8.1e-06 [virtual_dataset]: 6.59999e-06 [get_grad_eliminate_]: 5.79999e-06 [virtual_output]: 6.13998e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 1.023e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.183e-05 [merge_recompute_call_nodes]: 1.94e-06 [before_grad]: 9.97001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47002e-06 [meta_fg_expand]: 2.96001e-06 [flash_sp_send_recv_attached]: 2.39999e-06 [receive_attached]: 2.09e-06 [after_resolve]: 1.096e-05 [a_after_grad]: 9.39998e-06 [renormalize]: 0.00047391 [add_forward_monad_depend]: 5.33002e-06 [auto_monad_grad]: 2.35002e-06 [auto_monad_eliminator]: 1.396e-05 [cse]: 2.931e-05 [a_3]: 4.326e-05 [Cycle 2]: 0.00060737, [45] [expand_dump_flag]: 9.40025e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00012867 [with_stream_mark]: 9.82999e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 3.11001e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.83e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.68002e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.64002e-06 [auto_parallel]: 5.54e-06 [parallel]: 5.46e-06 [flash_sp]: 3.48e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.35001e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.46e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.49001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01999e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 6.60017e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 8.87e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 7.07002e-06 [cse]: 1.341e-05 [a_3]: 3.165e-05 [py_interpret_to_execute_after_opt_a]: 9.59e-06 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 3.381e-05 [convert_after_rewriter]: 7.28e-06 [order_py_execute_after_rewriter]: 5.22e-06 [mutable_eliminate]: 0.00059012 [opt_b]: 0.00019102, [1] [Cycle 1]: 0.00018477, [7] [b_1]: 0.0001104 [b_2]: 7.38e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 2.50997e-06 [updatestate_loads_eliminate]: 2.59999e-06 [renormalize]: 4.39992e-07 [cse]: 2.014e-05 [optimize_parallel_all_gather_comm]: 1.624e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.515e-05 [loop_unroll]: 0.00046272 [opt_after_cconv]: 0.0001032, [1] [Cycle 1]: 9.665e-05, [7] [c_1]: 2.886e-05 [parameter_eliminate]: 2.80002e-06 [updatestate_depend_eliminate]: 5.74e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.864e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.275e-05 [tuple_transform]: 7.198e-05, [1] [Cycle 1]: 6.734e-05, [4] [d_1]: 4.135e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.69999e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.801e-05 [cse_after_recomputation]: 2.217e-05, [1] [Cycle 1]: 1.732e-05, [1] [cse]: 1.176e-05 [environ_conv]: 5.51e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 3.71999e-06 [label_micro_interleaved_index]: 4.74e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.64e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.15002e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.67001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.54e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50999e-06 [control_data_broadcast_order]: 1.154e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.89998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 4.38999e-06 [overlap_grad_flash_sp]: 1.877e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 7.195e-05, [1] [Cycle 1]: 6.761e-05, [6] [build]: 3.13e-06 [elim_shapecalc]: 9.00001e-06 [elim_not_effective]: 1.193e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.31e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 2.19999e-06 [auto_monad_reorder]: 1.513e-05 [get_jit_bprop_graph]: 1.27999e-06 [rewriter_after_jit_bprop_graph]: 4.20999e-06 [opt_after_jit_grad]: 0.00057867 [validate]: 3.762e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.455133 [execute]: 9.32001e-06 Sums bootstrap : 0.000657s : 0.14% type_inference : 0.005034s : 1.08% event_method : 0.000011s : 0.00% auto_monad : 0.000051s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.00% optimize.rewriter_before_opt_a : 0.000043s : 0.01% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.01% optimize.opt_a.loop_unroll : 0.000020s : 0.00% optimize.opt_a.a_1 : 0.000435s : 0.09% optimize.opt_a.with_stream_mark : 0.000026s : 0.01% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000165s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000066s : 0.01% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000474s : 0.10% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.00% optimize.opt_a.cse : 0.000043s : 0.01% optimize.opt_a.a_3 : 0.000075s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000590s : 0.13% optimize.opt_b.b_1 : 0.000110s : 0.02% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.01% optimize.loop_unroll : 0.000463s : 0.10% optimize.opt_after_cconv.c_1 : 0.000029s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000004s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000579s : 0.12% validate : 0.000038s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.455133s : 97.89% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000134 26 18.25% : 0.000024s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.16% : 0.000002s : 2: substitution.fold_const_symbol 4.47% : 0.000006s : 4: substitution.graph_param_transform 65.64% : 0.000088s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.47% : 0.000005s : 4: substitution.remove_not_recompute_node 3.26% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004989 2 91.74% : 0.004577s : 1: type_inference.infer 8.26% : 0.000412s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000086 2 100.00% : 0.000086s : 2: match.inline ------[predicate.] 0.000147 984 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 1.49% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.91% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.62% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.74% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.41% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.99% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.87% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.72% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.60% : 0.000001s : 8: predicate.incorporate_call_switch 5.59% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.40% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.87% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 9: predicate.minmaximum_grad 1.58% : 0.000002s : 4: predicate.mutable_eliminate 0.63% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.15% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.25% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 9: predicate.reshape_eliminate 1.12% : 0.000002s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 1.36% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.24% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.93% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.18% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.44% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.56% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000288 6 40.38% : 0.000116s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.62% : 0.000172s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.478452 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.72% : 0.003440s : 1: add_attr 0.72% : 0.003430s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000056s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000007s : 1: bias_add_comm_swap 0.15% : 0.000706s : 1: bootstrap 0.01% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000017s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.10% : 0.000472s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.13% : 0.000600s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.17% : 0.000796s : 78: opt.transform.opt_a 0.01% : 0.000028s : 1: opt.transform.opt_after_cconv 0.00% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000092s : 28: opt.transform.opt_b 0.01% : 0.000046s : 2: opt.transform.opt_trans_graph 0.01% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.44% : 0.002114s : 1: opt_a 0.02% : 0.000107s : 1: opt_after_cconv 0.12% : 0.000589s : 1: opt_after_jit_grad 0.04% : 0.000194s : 1: opt_b 0.87% : 0.004172s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000031s : 1: pre_auto_parallel 0.00% : 0.000021s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.06% : 0.000270s : 1: renormalize.infer 0.04% : 0.000197s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000038s : 1: rewriter_after_opt_a 0.01% : 0.000047s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000075s : 1: symbol_engine_optimizer 95.13% : 0.455154s : 1: task_emit 0.02% : 0.000075s : 1: tuple_transform 1.06% : 0.005053s : 1: type_inference 0.01% : 0.000064s : 1: validate TotalTime = 0.724477, [24] [bootstrap]: 0.00049789 [type_inference]: 0.0429181 [event_method]: 4.559e-05 [auto_monad]: 0.00012951 [graph_reusing]: 8.01001e-06 [inline]: 1.94e-06 [add_attr]: 0.0030849, [1] [add_attr_with_inline]: 0.00307661, [1] [Cycle 1]: 6.74e-05, [2] [tag_attr]: 3.229e-05 [meta_addattr_fg_expand]: 8.67e-06 [parallel-infer-symbol]: 2.65997e-06 [pre_auto_parallel]: 4.583e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.0302597, [53] [py_interpret_to_execute]: 3.834e-05 [rewriter_before_opt_a]: 0.00012866 [opt_a]: 0.0278591, [3] [Cycle 1]: 0.0237438, [45] [expand_dump_flag]: 3.58999e-06 [switch_simplify]: 6.725e-05 [loop_unroll]: 5.627e-05 [a_1]: 0.00135299 [with_stream_mark]: 2.285e-05 [recompute_prepare]: 2.108e-05 [updatestate_depend_eliminate]: 8.94e-06 [updatestate_assign_eliminate]: 8.05e-06 [updatestate_loads_eliminate]: 7.36999e-06 [parameter_eliminate]: 2.41998e-06 [a_2]: 0.00024442 [accelerated_algorithm]: 3.038e-05 [shard]: 2.16998e-06 [meta_shard_fg_expand]: 3.41001e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.575e-05 [auto_parallel]: 1.054e-05 [parallel]: 1.822e-05 [flash_sp]: 1.189e-05 [merge_comm]: 9.27999e-06 [allreduce_fusion]: 9.15999e-06 [matmul_add_comm_reduction]: 2.537e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.845e-05 [virtual_dataset]: 1.591e-05 [get_grad_eliminate_]: 1.496e-05 [virtual_output]: 1.498e-05 [merge_forward]: 9.69e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.76e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.853e-05 [merge_recompute_call_nodes]: 1.82001e-06 [before_grad]: 2.698e-05 [set_forward_comm_id_for_comm_node_pass]: 9.79e-06 [meta_fg_expand]: 0.00191652 [flash_sp_send_recv_attached]: 4.03999e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 6.196e-05 [a_after_grad]: 9.489e-05 [renormalize]: 0.00261192 [add_forward_monad_depend]: 9.42999e-06 [auto_monad_grad]: 5.54e-06 [auto_monad_eliminator]: 5.762e-05 [cse]: 0.0162284 [a_3]: 0.00034658 [Cycle 2]: 0.00314123, [45] [expand_dump_flag]: 1.52001e-06 [switch_simplify]: 4.785e-05 [loop_unroll]: 4.453e-05 [a_1]: 0.00158805 [with_stream_mark]: 1.393e-05 [recompute_prepare]: 1.081e-05 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 4.62e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 0.00012752 [accelerated_algorithm]: 1.268e-05 [shard]: 1.39e-06 [meta_shard_fg_expand]: 2.17999e-06 [shard_inline]: 9.66e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 8.57e-06 [parallel]: 6.11e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 5.39e-06 [allreduce_fusion]: 4.92999e-06 [matmul_add_comm_reduction]: 8.45001e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 1.079e-05 [virtual_dataset]: 9.04e-06 [get_grad_eliminate_]: 9.30001e-06 [virtual_output]: 8.63001e-06 [merge_forward]: 4.58999e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 1.059e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.611e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 1.404e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40999e-06 [meta_fg_expand]: 4.3e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.532e-05 [a_after_grad]: 1.468e-05 [renormalize]: 0.00067632 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 1.55999e-06 [auto_monad_eliminator]: 1.555e-05 [cse]: 4.92e-05 [a_3]: 6.7e-05 [Cycle 3]: 0.00095883, [45] [expand_dump_flag]: 1.22e-06 [switch_simplify]: 1.08e-05 [loop_unroll]: 9.19e-06 [a_1]: 0.00025995 [with_stream_mark]: 1.038e-05 [recompute_prepare]: 9.29e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 3.92998e-06 [updatestate_loads_eliminate]: 3.89002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00016023 [accelerated_algorithm]: 1.218e-05 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 9.12001e-06 [merge_send_recv]: 7.13e-06 [auto_parallel]: 7.21001e-06 [parallel]: 4.86002e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.90001e-06 [allreduce_fusion]: 4.93001e-06 [matmul_add_comm_reduction]: 7.81001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.02e-05 [virtual_dataset]: 8.76997e-06 [get_grad_eliminate_]: 8.45001e-06 [virtual_output]: 8.30999e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 9.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.695e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.506e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25001e-06 [meta_fg_expand]: 3.08e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.375e-05 [a_after_grad]: 1.442e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 1.06997e-06 [auto_monad_eliminator]: 1.131e-05 [cse]: 2.772e-05 [a_3]: 5.982e-05 [py_interpret_to_execute_after_opt_a]: 1.18e-05 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 4.786e-05 [convert_after_rewriter]: 8.95999e-06 [order_py_execute_after_rewriter]: 7.02002e-06 [mutable_eliminate]: 0.00055574 [opt_b]: 0.00029202, [1] [Cycle 1]: 0.00028538, [7] [b_1]: 0.00019177 [b_2]: 1.094e-05 [updatestate_depend_eliminate]: 7.33e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 4e-06 [renormalize]: 2.89991e-07 [cse]: 3.295e-05 [optimize_parallel_all_gather_comm]: 2.074e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.11e-05 [loop_unroll]: 0.00044272 [opt_after_cconv]: 0.00014981, [1] [Cycle 1]: 0.0001439, [7] [c_1]: 4.903e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 7.11999e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.80998e-06 [cse]: 4.223e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 3.161e-05 [tuple_transform]: 0.00010537, [1] [Cycle 1]: 9.987e-05, [4] [d_1]: 6.913e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.02e-05 [partial_unused_args_eliminate]: 2.23002e-06 [add_recomputation]: 5.905e-05 [cse_after_recomputation]: 3.393e-05, [1] [Cycle 1]: 2.904e-05, [1] [cse]: 2.329e-05 [environ_conv]: 9.22999e-06 [swap_dp_allreduce_reducescatter]: 8.18001e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.21998e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.60999e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.38002e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.703e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 5.47001e-06 [overlap_recompute_and_grad_model_parallel]: 5.57001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.68002e-06 [overlap_recompute_comm]: 2.29999e-06 [overlap_grad_ring_attention]: 5.20001e-06 [overlap_grad_flash_sp]: 2.574e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.07998e-06 [symbol_engine_optimizer]: 9.876e-05, [1] [Cycle 1]: 9.441e-05, [6] [build]: 9.12001e-06 [elim_shapecalc]: 1.373e-05 [elim_not_effective]: 1.809e-05 [opt_reshape]: 1.015e-05 [fold_const_symbol]: 1.53e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.07001e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 2.518e-05 [get_jit_bprop_graph]: 1.23002e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00048972 [validate]: 4.797e-05 [backend_pass]: 1.14003e-06 [task_emit]: 0.646664 [execute]: 1.036e-05 Sums bootstrap : 0.000498s : 0.07% type_inference : 0.042918s : 5.96% event_method : 0.000046s : 0.01% auto_monad : 0.000130s : 0.02% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.01% optimize.rewriter_before_opt_a : 0.000129s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000126s : 0.02% optimize.opt_a.loop_unroll : 0.000110s : 0.02% optimize.opt_a.a_1 : 0.003201s : 0.44% optimize.opt_a.with_stream_mark : 0.000047s : 0.01% optimize.opt_a.recompute_prepare : 0.000041s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000532s : 0.07% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.01% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000031s : 0.00% optimize.opt_a.auto_parallel : 0.000026s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.01% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.001963s : 0.27% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.01% optimize.opt_a.a_after_grad : 0.000124s : 0.02% optimize.opt_a.renormalize : 0.003288s : 0.46% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000008s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.01% optimize.opt_a.cse : 0.016305s : 2.26% optimize.opt_a.a_3 : 0.000473s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000556s : 0.08% optimize.opt_b.b_1 : 0.000192s : 0.03% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.00% optimize.loop_unroll : 0.000443s : 0.06% optimize.opt_after_cconv.c_1 : 0.000049s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000042s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.00% optimize.tuple_transform.d_1 : 0.000069s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.01% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000490s : 0.07% validate : 0.000048s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.646664s : 89.80% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000777 218 6.24% : 0.000049s : 11: substitution.arithmetic_simplify 1.94% : 0.000015s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.99% : 0.000427s : 16: substitution.inline 2.12% : 0.000016s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000014s : 11: substitution.minmaximum_grad 0.77% : 0.000006s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.45% : 0.000027s : 10: substitution.replace_applicator 1.45% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.68% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.36% : 0.000065s : 28: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.042847 2 96.59% : 0.041387s : 1: type_inference.infer 3.41% : 0.001460s : 1: type_inference.specialize ------[replace.] 0.000218 30 59.96% : 0.000131s : 16: replace.inline 40.04% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 30 93.06% : 0.000419s : 16: match.inline 6.94% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000786 5663 1.04% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.03% : 0.000008s : 67: predicate.addn_zero_filter 1.01% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.97% : 0.000015s : 99: predicate.arithmetic_simplify 1.20% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.14% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_depend_swap 1.71% : 0.000013s : 107: predicate.environ_get_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.60% : 0.000013s : 97: predicate.exchange_switch_depend_value 3.35% : 0.000026s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.33% : 0.000042s : 244: predicate.inline 1.21% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.54% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.51% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.36% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000015s : 97: predicate.partial_defer_inline 1.61% : 0.000013s : 89: predicate.partial_eliminate 1.04% : 0.000008s : 67: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.25% : 0.000010s : 67: predicate.reduce_eliminate 2.55% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.83% : 0.000014s : 149: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000009s : 67: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.38% : 0.000011s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 2.60% : 0.000020s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.74% : 0.000014s : 97: predicate.switch_defer_inline 2.77% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.66% : 0.000037s : 265: predicate.switch_simplify 1.03% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.39% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 2.55% : 0.000020s : 83: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.69% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.37% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.93% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.56% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.50% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.15% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001608 32 55.92% : 0.000899s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.08% : 0.000709s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.766354 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.40% : 0.003089s : 1: add_attr 0.40% : 0.003081s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000137s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000532s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000053s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.06% : 0.000452s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.07% : 0.000566s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.64% : 0.004918s : 117: opt.transform.opt_a 0.01% : 0.000048s : 1: opt.transform.opt_after_cconv 0.00% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000177s : 28: opt.transform.opt_b 0.01% : 0.000077s : 2: opt.transform.opt_trans_graph 0.01% : 0.000054s : 4: opt.transform.symbol_engine_opt 3.64% : 0.027862s : 1: opt_a 0.02% : 0.000153s : 1: opt_after_cconv 0.07% : 0.000500s : 1: opt_after_jit_grad 0.04% : 0.000296s : 1: opt_b 3.95% : 0.030265s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000050s : 1: pre_auto_parallel 0.01% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000036s : 1: remove_dup_value 0.23% : 0.001740s : 2: renormalize.infer 0.20% : 0.001534s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000052s : 1: rewriter_after_opt_a 0.02% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000101s : 1: symbol_engine_optimizer 84.38% : 0.646688s : 1: task_emit 0.01% : 0.000108s : 1: tuple_transform 5.60% : 0.042935s : 1: type_inference 0.01% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x4-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x5-pynative],max_mem:6.0M TotalTime = 0.0538062, [24] [bootstrap]: 0.00056161 [type_inference]: 0.00637609 [event_method]: 1.494e-05 [auto_monad]: 5.548e-05 [graph_reusing]: 6.12001e-06 [inline]: 1.92001e-06 [add_attr]: 0.00351464, [1] [add_attr_with_inline]: 0.00350338, [1] [Cycle 1]: 4.574e-05, [2] [tag_attr]: 1.553e-05 [meta_addattr_fg_expand]: 4.32998e-06 [parallel-infer-symbol]: 2.87002e-06 [pre_auto_parallel]: 2.914e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.0327704, [53] [py_interpret_to_execute]: 2.003e-05 [rewriter_before_opt_a]: 6.007e-05 [opt_a]: 0.0306807, [2] [Cycle 1]: 0.0300431, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.238e-05 [loop_unroll]: 2.12e-05 [a_1]: 0.00045887 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.93001e-06 [updatestate_depend_eliminate]: 3.60003e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 3.08998e-06 [parameter_eliminate]: 1.78002e-06 [a_2]: 7.567e-05 [accelerated_algorithm]: 6.38998e-06 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.17999e-06 [merge_send_recv]: 8.18001e-06 [auto_parallel]: 5.92001e-06 [parallel]: 0.0280826 [flash_sp]: 2.376e-05 [merge_comm]: 1.383e-05 [allreduce_fusion]: 3.76001e-06 [matmul_add_comm_reduction]: 1.261e-05 [allreduce_slice_to_reducescatter]: 1.19998e-06 [virtual_shard_identity]: 2.836e-05 [virtual_dataset]: 7.66999e-06 [get_grad_eliminate_]: 6.91999e-06 [virtual_output]: 6.42001e-06 [merge_forward]: 4.65001e-06 [cell_reuse_recompute_pass]: 3.58e-06 [offload_activation]: 1.047e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.208e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 1.031e-05 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 4.75999e-06 [flash_sp_send_recv_attached]: 2.78e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.322e-05 [a_after_grad]: 9.81e-06 [renormalize]: 0.00072479 [add_forward_monad_depend]: 5.16002e-06 [auto_monad_grad]: 2.39999e-06 [auto_monad_eliminator]: 1.504e-05 [cse]: 2.897e-05 [a_3]: 4.287e-05 [Cycle 2]: 0.00062645, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 6.94999e-06 [loop_unroll]: 6.29999e-06 [a_1]: 0.0001409 [with_stream_mark]: 1.351e-05 [recompute_prepare]: 6.09999e-06 [updatestate_depend_eliminate]: 3.13998e-06 [updatestate_assign_eliminate]: 2.91999e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 2.19001e-06 [a_2]: 6.92e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 2.63e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.57998e-06 [auto_parallel]: 8.44002e-06 [parallel]: 5.65001e-06 [flash_sp]: 3.35003e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 3.04001e-06 [matmul_add_comm_reduction]: 5.05999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.09998e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.51998e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 5.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.31e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.24998e-06 [a_after_grad]: 8.68001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.49978e-07 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.40002e-06 [cse]: 1.34e-05 [a_3]: 3.201e-05 [py_interpret_to_execute_after_opt_a]: 9.61e-06 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 3.268e-05 [convert_after_rewriter]: 7.21999e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00066647 [opt_b]: 0.00018644, [1] [Cycle 1]: 0.00017918, [7] [b_1]: 0.00011024 [b_2]: 7.06999e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 3.9002e-07 [cse]: 1.615e-05 [optimize_parallel_all_gather_comm]: 1.62e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.345e-05 [loop_unroll]: 0.00041697 [opt_after_cconv]: 9.688e-05, [1] [Cycle 1]: 9.041e-05, [7] [c_1]: 2.824e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.18002e-06 [cse]: 1.599e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 1.336e-05 [tuple_transform]: 7.018e-05, [1] [Cycle 1]: 6.567e-05, [4] [d_1]: 3.956e-05 [none_parameter_eliminate]: 1.66002e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 5.579e-05 [cse_after_recomputation]: 2.155e-05, [1] [Cycle 1]: 1.669e-05, [1] [cse]: 1.14e-05 [environ_conv]: 5.49e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.07003e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.22001e-06 [assign_add_opt]: 1.41002e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.34998e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 3.04001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.783e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 6.961e-05, [1] [Cycle 1]: 6.551e-05, [6] [build]: 2.67001e-06 [elim_shapecalc]: 8.45001e-06 [elim_not_effective]: 1.162e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.07001e-06 [renormalize]: 3.69997e-07 [detach_backward]: 2.04999e-06 [pipeline_parallel_scheduler]: 1.66998e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 1.36002e-06 [rewriter_after_jit_bprop_graph]: 0.00264666 [opt_after_jit_grad]: 0.00067551 [validate]: 4.165e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00684047 [execute]: 6.85998e-06 Sums bootstrap : 0.000562s : 1.14% type_inference : 0.006376s : 12.96% event_method : 0.000015s : 0.03% auto_monad : 0.000055s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000029s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000060s : 0.12% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.08% optimize.opt_a.loop_unroll : 0.000027s : 0.06% optimize.opt_a.a_1 : 0.000600s : 1.22% optimize.opt_a.with_stream_mark : 0.000028s : 0.06% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.03% optimize.opt_a.auto_parallel : 0.000014s : 0.03% optimize.opt_a.parallel : 0.028088s : 57.07% optimize.opt_a.flash_sp : 0.000027s : 0.06% optimize.opt_a.merge_comm : 0.000017s : 0.03% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000034s : 0.07% optimize.opt_a.virtual_dataset : 0.000013s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000031s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000006s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000022s : 0.05% optimize.opt_a.a_after_grad : 0.000018s : 0.04% optimize.opt_a.renormalize : 0.000725s : 1.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000042s : 0.09% optimize.opt_a.a_3 : 0.000075s : 0.15% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.07% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000666s : 1.35% optimize.opt_b.b_1 : 0.000110s : 0.22% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.05% optimize.loop_unroll : 0.000417s : 0.85% optimize.opt_after_cconv.c_1 : 0.000028s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.03% optimize.tuple_transform.d_1 : 0.000040s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.11% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.002647s : 5.38% opt_after_jit_grad : 0.000676s : 1.37% validate : 0.000042s : 0.08% backend_pass : 0.000001s : 0.00% task_emit : 0.006840s : 13.90% execute : 0.000007s : 0.01% Time group info: ------[substitution.] 0.000180 30 18.02% : 0.000032s : 5: substitution.arithmetic_simplify 1.21% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.07% : 0.000006s : 4: substitution.graph_param_transform 62.77% : 0.000113s : 3: substitution.inline 1.83% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.67% : 0.000005s : 4: substitution.remove_not_recompute_node 3.35% : 0.000006s : 4: substitution.replace_old_param 6.36% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006328 2 90.58% : 0.005732s : 1: type_inference.infer 9.42% : 0.000596s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.43% : 0.000027s : 3: replace.inline 29.57% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.40% : 0.000111s : 3: match.inline 8.60% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000169 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.72% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.94% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.01% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.40% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.01% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.01% : 0.000002s : 15: predicate.environ_get_depend_swap 1.71% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 5.73% : 0.000010s : 51: predicate.inline 1.12% : 0.000002s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 8: predicate.less_batch_normalization 1.83% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.60% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.01% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.37% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.60% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.63% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.86% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.47% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.51% : 0.000003s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.30% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.88% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.74% : 0.000008s : 54: predicate.switch_simplify 0.77% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.20% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.29% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.40% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000417 8 43.19% : 0.000180s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.81% : 0.000237s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.091964 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.83% : 0.003519s : 1: add_attr 3.81% : 0.003507s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000061s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.66% : 0.000603s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.46% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000675s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.09% : 0.001003s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 33.37% : 0.030684s : 1: opt_a 0.11% : 0.000100s : 1: opt_after_cconv 0.75% : 0.000686s : 1: opt_after_jit_grad 0.21% : 0.000190s : 1: opt_b 35.64% : 0.032775s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000034s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.43% : 0.000395s : 1: renormalize.infer 0.35% : 0.000319s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 2.89% : 0.002662s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 7.45% : 0.006850s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 6.95% : 0.006390s : 1: type_inference 0.08% : 0.000078s : 1: validate TotalTime = 0.0621769, [24] [bootstrap]: 0.00051456 [type_inference]: 0.00480045 [event_method]: 1.115e-05 [auto_monad]: 5.432e-05 [graph_reusing]: 5.66998e-06 [inline]: 2.07001e-06 [add_attr]: 0.00306153, [1] [add_attr_with_inline]: 0.003054, [1] [Cycle 1]: 4.645e-05, [2] [tag_attr]: 1.236e-05 [meta_addattr_fg_expand]: 3.12997e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.248e-05 [insert-virtual-dataset]: 3.02002e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.46998e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0466467, [53] [py_interpret_to_execute]: 1.513e-05 [rewriter_before_opt_a]: 3.942e-05 [opt_a]: 0.00187092, [2] [Cycle 1]: 0.00125864, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 2.493e-05 [loop_unroll]: 1.352e-05 [a_1]: 0.00029259 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.30998e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.75e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.794e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.05002e-06 [merge_send_recv]: 8.09002e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.944e-05 [flash_sp]: 7.61999e-06 [merge_comm]: 3.41001e-06 [allreduce_fusion]: 3.28998e-06 [matmul_add_comm_reduction]: 9.27001e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.12002e-06 [virtual_dataset]: 5.90002e-06 [get_grad_eliminate_]: 5.77001e-06 [virtual_output]: 5.59e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 9.09998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.80001e-06 [before_grad]: 9.67999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 8.84998e-06 [renormalize]: 0.00034126 [add_forward_monad_depend]: 4.81002e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.36e-05 [cse]: 2.815e-05 [a_3]: 4.013e-05 [Cycle 2]: 0.0006033, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.88998e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.00012989 [with_stream_mark]: 8.73001e-06 [recompute_prepare]: 5.54998e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.20002e-06 [updatestate_loads_eliminate]: 2.41e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.858e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.28002e-06 [parallel]: 4.60001e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 4.87998e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 5.94999e-06 [virtual_dataset]: 5.05999e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 6.51999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.03e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.85001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.22999e-06 [after_resolve]: 1.049e-05 [a_after_grad]: 7.78001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.64001e-06 [cse]: 1.25e-05 [a_3]: 3.192e-05 [py_interpret_to_execute_after_opt_a]: 7.05002e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 3.067e-05 [convert_after_rewriter]: 7.08998e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.0004625 [opt_b]: 0.0428823, [1] [Cycle 1]: 0.0428752, [7] [b_1]: 0.042763 [b_2]: 1.159e-05 [updatestate_depend_eliminate]: 7.59002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.70002e-06 [renormalize]: 7.40023e-07 [cse]: 2.505e-05 [optimize_parallel_all_gather_comm]: 2.118e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.711e-05 [loop_unroll]: 0.00058905 [opt_after_cconv]: 0.00010012, [1] [Cycle 1]: 9.318e-05, [7] [c_1]: 2.877e-05 [parameter_eliminate]: 3.21001e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.23002e-06 [cse]: 1.702e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.393e-05 [tuple_transform]: 7.327e-05, [1] [Cycle 1]: 6.862e-05, [4] [d_1]: 4.208e-05 [none_parameter_eliminate]: 1.79998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.39001e-06 [partial_unused_args_eliminate]: 2.14e-06 [add_recomputation]: 5.11e-05 [cse_after_recomputation]: 2.205e-05, [1] [Cycle 1]: 1.706e-05, [1] [cse]: 1.118e-05 [environ_conv]: 4.97e-06 [swap_dp_allreduce_reducescatter]: 5.42001e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.49002e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.41998e-06 [slice_recompute_activation]: 2.79999e-06 [micro_interleaved_order_control]: 2.64999e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.22999e-06 [full_micro_interleaved_order_control]: 2.64001e-06 [reorder_send_recv_between_fp_bp]: 2.98e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.98002e-06 [control_data_broadcast_order]: 1.196e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 4.18001e-06 [overlap_recompute_and_grad_model_parallel]: 5.51e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.65002e-06 [overlap_grad_ring_attention]: 2.177e-05 [overlap_grad_flash_sp]: 1.944e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.60997e-06 [split_layernorm_comm]: 1.98002e-06 [handle_group_info]: 1.48002e-06 [symbol_engine_optimizer]: 7.825e-05, [1] [Cycle 1]: 7.32e-05, [6] [build]: 3.45e-06 [elim_shapecalc]: 1.027e-05 [elim_not_effective]: 1.332e-05 [opt_reshape]: 6.57002e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 1.90019e-07 [detach_backward]: 2.66e-06 [pipeline_parallel_scheduler]: 2.16e-06 [auto_monad_reorder]: 1.844e-05 [get_jit_bprop_graph]: 1.81998e-06 [rewriter_after_jit_bprop_graph]: 5.52001e-06 [opt_after_jit_grad]: 0.00055562 [validate]: 3.931e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00620036 [execute]: 8.05999e-06 Sums bootstrap : 0.000515s : 0.89% type_inference : 0.004800s : 8.26% event_method : 0.000011s : 0.02% auto_monad : 0.000054s : 0.09% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000022s : 0.04% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000039s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000422s : 0.73% optimize.opt_a.with_stream_mark : 0.000022s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.25% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000341s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.07% optimize.opt_a.a_3 : 0.000072s : 0.12% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.80% optimize.opt_b.b_1 : 0.042763s : 73.62% optimize.opt_b.b_2 : 0.000012s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.04% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.05% optimize.loop_unroll : 0.000589s : 1.01% optimize.opt_after_cconv.c_1 : 0.000029s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000042s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000022s : 0.04% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000556s : 0.96% validate : 0.000039s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.006200s : 10.67% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000123 26 18.55% : 0.000023s : 4: substitution.arithmetic_simplify 1.86% : 0.000002s : 2: substitution.elim_not_effective 1.16% : 0.000001s : 2: substitution.fold_const_symbol 4.53% : 0.000006s : 4: substitution.graph_param_transform 64.72% : 0.000080s : 2: substitution.inline 2.49% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.25% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004756 2 92.16% : 0.004383s : 1: type_inference.infer 7.84% : 0.000373s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000149 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 1.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.66% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.42% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.79% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.75% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.66% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.39% : 0.000001s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 5.80% : 0.000009s : 44: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.65% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.52% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.16% : 0.000002s : 13: predicate.partial_eliminate 0.72% : 0.000001s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.25% : 0.000002s : 17: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.54% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 1.04% : 0.000002s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.52% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.63% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.91% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.74% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.00% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 4: predicate.value_based_eliminate 0.94% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 3.21% : 0.000005s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000249 6 43.22% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.78% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.113194 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.71% : 0.003066s : 1: add_attr 2.70% : 0.003057s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000023s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.49% : 0.000554s : 1: bootstrap 0.03% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.01% : 0.000013s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000598s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.42% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000778s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000118s : 28: opt.transform.opt_b 0.04% : 0.000047s : 2: opt.transform.opt_trans_graph 0.03% : 0.000036s : 4: opt.transform.symbol_engine_opt 1.66% : 0.001874s : 1: opt_a 0.09% : 0.000104s : 1: opt_after_cconv 0.50% : 0.000566s : 1: opt_after_jit_grad 37.89% : 0.042887s : 1: opt_b 41.21% : 0.046652s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000025s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.17% : 0.000188s : 1: renormalize.infer 0.13% : 0.000146s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000081s : 1: symbol_engine_optimizer 5.49% : 0.006210s : 1: task_emit 0.07% : 0.000076s : 1: tuple_transform 4.25% : 0.004815s : 1: type_inference 0.06% : 0.000071s : 1: validate TotalTime = 0.0325104, [24] [bootstrap]: 0.00048369 [type_inference]: 0.0180817 [event_method]: 1.445e-05 [auto_monad]: 5.836e-05 [graph_reusing]: 5.38002e-06 [inline]: 2.54999e-06 [add_attr]: 0.00309743, [1] [add_attr_with_inline]: 0.00308969, [1] [Cycle 1]: 4.798e-05, [2] [tag_attr]: 1.645e-05 [meta_addattr_fg_expand]: 4.03999e-06 [parallel-infer-symbol]: 3.31999e-06 [pre_auto_parallel]: 2.618e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 2.01e-06 [optimize]: 0.00400736, [53] [py_interpret_to_execute]: 2.028e-05 [rewriter_before_opt_a]: 5.907e-05 [opt_a]: 0.00213509, [2] [Cycle 1]: 0.00153197, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 3.369e-05 [loop_unroll]: 2.064e-05 [a_1]: 0.00045439 [with_stream_mark]: 1.277e-05 [recompute_prepare]: 7.32002e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.06999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.634e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 5.59998e-06 [parallel]: 1.864e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 9.31998e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.03e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.83002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.041e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 9.56998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.41998e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00043337 [add_forward_monad_depend]: 4.33001e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.402e-05 [cse]: 2.844e-05 [a_3]: 4.065e-05 [Cycle 2]: 0.00059411, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7.03998e-06 [loop_unroll]: 5.56998e-06 [a_1]: 0.00012597 [with_stream_mark]: 8.90999e-06 [recompute_prepare]: 5.77001e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.915e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.28002e-06 [parallel]: 4.27998e-06 [flash_sp]: 3.56001e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.05002e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.61999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.22001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.33999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.46e-06 [a_after_grad]: 8.21002e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.686e-05 [a_3]: 3.192e-05 [py_interpret_to_execute_after_opt_a]: 7.28e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.216e-05 [convert_after_rewriter]: 6.88998e-06 [order_py_execute_after_rewriter]: 5.08002e-06 [mutable_eliminate]: 0.00045102 [opt_b]: 0.00018127, [1] [Cycle 1]: 0.00017531, [7] [b_1]: 0.00010787 [b_2]: 7.15003e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [renormalize]: 2.59985e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.679e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.227e-05 [loop_unroll]: 0.00044736 [opt_after_cconv]: 9.475e-05, [1] [Cycle 1]: 8.892e-05, [7] [c_1]: 2.731e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.575e-05 [renormalize]: 8.09989e-07 [remove_dup_value]: 1.318e-05 [tuple_transform]: 6.894e-05, [1] [Cycle 1]: 6.482e-05, [4] [d_1]: 3.951e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.25002e-06 [partial_unused_args_eliminate]: 1.96998e-06 [add_recomputation]: 4.456e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.56e-05, [1] [cse]: 1.043e-05 [environ_conv]: 4.53001e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.21e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.99977e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.191e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71998e-06 [overlap_recompute_comm]: 2.00002e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.74e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.33998e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 6.858e-05, [1] [Cycle 1]: 6.451e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 6.38e-06 [fold_const_symbol]: 8.92999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.616e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.45998e-06 [opt_after_jit_grad]: 0.00045426 [validate]: 3.142e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.00600158 [execute]: 7.34002e-06 Sums bootstrap : 0.000484s : 1.70% type_inference : 0.018082s : 63.55% event_method : 0.000014s : 0.05% auto_monad : 0.000058s : 0.21% graph_reusing : 0.000005s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.06% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000026s : 0.09% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.07% optimize.rewriter_before_opt_a : 0.000059s : 0.21% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.14% optimize.opt_a.loop_unroll : 0.000026s : 0.09% optimize.opt_a.a_1 : 0.000580s : 2.04% optimize.opt_a.with_stream_mark : 0.000022s : 0.08% optimize.opt_a.recompute_prepare : 0.000013s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.51% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.04% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.04% optimize.opt_a.merge_send_recv : 0.000013s : 0.04% optimize.opt_a.auto_parallel : 0.000011s : 0.04% optimize.opt_a.parallel : 0.000023s : 0.08% optimize.opt_a.flash_sp : 0.000011s : 0.04% optimize.opt_a.merge_comm : 0.000007s : 0.02% optimize.opt_a.allreduce_fusion : 0.000006s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.05% optimize.opt_a.virtual_dataset : 0.000011s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.04% optimize.opt_a.virtual_output : 0.000011s : 0.04% optimize.opt_a.merge_forward : 0.000006s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.02% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.07% optimize.opt_a.a_after_grad : 0.000017s : 0.06% optimize.opt_a.renormalize : 0.000433s : 1.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.07% optimize.opt_a.cse : 0.000045s : 0.16% optimize.opt_a.a_3 : 0.000073s : 0.26% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.11% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000451s : 1.59% optimize.opt_b.b_1 : 0.000108s : 0.38% optimize.opt_b.b_2 : 0.000007s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.08% optimize.loop_unroll : 0.000447s : 1.57% optimize.opt_after_cconv.c_1 : 0.000027s : 0.10% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.06% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.05% optimize.tuple_transform.d_1 : 0.000040s : 0.14% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.16% optimize.cse_after_recomputation.cse : 0.000010s : 0.04% optimize.environ_conv : 0.000005s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000454s : 1.60% validate : 0.000031s : 0.11% backend_pass : 0.000001s : 0.00% task_emit : 0.006002s : 21.09% execute : 0.000007s : 0.03% Time group info: ------[substitution.] 0.000170 30 14.79% : 0.000025s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.37% : 0.000006s : 4: substitution.graph_param_transform 66.54% : 0.000113s : 3: substitution.inline 1.85% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.40% : 0.000004s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.72% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.018037 2 96.70% : 0.017443s : 1: type_inference.infer 3.30% : 0.000595s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.09% : 0.000027s : 3: replace.inline 29.91% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.49% : 0.000111s : 3: match.inline 8.51% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.16% : 0.000002s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.98% : 0.000002s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.18% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.79% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.84% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 44.91% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.09% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.041156 196 0.01% : 0.000003s : 1: ForceFp32Comm 7.54% : 0.003102s : 1: add_attr 7.52% : 0.003093s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.12% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.15% : 0.000064s : 1: auto_monad 0.05% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.27% : 0.000523s : 1: bootstrap 0.06% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.06% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.05% : 0.000020s : 1: event_method 0.03% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.11% : 0.000456s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.12% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.03% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000013s : 1: opt.transform.mutable_eliminate 2.31% : 0.000949s : 78: opt.transform.opt_a 0.06% : 0.000026s : 1: opt.transform.opt_after_cconv 0.05% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.22% : 0.000091s : 28: opt.transform.opt_b 0.11% : 0.000044s : 2: opt.transform.opt_trans_graph 0.08% : 0.000032s : 4: opt.transform.symbol_engine_opt 5.19% : 0.002138s : 1: opt_a 0.24% : 0.000098s : 1: opt_after_cconv 1.13% : 0.000464s : 1: opt_after_jit_grad 0.45% : 0.000185s : 1: opt_b 9.75% : 0.004011s : 1: optimize 0.05% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.05% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.07% : 0.000030s : 1: pre_auto_parallel 0.06% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.04% : 0.000017s : 1: remove_dup_value 0.55% : 0.000228s : 1: renormalize.infer 0.48% : 0.000199s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000036s : 1: rewriter_after_opt_a 0.15% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000071s : 1: symbol_engine_optimizer 14.61% : 0.006012s : 1: task_emit 0.17% : 0.000072s : 1: tuple_transform 43.97% : 0.018097s : 1: type_inference 0.14% : 0.000058s : 1: validate TotalTime = 0.0872957, [24] [bootstrap]: 0.00054555 [type_inference]: 0.011679 [event_method]: 4.662e-05 [auto_monad]: 0.0001216 [graph_reusing]: 8.20999e-06 [inline]: 1.94999e-06 [add_attr]: 0.0031042, [1] [add_attr_with_inline]: 0.00309625, [1] [Cycle 1]: 7.17e-05, [2] [tag_attr]: 3.614e-05 [meta_addattr_fg_expand]: 9.48002e-06 [parallel-infer-symbol]: 3.32002e-06 [pre_auto_parallel]: 5.061e-05 [insert-virtual-dataset]: 2.43002e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0462867, [53] [py_interpret_to_execute]: 3.957e-05 [rewriter_before_opt_a]: 0.00014721 [opt_a]: 0.0439098, [3] [Cycle 1]: 0.0398991, [45] [expand_dump_flag]: 3.48e-06 [switch_simplify]: 7.458e-05 [loop_unroll]: 6.196e-05 [a_1]: 0.033649 [with_stream_mark]: 4.056e-05 [recompute_prepare]: 3.169e-05 [updatestate_depend_eliminate]: 1.042e-05 [updatestate_assign_eliminate]: 8.16002e-06 [updatestate_loads_eliminate]: 8.00999e-06 [parameter_eliminate]: 3.18e-06 [a_2]: 0.00025287 [accelerated_algorithm]: 3.254e-05 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 6.11e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.72e-05 [auto_parallel]: 1.208e-05 [parallel]: 1.991e-05 [flash_sp]: 1.219e-05 [merge_comm]: 9.37999e-06 [allreduce_fusion]: 8.95999e-06 [matmul_add_comm_reduction]: 2.889e-05 [allreduce_slice_to_reducescatter]: 7.99977e-07 [virtual_shard_identity]: 1.842e-05 [virtual_dataset]: 1.569e-05 [get_grad_eliminate_]: 1.533e-05 [virtual_output]: 1.549e-05 [merge_forward]: 9.31e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 1.874e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.918e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 2.782e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66e-06 [meta_fg_expand]: 0.00161456 [flash_sp_send_recv_attached]: 3.68e-06 [receive_attached]: 2.91e-06 [after_resolve]: 6.189e-05 [a_after_grad]: 8.267e-05 [renormalize]: 0.00278192 [add_forward_monad_depend]: 9.72999e-06 [auto_monad_grad]: 5.19998e-06 [auto_monad_eliminator]: 5.667e-05 [cse]: 0.00016892 [a_3]: 0.00033639 [Cycle 2]: 0.00308349, [45] [expand_dump_flag]: 1.86e-06 [switch_simplify]: 4.737e-05 [loop_unroll]: 4.453e-05 [a_1]: 0.00154067 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 1.154e-05 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.20001e-06 [a_2]: 0.00012907 [accelerated_algorithm]: 1.235e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.28002e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.70998e-06 [parallel]: 4.73001e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 5.82001e-06 [allreduce_fusion]: 5.17e-06 [matmul_add_comm_reduction]: 7.92e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 8.82999e-06 [get_grad_eliminate_]: 8.57998e-06 [virtual_output]: 8.48999e-06 [merge_forward]: 4.60999e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.63e-05 [merge_recompute_call_nodes]: 1.08001e-06 [before_grad]: 1.395e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 9.945e-05 [flash_sp_send_recv_attached]: 1.05999e-06 [receive_attached]: 1.57999e-06 [after_resolve]: 1.859e-05 [a_after_grad]: 1.433e-05 [renormalize]: 0.00059587 [add_forward_monad_depend]: 4.22998e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.442e-05 [cse]: 4.663e-05 [a_3]: 6.655e-05 [Cycle 3]: 0.00091263, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 1.066e-05 [loop_unroll]: 9.07001e-06 [a_1]: 0.00025175 [with_stream_mark]: 1.009e-05 [recompute_prepare]: 9.31e-06 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012417 [accelerated_algorithm]: 1.194e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.17999e-06 [merge_send_recv]: 7.06999e-06 [auto_parallel]: 7.02997e-06 [parallel]: 4.68001e-06 [flash_sp]: 1.19e-06 [merge_comm]: 5.00001e-06 [allreduce_fusion]: 4.90001e-06 [matmul_add_comm_reduction]: 7.85e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 1.037e-05 [virtual_dataset]: 8.70999e-06 [get_grad_eliminate_]: 8.54002e-06 [virtual_output]: 8.34002e-06 [merge_forward]: 4.29002e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 8.52998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.648e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.526e-05 [set_forward_comm_id_for_comm_node_pass]: 5.91e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.464e-05 [a_after_grad]: 1.465e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 1.088e-05 [cse]: 2.69e-05 [a_3]: 6.008e-05 [py_interpret_to_execute_after_opt_a]: 1.122e-05 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 5.174e-05 [convert_after_rewriter]: 9.22999e-06 [order_py_execute_after_rewriter]: 7.01001e-06 [mutable_eliminate]: 0.0004802 [opt_b]: 0.00029267, [1] [Cycle 1]: 0.00028584, [7] [b_1]: 0.00019157 [b_2]: 1.071e-05 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 4.29997e-06 [renormalize]: 4.19997e-07 [cse]: 3.182e-05 [optimize_parallel_all_gather_comm]: 2.076e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 2.148e-05 [loop_unroll]: 0.00043769 [opt_after_cconv]: 0.00013684, [1] [Cycle 1]: 0.00013112, [7] [c_1]: 4.81e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 7.40998e-06 [updatestate_assign_eliminate]: 4.20999e-06 [updatestate_loads_eliminate]: 4e-06 [cse]: 3.116e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 2.986e-05 [tuple_transform]: 0.00015142, [1] [Cycle 1]: 0.00014663, [4] [d_1]: 6.606e-05 [none_parameter_eliminate]: 1.99999e-06 [renormalize]: 2.79979e-07 [switch_simplify]: 1.039e-05 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 6.184e-05 [cse_after_recomputation]: 3.326e-05, [1] [Cycle 1]: 2.837e-05, [1] [cse]: 2.275e-05 [environ_conv]: 8.65999e-06 [swap_dp_allreduce_reducescatter]: 7.68001e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.56e-06 [micro_interleaved_order_control]: 2.25002e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.18001e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.761e-05 [grouped_pairwise_exchange_alltoall]: 1.87999e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 5.42999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69998e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 5.70001e-06 [overlap_grad_flash_sp]: 2.421e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 0.00010026, [1] [Cycle 1]: 9.592e-05, [6] [build]: 9.75002e-06 [elim_shapecalc]: 1.362e-05 [elim_not_effective]: 1.827e-05 [opt_reshape]: 1.025e-05 [fold_const_symbol]: 1.532e-05 [renormalize]: 2.69996e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 2.604e-05 [get_jit_bprop_graph]: 1.18001e-06 [rewriter_after_jit_bprop_graph]: 3.7e-06 [opt_after_jit_grad]: 0.00048388 [validate]: 4.72e-05 [backend_pass]: 1.30001e-06 [task_emit]: 0.0246516 [execute]: 7.29001e-06 Sums bootstrap : 0.000546s : 0.66% type_inference : 0.011679s : 14.10% event_method : 0.000047s : 0.06% auto_monad : 0.000122s : 0.15% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.05% optimize.rewriter_before_opt_a : 0.000147s : 0.18% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000133s : 0.16% optimize.opt_a.loop_unroll : 0.000116s : 0.14% optimize.opt_a.a_1 : 0.035441s : 42.79% optimize.opt_a.with_stream_mark : 0.000064s : 0.08% optimize.opt_a.recompute_prepare : 0.000053s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000506s : 0.61% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.07% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.04% optimize.opt_a.merge_send_recv : 0.000031s : 0.04% optimize.opt_a.auto_parallel : 0.000027s : 0.03% optimize.opt_a.parallel : 0.000029s : 0.04% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.05% optimize.opt_a.virtual_dataset : 0.000033s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.04% optimize.opt_a.virtual_output : 0.000032s : 0.04% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.03% optimize.opt_a.meta_fg_expand : 0.001717s : 2.07% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000095s : 0.11% optimize.opt_a.a_after_grad : 0.000112s : 0.13% optimize.opt_a.renormalize : 0.003378s : 4.08% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.10% optimize.opt_a.cse : 0.000242s : 0.29% optimize.opt_a.a_3 : 0.000463s : 0.56% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000052s : 0.06% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000480s : 0.58% optimize.opt_b.b_1 : 0.000192s : 0.23% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000438s : 0.53% optimize.opt_after_cconv.c_1 : 0.000048s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.04% optimize.tuple_transform.d_1 : 0.000066s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.07% optimize.cse_after_recomputation.cse : 0.000023s : 0.03% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000484s : 0.58% validate : 0.000047s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.024652s : 29.76% execute : 0.000007s : 0.01% Time group info: ------[substitution.] 0.000789 222 7.26% : 0.000057s : 12: substitution.arithmetic_simplify 1.81% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000007s : 8: substitution.graph_param_transform 0.47% : 0.000004s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 54.06% : 0.000427s : 17: substitution.inline 2.14% : 0.000017s : 2: substitution.inline_without_move 1.39% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000016s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.70% : 0.000013s : 20: substitution.remove_not_recompute_node 3.02% : 0.000024s : 10: substitution.replace_applicator 1.43% : 0.000011s : 15: substitution.replace_old_param 0.61% : 0.000005s : 1: substitution.set_cell_output_no_recompute 3.53% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.72% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.81% : 0.000070s : 30: substitution.tuple_list_get_item_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011608 2 87.33% : 0.010137s : 1: type_inference.infer 12.67% : 0.001470s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.43% : 0.000126s : 17: replace.inline 42.57% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000452 33 92.33% : 0.000418s : 17: match.inline 7.67% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000770 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000009s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.22% : 0.000017s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.80% : 0.000014s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.68% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000043s : 249: predicate.inline 1.23% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.93% : 0.000015s : 100: predicate.list_to_tuple_eliminator_ 2.63% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000009s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.98% : 0.000015s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000011s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000015s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 101: predicate.switch_defer_inline 2.87% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.90% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.42% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.60% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.23% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001603 34 57.05% : 0.000914s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.95% : 0.000688s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.177529 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.75% : 0.003108s : 1: add_attr 1.75% : 0.003100s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000129s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.33% : 0.000584s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000054s : 1: event_method 0.01% : 0.000012s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000446s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000489s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 20.92% : 0.037139s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000176s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 24.74% : 0.043913s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.28% : 0.000493s : 1: opt_after_jit_grad 0.17% : 0.000296s : 1: opt_b 26.08% : 0.046291s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.02% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.06% : 0.001877s : 2: renormalize.infer 0.84% : 0.001487s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000056s : 1: rewriter_after_opt_a 0.09% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000103s : 1: symbol_engine_optimizer 13.89% : 0.024663s : 1: task_emit 0.09% : 0.000154s : 1: tuple_transform 6.59% : 0.011694s : 1: type_inference 0.04% : 0.000080s : 1: validate TotalTime = 0.0189074, [24] [bootstrap]: 0.00049136 [type_inference]: 0.00447258 [event_method]: 1.096e-05 [auto_monad]: 5.25e-05 [graph_reusing]: 5.74999e-06 [inline]: 1.74e-06 [add_attr]: 0.00308628, [1] [add_attr_with_inline]: 0.00307873, [1] [Cycle 1]: 4.393e-05, [2] [tag_attr]: 1.202e-05 [meta_addattr_fg_expand]: 3.32002e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.263e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.99999e-06 [optimize]: 0.00371399, [53] [py_interpret_to_execute]: 1.554e-05 [rewriter_before_opt_a]: 3.848e-05 [opt_a]: 0.00186432, [2] [Cycle 1]: 0.00126524, [45] [expand_dump_flag]: 2.98998e-06 [switch_simplify]: 2.423e-05 [loop_unroll]: 1.396e-05 [a_1]: 0.0002913 [with_stream_mark]: 1.292e-05 [recompute_prepare]: 7.03e-06 [updatestate_depend_eliminate]: 3.26001e-06 [updatestate_assign_eliminate]: 3.54002e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 7.563e-05 [accelerated_algorithm]: 6.33998e-06 [shard]: 2.56998e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 8.03001e-06 [auto_parallel]: 5.99999e-06 [parallel]: 1.785e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 9.87999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.99e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 3.75998e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.50001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.115e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 1.003e-05 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.06998e-06 [after_resolve]: 1.143e-05 [a_after_grad]: 9.84001e-06 [renormalize]: 0.00034901 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.363e-05 [cse]: 2.878e-05 [a_3]: 3.964e-05 [Cycle 2]: 0.00058993, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 6.94001e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012593 [with_stream_mark]: 9.57999e-06 [recompute_prepare]: 5.50001e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.53003e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.796e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.47001e-06 [parallel]: 4.05998e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.19999e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 4.85999e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 5.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42999e-06 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.37999e-06 [a_after_grad]: 8.03999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.476e-05 [a_3]: 3.121e-05 [py_interpret_to_execute_after_opt_a]: 7.28999e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.19e-05 [convert_after_rewriter]: 7.66999e-06 [order_py_execute_after_rewriter]: 5.58002e-06 [mutable_eliminate]: 0.00044995 [opt_b]: 0.00018034, [1] [Cycle 1]: 0.00017428, [7] [b_1]: 0.00010698 [b_2]: 7.01999e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 3.70026e-07 [cse]: 1.614e-05 [optimize_parallel_all_gather_comm]: 1.687e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.261e-05 [loop_unroll]: 0.00044635 [opt_after_cconv]: 9.435e-05, [1] [Cycle 1]: 8.863e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.582e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.354e-05 [tuple_transform]: 6.775e-05, [1] [Cycle 1]: 6.361e-05, [4] [d_1]: 3.866e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.924e-05 [cse_after_recomputation]: 2.145e-05, [1] [Cycle 1]: 1.7e-05, [1] [cse]: 1.172e-05 [environ_conv]: 5.21998e-06 [swap_dp_allreduce_reducescatter]: 5.77001e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.46998e-06 [slice_recompute_activation]: 2.68e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.04003e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.169e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 4.13001e-06 [overlap_recompute_and_grad_model_parallel]: 4.48999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.76e-06 [overlap_recompute_comm]: 1.99999e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.737e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.61999e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.775e-05, [1] [Cycle 1]: 6.381e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.17e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 6.34001e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.49e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.545e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00045266 [validate]: 3.121e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00632959 [execute]: 6.81999e-06 Sums bootstrap : 0.000491s : 3.30% type_inference : 0.004473s : 30.08% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.81% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000349s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000044s : 0.29% optimize.opt_a.a_3 : 0.000071s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.03% optimize.opt_b.b_1 : 0.000107s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000446s : 3.00% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.33% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000453s : 3.04% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006330s : 42.57% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 17.91% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000001s : 2: substitution.fold_const_symbol 4.37% : 0.000005s : 4: substitution.graph_param_transform 65.76% : 0.000081s : 2: substitution.inline 2.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.62% : 0.000004s : 4: substitution.remove_not_recompute_node 3.38% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004430 2 91.96% : 0.004074s : 1: type_inference.infer 8.04% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.75% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.73% : 0.000001s : 8: predicate.float_environ_get_switch 1.09% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 1.01% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.24% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 1.04% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 2.01% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.32% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.90% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 41.95% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.05% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026985 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.45% : 0.003090s : 1: add_attr 11.42% : 0.003082s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.20% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000058s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.96% : 0.000528s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.69% : 0.000455s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.70% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.85% : 0.000770s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.92% : 0.001867s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000462s : 1: opt_after_jit_grad 0.68% : 0.000184s : 1: opt_b 13.78% : 0.003718s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.72% : 0.000194s : 1: renormalize.infer 0.55% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.49% : 0.006340s : 1: task_emit 0.26% : 0.000070s : 1: tuple_transform 16.63% : 0.004486s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0439477, [24] [bootstrap]: 0.00052496 [type_inference]: 0.0168758 [event_method]: 4.203e-05 [auto_monad]: 0.00011603 [graph_reusing]: 8.65001e-06 [inline]: 2.29001e-06 [add_attr]: 0.00354424, [1] [add_attr_with_inline]: 0.00353576, [1] [Cycle 1]: 6.989e-05, [2] [tag_attr]: 3.255e-05 [meta_addattr_fg_expand]: 8.57e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 4.729e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0136114, [53] [py_interpret_to_execute]: 3.746e-05 [rewriter_before_opt_a]: 0.00013155 [opt_a]: 0.0111613, [3] [Cycle 1]: 0.00725925, [45] [expand_dump_flag]: 3.63999e-06 [switch_simplify]: 6.702e-05 [loop_unroll]: 5.42e-05 [a_1]: 0.00138601 [with_stream_mark]: 2.434e-05 [recompute_prepare]: 2.135e-05 [updatestate_depend_eliminate]: 9.14e-06 [updatestate_assign_eliminate]: 7.72998e-06 [updatestate_loads_eliminate]: 7.15e-06 [parameter_eliminate]: 2.36e-06 [a_2]: 0.00024477 [accelerated_algorithm]: 3.077e-05 [shard]: 1.91003e-06 [meta_shard_fg_expand]: 3.62998e-06 [shard_inline]: 1.64e-05 [merge_send_recv]: 1.613e-05 [auto_parallel]: 1.138e-05 [parallel]: 1.801e-05 [flash_sp]: 1.146e-05 [merge_comm]: 9.82999e-06 [allreduce_fusion]: 8.73001e-06 [matmul_add_comm_reduction]: 2.554e-05 [allreduce_slice_to_reducescatter]: 8.69972e-07 [virtual_shard_identity]: 1.782e-05 [virtual_dataset]: 1.562e-05 [get_grad_eliminate_]: 1.504e-05 [virtual_output]: 1.521e-05 [merge_forward]: 9.37999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 1.772e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.925e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 2.708e-05 [set_forward_comm_id_for_comm_node_pass]: 1.007e-05 [meta_fg_expand]: 0.00150785 [flash_sp_send_recv_attached]: 3.51999e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 5.913e-05 [a_after_grad]: 8.075e-05 [renormalize]: 0.0026081 [add_forward_monad_depend]: 9.23002e-06 [auto_monad_grad]: 5.58002e-06 [auto_monad_eliminator]: 5.546e-05 [cse]: 0.00016629 [a_3]: 0.00033402 [Cycle 2]: 0.00298808, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 4.666e-05 [loop_unroll]: 4.342e-05 [a_1]: 0.00155931 [with_stream_mark]: 1.233e-05 [recompute_prepare]: 1.1e-05 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.61999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012677 [accelerated_algorithm]: 1.225e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.39002e-06 [parallel]: 4.76002e-06 [flash_sp]: 3.06999e-06 [merge_comm]: 5.34e-06 [allreduce_fusion]: 4.62e-06 [matmul_add_comm_reduction]: 7.73001e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 1.025e-05 [virtual_dataset]: 8.80999e-06 [get_grad_eliminate_]: 8.91002e-06 [virtual_output]: 8.22e-06 [merge_forward]: 4.50001e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.618e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.504e-05 [set_forward_comm_id_for_comm_node_pass]: 5.86e-06 [meta_fg_expand]: 3.663e-05 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.508e-05 [a_after_grad]: 1.411e-05 [renormalize]: 0.0005804 [add_forward_monad_depend]: 4.30999e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.494e-05 [cse]: 4.602e-05 [a_3]: 6.517e-05 [Cycle 3]: 0.00090001, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 1.048e-05 [loop_unroll]: 8.84998e-06 [a_1]: 0.00024812 [with_stream_mark]: 9.67001e-06 [recompute_prepare]: 8.93002e-06 [updatestate_depend_eliminate]: 4.83001e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.83001e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 0.00012319 [accelerated_algorithm]: 1.166e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 8.69003e-06 [merge_send_recv]: 7.00998e-06 [auto_parallel]: 7.47002e-06 [parallel]: 4.60999e-06 [flash_sp]: 1.15001e-06 [merge_comm]: 5.09e-06 [allreduce_fusion]: 5.08002e-06 [matmul_add_comm_reduction]: 7.41999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.64999e-06 [virtual_dataset]: 8.64998e-06 [get_grad_eliminate_]: 8.45001e-06 [virtual_output]: 8.34002e-06 [merge_forward]: 4.52e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 8.57e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.623e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.511e-05 [set_forward_comm_id_for_comm_node_pass]: 5.96e-06 [meta_fg_expand]: 3.00998e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.429e-05 [a_after_grad]: 1.45e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.076e-05 [cse]: 2.522e-05 [a_3]: 5.932e-05 [py_interpret_to_execute_after_opt_a]: 1.056e-05 [slice_cell_reuse_recomputed_activation]: 2.17001e-06 [rewriter_after_opt_a]: 4.643e-05 [convert_after_rewriter]: 9.15999e-06 [order_py_execute_after_rewriter]: 6.61e-06 [mutable_eliminate]: 0.0004691 [opt_b]: 0.000287, [1] [Cycle 1]: 0.00028051, [7] [b_1]: 0.0001884 [b_2]: 1.086e-05 [updatestate_depend_eliminate]: 7.45e-06 [updatestate_assign_eliminate]: 4.24002e-06 [updatestate_loads_eliminate]: 3.98001e-06 [renormalize]: 3.9002e-07 [cse]: 3.074e-05 [optimize_parallel_all_gather_comm]: 2.078e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 1.999e-05 [loop_unroll]: 0.0004307 [opt_after_cconv]: 0.00013387, [1] [Cycle 1]: 0.00012768, [7] [c_1]: 4.774e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 4.05e-06 [cse]: 2.882e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 6.098e-05 [tuple_transform]: 0.00024308, [1] [Cycle 1]: 0.00023798, [4] [d_1]: 0.00018364 [none_parameter_eliminate]: 2.34001e-06 [renormalize]: 2.70025e-07 [switch_simplify]: 2.643e-05 [partial_unused_args_eliminate]: 2.23998e-06 [add_recomputation]: 6.012e-05 [cse_after_recomputation]: 3.615e-05, [1] [Cycle 1]: 3.038e-05, [1] [cse]: 2.436e-05 [environ_conv]: 8.60001e-06 [swap_dp_allreduce_reducescatter]: 8.16002e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.47e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.53002e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 1.06002e-06 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 9.90025e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.722e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.19e-06 [overlap_recompute_and_grad_model_parallel]: 5.71e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 5.24e-06 [overlap_grad_flash_sp]: 2.379e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 1.87001e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 0.00010174, [1] [Cycle 1]: 9.691e-05, [6] [build]: 9.08002e-06 [elim_shapecalc]: 1.431e-05 [elim_not_effective]: 1.918e-05 [opt_reshape]: 1.033e-05 [fold_const_symbol]: 1.518e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.59998e-06 [auto_monad_reorder]: 2.523e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.81999e-06 [opt_after_jit_grad]: 0.00048315 [validate]: 4.513e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00838118 [execute]: 6.99001e-06 Sums bootstrap : 0.000525s : 1.34% type_inference : 0.016876s : 43.14% event_method : 0.000042s : 0.11% auto_monad : 0.000116s : 0.30% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.12% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.10% optimize.rewriter_before_opt_a : 0.000132s : 0.34% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.32% optimize.opt_a.loop_unroll : 0.000106s : 0.27% optimize.opt_a.a_1 : 0.003193s : 8.16% optimize.opt_a.with_stream_mark : 0.000046s : 0.12% optimize.opt_a.recompute_prepare : 0.000041s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000495s : 1.26% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.14% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.09% optimize.opt_a.merge_send_recv : 0.000030s : 0.08% optimize.opt_a.auto_parallel : 0.000026s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.07% optimize.opt_a.flash_sp : 0.000016s : 0.04% optimize.opt_a.merge_comm : 0.000020s : 0.05% optimize.opt_a.allreduce_fusion : 0.000018s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.10% optimize.opt_a.virtual_dataset : 0.000033s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.08% optimize.opt_a.virtual_output : 0.000032s : 0.08% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.06% optimize.opt_a.meta_fg_expand : 0.001547s : 3.96% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.23% optimize.opt_a.a_after_grad : 0.000109s : 0.28% optimize.opt_a.renormalize : 0.003189s : 8.15% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.21% optimize.opt_a.cse : 0.000238s : 0.61% optimize.opt_a.a_3 : 0.000459s : 1.17% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.12% optimize.convert_after_rewriter : 0.000009s : 0.02% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000469s : 1.20% optimize.opt_b.b_1 : 0.000188s : 0.48% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.05% optimize.loop_unroll : 0.000431s : 1.10% optimize.opt_after_cconv.c_1 : 0.000048s : 0.12% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.07% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000061s : 0.16% optimize.tuple_transform.d_1 : 0.000184s : 0.47% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000026s : 0.07% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.15% optimize.cse_after_recomputation.cse : 0.000024s : 0.06% optimize.environ_conv : 0.000009s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000483s : 1.24% validate : 0.000045s : 0.12% backend_pass : 0.000001s : 0.00% task_emit : 0.008381s : 21.43% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000773 218 5.69% : 0.000044s : 11: substitution.arithmetic_simplify 1.77% : 0.000014s : 2: substitution.cast_eliminate 0.41% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.27% : 0.000010s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 52.10% : 0.000403s : 16: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000015s : 3: substitution.less_batch_normalization 1.65% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.76% : 0.000014s : 20: substitution.remove_not_recompute_node 3.10% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.48% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 12.59% : 0.000097s : 28: substitution.tuple_list_get_item_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.016801 2 92.35% : 0.015516s : 1: type_inference.infer 7.65% : 0.001286s : 1: type_inference.specialize ------[replace.] 0.000206 30 58.86% : 0.000121s : 16: replace.inline 41.14% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000426 30 92.69% : 0.000394s : 16: match.inline 7.31% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000742 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.55% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000017s : 97: predicate.float_depend_g_call 0.55% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 244: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.06% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.58% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.19% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.96% : 0.000037s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.31% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001505 32 57.88% : 0.000871s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.12% : 0.000634s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.069592 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.10% : 0.003549s : 1: add_attr 5.09% : 0.003540s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000123s : 1: auto_monad 0.04% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.81% : 0.000561s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000039s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.07% : 0.000049s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.63% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.69% : 0.000479s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 6.96% : 0.004840s : 117: opt.transform.opt_a 0.07% : 0.000046s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000174s : 28: opt.transform.opt_b 0.30% : 0.000207s : 2: opt.transform.opt_trans_graph 0.08% : 0.000056s : 4: opt.transform.symbol_engine_opt 16.04% : 0.011164s : 1: opt_a 0.20% : 0.000137s : 1: opt_after_cconv 0.71% : 0.000493s : 1: opt_after_jit_grad 0.42% : 0.000291s : 1: opt_b 19.56% : 0.013615s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.07% : 0.000052s : 1: pre_auto_parallel 0.06% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.09% : 0.000066s : 1: remove_dup_value 2.57% : 0.001788s : 2: renormalize.infer 1.99% : 0.001388s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000051s : 1: rewriter_after_opt_a 0.20% : 0.000136s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000104s : 1: symbol_engine_optimizer 12.06% : 0.008392s : 1: task_emit 0.35% : 0.000246s : 1: tuple_transform 24.27% : 0.016893s : 1: type_inference 0.11% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x5-kbk],max_mem:6.0M . TotalTime = 0.95509, [24] [bootstrap]: 0.00054822 [type_inference]: 0.00645169 [event_method]: 1.403e-05 [auto_monad]: 5.55e-05 [graph_reusing]: 4.85001e-06 [inline]: 1.77001e-06 [add_attr]: 0.00347178, [1] [add_attr_with_inline]: 0.00346086, [1] [Cycle 1]: 4.474e-05, [2] [tag_attr]: 1.482e-05 [meta_addattr_fg_expand]: 3.98999e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.738e-05 [insert-virtual-dataset]: 2.21e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.10002e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00399455, [53] [py_interpret_to_execute]: 2.042e-05 [rewriter_before_opt_a]: 5.828e-05 [opt_a]: 0.0021193, [2] [Cycle 1]: 0.001521, [45] [expand_dump_flag]: 2.76e-06 [switch_simplify]: 3.216e-05 [loop_unroll]: 2.106e-05 [a_1]: 0.00045515 [with_stream_mark]: 1.332e-05 [recompute_prepare]: 8.97999e-06 [updatestate_depend_eliminate]: 4.29002e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 3.24001e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.731e-05 [accelerated_algorithm]: 6.44001e-06 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.23e-06 [merge_send_recv]: 7.83001e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.474e-05 [flash_sp]: 7.30998e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.75003e-06 [matmul_add_comm_reduction]: 9.35001e-06 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 7.41001e-06 [virtual_dataset]: 6.27001e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 9.46003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.29998e-06 [before_grad]: 9.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.53e-06 [after_resolve]: 9.94001e-06 [a_after_grad]: 8.74e-06 [renormalize]: 0.00040945 [add_forward_monad_depend]: 4.95999e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.307e-05 [cse]: 2.725e-05 [a_3]: 4.106e-05 [Cycle 2]: 0.00058867, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.75998e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.0001261 [with_stream_mark]: 9.29e-06 [recompute_prepare]: 5.46e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.21998e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 6.808e-05 [accelerated_algorithm]: 5.67999e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.31002e-06 [merge_send_recv]: 4.40999e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.27e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 3.03998e-06 [allreduce_fusion]: 2.80002e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.16998e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 4.99998e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42999e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 1.08001e-06 [receive_attached]: 1.10001e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 8.11002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 5.91e-06 [cse]: 1.241e-05 [a_3]: 3.202e-05 [py_interpret_to_execute_after_opt_a]: 7.42002e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 3.038e-05 [convert_after_rewriter]: 6.84001e-06 [order_py_execute_after_rewriter]: 4.80999e-06 [mutable_eliminate]: 0.00045266 [opt_b]: 0.00018713, [1] [Cycle 1]: 0.00018116, [7] [b_1]: 0.00011395 [b_2]: 7.07002e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [renormalize]: 3.10014e-07 [cse]: 1.556e-05 [optimize_parallel_all_gather_comm]: 1.544e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 2.236e-05 [loop_unroll]: 0.00043861 [opt_after_cconv]: 9.479e-05, [1] [Cycle 1]: 8.886e-05, [7] [c_1]: 2.82e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.559e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.344e-05 [tuple_transform]: 6.856e-05, [1] [Cycle 1]: 6.419e-05, [4] [d_1]: 3.882e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 5.136e-05 [cse_after_recomputation]: 2.053e-05, [1] [Cycle 1]: 1.615e-05, [1] [cse]: 1.109e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 5.27001e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 3.99997e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.01e-06 [micro_interleaved_order_control]: 2.02999e-06 [assign_add_opt]: 1.59e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.06997e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07001e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.74999e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.651e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.906e-05, [1] [Cycle 1]: 6.502e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.209e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 9.29e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.619e-05 [get_jit_bprop_graph]: 1.24e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00045605 [validate]: 3.136e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.939774 [execute]: 1.004e-05 Sums bootstrap : 0.000548s : 0.06% type_inference : 0.006452s : 0.68% event_method : 0.000014s : 0.00% auto_monad : 0.000055s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000581s : 0.06% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000410s : 0.04% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000040s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000453s : 0.05% optimize.opt_b.b_1 : 0.000114s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000439s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000456s : 0.05% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.939774s : 98.86% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000165 30 14.69% : 0.000024s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000002s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 66.33% : 0.000110s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.57% : 0.000004s : 4: substitution.replace_old_param 6.83% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006405 2 91.38% : 0.005853s : 1: type_inference.infer 8.62% : 0.000552s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.05% : 0.000027s : 3: replace.inline 28.95% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.33% : 0.000108s : 3: match.inline 8.67% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.84% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.83% : 0.000009s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.86% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.71% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.53% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.93% : 0.000002s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000002s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.05% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.93% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.26% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 46.35% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.65% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.964070 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.36% : 0.003477s : 1: add_attr 0.36% : 0.003464s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000061s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000585s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000448s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000461s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.10% : 0.000951s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000095s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.22% : 0.002122s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.05% : 0.000466s : 1: opt_after_jit_grad 0.02% : 0.000191s : 1: opt_b 0.41% : 0.003998s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.02% : 0.000209s : 1: renormalize.infer 0.02% : 0.000194s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000034s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 97.48% : 0.939798s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.67% : 0.006466s : 1: type_inference 0.01% : 0.000054s : 1: validate TotalTime = 0.157894, [24] [bootstrap]: 0.00049371 [type_inference]: 0.0287905 [event_method]: 1.089e-05 [auto_monad]: 5.069e-05 [graph_reusing]: 4.92e-06 [inline]: 1.67001e-06 [add_attr]: 0.00298547, [1] [add_attr_with_inline]: 0.00297745, [1] [Cycle 1]: 4.396e-05, [2] [tag_attr]: 1.208e-05 [meta_addattr_fg_expand]: 3.33e-06 [parallel-infer-symbol]: 3.02002e-06 [pre_auto_parallel]: 2.17e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00386227, [53] [py_interpret_to_execute]: 1.479e-05 [rewriter_before_opt_a]: 3.855e-05 [opt_a]: 0.00206409, [2] [Cycle 1]: 0.00128683, [45] [expand_dump_flag]: 2.42001e-06 [switch_simplify]: 2.423e-05 [loop_unroll]: 1.365e-05 [a_1]: 0.00029719 [with_stream_mark]: 1.325e-05 [recompute_prepare]: 7.46001e-06 [updatestate_depend_eliminate]: 4.00998e-06 [updatestate_assign_eliminate]: 3.09001e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.73e-05 [accelerated_algorithm]: 6.73998e-06 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.04001e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 5.72001e-06 [parallel]: 1.775e-05 [flash_sp]: 7.03e-06 [merge_comm]: 3.84002e-06 [allreduce_fusion]: 3.44001e-06 [matmul_add_comm_reduction]: 9.00999e-06 [allreduce_slice_to_reducescatter]: 8.29983e-07 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 6.02999e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.64998e-06 [merge_forward]: 3.62002e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.25002e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.71997e-06 [renormalize]: 0.00036944 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 2.708e-05 [a_3]: 3.951e-05 [Cycle 2]: 0.00076803, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00016191 [with_stream_mark]: 1.353e-05 [recompute_prepare]: 8.67e-06 [updatestate_depend_eliminate]: 3.21999e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.392e-05 [accelerated_algorithm]: 5.75001e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.59e-06 [parallel]: 4.13999e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.56998e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.61998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 8.97999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 7.21001e-06 [cse]: 1.516e-05 [a_3]: 3.249e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 3.109e-05 [convert_after_rewriter]: 6.43e-06 [order_py_execute_after_rewriter]: 4.65001e-06 [mutable_eliminate]: 0.00044658 [opt_b]: 0.00018163, [1] [Cycle 1]: 0.00017567, [7] [b_1]: 0.00010855 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.60015e-07 [cse]: 1.633e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.166e-05 [loop_unroll]: 0.00041175 [opt_after_cconv]: 9.458e-05, [1] [Cycle 1]: 8.88e-05, [7] [c_1]: 2.755e-05 [parameter_eliminate]: 2.12001e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.627e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.257e-05 [tuple_transform]: 6.971e-05, [1] [Cycle 1]: 6.541e-05, [4] [d_1]: 3.931e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.221e-05 [cse_after_recomputation]: 2.02e-05, [1] [Cycle 1]: 1.59e-05, [1] [cse]: 1.063e-05 [environ_conv]: 5.04003e-06 [swap_dp_allreduce_reducescatter]: 5.43002e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 3.85e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.38998e-06 [micro_interleaved_order_control]: 2.06003e-06 [assign_add_opt]: 1.52999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.16002e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.00001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 4.12003e-06 [overlap_grad_flash_sp]: 1.649e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.84e-05, [1] [Cycle 1]: 6.429e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 8.72e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 5.85002e-06 [fold_const_symbol]: 8.70999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.58002e-06 [pipeline_parallel_scheduler]: 1.27e-06 [auto_monad_reorder]: 1.531e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00044637 [validate]: 2.987e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.120951 [execute]: 9.34998e-06 Sums bootstrap : 0.000494s : 0.32% type_inference : 0.028790s : 18.72% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.03% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.03% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.02% optimize.opt_a.loop_unroll : 0.000019s : 0.01% optimize.opt_a.a_1 : 0.000459s : 0.30% optimize.opt_a.with_stream_mark : 0.000027s : 0.02% optimize.opt_a.recompute_prepare : 0.000016s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.10% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.01% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000370s : 0.24% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.01% optimize.opt_a.cse : 0.000042s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.02% optimize.convert_after_rewriter : 0.000006s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.29% optimize.opt_b.b_1 : 0.000109s : 0.07% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.01% optimize.loop_unroll : 0.000412s : 0.27% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.29% validate : 0.000030s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.120951s : 78.63% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000124 26 19.04% : 0.000024s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.83% : 0.000006s : 4: substitution.graph_param_transform 65.16% : 0.000081s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.33% : 0.000004s : 4: substitution.remove_not_recompute_node 3.01% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.028750 2 98.51% : 0.028322s : 1: type_inference.infer 1.49% : 0.000428s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000141 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.53% : 0.000004s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.80% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.06% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.50% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.38% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.80% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.87% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.16% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 0.88% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.22% : 0.000002s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.31% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 1.02% : 0.000001s : 9: predicate.transpose_eliminate 1.42% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.99% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000442 6 45.87% : 0.000202s : 2: func_graph_cloner_run.FuncGraphClonerGraph 54.13% : 0.000239s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.166084 196 0.00% : 0.000003s : 1: ForceFp32Comm 1.80% : 0.002990s : 1: add_attr 1.79% : 0.002981s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.03% : 0.000056s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.32% : 0.000529s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000455s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.49% : 0.000819s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.24% : 0.002067s : 1: opt_a 0.06% : 0.000098s : 1: opt_after_cconv 0.27% : 0.000456s : 1: opt_after_jit_grad 0.11% : 0.000185s : 1: opt_b 2.33% : 0.003866s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000007s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.01% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.11% : 0.000182s : 1: renormalize.infer 0.11% : 0.000180s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000035s : 1: rewriter_after_opt_a 0.03% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000071s : 1: symbol_engine_optimizer 72.84% : 0.120970s : 1: task_emit 0.04% : 0.000073s : 1: tuple_transform 17.34% : 0.028804s : 1: type_inference 0.03% : 0.000051s : 1: validate TotalTime = 0.0935631, [24] [bootstrap]: 0.00048103 [type_inference]: 0.00553945 [event_method]: 1.4e-05 [auto_monad]: 5.434e-05 [graph_reusing]: 5.86e-06 [inline]: 2.07001e-06 [add_attr]: 0.00304685, [1] [add_attr_with_inline]: 0.00303877, [1] [Cycle 1]: 4.447e-05, [2] [tag_attr]: 1.573e-05 [meta_addattr_fg_expand]: 4.08999e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.536e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00409819, [53] [py_interpret_to_execute]: 3.882e-05 [rewriter_before_opt_a]: 5.702e-05 [opt_a]: 0.00224504, [2] [Cycle 1]: 0.00164118, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 3.294e-05 [loop_unroll]: 2.096e-05 [a_1]: 0.00047586 [with_stream_mark]: 1.281e-05 [recompute_prepare]: 2.942e-05 [updatestate_depend_eliminate]: 4.15e-06 [updatestate_assign_eliminate]: 3.46001e-06 [updatestate_loads_eliminate]: 3.26001e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.968e-05 [accelerated_algorithm]: 6.97002e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.22998e-06 [auto_parallel]: 6.27001e-06 [parallel]: 1.789e-05 [flash_sp]: 6.84999e-06 [merge_comm]: 3.81001e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.3e-06 [virtual_dataset]: 2.49e-05 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.73002e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.66e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65998e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.36998e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.085e-05 [a_after_grad]: 8.93002e-06 [renormalize]: 0.00046944 [add_forward_monad_depend]: 4.89998e-06 [auto_monad_grad]: 1.96003e-06 [auto_monad_eliminator]: 1.393e-05 [cse]: 2.845e-05 [a_3]: 4.159e-05 [Cycle 2]: 0.0005941, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012465 [with_stream_mark]: 9.34e-06 [recompute_prepare]: 5.63997e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.23002e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.914e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.04998e-06 [parallel]: 4.29002e-06 [flash_sp]: 3.91001e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.77001e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 7.31999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.30001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.08999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.82002e-06 [meta_fg_expand]: 1.50999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.26998e-06 [a_after_grad]: 7.97998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.329e-05 [a_3]: 3.126e-05 [py_interpret_to_execute_after_opt_a]: 7.63001e-06 [slice_cell_reuse_recomputed_activation]: 1.88997e-06 [rewriter_after_opt_a]: 3.223e-05 [convert_after_rewriter]: 7.25e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.00045398 [opt_b]: 0.00018269, [1] [Cycle 1]: 0.0001764, [7] [b_1]: 0.00010882 [b_2]: 7.19001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 3.00002e-07 [cse]: 1.602e-05 [optimize_parallel_all_gather_comm]: 1.586e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 2.176e-05 [loop_unroll]: 0.00041448 [opt_after_cconv]: 9.393e-05, [1] [Cycle 1]: 8.829e-05, [7] [c_1]: 2.74e-05 [parameter_eliminate]: 2.13002e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.589e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.313e-05 [tuple_transform]: 6.929e-05, [1] [Cycle 1]: 6.492e-05, [4] [d_1]: 3.916e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.314e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.068e-05 [environ_conv]: 4.73001e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 3.31999e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.46998e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.122e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.91e-06 [overlap_grad_ring_attention]: 3.75e-06 [overlap_grad_flash_sp]: 1.7e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.66e-05, [1] [Cycle 1]: 6.251e-05, [6] [build]: 2.28998e-06 [elim_shapecalc]: 8.25999e-06 [elim_not_effective]: 1.119e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.535e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.81999e-06 [opt_after_jit_grad]: 0.00044947 [validate]: 3.071e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0795729 [execute]: 8.94e-06 Sums bootstrap : 0.000481s : 0.54% type_inference : 0.005539s : 6.19% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.04% optimize.rewriter_before_opt_a : 0.000057s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000601s : 0.67% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000035s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000149s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000030s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000470s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.51% optimize.opt_b.b_1 : 0.000109s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000414s : 0.46% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000449s : 0.50% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.079573s : 88.86% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.81% : 0.000025s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.21% : 0.000005s : 4: substitution.graph_param_transform 66.86% : 0.000111s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.35% : 0.000004s : 4: substitution.remove_not_recompute_node 2.35% : 0.000004s : 4: substitution.replace_old_param 6.76% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005499 2 89.68% : 0.004931s : 1: type_inference.infer 10.32% : 0.000567s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.36% : 0.000027s : 3: replace.inline 29.64% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.40% : 0.000109s : 3: match.inline 8.60% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.74% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.46% : 0.000004s : 19: predicate.arithmetic_simplify 0.99% : 0.000002s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.06% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.84% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 1.07% : 0.000002s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 1.01% : 0.000002s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000382 8 42.64% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.36% : 0.000219s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.102343 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.98% : 0.003051s : 1: add_attr 2.97% : 0.003043s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.50% : 0.000516s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.99% : 0.001011s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.20% : 0.002248s : 1: opt_a 0.10% : 0.000097s : 1: opt_after_cconv 0.45% : 0.000459s : 1: opt_after_jit_grad 0.18% : 0.000186s : 1: opt_b 4.01% : 0.004102s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000006s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.04% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.20% : 0.000201s : 1: renormalize.infer 0.26% : 0.000261s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.06% : 0.000061s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000069s : 1: symbol_engine_optimizer 77.77% : 0.079590s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 5.43% : 0.005553s : 1: type_inference 0.05% : 0.000052s : 1: validate TotalTime = 1.09871, [24] [bootstrap]: 0.00050203 [type_inference]: 0.0115363 [event_method]: 4.822e-05 [auto_monad]: 0.00011898 [graph_reusing]: 7.89002e-06 [inline]: 2.22999e-06 [add_attr]: 0.00307561, [1] [add_attr_with_inline]: 0.00306726, [1] [Cycle 1]: 7.165e-05, [2] [tag_attr]: 3.464e-05 [meta_addattr_fg_expand]: 1.009e-05 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 4.817e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.0146846, [53] [py_interpret_to_execute]: 3.763e-05 [rewriter_before_opt_a]: 0.000145 [opt_a]: 0.0123495, [3] [Cycle 1]: 0.00782664, [45] [expand_dump_flag]: 3.81001e-06 [switch_simplify]: 7.483e-05 [loop_unroll]: 6.203e-05 [a_1]: 0.00167113 [with_stream_mark]: 2.583e-05 [recompute_prepare]: 2.426e-05 [updatestate_depend_eliminate]: 9.05999e-06 [updatestate_assign_eliminate]: 7.77998e-06 [updatestate_loads_eliminate]: 7.28e-06 [parameter_eliminate]: 3.00998e-06 [a_2]: 0.00025551 [accelerated_algorithm]: 3.222e-05 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 3.94002e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.759e-05 [auto_parallel]: 1.085e-05 [parallel]: 1.972e-05 [flash_sp]: 1.143e-05 [merge_comm]: 9.09e-06 [allreduce_fusion]: 8.89e-06 [matmul_add_comm_reduction]: 2.716e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.776e-05 [virtual_dataset]: 1.567e-05 [get_grad_eliminate_]: 1.537e-05 [virtual_output]: 1.526e-05 [merge_forward]: 9.32001e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 1.767e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.825e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 2.72e-05 [set_forward_comm_id_for_comm_node_pass]: 9.84001e-06 [meta_fg_expand]: 0.00146631 [flash_sp_send_recv_attached]: 3.99002e-06 [receive_attached]: 2.86e-06 [after_resolve]: 5.959e-05 [a_after_grad]: 8.075e-05 [renormalize]: 0.00282045 [add_forward_monad_depend]: 9.75002e-06 [auto_monad_grad]: 6.15002e-06 [auto_monad_eliminator]: 5.963e-05 [cse]: 0.00019013 [a_3]: 0.00037788 [Cycle 2]: 0.00351008, [45] [expand_dump_flag]: 1.81e-06 [switch_simplify]: 5.323e-05 [loop_unroll]: 5.007e-05 [a_1]: 0.00185449 [with_stream_mark]: 1.357e-05 [recompute_prepare]: 1.258e-05 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 4.82e-06 [updatestate_loads_eliminate]: 3.91999e-06 [parameter_eliminate]: 1.08001e-06 [a_2]: 0.00014109 [accelerated_algorithm]: 1.397e-05 [shard]: 1.29e-06 [meta_shard_fg_expand]: 2.07001e-06 [shard_inline]: 1.04e-05 [merge_send_recv]: 6.95002e-06 [auto_parallel]: 7.59002e-06 [parallel]: 5.34e-06 [flash_sp]: 3.56001e-06 [merge_comm]: 5.42999e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 8.49998e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 1.153e-05 [virtual_dataset]: 1.012e-05 [get_grad_eliminate_]: 1.019e-05 [virtual_output]: 9.19e-06 [merge_forward]: 4.84998e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.91998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.898e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.617e-05 [set_forward_comm_id_for_comm_node_pass]: 5.63997e-06 [meta_fg_expand]: 8.296e-05 [flash_sp_send_recv_attached]: 1.22e-06 [receive_attached]: 1.45001e-06 [after_resolve]: 1.871e-05 [a_after_grad]: 1.622e-05 [renormalize]: 0.00066938 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.34998e-06 [auto_monad_eliminator]: 1.61e-05 [cse]: 5.05e-05 [a_3]: 7.448e-05 [Cycle 3]: 0.00099731, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 1.227e-05 [loop_unroll]: 1.045e-05 [a_1]: 0.00028298 [with_stream_mark]: 1.092e-05 [recompute_prepare]: 1.072e-05 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 4.24002e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 0.00013962 [accelerated_algorithm]: 1.299e-05 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.88998e-06 [merge_send_recv]: 7.91001e-06 [auto_parallel]: 7.85e-06 [parallel]: 4.90001e-06 [flash_sp]: 1.22999e-06 [merge_comm]: 5.69e-06 [allreduce_fusion]: 5.17999e-06 [matmul_add_comm_reduction]: 7.92998e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 1.124e-05 [virtual_dataset]: 9.97001e-06 [get_grad_eliminate_]: 9.62001e-06 [virtual_output]: 9.36e-06 [merge_forward]: 4.68999e-06 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 1.024e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.908e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.668e-05 [set_forward_comm_id_for_comm_node_pass]: 6.00002e-06 [meta_fg_expand]: 3.14999e-06 [flash_sp_send_recv_attached]: 9.79984e-07 [receive_attached]: 1.17e-06 [after_resolve]: 1.491e-05 [a_after_grad]: 1.605e-05 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 1.17999e-06 [auto_monad_eliminator]: 1.143e-05 [cse]: 2.719e-05 [a_3]: 5.879e-05 [py_interpret_to_execute_after_opt_a]: 1.079e-05 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 4.817e-05 [convert_after_rewriter]: 9.32001e-06 [order_py_execute_after_rewriter]: 6.72002e-06 [mutable_eliminate]: 0.00046684 [opt_b]: 0.0002912, [1] [Cycle 1]: 0.00028489, [7] [b_1]: 0.00019032 [b_2]: 1.058e-05 [updatestate_depend_eliminate]: 7.83999e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 4.12e-06 [renormalize]: 4.89992e-07 [cse]: 3.277e-05 [optimize_parallel_all_gather_comm]: 1.974e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 1.96e-05 [loop_unroll]: 0.00044726 [opt_after_cconv]: 0.00013805, [1] [Cycle 1]: 0.00013173, [7] [c_1]: 4.818e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 7.52998e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 3.88001e-06 [cse]: 3.102e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 2.94e-05 [tuple_transform]: 0.00010113, [1] [Cycle 1]: 9.651e-05, [4] [d_1]: 6.627e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 9.86e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 5.714e-05 [cse_after_recomputation]: 3.29e-05, [1] [Cycle 1]: 2.785e-05, [1] [cse]: 2.219e-05 [environ_conv]: 8.82e-06 [swap_dp_allreduce_reducescatter]: 8.08001e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.79e-06 [label_fine_grained_interleaved_index]: 3.13998e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.77002e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.01002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.72e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 5.56998e-06 [overlap_recompute_and_grad_model_parallel]: 5.54e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.08002e-06 [overlap_grad_ring_attention]: 5.12e-06 [overlap_grad_flash_sp]: 2.524e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010172, [1] [Cycle 1]: 9.701e-05, [6] [build]: 9.74e-06 [elim_shapecalc]: 1.454e-05 [elim_not_effective]: 1.867e-05 [opt_reshape]: 1.008e-05 [fold_const_symbol]: 1.504e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 2.516e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.00047569 [validate]: 4.662e-05 [backend_pass]: 9.20001e-07 [task_emit]: 1.06789 [execute]: 9.12001e-06 Sums bootstrap : 0.000502s : 0.05% type_inference : 0.011536s : 1.05% event_method : 0.000048s : 0.00% auto_monad : 0.000119s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000145s : 0.01% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000140s : 0.01% optimize.opt_a.loop_unroll : 0.000123s : 0.01% optimize.opt_a.a_1 : 0.003809s : 0.35% optimize.opt_a.with_stream_mark : 0.000050s : 0.00% optimize.opt_a.recompute_prepare : 0.000048s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000536s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000059s : 0.01% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000036s : 0.00% optimize.opt_a.merge_send_recv : 0.000032s : 0.00% optimize.opt_a.auto_parallel : 0.000026s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.00% optimize.opt_a.virtual_dataset : 0.000036s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.00% optimize.opt_a.virtual_output : 0.000034s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000060s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001552s : 0.14% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000093s : 0.01% optimize.opt_a.a_after_grad : 0.000113s : 0.01% optimize.opt_a.renormalize : 0.003490s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000087s : 0.01% optimize.opt_a.cse : 0.000268s : 0.02% optimize.opt_a.a_3 : 0.000511s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000467s : 0.04% optimize.opt_b.b_1 : 0.000190s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000447s : 0.04% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.00% optimize.tuple_transform.d_1 : 0.000066s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000476s : 0.04% validate : 0.000047s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.067889s : 97.59% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000818 222 6.65% : 0.000054s : 12: substitution.arithmetic_simplify 1.84% : 0.000015s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 1.22% : 0.000010s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.91% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.40% : 0.000445s : 17: substitution.inline 1.96% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000016s : 3: substitution.less_batch_normalization 1.70% : 0.000014s : 11: substitution.minmaximum_grad 0.64% : 0.000005s : 5: substitution.partial_eliminate 1.96% : 0.000016s : 20: substitution.remove_not_recompute_node 3.21% : 0.000026s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.49% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.77% : 0.000072s : 30: substitution.tuple_list_get_item_eliminator 2.35% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011462 2 86.78% : 0.009947s : 1: type_inference.infer 13.22% : 0.001515s : 1: type_inference.specialize ------[replace.] 0.000297 33 44.01% : 0.000131s : 17: replace.inline 55.99% : 0.000166s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000473 33 92.10% : 0.000436s : 17: match.inline 7.90% : 0.000037s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000816 5764 1.11% : 0.000009s : 68: predicate.accumulaten_eliminater 0.24% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000009s : 68: predicate.addn_zero_filter 1.06% : 0.000009s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000017s : 100: predicate.arithmetic_simplify 1.19% : 0.000010s : 68: predicate.cast_eliminate 1.21% : 0.000010s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.56% : 0.000005s : 32: predicate.depend_value_elim 1.19% : 0.000010s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.23% : 0.000010s : 68: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000010s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000010s : 76: predicate.environ_get_depend_swap 1.77% : 0.000014s : 108: predicate.environ_get_eliminate 1.20% : 0.000010s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000014s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000019s : 101: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000046s : 249: predicate.inline 1.23% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000006s : 32: predicate.less_batch_normalization 1.61% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000022s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.31% : 0.000019s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000009s : 68: predicate.minmaximum_grad 0.32% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.21% : 0.000002s : 8: predicate.parallel_virtual_node 2.15% : 0.000018s : 101: predicate.partial_defer_inline 1.66% : 0.000014s : 92: predicate.partial_eliminate 1.06% : 0.000009s : 68: predicate.print_const_string_wrapper 0.56% : 0.000005s : 32: predicate.reduce_all_const_elim 1.31% : 0.000011s : 68: predicate.reduce_eliminate 2.68% : 0.000022s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000016s : 152: predicate.replace_applicator 0.56% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.13% : 0.000009s : 68: predicate.reshape_eliminate 1.20% : 0.000010s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.36% : 0.000011s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.09% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000015s : 101: predicate.switch_defer_inline 2.97% : 0.000024s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000041s : 277: predicate.switch_simplify 1.10% : 0.000009s : 68: predicate.tile_eliminate 1.12% : 0.000009s : 68: predicate.transpose_eliminate 1.39% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000013s : 84: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.71% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.42% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.68% : 0.000022s : 168: predicate.updatestate_pure_node_eliminater 3.24% : 0.000026s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.12% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001648 34 55.66% : 0.000917s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.34% : 0.000731s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.125888 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.27% : 0.003080s : 1: add_attr 0.27% : 0.003071s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000126s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000033s : 1: bias_add_comm_swap 0.05% : 0.000537s : 1: bootstrap 0.00% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.00% : 0.000055s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.04% : 0.000456s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000476s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.50% : 0.005606s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000175s : 28: opt.transform.opt_b 0.01% : 0.000074s : 2: opt.transform.opt_trans_graph 0.00% : 0.000055s : 4: opt.transform.symbol_engine_opt 1.10% : 0.012353s : 1: opt_a 0.01% : 0.000141s : 1: opt_after_cconv 0.04% : 0.000485s : 1: opt_after_jit_grad 0.03% : 0.000295s : 1: opt_b 1.30% : 0.014689s : 1: optimize 0.00% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000053s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000034s : 1: remove_dup_value 0.17% : 0.001890s : 2: renormalize.infer 0.14% : 0.001586s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000052s : 1: rewriter_after_opt_a 0.01% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000104s : 1: symbol_engine_optimizer 94.85% : 1.067912s : 1: task_emit 0.01% : 0.000104s : 1: tuple_transform 1.03% : 0.011552s : 1: type_inference 0.01% : 0.000072s : 1: validate TotalTime = 0.0728, [24] [bootstrap]: 0.00048838 [type_inference]: 0.00439685 [event_method]: 1.047e-05 [auto_monad]: 5.193e-05 [graph_reusing]: 5.24003e-06 [inline]: 1.78997e-06 [add_attr]: 0.0030452, [1] [add_attr_with_inline]: 0.00303711, [1] [Cycle 1]: 4.576e-05, [2] [tag_attr]: 1.21e-05 [meta_addattr_fg_expand]: 3.12002e-06 [parallel-infer-symbol]: 2.68998e-06 [pre_auto_parallel]: 2.094e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00381108, [53] [py_interpret_to_execute]: 1.521e-05 [rewriter_before_opt_a]: 3.694e-05 [opt_a]: 0.00194995, [2] [Cycle 1]: 0.00132407, [45] [expand_dump_flag]: 3.08998e-06 [switch_simplify]: 2.435e-05 [loop_unroll]: 1.368e-05 [a_1]: 0.00029355 [with_stream_mark]: 1.294e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.31001e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 3.41001e-06 [parameter_eliminate]: 1.48002e-06 [a_2]: 7.676e-05 [accelerated_algorithm]: 6.81001e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.42999e-06 [shard_inline]: 5.93002e-06 [merge_send_recv]: 7.47002e-06 [auto_parallel]: 5.84e-06 [parallel]: 2.09e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.74002e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 8.43999e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.10998e-06 [virtual_dataset]: 5.79999e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 4.07e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.093e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.59999e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 8.74e-06 [renormalize]: 0.00040801 [add_forward_monad_depend]: 4.16001e-06 [auto_monad_grad]: 1.95001e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 2.706e-05 [a_3]: 4.004e-05 [Cycle 2]: 0.00061661, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00014231 [with_stream_mark]: 9.51998e-06 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.907e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.24003e-06 [parallel]: 4.18001e-06 [flash_sp]: 3.49001e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.91999e-06 [matmul_add_comm_reduction]: 4.90999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.79999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87999e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.23999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 8.70999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.36998e-06 [cse]: 1.371e-05 [a_3]: 3.155e-05 [py_interpret_to_execute_after_opt_a]: 7.68999e-06 [slice_cell_reuse_recomputed_activation]: 1.75001e-06 [rewriter_after_opt_a]: 3.1e-05 [convert_after_rewriter]: 6.74001e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00050319 [opt_b]: 0.0001836, [1] [Cycle 1]: 0.00017769, [7] [b_1]: 0.00010862 [b_2]: 7.39002e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 5.19998e-07 [cse]: 1.692e-05 [optimize_parallel_all_gather_comm]: 1.629e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 2.195e-05 [loop_unroll]: 0.00041463 [opt_after_cconv]: 9.594e-05, [1] [Cycle 1]: 8.988e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.685e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 1.287e-05 [tuple_transform]: 6.99e-05, [1] [Cycle 1]: 6.563e-05, [4] [d_1]: 4.028e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 4.369e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.639e-05, [1] [cse]: 1.142e-05 [environ_conv]: 4.2e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.46998e-06 [label_micro_interleaved_index]: 4.51002e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.08002e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.118e-05 [grouped_pairwise_exchange_alltoall]: 1.96998e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 3.76999e-06 [overlap_grad_flash_sp]: 1.737e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.678e-05, [1] [Cycle 1]: 6.28e-05, [6] [build]: 2.43998e-06 [elim_shapecalc]: 7.97e-06 [elim_not_effective]: 1.104e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.50999e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.599e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00044956 [validate]: 3.207e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0602387 [execute]: 8.55999e-06 Sums bootstrap : 0.000488s : 0.71% type_inference : 0.004397s : 6.39% event_method : 0.000010s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000037s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000436s : 0.63% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000025s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000408s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000503s : 0.73% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000450s : 0.65% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060239s : 87.58% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.09% : 0.000022s : 4: substitution.arithmetic_simplify 1.40% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.24% : 0.000005s : 4: substitution.graph_param_transform 65.58% : 0.000080s : 2: substitution.inline 2.16% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.81% : 0.000005s : 4: substitution.remove_not_recompute_node 3.73% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004358 2 91.82% : 0.004001s : 1: type_inference.infer 8.18% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000140 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 1.34% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.99% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.43% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.15% : 0.000002s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.50% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.19% : 0.000002s : 9: predicate.reduce_eliminate 2.08% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.28% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.45% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.62% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 41.85% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.15% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081008 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.76% : 0.003049s : 1: add_attr 3.75% : 0.003040s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000525s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.63% : 0.000512s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 0.97% : 0.000788s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.41% : 0.001953s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.57% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000187s : 1: opt_b 4.71% : 0.003815s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.31% : 0.000248s : 1: renormalize.infer 0.19% : 0.000153s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000041s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000069s : 1: symbol_engine_optimizer 74.38% : 0.060257s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.44% : 0.004409s : 1: type_inference 0.07% : 0.000055s : 1: validate TotalTime = 0.120231, [24] [bootstrap]: 0.00052232 [type_inference]: 0.0106341 [event_method]: 4.456e-05 [auto_monad]: 0.00012047 [graph_reusing]: 7.41001e-06 [inline]: 2.06e-06 [add_attr]: 0.00332704, [1] [add_attr_with_inline]: 0.00331674, [1] [Cycle 1]: 8.288e-05, [2] [tag_attr]: 3.434e-05 [meta_addattr_fg_expand]: 8.70999e-06 [parallel-infer-symbol]: 3.69002e-06 [pre_auto_parallel]: 6.737e-05 [insert-virtual-dataset]: 2.74001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.0152158, [53] [py_interpret_to_execute]: 4.093e-05 [rewriter_before_opt_a]: 0.00013418 [opt_a]: 0.0125279, [3] [Cycle 1]: 0.0080367, [45] [expand_dump_flag]: 3.51001e-06 [switch_simplify]: 6.849e-05 [loop_unroll]: 5.716e-05 [a_1]: 0.00144837 [with_stream_mark]: 3.03e-05 [recompute_prepare]: 2.373e-05 [updatestate_depend_eliminate]: 1.004e-05 [updatestate_assign_eliminate]: 7.8e-06 [updatestate_loads_eliminate]: 7.08e-06 [parameter_eliminate]: 2.88998e-06 [a_2]: 0.00024982 [accelerated_algorithm]: 3.548e-05 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 3.95e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.785e-05 [auto_parallel]: 1.245e-05 [parallel]: 2.026e-05 [flash_sp]: 1.289e-05 [merge_comm]: 1.039e-05 [allreduce_fusion]: 8.98002e-06 [matmul_add_comm_reduction]: 3.005e-05 [allreduce_slice_to_reducescatter]: 1.03001e-06 [virtual_shard_identity]: 2.042e-05 [virtual_dataset]: 1.59e-05 [get_grad_eliminate_]: 1.51e-05 [virtual_output]: 1.577e-05 [merge_forward]: 1.08e-05 [cell_reuse_recompute_pass]: 1.82001e-06 [offload_activation]: 1.904e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.116e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 2.882e-05 [set_forward_comm_id_for_comm_node_pass]: 1.02e-05 [meta_fg_expand]: 0.00172114 [flash_sp_send_recv_attached]: 4.70999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 6.521e-05 [a_after_grad]: 8.92e-05 [renormalize]: 0.00294197 [add_forward_monad_depend]: 1.154e-05 [auto_monad_grad]: 6.16e-06 [auto_monad_eliminator]: 5.815e-05 [cse]: 0.00017721 [a_3]: 0.00038155 [Cycle 2]: 0.00344092, [45] [expand_dump_flag]: 3.11001e-06 [switch_simplify]: 4.957e-05 [loop_unroll]: 4.503e-05 [a_1]: 0.0016622 [with_stream_mark]: 2.031e-05 [recompute_prepare]: 1.456e-05 [updatestate_depend_eliminate]: 6.24001e-06 [updatestate_assign_eliminate]: 5.27999e-06 [updatestate_loads_eliminate]: 4.43001e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 0.00013206 [accelerated_algorithm]: 1.292e-05 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 2.78e-06 [shard_inline]: 9.42999e-06 [merge_send_recv]: 9.94999e-06 [auto_parallel]: 1.186e-05 [parallel]: 7.92e-06 [flash_sp]: 3.58e-06 [merge_comm]: 5.86e-06 [allreduce_fusion]: 5.07e-06 [matmul_add_comm_reduction]: 1.118e-05 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 1.132e-05 [virtual_dataset]: 8.96998e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.42998e-06 [merge_forward]: 6.32001e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 1.367e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.928e-05 [merge_recompute_call_nodes]: 1.30001e-06 [before_grad]: 1.447e-05 [set_forward_comm_id_for_comm_node_pass]: 6.69001e-06 [meta_fg_expand]: 4.362e-05 [flash_sp_send_recv_attached]: 1.86e-06 [receive_attached]: 2.00002e-06 [after_resolve]: 1.821e-05 [a_after_grad]: 1.487e-05 [renormalize]: 0.00080134 [add_forward_monad_depend]: 5.95002e-06 [auto_monad_grad]: 1.71998e-06 [auto_monad_eliminator]: 1.955e-05 [cse]: 5.551e-05 [a_3]: 6.958e-05 [Cycle 3]: 0.00103268, [45] [expand_dump_flag]: 2.12999e-06 [switch_simplify]: 1.048e-05 [loop_unroll]: 9.15001e-06 [a_1]: 0.00029748 [with_stream_mark]: 1.499e-05 [recompute_prepare]: 1.088e-05 [updatestate_depend_eliminate]: 5.67999e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 4.07e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 0.0001284 [accelerated_algorithm]: 1.433e-05 [shard]: 1.57001e-06 [meta_shard_fg_expand]: 2.54001e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 8.57e-06 [auto_parallel]: 8.22e-06 [parallel]: 6.41e-06 [flash_sp]: 1.10999e-06 [merge_comm]: 5.41002e-06 [allreduce_fusion]: 4.92e-06 [matmul_add_comm_reduction]: 8.69e-06 [allreduce_slice_to_reducescatter]: 9.10019e-07 [virtual_shard_identity]: 1.057e-05 [virtual_dataset]: 8.73001e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.34002e-06 [merge_forward]: 4.80001e-06 [cell_reuse_recompute_pass]: 2.21e-06 [offload_activation]: 1.101e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.861e-05 [merge_recompute_call_nodes]: 1.02e-06 [before_grad]: 1.54e-05 [set_forward_comm_id_for_comm_node_pass]: 6.04001e-06 [meta_fg_expand]: 3.48999e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.97999e-06 [after_resolve]: 1.552e-05 [a_after_grad]: 1.479e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 2.83e-06 [auto_monad_grad]: 1.35001e-06 [auto_monad_eliminator]: 1.45e-05 [cse]: 3.271e-05 [a_3]: 6.201e-05 [py_interpret_to_execute_after_opt_a]: 1.654e-05 [slice_cell_reuse_recomputed_activation]: 2.03002e-06 [rewriter_after_opt_a]: 5.395e-05 [convert_after_rewriter]: 1.027e-05 [order_py_execute_after_rewriter]: 6.79999e-06 [mutable_eliminate]: 0.00068664 [opt_b]: 0.00030708, [1] [Cycle 1]: 0.00029867, [7] [b_1]: 0.00019223 [b_2]: 1.227e-05 [updatestate_depend_eliminate]: 9.22001e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 5.90022e-07 [cse]: 3.694e-05 [optimize_parallel_all_gather_comm]: 2.445e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.779e-05 [loop_unroll]: 0.00050014 [opt_after_cconv]: 0.00014981, [1] [Cycle 1]: 0.00014241, [7] [c_1]: 4.984e-05 [parameter_eliminate]: 3.36999e-06 [updatestate_depend_eliminate]: 8.28999e-06 [updatestate_assign_eliminate]: 4.63001e-06 [updatestate_loads_eliminate]: 3.98999e-06 [cse]: 3.59e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 4.136e-05 [tuple_transform]: 0.00010775, [1] [Cycle 1]: 0.00010284, [4] [d_1]: 7.016e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.061e-05 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 6.606e-05 [cse_after_recomputation]: 3.715e-05, [1] [Cycle 1]: 3.157e-05, [1] [cse]: 2.495e-05 [environ_conv]: 1.088e-05 [swap_dp_allreduce_reducescatter]: 8.35001e-06 [bias_add_comm_swap]: 2.67001e-06 [label_micro_interleaved_index]: 5.22e-06 [label_fine_grained_interleaved_index]: 2.43002e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 8.59989e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.833e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 4.92999e-06 [overlap_recompute_and_grad_model_parallel]: 5.59998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.48002e-06 [overlap_grad_ring_attention]: 5.02999e-06 [overlap_grad_flash_sp]: 2.951e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010781, [1] [Cycle 1]: 0.00010303, [6] [build]: 1.056e-05 [elim_shapecalc]: 1.604e-05 [elim_not_effective]: 1.88e-05 [opt_reshape]: 1.027e-05 [fold_const_symbol]: 1.484e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.14e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 2.732e-05 [get_jit_bprop_graph]: 1.71e-06 [rewriter_after_jit_bprop_graph]: 6.44999e-06 [opt_after_jit_grad]: 0.00053657 [validate]: 5.589e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0893885 [execute]: 9.14e-06 Sums bootstrap : 0.000522s : 0.45% type_inference : 0.010634s : 9.21% event_method : 0.000045s : 0.04% auto_monad : 0.000120s : 0.10% graph_reusing : 0.000007s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000067s : 0.06% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.04% optimize.rewriter_before_opt_a : 0.000134s : 0.12% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000129s : 0.11% optimize.opt_a.loop_unroll : 0.000111s : 0.10% optimize.opt_a.a_1 : 0.003408s : 2.95% optimize.opt_a.with_stream_mark : 0.000066s : 0.06% optimize.opt_a.recompute_prepare : 0.000049s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000510s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000063s : 0.05% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000036s : 0.03% optimize.opt_a.auto_parallel : 0.000033s : 0.03% optimize.opt_a.parallel : 0.000035s : 0.03% optimize.opt_a.flash_sp : 0.000018s : 0.02% optimize.opt_a.merge_comm : 0.000022s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000050s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000042s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000022s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000069s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000059s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.02% optimize.opt_a.meta_fg_expand : 0.001768s : 1.53% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.01% optimize.opt_a.receive_attached : 0.000007s : 0.01% optimize.opt_a.after_resolve : 0.000099s : 0.09% optimize.opt_a.a_after_grad : 0.000119s : 0.10% optimize.opt_a.renormalize : 0.003743s : 3.24% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000092s : 0.08% optimize.opt_a.cse : 0.000265s : 0.23% optimize.opt_a.a_3 : 0.000513s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000054s : 0.05% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000687s : 0.59% optimize.opt_b.b_1 : 0.000192s : 0.17% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000037s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000028s : 0.02% optimize.loop_unroll : 0.000500s : 0.43% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000036s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000041s : 0.04% optimize.tuple_transform.d_1 : 0.000070s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000066s : 0.06% optimize.cse_after_recomputation.cse : 0.000025s : 0.02% optimize.environ_conv : 0.000011s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000030s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000537s : 0.46% validate : 0.000056s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.089388s : 77.43% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000929 218 6.04% : 0.000056s : 11: substitution.arithmetic_simplify 1.99% : 0.000019s : 2: substitution.cast_eliminate 0.30% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000004s : 5: substitution.float_depend_g_call 0.47% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 5: substitution.fold_const_symbol 0.91% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 57.06% : 0.000530s : 16: substitution.inline 2.25% : 0.000021s : 2: substitution.inline_without_move 1.23% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000019s : 3: substitution.less_batch_normalization 1.68% : 0.000016s : 11: substitution.minmaximum_grad 0.74% : 0.000007s : 5: substitution.partial_eliminate 1.66% : 0.000015s : 20: substitution.remove_not_recompute_node 3.50% : 0.000032s : 10: substitution.replace_applicator 1.36% : 0.000013s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.34% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.59% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.10% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.94% : 0.000074s : 28: substitution.tuple_list_get_item_eliminator 2.27% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010554 2 85.82% : 0.009058s : 1: type_inference.infer 14.18% : 0.001496s : 1: type_inference.specialize ------[replace.] 0.000223 30 61.42% : 0.000137s : 16: replace.inline 38.58% : 0.000086s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000555 30 93.58% : 0.000520s : 16: match.inline 6.42% : 0.000036s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000765 5663 1.04% : 0.000008s : 67: predicate.accumulaten_eliminater 0.35% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.21% : 0.000017s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.11% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.51% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.23% : 0.000002s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.14% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.63% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.37% : 0.000018s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.63% : 0.000005s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.71% : 0.000044s : 244: predicate.inline 1.38% : 0.000011s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.55% : 0.000020s : 164: predicate.load_eliminater 0.44% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000009s : 67: predicate.minmaximum_grad 0.57% : 0.000004s : 8: predicate.mutable_eliminate 0.18% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000016s : 97: predicate.partial_defer_inline 1.64% : 0.000013s : 89: predicate.partial_eliminate 1.02% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.23% : 0.000009s : 67: predicate.reduce_eliminate 2.57% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.41% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000015s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.47% : 0.000004s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.36% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.77% : 0.000014s : 97: predicate.switch_defer_inline 2.85% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000038s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.41% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.94% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.55% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.55% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.18% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001704 32 56.09% : 0.000956s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.91% : 0.000748s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.148014 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.25% : 0.003332s : 1: add_attr 2.24% : 0.003321s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000072s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000129s : 1: auto_monad 0.02% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.38% : 0.000557s : 1: bootstrap 0.02% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.03% : 0.000040s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000015s : 1: environ_conv 0.04% : 0.000053s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.34% : 0.000511s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.47% : 0.000698s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 3.49% : 0.005166s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.03% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000178s : 28: opt.transform.opt_b 0.05% : 0.000079s : 2: opt.transform.opt_trans_graph 0.04% : 0.000056s : 4: opt.transform.symbol_engine_opt 8.47% : 0.012531s : 1: opt_a 0.10% : 0.000154s : 1: opt_after_cconv 0.37% : 0.000548s : 1: opt_after_jit_grad 0.21% : 0.000311s : 1: opt_b 10.28% : 0.015221s : 1: optimize 0.02% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000033s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000074s : 1: pre_auto_parallel 0.03% : 0.000046s : 1: py_interpret_to_execute 0.01% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000046s : 1: remove_dup_value 1.40% : 0.002077s : 2: renormalize.infer 1.11% : 0.001649s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000059s : 1: rewriter_after_opt_a 0.09% : 0.000140s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000111s : 1: symbol_engine_optimizer 60.41% : 0.089412s : 1: task_emit 0.07% : 0.000111s : 1: tuple_transform 7.20% : 0.010655s : 1: type_inference 0.06% : 0.000090s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x5-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x6-pynative],max_mem:6.0M TotalTime = 0.0223077, [24] [bootstrap]: 0.00056137 [type_inference]: 0.0064187 [event_method]: 1.47e-05 [auto_monad]: 5.723e-05 [graph_reusing]: 5.42001e-06 [inline]: 1.57999e-06 [add_attr]: 0.00350441, [1] [add_attr_with_inline]: 0.00349319, [1] [Cycle 1]: 4.838e-05, [2] [tag_attr]: 1.555e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 2.97002e-06 [pre_auto_parallel]: 2.927e-05 [insert-virtual-dataset]: 2.41998e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.32001e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0040806, [53] [py_interpret_to_execute]: 2.046e-05 [rewriter_before_opt_a]: 5.739e-05 [opt_a]: 0.00217706, [2] [Cycle 1]: 0.00157405, [45] [expand_dump_flag]: 2.73003e-06 [switch_simplify]: 3.122e-05 [loop_unroll]: 2.05e-05 [a_1]: 0.00045848 [with_stream_mark]: 1.445e-05 [recompute_prepare]: 7.53e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 1.55001e-06 [a_2]: 7.411e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 5.92999e-06 [merge_send_recv]: 7.87e-06 [auto_parallel]: 6.11e-06 [parallel]: 2.724e-05 [flash_sp]: 7.76001e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 8.67e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.53e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.57999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.064e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.24998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.18998e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 8.47998e-06 [renormalize]: 0.00047268 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 2.16998e-06 [auto_monad_eliminator]: 1.298e-05 [cse]: 2.737e-05 [a_3]: 3.988e-05 [Cycle 2]: 0.00059298, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012517 [with_stream_mark]: 1.021e-05 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.85998e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.752e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.27999e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.11997e-06 [parallel]: 4.31002e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.00999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.24998e-06 [get_grad_eliminate_]: 4.94998e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.62001e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.96e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 7.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.20999e-06 [a_after_grad]: 7.85e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.27999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.32001e-06 [cse]: 1.652e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 8.37998e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.123e-05 [convert_after_rewriter]: 6.81999e-06 [order_py_execute_after_rewriter]: 5.27001e-06 [mutable_eliminate]: 0.00050746 [opt_b]: 0.00018248, [1] [Cycle 1]: 0.00017628, [7] [b_1]: 0.00010793 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.50003e-07 [cse]: 1.639e-05 [optimize_parallel_all_gather_comm]: 1.537e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.34e-05 [loop_unroll]: 0.0004169 [opt_after_cconv]: 9.54e-05, [1] [Cycle 1]: 8.966e-05, [7] [c_1]: 2.85e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.72001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.599e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.297e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.414e-05, [4] [d_1]: 3.861e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.23002e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 5.099e-05 [cse_after_recomputation]: 2.036e-05, [1] [Cycle 1]: 1.612e-05, [1] [cse]: 1.123e-05 [environ_conv]: 4.49002e-06 [swap_dp_allreduce_reducescatter]: 5.54e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.67999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.24998e-06 [full_micro_interleaved_order_control]: 2.11998e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.42e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96998e-06 [control_data_broadcast_order]: 1.151e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 3.49001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 3.76999e-06 [overlap_grad_flash_sp]: 1.771e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.85001e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.73e-05, [1] [Cycle 1]: 6.325e-05, [6] [build]: 2.14e-06 [elim_shapecalc]: 8.67e-06 [elim_not_effective]: 1.135e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.82999e-06 [auto_monad_reorder]: 1.57e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 0.00011895 [opt_after_jit_grad]: 0.00045839 [validate]: 3.276e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.00677799 [execute]: 7.05e-06 Sums bootstrap : 0.000561s : 3.15% type_inference : 0.006419s : 35.99% event_method : 0.000015s : 0.08% auto_monad : 0.000057s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000057s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.21% optimize.opt_a.loop_unroll : 0.000026s : 0.14% optimize.opt_a.a_1 : 0.000584s : 3.27% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000142s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000032s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000473s : 2.65% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000072s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000507s : 2.85% optimize.opt_b.b_1 : 0.000108s : 0.61% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000417s : 2.34% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000119s : 0.67% opt_after_jit_grad : 0.000458s : 2.57% validate : 0.000033s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006778s : 38.00% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000170 30 14.85% : 0.000025s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.13% : 0.000005s : 4: substitution.graph_param_transform 67.30% : 0.000114s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000005s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.20% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006372 2 90.76% : 0.005783s : 1: type_inference.infer 9.24% : 0.000589s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.81% : 0.000027s : 3: replace.inline 29.19% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.21% : 0.000112s : 3: match.inline 7.79% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.84% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.33% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.92% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000383 8 45.21% : 0.000173s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.79% : 0.000210s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031462 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.15% : 0.003508s : 1: add_attr 11.12% : 0.003497s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000062s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000598s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.35% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000517s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.00% : 0.000945s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.93% : 0.002180s : 1: opt_a 0.31% : 0.000099s : 1: opt_after_cconv 1.49% : 0.000468s : 1: opt_after_jit_grad 0.59% : 0.000186s : 1: opt_b 12.98% : 0.004084s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.78% : 0.000246s : 1: renormalize.infer 0.70% : 0.000220s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.40% : 0.000125s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.19% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000070s : 1: symbol_engine_optimizer 21.58% : 0.006789s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.45% : 0.006433s : 1: type_inference 0.21% : 0.000065s : 1: validate TotalTime = 0.0182191, [24] [bootstrap]: 0.00046165 [type_inference]: 0.00435936 [event_method]: 1.053e-05 [auto_monad]: 5.063e-05 [graph_reusing]: 5.04e-06 [inline]: 2.02999e-06 [add_attr]: 0.00295267, [1] [add_attr_with_inline]: 0.00294448, [1] [Cycle 1]: 4.673e-05, [2] [tag_attr]: 1.134e-05 [meta_addattr_fg_expand]: 3.46001e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 7.959e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00369421, [53] [py_interpret_to_execute]: 1.563e-05 [rewriter_before_opt_a]: 3.936e-05 [opt_a]: 0.00187925, [2] [Cycle 1]: 0.00128316, [45] [expand_dump_flag]: 2.80002e-06 [switch_simplify]: 2.485e-05 [loop_unroll]: 1.356e-05 [a_1]: 0.0002958 [with_stream_mark]: 1.354e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.665e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 5.74999e-06 [parallel]: 1.801e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.39001e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 9.22001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.20998e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.75001e-06 [virtual_output]: 5.66998e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.125e-05 [merge_recompute_call_nodes]: 1.41998e-06 [before_grad]: 1.017e-05 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.69999e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.0003643 [add_forward_monad_depend]: 5.24e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.267e-05 [cse]: 2.799e-05 [a_3]: 3.991e-05 [Cycle 2]: 0.0005867, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00012479 [with_stream_mark]: 8.91997e-06 [recompute_prepare]: 5.43002e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.73e-05 [accelerated_algorithm]: 5.39998e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.44998e-06 [merge_send_recv]: 4.20999e-06 [auto_parallel]: 5.18002e-06 [parallel]: 4.70001e-06 [flash_sp]: 3.03003e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.57001e-06 [matmul_add_comm_reduction]: 5.42999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.83997e-06 [virtual_dataset]: 5.17999e-06 [get_grad_eliminate_]: 5.01002e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.43998e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 5.80002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.43001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.97002e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.22e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 5.87001e-06 [cse]: 1.232e-05 [a_3]: 3.133e-05 [py_interpret_to_execute_after_opt_a]: 7.60998e-06 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 3.292e-05 [convert_after_rewriter]: 6.85998e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00045833 [opt_b]: 0.00018029, [1] [Cycle 1]: 0.00017465, [7] [b_1]: 0.00010774 [b_2]: 7.26999e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.38002e-06 [renormalize]: 3.7998e-07 [cse]: 1.574e-05 [optimize_parallel_all_gather_comm]: 1.586e-05 [overlap_param_gather]: 2.12999e-06 [cconv]: 2.27e-05 [loop_unroll]: 0.00041135 [opt_after_cconv]: 9.344e-05, [1] [Cycle 1]: 8.78e-05, [7] [c_1]: 2.795e-05 [parameter_eliminate]: 2.02001e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.07001e-06 [cse]: 1.624e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.221e-05 [tuple_transform]: 6.808e-05, [1] [Cycle 1]: 6.361e-05, [4] [d_1]: 3.852e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.58002e-06 [add_recomputation]: 4.222e-05 [cse_after_recomputation]: 1.968e-05, [1] [Cycle 1]: 1.511e-05, [1] [cse]: 1.023e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.39e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.24002e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.16002e-06 [slice_recompute_activation]: 2.36998e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.46998e-06 [reorder_send_recv_between_fp_bp]: 3.06999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.12e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.199e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.89e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27999e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.731e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.91998e-06 [split_layernorm_comm]: 1.97999e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.991e-05, [1] [Cycle 1]: 6.565e-05, [6] [build]: 2.56998e-06 [elim_shapecalc]: 9.07001e-06 [elim_not_effective]: 1.167e-05 [opt_reshape]: 6.36998e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.486e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.00048299 [validate]: 3.18e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00585197 [execute]: 7.46999e-06 Sums bootstrap : 0.000462s : 3.23% type_inference : 0.004359s : 30.46% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000080s : 0.56% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000421s : 2.94% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000364s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000458s : 3.20% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000411s : 2.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000483s : 3.37% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005852s : 40.89% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.33% : 0.000022s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.25% : 0.000002s : 2: substitution.fold_const_symbol 4.37% : 0.000005s : 4: substitution.graph_param_transform 65.26% : 0.000079s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 3.31% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004319 2 91.92% : 0.003970s : 1: type_inference.infer 8.08% : 0.000349s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.78% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.74% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.94% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 1.14% : 0.000002s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.84% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.99% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.65% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.16% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.84% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026164 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.30% : 0.002957s : 1: add_attr 11.27% : 0.002948s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000499s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.79% : 0.000468s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000772s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.13% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.19% : 0.001882s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.88% : 0.000493s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.13% : 0.003698s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.32% : 0.000084s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.78% : 0.000204s : 1: renormalize.infer 0.59% : 0.000155s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000037s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000072s : 1: symbol_engine_optimizer 22.40% : 0.005862s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.71% : 0.004373s : 1: type_inference 0.23% : 0.000059s : 1: validate TotalTime = 0.0197692, [24] [bootstrap]: 0.00046443 [type_inference]: 0.00548746 [event_method]: 1.392e-05 [auto_monad]: 8.856e-05 [graph_reusing]: 5.96e-06 [inline]: 2.27999e-06 [add_attr]: 0.00292829, [1] [add_attr_with_inline]: 0.00292038, [1] [Cycle 1]: 4.583e-05, [2] [tag_attr]: 1.613e-05 [meta_addattr_fg_expand]: 4.03999e-06 [parallel-infer-symbol]: 2.64999e-06 [pre_auto_parallel]: 2.548e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.60999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00396117, [53] [py_interpret_to_execute]: 2.004e-05 [rewriter_before_opt_a]: 5.692e-05 [opt_a]: 0.00214389, [2] [Cycle 1]: 0.00154747, [45] [expand_dump_flag]: 2.57001e-06 [switch_simplify]: 3.226e-05 [loop_unroll]: 2.04e-05 [a_1]: 0.00044829 [with_stream_mark]: 1.277e-05 [recompute_prepare]: 7.87e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.588e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 7.43e-06 [auto_parallel]: 5.86e-06 [parallel]: 1.784e-05 [flash_sp]: 7.25e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.35998e-06 [virtual_dataset]: 6.10002e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.943e-05 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 9.43002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.185e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 9.47999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.51998e-06 [flash_sp_send_recv_attached]: 2.97002e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.072e-05 [a_after_grad]: 8.95999e-06 [renormalize]: 0.00042117 [add_forward_monad_depend]: 4.89e-06 [auto_monad_grad]: 1.96e-06 [auto_monad_eliminator]: 1.398e-05 [cse]: 2.662e-05 [a_3]: 4.115e-05 [Cycle 2]: 0.00058728, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012461 [with_stream_mark]: 9.49e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.666e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.17e-06 [parallel]: 4.74e-06 [flash_sp]: 2.86e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 4.91997e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 6.07999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.07999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13998e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.35001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04003e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.01e-06 [cse]: 1.278e-05 [a_3]: 3.158e-05 [py_interpret_to_execute_after_opt_a]: 7.83999e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.094e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 5.27001e-06 [mutable_eliminate]: 0.00044508 [opt_b]: 0.00018328, [1] [Cycle 1]: 0.00017722, [7] [b_1]: 0.00010968 [b_2]: 7.18998e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.49999e-06 [renormalize]: 5.8001e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.55e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 2.25e-05 [loop_unroll]: 0.00040948 [opt_after_cconv]: 9.291e-05, [1] [Cycle 1]: 8.731e-05, [7] [c_1]: 2.739e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.18002e-06 [cse]: 1.559e-05 [renormalize]: 3.70026e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 6.867e-05, [1] [Cycle 1]: 6.405e-05, [4] [d_1]: 3.847e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.49973e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.66998e-06 [add_recomputation]: 4.262e-05 [cse_after_recomputation]: 1.951e-05, [1] [Cycle 1]: 1.513e-05, [1] [cse]: 1.018e-05 [environ_conv]: 4.86002e-06 [swap_dp_allreduce_reducescatter]: 5.81998e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.50002e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 1.97999e-06 [reorder_send_recv_between_fp_bp]: 2.68998e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.141e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.827e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.848e-05, [1] [Cycle 1]: 6.44e-05, [6] [build]: 2.81e-06 [elim_shapecalc]: 8.72998e-06 [elim_not_effective]: 1.147e-05 [opt_reshape]: 6.15002e-06 [fold_const_symbol]: 8.55001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 9.49978e-07 [rewriter_after_jit_bprop_graph]: 3.66999e-06 [opt_after_jit_grad]: 0.00044372 [validate]: 2.99e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00607882 [execute]: 7.59002e-06 Sums bootstrap : 0.000464s : 2.92% type_inference : 0.005487s : 34.54% event_method : 0.000014s : 0.09% auto_monad : 0.000089s : 0.56% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000057s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000573s : 3.61% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000042s : 0.26% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000421s : 2.65% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000039s : 0.25% optimize.opt_a.a_3 : 0.000073s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 2.80% optimize.opt_b.b_1 : 0.000110s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000409s : 2.58% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000444s : 2.79% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006079s : 38.27% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000165 30 15.08% : 0.000025s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 4: substitution.graph_param_transform 66.39% : 0.000110s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000004s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.64% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005448 2 90.13% : 0.004910s : 1: type_inference.infer 9.87% : 0.000538s : 1: type_inference.specialize ------[replace.] 0.000039 5 68.36% : 0.000027s : 3: replace.inline 31.64% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.54% : 0.000108s : 3: match.inline 8.46% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.95% : 0.000002s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.55% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 47.46% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.54% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028177 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.41% : 0.002932s : 1: add_attr 10.38% : 0.002924s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.34% : 0.000094s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.77% : 0.000500s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.33% : 0.000939s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.62% : 0.002147s : 1: opt_a 0.34% : 0.000096s : 1: opt_after_cconv 1.61% : 0.000454s : 1: opt_after_jit_grad 0.66% : 0.000187s : 1: opt_b 14.07% : 0.003965s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.79% : 0.000222s : 1: renormalize.infer 0.68% : 0.000192s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.61% : 0.006090s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.53% : 0.005502s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0384331, [24] [bootstrap]: 0.00052457 [type_inference]: 0.0114972 [event_method]: 5.01e-05 [auto_monad]: 0.00011966 [graph_reusing]: 8.42e-06 [inline]: 2.15002e-06 [add_attr]: 0.00314279, [1] [add_attr_with_inline]: 0.00313431, [1] [Cycle 1]: 7.35e-05, [2] [tag_attr]: 3.537e-05 [meta_addattr_fg_expand]: 9.64e-06 [parallel-infer-symbol]: 2.78998e-06 [pre_auto_parallel]: 5.023e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 2.18998e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.0140055, [53] [py_interpret_to_execute]: 3.941e-05 [rewriter_before_opt_a]: 0.00014527 [opt_a]: 0.0116706, [3] [Cycle 1]: 0.00769832, [45] [expand_dump_flag]: 4.1e-06 [switch_simplify]: 7.439e-05 [loop_unroll]: 6.244e-05 [a_1]: 0.00150596 [with_stream_mark]: 2.311e-05 [recompute_prepare]: 2.211e-05 [updatestate_depend_eliminate]: 9.25999e-06 [updatestate_assign_eliminate]: 8.17003e-06 [updatestate_loads_eliminate]: 7.18e-06 [parameter_eliminate]: 2.77002e-06 [a_2]: 0.00024495 [accelerated_algorithm]: 2.99e-05 [shard]: 1.81998e-06 [meta_shard_fg_expand]: 3.38999e-06 [shard_inline]: 1.622e-05 [merge_send_recv]: 1.55e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.937e-05 [flash_sp]: 1.183e-05 [merge_comm]: 1.01e-05 [allreduce_fusion]: 9.03002e-06 [matmul_add_comm_reduction]: 2.677e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 1.756e-05 [virtual_dataset]: 1.537e-05 [get_grad_eliminate_]: 1.513e-05 [virtual_output]: 1.512e-05 [merge_forward]: 9.20001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.82e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.919e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 2.775e-05 [set_forward_comm_id_for_comm_node_pass]: 9.63002e-06 [meta_fg_expand]: 0.00139832 [flash_sp_send_recv_attached]: 3.78999e-06 [receive_attached]: 2.02001e-06 [after_resolve]: 6.001e-05 [a_after_grad]: 8.11e-05 [renormalize]: 0.00302409 [add_forward_monad_depend]: 9.30001e-06 [auto_monad_grad]: 5.06002e-06 [auto_monad_eliminator]: 5.505e-05 [cse]: 0.00016624 [a_3]: 0.00033389 [Cycle 2]: 0.00305551, [45] [expand_dump_flag]: 1.72001e-06 [switch_simplify]: 4.702e-05 [loop_unroll]: 4.342e-05 [a_1]: 0.00155038 [with_stream_mark]: 1.271e-05 [recompute_prepare]: 1.097e-05 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 4.01001e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012655 [accelerated_algorithm]: 1.203e-05 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 9.19e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.35e-06 [parallel]: 5.50001e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 5.02999e-06 [allreduce_fusion]: 5.10999e-06 [matmul_add_comm_reduction]: 7.58999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 8.64003e-06 [get_grad_eliminate_]: 8.81002e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.39002e-06 [cell_reuse_recompute_pass]: 9.5999e-07 [offload_activation]: 9.55001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.756e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.486e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20999e-06 [meta_fg_expand]: 8.053e-05 [flash_sp_send_recv_attached]: 1.07998e-06 [receive_attached]: 1.17e-06 [after_resolve]: 1.603e-05 [a_after_grad]: 1.412e-05 [renormalize]: 0.00060922 [add_forward_monad_depend]: 4.42003e-06 [auto_monad_grad]: 1.29e-06 [auto_monad_eliminator]: 1.456e-05 [cse]: 4.748e-05 [a_3]: 6.492e-05 [Cycle 3]: 0.00090124, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 1.044e-05 [loop_unroll]: 8.66002e-06 [a_1]: 0.00024964 [with_stream_mark]: 9.89999e-06 [recompute_prepare]: 9.07999e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.80998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012302 [accelerated_algorithm]: 1.177e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 2.17001e-06 [shard_inline]: 9.00999e-06 [merge_send_recv]: 7.11001e-06 [auto_parallel]: 7.33999e-06 [parallel]: 4.54998e-06 [flash_sp]: 1.00001e-06 [merge_comm]: 4.89e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.66999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.85999e-06 [get_grad_eliminate_]: 8.58001e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.571e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.521e-05 [set_forward_comm_id_for_comm_node_pass]: 5.89e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 1.436e-05 [a_after_grad]: 1.415e-05 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 1.07e-05 [cse]: 2.62e-05 [a_3]: 5.887e-05 [py_interpret_to_execute_after_opt_a]: 1.093e-05 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 4.846e-05 [convert_after_rewriter]: 9.09003e-06 [order_py_execute_after_rewriter]: 6.94999e-06 [mutable_eliminate]: 0.00049289 [opt_b]: 0.00028765, [1] [Cycle 1]: 0.00028136, [7] [b_1]: 0.00019023 [b_2]: 1.043e-05 [updatestate_depend_eliminate]: 6.92002e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 3.94997e-06 [renormalize]: 4.19997e-07 [cse]: 3.06e-05 [optimize_parallel_all_gather_comm]: 2.031e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.084e-05 [loop_unroll]: 0.00045757 [opt_after_cconv]: 0.00013632, [1] [Cycle 1]: 0.00013019, [7] [c_1]: 4.837e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 6.91001e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.93999e-06 [cse]: 2.974e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 2.876e-05 [tuple_transform]: 0.00010148, [1] [Cycle 1]: 9.685e-05, [4] [d_1]: 6.609e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 1.011e-05 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 5.702e-05 [cse_after_recomputation]: 3.176e-05, [1] [Cycle 1]: 2.696e-05, [1] [cse]: 2.158e-05 [environ_conv]: 8.01001e-06 [swap_dp_allreduce_reducescatter]: 7.96001e-06 [bias_add_comm_swap]: 2.83998e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.68e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.55001e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.44998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.696e-05 [grouped_pairwise_exchange_alltoall]: 2.06e-06 [offloading_packed_experts]: 4.94998e-06 [overlap_recompute_and_grad_model_parallel]: 5.78002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 5.25999e-06 [overlap_grad_flash_sp]: 2.494e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 9.812e-05, [1] [Cycle 1]: 9.363e-05, [6] [build]: 9.61998e-06 [elim_shapecalc]: 1.331e-05 [elim_not_effective]: 1.774e-05 [opt_reshape]: 1.037e-05 [fold_const_symbol]: 1.486e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.45001e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 2.498e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 3.77002e-06 [opt_after_jit_grad]: 0.00047636 [validate]: 4.535e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.00824895 [execute]: 7.42002e-06 Sums bootstrap : 0.000525s : 1.54% type_inference : 0.011497s : 33.79% event_method : 0.000050s : 0.15% auto_monad : 0.000120s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000145s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.34% optimize.opt_a.a_1 : 0.003306s : 9.72% optimize.opt_a.with_stream_mark : 0.000046s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000495s : 1.45% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000058s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001482s : 4.35% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.32% optimize.opt_a.renormalize : 0.003633s : 10.68% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000240s : 0.71% optimize.opt_a.a_3 : 0.000458s : 1.34% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000493s : 1.45% optimize.opt_b.b_1 : 0.000190s : 0.56% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000458s : 1.34% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.08% optimize.tuple_transform.d_1 : 0.000066s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000476s : 1.40% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008249s : 24.24% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000779 222 6.08% : 0.000047s : 12: substitution.arithmetic_simplify 1.88% : 0.000015s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.92% : 0.000007s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.71% : 0.000434s : 17: substitution.inline 2.13% : 0.000017s : 2: substitution.inline_without_move 1.43% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.86% : 0.000014s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000006s : 5: substitution.partial_eliminate 1.88% : 0.000015s : 20: substitution.remove_not_recompute_node 3.22% : 0.000025s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.52% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011421 2 87.07% : 0.009944s : 1: type_inference.infer 12.93% : 0.001477s : 1: type_inference.specialize ------[replace.] 0.000223 33 57.81% : 0.000129s : 17: replace.inline 42.19% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000459 33 92.55% : 0.000425s : 17: match.inline 7.45% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000764 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.00% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.24% : 0.000017s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.49% : 0.000042s : 249: predicate.inline 1.22% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.03% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.25% : 0.000010s : 68: predicate.reduce_eliminate 2.64% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.11% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.08% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 4.33% : 0.000033s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.42% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.51% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.59% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.23% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002077 34 67.47% : 0.001401s : 13: func_graph_cloner_run.FuncGraphClonerGraph 32.53% : 0.000676s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064511 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.88% : 0.003147s : 1: add_attr 4.86% : 0.003138s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000126s : 1: auto_monad 0.04% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.87% : 0.000559s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000058s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000467s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.71% : 0.004971s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000175s : 28: opt.transform.opt_b 0.11% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 18.10% : 0.011674s : 1: opt_a 0.22% : 0.000140s : 1: opt_after_cconv 0.75% : 0.000486s : 1: opt_after_jit_grad 0.45% : 0.000291s : 1: opt_b 21.72% : 0.014010s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 3.40% : 0.002192s : 2: renormalize.infer 2.21% : 0.001429s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.23% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 12.80% : 0.008259s : 1: task_emit 0.16% : 0.000105s : 1: tuple_transform 17.85% : 0.011514s : 1: type_inference 0.12% : 0.000079s : 1: validate TotalTime = 0.0188487, [24] [bootstrap]: 0.00049151 [type_inference]: 0.00449391 [event_method]: 1.063e-05 [auto_monad]: 5.081e-05 [graph_reusing]: 5.14e-06 [inline]: 1.92999e-06 [add_attr]: 0.00303021, [1] [add_attr_with_inline]: 0.00302117, [1] [Cycle 1]: 4.803e-05, [2] [tag_attr]: 1.257e-05 [meta_addattr_fg_expand]: 3.16999e-06 [parallel-infer-symbol]: 2.83998e-06 [pre_auto_parallel]: 2.263e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.35997e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00374705, [53] [py_interpret_to_execute]: 1.519e-05 [rewriter_before_opt_a]: 3.961e-05 [opt_a]: 0.0019114, [2] [Cycle 1]: 0.00131594, [45] [expand_dump_flag]: 2.64001e-06 [switch_simplify]: 2.42e-05 [loop_unroll]: 1.4e-05 [a_1]: 0.00029607 [with_stream_mark]: 1.325e-05 [recompute_prepare]: 7.37002e-06 [updatestate_depend_eliminate]: 3.58999e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 7.673e-05 [accelerated_algorithm]: 6.51999e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.42e-06 [auto_parallel]: 6.24999e-06 [parallel]: 1.773e-05 [flash_sp]: 7.12002e-06 [merge_comm]: 3.52997e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 5.75001e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.69002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.111e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 9.72999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.62998e-06 [meta_fg_expand]: 1.98997e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.91e-06 [after_resolve]: 1.085e-05 [a_after_grad]: 8.75999e-06 [renormalize]: 0.00039599 [add_forward_monad_depend]: 4.97999e-06 [auto_monad_grad]: 1.98002e-06 [auto_monad_eliminator]: 1.233e-05 [cse]: 2.741e-05 [a_3]: 4.103e-05 [Cycle 2]: 0.00058637, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.26998e-06 [a_1]: 0.0001245 [with_stream_mark]: 9.61998e-06 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.714e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.39002e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.25999e-06 [flash_sp]: 3.14999e-06 [merge_comm]: 2.74999e-06 [allreduce_fusion]: 2.58e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 5.83002e-06 [virtual_dataset]: 5.09998e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 5.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.14e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.55001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.20999e-06 [a_after_grad]: 8.08999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.28002e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.08002e-06 [cse]: 1.223e-05 [a_3]: 3.141e-05 [py_interpret_to_execute_after_opt_a]: 7.25e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.088e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 5.22999e-06 [mutable_eliminate]: 0.00047805 [opt_b]: 0.00018182, [1] [Cycle 1]: 0.00017584, [7] [b_1]: 0.00010856 [b_2]: 7.26001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 4.09986e-07 [cse]: 1.563e-05 [optimize_parallel_all_gather_comm]: 1.554e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 2.236e-05 [loop_unroll]: 0.00041309 [opt_after_cconv]: 9.625e-05, [1] [Cycle 1]: 9.051e-05, [7] [c_1]: 2.816e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.645e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.203e-05 [tuple_transform]: 6.92e-05, [1] [Cycle 1]: 6.486e-05, [4] [d_1]: 3.916e-05 [none_parameter_eliminate]: 1.53002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.39999e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.396e-05 [cse_after_recomputation]: 1.936e-05, [1] [Cycle 1]: 1.503e-05, [1] [cse]: 9.92999e-06 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.42001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.136e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.05002e-06 [overlap_grad_ring_attention]: 3.83999e-06 [overlap_grad_flash_sp]: 1.798e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.696e-05, [1] [Cycle 1]: 6.285e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.16002e-06 [elim_not_effective]: 1.129e-05 [opt_reshape]: 5.84e-06 [fold_const_symbol]: 8.41002e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.47e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00044409 [validate]: 3.218e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00627807 [execute]: 7.56999e-06 Sums bootstrap : 0.000492s : 3.31% type_inference : 0.004494s : 30.25% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000421s : 2.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000396s : 2.67% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.12% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000478s : 3.22% optimize.opt_b.b_1 : 0.000109s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000413s : 2.78% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 2.99% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006278s : 42.26% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000125 26 17.81% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 4.28% : 0.000005s : 4: substitution.graph_param_transform 66.29% : 0.000083s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.25% : 0.000004s : 4: substitution.remove_not_recompute_node 3.71% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004451 2 92.10% : 0.004099s : 1: type_inference.infer 7.90% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000082 2 100.00% : 0.000082s : 2: match.inline ------[predicate.] 0.000137 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.74% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000008s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.87% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.82% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.20% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.01% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.77% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000251 6 40.80% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.20% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026948 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.26% : 0.003035s : 1: add_attr 11.22% : 0.003025s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.96% : 0.000527s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.81% : 0.000487s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.86% : 0.000772s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001914s : 1: opt_a 0.37% : 0.000100s : 1: opt_after_cconv 1.68% : 0.000454s : 1: opt_after_jit_grad 0.69% : 0.000185s : 1: opt_b 13.92% : 0.003751s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.84% : 0.000225s : 1: renormalize.infer 0.61% : 0.000164s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.34% : 0.006289s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.73% : 0.004509s : 1: type_inference 0.22% : 0.000060s : 1: validate TotalTime = 0.0878188, [24] [bootstrap]: 0.00058718 [type_inference]: 0.0271669 [event_method]: 4.472e-05 [auto_monad]: 0.00011666 [graph_reusing]: 2.197e-05 [inline]: 2.27999e-06 [add_attr]: 0.0031904, [1] [add_attr_with_inline]: 0.00318145, [1] [Cycle 1]: 7.853e-05, [2] [tag_attr]: 3.291e-05 [meta_addattr_fg_expand]: 8.28999e-06 [parallel-infer-symbol]: 3.39001e-06 [pre_auto_parallel]: 4.816e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0305035, [53] [py_interpret_to_execute]: 3.562e-05 [rewriter_before_opt_a]: 0.00012672 [opt_a]: 0.0279312, [3] [Cycle 1]: 0.0238024, [45] [expand_dump_flag]: 3.58999e-06 [switch_simplify]: 6.703e-05 [loop_unroll]: 5.504e-05 [a_1]: 0.00136392 [with_stream_mark]: 2.382e-05 [recompute_prepare]: 2.187e-05 [updatestate_depend_eliminate]: 9.55001e-06 [updatestate_assign_eliminate]: 7.85e-06 [updatestate_loads_eliminate]: 7.71001e-06 [parameter_eliminate]: 2.69001e-06 [a_2]: 0.00024563 [accelerated_algorithm]: 3.105e-05 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 3.22002e-06 [shard_inline]: 1.636e-05 [merge_send_recv]: 1.601e-05 [auto_parallel]: 1.102e-05 [parallel]: 1.882e-05 [flash_sp]: 1.137e-05 [merge_comm]: 9.71e-06 [allreduce_fusion]: 8.67e-06 [matmul_add_comm_reduction]: 2.611e-05 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 1.812e-05 [virtual_dataset]: 1.569e-05 [get_grad_eliminate_]: 1.527e-05 [virtual_output]: 1.544e-05 [merge_forward]: 9.92999e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.746e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.9e-05 [merge_recompute_call_nodes]: 1.29998e-06 [before_grad]: 2.74e-05 [set_forward_comm_id_for_comm_node_pass]: 9.61998e-06 [meta_fg_expand]: 0.00143272 [flash_sp_send_recv_attached]: 3.73001e-06 [receive_attached]: 2.97002e-06 [after_resolve]: 5.991e-05 [a_after_grad]: 8.031e-05 [renormalize]: 0.019185 [add_forward_monad_depend]: 1.268e-05 [auto_monad_grad]: 6.35997e-06 [auto_monad_eliminator]: 5.833e-05 [cse]: 0.00017041 [a_3]: 0.00035598 [Cycle 2]: 0.00320396, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 4.828e-05 [loop_unroll]: 4.493e-05 [a_1]: 0.00161439 [with_stream_mark]: 1.659e-05 [recompute_prepare]: 1.18e-05 [updatestate_depend_eliminate]: 5.82001e-06 [updatestate_assign_eliminate]: 5.19998e-06 [updatestate_loads_eliminate]: 4.15e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 0.0001274 [accelerated_algorithm]: 1.402e-05 [shard]: 2.56e-06 [meta_shard_fg_expand]: 2.54001e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 1.048e-05 [auto_parallel]: 1.096e-05 [parallel]: 9.53002e-06 [flash_sp]: 4.27998e-06 [merge_comm]: 5.12e-06 [allreduce_fusion]: 5.48002e-06 [matmul_add_comm_reduction]: 1.064e-05 [allreduce_slice_to_reducescatter]: 7.60017e-07 [virtual_shard_identity]: 1.069e-05 [virtual_dataset]: 8.90001e-06 [get_grad_eliminate_]: 8.99998e-06 [virtual_output]: 8.69e-06 [merge_forward]: 5.22999e-06 [cell_reuse_recompute_pass]: 1.46002e-06 [offload_activation]: 1.252e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.733e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 1.585e-05 [set_forward_comm_id_for_comm_node_pass]: 6.01e-06 [meta_fg_expand]: 5.052e-05 [flash_sp_send_recv_attached]: 1.49e-06 [receive_attached]: 2.75997e-06 [after_resolve]: 1.583e-05 [a_after_grad]: 1.459e-05 [renormalize]: 0.00067496 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.502e-05 [cse]: 4.777e-05 [a_3]: 6.605e-05 [Cycle 3]: 0.00090648, [45] [expand_dump_flag]: 1.59998e-06 [switch_simplify]: 1.053e-05 [loop_unroll]: 9.05999e-06 [a_1]: 0.00025025 [with_stream_mark]: 1.026e-05 [recompute_prepare]: 9.46e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.73999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012297 [accelerated_algorithm]: 1.192e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.88002e-06 [shard_inline]: 8.85001e-06 [merge_send_recv]: 7.13e-06 [auto_parallel]: 7.86001e-06 [parallel]: 4.66002e-06 [flash_sp]: 1.12e-06 [merge_comm]: 5.05001e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 7.87e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.03e-05 [virtual_dataset]: 8.87e-06 [get_grad_eliminate_]: 8.70001e-06 [virtual_output]: 8.37e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.93002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.676e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 1.389e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99998e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 1.10999e-06 [receive_attached]: 1.10999e-06 [after_resolve]: 1.337e-05 [a_after_grad]: 1.431e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.35001e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 1.138e-05 [cse]: 2.676e-05 [a_3]: 5.982e-05 [py_interpret_to_execute_after_opt_a]: 1.463e-05 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 4.964e-05 [convert_after_rewriter]: 9.87999e-06 [order_py_execute_after_rewriter]: 7.1e-06 [mutable_eliminate]: 0.00075731 [opt_b]: 0.00028826, [1] [Cycle 1]: 0.00028097, [7] [b_1]: 0.00018876 [b_2]: 1.058e-05 [updatestate_depend_eliminate]: 7.46001e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.86001e-06 [renormalize]: 4.2998e-07 [cse]: 3.144e-05 [optimize_parallel_all_gather_comm]: 2.082e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.434e-05 [loop_unroll]: 0.0004295 [opt_after_cconv]: 0.00013749, [1] [Cycle 1]: 0.00013144, [7] [c_1]: 4.956e-05 [parameter_eliminate]: 2.80002e-06 [updatestate_depend_eliminate]: 7.01999e-06 [updatestate_assign_eliminate]: 4.24002e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 2.985e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 3.36e-05 [tuple_transform]: 0.00010082, [1] [Cycle 1]: 9.616e-05, [4] [d_1]: 6.595e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 9.76e-06 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 6.122e-05 [cse_after_recomputation]: 3.136e-05, [1] [Cycle 1]: 2.671e-05, [1] [cse]: 2.127e-05 [environ_conv]: 9.20001e-06 [swap_dp_allreduce_reducescatter]: 7.95e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.58998e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.54998e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 1.96e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 8.49977e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 9.70002e-07 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.715e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 5.02e-06 [overlap_recompute_and_grad_model_parallel]: 5.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 5.22999e-06 [overlap_grad_flash_sp]: 2.688e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.16003e-06 [split_layernorm_comm]: 2.19001e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 0.00010002, [1] [Cycle 1]: 9.555e-05, [6] [build]: 1.018e-05 [elim_shapecalc]: 1.357e-05 [elim_not_effective]: 1.804e-05 [opt_reshape]: 1.026e-05 [fold_const_symbol]: 1.516e-05 [renormalize]: 1.69995e-07 [detach_backward]: 2.22999e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 2.45e-05 [get_jit_bprop_graph]: 1.68002e-06 [rewriter_after_jit_bprop_graph]: 3.85998e-06 [opt_after_jit_grad]: 0.00046686 [validate]: 4.682e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0253207 [execute]: 8.37e-06 Sums bootstrap : 0.000587s : 0.71% type_inference : 0.027167s : 32.62% event_method : 0.000045s : 0.05% auto_monad : 0.000117s : 0.14% graph_reusing : 0.000022s : 0.03% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.04% optimize.rewriter_before_opt_a : 0.000127s : 0.15% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.15% optimize.opt_a.loop_unroll : 0.000109s : 0.13% optimize.opt_a.a_1 : 0.003229s : 3.88% optimize.opt_a.with_stream_mark : 0.000051s : 0.06% optimize.opt_a.recompute_prepare : 0.000043s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 0.60% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.07% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.04% optimize.opt_a.merge_send_recv : 0.000034s : 0.04% optimize.opt_a.auto_parallel : 0.000030s : 0.04% optimize.opt_a.parallel : 0.000033s : 0.04% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.05% optimize.opt_a.virtual_dataset : 0.000033s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.04% optimize.opt_a.virtual_output : 0.000032s : 0.04% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000039s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001486s : 1.78% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000007s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.11% optimize.opt_a.a_after_grad : 0.000109s : 0.13% optimize.opt_a.renormalize : 0.019860s : 23.85% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.02% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.10% optimize.opt_a.cse : 0.000245s : 0.29% optimize.opt_a.a_3 : 0.000482s : 0.58% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.06% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000757s : 0.91% optimize.opt_b.b_1 : 0.000189s : 0.23% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000430s : 0.52% optimize.opt_after_cconv.c_1 : 0.000050s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.04% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000034s : 0.04% optimize.tuple_transform.d_1 : 0.000066s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.07% optimize.cse_after_recomputation.cse : 0.000021s : 0.03% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000027s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000467s : 0.56% validate : 0.000047s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.025321s : 30.40% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000811 218 6.19% : 0.000050s : 11: substitution.arithmetic_simplify 1.90% : 0.000015s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.56% : 0.000005s : 5: substitution.float_depend_g_call 0.59% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 55.13% : 0.000447s : 16: substitution.inline 2.01% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.09% : 0.000017s : 3: substitution.less_batch_normalization 1.74% : 0.000014s : 11: substitution.minmaximum_grad 0.81% : 0.000007s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.50% : 0.000028s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.80% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.20% : 0.000067s : 28: substitution.tuple_list_get_item_eliminator 2.40% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.027085 2 94.90% : 0.025703s : 1: type_inference.infer 5.10% : 0.001382s : 1: type_inference.specialize ------[replace.] 0.000218 30 61.56% : 0.000134s : 16: replace.inline 38.44% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000469 30 93.46% : 0.000439s : 16: match.inline 6.54% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000743 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.17% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 244: predicate.inline 1.35% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.67% : 0.000012s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.35% : 0.000010s : 68: predicate.same_eliminate 0.40% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.70% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.37% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.18% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.91% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.10% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001687 32 58.88% : 0.000993s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.12% : 0.000694s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.146592 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.18% : 0.003195s : 1: add_attr 2.17% : 0.003185s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000065s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000123s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000623s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000034s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000052s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000027s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.30% : 0.000438s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000767s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.35% : 0.004904s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000174s : 28: opt.transform.opt_b 0.05% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 19.06% : 0.027935s : 1: opt_a 0.10% : 0.000141s : 1: opt_after_cconv 0.33% : 0.000477s : 1: opt_after_jit_grad 0.20% : 0.000292s : 1: opt_b 20.81% : 0.030508s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000038s : 1: remove_dup_value 12.45% : 0.018257s : 2: renormalize.infer 1.08% : 0.001587s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000053s : 1: rewriter_after_opt_a 0.09% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000103s : 1: symbol_engine_optimizer 17.29% : 0.025340s : 1: task_emit 0.07% : 0.000104s : 1: tuple_transform 18.55% : 0.027197s : 1: type_inference 0.06% : 0.000086s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x6-kbk],max_mem:6.0M TotalTime = 1.37096, [24] [bootstrap]: 0.00067278 [type_inference]: 0.00672134 [event_method]: 1.392e-05 [auto_monad]: 6.091e-05 [graph_reusing]: 5.85002e-06 [inline]: 1.57999e-06 [add_attr]: 0.00364303, [1] [add_attr_with_inline]: 0.0036314, [1] [Cycle 1]: 4.679e-05, [2] [tag_attr]: 1.551e-05 [meta_addattr_fg_expand]: 4.22e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.921e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.10002e-06 [pipeline_split]: 1.71002e-06 [optimize]: 0.0199942, [53] [py_interpret_to_execute]: 2.192e-05 [rewriter_before_opt_a]: 5.835e-05 [opt_a]: 0.0181147, [2] [Cycle 1]: 0.00151725, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 3.246e-05 [loop_unroll]: 2.134e-05 [a_1]: 0.00045759 [with_stream_mark]: 1.295e-05 [recompute_prepare]: 7.81001e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.65e-05 [accelerated_algorithm]: 6.62002e-06 [shard]: 2.00002e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 8.42e-06 [auto_parallel]: 5.44e-06 [parallel]: 2.445e-05 [flash_sp]: 7.21001e-06 [merge_comm]: 3.81999e-06 [allreduce_fusion]: 3.64002e-06 [matmul_add_comm_reduction]: 8.81002e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 7.53999e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.58002e-06 [virtual_output]: 5.69e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.035e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.21998e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 9.20001e-06 [renormalize]: 0.00040889 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.387e-05 [cse]: 2.862e-05 [a_3]: 3.95e-05 [Cycle 2]: 0.0165842, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.71e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.00012432 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.48002e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.698e-05 [accelerated_algorithm]: 5.35001e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.22e-06 [parallel]: 4.52998e-06 [flash_sp]: 3.5e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 2.99973e-07 [virtual_shard_identity]: 6.02999e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.45002e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 5.68002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.05999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.94997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.09e-06 [a_after_grad]: 7.75998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 5.79999e-06 [cse]: 1.271e-05 [a_3]: 9.139e-05 [py_interpret_to_execute_after_opt_a]: 1.103e-05 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.635e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 4.97e-06 [mutable_eliminate]: 0.00045994 [opt_b]: 0.00018858, [1] [Cycle 1]: 0.00018224, [7] [b_1]: 0.00011405 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 4.69998e-07 [cse]: 1.66e-05 [optimize_parallel_all_gather_comm]: 1.598e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 2.416e-05 [loop_unroll]: 0.00041681 [opt_after_cconv]: 9.414e-05, [1] [Cycle 1]: 8.884e-05, [7] [c_1]: 2.743e-05 [parameter_eliminate]: 2.54999e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.622e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.353e-05 [tuple_transform]: 6.835e-05, [1] [Cycle 1]: 6.404e-05, [4] [d_1]: 3.826e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 5.283e-05 [cse_after_recomputation]: 2.103e-05, [1] [Cycle 1]: 1.671e-05, [1] [cse]: 1.147e-05 [environ_conv]: 4.67998e-06 [swap_dp_allreduce_reducescatter]: 5.86e-06 [bias_add_comm_swap]: 2.50002e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.26997e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 7.80012e-07 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.38002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.22e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55001e-06 [overlap_recompute_comm]: 1.98997e-06 [overlap_grad_ring_attention]: 4.12e-06 [overlap_grad_flash_sp]: 1.785e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 6.916e-05, [1] [Cycle 1]: 6.516e-05, [6] [build]: 2.45002e-06 [elim_shapecalc]: 8.57e-06 [elim_not_effective]: 1.218e-05 [opt_reshape]: 6.26998e-06 [fold_const_symbol]: 9.32999e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.81998e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.667e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.00045126 [validate]: 3.178e-05 [backend_pass]: 8.50006e-07 [task_emit]: 1.33906 [execute]: 1.031e-05 Sums bootstrap : 0.000673s : 0.05% type_inference : 0.006721s : 0.50% event_method : 0.000014s : 0.00% auto_monad : 0.000061s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000582s : 0.04% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000409s : 0.03% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000041s : 0.00% optimize.opt_a.a_3 : 0.000131s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000460s : 0.03% optimize.opt_b.b_1 : 0.000114s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000417s : 0.03% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000451s : 0.03% validate : 0.000032s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.339055s : 99.16% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000167 30 14.83% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000002s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 4: substitution.graph_param_transform 66.40% : 0.000111s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.47% : 0.000004s : 4: substitution.remove_not_recompute_node 2.35% : 0.000004s : 4: substitution.replace_old_param 7.08% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006674 2 91.65% : 0.006117s : 1: type_inference.infer 8.35% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.04% : 0.000028s : 3: replace.inline 29.96% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.08% : 0.000109s : 3: match.inline 8.92% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.18% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.12% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.98% : 0.000002s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.61% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.89% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000347 8 48.29% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.71% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.396099 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.26% : 0.003647s : 1: add_attr 0.26% : 0.003635s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000067s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.05% : 0.000717s : 1: bootstrap 0.00% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.03% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.03% : 0.000469s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.07% : 0.000945s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000094s : 28: opt.transform.opt_b 0.00% : 0.000042s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.30% : 0.018119s : 1: opt_a 0.01% : 0.000097s : 1: opt_after_cconv 0.03% : 0.000461s : 1: opt_after_jit_grad 0.01% : 0.000192s : 1: opt_b 1.43% : 0.019998s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000026s : 1: py_interpret_to_execute 0.00% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.02% : 0.000210s : 1: renormalize.infer 0.01% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000040s : 1: rewriter_after_opt_a 0.00% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 95.92% : 1.339080s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.48% : 0.006736s : 1: type_inference 0.00% : 0.000059s : 1: validate . TotalTime = 0.147548, [24] [bootstrap]: 0.00053201 [type_inference]: 0.00492868 [event_method]: 1.202e-05 [auto_monad]: 5.925e-05 [graph_reusing]: 5.39e-06 [inline]: 2.59001e-06 [add_attr]: 0.00337849, [1] [add_attr_with_inline]: 0.0033677, [1] [Cycle 1]: 5.819e-05, [2] [tag_attr]: 1.405e-05 [meta_addattr_fg_expand]: 2.96001e-06 [parallel-infer-symbol]: 3.31001e-06 [pre_auto_parallel]: 2.432e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00424682, [53] [py_interpret_to_execute]: 1.654e-05 [rewriter_before_opt_a]: 4.089e-05 [opt_a]: 0.00221986, [2] [Cycle 1]: 0.00157, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.523e-05 [loop_unroll]: 1.393e-05 [a_1]: 0.00029838 [with_stream_mark]: 2.115e-05 [recompute_prepare]: 9.66998e-06 [updatestate_depend_eliminate]: 3.91999e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.849e-05 [accelerated_algorithm]: 7.49002e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 8.43001e-06 [auto_parallel]: 7.11999e-06 [parallel]: 1.922e-05 [flash_sp]: 9.12001e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.76001e-06 [matmul_add_comm_reduction]: 9.86e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.33e-06 [virtual_dataset]: 5.62001e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.93002e-06 [merge_forward]: 4.71002e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 5.386e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.302e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 1.016e-05 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.64999e-06 [flash_sp_send_recv_attached]: 3.01001e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.163e-05 [a_after_grad]: 9.19998e-06 [renormalize]: 0.00054276 [add_forward_monad_depend]: 5.62001e-06 [auto_monad_grad]: 2.26e-06 [auto_monad_eliminator]: 1.416e-05 [cse]: 2.927e-05 [a_3]: 4.277e-05 [Cycle 2]: 0.00063965, [45] [expand_dump_flag]: 1.62001e-06 [switch_simplify]: 7.2e-06 [loop_unroll]: 5.69999e-06 [a_1]: 0.00012707 [with_stream_mark]: 1.151e-05 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.39998e-06 [a_2]: 6.843e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.62001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 6.36e-06 [auto_parallel]: 5.69e-06 [parallel]: 6.03998e-06 [flash_sp]: 3.82002e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 2.86e-06 [matmul_add_comm_reduction]: 7.21001e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 7.10998e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 4.94e-06 [merge_forward]: 3.26999e-06 [cell_reuse_recompute_pass]: 1.59e-06 [offload_activation]: 6.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 8.52e-06 [set_forward_comm_id_for_comm_node_pass]: 4.2e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 1.02998e-06 [receive_attached]: 1.74998e-06 [after_resolve]: 1.022e-05 [a_after_grad]: 7.76001e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.47001e-06 [auto_monad_grad]: 1.17999e-06 [auto_monad_eliminator]: 8.2e-06 [cse]: 1.742e-05 [a_3]: 3.218e-05 [py_interpret_to_execute_after_opt_a]: 8.92999e-06 [slice_cell_reuse_recomputed_activation]: 1.93002e-06 [rewriter_after_opt_a]: 4.091e-05 [convert_after_rewriter]: 7.82998e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00050993 [opt_b]: 0.00019214, [1] [Cycle 1]: 0.00018491, [7] [b_1]: 0.00010844 [b_2]: 7.22002e-06 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 2.74001e-06 [renormalize]: 4.69998e-07 [cse]: 2.009e-05 [optimize_parallel_all_gather_comm]: 2.337e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 2.484e-05 [loop_unroll]: 0.00047049 [opt_after_cconv]: 0.00010585, [1] [Cycle 1]: 9.888e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 3.25e-06 [updatestate_depend_eliminate]: 6.58998e-06 [updatestate_assign_eliminate]: 2.91e-06 [updatestate_loads_eliminate]: 2.48e-06 [cse]: 2.054e-05 [renormalize]: 6.60017e-07 [remove_dup_value]: 1.238e-05 [tuple_transform]: 7.104e-05, [1] [Cycle 1]: 6.65e-05, [4] [d_1]: 4.025e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.16998e-06 [partial_unused_args_eliminate]: 2.05002e-06 [add_recomputation]: 5.341e-05 [cse_after_recomputation]: 2.24e-05, [1] [Cycle 1]: 1.749e-05, [1] [cse]: 1.168e-05 [environ_conv]: 4.82998e-06 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 5.40999e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.423e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.39998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 4.35e-06 [overlap_grad_flash_sp]: 1.927e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.969e-05, [1] [Cycle 1]: 7.403e-05, [6] [build]: 3.86001e-06 [elim_shapecalc]: 1.136e-05 [elim_not_effective]: 1.184e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 8.82e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.791e-05 [get_jit_bprop_graph]: 1.46002e-06 [rewriter_after_jit_bprop_graph]: 4.46002e-06 [opt_after_jit_grad]: 0.00051729 [validate]: 4.129e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.133518 [execute]: 8.82e-06 Sums bootstrap : 0.000532s : 0.37% type_inference : 0.004929s : 3.44% event_method : 0.000012s : 0.01% auto_monad : 0.000059s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.01% optimize.rewriter_before_opt_a : 0.000041s : 0.03% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.02% optimize.opt_a.loop_unroll : 0.000020s : 0.01% optimize.opt_a.a_1 : 0.000425s : 0.30% optimize.opt_a.with_stream_mark : 0.000033s : 0.02% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.10% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.01% optimize.opt_a.auto_parallel : 0.000013s : 0.01% optimize.opt_a.parallel : 0.000025s : 0.02% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000061s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000543s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000047s : 0.03% optimize.opt_a.a_3 : 0.000075s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000510s : 0.36% optimize.opt_b.b_1 : 0.000108s : 0.08% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.02% optimize.loop_unroll : 0.000470s : 0.33% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000021s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.04% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000018s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000517s : 0.36% validate : 0.000041s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.133518s : 93.31% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000132 26 19.73% : 0.000026s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 4.34% : 0.000006s : 4: substitution.graph_param_transform 64.02% : 0.000084s : 2: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.65% : 0.000005s : 4: substitution.remove_not_recompute_node 3.62% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004874 2 91.86% : 0.004477s : 1: type_inference.infer 8.14% : 0.000397s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000083 2 100.00% : 0.000083s : 2: match.inline ------[predicate.] 0.000142 984 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 1.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.93% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.59% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.79% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 5.75% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 2.03% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.84% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.93% : 0.000003s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.36% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.72% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.05% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.89% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.58% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.39% : 0.000002s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.48% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.96% : 0.000001s : 11: predicate.switch_defer_inline 1.69% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.33% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.41% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.92% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.68% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.02% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000295 6 41.56% : 0.000123s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.44% : 0.000172s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.156665 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.16% : 0.003384s : 1: add_attr 2.15% : 0.003371s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000065s : 1: auto_monad 0.01% : 0.000023s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.36% : 0.000570s : 1: bootstrap 0.02% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000018s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.31% : 0.000482s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000521s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 0.50% : 0.000790s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000091s : 28: opt.transform.opt_b 0.03% : 0.000045s : 2: opt.transform.opt_trans_graph 0.02% : 0.000035s : 4: opt.transform.symbol_engine_opt 1.42% : 0.002223s : 1: opt_a 0.07% : 0.000110s : 1: opt_after_cconv 0.34% : 0.000532s : 1: opt_after_jit_grad 0.13% : 0.000196s : 1: opt_b 2.71% : 0.004251s : 1: optimize 0.02% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000028s : 1: pre_auto_parallel 0.01% : 0.000021s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.22% : 0.000341s : 1: renormalize.infer 0.12% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000046s : 1: rewriter_after_opt_a 0.03% : 0.000046s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000082s : 1: symbol_engine_optimizer 85.24% : 0.133540s : 1: task_emit 0.05% : 0.000074s : 1: tuple_transform 3.16% : 0.004948s : 1: type_inference 0.04% : 0.000069s : 1: validate TotalTime = 0.126936, [24] [bootstrap]: 0.00046573 [type_inference]: 0.00577964 [event_method]: 1.393e-05 [auto_monad]: 5.855e-05 [graph_reusing]: 5.87001e-06 [inline]: 2.07001e-06 [add_attr]: 0.00302638, [1] [add_attr_with_inline]: 0.00301798, [1] [Cycle 1]: 4.442e-05, [2] [tag_attr]: 1.565e-05 [meta_addattr_fg_expand]: 3.92998e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.621e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 6.30011e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0039772, [53] [py_interpret_to_execute]: 2.164e-05 [rewriter_before_opt_a]: 5.855e-05 [opt_a]: 0.00214612, [2] [Cycle 1]: 0.00153735, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 3.296e-05 [loop_unroll]: 2.079e-05 [a_1]: 0.00044555 [with_stream_mark]: 1.278e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.45e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.529e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 2.53e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.60999e-06 [auto_parallel]: 5.56998e-06 [parallel]: 1.778e-05 [flash_sp]: 6.86999e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.20998e-06 [matmul_add_comm_reduction]: 8.47998e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 6.91001e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.45998e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.045e-05 [merge_recompute_call_nodes]: 1.82001e-06 [before_grad]: 8.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.48e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 8.87e-06 [renormalize]: 0.00043434 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 2.23002e-06 [auto_monad_eliminator]: 1.333e-05 [cse]: 2.85e-05 [a_3]: 4.074e-05 [Cycle 2]: 0.00059931, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.88998e-06 [loop_unroll]: 5.74e-06 [a_1]: 0.00012456 [with_stream_mark]: 1.087e-05 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 6.838e-05 [accelerated_algorithm]: 5.43002e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.35001e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.25001e-06 [parallel]: 4.1e-06 [flash_sp]: 3.71001e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 5.20999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 4.99003e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 4.50001e-06 [cell_reuse_recompute_pass]: 1.48002e-06 [offload_activation]: 7.45e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 7.95e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.61e-06 [cse]: 1.38e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 1.76998e-06 [rewriter_after_opt_a]: 3.216e-05 [convert_after_rewriter]: 7.12002e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00044623 [opt_b]: 0.00017883, [1] [Cycle 1]: 0.00017287, [7] [b_1]: 0.00010631 [b_2]: 6.74001e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 3.59985e-07 [cse]: 1.66e-05 [optimize_parallel_all_gather_comm]: 1.523e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.329e-05 [loop_unroll]: 0.00041134 [opt_after_cconv]: 9.474e-05, [1] [Cycle 1]: 8.898e-05, [7] [c_1]: 2.753e-05 [parameter_eliminate]: 2.35002e-06 [updatestate_depend_eliminate]: 4.79002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.662e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 7.09e-05, [1] [Cycle 1]: 6.668e-05, [4] [d_1]: 3.985e-05 [none_parameter_eliminate]: 2.01998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 4.715e-05 [cse_after_recomputation]: 2.115e-05, [1] [Cycle 1]: 1.656e-05, [1] [cse]: 1.152e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.07999e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.45997e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.38998e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.60999e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 8.99978e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 8.40024e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.142e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.70998e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.745e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 6.725e-05, [1] [Cycle 1]: 6.323e-05, [6] [build]: 2.79001e-06 [elim_shapecalc]: 7.80998e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.89998e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.484e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00044435 [validate]: 3.247e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.112853 [execute]: 9.02e-06 Sums bootstrap : 0.000466s : 0.38% type_inference : 0.005780s : 4.70% event_method : 0.000014s : 0.01% auto_monad : 0.000059s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000570s : 0.46% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000434s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.36% optimize.opt_b.b_1 : 0.000106s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000411s : 0.33% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.04% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.36% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.112853s : 91.81% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.33% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.68% : 0.000006s : 4: substitution.graph_param_transform 66.45% : 0.000109s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.67% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.65% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005738 2 90.20% : 0.005176s : 1: type_inference.infer 9.80% : 0.000562s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.90% : 0.000027s : 3: replace.inline 30.10% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.54% : 0.000107s : 3: match.inline 8.46% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.36% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.88% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.14% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.78% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.86% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.27% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 45.97% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.03% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.135459 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.24% : 0.003031s : 1: add_attr 2.23% : 0.003021s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000064s : 1: auto_monad 0.01% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.37% : 0.000501s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000456s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000934s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.59% : 0.002149s : 1: opt_a 0.07% : 0.000098s : 1: opt_after_cconv 0.33% : 0.000454s : 1: opt_after_jit_grad 0.13% : 0.000182s : 1: opt_b 2.94% : 0.003981s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000220s : 1: renormalize.infer 0.15% : 0.000207s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 83.33% : 0.112875s : 1: task_emit 0.05% : 0.000074s : 1: tuple_transform 4.28% : 0.005794s : 1: type_inference 0.04% : 0.000055s : 1: validate TotalTime = 3.53264, [24] [bootstrap]: 0.00051304 [type_inference]: 0.0119498 [event_method]: 5.017e-05 [auto_monad]: 0.00012248 [graph_reusing]: 8.85999e-06 [inline]: 1.92001e-06 [add_attr]: 0.0032519, [1] [add_attr_with_inline]: 0.00324196, [1] [Cycle 1]: 7.877e-05, [2] [tag_attr]: 3.694e-05 [meta_addattr_fg_expand]: 9.22999e-06 [parallel-infer-symbol]: 3.43e-06 [pre_auto_parallel]: 5.261e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.96e-06 [optimize]: 0.0147402, [53] [py_interpret_to_execute]: 4.157e-05 [rewriter_before_opt_a]: 0.00014989 [opt_a]: 0.012199, [3] [Cycle 1]: 0.00783979, [45] [expand_dump_flag]: 3.98001e-06 [switch_simplify]: 7.603e-05 [loop_unroll]: 6.284e-05 [a_1]: 0.00156077 [with_stream_mark]: 2.675e-05 [recompute_prepare]: 2.578e-05 [updatestate_depend_eliminate]: 9.80002e-06 [updatestate_assign_eliminate]: 8.07998e-06 [updatestate_loads_eliminate]: 7.18e-06 [parameter_eliminate]: 2.83998e-06 [a_2]: 0.00024732 [accelerated_algorithm]: 3.297e-05 [shard]: 2.13002e-06 [meta_shard_fg_expand]: 3.20998e-06 [shard_inline]: 1.603e-05 [merge_send_recv]: 1.684e-05 [auto_parallel]: 1.276e-05 [parallel]: 2.029e-05 [flash_sp]: 1.216e-05 [merge_comm]: 9.92001e-06 [allreduce_fusion]: 8.97999e-06 [matmul_add_comm_reduction]: 2.882e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.812e-05 [virtual_dataset]: 1.569e-05 [get_grad_eliminate_]: 1.505e-05 [virtual_output]: 1.518e-05 [merge_forward]: 9.94001e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 1.837e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.903e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.828e-05 [set_forward_comm_id_for_comm_node_pass]: 9.93998e-06 [meta_fg_expand]: 0.00154741 [flash_sp_send_recv_attached]: 3.88001e-06 [receive_attached]: 2.51e-06 [after_resolve]: 6.312e-05 [a_after_grad]: 8.333e-05 [renormalize]: 0.00287916 [add_forward_monad_depend]: 1.095e-05 [auto_monad_grad]: 5.97001e-06 [auto_monad_eliminator]: 5.872e-05 [cse]: 0.0001731 [a_3]: 0.00034174 [Cycle 2]: 0.00340527, [45] [expand_dump_flag]: 2.09e-06 [switch_simplify]: 4.751e-05 [loop_unroll]: 4.467e-05 [a_1]: 0.00170659 [with_stream_mark]: 1.884e-05 [recompute_prepare]: 1.405e-05 [updatestate_depend_eliminate]: 5.89999e-06 [updatestate_assign_eliminate]: 5.42001e-06 [updatestate_loads_eliminate]: 4.07998e-06 [parameter_eliminate]: 1.71002e-06 [a_2]: 0.00012763 [accelerated_algorithm]: 1.289e-05 [shard]: 1.61998e-06 [meta_shard_fg_expand]: 2.01e-06 [shard_inline]: 9.05001e-06 [merge_send_recv]: 8.03001e-06 [auto_parallel]: 9.29998e-06 [parallel]: 6.67002e-06 [flash_sp]: 3.87998e-06 [merge_comm]: 5.10999e-06 [allreduce_fusion]: 5.19e-06 [matmul_add_comm_reduction]: 9.96e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 1.048e-05 [virtual_dataset]: 8.89e-06 [get_grad_eliminate_]: 9.74999e-06 [virtual_output]: 9.04e-06 [merge_forward]: 5.12e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 1.139e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.696e-05 [merge_recompute_call_nodes]: 1.02998e-06 [before_grad]: 1.441e-05 [set_forward_comm_id_for_comm_node_pass]: 5.77001e-06 [meta_fg_expand]: 8.626e-05 [flash_sp_send_recv_attached]: 1.38002e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.742e-05 [a_after_grad]: 1.487e-05 [renormalize]: 0.00071264 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 2.11e-06 [auto_monad_eliminator]: 1.608e-05 [cse]: 5.314e-05 [a_3]: 6.766e-05 [Cycle 3]: 0.00093595, [45] [expand_dump_flag]: 1.65001e-06 [switch_simplify]: 1.109e-05 [loop_unroll]: 8.82999e-06 [a_1]: 0.00025287 [with_stream_mark]: 1.043e-05 [recompute_prepare]: 9.75002e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.03001e-06 [parameter_eliminate]: 1.33002e-06 [a_2]: 0.00012414 [accelerated_algorithm]: 1.235e-05 [shard]: 1.29e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.20999e-06 [merge_send_recv]: 7.90998e-06 [auto_parallel]: 8.12e-06 [parallel]: 5.02e-06 [flash_sp]: 1.54e-06 [merge_comm]: 5.11002e-06 [allreduce_fusion]: 5.34e-06 [matmul_add_comm_reduction]: 8.40001e-06 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 1.033e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.25e-06 [virtual_output]: 8.30999e-06 [merge_forward]: 4.94e-06 [cell_reuse_recompute_pass]: 1.80001e-06 [offload_activation]: 9.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.6e-05 [merge_recompute_call_nodes]: 1.01997e-06 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.68002e-06 [meta_fg_expand]: 2.88003e-06 [flash_sp_send_recv_attached]: 1.49e-06 [receive_attached]: 1.71998e-06 [after_resolve]: 1.597e-05 [a_after_grad]: 1.501e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.59998e-06 [auto_monad_grad]: 1.34e-06 [auto_monad_eliminator]: 1.281e-05 [cse]: 2.989e-05 [a_3]: 5.899e-05 [py_interpret_to_execute_after_opt_a]: 1.592e-05 [slice_cell_reuse_recomputed_activation]: 2.34001e-06 [rewriter_after_opt_a]: 5.161e-05 [convert_after_rewriter]: 9.79e-06 [order_py_execute_after_rewriter]: 7.14001e-06 [mutable_eliminate]: 0.00061454 [opt_b]: 0.00029873, [1] [Cycle 1]: 0.00029114, [7] [b_1]: 0.00018911 [b_2]: 1.105e-05 [updatestate_depend_eliminate]: 8.79e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 4.56002e-06 [renormalize]: 5.00004e-07 [cse]: 3.702e-05 [optimize_parallel_all_gather_comm]: 2.185e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.561e-05 [loop_unroll]: 0.00047414 [opt_after_cconv]: 0.00014285, [1] [Cycle 1]: 0.00013517, [7] [c_1]: 4.832e-05 [parameter_eliminate]: 2.76e-06 [updatestate_depend_eliminate]: 7.65e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 4.05e-06 [cse]: 3.36e-05 [renormalize]: 4.49974e-07 [remove_dup_value]: 3.307e-05 [tuple_transform]: 0.00010151, [1] [Cycle 1]: 9.688e-05, [4] [d_1]: 6.619e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.006e-05 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 6.076e-05 [cse_after_recomputation]: 3.389e-05, [1] [Cycle 1]: 2.901e-05, [1] [cse]: 2.34e-05 [environ_conv]: 1.056e-05 [swap_dp_allreduce_reducescatter]: 7.77e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 5.29e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 9.79984e-07 [remove_cast_before_assign_add]: 1.35001e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.58003e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88997e-06 [control_data_broadcast_order]: 1.765e-05 [grouped_pairwise_exchange_alltoall]: 1.96e-06 [offloading_packed_experts]: 5.18002e-06 [overlap_recompute_and_grad_model_parallel]: 5.99999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.68002e-06 [overlap_recompute_comm]: 2.04999e-06 [overlap_grad_ring_attention]: 5.22999e-06 [overlap_grad_flash_sp]: 2.778e-05 [begin_end_overlap_inline]: 6.89994e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.16997e-06 [symbol_engine_optimizer]: 0.00010152, [1] [Cycle 1]: 9.695e-05, [6] [build]: 1.152e-05 [elim_shapecalc]: 1.354e-05 [elim_not_effective]: 1.828e-05 [opt_reshape]: 1.026e-05 [fold_const_symbol]: 1.447e-05 [renormalize]: 2.80008e-07 [detach_backward]: 2.21e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 2.528e-05 [get_jit_bprop_graph]: 1.52001e-06 [rewriter_after_jit_bprop_graph]: 4.26001e-06 [opt_after_jit_grad]: 0.00051297 [validate]: 5.108e-05 [backend_pass]: 1.17e-06 [task_emit]: 3.50109 [execute]: 9.64e-06 Sums bootstrap : 0.000513s : 0.01% type_inference : 0.011950s : 0.34% event_method : 0.000050s : 0.00% auto_monad : 0.000122s : 0.00% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000053s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.00% optimize.rewriter_before_opt_a : 0.000150s : 0.00% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000135s : 0.00% optimize.opt_a.loop_unroll : 0.000116s : 0.00% optimize.opt_a.a_1 : 0.003520s : 0.10% optimize.opt_a.with_stream_mark : 0.000056s : 0.00% optimize.opt_a.recompute_prepare : 0.000050s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.00% optimize.opt_a.merge_send_recv : 0.000033s : 0.00% optimize.opt_a.auto_parallel : 0.000030s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000018s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000020s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000047s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.00% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000033s : 0.00% optimize.opt_a.merge_forward : 0.000020s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000039s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001637s : 0.05% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000097s : 0.00% optimize.opt_a.a_after_grad : 0.000113s : 0.00% optimize.opt_a.renormalize : 0.003592s : 0.10% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000088s : 0.00% optimize.opt_a.cse : 0.000256s : 0.01% optimize.opt_a.a_3 : 0.000468s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000052s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000615s : 0.02% optimize.opt_b.b_1 : 0.000189s : 0.01% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000037s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000474s : 0.01% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000033s : 0.00% optimize.tuple_transform.d_1 : 0.000066s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.00% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000513s : 0.01% validate : 0.000051s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 3.501089s : 99.24% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000968 222 5.85% : 0.000057s : 12: substitution.arithmetic_simplify 10.85% : 0.000105s : 2: substitution.cast_eliminate 0.28% : 0.000003s : 5: substitution.elim_not_effective 0.42% : 0.000004s : 5: substitution.float_depend_g_call 0.44% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.21% : 0.000002s : 5: substitution.fold_const_symbol 0.81% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.20% : 0.000002s : 2: substitution.incorporate_call_switch 51.82% : 0.000501s : 17: substitution.inline 1.82% : 0.000018s : 2: substitution.inline_without_move 1.11% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.80% : 0.000017s : 3: substitution.less_batch_normalization 1.43% : 0.000014s : 11: substitution.minmaximum_grad 0.61% : 0.000006s : 5: substitution.partial_eliminate 1.42% : 0.000014s : 20: substitution.remove_not_recompute_node 2.83% : 0.000027s : 10: substitution.replace_applicator 1.25% : 0.000012s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.11% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.47% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 1.93% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.59% : 0.000073s : 30: substitution.tuple_list_get_item_eliminator 2.06% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011867 2 86.93% : 0.010316s : 1: type_inference.infer 13.07% : 0.001551s : 1: type_inference.specialize ------[replace.] 0.000239 33 60.16% : 0.000144s : 17: replace.inline 39.84% : 0.000095s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000531 33 92.88% : 0.000493s : 17: match.inline 7.12% : 0.000038s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000764 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.03% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.01% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.46% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.23% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000014s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000043s : 249: predicate.inline 1.22% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.31% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.40% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.35% : 0.000018s : 101: predicate.partial_defer_inline 1.71% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.63% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000015s : 152: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.26% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.42% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.23% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.22% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001704 34 57.21% : 0.000975s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.79% : 0.000729s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 3.559759 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.09% : 0.003257s : 1: add_attr 0.09% : 0.003246s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.00% : 0.000065s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000130s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.02% : 0.000552s : 1: bootstrap 0.00% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.00% : 0.000058s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.01% : 0.000484s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.02% : 0.000625s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000019s : 1: opt.transform.mutable_eliminate 0.15% : 0.005217s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000174s : 28: opt.transform.opt_b 0.00% : 0.000074s : 2: opt.transform.opt_trans_graph 0.00% : 0.000053s : 4: opt.transform.symbol_engine_opt 0.34% : 0.012202s : 1: opt_a 0.00% : 0.000146s : 1: opt_after_cconv 0.01% : 0.000523s : 1: opt_after_jit_grad 0.01% : 0.000302s : 1: opt_b 0.41% : 0.014745s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000058s : 1: pre_auto_parallel 0.00% : 0.000046s : 1: py_interpret_to_execute 0.00% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000037s : 1: remove_dup_value 0.06% : 0.001998s : 2: renormalize.infer 0.04% : 0.001577s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000057s : 1: rewriter_after_opt_a 0.00% : 0.000156s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000104s : 1: symbol_engine_optimizer 98.35% : 3.501112s : 1: task_emit 0.00% : 0.000105s : 1: tuple_transform 0.34% : 0.011969s : 1: type_inference 0.00% : 0.000083s : 1: validate TotalTime = 0.081381, [24] [bootstrap]: 0.0005158 [type_inference]: 0.00473125 [event_method]: 1.101e-05 [auto_monad]: 5.169e-05 [graph_reusing]: 5.44e-06 [inline]: 1.76998e-06 [add_attr]: 0.00300347, [1] [add_attr_with_inline]: 0.00299479, [1] [Cycle 1]: 5.163e-05, [2] [tag_attr]: 1.22e-05 [meta_addattr_fg_expand]: 3.43e-06 [parallel-infer-symbol]: 2.97002e-06 [pre_auto_parallel]: 7.255e-05 [insert-virtual-dataset]: 2.83e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00366235, [53] [py_interpret_to_execute]: 1.582e-05 [rewriter_before_opt_a]: 4.004e-05 [opt_a]: 0.00186325, [2] [Cycle 1]: 0.00126355, [45] [expand_dump_flag]: 2.28998e-06 [switch_simplify]: 2.452e-05 [loop_unroll]: 1.335e-05 [a_1]: 0.00028915 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 7.46001e-06 [updatestate_depend_eliminate]: 3.56001e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.507e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 8.3e-06 [auto_parallel]: 5.82999e-06 [parallel]: 1.683e-05 [flash_sp]: 7.72998e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 7.43999e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.32999e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 3.52002e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.54999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 9.24e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 1.97001e-06 [after_resolve]: 1.033e-05 [a_after_grad]: 8.40999e-06 [renormalize]: 0.00035737 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.763e-05 [a_3]: 4.031e-05 [Cycle 2]: 0.00058991, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.66e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012494 [with_stream_mark]: 9.04998e-06 [recompute_prepare]: 5.78002e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.639e-05 [accelerated_algorithm]: 5.43997e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.42001e-06 [parallel]: 3.98999e-06 [flash_sp]: 3.23e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 4.97e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.22001e-06 [virtual_dataset]: 5.54998e-06 [get_grad_eliminate_]: 5.02999e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.64999e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.14998e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.28999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 9.09e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.07001e-06 [cse]: 1.335e-05 [a_3]: 3.231e-05 [py_interpret_to_execute_after_opt_a]: 7.83999e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.008e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 5.14998e-06 [mutable_eliminate]: 0.00044866 [opt_b]: 0.00017899, [1] [Cycle 1]: 0.00017305, [7] [b_1]: 0.00010581 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.15002e-06 [renormalize]: 4.19997e-07 [cse]: 1.613e-05 [optimize_parallel_all_gather_comm]: 1.538e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.228e-05 [loop_unroll]: 0.00041172 [opt_after_cconv]: 9.456e-05, [1] [Cycle 1]: 8.884e-05, [7] [c_1]: 2.742e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.653e-05 [renormalize]: 5.29981e-07 [remove_dup_value]: 1.192e-05 [tuple_transform]: 6.879e-05, [1] [Cycle 1]: 6.45e-05, [4] [d_1]: 3.9e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.361e-05 [cse_after_recomputation]: 1.991e-05, [1] [Cycle 1]: 1.575e-05, [1] [cse]: 1.069e-05 [environ_conv]: 4.83001e-06 [swap_dp_allreduce_reducescatter]: 4.87e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 3.86999e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.59999e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 8.79983e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.21002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.92002e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.636e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.747e-05, [1] [Cycle 1]: 6.351e-05, [6] [build]: 2.53e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.099e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.60001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.95001e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.522e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.22997e-06 [opt_after_jit_grad]: 0.00052459 [validate]: 3.231e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.0685173 [execute]: 8.97e-06 Sums bootstrap : 0.000516s : 0.67% type_inference : 0.004731s : 6.11% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000073s : 0.09% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.05% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000414s : 0.53% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.18% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000357s : 0.46% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.58% optimize.opt_b.b_1 : 0.000106s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.53% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000525s : 0.68% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.068517s : 88.51% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 17.89% : 0.000022s : 4: substitution.arithmetic_simplify 1.50% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.35% : 0.000005s : 4: substitution.graph_param_transform 65.98% : 0.000079s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.35% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004690 2 92.30% : 0.004329s : 1: type_inference.infer 7.70% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.29% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.79% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.93% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.14% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.26% : 0.000002s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.83% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.57% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 41.37% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.63% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.089319 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.37% : 0.003008s : 1: add_attr 3.36% : 0.002998s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.62% : 0.000551s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.47% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.85% : 0.000763s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.09% : 0.001866s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.60% : 0.000535s : 1: opt_after_jit_grad 0.20% : 0.000182s : 1: opt_b 4.10% : 0.003666s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000077s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.22% : 0.000197s : 1: renormalize.infer 0.17% : 0.000153s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.73% : 0.068538s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 5.31% : 0.004745s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.247964, [24] [bootstrap]: 0.00051295 [type_inference]: 0.0112266 [event_method]: 4.425e-05 [auto_monad]: 0.00011758 [graph_reusing]: 8.28001e-06 [inline]: 2.16e-06 [add_attr]: 0.00318074, [1] [add_attr_with_inline]: 0.00317046, [1] [Cycle 1]: 8.145e-05, [2] [tag_attr]: 3.702e-05 [meta_addattr_fg_expand]: 8.59e-06 [parallel-infer-symbol]: 3.53999e-06 [pre_auto_parallel]: 5.079e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.0143906, [53] [py_interpret_to_execute]: 3.801e-05 [rewriter_before_opt_a]: 0.00018352 [opt_a]: 0.0118806, [3] [Cycle 1]: 0.00760454, [45] [expand_dump_flag]: 4.35999e-06 [switch_simplify]: 6.811e-05 [loop_unroll]: 5.431e-05 [a_1]: 0.00138613 [with_stream_mark]: 2.596e-05 [recompute_prepare]: 2.326e-05 [updatestate_depend_eliminate]: 9.20001e-06 [updatestate_assign_eliminate]: 8.23999e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.88e-06 [a_2]: 0.00025084 [accelerated_algorithm]: 3.246e-05 [shard]: 2.26e-06 [meta_shard_fg_expand]: 3.46001e-06 [shard_inline]: 1.621e-05 [merge_send_recv]: 1.663e-05 [auto_parallel]: 1.185e-05 [parallel]: 1.934e-05 [flash_sp]: 1.215e-05 [merge_comm]: 9.90002e-06 [allreduce_fusion]: 8.80001e-06 [matmul_add_comm_reduction]: 3.003e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.907e-05 [virtual_dataset]: 1.584e-05 [get_grad_eliminate_]: 1.516e-05 [virtual_output]: 1.513e-05 [merge_forward]: 1.066e-05 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.857e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.844e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 2.679e-05 [set_forward_comm_id_for_comm_node_pass]: 1.031e-05 [meta_fg_expand]: 0.00154631 [flash_sp_send_recv_attached]: 4.1e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 6.131e-05 [a_after_grad]: 8.238e-05 [renormalize]: 0.00278132 [add_forward_monad_depend]: 1.035e-05 [auto_monad_grad]: 6.12999e-06 [auto_monad_eliminator]: 5.792e-05 [cse]: 0.00023531 [a_3]: 0.00034171 [Cycle 2]: 0.00314959, [45] [expand_dump_flag]: 2.26e-06 [switch_simplify]: 4.733e-05 [loop_unroll]: 4.352e-05 [a_1]: 0.00157821 [with_stream_mark]: 1.531e-05 [recompute_prepare]: 1.187e-05 [updatestate_depend_eliminate]: 6.02999e-06 [updatestate_assign_eliminate]: 4.77998e-06 [updatestate_loads_eliminate]: 3.90998e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 0.00012927 [accelerated_algorithm]: 1.3e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.99e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 7.90998e-06 [auto_parallel]: 8.67998e-06 [parallel]: 6.16998e-06 [flash_sp]: 3.68e-06 [merge_comm]: 5.69e-06 [allreduce_fusion]: 5.04e-06 [matmul_add_comm_reduction]: 9.05001e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.047e-05 [virtual_dataset]: 8.84e-06 [get_grad_eliminate_]: 8.57e-06 [virtual_output]: 8.38001e-06 [merge_forward]: 5.05001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.85002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.728e-05 [merge_recompute_call_nodes]: 1.02998e-06 [before_grad]: 1.471e-05 [set_forward_comm_id_for_comm_node_pass]: 5.45001e-06 [meta_fg_expand]: 4.431e-05 [flash_sp_send_recv_attached]: 1.20999e-06 [receive_attached]: 1.62001e-06 [after_resolve]: 1.628e-05 [a_after_grad]: 1.487e-05 [renormalize]: 0.00066827 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 2.17001e-06 [auto_monad_eliminator]: 1.698e-05 [cse]: 5.009e-05 [a_3]: 6.675e-05 [Cycle 3]: 0.0011088, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 1.101e-05 [loop_unroll]: 8.95001e-06 [a_1]: 0.00025379 [with_stream_mark]: 1.088e-05 [recompute_prepare]: 9.31e-06 [updatestate_depend_eliminate]: 4.87998e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 7.153e-05 [parameter_eliminate]: 1.41002e-06 [a_2]: 0.00014777 [accelerated_algorithm]: 2.995e-05 [shard]: 1.46998e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.37001e-06 [merge_send_recv]: 8.67998e-06 [auto_parallel]: 8.37e-06 [parallel]: 1.95e-05 [flash_sp]: 1.39e-06 [merge_comm]: 5.29e-06 [allreduce_fusion]: 4.94998e-06 [matmul_add_comm_reduction]: 2.134e-05 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 1.094e-05 [virtual_dataset]: 9.12999e-06 [get_grad_eliminate_]: 8.58001e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.70999e-06 [cell_reuse_recompute_pass]: 1.67999e-06 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.778e-05 [merge_recompute_call_nodes]: 8.90024e-07 [before_grad]: 1.411e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37999e-06 [meta_fg_expand]: 3.06001e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.60999e-06 [after_resolve]: 1.566e-05 [a_after_grad]: 1.49e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 2.12999e-06 [auto_monad_grad]: 1.45001e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 3.243e-05 [a_3]: 6.033e-05 [py_interpret_to_execute_after_opt_a]: 1.469e-05 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 4.968e-05 [convert_after_rewriter]: 9.60001e-06 [order_py_execute_after_rewriter]: 7.03e-06 [mutable_eliminate]: 0.00053768 [opt_b]: 0.00029992, [1] [Cycle 1]: 0.00029214, [7] [b_1]: 0.00019067 [b_2]: 1.087e-05 [updatestate_depend_eliminate]: 8.54e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 4.42998e-06 [renormalize]: 2.69996e-07 [cse]: 3.744e-05 [optimize_parallel_all_gather_comm]: 2.339e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.331e-05 [loop_unroll]: 0.00046896 [opt_after_cconv]: 0.00014497, [1] [Cycle 1]: 0.00013727, [7] [c_1]: 4.836e-05 [parameter_eliminate]: 2.61e-06 [updatestate_depend_eliminate]: 8.25e-06 [updatestate_assign_eliminate]: 4.29002e-06 [updatestate_loads_eliminate]: 3.97002e-06 [cse]: 3.521e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 3.28e-05 [tuple_transform]: 0.0001037, [1] [Cycle 1]: 9.891e-05, [4] [d_1]: 6.736e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.038e-05 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 6.355e-05 [cse_after_recomputation]: 3.535e-05, [1] [Cycle 1]: 3.032e-05, [1] [cse]: 2.414e-05 [environ_conv]: 1.042e-05 [swap_dp_allreduce_reducescatter]: 8.33001e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.72998e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.13002e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.97002e-06 [comm_op_add_attrs]: 1.30999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.37999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49e-06 [control_data_broadcast_order]: 1.832e-05 [grouped_pairwise_exchange_alltoall]: 1.99e-06 [offloading_packed_experts]: 5.39e-06 [overlap_recompute_and_grad_model_parallel]: 5.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60999e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 5.22e-06 [overlap_grad_flash_sp]: 2.759e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.42001e-06 [split_layernorm_comm]: 1.61002e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 0.00010886, [1] [Cycle 1]: 0.00010379, [6] [build]: 1.156e-05 [elim_shapecalc]: 1.582e-05 [elim_not_effective]: 1.891e-05 [opt_reshape]: 1.045e-05 [fold_const_symbol]: 1.507e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.34001e-06 [pipeline_parallel_scheduler]: 1.69998e-06 [auto_monad_reorder]: 2.607e-05 [get_jit_bprop_graph]: 1.76e-06 [rewriter_after_jit_bprop_graph]: 4.25e-06 [opt_after_jit_grad]: 0.00052451 [validate]: 4.943e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.217551 [execute]: 8.53001e-06 Sums bootstrap : 0.000513s : 0.21% type_inference : 0.011227s : 4.61% event_method : 0.000044s : 0.02% auto_monad : 0.000118s : 0.05% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000051s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.02% optimize.rewriter_before_opt_a : 0.000184s : 0.08% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000126s : 0.05% optimize.opt_a.loop_unroll : 0.000107s : 0.04% optimize.opt_a.a_1 : 0.003218s : 1.32% optimize.opt_a.with_stream_mark : 0.000052s : 0.02% optimize.opt_a.recompute_prepare : 0.000044s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000083s : 0.03% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000528s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000075s : 0.03% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.01% optimize.opt_a.merge_send_recv : 0.000033s : 0.01% optimize.opt_a.auto_parallel : 0.000029s : 0.01% optimize.opt_a.parallel : 0.000045s : 0.02% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000060s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.02% optimize.opt_a.virtual_dataset : 0.000034s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.01% optimize.opt_a.virtual_output : 0.000032s : 0.01% optimize.opt_a.merge_forward : 0.000020s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001594s : 0.65% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000093s : 0.04% optimize.opt_a.a_after_grad : 0.000112s : 0.05% optimize.opt_a.renormalize : 0.003450s : 1.42% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000010s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000088s : 0.04% optimize.opt_a.cse : 0.000318s : 0.13% optimize.opt_a.a_3 : 0.000469s : 0.19% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.02% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000538s : 0.22% optimize.opt_b.b_1 : 0.000191s : 0.08% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000037s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.01% optimize.loop_unroll : 0.000469s : 0.19% optimize.opt_after_cconv.c_1 : 0.000048s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000035s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000033s : 0.01% optimize.tuple_transform.d_1 : 0.000067s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000064s : 0.03% optimize.cse_after_recomputation.cse : 0.000024s : 0.01% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000525s : 0.22% validate : 0.000049s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.217551s : 89.39% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000824 218 6.15% : 0.000051s : 11: substitution.arithmetic_simplify 1.92% : 0.000016s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 56.09% : 0.000462s : 16: substitution.inline 2.15% : 0.000018s : 2: substitution.inline_without_move 1.28% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000017s : 3: substitution.less_batch_normalization 1.63% : 0.000013s : 11: substitution.minmaximum_grad 0.63% : 0.000005s : 5: substitution.partial_eliminate 1.72% : 0.000014s : 20: substitution.remove_not_recompute_node 3.27% : 0.000027s : 10: substitution.replace_applicator 1.51% : 0.000012s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.70% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.66% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.29% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.11% : 0.000067s : 28: substitution.tuple_list_get_item_eliminator 2.35% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011148 2 87.22% : 0.009722s : 1: type_inference.infer 12.78% : 0.001425s : 1: type_inference.specialize ------[replace.] 0.000210 30 60.50% : 0.000127s : 16: replace.inline 39.50% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000486 30 93.32% : 0.000453s : 16: match.inline 6.68% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 5663 1.12% : 0.000008s : 67: predicate.accumulaten_eliminater 0.32% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.14% : 0.000016s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.47% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.65% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.61% : 0.000019s : 164: predicate.load_eliminater 0.47% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.42% : 0.000003s : 8: predicate.mutable_eliminate 0.18% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.08% : 0.000015s : 97: predicate.partial_defer_inline 1.66% : 0.000012s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.57% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.62% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.85% : 0.000014s : 149: predicate.replace_applicator 0.67% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.34% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.71% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.33% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.78% : 0.000013s : 97: predicate.switch_defer_inline 2.88% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000037s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.04% : 0.000008s : 67: predicate.transpose_eliminate 1.43% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.97% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.59% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002123 32 65.58% : 0.001393s : 12: func_graph_cloner_run.FuncGraphClonerGraph 34.42% : 0.000731s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.274237 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.16% : 0.003186s : 1: add_attr 1.16% : 0.003174s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000069s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000126s : 1: auto_monad 0.01% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.20% : 0.000551s : 1: bootstrap 0.01% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.01% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000014s : 1: environ_conv 0.02% : 0.000052s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.17% : 0.000479s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.20% : 0.000548s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000020s : 1: opt.transform.mutable_eliminate 1.80% : 0.004927s : 117: opt.transform.opt_a 0.02% : 0.000047s : 1: opt.transform.opt_after_cconv 0.01% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000175s : 28: opt.transform.opt_b 0.03% : 0.000076s : 2: opt.transform.opt_trans_graph 0.02% : 0.000057s : 4: opt.transform.symbol_engine_opt 4.33% : 0.011884s : 1: opt_a 0.05% : 0.000149s : 1: opt_after_cconv 0.20% : 0.000538s : 1: opt_after_jit_grad 0.11% : 0.000304s : 1: opt_b 5.25% : 0.014395s : 1: optimize 0.01% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000056s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000037s : 1: remove_dup_value 0.68% : 0.001861s : 2: renormalize.infer 0.57% : 0.001574s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000054s : 1: rewriter_after_opt_a 0.07% : 0.000191s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000112s : 1: symbol_engine_optimizer 79.34% : 0.217574s : 1: task_emit 0.04% : 0.000107s : 1: tuple_transform 4.10% : 0.011247s : 1: type_inference 0.03% : 0.000081s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x6-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x7-pynative],max_mem:6.0M TotalTime = 0.0218358, [24] [bootstrap]: 0.00051745 [type_inference]: 0.00638148 [event_method]: 1.394e-05 [auto_monad]: 5.523e-05 [graph_reusing]: 5.30001e-06 [inline]: 1.79e-06 [add_attr]: 0.00350987, [1] [add_attr_with_inline]: 0.00349864, [1] [Cycle 1]: 4.405e-05, [2] [tag_attr]: 1.561e-05 [meta_addattr_fg_expand]: 3.88001e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.962e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00399837, [53] [py_interpret_to_execute]: 2.006e-05 [rewriter_before_opt_a]: 5.921e-05 [opt_a]: 0.00212549, [2] [Cycle 1]: 0.00152753, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 3.299e-05 [loop_unroll]: 2.077e-05 [a_1]: 0.00045285 [with_stream_mark]: 1.33e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.45e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.472e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 1.79e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 8.30999e-06 [auto_parallel]: 5.82999e-06 [parallel]: 2.62e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 8.55001e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 6.87002e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 8.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.56998e-06 [before_grad]: 9.91e-06 [set_forward_comm_id_for_comm_node_pass]: 3.40998e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.36998e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 1.039e-05 [a_after_grad]: 8.75999e-06 [renormalize]: 0.00042951 [add_forward_monad_depend]: 5.09e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.343e-05 [cse]: 2.598e-05 [a_3]: 4.089e-05 [Cycle 2]: 0.00058864, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.25001e-06 [a_1]: 0.00012409 [with_stream_mark]: 9.32001e-06 [recompute_prepare]: 5.75001e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.21998e-06 [updatestate_loads_eliminate]: 2.48002e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.805e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.35e-06 [flash_sp]: 2.82002e-06 [merge_comm]: 2.88998e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.18002e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.02999e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.73997e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.15001e-06 [after_resolve]: 9.17001e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.71e-06 [cse]: 1.295e-05 [a_3]: 3.152e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 2.809e-05 [convert_after_rewriter]: 7.28999e-06 [order_py_execute_after_rewriter]: 5.26002e-06 [mutable_eliminate]: 0.0004518 [opt_b]: 0.00019523, [1] [Cycle 1]: 0.00018923, [7] [b_1]: 0.00010625 [b_2]: 1.987e-05 [updatestate_depend_eliminate]: 5.54e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 2.30008e-07 [cse]: 1.669e-05 [optimize_parallel_all_gather_comm]: 1.63e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 2.138e-05 [loop_unroll]: 0.00042696 [opt_after_cconv]: 9.725e-05, [1] [Cycle 1]: 9.124e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.50002e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.742e-05 [renormalize]: 2.40019e-07 [remove_dup_value]: 1.209e-05 [tuple_transform]: 6.873e-05, [1] [Cycle 1]: 6.423e-05, [4] [d_1]: 3.92e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 4.959e-05 [cse_after_recomputation]: 2.109e-05, [1] [Cycle 1]: 1.672e-05, [1] [cse]: 1.17e-05 [environ_conv]: 5.28002e-06 [swap_dp_allreduce_reducescatter]: 5.33002e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.45999e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.24998e-06 [slice_recompute_activation]: 2.21e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.25999e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.41998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.52998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.703e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.96e-06 [split_layernorm_comm]: 1.55001e-06 [handle_group_info]: 1.26002e-06 [symbol_engine_optimizer]: 6.95e-05, [1] [Cycle 1]: 6.517e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 8.82e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 8.70001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.43002e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.526e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 0.00011275 [opt_after_jit_grad]: 0.00045696 [validate]: 3.419e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00647537 [execute]: 6.33998e-06 Sums bootstrap : 0.000517s : 2.98% type_inference : 0.006381s : 36.78% event_method : 0.000014s : 0.08% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000577s : 3.33% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000031s : 0.18% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000014s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000430s : 2.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000039s : 0.22% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.60% optimize.opt_b.b_1 : 0.000106s : 0.61% optimize.opt_b.b_2 : 0.000020s : 0.11% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.12% optimize.loop_unroll : 0.000427s : 2.46% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000113s : 0.65% opt_after_jit_grad : 0.000457s : 2.63% validate : 0.000034s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006475s : 37.32% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.46% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.10% : 0.000005s : 4: substitution.graph_param_transform 67.11% : 0.000110s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.69% : 0.000004s : 4: substitution.replace_old_param 6.67% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006335 2 90.85% : 0.005756s : 1: type_inference.infer 9.15% : 0.000580s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.06% : 0.000027s : 3: replace.inline 29.94% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.64% : 0.000108s : 3: match.inline 8.36% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000009s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.57% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000002s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.19% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 47.41% : 0.000177s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.59% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030879 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.38% : 0.003514s : 1: add_attr 11.34% : 0.003502s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.80% : 0.000555s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.41% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.05% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000102s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.89% : 0.002128s : 1: opt_a 0.33% : 0.000101s : 1: opt_after_cconv 1.51% : 0.000467s : 1: opt_after_jit_grad 0.64% : 0.000199s : 1: opt_b 12.96% : 0.004002s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.72% : 0.000221s : 1: renormalize.infer 0.65% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.38% : 0.000118s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000032s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 21.00% : 0.006486s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.71% : 0.006396s : 1: type_inference 0.21% : 0.000066s : 1: validate TotalTime = 0.0202721, [24] [bootstrap]: 0.00049654 [type_inference]: 0.00480876 [event_method]: 1.148e-05 [auto_monad]: 5.352e-05 [graph_reusing]: 5.30999e-06 [inline]: 2.01e-06 [add_attr]: 0.00326974, [1] [add_attr_with_inline]: 0.00326053, [1] [Cycle 1]: 6.354e-05, [2] [tag_attr]: 1.379e-05 [meta_addattr_fg_expand]: 3.71999e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 2.697e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00421213, [53] [py_interpret_to_execute]: 1.727e-05 [rewriter_before_opt_a]: 4.021e-05 [opt_a]: 0.00218207, [2] [Cycle 1]: 0.00146284, [45] [expand_dump_flag]: 2.97002e-06 [switch_simplify]: 2.509e-05 [loop_unroll]: 1.433e-05 [a_1]: 0.00030532 [with_stream_mark]: 1.594e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 1.55001e-06 [a_2]: 7.658e-05 [accelerated_algorithm]: 7.28999e-06 [shard]: 2.49001e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 8.56002e-06 [auto_parallel]: 6.32001e-06 [parallel]: 1.842e-05 [flash_sp]: 9.27001e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 9.99001e-06 [allreduce_slice_to_reducescatter]: 1.06997e-06 [virtual_shard_identity]: 8.23999e-06 [virtual_dataset]: 5.67999e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.69999e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.50001e-06 [offload_activation]: 9.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.109e-05 [merge_recompute_call_nodes]: 1.58002e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.93001e-06 [meta_fg_expand]: 2.07999e-06 [flash_sp_send_recv_attached]: 2.74999e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.1e-05 [a_after_grad]: 9.66998e-06 [renormalize]: 0.00048914 [add_forward_monad_depend]: 5.69e-06 [auto_monad_grad]: 2.21e-06 [auto_monad_eliminator]: 1.606e-05 [cse]: 2.849e-05 [a_3]: 4.337e-05 [Cycle 2]: 0.00070869, [45] [expand_dump_flag]: 1.43002e-06 [switch_simplify]: 7.5e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012946 [with_stream_mark]: 1.202e-05 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.76999e-06 [parameter_eliminate]: 1.20001e-06 [a_2]: 6.895e-05 [accelerated_algorithm]: 5.91998e-06 [shard]: 2.08002e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 6.21e-06 [auto_parallel]: 5.86e-06 [parallel]: 5.47999e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 3.22002e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 6.98e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 6.54999e-06 [virtual_dataset]: 5.07999e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.76002e-06 [merge_forward]: 2.94001e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 7.16001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.025e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 6.215e-05 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 9.60019e-07 [receive_attached]: 1.54998e-06 [after_resolve]: 2.219e-05 [a_after_grad]: 8.75999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.89e-06 [auto_monad_grad]: 1.46998e-06 [auto_monad_eliminator]: 9.26998e-06 [cse]: 1.73e-05 [a_3]: 3.34e-05 [py_interpret_to_execute_after_opt_a]: 1.072e-05 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.57e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00054513 [opt_b]: 0.00019393, [1] [Cycle 1]: 0.00018655, [7] [b_1]: 0.00010863 [b_2]: 7.58001e-06 [updatestate_depend_eliminate]: 7.5e-06 [updatestate_assign_eliminate]: 2.74001e-06 [updatestate_loads_eliminate]: 2.61e-06 [renormalize]: 4.60015e-07 [cse]: 2.146e-05 [optimize_parallel_all_gather_comm]: 1.842e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.731e-05 [loop_unroll]: 0.00046736 [opt_after_cconv]: 0.00010029, [1] [Cycle 1]: 9.404e-05, [7] [c_1]: 2.817e-05 [parameter_eliminate]: 3.3e-06 [updatestate_depend_eliminate]: 6.11e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.778e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.253e-05 [tuple_transform]: 7.079e-05, [1] [Cycle 1]: 6.651e-05, [4] [d_1]: 4.058e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 5.91e-06 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 4.827e-05 [cse_after_recomputation]: 2.174e-05, [1] [Cycle 1]: 1.729e-05, [1] [cse]: 1.125e-05 [environ_conv]: 4.77998e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 5.24998e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.35001e-06 [full_micro_interleaved_order_control]: 2.66e-06 [reorder_send_recv_between_fp_bp]: 2.91999e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.30999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 2.07999e-06 [control_data_broadcast_order]: 1.291e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 4.07003e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64998e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.984e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.68002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.571e-05, [1] [Cycle 1]: 7.089e-05, [6] [build]: 3.66999e-06 [elim_shapecalc]: 1.049e-05 [elim_not_effective]: 1.171e-05 [opt_reshape]: 6.15002e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 1.70025e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.882e-05 [get_jit_bprop_graph]: 1.59e-06 [rewriter_after_jit_bprop_graph]: 4.97e-06 [opt_after_jit_grad]: 0.00052191 [validate]: 3.747e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.00654327 [execute]: 8.74e-06 Sums bootstrap : 0.000497s : 3.11% type_inference : 0.004809s : 30.16% event_method : 0.000011s : 0.07% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.25% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000033s : 0.20% optimize.opt_a.loop_unroll : 0.000020s : 0.12% optimize.opt_a.a_1 : 0.000435s : 2.73% optimize.opt_a.with_stream_mark : 0.000028s : 0.18% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000005s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000015s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000072s : 0.45% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000033s : 0.21% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000489s : 3.07% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.05% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.16% optimize.opt_a.cse : 0.000046s : 0.29% optimize.opt_a.a_3 : 0.000077s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000545s : 3.42% optimize.opt_b.b_1 : 0.000109s : 0.68% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.13% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.17% optimize.loop_unroll : 0.000467s : 2.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000020s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.07% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000019s : 0.12% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000522s : 3.27% validate : 0.000037s : 0.24% backend_pass : 0.000001s : 0.01% task_emit : 0.006543s : 41.04% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000135 26 19.19% : 0.000026s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.04% : 0.000005s : 4: substitution.graph_param_transform 65.07% : 0.000088s : 2: substitution.inline 2.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.27% : 0.000004s : 4: substitution.remove_not_recompute_node 3.53% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004758 2 92.27% : 0.004390s : 1: type_inference.infer 7.73% : 0.000368s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000085 2 100.00% : 0.000085s : 2: match.inline ------[predicate.] 0.000196 984 0.59% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.48% : 0.000001s : 8: predicate.addn_check_dump 0.54% : 0.000001s : 9: predicate.addn_zero_filter 0.49% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.79% : 0.000004s : 17: predicate.arithmetic_simplify 0.55% : 0.000001s : 9: predicate.cast_eliminate 0.56% : 0.000001s : 8: predicate.check_bprop_eliminate 0.46% : 0.000001s : 8: predicate.compare_switch_simplify 0.19% : 0.000000s : 4: predicate.const_output_eliminate 0.49% : 0.000001s : 8: predicate.depend_value_elim 0.56% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.62% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.56% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.56% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.79% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.72% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.74% : 0.000001s : 13: predicate.environ_get_depend_swap 1.33% : 0.000003s : 21: predicate.environ_get_eliminate 0.73% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.66% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.23% : 0.000002s : 11: predicate.float_depend_g_call 0.46% : 0.000001s : 8: predicate.float_environ_get_switch 0.69% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 4: predicate.fold_const_symbol 0.60% : 0.000001s : 8: predicate.get_grad_eliminate 0.19% : 0.000000s : 4: predicate.graph_param_transform 0.58% : 0.000001s : 8: predicate.incorporate_call 0.46% : 0.000001s : 8: predicate.incorporate_call_switch 4.37% : 0.000009s : 44: predicate.inline 0.72% : 0.000001s : 8: predicate.inline_without_move 27.19% : 0.000053s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000002s : 8: predicate.less_batch_normalization 1.17% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.49% : 0.000003s : 26: predicate.load_eliminater 1.50% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.23% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.17% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.49% : 0.000001s : 8: predicate.merge_addn 0.47% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.55% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.48% : 0.000001s : 9: predicate.minmaximum_grad 1.74% : 0.000003s : 4: predicate.mutable_eliminate 0.29% : 0.000001s : 4: predicate.opt_reshape 0.31% : 0.000001s : 4: predicate.parallel_virtual_node 0.87% : 0.000002s : 11: predicate.partial_defer_inline 0.83% : 0.000002s : 13: predicate.partial_eliminate 0.54% : 0.000001s : 9: predicate.print_const_string_wrapper 0.52% : 0.000001s : 8: predicate.reduce_all_const_elim 0.73% : 0.000001s : 9: predicate.reduce_eliminate 1.51% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 0.91% : 0.000002s : 17: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.26% : 0.000001s : 4: predicate.reset_defer_inline 0.52% : 0.000001s : 9: predicate.reshape_eliminate 0.61% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.34% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000002s : 8: predicate.same_eliminate 0.41% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000002s : 8: predicate.shard_identity_eliminate 0.60% : 0.000001s : 8: predicate.special_op_eliminate 0.66% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.31% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.70% : 0.000001s : 11: predicate.switch_defer_inline 1.22% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.14% : 0.000006s : 41: predicate.switch_simplify 0.50% : 0.000001s : 9: predicate.tile_eliminate 0.55% : 0.000001s : 9: predicate.transpose_eliminate 1.08% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.07% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 0.96% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.34% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 0.97% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.55% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.08% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.64% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.15% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.30% : 0.000001s : 4: predicate.value_based_eliminate 0.53% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.61% : 0.000001s : 8: predicate.virtual_output_eliminate 0.28% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000274 6 40.43% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.57% : 0.000164s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029272 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.19% : 0.003274s : 1: add_attr 11.15% : 0.003264s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000059s : 1: auto_monad 0.08% : 0.000023s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.83% : 0.000536s : 1: bootstrap 0.11% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.63% : 0.000478s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.90% : 0.000557s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.06% : 0.000018s : 1: opt.transform.mutable_eliminate 2.95% : 0.000864s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.47% : 0.002185s : 1: opt_a 0.35% : 0.000104s : 1: opt_after_cconv 1.83% : 0.000535s : 1: opt_after_jit_grad 0.68% : 0.000198s : 1: opt_b 14.40% : 0.004216s : 1: optimize 0.08% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.07% : 0.000021s : 1: py_interpret_to_execute 0.05% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.98% : 0.000287s : 1: renormalize.infer 0.66% : 0.000193s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000040s : 1: rewriter_after_opt_a 0.15% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000078s : 1: symbol_engine_optimizer 22.41% : 0.006559s : 1: task_emit 0.25% : 0.000074s : 1: tuple_transform 16.50% : 0.004831s : 1: type_inference 0.25% : 0.000073s : 1: validate TotalTime = 0.0224343, [24] [bootstrap]: 0.00050745 [type_inference]: 0.0064732 [event_method]: 1.704e-05 [auto_monad]: 5.79e-05 [graph_reusing]: 5.20001e-06 [inline]: 1.99e-06 [add_attr]: 0.00344511, [1] [add_attr_with_inline]: 0.00343435, [1] [Cycle 1]: 7.057e-05, [2] [tag_attr]: 1.826e-05 [meta_addattr_fg_expand]: 4.25999e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 3.117e-05 [insert-virtual-dataset]: 2.38998e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00449835, [53] [py_interpret_to_execute]: 2.464e-05 [rewriter_before_opt_a]: 6.071e-05 [opt_a]: 0.00240746, [2] [Cycle 1]: 0.0017675, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 3.365e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.00047412 [with_stream_mark]: 1.706e-05 [recompute_prepare]: 9.57001e-06 [updatestate_depend_eliminate]: 4.27e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.21001e-06 [parameter_eliminate]: 1.53002e-06 [a_2]: 7.757e-05 [accelerated_algorithm]: 6.54001e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 5.76998e-06 [merge_send_recv]: 8.44998e-06 [auto_parallel]: 6.79001e-06 [parallel]: 1.732e-05 [flash_sp]: 9.51998e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.50003e-06 [matmul_add_comm_reduction]: 1.006e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 8.62998e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.68999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.086e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.148e-05 [a_after_grad]: 9.02e-06 [renormalize]: 0.00060407 [add_forward_monad_depend]: 5.52001e-06 [auto_monad_grad]: 2.54999e-06 [auto_monad_eliminator]: 1.549e-05 [cse]: 2.756e-05 [a_3]: 4.451e-05 [Cycle 2]: 0.00062868, [45] [expand_dump_flag]: 1.67999e-06 [switch_simplify]: 7.53e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00012739 [with_stream_mark]: 1.267e-05 [recompute_prepare]: 6.29001e-06 [updatestate_depend_eliminate]: 3.07002e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 6.787e-05 [accelerated_algorithm]: 5.97999e-06 [shard]: 1.73002e-06 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 5.96998e-06 [auto_parallel]: 5.75001e-06 [parallel]: 5.91e-06 [flash_sp]: 3.7e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 7.28e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 6.33998e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.89e-06 [merge_forward]: 3.36999e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 7.41001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.02e-06 [before_grad]: 8.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.59998e-06 [after_resolve]: 1.074e-05 [a_after_grad]: 8e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.45001e-06 [auto_monad_grad]: 1.57001e-06 [auto_monad_eliminator]: 8.95999e-06 [cse]: 1.866e-05 [a_3]: 3.291e-05 [py_interpret_to_execute_after_opt_a]: 1.033e-05 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.628e-05 [convert_after_rewriter]: 6.93998e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00055638 [opt_b]: 0.00023314, [1] [Cycle 1]: 0.00022525, [7] [b_1]: 0.00010773 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 6.81999e-06 [updatestate_assign_eliminate]: 2.74001e-06 [updatestate_loads_eliminate]: 2.51e-06 [renormalize]: 2.80008e-07 [cse]: 2.2e-05 [optimize_parallel_all_gather_comm]: 1.866e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.669e-05 [loop_unroll]: 0.00045725 [opt_after_cconv]: 0.00010271, [1] [Cycle 1]: 9.554e-05, [7] [c_1]: 2.828e-05 [parameter_eliminate]: 3.07002e-06 [updatestate_depend_eliminate]: 5.81e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [cse]: 1.959e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.269e-05 [tuple_transform]: 7.088e-05, [1] [Cycle 1]: 6.66e-05, [4] [d_1]: 4.058e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.48e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 4.649e-05 [cse_after_recomputation]: 2.046e-05, [1] [Cycle 1]: 1.59e-05, [1] [cse]: 1.072e-05 [environ_conv]: 5.76998e-06 [swap_dp_allreduce_reducescatter]: 5.82001e-06 [bias_add_comm_swap]: 2.72001e-06 [label_micro_interleaved_index]: 5.41002e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.26997e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.269e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.45002e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.844e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 1.94999e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 7.31e-05, [1] [Cycle 1]: 6.893e-05, [6] [build]: 3.13e-06 [elim_shapecalc]: 9.49999e-06 [elim_not_effective]: 1.181e-05 [opt_reshape]: 7.19001e-06 [fold_const_symbol]: 8.78001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 1.73e-05 [get_jit_bprop_graph]: 1.68002e-06 [rewriter_after_jit_bprop_graph]: 5.29e-06 [opt_after_jit_grad]: 0.00050763 [validate]: 3.74e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00656308 [execute]: 7.86001e-06 Sums bootstrap : 0.000507s : 2.84% type_inference : 0.006473s : 36.17% event_method : 0.000017s : 0.10% auto_monad : 0.000058s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000031s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000025s : 0.14% optimize.rewriter_before_opt_a : 0.000061s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.14% optimize.opt_a.a_1 : 0.000602s : 3.36% optimize.opt_a.with_stream_mark : 0.000030s : 0.17% optimize.opt_a.recompute_prepare : 0.000016s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.81% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.13% optimize.opt_a.flash_sp : 0.000013s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000022s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000604s : 3.38% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.14% optimize.opt_a.cse : 0.000046s : 0.26% optimize.opt_a.a_3 : 0.000077s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000556s : 3.11% optimize.opt_b.b_1 : 0.000108s : 0.60% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.15% optimize.loop_unroll : 0.000457s : 2.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.26% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000508s : 2.84% validate : 0.000037s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006563s : 36.67% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000188 30 14.62% : 0.000028s : 5: substitution.arithmetic_simplify 0.90% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.00% : 0.000006s : 4: substitution.graph_param_transform 67.15% : 0.000126s : 3: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.41% : 0.000005s : 4: substitution.remove_not_recompute_node 2.67% : 0.000005s : 4: substitution.replace_old_param 6.69% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006417 2 90.49% : 0.005807s : 1: type_inference.infer 9.51% : 0.000610s : 1: type_inference.specialize ------[replace.] 0.000041 5 71.11% : 0.000029s : 3: replace.inline 28.89% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000136 5 91.46% : 0.000124s : 3: match.inline 8.54% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.65% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000004s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.23% : 0.000010s : 51: predicate.inline 1.00% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.79% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.65% : 0.000003s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.80% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.23% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000002s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.84% : 0.000008s : 54: predicate.switch_simplify 0.77% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.38% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.24% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000411 8 44.67% : 0.000184s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.33% : 0.000227s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032124 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.74% : 0.003450s : 1: add_attr 10.71% : 0.003439s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000064s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.70% : 0.000547s : 1: bootstrap 0.09% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000024s : 1: event_method 0.04% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.46% : 0.000468s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000569s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000017s : 1: opt.transform.mutable_eliminate 3.05% : 0.000981s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.50% : 0.002411s : 1: opt_a 0.33% : 0.000106s : 1: opt_after_cconv 1.62% : 0.000521s : 1: opt_after_jit_grad 0.74% : 0.000237s : 1: opt_b 14.02% : 0.004503s : 1: optimize 0.07% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000036s : 1: pre_auto_parallel 0.09% : 0.000029s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 1.04% : 0.000333s : 1: renormalize.infer 0.82% : 0.000263s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000040s : 1: rewriter_after_opt_a 0.20% : 0.000065s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000076s : 1: symbol_engine_optimizer 20.48% : 0.006580s : 1: task_emit 0.23% : 0.000074s : 1: tuple_transform 20.23% : 0.006498s : 1: type_inference 0.22% : 0.000072s : 1: validate TotalTime = 0.03965, [24] [bootstrap]: 0.00051613 [type_inference]: 0.0120053 [event_method]: 4.768e-05 [auto_monad]: 0.00012193 [graph_reusing]: 8.55001e-06 [inline]: 2.16e-06 [add_attr]: 0.00315606, [1] [add_attr_with_inline]: 0.00314631, [1] [Cycle 1]: 7.788e-05, [2] [tag_attr]: 3.656e-05 [meta_addattr_fg_expand]: 9.20999e-06 [parallel-infer-symbol]: 3.61001e-06 [pre_auto_parallel]: 5.205e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0142717, [53] [py_interpret_to_execute]: 4.001e-05 [rewriter_before_opt_a]: 0.00014906 [opt_a]: 0.0118375, [3] [Cycle 1]: 0.00766636, [45] [expand_dump_flag]: 3.56001e-06 [switch_simplify]: 7.591e-05 [loop_unroll]: 6.26e-05 [a_1]: 0.00148089 [with_stream_mark]: 2.516e-05 [recompute_prepare]: 2.363e-05 [updatestate_depend_eliminate]: 9.51e-06 [updatestate_assign_eliminate]: 7.97e-06 [updatestate_loads_eliminate]: 7.44002e-06 [parameter_eliminate]: 2.73e-06 [a_2]: 0.00024455 [accelerated_algorithm]: 3.196e-05 [shard]: 2.01e-06 [meta_shard_fg_expand]: 3.4e-06 [shard_inline]: 1.613e-05 [merge_send_recv]: 1.637e-05 [auto_parallel]: 1.203e-05 [parallel]: 1.986e-05 [flash_sp]: 1.157e-05 [merge_comm]: 1.055e-05 [allreduce_fusion]: 9.25001e-06 [matmul_add_comm_reduction]: 2.894e-05 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 1.896e-05 [virtual_dataset]: 1.564e-05 [get_grad_eliminate_]: 1.514e-05 [virtual_output]: 1.486e-05 [merge_forward]: 1.03e-05 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 1.866e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.008e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 2.741e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00158767 [flash_sp_send_recv_attached]: 3.78999e-06 [receive_attached]: 2.46e-06 [after_resolve]: 6.296e-05 [a_after_grad]: 8.341e-05 [renormalize]: 0.00276571 [add_forward_monad_depend]: 1.091e-05 [auto_monad_grad]: 6.08002e-06 [auto_monad_eliminator]: 5.875e-05 [cse]: 0.00016541 [a_3]: 0.00033857 [Cycle 2]: 0.00322835, [45] [expand_dump_flag]: 2.08998e-06 [switch_simplify]: 4.79e-05 [loop_unroll]: 4.313e-05 [a_1]: 0.00156632 [with_stream_mark]: 1.531e-05 [recompute_prepare]: 1.261e-05 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 4.42e-06 [updatestate_loads_eliminate]: 3.74002e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 0.00012689 [accelerated_algorithm]: 1.292e-05 [shard]: 1.58002e-06 [meta_shard_fg_expand]: 2.00002e-06 [shard_inline]: 9.50001e-06 [merge_send_recv]: 7.45998e-06 [auto_parallel]: 8.81002e-06 [parallel]: 6.33e-06 [flash_sp]: 3.33998e-06 [merge_comm]: 5.27001e-06 [allreduce_fusion]: 5.09e-06 [matmul_add_comm_reduction]: 1.003e-05 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 1.07e-05 [virtual_dataset]: 9.08002e-06 [get_grad_eliminate_]: 9.49999e-06 [virtual_output]: 8.51002e-06 [merge_forward]: 5.55001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 1.054e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.731e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.443e-05 [set_forward_comm_id_for_comm_node_pass]: 5.54e-06 [meta_fg_expand]: 8.924e-05 [flash_sp_send_recv_attached]: 1.27e-06 [receive_attached]: 1.54e-06 [after_resolve]: 1.914e-05 [a_after_grad]: 1.507e-05 [renormalize]: 0.00067049 [add_forward_monad_depend]: 4.82998e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.552e-05 [cse]: 8.862e-05 [a_3]: 6.731e-05 [Cycle 3]: 0.00092609, [45] [expand_dump_flag]: 1.52001e-06 [switch_simplify]: 1.018e-05 [loop_unroll]: 8.72e-06 [a_1]: 0.00025166 [with_stream_mark]: 1.087e-05 [recompute_prepare]: 9.51e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 4.15e-06 [parameter_eliminate]: 1.23002e-06 [a_2]: 0.00012246 [accelerated_algorithm]: 1.2e-05 [shard]: 1.29998e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.06002e-06 [merge_send_recv]: 7.84002e-06 [auto_parallel]: 7.51999e-06 [parallel]: 5.47001e-06 [flash_sp]: 1.35001e-06 [merge_comm]: 4.97999e-06 [allreduce_fusion]: 5.02999e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 9.81e-06 [virtual_dataset]: 8.66997e-06 [get_grad_eliminate_]: 8.53001e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.39002e-06 [cell_reuse_recompute_pass]: 1.71998e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.645e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 1.467e-05 [set_forward_comm_id_for_comm_node_pass]: 5.46998e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 1.10999e-06 [receive_attached]: 1.87999e-06 [after_resolve]: 1.41e-05 [a_after_grad]: 1.416e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.54998e-06 [auto_monad_grad]: 1.23002e-06 [auto_monad_eliminator]: 1.384e-05 [cse]: 2.937e-05 [a_3]: 6.044e-05 [py_interpret_to_execute_after_opt_a]: 1.392e-05 [slice_cell_reuse_recomputed_activation]: 2.18002e-06 [rewriter_after_opt_a]: 5.124e-05 [convert_after_rewriter]: 9.37999e-06 [order_py_execute_after_rewriter]: 6.67002e-06 [mutable_eliminate]: 0.00053961 [opt_b]: 0.00029632, [1] [Cycle 1]: 0.00028898, [7] [b_1]: 0.00018875 [b_2]: 1.105e-05 [updatestate_depend_eliminate]: 8.25e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 4.25e-06 [renormalize]: 5.09986e-07 [cse]: 3.637e-05 [optimize_parallel_all_gather_comm]: 2.207e-05 [overlap_param_gather]: 2.17001e-06 [cconv]: 2.382e-05 [loop_unroll]: 0.00045745 [opt_after_cconv]: 0.00014068, [1] [Cycle 1]: 0.0001339, [7] [c_1]: 4.848e-05 [parameter_eliminate]: 2.78998e-06 [updatestate_depend_eliminate]: 7.56001e-06 [updatestate_assign_eliminate]: 4.53999e-06 [updatestate_loads_eliminate]: 3.97998e-06 [cse]: 3.183e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 3.092e-05 [tuple_transform]: 0.00010175, [1] [Cycle 1]: 9.709e-05, [4] [d_1]: 6.638e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.92999e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 6.177e-05 [cse_after_recomputation]: 3.205e-05, [1] [Cycle 1]: 2.72e-05, [1] [cse]: 2.168e-05 [environ_conv]: 8.92e-06 [swap_dp_allreduce_reducescatter]: 7.84002e-06 [bias_add_comm_swap]: 2.38998e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.31002e-06 [slice_recompute_activation]: 2.59001e-06 [micro_interleaved_order_control]: 2.28998e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.56e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.31002e-06 [overlap_opt_shard_in_pipeline]: 1.06002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.805e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 4.81002e-06 [overlap_recompute_and_grad_model_parallel]: 6.02001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 4.75001e-06 [overlap_grad_flash_sp]: 2.651e-05 [begin_end_overlap_inline]: 7.59988e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 2.03002e-06 [handle_group_info]: 1.24e-06 [symbol_engine_optimizer]: 0.00010221, [1] [Cycle 1]: 9.728e-05, [6] [build]: 1.019e-05 [elim_shapecalc]: 1.41e-05 [elim_not_effective]: 1.86e-05 [opt_reshape]: 1.031e-05 [fold_const_symbol]: 1.458e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.549e-05 [get_jit_bprop_graph]: 1.51998e-06 [rewriter_after_jit_bprop_graph]: 4.51002e-06 [opt_after_jit_grad]: 0.0005081 [validate]: 4.798e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.00862766 [execute]: 7.23e-06 Sums bootstrap : 0.000516s : 1.47% type_inference : 0.012005s : 34.16% event_method : 0.000048s : 0.14% auto_monad : 0.000122s : 0.35% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000052s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000149s : 0.42% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000134s : 0.38% optimize.opt_a.loop_unroll : 0.000114s : 0.33% optimize.opt_a.a_1 : 0.003299s : 9.39% optimize.opt_a.with_stream_mark : 0.000051s : 0.15% optimize.opt_a.recompute_prepare : 0.000046s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.41% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.16% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000032s : 0.09% optimize.opt_a.auto_parallel : 0.000028s : 0.08% optimize.opt_a.parallel : 0.000032s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.09% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000020s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000038s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001680s : 4.78% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000096s : 0.27% optimize.opt_a.a_after_grad : 0.000113s : 0.32% optimize.opt_a.renormalize : 0.003436s : 9.78% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000088s : 0.25% optimize.opt_a.cse : 0.000283s : 0.81% optimize.opt_a.a_3 : 0.000466s : 1.33% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000540s : 1.54% optimize.opt_b.b_1 : 0.000189s : 0.54% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000036s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.07% optimize.loop_unroll : 0.000457s : 1.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000032s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000027s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000508s : 1.45% validate : 0.000048s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008628s : 24.55% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000831 222 5.91% : 0.000049s : 12: substitution.arithmetic_simplify 1.87% : 0.000016s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.22% : 0.000002s : 2: substitution.incorporate_call_switch 56.64% : 0.000471s : 17: substitution.inline 2.16% : 0.000018s : 2: substitution.inline_without_move 1.33% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000016s : 3: substitution.less_batch_normalization 1.54% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000006s : 5: substitution.partial_eliminate 1.73% : 0.000014s : 20: substitution.remove_not_recompute_node 3.29% : 0.000027s : 10: substitution.replace_applicator 1.44% : 0.000012s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.37% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.65% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.14% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000071s : 30: substitution.tuple_list_get_item_eliminator 2.30% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011921 2 86.83% : 0.010351s : 1: type_inference.infer 13.17% : 0.001571s : 1: type_inference.specialize ------[replace.] 0.000232 33 58.66% : 0.000136s : 17: replace.inline 41.34% : 0.000096s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000499 33 92.55% : 0.000462s : 17: match.inline 7.45% : 0.000037s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000759 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.97% : 0.000015s : 100: predicate.arithmetic_simplify 1.11% : 0.000008s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.45% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000018s : 101: predicate.float_depend_g_call 0.55% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.62% : 0.000005s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000043s : 249: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.63% : 0.000020s : 168: predicate.load_eliminater 0.39% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.24% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000009s : 68: predicate.minmaximum_grad 0.37% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.12% : 0.000016s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000015s : 152: predicate.replace_applicator 0.68% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.20% : 0.000002s : 8: predicate.row_tensor_eliminate 1.36% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.35% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.90% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.22% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001701 34 58.31% : 0.000992s : 13: func_graph_cloner_run.FuncGraphClonerGraph 41.69% : 0.000709s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065823 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.80% : 0.003161s : 1: add_attr 4.79% : 0.003151s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000129s : 1: auto_monad 0.04% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000552s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000466s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.84% : 0.000550s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.58% : 0.004986s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000175s : 28: opt.transform.opt_b 0.11% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.99% : 0.011841s : 1: opt_a 0.22% : 0.000144s : 1: opt_after_cconv 0.79% : 0.000519s : 1: opt_after_jit_grad 0.46% : 0.000300s : 1: opt_b 21.69% : 0.014276s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000057s : 1: pre_auto_parallel 0.07% : 0.000045s : 1: py_interpret_to_execute 0.03% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000035s : 1: remove_dup_value 2.80% : 0.001844s : 2: renormalize.infer 2.39% : 0.001576s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000056s : 1: rewriter_after_opt_a 0.23% : 0.000155s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000105s : 1: symbol_engine_optimizer 13.13% : 0.008640s : 1: task_emit 0.16% : 0.000105s : 1: tuple_transform 18.27% : 0.012027s : 1: type_inference 0.13% : 0.000087s : 1: validate TotalTime = 0.0185529, [24] [bootstrap]: 0.00046348 [type_inference]: 0.00434252 [event_method]: 1.038e-05 [auto_monad]: 5.07e-05 [graph_reusing]: 5.30999e-06 [inline]: 1.70001e-06 [add_attr]: 0.00309578, [1] [add_attr_with_inline]: 0.00308813, [1] [Cycle 1]: 5.2e-05, [2] [tag_attr]: 1.241e-05 [meta_addattr_fg_expand]: 3.58e-06 [parallel-infer-symbol]: 3.38e-06 [pre_auto_parallel]: 2.165e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.75001e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00370753, [53] [py_interpret_to_execute]: 1.527e-05 [rewriter_before_opt_a]: 3.837e-05 [opt_a]: 0.00186204, [2] [Cycle 1]: 0.00126544, [45] [expand_dump_flag]: 2.49001e-06 [switch_simplify]: 2.387e-05 [loop_unroll]: 1.344e-05 [a_1]: 0.00029252 [with_stream_mark]: 1.36e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.517e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.30002e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 6.23e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.66e-06 [parallel]: 1.84e-05 [flash_sp]: 7.27002e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.48999e-06 [matmul_add_comm_reduction]: 8.85001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 3.55998e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 9.42001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.098e-05 [merge_recompute_call_nodes]: 1.64998e-06 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.17001e-06 [receive_attached]: 1.99e-06 [after_resolve]: 1.03e-05 [a_after_grad]: 8.70999e-06 [renormalize]: 0.00035845 [add_forward_monad_depend]: 5.07999e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.296e-05 [cse]: 2.706e-05 [a_3]: 4.076e-05 [Cycle 2]: 0.00058735, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 6.86999e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012274 [with_stream_mark]: 1.056e-05 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.60002e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.729e-05 [accelerated_algorithm]: 5.53002e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.64002e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.2e-06 [flash_sp]: 2.91e-06 [merge_comm]: 2.98998e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.86e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.04999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.20999e-06 [after_resolve]: 9.02e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.29999e-06 [cse]: 1.301e-05 [a_3]: 3.207e-05 [py_interpret_to_execute_after_opt_a]: 7.56001e-06 [slice_cell_reuse_recomputed_activation]: 1.69998e-06 [rewriter_after_opt_a]: 3.124e-05 [convert_after_rewriter]: 6.41998e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00044817 [opt_b]: 0.00017979, [1] [Cycle 1]: 0.00017391, [7] [b_1]: 0.00010627 [b_2]: 7.15e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 2.89991e-07 [cse]: 1.596e-05 [optimize_parallel_all_gather_comm]: 1.591e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.106e-05 [loop_unroll]: 0.00045734 [opt_after_cconv]: 9.494e-05, [1] [Cycle 1]: 8.922e-05, [7] [c_1]: 2.803e-05 [parameter_eliminate]: 2.11998e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.601e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.282e-05 [tuple_transform]: 6.96e-05, [1] [Cycle 1]: 6.532e-05, [4] [d_1]: 3.916e-05 [none_parameter_eliminate]: 1.87999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.16998e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.458e-05 [cse_after_recomputation]: 2.017e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.081e-05 [environ_conv]: 4.20999e-06 [swap_dp_allreduce_reducescatter]: 5.13002e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.39999e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.72001e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 1.25999e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.134e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.04999e-06 [overlap_grad_ring_attention]: 4.4e-06 [overlap_grad_flash_sp]: 1.63e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 6.856e-05, [1] [Cycle 1]: 6.428e-05, [6] [build]: 2.21998e-06 [elim_shapecalc]: 8.43001e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.84e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.12001e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.523e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00044629 [validate]: 3.125e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00614109 [execute]: 6.84001e-06 Sums bootstrap : 0.000463s : 3.19% type_inference : 0.004343s : 29.93% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.86% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000359s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.09% optimize.opt_b.b_1 : 0.000106s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000457s : 3.15% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.08% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006141s : 42.33% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000123 26 17.46% : 0.000022s : 4: substitution.arithmetic_simplify 1.52% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.69% : 0.000006s : 4: substitution.graph_param_transform 65.61% : 0.000081s : 2: substitution.inline 2.48% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.95% : 0.000005s : 4: substitution.remove_not_recompute_node 3.26% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004302 2 91.91% : 0.003954s : 1: type_inference.infer 8.09% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.16% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.84% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.78% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.57% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.92% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 40.92% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.08% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026637 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.64% : 0.003101s : 1: add_attr 11.61% : 0.003092s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000499s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.75% : 0.000465s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.87% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.00% : 0.001865s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000456s : 1: opt_after_jit_grad 0.69% : 0.000183s : 1: opt_b 13.93% : 0.003711s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000198s : 1: renormalize.infer 0.58% : 0.000154s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.09% : 0.006151s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.35% : 0.004356s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0367717, [24] [bootstrap]: 0.00055435 [type_inference]: 0.0102957 [event_method]: 4.024e-05 [auto_monad]: 0.00011267 [graph_reusing]: 7.82e-06 [inline]: 1.92999e-06 [add_attr]: 0.00303225, [1] [add_attr_with_inline]: 0.00302427, [1] [Cycle 1]: 6.958e-05, [2] [tag_attr]: 3.164e-05 [meta_addattr_fg_expand]: 8.62e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 4.625e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0136012, [53] [py_interpret_to_execute]: 3.416e-05 [rewriter_before_opt_a]: 0.00012648 [opt_a]: 0.0113484, [3] [Cycle 1]: 0.00745538, [45] [expand_dump_flag]: 3.57997e-06 [switch_simplify]: 6.571e-05 [loop_unroll]: 6.013e-05 [a_1]: 0.0013295 [with_stream_mark]: 2.247e-05 [recompute_prepare]: 2.099e-05 [updatestate_depend_eliminate]: 8.95999e-06 [updatestate_assign_eliminate]: 7.77e-06 [updatestate_loads_eliminate]: 7.67998e-06 [parameter_eliminate]: 2.46e-06 [a_2]: 0.00024429 [accelerated_algorithm]: 3.065e-05 [shard]: 1.80001e-06 [meta_shard_fg_expand]: 3.36001e-06 [shard_inline]: 1.599e-05 [merge_send_recv]: 1.647e-05 [auto_parallel]: 1.068e-05 [parallel]: 2.321e-05 [flash_sp]: 1.391e-05 [merge_comm]: 1.4e-05 [allreduce_fusion]: 9.26998e-06 [matmul_add_comm_reduction]: 2.783e-05 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 2.319e-05 [virtual_dataset]: 1.638e-05 [get_grad_eliminate_]: 1.547e-05 [virtual_output]: 1.538e-05 [merge_forward]: 9.92001e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 1.751e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.126e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 2.872e-05 [set_forward_comm_id_for_comm_node_pass]: 1.019e-05 [meta_fg_expand]: 0.00142492 [flash_sp_send_recv_attached]: 3.64002e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 5.996e-05 [a_after_grad]: 8.027e-05 [renormalize]: 0.0025145 [add_forward_monad_depend]: 9.52999e-06 [auto_monad_grad]: 5.59e-06 [auto_monad_eliminator]: 5.621e-05 [cse]: 0.0001624 [a_3]: 0.00033509 [Cycle 2]: 0.0029876, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.682e-05 [loop_unroll]: 4.371e-05 [a_1]: 0.00152514 [with_stream_mark]: 1.196e-05 [recompute_prepare]: 1.112e-05 [updatestate_depend_eliminate]: 5.47999e-06 [updatestate_assign_eliminate]: 4.30999e-06 [updatestate_loads_eliminate]: 3.72002e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 0.00012536 [accelerated_algorithm]: 1.181e-05 [shard]: 1.24998e-06 [meta_shard_fg_expand]: 1.83997e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.66001e-06 [parallel]: 5.52999e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 4.87998e-06 [matmul_add_comm_reduction]: 7.87998e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.90999e-06 [get_grad_eliminate_]: 8.83001e-06 [virtual_output]: 8.39998e-06 [merge_forward]: 4.45999e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.641e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.417e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.618e-05 [flash_sp_send_recv_attached]: 1.30999e-06 [receive_attached]: 1.15999e-06 [after_resolve]: 1.474e-05 [a_after_grad]: 1.425e-05 [renormalize]: 0.00061879 [add_forward_monad_depend]: 3.96001e-06 [auto_monad_grad]: 1.11002e-06 [auto_monad_eliminator]: 1.464e-05 [cse]: 4.534e-05 [a_3]: 6.521e-05 [Cycle 3]: 0.00088987, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.042e-05 [loop_unroll]: 8.95999e-06 [a_1]: 0.00024849 [with_stream_mark]: 9.81998e-06 [recompute_prepare]: 9.51998e-06 [updatestate_depend_eliminate]: 4.72e-06 [updatestate_assign_eliminate]: 3.80998e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012212 [accelerated_algorithm]: 1.179e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.66998e-06 [shard_inline]: 9.22001e-06 [merge_send_recv]: 6.96001e-06 [auto_parallel]: 7.36999e-06 [parallel]: 4.65999e-06 [flash_sp]: 1.04e-06 [merge_comm]: 4.97e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 7.69002e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.018e-05 [virtual_dataset]: 8.64998e-06 [get_grad_eliminate_]: 8.40999e-06 [virtual_output]: 8.23001e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 8.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.587e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.388e-05 [set_forward_comm_id_for_comm_node_pass]: 5.15001e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.307e-05 [a_after_grad]: 1.41e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 1.045e-05 [cse]: 2.481e-05 [a_3]: 5.685e-05 [py_interpret_to_execute_after_opt_a]: 1.054e-05 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 4.806e-05 [convert_after_rewriter]: 9.37999e-06 [order_py_execute_after_rewriter]: 6.73e-06 [mutable_eliminate]: 0.0004726 [opt_b]: 0.00028615, [1] [Cycle 1]: 0.00027967, [7] [b_1]: 0.00018854 [b_2]: 1.051e-05 [updatestate_depend_eliminate]: 7.18998e-06 [updatestate_assign_eliminate]: 4.02998e-06 [updatestate_loads_eliminate]: 4.01001e-06 [renormalize]: 5.09986e-07 [cse]: 3.037e-05 [optimize_parallel_all_gather_comm]: 2.081e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 1.967e-05 [loop_unroll]: 0.00042218 [opt_after_cconv]: 0.00013573, [1] [Cycle 1]: 0.00012958, [7] [c_1]: 4.811e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 3.97002e-06 [cse]: 3.002e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 2.882e-05 [tuple_transform]: 0.00010108, [1] [Cycle 1]: 9.641e-05, [4] [d_1]: 6.633e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 9.76e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 5.799e-05 [cse_after_recomputation]: 3.182e-05, [1] [Cycle 1]: 2.708e-05, [1] [cse]: 2.125e-05 [environ_conv]: 9.06002e-06 [swap_dp_allreduce_reducescatter]: 8.28999e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.02002e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.00002e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.44001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.72e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 4.94e-06 [overlap_recompute_and_grad_model_parallel]: 5.57001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.26998e-06 [overlap_grad_ring_attention]: 5.32001e-06 [overlap_grad_flash_sp]: 2.468e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.982e-05, [1] [Cycle 1]: 9.536e-05, [6] [build]: 9.67999e-06 [elim_shapecalc]: 1.345e-05 [elim_not_effective]: 1.853e-05 [opt_reshape]: 1.014e-05 [fold_const_symbol]: 1.451e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 2.446e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00052251 [validate]: 4.659e-05 [backend_pass]: 1.17e-06 [task_emit]: 0.00825121 [execute]: 6.92002e-06 Sums bootstrap : 0.000554s : 1.73% type_inference : 0.010296s : 32.11% event_method : 0.000040s : 0.13% auto_monad : 0.000113s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000126s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.38% optimize.opt_a.loop_unroll : 0.000113s : 0.35% optimize.opt_a.a_1 : 0.003103s : 9.68% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.53% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000033s : 0.10% optimize.opt_a.flash_sp : 0.000018s : 0.06% optimize.opt_a.merge_comm : 0.000024s : 0.07% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.14% optimize.opt_a.virtual_dataset : 0.000034s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001464s : 4.57% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.34% optimize.opt_a.renormalize : 0.003133s : 9.77% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000233s : 0.73% optimize.opt_a.a_3 : 0.000457s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000473s : 1.47% optimize.opt_b.b_1 : 0.000189s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000422s : 1.32% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000523s : 1.63% validate : 0.000047s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008251s : 25.73% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000734 218 5.77% : 0.000042s : 11: substitution.arithmetic_simplify 1.92% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.57% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.13% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 54.78% : 0.000402s : 16: substitution.inline 2.09% : 0.000015s : 2: substitution.inline_without_move 1.45% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000005s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.26% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.75% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.59% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010226 2 87.45% : 0.008943s : 1: type_inference.infer 12.55% : 0.001284s : 1: type_inference.specialize ------[replace.] 0.000201 30 59.06% : 0.000119s : 16: replace.inline 40.94% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 30 92.79% : 0.000394s : 16: match.inline 7.21% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000732 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000016s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.17% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.69% : 0.000012s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.13% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.71% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.91% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.41% : 0.000010s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001526 32 56.49% : 0.000862s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.51% : 0.000664s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061619 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.93% : 0.003037s : 1: add_attr 4.91% : 0.003028s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000119s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.96% : 0.000589s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.72% : 0.004757s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 18.42% : 0.011351s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.86% : 0.000532s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 22.08% : 0.013605s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.73% : 0.001684s : 2: renormalize.infer 2.33% : 0.001436s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.21% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.41% : 0.008261s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 16.73% : 0.010312s : 1: type_inference 0.13% : 0.000081s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x7-kbk],max_mem:6.0M . TotalTime = 2.41163, [24] [bootstrap]: 0.00055021 [type_inference]: 0.00630059 [event_method]: 1.426e-05 [auto_monad]: 5.846e-05 [graph_reusing]: 5.14e-06 [inline]: 2.04e-06 [add_attr]: 0.00350809, [1] [add_attr_with_inline]: 0.00349593, [1] [Cycle 1]: 4.838e-05, [2] [tag_attr]: 1.628e-05 [meta_addattr_fg_expand]: 4.08001e-06 [parallel-infer-symbol]: 3.02002e-06 [pre_auto_parallel]: 2.814e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.33998e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00397904, [53] [py_interpret_to_execute]: 2.054e-05 [rewriter_before_opt_a]: 5.835e-05 [opt_a]: 0.00211533, [2] [Cycle 1]: 0.00152408, [45] [expand_dump_flag]: 2.63e-06 [switch_simplify]: 3.23e-05 [loop_unroll]: 2.087e-05 [a_1]: 0.00045195 [with_stream_mark]: 1.348e-05 [recompute_prepare]: 7.31001e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.484e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.85998e-06 [auto_parallel]: 5.92999e-06 [parallel]: 2.361e-05 [flash_sp]: 7.05e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.58001e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.113e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.18998e-06 [after_resolve]: 9.91998e-06 [a_after_grad]: 9.00999e-06 [renormalize]: 0.00043015 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.94999e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.716e-05 [a_3]: 3.931e-05 [Cycle 2]: 0.00058166, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.0001249 [with_stream_mark]: 9.67999e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 6.719e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.65001e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.81002e-06 [flash_sp]: 3.23e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 4.98001e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 5.84999e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 4.90999e-06 [virtual_output]: 4.80001e-06 [merge_forward]: 2.50997e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.79999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.35001e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.76001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16999e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.09e-06 [a_after_grad]: 7.85e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.09999e-06 [cse]: 1.199e-05 [a_3]: 3.102e-05 [py_interpret_to_execute_after_opt_a]: 7.28e-06 [slice_cell_reuse_recomputed_activation]: 2.21998e-06 [rewriter_after_opt_a]: 3.071e-05 [convert_after_rewriter]: 6.54999e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00045175 [opt_b]: 0.00018251, [1] [Cycle 1]: 0.00017671, [7] [b_1]: 0.00011083 [b_2]: 6.89999e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.12999e-06 [renormalize]: 3.39991e-07 [cse]: 1.536e-05 [optimize_parallel_all_gather_comm]: 1.597e-05 [overlap_param_gather]: 2.17001e-06 [cconv]: 2.206e-05 [loop_unroll]: 0.00043706 [opt_after_cconv]: 9.298e-05, [1] [Cycle 1]: 8.725e-05, [7] [c_1]: 2.735e-05 [parameter_eliminate]: 2.49001e-06 [updatestate_depend_eliminate]: 4.79002e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.555e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.306e-05 [tuple_transform]: 6.758e-05, [1] [Cycle 1]: 6.344e-05, [4] [d_1]: 3.791e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 5.292e-05 [cse_after_recomputation]: 1.942e-05, [1] [Cycle 1]: 1.523e-05, [1] [cse]: 1.026e-05 [environ_conv]: 4.45999e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.71999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 1.27e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81003e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.90001e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.53002e-06 [overlap_recompute_comm]: 2.42001e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.674e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.815e-05, [1] [Cycle 1]: 6.411e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 8.08999e-06 [elim_not_effective]: 1.19e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.538e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.68999e-06 [opt_after_jit_grad]: 0.00044742 [validate]: 3.012e-05 [backend_pass]: 8.29983e-07 [task_emit]: 2.39646 [execute]: 9.39e-06 Sums bootstrap : 0.000550s : 0.02% type_inference : 0.006301s : 0.26% event_method : 0.000014s : 0.00% auto_monad : 0.000058s : 0.00% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000577s : 0.02% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000010s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000430s : 0.02% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000070s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000452s : 0.02% optimize.opt_b.b_1 : 0.000111s : 0.00% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000437s : 0.02% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.00% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.02% validate : 0.000030s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 2.396455s : 99.56% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000166 30 15.01% : 0.000025s : 5: substitution.arithmetic_simplify 1.24% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000005s : 4: substitution.graph_param_transform 66.93% : 0.000111s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000004s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 6.25% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006253 2 91.11% : 0.005697s : 1: type_inference.infer 8.89% : 0.000556s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.49% : 0.000026s : 3: replace.inline 30.51% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 92.15% : 0.000109s : 3: match.inline 7.85% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.83% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000009s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.63% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.54% : 0.000002s : 17: predicate.partial_eliminate 1.04% : 0.000002s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.45% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.97% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 45.68% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.32% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 2.420639 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.15% : 0.003512s : 1: add_attr 0.14% : 0.003500s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000064s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.02% : 0.000588s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.02% : 0.000446s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.02% : 0.000461s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.04% : 0.000939s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000093s : 28: opt.transform.opt_b 0.00% : 0.000042s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.09% : 0.002118s : 1: opt_a 0.00% : 0.000096s : 1: opt_after_cconv 0.02% : 0.000456s : 1: opt_after_jit_grad 0.01% : 0.000186s : 1: opt_b 0.16% : 0.003983s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.01% : 0.000226s : 1: renormalize.infer 0.01% : 0.000198s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000034s : 1: rewriter_after_opt_a 0.00% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000071s : 1: symbol_engine_optimizer 99.00% : 2.396477s : 1: task_emit 0.00% : 0.000070s : 1: tuple_transform 0.26% : 0.006315s : 1: type_inference 0.00% : 0.000052s : 1: validate TotalTime = 0.137101, [24] [bootstrap]: 0.00049007 [type_inference]: 0.00445534 [event_method]: 1.13e-05 [auto_monad]: 5.245e-05 [graph_reusing]: 5.10001e-06 [inline]: 1.99999e-06 [add_attr]: 0.00299575, [1] [add_attr_with_inline]: 0.00298825, [1] [Cycle 1]: 4.698e-05, [2] [tag_attr]: 1.231e-05 [meta_addattr_fg_expand]: 3.03e-06 [parallel-infer-symbol]: 2.63003e-06 [pre_auto_parallel]: 2.151e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00374545, [53] [py_interpret_to_execute]: 1.584e-05 [rewriter_before_opt_a]: 4.025e-05 [opt_a]: 0.00188776, [2] [Cycle 1]: 0.00129018, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 2.425e-05 [loop_unroll]: 1.376e-05 [a_1]: 0.00029404 [with_stream_mark]: 1.322e-05 [recompute_prepare]: 7.28999e-06 [updatestate_depend_eliminate]: 3.47997e-06 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.648e-05 [accelerated_algorithm]: 6.63e-06 [shard]: 2.49001e-06 [meta_shard_fg_expand]: 1.41998e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 7.96001e-06 [auto_parallel]: 6.07999e-06 [parallel]: 2.184e-05 [flash_sp]: 6.99001e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.28998e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.21999e-06 [virtual_dataset]: 5.77999e-06 [get_grad_eliminate_]: 5.68002e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.091e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.51e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 2.17001e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00036917 [add_forward_monad_depend]: 4.36002e-06 [auto_monad_grad]: 1.90001e-06 [auto_monad_eliminator]: 1.355e-05 [cse]: 2.848e-05 [a_3]: 4.054e-05 [Cycle 2]: 0.00058821, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.34998e-06 [a_1]: 0.00012343 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.701e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.39e-06 [parallel]: 4e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.05002e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.43998e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.94999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.97001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 7.85e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.384e-05 [a_3]: 3.146e-05 [py_interpret_to_execute_after_opt_a]: 7.47998e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.091e-05 [convert_after_rewriter]: 6.69999e-06 [order_py_execute_after_rewriter]: 5.69999e-06 [mutable_eliminate]: 0.00045116 [opt_b]: 0.00022901, [1] [Cycle 1]: 0.00022287, [7] [b_1]: 0.00015219 [b_2]: 7.73001e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 5.3001e-07 [cse]: 1.619e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.2e-05 [loop_unroll]: 0.00041549 [opt_after_cconv]: 9.521e-05, [1] [Cycle 1]: 8.934e-05, [7] [c_1]: 2.793e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.586e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.242e-05 [tuple_transform]: 6.937e-05, [1] [Cycle 1]: 6.497e-05, [4] [d_1]: 3.908e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.309e-05 [cse_after_recomputation]: 2.046e-05, [1] [Cycle 1]: 1.618e-05, [1] [cse]: 1.11e-05 [environ_conv]: 4.99e-06 [swap_dp_allreduce_reducescatter]: 4.95999e-06 [bias_add_comm_swap]: 2.21998e-06 [label_micro_interleaved_index]: 3.95998e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.14003e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.22001e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 1.98997e-06 [reorder_send_recv_between_fp_bp]: 2.65002e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49e-06 [control_data_broadcast_order]: 1.116e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.44002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.718e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.905e-05, [1] [Cycle 1]: 6.496e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.191e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00043962 [validate]: 3.16e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.124602 [execute]: 9.17001e-06 Sums bootstrap : 0.000490s : 0.37% type_inference : 0.004455s : 3.35% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.03% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.02% optimize.opt_a.loop_unroll : 0.000019s : 0.01% optimize.opt_a.a_1 : 0.000417s : 0.31% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.11% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000369s : 0.28% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.34% optimize.opt_b.b_1 : 0.000152s : 0.11% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000415s : 0.31% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000440s : 0.33% validate : 0.000032s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.124602s : 93.59% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.59% : 0.000023s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000005s : 4: substitution.graph_param_transform 65.72% : 0.000080s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 3.00% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004414 2 91.46% : 0.004037s : 1: type_inference.infer 8.54% : 0.000377s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.09% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.50% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.27% : 0.000002s : 8: predicate.less_batch_normalization 1.67% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.88% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.83% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.17% : 0.000002s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.32% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.94% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000262 6 40.58% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.42% : 0.000156s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.145140 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.07% : 0.003000s : 1: add_attr 2.06% : 0.002992s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000057s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.36% : 0.000525s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.29% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.32% : 0.000460s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.53% : 0.000767s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000097s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.30% : 0.001891s : 1: opt_a 0.07% : 0.000099s : 1: opt_after_cconv 0.31% : 0.000449s : 1: opt_after_jit_grad 0.16% : 0.000232s : 1: opt_b 2.58% : 0.003749s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.01% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.14% : 0.000206s : 1: renormalize.infer 0.11% : 0.000156s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000034s : 1: rewriter_after_opt_a 0.03% : 0.000045s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000072s : 1: symbol_engine_optimizer 85.86% : 0.124624s : 1: task_emit 0.05% : 0.000072s : 1: tuple_transform 3.08% : 0.004469s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.423644, [24] [bootstrap]: 0.00047905 [type_inference]: 0.00600493 [event_method]: 1.544e-05 [auto_monad]: 5.602e-05 [graph_reusing]: 5.56e-06 [inline]: 1.82001e-06 [add_attr]: 0.00313201, [1] [add_attr_with_inline]: 0.00312234, [1] [Cycle 1]: 5.561e-05, [2] [tag_attr]: 1.678e-05 [meta_addattr_fg_expand]: 3.56001e-06 [parallel-infer-symbol]: 3.5e-06 [pre_auto_parallel]: 6.727e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.17001e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00430825, [53] [py_interpret_to_execute]: 2.264e-05 [rewriter_before_opt_a]: 5.957e-05 [opt_a]: 0.00229452, [2] [Cycle 1]: 0.00165541, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 3.272e-05 [loop_unroll]: 2.08e-05 [a_1]: 0.00046263 [with_stream_mark]: 1.547e-05 [recompute_prepare]: 9.58997e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 3.02002e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.677e-05 [accelerated_algorithm]: 6.96999e-06 [shard]: 2.53e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 8.99e-06 [auto_parallel]: 6.46e-06 [parallel]: 1.858e-05 [flash_sp]: 8.03001e-06 [merge_comm]: 4.22e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 1.038e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.74002e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.59e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.79998e-06 [offload_activation]: 1.01e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.154e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 1.002e-05 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.42001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00051315 [add_forward_monad_depend]: 5.25999e-06 [auto_monad_grad]: 2.38998e-06 [auto_monad_eliminator]: 1.482e-05 [cse]: 2.912e-05 [a_3]: 4.244e-05 [Cycle 2]: 0.00062848, [45] [expand_dump_flag]: 1.35001e-06 [switch_simplify]: 7.23e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.000129 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.78002e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.36e-06 [parameter_eliminate]: 1.40001e-06 [a_2]: 6.818e-05 [accelerated_algorithm]: 5.57999e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 5.29998e-06 [auto_parallel]: 6.07001e-06 [parallel]: 4.71002e-06 [flash_sp]: 3.33e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 3.08e-06 [matmul_add_comm_reduction]: 6.33998e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 6.79001e-06 [virtual_dataset]: 6.65002e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.77002e-06 [cell_reuse_recompute_pass]: 1.89e-06 [offload_activation]: 6.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.01e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 8.28001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.52002e-06 [meta_fg_expand]: 1.80001e-06 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.63002e-06 [after_resolve]: 1.068e-05 [a_after_grad]: 8.43999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 8.49002e-06 [cse]: 1.581e-05 [a_3]: 3.394e-05 [py_interpret_to_execute_after_opt_a]: 9.25999e-06 [slice_cell_reuse_recomputed_activation]: 2.08998e-06 [rewriter_after_opt_a]: 3.319e-05 [convert_after_rewriter]: 7.16999e-06 [order_py_execute_after_rewriter]: 5.11997e-06 [mutable_eliminate]: 0.00050357 [opt_b]: 0.00018918, [1] [Cycle 1]: 0.0001821, [7] [b_1]: 0.0001077 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 6.42001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 4.39992e-07 [cse]: 1.96e-05 [optimize_parallel_all_gather_comm]: 1.699e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 2.582e-05 [loop_unroll]: 0.00045335 [opt_after_cconv]: 0.0001006, [1] [Cycle 1]: 9.425e-05, [7] [c_1]: 2.718e-05 [parameter_eliminate]: 2.89001e-06 [updatestate_depend_eliminate]: 5.52001e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.71e-06 [cse]: 1.992e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.283e-05 [tuple_transform]: 7.241e-05, [1] [Cycle 1]: 6.762e-05, [4] [d_1]: 4.131e-05 [none_parameter_eliminate]: 2.21e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 7.412e-05 [cse_after_recomputation]: 2.256e-05, [1] [Cycle 1]: 1.797e-05, [1] [cse]: 1.258e-05 [environ_conv]: 5.23002e-06 [swap_dp_allreduce_reducescatter]: 5.78002e-06 [bias_add_comm_swap]: 2.85998e-06 [label_micro_interleaved_index]: 5.34998e-06 [label_fine_grained_interleaved_index]: 3.08998e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.85001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.274e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 3.85998e-06 [overlap_recompute_and_grad_model_parallel]: 4.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.58002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.36998e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.935e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 7.408e-05, [1] [Cycle 1]: 6.932e-05, [6] [build]: 3.83001e-06 [elim_shapecalc]: 9.96e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 9.36998e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.97001e-06 [pipeline_parallel_scheduler]: 1.77001e-06 [auto_monad_reorder]: 1.717e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.93001e-06 [opt_after_jit_grad]: 0.00049462 [validate]: 3.562e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.408763 [execute]: 8.69e-06 Sums bootstrap : 0.000479s : 0.11% type_inference : 0.006005s : 1.43% event_method : 0.000015s : 0.00% auto_monad : 0.000056s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000067s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.01% optimize.rewriter_before_opt_a : 0.000060s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.01% optimize.opt_a.a_1 : 0.000592s : 0.14% optimize.opt_a.with_stream_mark : 0.000026s : 0.01% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000023s : 0.01% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000513s : 0.12% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.01% optimize.opt_a.cse : 0.000045s : 0.01% optimize.opt_a.a_3 : 0.000076s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000504s : 0.12% optimize.opt_b.b_1 : 0.000108s : 0.03% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.01% optimize.loop_unroll : 0.000453s : 0.11% optimize.opt_after_cconv.c_1 : 0.000027s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000074s : 0.02% optimize.cse_after_recomputation.cse : 0.000013s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000495s : 0.12% validate : 0.000036s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.408763s : 97.45% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000180 30 15.31% : 0.000028s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000006s : 4: substitution.graph_param_transform 65.90% : 0.000118s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.43% : 0.000004s : 4: substitution.remove_not_recompute_node 2.75% : 0.000005s : 4: substitution.replace_old_param 7.02% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005956 2 90.07% : 0.005364s : 1: type_inference.infer 9.93% : 0.000592s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.74% : 0.000028s : 3: replace.inline 29.26% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000128 5 90.97% : 0.000116s : 3: match.inline 9.03% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000165 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.73% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 0.82% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.88% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.04% : 0.000003s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.67% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.19% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000002s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.29% : 0.000004s : 32: predicate.load_eliminater 1.49% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.57% : 0.000003s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.76% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 1.03% : 0.000002s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000002s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.72% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.22% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000395 8 46.95% : 0.000185s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.05% : 0.000209s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.432723 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.72% : 0.003137s : 1: add_attr 0.72% : 0.003126s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000078s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000061s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.12% : 0.000518s : 1: bootstrap 0.01% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.01% : 0.000022s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.11% : 0.000463s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.12% : 0.000514s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000015s : 1: opt.transform.mutable_eliminate 0.22% : 0.000970s : 78: opt.transform.opt_a 0.01% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000090s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.01% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.53% : 0.002298s : 1: opt_a 0.02% : 0.000104s : 1: opt_after_cconv 0.12% : 0.000506s : 1: opt_after_jit_grad 0.04% : 0.000193s : 1: opt_b 1.00% : 0.004313s : 1: optimize 0.00% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000072s : 1: pre_auto_parallel 0.01% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.06% : 0.000275s : 1: renormalize.infer 0.05% : 0.000231s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000038s : 1: rewriter_after_opt_a 0.01% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000077s : 1: symbol_engine_optimizer 94.47% : 0.408785s : 1: task_emit 0.02% : 0.000075s : 1: tuple_transform 1.39% : 0.006025s : 1: type_inference 0.01% : 0.000063s : 1: validate TotalTime = 1.53934, [24] [bootstrap]: 0.00052132 [type_inference]: 0.0121824 [event_method]: 4.966e-05 [auto_monad]: 0.00012282 [graph_reusing]: 8.31002e-06 [inline]: 1.96e-06 [add_attr]: 0.00319199, [1] [add_attr_with_inline]: 0.00318214, [1] [Cycle 1]: 8.109e-05, [2] [tag_attr]: 3.743e-05 [meta_addattr_fg_expand]: 9.29998e-06 [parallel-infer-symbol]: 3.38e-06 [pre_auto_parallel]: 5.009e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.014457, [53] [py_interpret_to_execute]: 3.914e-05 [rewriter_before_opt_a]: 0.00014677 [opt_a]: 0.0120057, [3] [Cycle 1]: 0.00781785, [45] [expand_dump_flag]: 4.57e-06 [switch_simplify]: 7.531e-05 [loop_unroll]: 6.185e-05 [a_1]: 0.00149418 [with_stream_mark]: 2.529e-05 [recompute_prepare]: 2.23e-05 [updatestate_depend_eliminate]: 9.04e-06 [updatestate_assign_eliminate]: 8.1e-06 [updatestate_loads_eliminate]: 7.17002e-06 [parameter_eliminate]: 2.64999e-06 [a_2]: 0.00024919 [accelerated_algorithm]: 3.242e-05 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 3.26999e-06 [shard_inline]: 1.654e-05 [merge_send_recv]: 1.687e-05 [auto_parallel]: 1.226e-05 [parallel]: 1.985e-05 [flash_sp]: 1.193e-05 [merge_comm]: 9.78998e-06 [allreduce_fusion]: 8.67998e-06 [matmul_add_comm_reduction]: 2.817e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.822e-05 [virtual_dataset]: 1.594e-05 [get_grad_eliminate_]: 1.553e-05 [virtual_output]: 1.55e-05 [merge_forward]: 1.01e-05 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 1.91e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.022e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.728e-05 [set_forward_comm_id_for_comm_node_pass]: 1.051e-05 [meta_fg_expand]: 0.0016064 [flash_sp_send_recv_attached]: 3.78999e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 6.236e-05 [a_after_grad]: 8.332e-05 [renormalize]: 0.0028798 [add_forward_monad_depend]: 1.075e-05 [auto_monad_grad]: 5.89e-06 [auto_monad_eliminator]: 5.951e-05 [cse]: 0.00016917 [a_3]: 0.00034172 [Cycle 2]: 0.00323253, [45] [expand_dump_flag]: 2.33002e-06 [switch_simplify]: 4.782e-05 [loop_unroll]: 4.357e-05 [a_1]: 0.00156827 [with_stream_mark]: 1.592e-05 [recompute_prepare]: 1.196e-05 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 4.42998e-06 [updatestate_loads_eliminate]: 3.91999e-06 [parameter_eliminate]: 1.17999e-06 [a_2]: 0.00012795 [accelerated_algorithm]: 1.257e-05 [shard]: 1.84e-06 [meta_shard_fg_expand]: 2.06e-06 [shard_inline]: 9.18002e-06 [merge_send_recv]: 8.58001e-06 [auto_parallel]: 7.75e-06 [parallel]: 6.29999e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 5.19e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.075e-05 [virtual_dataset]: 8.95999e-06 [get_grad_eliminate_]: 8.67998e-06 [virtual_output]: 9.21002e-06 [merge_forward]: 5.74e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 1.053e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.648e-05 [merge_recompute_call_nodes]: 8.89995e-07 [before_grad]: 1.489e-05 [set_forward_comm_id_for_comm_node_pass]: 5.49e-06 [meta_fg_expand]: 8.868e-05 [flash_sp_send_recv_attached]: 1.27999e-06 [receive_attached]: 2.22001e-06 [after_resolve]: 1.814e-05 [a_after_grad]: 1.444e-05 [renormalize]: 0.00067627 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.61998e-06 [auto_monad_eliminator]: 1.596e-05 [cse]: 8.904e-05 [a_3]: 6.766e-05 [Cycle 3]: 0.00093864, [45] [expand_dump_flag]: 1.57001e-06 [switch_simplify]: 1.078e-05 [loop_unroll]: 8.79e-06 [a_1]: 0.00025263 [with_stream_mark]: 1.138e-05 [recompute_prepare]: 9.59e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 4.39998e-06 [updatestate_loads_eliminate]: 4.28001e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012423 [accelerated_algorithm]: 1.208e-05 [shard]: 1.54e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 8.97999e-06 [merge_send_recv]: 7.68001e-06 [auto_parallel]: 7.56999e-06 [parallel]: 5.81998e-06 [flash_sp]: 1.25999e-06 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 8.89998e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.069e-05 [virtual_dataset]: 8.50001e-06 [get_grad_eliminate_]: 8.63001e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.60999e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 8.93002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.85e-05 [merge_recompute_call_nodes]: 8.80013e-07 [before_grad]: 1.499e-05 [set_forward_comm_id_for_comm_node_pass]: 5.66e-06 [meta_fg_expand]: 2.98998e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.67999e-06 [after_resolve]: 1.462e-05 [a_after_grad]: 1.418e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.96998e-06 [auto_monad_grad]: 1.31998e-06 [auto_monad_eliminator]: 1.284e-05 [cse]: 3.065e-05 [a_3]: 6.14e-05 [py_interpret_to_execute_after_opt_a]: 1.436e-05 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 5.013e-05 [convert_after_rewriter]: 9.29e-06 [order_py_execute_after_rewriter]: 6.81001e-06 [mutable_eliminate]: 0.00055602 [opt_b]: 0.0002987, [1] [Cycle 1]: 0.00029077, [7] [b_1]: 0.00018948 [b_2]: 1.094e-05 [updatestate_depend_eliminate]: 8.77e-06 [updatestate_assign_eliminate]: 4.63999e-06 [updatestate_loads_eliminate]: 4.23999e-06 [renormalize]: 3.60014e-07 [cse]: 3.621e-05 [optimize_parallel_all_gather_comm]: 2.22e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.378e-05 [loop_unroll]: 0.00046041 [opt_after_cconv]: 0.00014058, [1] [Cycle 1]: 0.00013385, [7] [c_1]: 4.787e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 7.71999e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 4.2e-06 [cse]: 3.261e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 3.17e-05 [tuple_transform]: 0.0001021, [1] [Cycle 1]: 9.76e-05, [4] [d_1]: 6.729e-05 [none_parameter_eliminate]: 1.66998e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 1.007e-05 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 6.144e-05 [cse_after_recomputation]: 3.312e-05, [1] [Cycle 1]: 2.818e-05, [1] [cse]: 2.275e-05 [environ_conv]: 9.24e-06 [swap_dp_allreduce_reducescatter]: 7.87e-06 [bias_add_comm_swap]: 2.86e-06 [label_micro_interleaved_index]: 5.07e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.38002e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.37e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91003e-06 [control_data_broadcast_order]: 1.781e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 5.10001e-06 [overlap_recompute_and_grad_model_parallel]: 6.16e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 5.15999e-06 [overlap_grad_flash_sp]: 2.737e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 1.87999e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.39e-06 [symbol_engine_optimizer]: 0.00010134, [1] [Cycle 1]: 9.697e-05, [6] [build]: 1.065e-05 [elim_shapecalc]: 1.453e-05 [elim_not_effective]: 1.811e-05 [opt_reshape]: 1.029e-05 [fold_const_symbol]: 1.465e-05 [renormalize]: 2.50002e-07 [detach_backward]: 2.29001e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 2.541e-05 [get_jit_bprop_graph]: 1.37e-06 [rewriter_after_jit_bprop_graph]: 4.10998e-06 [opt_after_jit_grad]: 0.00051363 [validate]: 5.066e-05 [backend_pass]: 9.40025e-07 [task_emit]: 1.50788 [execute]: 8.75001e-06 Sums bootstrap : 0.000521s : 0.03% type_inference : 0.012182s : 0.79% event_method : 0.000050s : 0.00% auto_monad : 0.000123s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.00% optimize.rewriter_before_opt_a : 0.000147s : 0.01% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000134s : 0.01% optimize.opt_a.loop_unroll : 0.000114s : 0.01% optimize.opt_a.a_1 : 0.003315s : 0.22% optimize.opt_a.with_stream_mark : 0.000053s : 0.00% optimize.opt_a.recompute_prepare : 0.000044s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000501s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000033s : 0.00% optimize.opt_a.auto_parallel : 0.000028s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000017s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000046s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.00% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000033s : 0.00% optimize.opt_a.merge_forward : 0.000020s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000039s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.00% optimize.opt_a.meta_fg_expand : 0.001698s : 0.11% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000095s : 0.01% optimize.opt_a.a_after_grad : 0.000112s : 0.01% optimize.opt_a.renormalize : 0.003556s : 0.23% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000088s : 0.01% optimize.opt_a.cse : 0.000289s : 0.02% optimize.opt_a.a_3 : 0.000471s : 0.03% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000556s : 0.04% optimize.opt_b.b_1 : 0.000189s : 0.01% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000036s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000460s : 0.03% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.00% optimize.tuple_transform.d_1 : 0.000067s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.00% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000514s : 0.03% validate : 0.000051s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.507882s : 98.25% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000842 222 5.90% : 0.000050s : 12: substitution.arithmetic_simplify 1.89% : 0.000016s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.44% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 5: substitution.fold_const_symbol 0.93% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.64% : 0.000477s : 17: substitution.inline 2.11% : 0.000018s : 2: substitution.inline_without_move 1.29% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.88% : 0.000016s : 3: substitution.less_batch_normalization 1.53% : 0.000013s : 11: substitution.minmaximum_grad 0.65% : 0.000006s : 5: substitution.partial_eliminate 1.89% : 0.000016s : 20: substitution.remove_not_recompute_node 3.09% : 0.000026s : 10: substitution.replace_applicator 1.49% : 0.000013s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.49% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.63% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.20% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.62% : 0.000073s : 30: substitution.tuple_list_get_item_eliminator 2.30% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012099 2 86.08% : 0.010414s : 1: type_inference.infer 13.92% : 0.001684s : 1: type_inference.specialize ------[replace.] 0.000230 33 59.65% : 0.000137s : 17: replace.inline 40.35% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000506 33 92.42% : 0.000467s : 17: match.inline 7.58% : 0.000038s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000757 5764 1.06% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.03% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.51% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000043s : 249: predicate.inline 1.21% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.69% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.38% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.39% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.10% : 0.000016s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.62% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.68% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.11% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.35% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.33% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.90% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.12% : 0.000039s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.59% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001750 34 56.43% : 0.000988s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.57% : 0.000763s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.565880 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.20% : 0.003197s : 1: add_attr 0.20% : 0.003186s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000130s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.04% : 0.000562s : 1: bootstrap 0.00% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.00% : 0.000057s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.03% : 0.000470s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000567s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000021s : 1: opt.transform.mutable_eliminate 0.32% : 0.005016s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000175s : 28: opt.transform.opt_b 0.00% : 0.000075s : 2: opt.transform.opt_trans_graph 0.00% : 0.000054s : 4: opt.transform.symbol_engine_opt 0.77% : 0.012009s : 1: opt_a 0.01% : 0.000144s : 1: opt_after_cconv 0.03% : 0.000525s : 1: opt_after_jit_grad 0.02% : 0.000303s : 1: opt_b 0.92% : 0.014461s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000055s : 1: pre_auto_parallel 0.00% : 0.000043s : 1: py_interpret_to_execute 0.00% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000037s : 1: remove_dup_value 0.12% : 0.001880s : 2: renormalize.infer 0.11% : 0.001658s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000055s : 1: rewriter_after_opt_a 0.01% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000104s : 1: symbol_engine_optimizer 96.30% : 1.507916s : 1: task_emit 0.01% : 0.000105s : 1: tuple_transform 0.78% : 0.012202s : 1: type_inference 0.01% : 0.000082s : 1: validate TotalTime = 0.500649, [24] [bootstrap]: 0.00051844 [type_inference]: 0.00525172 [event_method]: 1.167e-05 [auto_monad]: 5.501e-05 [graph_reusing]: 4.93001e-06 [inline]: 2.86e-06 [add_attr]: 0.00365809, [1] [add_attr_with_inline]: 0.00364756, [1] [Cycle 1]: 5.416e-05, [2] [tag_attr]: 1.449e-05 [meta_addattr_fg_expand]: 3.3e-06 [parallel-infer-symbol]: 3.68999e-06 [pre_auto_parallel]: 2.718e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.00467916, [53] [py_interpret_to_execute]: 1.914e-05 [rewriter_before_opt_a]: 4.654e-05 [opt_a]: 0.00238858, [2] [Cycle 1]: 0.00171342, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 2.474e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.00032371 [with_stream_mark]: 1.525e-05 [recompute_prepare]: 7.75e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 1.413e-05 [updatestate_loads_eliminate]: 3.09001e-06 [parameter_eliminate]: 2.04999e-06 [a_2]: 7.98e-05 [accelerated_algorithm]: 6.55002e-06 [shard]: 2.89001e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 5.59998e-06 [merge_send_recv]: 7.77998e-06 [auto_parallel]: 6.93e-06 [parallel]: 2.022e-05 [flash_sp]: 9.00001e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.00999e-06 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 7.98999e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 6.19001e-06 [merge_forward]: 4.10998e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.99001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.174e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.57002e-06 [meta_fg_expand]: 2.58e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.55997e-06 [after_resolve]: 1.187e-05 [a_after_grad]: 9.29e-06 [renormalize]: 0.00070664 [add_forward_monad_depend]: 5.94999e-06 [auto_monad_grad]: 2.53003e-06 [auto_monad_eliminator]: 1.555e-05 [cse]: 2.965e-05 [a_3]: 4.619e-05 [Cycle 2]: 0.00066295, [45] [expand_dump_flag]: 1.25001e-06 [switch_simplify]: 7.82002e-06 [loop_unroll]: 6.26998e-06 [a_1]: 0.00013885 [with_stream_mark]: 1.236e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.889e-05 [accelerated_algorithm]: 6.11e-06 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 5.24e-06 [auto_parallel]: 5.94e-06 [parallel]: 5.99e-06 [flash_sp]: 4.25e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 2.90002e-06 [matmul_add_comm_reduction]: 6.24999e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 7.25998e-06 [virtual_dataset]: 6.43e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 1.94e-06 [offload_activation]: 7.35998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.026e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 8.22998e-06 [set_forward_comm_id_for_comm_node_pass]: 4.02e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.63001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.40999e-06 [auto_monad_grad]: 1.11002e-06 [auto_monad_eliminator]: 7.51999e-06 [cse]: 1.697e-05 [a_3]: 3.371e-05 [py_interpret_to_execute_after_opt_a]: 1.119e-05 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.596e-05 [convert_after_rewriter]: 7.08e-06 [order_py_execute_after_rewriter]: 5.22999e-06 [mutable_eliminate]: 0.00063076 [opt_b]: 0.00020895, [1] [Cycle 1]: 0.00020146, [7] [b_1]: 0.00011921 [b_2]: 8.07e-06 [updatestate_depend_eliminate]: 6.39001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.51e-06 [renormalize]: 7.30011e-07 [cse]: 2.51e-05 [optimize_parallel_all_gather_comm]: 1.861e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.591e-05 [loop_unroll]: 0.00057639 [opt_after_cconv]: 0.00011456, [1] [Cycle 1]: 0.0001078, [7] [c_1]: 3.139e-05 [parameter_eliminate]: 3.47997e-06 [updatestate_depend_eliminate]: 7.53999e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.44999e-06 [cse]: 2.338e-05 [renormalize]: 6.79982e-07 [remove_dup_value]: 1.361e-05 [tuple_transform]: 7.688e-05, [1] [Cycle 1]: 7.163e-05, [4] [d_1]: 4.488e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 4.831e-05 [cse_after_recomputation]: 2.318e-05, [1] [Cycle 1]: 1.829e-05, [1] [cse]: 1.281e-05 [environ_conv]: 6.04001e-06 [swap_dp_allreduce_reducescatter]: 6.12999e-06 [bias_add_comm_swap]: 2.98e-06 [label_micro_interleaved_index]: 5.84e-06 [label_fine_grained_interleaved_index]: 2.55002e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.91e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 8.99978e-07 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.263e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 4.43001e-06 [overlap_recompute_and_grad_model_parallel]: 4.92e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 1.94999e-06 [overlap_grad_ring_attention]: 4.40999e-06 [overlap_grad_flash_sp]: 1.873e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.91003e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 7.704e-05, [1] [Cycle 1]: 7.213e-05, [6] [build]: 3.33998e-06 [elim_shapecalc]: 9.44e-06 [elim_not_effective]: 1.304e-05 [opt_reshape]: 7.26999e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.609e-05 [get_jit_bprop_graph]: 1.96e-06 [rewriter_after_jit_bprop_graph]: 4.83001e-06 [opt_after_jit_grad]: 0.00088583 [validate]: 4.799e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.485203 [execute]: 9.06998e-06 Sums bootstrap : 0.000518s : 0.10% type_inference : 0.005252s : 1.06% event_method : 0.000012s : 0.00% auto_monad : 0.000055s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000027s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.00% optimize.rewriter_before_opt_a : 0.000047s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.01% optimize.opt_a.loop_unroll : 0.000020s : 0.00% optimize.opt_a.a_1 : 0.000463s : 0.09% optimize.opt_a.with_stream_mark : 0.000028s : 0.01% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000159s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.00% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000013s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.01% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000008s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000707s : 0.14% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000047s : 0.01% optimize.opt_a.a_3 : 0.000080s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000631s : 0.13% optimize.opt_b.b_1 : 0.000119s : 0.02% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.01% optimize.loop_unroll : 0.000576s : 0.12% optimize.opt_after_cconv.c_1 : 0.000031s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000045s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.01% optimize.cse_after_recomputation.cse : 0.000013s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000886s : 0.18% validate : 0.000048s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.485203s : 97.84% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000147 26 19.11% : 0.000028s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.40% : 0.000006s : 4: substitution.graph_param_transform 64.50% : 0.000095s : 2: substitution.inline 2.51% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.32% : 0.000005s : 4: substitution.remove_not_recompute_node 3.75% : 0.000006s : 4: substitution.replace_old_param ------[type_inference.] 0.005200 2 91.83% : 0.004776s : 1: type_inference.infer 8.17% : 0.000425s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000093 2 100.00% : 0.000093s : 2: match.inline ------[predicate.] 0.000158 984 1.08% : 0.000002s : 9: predicate.accumulaten_eliminater 1.18% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.95% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000004s : 17: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.89% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 13: predicate.environ_get_depend_swap 1.76% : 0.000003s : 21: predicate.environ_get_eliminate 1.00% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.66% : 0.000003s : 11: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.93% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000009s : 44: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000002s : 8: predicate.less_batch_normalization 1.81% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 2.00% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.66% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.96% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 1.00% : 0.000002s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.59% : 0.000002s : 4: predicate.mutable_eliminate 0.53% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.07% : 0.000002s : 13: predicate.partial_eliminate 1.10% : 0.000002s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.58% : 0.000004s : 26: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.96% : 0.000002s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 1.08% : 0.000002s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.89% : 0.000001s : 11: predicate.switch_defer_inline 1.57% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.79% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.64% : 0.000003s : 17: predicate.tuple_list_get_set_item_eliminator 2.78% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 17: predicate.tuple_to_list_eliminator_ 1.99% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.92% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000375 6 35.30% : 0.000132s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.70% : 0.000243s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.510685 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.72% : 0.003663s : 1: add_attr 0.72% : 0.003652s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000052s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.01% : 0.000060s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.11% : 0.000558s : 1: bootstrap 0.01% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000026s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000010s : 1: environ_conv 0.00% : 0.000018s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.12% : 0.000588s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.13% : 0.000643s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000016s : 1: opt.transform.mutable_eliminate 0.16% : 0.000833s : 78: opt.transform.opt_a 0.01% : 0.000030s : 1: opt.transform.opt_after_cconv 0.01% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000099s : 28: opt.transform.opt_b 0.01% : 0.000049s : 2: opt.transform.opt_trans_graph 0.01% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.47% : 0.002392s : 1: opt_a 0.02% : 0.000118s : 1: opt_after_cconv 0.18% : 0.000902s : 1: opt_after_jit_grad 0.04% : 0.000213s : 1: opt_b 0.92% : 0.004684s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000023s : 1: py_interpret_to_execute 0.00% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.07% : 0.000346s : 1: renormalize.infer 0.07% : 0.000353s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000040s : 1: rewriter_after_opt_a 0.01% : 0.000051s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000080s : 1: symbol_engine_optimizer 95.01% : 0.485226s : 1: task_emit 0.02% : 0.000080s : 1: tuple_transform 1.03% : 0.005275s : 1: type_inference 0.02% : 0.000085s : 1: validate TotalTime = 0.261346, [24] [bootstrap]: 0.00069168 [type_inference]: 0.0109056 [event_method]: 4.277e-05 [auto_monad]: 0.00011541 [graph_reusing]: 8.52998e-06 [inline]: 2.04999e-06 [add_attr]: 0.00327101, [1] [add_attr_with_inline]: 0.0032621, [1] [Cycle 1]: 7.93e-05, [2] [tag_attr]: 3.546e-05 [meta_addattr_fg_expand]: 8.95001e-06 [parallel-infer-symbol]: 3.59002e-06 [pre_auto_parallel]: 4.951e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0143072, [53] [py_interpret_to_execute]: 3.789e-05 [rewriter_before_opt_a]: 0.00012899 [opt_a]: 0.0118479, [3] [Cycle 1]: 0.00755316, [45] [expand_dump_flag]: 3.79002e-06 [switch_simplify]: 6.705e-05 [loop_unroll]: 5.488e-05 [a_1]: 0.00142738 [with_stream_mark]: 2.629e-05 [recompute_prepare]: 2.294e-05 [updatestate_depend_eliminate]: 9.37999e-06 [updatestate_assign_eliminate]: 8.23001e-06 [updatestate_loads_eliminate]: 7.55e-06 [parameter_eliminate]: 2.53e-06 [a_2]: 0.00024805 [accelerated_algorithm]: 3.192e-05 [shard]: 1.82999e-06 [meta_shard_fg_expand]: 3.79002e-06 [shard_inline]: 1.657e-05 [merge_send_recv]: 1.687e-05 [auto_parallel]: 1.139e-05 [parallel]: 1.899e-05 [flash_sp]: 1.217e-05 [merge_comm]: 1.007e-05 [allreduce_fusion]: 9.09e-06 [matmul_add_comm_reduction]: 3.02e-05 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 1.829e-05 [virtual_dataset]: 1.549e-05 [get_grad_eliminate_]: 1.541e-05 [virtual_output]: 1.554e-05 [merge_forward]: 9.34998e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.781e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.882e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 2.79e-05 [set_forward_comm_id_for_comm_node_pass]: 1.04e-05 [meta_fg_expand]: 0.0015144 [flash_sp_send_recv_attached]: 4.22998e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 6.25e-05 [a_after_grad]: 8.386e-05 [renormalize]: 0.00278695 [add_forward_monad_depend]: 1.027e-05 [auto_monad_grad]: 5.54e-06 [auto_monad_eliminator]: 5.791e-05 [cse]: 0.00017326 [a_3]: 0.00034535 [Cycle 2]: 0.00335002, [45] [expand_dump_flag]: 2.47001e-06 [switch_simplify]: 4.794e-05 [loop_unroll]: 4.454e-05 [a_1]: 0.0017646 [with_stream_mark]: 1.541e-05 [recompute_prepare]: 1.267e-05 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 4.08999e-06 [parameter_eliminate]: 1.14998e-06 [a_2]: 0.00012744 [accelerated_algorithm]: 1.22e-05 [shard]: 1.31998e-06 [meta_shard_fg_expand]: 2.02001e-06 [shard_inline]: 9.15001e-06 [merge_send_recv]: 8.01001e-06 [auto_parallel]: 8.13999e-06 [parallel]: 6.21e-06 [flash_sp]: 3.63e-06 [merge_comm]: 6.00002e-06 [allreduce_fusion]: 5.09e-06 [matmul_add_comm_reduction]: 9.07999e-06 [allreduce_slice_to_reducescatter]: 4.90021e-07 [virtual_shard_identity]: 1.068e-05 [virtual_dataset]: 8.89003e-06 [get_grad_eliminate_]: 8.65001e-06 [virtual_output]: 8.37998e-06 [merge_forward]: 4.67e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 1.06e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.756e-05 [merge_recompute_call_nodes]: 1.07998e-06 [before_grad]: 1.55e-05 [set_forward_comm_id_for_comm_node_pass]: 6.11e-06 [meta_fg_expand]: 4.179e-05 [flash_sp_send_recv_attached]: 1.17e-06 [receive_attached]: 1.57001e-06 [after_resolve]: 1.652e-05 [a_after_grad]: 1.444e-05 [renormalize]: 0.00066939 [add_forward_monad_depend]: 4.95999e-06 [auto_monad_grad]: 1.50999e-06 [auto_monad_eliminator]: 1.531e-05 [cse]: 5.1e-05 [a_3]: 6.646e-05 [Cycle 3]: 0.00092852, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 1.051e-05 [loop_unroll]: 8.95001e-06 [a_1]: 0.00025247 [with_stream_mark]: 1.092e-05 [recompute_prepare]: 9.35001e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.93001e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012392 [accelerated_algorithm]: 1.217e-05 [shard]: 1.55999e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 9.01002e-06 [merge_send_recv]: 7.95e-06 [auto_parallel]: 7.51999e-06 [parallel]: 5.03002e-06 [flash_sp]: 1.30999e-06 [merge_comm]: 4.82e-06 [allreduce_fusion]: 4.88001e-06 [matmul_add_comm_reduction]: 9.14e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 9.99999e-06 [virtual_dataset]: 8.61002e-06 [get_grad_eliminate_]: 8.45999e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 1.83002e-06 [offload_activation]: 8.95001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.579e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.431e-05 [set_forward_comm_id_for_comm_node_pass]: 5.59998e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.58002e-06 [after_resolve]: 1.505e-05 [a_after_grad]: 1.476e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.64e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.245e-05 [cse]: 3.001e-05 [a_3]: 6.071e-05 [py_interpret_to_execute_after_opt_a]: 1.488e-05 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 5.047e-05 [convert_after_rewriter]: 9.49e-06 [order_py_execute_after_rewriter]: 6.90998e-06 [mutable_eliminate]: 0.00054768 [opt_b]: 0.00033547, [1] [Cycle 1]: 0.00032786, [7] [b_1]: 0.00018976 [b_2]: 1.066e-05 [updatestate_depend_eliminate]: 8.93002e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 4.13001e-06 [renormalize]: 4.69998e-07 [cse]: 7.286e-05 [optimize_parallel_all_gather_comm]: 2.389e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.337e-05 [loop_unroll]: 0.00045054 [opt_after_cconv]: 0.00014219, [1] [Cycle 1]: 0.00013523, [7] [c_1]: 4.936e-05 [parameter_eliminate]: 2.51998e-06 [updatestate_depend_eliminate]: 7.37002e-06 [updatestate_assign_eliminate]: 4.22998e-06 [updatestate_loads_eliminate]: 4.03001e-06 [cse]: 3.3e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 3.214e-05 [tuple_transform]: 0.00010457, [1] [Cycle 1]: 9.965e-05, [4] [d_1]: 6.883e-05 [none_parameter_eliminate]: 1.87999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.87999e-06 [partial_unused_args_eliminate]: 1.91998e-06 [add_recomputation]: 6.215e-05 [cse_after_recomputation]: 3.359e-05, [1] [Cycle 1]: 2.865e-05, [1] [cse]: 2.274e-05 [environ_conv]: 9.25999e-06 [swap_dp_allreduce_reducescatter]: 7.84002e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.79998e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.35002e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.14003e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.747e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 5.44e-06 [overlap_recompute_and_grad_model_parallel]: 6.12001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.84998e-06 [overlap_grad_ring_attention]: 5.44998e-06 [overlap_grad_flash_sp]: 2.634e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 0.0001023, [1] [Cycle 1]: 9.752e-05, [6] [build]: 1.084e-05 [elim_shapecalc]: 1.441e-05 [elim_not_effective]: 1.829e-05 [opt_reshape]: 9.97001e-06 [fold_const_symbol]: 1.488e-05 [renormalize]: 2.80008e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.41002e-06 [auto_monad_reorder]: 2.481e-05 [get_jit_bprop_graph]: 1.72999e-06 [rewriter_after_jit_bprop_graph]: 4.54998e-06 [opt_after_jit_grad]: 0.00050593 [validate]: 4.882e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.2311 [execute]: 9.73998e-06 Sums bootstrap : 0.000692s : 0.27% type_inference : 0.010906s : 4.25% event_method : 0.000043s : 0.02% auto_monad : 0.000115s : 0.04% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000050s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.01% optimize.rewriter_before_opt_a : 0.000129s : 0.05% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.05% optimize.opt_a.loop_unroll : 0.000108s : 0.04% optimize.opt_a.a_1 : 0.003444s : 1.34% optimize.opt_a.with_stream_mark : 0.000053s : 0.02% optimize.opt_a.recompute_prepare : 0.000045s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.02% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.01% optimize.opt_a.merge_send_recv : 0.000033s : 0.01% optimize.opt_a.auto_parallel : 0.000027s : 0.01% optimize.opt_a.parallel : 0.000030s : 0.01% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.02% optimize.opt_a.virtual_dataset : 0.000033s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.01% optimize.opt_a.virtual_output : 0.000032s : 0.01% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.01% optimize.opt_a.meta_fg_expand : 0.001559s : 0.61% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000094s : 0.04% optimize.opt_a.a_after_grad : 0.000113s : 0.04% optimize.opt_a.renormalize : 0.003456s : 1.35% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.03% optimize.opt_a.cse : 0.000254s : 0.10% optimize.opt_a.a_3 : 0.000473s : 0.18% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.02% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000548s : 0.21% optimize.opt_b.b_1 : 0.000190s : 0.07% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000073s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.01% optimize.loop_unroll : 0.000451s : 0.18% optimize.opt_after_cconv.c_1 : 0.000049s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.01% optimize.tuple_transform.d_1 : 0.000069s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.02% optimize.cse_after_recomputation.cse : 0.000023s : 0.01% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000506s : 0.20% validate : 0.000049s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.231100s : 90.03% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.001008 218 5.05% : 0.000051s : 11: substitution.arithmetic_simplify 1.60% : 0.000016s : 2: substitution.cast_eliminate 0.26% : 0.000003s : 5: substitution.elim_not_effective 0.38% : 0.000004s : 5: substitution.float_depend_g_call 0.41% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.21% : 0.000002s : 5: substitution.fold_const_symbol 0.81% : 0.000008s : 8: substitution.graph_param_transform 0.29% : 0.000003s : 2: substitution.incorporate_call 0.19% : 0.000002s : 2: substitution.incorporate_call_switch 50.13% : 0.000505s : 16: substitution.inline 1.78% : 0.000018s : 2: substitution.inline_without_move 1.05% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.56% : 0.000016s : 3: substitution.less_batch_normalization 1.34% : 0.000013s : 11: substitution.minmaximum_grad 0.57% : 0.000006s : 5: substitution.partial_eliminate 1.32% : 0.000013s : 20: substitution.remove_not_recompute_node 2.49% : 0.000025s : 10: substitution.replace_applicator 1.18% : 0.000012s : 15: substitution.replace_old_param 0.29% : 0.000003s : 1: substitution.set_cell_output_no_recompute 2.86% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.37% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 1.83% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 21.08% : 0.000212s : 28: substitution.tuple_list_get_item_eliminator 1.91% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010826 2 86.86% : 0.009403s : 1: type_inference.infer 13.14% : 0.001423s : 1: type_inference.specialize ------[replace.] 0.000223 30 59.38% : 0.000133s : 16: replace.inline 40.62% : 0.000091s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000674 30 73.71% : 0.000497s : 16: match.inline 26.29% : 0.000177s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000742 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 99: predicate.arithmetic_simplify 1.15% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000041s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.60% : 0.000019s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.44% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.14% : 0.000016s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.67% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.34% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.75% : 0.000006s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.32% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.91% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.60% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.12% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001632 32 57.37% : 0.000936s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.63% : 0.000696s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.287831 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.14% : 0.003276s : 1: add_attr 1.13% : 0.003266s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000067s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000123s : 1: auto_monad 0.01% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.25% : 0.000728s : 1: bootstrap 0.01% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.01% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.02% : 0.000050s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.16% : 0.000461s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.19% : 0.000558s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000020s : 1: opt.transform.mutable_eliminate 1.78% : 0.005127s : 117: opt.transform.opt_a 0.02% : 0.000048s : 1: opt.transform.opt_after_cconv 0.01% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000175s : 28: opt.transform.opt_b 0.03% : 0.000077s : 2: opt.transform.opt_trans_graph 0.02% : 0.000054s : 4: opt.transform.symbol_engine_opt 4.12% : 0.011851s : 1: opt_a 0.05% : 0.000146s : 1: opt_after_cconv 0.18% : 0.000518s : 1: opt_after_jit_grad 0.12% : 0.000339s : 1: opt_b 4.97% : 0.014312s : 1: optimize 0.01% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000055s : 1: pre_auto_parallel 0.01% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000037s : 1: remove_dup_value 0.65% : 0.001884s : 2: renormalize.infer 0.54% : 0.001557s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000055s : 1: rewriter_after_opt_a 0.05% : 0.000134s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000105s : 1: symbol_engine_optimizer 80.30% : 0.231122s : 1: task_emit 0.04% : 0.000108s : 1: tuple_transform 3.80% : 0.010926s : 1: type_inference 0.03% : 0.000080s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x7-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x8-pynative],max_mem:6.0M TotalTime = 0.0256807, [24] [bootstrap]: 0.00058603 [type_inference]: 0.00735868 [event_method]: 1.556e-05 [auto_monad]: 6.206e-05 [graph_reusing]: 6.00002e-06 [inline]: 2.48e-06 [add_attr]: 0.0040471, [1] [add_attr_with_inline]: 0.0040341, [1] [Cycle 1]: 5.671e-05, [2] [tag_attr]: 2.044e-05 [meta_addattr_fg_expand]: 3.94997e-06 [parallel-infer-symbol]: 3.28998e-06 [pre_auto_parallel]: 3.331e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 8.79983e-07 [dataset_repeat_opt]: 2.35002e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00498489, [53] [py_interpret_to_execute]: 2.574e-05 [rewriter_before_opt_a]: 7.071e-05 [opt_a]: 0.00258364, [2] [Cycle 1]: 0.00192681, [45] [expand_dump_flag]: 2.76999e-06 [switch_simplify]: 3.356e-05 [loop_unroll]: 2.071e-05 [a_1]: 0.00050364 [with_stream_mark]: 1.638e-05 [recompute_prepare]: 8.47e-06 [updatestate_depend_eliminate]: 4.04997e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 3.55e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 7.778e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.46e-06 [meta_shard_fg_expand]: 1.81998e-06 [shard_inline]: 5.70001e-06 [merge_send_recv]: 8.15e-06 [auto_parallel]: 6.09001e-06 [parallel]: 2.536e-05 [flash_sp]: 8.33001e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 9.15999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.58001e-06 [virtual_dataset]: 6.05002e-06 [get_grad_eliminate_]: 6.54001e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 4.09002e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.036e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 9.17001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35003e-06 [meta_fg_expand]: 2.83998e-06 [flash_sp_send_recv_attached]: 2.92002e-06 [receive_attached]: 3.03e-06 [after_resolve]: 1.095e-05 [a_after_grad]: 9.01998e-06 [renormalize]: 0.00068493 [add_forward_monad_depend]: 5.81e-06 [auto_monad_grad]: 2.58e-06 [auto_monad_eliminator]: 1.721e-05 [cse]: 3.061e-05 [a_3]: 4.815e-05 [Cycle 2]: 0.00064491, [45] [expand_dump_flag]: 8.09989e-07 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.00013457 [with_stream_mark]: 1.322e-05 [recompute_prepare]: 5.77001e-06 [updatestate_depend_eliminate]: 3.3e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.849e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.74e-06 [meta_shard_fg_expand]: 1.76998e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 6.06e-06 [auto_parallel]: 6.24999e-06 [parallel]: 5.96e-06 [flash_sp]: 3.7e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 6.12999e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 6.73998e-06 [virtual_dataset]: 5.71998e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 5.35001e-06 [merge_forward]: 3.14999e-06 [cell_reuse_recompute_pass]: 2.60997e-06 [offload_activation]: 7.63999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.63997e-06 [merge_recompute_call_nodes]: 9.10019e-07 [before_grad]: 9.29998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47997e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.29e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 9.01002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.79e-06 [auto_monad_grad]: 1.47999e-06 [auto_monad_eliminator]: 8.43001e-06 [cse]: 2.335e-05 [a_3]: 3.268e-05 [py_interpret_to_execute_after_opt_a]: 1.221e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 3.6e-05 [convert_after_rewriter]: 8.33999e-06 [order_py_execute_after_rewriter]: 5.82001e-06 [mutable_eliminate]: 0.00066604 [opt_b]: 0.00020866, [1] [Cycle 1]: 0.00020119, [7] [b_1]: 0.00011955 [b_2]: 8.18999e-06 [updatestate_depend_eliminate]: 6.48003e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.70002e-06 [renormalize]: 6.60017e-07 [cse]: 2.511e-05 [optimize_parallel_all_gather_comm]: 1.814e-05 [overlap_param_gather]: 1.93002e-06 [cconv]: 2.916e-05 [loop_unroll]: 0.00057937 [opt_after_cconv]: 0.000116, [1] [Cycle 1]: 0.00010852, [7] [c_1]: 3.129e-05 [parameter_eliminate]: 4.36002e-06 [updatestate_depend_eliminate]: 7.33e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.33002e-06 [cse]: 2.487e-05 [renormalize]: 6.80011e-07 [remove_dup_value]: 1.364e-05 [tuple_transform]: 7.977e-05, [1] [Cycle 1]: 7.51e-05, [4] [d_1]: 4.75e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.61999e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 7.478e-05 [cse_after_recomputation]: 2.404e-05, [1] [Cycle 1]: 1.872e-05, [1] [cse]: 1.309e-05 [environ_conv]: 6.15002e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 3.30998e-06 [label_micro_interleaved_index]: 5.70001e-06 [label_fine_grained_interleaved_index]: 2.68998e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.36998e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 9.29984e-07 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.253e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 4.12e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.32e-06 [overlap_grad_flash_sp]: 2.182e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.88e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 8.238e-05, [1] [Cycle 1]: 7.713e-05, [6] [build]: 3.38e-06 [elim_shapecalc]: 1.096e-05 [elim_not_effective]: 1.435e-05 [opt_reshape]: 7.4e-06 [fold_const_symbol]: 1.025e-05 [renormalize]: 2.59985e-07 [detach_backward]: 2.19999e-06 [pipeline_parallel_scheduler]: 1.63002e-06 [auto_monad_reorder]: 1.725e-05 [get_jit_bprop_graph]: 1.80001e-06 [rewriter_after_jit_bprop_graph]: 0.00022997 [opt_after_jit_grad]: 0.00068494 [validate]: 4.618e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00730201 [execute]: 8.37e-06 Sums bootstrap : 0.000586s : 2.86% type_inference : 0.007359s : 35.92% event_method : 0.000016s : 0.08% auto_monad : 0.000062s : 0.30% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000033s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000026s : 0.13% optimize.rewriter_before_opt_a : 0.000071s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.20% optimize.opt_a.loop_unroll : 0.000027s : 0.13% optimize.opt_a.a_1 : 0.000638s : 3.12% optimize.opt_a.with_stream_mark : 0.000030s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000146s : 0.71% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.05% optimize.opt_a.merge_send_recv : 0.000014s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000031s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.03% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.10% optimize.opt_a.a_after_grad : 0.000018s : 0.09% optimize.opt_a.renormalize : 0.000685s : 3.34% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.13% optimize.opt_a.cse : 0.000054s : 0.26% optimize.opt_a.a_3 : 0.000081s : 0.39% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.18% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000666s : 3.25% optimize.opt_b.b_1 : 0.000120s : 0.58% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000029s : 0.14% optimize.loop_unroll : 0.000579s : 2.83% optimize.opt_after_cconv.c_1 : 0.000031s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000025s : 0.12% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.07% optimize.tuple_transform.d_1 : 0.000048s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000075s : 0.36% optimize.cse_after_recomputation.cse : 0.000013s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000006s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000022s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000230s : 1.12% opt_after_jit_grad : 0.000685s : 3.34% validate : 0.000046s : 0.23% backend_pass : 0.000001s : 0.00% task_emit : 0.007302s : 35.64% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000206 30 14.96% : 0.000031s : 5: substitution.arithmetic_simplify 1.38% : 0.000003s : 2: substitution.elim_not_effective 0.62% : 0.000001s : 2: substitution.fold_const_symbol 2.98% : 0.000006s : 4: substitution.graph_param_transform 67.39% : 0.000139s : 3: substitution.inline 1.89% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.24% : 0.000005s : 4: substitution.remove_not_recompute_node 2.29% : 0.000005s : 4: substitution.replace_old_param 6.24% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007301 2 90.67% : 0.006619s : 1: type_inference.infer 9.33% : 0.000681s : 1: type_inference.specialize ------[replace.] 0.000044 5 70.11% : 0.000031s : 3: replace.inline 29.89% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000149 5 92.00% : 0.000137s : 3: match.inline 8.00% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000223 1131 0.80% : 0.000002s : 11: predicate.accumulaten_eliminater 20.73% : 0.000046s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000002s : 11: predicate.addn_zero_filter 0.60% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.74% : 0.000004s : 19: predicate.arithmetic_simplify 0.75% : 0.000002s : 11: predicate.cast_eliminate 0.64% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.18% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000001s : 8: predicate.depend_value_elim 0.69% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.83% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.72% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.18% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.77% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.80% : 0.000002s : 15: predicate.environ_get_depend_swap 1.38% : 0.000003s : 23: predicate.environ_get_eliminate 0.81% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.94% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.59% : 0.000004s : 16: predicate.float_depend_g_call 0.40% : 0.000001s : 8: predicate.float_environ_get_switch 0.63% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.14% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000001s : 8: predicate.get_grad_eliminate 0.17% : 0.000000s : 4: predicate.graph_param_transform 0.49% : 0.000001s : 8: predicate.incorporate_call 0.41% : 0.000001s : 8: predicate.incorporate_call_switch 4.68% : 0.000010s : 51: predicate.inline 0.65% : 0.000001s : 8: predicate.inline_without_move 0.26% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.71% : 0.000002s : 8: predicate.less_batch_normalization 1.37% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.91% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.65% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.43% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.50% : 0.000001s : 8: predicate.merge_addn 0.51% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.55% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.58% : 0.000001s : 11: predicate.minmaximum_grad 1.32% : 0.000003s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.33% : 0.000001s : 4: predicate.parallel_virtual_node 1.32% : 0.000003s : 16: predicate.partial_defer_inline 1.04% : 0.000002s : 17: predicate.partial_eliminate 0.73% : 0.000002s : 11: predicate.print_const_string_wrapper 0.57% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000002s : 11: predicate.reduce_eliminate 1.91% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000001s : 8: predicate.remove_not_recompute_node 1.12% : 0.000002s : 21: predicate.replace_applicator 0.39% : 0.000001s : 8: predicate.replace_old_param 0.26% : 0.000001s : 4: predicate.reset_defer_inline 0.70% : 0.000002s : 11: predicate.reshape_eliminate 0.73% : 0.000002s : 8: predicate.row_tensor_add_zeros_like 0.26% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000002s : 8: predicate.same_eliminate 0.41% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000002s : 8: predicate.shard_identity_eliminate 0.74% : 0.000002s : 8: predicate.special_op_eliminate 0.59% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.62% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.94% : 0.000002s : 16: predicate.switch_defer_inline 1.48% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.74% : 0.000008s : 54: predicate.switch_simplify 0.70% : 0.000002s : 11: predicate.tile_eliminate 0.72% : 0.000002s : 11: predicate.transpose_eliminate 1.32% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.25% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.18% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.11% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.35% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.70% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.48% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.62% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 8: predicate.virtual_output_eliminate 0.27% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.35% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000449 8 43.43% : 0.000195s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.57% : 0.000254s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.036632 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.06% : 0.004053s : 1: add_attr 11.02% : 0.004038s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.22% : 0.000079s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000068s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.71% : 0.000625s : 1: bootstrap 0.09% : 0.000033s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000012s : 1: convert_after_rewriter 0.07% : 0.000027s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000022s : 1: event_method 0.04% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.05% : 0.000019s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000009s : 1: label_micro_interleaved_index 1.61% : 0.000591s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.85% : 0.000679s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000018s : 1: opt.transform.mutable_eliminate 2.78% : 0.001017s : 78: opt.transform.opt_a 0.08% : 0.000030s : 1: opt.transform.opt_after_cconv 0.20% : 0.000073s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000098s : 28: opt.transform.opt_b 0.14% : 0.000052s : 2: opt.transform.opt_trans_graph 0.11% : 0.000039s : 4: opt.transform.symbol_engine_opt 7.06% : 0.002587s : 1: opt_a 0.33% : 0.000120s : 1: opt_after_cconv 1.91% : 0.000698s : 1: opt_after_jit_grad 0.58% : 0.000213s : 1: opt_b 13.62% : 0.004990s : 1: optimize 0.06% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000038s : 1: pre_auto_parallel 0.08% : 0.000030s : 1: py_interpret_to_execute 0.04% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 1.01% : 0.000372s : 1: renormalize.infer 0.83% : 0.000305s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.65% : 0.000239s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000041s : 1: rewriter_after_opt_a 0.20% : 0.000075s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000006s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000085s : 1: symbol_engine_optimizer 19.98% : 0.007319s : 1: task_emit 0.23% : 0.000083s : 1: tuple_transform 20.15% : 0.007380s : 1: type_inference 0.26% : 0.000095s : 1: validate TotalTime = 0.0591585, [24] [bootstrap]: 0.00046087 [type_inference]: 0.0206942 [event_method]: 1.043e-05 [auto_monad]: 5.086e-05 [graph_reusing]: 5.50001e-06 [inline]: 2.04e-06 [add_attr]: 0.00314241, [1] [add_attr_with_inline]: 0.00313433, [1] [Cycle 1]: 4.533e-05, [2] [tag_attr]: 1.179e-05 [meta_addattr_fg_expand]: 3.13e-06 [parallel-infer-symbol]: 3.25e-06 [pre_auto_parallel]: 2.146e-05 [insert-virtual-dataset]: 2.49999e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00384864, [53] [py_interpret_to_execute]: 1.532e-05 [rewriter_before_opt_a]: 3.796e-05 [opt_a]: 0.00194033, [2] [Cycle 1]: 0.00128558, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 2.462e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.00029424 [with_stream_mark]: 1.303e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.41001e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.5e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 2.39001e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.74e-06 [parallel]: 1.793e-05 [flash_sp]: 6.84999e-06 [merge_comm]: 3.67998e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 9.72999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 6.89999e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.76998e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 8.81002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.058e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 8.67998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.21998e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.60999e-06 [renormalize]: 0.00037452 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.4e-05 [cse]: 2.617e-05 [a_3]: 4.06e-05 [Cycle 2]: 0.00064481, [45] [expand_dump_flag]: 9.40025e-07 [switch_simplify]: 7.16001e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012163 [with_stream_mark]: 9.09e-06 [recompute_prepare]: 5.52999e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 1.04003e-06 [a_2]: 6.893e-05 [accelerated_algorithm]: 5.29998e-06 [shard]: 1.41002e-06 [meta_shard_fg_expand]: 1.33002e-06 [shard_inline]: 5.88002e-06 [merge_send_recv]: 5.24998e-06 [auto_parallel]: 6.01e-06 [parallel]: 4.63001e-06 [flash_sp]: 3.71001e-06 [merge_comm]: 3.27002e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 6.32001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.52001e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 5.30001e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.92002e-06 [cell_reuse_recompute_pass]: 1.55001e-06 [offload_activation]: 7.51999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.35999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 1.96e-06 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.49e-06 [after_resolve]: 9.81998e-06 [a_after_grad]: 7.95e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.74998e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 8.87e-06 [cse]: 1.604e-05 [a_3]: 3.211e-05 [py_interpret_to_execute_after_opt_a]: 9.19e-06 [slice_cell_reuse_recomputed_activation]: 1.99999e-06 [rewriter_after_opt_a]: 3.308e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.34e-06 [mutable_eliminate]: 0.00052185 [opt_b]: 0.00018045, [1] [Cycle 1]: 0.00017382, [7] [b_1]: 0.00010717 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 4.91002e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.14999e-06 [renormalize]: 2.89991e-07 [cse]: 1.587e-05 [optimize_parallel_all_gather_comm]: 1.614e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.418e-05 [loop_unroll]: 0.00042038 [opt_after_cconv]: 0.00010464, [1] [Cycle 1]: 9.869e-05, [7] [c_1]: 3.772e-05 [parameter_eliminate]: 2.10002e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.524e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.197e-05 [tuple_transform]: 6.919e-05, [1] [Cycle 1]: 6.437e-05, [4] [d_1]: 3.934e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.11998e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.396e-05 [cse_after_recomputation]: 2.05e-05, [1] [Cycle 1]: 1.59e-05, [1] [cse]: 1.053e-05 [environ_conv]: 4.50999e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 2.55002e-06 [label_micro_interleaved_index]: 4.75001e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.02999e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.35002e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.213e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.08002e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.758e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 2.10002e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.051e-05, [1] [Cycle 1]: 6.614e-05, [6] [build]: 2.97002e-06 [elim_shapecalc]: 9.23002e-06 [elim_not_effective]: 1.125e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.06e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 1.512e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.37002e-06 [opt_after_jit_grad]: 0.00045401 [validate]: 3.265e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0301793 [execute]: 8.07e-06 Sums bootstrap : 0.000461s : 0.84% type_inference : 0.020694s : 37.63% event_method : 0.000010s : 0.02% auto_monad : 0.000051s : 0.09% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000021s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000038s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.06% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.76% optimize.opt_a.with_stream_mark : 0.000022s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.26% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000375s : 0.68% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.04% optimize.opt_a.cse : 0.000042s : 0.08% optimize.opt_a.a_3 : 0.000073s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000522s : 0.95% optimize.opt_b.b_1 : 0.000107s : 0.19% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000420s : 0.76% optimize.opt_after_cconv.c_1 : 0.000038s : 0.07% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000454s : 0.83% validate : 0.000033s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.030179s : 54.88% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 19.08% : 0.000023s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.25% : 0.000005s : 4: substitution.graph_param_transform 65.10% : 0.000079s : 2: substitution.inline 2.12% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.64% : 0.000004s : 4: substitution.remove_not_recompute_node 3.40% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.020649 2 98.27% : 0.020291s : 1: type_inference.infer 1.73% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.40% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 2.11% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000008s : 44: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.62% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.96% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.98% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.96% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 1.07% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.17% : 0.000002s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 42.13% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.87% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.067457 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.66% : 0.003147s : 1: add_attr 4.65% : 0.003138s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000056s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.74% : 0.000497s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.64% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000531s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.14% : 0.000769s : 78: opt.transform.opt_a 0.05% : 0.000036s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.05% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.88% : 0.001944s : 1: opt_a 0.16% : 0.000108s : 1: opt_after_cconv 0.69% : 0.000464s : 1: opt_after_jit_grad 0.27% : 0.000184s : 1: opt_b 5.71% : 0.003853s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000026s : 1: pre_auto_parallel 0.03% : 0.000019s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.31% : 0.000210s : 1: renormalize.infer 0.23% : 0.000158s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000037s : 1: rewriter_after_opt_a 0.06% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000073s : 1: symbol_engine_optimizer 44.76% : 0.030193s : 1: task_emit 0.11% : 0.000072s : 1: tuple_transform 30.70% : 0.020712s : 1: type_inference 0.09% : 0.000062s : 1: validate TotalTime = 0.0361732, [24] [bootstrap]: 0.00047936 [type_inference]: 0.00565039 [event_method]: 1.349e-05 [auto_monad]: 5.486e-05 [graph_reusing]: 5.45001e-06 [inline]: 1.69998e-06 [add_attr]: 0.0191223, [1] [add_attr_with_inline]: 0.019112, [1] [Cycle 1]: 5.47e-05, [2] [tag_attr]: 1.701e-05 [meta_addattr_fg_expand]: 3.72002e-06 [parallel-infer-symbol]: 3.43999e-06 [pre_auto_parallel]: 2.937e-05 [insert-virtual-dataset]: 2.92002e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.44999e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.00414048, [53] [py_interpret_to_execute]: 2.145e-05 [rewriter_before_opt_a]: 6.156e-05 [opt_a]: 0.00222696, [2] [Cycle 1]: 0.00162883, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 3.197e-05 [loop_unroll]: 2.071e-05 [a_1]: 0.00046258 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 7.80998e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 7.772e-05 [accelerated_algorithm]: 6.14001e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 7.92e-06 [auto_parallel]: 6.51e-06 [parallel]: 1.892e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.66e-06 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 7.08998e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.82999e-06 [merge_forward]: 3.53999e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 9.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 2.17001e-06 [receive_attached]: 2.83e-06 [after_resolve]: 1.064e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00052182 [add_forward_monad_depend]: 4.20999e-06 [auto_monad_grad]: 2.31998e-06 [auto_monad_eliminator]: 1.378e-05 [cse]: 2.9e-05 [a_3]: 4.019e-05 [Cycle 2]: 0.00058868, [45] [expand_dump_flag]: 8.30012e-07 [switch_simplify]: 6.65002e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012496 [with_stream_mark]: 9.27001e-06 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.718e-05 [accelerated_algorithm]: 5.35999e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.04997e-06 [flash_sp]: 2.92002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.21998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.87001e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.17001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.08001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 8.95001e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.19001e-06 [cse]: 1.383e-05 [a_3]: 3.183e-05 [py_interpret_to_execute_after_opt_a]: 7.1e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.075e-05 [convert_after_rewriter]: 6.61e-06 [order_py_execute_after_rewriter]: 5.08002e-06 [mutable_eliminate]: 0.00048104 [opt_b]: 0.00018112, [1] [Cycle 1]: 0.00017533, [7] [b_1]: 0.00010813 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 3.59985e-07 [cse]: 1.598e-05 [optimize_parallel_all_gather_comm]: 1.523e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.158e-05 [loop_unroll]: 0.00046432 [opt_after_cconv]: 9.421e-05, [1] [Cycle 1]: 8.84e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.10002e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.569e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.273e-05 [tuple_transform]: 6.91e-05, [1] [Cycle 1]: 6.488e-05, [4] [d_1]: 3.924e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.50999e-06 [add_recomputation]: 4.386e-05 [cse_after_recomputation]: 2.087e-05, [1] [Cycle 1]: 1.634e-05, [1] [cse]: 1.117e-05 [environ_conv]: 4.16001e-06 [swap_dp_allreduce_reducescatter]: 5.07999e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.30999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.209e-05 [grouped_pairwise_exchange_alltoall]: 1.91e-06 [offloading_packed_experts]: 3.80998e-06 [overlap_recompute_and_grad_model_parallel]: 4.44998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 6.50005e-07 [split_matmul_comm_elemetwise]: 1.91998e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.809e-05, [1] [Cycle 1]: 6.383e-05, [6] [build]: 2.53e-06 [elim_shapecalc]: 8.3e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 5.96998e-06 [fold_const_symbol]: 8.55001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.585e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.42002e-06 [opt_after_jit_grad]: 0.00044995 [validate]: 3.266e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.00594921 [execute]: 6.91999e-06 Sums bootstrap : 0.000479s : 2.98% type_inference : 0.005650s : 35.12% event_method : 0.000013s : 0.08% auto_monad : 0.000055s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.18% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000062s : 0.38% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000588s : 3.65% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000522s : 3.24% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000481s : 2.99% optimize.opt_b.b_1 : 0.000108s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000464s : 2.89% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 2.80% validate : 0.000033s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005949s : 36.97% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000176 30 14.20% : 0.000025s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.69% : 0.000001s : 2: substitution.fold_const_symbol 3.00% : 0.000005s : 4: substitution.graph_param_transform 68.70% : 0.000121s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.23% : 0.000004s : 4: substitution.replace_old_param 5.91% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005610 2 90.31% : 0.005066s : 1: type_inference.infer 9.69% : 0.000544s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.17% : 0.000027s : 3: replace.inline 29.83% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000128 5 92.75% : 0.000119s : 3: match.inline 7.25% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.83% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.93% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.50% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.65% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.73% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.95% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.59% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.79% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 44.89% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.11% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061062 196 0.01% : 0.000004s : 1: ForceFp32Comm 31.32% : 0.019127s : 1: add_attr 31.31% : 0.019116s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.10% : 0.000060s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000516s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.77% : 0.000473s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.80% : 0.000490s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.56% : 0.000952s : 78: opt.transform.opt_a 0.04% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000090s : 28: opt.transform.opt_b 0.07% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000031s : 4: opt.transform.symbol_engine_opt 3.65% : 0.002230s : 1: opt_a 0.16% : 0.000098s : 1: opt_after_cconv 0.75% : 0.000459s : 1: opt_after_jit_grad 0.30% : 0.000184s : 1: opt_b 6.79% : 0.004145s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000033s : 1: pre_auto_parallel 0.04% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000016s : 1: remove_dup_value 0.47% : 0.000285s : 1: renormalize.infer 0.38% : 0.000230s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000035s : 1: rewriter_after_opt_a 0.11% : 0.000066s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000071s : 1: symbol_engine_optimizer 9.76% : 0.005959s : 1: task_emit 0.12% : 0.000072s : 1: tuple_transform 9.28% : 0.005665s : 1: type_inference 0.10% : 0.000061s : 1: validate TotalTime = 0.0700059, [24] [bootstrap]: 0.00051059 [type_inference]: 0.0114636 [event_method]: 4.743e-05 [auto_monad]: 0.00011908 [graph_reusing]: 8.35999e-06 [inline]: 2.43998e-06 [add_attr]: 0.019189, [1] [add_attr_with_inline]: 0.0191792, [1] [Cycle 1]: 8.15e-05, [2] [tag_attr]: 3.821e-05 [meta_addattr_fg_expand]: 9.20999e-06 [parallel-infer-symbol]: 3.31001e-06 [pre_auto_parallel]: 5.284e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.70001e-06 [pipeline_split]: 1.46002e-06 [optimize]: 0.0135784, [53] [py_interpret_to_execute]: 3.957e-05 [rewriter_before_opt_a]: 0.0001505 [opt_a]: 0.0112979, [3] [Cycle 1]: 0.0073475, [45] [expand_dump_flag]: 4e-06 [switch_simplify]: 7.396e-05 [loop_unroll]: 6.185e-05 [a_1]: 0.00146535 [with_stream_mark]: 2.463e-05 [recompute_prepare]: 2.242e-05 [updatestate_depend_eliminate]: 9.42001e-06 [updatestate_assign_eliminate]: 7.61999e-06 [updatestate_loads_eliminate]: 7.43e-06 [parameter_eliminate]: 2.74001e-06 [a_2]: 0.00024483 [accelerated_algorithm]: 3.035e-05 [shard]: 2.36e-06 [meta_shard_fg_expand]: 3.21999e-06 [shard_inline]: 1.634e-05 [merge_send_recv]: 5.793e-05 [auto_parallel]: 1.114e-05 [parallel]: 1.982e-05 [flash_sp]: 1.189e-05 [merge_comm]: 9.61e-06 [allreduce_fusion]: 8.82e-06 [matmul_add_comm_reduction]: 2.827e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.868e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.522e-05 [virtual_output]: 1.514e-05 [merge_forward]: 9.42999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.762e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.852e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.819e-05 [set_forward_comm_id_for_comm_node_pass]: 9.41e-06 [meta_fg_expand]: 0.00143027 [flash_sp_send_recv_attached]: 3.76001e-06 [receive_attached]: 2.21e-06 [after_resolve]: 6.171e-05 [a_after_grad]: 8.148e-05 [renormalize]: 0.00263118 [add_forward_monad_depend]: 9.37001e-06 [auto_monad_grad]: 5.97999e-06 [auto_monad_eliminator]: 5.537e-05 [cse]: 0.00016269 [a_3]: 0.00033606 [Cycle 2]: 0.00302952, [45] [expand_dump_flag]: 1.53002e-06 [switch_simplify]: 4.66e-05 [loop_unroll]: 4.387e-05 [a_1]: 0.0015235 [with_stream_mark]: 1.176e-05 [recompute_prepare]: 1.061e-05 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.16002e-06 [a_2]: 0.00012599 [accelerated_algorithm]: 1.197e-05 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.22001e-06 [merge_send_recv]: 6.71e-06 [auto_parallel]: 7.6e-06 [parallel]: 5.22e-06 [flash_sp]: 3.33e-06 [merge_comm]: 5.35001e-06 [allreduce_fusion]: 4.80001e-06 [matmul_add_comm_reduction]: 8.07e-06 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 1.015e-05 [virtual_dataset]: 8.64998e-06 [get_grad_eliminate_]: 8.64e-06 [virtual_output]: 8.76002e-06 [merge_forward]: 5.07e-06 [cell_reuse_recompute_pass]: 8.79983e-07 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.669e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.388e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 8.007e-05 [flash_sp_send_recv_attached]: 1.17e-06 [receive_attached]: 1.52001e-06 [after_resolve]: 1.624e-05 [a_after_grad]: 1.424e-05 [renormalize]: 0.00061232 [add_forward_monad_depend]: 4.02002e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.507e-05 [cse]: 4.631e-05 [a_3]: 6.542e-05 [Cycle 3]: 0.00090597, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.064e-05 [loop_unroll]: 8.90999e-06 [a_1]: 0.00024963 [with_stream_mark]: 1.014e-05 [recompute_prepare]: 9.14e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 3.88999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012608 [accelerated_algorithm]: 1.157e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 9.11002e-06 [merge_send_recv]: 6.61999e-06 [auto_parallel]: 7.15998e-06 [parallel]: 4.74998e-06 [flash_sp]: 1.12e-06 [merge_comm]: 4.95999e-06 [allreduce_fusion]: 4.97999e-06 [matmul_add_comm_reduction]: 7.41999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 9.85002e-06 [virtual_dataset]: 8.63001e-06 [get_grad_eliminate_]: 8.37998e-06 [virtual_output]: 8.33999e-06 [merge_forward]: 4.21001e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.672e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.439e-05 [set_forward_comm_id_for_comm_node_pass]: 5.85002e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.347e-05 [a_after_grad]: 1.452e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.047e-05 [cse]: 2.567e-05 [a_3]: 5.886e-05 [py_interpret_to_execute_after_opt_a]: 1.096e-05 [slice_cell_reuse_recomputed_activation]: 1.73002e-06 [rewriter_after_opt_a]: 4.816e-05 [convert_after_rewriter]: 9.32001e-06 [order_py_execute_after_rewriter]: 6.39001e-06 [mutable_eliminate]: 0.00047484 [opt_b]: 0.00028416, [1] [Cycle 1]: 0.00027796, [7] [b_1]: 0.00018685 [b_2]: 1.038e-05 [updatestate_depend_eliminate]: 7.31001e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.91001e-06 [renormalize]: 5.50004e-07 [cse]: 3.103e-05 [optimize_parallel_all_gather_comm]: 2.038e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 1.918e-05 [loop_unroll]: 0.0004242 [opt_after_cconv]: 0.00013488, [1] [Cycle 1]: 0.0001287, [7] [c_1]: 4.839e-05 [parameter_eliminate]: 2.35002e-06 [updatestate_depend_eliminate]: 7.12002e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.92002e-06 [cse]: 2.889e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 2.855e-05 [tuple_transform]: 0.00010013, [1] [Cycle 1]: 9.555e-05, [4] [d_1]: 6.561e-05 [none_parameter_eliminate]: 1.63002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.88998e-06 [partial_unused_args_eliminate]: 2.31998e-06 [add_recomputation]: 5.788e-05 [cse_after_recomputation]: 3.195e-05, [1] [Cycle 1]: 2.727e-05, [1] [cse]: 2.171e-05 [environ_conv]: 8.42e-06 [swap_dp_allreduce_reducescatter]: 7.71999e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.98e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.06998e-06 [micro_interleaved_order_control]: 2.92002e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.60019e-07 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.30013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.33002e-06 [overlap_opt_shard_in_pipeline]: 1.35999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.704e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 4.70999e-06 [overlap_recompute_and_grad_model_parallel]: 5.74999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 4.95001e-06 [overlap_grad_flash_sp]: 2.48e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.824e-05, [1] [Cycle 1]: 9.398e-05, [6] [build]: 9.89999e-06 [elim_shapecalc]: 1.299e-05 [elim_not_effective]: 1.814e-05 [opt_reshape]: 1.018e-05 [fold_const_symbol]: 1.508e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.00002e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.548e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.88999e-06 [opt_after_jit_grad]: 0.00046778 [validate]: 4.384e-05 [backend_pass]: 1.08001e-06 [task_emit]: 0.024253 [execute]: 7.5e-06 Sums bootstrap : 0.000511s : 1.03% type_inference : 0.011464s : 23.14% event_method : 0.000047s : 0.10% auto_monad : 0.000119s : 0.24% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000053s : 0.11% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.08% optimize.rewriter_before_opt_a : 0.000151s : 0.30% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.26% optimize.opt_a.loop_unroll : 0.000115s : 0.23% optimize.opt_a.a_1 : 0.003238s : 6.54% optimize.opt_a.with_stream_mark : 0.000047s : 0.09% optimize.opt_a.recompute_prepare : 0.000042s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.03% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000497s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.11% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.07% optimize.opt_a.merge_send_recv : 0.000071s : 0.14% optimize.opt_a.auto_parallel : 0.000026s : 0.05% optimize.opt_a.parallel : 0.000030s : 0.06% optimize.opt_a.flash_sp : 0.000016s : 0.03% optimize.opt_a.merge_comm : 0.000020s : 0.04% optimize.opt_a.allreduce_fusion : 0.000019s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.08% optimize.opt_a.virtual_dataset : 0.000033s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.07% optimize.opt_a.virtual_output : 0.000032s : 0.07% optimize.opt_a.merge_forward : 0.000019s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.07% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.04% optimize.opt_a.meta_fg_expand : 0.001513s : 3.05% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.18% optimize.opt_a.a_after_grad : 0.000110s : 0.22% optimize.opt_a.renormalize : 0.003244s : 6.55% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.03% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.16% optimize.opt_a.cse : 0.000235s : 0.47% optimize.opt_a.a_3 : 0.000460s : 0.93% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.10% optimize.convert_after_rewriter : 0.000009s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000475s : 0.96% optimize.opt_b.b_1 : 0.000187s : 0.38% optimize.opt_b.b_2 : 0.000010s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.04% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.04% optimize.loop_unroll : 0.000424s : 0.86% optimize.opt_after_cconv.c_1 : 0.000048s : 0.10% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.06% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000029s : 0.06% optimize.tuple_transform.d_1 : 0.000066s : 0.13% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.12% optimize.cse_after_recomputation.cse : 0.000022s : 0.04% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.05% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.05% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000468s : 0.94% validate : 0.000044s : 0.09% backend_pass : 0.000001s : 0.00% task_emit : 0.024253s : 48.95% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000774 222 5.91% : 0.000046s : 12: substitution.arithmetic_simplify 1.82% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.46% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000003s : 5: substitution.fold_const_symbol 0.95% : 0.000007s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.38% : 0.000436s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.63% : 0.000013s : 11: substitution.minmaximum_grad 0.79% : 0.000006s : 5: substitution.partial_eliminate 1.80% : 0.000014s : 20: substitution.remove_not_recompute_node 3.02% : 0.000023s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.73% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.37% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011388 2 87.13% : 0.009922s : 1: type_inference.infer 12.87% : 0.001466s : 1: type_inference.specialize ------[replace.] 0.000219 33 58.28% : 0.000128s : 17: replace.inline 41.72% : 0.000091s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000460 33 92.87% : 0.000427s : 17: match.inline 7.13% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.10% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.09% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000042s : 249: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.61% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.65% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.96% : 0.000015s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001633 34 55.21% : 0.000901s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.79% : 0.000731s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.111241 237 0.00% : 0.000004s : 1: ForceFp32Comm 17.25% : 0.019194s : 1: add_attr 17.24% : 0.019183s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.11% : 0.000126s : 1: auto_monad 0.03% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.49% : 0.000545s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.05% : 0.000054s : 1: event_method 0.01% : 0.000013s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.39% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.44% : 0.000484s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000017s : 1: opt.transform.mutable_eliminate 4.41% : 0.004908s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000172s : 28: opt.transform.opt_b 0.07% : 0.000073s : 2: opt.transform.opt_trans_graph 0.05% : 0.000053s : 4: opt.transform.symbol_engine_opt 10.16% : 0.011301s : 1: opt_a 0.12% : 0.000138s : 1: opt_after_cconv 0.43% : 0.000478s : 1: opt_after_jit_grad 0.26% : 0.000288s : 1: opt_b 12.21% : 0.013582s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.05% : 0.000058s : 1: pre_auto_parallel 0.04% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.57% : 0.001745s : 2: renormalize.infer 1.33% : 0.001485s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000052s : 1: rewriter_after_opt_a 0.14% : 0.000155s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000101s : 1: symbol_engine_optimizer 21.81% : 0.024266s : 1: task_emit 0.09% : 0.000103s : 1: tuple_transform 10.32% : 0.011479s : 1: type_inference 0.07% : 0.000079s : 1: validate TotalTime = 0.0197283, [24] [bootstrap]: 0.00047603 [type_inference]: 0.00443714 [event_method]: 1.048e-05 [auto_monad]: 5.058e-05 [graph_reusing]: 4.61002e-06 [inline]: 2.09999e-06 [add_attr]: 0.00306563, [1] [add_attr_with_inline]: 0.00305666, [1] [Cycle 1]: 5.26e-05, [2] [tag_attr]: 1.368e-05 [meta_addattr_fg_expand]: 3.16001e-06 [parallel-infer-symbol]: 3.52002e-06 [pre_auto_parallel]: 2.362e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.03002e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00399306, [53] [py_interpret_to_execute]: 1.639e-05 [rewriter_before_opt_a]: 4.027e-05 [opt_a]: 0.00203538, [2] [Cycle 1]: 0.00139565, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 2.507e-05 [loop_unroll]: 1.346e-05 [a_1]: 0.00030169 [with_stream_mark]: 1.541e-05 [recompute_prepare]: 8e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.655e-05 [accelerated_algorithm]: 6.96001e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 8.2e-06 [auto_parallel]: 6.63e-06 [parallel]: 1.815e-05 [flash_sp]: 9.61e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.11999e-06 [matmul_add_comm_reduction]: 1.036e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 8.1e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 6.06e-06 [merge_forward]: 4.53999e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.03e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 2.371e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.37999e-06 [flash_sp_send_recv_attached]: 3.11001e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.212e-05 [a_after_grad]: 8.83001e-06 [renormalize]: 0.0004198 [add_forward_monad_depend]: 4.89e-06 [auto_monad_grad]: 2.24999e-06 [auto_monad_eliminator]: 1.502e-05 [cse]: 2.869e-05 [a_3]: 4.231e-05 [Cycle 2]: 0.0006291, [45] [expand_dump_flag]: 1.77001e-06 [switch_simplify]: 7.58999e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012692 [with_stream_mark]: 1.177e-05 [recompute_prepare]: 5.73002e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.892e-05 [accelerated_algorithm]: 5.77999e-06 [shard]: 1.42999e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.84e-06 [auto_parallel]: 5.87001e-06 [parallel]: 5.18002e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 6.81999e-06 [allreduce_slice_to_reducescatter]: 4.90021e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.21998e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 3.23e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.41998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.002e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 8.32e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 1.87999e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.96998e-06 [after_resolve]: 1.011e-05 [a_after_grad]: 7.87e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.83997e-06 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 7.6e-06 [cse]: 1.631e-05 [a_3]: 3.258e-05 [py_interpret_to_execute_after_opt_a]: 9.44998e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.377e-05 [convert_after_rewriter]: 7.8e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00050312 [opt_b]: 0.00018903, [1] [Cycle 1]: 0.00018206, [7] [b_1]: 0.000107 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 7.06999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.66999e-06 [renormalize]: 4.69998e-07 [cse]: 1.983e-05 [optimize_parallel_all_gather_comm]: 1.784e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.379e-05 [loop_unroll]: 0.00045579 [opt_after_cconv]: 0.000101, [1] [Cycle 1]: 9.451e-05, [7] [c_1]: 2.752e-05 [parameter_eliminate]: 2.40002e-06 [updatestate_depend_eliminate]: 6.22001e-06 [updatestate_assign_eliminate]: 2.79999e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.954e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 6.996e-05, [1] [Cycle 1]: 6.56e-05, [4] [d_1]: 3.974e-05 [none_parameter_eliminate]: 1.53002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.08002e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.676e-05 [cse_after_recomputation]: 2.083e-05, [1] [Cycle 1]: 1.653e-05, [1] [cse]: 1.153e-05 [environ_conv]: 4.78001e-06 [swap_dp_allreduce_reducescatter]: 5.37999e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 5.00001e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.272e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.61002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.944e-05 [begin_end_overlap_inline]: 6.29982e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 7.622e-05, [1] [Cycle 1]: 7.116e-05, [6] [build]: 3.5e-06 [elim_shapecalc]: 9.24e-06 [elim_not_effective]: 1.19e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 9.03002e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.86e-06 [auto_monad_reorder]: 1.554e-05 [get_jit_bprop_graph]: 1.58002e-06 [rewriter_after_jit_bprop_graph]: 3.83001e-06 [opt_after_jit_grad]: 0.00049669 [validate]: 3.589e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00683162 [execute]: 7.66001e-06 Sums bootstrap : 0.000476s : 3.05% type_inference : 0.004437s : 28.46% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.26% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000033s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000429s : 2.75% optimize.opt_a.with_stream_mark : 0.000027s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000013s : 0.09% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000008s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000032s : 0.21% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000022s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000420s : 2.69% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.15% optimize.opt_a.cse : 0.000045s : 0.29% optimize.opt_a.a_3 : 0.000075s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.22% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000503s : 3.23% optimize.opt_b.b_1 : 0.000107s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.13% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000456s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000020s : 0.13% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000497s : 3.19% validate : 0.000036s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006832s : 43.82% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000129 26 19.24% : 0.000025s : 4: substitution.arithmetic_simplify 1.36% : 0.000002s : 2: substitution.elim_not_effective 1.21% : 0.000002s : 2: substitution.fold_const_symbol 4.19% : 0.000005s : 4: substitution.graph_param_transform 64.33% : 0.000083s : 2: substitution.inline 2.46% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.20% : 0.000004s : 4: substitution.remove_not_recompute_node 4.02% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004393 2 92.03% : 0.004043s : 1: type_inference.infer 7.97% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000082 2 100.00% : 0.000082s : 2: match.inline ------[predicate.] 0.000141 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.16% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.51% : 0.000004s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.76% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.93% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.58% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.99% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.05% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000002s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.43% : 0.000001s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.66% : 0.000008s : 44: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.93% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.65% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 2.04% : 0.000003s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.16% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.03% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.80% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.89% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 1.09% : 0.000002s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.15% : 0.000002s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.27% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.66% : 0.000007s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.50% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.99% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000251 6 41.72% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.28% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028173 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.90% : 0.003070s : 1: add_attr 10.86% : 0.003060s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.82% : 0.000512s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.16% : 0.000046s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.65% : 0.000466s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.82% : 0.000514s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.06% : 0.000016s : 1: opt.transform.mutable_eliminate 2.86% : 0.000806s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.09% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.24% : 0.002038s : 1: opt_a 0.37% : 0.000105s : 1: opt_after_cconv 1.80% : 0.000508s : 1: opt_after_jit_grad 0.68% : 0.000193s : 1: opt_b 14.19% : 0.003998s : 1: optimize 0.08% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.05% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.85% : 0.000241s : 1: renormalize.infer 0.61% : 0.000172s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000039s : 1: rewriter_after_opt_a 0.16% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000079s : 1: symbol_engine_optimizer 24.30% : 0.006847s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 15.80% : 0.004453s : 1: type_inference 0.24% : 0.000068s : 1: validate TotalTime = 0.0382431, [24] [bootstrap]: 0.00052178 [type_inference]: 0.0109743 [event_method]: 4.188e-05 [auto_monad]: 0.00013341 [graph_reusing]: 8.50001e-06 [inline]: 1.63002e-06 [add_attr]: 0.00308796, [1] [add_attr_with_inline]: 0.00307899, [1] [Cycle 1]: 7.475e-05, [2] [tag_attr]: 3.349e-05 [meta_addattr_fg_expand]: 8.64e-06 [parallel-infer-symbol]: 3.69002e-06 [pre_auto_parallel]: 4.859e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0140252, [53] [py_interpret_to_execute]: 3.591e-05 [rewriter_before_opt_a]: 0.00012882 [opt_a]: 0.0113957, [3] [Cycle 1]: 0.00733971, [45] [expand_dump_flag]: 3.93001e-06 [switch_simplify]: 6.651e-05 [loop_unroll]: 5.426e-05 [a_1]: 0.00141083 [with_stream_mark]: 2.644e-05 [recompute_prepare]: 2.356e-05 [updatestate_depend_eliminate]: 9.07001e-06 [updatestate_assign_eliminate]: 8.03001e-06 [updatestate_loads_eliminate]: 7.61999e-06 [parameter_eliminate]: 2.77002e-06 [a_2]: 0.00024892 [accelerated_algorithm]: 3.262e-05 [shard]: 1.92001e-06 [meta_shard_fg_expand]: 3.73999e-06 [shard_inline]: 1.587e-05 [merge_send_recv]: 1.642e-05 [auto_parallel]: 1.163e-05 [parallel]: 1.94e-05 [flash_sp]: 1.244e-05 [merge_comm]: 9.72999e-06 [allreduce_fusion]: 8.79e-06 [matmul_add_comm_reduction]: 2.961e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.896e-05 [virtual_dataset]: 1.542e-05 [get_grad_eliminate_]: 1.531e-05 [virtual_output]: 1.497e-05 [merge_forward]: 9.69999e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 1.798e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.941e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 2.736e-05 [set_forward_comm_id_for_comm_node_pass]: 9.79e-06 [meta_fg_expand]: 0.00146573 [flash_sp_send_recv_attached]: 3.55998e-06 [receive_attached]: 2.23002e-06 [after_resolve]: 6.093e-05 [a_after_grad]: 8.21e-05 [renormalize]: 0.00266189 [add_forward_monad_depend]: 1.096e-05 [auto_monad_grad]: 5.79999e-06 [auto_monad_eliminator]: 5.757e-05 [cse]: 0.00016162 [a_3]: 0.00033704 [Cycle 2]: 0.00312356, [45] [expand_dump_flag]: 2.16e-06 [switch_simplify]: 4.703e-05 [loop_unroll]: 4.358e-05 [a_1]: 0.00158939 [with_stream_mark]: 1.549e-05 [recompute_prepare]: 1.204e-05 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 4.51002e-06 [updatestate_loads_eliminate]: 3.88999e-06 [parameter_eliminate]: 1.45999e-06 [a_2]: 0.00012782 [accelerated_algorithm]: 1.26e-05 [shard]: 1.72001e-06 [meta_shard_fg_expand]: 2.04999e-06 [shard_inline]: 9.19998e-06 [merge_send_recv]: 8.37e-06 [auto_parallel]: 8.19998e-06 [parallel]: 6.85998e-06 [flash_sp]: 3.35e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.85001e-06 [matmul_add_comm_reduction]: 1.005e-05 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 1.052e-05 [virtual_dataset]: 9.07999e-06 [get_grad_eliminate_]: 8.84998e-06 [virtual_output]: 8.54e-06 [merge_forward]: 4.75001e-06 [cell_reuse_recompute_pass]: 9.09989e-07 [offload_activation]: 1.069e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.691e-05 [merge_recompute_call_nodes]: 8.80013e-07 [before_grad]: 1.421e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 4.433e-05 [flash_sp_send_recv_attached]: 1.19003e-06 [receive_attached]: 1.74e-06 [after_resolve]: 1.546e-05 [a_after_grad]: 1.52e-05 [renormalize]: 0.00064052 [add_forward_monad_depend]: 4.46002e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.486e-05 [cse]: 4.826e-05 [a_3]: 6.614e-05 [Cycle 3]: 0.00091598, [45] [expand_dump_flag]: 1.30999e-06 [switch_simplify]: 1.068e-05 [loop_unroll]: 9.12999e-06 [a_1]: 0.00025115 [with_stream_mark]: 1.101e-05 [recompute_prepare]: 9.51e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 4.05998e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012315 [accelerated_algorithm]: 1.187e-05 [shard]: 1.38002e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 8.97e-06 [merge_send_recv]: 8.1e-06 [auto_parallel]: 7.53e-06 [parallel]: 5.02999e-06 [flash_sp]: 1.49e-06 [merge_comm]: 5.37999e-06 [allreduce_fusion]: 4.84003e-06 [matmul_add_comm_reduction]: 8.64e-06 [allreduce_slice_to_reducescatter]: 4.89992e-07 [virtual_shard_identity]: 9.92999e-06 [virtual_dataset]: 8.60001e-06 [get_grad_eliminate_]: 8.38999e-06 [virtual_output]: 8.05e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 9.54999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.664e-05 [merge_recompute_call_nodes]: 8.40024e-07 [before_grad]: 1.395e-05 [set_forward_comm_id_for_comm_node_pass]: 5.02999e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 9.79984e-07 [receive_attached]: 1.72001e-06 [after_resolve]: 1.393e-05 [a_after_grad]: 1.482e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.50999e-06 [auto_monad_grad]: 1.42999e-06 [auto_monad_eliminator]: 1.21e-05 [cse]: 2.696e-05 [a_3]: 5.662e-05 [py_interpret_to_execute_after_opt_a]: 1.351e-05 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 5.186e-05 [convert_after_rewriter]: 9.07001e-06 [order_py_execute_after_rewriter]: 6.88998e-06 [mutable_eliminate]: 0.00052722 [opt_b]: 0.00029367, [1] [Cycle 1]: 0.00028569, [7] [b_1]: 0.00018801 [b_2]: 1.055e-05 [updatestate_depend_eliminate]: 8.44002e-06 [updatestate_assign_eliminate]: 4.09002e-06 [updatestate_loads_eliminate]: 4.23999e-06 [renormalize]: 4.80009e-07 [cse]: 3.408e-05 [optimize_parallel_all_gather_comm]: 2.1e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.29e-05 [loop_unroll]: 0.00069636 [opt_after_cconv]: 0.00013968, [1] [Cycle 1]: 0.00013254, [7] [c_1]: 4.859e-05 [parameter_eliminate]: 2.71e-06 [updatestate_depend_eliminate]: 7.46001e-06 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 3.174e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 3.096e-05 [tuple_transform]: 0.00010337, [1] [Cycle 1]: 9.798e-05, [4] [d_1]: 6.77e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.78998e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 6.036e-05 [cse_after_recomputation]: 3.157e-05, [1] [Cycle 1]: 2.687e-05, [1] [cse]: 2.114e-05 [environ_conv]: 9.29e-06 [swap_dp_allreduce_reducescatter]: 7.87e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.13002e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.32e-06 [interleave_split_concat_branches]: 1.40999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92001e-06 [control_data_broadcast_order]: 1.762e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.57999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 5.17999e-06 [overlap_grad_flash_sp]: 2.628e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.04998e-06 [symbol_engine_optimizer]: 0.00010282, [1] [Cycle 1]: 9.838e-05, [6] [build]: 1.042e-05 [elim_shapecalc]: 1.482e-05 [elim_not_effective]: 1.86e-05 [opt_reshape]: 9.92001e-06 [fold_const_symbol]: 1.482e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 2.523e-05 [get_jit_bprop_graph]: 1.49e-06 [rewriter_after_jit_bprop_graph]: 4.18001e-06 [opt_after_jit_grad]: 0.00050924 [validate]: 5.895e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0085404 [execute]: 7.75e-06 Sums bootstrap : 0.000522s : 1.54% type_inference : 0.010974s : 32.46% event_method : 0.000042s : 0.12% auto_monad : 0.000133s : 0.39% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000049s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000129s : 0.38% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.37% optimize.opt_a.loop_unroll : 0.000107s : 0.32% optimize.opt_a.a_1 : 0.003251s : 9.62% optimize.opt_a.with_stream_mark : 0.000053s : 0.16% optimize.opt_a.recompute_prepare : 0.000045s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000500s : 1.48% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.17% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000033s : 0.10% optimize.opt_a.auto_parallel : 0.000027s : 0.08% optimize.opt_a.parallel : 0.000031s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000038s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001513s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000112s : 0.33% optimize.opt_a.renormalize : 0.003302s : 9.77% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.25% optimize.opt_a.cse : 0.000237s : 0.70% optimize.opt_a.a_3 : 0.000460s : 1.36% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000052s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000527s : 1.56% optimize.opt_b.b_1 : 0.000188s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.07% optimize.loop_unroll : 0.000696s : 2.06% optimize.opt_after_cconv.c_1 : 0.000049s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000032s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000026s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000509s : 1.51% validate : 0.000059s : 0.17% backend_pass : 0.000001s : 0.00% task_emit : 0.008540s : 25.26% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000793 218 5.92% : 0.000047s : 11: substitution.arithmetic_simplify 2.12% : 0.000017s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.71% : 0.000442s : 16: substitution.inline 2.22% : 0.000018s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000016s : 3: substitution.less_batch_normalization 1.72% : 0.000014s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.75% : 0.000014s : 20: substitution.remove_not_recompute_node 3.06% : 0.000024s : 10: substitution.replace_applicator 1.48% : 0.000012s : 15: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.48% : 0.000067s : 28: substitution.tuple_list_get_item_eliminator 2.37% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010895 2 87.90% : 0.009577s : 1: type_inference.infer 12.10% : 0.001318s : 1: type_inference.specialize ------[replace.] 0.000232 30 55.43% : 0.000129s : 16: replace.inline 44.57% : 0.000103s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000466 30 93.02% : 0.000433s : 16: match.inline 6.98% : 0.000033s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000740 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000008s : 67: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.51% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.19% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000016s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 164: predicate.load_eliminater 0.41% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 97: predicate.partial_defer_inline 1.68% : 0.000012s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 67: predicate.reduce_eliminate 2.64% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.66% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.19% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.81% : 0.000006s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.89% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.43% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.38% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001553 32 58.15% : 0.000903s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.85% : 0.000650s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063895 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.84% : 0.003093s : 1: add_attr 4.82% : 0.003083s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000141s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.87% : 0.000559s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 1.11% : 0.000709s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.84% : 0.000537s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.69% : 0.004914s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000055s : 4: opt.transform.symbol_engine_opt 17.84% : 0.011399s : 1: opt_a 0.22% : 0.000144s : 1: opt_after_cconv 0.81% : 0.000520s : 1: opt_after_jit_grad 0.47% : 0.000297s : 1: opt_b 21.96% : 0.014030s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000053s : 1: pre_auto_parallel 0.06% : 0.000040s : 1: py_interpret_to_execute 0.03% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000035s : 1: remove_dup_value 2.88% : 0.001841s : 2: renormalize.infer 2.26% : 0.001445s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000058s : 1: rewriter_after_opt_a 0.21% : 0.000134s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000105s : 1: symbol_engine_optimizer 13.39% : 0.008554s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 17.21% : 0.010995s : 1: type_inference 0.16% : 0.000099s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x8-kbk],max_mem:6.0M . TotalTime = 3.0024, [24] [bootstrap]: 0.00055786 [type_inference]: 0.00623078 [event_method]: 1.455e-05 [auto_monad]: 5.491e-05 [graph_reusing]: 5.56e-06 [inline]: 1.71e-06 [add_attr]: 0.0035098, [1] [add_attr_with_inline]: 0.0034972, [1] [Cycle 1]: 5.175e-05, [2] [tag_attr]: 1.659e-05 [meta_addattr_fg_expand]: 3.95e-06 [parallel-infer-symbol]: 3.64002e-06 [pre_auto_parallel]: 2.833e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.32001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.004174, [53] [py_interpret_to_execute]: 2.225e-05 [rewriter_before_opt_a]: 6.015e-05 [opt_a]: 0.0022518, [2] [Cycle 1]: 0.00158905, [45] [expand_dump_flag]: 2.49001e-06 [switch_simplify]: 3.328e-05 [loop_unroll]: 2.098e-05 [a_1]: 0.00045742 [with_stream_mark]: 1.281e-05 [recompute_prepare]: 7.65998e-06 [updatestate_depend_eliminate]: 4.42e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.415e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 6.14001e-06 [parallel]: 2.407e-05 [flash_sp]: 7.61001e-06 [merge_comm]: 3.98999e-06 [allreduce_fusion]: 3.30003e-06 [matmul_add_comm_reduction]: 9.57999e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.62002e-06 [virtual_dataset]: 6.22001e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.5e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.99001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.36998e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.071e-05 [a_after_grad]: 8.95999e-06 [renormalize]: 0.00047029 [add_forward_monad_depend]: 5.10999e-06 [auto_monad_grad]: 2.27001e-06 [auto_monad_eliminator]: 1.569e-05 [cse]: 2.833e-05 [a_3]: 4.108e-05 [Cycle 2]: 0.00065206, [45] [expand_dump_flag]: 1.44e-06 [switch_simplify]: 7.8e-06 [loop_unroll]: 5.19e-06 [a_1]: 0.00017778 [with_stream_mark]: 1.067e-05 [recompute_prepare]: 6.08998e-06 [updatestate_depend_eliminate]: 2.99001e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 6.788e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.41002e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.99e-06 [parallel]: 4.52e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.40999e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 5.82001e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 6.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.14998e-06 [merge_recompute_call_nodes]: 8.90024e-07 [before_grad]: 8.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.28002e-06 [after_resolve]: 9.88002e-06 [a_after_grad]: 7.68001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.40001e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 7.39002e-06 [cse]: 1.436e-05 [a_3]: 3.151e-05 [py_interpret_to_execute_after_opt_a]: 8.77999e-06 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 3.269e-05 [convert_after_rewriter]: 6.41998e-06 [order_py_execute_after_rewriter]: 5.47999e-06 [mutable_eliminate]: 0.00048797 [opt_b]: 0.00019132, [1] [Cycle 1]: 0.00018461, [7] [b_1]: 0.00011265 [b_2]: 6.89999e-06 [updatestate_depend_eliminate]: 6.98998e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.51998e-06 [renormalize]: 5.49975e-07 [cse]: 1.751e-05 [optimize_parallel_all_gather_comm]: 1.645e-05 [overlap_param_gather]: 1.78002e-06 [cconv]: 2.431e-05 [loop_unroll]: 0.0004258 [opt_after_cconv]: 9.708e-05, [1] [Cycle 1]: 9.044e-05, [7] [c_1]: 2.696e-05 [parameter_eliminate]: 2.79999e-06 [updatestate_depend_eliminate]: 5.78002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.697e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.946e-05, [1] [Cycle 1]: 6.492e-05, [4] [d_1]: 3.921e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.02001e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 5.206e-05 [cse_after_recomputation]: 2.009e-05, [1] [Cycle 1]: 1.574e-05, [1] [cse]: 1.074e-05 [environ_conv]: 4.59998e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.23998e-06 [label_micro_interleaved_index]: 4.66002e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.33002e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 2.93998e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.216e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 3.3e-06 [overlap_recompute_and_grad_model_parallel]: 4.27998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 4.14002e-06 [overlap_grad_flash_sp]: 1.813e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.89e-06 [handle_group_info]: 9.69972e-07 [symbol_engine_optimizer]: 6.983e-05, [1] [Cycle 1]: 6.515e-05, [6] [build]: 2.93e-06 [elim_shapecalc]: 9.22001e-06 [elim_not_effective]: 1.118e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 9.09e-06 [renormalize]: 2.09984e-07 [detach_backward]: 2.11998e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 4.69002e-06 [opt_after_jit_grad]: 0.00047887 [validate]: 3.324e-05 [backend_pass]: 8.39995e-07 [task_emit]: 2.98704 [execute]: 8.55001e-06 Sums bootstrap : 0.000558s : 0.02% type_inference : 0.006231s : 0.21% event_method : 0.000015s : 0.00% auto_monad : 0.000055s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.00% optimize.rewriter_before_opt_a : 0.000060s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000635s : 0.02% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000470s : 0.02% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000043s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.00% optimize.convert_after_rewriter : 0.000006s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000488s : 0.02% optimize.opt_b.b_1 : 0.000113s : 0.00% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000426s : 0.01% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000479s : 0.02% validate : 0.000033s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 2.987038s : 99.64% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000172 30 15.68% : 0.000027s : 5: substitution.arithmetic_simplify 1.02% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000006s : 4: substitution.graph_param_transform 66.24% : 0.000114s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.35% : 0.000004s : 4: substitution.remove_not_recompute_node 2.60% : 0.000004s : 4: substitution.replace_old_param 6.37% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006180 2 90.96% : 0.005621s : 1: type_inference.infer 9.04% : 0.000558s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.86% : 0.000027s : 3: replace.inline 29.14% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 91.84% : 0.000112s : 3: match.inline 8.16% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000210 1131 0.62% : 0.000001s : 11: predicate.accumulaten_eliminater 0.65% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.43% : 0.000001s : 8: predicate.addn_check_dump 0.60% : 0.000001s : 11: predicate.addn_zero_filter 0.59% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.63% : 0.000003s : 19: predicate.arithmetic_simplify 0.69% : 0.000001s : 11: predicate.cast_eliminate 0.53% : 0.000001s : 8: predicate.check_bprop_eliminate 0.42% : 0.000001s : 8: predicate.compare_switch_simplify 0.19% : 0.000000s : 4: predicate.const_output_eliminate 0.45% : 0.000001s : 8: predicate.depend_value_elim 0.64% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.73% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.68% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 4: predicate.elim_not_effective 0.31% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.91% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.80% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.80% : 0.000002s : 15: predicate.environ_get_depend_swap 1.33% : 0.000003s : 23: predicate.environ_get_eliminate 0.82% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.96% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.68% : 0.000004s : 16: predicate.float_depend_g_call 0.42% : 0.000001s : 8: predicate.float_environ_get_switch 0.65% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000001s : 8: predicate.get_grad_eliminate 0.19% : 0.000000s : 4: predicate.graph_param_transform 0.53% : 0.000001s : 8: predicate.incorporate_call 0.42% : 0.000001s : 8: predicate.incorporate_call_switch 4.51% : 0.000009s : 51: predicate.inline 0.59% : 0.000001s : 8: predicate.inline_without_move 0.30% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.61% : 0.000001s : 8: predicate.less_batch_normalization 1.47% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.83% : 0.000004s : 32: predicate.load_eliminater 0.94% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.62% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.36% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.47% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.51% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.59% : 0.000001s : 11: predicate.minmaximum_grad 1.27% : 0.000003s : 4: predicate.mutable_eliminate 0.27% : 0.000001s : 4: predicate.opt_reshape 0.29% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000003s : 16: predicate.partial_defer_inline 24.91% : 0.000052s : 17: predicate.partial_eliminate 0.65% : 0.000001s : 11: predicate.print_const_string_wrapper 0.48% : 0.000001s : 8: predicate.reduce_all_const_elim 0.79% : 0.000002s : 11: predicate.reduce_eliminate 1.81% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.41% : 0.000001s : 8: predicate.remove_not_recompute_node 1.13% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.23% : 0.000000s : 4: predicate.reset_defer_inline 0.68% : 0.000001s : 11: predicate.reshape_eliminate 0.54% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.31% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000002s : 8: predicate.same_eliminate 0.40% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.74% : 0.000002s : 8: predicate.shard_identity_eliminate 0.58% : 0.000001s : 8: predicate.special_op_eliminate 0.57% : 0.000001s : 8: predicate.specialize_transform 0.80% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.59% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000002s : 16: predicate.switch_defer_inline 1.50% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.70% : 0.000008s : 54: predicate.switch_simplify 0.61% : 0.000001s : 11: predicate.tile_eliminate 0.74% : 0.000002s : 11: predicate.transpose_eliminate 1.14% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.25% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.00% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.60% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.04% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.58% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.22% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.74% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.38% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.29% : 0.000001s : 4: predicate.value_based_eliminate 0.54% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.56% : 0.000001s : 8: predicate.virtual_output_eliminate 0.27% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.40% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 46.71% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.29% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 3.011703 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.12% : 0.003515s : 1: add_attr 0.12% : 0.003501s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000060s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.02% : 0.000599s : 1: bootstrap 0.00% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000009s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.01% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.02% : 0.000498s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.03% : 0.001001s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000093s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.07% : 0.002255s : 1: opt_a 0.00% : 0.000100s : 1: opt_after_cconv 0.02% : 0.000490s : 1: opt_after_jit_grad 0.01% : 0.000195s : 1: opt_b 0.14% : 0.004178s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000026s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.01% : 0.000250s : 1: renormalize.infer 0.01% : 0.000213s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000037s : 1: rewriter_after_opt_a 0.00% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000072s : 1: symbol_engine_optimizer 99.18% : 2.987060s : 1: task_emit 0.00% : 0.000072s : 1: tuple_transform 0.21% : 0.006248s : 1: type_inference 0.00% : 0.000057s : 1: validate TotalTime = 0.376211, [24] [bootstrap]: 0.00049794 [type_inference]: 0.00442129 [event_method]: 1.076e-05 [auto_monad]: 4.86e-05 [graph_reusing]: 5.36998e-06 [inline]: 1.87999e-06 [add_attr]: 0.00301097, [1] [add_attr_with_inline]: 0.00300333, [1] [Cycle 1]: 4.727e-05, [2] [tag_attr]: 1.202e-05 [meta_addattr_fg_expand]: 3.36999e-06 [parallel-infer-symbol]: 3.01999e-06 [pre_auto_parallel]: 2.057e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 2.00002e-06 [optimize]: 0.00375812, [53] [py_interpret_to_execute]: 1.585e-05 [rewriter_before_opt_a]: 3.776e-05 [opt_a]: 0.00188398, [2] [Cycle 1]: 0.00128403, [45] [expand_dump_flag]: 2.41e-06 [switch_simplify]: 2.347e-05 [loop_unroll]: 1.338e-05 [a_1]: 0.0002915 [with_stream_mark]: 1.601e-05 [recompute_prepare]: 7.56999e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 7.739e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 1.90001e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 6.19001e-06 [parallel]: 1.82e-05 [flash_sp]: 7.15e-06 [merge_comm]: 3.47002e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 8.48999e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 6.17999e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.99002e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 8.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 1.90001e-06 [before_grad]: 9.48002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.26998e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.054e-05 [a_after_grad]: 8.76002e-06 [renormalize]: 0.00037216 [add_forward_monad_depend]: 4.31002e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.704e-05 [a_3]: 4.029e-05 [Cycle 2]: 0.00059028, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012437 [with_stream_mark]: 8.75001e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.23998e-06 [updatestate_loads_eliminate]: 2.72001e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 6.701e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.14003e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.37998e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.28001e-06 [flash_sp]: 3.53e-06 [merge_comm]: 3.00998e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.04999e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.11997e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.35002e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.28002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 8.92999e-06 [a_after_grad]: 7.9e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.593e-05 [a_3]: 3.148e-05 [py_interpret_to_execute_after_opt_a]: 8.70001e-06 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 3.218e-05 [convert_after_rewriter]: 6.93998e-06 [order_py_execute_after_rewriter]: 5.43002e-06 [mutable_eliminate]: 0.0004482 [opt_b]: 0.00018003, [1] [Cycle 1]: 0.00017407, [7] [b_1]: 0.00010678 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 4.85001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 3.30008e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.852e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.293e-05 [loop_unroll]: 0.00040949 [opt_after_cconv]: 9.395e-05, [1] [Cycle 1]: 8.813e-05, [7] [c_1]: 2.775e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.569e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.316e-05 [tuple_transform]: 6.904e-05, [1] [Cycle 1]: 6.469e-05, [4] [d_1]: 3.906e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.371e-05 [cse_after_recomputation]: 1.936e-05, [1] [Cycle 1]: 1.504e-05, [1] [cse]: 1.03e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.53001e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.56998e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64998e-06 [control_data_broadcast_order]: 1.145e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.65999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.02999e-06 [overlap_grad_ring_attention]: 4.63001e-06 [overlap_grad_flash_sp]: 1.74e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 0.00013703, [1] [Cycle 1]: 0.00013264, [6] [build]: 2.33998e-06 [elim_shapecalc]: 8.39998e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 8.82999e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.569e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00044384 [validate]: 3.217e-05 [backend_pass]: 8.09989e-07 [task_emit]: 0.363703 [execute]: 9.57001e-06 Sums bootstrap : 0.000498s : 0.13% type_inference : 0.004421s : 1.19% event_method : 0.000011s : 0.00% auto_monad : 0.000049s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.00% optimize.rewriter_before_opt_a : 0.000038s : 0.01% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000030s : 0.01% optimize.opt_a.loop_unroll : 0.000019s : 0.01% optimize.opt_a.a_1 : 0.000416s : 0.11% optimize.opt_a.with_stream_mark : 0.000025s : 0.01% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000022s : 0.01% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000372s : 0.10% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.01% optimize.opt_a.cse : 0.000043s : 0.01% optimize.opt_a.a_3 : 0.000072s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000448s : 0.12% optimize.opt_b.b_1 : 0.000107s : 0.03% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.01% optimize.loop_unroll : 0.000409s : 0.11% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.12% validate : 0.000032s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.363703s : 97.73% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000121 26 17.95% : 0.000022s : 4: substitution.arithmetic_simplify 1.50% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.68% : 0.000006s : 4: substitution.graph_param_transform 65.79% : 0.000080s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.46% : 0.000004s : 4: substitution.remove_not_recompute_node 3.33% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004379 2 91.59% : 0.004011s : 1: type_inference.infer 8.41% : 0.000368s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.72% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.74% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 1.12% : 0.000002s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.20% : 0.000009s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.40% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.99% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.65% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.93% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000267 6 41.98% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.02% : 0.000155s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.384270 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.78% : 0.003015s : 1: add_attr 0.78% : 0.003007s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000054s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.14% : 0.000534s : 1: bootstrap 0.01% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000017s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.11% : 0.000418s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.12% : 0.000457s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.20% : 0.000766s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000090s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.01% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.49% : 0.001887s : 1: opt_a 0.03% : 0.000097s : 1: opt_after_cconv 0.12% : 0.000453s : 1: opt_after_jit_grad 0.05% : 0.000183s : 1: opt_b 0.98% : 0.003763s : 1: optimize 0.01% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000025s : 1: pre_auto_parallel 0.01% : 0.000020s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.05% : 0.000207s : 1: renormalize.infer 0.04% : 0.000159s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000042s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000140s : 1: symbol_engine_optimizer 94.65% : 0.363725s : 1: task_emit 0.02% : 0.000072s : 1: tuple_transform 1.15% : 0.004435s : 1: type_inference 0.01% : 0.000055s : 1: validate TotalTime = 0.338225, [24] [bootstrap]: 0.00049494 [type_inference]: 0.00631363 [event_method]: 1.46e-05 [auto_monad]: 5.679e-05 [graph_reusing]: 5.90002e-06 [inline]: 3.25002e-06 [add_attr]: 0.00355759, [1] [add_attr_with_inline]: 0.00354832, [1] [Cycle 1]: 5.606e-05, [2] [tag_attr]: 1.836e-05 [meta_addattr_fg_expand]: 4.75999e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 2.96e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.00442692, [53] [py_interpret_to_execute]: 2.315e-05 [rewriter_before_opt_a]: 6.468e-05 [opt_a]: 0.00234119, [2] [Cycle 1]: 0.00172372, [45] [expand_dump_flag]: 2.77002e-06 [switch_simplify]: 3.307e-05 [loop_unroll]: 2.168e-05 [a_1]: 0.00046425 [with_stream_mark]: 1.492e-05 [recompute_prepare]: 7.92e-06 [updatestate_depend_eliminate]: 1.538e-05 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 3.20998e-06 [parameter_eliminate]: 1.87001e-06 [a_2]: 7.812e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.36e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.27e-06 [auto_parallel]: 6.49001e-06 [parallel]: 1.884e-05 [flash_sp]: 7.56001e-06 [merge_comm]: 3.74002e-06 [allreduce_fusion]: 3.52002e-06 [matmul_add_comm_reduction]: 9.62999e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 4.18001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.88998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.13e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.67001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.05002e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 8.94e-06 [renormalize]: 0.00058341 [add_forward_monad_depend]: 5.20999e-06 [auto_monad_grad]: 1.71998e-06 [auto_monad_eliminator]: 1.511e-05 [cse]: 2.88e-05 [a_3]: 4.229e-05 [Cycle 2]: 0.00060784, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.95998e-06 [loop_unroll]: 5.63002e-06 [a_1]: 0.000128 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 5.66998e-06 [updatestate_depend_eliminate]: 2.83998e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.859e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.46998e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 5.46998e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.40999e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 3.05998e-06 [matmul_add_comm_reduction]: 5.44e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.79999e-06 [virtual_dataset]: 5.10001e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 4.89998e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.89e-06 [offload_activation]: 6.44001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.87001e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.02e-06 [a_after_grad]: 8.41997e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.69999e-06 [cse]: 1.398e-05 [a_3]: 3.424e-05 [py_interpret_to_execute_after_opt_a]: 8.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 3.176e-05 [convert_after_rewriter]: 7.33e-06 [order_py_execute_after_rewriter]: 5.47999e-06 [mutable_eliminate]: 0.00059288 [opt_b]: 0.00018712, [1] [Cycle 1]: 0.00018071, [7] [b_1]: 0.00010907 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 6.10016e-07 [cse]: 1.849e-05 [optimize_parallel_all_gather_comm]: 1.594e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.39e-05 [loop_unroll]: 0.00045819 [opt_after_cconv]: 0.00010004, [1] [Cycle 1]: 9.393e-05, [7] [c_1]: 2.866e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.788e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.271e-05 [tuple_transform]: 7.11e-05, [1] [Cycle 1]: 6.669e-05, [4] [d_1]: 4.072e-05 [none_parameter_eliminate]: 1.82999e-06 [renormalize]: 3.19997e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 5.164e-05 [cse_after_recomputation]: 2.13e-05, [1] [Cycle 1]: 1.654e-05, [1] [cse]: 1.124e-05 [environ_conv]: 5.34e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 3.03e-06 [label_micro_interleaved_index]: 4.97e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.70001e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.66999e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.203e-05 [grouped_pairwise_exchange_alltoall]: 1.99e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.63001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 1.719e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 7.388e-05, [1] [Cycle 1]: 6.953e-05, [6] [build]: 2.59999e-06 [elim_shapecalc]: 9.24e-06 [elim_not_effective]: 1.206e-05 [opt_reshape]: 6.64999e-06 [fold_const_symbol]: 9.86e-06 [renormalize]: 2.59985e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.87001e-06 [auto_monad_reorder]: 1.659e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 4.03001e-06 [opt_after_jit_grad]: 0.00049097 [validate]: 3.707e-05 [backend_pass]: 1.10999e-06 [task_emit]: 0.32252 [execute]: 1.006e-05 Sums bootstrap : 0.000495s : 0.15% type_inference : 0.006314s : 1.89% event_method : 0.000015s : 0.00% auto_monad : 0.000057s : 0.02% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000030s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.01% optimize.rewriter_before_opt_a : 0.000065s : 0.02% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.01% optimize.opt_a.loop_unroll : 0.000027s : 0.01% optimize.opt_a.a_1 : 0.000592s : 0.18% optimize.opt_a.with_stream_mark : 0.000025s : 0.01% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000023s : 0.01% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000583s : 0.17% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.01% optimize.opt_a.cse : 0.000043s : 0.01% optimize.opt_a.a_3 : 0.000077s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000593s : 0.18% optimize.opt_b.b_1 : 0.000109s : 0.03% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.01% optimize.loop_unroll : 0.000458s : 0.14% optimize.opt_after_cconv.c_1 : 0.000029s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.02% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000491s : 0.15% validate : 0.000037s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.322520s : 96.67% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000180 30 14.56% : 0.000026s : 5: substitution.arithmetic_simplify 1.28% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000006s : 4: substitution.graph_param_transform 67.58% : 0.000121s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.57% : 0.000005s : 4: substitution.remove_not_recompute_node 2.27% : 0.000004s : 4: substitution.replace_old_param 6.06% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006269 2 89.65% : 0.005620s : 1: type_inference.infer 10.35% : 0.000649s : 1: type_inference.specialize ------[replace.] 0.000041 5 71.04% : 0.000029s : 3: replace.inline 28.96% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000129 5 92.37% : 0.000119s : 3: match.inline 7.63% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000165 1131 0.91% : 0.000002s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.37% : 0.000004s : 19: predicate.arithmetic_simplify 0.82% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 16: predicate.float_depend_g_call 0.54% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.81% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000002s : 8: predicate.less_batch_normalization 1.92% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 11: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.49% : 0.000002s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 11: predicate.reduce_eliminate 2.26% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.61% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.80% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.22% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000430 8 48.00% : 0.000207s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.00% : 0.000224s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.347911 196 0.00% : 0.000004s : 1: ForceFp32Comm 1.02% : 0.003563s : 1: add_attr 1.02% : 0.003553s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000063s : 1: auto_monad 0.01% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.15% : 0.000531s : 1: bootstrap 0.01% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000027s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000021s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.13% : 0.000468s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.17% : 0.000604s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.28% : 0.000963s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000091s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.01% : 0.000035s : 4: opt.transform.symbol_engine_opt 0.67% : 0.002345s : 1: opt_a 0.03% : 0.000103s : 1: opt_after_cconv 0.14% : 0.000503s : 1: opt_after_jit_grad 0.05% : 0.000191s : 1: opt_b 1.27% : 0.004432s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000034s : 1: pre_auto_parallel 0.01% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.09% : 0.000306s : 1: renormalize.infer 0.08% : 0.000269s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000035s : 1: rewriter_after_opt_a 0.02% : 0.000069s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000076s : 1: symbol_engine_optimizer 92.71% : 0.322543s : 1: task_emit 0.02% : 0.000074s : 1: tuple_transform 1.82% : 0.006330s : 1: type_inference 0.02% : 0.000062s : 1: validate TotalTime = 2.31857, [24] [bootstrap]: 0.00051044 [type_inference]: 0.0121718 [event_method]: 5.014e-05 [auto_monad]: 0.00012244 [graph_reusing]: 8.82e-06 [inline]: 1.94e-06 [add_attr]: 0.0492981, [1] [add_attr_with_inline]: 0.0492846, [1] [Cycle 1]: 0.00010675, [2] [tag_attr]: 4.928e-05 [meta_addattr_fg_expand]: 9.14e-06 [parallel-infer-symbol]: 3.73001e-06 [pre_auto_parallel]: 5.972e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0281927, [53] [py_interpret_to_execute]: 4.42e-05 [rewriter_before_opt_a]: 0.00016731 [opt_a]: 0.0253536, [3] [Cycle 1]: 0.0207172, [45] [expand_dump_flag]: 5.30001e-06 [switch_simplify]: 7.696e-05 [loop_unroll]: 6.423e-05 [a_1]: 0.00156553 [with_stream_mark]: 3.083e-05 [recompute_prepare]: 2.518e-05 [updatestate_depend_eliminate]: 9.64e-06 [updatestate_assign_eliminate]: 7.45e-06 [updatestate_loads_eliminate]: 7.88001e-06 [parameter_eliminate]: 2.59999e-06 [a_2]: 0.00024763 [accelerated_algorithm]: 3.415e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.21001e-06 [shard_inline]: 1.601e-05 [merge_send_recv]: 1.764e-05 [auto_parallel]: 1.272e-05 [parallel]: 1.974e-05 [flash_sp]: 1.295e-05 [merge_comm]: 1.003e-05 [allreduce_fusion]: 8.79e-06 [matmul_add_comm_reduction]: 3.22e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.922e-05 [virtual_dataset]: 1.741e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.626e-05 [merge_forward]: 1.111e-05 [cell_reuse_recompute_pass]: 1.57999e-06 [offload_activation]: 2.079e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.011e-05 [merge_recompute_call_nodes]: 1.70001e-06 [before_grad]: 2.797e-05 [set_forward_comm_id_for_comm_node_pass]: 1.05e-05 [meta_fg_expand]: 0.00170239 [flash_sp_send_recv_attached]: 4.65999e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 6.428e-05 [a_after_grad]: 8.34e-05 [renormalize]: 0.0154414 [add_forward_monad_depend]: 1.56e-05 [auto_monad_grad]: 6.66999e-06 [auto_monad_eliminator]: 6.894e-05 [cse]: 0.000181 [a_3]: 0.00036251 [Cycle 2]: 0.00364739, [45] [expand_dump_flag]: 3.5e-06 [switch_simplify]: 4.963e-05 [loop_unroll]: 4.578e-05 [a_1]: 0.00172218 [with_stream_mark]: 1.93e-05 [recompute_prepare]: 1.297e-05 [updatestate_depend_eliminate]: 6.09999e-06 [updatestate_assign_eliminate]: 5.21998e-06 [updatestate_loads_eliminate]: 4.67998e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 0.00013089 [accelerated_algorithm]: 1.48e-05 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 2.74999e-06 [shard_inline]: 9.72001e-06 [merge_send_recv]: 1.084e-05 [auto_parallel]: 1.179e-05 [parallel]: 1.048e-05 [flash_sp]: 4.22e-06 [merge_comm]: 5.49998e-06 [allreduce_fusion]: 5.25999e-06 [matmul_add_comm_reduction]: 1.135e-05 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 1.099e-05 [virtual_dataset]: 8.87999e-06 [get_grad_eliminate_]: 9.17999e-06 [virtual_output]: 8.60001e-06 [merge_forward]: 6.09001e-06 [cell_reuse_recompute_pass]: 1.53002e-06 [offload_activation]: 1.402e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.761e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 1.603e-05 [set_forward_comm_id_for_comm_node_pass]: 7.33e-06 [meta_fg_expand]: 0.00012258 [flash_sp_send_recv_attached]: 1.69e-06 [receive_attached]: 2.92002e-06 [after_resolve]: 2.008e-05 [a_after_grad]: 1.477e-05 [renormalize]: 0.00086791 [add_forward_monad_depend]: 5.12999e-06 [auto_monad_grad]: 2.03002e-06 [auto_monad_eliminator]: 1.788e-05 [cse]: 5.204e-05 [a_3]: 6.852e-05 [Cycle 3]: 0.00096775, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 1.089e-05 [loop_unroll]: 9.16002e-06 [a_1]: 0.00025518 [with_stream_mark]: 1.113e-05 [recompute_prepare]: 1.04e-05 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 4.40999e-06 [updatestate_loads_eliminate]: 3.93999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012536 [accelerated_algorithm]: 1.311e-05 [shard]: 1.66e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 8.94e-06 [merge_send_recv]: 8.97999e-06 [auto_parallel]: 7.51001e-06 [parallel]: 5.77001e-06 [flash_sp]: 1.38002e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 5.15001e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.018e-05 [virtual_dataset]: 8.89998e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.15e-06 [merge_forward]: 5.02e-06 [cell_reuse_recompute_pass]: 1.48002e-06 [offload_activation]: 9.20001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.656e-05 [merge_recompute_call_nodes]: 1.07e-06 [before_grad]: 1.505e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20999e-06 [meta_fg_expand]: 3.09999e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.59e-06 [after_resolve]: 1.527e-05 [a_after_grad]: 1.552e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.92001e-06 [auto_monad_grad]: 1.39e-06 [auto_monad_eliminator]: 1.355e-05 [cse]: 3.157e-05 [a_3]: 7.148e-05 [py_interpret_to_execute_after_opt_a]: 1.596e-05 [slice_cell_reuse_recomputed_activation]: 1.91998e-06 [rewriter_after_opt_a]: 5.403e-05 [convert_after_rewriter]: 9.41e-06 [order_py_execute_after_rewriter]: 7.11999e-06 [mutable_eliminate]: 0.00073846 [opt_b]: 0.00034848, [1] [Cycle 1]: 0.00034013, [7] [b_1]: 0.00022016 [b_2]: 1.154e-05 [updatestate_depend_eliminate]: 8.63001e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 7.00005e-07 [cse]: 5.102e-05 [optimize_parallel_all_gather_comm]: 2.811e-05 [overlap_param_gather]: 2.31e-06 [cconv]: 2.727e-05 [loop_unroll]: 0.00051284 [opt_after_cconv]: 0.00014775, [1] [Cycle 1]: 0.00014033, [7] [c_1]: 4.897e-05 [parameter_eliminate]: 2.73998e-06 [updatestate_depend_eliminate]: 8.69003e-06 [updatestate_assign_eliminate]: 4.40999e-06 [updatestate_loads_eliminate]: 4.29002e-06 [cse]: 3.601e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 3.394e-05 [tuple_transform]: 0.00010486, [1] [Cycle 1]: 9.955e-05, [4] [d_1]: 6.854e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.9002e-07 [switch_simplify]: 1.009e-05 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 6.322e-05 [cse_after_recomputation]: 3.714e-05, [1] [Cycle 1]: 3.177e-05, [1] [cse]: 2.461e-05 [environ_conv]: 9.99001e-06 [swap_dp_allreduce_reducescatter]: 8.64998e-06 [bias_add_comm_swap]: 2.73003e-06 [label_micro_interleaved_index]: 5.08002e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.13998e-06 [assign_add_opt]: 1.60001e-06 [ForceFp32Comm]: 1.02998e-06 [remove_cast_before_assign_add]: 1.04003e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.46998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.00001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.934e-05 [grouped_pairwise_exchange_alltoall]: 1.78002e-06 [offloading_packed_experts]: 5.68997e-06 [overlap_recompute_and_grad_model_parallel]: 5.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 4.90999e-06 [overlap_grad_flash_sp]: 2.873e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.43998e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 0.00011254, [1] [Cycle 1]: 0.00010739, [6] [build]: 1.252e-05 [elim_shapecalc]: 1.687e-05 [elim_not_effective]: 1.9e-05 [opt_reshape]: 1.081e-05 [fold_const_symbol]: 1.498e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.29001e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 2.78e-05 [get_jit_bprop_graph]: 1.74e-06 [rewriter_after_jit_bprop_graph]: 4.47998e-06 [opt_after_jit_grad]: 0.00051935 [validate]: 5.328e-05 [backend_pass]: 1.03001e-06 [task_emit]: 2.22727 [execute]: 8.68001e-06 Sums bootstrap : 0.000510s : 0.02% type_inference : 0.012172s : 0.54% event_method : 0.000050s : 0.00% auto_monad : 0.000122s : 0.01% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000049s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000060s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000044s : 0.00% optimize.rewriter_before_opt_a : 0.000167s : 0.01% optimize.opt_a.expand_dump_flag : 0.000010s : 0.00% optimize.opt_a.switch_simplify : 0.000137s : 0.01% optimize.opt_a.loop_unroll : 0.000119s : 0.01% optimize.opt_a.a_1 : 0.003543s : 0.16% optimize.opt_a.with_stream_mark : 0.000061s : 0.00% optimize.opt_a.recompute_prepare : 0.000049s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000504s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000062s : 0.00% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000037s : 0.00% optimize.opt_a.auto_parallel : 0.000032s : 0.00% optimize.opt_a.parallel : 0.000036s : 0.00% optimize.opt_a.flash_sp : 0.000019s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000053s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.00% optimize.opt_a.virtual_dataset : 0.000035s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000033s : 0.00% optimize.opt_a.merge_forward : 0.000022s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000059s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.00% optimize.opt_a.meta_fg_expand : 0.001828s : 0.08% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000100s : 0.00% optimize.opt_a.a_after_grad : 0.000114s : 0.01% optimize.opt_a.renormalize : 0.016309s : 0.72% optimize.opt_a.add_forward_monad_depend : 0.000023s : 0.00% optimize.opt_a.auto_monad_grad : 0.000010s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000100s : 0.00% optimize.opt_a.cse : 0.000265s : 0.01% optimize.opt_a.a_3 : 0.000503s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000054s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000738s : 0.03% optimize.opt_b.b_1 : 0.000220s : 0.01% optimize.opt_b.b_2 : 0.000012s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000051s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000028s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.00% optimize.loop_unroll : 0.000513s : 0.02% optimize.opt_after_cconv.c_1 : 0.000049s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000036s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000034s : 0.00% optimize.tuple_transform.d_1 : 0.000069s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000063s : 0.00% optimize.cse_after_recomputation.cse : 0.000025s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000017s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000028s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000519s : 0.02% validate : 0.000053s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 2.227275s : 98.22% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.001032 222 11.81% : 0.000122s : 12: substitution.arithmetic_simplify 1.79% : 0.000018s : 2: substitution.cast_eliminate 0.27% : 0.000003s : 5: substitution.elim_not_effective 0.44% : 0.000005s : 5: substitution.float_depend_g_call 0.41% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 5: substitution.fold_const_symbol 0.81% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.18% : 0.000002s : 2: substitution.incorporate_call_switch 55.51% : 0.000573s : 17: substitution.inline 1.74% : 0.000018s : 2: substitution.inline_without_move 1.16% : 0.000012s : 20: substitution.j_node_and_user_rematch 1.78% : 0.000018s : 3: substitution.less_batch_normalization 1.35% : 0.000014s : 11: substitution.minmaximum_grad 0.68% : 0.000007s : 5: substitution.partial_eliminate 1.32% : 0.000014s : 20: substitution.remove_not_recompute_node 2.95% : 0.000030s : 10: substitution.replace_applicator 1.26% : 0.000013s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.02% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.37% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 1.89% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.45% : 0.000077s : 30: substitution.tuple_list_get_item_eliminator 1.95% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012086 2 86.81% : 0.010491s : 1: type_inference.infer 13.19% : 0.001595s : 1: type_inference.specialize ------[replace.] 0.000252 33 59.83% : 0.000151s : 17: replace.inline 40.17% : 0.000101s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000603 33 93.41% : 0.000564s : 17: match.inline 6.59% : 0.000040s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000773 5764 1.04% : 0.000008s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.18% : 0.000009s : 68: predicate.addn_zero_filter 1.01% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.04% : 0.000016s : 100: predicate.arithmetic_simplify 1.13% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.50% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.19% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.72% : 0.000013s : 108: predicate.environ_get_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000043s : 249: predicate.inline 1.23% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000006s : 32: predicate.less_batch_normalization 1.65% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.60% : 0.000020s : 168: predicate.load_eliminater 0.48% : 0.000004s : 8: predicate.loop_unroll_after_grad 2.30% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 68: predicate.minmaximum_grad 0.47% : 0.000004s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.19% : 0.000017s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.64% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 32: predicate.remove_not_recompute_node 1.85% : 0.000014s : 152: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.37% : 0.000011s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.72% : 0.000006s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.34% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000014s : 101: predicate.switch_defer_inline 2.88% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.95% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000024s : 132: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.56% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.20% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.19% : 0.000001s : 8: predicate.value_based_eliminate 0.60% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.013914 34 7.24% : 0.001007s : 13: func_graph_cloner_run.FuncGraphClonerGraph 92.76% : 0.012907s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 2.418014 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.04% : 0.049305s : 1: add_attr 2.04% : 0.049290s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000068s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000130s : 1: auto_monad 0.00% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.02% : 0.000547s : 1: bootstrap 0.00% : 0.000045s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000023s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000040s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.00% : 0.000058s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.02% : 0.000524s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.03% : 0.000750s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.00% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000021s : 1: opt.transform.mutable_eliminate 0.22% : 0.005294s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000204s : 28: opt.transform.opt_b 0.00% : 0.000077s : 2: opt.transform.opt_trans_graph 0.00% : 0.000058s : 4: opt.transform.symbol_engine_opt 1.05% : 0.025358s : 1: opt_a 0.01% : 0.000151s : 1: opt_after_cconv 0.02% : 0.000532s : 1: opt_after_jit_grad 0.01% : 0.000352s : 1: opt_b 1.17% : 0.028198s : 1: optimize 0.00% : 0.000033s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000033s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000065s : 1: pre_auto_parallel 0.00% : 0.000048s : 1: py_interpret_to_execute 0.00% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000039s : 1: remove_dup_value 0.09% : 0.002271s : 2: renormalize.infer 0.58% : 0.014018s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000059s : 1: rewriter_after_opt_a 0.01% : 0.000172s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000115s : 1: symbol_engine_optimizer 92.11% : 2.227297s : 1: task_emit 0.00% : 0.000108s : 1: tuple_transform 0.50% : 0.012191s : 1: type_inference 0.00% : 0.000085s : 1: validate TotalTime = 0.547799, [24] [bootstrap]: 0.00045042 [type_inference]: 0.0043581 [event_method]: 1.149e-05 [auto_monad]: 5.067e-05 [graph_reusing]: 4.58999e-06 [inline]: 1.91e-06 [add_attr]: 0.00301929, [1] [add_attr_with_inline]: 0.00301176, [1] [Cycle 1]: 4.692e-05, [2] [tag_attr]: 1.17e-05 [meta_addattr_fg_expand]: 3.04999e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 2.149e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 9.09989e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0681425, [53] [py_interpret_to_execute]: 1.582e-05 [rewriter_before_opt_a]: 3.944e-05 [opt_a]: 0.0660738, [2] [Cycle 1]: 0.00128305, [45] [expand_dump_flag]: 2.63e-06 [switch_simplify]: 2.453e-05 [loop_unroll]: 1.364e-05 [a_1]: 0.00029106 [with_stream_mark]: 1.35e-05 [recompute_prepare]: 7.62998e-06 [updatestate_depend_eliminate]: 3.81001e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 7.633e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 2.60002e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 7.53e-06 [auto_parallel]: 5.60001e-06 [parallel]: 1.834e-05 [flash_sp]: 7.30003e-06 [merge_comm]: 3.82998e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.03002e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 5.66003e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.78998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.086e-05 [merge_recompute_call_nodes]: 1.31002e-06 [before_grad]: 9.32999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.40002e-06 [after_resolve]: 1.03e-05 [a_after_grad]: 8.88002e-06 [renormalize]: 0.00037443 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.255e-05 [cse]: 2.654e-05 [a_3]: 3.977e-05 [Cycle 2]: 0.0647803, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00012135 [with_stream_mark]: 9.54999e-06 [recompute_prepare]: 5.76003e-06 [updatestate_depend_eliminate]: 2.75002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.771e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.47998e-06 [auto_parallel]: 2.481e-05 [parallel]: 9.29e-06 [flash_sp]: 5.68997e-06 [merge_comm]: 6.81001e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 1.119e-05 [allreduce_slice_to_reducescatter]: 7.89994e-07 [virtual_shard_identity]: 2.061e-05 [virtual_dataset]: 7.00998e-06 [get_grad_eliminate_]: 6.05002e-06 [virtual_output]: 5.88002e-06 [merge_forward]: 3.62002e-06 [cell_reuse_recompute_pass]: 2.58e-06 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.728e-05 [merge_recompute_call_nodes]: 1.15999e-06 [before_grad]: 9.41e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 3.24001e-06 [flash_sp_send_recv_attached]: 1.25999e-06 [receive_attached]: 1.69998e-06 [after_resolve]: 1.177e-05 [a_after_grad]: 1.051e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 3.93999e-06 [auto_monad_grad]: 2.59001e-06 [auto_monad_eliminator]: 1.376e-05 [cse]: 2.653e-05 [a_3]: 3.763e-05 [py_interpret_to_execute_after_opt_a]: 1.432e-05 [slice_cell_reuse_recomputed_activation]: 2.01003e-06 [rewriter_after_opt_a]: 3.581e-05 [convert_after_rewriter]: 7.08e-06 [order_py_execute_after_rewriter]: 5.22e-06 [mutable_eliminate]: 0.00063917 [opt_b]: 0.00019483, [1] [Cycle 1]: 0.00018741, [7] [b_1]: 0.00011618 [b_2]: 7.51999e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.49001e-06 [renormalize]: 7.50006e-07 [cse]: 1.783e-05 [optimize_parallel_all_gather_comm]: 1.607e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.743e-05 [loop_unroll]: 0.00042909 [opt_after_cconv]: 9.658e-05, [1] [Cycle 1]: 9.084e-05, [7] [c_1]: 2.851e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.696e-05 [renormalize]: 2.40019e-07 [remove_dup_value]: 1.319e-05 [tuple_transform]: 7.179e-05, [1] [Cycle 1]: 6.711e-05, [4] [d_1]: 4.062e-05 [none_parameter_eliminate]: 1.36998e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 5.97999e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.637e-05 [cse_after_recomputation]: 2.041e-05, [1] [Cycle 1]: 1.545e-05, [1] [cse]: 1.026e-05 [environ_conv]: 5.44998e-06 [swap_dp_allreduce_reducescatter]: 5.16002e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.51002e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.68e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 4.4e-06 [overlap_grad_flash_sp]: 1.818e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 7.178e-05, [1] [Cycle 1]: 6.745e-05, [6] [build]: 3.29001e-06 [elim_shapecalc]: 8.92999e-06 [elim_not_effective]: 1.177e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 9.37999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.491e-05 [get_jit_bprop_graph]: 1.94e-06 [rewriter_after_jit_bprop_graph]: 3.66999e-06 [opt_after_jit_grad]: 0.00045276 [validate]: 3.692e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.470979 [execute]: 1.027e-05 Sums bootstrap : 0.000450s : 0.09% type_inference : 0.004358s : 0.91% event_method : 0.000011s : 0.00% auto_monad : 0.000051s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.00% optimize.rewriter_before_opt_a : 0.000039s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.01% optimize.opt_a.loop_unroll : 0.000019s : 0.00% optimize.opt_a.a_1 : 0.000412s : 0.09% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000030s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.01% optimize.opt_a.flash_sp : 0.000013s : 0.00% optimize.opt_a.merge_comm : 0.000011s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000020s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000028s : 0.01% optimize.opt_a.virtual_dataset : 0.000013s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000028s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.00% optimize.opt_a.a_after_grad : 0.000019s : 0.00% optimize.opt_a.renormalize : 0.000375s : 0.08% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.01% optimize.opt_a.cse : 0.000053s : 0.01% optimize.opt_a.a_3 : 0.000077s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000639s : 0.13% optimize.opt_b.b_1 : 0.000116s : 0.02% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.01% optimize.loop_unroll : 0.000429s : 0.09% optimize.opt_after_cconv.c_1 : 0.000029s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000453s : 0.09% validate : 0.000037s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.470979s : 98.18% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000123 26 17.66% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.75% : 0.000006s : 4: substitution.graph_param_transform 63.64% : 0.000078s : 2: substitution.inline 2.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.16% : 0.000005s : 4: substitution.remove_not_recompute_node 4.41% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004317 2 91.88% : 0.003967s : 1: type_inference.infer 8.12% : 0.000351s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000143 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.76% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.79% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.89% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.65% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.50% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.40% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.89% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.50% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.31% : 0.000002s : 11: predicate.partial_defer_inline 1.13% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.87% : 0.000001s : 9: predicate.reduce_eliminate 2.08% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.94% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.94% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 1.13% : 0.000002s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.66% : 0.000002s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.60% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.67% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.23% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.91% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 42.77% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.23% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.620282 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.49% : 0.003024s : 1: add_attr 0.49% : 0.003015s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000056s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.08% : 0.000485s : 1: bootstrap 0.00% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000017s : 1: event_method 0.00% : 0.000019s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.07% : 0.000438s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.10% : 0.000649s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.13% : 0.000793s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000096s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.01% : 0.000033s : 4: opt.transform.symbol_engine_opt 10.65% : 0.066077s : 1: opt_a 0.02% : 0.000100s : 1: opt_after_cconv 0.07% : 0.000463s : 1: opt_after_jit_grad 0.03% : 0.000198s : 1: opt_b 10.99% : 0.068147s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000025s : 1: pre_auto_parallel 0.00% : 0.000019s : 1: py_interpret_to_execute 0.00% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.03% : 0.000208s : 1: renormalize.infer 0.03% : 0.000160s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000039s : 1: rewriter_after_opt_a 0.01% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000074s : 1: symbol_engine_optimizer 75.93% : 0.471003s : 1: task_emit 0.01% : 0.000075s : 1: tuple_transform 0.70% : 0.004372s : 1: type_inference 0.01% : 0.000063s : 1: validate . TotalTime = 0.474364, [24] [bootstrap]: 0.00051534 [type_inference]: 0.0105334 [event_method]: 4.494e-05 [auto_monad]: 0.00011594 [graph_reusing]: 7.53e-06 [inline]: 2.19999e-06 [add_attr]: 0.00306555, [1] [add_attr_with_inline]: 0.00305666, [1] [Cycle 1]: 7.057e-05, [2] [tag_attr]: 3.256e-05 [meta_addattr_fg_expand]: 8.55001e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 4.807e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.0302442, [53] [py_interpret_to_execute]: 3.632e-05 [rewriter_before_opt_a]: 0.00012841 [opt_a]: 0.0278313, [3] [Cycle 1]: 0.023789, [45] [expand_dump_flag]: 3.93001e-06 [switch_simplify]: 6.703e-05 [loop_unroll]: 5.463e-05 [a_1]: 0.0174875 [with_stream_mark]: 3.07e-05 [recompute_prepare]: 2.425e-05 [updatestate_depend_eliminate]: 9.87999e-06 [updatestate_assign_eliminate]: 7.85e-06 [updatestate_loads_eliminate]: 7.41001e-06 [parameter_eliminate]: 2.39999e-06 [a_2]: 0.00024692 [accelerated_algorithm]: 3.289e-05 [shard]: 2.14e-06 [meta_shard_fg_expand]: 4.35999e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.851e-05 [auto_parallel]: 1.385e-05 [parallel]: 1.971e-05 [flash_sp]: 1.428e-05 [merge_comm]: 9.72001e-06 [allreduce_fusion]: 9.36998e-06 [matmul_add_comm_reduction]: 3.196e-05 [allreduce_slice_to_reducescatter]: 1.10001e-06 [virtual_shard_identity]: 1.786e-05 [virtual_dataset]: 1.604e-05 [get_grad_eliminate_]: 1.543e-05 [virtual_output]: 1.526e-05 [merge_forward]: 9.84001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.872e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.902e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 2.723e-05 [set_forward_comm_id_for_comm_node_pass]: 9.52999e-06 [meta_fg_expand]: 0.00167215 [flash_sp_send_recv_attached]: 4.51002e-06 [receive_attached]: 2.46e-06 [after_resolve]: 8.37e-05 [a_after_grad]: 8.245e-05 [renormalize]: 0.00277686 [add_forward_monad_depend]: 1.086e-05 [auto_monad_grad]: 6.02999e-06 [auto_monad_eliminator]: 5.775e-05 [cse]: 0.00017119 [a_3]: 0.00033672 [Cycle 2]: 0.0031181, [45] [expand_dump_flag]: 2.63e-06 [switch_simplify]: 4.687e-05 [loop_unroll]: 4.466e-05 [a_1]: 0.00156621 [with_stream_mark]: 1.41e-05 [recompute_prepare]: 1.145e-05 [updatestate_depend_eliminate]: 5.46998e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.16002e-06 [a_2]: 0.00012752 [accelerated_algorithm]: 1.269e-05 [shard]: 1.40001e-06 [meta_shard_fg_expand]: 2.19999e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 7.28e-06 [auto_parallel]: 8.48001e-06 [parallel]: 6.69999e-06 [flash_sp]: 3.49001e-06 [merge_comm]: 5.79e-06 [allreduce_fusion]: 5.09e-06 [matmul_add_comm_reduction]: 8.17e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 1.039e-05 [virtual_dataset]: 9.11002e-06 [get_grad_eliminate_]: 8.71002e-06 [virtual_output]: 8.69e-06 [merge_forward]: 4.70001e-06 [cell_reuse_recompute_pass]: 9.00007e-07 [offload_activation]: 9.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.633e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.448e-05 [set_forward_comm_id_for_comm_node_pass]: 5.51002e-06 [meta_fg_expand]: 4.237e-05 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.44e-06 [after_resolve]: 1.614e-05 [a_after_grad]: 1.451e-05 [renormalize]: 0.00066921 [add_forward_monad_depend]: 5.00999e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.526e-05 [cse]: 5.052e-05 [a_3]: 6.702e-05 [Cycle 3]: 0.00090787, [45] [expand_dump_flag]: 1.29e-06 [switch_simplify]: 1.063e-05 [loop_unroll]: 9.19e-06 [a_1]: 0.00024941 [with_stream_mark]: 1.023e-05 [recompute_prepare]: 9.52001e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 3.86999e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012369 [accelerated_algorithm]: 1.214e-05 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 7.18998e-06 [auto_parallel]: 7.35998e-06 [parallel]: 4.52e-06 [flash_sp]: 1.12e-06 [merge_comm]: 4.82e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.45998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.046e-05 [virtual_dataset]: 8.87e-06 [get_grad_eliminate_]: 8.74e-06 [virtual_output]: 8.27998e-06 [merge_forward]: 4.16001e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 8.45999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.614e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 1.476e-05 [a_after_grad]: 1.506e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.112e-05 [cse]: 2.689e-05 [a_3]: 5.956e-05 [py_interpret_to_execute_after_opt_a]: 1.506e-05 [slice_cell_reuse_recomputed_activation]: 1.74998e-06 [rewriter_after_opt_a]: 4.952e-05 [convert_after_rewriter]: 9.03002e-06 [order_py_execute_after_rewriter]: 6.57002e-06 [mutable_eliminate]: 0.00056557 [opt_b]: 0.00029652, [1] [Cycle 1]: 0.00028894, [7] [b_1]: 0.00019238 [b_2]: 1.081e-05 [updatestate_depend_eliminate]: 7.92e-06 [updatestate_assign_eliminate]: 4.06001e-06 [updatestate_loads_eliminate]: 4.12e-06 [renormalize]: 8.90024e-07 [cse]: 3.359e-05 [optimize_parallel_all_gather_comm]: 2.145e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.0004426 [opt_after_cconv]: 0.00013774, [1] [Cycle 1]: 0.00013177, [7] [c_1]: 4.781e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 7.22002e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.88001e-06 [cse]: 3.183e-05 [renormalize]: 7.60017e-07 [remove_dup_value]: 3.131e-05 [tuple_transform]: 0.00010343, [1] [Cycle 1]: 9.868e-05, [4] [d_1]: 6.822e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 1.014e-05 [partial_unused_args_eliminate]: 1.88997e-06 [add_recomputation]: 6.701e-05 [cse_after_recomputation]: 3.402e-05, [1] [Cycle 1]: 2.903e-05, [1] [cse]: 2.343e-05 [environ_conv]: 9.54e-06 [swap_dp_allreduce_reducescatter]: 7.61999e-06 [bias_add_comm_swap]: 2.37001e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.06997e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.50999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.739e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.44998e-06 [overlap_recompute_and_grad_model_parallel]: 5.86e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 5.40001e-06 [overlap_grad_flash_sp]: 2.605e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 9.928e-05, [1] [Cycle 1]: 9.482e-05, [6] [build]: 1.055e-05 [elim_shapecalc]: 1.312e-05 [elim_not_effective]: 1.827e-05 [opt_reshape]: 1.005e-05 [fold_const_symbol]: 1.514e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.37999e-06 [pipeline_parallel_scheduler]: 1.33002e-06 [auto_monad_reorder]: 2.553e-05 [get_jit_bprop_graph]: 1.74e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00048714 [validate]: 4.725e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.428966 [execute]: 8.59998e-06 Sums bootstrap : 0.000515s : 0.11% type_inference : 0.010533s : 2.24% event_method : 0.000045s : 0.01% auto_monad : 0.000116s : 0.02% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.01% optimize.rewriter_before_opt_a : 0.000128s : 0.03% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.03% optimize.opt_a.loop_unroll : 0.000108s : 0.02% optimize.opt_a.a_1 : 0.019303s : 4.11% optimize.opt_a.with_stream_mark : 0.000055s : 0.01% optimize.opt_a.recompute_prepare : 0.000045s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000498s : 0.11% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.01% optimize.opt_a.merge_send_recv : 0.000033s : 0.01% optimize.opt_a.auto_parallel : 0.000030s : 0.01% optimize.opt_a.parallel : 0.000031s : 0.01% optimize.opt_a.flash_sp : 0.000019s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.01% optimize.opt_a.virtual_dataset : 0.000034s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.01% optimize.opt_a.virtual_output : 0.000032s : 0.01% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.001718s : 0.37% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000115s : 0.02% optimize.opt_a.a_after_grad : 0.000112s : 0.02% optimize.opt_a.renormalize : 0.003446s : 0.73% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.02% optimize.opt_a.cse : 0.000249s : 0.05% optimize.opt_a.a_3 : 0.000463s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000566s : 0.12% optimize.opt_b.b_1 : 0.000192s : 0.04% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000443s : 0.09% optimize.opt_after_cconv.c_1 : 0.000048s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000031s : 0.01% optimize.tuple_transform.d_1 : 0.000068s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000067s : 0.01% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000487s : 0.10% validate : 0.000047s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.428966s : 91.27% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.016893 218 0.28% : 0.000048s : 11: substitution.arithmetic_simplify 0.09% : 0.000015s : 2: substitution.cast_eliminate 0.02% : 0.000003s : 5: substitution.elim_not_effective 0.02% : 0.000004s : 5: substitution.float_depend_g_call 0.03% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.01% : 0.000002s : 5: substitution.fold_const_symbol 0.05% : 0.000008s : 8: substitution.graph_param_transform 0.02% : 0.000003s : 2: substitution.incorporate_call 0.01% : 0.000002s : 2: substitution.incorporate_call_switch 97.92% : 0.016542s : 16: substitution.inline 0.10% : 0.000017s : 2: substitution.inline_without_move 0.06% : 0.000010s : 20: substitution.j_node_and_user_rematch 0.10% : 0.000017s : 3: substitution.less_batch_normalization 0.08% : 0.000013s : 11: substitution.minmaximum_grad 0.03% : 0.000006s : 5: substitution.partial_eliminate 0.08% : 0.000014s : 20: substitution.remove_not_recompute_node 0.15% : 0.000025s : 10: substitution.replace_applicator 0.07% : 0.000011s : 15: substitution.replace_old_param 0.02% : 0.000003s : 1: substitution.set_cell_output_no_recompute 0.17% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 0.08% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 0.11% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 0.39% : 0.000066s : 28: substitution.tuple_list_get_item_eliminator 0.11% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010460 2 86.71% : 0.009070s : 1: type_inference.infer 13.29% : 0.001390s : 1: type_inference.specialize ------[replace.] 0.000213 30 59.88% : 0.000127s : 16: replace.inline 40.12% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.016564 30 99.81% : 0.016532s : 16: match.inline 0.19% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000745 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.10% : 0.000016s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.48% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.14% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.69% : 0.000042s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.60% : 0.000019s : 164: predicate.load_eliminater 0.37% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000015s : 97: predicate.partial_defer_inline 1.67% : 0.000012s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.87% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.59% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001589 32 57.53% : 0.000914s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.47% : 0.000675s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.532433 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.58% : 0.003070s : 1: add_attr 0.57% : 0.003061s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000071s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000123s : 1: auto_monad 0.01% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.10% : 0.000549s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.01% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000052s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000011s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.08% : 0.000452s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.11% : 0.000575s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000019s : 1: opt.transform.mutable_eliminate 3.94% : 0.020991s : 117: opt.transform.opt_a 0.01% : 0.000047s : 1: opt.transform.opt_after_cconv 0.01% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000177s : 28: opt.transform.opt_b 0.01% : 0.000076s : 2: opt.transform.opt_trans_graph 0.01% : 0.000053s : 4: opt.transform.symbol_engine_opt 5.23% : 0.027835s : 1: opt_a 0.03% : 0.000141s : 1: opt_after_cconv 0.09% : 0.000497s : 1: opt_after_jit_grad 0.06% : 0.000300s : 1: opt_b 5.68% : 0.030250s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000053s : 1: pre_auto_parallel 0.01% : 0.000040s : 1: py_interpret_to_execute 0.00% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000036s : 1: remove_dup_value 0.35% : 0.001882s : 2: renormalize.infer 0.29% : 0.001549s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000054s : 1: rewriter_after_opt_a 0.02% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000102s : 1: symbol_engine_optimizer 80.57% : 0.428989s : 1: task_emit 0.02% : 0.000107s : 1: tuple_transform 1.98% : 0.010551s : 1: type_inference 0.01% : 0.000079s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x8-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x9-pynative],max_mem:6.0M TotalTime = 0.0228167, [24] [bootstrap]: 0.00058489 [type_inference]: 0.00650648 [event_method]: 1.488e-05 [auto_monad]: 5.844e-05 [graph_reusing]: 5.59e-06 [inline]: 2.08002e-06 [add_attr]: 0.00366655, [1] [add_attr_with_inline]: 0.00365502, [1] [Cycle 1]: 5.358e-05, [2] [tag_attr]: 1.674e-05 [meta_addattr_fg_expand]: 4.37e-06 [parallel-infer-symbol]: 3.32002e-06 [pre_auto_parallel]: 2.957e-05 [insert-virtual-dataset]: 2.63998e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00418363, [53] [py_interpret_to_execute]: 2.058e-05 [rewriter_before_opt_a]: 6.072e-05 [opt_a]: 0.00227641, [2] [Cycle 1]: 0.00166786, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 3.255e-05 [loop_unroll]: 2.128e-05 [a_1]: 0.00051389 [with_stream_mark]: 1.409e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.90002e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.652e-05 [accelerated_algorithm]: 6.23002e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 8.1e-06 [auto_parallel]: 6.74001e-06 [parallel]: 2.526e-05 [flash_sp]: 7.92e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 8.54998e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 6.94001e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.58002e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.002e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.136e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.62001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.19001e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00049872 [add_forward_monad_depend]: 5.72001e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.384e-05 [cse]: 2.773e-05 [a_3]: 4.104e-05 [Cycle 2]: 0.00059833, [45] [expand_dump_flag]: 1.17999e-06 [switch_simplify]: 7.10998e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00012485 [with_stream_mark]: 1.065e-05 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 3.06001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.716e-05 [accelerated_algorithm]: 5.46002e-06 [shard]: 1.30001e-06 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.29997e-06 [flash_sp]: 3.31001e-06 [merge_comm]: 2.86999e-06 [allreduce_fusion]: 3.03998e-06 [matmul_add_comm_reduction]: 4.98001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.09998e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 5.01002e-06 [merge_forward]: 2.62001e-06 [cell_reuse_recompute_pass]: 1.51998e-06 [offload_activation]: 6.01003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.75002e-06 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.76998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.15e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.64999e-06 [cse]: 1.7e-05 [a_3]: 3.359e-05 [py_interpret_to_execute_after_opt_a]: 8.15e-06 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 3.018e-05 [convert_after_rewriter]: 6.94999e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00048803 [opt_b]: 0.00018048, [1] [Cycle 1]: 0.00017446, [7] [b_1]: 0.00010632 [b_2]: 6.64001e-06 [updatestate_depend_eliminate]: 5.52999e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 3.29979e-07 [cse]: 1.682e-05 [optimize_parallel_all_gather_comm]: 1.534e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00042335 [opt_after_cconv]: 9.661e-05, [1] [Cycle 1]: 9.092e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.81e-06 [updatestate_depend_eliminate]: 5.46002e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.68e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.303e-05 [tuple_transform]: 7.006e-05, [1] [Cycle 1]: 6.563e-05, [4] [d_1]: 3.94e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 5.425e-05 [cse_after_recomputation]: 2.13e-05, [1] [Cycle 1]: 1.668e-05, [1] [cse]: 1.156e-05 [environ_conv]: 4.70999e-06 [swap_dp_allreduce_reducescatter]: 5.66e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.27998e-06 [label_fine_grained_interleaved_index]: 3.03003e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.58002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.242e-05 [grouped_pairwise_exchange_alltoall]: 1.89999e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.875e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.83997e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 6.896e-05, [1] [Cycle 1]: 6.471e-05, [6] [build]: 2.93e-06 [elim_shapecalc]: 8.45999e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 6.07001e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.634e-05 [get_jit_bprop_graph]: 1.30999e-06 [rewriter_after_jit_bprop_graph]: 0.00014131 [opt_after_jit_grad]: 0.0005175 [validate]: 3.422e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00682067 [execute]: 8.00999e-06 Sums bootstrap : 0.000585s : 3.22% type_inference : 0.006506s : 35.81% event_method : 0.000015s : 0.08% auto_monad : 0.000058s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.11% optimize.rewriter_before_opt_a : 0.000061s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000639s : 3.52% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.03% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000499s : 2.75% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000045s : 0.25% optimize.opt_a.a_3 : 0.000075s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000488s : 2.69% optimize.opt_b.b_1 : 0.000106s : 0.59% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.12% optimize.loop_unroll : 0.000423s : 2.33% optimize.opt_after_cconv.c_1 : 0.000028s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000054s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000002s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000141s : 0.78% opt_after_jit_grad : 0.000518s : 2.85% validate : 0.000034s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006821s : 37.54% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000174 30 14.80% : 0.000026s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.03% : 0.000005s : 4: substitution.graph_param_transform 67.45% : 0.000118s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000005s : 4: substitution.remove_not_recompute_node 2.12% : 0.000004s : 4: substitution.replace_old_param 6.50% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006458 2 90.87% : 0.005868s : 1: type_inference.infer 9.13% : 0.000589s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.18% : 0.000028s : 3: replace.inline 29.82% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000126 5 91.89% : 0.000116s : 3: match.inline 8.11% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 1.23% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.53% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000001s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.30% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.95% : 0.000002s : 11: predicate.minmaximum_grad 1.49% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.29% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.18% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.44% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.53% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.91% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000396 8 45.65% : 0.000181s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.35% : 0.000215s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032322 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.36% : 0.003671s : 1: add_attr 11.32% : 0.003659s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000059s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000064s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000624s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.34% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000497s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 3.11% : 0.001005s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.05% : 0.002280s : 1: opt_a 0.31% : 0.000100s : 1: opt_after_cconv 1.63% : 0.000528s : 1: opt_after_jit_grad 0.57% : 0.000184s : 1: opt_b 12.96% : 0.004188s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.82% : 0.000264s : 1: renormalize.infer 0.71% : 0.000228s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.46% : 0.000147s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000072s : 1: symbol_engine_optimizer 21.14% : 0.006832s : 1: task_emit 0.23% : 0.000073s : 1: tuple_transform 20.18% : 0.006522s : 1: type_inference 0.20% : 0.000065s : 1: validate TotalTime = 0.0189528, [24] [bootstrap]: 0.00047454 [type_inference]: 0.00451624 [event_method]: 1.068e-05 [auto_monad]: 5.044e-05 [graph_reusing]: 4.70999e-06 [inline]: 1.99999e-06 [add_attr]: 0.00311319, [1] [add_attr_with_inline]: 0.00310442, [1] [Cycle 1]: 5.581e-05, [2] [tag_attr]: 1.261e-05 [meta_addattr_fg_expand]: 3.43e-06 [parallel-infer-symbol]: 2.99999e-06 [pre_auto_parallel]: 2.362e-05 [insert-virtual-dataset]: 2.79999e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.19999e-06 [pipeline_split]: 1.96e-06 [optimize]: 0.00384272, [53] [py_interpret_to_execute]: 1.655e-05 [rewriter_before_opt_a]: 4.049e-05 [opt_a]: 0.00192906, [2] [Cycle 1]: 0.0013247, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 2.453e-05 [loop_unroll]: 1.364e-05 [a_1]: 0.00029694 [with_stream_mark]: 1.495e-05 [recompute_prepare]: 7.51001e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.05998e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.722e-05 [accelerated_algorithm]: 6.14999e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 5.93002e-06 [merge_send_recv]: 7.51001e-06 [auto_parallel]: 6.17999e-06 [parallel]: 1.77e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.46999e-06 [allreduce_fusion]: 3.14001e-06 [matmul_add_comm_reduction]: 8.33999e-06 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 7.92998e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.61003e-06 [virtual_output]: 5.59e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.62999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.12999e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 1.037e-05 [a_after_grad]: 9.53002e-06 [renormalize]: 0.00040178 [add_forward_monad_depend]: 4.60001e-06 [auto_monad_grad]: 1.91998e-06 [auto_monad_eliminator]: 1.308e-05 [cse]: 2.695e-05 [a_3]: 4.128e-05 [Cycle 2]: 0.0005944, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.97002e-06 [loop_unroll]: 5.53002e-06 [a_1]: 0.00012559 [with_stream_mark]: 9.76e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 3.14001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.752e-05 [accelerated_algorithm]: 5.72001e-06 [shard]: 1.22999e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.40001e-06 [merge_send_recv]: 4.75001e-06 [auto_parallel]: 5.22999e-06 [parallel]: 4.58999e-06 [flash_sp]: 3.13998e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 5.99999e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 5.36998e-06 [merge_forward]: 2.55002e-06 [cell_reuse_recompute_pass]: 1.60001e-06 [offload_activation]: 6.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.7e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 8.79e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 6.43998e-06 [cse]: 1.235e-05 [a_3]: 3.236e-05 [py_interpret_to_execute_after_opt_a]: 7.93999e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.245e-05 [convert_after_rewriter]: 6.49999e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00048106 [opt_b]: 0.0001833, [1] [Cycle 1]: 0.00017708, [7] [b_1]: 0.00010684 [b_2]: 7.31999e-06 [updatestate_depend_eliminate]: 6.36998e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.45002e-06 [renormalize]: 3.50003e-07 [cse]: 1.69e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 7.728e-05 [loop_unroll]: 0.00041993 [opt_after_cconv]: 9.516e-05, [1] [Cycle 1]: 8.908e-05, [7] [c_1]: 2.739e-05 [parameter_eliminate]: 2.38998e-06 [updatestate_depend_eliminate]: 5.49998e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.611e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.194e-05 [tuple_transform]: 6.984e-05, [1] [Cycle 1]: 6.545e-05, [4] [d_1]: 4.008e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 4.558e-05 [cse_after_recomputation]: 2.015e-05, [1] [Cycle 1]: 1.581e-05, [1] [cse]: 1.069e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.39998e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.49e-06 [full_micro_interleaved_order_control]: 2.69999e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.195e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.35e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.35999e-06 [overlap_grad_flash_sp]: 1.947e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 6.875e-05, [1] [Cycle 1]: 6.434e-05, [6] [build]: 2.37999e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.154e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.99998e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.80001e-06 [auto_monad_reorder]: 1.585e-05 [get_jit_bprop_graph]: 1.06997e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00045636 [validate]: 3.155e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00617984 [execute]: 7.90998e-06 Sums bootstrap : 0.000475s : 3.19% type_inference : 0.004516s : 30.37% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000423s : 2.84% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000402s : 2.70% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000039s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000481s : 3.24% optimize.opt_b.b_1 : 0.000107s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000077s : 0.52% optimize.loop_unroll : 0.000420s : 2.82% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.13% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000456s : 3.07% validate : 0.000032s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006180s : 41.56% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000125 26 17.98% : 0.000022s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.67% : 0.000006s : 4: substitution.graph_param_transform 66.19% : 0.000083s : 2: substitution.inline 2.12% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000004s : 4: substitution.remove_not_recompute_node 3.14% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004474 2 92.26% : 0.004128s : 1: type_inference.infer 7.74% : 0.000346s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000140 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.72% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.76% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.44% : 0.000001s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.48% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 1.08% : 0.000002s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.47% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 1.07% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.38% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.87% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000300 6 52.30% : 0.000157s : 2: func_graph_cloner_run.FuncGraphClonerGraph 47.70% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027241 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.44% : 0.003118s : 1: add_attr 11.41% : 0.003108s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000056s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000511s : 1: bootstrap 0.30% : 0.000081s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000429s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.80% : 0.000490s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.85% : 0.000776s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001932s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000466s : 1: opt_after_jit_grad 0.69% : 0.000187s : 1: opt_b 14.12% : 0.003847s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.83% : 0.000226s : 1: renormalize.infer 0.62% : 0.000168s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000071s : 1: symbol_engine_optimizer 22.72% : 0.006190s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.64% : 0.004532s : 1: type_inference 0.22% : 0.000060s : 1: validate TotalTime = 0.0211413, [24] [bootstrap]: 0.00047843 [type_inference]: 0.00588532 [event_method]: 7.511e-05 [auto_monad]: 6.017e-05 [graph_reusing]: 5.40999e-06 [inline]: 2.04e-06 [add_attr]: 0.00304675, [1] [add_attr_with_inline]: 0.00303746, [1] [Cycle 1]: 5.152e-05, [2] [tag_attr]: 1.63e-05 [meta_addattr_fg_expand]: 3.72998e-06 [parallel-infer-symbol]: 4.27e-06 [pre_auto_parallel]: 2.846e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.0043321, [53] [py_interpret_to_execute]: 2.083e-05 [rewriter_before_opt_a]: 5.985e-05 [opt_a]: 0.00234101, [2] [Cycle 1]: 0.00169596, [45] [expand_dump_flag]: 3.37997e-06 [switch_simplify]: 3.476e-05 [loop_unroll]: 2.069e-05 [a_1]: 0.00045341 [with_stream_mark]: 1.699e-05 [recompute_prepare]: 8.97999e-06 [updatestate_depend_eliminate]: 4.03001e-06 [updatestate_assign_eliminate]: 3.38999e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 1.55999e-06 [a_2]: 0.00013993 [accelerated_algorithm]: 8.02003e-06 [shard]: 1.84e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 5.90002e-06 [merge_send_recv]: 9.39e-06 [auto_parallel]: 7.24001e-06 [parallel]: 1.795e-05 [flash_sp]: 7.91001e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 9.91e-06 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 7.67002e-06 [virtual_dataset]: 5.81003e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.71e-06 [offload_activation]: 9.59999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.231e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.26002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 3.23e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 1.127e-05 [a_after_grad]: 9.55001e-06 [renormalize]: 0.00048825 [add_forward_monad_depend]: 5.59998e-06 [auto_monad_grad]: 2.24999e-06 [auto_monad_eliminator]: 1.602e-05 [cse]: 2.8e-05 [a_3]: 4.245e-05 [Cycle 2]: 0.00063414, [45] [expand_dump_flag]: 1.64e-06 [switch_simplify]: 7.11999e-06 [loop_unroll]: 5.58002e-06 [a_1]: 0.00012593 [with_stream_mark]: 1.307e-05 [recompute_prepare]: 6.48e-06 [updatestate_depend_eliminate]: 3.11001e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.897e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.32999e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 4.85001e-06 [auto_parallel]: 5.85002e-06 [parallel]: 5.03002e-06 [flash_sp]: 3.65998e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.83999e-06 [matmul_add_comm_reduction]: 6.59001e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 7.63999e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 4.78001e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.65001e-06 [offload_activation]: 7.22997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.186e-05 [merge_recompute_call_nodes]: 8.60018e-07 [before_grad]: 8.75001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.67001e-06 [after_resolve]: 9.54999e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.59e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 9.74e-06 [cse]: 1.913e-05 [a_3]: 3.264e-05 [py_interpret_to_execute_after_opt_a]: 9.74e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.525e-05 [convert_after_rewriter]: 7.90998e-06 [order_py_execute_after_rewriter]: 5.25001e-06 [mutable_eliminate]: 0.00050454 [opt_b]: 0.00019093, [1] [Cycle 1]: 0.00018415, [7] [b_1]: 0.00010929 [b_2]: 7.46001e-06 [updatestate_depend_eliminate]: 6.91001e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.80009e-07 [cse]: 1.867e-05 [optimize_parallel_all_gather_comm]: 1.838e-05 [overlap_param_gather]: 2.24001e-06 [cconv]: 2.521e-05 [loop_unroll]: 0.0004563 [opt_after_cconv]: 0.00010218, [1] [Cycle 1]: 9.544e-05, [7] [c_1]: 2.76e-05 [parameter_eliminate]: 3.33998e-06 [updatestate_depend_eliminate]: 6.49999e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.35002e-06 [cse]: 1.852e-05 [renormalize]: 6.50005e-07 [remove_dup_value]: 1.304e-05 [tuple_transform]: 7.177e-05, [1] [Cycle 1]: 6.732e-05, [4] [d_1]: 4.094e-05 [none_parameter_eliminate]: 1.32999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 4.811e-05 [cse_after_recomputation]: 2.089e-05, [1] [Cycle 1]: 1.625e-05, [1] [cse]: 1.095e-05 [environ_conv]: 4.71997e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.45997e-06 [label_micro_interleaved_index]: 5.35999e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.79999e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.16002e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.49978e-07 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.35001e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.308e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.82002e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.54999e-06 [overlap_grad_ring_attention]: 3.88999e-06 [overlap_grad_flash_sp]: 1.91e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 7.449e-05, [1] [Cycle 1]: 6.956e-05, [6] [build]: 2.98e-06 [elim_shapecalc]: 9.64e-06 [elim_not_effective]: 1.219e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 8.55999e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.76998e-06 [pipeline_parallel_scheduler]: 1.60001e-06 [auto_monad_reorder]: 1.793e-05 [get_jit_bprop_graph]: 1.52999e-06 [rewriter_after_jit_bprop_graph]: 3.91999e-06 [opt_after_jit_grad]: 0.00057519 [validate]: 3.482e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00635399 [execute]: 7.87998e-06 Sums bootstrap : 0.000478s : 2.80% type_inference : 0.005885s : 34.50% event_method : 0.000075s : 0.44% auto_monad : 0.000060s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.03% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000042s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000579s : 3.40% optimize.opt_a.with_stream_mark : 0.000030s : 0.18% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000209s : 1.22% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.13% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000488s : 2.86% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.15% optimize.opt_a.cse : 0.000047s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000505s : 2.96% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000456s : 2.67% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.11% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000575s : 3.37% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006354s : 37.24% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000175 30 16.09% : 0.000028s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000001s : 2: substitution.fold_const_symbol 3.28% : 0.000006s : 4: substitution.graph_param_transform 66.05% : 0.000116s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.84% : 0.000005s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 4: substitution.replace_old_param 6.15% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005838 2 90.27% : 0.005270s : 1: type_inference.infer 9.73% : 0.000568s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.60% : 0.000027s : 3: replace.inline 30.40% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 92.18% : 0.000114s : 3: match.inline 7.82% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000226 1131 0.63% : 0.000001s : 11: predicate.accumulaten_eliminater 0.72% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.41% : 0.000001s : 8: predicate.addn_check_dump 0.54% : 0.000001s : 11: predicate.addn_zero_filter 0.55% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.52% : 0.000003s : 19: predicate.arithmetic_simplify 0.62% : 0.000001s : 11: predicate.cast_eliminate 0.48% : 0.000001s : 8: predicate.check_bprop_eliminate 0.40% : 0.000001s : 8: predicate.compare_switch_simplify 0.15% : 0.000000s : 4: predicate.const_output_eliminate 0.41% : 0.000001s : 8: predicate.depend_value_elim 0.60% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.69% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.65% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.78% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.75% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.76% : 0.000002s : 15: predicate.environ_get_depend_swap 1.26% : 0.000003s : 23: predicate.environ_get_eliminate 0.75% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.88% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.56% : 0.000004s : 16: predicate.float_depend_g_call 0.40% : 0.000001s : 8: predicate.float_environ_get_switch 0.63% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 4: predicate.fold_const_symbol 0.54% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000001s : 4: predicate.graph_param_transform 0.61% : 0.000001s : 8: predicate.incorporate_call 0.42% : 0.000001s : 8: predicate.incorporate_call_switch 4.70% : 0.000011s : 51: predicate.inline 0.80% : 0.000002s : 8: predicate.inline_without_move 0.27% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.61% : 0.000001s : 8: predicate.less_batch_normalization 1.26% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.60% : 0.000004s : 32: predicate.load_eliminater 1.24% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.57% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.22% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.46% : 0.000001s : 8: predicate.merge_addn 0.47% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.45% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.55% : 0.000001s : 11: predicate.minmaximum_grad 1.33% : 0.000003s : 4: predicate.mutable_eliminate 0.25% : 0.000001s : 4: predicate.opt_reshape 0.29% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000003s : 16: predicate.partial_defer_inline 0.99% : 0.000002s : 17: predicate.partial_eliminate 0.60% : 0.000001s : 11: predicate.print_const_string_wrapper 0.48% : 0.000001s : 8: predicate.reduce_all_const_elim 0.74% : 0.000002s : 11: predicate.reduce_eliminate 1.65% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.05% : 0.000002s : 21: predicate.replace_applicator 0.45% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.59% : 0.000001s : 11: predicate.reshape_eliminate 0.50% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.28% : 0.000001s : 4: predicate.row_tensor_eliminate 0.63% : 0.000001s : 8: predicate.same_eliminate 0.36% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000002s : 8: predicate.shard_identity_eliminate 0.56% : 0.000001s : 8: predicate.special_op_eliminate 27.07% : 0.000061s : 8: predicate.specialize_transform 0.73% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.96% : 0.000002s : 16: predicate.switch_defer_inline 1.43% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.98% : 0.000009s : 54: predicate.switch_simplify 0.57% : 0.000001s : 11: predicate.tile_eliminate 0.61% : 0.000001s : 11: predicate.transpose_eliminate 1.11% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.05% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 0.93% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 0.97% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.57% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.15% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.59% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.28% : 0.000001s : 4: predicate.value_based_eliminate 0.57% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.51% : 0.000001s : 8: predicate.virtual_output_eliminate 0.22% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.33% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000359 8 45.68% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.32% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030192 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.11% : 0.003051s : 1: add_attr 10.07% : 0.003042s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000065s : 1: auto_monad 0.07% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.70% : 0.000515s : 1: bootstrap 0.10% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.27% : 0.000082s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.54% : 0.000466s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000515s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 3.39% : 0.001022s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.76% : 0.002344s : 1: opt_a 0.35% : 0.000106s : 1: opt_after_cconv 1.94% : 0.000587s : 1: opt_after_jit_grad 0.64% : 0.000194s : 1: opt_b 14.36% : 0.004336s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.87% : 0.000263s : 1: renormalize.infer 0.72% : 0.000217s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000077s : 1: symbol_engine_optimizer 21.09% : 0.006367s : 1: task_emit 0.25% : 0.000075s : 1: tuple_transform 19.55% : 0.005903s : 1: type_inference 0.22% : 0.000067s : 1: validate TotalTime = 0.0395803, [24] [bootstrap]: 0.00051978 [type_inference]: 0.0120084 [event_method]: 4.737e-05 [auto_monad]: 0.00012196 [graph_reusing]: 8.52998e-06 [inline]: 1.93002e-06 [add_attr]: 0.00320112, [1] [add_attr_with_inline]: 0.0031919, [1] [Cycle 1]: 7.948e-05, [2] [tag_attr]: 3.736e-05 [meta_addattr_fg_expand]: 9.15999e-06 [parallel-infer-symbol]: 3.88999e-06 [pre_auto_parallel]: 5.392e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.0141293, [53] [py_interpret_to_execute]: 3.905e-05 [rewriter_before_opt_a]: 0.0001482 [opt_a]: 0.0117333, [3] [Cycle 1]: 0.00755895, [45] [expand_dump_flag]: 3.81999e-06 [switch_simplify]: 7.384e-05 [loop_unroll]: 6.122e-05 [a_1]: 0.00148885 [with_stream_mark]: 2.615e-05 [recompute_prepare]: 2.334e-05 [updatestate_depend_eliminate]: 9.49999e-06 [updatestate_assign_eliminate]: 7.35e-06 [updatestate_loads_eliminate]: 7.13e-06 [parameter_eliminate]: 2.88998e-06 [a_2]: 0.00024623 [accelerated_algorithm]: 6.855e-05 [shard]: 2.18002e-06 [meta_shard_fg_expand]: 3.74002e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.746e-05 [auto_parallel]: 1.202e-05 [parallel]: 2.02e-05 [flash_sp]: 1.318e-05 [merge_comm]: 9.84001e-06 [allreduce_fusion]: 8.87999e-06 [matmul_add_comm_reduction]: 2.697e-05 [allreduce_slice_to_reducescatter]: 1.01002e-06 [virtual_shard_identity]: 1.833e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.49e-05 [virtual_output]: 1.533e-05 [merge_forward]: 1.039e-05 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 1.814e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.957e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 2.744e-05 [set_forward_comm_id_for_comm_node_pass]: 9.74e-06 [meta_fg_expand]: 0.00149699 [flash_sp_send_recv_attached]: 3.85e-06 [receive_attached]: 2.91999e-06 [after_resolve]: 6.173e-05 [a_after_grad]: 8.277e-05 [renormalize]: 0.00271164 [add_forward_monad_depend]: 1.025e-05 [auto_monad_grad]: 6.44001e-06 [auto_monad_eliminator]: 5.648e-05 [cse]: 0.00016686 [a_3]: 0.00033969 [Cycle 2]: 0.00322558, [45] [expand_dump_flag]: 2.08002e-06 [switch_simplify]: 4.771e-05 [loop_unroll]: 4.378e-05 [a_1]: 0.00157372 [with_stream_mark]: 1.62e-05 [recompute_prepare]: 1.186e-05 [updatestate_depend_eliminate]: 6.17999e-06 [updatestate_assign_eliminate]: 4.58999e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 0.00012864 [accelerated_algorithm]: 1.259e-05 [shard]: 1.61002e-06 [meta_shard_fg_expand]: 2.73e-06 [shard_inline]: 9.05001e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 8.33999e-06 [parallel]: 6.43e-06 [flash_sp]: 3.55003e-06 [merge_comm]: 5.39998e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 8.66002e-06 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 1.079e-05 [virtual_dataset]: 8.81002e-06 [get_grad_eliminate_]: 9.14e-06 [virtual_output]: 8.69003e-06 [merge_forward]: 4.93001e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.027e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.769e-05 [merge_recompute_call_nodes]: 9.79984e-07 [before_grad]: 1.502e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40001e-06 [meta_fg_expand]: 8.393e-05 [flash_sp_send_recv_attached]: 1.12e-06 [receive_attached]: 1.72001e-06 [after_resolve]: 1.834e-05 [a_after_grad]: 1.537e-05 [renormalize]: 0.00070502 [add_forward_monad_depend]: 4.76997e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.63e-05 [cse]: 4.86e-05 [a_3]: 6.637e-05 [Cycle 3]: 0.0009334, [45] [expand_dump_flag]: 1.58002e-06 [switch_simplify]: 1.036e-05 [loop_unroll]: 8.85999e-06 [a_1]: 0.0002524 [with_stream_mark]: 1.234e-05 [recompute_prepare]: 9.05999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 9.90025e-07 [a_2]: 0.0001241 [accelerated_algorithm]: 1.255e-05 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 2.09e-06 [shard_inline]: 8.95999e-06 [merge_send_recv]: 7.3e-06 [auto_parallel]: 7.68999e-06 [parallel]: 5.03002e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.94e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 1.058e-05 [virtual_dataset]: 8.52e-06 [get_grad_eliminate_]: 8.64e-06 [virtual_output]: 8.35999e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 1.64998e-06 [offload_activation]: 1.051e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.874e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.46e-05 [set_forward_comm_id_for_comm_node_pass]: 5.92999e-06 [meta_fg_expand]: 3.23e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.34e-06 [after_resolve]: 1.398e-05 [a_after_grad]: 1.409e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 2.11e-06 [auto_monad_grad]: 1.29e-06 [auto_monad_eliminator]: 1.321e-05 [cse]: 2.891e-05 [a_3]: 5.998e-05 [py_interpret_to_execute_after_opt_a]: 1.45e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 5.062e-05 [convert_after_rewriter]: 9.53002e-06 [order_py_execute_after_rewriter]: 7.00998e-06 [mutable_eliminate]: 0.00052333 [opt_b]: 0.00029589, [1] [Cycle 1]: 0.00028858, [7] [b_1]: 0.0001898 [b_2]: 1.103e-05 [updatestate_depend_eliminate]: 8.12e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.8e-06 [renormalize]: 4.99975e-07 [cse]: 3.423e-05 [optimize_parallel_all_gather_comm]: 2.14e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.288e-05 [loop_unroll]: 0.0004446 [opt_after_cconv]: 0.00014026, [1] [Cycle 1]: 0.00013406, [7] [c_1]: 4.857e-05 [parameter_eliminate]: 3.35e-06 [updatestate_depend_eliminate]: 7.41999e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.93001e-06 [cse]: 3.165e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 3.228e-05 [tuple_transform]: 0.00010072, [1] [Cycle 1]: 9.639e-05, [4] [d_1]: 6.645e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.74e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 6.084e-05 [cse_after_recomputation]: 3.346e-05, [1] [Cycle 1]: 2.878e-05, [1] [cse]: 2.329e-05 [environ_conv]: 9.37001e-06 [swap_dp_allreduce_reducescatter]: 7.75e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.94003e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 9.60019e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.28002e-06 [add_comm_op_reuse_tag]: 1.34e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.743e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 5.42999e-06 [overlap_recompute_and_grad_model_parallel]: 6.02001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 5.41998e-06 [overlap_grad_flash_sp]: 2.536e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.42999e-06 [symbol_engine_optimizer]: 0.00010051, [1] [Cycle 1]: 9.581e-05, [6] [build]: 1.018e-05 [elim_shapecalc]: 1.367e-05 [elim_not_effective]: 1.777e-05 [opt_reshape]: 9.86998e-06 [fold_const_symbol]: 1.503e-05 [renormalize]: 2.29978e-07 [detach_backward]: 2.11e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 2.587e-05 [get_jit_bprop_graph]: 1.42e-06 [rewriter_after_jit_bprop_graph]: 4.34002e-06 [opt_after_jit_grad]: 0.00059705 [validate]: 4.837e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0085566 [execute]: 8.48999e-06 Sums bootstrap : 0.000520s : 1.48% type_inference : 0.012008s : 34.27% event_method : 0.000047s : 0.14% auto_monad : 0.000122s : 0.35% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000054s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000148s : 0.42% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.38% optimize.opt_a.loop_unroll : 0.000114s : 0.32% optimize.opt_a.a_1 : 0.003315s : 9.46% optimize.opt_a.with_stream_mark : 0.000055s : 0.16% optimize.opt_a.recompute_prepare : 0.000044s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000499s : 1.42% optimize.opt_a.accelerated_algorithm : 0.000094s : 0.27% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000033s : 0.09% optimize.opt_a.auto_parallel : 0.000028s : 0.08% optimize.opt_a.parallel : 0.000032s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.09% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000020s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000039s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001584s : 4.52% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000094s : 0.27% optimize.opt_a.a_after_grad : 0.000112s : 0.32% optimize.opt_a.renormalize : 0.003417s : 9.75% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.05% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.25% optimize.opt_a.cse : 0.000244s : 0.70% optimize.opt_a.a_3 : 0.000466s : 1.33% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000523s : 1.49% optimize.opt_b.b_1 : 0.000190s : 0.54% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.07% optimize.loop_unroll : 0.000445s : 1.27% optimize.opt_after_cconv.c_1 : 0.000049s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000032s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.17% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000597s : 1.70% validate : 0.000048s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008557s : 24.42% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000835 222 6.20% : 0.000052s : 12: substitution.arithmetic_simplify 1.97% : 0.000016s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.50% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.91% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.47% : 0.000471s : 17: substitution.inline 2.06% : 0.000017s : 2: substitution.inline_without_move 1.28% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000016s : 3: substitution.less_batch_normalization 1.61% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000006s : 5: substitution.partial_eliminate 1.91% : 0.000016s : 20: substitution.remove_not_recompute_node 3.25% : 0.000027s : 10: substitution.replace_applicator 1.40% : 0.000012s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.34% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.68% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.17% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.38% : 0.000070s : 30: substitution.tuple_list_get_item_eliminator 2.25% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011923 2 87.36% : 0.010416s : 1: type_inference.infer 12.64% : 0.001507s : 1: type_inference.specialize ------[replace.] 0.000229 33 57.51% : 0.000132s : 17: replace.inline 42.49% : 0.000097s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000497 33 92.85% : 0.000461s : 17: match.inline 7.15% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000760 5764 1.12% : 0.000008s : 68: predicate.accumulaten_eliminater 0.33% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.44% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_depend_swap 1.71% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.72% : 0.000043s : 249: predicate.inline 1.34% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.38% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.40% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.06% : 0.000016s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.97% : 0.000015s : 152: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.42% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.66% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.18% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.91% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.97% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.59% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001638 34 57.14% : 0.000936s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.86% : 0.000702s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065691 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.88% : 0.003206s : 1: add_attr 4.86% : 0.003196s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000130s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.85% : 0.000557s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.08% : 0.000055s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.69% : 0.000454s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.81% : 0.000533s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000020s : 1: opt.transform.mutable_eliminate 7.67% : 0.005040s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000174s : 28: opt.transform.opt_b 0.11% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.87% : 0.011737s : 1: opt_a 0.22% : 0.000144s : 1: opt_after_cconv 0.93% : 0.000609s : 1: opt_after_jit_grad 0.46% : 0.000299s : 1: opt_b 21.52% : 0.014135s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000058s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.03% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000037s : 1: remove_dup_value 2.86% : 0.001877s : 2: renormalize.infer 2.32% : 0.001524s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000055s : 1: rewriter_after_opt_a 0.23% : 0.000153s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000103s : 1: symbol_engine_optimizer 13.04% : 0.008569s : 1: task_emit 0.16% : 0.000104s : 1: tuple_transform 18.31% : 0.012028s : 1: type_inference 0.13% : 0.000086s : 1: validate TotalTime = 0.0202836, [24] [bootstrap]: 0.00049503 [type_inference]: 0.00463366 [event_method]: 1.165e-05 [auto_monad]: 5.299e-05 [graph_reusing]: 5.09e-06 [inline]: 2.06e-06 [add_attr]: 0.00325414, [1] [add_attr_with_inline]: 0.00324488, [1] [Cycle 1]: 5.193e-05, [2] [tag_attr]: 1.338e-05 [meta_addattr_fg_expand]: 3.08e-06 [parallel-infer-symbol]: 3.71999e-06 [pre_auto_parallel]: 2.513e-05 [insert-virtual-dataset]: 2.48002e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00405061, [53] [py_interpret_to_execute]: 1.691e-05 [rewriter_before_opt_a]: 4.071e-05 [opt_a]: 0.00208539, [2] [Cycle 1]: 0.0014199, [45] [expand_dump_flag]: 2.97002e-06 [switch_simplify]: 2.586e-05 [loop_unroll]: 1.405e-05 [a_1]: 0.0003066 [with_stream_mark]: 1.641e-05 [recompute_prepare]: 8.72998e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.78001e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.97999e-06 [a_2]: 7.837e-05 [accelerated_algorithm]: 6.94999e-06 [shard]: 3.51999e-06 [meta_shard_fg_expand]: 1.42999e-06 [shard_inline]: 6.08002e-06 [merge_send_recv]: 8.15999e-06 [auto_parallel]: 7.17997e-06 [parallel]: 2.385e-05 [flash_sp]: 8.79e-06 [merge_comm]: 4.11001e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.47001e-06 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 8.72998e-06 [virtual_dataset]: 5.65001e-06 [get_grad_eliminate_]: 5.72999e-06 [virtual_output]: 5.54e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 2.01e-06 [offload_activation]: 1.039e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.211e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.91001e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.09999e-06 [after_resolve]: 1.107e-05 [a_after_grad]: 9.10999e-06 [renormalize]: 0.00043869 [add_forward_monad_depend]: 5.34998e-06 [auto_monad_grad]: 2.21998e-06 [auto_monad_eliminator]: 1.454e-05 [cse]: 2.764e-05 [a_3]: 4.238e-05 [Cycle 2]: 0.00065492, [45] [expand_dump_flag]: 1.64e-06 [switch_simplify]: 6.85998e-06 [loop_unroll]: 5.77999e-06 [a_1]: 0.00012772 [with_stream_mark]: 1.421e-05 [recompute_prepare]: 5.94e-06 [updatestate_depend_eliminate]: 3.41999e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.838e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.30999e-06 [meta_shard_fg_expand]: 1.32e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 4.96002e-06 [auto_parallel]: 6.25002e-06 [parallel]: 5.57001e-06 [flash_sp]: 3.03e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.94e-06 [allreduce_slice_to_reducescatter]: 5.10016e-07 [virtual_shard_identity]: 6.56999e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.72001e-06 [offload_activation]: 7.32002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.04e-05 [merge_recompute_call_nodes]: 8.40024e-07 [before_grad]: 8.17998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.87001e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.49e-06 [after_resolve]: 9.47999e-06 [a_after_grad]: 8.19002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 2.36998e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.03e-05 [cse]: 1.628e-05 [a_3]: 3.4e-05 [py_interpret_to_execute_after_opt_a]: 1.077e-05 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 3.663e-05 [convert_after_rewriter]: 7.87e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.0005116 [opt_b]: 0.00019004, [1] [Cycle 1]: 0.00018304, [7] [b_1]: 0.00010825 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 6.57002e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 5.50004e-07 [cse]: 1.926e-05 [optimize_parallel_all_gather_comm]: 1.816e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.467e-05 [loop_unroll]: 0.00044772 [opt_after_cconv]: 0.00010061, [1] [Cycle 1]: 9.446e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 3.31001e-06 [updatestate_depend_eliminate]: 6.23e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.06e-06 [cse]: 1.812e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.296e-05 [tuple_transform]: 7.101e-05, [1] [Cycle 1]: 6.676e-05, [4] [d_1]: 4.025e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.929e-05 [cse_after_recomputation]: 2.167e-05, [1] [Cycle 1]: 1.704e-05, [1] [cse]: 1.148e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 3.18e-06 [label_micro_interleaved_index]: 4.94998e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.47999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.66e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.353e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.915e-05 [begin_end_overlap_inline]: 7.79983e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.34998e-06 [symbol_engine_optimizer]: 7.289e-05, [1] [Cycle 1]: 6.812e-05, [6] [build]: 3.11001e-06 [elim_shapecalc]: 9.47001e-06 [elim_not_effective]: 1.223e-05 [opt_reshape]: 5.92001e-06 [fold_const_symbol]: 8.55999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.58003e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.773e-05 [get_jit_bprop_graph]: 1.21002e-06 [rewriter_after_jit_bprop_graph]: 4.03001e-06 [opt_after_jit_grad]: 0.00049464 [validate]: 3.585e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00695424 [execute]: 8.10999e-06 Sums bootstrap : 0.000495s : 3.10% type_inference : 0.004634s : 29.01% event_method : 0.000012s : 0.07% auto_monad : 0.000053s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.11% optimize.rewriter_before_opt_a : 0.000041s : 0.25% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000033s : 0.20% optimize.opt_a.loop_unroll : 0.000020s : 0.12% optimize.opt_a.a_1 : 0.000434s : 2.72% optimize.opt_a.with_stream_mark : 0.000031s : 0.19% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000005s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.18% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000439s : 2.75% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.05% optimize.opt_a.auto_monad_grad : 0.000004s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.16% optimize.opt_a.cse : 0.000044s : 0.27% optimize.opt_a.a_3 : 0.000076s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.23% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000512s : 3.20% optimize.opt_b.b_1 : 0.000108s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.15% optimize.loop_unroll : 0.000448s : 2.80% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.02% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000018s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000495s : 3.10% validate : 0.000036s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006954s : 43.54% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000136 26 19.05% : 0.000026s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 4.39% : 0.000006s : 4: substitution.graph_param_transform 64.51% : 0.000088s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.72% : 0.000005s : 4: substitution.remove_not_recompute_node 3.62% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004587 2 91.87% : 0.004214s : 1: type_inference.infer 8.13% : 0.000373s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000086 2 100.00% : 0.000086s : 2: match.inline ------[predicate.] 0.000146 984 0.71% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.68% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.52% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.75% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 2.24% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.99% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_depend_swap 1.71% : 0.000002s : 21: predicate.environ_get_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.88% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.05% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.38% : 0.000001s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000009s : 44: predicate.inline 1.21% : 0.000002s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.52% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 2.05% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.58% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.80% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.64% : 0.000001s : 9: predicate.minmaximum_grad 2.01% : 0.000003s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.31% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.73% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 1.97% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.86% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.47% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 1.40% : 0.000002s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.05% : 0.000002s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.53% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.96% : 0.000001s : 11: predicate.switch_defer_inline 1.66% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.69% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.41% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.76% : 0.000003s : 17: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.28% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.98% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.85% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000267 6 41.08% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.92% : 0.000157s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028988 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.24% : 0.003259s : 1: add_attr 11.21% : 0.003249s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000059s : 1: auto_monad 0.08% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.84% : 0.000534s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000017s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.58% : 0.000458s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.80% : 0.000522s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 2.76% : 0.000799s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.09% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.20% : 0.002088s : 1: opt_a 0.36% : 0.000104s : 1: opt_after_cconv 1.74% : 0.000506s : 1: opt_after_jit_grad 0.67% : 0.000194s : 1: opt_b 13.99% : 0.004055s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.07% : 0.000021s : 1: py_interpret_to_execute 0.05% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.86% : 0.000249s : 1: renormalize.infer 0.63% : 0.000182s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000041s : 1: rewriter_after_opt_a 0.15% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 24.04% : 0.006969s : 1: task_emit 0.25% : 0.000074s : 1: tuple_transform 16.04% : 0.004651s : 1: type_inference 0.24% : 0.000068s : 1: validate TotalTime = 0.0405963, [24] [bootstrap]: 0.00052863 [type_inference]: 0.0114176 [event_method]: 4.213e-05 [auto_monad]: 0.00011846 [graph_reusing]: 7.69002e-06 [inline]: 2.21e-06 [add_attr]: 0.00325096, [1] [add_attr_with_inline]: 0.00324184, [1] [Cycle 1]: 8.566e-05, [2] [tag_attr]: 3.544e-05 [meta_addattr_fg_expand]: 8.62e-06 [parallel-infer-symbol]: 3.41001e-06 [pre_auto_parallel]: 5.024e-05 [insert-virtual-dataset]: 2.19999e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.0146574, [53] [py_interpret_to_execute]: 3.952e-05 [rewriter_before_opt_a]: 0.00013087 [opt_a]: 0.012098, [3] [Cycle 1]: 0.00777642, [45] [expand_dump_flag]: 3.70998e-06 [switch_simplify]: 6.685e-05 [loop_unroll]: 5.6e-05 [a_1]: 0.00140902 [with_stream_mark]: 2.898e-05 [recompute_prepare]: 2.444e-05 [updatestate_depend_eliminate]: 9.87001e-06 [updatestate_assign_eliminate]: 7.88001e-06 [updatestate_loads_eliminate]: 8e-06 [parameter_eliminate]: 3.11001e-06 [a_2]: 0.00024977 [accelerated_algorithm]: 3.446e-05 [shard]: 1.84e-06 [meta_shard_fg_expand]: 3.61001e-06 [shard_inline]: 1.703e-05 [merge_send_recv]: 1.721e-05 [auto_parallel]: 1.253e-05 [parallel]: 2.069e-05 [flash_sp]: 1.349e-05 [merge_comm]: 1.14e-05 [allreduce_fusion]: 9.90002e-06 [matmul_add_comm_reduction]: 3.077e-05 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 2.021e-05 [virtual_dataset]: 1.558e-05 [get_grad_eliminate_]: 1.546e-05 [virtual_output]: 1.552e-05 [merge_forward]: 1.05e-05 [cell_reuse_recompute_pass]: 1.53002e-06 [offload_activation]: 1.941e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.094e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 2.74e-05 [set_forward_comm_id_for_comm_node_pass]: 1.097e-05 [meta_fg_expand]: 0.00165746 [flash_sp_send_recv_attached]: 4.22e-06 [receive_attached]: 3.04999e-06 [after_resolve]: 6.28e-05 [a_after_grad]: 8.701e-05 [renormalize]: 0.00284474 [add_forward_monad_depend]: 1.215e-05 [auto_monad_grad]: 6.26e-06 [auto_monad_eliminator]: 5.918e-05 [cse]: 0.00016569 [a_3]: 0.00034162 [Cycle 2]: 0.0032728, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 4.865e-05 [loop_unroll]: 4.437e-05 [a_1]: 0.00162414 [with_stream_mark]: 1.871e-05 [recompute_prepare]: 1.36e-05 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 4.97999e-06 [updatestate_loads_eliminate]: 4.15e-06 [parameter_eliminate]: 1.51998e-06 [a_2]: 0.00013025 [accelerated_algorithm]: 1.395e-05 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 2.32999e-06 [shard_inline]: 9.84001e-06 [merge_send_recv]: 9.55001e-06 [auto_parallel]: 1.048e-05 [parallel]: 8.37e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 5.47001e-06 [allreduce_fusion]: 4.98001e-06 [matmul_add_comm_reduction]: 1.073e-05 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 1.055e-05 [virtual_dataset]: 9.11002e-06 [get_grad_eliminate_]: 8.90999e-06 [virtual_output]: 8.42e-06 [merge_forward]: 5.59e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 1.181e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.889e-05 [merge_recompute_call_nodes]: 1.04e-06 [before_grad]: 1.507e-05 [set_forward_comm_id_for_comm_node_pass]: 5.50001e-06 [meta_fg_expand]: 4.508e-05 [flash_sp_send_recv_attached]: 1.34e-06 [receive_attached]: 1.60999e-06 [after_resolve]: 1.696e-05 [a_after_grad]: 1.529e-05 [renormalize]: 0.00070643 [add_forward_monad_depend]: 5.34998e-06 [auto_monad_grad]: 1.87001e-06 [auto_monad_eliminator]: 1.794e-05 [cse]: 5.109e-05 [a_3]: 6.811e-05 [Cycle 3]: 0.0010309, [45] [expand_dump_flag]: 2.36998e-06 [switch_simplify]: 1.256e-05 [loop_unroll]: 8.92999e-06 [a_1]: 0.00025586 [with_stream_mark]: 1.442e-05 [recompute_prepare]: 1.018e-05 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 4.38999e-06 [updatestate_loads_eliminate]: 4.12e-06 [parameter_eliminate]: 1.52999e-06 [a_2]: 0.00012594 [accelerated_algorithm]: 1.235e-05 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 2.36e-06 [shard_inline]: 9.17001e-06 [merge_send_recv]: 7.76001e-06 [auto_parallel]: 7.92e-06 [parallel]: 5.42001e-06 [flash_sp]: 9.61e-06 [merge_comm]: 6.16e-06 [allreduce_fusion]: 5.00999e-06 [matmul_add_comm_reduction]: 8.65001e-06 [allreduce_slice_to_reducescatter]: 5.10016e-07 [virtual_shard_identity]: 1.171e-05 [virtual_dataset]: 8.67998e-06 [get_grad_eliminate_]: 8.72998e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.70001e-06 [cell_reuse_recompute_pass]: 1.81e-06 [offload_activation]: 1.002e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.745e-05 [merge_recompute_call_nodes]: 9.5999e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 6.01e-06 [meta_fg_expand]: 3.28e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.35001e-06 [after_resolve]: 1.387e-05 [a_after_grad]: 1.436e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 2.45002e-06 [auto_monad_grad]: 1.49e-06 [auto_monad_eliminator]: 1.319e-05 [cse]: 3.235e-05 [a_3]: 6.13e-05 [py_interpret_to_execute_after_opt_a]: 1.58e-05 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 5.309e-05 [convert_after_rewriter]: 9.67001e-06 [order_py_execute_after_rewriter]: 6.74999e-06 [mutable_eliminate]: 0.00061913 [opt_b]: 0.00030363, [1] [Cycle 1]: 0.00029602, [7] [b_1]: 0.00019003 [b_2]: 1.149e-05 [updatestate_depend_eliminate]: 8.95999e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 4.50999e-06 [renormalize]: 6.80011e-07 [cse]: 3.682e-05 [optimize_parallel_all_gather_comm]: 2.258e-05 [overlap_param_gather]: 1.86003e-06 [cconv]: 2.55e-05 [loop_unroll]: 0.00047525 [opt_after_cconv]: 0.00014655, [1] [Cycle 1]: 0.00013938, [7] [c_1]: 4.903e-05 [parameter_eliminate]: 3.25998e-06 [updatestate_depend_eliminate]: 9.04e-06 [updatestate_assign_eliminate]: 4.71002e-06 [updatestate_loads_eliminate]: 4.64002e-06 [cse]: 3.384e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 3.447e-05 [tuple_transform]: 0.00010279, [1] [Cycle 1]: 9.812e-05, [4] [d_1]: 6.685e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 1.30007e-07 [switch_simplify]: 1.039e-05 [partial_unused_args_eliminate]: 1.89999e-06 [add_recomputation]: 6.292e-05 [cse_after_recomputation]: 3.296e-05, [1] [Cycle 1]: 2.812e-05, [1] [cse]: 2.227e-05 [environ_conv]: 9.76e-06 [swap_dp_allreduce_reducescatter]: 7.85998e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 5.24e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 1.55999e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.32999e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.27999e-06 [overlap_opt_shard_in_pipeline]: 1.66e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66002e-06 [control_data_broadcast_order]: 1.959e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 5.70001e-06 [overlap_recompute_and_grad_model_parallel]: 6.24001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.65002e-06 [overlap_grad_ring_attention]: 5.32001e-06 [overlap_grad_flash_sp]: 2.806e-05 [begin_end_overlap_inline]: 7.2e-07 [split_matmul_comm_elemetwise]: 2.31998e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 0.00010933, [1] [Cycle 1]: 0.00010394, [6] [build]: 1.066e-05 [elim_shapecalc]: 1.561e-05 [elim_not_effective]: 1.894e-05 [opt_reshape]: 1.029e-05 [fold_const_symbol]: 1.498e-05 [renormalize]: 1.90019e-07 [detach_backward]: 1.95001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.731e-05 [get_jit_bprop_graph]: 1.95001e-06 [rewriter_after_jit_bprop_graph]: 4.52e-06 [opt_after_jit_grad]: 0.00116776 [validate]: 5.354e-05 [backend_pass]: 1.09998e-06 [task_emit]: 0.00899185 [execute]: 8.18001e-06 Sums bootstrap : 0.000529s : 1.47% type_inference : 0.011418s : 31.83% event_method : 0.000042s : 0.12% auto_monad : 0.000118s : 0.33% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000131s : 0.36% optimize.opt_a.expand_dump_flag : 0.000009s : 0.02% optimize.opt_a.switch_simplify : 0.000128s : 0.36% optimize.opt_a.loop_unroll : 0.000109s : 0.30% optimize.opt_a.a_1 : 0.003289s : 9.17% optimize.opt_a.with_stream_mark : 0.000062s : 0.17% optimize.opt_a.recompute_prepare : 0.000048s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000506s : 1.41% optimize.opt_a.accelerated_algorithm : 0.000061s : 0.17% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.10% optimize.opt_a.merge_send_recv : 0.000035s : 0.10% optimize.opt_a.auto_parallel : 0.000031s : 0.09% optimize.opt_a.parallel : 0.000034s : 0.10% optimize.opt_a.flash_sp : 0.000027s : 0.07% optimize.opt_a.merge_comm : 0.000023s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000050s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000042s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.09% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000021s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000041s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000067s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.06% optimize.opt_a.meta_fg_expand : 0.001706s : 4.76% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000094s : 0.26% optimize.opt_a.a_after_grad : 0.000117s : 0.33% optimize.opt_a.renormalize : 0.003551s : 9.90% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.06% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000090s : 0.25% optimize.opt_a.cse : 0.000249s : 0.69% optimize.opt_a.a_3 : 0.000471s : 1.31% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000053s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000619s : 1.73% optimize.opt_b.b_1 : 0.000190s : 0.53% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000037s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.07% optimize.loop_unroll : 0.000475s : 1.33% optimize.opt_after_cconv.c_1 : 0.000049s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.cse : 0.000034s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000034s : 0.10% optimize.tuple_transform.d_1 : 0.000067s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000063s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000010s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000028s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000027s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.001168s : 3.26% validate : 0.000054s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008992s : 25.07% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000873 218 6.11% : 0.000053s : 11: substitution.arithmetic_simplify 1.99% : 0.000017s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 5: substitution.fold_const_symbol 0.89% : 0.000008s : 8: substitution.graph_param_transform 0.30% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 56.52% : 0.000493s : 16: substitution.inline 2.28% : 0.000020s : 2: substitution.inline_without_move 1.21% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.14% : 0.000019s : 3: substitution.less_batch_normalization 1.53% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000006s : 5: substitution.partial_eliminate 1.69% : 0.000015s : 20: substitution.remove_not_recompute_node 3.57% : 0.000031s : 10: substitution.replace_applicator 1.29% : 0.000011s : 15: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.35% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.65% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.23% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.11% : 0.000071s : 28: substitution.tuple_list_get_item_eliminator 2.25% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011332 2 87.20% : 0.009882s : 1: type_inference.infer 12.80% : 0.001450s : 1: type_inference.specialize ------[replace.] 0.000224 30 60.63% : 0.000136s : 16: replace.inline 39.37% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000517 30 93.44% : 0.000483s : 16: match.inline 6.56% : 0.000034s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5663 1.04% : 0.000008s : 67: predicate.accumulaten_eliminater 0.38% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.04% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 99: predicate.arithmetic_simplify 1.10% : 0.000008s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.52% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.21% : 0.000002s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.73% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.32% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.60% : 0.000005s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.84% : 0.000044s : 244: predicate.inline 1.36% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.59% : 0.000020s : 164: predicate.load_eliminater 0.47% : 0.000004s : 8: predicate.loop_unroll_after_grad 2.16% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 67: predicate.minmaximum_grad 0.48% : 0.000004s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.64% : 0.000012s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.57% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.40% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000015s : 149: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 67: predicate.reshape_eliminate 1.11% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.41% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.71% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.31% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.18% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.77% : 0.000013s : 97: predicate.switch_defer_inline 2.85% : 0.000021s : 165: predicate.switch_layer_defer_inline 5.05% : 0.000038s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.41% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.40% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.55% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.58% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.19% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001692 32 59.49% : 0.001007s : 12: func_graph_cloner_run.FuncGraphClonerGraph 40.51% : 0.000686s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.067316 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.84% : 0.003256s : 1: add_attr 4.82% : 0.003246s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000069s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000127s : 1: auto_monad 0.05% : 0.000031s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000567s : 1: bootstrap 0.04% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000023s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.07% : 0.000050s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.72% : 0.000485s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.94% : 0.000630s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.03% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000021s : 1: opt.transform.mutable_eliminate 7.41% : 0.004989s : 117: opt.transform.opt_a 0.07% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000175s : 28: opt.transform.opt_b 0.11% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000056s : 4: opt.transform.symbol_engine_opt 17.98% : 0.012101s : 1: opt_a 0.22% : 0.000150s : 1: opt_after_cconv 1.75% : 0.001181s : 1: opt_after_jit_grad 0.46% : 0.000307s : 1: opt_b 21.78% : 0.014663s : 1: optimize 0.04% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000032s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000044s : 1: py_interpret_to_execute 0.03% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000039s : 1: remove_dup_value 2.89% : 0.001944s : 2: renormalize.infer 2.27% : 0.001531s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000058s : 1: rewriter_after_opt_a 0.20% : 0.000137s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000112s : 1: symbol_engine_optimizer 13.38% : 0.009008s : 1: task_emit 0.16% : 0.000106s : 1: tuple_transform 17.00% : 0.011442s : 1: type_inference 0.14% : 0.000096s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x9-kbk],max_mem:6.0M TotalTime = 3.54712, [24] [bootstrap]: 0.00055119 [type_inference]: 0.00667153 [event_method]: 1.542e-05 [auto_monad]: 5.573e-05 [graph_reusing]: 5.17e-06 [inline]: 2.09e-06 [add_attr]: 0.00370833, [1] [add_attr_with_inline]: 0.00369654, [1] [Cycle 1]: 5.307e-05, [2] [tag_attr]: 1.676e-05 [meta_addattr_fg_expand]: 4.32e-06 [parallel-infer-symbol]: 3.43e-06 [pre_auto_parallel]: 3.262e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.82999e-06 [optimize]: 0.00430411, [53] [py_interpret_to_execute]: 2.208e-05 [rewriter_before_opt_a]: 6.051e-05 [opt_a]: 0.00230316, [2] [Cycle 1]: 0.00167956, [45] [expand_dump_flag]: 2.85002e-06 [switch_simplify]: 3.347e-05 [loop_unroll]: 2.066e-05 [a_1]: 0.0004694 [with_stream_mark]: 1.647e-05 [recompute_prepare]: 9.14e-06 [updatestate_depend_eliminate]: 3.87998e-06 [updatestate_assign_eliminate]: 3.20998e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 2.44999e-06 [a_2]: 7.851e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 1.73002e-06 [shard_inline]: 5.97001e-06 [merge_send_recv]: 8.65001e-06 [auto_parallel]: 6.69001e-06 [parallel]: 2.647e-05 [flash_sp]: 8.74998e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.66999e-06 [matmul_add_comm_reduction]: 8.94e-06 [allreduce_slice_to_reducescatter]: 1.07998e-06 [virtual_shard_identity]: 8.50001e-06 [virtual_dataset]: 6.34999e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.65001e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.032e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.226e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.048e-05 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.56998e-06 [after_resolve]: 1.144e-05 [a_after_grad]: 9.52999e-06 [renormalize]: 0.0005215 [add_forward_monad_depend]: 5.22999e-06 [auto_monad_grad]: 2.41e-06 [auto_monad_eliminator]: 1.492e-05 [cse]: 2.653e-05 [a_3]: 4.361e-05 [Cycle 2]: 0.00061362, [45] [expand_dump_flag]: 1.17e-06 [switch_simplify]: 7.42002e-06 [loop_unroll]: 5.39998e-06 [a_1]: 0.00012836 [with_stream_mark]: 1.088e-05 [recompute_prepare]: 5.78997e-06 [updatestate_depend_eliminate]: 3.33998e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.779e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 1.36002e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.68999e-06 [auto_parallel]: 5.76998e-06 [parallel]: 4.93001e-06 [flash_sp]: 3.31001e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 2.78998e-06 [matmul_add_comm_reduction]: 6.23998e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.21001e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 5.09e-06 [merge_forward]: 3.00998e-06 [cell_reuse_recompute_pass]: 1.84998e-06 [offload_activation]: 7.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 8.28001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14001e-06 [meta_fg_expand]: 1.82001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.39e-06 [after_resolve]: 9.54999e-06 [a_after_grad]: 8.33999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.53002e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 8.43999e-06 [cse]: 1.361e-05 [a_3]: 3.304e-05 [py_interpret_to_execute_after_opt_a]: 1.018e-05 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.49e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 5.38002e-06 [mutable_eliminate]: 0.00052712 [opt_b]: 0.00019235, [1] [Cycle 1]: 0.00018529, [7] [b_1]: 0.00011049 [b_2]: 7.71001e-06 [updatestate_depend_eliminate]: 6.59001e-06 [updatestate_assign_eliminate]: 2.97002e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.30008e-07 [cse]: 1.976e-05 [optimize_parallel_all_gather_comm]: 1.656e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.516e-05 [loop_unroll]: 0.00045134 [opt_after_cconv]: 9.938e-05, [1] [Cycle 1]: 9.363e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 2.97002e-06 [updatestate_depend_eliminate]: 6.14001e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.802e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.313e-05 [tuple_transform]: 6.889e-05, [1] [Cycle 1]: 6.458e-05, [4] [d_1]: 3.901e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 5.95002e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 5.533e-05 [cse_after_recomputation]: 2.133e-05, [1] [Cycle 1]: 1.661e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.62998e-06 [swap_dp_allreduce_reducescatter]: 5.02999e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 3.01001e-06 [merge_cast_opt]: 1.21997e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.35001e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.14998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.305e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.74002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.28002e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.885e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 7.259e-05, [1] [Cycle 1]: 6.714e-05, [6] [build]: 2.73e-06 [elim_shapecalc]: 9.59999e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 8.68001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.69998e-06 [auto_monad_reorder]: 1.575e-05 [get_jit_bprop_graph]: 1.20001e-06 [rewriter_after_jit_bprop_graph]: 3.78999e-06 [opt_after_jit_grad]: 0.00049454 [validate]: 3.364e-05 [backend_pass]: 9.20001e-07 [task_emit]: 3.53098 [execute]: 8.47e-06 Sums bootstrap : 0.000551s : 0.02% type_inference : 0.006672s : 0.19% event_method : 0.000015s : 0.00% auto_monad : 0.000056s : 0.00% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000033s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.00% optimize.rewriter_before_opt_a : 0.000061s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000598s : 0.02% optimize.opt_a.with_stream_mark : 0.000027s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000031s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000522s : 0.01% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.00% optimize.opt_a.cse : 0.000040s : 0.00% optimize.opt_a.a_3 : 0.000077s : 0.00% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000527s : 0.01% optimize.opt_b.b_1 : 0.000110s : 0.00% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.00% optimize.loop_unroll : 0.000451s : 0.01% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000495s : 0.01% validate : 0.000034s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 3.530979s : 99.68% execute : 0.000008s : 0.00% Time group info: ------[substitution.] 0.000179 30 15.37% : 0.000028s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.67% : 0.000001s : 2: substitution.fold_const_symbol 2.87% : 0.000005s : 4: substitution.graph_param_transform 66.44% : 0.000119s : 3: substitution.inline 1.85% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000005s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006621 2 90.93% : 0.006020s : 1: type_inference.infer 9.07% : 0.000601s : 1: type_inference.specialize ------[replace.] 0.000041 5 72.17% : 0.000030s : 3: replace.inline 27.83% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000128 5 91.55% : 0.000117s : 3: match.inline 8.45% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000167 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 1.28% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 1.09% : 0.000002s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.53% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.70% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.67% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000004s : 16: predicate.float_depend_g_call 0.53% : 0.000001s : 8: predicate.float_environ_get_switch 0.81% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000001s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.93% : 0.000002s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.46% : 0.000004s : 32: predicate.load_eliminater 1.46% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 11: predicate.minmaximum_grad 1.94% : 0.000003s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000003s : 16: predicate.partial_defer_inline 1.37% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.22% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000003s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.94% : 0.000002s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.13% : 0.000002s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 16: predicate.switch_defer_inline 1.87% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000009s : 54: predicate.switch_simplify 0.77% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000002s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.15% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000385 8 46.04% : 0.000177s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.96% : 0.000208s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 3.556794 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.10% : 0.003713s : 1: add_attr 0.10% : 0.003700s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000060s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.02% : 0.000590s : 1: bootstrap 0.00% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.01% : 0.000461s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.02% : 0.000537s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.03% : 0.000979s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.00% : 0.000094s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.06% : 0.002306s : 1: opt_a 0.00% : 0.000103s : 1: opt_after_cconv 0.01% : 0.000505s : 1: opt_after_jit_grad 0.01% : 0.000196s : 1: opt_b 0.12% : 0.004308s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000037s : 1: pre_auto_parallel 0.00% : 0.000026s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.01% : 0.000282s : 1: renormalize.infer 0.01% : 0.000231s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000039s : 1: rewriter_after_opt_a 0.00% : 0.000065s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.00% : 0.000075s : 1: symbol_engine_optimizer 99.27% : 3.531001s : 1: task_emit 0.00% : 0.000072s : 1: tuple_transform 0.19% : 0.006690s : 1: type_inference 0.00% : 0.000060s : 1: validate TotalTime = 0.0758873, [24] [bootstrap]: 0.00049677 [type_inference]: 0.00462796 [event_method]: 1.127e-05 [auto_monad]: 5.197e-05 [graph_reusing]: 4.90001e-06 [inline]: 1.96e-06 [add_attr]: 0.00315178, [1] [add_attr_with_inline]: 0.00314267, [1] [Cycle 1]: 5.281e-05, [2] [tag_attr]: 1.341e-05 [meta_addattr_fg_expand]: 3.76999e-06 [parallel-infer-symbol]: 3.53e-06 [pre_auto_parallel]: 2.493e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00393538, [53] [py_interpret_to_execute]: 1.577e-05 [rewriter_before_opt_a]: 4.074e-05 [opt_a]: 0.00203401, [2] [Cycle 1]: 0.00137292, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.425e-05 [loop_unroll]: 1.367e-05 [a_1]: 0.00030561 [with_stream_mark]: 1.517e-05 [recompute_prepare]: 7.88999e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.35998e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.78e-05 [accelerated_algorithm]: 6.77002e-06 [shard]: 2.75002e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.17998e-06 [auto_parallel]: 6.70998e-06 [parallel]: 1.92e-05 [flash_sp]: 8.77999e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.04998e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 5.69999e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.70998e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.51998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.12e-05 [merge_recompute_call_nodes]: 1.41998e-06 [before_grad]: 9.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.16e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 1.106e-05 [a_after_grad]: 8.48999e-06 [renormalize]: 0.00043236 [add_forward_monad_depend]: 4.38999e-06 [auto_monad_grad]: 2.12999e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.895e-05 [a_3]: 4.013e-05 [Cycle 2]: 0.00065108, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.29e-06 [a_1]: 0.00016263 [with_stream_mark]: 1.167e-05 [recompute_prepare]: 6.20002e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.874e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 6.46e-06 [merge_send_recv]: 4.67e-06 [auto_parallel]: 5.54998e-06 [parallel]: 4.47e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.10002e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.77999e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 6.32001e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.55997e-06 [cell_reuse_recompute_pass]: 1.55001e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.66e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.93001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 1.50001e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.46e-06 [a_after_grad]: 7.9e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.31998e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 6.93e-06 [cse]: 1.375e-05 [a_3]: 3.155e-05 [py_interpret_to_execute_after_opt_a]: 8.92e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 3.563e-05 [convert_after_rewriter]: 7.24001e-06 [order_py_execute_after_rewriter]: 4.95999e-06 [mutable_eliminate]: 0.00050219 [opt_b]: 0.00018537, [1] [Cycle 1]: 0.00017884, [7] [b_1]: 0.0001086 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.62001e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 8.09989e-07 [cse]: 1.754e-05 [optimize_parallel_all_gather_comm]: 1.652e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.21e-05 [loop_unroll]: 0.00041639 [opt_after_cconv]: 9.697e-05, [1] [Cycle 1]: 9.149e-05, [7] [c_1]: 2.801e-05 [parameter_eliminate]: 2.74001e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.749e-05 [renormalize]: 6.09987e-07 [remove_dup_value]: 1.248e-05 [tuple_transform]: 7.109e-05, [1] [Cycle 1]: 6.669e-05, [4] [d_1]: 4.07e-05 [none_parameter_eliminate]: 1.96998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.604e-05 [cse_after_recomputation]: 2.044e-05, [1] [Cycle 1]: 1.601e-05, [1] [cse]: 1.074e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 1.578e-05 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 7.10017e-07 [remove_cast_before_assign_add]: 1.26002e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 8.69972e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.197e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.96997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.28002e-06 [overlap_recompute_comm]: 2.59999e-06 [overlap_grad_ring_attention]: 4.22998e-06 [overlap_grad_flash_sp]: 1.778e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 7.1e-05, [1] [Cycle 1]: 6.681e-05, [6] [build]: 2.78e-06 [elim_shapecalc]: 8.81997e-06 [elim_not_effective]: 1.188e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 9.22999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.12001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.57e-05 [get_jit_bprop_graph]: 1.55999e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00045962 [validate]: 3.474e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0628231 [execute]: 9.37999e-06 Sums bootstrap : 0.000497s : 0.69% type_inference : 0.004628s : 6.45% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000025s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000468s : 0.65% optimize.opt_a.with_stream_mark : 0.000027s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000432s : 0.60% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000043s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000502s : 0.70% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000416s : 0.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000041s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000016s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000460s : 0.64% validate : 0.000035s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.062823s : 87.58% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000131 26 18.11% : 0.000024s : 4: substitution.arithmetic_simplify 1.32% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 4.50% : 0.000006s : 4: substitution.graph_param_transform 66.32% : 0.000087s : 2: substitution.inline 2.51% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.12% : 0.000004s : 4: substitution.remove_not_recompute_node 3.18% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004584 2 91.86% : 0.004211s : 1: type_inference.infer 8.14% : 0.000373s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000086 2 100.00% : 0.000086s : 2: match.inline ------[predicate.] 0.000144 984 0.93% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.66% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 1.03% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.31% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000009s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.36% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.65% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.16% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.72% : 0.000001s : 9: predicate.print_const_string_wrapper 0.91% : 0.000001s : 8: predicate.reduce_all_const_elim 0.89% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.93% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.49% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 1.13% : 0.000002s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.56% : 0.000007s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.13% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.61% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.44% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.72% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000276 6 39.97% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.03% : 0.000166s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084386 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.74% : 0.003156s : 1: add_attr 3.73% : 0.003146s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000533s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000019s : 1: label_micro_interleaved_index 0.50% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.61% : 0.000511s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 0.97% : 0.000822s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.41% : 0.002037s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.56% : 0.000469s : 1: opt_after_jit_grad 0.22% : 0.000189s : 1: opt_b 4.67% : 0.003940s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.29% : 0.000243s : 1: renormalize.infer 0.22% : 0.000184s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000040s : 1: rewriter_after_opt_a 0.05% : 0.000045s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000074s : 1: symbol_engine_optimizer 74.47% : 0.062845s : 1: task_emit 0.09% : 0.000074s : 1: tuple_transform 5.50% : 0.004645s : 1: type_inference 0.07% : 0.000059s : 1: validate TotalTime = 0.0734307, [24] [bootstrap]: 0.00045597 [type_inference]: 0.00564386 [event_method]: 1.409e-05 [auto_monad]: 5.589e-05 [graph_reusing]: 5.32001e-06 [inline]: 1.77001e-06 [add_attr]: 0.00303753, [1] [add_attr_with_inline]: 0.00302964, [1] [Cycle 1]: 4.49e-05, [2] [tag_attr]: 1.487e-05 [meta_addattr_fg_expand]: 4.63001e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.418e-05 [insert-virtual-dataset]: 2.28998e-06 [parallel-infer-symbol-second]: 6.49976e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00403898, [53] [py_interpret_to_execute]: 2.062e-05 [rewriter_before_opt_a]: 5.713e-05 [opt_a]: 0.00215403, [2] [Cycle 1]: 0.00154954, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 3.189e-05 [loop_unroll]: 2.03e-05 [a_1]: 0.00045214 [with_stream_mark]: 1.326e-05 [recompute_prepare]: 7.61001e-06 [updatestate_depend_eliminate]: 3.51999e-06 [updatestate_assign_eliminate]: 3.05002e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.559e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 1.84e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.93998e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 5.82001e-06 [parallel]: 1.849e-05 [flash_sp]: 7.8e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.59998e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.73001e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.99e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.34998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.79001e-06 [after_resolve]: 9.81e-06 [a_after_grad]: 9.20999e-06 [renormalize]: 0.00045745 [add_forward_monad_depend]: 4.31002e-06 [auto_monad_grad]: 1.92001e-06 [auto_monad_eliminator]: 1.389e-05 [cse]: 2.77e-05 [a_3]: 4.157e-05 [Cycle 2]: 0.00059488, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.74999e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.0001259 [with_stream_mark]: 9.86998e-06 [recompute_prepare]: 5.63002e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [parameter_eliminate]: 7.39994e-07 [a_2]: 6.777e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.35999e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.10999e-06 [parallel]: 4.12e-06 [flash_sp]: 2.98e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 4.08999e-06 [matmul_add_comm_reduction]: 4.96002e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.21998e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 6.94999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.21002e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.66001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.00999e-06 [a_after_grad]: 8.23001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 6.58003e-06 [cse]: 1.33e-05 [a_3]: 3.366e-05 [py_interpret_to_execute_after_opt_a]: 8.23999e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.124e-05 [convert_after_rewriter]: 6.48e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.00049302 [opt_b]: 0.00018408, [1] [Cycle 1]: 0.00017813, [7] [b_1]: 0.0001085 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.47999e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 4.39992e-07 [cse]: 1.742e-05 [optimize_parallel_all_gather_comm]: 1.671e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.381e-05 [loop_unroll]: 0.00041898 [opt_after_cconv]: 9.441e-05, [1] [Cycle 1]: 8.86e-05, [7] [c_1]: 2.749e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.618e-05 [renormalize]: 6.00005e-07 [remove_dup_value]: 1.239e-05 [tuple_transform]: 6.888e-05, [1] [Cycle 1]: 6.464e-05, [4] [d_1]: 3.933e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 5.94e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.35e-05 [cse_after_recomputation]: 2.028e-05, [1] [Cycle 1]: 1.591e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.75001e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.38998e-06 [micro_interleaved_order_control]: 2.28002e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.64001e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.04003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.134e-05 [grouped_pairwise_exchange_alltoall]: 2.02001e-06 [offloading_packed_experts]: 3.47002e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.84e-06 [overlap_recompute_comm]: 2.18998e-06 [overlap_grad_ring_attention]: 4.21001e-06 [overlap_grad_flash_sp]: 1.683e-05 [begin_end_overlap_inline]: 8.59989e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.61998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.796e-05, [1] [Cycle 1]: 6.369e-05, [6] [build]: 2.32999e-06 [elim_shapecalc]: 8.48999e-06 [elim_not_effective]: 1.115e-05 [opt_reshape]: 5.87001e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.474e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.45003e-06 [opt_after_jit_grad]: 0.00044382 [validate]: 3.183e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0594303 [execute]: 8.89003e-06 Sums bootstrap : 0.000456s : 0.66% type_inference : 0.005644s : 8.13% event_method : 0.000014s : 0.02% auto_monad : 0.000056s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000578s : 0.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000458s : 0.66% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000493s : 0.71% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000419s : 0.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.64% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059430s : 85.60% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.27% : 0.000024s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.24% : 0.000005s : 4: substitution.graph_param_transform 67.58% : 0.000113s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.27% : 0.000004s : 4: substitution.replace_old_param 6.63% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005604 2 89.74% : 0.005029s : 1: type_inference.infer 10.26% : 0.000575s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.63% : 0.000027s : 3: replace.inline 30.37% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.69% : 0.000111s : 3: match.inline 8.31% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 19: predicate.arithmetic_simplify 0.97% : 0.000002s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.83% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.26% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.54% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.88% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.27% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.95% : 0.000002s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000360 8 46.28% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.72% : 0.000193s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.082062 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.71% : 0.003042s : 1: add_attr 3.70% : 0.003033s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000061s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.60% : 0.000491s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000428s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.61% : 0.000503s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.15% : 0.000943s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.63% : 0.002157s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.55% : 0.000454s : 1: opt_after_jit_grad 0.23% : 0.000188s : 1: opt_b 4.93% : 0.004043s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000028s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.28% : 0.000230s : 1: renormalize.infer 0.27% : 0.000221s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 72.45% : 0.059452s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.89% : 0.005657s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 1.07222, [24] [bootstrap]: 0.00051914 [type_inference]: 0.0118235 [event_method]: 4.919e-05 [auto_monad]: 0.00012248 [graph_reusing]: 7.98999e-06 [inline]: 1.99e-06 [add_attr]: 0.00315626, [1] [add_attr_with_inline]: 0.00314729, [1] [Cycle 1]: 7.478e-05, [2] [tag_attr]: 3.621e-05 [meta_addattr_fg_expand]: 9.55001e-06 [parallel-infer-symbol]: 3.31999e-06 [pre_auto_parallel]: 5.159e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.0141278, [53] [py_interpret_to_execute]: 3.984e-05 [rewriter_before_opt_a]: 0.00014652 [opt_a]: 0.0117348, [3] [Cycle 1]: 0.00763515, [45] [expand_dump_flag]: 3.88001e-06 [switch_simplify]: 7.453e-05 [loop_unroll]: 6.244e-05 [a_1]: 0.00146599 [with_stream_mark]: 2.592e-05 [recompute_prepare]: 2.252e-05 [updatestate_depend_eliminate]: 9.21998e-06 [updatestate_assign_eliminate]: 7.76001e-06 [updatestate_loads_eliminate]: 7.21999e-06 [parameter_eliminate]: 2.61e-06 [a_2]: 0.00024712 [accelerated_algorithm]: 3.214e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.46001e-06 [shard_inline]: 1.64e-05 [merge_send_recv]: 1.573e-05 [auto_parallel]: 1.138e-05 [parallel]: 1.854e-05 [flash_sp]: 1.22e-05 [merge_comm]: 9.78002e-06 [allreduce_fusion]: 8.54e-06 [matmul_add_comm_reduction]: 2.68e-05 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 1.789e-05 [virtual_dataset]: 1.576e-05 [get_grad_eliminate_]: 1.571e-05 [virtual_output]: 1.518e-05 [merge_forward]: 9.37001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 1.801e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.943e-05 [merge_recompute_call_nodes]: 1.73002e-06 [before_grad]: 2.785e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66e-06 [meta_fg_expand]: 0.00156448 [flash_sp_send_recv_attached]: 4e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 6.08e-05 [a_after_grad]: 8.29e-05 [renormalize]: 0.00280263 [add_forward_monad_depend]: 9.42001e-06 [auto_monad_grad]: 6.41e-06 [auto_monad_eliminator]: 5.684e-05 [cse]: 0.00017247 [a_3]: 0.00033774 [Cycle 2]: 0.00316816, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 4.655e-05 [loop_unroll]: 4.391e-05 [a_1]: 0.00153862 [with_stream_mark]: 1.469e-05 [recompute_prepare]: 1.1e-05 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 4.55999e-06 [updatestate_loads_eliminate]: 3.59002e-06 [parameter_eliminate]: 1.34e-06 [a_2]: 0.00012717 [accelerated_algorithm]: 1.259e-05 [shard]: 1.47001e-06 [meta_shard_fg_expand]: 1.98997e-06 [shard_inline]: 9.27001e-06 [merge_send_recv]: 7.53e-06 [auto_parallel]: 8.08001e-06 [parallel]: 6.76e-06 [flash_sp]: 3.61999e-06 [merge_comm]: 6.28e-06 [allreduce_fusion]: 4.79002e-06 [matmul_add_comm_reduction]: 8.62e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 9.91998e-06 [virtual_dataset]: 8.95001e-06 [get_grad_eliminate_]: 8.84e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.52998e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 9.63997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.62e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 1.411e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42001e-06 [meta_fg_expand]: 9.678e-05 [flash_sp_send_recv_attached]: 1.10001e-06 [receive_attached]: 1.34e-06 [after_resolve]: 1.661e-05 [a_after_grad]: 1.49e-05 [renormalize]: 0.0006979 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.94999e-06 [auto_monad_eliminator]: 1.553e-05 [cse]: 4.975e-05 [a_3]: 6.525e-05 [Cycle 3]: 0.00091459, [45] [expand_dump_flag]: 1.06997e-06 [switch_simplify]: 1.069e-05 [loop_unroll]: 9.39e-06 [a_1]: 0.00025149 [with_stream_mark]: 1.062e-05 [recompute_prepare]: 9.51003e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 4.06001e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 9.40025e-07 [a_2]: 0.00012464 [accelerated_algorithm]: 1.258e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.81003e-06 [shard_inline]: 9.09998e-06 [merge_send_recv]: 7.1e-06 [auto_parallel]: 7.26001e-06 [parallel]: 4.69002e-06 [flash_sp]: 1.06002e-06 [merge_comm]: 4.78001e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 8.01001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.004e-05 [virtual_dataset]: 8.70001e-06 [get_grad_eliminate_]: 8.52e-06 [virtual_output]: 8.27998e-06 [merge_forward]: 4.23999e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 8.77e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.609e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.474e-05 [set_forward_comm_id_for_comm_node_pass]: 5.86e-06 [meta_fg_expand]: 2.89001e-06 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.20999e-06 [after_resolve]: 1.462e-05 [a_after_grad]: 1.445e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 1.089e-05 [cse]: 2.885e-05 [a_3]: 5.996e-05 [py_interpret_to_execute_after_opt_a]: 1.209e-05 [slice_cell_reuse_recomputed_activation]: 2.84001e-06 [rewriter_after_opt_a]: 4.918e-05 [convert_after_rewriter]: 9.16002e-06 [order_py_execute_after_rewriter]: 6.47001e-06 [mutable_eliminate]: 0.00054693 [opt_b]: 0.00029304, [1] [Cycle 1]: 0.00028637, [7] [b_1]: 0.00019031 [b_2]: 1.098e-05 [updatestate_depend_eliminate]: 7.44002e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 3.00002e-07 [cse]: 3.32e-05 [optimize_parallel_all_gather_comm]: 2.052e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.2e-05 [loop_unroll]: 0.00043502 [opt_after_cconv]: 0.00013718, [1] [Cycle 1]: 0.00013103, [7] [c_1]: 4.842e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 6.88e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 3.123e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 3.17e-05 [tuple_transform]: 0.00010259, [1] [Cycle 1]: 9.783e-05, [4] [d_1]: 6.737e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 3.19997e-07 [switch_simplify]: 1.007e-05 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 5.934e-05 [cse_after_recomputation]: 3.285e-05, [1] [Cycle 1]: 2.818e-05, [1] [cse]: 2.264e-05 [environ_conv]: 9.71e-06 [swap_dp_allreduce_reducescatter]: 8.18999e-06 [bias_add_comm_swap]: 2.33998e-06 [label_micro_interleaved_index]: 4.73001e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61002e-06 [control_data_broadcast_order]: 1.704e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 5.07e-06 [overlap_recompute_and_grad_model_parallel]: 5.67999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 5.20999e-06 [overlap_grad_flash_sp]: 2.536e-05 [begin_end_overlap_inline]: 9.29984e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 0.00010201, [1] [Cycle 1]: 9.754e-05, [6] [build]: 1.035e-05 [elim_shapecalc]: 1.382e-05 [elim_not_effective]: 1.865e-05 [opt_reshape]: 1.05e-05 [fold_const_symbol]: 1.484e-05 [renormalize]: 2.40019e-07 [detach_backward]: 1.99999e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 2.483e-05 [get_jit_bprop_graph]: 1.14998e-06 [rewriter_after_jit_bprop_graph]: 3.85e-06 [opt_after_jit_grad]: 0.00047551 [validate]: 8.92e-05 [backend_pass]: 1.12e-06 [task_emit]: 1.04151 [execute]: 9.24998e-06 Sums bootstrap : 0.000519s : 0.05% type_inference : 0.011823s : 1.11% event_method : 0.000049s : 0.00% auto_monad : 0.000122s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000052s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.00% optimize.rewriter_before_opt_a : 0.000147s : 0.01% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.01% optimize.opt_a.loop_unroll : 0.000116s : 0.01% optimize.opt_a.a_1 : 0.003256s : 0.30% optimize.opt_a.with_stream_mark : 0.000051s : 0.00% optimize.opt_a.recompute_prepare : 0.000043s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000027s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000017s : 0.00% optimize.opt_a.merge_comm : 0.000021s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.00% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001664s : 0.16% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.01% optimize.opt_a.a_after_grad : 0.000112s : 0.01% optimize.opt_a.renormalize : 0.003501s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.01% optimize.opt_a.cse : 0.000251s : 0.02% optimize.opt_a.a_3 : 0.000463s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000547s : 0.05% optimize.opt_b.b_1 : 0.000190s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000435s : 0.04% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.00% optimize.tuple_transform.d_1 : 0.000067s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.01% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000476s : 0.04% validate : 0.000089s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 1.041508s : 97.54% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000797 222 6.31% : 0.000050s : 12: substitution.arithmetic_simplify 1.74% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.93% : 0.000007s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 56.17% : 0.000448s : 17: substitution.inline 2.21% : 0.000018s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.07% : 0.000016s : 3: substitution.less_batch_normalization 1.65% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.72% : 0.000014s : 20: substitution.remove_not_recompute_node 3.12% : 0.000025s : 10: substitution.replace_applicator 1.33% : 0.000011s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.50% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.31% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011744 2 86.63% : 0.010175s : 1: type_inference.infer 13.37% : 0.001570s : 1: type_inference.specialize ------[replace.] 0.000220 33 58.02% : 0.000128s : 17: replace.inline 41.98% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000473 33 92.79% : 0.000439s : 17: match.inline 7.21% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000756 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.22% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000043s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.30% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.44% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 68: predicate.reduce_eliminate 2.65% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.20% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.92% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.21% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001701 34 56.98% : 0.000969s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.02% : 0.000732s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.098259 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.29% : 0.003161s : 1: add_attr 0.29% : 0.003151s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000130s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.05% : 0.000554s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000057s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.04% : 0.000444s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000556s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000019s : 1: opt.transform.mutable_eliminate 0.45% : 0.004936s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000175s : 28: opt.transform.opt_b 0.01% : 0.000075s : 2: opt.transform.opt_trans_graph 0.00% : 0.000054s : 4: opt.transform.symbol_engine_opt 1.07% : 0.011738s : 1: opt_a 0.01% : 0.000141s : 1: opt_after_cconv 0.04% : 0.000486s : 1: opt_after_jit_grad 0.03% : 0.000297s : 1: opt_b 1.29% : 0.014132s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000056s : 1: pre_auto_parallel 0.00% : 0.000044s : 1: py_interpret_to_execute 0.00% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000036s : 1: remove_dup_value 0.17% : 0.001849s : 2: renormalize.infer 0.15% : 0.001638s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000053s : 1: rewriter_after_opt_a 0.01% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000105s : 1: symbol_engine_optimizer 94.83% : 1.041529s : 1: task_emit 0.01% : 0.000105s : 1: tuple_transform 1.08% : 0.011841s : 1: type_inference 0.01% : 0.000118s : 1: validate TotalTime = 0.0727522, [24] [bootstrap]: 0.0004964 [type_inference]: 0.00439818 [event_method]: 1.098e-05 [auto_monad]: 5.214e-05 [graph_reusing]: 5.09e-06 [inline]: 1.77999e-06 [add_attr]: 0.00305354, [1] [add_attr_with_inline]: 0.00304533, [1] [Cycle 1]: 4.674e-05, [2] [tag_attr]: 1.195e-05 [meta_addattr_fg_expand]: 3.36999e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 2.249e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00370654, [53] [py_interpret_to_execute]: 1.575e-05 [rewriter_before_opt_a]: 3.873e-05 [opt_a]: 0.0018815, [2] [Cycle 1]: 0.00128156, [45] [expand_dump_flag]: 2.93998e-06 [switch_simplify]: 2.388e-05 [loop_unroll]: 1.39e-05 [a_1]: 0.00029572 [with_stream_mark]: 1.272e-05 [recompute_prepare]: 7.7e-06 [updatestate_depend_eliminate]: 3.39001e-06 [updatestate_assign_eliminate]: 3.69002e-06 [updatestate_loads_eliminate]: 2.96999e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.68e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 8.28999e-06 [auto_parallel]: 5.99999e-06 [parallel]: 1.83e-05 [flash_sp]: 7.80998e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.58999e-06 [matmul_add_comm_reduction]: 9.64999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.62002e-06 [virtual_dataset]: 5.67999e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 5.82999e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 9.20999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 1.043e-05 [a_after_grad]: 8.43999e-06 [renormalize]: 0.00036222 [add_forward_monad_depend]: 4.89e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.357e-05 [cse]: 2.741e-05 [a_3]: 3.947e-05 [Cycle 2]: 0.00059058, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.69001e-06 [loop_unroll]: 5.24998e-06 [a_1]: 0.00012464 [with_stream_mark]: 9.25999e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.64001e-06 [updatestate_assign_eliminate]: 2.09999e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.685e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.35001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.42001e-06 [parallel]: 4.37e-06 [flash_sp]: 3.42002e-06 [merge_comm]: 3.08998e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 6.27001e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 5.22999e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.93998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.96e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.77e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.349e-05 [a_3]: 3.157e-05 [py_interpret_to_execute_after_opt_a]: 7.75e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.115e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00044784 [opt_b]: 0.00017947, [1] [Cycle 1]: 0.00017337, [7] [b_1]: 0.00010694 [b_2]: 6.84999e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 3.89991e-07 [cse]: 1.579e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.78002e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.00041655 [opt_after_cconv]: 0.00011096, [1] [Cycle 1]: 0.00010527, [7] [c_1]: 4.293e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.679e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.27e-05 [tuple_transform]: 6.972e-05, [1] [Cycle 1]: 6.533e-05, [4] [d_1]: 3.973e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.28998e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.523e-05 [cse_after_recomputation]: 2.078e-05, [1] [Cycle 1]: 1.623e-05, [1] [cse]: 1.123e-05 [environ_conv]: 4.60001e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.15002e-06 [label_micro_interleaved_index]: 4.19002e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.41002e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 9.79984e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.35002e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.133e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 3.65998e-06 [overlap_recompute_and_grad_model_parallel]: 4.23001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.706e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.13998e-06 [split_layernorm_comm]: 1.99999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.826e-05, [1] [Cycle 1]: 6.424e-05, [6] [build]: 2.85998e-06 [elim_shapecalc]: 7.90998e-06 [elim_not_effective]: 1.172e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.65001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.47999e-06 [pipeline_parallel_scheduler]: 1.74998e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00044949 [validate]: 3.211e-05 [backend_pass]: 8.10018e-07 [task_emit]: 0.0602673 [execute]: 1.014e-05 Sums bootstrap : 0.000496s : 0.72% type_inference : 0.004398s : 6.40% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.61% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000362s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.65% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000417s : 0.61% optimize.opt_after_cconv.c_1 : 0.000043s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000449s : 0.65% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060267s : 87.70% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000122 26 19.18% : 0.000023s : 4: substitution.arithmetic_simplify 1.65% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.88% : 0.000006s : 4: substitution.graph_param_transform 64.38% : 0.000079s : 2: substitution.inline 2.30% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.65% : 0.000004s : 4: substitution.remove_not_recompute_node 2.95% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004357 2 91.80% : 0.004000s : 1: type_inference.infer 8.20% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.96% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000008s : 44: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.95% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 1.01% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.38% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.26% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 42.53% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.47% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080813 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.78% : 0.003058s : 1: add_attr 3.77% : 0.003049s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000534s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.95% : 0.000770s : 78: opt.transform.opt_a 0.05% : 0.000042s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.33% : 0.001884s : 1: opt_a 0.14% : 0.000114s : 1: opt_after_cconv 0.57% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.59% : 0.003710s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000200s : 1: renormalize.infer 0.19% : 0.000156s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.60% : 0.060290s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.46% : 0.004413s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.237529, [24] [bootstrap]: 0.00051281 [type_inference]: 0.0107013 [event_method]: 4.23e-05 [auto_monad]: 0.00011559 [graph_reusing]: 8.50001e-06 [inline]: 1.71002e-06 [add_attr]: 0.00306317, [1] [add_attr_with_inline]: 0.00305479, [1] [Cycle 1]: 0.00011108, [2] [tag_attr]: 7.256e-05 [meta_addattr_fg_expand]: 8.22998e-06 [parallel-infer-symbol]: 3.00998e-06 [pre_auto_parallel]: 4.714e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0131731, [53] [py_interpret_to_execute]: 3.576e-05 [rewriter_before_opt_a]: 0.00012788 [opt_a]: 0.0109478, [3] [Cycle 1]: 0.00702829, [45] [expand_dump_flag]: 3.75e-06 [switch_simplify]: 6.642e-05 [loop_unroll]: 5.511e-05 [a_1]: 0.00133289 [with_stream_mark]: 2.262e-05 [recompute_prepare]: 2.167e-05 [updatestate_depend_eliminate]: 8.96002e-06 [updatestate_assign_eliminate]: 7.46999e-06 [updatestate_loads_eliminate]: 7.45e-06 [parameter_eliminate]: 2.51e-06 [a_2]: 0.00024513 [accelerated_algorithm]: 3.068e-05 [shard]: 1.79e-06 [meta_shard_fg_expand]: 3.13e-06 [shard_inline]: 1.623e-05 [merge_send_recv]: 1.552e-05 [auto_parallel]: 1.041e-05 [parallel]: 1.863e-05 [flash_sp]: 1.165e-05 [merge_comm]: 9.72999e-06 [allreduce_fusion]: 8.58001e-06 [matmul_add_comm_reduction]: 2.743e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.776e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.522e-05 [virtual_output]: 1.535e-05 [merge_forward]: 9.49e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.81e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.929e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.789e-05 [set_forward_comm_id_for_comm_node_pass]: 9.55001e-06 [meta_fg_expand]: 0.00142405 [flash_sp_send_recv_attached]: 3.73001e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 5.957e-05 [a_after_grad]: 8.16e-05 [renormalize]: 0.00251554 [add_forward_monad_depend]: 9.30001e-06 [auto_monad_grad]: 4.88001e-06 [auto_monad_eliminator]: 5.71e-05 [cse]: 0.00016988 [a_3]: 0.00033803 [Cycle 2]: 0.00294502, [45] [expand_dump_flag]: 1.66e-06 [switch_simplify]: 4.744e-05 [loop_unroll]: 4.392e-05 [a_1]: 0.00153073 [with_stream_mark]: 1.207e-05 [recompute_prepare]: 1.135e-05 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.63999e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 0.00012592 [accelerated_algorithm]: 1.191e-05 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 9.37001e-06 [merge_send_recv]: 6.73e-06 [auto_parallel]: 7.50003e-06 [parallel]: 5.13002e-06 [flash_sp]: 2.86e-06 [merge_comm]: 5.08002e-06 [allreduce_fusion]: 6.06998e-06 [matmul_add_comm_reduction]: 8.40001e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 1.041e-05 [virtual_dataset]: 9.20999e-06 [get_grad_eliminate_]: 8.70001e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.68999e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.20001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.702e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.433e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20999e-06 [meta_fg_expand]: 3.654e-05 [flash_sp_send_recv_attached]: 9.79984e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.469e-05 [a_after_grad]: 1.437e-05 [renormalize]: 0.00057244 [add_forward_monad_depend]: 3.88001e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.447e-05 [cse]: 4.321e-05 [a_3]: 6.525e-05 [Cycle 3]: 0.00096042, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 1.051e-05 [loop_unroll]: 8.88002e-06 [a_1]: 0.00024982 [with_stream_mark]: 9.64999e-06 [recompute_prepare]: 9.68997e-06 [updatestate_depend_eliminate]: 4.80001e-06 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 5.87999e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 0.00012779 [accelerated_algorithm]: 1.21e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 9.22001e-06 [merge_send_recv]: 7.3e-06 [auto_parallel]: 7.57002e-06 [parallel]: 4.4e-06 [flash_sp]: 1.04e-06 [merge_comm]: 5.05001e-06 [allreduce_fusion]: 4.89998e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 1.027e-05 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.43999e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 8.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.633e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.365e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 1.325e-05 [a_after_grad]: 1.388e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.157e-05 [cse]: 2.771e-05 [a_3]: 5.916e-05 [py_interpret_to_execute_after_opt_a]: 1.02e-05 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 4.603e-05 [convert_after_rewriter]: 8.58001e-06 [order_py_execute_after_rewriter]: 6.56999e-06 [mutable_eliminate]: 0.00045641 [opt_b]: 0.00028624, [1] [Cycle 1]: 0.00028015, [7] [b_1]: 0.00018913 [b_2]: 1.078e-05 [updatestate_depend_eliminate]: 6.88e-06 [updatestate_assign_eliminate]: 4.24002e-06 [updatestate_loads_eliminate]: 4.11001e-06 [renormalize]: 5.19998e-07 [cse]: 3.077e-05 [optimize_parallel_all_gather_comm]: 1.978e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 1.903e-05 [loop_unroll]: 0.00042186 [opt_after_cconv]: 0.00013452, [1] [Cycle 1]: 0.0001286, [7] [c_1]: 4.8e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 6.95998e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.9e-06 [cse]: 2.999e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 2.904e-05 [tuple_transform]: 0.00010055, [1] [Cycle 1]: 9.612e-05, [4] [d_1]: 6.572e-05 [none_parameter_eliminate]: 1.96e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 5.814e-05 [cse_after_recomputation]: 3.199e-05, [1] [Cycle 1]: 2.737e-05, [1] [cse]: 2.176e-05 [environ_conv]: 9.29e-06 [swap_dp_allreduce_reducescatter]: 7.61001e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.60001e-06 [slice_recompute_activation]: 2.45002e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 8.39995e-07 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61998e-06 [control_data_broadcast_order]: 1.699e-05 [grouped_pairwise_exchange_alltoall]: 1.67001e-06 [offloading_packed_experts]: 4.95001e-06 [overlap_recompute_and_grad_model_parallel]: 5.65001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 4.91002e-06 [overlap_grad_flash_sp]: 2.383e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 9.752e-05, [1] [Cycle 1]: 9.337e-05, [6] [build]: 9.47999e-06 [elim_shapecalc]: 1.32e-05 [elim_not_effective]: 1.793e-05 [opt_reshape]: 9.77001e-06 [fold_const_symbol]: 1.508e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.73002e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 2.478e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00046674 [validate]: 4.581e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.209078 [execute]: 9.07999e-06 Sums bootstrap : 0.000513s : 0.22% type_inference : 0.010701s : 4.59% event_method : 0.000042s : 0.02% auto_monad : 0.000116s : 0.05% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000073s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.02% optimize.rewriter_before_opt_a : 0.000128s : 0.05% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000124s : 0.05% optimize.opt_a.loop_unroll : 0.000108s : 0.05% optimize.opt_a.a_1 : 0.003113s : 1.34% optimize.opt_a.with_stream_mark : 0.000044s : 0.02% optimize.opt_a.recompute_prepare : 0.000043s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.01% optimize.opt_a.merge_send_recv : 0.000030s : 0.01% optimize.opt_a.auto_parallel : 0.000025s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.01% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.02% optimize.opt_a.virtual_dataset : 0.000033s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.01% optimize.opt_a.virtual_output : 0.000032s : 0.01% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001464s : 0.63% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.04% optimize.opt_a.a_after_grad : 0.000110s : 0.05% optimize.opt_a.renormalize : 0.003088s : 1.32% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.04% optimize.opt_a.cse : 0.000241s : 0.10% optimize.opt_a.a_3 : 0.000462s : 0.20% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.02% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000456s : 0.20% optimize.opt_b.b_1 : 0.000189s : 0.08% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.01% optimize.loop_unroll : 0.000422s : 0.18% optimize.opt_after_cconv.c_1 : 0.000048s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.01% optimize.tuple_transform.d_1 : 0.000066s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.02% optimize.cse_after_recomputation.cse : 0.000022s : 0.01% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000467s : 0.20% validate : 0.000046s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.209078s : 89.66% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000727 218 6.00% : 0.000044s : 11: substitution.arithmetic_simplify 1.91% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.31% : 0.000002s : 2: substitution.incorporate_call_switch 54.78% : 0.000398s : 16: substitution.inline 2.07% : 0.000015s : 2: substitution.inline_without_move 1.41% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.79% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.89% : 0.000014s : 20: substitution.remove_not_recompute_node 3.29% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.80% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.88% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.40% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.45% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010632 2 87.31% : 0.009282s : 1: type_inference.infer 12.69% : 0.001349s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.32% : 0.000119s : 16: replace.inline 40.68% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000421 30 92.75% : 0.000390s : 16: match.inline 7.25% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.21% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 67: predicate.reduce_eliminate 2.70% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.33% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.83% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001553 32 57.21% : 0.000889s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.79% : 0.000665s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.261943 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.17% : 0.003068s : 1: add_attr 1.17% : 0.003059s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000123s : 1: auto_monad 0.01% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.21% : 0.000548s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.01% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.02% : 0.000049s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.16% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.18% : 0.000465s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 1.82% : 0.004771s : 117: opt.transform.opt_a 0.02% : 0.000047s : 1: opt.transform.opt_after_cconv 0.01% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000175s : 28: opt.transform.opt_b 0.03% : 0.000074s : 2: opt.transform.opt_trans_graph 0.02% : 0.000053s : 4: opt.transform.symbol_engine_opt 4.18% : 0.010951s : 1: opt_a 0.05% : 0.000138s : 1: opt_after_cconv 0.18% : 0.000476s : 1: opt_after_jit_grad 0.11% : 0.000290s : 1: opt_b 5.03% : 0.013177s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000052s : 1: pre_auto_parallel 0.02% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000033s : 1: remove_dup_value 0.62% : 0.001622s : 2: renormalize.infer 0.55% : 0.001453s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000050s : 1: rewriter_after_opt_a 0.05% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000100s : 1: symbol_engine_optimizer 79.83% : 0.209102s : 1: task_emit 0.04% : 0.000104s : 1: tuple_transform 4.09% : 0.010716s : 1: type_inference 0.03% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x9-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x0-pynative],max_mem:6.0M TotalTime = 0.0221122, [24] [bootstrap]: 0.00052609 [type_inference]: 0.00619802 [event_method]: 1.444e-05 [auto_monad]: 5.277e-05 [graph_reusing]: 5.30001e-06 [inline]: 1.74998e-06 [add_attr]: 0.00392805, [1] [add_attr_with_inline]: 0.0039174, [1] [Cycle 1]: 4.715e-05, [2] [tag_attr]: 1.621e-05 [meta_addattr_fg_expand]: 4.35999e-06 [parallel-infer-symbol]: 3.16001e-06 [pre_auto_parallel]: 2.808e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00419351, [53] [py_interpret_to_execute]: 2.121e-05 [rewriter_before_opt_a]: 6.041e-05 [opt_a]: 0.00220743, [2] [Cycle 1]: 0.00161146, [45] [expand_dump_flag]: 3.2e-06 [switch_simplify]: 3.147e-05 [loop_unroll]: 2.087e-05 [a_1]: 0.00048391 [with_stream_mark]: 1.426e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.48e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 7.28e-06 [auto_parallel]: 6.25997e-06 [parallel]: 2.402e-05 [flash_sp]: 7.00998e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 8.87e-06 [allreduce_slice_to_reducescatter]: 1.04998e-06 [virtual_shard_identity]: 7.42002e-06 [virtual_dataset]: 6.17999e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.62999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73999e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.25002e-06 [after_resolve]: 1.023e-05 [a_after_grad]: 8.55001e-06 [renormalize]: 0.00048309 [add_forward_monad_depend]: 4.61002e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 1.321e-05 [cse]: 2.672e-05 [a_3]: 4.11e-05 [Cycle 2]: 0.00058642, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.000125 [with_stream_mark]: 9.17999e-06 [recompute_prepare]: 5.56998e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.72e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.10998e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.07e-06 [flash_sp]: 2.94999e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.58998e-06 [matmul_add_comm_reduction]: 5.29e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 5.89999e-06 [virtual_dataset]: 5.10001e-06 [get_grad_eliminate_]: 4.87998e-06 [virtual_output]: 4.77998e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.78002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.35001e-06 [merge_recompute_call_nodes]: 8.60018e-07 [before_grad]: 7.83999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 9.35001e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.03998e-06 [cse]: 1.335e-05 [a_3]: 3.097e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 2.959e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00045425 [opt_b]: 0.00018092, [1] [Cycle 1]: 0.00017474, [7] [b_1]: 0.00010818 [b_2]: 7e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.00002e-07 [cse]: 1.598e-05 [optimize_parallel_all_gather_comm]: 1.554e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.324e-05 [loop_unroll]: 0.00041875 [opt_after_cconv]: 9.43e-05, [1] [Cycle 1]: 8.858e-05, [7] [c_1]: 2.769e-05 [parameter_eliminate]: 2.04e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.555e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.344e-05 [tuple_transform]: 6.962e-05, [1] [Cycle 1]: 6.52e-05, [4] [d_1]: 3.936e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.07001e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.67e-05 [cse_after_recomputation]: 2.093e-05, [1] [Cycle 1]: 1.65e-05, [1] [cse]: 1.125e-05 [environ_conv]: 5.02999e-06 [swap_dp_allreduce_reducescatter]: 5.37001e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.14002e-06 [label_fine_grained_interleaved_index]: 2.68998e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 0.00012573 [overlap_recompute_and_grad_model_parallel]: 5.89999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.70001e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.788e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 7.576e-05, [1] [Cycle 1]: 7.084e-05, [6] [build]: 2.68998e-06 [elim_shapecalc]: 1.029e-05 [elim_not_effective]: 1.282e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 9.44998e-06 [renormalize]: 3.09985e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.726e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 0.00017916 [opt_after_jit_grad]: 0.00046042 [validate]: 3.107e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00624946 [execute]: 7.05e-06 Sums bootstrap : 0.000526s : 3.06% type_inference : 0.006198s : 36.02% event_method : 0.000014s : 0.08% auto_monad : 0.000053s : 0.31% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000609s : 3.54% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000142s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000011s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000483s : 2.81% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000040s : 0.23% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000454s : 2.64% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000419s : 2.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000126s : 0.73% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000179s : 1.04% opt_after_jit_grad : 0.000460s : 2.68% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006249s : 36.32% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000193 30 12.68% : 0.000025s : 5: substitution.arithmetic_simplify 0.94% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000002s : 2: substitution.fold_const_symbol 2.68% : 0.000005s : 4: substitution.graph_param_transform 71.44% : 0.000138s : 3: substitution.inline 1.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.30% : 0.000004s : 4: substitution.remove_not_recompute_node 2.03% : 0.000004s : 4: substitution.replace_old_param 5.70% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006152 2 90.41% : 0.005562s : 1: type_inference.infer 9.59% : 0.000590s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.62% : 0.000028s : 3: replace.inline 28.38% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000146 5 93.12% : 0.000136s : 3: match.inline 6.88% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.33% : 0.000004s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.89% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.05% : 0.000002s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000384 8 45.91% : 0.000176s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.09% : 0.000207s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031843 196 0.01% : 0.000004s : 1: ForceFp32Comm 12.35% : 0.003932s : 1: add_attr 12.31% : 0.003921s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000058s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000563s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.34% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.45% : 0.000463s : 1: mutable_eliminate 0.42% : 0.000132s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.05% : 0.000972s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000035s : 4: opt.transform.symbol_engine_opt 6.94% : 0.002210s : 1: opt_a 0.31% : 0.000098s : 1: opt_after_cconv 1.48% : 0.000470s : 1: opt_after_jit_grad 0.58% : 0.000184s : 1: opt_b 13.18% : 0.004197s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.80% : 0.000255s : 1: renormalize.infer 0.69% : 0.000221s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.58% : 0.000185s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000078s : 1: symbol_engine_optimizer 19.66% : 0.006260s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 19.51% : 0.006211s : 1: type_inference 0.19% : 0.000061s : 1: validate TotalTime = 0.066838, [24] [bootstrap]: 0.00045459 [type_inference]: 0.0043814 [event_method]: 1.059e-05 [auto_monad]: 5.068e-05 [graph_reusing]: 4.92e-06 [inline]: 1.90001e-06 [add_attr]: 0.0312396, [1] [add_attr_with_inline]: 0.03123, [1] [Cycle 1]: 5.634e-05, [2] [tag_attr]: 1.385e-05 [meta_addattr_fg_expand]: 3.31999e-06 [parallel-infer-symbol]: 3.48999e-06 [pre_auto_parallel]: 2.355e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00389314, [53] [py_interpret_to_execute]: 1.599e-05 [rewriter_before_opt_a]: 4.156e-05 [opt_a]: 0.00206139, [2] [Cycle 1]: 0.00139019, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 2.549e-05 [loop_unroll]: 1.353e-05 [a_1]: 0.00030501 [with_stream_mark]: 1.439e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 1.48002e-06 [a_2]: 7.588e-05 [accelerated_algorithm]: 6.34001e-06 [shard]: 2.25002e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 7.68999e-06 [auto_parallel]: 5.99e-06 [parallel]: 1.809e-05 [flash_sp]: 7.99002e-06 [merge_comm]: 3.33e-06 [allreduce_fusion]: 3.74002e-06 [matmul_add_comm_reduction]: 9.15001e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.70001e-06 [get_grad_eliminate_]: 5.76998e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.097e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.77001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.45002e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.086e-05 [a_after_grad]: 8.95001e-06 [renormalize]: 0.00045846 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.287e-05 [cse]: 2.737e-05 [a_3]: 4.014e-05 [Cycle 2]: 0.00066155, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00018829 [with_stream_mark]: 9.89999e-06 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.917e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.39998e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.18999e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 4.94e-06 [allreduce_slice_to_reducescatter]: 3.70026e-07 [virtual_shard_identity]: 6.68e-06 [virtual_dataset]: 5.18002e-06 [get_grad_eliminate_]: 5.11997e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.94001e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.78002e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.13001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.63997e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.23002e-06 [after_resolve]: 9.28002e-06 [a_after_grad]: 7.93999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.80013e-07 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 6.79999e-06 [cse]: 1.438e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.038e-05 [convert_after_rewriter]: 7.39002e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00046191 [opt_b]: 0.00018029, [1] [Cycle 1]: 0.00017397, [7] [b_1]: 0.00010679 [b_2]: 6.84999e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.80009e-07 [cse]: 1.621e-05 [optimize_parallel_all_gather_comm]: 1.615e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.126e-05 [loop_unroll]: 0.00042301 [opt_after_cconv]: 9.538e-05, [1] [Cycle 1]: 8.937e-05, [7] [c_1]: 2.817e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.586e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.224e-05 [tuple_transform]: 6.838e-05, [1] [Cycle 1]: 6.408e-05, [4] [d_1]: 3.885e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.20002e-06 [partial_unused_args_eliminate]: 1.58002e-06 [add_recomputation]: 4.345e-05 [cse_after_recomputation]: 2.073e-05, [1] [Cycle 1]: 1.626e-05, [1] [cse]: 1.113e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.36002e-06 [bias_add_comm_swap]: 3.11999e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.26998e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.15e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.97e-06 [overlap_recompute_and_grad_model_parallel]: 4.12e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.66e-06 [overlap_grad_ring_attention]: 4.00998e-06 [overlap_grad_flash_sp]: 1.741e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.876e-05, [1] [Cycle 1]: 6.437e-05, [6] [build]: 2.54999e-06 [elim_shapecalc]: 8.01001e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 5.98002e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.66e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.45998e-06 [opt_after_jit_grad]: 0.00045546 [validate]: 3.135e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0260443 [execute]: 7.11999e-06 Sums bootstrap : 0.000455s : 1.31% type_inference : 0.004381s : 12.65% event_method : 0.000011s : 0.03% auto_monad : 0.000051s : 0.15% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000024s : 0.07% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.05% optimize.rewriter_before_opt_a : 0.000042s : 0.12% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.09% optimize.opt_a.loop_unroll : 0.000019s : 0.06% optimize.opt_a.a_1 : 0.000493s : 1.42% optimize.opt_a.with_stream_mark : 0.000024s : 0.07% optimize.opt_a.recompute_prepare : 0.000013s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.02% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.42% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.03% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.03% optimize.opt_a.merge_send_recv : 0.000012s : 0.03% optimize.opt_a.auto_parallel : 0.000011s : 0.03% optimize.opt_a.parallel : 0.000022s : 0.06% optimize.opt_a.flash_sp : 0.000011s : 0.03% optimize.opt_a.merge_comm : 0.000006s : 0.02% optimize.opt_a.allreduce_fusion : 0.000006s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.04% optimize.opt_a.virtual_dataset : 0.000011s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.03% optimize.opt_a.virtual_output : 0.000011s : 0.03% optimize.opt_a.merge_forward : 0.000007s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.02% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.06% optimize.opt_a.a_after_grad : 0.000017s : 0.05% optimize.opt_a.renormalize : 0.000459s : 1.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.06% optimize.opt_a.cse : 0.000042s : 0.12% optimize.opt_a.a_3 : 0.000072s : 0.21% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.09% optimize.convert_after_rewriter : 0.000007s : 0.02% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000462s : 1.33% optimize.opt_b.b_1 : 0.000107s : 0.31% optimize.opt_b.b_2 : 0.000007s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.05% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000423s : 1.22% optimize.opt_after_cconv.c_1 : 0.000028s : 0.08% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.05% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.04% optimize.tuple_transform.d_1 : 0.000039s : 0.11% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.13% optimize.cse_after_recomputation.cse : 0.000011s : 0.03% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.05% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.05% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000455s : 1.32% validate : 0.000031s : 0.09% backend_pass : 0.000001s : 0.00% task_emit : 0.026044s : 75.21% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000129 26 18.13% : 0.000023s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 4.15% : 0.000005s : 4: substitution.graph_param_transform 67.05% : 0.000087s : 2: substitution.inline 2.10% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.35% : 0.000004s : 4: substitution.remove_not_recompute_node 2.90% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004341 2 92.00% : 0.003994s : 1: type_inference.infer 8.00% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000085 2 100.00% : 0.000085s : 2: match.inline ------[predicate.] 0.000137 984 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.25% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.50% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_depend_swap 1.98% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.02% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 1.01% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.41% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.87% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.84% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.54% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 40.93% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.07% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.103430 196 0.00% : 0.000004s : 1: ForceFp32Comm 30.21% : 0.031245s : 1: add_attr 30.20% : 0.031234s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.47% : 0.000490s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.01% : 0.000013s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.42% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.82% : 0.000848s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.00% : 0.002064s : 1: opt_a 0.10% : 0.000099s : 1: opt_after_cconv 0.45% : 0.000465s : 1: opt_after_jit_grad 0.18% : 0.000184s : 1: opt_b 3.77% : 0.003897s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000028s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000266s : 1: renormalize.infer 0.18% : 0.000186s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000046s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000071s : 1: symbol_engine_optimizer 25.19% : 0.026056s : 1: task_emit 0.07% : 0.000071s : 1: tuple_transform 4.25% : 0.004395s : 1: type_inference 0.06% : 0.000059s : 1: validate TotalTime = 0.0553372, [24] [bootstrap]: 0.00051034 [type_inference]: 0.00555608 [event_method]: 1.368e-05 [auto_monad]: 5.638e-05 [graph_reusing]: 5.58997e-06 [inline]: 1.72001e-06 [add_attr]: 0.00299454, [1] [add_attr_with_inline]: 0.00298712, [1] [Cycle 1]: 4.427e-05, [2] [tag_attr]: 1.521e-05 [meta_addattr_fg_expand]: 3.7e-06 [parallel-infer-symbol]: 2.66999e-06 [pre_auto_parallel]: 2.494e-05 [insert-virtual-dataset]: 2.41998e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.0394181, [53] [py_interpret_to_execute]: 2.015e-05 [rewriter_before_opt_a]: 5.955e-05 [opt_a]: 0.0372158, [2] [Cycle 1]: 0.0366, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 3.129e-05 [loop_unroll]: 2.085e-05 [a_1]: 0.0004421 [with_stream_mark]: 1.284e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.50999e-06 [a_2]: 0.0346642 [accelerated_algorithm]: 1.429e-05 [shard]: 4.99e-06 [meta_shard_fg_expand]: 3.31001e-06 [shard_inline]: 7.63999e-06 [merge_send_recv]: 1.753e-05 [auto_parallel]: 1.497e-05 [parallel]: 2.227e-05 [flash_sp]: 1.378e-05 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 1.24e-05 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 8.65001e-06 [virtual_dataset]: 6.53e-06 [get_grad_eliminate_]: 5.85002e-06 [virtual_output]: 6.04001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 2.83998e-06 [offload_activation]: 1.025e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.305e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 1.04e-05 [set_forward_comm_id_for_comm_node_pass]: 3.28998e-06 [meta_fg_expand]: 3.61001e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.48998e-06 [after_resolve]: 1.172e-05 [a_after_grad]: 9.32001e-06 [renormalize]: 0.00080306 [add_forward_monad_depend]: 6.64999e-06 [auto_monad_grad]: 2.34001e-06 [auto_monad_eliminator]: 1.465e-05 [cse]: 2.92e-05 [a_3]: 4.269e-05 [Cycle 2]: 0.00060281, [45] [expand_dump_flag]: 1.97001e-06 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00013053 [with_stream_mark]: 1.283e-05 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 2.84999e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 6.769e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.22999e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.51002e-06 [merge_send_recv]: 4.23001e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.26001e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.65997e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 5.57999e-06 [virtual_dataset]: 5.16002e-06 [get_grad_eliminate_]: 4.95001e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.86001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.88e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.02e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.38998e-06 [cse]: 1.555e-05 [a_3]: 3.221e-05 [py_interpret_to_execute_after_opt_a]: 1.16e-05 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 3.571e-05 [convert_after_rewriter]: 6.76999e-06 [order_py_execute_after_rewriter]: 4.63001e-06 [mutable_eliminate]: 0.00071733 [opt_b]: 0.00018492, [1] [Cycle 1]: 0.00017797, [7] [b_1]: 0.0001104 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 3.19997e-07 [cse]: 1.647e-05 [optimize_parallel_all_gather_comm]: 1.492e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.326e-05 [loop_unroll]: 0.00041719 [opt_after_cconv]: 9.577e-05, [1] [Cycle 1]: 8.995e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 2.06998e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.619e-05 [renormalize]: 6.30011e-07 [remove_dup_value]: 1.291e-05 [tuple_transform]: 0.00012625, [1] [Cycle 1]: 0.00012168, [4] [d_1]: 3.833e-05 [none_parameter_eliminate]: 5.688e-05 [renormalize]: 2.60014e-07 [switch_simplify]: 6.86999e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 6.264e-05 [cse_after_recomputation]: 2.173e-05, [1] [Cycle 1]: 1.709e-05, [1] [cse]: 1.174e-05 [environ_conv]: 5.47001e-06 [swap_dp_allreduce_reducescatter]: 5.22e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.12998e-06 [label_fine_grained_interleaved_index]: 2.63003e-06 [merge_cast_opt]: 1.34003e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.10002e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 8.60018e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.44999e-06 [reorder_send_recv_between_fp_bp]: 2.45002e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 1.25999e-06 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 9.90025e-07 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 2.02999e-06 [offloading_packed_experts]: 4.12e-06 [overlap_recompute_and_grad_model_parallel]: 4.41002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 3.80998e-06 [overlap_grad_flash_sp]: 1.832e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 1.21997e-06 [symbol_engine_optimizer]: 7.133e-05, [1] [Cycle 1]: 6.7e-05, [6] [build]: 2.54999e-06 [elim_shapecalc]: 8.62998e-06 [elim_not_effective]: 1.209e-05 [opt_reshape]: 6.67002e-06 [fold_const_symbol]: 9.09e-06 [renormalize]: 2.50002e-07 [detach_backward]: 2.15002e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.628e-05 [get_jit_bprop_graph]: 1.83002e-06 [rewriter_after_jit_bprop_graph]: 3.98001e-06 [opt_after_jit_grad]: 0.00045153 [validate]: 3.295e-05 [backend_pass]: 8.29983e-07 [task_emit]: 0.0060296 [execute]: 6.60002e-06 Sums bootstrap : 0.000510s : 0.99% type_inference : 0.005556s : 10.83% event_method : 0.000014s : 0.03% auto_monad : 0.000056s : 0.11% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000060s : 0.12% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.05% optimize.opt_a.a_1 : 0.000573s : 1.12% optimize.opt_a.with_stream_mark : 0.000026s : 0.05% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.034732s : 67.69% optimize.opt_a.accelerated_algorithm : 0.000020s : 0.04% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000013s : 0.03% optimize.opt_a.merge_send_recv : 0.000022s : 0.04% optimize.opt_a.auto_parallel : 0.000020s : 0.04% optimize.opt_a.parallel : 0.000027s : 0.05% optimize.opt_a.flash_sp : 0.000017s : 0.03% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000803s : 1.57% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.04% optimize.opt_a.cse : 0.000045s : 0.09% optimize.opt_a.a_3 : 0.000075s : 0.15% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.07% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000717s : 1.40% optimize.opt_b.b_1 : 0.000110s : 0.22% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.05% optimize.loop_unroll : 0.000417s : 0.81% optimize.opt_after_cconv.c_1 : 0.000028s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.03% optimize.tuple_transform.d_1 : 0.000038s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000057s : 0.11% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000063s : 0.12% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.04% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000452s : 0.88% validate : 0.000033s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.006030s : 11.75% execute : 0.000007s : 0.01% Time group info: ------[substitution.] 0.000176 30 20.67% : 0.000036s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.00% : 0.000005s : 4: substitution.graph_param_transform 61.02% : 0.000107s : 3: substitution.inline 1.94% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000005s : 4: substitution.remove_not_recompute_node 2.68% : 0.000005s : 4: substitution.replace_old_param 6.07% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005515 2 90.07% : 0.004967s : 1: type_inference.infer 9.93% : 0.000548s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.68% : 0.000027s : 3: replace.inline 29.32% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.62% : 0.000105s : 3: match.inline 8.38% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000175 1131 0.91% : 0.000002s : 11: predicate.accumulaten_eliminater 0.80% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 11: predicate.addn_zero_filter 0.72% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 3.07% : 0.000005s : 19: predicate.arithmetic_simplify 0.91% : 0.000002s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.00% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.97% : 0.000002s : 15: predicate.environ_get_depend_swap 1.67% : 0.000003s : 23: predicate.environ_get_eliminate 1.01% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.14% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000004s : 16: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 5.58% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.34% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.19% : 0.000004s : 32: predicate.load_eliminater 0.91% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.96% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 2.06% : 0.000004s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 11: predicate.minmaximum_grad 1.00% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.34% : 0.000001s : 4: predicate.parallel_virtual_node 1.48% : 0.000003s : 16: predicate.partial_defer_inline 1.34% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.21% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000000s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.45% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.67% : 0.000001s : 8: predicate.special_op_eliminate 4.26% : 0.000007s : 8: predicate.specialize_transform 0.90% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.33% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 16: predicate.switch_defer_inline 1.83% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.62% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000002s : 11: predicate.transpose_eliminate 1.48% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.23% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.56% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.55% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.14% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.03% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000388 8 40.95% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 59.05% : 0.000229s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.099715 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.01% : 0.002999s : 1: add_attr 3.00% : 0.002991s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000067s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000062s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.55% : 0.000544s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.01% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.43% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000726s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.01% : 0.001008s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 37.33% : 0.037219s : 1: opt_a 0.10% : 0.000099s : 1: opt_after_cconv 0.46% : 0.000461s : 1: opt_after_jit_grad 0.19% : 0.000188s : 1: opt_b 39.54% : 0.039423s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.44% : 0.000441s : 1: renormalize.infer 0.35% : 0.000351s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000040s : 1: rewriter_after_opt_a 0.06% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000074s : 1: symbol_engine_optimizer 6.06% : 0.006040s : 1: task_emit 0.13% : 0.000129s : 1: tuple_transform 5.59% : 0.005570s : 1: type_inference 0.07% : 0.000065s : 1: validate TotalTime = 0.0724822, [24] [bootstrap]: 0.0004863 [type_inference]: 0.0282818 [event_method]: 5.12e-05 [auto_monad]: 0.00012388 [graph_reusing]: 8.95001e-06 [inline]: 2.29999e-06 [add_attr]: 0.00337543, [1] [add_attr_with_inline]: 0.00336601, [1] [Cycle 1]: 8.879e-05, [2] [tag_attr]: 4.436e-05 [meta_addattr_fg_expand]: 9.22999e-06 [parallel-infer-symbol]: 3.43e-06 [pre_auto_parallel]: 5.261e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 9.30013e-07 [dataset_repeat_opt]: 1.93002e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0146761, [53] [py_interpret_to_execute]: 4.053e-05 [rewriter_before_opt_a]: 0.00016055 [opt_a]: 0.0121996, [3] [Cycle 1]: 0.00787418, [45] [expand_dump_flag]: 5.14e-06 [switch_simplify]: 7.421e-05 [loop_unroll]: 6.134e-05 [a_1]: 0.00149592 [with_stream_mark]: 2.392e-05 [recompute_prepare]: 2.191e-05 [updatestate_depend_eliminate]: 9.42001e-06 [updatestate_assign_eliminate]: 7.29001e-06 [updatestate_loads_eliminate]: 7.14001e-06 [parameter_eliminate]: 2.84999e-06 [a_2]: 0.00024574 [accelerated_algorithm]: 3.192e-05 [shard]: 1.86e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.622e-05 [merge_send_recv]: 1.591e-05 [auto_parallel]: 1.107e-05 [parallel]: 1.843e-05 [flash_sp]: 1.199e-05 [merge_comm]: 9.46998e-06 [allreduce_fusion]: 8.84998e-06 [matmul_add_comm_reduction]: 2.781e-05 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 1.809e-05 [virtual_dataset]: 6.369e-05 [get_grad_eliminate_]: 1.581e-05 [virtual_output]: 1.567e-05 [merge_forward]: 9.57001e-06 [cell_reuse_recompute_pass]: 1.55001e-06 [offload_activation]: 1.89e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.963e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 2.807e-05 [set_forward_comm_id_for_comm_node_pass]: 9.81998e-06 [meta_fg_expand]: 0.00157425 [flash_sp_send_recv_attached]: 3.88999e-06 [receive_attached]: 2.46e-06 [after_resolve]: 6.236e-05 [a_after_grad]: 8.159e-05 [renormalize]: 0.0029485 [add_forward_monad_depend]: 1.089e-05 [auto_monad_grad]: 5.90002e-06 [auto_monad_eliminator]: 5.807e-05 [cse]: 0.00016767 [a_3]: 0.00034099 [Cycle 2]: 0.00339502, [45] [expand_dump_flag]: 1.62999e-06 [switch_simplify]: 4.686e-05 [loop_unroll]: 4.42e-05 [a_1]: 0.00155462 [with_stream_mark]: 1.354e-05 [recompute_prepare]: 1.124e-05 [updatestate_depend_eliminate]: 5.60001e-06 [updatestate_assign_eliminate]: 4.46002e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 0.00012571 [accelerated_algorithm]: 1.363e-05 [shard]: 1.06997e-06 [meta_shard_fg_expand]: 2.41998e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 8.12e-06 [auto_parallel]: 8.32e-06 [parallel]: 7.33e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 5.12e-06 [matmul_add_comm_reduction]: 9.57001e-06 [allreduce_slice_to_reducescatter]: 4.80009e-07 [virtual_shard_identity]: 1.049e-05 [virtual_dataset]: 9.22001e-06 [get_grad_eliminate_]: 9.14e-06 [virtual_output]: 8.75001e-06 [merge_forward]: 4.90001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.043e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.653e-05 [merge_recompute_call_nodes]: 8.30012e-07 [before_grad]: 1.459e-05 [set_forward_comm_id_for_comm_node_pass]: 5.56e-06 [meta_fg_expand]: 0.00013033 [flash_sp_send_recv_attached]: 1.20999e-06 [receive_attached]: 1.59e-06 [after_resolve]: 1.685e-05 [a_after_grad]: 1.46e-05 [renormalize]: 0.0008646 [add_forward_monad_depend]: 4.63001e-06 [auto_monad_grad]: 1.29998e-06 [auto_monad_eliminator]: 1.593e-05 [cse]: 5.091e-05 [a_3]: 6.681e-05 [Cycle 3]: 0.00091493, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 1.07e-05 [loop_unroll]: 9.31e-06 [a_1]: 0.00025085 [with_stream_mark]: 1.118e-05 [recompute_prepare]: 9.29e-06 [updatestate_depend_eliminate]: 4.74002e-06 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012308 [accelerated_algorithm]: 1.235e-05 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.20001e-06 [merge_send_recv]: 7.65e-06 [auto_parallel]: 7.43999e-06 [parallel]: 5.29e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 4.92999e-06 [allreduce_fusion]: 5.14998e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.073e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.62e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.34997e-06 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 8.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 1.418e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25001e-06 [meta_fg_expand]: 3.24001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.24e-06 [after_resolve]: 1.548e-05 [a_after_grad]: 1.499e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 1.184e-05 [cse]: 2.697e-05 [a_3]: 5.951e-05 [py_interpret_to_execute_after_opt_a]: 1.406e-05 [slice_cell_reuse_recomputed_activation]: 1.72001e-06 [rewriter_after_opt_a]: 4.847e-05 [convert_after_rewriter]: 9.64999e-06 [order_py_execute_after_rewriter]: 7.07002e-06 [mutable_eliminate]: 0.00057499 [opt_b]: 0.00029209, [1] [Cycle 1]: 0.00028566, [7] [b_1]: 0.00018992 [b_2]: 1.102e-05 [updatestate_depend_eliminate]: 7.66001e-06 [updatestate_assign_eliminate]: 3.99002e-06 [updatestate_loads_eliminate]: 4.23001e-06 [renormalize]: 5.10016e-07 [cse]: 3.331e-05 [optimize_parallel_all_gather_comm]: 2.116e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.146e-05 [loop_unroll]: 0.00044604 [opt_after_cconv]: 0.00013857, [1] [Cycle 1]: 0.00013237, [7] [c_1]: 4.887e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 7.49002e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 3.041e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 3.117e-05 [tuple_transform]: 0.00010366, [1] [Cycle 1]: 9.802e-05, [4] [d_1]: 6.715e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 3.09985e-07 [switch_simplify]: 9.74e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 6.203e-05 [cse_after_recomputation]: 3.32e-05, [1] [Cycle 1]: 2.815e-05, [1] [cse]: 2.258e-05 [environ_conv]: 9.12001e-06 [swap_dp_allreduce_reducescatter]: 8.06001e-06 [bias_add_comm_swap]: 2.78e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.38002e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.32999e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.31998e-06 [interleave_parallel_branches]: 9.90025e-07 [overlap_opt_shard_in_pipeline]: 1.06002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.781e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 5.25999e-06 [overlap_recompute_and_grad_model_parallel]: 5.67001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 1.146e-05 [overlap_grad_ring_attention]: 6.27001e-06 [overlap_grad_flash_sp]: 2.533e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 0.00010628, [1] [Cycle 1]: 0.00010164, [6] [build]: 1.078e-05 [elim_shapecalc]: 1.504e-05 [elim_not_effective]: 1.864e-05 [opt_reshape]: 1.134e-05 [fold_const_symbol]: 1.589e-05 [renormalize]: 1.8999e-07 [detach_backward]: 2.19999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.568e-05 [get_jit_bprop_graph]: 1.82999e-06 [rewriter_after_jit_bprop_graph]: 3.75998e-06 [opt_after_jit_grad]: 0.00049501 [validate]: 4.807e-05 [backend_pass]: 9.90025e-07 [task_emit]: 0.0245917 [execute]: 8.57e-06 Sums bootstrap : 0.000486s : 0.72% type_inference : 0.028282s : 41.72% event_method : 0.000051s : 0.08% auto_monad : 0.000124s : 0.18% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000044s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000053s : 0.08% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.06% optimize.rewriter_before_opt_a : 0.000161s : 0.24% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.19% optimize.opt_a.loop_unroll : 0.000115s : 0.17% optimize.opt_a.a_1 : 0.003301s : 4.87% optimize.opt_a.with_stream_mark : 0.000049s : 0.07% optimize.opt_a.recompute_prepare : 0.000042s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.02% optimize.opt_a.parameter_eliminate : 0.000006s : 0.01% optimize.opt_a.a_2 : 0.000495s : 0.73% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.09% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.05% optimize.opt_a.merge_send_recv : 0.000032s : 0.05% optimize.opt_a.auto_parallel : 0.000027s : 0.04% optimize.opt_a.parallel : 0.000031s : 0.05% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.03% optimize.opt_a.allreduce_fusion : 0.000019s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.06% optimize.opt_a.virtual_dataset : 0.000082s : 0.12% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.05% optimize.opt_a.virtual_output : 0.000033s : 0.05% optimize.opt_a.merge_forward : 0.000019s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000038s : 0.06% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.09% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.03% optimize.opt_a.meta_fg_expand : 0.001708s : 2.52% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000095s : 0.14% optimize.opt_a.a_after_grad : 0.000111s : 0.16% optimize.opt_a.renormalize : 0.003813s : 5.62% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.02% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.13% optimize.opt_a.cse : 0.000246s : 0.36% optimize.opt_a.a_3 : 0.000467s : 0.69% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.07% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000575s : 0.85% optimize.opt_b.b_1 : 0.000190s : 0.28% optimize.opt_b.b_2 : 0.000011s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.05% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000446s : 0.66% optimize.opt_after_cconv.c_1 : 0.000049s : 0.07% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.04% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000031s : 0.05% optimize.tuple_transform.d_1 : 0.000067s : 0.10% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.09% optimize.cse_after_recomputation.cse : 0.000023s : 0.03% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.03% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000011s : 0.02% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.04% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.04% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000495s : 0.73% validate : 0.000048s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.024592s : 36.28% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000840 222 6.08% : 0.000051s : 12: substitution.arithmetic_simplify 1.94% : 0.000016s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000003s : 5: substitution.fold_const_symbol 0.89% : 0.000007s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 57.60% : 0.000484s : 17: substitution.inline 2.00% : 0.000017s : 2: substitution.inline_without_move 1.28% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.06% : 0.000017s : 3: substitution.less_batch_normalization 1.65% : 0.000014s : 11: substitution.minmaximum_grad 0.72% : 0.000006s : 5: substitution.partial_eliminate 1.63% : 0.000014s : 20: substitution.remove_not_recompute_node 2.96% : 0.000025s : 10: substitution.replace_applicator 1.35% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.48% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.61% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.18% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.89% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.16% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.028195 2 36.97% : 0.010424s : 1: type_inference.infer 63.03% : 0.017771s : 1: type_inference.specialize ------[replace.] 0.000225 33 57.69% : 0.000130s : 17: replace.inline 42.31% : 0.000095s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000509 33 93.32% : 0.000475s : 17: match.inline 6.68% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000757 5764 1.10% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000016s : 100: predicate.arithmetic_simplify 1.19% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.54% : 0.000042s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.71% : 0.000005s : 32: predicate.less_batch_normalization 1.69% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.73% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.09% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.96% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.11% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.72% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.60% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.017801 34 5.35% : 0.000953s : 13: func_graph_cloner_run.FuncGraphClonerGraph 94.65% : 0.016848s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.099699 237 0.00% : 0.000004s : 1: ForceFp32Comm 3.39% : 0.003380s : 1: add_attr 3.38% : 0.003370s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.13% : 0.000131s : 1: auto_monad 0.03% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.52% : 0.000521s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.04% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.06% : 0.000060s : 1: event_method 0.01% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.46% : 0.000456s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000585s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000018s : 1: opt.transform.mutable_eliminate 5.05% : 0.005031s : 117: opt.transform.opt_a 0.05% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.18% : 0.000176s : 28: opt.transform.opt_b 0.08% : 0.000075s : 2: opt.transform.opt_trans_graph 0.06% : 0.000057s : 4: opt.transform.symbol_engine_opt 12.24% : 0.012203s : 1: opt_a 0.14% : 0.000142s : 1: opt_after_cconv 0.51% : 0.000505s : 1: opt_after_jit_grad 0.30% : 0.000296s : 1: opt_b 14.72% : 0.014681s : 1: optimize 0.03% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.03% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000015s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.06% : 0.000058s : 1: pre_auto_parallel 0.04% : 0.000045s : 1: py_interpret_to_execute 0.02% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.04% : 0.000035s : 1: remove_dup_value 2.10% : 0.002096s : 2: renormalize.infer 1.71% : 0.001703s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000053s : 1: rewriter_after_opt_a 0.17% : 0.000165s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000109s : 1: symbol_engine_optimizer 24.68% : 0.024605s : 1: task_emit 0.11% : 0.000107s : 1: tuple_transform 28.39% : 0.028302s : 1: type_inference 0.09% : 0.000086s : 1: validate TotalTime = 0.0184879, [24] [bootstrap]: 0.00044344 [type_inference]: 0.00427634 [event_method]: 1.087e-05 [auto_monad]: 5.097e-05 [graph_reusing]: 5.27001e-06 [inline]: 2.10002e-06 [add_attr]: 0.00320791, [1] [add_attr_with_inline]: 0.00319828, [1] [Cycle 1]: 0.00028449, [2] [tag_attr]: 1.274e-05 [meta_addattr_fg_expand]: 0.0002275 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 2.26e-05 [insert-virtual-dataset]: 2.44999e-06 [parallel-infer-symbol-second]: 1.07e-06 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00367709, [53] [py_interpret_to_execute]: 1.688e-05 [rewriter_before_opt_a]: 4.084e-05 [opt_a]: 0.00186929, [2] [Cycle 1]: 0.00127598, [45] [expand_dump_flag]: 2.53998e-06 [switch_simplify]: 2.477e-05 [loop_unroll]: 1.423e-05 [a_1]: 0.00029416 [with_stream_mark]: 1.384e-05 [recompute_prepare]: 7.33999e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.8e-05 [accelerated_algorithm]: 6.74999e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 7.71001e-06 [auto_parallel]: 5.92999e-06 [parallel]: 1.739e-05 [flash_sp]: 7.26001e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.71001e-06 [matmul_add_comm_reduction]: 9.40001e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 6.26e-06 [get_grad_eliminate_]: 6.07001e-06 [virtual_output]: 6.47001e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.19003e-06 [offload_activation]: 9.20999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 9.25999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22002e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.67e-06 [renormalize]: 0.00035503 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.291e-05 [cse]: 2.733e-05 [a_3]: 3.963e-05 [Cycle 2]: 0.00058329, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.71e-06 [loop_unroll]: 5.22999e-06 [a_1]: 0.00012419 [with_stream_mark]: 8.89e-06 [recompute_prepare]: 5.66998e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.74e-05 [accelerated_algorithm]: 5.49998e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.36002e-06 [auto_parallel]: 5.06002e-06 [parallel]: 4.31002e-06 [flash_sp]: 3.9e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.20999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 4.83001e-06 [virtual_output]: 4.77e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 5.71003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36998e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 8.74003e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 5.92999e-06 [cse]: 1.281e-05 [a_3]: 3.094e-05 [py_interpret_to_execute_after_opt_a]: 7.18e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.065e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.0004575 [opt_b]: 0.00017813, [1] [Cycle 1]: 0.00017227, [7] [b_1]: 0.00010687 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.16998e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.30008e-07 [cse]: 1.528e-05 [optimize_parallel_all_gather_comm]: 1.569e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.18e-05 [loop_unroll]: 0.00041126 [opt_after_cconv]: 9.44e-05, [1] [Cycle 1]: 8.857e-05, [7] [c_1]: 2.729e-05 [parameter_eliminate]: 2.12001e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.639e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.859e-05, [1] [Cycle 1]: 6.425e-05, [4] [d_1]: 3.86e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 2.17001e-06 [add_recomputation]: 4.396e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.626e-05, [1] [cse]: 1.111e-05 [environ_conv]: 4.27e-06 [swap_dp_allreduce_reducescatter]: 5.47999e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.32998e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.03997e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 5.02e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.71998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.45002e-06 [overlap_grad_ring_attention]: 3.85998e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 2.07001e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 6.712e-05, [1] [Cycle 1]: 6.312e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.08999e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.607e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.98999e-06 [opt_after_jit_grad]: 0.00044348 [validate]: 3.042e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00606747 [execute]: 6.87002e-06 Sums bootstrap : 0.000443s : 3.05% type_inference : 0.004276s : 29.41% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000227s : 1.56% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.12% optimize.rewriter_before_opt_a : 0.000041s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000418s : 2.88% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000355s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000457s : 3.15% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000411s : 2.83% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000443s : 3.05% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006067s : 41.73% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 17.88% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.40% : 0.000005s : 4: substitution.graph_param_transform 66.14% : 0.000080s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000004s : 4: substitution.remove_not_recompute_node 3.11% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004234 2 91.85% : 0.003889s : 1: type_inference.infer 8.15% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000141 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.74% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.79% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.99% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.97% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000002s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.95% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.67% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 1.08% : 0.000002s : 8: predicate.special_op_eliminate 1.07% : 0.000002s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.36% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.76% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.97% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.06% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 42.13% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.87% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026656 196 0.01% : 0.000003s : 1: ForceFp32Comm 12.05% : 0.003213s : 1: add_attr 12.01% : 0.003202s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.79% : 0.000477s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.08% : 0.000021s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.89% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.02% : 0.001872s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.70% : 0.000453s : 1: opt_after_jit_grad 0.68% : 0.000181s : 1: opt_b 13.81% : 0.003681s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000006s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000196s : 1: renormalize.infer 0.58% : 0.000153s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 22.80% : 0.006077s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.09% : 0.004290s : 1: type_inference 0.21% : 0.000057s : 1: validate TotalTime = 0.0361568, [24] [bootstrap]: 0.00047476 [type_inference]: 0.0103722 [event_method]: 4.119e-05 [auto_monad]: 0.00011447 [graph_reusing]: 8.19002e-06 [inline]: 2.14e-06 [add_attr]: 0.00304144, [1] [add_attr_with_inline]: 0.00303265, [1] [Cycle 1]: 6.806e-05, [2] [tag_attr]: 3.223e-05 [meta_addattr_fg_expand]: 8.69e-06 [parallel-infer-symbol]: 3.06999e-06 [pre_auto_parallel]: 4.546e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.69e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.013102, [53] [py_interpret_to_execute]: 3.684e-05 [rewriter_before_opt_a]: 0.00012888 [opt_a]: 0.0108407, [3] [Cycle 1]: 0.00697347, [45] [expand_dump_flag]: 3.46001e-06 [switch_simplify]: 6.642e-05 [loop_unroll]: 5.473e-05 [a_1]: 0.00135003 [with_stream_mark]: 2.289e-05 [recompute_prepare]: 2.188e-05 [updatestate_depend_eliminate]: 8.74e-06 [updatestate_assign_eliminate]: 8.1e-06 [updatestate_loads_eliminate]: 7.23999e-06 [parameter_eliminate]: 2.51998e-06 [a_2]: 0.00024426 [accelerated_algorithm]: 3.043e-05 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 3.43999e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.572e-05 [auto_parallel]: 1.091e-05 [parallel]: 1.917e-05 [flash_sp]: 1.202e-05 [merge_comm]: 9.74999e-06 [allreduce_fusion]: 8.60999e-06 [matmul_add_comm_reduction]: 2.609e-05 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 1.8e-05 [virtual_dataset]: 1.626e-05 [get_grad_eliminate_]: 1.574e-05 [virtual_output]: 1.521e-05 [merge_forward]: 9.64e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.772e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.924e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.709e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.00145253 [flash_sp_send_recv_attached]: 3.65998e-06 [receive_attached]: 2.83998e-06 [after_resolve]: 5.924e-05 [a_after_grad]: 8.001e-05 [renormalize]: 0.00242626 [add_forward_monad_depend]: 9.16998e-06 [auto_monad_grad]: 5.81e-06 [auto_monad_eliminator]: 5.563e-05 [cse]: 0.00016082 [a_3]: 0.00033617 [Cycle 2]: 0.0029641, [45] [expand_dump_flag]: 1.76e-06 [switch_simplify]: 4.669e-05 [loop_unroll]: 4.331e-05 [a_1]: 0.00155422 [with_stream_mark]: 1.227e-05 [recompute_prepare]: 1.055e-05 [updatestate_depend_eliminate]: 5.28002e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 0.00012513 [accelerated_algorithm]: 1.219e-05 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.83997e-06 [shard_inline]: 9.31002e-06 [merge_send_recv]: 6.93e-06 [auto_parallel]: 7.54002e-06 [parallel]: 4.84e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 5.56e-06 [allreduce_fusion]: 4.72e-06 [matmul_add_comm_reduction]: 8.06001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.94e-06 [get_grad_eliminate_]: 8.72e-06 [virtual_output]: 8.34998e-06 [merge_forward]: 4.37998e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.724e-05 [merge_recompute_call_nodes]: 9.00007e-07 [before_grad]: 1.407e-05 [set_forward_comm_id_for_comm_node_pass]: 5.52001e-06 [meta_fg_expand]: 3.656e-05 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.24e-06 [after_resolve]: 1.449e-05 [a_after_grad]: 1.404e-05 [renormalize]: 0.00056928 [add_forward_monad_depend]: 3.76999e-06 [auto_monad_grad]: 1.29003e-06 [auto_monad_eliminator]: 1.407e-05 [cse]: 4.431e-05 [a_3]: 6.531e-05 [Cycle 3]: 0.00088908, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.053e-05 [loop_unroll]: 8.82e-06 [a_1]: 0.00024669 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 9.23002e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 3.83001e-06 [updatestate_loads_eliminate]: 4.00998e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 0.00012226 [accelerated_algorithm]: 1.118e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.66998e-06 [shard_inline]: 9.03002e-06 [merge_send_recv]: 6.82002e-06 [auto_parallel]: 7.15998e-06 [parallel]: 4.37e-06 [flash_sp]: 1.09998e-06 [merge_comm]: 4.89998e-06 [allreduce_fusion]: 5.06002e-06 [matmul_add_comm_reduction]: 7.73001e-06 [allreduce_slice_to_reducescatter]: 3.49974e-07 [virtual_shard_identity]: 9.89999e-06 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.42e-06 [virtual_output]: 8.19002e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 8.47e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.548e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.379e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07999e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.362e-05 [a_after_grad]: 1.41e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24998e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 9.92999e-06 [cse]: 2.609e-05 [a_3]: 5.925e-05 [py_interpret_to_execute_after_opt_a]: 1.088e-05 [slice_cell_reuse_recomputed_activation]: 1.72001e-06 [rewriter_after_opt_a]: 4.587e-05 [convert_after_rewriter]: 9.46e-06 [order_py_execute_after_rewriter]: 6.63e-06 [mutable_eliminate]: 0.0004898 [opt_b]: 0.00028501, [1] [Cycle 1]: 0.00027833, [7] [b_1]: 0.00018727 [b_2]: 1.075e-05 [updatestate_depend_eliminate]: 7.08998e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 4.11001e-06 [renormalize]: 3.60014e-07 [cse]: 3.02e-05 [optimize_parallel_all_gather_comm]: 1.992e-05 [overlap_param_gather]: 1.81003e-06 [cconv]: 2.124e-05 [loop_unroll]: 0.00041757 [opt_after_cconv]: 0.00013342, [1] [Cycle 1]: 0.0001276, [7] [c_1]: 4.694e-05 [parameter_eliminate]: 2.66999e-06 [updatestate_depend_eliminate]: 7.15998e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 3.86999e-06 [cse]: 2.89e-05 [renormalize]: 1.79978e-07 [remove_dup_value]: 3.007e-05 [tuple_transform]: 0.00010049, [1] [Cycle 1]: 9.59e-05, [4] [d_1]: 6.618e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.84001e-06 [partial_unused_args_eliminate]: 2.11998e-06 [add_recomputation]: 5.754e-05 [cse_after_recomputation]: 3.138e-05, [1] [Cycle 1]: 2.676e-05, [1] [cse]: 2.115e-05 [environ_conv]: 8.43999e-06 [swap_dp_allreduce_reducescatter]: 7.76001e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 3.01001e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 9.70002e-07 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.731e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 5.09e-06 [overlap_recompute_and_grad_model_parallel]: 5.76003e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 4.99998e-06 [overlap_grad_flash_sp]: 2.426e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.16998e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.796e-05, [1] [Cycle 1]: 9.379e-05, [6] [build]: 1.018e-05 [elim_shapecalc]: 1.276e-05 [elim_not_effective]: 1.785e-05 [opt_reshape]: 9.90002e-06 [fold_const_symbol]: 1.509e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 2.536e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00045819 [validate]: 4.388e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0081943 [execute]: 7.32997e-06 Sums bootstrap : 0.000475s : 1.49% type_inference : 0.010372s : 32.55% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.12% optimize.rewriter_before_opt_a : 0.000129s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003151s : 9.89% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001492s : 4.68% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000087s : 0.27% optimize.opt_a.a_after_grad : 0.000108s : 0.34% optimize.opt_a.renormalize : 0.002996s : 9.40% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000231s : 0.73% optimize.opt_a.a_3 : 0.000461s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000490s : 1.54% optimize.opt_b.b_1 : 0.000187s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000418s : 1.31% optimize.opt_after_cconv.c_1 : 0.000047s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000458s : 1.44% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008194s : 25.71% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000767 218 5.67% : 0.000043s : 11: substitution.arithmetic_simplify 1.83% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 56.68% : 0.000435s : 16: substitution.inline 2.09% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000015s : 3: substitution.less_batch_normalization 1.64% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.88% : 0.000014s : 20: substitution.remove_not_recompute_node 3.13% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.72% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.15% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010304 2 87.27% : 0.008993s : 1: type_inference.infer 12.73% : 0.001311s : 1: type_inference.specialize ------[replace.] 0.000206 30 59.65% : 0.000123s : 16: replace.inline 40.35% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000456 30 93.35% : 0.000426s : 16: match.inline 6.65% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000735 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.10% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.22% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000042s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000019s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000012s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.65% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.34% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.91% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001496 32 57.05% : 0.000853s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.95% : 0.000643s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060412 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.04% : 0.003046s : 1: add_attr 5.03% : 0.003037s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.84% : 0.000509s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.83% : 0.000499s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.93% : 0.004793s : 117: opt.transform.opt_a 0.08% : 0.000046s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.95% : 0.010844s : 1: opt_a 0.23% : 0.000137s : 1: opt_after_cconv 0.77% : 0.000468s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.69% : 0.013106s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.64% : 0.001597s : 2: renormalize.infer 2.29% : 0.001386s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.22% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.58% : 0.008205s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.19% : 0.010387s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x0-kbk],max_mem:6.0M . TotalTime = 0.375137, [24] [bootstrap]: 0.00051539 [type_inference]: 0.00615828 [event_method]: 1.395e-05 [auto_monad]: 5.457e-05 [graph_reusing]: 5.51002e-06 [inline]: 1.96e-06 [add_attr]: 0.0036727, [1] [add_attr_with_inline]: 0.00366143, [1] [Cycle 1]: 4.511e-05, [2] [tag_attr]: 1.482e-05 [meta_addattr_fg_expand]: 4.02e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 2.905e-05 [insert-virtual-dataset]: 2.22001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.0039946, [53] [py_interpret_to_execute]: 2.055e-05 [rewriter_before_opt_a]: 5.998e-05 [opt_a]: 0.00212859, [2] [Cycle 1]: 0.00153534, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 3.195e-05 [loop_unroll]: 2.075e-05 [a_1]: 0.00045699 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.76001e-06 [updatestate_depend_eliminate]: 3.49001e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.90002e-06 [parameter_eliminate]: 1.66002e-06 [a_2]: 7.516e-05 [accelerated_algorithm]: 6.19001e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 7.95e-06 [auto_parallel]: 5.66e-06 [parallel]: 2.551e-05 [flash_sp]: 7.20998e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.13998e-06 [matmul_add_comm_reduction]: 8.69998e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.06999e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.82001e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.73999e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.072e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.56e-06 [flash_sp_send_recv_attached]: 3.04999e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.12e-05 [a_after_grad]: 9.09e-06 [renormalize]: 0.00043153 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 2.649e-05 [a_3]: 4.141e-05 [Cycle 2]: 0.00058347, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.35001e-06 [a_1]: 0.00012475 [with_stream_mark]: 9.27999e-06 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.72e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 5.03002e-06 [parallel]: 4.35e-06 [flash_sp]: 2.93e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.09003e-06 [get_grad_eliminate_]: 4.82998e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.37999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.88002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.77999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89999e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.07999e-06 [a_after_grad]: 7.97e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 5.89999e-06 [cse]: 1.239e-05 [a_3]: 3.148e-05 [py_interpret_to_execute_after_opt_a]: 7.71001e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 3.106e-05 [convert_after_rewriter]: 6.61999e-06 [order_py_execute_after_rewriter]: 4.97e-06 [mutable_eliminate]: 0.00045048 [opt_b]: 0.00018544, [1] [Cycle 1]: 0.00017893, [7] [b_1]: 0.00011042 [b_2]: 6.93e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.60997e-06 [updatestate_loads_eliminate]: 2.46998e-06 [renormalize]: 4.59986e-07 [cse]: 1.62e-05 [optimize_parallel_all_gather_comm]: 1.484e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 2.198e-05 [loop_unroll]: 0.00043603 [opt_after_cconv]: 9.479e-05, [1] [Cycle 1]: 8.896e-05, [7] [c_1]: 2.803e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.538e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.273e-05 [tuple_transform]: 6.895e-05, [1] [Cycle 1]: 6.432e-05, [4] [d_1]: 3.895e-05 [none_parameter_eliminate]: 1.40001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 2.44001e-06 [add_recomputation]: 5.137e-05 [cse_after_recomputation]: 2.029e-05, [1] [Cycle 1]: 1.58e-05, [1] [cse]: 1.074e-05 [environ_conv]: 4.63999e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.26998e-06 [label_micro_interleaved_index]: 3.78999e-06 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.24998e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.69999e-06 [assign_add_opt]: 1.19003e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.49978e-07 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.48998e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76003e-06 [control_data_broadcast_order]: 1.199e-05 [grouped_pairwise_exchange_alltoall]: 2.04e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.688e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.705e-05, [1] [Cycle 1]: 6.298e-05, [6] [build]: 2.12999e-06 [elim_shapecalc]: 8.36002e-06 [elim_not_effective]: 1.131e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.74998e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.54e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 1.584e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.83999e-06 [opt_after_jit_grad]: 0.00044633 [validate]: 3.21e-05 [backend_pass]: 1.12999e-06 [task_emit]: 0.359953 [execute]: 8.77999e-06 Sums bootstrap : 0.000515s : 0.14% type_inference : 0.006158s : 1.66% event_method : 0.000014s : 0.00% auto_monad : 0.000055s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.01% optimize.rewriter_before_opt_a : 0.000060s : 0.02% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.01% optimize.opt_a.a_1 : 0.000582s : 0.16% optimize.opt_a.with_stream_mark : 0.000023s : 0.01% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.01% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000432s : 0.12% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.01% optimize.opt_a.cse : 0.000039s : 0.01% optimize.opt_a.a_3 : 0.000073s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000450s : 0.12% optimize.opt_b.b_1 : 0.000110s : 0.03% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.01% optimize.loop_unroll : 0.000436s : 0.12% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000446s : 0.12% validate : 0.000032s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.359953s : 97.16% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000168 30 14.98% : 0.000025s : 5: substitution.arithmetic_simplify 1.02% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.10% : 0.000111s : 3: substitution.inline 1.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.71% : 0.000005s : 4: substitution.remove_not_recompute_node 2.57% : 0.000004s : 4: substitution.replace_old_param 6.78% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006112 2 90.86% : 0.005553s : 1: type_inference.infer 9.14% : 0.000559s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.65% : 0.000027s : 3: replace.inline 30.35% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.34% : 0.000109s : 3: match.inline 8.66% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.83% : 0.000001s : 11: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.17% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000009s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.23% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.91% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.42% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.16% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.47% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000347 8 45.48% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.52% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.384336 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.96% : 0.003677s : 1: add_attr 0.95% : 0.003665s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000060s : 1: auto_monad 0.01% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.14% : 0.000554s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.12% : 0.000445s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.12% : 0.000459s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.25% : 0.000947s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000093s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.01% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.55% : 0.002131s : 1: opt_a 0.03% : 0.000098s : 1: opt_after_cconv 0.12% : 0.000456s : 1: opt_after_jit_grad 0.05% : 0.000189s : 1: opt_b 1.04% : 0.003998s : 1: optimize 0.00% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000033s : 1: pre_auto_parallel 0.01% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.06% : 0.000225s : 1: renormalize.infer 0.05% : 0.000200s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000035s : 1: rewriter_after_opt_a 0.02% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000070s : 1: symbol_engine_optimizer 93.66% : 0.359975s : 1: task_emit 0.02% : 0.000072s : 1: tuple_transform 1.61% : 0.006171s : 1: type_inference 0.02% : 0.000058s : 1: validate TotalTime = 0.413999, [24] [bootstrap]: 0.00053311 [type_inference]: 0.00452158 [event_method]: 1.065e-05 [auto_monad]: 4.978e-05 [graph_reusing]: 5.45001e-06 [inline]: 1.50001e-06 [add_attr]: 0.00295436, [1] [add_attr_with_inline]: 0.00294613, [1] [Cycle 1]: 4.616e-05, [2] [tag_attr]: 1.197e-05 [meta_addattr_fg_expand]: 3.41001e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.114e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.00367894, [53] [py_interpret_to_execute]: 1.462e-05 [rewriter_before_opt_a]: 3.779e-05 [opt_a]: 0.00187773, [2] [Cycle 1]: 0.00127803, [45] [expand_dump_flag]: 2.53998e-06 [switch_simplify]: 2.545e-05 [loop_unroll]: 1.339e-05 [a_1]: 0.00031758 [with_stream_mark]: 1.339e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.66002e-06 [a_2]: 7.632e-05 [accelerated_algorithm]: 6.23e-06 [shard]: 2.39001e-06 [meta_shard_fg_expand]: 1.41002e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 7.73999e-06 [auto_parallel]: 6.23e-06 [parallel]: 1.812e-05 [flash_sp]: 7.33e-06 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 9.47999e-06 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 7.59002e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 9.00001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.25002e-06 [flash_sp_send_recv_attached]: 2.23998e-06 [receive_attached]: 2.68998e-06 [after_resolve]: 9.90002e-06 [a_after_grad]: 8.38001e-06 [renormalize]: 0.00034358 [add_forward_monad_depend]: 4.50001e-06 [auto_monad_grad]: 1.56002e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.748e-05 [a_3]: 3.991e-05 [Cycle 2]: 0.0005904, [45] [expand_dump_flag]: 8.09989e-07 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.73002e-06 [a_1]: 0.00012379 [with_stream_mark]: 9.69e-06 [recompute_prepare]: 5.49e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.18998e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.819e-05 [accelerated_algorithm]: 5.51998e-06 [shard]: 1.11997e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 5.56e-06 [parallel]: 4.34002e-06 [flash_sp]: 3.57997e-06 [merge_comm]: 3.24001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.02001e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.63003e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.11998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.24e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.71999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 8.84998e-06 [a_after_grad]: 7.9e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.48998e-06 [cse]: 1.293e-05 [a_3]: 3.18e-05 [py_interpret_to_execute_after_opt_a]: 6.88e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.114e-05 [convert_after_rewriter]: 7.02002e-06 [order_py_execute_after_rewriter]: 4.84e-06 [mutable_eliminate]: 0.00044854 [opt_b]: 0.00017913, [1] [Cycle 1]: 0.00017305, [7] [b_1]: 0.00010628 [b_2]: 7.14001e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.33998e-06 [renormalize]: 3.59985e-07 [cse]: 1.595e-05 [optimize_parallel_all_gather_comm]: 1.644e-05 [overlap_param_gather]: 1.91998e-06 [cconv]: 2.162e-05 [loop_unroll]: 0.00041479 [opt_after_cconv]: 9.35e-05, [1] [Cycle 1]: 8.762e-05, [7] [c_1]: 2.721e-05 [parameter_eliminate]: 2.40002e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.533e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.863e-05, [1] [Cycle 1]: 6.428e-05, [4] [d_1]: 3.835e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.40019e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 2.21003e-06 [add_recomputation]: 4.518e-05 [cse_after_recomputation]: 2.053e-05, [1] [Cycle 1]: 1.604e-05, [1] [cse]: 1.097e-05 [environ_conv]: 4.43001e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.86e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 3.00002e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.14e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.5999e-07 [full_micro_interleaved_order_control]: 1.91e-06 [reorder_send_recv_between_fp_bp]: 2.73998e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.46002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.22998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.748e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.783e-05, [1] [Cycle 1]: 6.378e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.41002e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.90001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.98997e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.568e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00044746 [validate]: 3.259e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.401451 [execute]: 1.001e-05 Sums bootstrap : 0.000533s : 0.13% type_inference : 0.004522s : 1.10% event_method : 0.000011s : 0.00% auto_monad : 0.000050s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.00% optimize.rewriter_before_opt_a : 0.000038s : 0.01% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.01% optimize.opt_a.loop_unroll : 0.000019s : 0.00% optimize.opt_a.a_1 : 0.000441s : 0.11% optimize.opt_a.with_stream_mark : 0.000023s : 0.01% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000022s : 0.01% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000344s : 0.08% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000040s : 0.01% optimize.opt_a.a_3 : 0.000072s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.11% optimize.opt_b.b_1 : 0.000106s : 0.03% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.01% optimize.loop_unroll : 0.000415s : 0.10% optimize.opt_after_cconv.c_1 : 0.000027s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.11% validate : 0.000033s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.401451s : 97.91% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000144 26 14.55% : 0.000021s : 4: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.71% : 0.000005s : 4: substitution.graph_param_transform 72.25% : 0.000104s : 2: substitution.inline 1.86% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.90% : 0.000004s : 4: substitution.remove_not_recompute_node 2.58% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004480 2 91.95% : 0.004120s : 1: type_inference.infer 8.05% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000102 2 100.00% : 0.000102s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.90% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.43% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.83% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.92% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.81% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.86% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.03% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.21% : 0.000002s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.89% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000267 6 44.56% : 0.000119s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.44% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.421918 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.70% : 0.002959s : 1: add_attr 0.70% : 0.002950s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000055s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.13% : 0.000567s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000016s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.10% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.11% : 0.000458s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.19% : 0.000791s : 78: opt.transform.opt_a 0.01% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000090s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.01% : 0.000031s : 4: opt.transform.symbol_engine_opt 0.45% : 0.001881s : 1: opt_a 0.02% : 0.000097s : 1: opt_after_cconv 0.11% : 0.000457s : 1: opt_after_jit_grad 0.04% : 0.000182s : 1: opt_b 0.87% : 0.003683s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000025s : 1: pre_auto_parallel 0.00% : 0.000018s : 1: py_interpret_to_execute 0.00% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.04% : 0.000188s : 1: renormalize.infer 0.04% : 0.000149s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000042s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000070s : 1: symbol_engine_optimizer 95.15% : 0.401475s : 1: task_emit 0.02% : 0.000071s : 1: tuple_transform 1.08% : 0.004536s : 1: type_inference 0.02% : 0.000093s : 1: validate TotalTime = 0.459053, [24] [bootstrap]: 0.000473 [type_inference]: 0.00590926 [event_method]: 1.492e-05 [auto_monad]: 5.697e-05 [graph_reusing]: 5.27999e-06 [inline]: 2.19001e-06 [add_attr]: 0.00328671, [1] [add_attr_with_inline]: 0.00327727, [1] [Cycle 1]: 5.193e-05, [2] [tag_attr]: 1.814e-05 [meta_addattr_fg_expand]: 4.11001e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.84e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 1.12999e-06 [dataset_repeat_opt]: 2.24999e-06 [pipeline_split]: 2.07001e-06 [optimize]: 0.00423427, [53] [py_interpret_to_execute]: 2.112e-05 [rewriter_before_opt_a]: 6.537e-05 [opt_a]: 0.00229838, [2] [Cycle 1]: 0.00168938, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.142e-05 [loop_unroll]: 2.043e-05 [a_1]: 0.00046614 [with_stream_mark]: 1.303e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 3.17997e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.641e-05 [accelerated_algorithm]: 6.11998e-06 [shard]: 2.33998e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.42e-06 [auto_parallel]: 6.17999e-06 [parallel]: 1.849e-05 [flash_sp]: 7.13e-06 [merge_comm]: 3.81001e-06 [allreduce_fusion]: 3.37997e-06 [matmul_add_comm_reduction]: 8.90999e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.31001e-06 [virtual_dataset]: 6.26e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.132e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.54999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.49999e-06 [flash_sp_send_recv_attached]: 2.17001e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.088e-05 [a_after_grad]: 9.24e-06 [renormalize]: 0.00057168 [add_forward_monad_depend]: 4.87998e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.457e-05 [cse]: 3.084e-05 [a_3]: 4.121e-05 [Cycle 2]: 0.00059938, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 6.67002e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012673 [with_stream_mark]: 9.90002e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.11e-06 [updatestate_loads_eliminate]: 2.27999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.848e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 5.82999e-06 [parallel]: 4.21001e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 4.99998e-06 [allreduce_slice_to_reducescatter]: 2.40019e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.19998e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.56998e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 8.19002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.40024e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 9.29e-06 [a_after_grad]: 8.30999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.342e-05 [a_3]: 3.217e-05 [py_interpret_to_execute_after_opt_a]: 7.63999e-06 [slice_cell_reuse_recomputed_activation]: 1.99999e-06 [rewriter_after_opt_a]: 3.133e-05 [convert_after_rewriter]: 7.33999e-06 [order_py_execute_after_rewriter]: 5.62001e-06 [mutable_eliminate]: 0.0004847 [opt_b]: 0.00018621, [1] [Cycle 1]: 0.0001799, [7] [b_1]: 0.00011075 [b_2]: 7.04001e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.53003e-06 [updatestate_loads_eliminate]: 2.68998e-06 [renormalize]: 3.69997e-07 [cse]: 1.715e-05 [optimize_parallel_all_gather_comm]: 1.601e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.356e-05 [loop_unroll]: 0.00043119 [opt_after_cconv]: 0.00011447, [1] [Cycle 1]: 9.145e-05, [7] [c_1]: 2.798e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.713e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.258e-05 [tuple_transform]: 7.226e-05, [1] [Cycle 1]: 6.73e-05, [4] [d_1]: 4.046e-05 [none_parameter_eliminate]: 1.87999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 4.48e-05 [cse_after_recomputation]: 2.197e-05, [1] [Cycle 1]: 1.74e-05, [1] [cse]: 1.203e-05 [environ_conv]: 5.59e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.51998e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.30013e-07 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.04003e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.214e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.89002e-06 [overlap_recompute_and_grad_model_parallel]: 4.28999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.89e-06 [overlap_grad_ring_attention]: 4.47e-06 [overlap_grad_flash_sp]: 1.747e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 2.07001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 7.228e-05, [1] [Cycle 1]: 6.816e-05, [6] [build]: 2.88e-06 [elim_shapecalc]: 8.97e-06 [elim_not_effective]: 1.257e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.05001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.616e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.74002e-06 [opt_after_jit_grad]: 0.00047251 [validate]: 3.436e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.444264 [execute]: 9.96e-06 Sums bootstrap : 0.000473s : 0.10% type_inference : 0.005909s : 1.30% event_method : 0.000015s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000065s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.01% optimize.opt_a.a_1 : 0.000593s : 0.13% optimize.opt_a.with_stream_mark : 0.000023s : 0.01% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000023s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000572s : 0.13% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.00% optimize.opt_a.cse : 0.000044s : 0.01% optimize.opt_a.a_3 : 0.000073s : 0.02% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.01% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000485s : 0.11% optimize.opt_b.b_1 : 0.000111s : 0.02% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.01% optimize.loop_unroll : 0.000431s : 0.09% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000473s : 0.10% validate : 0.000034s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.444264s : 97.69% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000181 30 13.97% : 0.000025s : 5: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000001s : 2: substitution.fold_const_symbol 3.57% : 0.000006s : 4: substitution.graph_param_transform 67.98% : 0.000123s : 3: substitution.inline 1.61% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000005s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.04% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005863 2 87.22% : 0.005114s : 1: type_inference.infer 12.78% : 0.000749s : 1: type_inference.specialize ------[replace.] 0.000042 5 70.58% : 0.000030s : 3: replace.inline 29.42% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000131 5 92.41% : 0.000121s : 3: match.inline 7.59% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.59% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.46% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.08% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.55% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.94% : 0.000002s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000525 8 31.43% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 68.57% : 0.000360s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.468258 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.70% : 0.003292s : 1: add_attr 0.70% : 0.003281s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000063s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.11% : 0.000511s : 1: bootstrap 0.01% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.01% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.09% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.11% : 0.000495s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.20% : 0.000960s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000091s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.01% : 0.000034s : 4: opt.transform.symbol_engine_opt 0.49% : 0.002301s : 1: opt_a 0.03% : 0.000119s : 1: opt_after_cconv 0.10% : 0.000483s : 1: opt_after_jit_grad 0.04% : 0.000190s : 1: opt_b 0.91% : 0.004239s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000033s : 1: pre_auto_parallel 0.01% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.07% : 0.000308s : 1: renormalize.infer 0.05% : 0.000256s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000075s : 1: symbol_engine_optimizer 94.88% : 0.444287s : 1: task_emit 0.02% : 0.000075s : 1: tuple_transform 1.27% : 0.005926s : 1: type_inference 0.01% : 0.000058s : 1: validate TotalTime = 0.796517, [24] [bootstrap]: 0.00051917 [type_inference]: 0.028466 [event_method]: 5.411e-05 [auto_monad]: 0.00012787 [graph_reusing]: 7.92e-06 [inline]: 2.09e-06 [add_attr]: 0.00343859, [1] [add_attr_with_inline]: 0.00342832, [1] [Cycle 1]: 8.243e-05, [2] [tag_attr]: 4.388e-05 [meta_addattr_fg_expand]: 9.22999e-06 [parallel-infer-symbol]: 3.07002e-06 [pre_auto_parallel]: 5.349e-05 [insert-virtual-dataset]: 2.64999e-06 [parallel-infer-symbol-second]: 9.09989e-07 [dataset_repeat_opt]: 1.91998e-06 [pipeline_split]: 1.59998e-06 [optimize]: 0.030829, [53] [py_interpret_to_execute]: 4.085e-05 [rewriter_before_opt_a]: 0.00016569 [opt_a]: 0.0282819, [3] [Cycle 1]: 0.023978, [45] [expand_dump_flag]: 4.85001e-06 [switch_simplify]: 7.419e-05 [loop_unroll]: 6.124e-05 [a_1]: 0.00148823 [with_stream_mark]: 2.687e-05 [recompute_prepare]: 2.248e-05 [updatestate_depend_eliminate]: 8.99998e-06 [updatestate_assign_eliminate]: 7.82e-06 [updatestate_loads_eliminate]: 7.56999e-06 [parameter_eliminate]: 2.51e-06 [a_2]: 0.00026516 [accelerated_algorithm]: 3.16e-05 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 3.75e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.656e-05 [auto_parallel]: 1.11e-05 [parallel]: 1.919e-05 [flash_sp]: 1.115e-05 [merge_comm]: 9.87001e-06 [allreduce_fusion]: 8.95001e-06 [matmul_add_comm_reduction]: 2.73e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.847e-05 [virtual_dataset]: 1.561e-05 [get_grad_eliminate_]: 1.506e-05 [virtual_output]: 1.55e-05 [merge_forward]: 9.91e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.77e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.913e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.755e-05 [set_forward_comm_id_for_comm_node_pass]: 9.81e-06 [meta_fg_expand]: 0.0015106 [flash_sp_send_recv_attached]: 4.03001e-06 [receive_attached]: 2.38002e-06 [after_resolve]: 6.099e-05 [a_after_grad]: 8.255e-05 [renormalize]: 0.0191235 [add_forward_monad_depend]: 1.156e-05 [auto_monad_grad]: 5.96998e-06 [auto_monad_eliminator]: 6.154e-05 [cse]: 0.00018359 [a_3]: 0.00034652 [Cycle 2]: 0.00337713, [45] [expand_dump_flag]: 1.61002e-06 [switch_simplify]: 4.726e-05 [loop_unroll]: 4.425e-05 [a_1]: 0.00158148 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 1.117e-05 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 4.63999e-06 [updatestate_loads_eliminate]: 4.10998e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 0.00012742 [accelerated_algorithm]: 4.605e-05 [shard]: 1.59998e-06 [meta_shard_fg_expand]: 2.17999e-06 [shard_inline]: 9.71e-06 [merge_send_recv]: 7.83999e-06 [auto_parallel]: 9.12001e-06 [parallel]: 6.99001e-06 [flash_sp]: 4.30999e-06 [merge_comm]: 5.82001e-06 [allreduce_fusion]: 5.02999e-06 [matmul_add_comm_reduction]: 8.80999e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 9.20999e-06 [get_grad_eliminate_]: 9.22001e-06 [virtual_output]: 8.57e-06 [merge_forward]: 4.90999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.116e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.782e-05 [merge_recompute_call_nodes]: 1.13001e-06 [before_grad]: 1.439e-05 [set_forward_comm_id_for_comm_node_pass]: 5.54e-06 [meta_fg_expand]: 0.00012802 [flash_sp_send_recv_attached]: 1.35999e-06 [receive_attached]: 1.67001e-06 [after_resolve]: 1.638e-05 [a_after_grad]: 1.499e-05 [renormalize]: 0.00078424 [add_forward_monad_depend]: 4.38999e-06 [auto_monad_grad]: 1.49998e-06 [auto_monad_eliminator]: 1.584e-05 [cse]: 4.805e-05 [a_3]: 6.64e-05 [Cycle 3]: 0.00091124, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.062e-05 [loop_unroll]: 8.90999e-06 [a_1]: 0.00025012 [with_stream_mark]: 1.021e-05 [recompute_prepare]: 9.52001e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 0.00012372 [accelerated_algorithm]: 1.214e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.15001e-06 [merge_send_recv]: 7.33e-06 [auto_parallel]: 7.35e-06 [parallel]: 5.04e-06 [flash_sp]: 1.02e-06 [merge_comm]: 5.38002e-06 [allreduce_fusion]: 5.26002e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 1.021e-05 [virtual_dataset]: 8.78001e-06 [get_grad_eliminate_]: 8.57e-06 [virtual_output]: 8.33999e-06 [merge_forward]: 4.40999e-06 [cell_reuse_recompute_pass]: 1.66e-06 [offload_activation]: 8.88002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.606e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.461e-05 [set_forward_comm_id_for_comm_node_pass]: 5.59e-06 [meta_fg_expand]: 2.81999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.433e-05 [a_after_grad]: 1.508e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.17e-05 [cse]: 2.736e-05 [a_3]: 5.918e-05 [py_interpret_to_execute_after_opt_a]: 1.191e-05 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 5.379e-05 [convert_after_rewriter]: 9.52001e-06 [order_py_execute_after_rewriter]: 6.60997e-06 [mutable_eliminate]: 0.00058327 [opt_b]: 0.00029798, [1] [Cycle 1]: 0.00029148, [7] [b_1]: 0.00019265 [b_2]: 1.132e-05 [updatestate_depend_eliminate]: 7.76001e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 6.29982e-07 [cse]: 3.478e-05 [optimize_parallel_all_gather_comm]: 2.488e-05 [overlap_param_gather]: 2.51e-06 [cconv]: 2.21e-05 [loop_unroll]: 0.00045079 [opt_after_cconv]: 0.00019169, [1] [Cycle 1]: 0.00018544, [7] [c_1]: 4.961e-05 [parameter_eliminate]: 2.15002e-06 [updatestate_depend_eliminate]: 7.2e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 5.2e-05 [cse]: 3.308e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 3.048e-05 [tuple_transform]: 0.00010387, [1] [Cycle 1]: 9.869e-05, [4] [d_1]: 6.785e-05 [none_parameter_eliminate]: 1.85001e-06 [renormalize]: 2.89991e-07 [switch_simplify]: 1.006e-05 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.992e-05 [cse_after_recomputation]: 3.364e-05, [1] [Cycle 1]: 2.883e-05, [1] [cse]: 2.318e-05 [environ_conv]: 9.24e-06 [swap_dp_allreduce_reducescatter]: 8.25e-06 [bias_add_comm_swap]: 2.68003e-06 [label_micro_interleaved_index]: 4.46002e-06 [label_fine_grained_interleaved_index]: 2.57001e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.13002e-06 [micro_interleaved_order_control]: 2.82002e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.58998e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.706e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 5.22999e-06 [overlap_recompute_and_grad_model_parallel]: 5.77999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 5.32999e-06 [overlap_grad_flash_sp]: 2.483e-05 [begin_end_overlap_inline]: 8.60018e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 0.00010488, [1] [Cycle 1]: 0.00010052, [6] [build]: 1.116e-05 [elim_shapecalc]: 1.461e-05 [elim_not_effective]: 1.889e-05 [opt_reshape]: 1.106e-05 [fold_const_symbol]: 1.502e-05 [renormalize]: 2.59985e-07 [detach_backward]: 2.05002e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.516e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.00049273 [validate]: 5.13e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.732187 [execute]: 9.39e-06 Sums bootstrap : 0.000519s : 0.07% type_inference : 0.028466s : 3.60% event_method : 0.000054s : 0.01% auto_monad : 0.000128s : 0.02% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000044s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000053s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.01% optimize.rewriter_before_opt_a : 0.000166s : 0.02% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.02% optimize.opt_a.loop_unroll : 0.000114s : 0.01% optimize.opt_a.a_1 : 0.003320s : 0.42% optimize.opt_a.with_stream_mark : 0.000051s : 0.01% optimize.opt_a.recompute_prepare : 0.000043s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000516s : 0.07% optimize.opt_a.accelerated_algorithm : 0.000090s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000032s : 0.00% optimize.opt_a.auto_parallel : 0.000028s : 0.00% optimize.opt_a.parallel : 0.000031s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000021s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001641s : 0.21% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.01% optimize.opt_a.a_after_grad : 0.000113s : 0.01% optimize.opt_a.renormalize : 0.019908s : 2.51% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.00% optimize.opt_a.auto_monad_grad : 0.000009s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000089s : 0.01% optimize.opt_a.cse : 0.000259s : 0.03% optimize.opt_a.a_3 : 0.000472s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000054s : 0.01% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000583s : 0.07% optimize.opt_b.b_1 : 0.000193s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000035s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.00% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000451s : 0.06% optimize.opt_after_cconv.c_1 : 0.000050s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000052s : 0.01% optimize.opt_after_cconv.cse : 0.000033s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.00% optimize.tuple_transform.d_1 : 0.000068s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.01% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000493s : 0.06% validate : 0.000051s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.732187s : 92.48% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000840 222 6.11% : 0.000051s : 12: substitution.arithmetic_simplify 1.85% : 0.000016s : 2: substitution.cast_eliminate 0.31% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 57.48% : 0.000483s : 17: substitution.inline 2.02% : 0.000017s : 2: substitution.inline_without_move 1.30% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.89% : 0.000016s : 3: substitution.less_batch_normalization 1.63% : 0.000014s : 11: substitution.minmaximum_grad 0.67% : 0.000006s : 5: substitution.partial_eliminate 1.75% : 0.000015s : 20: substitution.remove_not_recompute_node 3.00% : 0.000025s : 10: substitution.replace_applicator 1.27% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.47% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.69% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.11% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.01% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.25% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.028377 2 93.69% : 0.026585s : 1: type_inference.infer 6.31% : 0.001792s : 1: type_inference.specialize ------[replace.] 0.000229 33 57.91% : 0.000132s : 17: replace.inline 42.09% : 0.000096s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000509 33 93.21% : 0.000474s : 17: match.inline 6.79% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000762 5764 1.10% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.09% : 0.000016s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.70% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000043s : 249: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.10% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.65% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000015s : 152: predicate.replace_applicator 0.58% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.91% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.15% : 0.000009s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001855 34 52.62% : 0.000976s : 13: func_graph_cloner_run.FuncGraphClonerGraph 47.38% : 0.000879s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.856070 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.40% : 0.003444s : 1: add_attr 0.40% : 0.003432s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000135s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000553s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000062s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000461s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.07% : 0.000594s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.59% : 0.005059s : 117: opt.transform.opt_a 0.01% : 0.000048s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000177s : 28: opt.transform.opt_b 0.01% : 0.000076s : 2: opt.transform.opt_trans_graph 0.01% : 0.000056s : 4: opt.transform.symbol_engine_opt 3.30% : 0.028285s : 1: opt_a 0.02% : 0.000195s : 1: opt_after_cconv 0.06% : 0.000503s : 1: opt_after_jit_grad 0.04% : 0.000302s : 1: opt_b 3.60% : 0.030834s : 1: optimize 0.00% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000059s : 1: pre_auto_parallel 0.01% : 0.000045s : 1: py_interpret_to_execute 0.00% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000035s : 1: remove_dup_value 0.23% : 0.002010s : 2: renormalize.infer 2.09% : 0.017882s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000058s : 1: rewriter_after_opt_a 0.02% : 0.000171s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000108s : 1: symbol_engine_optimizer 85.53% : 0.732208s : 1: task_emit 0.01% : 0.000107s : 1: tuple_transform 3.33% : 0.028487s : 1: type_inference 0.01% : 0.000079s : 1: validate TotalTime = 0.171748, [24] [bootstrap]: 0.00052763 [type_inference]: 0.00459046 [event_method]: 1.116e-05 [auto_monad]: 7.961e-05 [graph_reusing]: 5.46002e-06 [inline]: 2.61e-06 [add_attr]: 0.00366519, [1] [add_attr_with_inline]: 0.00365578, [1] [Cycle 1]: 5e-05, [2] [tag_attr]: 1.215e-05 [meta_addattr_fg_expand]: 3.76999e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 2.278e-05 [insert-virtual-dataset]: 2.54999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00392316, [53] [py_interpret_to_execute]: 1.619e-05 [rewriter_before_opt_a]: 4.154e-05 [opt_a]: 0.00200424, [2] [Cycle 1]: 0.00139808, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.492e-05 [loop_unroll]: 1.339e-05 [a_1]: 0.00032812 [with_stream_mark]: 1.495e-05 [recompute_prepare]: 7.71001e-06 [updatestate_depend_eliminate]: 3.43e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.822e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.61998e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 7.93001e-06 [auto_parallel]: 6.71999e-06 [parallel]: 1.839e-05 [flash_sp]: 7.54002e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.83e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.73001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 8.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.51998e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.83e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00044048 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.82001e-06 [auto_monad_eliminator]: 1.334e-05 [cse]: 2.78e-05 [a_3]: 4.106e-05 [Cycle 2]: 0.00059662, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.35001e-06 [a_1]: 0.00012636 [with_stream_mark]: 9.51e-06 [recompute_prepare]: 5.84999e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.733e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.65001e-06 [parallel]: 4.17998e-06 [flash_sp]: 3.48e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.47001e-06 [allreduce_slice_to_reducescatter]: 2.29978e-07 [virtual_shard_identity]: 6.18998e-06 [virtual_dataset]: 5.10999e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.65002e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.38002e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 7.75e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.52001e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.06997e-06 [after_resolve]: 9.29e-06 [a_after_grad]: 8.33001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.51999e-06 [cse]: 1.345e-05 [a_3]: 3.148e-05 [py_interpret_to_execute_after_opt_a]: 8.04997e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.581e-05 [convert_after_rewriter]: 6.99001e-06 [order_py_execute_after_rewriter]: 5.02999e-06 [mutable_eliminate]: 0.00049308 [opt_b]: 0.00018714, [1] [Cycle 1]: 0.00018069, [7] [b_1]: 0.00011242 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 5.59987e-07 [cse]: 1.765e-05 [optimize_parallel_all_gather_comm]: 1.663e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.331e-05 [loop_unroll]: 0.00044117 [opt_after_cconv]: 9.823e-05, [1] [Cycle 1]: 9.251e-05, [7] [c_1]: 2.837e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 5.53002e-06 [updatestate_assign_eliminate]: 2.78e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.74e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.305e-05 [tuple_transform]: 7.082e-05, [1] [Cycle 1]: 6.629e-05, [4] [d_1]: 4.028e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.76998e-06 [add_recomputation]: 4.951e-05 [cse_after_recomputation]: 2.1e-05, [1] [Cycle 1]: 1.644e-05, [1] [cse]: 1.117e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.16998e-06 [bias_add_comm_swap]: 3.00002e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 2.64001e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 1.04998e-06 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.85998e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.19e-06 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.95001e-06 [control_data_broadcast_order]: 1.119e-05 [grouped_pairwise_exchange_alltoall]: 1.83002e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.06998e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.781e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.53003e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 6.981e-05, [1] [Cycle 1]: 6.555e-05, [6] [build]: 2.51e-06 [elim_shapecalc]: 8.25999e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.29999e-06 [fold_const_symbol]: 9.39e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.84e-06 [auto_monad_reorder]: 1.577e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.44001e-06 [opt_after_jit_grad]: 0.0166035 [validate]: 4.287e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.142 [execute]: 9.92999e-06 Sums bootstrap : 0.000528s : 0.32% type_inference : 0.004590s : 2.75% event_method : 0.000011s : 0.01% auto_monad : 0.000080s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000042s : 0.02% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.02% optimize.opt_a.loop_unroll : 0.000019s : 0.01% optimize.opt_a.a_1 : 0.000454s : 0.27% optimize.opt_a.with_stream_mark : 0.000024s : 0.01% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.09% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.01% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000441s : 0.26% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.01% optimize.opt_a.cse : 0.000041s : 0.02% optimize.opt_a.a_3 : 0.000073s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000493s : 0.30% optimize.opt_b.b_1 : 0.000112s : 0.07% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.01% optimize.loop_unroll : 0.000441s : 0.26% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.016603s : 9.94% validate : 0.000043s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.142000s : 84.99% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000127 26 18.18% : 0.000023s : 4: substitution.arithmetic_simplify 1.68% : 0.000002s : 2: substitution.elim_not_effective 1.19% : 0.000002s : 2: substitution.fold_const_symbol 4.43% : 0.000006s : 4: substitution.graph_param_transform 65.98% : 0.000084s : 2: substitution.inline 2.19% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.23% : 0.000004s : 4: substitution.remove_not_recompute_node 3.12% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004546 2 91.81% : 0.004174s : 1: type_inference.infer 8.19% : 0.000372s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000082 2 100.00% : 0.000082s : 2: match.inline ------[predicate.] 0.000167 984 0.67% : 0.000001s : 9: predicate.accumulaten_eliminater 1.36% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.59% : 0.000001s : 9: predicate.addn_zero_filter 0.61% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 17: predicate.arithmetic_simplify 0.70% : 0.000001s : 9: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.64% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.74% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.68% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.95% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.86% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.94% : 0.000002s : 13: predicate.environ_get_depend_swap 1.52% : 0.000003s : 21: predicate.environ_get_eliminate 0.91% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.77% : 0.000001s : 11: predicate.exchange_switch_depend_value 16.78% : 0.000028s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.59% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 5.00% : 0.000008s : 44: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000002s : 8: predicate.less_batch_normalization 1.39% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.86% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.38% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.58% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.58% : 0.000001s : 9: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 0.94% : 0.000002s : 11: predicate.partial_defer_inline 1.00% : 0.000002s : 13: predicate.partial_eliminate 0.66% : 0.000001s : 9: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000002s : 9: predicate.reduce_eliminate 1.80% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 8: predicate.remove_not_recompute_node 1.11% : 0.000002s : 17: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.64% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000002s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.84% : 0.000001s : 11: predicate.switch_defer_inline 1.47% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.64% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.66% : 0.000001s : 9: predicate.transpose_eliminate 1.30% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.17% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.16% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.89% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.77% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.62% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.64% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000277 6 42.63% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.37% : 0.000159s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.180746 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.03% : 0.003671s : 1: add_attr 2.02% : 0.003660s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000087s : 1: auto_monad 0.01% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000564s : 1: bootstrap 0.01% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000451s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000504s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.45% : 0.000806s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000094s : 28: opt.transform.opt_b 0.02% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.11% : 0.002007s : 1: opt_a 0.06% : 0.000102s : 1: opt_after_cconv 9.19% : 0.016618s : 1: opt_after_jit_grad 0.11% : 0.000191s : 1: opt_b 2.17% : 0.003927s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.01% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.14% : 0.000251s : 1: renormalize.infer 0.10% : 0.000182s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000040s : 1: rewriter_after_opt_a 0.03% : 0.000046s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000073s : 1: symbol_engine_optimizer 78.58% : 0.142022s : 1: task_emit 0.04% : 0.000074s : 1: tuple_transform 2.55% : 0.004605s : 1: type_inference 0.04% : 0.000072s : 1: validate TotalTime = 0.625402, [24] [bootstrap]: 0.00056765 [type_inference]: 0.0107711 [event_method]: 4.454e-05 [auto_monad]: 0.00011607 [graph_reusing]: 8.2e-06 [inline]: 2.07999e-06 [add_attr]: 0.0032548, [1] [add_attr_with_inline]: 0.00324553, [1] [Cycle 1]: 7.142e-05, [2] [tag_attr]: 3.157e-05 [meta_addattr_fg_expand]: 8.38001e-06 [parallel-infer-symbol]: 3.12002e-06 [pre_auto_parallel]: 4.889e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0307044, [53] [py_interpret_to_execute]: 3.673e-05 [rewriter_before_opt_a]: 0.00012799 [opt_a]: 0.0280605, [3] [Cycle 1]: 0.0236474, [45] [expand_dump_flag]: 3.50998e-06 [switch_simplify]: 6.782e-05 [loop_unroll]: 5.521e-05 [a_1]: 0.00147557 [with_stream_mark]: 2.595e-05 [recompute_prepare]: 2.375e-05 [updatestate_depend_eliminate]: 8.72e-06 [updatestate_assign_eliminate]: 7.66999e-06 [updatestate_loads_eliminate]: 7.66999e-06 [parameter_eliminate]: 2.69001e-06 [a_2]: 0.00024555 [accelerated_algorithm]: 3.149e-05 [shard]: 2.49001e-06 [meta_shard_fg_expand]: 3.33998e-06 [shard_inline]: 1.649e-05 [merge_send_recv]: 1.544e-05 [auto_parallel]: 1.12e-05 [parallel]: 1.812e-05 [flash_sp]: 1.185e-05 [merge_comm]: 9.57001e-06 [allreduce_fusion]: 9.25999e-06 [matmul_add_comm_reduction]: 2.825e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.737e-05 [virtual_dataset]: 1.719e-05 [get_grad_eliminate_]: 1.523e-05 [virtual_output]: 1.503e-05 [merge_forward]: 9.36e-06 [cell_reuse_recompute_pass]: 1.41998e-06 [offload_activation]: 1.794e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.9e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 2.819e-05 [set_forward_comm_id_for_comm_node_pass]: 1.003e-05 [meta_fg_expand]: 0.00147473 [flash_sp_send_recv_attached]: 3.86001e-06 [receive_attached]: 3.28e-06 [after_resolve]: 6.058e-05 [a_after_grad]: 8.34e-05 [renormalize]: 0.0188252 [add_forward_monad_depend]: 1.395e-05 [auto_monad_grad]: 6.56999e-06 [auto_monad_eliminator]: 6.647e-05 [cse]: 0.00018606 [a_3]: 0.00035544 [Cycle 2]: 0.00347786, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 4.901e-05 [loop_unroll]: 4.454e-05 [a_1]: 0.0016631 [with_stream_mark]: 1.826e-05 [recompute_prepare]: 1.18e-05 [updatestate_depend_eliminate]: 6.07999e-06 [updatestate_assign_eliminate]: 5.35999e-06 [updatestate_loads_eliminate]: 4.72e-06 [parameter_eliminate]: 2.19999e-06 [a_2]: 0.00013362 [accelerated_algorithm]: 1.513e-05 [shard]: 2.67001e-06 [meta_shard_fg_expand]: 2.91999e-06 [shard_inline]: 1.03e-05 [merge_send_recv]: 1.096e-05 [auto_parallel]: 1.279e-05 [parallel]: 1.047e-05 [flash_sp]: 3.98999e-06 [merge_comm]: 5.74999e-06 [allreduce_fusion]: 5.27001e-06 [matmul_add_comm_reduction]: 1.156e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.165e-05 [virtual_dataset]: 1.021e-05 [get_grad_eliminate_]: 9.54e-06 [virtual_output]: 9.31e-06 [merge_forward]: 6.29001e-06 [cell_reuse_recompute_pass]: 1.80001e-06 [offload_activation]: 1.392e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.795e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 1.483e-05 [set_forward_comm_id_for_comm_node_pass]: 5.84e-06 [meta_fg_expand]: 5.255e-05 [flash_sp_send_recv_attached]: 1.76003e-06 [receive_attached]: 2.74001e-06 [after_resolve]: 1.729e-05 [a_after_grad]: 1.458e-05 [renormalize]: 0.00085353 [add_forward_monad_depend]: 5.07999e-06 [auto_monad_grad]: 2.06998e-06 [auto_monad_eliminator]: 1.544e-05 [cse]: 5.215e-05 [a_3]: 6.709e-05 [Cycle 3]: 0.0009164, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 1.094e-05 [loop_unroll]: 9.22001e-06 [a_1]: 0.00025209 [with_stream_mark]: 1.061e-05 [recompute_prepare]: 8.97999e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 4.15999e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 0.00012406 [accelerated_algorithm]: 1.175e-05 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.49999e-06 [merge_send_recv]: 7.7e-06 [auto_parallel]: 7.31001e-06 [parallel]: 4.79e-06 [flash_sp]: 1.32999e-06 [merge_comm]: 5.07999e-06 [allreduce_fusion]: 5.09003e-06 [matmul_add_comm_reduction]: 8.52998e-06 [allreduce_slice_to_reducescatter]: 5.09986e-07 [virtual_shard_identity]: 1.002e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.71002e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.38001e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 9.49e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.624e-05 [merge_recompute_call_nodes]: 9.89996e-07 [before_grad]: 1.438e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.31998e-06 [after_resolve]: 1.427e-05 [a_after_grad]: 1.531e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.107e-05 [cse]: 2.759e-05 [a_3]: 6.028e-05 [py_interpret_to_execute_after_opt_a]: 1.501e-05 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 5.001e-05 [convert_after_rewriter]: 9.34e-06 [order_py_execute_after_rewriter]: 7.19001e-06 [mutable_eliminate]: 0.00073443 [opt_b]: 0.00029629, [1] [Cycle 1]: 0.00028858, [7] [b_1]: 0.00019156 [b_2]: 1.084e-05 [updatestate_depend_eliminate]: 8.45999e-06 [updatestate_assign_eliminate]: 4.62e-06 [updatestate_loads_eliminate]: 4.12003e-06 [renormalize]: 3.80009e-07 [cse]: 3.422e-05 [optimize_parallel_all_gather_comm]: 2.211e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 7.718e-05 [loop_unroll]: 0.00044332 [opt_after_cconv]: 0.00014027, [1] [Cycle 1]: 0.00013356, [7] [c_1]: 4.877e-05 [parameter_eliminate]: 2.56998e-06 [updatestate_depend_eliminate]: 8.18001e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.95998e-06 [cse]: 3.086e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 3.254e-05 [tuple_transform]: 0.00010352, [1] [Cycle 1]: 9.883e-05, [4] [d_1]: 6.784e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.94999e-06 [partial_unused_args_eliminate]: 1.99e-06 [add_recomputation]: 6.199e-05 [cse_after_recomputation]: 3.232e-05, [1] [Cycle 1]: 2.752e-05, [1] [cse]: 2.218e-05 [environ_conv]: 1.055e-05 [swap_dp_allreduce_reducescatter]: 7.98999e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.44998e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.04e-06 [assign_add_opt]: 1.66002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.40002e-06 [reorder_send_recv_between_fp_bp]: 2.96999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.14003e-06 [interleave_parallel_branches]: 1.27e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.645e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.14e-06 [overlap_recompute_and_grad_model_parallel]: 5.76e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49998e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 5.07e-06 [overlap_grad_flash_sp]: 2.711e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.45002e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 0.0001029, [1] [Cycle 1]: 9.813e-05, [6] [build]: 1.051e-05 [elim_shapecalc]: 1.35e-05 [elim_not_effective]: 1.9e-05 [opt_reshape]: 1.039e-05 [fold_const_symbol]: 1.525e-05 [renormalize]: 1.8999e-07 [detach_backward]: 2.14999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.488e-05 [get_jit_bprop_graph]: 1.99e-06 [rewriter_after_jit_bprop_graph]: 4.05e-06 [opt_after_jit_grad]: 0.00048596 [validate]: 5.084e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.579055 [execute]: 9.61e-06 Sums bootstrap : 0.000568s : 0.09% type_inference : 0.010771s : 1.74% event_method : 0.000045s : 0.01% auto_monad : 0.000116s : 0.02% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.01% optimize.rewriter_before_opt_a : 0.000128s : 0.02% optimize.opt_a.expand_dump_flag : 0.000008s : 0.00% optimize.opt_a.switch_simplify : 0.000128s : 0.02% optimize.opt_a.loop_unroll : 0.000109s : 0.02% optimize.opt_a.a_1 : 0.003391s : 0.55% optimize.opt_a.with_stream_mark : 0.000055s : 0.01% optimize.opt_a.recompute_prepare : 0.000045s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.00% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000503s : 0.08% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000036s : 0.01% optimize.opt_a.merge_send_recv : 0.000034s : 0.01% optimize.opt_a.auto_parallel : 0.000031s : 0.01% optimize.opt_a.parallel : 0.000033s : 0.01% optimize.opt_a.flash_sp : 0.000017s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000020s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.01% optimize.opt_a.virtual_dataset : 0.000036s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.01% optimize.opt_a.virtual_output : 0.000033s : 0.01% optimize.opt_a.merge_forward : 0.000020s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000041s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001530s : 0.25% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.01% optimize.opt_a.a_after_grad : 0.000113s : 0.02% optimize.opt_a.renormalize : 0.019679s : 3.17% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.00% optimize.opt_a.auto_monad_grad : 0.000010s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000093s : 0.01% optimize.opt_a.cse : 0.000266s : 0.04% optimize.opt_a.a_3 : 0.000483s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000734s : 0.12% optimize.opt_b.b_1 : 0.000192s : 0.03% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000077s : 0.01% optimize.loop_unroll : 0.000443s : 0.07% optimize.opt_after_cconv.c_1 : 0.000049s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000033s : 0.01% optimize.tuple_transform.d_1 : 0.000068s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000011s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000486s : 0.08% validate : 0.000051s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.579055s : 93.28% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000827 218 6.51% : 0.000054s : 11: substitution.arithmetic_simplify 2.18% : 0.000018s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.91% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 54.59% : 0.000452s : 16: substitution.inline 2.10% : 0.000017s : 2: substitution.inline_without_move 1.37% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000017s : 3: substitution.less_batch_normalization 1.68% : 0.000014s : 11: substitution.minmaximum_grad 0.79% : 0.000007s : 5: substitution.partial_eliminate 1.70% : 0.000014s : 20: substitution.remove_not_recompute_node 3.51% : 0.000029s : 10: substitution.replace_applicator 1.49% : 0.000012s : 15: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.71% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.29% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.38% : 0.000069s : 28: substitution.tuple_list_get_item_eliminator 2.49% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010695 2 87.44% : 0.009352s : 1: type_inference.infer 12.56% : 0.001344s : 1: type_inference.specialize ------[replace.] 0.000213 30 60.10% : 0.000128s : 16: replace.inline 39.90% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000476 30 93.18% : 0.000443s : 16: match.inline 6.82% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000812 5663 1.03% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.02% : 0.000008s : 67: predicate.addn_zero_filter 0.98% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.96% : 0.000016s : 99: predicate.arithmetic_simplify 1.10% : 0.000009s : 67: predicate.cast_eliminate 1.10% : 0.000009s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.09% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000010s : 67: predicate.dict_get_item_eliminator 1.06% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.14% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.11% : 0.000009s : 75: predicate.environ_get_depend_swap 1.64% : 0.000013s : 107: predicate.environ_get_eliminate 1.10% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.58% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.10% : 0.000017s : 97: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.63% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 8: predicate.fold_const_symbol 0.52% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.49% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.28% : 0.000043s : 244: predicate.inline 1.16% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.50% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.47% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.00% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.31% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.04% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.06% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.88% : 0.000015s : 97: predicate.partial_defer_inline 1.56% : 0.000013s : 89: predicate.partial_eliminate 1.01% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.17% : 0.000009s : 67: predicate.reduce_eliminate 2.48% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000003s : 32: predicate.remove_not_recompute_node 1.76% : 0.000014s : 149: predicate.replace_applicator 0.57% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.03% : 0.000008s : 67: predicate.reshape_eliminate 1.09% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.34% : 0.000011s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.26% : 0.000002s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.68% : 0.000014s : 97: predicate.switch_defer_inline 2.70% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.56% : 0.000037s : 265: predicate.switch_simplify 1.00% : 0.000008s : 67: predicate.tile_eliminate 1.01% : 0.000008s : 67: predicate.transpose_eliminate 7.78% : 0.000063s : 83: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.69% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.34% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.92% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.53% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.42% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.10% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.53% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001745 32 55.21% : 0.000964s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.79% : 0.000782s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.684441 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.48% : 0.003260s : 1: add_attr 0.47% : 0.003249s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000123s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.09% : 0.000602s : 1: bootstrap 0.01% : 0.000081s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.01% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.01% : 0.000052s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.07% : 0.000452s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.11% : 0.000744s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.74% : 0.005085s : 117: opt.transform.opt_a 0.01% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.03% : 0.000176s : 28: opt.transform.opt_b 0.01% : 0.000075s : 2: opt.transform.opt_trans_graph 0.01% : 0.000055s : 4: opt.transform.symbol_engine_opt 4.10% : 0.028064s : 1: opt_a 0.02% : 0.000144s : 1: opt_after_cconv 0.07% : 0.000496s : 1: opt_after_jit_grad 0.04% : 0.000300s : 1: opt_b 4.49% : 0.030710s : 1: optimize 0.00% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000054s : 1: pre_auto_parallel 0.01% : 0.000041s : 1: py_interpret_to_execute 0.00% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000037s : 1: remove_dup_value 2.59% : 0.017754s : 2: renormalize.infer 0.28% : 0.001904s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000054s : 1: rewriter_after_opt_a 0.02% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.02% : 0.000106s : 1: symbol_engine_optimizer 84.61% : 0.579078s : 1: task_emit 0.02% : 0.000106s : 1: tuple_transform 1.58% : 0.010790s : 1: type_inference 0.01% : 0.000083s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x0-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x1-pynative],max_mem:6.0M TotalTime = 0.0231191, [24] [bootstrap]: 0.00056242 [type_inference]: 0.00689131 [event_method]: 5.793e-05 [auto_monad]: 6.673e-05 [graph_reusing]: 5.87999e-06 [inline]: 2.54999e-06 [add_attr]: 0.00354784, [1] [add_attr_with_inline]: 0.00353646, [1] [Cycle 1]: 5.12e-05, [2] [tag_attr]: 1.589e-05 [meta_addattr_fg_expand]: 3.76999e-06 [parallel-infer-symbol]: 3.23998e-06 [pre_auto_parallel]: 2.959e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.49003e-06 [optimize]: 0.0042002, [53] [py_interpret_to_execute]: 2.116e-05 [rewriter_before_opt_a]: 5.889e-05 [opt_a]: 0.00228464, [2] [Cycle 1]: 0.00167152, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 3.312e-05 [loop_unroll]: 2.107e-05 [a_1]: 0.0005122 [with_stream_mark]: 1.321e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.71999e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.667e-05 [accelerated_algorithm]: 6.62002e-06 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 6.20997e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.423e-05 [flash_sp]: 7.13e-06 [merge_comm]: 4.3e-06 [allreduce_fusion]: 3.23998e-06 [matmul_add_comm_reduction]: 1.001e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.147e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.19998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.18998e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.99998e-06 [renormalize]: 0.00049362 [add_forward_monad_depend]: 4.51002e-06 [auto_monad_grad]: 2.39999e-06 [auto_monad_eliminator]: 1.467e-05 [cse]: 2.955e-05 [a_3]: 4.201e-05 [Cycle 2]: 0.00060295, [45] [expand_dump_flag]: 1.20999e-06 [switch_simplify]: 7.80998e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012909 [with_stream_mark]: 9.71e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 6.724e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.87e-06 [auto_parallel]: 5.84999e-06 [parallel]: 4.63001e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 6.17999e-06 [allreduce_slice_to_reducescatter]: 4.20026e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.58998e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 8.99978e-07 [before_grad]: 8.42e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.32e-06 [after_resolve]: 9.96e-06 [a_after_grad]: 7.97e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.58998e-06 [cse]: 1.647e-05 [a_3]: 3.195e-05 [py_interpret_to_execute_after_opt_a]: 8.19002e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 3.016e-05 [convert_after_rewriter]: 6.55997e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00049226 [opt_b]: 0.00018808, [1] [Cycle 1]: 0.00018097, [7] [b_1]: 0.00010959 [b_2]: 7.61001e-06 [updatestate_depend_eliminate]: 5.89999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 7.7e-07 [cse]: 1.781e-05 [optimize_parallel_all_gather_comm]: 1.671e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.468e-05 [loop_unroll]: 0.00042314 [opt_after_cconv]: 9.731e-05, [1] [Cycle 1]: 9.085e-05, [7] [c_1]: 2.766e-05 [parameter_eliminate]: 2.67001e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.762e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.186e-05 [tuple_transform]: 7.034e-05, [1] [Cycle 1]: 6.607e-05, [4] [d_1]: 4.065e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 5.195e-05 [cse_after_recomputation]: 2.01e-05, [1] [Cycle 1]: 1.562e-05, [1] [cse]: 1.059e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.04003e-06 [full_micro_interleaved_order_control]: 2.68e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.61002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.81e-06 [overlap_recompute_comm]: 1.91e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.869e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 1.26002e-06 [symbol_engine_optimizer]: 7.021e-05, [1] [Cycle 1]: 6.608e-05, [6] [build]: 2.88e-06 [elim_shapecalc]: 8.90999e-06 [elim_not_effective]: 1.183e-05 [opt_reshape]: 6.51e-06 [fold_const_symbol]: 8.81002e-06 [renormalize]: 2.60014e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 7.391e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 0.00014153 [opt_after_jit_grad]: 0.00048227 [validate]: 3.305e-05 [backend_pass]: 1.08001e-06 [task_emit]: 0.00677737 [execute]: 7.49002e-06 Sums bootstrap : 0.000562s : 3.03% type_inference : 0.006891s : 37.11% event_method : 0.000058s : 0.31% auto_monad : 0.000067s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000641s : 3.45% optimize.opt_a.with_stream_mark : 0.000023s : 0.12% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.77% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000494s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000046s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000492s : 2.65% optimize.opt_b.b_1 : 0.000110s : 0.59% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.13% optimize.loop_unroll : 0.000423s : 2.28% optimize.opt_after_cconv.c_1 : 0.000028s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.06% optimize.tuple_transform.d_1 : 0.000041s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000074s : 0.40% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000142s : 0.76% opt_after_jit_grad : 0.000482s : 2.60% validate : 0.000033s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006777s : 36.49% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000224 30 11.68% : 0.000026s : 5: substitution.arithmetic_simplify 0.79% : 0.000002s : 2: substitution.elim_not_effective 0.55% : 0.000001s : 2: substitution.fold_const_symbol 2.52% : 0.000006s : 4: substitution.graph_param_transform 74.58% : 0.000167s : 3: substitution.inline 1.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 1.88% : 0.000004s : 4: substitution.remove_not_recompute_node 1.94% : 0.000004s : 4: substitution.replace_old_param 4.82% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006835 2 90.86% : 0.006211s : 1: type_inference.infer 9.14% : 0.000624s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.95% : 0.000028s : 3: replace.inline 29.05% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000174 5 94.40% : 0.000165s : 3: match.inline 5.60% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.95% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.91% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000009s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.39% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.68% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000002s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.10% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.83% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000401 8 46.03% : 0.000184s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.97% : 0.000216s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032525 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.92% : 0.003552s : 1: add_attr 10.88% : 0.003540s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000073s : 1: auto_monad 0.24% : 0.000078s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.85% : 0.000602s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.20% : 0.000065s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.33% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000502s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.11% : 0.001011s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.03% : 0.002287s : 1: opt_a 0.31% : 0.000101s : 1: opt_after_cconv 1.52% : 0.000493s : 1: opt_after_jit_grad 0.59% : 0.000192s : 1: opt_b 12.93% : 0.004204s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.81% : 0.000262s : 1: renormalize.infer 0.69% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000147s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.19% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000073s : 1: symbol_engine_optimizer 20.87% : 0.006789s : 1: task_emit 0.22% : 0.000073s : 1: tuple_transform 21.24% : 0.006908s : 1: type_inference 0.21% : 0.000067s : 1: validate TotalTime = 0.019851, [24] [bootstrap]: 0.00048154 [type_inference]: 0.00461776 [event_method]: 1.121e-05 [auto_monad]: 5.223e-05 [graph_reusing]: 5.15999e-06 [inline]: 2.06e-06 [add_attr]: 0.00320748, [1] [add_attr_with_inline]: 0.00319834, [1] [Cycle 1]: 5.154e-05, [2] [tag_attr]: 1.31e-05 [meta_addattr_fg_expand]: 3.41999e-06 [parallel-infer-symbol]: 3.58e-06 [pre_auto_parallel]: 2.405e-05 [insert-virtual-dataset]: 2.67001e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.0042596, [53] [py_interpret_to_execute]: 1.634e-05 [rewriter_before_opt_a]: 3.884e-05 [opt_a]: 0.00230312, [2] [Cycle 1]: 0.0016733, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 2.427e-05 [loop_unroll]: 1.341e-05 [a_1]: 0.00029741 [with_stream_mark]: 1.461e-05 [recompute_prepare]: 8.17998e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.56001e-06 [updatestate_loads_eliminate]: 3.41001e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.761e-05 [accelerated_algorithm]: 7.11999e-06 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 6.36e-06 [merge_send_recv]: 8.00999e-06 [auto_parallel]: 6.45997e-06 [parallel]: 1.866e-05 [flash_sp]: 8.36002e-06 [merge_comm]: 3.55998e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 1.001e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 8.17e-06 [virtual_dataset]: 5.75001e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 6.07001e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 9.47999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.182e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 9.72999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.48998e-06 [after_resolve]: 1.099e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.00071419 [add_forward_monad_depend]: 5.20999e-06 [auto_monad_grad]: 2.34999e-06 [auto_monad_eliminator]: 1.622e-05 [cse]: 2.686e-05 [a_3]: 4.32e-05 [Cycle 2]: 0.00061983, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 7.14001e-06 [loop_unroll]: 5.43002e-06 [a_1]: 0.00012715 [with_stream_mark]: 1.03e-05 [recompute_prepare]: 5.69999e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 6.819e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.34998e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 4.80999e-06 [auto_parallel]: 5.83002e-06 [parallel]: 5.31002e-06 [flash_sp]: 3.75e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 6.61999e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 6.44001e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.86e-06 [cell_reuse_recompute_pass]: 1.74e-06 [offload_activation]: 7.66001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79999e-06 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 8.47e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.47999e-06 [after_resolve]: 9.92999e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.42999e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 8.40999e-06 [cse]: 1.488e-05 [a_3]: 3.232e-05 [py_interpret_to_execute_after_opt_a]: 8.82999e-06 [slice_cell_reuse_recomputed_activation]: 1.88002e-06 [rewriter_after_opt_a]: 3.417e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 4.85999e-06 [mutable_eliminate]: 0.00051406 [opt_b]: 0.00019071, [1] [Cycle 1]: 0.00018248, [7] [b_1]: 0.00010742 [b_2]: 8.01001e-06 [updatestate_depend_eliminate]: 6.87002e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.71e-06 [renormalize]: 2.50002e-07 [cse]: 1.928e-05 [optimize_parallel_all_gather_comm]: 1.836e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.354e-05 [loop_unroll]: 0.00045225 [opt_after_cconv]: 9.924e-05, [1] [Cycle 1]: 9.251e-05, [7] [c_1]: 2.815e-05 [parameter_eliminate]: 2.51e-06 [updatestate_depend_eliminate]: 5.74999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [cse]: 1.763e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.286e-05 [tuple_transform]: 7.152e-05, [1] [Cycle 1]: 6.707e-05, [4] [d_1]: 4.059e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 2.06e-06 [add_recomputation]: 4.686e-05 [cse_after_recomputation]: 2.086e-05, [1] [Cycle 1]: 1.63e-05, [1] [cse]: 1.085e-05 [environ_conv]: 5.54e-06 [swap_dp_allreduce_reducescatter]: 5.52001e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 5.12e-06 [label_fine_grained_interleaved_index]: 2.80997e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.77002e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.838e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 7.161e-05, [1] [Cycle 1]: 6.739e-05, [6] [build]: 2.98e-06 [elim_shapecalc]: 8.87e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.49001e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.611e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 4.20999e-06 [opt_after_jit_grad]: 0.00050074 [validate]: 3.442e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00639639 [execute]: 6.59001e-06 Sums bootstrap : 0.000482s : 3.08% type_inference : 0.004618s : 29.56% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.25% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.20% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000425s : 2.72% optimize.opt_a.with_stream_mark : 0.000025s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000021s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000714s : 4.57% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.16% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000076s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000514s : 3.29% optimize.opt_b.b_1 : 0.000107s : 0.69% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000452s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000501s : 3.21% validate : 0.000034s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006396s : 40.95% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000127 26 18.71% : 0.000024s : 4: substitution.arithmetic_simplify 1.59% : 0.000002s : 2: substitution.elim_not_effective 1.14% : 0.000001s : 2: substitution.fold_const_symbol 4.14% : 0.000005s : 4: substitution.graph_param_transform 65.19% : 0.000083s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 3.50% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004573 2 91.80% : 0.004197s : 1: type_inference.infer 8.20% : 0.000375s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000141 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.82% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.74% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.32% : 0.000003s : 26: predicate.load_eliminater 1.77% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.71% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.63% : 0.000001s : 4: predicate.parallel_virtual_node 1.16% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.80% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.86% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.25% : 0.000002s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.24% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.40% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000562 6 19.07% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 80.93% : 0.000455s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028980 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.08% : 0.003212s : 1: add_attr 11.05% : 0.003202s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000058s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.79% : 0.000519s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.60% : 0.000463s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.81% : 0.000525s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.71% : 0.000785s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.96% : 0.002306s : 1: opt_a 0.35% : 0.000102s : 1: opt_after_cconv 1.77% : 0.000512s : 1: opt_after_jit_grad 0.67% : 0.000194s : 1: opt_b 14.71% : 0.004264s : 1: optimize 0.08% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.76% : 0.000220s : 1: renormalize.infer 1.68% : 0.000486s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.15% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000074s : 1: symbol_engine_optimizer 22.12% : 0.006409s : 1: task_emit 0.26% : 0.000074s : 1: tuple_transform 16.00% : 0.004637s : 1: type_inference 0.23% : 0.000066s : 1: validate TotalTime = 0.0208799, [24] [bootstrap]: 0.00045443 [type_inference]: 0.00569272 [event_method]: 1.427e-05 [auto_monad]: 5.583e-05 [graph_reusing]: 5.42001e-06 [inline]: 1.81998e-06 [add_attr]: 0.00310124, [1] [add_attr_with_inline]: 0.00309238, [1] [Cycle 1]: 5.195e-05, [2] [tag_attr]: 1.613e-05 [meta_addattr_fg_expand]: 4.63001e-06 [parallel-infer-symbol]: 3.2e-06 [pre_auto_parallel]: 2.824e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00454311, [53] [py_interpret_to_execute]: 2.097e-05 [rewriter_before_opt_a]: 5.916e-05 [opt_a]: 0.00264043, [2] [Cycle 1]: 0.00157607, [45] [expand_dump_flag]: 2.44999e-06 [switch_simplify]: 3.249e-05 [loop_unroll]: 2.103e-05 [a_1]: 0.00045746 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 7.98999e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.702e-05 [accelerated_algorithm]: 6.63998e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 6.23e-06 [merge_send_recv]: 8.02998e-06 [auto_parallel]: 5.96998e-06 [parallel]: 1.854e-05 [flash_sp]: 8.47e-06 [merge_comm]: 3.80998e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 1.027e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.58001e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 5.66e-06 [merge_forward]: 4.42e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 8.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.133e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 1.019e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 2.17001e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 8.91002e-06 [renormalize]: 0.00046229 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 2.54001e-06 [auto_monad_eliminator]: 1.47e-05 [cse]: 2.701e-05 [a_3]: 4.149e-05 [Cycle 2]: 0.00105394, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 7.43e-06 [loop_unroll]: 0.00041608 [a_1]: 0.000137 [with_stream_mark]: 1.423e-05 [recompute_prepare]: 6.93e-06 [updatestate_depend_eliminate]: 3.13998e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.25999e-06 [a_2]: 6.93e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.60999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 5.22e-06 [auto_parallel]: 6.01e-06 [parallel]: 5.05999e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 3.05002e-06 [allreduce_fusion]: 2.75997e-06 [matmul_add_comm_reduction]: 6.78e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 6.13002e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.30001e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.57999e-06 [offload_activation]: 6.69999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.97999e-06 [merge_recompute_call_nodes]: 8.59989e-07 [before_grad]: 8.32998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00002e-06 [meta_fg_expand]: 1.61002e-06 [flash_sp_send_recv_attached]: 9.29984e-07 [receive_attached]: 1.53002e-06 [after_resolve]: 9.47001e-06 [a_after_grad]: 8.16002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 1.46002e-06 [auto_monad_eliminator]: 8.84e-06 [cse]: 1.627e-05 [a_3]: 3.286e-05 [py_interpret_to_execute_after_opt_a]: 9.01998e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.129e-05 [convert_after_rewriter]: 7.14001e-06 [order_py_execute_after_rewriter]: 4.80001e-06 [mutable_eliminate]: 0.00049474 [opt_b]: 0.00018517, [1] [Cycle 1]: 0.00017772, [7] [b_1]: 0.00010702 [b_2]: 6.61e-06 [updatestate_depend_eliminate]: 5.69e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.14e-06 [renormalize]: 4.19997e-07 [cse]: 1.863e-05 [optimize_parallel_all_gather_comm]: 1.722e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.435e-05 [loop_unroll]: 0.00042502 [opt_after_cconv]: 9.659e-05, [1] [Cycle 1]: 9.032e-05, [7] [c_1]: 2.749e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.36998e-06 [cse]: 1.642e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.263e-05 [tuple_transform]: 6.851e-05, [1] [Cycle 1]: 6.423e-05, [4] [d_1]: 3.903e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.95002e-06 [partial_unused_args_eliminate]: 2.27999e-06 [add_recomputation]: 4.46e-05 [cse_after_recomputation]: 2.028e-05, [1] [Cycle 1]: 1.571e-05, [1] [cse]: 1.058e-05 [environ_conv]: 4.43999e-06 [swap_dp_allreduce_reducescatter]: 5.16998e-06 [bias_add_comm_swap]: 2.22001e-06 [label_micro_interleaved_index]: 4.45999e-06 [label_fine_grained_interleaved_index]: 2.50002e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.46998e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.39003e-06 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.11998e-06 [overlap_grad_ring_attention]: 3.88999e-06 [overlap_grad_flash_sp]: 1.809e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.96e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 6.778e-05, [1] [Cycle 1]: 6.349e-05, [6] [build]: 3.14999e-06 [elim_shapecalc]: 7.95998e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 6.06998e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.60001e-06 [auto_monad_reorder]: 1.5e-05 [get_jit_bprop_graph]: 9.29984e-07 [rewriter_after_jit_bprop_graph]: 3.93001e-06 [opt_after_jit_grad]: 0.0004937 [validate]: 3.277e-05 [backend_pass]: 1.04003e-06 [task_emit]: 0.00620937 [execute]: 7.45998e-06 Sums bootstrap : 0.000454s : 2.71% type_inference : 0.005693s : 33.91% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000437s : 2.60% optimize.opt_a.a_1 : 0.000594s : 3.54% optimize.opt_a.with_stream_mark : 0.000028s : 0.17% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000462s : 2.75% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.14% optimize.opt_a.cse : 0.000043s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000495s : 2.95% optimize.opt_b.b_1 : 0.000107s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000425s : 2.53% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000494s : 2.94% validate : 0.000033s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006209s : 36.99% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000174 30 15.67% : 0.000027s : 5: substitution.arithmetic_simplify 0.97% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.20% : 0.000006s : 4: substitution.graph_param_transform 66.56% : 0.000116s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.46% : 0.000004s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.25% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005648 2 90.15% : 0.005091s : 1: type_inference.infer 9.85% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.27% : 0.000028s : 3: replace.inline 29.73% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 92.07% : 0.000114s : 3: match.inline 7.93% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.40% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.90% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.89% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.26% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 46.01% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.99% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030517 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.18% : 0.003106s : 1: add_attr 10.15% : 0.003096s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.61% : 0.000492s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.65% : 0.000504s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 4.51% : 0.001377s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 8.66% : 0.002644s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.65% : 0.000504s : 1: opt_after_jit_grad 0.62% : 0.000188s : 1: opt_b 14.90% : 0.004547s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.79% : 0.000241s : 1: renormalize.infer 0.70% : 0.000214s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000070s : 1: symbol_engine_optimizer 20.38% : 0.006220s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 18.71% : 0.005710s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0385187, [24] [bootstrap]: 0.00050809 [type_inference]: 0.0121378 [event_method]: 4.696e-05 [auto_monad]: 0.00015603 [graph_reusing]: 8.27998e-06 [inline]: 1.92001e-06 [add_attr]: 0.00309417, [1] [add_attr_with_inline]: 0.00308521, [1] [Cycle 1]: 7.69e-05, [2] [tag_attr]: 3.61e-05 [meta_addattr_fg_expand]: 9.10001e-06 [parallel-infer-symbol]: 3.68e-06 [pre_auto_parallel]: 5.192e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0140662, [53] [py_interpret_to_execute]: 3.902e-05 [rewriter_before_opt_a]: 0.00014666 [opt_a]: 0.0115224, [3] [Cycle 1]: 0.00766155, [45] [expand_dump_flag]: 3.96001e-06 [switch_simplify]: 7.375e-05 [loop_unroll]: 6.274e-05 [a_1]: 0.00178643 [with_stream_mark]: 2.731e-05 [recompute_prepare]: 2.342e-05 [updatestate_depend_eliminate]: 9.54e-06 [updatestate_assign_eliminate]: 7.87e-06 [updatestate_loads_eliminate]: 7.16999e-06 [parameter_eliminate]: 3.04001e-06 [a_2]: 0.00024468 [accelerated_algorithm]: 3.306e-05 [shard]: 2.03002e-06 [meta_shard_fg_expand]: 3.61999e-06 [shard_inline]: 1.613e-05 [merge_send_recv]: 1.615e-05 [auto_parallel]: 1.152e-05 [parallel]: 1.853e-05 [flash_sp]: 1.264e-05 [merge_comm]: 9.62999e-06 [allreduce_fusion]: 8.80001e-06 [matmul_add_comm_reduction]: 2.93e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.794e-05 [virtual_dataset]: 1.55e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.536e-05 [merge_forward]: 9.51e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.811e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.921e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.744e-05 [set_forward_comm_id_for_comm_node_pass]: 1.006e-05 [meta_fg_expand]: 0.0015009 [flash_sp_send_recv_attached]: 4.12e-06 [receive_attached]: 2.55002e-06 [after_resolve]: 6.36e-05 [a_after_grad]: 8.396e-05 [renormalize]: 0.00255895 [add_forward_monad_depend]: 1.056e-05 [auto_monad_grad]: 6.26e-06 [auto_monad_eliminator]: 5.53e-05 [cse]: 0.00016209 [a_3]: 0.00033571 [Cycle 2]: 0.00302936, [45] [expand_dump_flag]: 2.18998e-06 [switch_simplify]: 4.611e-05 [loop_unroll]: 4.336e-05 [a_1]: 0.00158717 [with_stream_mark]: 1.479e-05 [recompute_prepare]: 1.146e-05 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.21001e-06 [parameter_eliminate]: 1.38002e-06 [a_2]: 0.00011032 [accelerated_algorithm]: 1.118e-05 [shard]: 1.60999e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 8.1e-06 [merge_send_recv]: 7.04001e-06 [auto_parallel]: 7.69002e-06 [parallel]: 6.20002e-06 [flash_sp]: 3.53999e-06 [merge_comm]: 4.41002e-06 [allreduce_fusion]: 4.22e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 4.89992e-07 [virtual_shard_identity]: 9.44998e-06 [virtual_dataset]: 7.68001e-06 [get_grad_eliminate_]: 7.39002e-06 [virtual_output]: 7.43e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 9.29998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.559e-05 [merge_recompute_call_nodes]: 1.04e-06 [before_grad]: 1.276e-05 [set_forward_comm_id_for_comm_node_pass]: 4.63999e-06 [meta_fg_expand]: 8.633e-05 [flash_sp_send_recv_attached]: 1.52001e-06 [receive_attached]: 1.72999e-06 [after_resolve]: 1.544e-05 [a_after_grad]: 1.303e-05 [renormalize]: 0.00058196 [add_forward_monad_depend]: 4.24002e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.384e-05 [cse]: 2.525e-05 [a_3]: 5.823e-05 [Cycle 3]: 0.00081534, [45] [expand_dump_flag]: 1.40001e-06 [switch_simplify]: 9.81e-06 [loop_unroll]: 7.92e-06 [a_1]: 0.00021451 [with_stream_mark]: 9.24e-06 [recompute_prepare]: 8.32e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.61999e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00010712 [accelerated_algorithm]: 1.047e-05 [shard]: 1.41002e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 8.32e-06 [merge_send_recv]: 6.56e-06 [auto_parallel]: 6.59999e-06 [parallel]: 5.00001e-06 [flash_sp]: 1.34998e-06 [merge_comm]: 4.39002e-06 [allreduce_fusion]: 4.03999e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 5.09986e-07 [virtual_shard_identity]: 8.87e-06 [virtual_dataset]: 7.61999e-06 [get_grad_eliminate_]: 7.55e-06 [virtual_output]: 7.27002e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 8.03999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.385e-05 [merge_recompute_call_nodes]: 8.89995e-07 [before_grad]: 1.219e-05 [set_forward_comm_id_for_comm_node_pass]: 4.45e-06 [meta_fg_expand]: 2.58003e-06 [flash_sp_send_recv_attached]: 9.40025e-07 [receive_attached]: 1.29e-06 [after_resolve]: 1.307e-05 [a_after_grad]: 1.222e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 9.97999e-06 [cse]: 2.055e-05 [a_3]: 4.933e-05 [py_interpret_to_execute_after_opt_a]: 1.1e-05 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 4.533e-05 [convert_after_rewriter]: 8.99998e-06 [order_py_execute_after_rewriter]: 6.35002e-06 [mutable_eliminate]: 0.00051762 [opt_b]: 0.00027288, [1] [Cycle 1]: 0.00026558, [7] [b_1]: 0.00017747 [b_2]: 1.029e-05 [updatestate_depend_eliminate]: 8.75001e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.41001e-06 [renormalize]: 4.00003e-07 [cse]: 2.618e-05 [optimize_parallel_all_gather_comm]: 1.977e-05 [overlap_param_gather]: 1.93002e-06 [cconv]: 2.184e-05 [loop_unroll]: 0.00068014 [opt_after_cconv]: 0.00012623, [1] [Cycle 1]: 0.00011972, [7] [c_1]: 4.333e-05 [parameter_eliminate]: 2.57001e-06 [updatestate_depend_eliminate]: 6.76e-06 [updatestate_assign_eliminate]: 3.64002e-06 [updatestate_loads_eliminate]: 3.33e-06 [cse]: 2.487e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.863e-05 [tuple_transform]: 9.555e-05, [1] [Cycle 1]: 9.032e-05, [4] [d_1]: 6.065e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.16002e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 5.691e-05 [cse_after_recomputation]: 2.8e-05, [1] [Cycle 1]: 2.33e-05, [1] [cse]: 1.786e-05 [environ_conv]: 9.49999e-06 [swap_dp_allreduce_reducescatter]: 7.28999e-06 [bias_add_comm_swap]: 2.41998e-06 [label_micro_interleaved_index]: 5.41998e-06 [label_fine_grained_interleaved_index]: 2.43e-06 [merge_cast_opt]: 1.49998e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.47999e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 7.50006e-07 [full_micro_interleaved_order_control]: 2.53003e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.22999e-06 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.506e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 4.38001e-06 [overlap_recompute_and_grad_model_parallel]: 5.42999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.82e-06 [overlap_grad_flash_sp]: 2.378e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.40002e-06 [split_layernorm_comm]: 2.34001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.249e-05, [1] [Cycle 1]: 8.806e-05, [6] [build]: 1.083e-05 [elim_shapecalc]: 1.192e-05 [elim_not_effective]: 1.576e-05 [opt_reshape]: 8.55999e-06 [fold_const_symbol]: 1.305e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 2.201e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 3.95998e-06 [opt_after_jit_grad]: 0.00049575 [validate]: 4.316e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00763253 [execute]: 7.13998e-06 Sums bootstrap : 0.000508s : 1.49% type_inference : 0.012138s : 35.60% event_method : 0.000047s : 0.14% auto_monad : 0.000156s : 0.46% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000052s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000147s : 0.43% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.38% optimize.opt_a.loop_unroll : 0.000114s : 0.33% optimize.opt_a.a_1 : 0.003588s : 10.52% optimize.opt_a.with_stream_mark : 0.000051s : 0.15% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000462s : 1.36% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.16% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000033s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000030s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000017s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000047s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.11% optimize.opt_a.virtual_dataset : 0.000031s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.09% optimize.opt_a.virtual_output : 0.000030s : 0.09% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000059s : 0.17% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.06% optimize.opt_a.meta_fg_expand : 0.001590s : 4.66% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000092s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.32% optimize.opt_a.renormalize : 0.003141s : 9.21% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000079s : 0.23% optimize.opt_a.cse : 0.000208s : 0.61% optimize.opt_a.a_3 : 0.000443s : 1.30% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000045s : 0.13% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000518s : 1.52% optimize.opt_b.b_1 : 0.000177s : 0.52% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000026s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.06% optimize.loop_unroll : 0.000680s : 1.99% optimize.opt_after_cconv.c_1 : 0.000043s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000025s : 0.07% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.08% optimize.tuple_transform.d_1 : 0.000061s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000018s : 0.05% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000496s : 1.45% validate : 0.000043s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.007633s : 22.38% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000887 213 5.78% : 0.000051s : 12: substitution.arithmetic_simplify 0.29% : 0.000003s : 4: substitution.elim_not_effective 0.45% : 0.000004s : 5: substitution.float_depend_g_call 0.47% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.21% : 0.000002s : 4: substitution.fold_const_symbol 0.79% : 0.000007s : 7: substitution.graph_param_transform 0.29% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.17% : 0.000489s : 17: substitution.inline 2.02% : 0.000018s : 2: substitution.inline_without_move 1.06% : 0.000009s : 18: substitution.j_node_and_user_rematch 1.88% : 0.000017s : 3: substitution.less_batch_normalization 7.75% : 0.000069s : 11: substitution.minmaximum_grad 0.65% : 0.000006s : 5: substitution.partial_eliminate 1.51% : 0.000013s : 18: substitution.remove_not_recompute_node 2.94% : 0.000026s : 10: substitution.replace_applicator 1.31% : 0.000012s : 15: substitution.replace_old_param 0.27% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.25% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.57% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.08% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.94% : 0.000070s : 30: substitution.tuple_list_get_item_eliminator 2.10% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012055 2 87.53% : 0.010552s : 1: type_inference.infer 12.47% : 0.001504s : 1: type_inference.specialize ------[replace.] 0.000231 33 59.27% : 0.000137s : 17: replace.inline 40.73% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000517 33 92.95% : 0.000481s : 17: match.inline 7.05% : 0.000036s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000986 5530 0.80% : 0.000008s : 66: predicate.accumulaten_eliminater 0.20% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.37% : 0.000004s : 30: predicate.addn_check_dump 0.78% : 0.000008s : 66: predicate.addn_zero_filter 0.79% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 1.48% : 0.000015s : 96: predicate.arithmetic_simplify 0.82% : 0.000008s : 66: predicate.cast_eliminate 0.84% : 0.000008s : 65: predicate.check_bprop_eliminate 0.37% : 0.000004s : 30: predicate.compare_switch_simplify 0.07% : 0.000001s : 7: predicate.const_output_eliminate 0.36% : 0.000004s : 30: predicate.depend_value_elim 0.88% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 0.87% : 0.000009s : 66: predicate.dict_get_item_eliminator 0.83% : 0.000008s : 66: predicate.dict_set_item_eliminator 0.30% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.06% : 0.000001s : 7: predicate.elim_not_effective 0.10% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 0.92% : 0.000009s : 73: predicate.environ_add_const_eliminate 0.88% : 0.000009s : 73: predicate.environ_get_add_eliminate 0.89% : 0.000009s : 73: predicate.environ_get_depend_swap 1.29% : 0.000013s : 103: predicate.environ_get_eliminate 0.86% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.29% : 0.000013s : 99: predicate.exchange_switch_depend_value 1.78% : 0.000018s : 99: predicate.float_depend_g_call 0.37% : 0.000004s : 30: predicate.float_environ_get_switch 0.46% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.06% : 0.000001s : 7: predicate.fold_const_symbol 0.39% : 0.000004s : 30: predicate.get_grad_eliminate 0.06% : 0.000001s : 7: predicate.graph_param_transform 0.38% : 0.000004s : 30: predicate.incorporate_call 0.35% : 0.000003s : 30: predicate.incorporate_call_switch 4.14% : 0.000041s : 239: predicate.inline 0.93% : 0.000009s : 53: predicate.inline_without_move 0.21% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.45% : 0.000004s : 30: predicate.less_batch_normalization 1.20% : 0.000012s : 96: predicate.list_to_tuple_eliminator_ 1.96% : 0.000019s : 162: predicate.load_eliminater 0.30% : 0.000003s : 7: predicate.loop_unroll_after_grad 1.72% : 0.000017s : 134: predicate.loop_unroll_before_grad 0.99% : 0.000010s : 80: predicate.make_slice_get_slice_eliminator 0.39% : 0.000004s : 30: predicate.merge_addn 0.80% : 0.000008s : 65: predicate.micro_step_allgather_replace 0.82% : 0.000008s : 65: predicate.mini_step_allgather_replace 0.83% : 0.000008s : 66: predicate.minmaximum_grad 0.29% : 0.000003s : 7: predicate.mutable_eliminate 0.10% : 0.000001s : 7: predicate.opt_reshape 0.11% : 0.000001s : 7: predicate.parallel_virtual_node 1.62% : 0.000016s : 99: predicate.partial_defer_inline 1.29% : 0.000013s : 89: predicate.partial_eliminate 0.78% : 0.000008s : 66: predicate.print_const_string_wrapper 0.38% : 0.000004s : 30: predicate.reduce_all_const_elim 0.98% : 0.000010s : 66: predicate.reduce_eliminate 1.97% : 0.000019s : 162: predicate.redundant_stop_gradient_eliminater 0.24% : 0.000002s : 30: predicate.remove_not_recompute_node 1.48% : 0.000015s : 147: predicate.replace_applicator 0.51% : 0.000005s : 53: predicate.replace_old_param 0.08% : 0.000001s : 7: predicate.reset_defer_inline 0.79% : 0.000008s : 66: predicate.reshape_eliminate 0.84% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.10% : 0.000001s : 7: predicate.row_tensor_eliminate 0.94% : 0.000009s : 65: predicate.same_eliminate 0.26% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.47% : 0.000005s : 30: predicate.shard_identity_eliminate 0.20% : 0.000002s : 14: predicate.special_op_eliminate 0.45% : 0.000004s : 30: predicate.specialize_transform 0.97% : 0.000010s : 65: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.10% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.38% : 0.000014s : 99: predicate.switch_defer_inline 28.35% : 0.000280s : 164: predicate.switch_layer_defer_inline 3.75% : 0.000037s : 270: predicate.switch_simplify 0.78% : 0.000008s : 66: predicate.tile_eliminate 0.79% : 0.000008s : 66: predicate.transpose_eliminate 1.06% : 0.000010s : 80: predicate.tuple_list_convert_item_index_to_positive 1.14% : 0.000011s : 80: predicate.tuple_list_get_item_const_eliminator 0.98% : 0.000010s : 80: predicate.tuple_list_get_item_depend_reorder 2.11% : 0.000021s : 126: predicate.tuple_list_get_item_eliminator 1.09% : 0.000011s : 80: predicate.tuple_list_get_set_item_eliminator 1.47% : 0.000014s : 110: predicate.tuple_list_set_item_eliminator 1.17% : 0.000012s : 96: predicate.tuple_to_list_eliminator_ 1.95% : 0.000019s : 162: predicate.updatestate_pure_node_eliminater 2.39% : 0.000024s : 192: predicate.updatestate_useless_node_eliminater 0.10% : 0.000001s : 7: predicate.value_based_eliminate 0.40% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.40% : 0.000004s : 30: predicate.virtual_output_eliminate 0.10% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.11% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001613 34 57.40% : 0.000926s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.60% : 0.000687s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064293 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.82% : 0.003099s : 1: add_attr 4.80% : 0.003089s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.26% : 0.000165s : 1: auto_monad 0.04% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.85% : 0.000545s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000018s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000031s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.08% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 1.07% : 0.000690s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.82% : 0.000529s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.06% : 0.005185s : 117: opt.transform.opt_a 0.07% : 0.000042s : 1: opt.transform.opt_after_cconv 0.05% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000161s : 28: opt.transform.opt_b 0.11% : 0.000068s : 2: opt.transform.opt_trans_graph 0.07% : 0.000046s : 4: opt.transform.symbol_engine_opt 17.93% : 0.011526s : 1: opt_a 0.20% : 0.000130s : 1: opt_after_cconv 0.79% : 0.000507s : 1: opt_after_jit_grad 0.43% : 0.000276s : 1: opt_b 21.89% : 0.014071s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000057s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.63% : 0.001693s : 2: renormalize.infer 2.23% : 0.001433s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000152s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000095s : 1: symbol_engine_optimizer 11.89% : 0.007645s : 1: task_emit 0.15% : 0.000098s : 1: tuple_transform 18.91% : 0.012157s : 1: type_inference 0.12% : 0.000079s : 1: validate TotalTime = 0.0192211, [24] [bootstrap]: 0.00075581 [type_inference]: 0.00446152 [event_method]: 1.078e-05 [auto_monad]: 5.04e-05 [graph_reusing]: 5.51002e-06 [inline]: 2.00002e-06 [add_attr]: 0.00309504, [1] [add_attr_with_inline]: 0.00308682, [1] [Cycle 1]: 5.135e-05, [2] [tag_attr]: 1.289e-05 [meta_addattr_fg_expand]: 3.48e-06 [parallel-infer-symbol]: 3.23e-06 [pre_auto_parallel]: 2.32e-05 [insert-virtual-dataset]: 2.21998e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00384524, [53] [py_interpret_to_execute]: 1.658e-05 [rewriter_before_opt_a]: 3.851e-05 [opt_a]: 0.00191579, [2] [Cycle 1]: 0.00131061, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 2.417e-05 [loop_unroll]: 1.346e-05 [a_1]: 0.00029531 [with_stream_mark]: 1.37e-05 [recompute_prepare]: 7.64002e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.748e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.41998e-06 [shard_inline]: 6.00002e-06 [merge_send_recv]: 7.78001e-06 [auto_parallel]: 7.11001e-06 [parallel]: 1.787e-05 [flash_sp]: 8.83001e-06 [merge_comm]: 3.70998e-06 [allreduce_fusion]: 3.6e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.31999e-06 [virtual_dataset]: 5.84e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.87999e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.37999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.128e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.76e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 1.97999e-06 [flash_sp_send_recv_attached]: 2.94999e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 1.015e-05 [a_after_grad]: 8.58001e-06 [renormalize]: 0.00038338 [add_forward_monad_depend]: 4.20999e-06 [auto_monad_grad]: 2.04999e-06 [auto_monad_eliminator]: 1.412e-05 [cse]: 2.695e-05 [a_3]: 4.183e-05 [Cycle 2]: 0.00059449, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 7.21001e-06 [loop_unroll]: 5.54998e-06 [a_1]: 0.00012466 [with_stream_mark]: 9.97001e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 6.766e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.20999e-06 [parallel]: 4.77998e-06 [flash_sp]: 3.76999e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.54e-06 [allreduce_slice_to_reducescatter]: 4.90021e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.56998e-06 [after_resolve]: 9.29e-06 [a_after_grad]: 7.87e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.309e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 7.31001e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.195e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 4.99998e-06 [mutable_eliminate]: 0.00048565 [opt_b]: 0.00018319, [1] [Cycle 1]: 0.00017651, [7] [b_1]: 0.00010784 [b_2]: 7.15998e-06 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.50002e-06 [renormalize]: 3.80009e-07 [cse]: 1.695e-05 [optimize_parallel_all_gather_comm]: 1.609e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.498e-05 [loop_unroll]: 0.00041546 [opt_after_cconv]: 9.501e-05, [1] [Cycle 1]: 8.923e-05, [7] [c_1]: 2.723e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.662e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.24e-05 [tuple_transform]: 6.997e-05, [1] [Cycle 1]: 6.536e-05, [4] [d_1]: 3.931e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.16998e-06 [partial_unused_args_eliminate]: 2.07001e-06 [add_recomputation]: 4.291e-05 [cse_after_recomputation]: 2.004e-05, [1] [Cycle 1]: 1.57e-05, [1] [cse]: 1.055e-05 [environ_conv]: 5.02e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 9.10019e-07 [remove_cast_before_assign_add]: 1.28002e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.188e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.85e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 0.00014091, [1] [Cycle 1]: 0.00013693, [6] [build]: 2.70002e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 8.129e-05 [opt_reshape]: 6.45002e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.47001e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.579e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 4.05998e-06 [opt_after_jit_grad]: 0.00046261 [validate]: 3.246e-05 [backend_pass]: 1.19e-06 [task_emit]: 0.00623588 [execute]: 6.49999e-06 Sums bootstrap : 0.000756s : 4.99% type_inference : 0.004462s : 29.44% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.33% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.25% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000420s : 2.77% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.96% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000383s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000040s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000486s : 3.20% optimize.opt_b.b_1 : 0.000108s : 0.71% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.16% optimize.loop_unroll : 0.000415s : 2.74% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000081s : 0.54% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000463s : 3.05% validate : 0.000032s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006236s : 41.15% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000193 26 11.50% : 0.000022s : 4: substitution.arithmetic_simplify 36.28% : 0.000070s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 2.84% : 0.000005s : 4: substitution.graph_param_transform 42.89% : 0.000083s : 2: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.09% : 0.000004s : 4: substitution.remove_not_recompute_node 1.99% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004417 2 91.86% : 0.004057s : 1: type_inference.infer 8.14% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000138 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.47% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.60% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.82% : 0.000003s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.23% : 0.000002s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.15% : 0.000002s : 8: predicate.shard_identity_eliminate 1.02% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.96% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000255 6 40.29% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.71% : 0.000152s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027547 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.25% : 0.003099s : 1: add_attr 11.22% : 0.003090s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000056s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.87% : 0.000790s : 1: bootstrap 0.10% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.54% : 0.000425s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.80% : 0.000496s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.81% : 0.000773s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.37% : 0.000102s : 4: opt.transform.symbol_engine_opt 6.97% : 0.001919s : 1: opt_a 0.36% : 0.000099s : 1: opt_after_cconv 1.72% : 0.000473s : 1: opt_after_jit_grad 0.68% : 0.000187s : 1: opt_b 13.97% : 0.003849s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.78% : 0.000215s : 1: renormalize.infer 0.59% : 0.000162s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.15% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.52% : 0.000143s : 1: symbol_engine_optimizer 22.68% : 0.006247s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 16.25% : 0.004477s : 1: type_inference 0.21% : 0.000059s : 1: validate TotalTime = 0.0386939, [24] [bootstrap]: 0.00051468 [type_inference]: 0.0112239 [event_method]: 4.082e-05 [auto_monad]: 0.00011867 [graph_reusing]: 9.02e-06 [inline]: 2.22999e-06 [add_attr]: 0.0032186, [1] [add_attr_with_inline]: 0.00320892, [1] [Cycle 1]: 8.135e-05, [2] [tag_attr]: 3.582e-05 [meta_addattr_fg_expand]: 8.73001e-06 [parallel-infer-symbol]: 3.32002e-06 [pre_auto_parallel]: 4.881e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.0142502, [53] [py_interpret_to_execute]: 3.947e-05 [rewriter_before_opt_a]: 0.00013165 [opt_a]: 0.0117528, [3] [Cycle 1]: 0.00783169, [45] [expand_dump_flag]: 3.95998e-06 [switch_simplify]: 6.763e-05 [loop_unroll]: 5.569e-05 [a_1]: 0.00139228 [with_stream_mark]: 2.782e-05 [recompute_prepare]: 2.442e-05 [updatestate_depend_eliminate]: 9.51e-06 [updatestate_assign_eliminate]: 8.50001e-06 [updatestate_loads_eliminate]: 7.5e-06 [parameter_eliminate]: 2.69001e-06 [a_2]: 0.00024768 [accelerated_algorithm]: 3.304e-05 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 3.6e-06 [shard_inline]: 1.639e-05 [merge_send_recv]: 1.667e-05 [auto_parallel]: 1.248e-05 [parallel]: 1.939e-05 [flash_sp]: 1.194e-05 [merge_comm]: 1.006e-05 [allreduce_fusion]: 9.42001e-06 [matmul_add_comm_reduction]: 3.162e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.908e-05 [virtual_dataset]: 1.631e-05 [get_grad_eliminate_]: 1.552e-05 [virtual_output]: 1.546e-05 [merge_forward]: 9.89999e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 1.828e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.024e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.782e-05 [set_forward_comm_id_for_comm_node_pass]: 1.031e-05 [meta_fg_expand]: 0.0018025 [flash_sp_send_recv_attached]: 3.83001e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 6.376e-05 [a_after_grad]: 8.543e-05 [renormalize]: 0.0027901 [add_forward_monad_depend]: 1.119e-05 [auto_monad_grad]: 6.39999e-06 [auto_monad_eliminator]: 5.813e-05 [cse]: 0.00016497 [a_3]: 0.00033895 [Cycle 2]: 0.00302093, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 4.784e-05 [loop_unroll]: 4.479e-05 [a_1]: 0.00157258 [with_stream_mark]: 1.653e-05 [recompute_prepare]: 1.191e-05 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 4.38999e-06 [updatestate_loads_eliminate]: 3.55998e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 0.00011196 [accelerated_algorithm]: 1.285e-05 [shard]: 2.17999e-06 [meta_shard_fg_expand]: 2.51998e-06 [shard_inline]: 8.18001e-06 [merge_send_recv]: 8.95001e-06 [auto_parallel]: 1.028e-05 [parallel]: 6.88e-06 [flash_sp]: 3.77002e-06 [merge_comm]: 4.55999e-06 [allreduce_fusion]: 4.33999e-06 [matmul_add_comm_reduction]: 9.52999e-06 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 1.04e-05 [virtual_dataset]: 7.77e-06 [get_grad_eliminate_]: 7.51001e-06 [virtual_output]: 7.73999e-06 [merge_forward]: 5.27001e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 1.102e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.488e-05 [merge_recompute_call_nodes]: 1.22e-06 [before_grad]: 1.316e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 4.306e-05 [flash_sp_send_recv_attached]: 1.44e-06 [receive_attached]: 1.97001e-06 [after_resolve]: 1.496e-05 [a_after_grad]: 1.244e-05 [renormalize]: 0.00059363 [add_forward_monad_depend]: 4.25999e-06 [auto_monad_grad]: 2.00002e-06 [auto_monad_eliminator]: 1.551e-05 [cse]: 2.81e-05 [a_3]: 5.883e-05 [Cycle 3]: 0.00088232, [45] [expand_dump_flag]: 1.42e-06 [switch_simplify]: 1.058e-05 [loop_unroll]: 7.79002e-06 [a_1]: 0.00025114 [with_stream_mark]: 1.079e-05 [recompute_prepare]: 8.27998e-06 [updatestate_depend_eliminate]: 4.32e-06 [updatestate_assign_eliminate]: 3.66999e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 0.00010875 [accelerated_algorithm]: 1.191e-05 [shard]: 1.59e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 8.52e-06 [merge_send_recv]: 7.4e-06 [auto_parallel]: 7.15e-06 [parallel]: 5.44998e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 5.14e-06 [allreduce_fusion]: 3.85998e-06 [matmul_add_comm_reduction]: 7.96001e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 9.25999e-06 [virtual_dataset]: 8.06001e-06 [get_grad_eliminate_]: 8.13001e-06 [virtual_output]: 7.35998e-06 [merge_forward]: 3.57997e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 8.1e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.503e-05 [merge_recompute_call_nodes]: 1.03001e-06 [before_grad]: 1.277e-05 [set_forward_comm_id_for_comm_node_pass]: 5.00001e-06 [meta_fg_expand]: 2.46998e-06 [flash_sp_send_recv_attached]: 9.49978e-07 [receive_attached]: 1.50999e-06 [after_resolve]: 1.293e-05 [a_after_grad]: 1.271e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.61998e-06 [auto_monad_grad]: 1.24998e-06 [auto_monad_eliminator]: 1.165e-05 [cse]: 2.402e-05 [a_3]: 4.978e-05 [py_interpret_to_execute_after_opt_a]: 1.375e-05 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 4.767e-05 [convert_after_rewriter]: 8.99e-06 [order_py_execute_after_rewriter]: 6.46e-06 [mutable_eliminate]: 0.00064859 [opt_b]: 0.00027184, [1] [Cycle 1]: 0.00026342, [7] [b_1]: 0.00016388 [b_2]: 1.581e-05 [updatestate_depend_eliminate]: 8.80001e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 3.86999e-06 [renormalize]: 7.09988e-07 [cse]: 2.848e-05 [optimize_parallel_all_gather_comm]: 2.051e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.47e-05 [loop_unroll]: 0.00047992 [opt_after_cconv]: 0.0001309, [1] [Cycle 1]: 0.00012352, [7] [c_1]: 4.281e-05 [parameter_eliminate]: 2.71e-06 [updatestate_depend_eliminate]: 8.52998e-06 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 3.76001e-06 [cse]: 2.718e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 3.071e-05 [tuple_transform]: 9.659e-05, [1] [Cycle 1]: 9.123e-05, [4] [d_1]: 6.075e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 8.93002e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 5.922e-05 [cse_after_recomputation]: 2.948e-05, [1] [Cycle 1]: 2.437e-05, [1] [cse]: 1.798e-05 [environ_conv]: 8.73001e-06 [swap_dp_allreduce_reducescatter]: 6.69001e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 5.56e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.10999e-06 [slice_recompute_activation]: 2.62001e-06 [micro_interleaved_order_control]: 2.58998e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.35999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.47001e-06 [control_data_broadcast_order]: 1.644e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 4.77e-06 [overlap_recompute_and_grad_model_parallel]: 6.21998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.61999e-06 [overlap_grad_ring_attention]: 5.02e-06 [overlap_grad_flash_sp]: 2.66e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.802e-05, [1] [Cycle 1]: 9.301e-05, [6] [build]: 1.154e-05 [elim_shapecalc]: 1.415e-05 [elim_not_effective]: 1.583e-05 [opt_reshape]: 8.75001e-06 [fold_const_symbol]: 1.25e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.09e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 2.29e-05 [get_jit_bprop_graph]: 1.67001e-06 [rewriter_after_jit_bprop_graph]: 4.47e-06 [opt_after_jit_grad]: 0.00051457 [validate]: 4.505e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00840889 [execute]: 7.9e-06 Sums bootstrap : 0.000515s : 1.51% type_inference : 0.011224s : 32.94% event_method : 0.000041s : 0.12% auto_monad : 0.000119s : 0.35% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000132s : 0.39% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000126s : 0.37% optimize.opt_a.loop_unroll : 0.000108s : 0.32% optimize.opt_a.a_1 : 0.003216s : 9.44% optimize.opt_a.with_stream_mark : 0.000055s : 0.16% optimize.opt_a.recompute_prepare : 0.000045s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000468s : 1.37% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.17% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000033s : 0.10% optimize.opt_a.merge_send_recv : 0.000033s : 0.10% optimize.opt_a.auto_parallel : 0.000030s : 0.09% optimize.opt_a.parallel : 0.000032s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000049s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.11% optimize.opt_a.virtual_dataset : 0.000032s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.09% optimize.opt_a.virtual_output : 0.000031s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000054s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001848s : 5.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000092s : 0.27% optimize.opt_a.a_after_grad : 0.000111s : 0.32% optimize.opt_a.renormalize : 0.003384s : 9.93% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.05% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.25% optimize.opt_a.cse : 0.000217s : 0.64% optimize.opt_a.a_3 : 0.000448s : 1.31% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000649s : 1.90% optimize.opt_b.b_1 : 0.000164s : 0.48% optimize.opt_b.b_2 : 0.000016s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000028s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.07% optimize.loop_unroll : 0.000480s : 1.41% optimize.opt_after_cconv.c_1 : 0.000043s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000027s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000061s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000059s : 0.17% optimize.cse_after_recomputation.cse : 0.000018s : 0.05% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000027s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000023s : 0.07% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000515s : 1.51% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008409s : 24.68% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000831 209 6.53% : 0.000054s : 11: substitution.arithmetic_simplify 0.30% : 0.000003s : 4: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 4: substitution.fold_const_symbol 0.93% : 0.000008s : 7: substitution.graph_param_transform 0.41% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 57.27% : 0.000476s : 16: substitution.inline 2.27% : 0.000019s : 2: substitution.inline_without_move 1.18% : 0.000010s : 18: substitution.j_node_and_user_rematch 2.10% : 0.000017s : 3: substitution.less_batch_normalization 1.62% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.56% : 0.000013s : 18: substitution.remove_not_recompute_node 3.32% : 0.000028s : 10: substitution.replace_applicator 1.37% : 0.000011s : 15: substitution.replace_old_param 0.41% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.67% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.64% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.52% : 0.000071s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011140 2 87.27% : 0.009722s : 1: type_inference.infer 12.73% : 0.001418s : 1: type_inference.specialize ------[replace.] 0.000225 30 60.65% : 0.000137s : 16: replace.inline 39.35% : 0.000089s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000504 30 92.91% : 0.000468s : 16: match.inline 7.09% : 0.000036s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000721 5429 1.08% : 0.000008s : 65: predicate.accumulaten_eliminater 0.27% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 30: predicate.addn_check_dump 1.08% : 0.000008s : 65: predicate.addn_zero_filter 1.05% : 0.000008s : 65: predicate.adjust_all_reduce_mul_add 2.11% : 0.000015s : 95: predicate.arithmetic_simplify 1.12% : 0.000008s : 65: predicate.cast_eliminate 1.13% : 0.000008s : 65: predicate.check_bprop_eliminate 0.50% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.50% : 0.000004s : 30: predicate.depend_value_elim 1.18% : 0.000009s : 65: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 65: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.48% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 7: predicate.elim_not_effective 0.22% : 0.000002s : 7: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 72: predicate.environ_add_const_eliminate 1.16% : 0.000008s : 72: predicate.environ_get_add_eliminate 1.17% : 0.000008s : 72: predicate.environ_get_depend_swap 1.71% : 0.000012s : 102: predicate.environ_get_eliminate 1.17% : 0.000008s : 72: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 95: predicate.exchange_switch_depend_value 2.25% : 0.000016s : 95: predicate.float_depend_g_call 0.49% : 0.000004s : 30: predicate.float_environ_get_switch 0.65% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.55% : 0.000004s : 30: predicate.get_grad_eliminate 0.08% : 0.000001s : 7: predicate.graph_param_transform 0.52% : 0.000004s : 30: predicate.incorporate_call 0.49% : 0.000004s : 30: predicate.incorporate_call_switch 5.66% : 0.000041s : 234: predicate.inline 1.30% : 0.000009s : 53: predicate.inline_without_move 0.31% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 30: predicate.less_batch_normalization 1.60% : 0.000012s : 93: predicate.list_to_tuple_eliminator_ 2.59% : 0.000019s : 158: predicate.load_eliminater 0.36% : 0.000003s : 7: predicate.loop_unroll_after_grad 2.29% : 0.000016s : 126: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 79: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 30: predicate.merge_addn 1.11% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 65: predicate.minmaximum_grad 0.44% : 0.000003s : 7: predicate.mutable_eliminate 0.15% : 0.000001s : 7: predicate.opt_reshape 0.16% : 0.000001s : 7: predicate.parallel_virtual_node 2.16% : 0.000016s : 95: predicate.partial_defer_inline 1.72% : 0.000012s : 86: predicate.partial_eliminate 1.07% : 0.000008s : 65: predicate.print_const_string_wrapper 0.51% : 0.000004s : 30: predicate.reduce_all_const_elim 1.29% : 0.000009s : 65: predicate.reduce_eliminate 2.65% : 0.000019s : 158: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 30: predicate.remove_not_recompute_node 1.88% : 0.000014s : 144: predicate.replace_applicator 0.71% : 0.000005s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.06% : 0.000008s : 65: predicate.reshape_eliminate 1.14% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 7: predicate.row_tensor_eliminate 1.34% : 0.000010s : 65: predicate.same_eliminate 0.37% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 30: predicate.shard_identity_eliminate 0.26% : 0.000002s : 14: predicate.special_op_eliminate 0.61% : 0.000004s : 30: predicate.specialize_transform 1.41% : 0.000010s : 65: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 53: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 95: predicate.switch_defer_inline 2.87% : 0.000021s : 160: predicate.switch_layer_defer_inline 5.08% : 0.000037s : 258: predicate.switch_simplify 1.07% : 0.000008s : 65: predicate.tile_eliminate 1.07% : 0.000008s : 65: predicate.transpose_eliminate 1.43% : 0.000010s : 79: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 79: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 79: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000022s : 123: predicate.tuple_list_get_item_eliminator 1.41% : 0.000010s : 79: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000014s : 109: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 93: predicate.tuple_to_list_eliminator_ 2.56% : 0.000018s : 158: predicate.updatestate_pure_node_eliminater 3.17% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 7: predicate.value_based_eliminate 0.58% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 30: predicate.virtual_output_eliminate 0.13% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001712 32 57.83% : 0.000990s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.17% : 0.000722s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064643 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.99% : 0.003223s : 1: add_attr 4.97% : 0.003213s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000127s : 1: auto_monad 0.04% : 0.000027s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.86% : 0.000555s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000032s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.76% : 0.000491s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.02% : 0.000660s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.46% : 0.004823s : 117: opt.transform.opt_a 0.06% : 0.000042s : 1: opt.transform.opt_after_cconv 0.05% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.23% : 0.000149s : 28: opt.transform.opt_b 0.10% : 0.000067s : 2: opt.transform.opt_trans_graph 0.07% : 0.000048s : 4: opt.transform.symbol_engine_opt 18.19% : 0.011756s : 1: opt_a 0.21% : 0.000135s : 1: opt_after_cconv 0.81% : 0.000527s : 1: opt_after_jit_grad 0.43% : 0.000275s : 1: opt_b 22.05% : 0.014255s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000044s : 1: py_interpret_to_execute 0.03% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000035s : 1: remove_dup_value 2.67% : 0.001728s : 2: renormalize.infer 2.54% : 0.001639s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000053s : 1: rewriter_after_opt_a 0.21% : 0.000137s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.03% : 0.008426s : 1: task_emit 0.15% : 0.000100s : 1: tuple_transform 17.40% : 0.011247s : 1: type_inference 0.13% : 0.000083s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x1-kbk],max_mem:6.0M . TotalTime = 1.40671, [24] [bootstrap]: 0.00054792 [type_inference]: 0.00633495 [event_method]: 1.444e-05 [auto_monad]: 5.475e-05 [graph_reusing]: 5.57001e-06 [inline]: 1.69e-06 [add_attr]: 0.00349118, [1] [add_attr_with_inline]: 0.00347945, [1] [Cycle 1]: 5.166e-05, [2] [tag_attr]: 1.72e-05 [meta_addattr_fg_expand]: 4.33001e-06 [parallel-infer-symbol]: 3.16999e-06 [pre_auto_parallel]: 2.994e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.00412413, [53] [py_interpret_to_execute]: 2.166e-05 [rewriter_before_opt_a]: 5.943e-05 [opt_a]: 0.00221631, [2] [Cycle 1]: 0.00160876, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 3.303e-05 [loop_unroll]: 2.155e-05 [a_1]: 0.00048427 [with_stream_mark]: 1.391e-05 [recompute_prepare]: 8.11002e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.61998e-06 [a_2]: 7.556e-05 [accelerated_algorithm]: 6.36998e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 7.45e-06 [auto_parallel]: 6.17999e-06 [parallel]: 2.412e-05 [flash_sp]: 7.82e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 9.57001e-06 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 6.19999e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.66e-06 [merge_forward]: 4.07e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 9.44998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.29998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42997e-06 [meta_fg_expand]: 2.25002e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.91999e-06 [after_resolve]: 1.053e-05 [a_after_grad]: 9.15999e-06 [renormalize]: 0.00046279 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 2.02001e-06 [auto_monad_eliminator]: 1.453e-05 [cse]: 2.79e-05 [a_3]: 4.078e-05 [Cycle 2]: 0.0005975, [45] [expand_dump_flag]: 1.30001e-06 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012705 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.92002e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.851e-05 [accelerated_algorithm]: 5.56002e-06 [shard]: 1.26002e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.80999e-06 [auto_parallel]: 5.47001e-06 [parallel]: 4.12e-06 [flash_sp]: 3.69002e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.79999e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.20002e-06 [virtual_dataset]: 5.40001e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.37999e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 6.41e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.91e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.07998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.19998e-06 [after_resolve]: 9.74e-06 [a_after_grad]: 8.43001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.37e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 7.04001e-06 [cse]: 1.297e-05 [a_3]: 3.203e-05 [py_interpret_to_execute_after_opt_a]: 8.19002e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 3.2e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 4.63001e-06 [mutable_eliminate]: 0.00049263 [opt_b]: 0.0001851, [1] [Cycle 1]: 0.00017842, [7] [b_1]: 0.00010754 [b_2]: 6.83998e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 4.19997e-07 [cse]: 1.839e-05 [optimize_parallel_all_gather_comm]: 1.634e-05 [overlap_param_gather]: 2.39999e-06 [cconv]: 2.371e-05 [loop_unroll]: 0.00041988 [opt_after_cconv]: 9.497e-05, [1] [Cycle 1]: 8.884e-05, [7] [c_1]: 2.76e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.56002e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.596e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.262e-05 [tuple_transform]: 7.031e-05, [1] [Cycle 1]: 6.605e-05, [4] [d_1]: 4.037e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.12001e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 5.051e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.608e-05, [1] [cse]: 1.09e-05 [environ_conv]: 4.84998e-06 [swap_dp_allreduce_reducescatter]: 5.51e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.96999e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 1.32999e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.06997e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.236e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.711e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.96003e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.003e-05, [1] [Cycle 1]: 6.581e-05, [6] [build]: 2.89999e-06 [elim_shapecalc]: 9.20999e-06 [elim_not_effective]: 1.167e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 8.96002e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.11e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 9.90025e-07 [rewriter_after_jit_bprop_graph]: 4.03999e-06 [opt_after_jit_grad]: 0.00051614 [validate]: 3.24e-05 [backend_pass]: 8.2e-07 [task_emit]: 1.39129 [execute]: 8.90999e-06 Sums bootstrap : 0.000548s : 0.04% type_inference : 0.006335s : 0.45% event_method : 0.000014s : 0.00% auto_monad : 0.000055s : 0.00% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000030s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.00% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000611s : 0.04% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000463s : 0.03% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.00% optimize.opt_a.cse : 0.000041s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000493s : 0.04% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000420s : 0.03% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000516s : 0.04% validate : 0.000032s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.391286s : 99.22% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000195 30 13.67% : 0.000027s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.64% : 0.000001s : 2: substitution.fold_const_symbol 2.93% : 0.000006s : 4: substitution.graph_param_transform 70.19% : 0.000137s : 3: substitution.inline 1.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.37% : 0.000005s : 4: substitution.remove_not_recompute_node 2.23% : 0.000004s : 4: substitution.replace_old_param 5.47% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006227 2 90.95% : 0.005663s : 1: type_inference.infer 9.05% : 0.000564s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.67% : 0.000028s : 3: replace.inline 29.33% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000144 5 93.34% : 0.000135s : 3: match.inline 6.66% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.65% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.46% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.50% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.29% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.64% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 1.03% : 0.000002s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 45.72% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.28% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.415916 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.25% : 0.003496s : 1: add_attr 0.25% : 0.003483s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.00% : 0.000060s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.04% : 0.000587s : 1: bootstrap 0.00% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000016s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.03% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.04% : 0.000502s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.07% : 0.000981s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000045s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.16% : 0.002219s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.04% : 0.000527s : 1: opt_after_jit_grad 0.01% : 0.000188s : 1: opt_b 0.29% : 0.004128s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000035s : 1: pre_auto_parallel 0.00% : 0.000026s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000246s : 1: renormalize.infer 0.01% : 0.000210s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.00% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000073s : 1: symbol_engine_optimizer 98.26% : 1.391311s : 1: task_emit 0.01% : 0.000073s : 1: tuple_transform 0.45% : 0.006352s : 1: type_inference 0.00% : 0.000055s : 1: validate TotalTime = 0.209841, [24] [bootstrap]: 0.00048514 [type_inference]: 0.00446588 [event_method]: 1.093e-05 [auto_monad]: 5.078e-05 [graph_reusing]: 5.08002e-06 [inline]: 1.64e-06 [add_attr]: 0.00304163, [1] [add_attr_with_inline]: 0.00303381, [1] [Cycle 1]: 4.636e-05, [2] [tag_attr]: 1.196e-05 [meta_addattr_fg_expand]: 3.5e-06 [parallel-infer-symbol]: 3.43e-06 [pre_auto_parallel]: 2.164e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00386311, [53] [py_interpret_to_execute]: 1.52e-05 [rewriter_before_opt_a]: 4.037e-05 [opt_a]: 0.00205663, [2] [Cycle 1]: 0.00125578, [45] [expand_dump_flag]: 2.39001e-06 [switch_simplify]: 2.418e-05 [loop_unroll]: 1.344e-05 [a_1]: 0.00029528 [with_stream_mark]: 1.3e-05 [recompute_prepare]: 7.78999e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.825e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 2.63998e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 6.29999e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 5.92999e-06 [parallel]: 1.849e-05 [flash_sp]: 7.19001e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 3.2e-06 [matmul_add_comm_reduction]: 9.09998e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 5.61998e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.79002e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.37001e-06 [receive_attached]: 2.21998e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00034044 [add_forward_monad_depend]: 5.74e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.257e-05 [cse]: 2.644e-05 [a_3]: 4.007e-05 [Cycle 2]: 0.00079145, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.61999e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012722 [with_stream_mark]: 1.134e-05 [recompute_prepare]: 5.80002e-06 [updatestate_depend_eliminate]: 3.09999e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00018275 [accelerated_algorithm]: 6.96999e-06 [shard]: 1.29e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 5.47999e-06 [auto_parallel]: 6.33e-06 [parallel]: 4.42e-06 [flash_sp]: 3.74002e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.74e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.39999e-06 [virtual_dataset]: 5.43002e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.61999e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 6.11e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.02e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 1.66998e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 9.18002e-06 [a_after_grad]: 8.27e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.53002e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 7.40998e-06 [cse]: 1.578e-05 [a_3]: 3.257e-05 [py_interpret_to_execute_after_opt_a]: 7.55e-06 [slice_cell_reuse_recomputed_activation]: 1.64e-06 [rewriter_after_opt_a]: 3.12e-05 [convert_after_rewriter]: 6.58e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00045117 [opt_b]: 0.00018233, [1] [Cycle 1]: 0.00017613, [7] [b_1]: 0.00010906 [b_2]: 7.05998e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 2.3999e-07 [cse]: 1.596e-05 [optimize_parallel_all_gather_comm]: 1.538e-05 [overlap_param_gather]: 2.18002e-06 [cconv]: 2.213e-05 [loop_unroll]: 0.00041402 [opt_after_cconv]: 9.437e-05, [1] [Cycle 1]: 8.881e-05, [7] [c_1]: 2.793e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.537e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.21e-05 [tuple_transform]: 6.866e-05, [1] [Cycle 1]: 6.42e-05, [4] [d_1]: 3.889e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.10002e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.367e-05 [cse_after_recomputation]: 1.944e-05, [1] [Cycle 1]: 1.531e-05, [1] [cse]: 1.046e-05 [environ_conv]: 5.24e-06 [swap_dp_allreduce_reducescatter]: 4.92e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 3.8e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.56998e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.26998e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.177e-05 [grouped_pairwise_exchange_alltoall]: 1.85001e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 3.84002e-06 [overlap_grad_flash_sp]: 1.652e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.01998e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 6.823e-05, [1] [Cycle 1]: 6.411e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 9.05999e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 1.11002e-06 [rewriter_after_jit_bprop_graph]: 3.55998e-06 [opt_after_jit_grad]: 0.00044798 [validate]: 3.263e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.19716 [execute]: 9.05999e-06 Sums bootstrap : 0.000485s : 0.24% type_inference : 0.004466s : 2.17% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.02% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.02% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.01% optimize.opt_a.loop_unroll : 0.000019s : 0.01% optimize.opt_a.a_1 : 0.000423s : 0.21% optimize.opt_a.with_stream_mark : 0.000024s : 0.01% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000261s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.01% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000341s : 0.17% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.01% optimize.opt_a.cse : 0.000042s : 0.02% optimize.opt_a.a_3 : 0.000073s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.22% optimize.opt_b.b_1 : 0.000109s : 0.05% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.01% optimize.loop_unroll : 0.000414s : 0.20% optimize.opt_after_cconv.c_1 : 0.000028s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.02% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000448s : 0.22% validate : 0.000033s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.197160s : 95.82% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000121 26 18.76% : 0.000023s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.18% : 0.000005s : 4: substitution.graph_param_transform 65.40% : 0.000079s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.36% : 0.000004s : 4: substitution.remove_not_recompute_node 3.46% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004426 2 91.82% : 0.004064s : 1: type_inference.infer 8.18% : 0.000362s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000239 984 0.48% : 0.000001s : 9: predicate.accumulaten_eliminater 0.55% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.38% : 0.000001s : 8: predicate.addn_check_dump 0.41% : 0.000001s : 9: predicate.addn_zero_filter 0.40% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.43% : 0.000001s : 9: predicate.cast_eliminate 0.46% : 0.000001s : 8: predicate.check_bprop_eliminate 0.40% : 0.000001s : 8: predicate.compare_switch_simplify 0.17% : 0.000000s : 4: predicate.const_output_eliminate 0.46% : 0.000001s : 8: predicate.depend_value_elim 0.45% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.52% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.48% : 0.000001s : 9: predicate.dict_set_item_eliminator 0.77% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.19% : 0.000000s : 4: predicate.elim_not_effective 0.28% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.68% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.62% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.59% : 0.000001s : 13: predicate.environ_get_depend_swap 1.21% : 0.000003s : 21: predicate.environ_get_eliminate 0.60% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.55% : 0.000001s : 11: predicate.exchange_switch_depend_value 0.99% : 0.000002s : 11: predicate.float_depend_g_call 0.38% : 0.000001s : 8: predicate.float_environ_get_switch 0.57% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.15% : 0.000000s : 4: predicate.fold_const_symbol 0.49% : 0.000001s : 8: predicate.get_grad_eliminate 0.16% : 0.000000s : 4: predicate.graph_param_transform 0.46% : 0.000001s : 8: predicate.incorporate_call 0.38% : 0.000001s : 8: predicate.incorporate_call_switch 3.43% : 0.000008s : 44: predicate.inline 0.57% : 0.000001s : 8: predicate.inline_without_move 0.26% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.64% : 0.000002s : 8: predicate.less_batch_normalization 0.90% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.26% : 0.000003s : 26: predicate.load_eliminater 0.66% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.03% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.09% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.56% : 0.000001s : 8: predicate.merge_addn 0.42% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.44% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.40% : 0.000001s : 9: predicate.minmaximum_grad 0.75% : 0.000002s : 4: predicate.mutable_eliminate 0.26% : 0.000001s : 4: predicate.opt_reshape 0.33% : 0.000001s : 4: predicate.parallel_virtual_node 0.71% : 0.000002s : 11: predicate.partial_defer_inline 0.72% : 0.000002s : 13: predicate.partial_eliminate 0.47% : 0.000001s : 9: predicate.print_const_string_wrapper 0.42% : 0.000001s : 8: predicate.reduce_all_const_elim 0.57% : 0.000001s : 9: predicate.reduce_eliminate 1.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.42% : 0.000001s : 8: predicate.remove_not_recompute_node 0.77% : 0.000002s : 17: predicate.replace_applicator 0.46% : 0.000001s : 8: predicate.replace_old_param 0.22% : 0.000001s : 4: predicate.reset_defer_inline 0.42% : 0.000001s : 9: predicate.reshape_eliminate 0.46% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.26% : 0.000001s : 4: predicate.row_tensor_eliminate 0.54% : 0.000001s : 8: predicate.same_eliminate 0.36% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.53% : 0.000001s : 8: predicate.shard_identity_eliminate 0.52% : 0.000001s : 8: predicate.special_op_eliminate 43.19% : 0.000103s : 8: predicate.specialize_transform 0.63% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.54% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.26% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.59% : 0.000001s : 11: predicate.switch_defer_inline 1.01% : 0.000002s : 19: predicate.switch_layer_defer_inline 2.60% : 0.000006s : 41: predicate.switch_simplify 0.42% : 0.000001s : 9: predicate.tile_eliminate 0.46% : 0.000001s : 9: predicate.transpose_eliminate 0.88% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 0.84% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 0.78% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 1.70% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 0.80% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.40% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 0.83% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.18% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 1.74% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.32% : 0.000001s : 4: predicate.value_based_eliminate 0.62% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.48% : 0.000001s : 8: predicate.virtual_output_eliminate 0.22% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.34% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000260 6 43.73% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.27% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.218132 196 0.00% : 0.000003s : 1: ForceFp32Comm 1.40% : 0.003046s : 1: add_attr 1.39% : 0.003037s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.02% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.03% : 0.000056s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.24% : 0.000522s : 1: bootstrap 0.01% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.19% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.21% : 0.000460s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.41% : 0.000891s : 78: opt.transform.opt_a 0.01% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.04% : 0.000092s : 28: opt.transform.opt_b 0.02% : 0.000043s : 2: opt.transform.opt_trans_graph 0.01% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.94% : 0.002060s : 1: opt_a 0.04% : 0.000098s : 1: opt_after_cconv 0.21% : 0.000458s : 1: opt_after_jit_grad 0.09% : 0.000186s : 1: opt_b 1.77% : 0.003867s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000026s : 1: pre_auto_parallel 0.01% : 0.000019s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.08% : 0.000183s : 1: renormalize.infer 0.07% : 0.000151s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000035s : 1: rewriter_after_opt_a 0.02% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.03% : 0.000071s : 1: symbol_engine_optimizer 90.40% : 0.197183s : 1: task_emit 0.03% : 0.000071s : 1: tuple_transform 2.05% : 0.004480s : 1: type_inference 0.02% : 0.000054s : 1: validate TotalTime = 0.0592088, [24] [bootstrap]: 0.00049913 [type_inference]: 0.00567735 [event_method]: 1.45e-05 [auto_monad]: 5.694e-05 [graph_reusing]: 6.14999e-06 [inline]: 2.21e-06 [add_attr]: 0.00301059, [1] [add_attr_with_inline]: 0.00300258, [1] [Cycle 1]: 4.712e-05, [2] [tag_attr]: 1.575e-05 [meta_addattr_fg_expand]: 4.55001e-06 [parallel-infer-symbol]: 2.94001e-06 [pre_auto_parallel]: 2.898e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.89999e-06 [optimize]: 0.00445002, [53] [py_interpret_to_execute]: 2.36e-05 [rewriter_before_opt_a]: 6.015e-05 [opt_a]: 0.00258454, [2] [Cycle 1]: 0.00197611, [45] [expand_dump_flag]: 3.4e-06 [switch_simplify]: 3.299e-05 [loop_unroll]: 2.093e-05 [a_1]: 0.00045706 [with_stream_mark]: 1.39e-05 [recompute_prepare]: 7.92e-06 [updatestate_depend_eliminate]: 3.95998e-06 [updatestate_assign_eliminate]: 3.14001e-06 [updatestate_loads_eliminate]: 3.2e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 0.00047306 [accelerated_algorithm]: 7.31001e-06 [shard]: 2.09999e-06 [meta_shard_fg_expand]: 2.09e-06 [shard_inline]: 6.15002e-06 [merge_send_recv]: 9.86998e-06 [auto_parallel]: 6.79999e-06 [parallel]: 1.917e-05 [flash_sp]: 7.68001e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.86001e-06 [matmul_add_comm_reduction]: 1.028e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.43e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.81003e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 9.65002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47997e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 1.076e-05 [a_after_grad]: 8.59e-06 [renormalize]: 0.00045535 [add_forward_monad_depend]: 4.77998e-06 [auto_monad_grad]: 2.09e-06 [auto_monad_eliminator]: 1.417e-05 [cse]: 3.057e-05 [a_3]: 4.155e-05 [Cycle 2]: 0.00059919, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.88998e-06 [loop_unroll]: 5.43002e-06 [a_1]: 0.00012555 [with_stream_mark]: 9.87999e-06 [recompute_prepare]: 5.68002e-06 [updatestate_depend_eliminate]: 2.99001e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.26e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 6.84e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 5.67001e-06 [parallel]: 3.86999e-06 [flash_sp]: 3.32002e-06 [merge_comm]: 3.09001e-06 [allreduce_fusion]: 3.16001e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 6.20002e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.55997e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 6.57002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.33999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.359e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 8.58001e-06 [slice_cell_reuse_recomputed_activation]: 2.71999e-06 [rewriter_after_opt_a]: 3.323e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.17999e-06 [mutable_eliminate]: 0.00045823 [opt_b]: 0.00018592, [1] [Cycle 1]: 0.00017996, [7] [b_1]: 0.00011027 [b_2]: 7.4e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 5.50004e-07 [cse]: 1.645e-05 [optimize_parallel_all_gather_comm]: 1.639e-05 [overlap_param_gather]: 2.27999e-06 [cconv]: 2.356e-05 [loop_unroll]: 0.00041703 [opt_after_cconv]: 9.653e-05, [1] [Cycle 1]: 9.086e-05, [7] [c_1]: 2.796e-05 [parameter_eliminate]: 2.66e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.668e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.242e-05 [tuple_transform]: 6.966e-05, [1] [Cycle 1]: 6.529e-05, [4] [d_1]: 3.945e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 4.449e-05 [cse_after_recomputation]: 2.056e-05, [1] [Cycle 1]: 1.607e-05, [1] [cse]: 1.108e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.00998e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.46998e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.29998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.42999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.04e-06 [control_data_broadcast_order]: 1.183e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.3e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.711e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.18002e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 6.79e-05, [1] [Cycle 1]: 6.397e-05, [6] [build]: 2.78e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.148e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 8.65001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.632e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.00047475 [validate]: 3.197e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.0447034 [execute]: 9.04e-06 Sums bootstrap : 0.000499s : 0.90% type_inference : 0.005677s : 10.28% event_method : 0.000014s : 0.03% auto_monad : 0.000057s : 0.10% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000029s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.04% optimize.rewriter_before_opt_a : 0.000060s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.05% optimize.opt_a.a_1 : 0.000583s : 1.06% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000541s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.03% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000455s : 0.82% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000044s : 0.08% optimize.opt_a.a_3 : 0.000074s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.83% optimize.opt_b.b_1 : 0.000110s : 0.20% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000417s : 0.76% optimize.opt_after_cconv.c_1 : 0.000028s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000475s : 0.86% validate : 0.000032s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.044703s : 80.96% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000172 30 16.14% : 0.000028s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 65.56% : 0.000113s : 3: substitution.inline 2.04% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.36% : 0.000004s : 4: substitution.replace_old_param 6.43% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005634 2 89.61% : 0.005049s : 1: type_inference.infer 10.39% : 0.000585s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.99% : 0.000027s : 3: replace.inline 30.01% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.76% : 0.000110s : 3: match.inline 8.24% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.37% : 0.000001s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.74% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.21% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000002s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.47% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.26% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000361 8 45.90% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.10% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068627 196 0.00% : 0.000003s : 1: ForceFp32Comm 4.39% : 0.003015s : 1: add_attr 4.38% : 0.003006s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000062s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.78% : 0.000536s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.62% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.68% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.96% : 0.001347s : 78: opt.transform.opt_a 0.04% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.05% : 0.000032s : 4: opt.transform.symbol_engine_opt 3.77% : 0.002587s : 1: opt_a 0.15% : 0.000100s : 1: opt_after_cconv 0.71% : 0.000485s : 1: opt_after_jit_grad 0.28% : 0.000189s : 1: opt_b 6.49% : 0.004454s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.05% : 0.000034s : 1: pre_auto_parallel 0.04% : 0.000028s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.33% : 0.000228s : 1: renormalize.infer 0.32% : 0.000220s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.09% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000070s : 1: symbol_engine_optimizer 65.17% : 0.044721s : 1: task_emit 0.11% : 0.000072s : 1: tuple_transform 8.29% : 0.005692s : 1: type_inference 0.08% : 0.000054s : 1: validate TotalTime = 0.0858298, [24] [bootstrap]: 0.00054604 [type_inference]: 0.0119959 [event_method]: 7.174e-05 [auto_monad]: 0.00012851 [graph_reusing]: 8.95999e-06 [inline]: 1.66e-06 [add_attr]: 0.00310895, [1] [add_attr_with_inline]: 0.00310074, [1] [Cycle 1]: 7.429e-05, [2] [tag_attr]: 3.631e-05 [meta_addattr_fg_expand]: 1.027e-05 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 5.217e-05 [insert-virtual-dataset]: 2.44999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.93997e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0135518, [53] [py_interpret_to_execute]: 3.73e-05 [rewriter_before_opt_a]: 0.00014969 [opt_a]: 0.0112545, [3] [Cycle 1]: 0.00746342, [45] [expand_dump_flag]: 4.33001e-06 [switch_simplify]: 7.638e-05 [loop_unroll]: 6.356e-05 [a_1]: 0.00152032 [with_stream_mark]: 2.435e-05 [recompute_prepare]: 2.291e-05 [updatestate_depend_eliminate]: 9.69999e-06 [updatestate_assign_eliminate]: 8.11002e-06 [updatestate_loads_eliminate]: 7.52002e-06 [parameter_eliminate]: 2.61e-06 [a_2]: 0.00024965 [accelerated_algorithm]: 3.205e-05 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 3.71999e-06 [shard_inline]: 1.66e-05 [merge_send_recv]: 1.646e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.947e-05 [flash_sp]: 1.219e-05 [merge_comm]: 1.042e-05 [allreduce_fusion]: 9.26998e-06 [matmul_add_comm_reduction]: 2.843e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.849e-05 [virtual_dataset]: 1.653e-05 [get_grad_eliminate_]: 1.544e-05 [virtual_output]: 1.533e-05 [merge_forward]: 9.82999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 1.859e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.873e-05 [merge_recompute_call_nodes]: 1.87999e-06 [before_grad]: 2.804e-05 [set_forward_comm_id_for_comm_node_pass]: 1.017e-05 [meta_fg_expand]: 0.00151663 [flash_sp_send_recv_attached]: 4.03999e-06 [receive_attached]: 2.84001e-06 [after_resolve]: 6.15e-05 [a_after_grad]: 8.416e-05 [renormalize]: 0.00261028 [add_forward_monad_depend]: 9.82999e-06 [auto_monad_grad]: 5.69e-06 [auto_monad_eliminator]: 5.798e-05 [cse]: 0.00017811 [a_3]: 0.00033582 [Cycle 2]: 0.00295802, [45] [expand_dump_flag]: 1.94e-06 [switch_simplify]: 4.743e-05 [loop_unroll]: 4.379e-05 [a_1]: 0.00156752 [with_stream_mark]: 1.208e-05 [recompute_prepare]: 1.024e-05 [updatestate_depend_eliminate]: 4.48001e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 3.15998e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 0.00011226 [accelerated_algorithm]: 1.193e-05 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 2.04e-06 [shard_inline]: 8.64998e-06 [merge_send_recv]: 6.29001e-06 [auto_parallel]: 7.03e-06 [parallel]: 5.14e-06 [flash_sp]: 3.28e-06 [merge_comm]: 5.27999e-06 [allreduce_fusion]: 4.61002e-06 [matmul_add_comm_reduction]: 7.9e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 9.57001e-06 [virtual_dataset]: 8.22e-06 [get_grad_eliminate_]: 8.02e-06 [virtual_output]: 7.85e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 8.10999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.481e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.232e-05 [set_forward_comm_id_for_comm_node_pass]: 4.84e-06 [meta_fg_expand]: 7.536e-05 [flash_sp_send_recv_attached]: 1.07998e-06 [receive_attached]: 1.20001e-06 [after_resolve]: 1.518e-05 [a_after_grad]: 1.326e-05 [renormalize]: 0.00055616 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.32e-06 [auto_monad_eliminator]: 1.413e-05 [cse]: 2.343e-05 [a_3]: 5.864e-05 [Cycle 3]: 0.00081809, [45] [expand_dump_flag]: 1.06002e-06 [switch_simplify]: 1.008e-05 [loop_unroll]: 8.05999e-06 [a_1]: 0.0002177 [with_stream_mark]: 9.31e-06 [recompute_prepare]: 8.62998e-06 [updatestate_depend_eliminate]: 4.33001e-06 [updatestate_assign_eliminate]: 3.55998e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00011024 [accelerated_algorithm]: 1.117e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 8.32e-06 [merge_send_recv]: 5.89999e-06 [auto_parallel]: 6.54999e-06 [parallel]: 4.90999e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 4.46002e-06 [allreduce_fusion]: 4.05e-06 [matmul_add_comm_reduction]: 6.91999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.07999e-06 [virtual_dataset]: 7.81001e-06 [get_grad_eliminate_]: 7.71999e-06 [virtual_output]: 7.43999e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.32e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.422e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.222e-05 [set_forward_comm_id_for_comm_node_pass]: 4.50001e-06 [meta_fg_expand]: 2.83e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.246e-05 [a_after_grad]: 1.289e-05 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 8.74998e-06 [cse]: 2.091e-05 [a_3]: 5.046e-05 [py_interpret_to_execute_after_opt_a]: 1.01e-05 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 4.339e-05 [convert_after_rewriter]: 8.87e-06 [order_py_execute_after_rewriter]: 6.76999e-06 [mutable_eliminate]: 0.00049664 [opt_b]: 0.00026924, [1] [Cycle 1]: 0.00026224, [7] [b_1]: 0.00016825 [b_2]: 1.718e-05 [updatestate_depend_eliminate]: 6.79999e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 3.69002e-06 [renormalize]: 2.59985e-07 [cse]: 2.546e-05 [optimize_parallel_all_gather_comm]: 1.897e-05 [overlap_param_gather]: 1.81003e-06 [cconv]: 2.111e-05 [loop_unroll]: 0.00046666 [opt_after_cconv]: 0.00012576, [1] [Cycle 1]: 0.00012006, [7] [c_1]: 4.435e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 6.73e-06 [updatestate_assign_eliminate]: 3.77998e-06 [updatestate_loads_eliminate]: 3.55e-06 [cse]: 2.494e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 2.832e-05 [tuple_transform]: 9.604e-05, [1] [Cycle 1]: 9.116e-05, [4] [d_1]: 6.121e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 3.39991e-07 [switch_simplify]: 9.02e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.806e-05 [cse_after_recomputation]: 2.915e-05, [1] [Cycle 1]: 2.416e-05, [1] [cse]: 1.823e-05 [environ_conv]: 8.32e-06 [swap_dp_allreduce_reducescatter]: 6.98e-06 [bias_add_comm_swap]: 2.84001e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.43998e-06 [micro_interleaved_order_control]: 2.13998e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.12999e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.40999e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.24998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.573e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 4.62998e-06 [overlap_recompute_and_grad_model_parallel]: 5.32001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46002e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 4.92e-06 [overlap_grad_flash_sp]: 2.203e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 2.11e-06 [handle_group_info]: 9.00007e-07 [symbol_engine_optimizer]: 9.301e-05, [1] [Cycle 1]: 8.854e-05, [6] [build]: 9.36e-06 [elim_shapecalc]: 1.251e-05 [elim_not_effective]: 1.638e-05 [opt_reshape]: 8.97999e-06 [fold_const_symbol]: 1.296e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.16e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.231e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.74002e-06 [opt_after_jit_grad]: 0.00047389 [validate]: 4.035e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.0555787 [execute]: 9.19e-06 Sums bootstrap : 0.000546s : 0.67% type_inference : 0.011996s : 14.73% event_method : 0.000072s : 0.09% auto_monad : 0.000129s : 0.16% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000052s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.05% optimize.rewriter_before_opt_a : 0.000150s : 0.18% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000134s : 0.16% optimize.opt_a.loop_unroll : 0.000115s : 0.14% optimize.opt_a.a_1 : 0.003306s : 4.06% optimize.opt_a.with_stream_mark : 0.000046s : 0.06% optimize.opt_a.recompute_prepare : 0.000042s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000472s : 0.58% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.07% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.04% optimize.opt_a.merge_send_recv : 0.000029s : 0.04% optimize.opt_a.auto_parallel : 0.000025s : 0.03% optimize.opt_a.parallel : 0.000030s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.05% optimize.opt_a.virtual_dataset : 0.000033s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.04% optimize.opt_a.virtual_output : 0.000031s : 0.04% optimize.opt_a.merge_forward : 0.000017s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000058s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000053s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001595s : 1.96% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.11% optimize.opt_a.a_after_grad : 0.000110s : 0.14% optimize.opt_a.renormalize : 0.003167s : 3.89% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.10% optimize.opt_a.cse : 0.000222s : 0.27% optimize.opt_a.a_3 : 0.000445s : 0.55% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000043s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000497s : 0.61% optimize.opt_b.b_1 : 0.000168s : 0.21% optimize.opt_b.b_2 : 0.000017s : 0.02% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000025s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000467s : 0.57% optimize.opt_after_cconv.c_1 : 0.000044s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000061s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.07% optimize.cse_after_recomputation.cse : 0.000018s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000022s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000022s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000474s : 0.58% validate : 0.000040s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.055579s : 68.24% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000818 213 5.77% : 0.000047s : 12: substitution.arithmetic_simplify 0.32% : 0.000003s : 4: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 4: substitution.fold_const_symbol 0.87% : 0.000007s : 7: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 59.55% : 0.000487s : 17: substitution.inline 2.00% : 0.000016s : 2: substitution.inline_without_move 1.16% : 0.000009s : 18: substitution.j_node_and_user_rematch 1.94% : 0.000016s : 3: substitution.less_batch_normalization 1.64% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000006s : 5: substitution.partial_eliminate 1.57% : 0.000013s : 18: substitution.remove_not_recompute_node 3.10% : 0.000025s : 10: substitution.replace_applicator 1.24% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.43% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.70% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.24% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.31% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011916 2 85.91% : 0.010236s : 1: type_inference.infer 14.09% : 0.001679s : 1: type_inference.specialize ------[replace.] 0.000236 33 57.70% : 0.000136s : 17: replace.inline 42.30% : 0.000100s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000513 33 93.19% : 0.000478s : 17: match.inline 6.81% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000733 5530 1.07% : 0.000008s : 66: predicate.accumulaten_eliminater 0.26% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 30: predicate.addn_check_dump 1.08% : 0.000008s : 66: predicate.addn_zero_filter 1.05% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 2.13% : 0.000016s : 96: predicate.arithmetic_simplify 1.12% : 0.000008s : 66: predicate.cast_eliminate 1.13% : 0.000008s : 65: predicate.check_bprop_eliminate 0.50% : 0.000004s : 30: predicate.compare_switch_simplify 0.09% : 0.000001s : 7: predicate.const_output_eliminate 0.50% : 0.000004s : 30: predicate.depend_value_elim 1.19% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 66: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 66: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 7: predicate.elim_not_effective 0.15% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 73: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 73: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 73: predicate.environ_get_depend_swap 1.74% : 0.000013s : 103: predicate.environ_get_eliminate 1.16% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.40% : 0.000018s : 99: predicate.float_depend_g_call 0.49% : 0.000004s : 30: predicate.float_environ_get_switch 0.64% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.53% : 0.000004s : 30: predicate.get_grad_eliminate 0.11% : 0.000001s : 7: predicate.graph_param_transform 0.54% : 0.000004s : 30: predicate.incorporate_call 0.48% : 0.000003s : 30: predicate.incorporate_call_switch 5.66% : 0.000041s : 239: predicate.inline 1.32% : 0.000010s : 53: predicate.inline_without_move 0.31% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 30: predicate.less_batch_normalization 1.61% : 0.000012s : 96: predicate.list_to_tuple_eliminator_ 2.62% : 0.000019s : 162: predicate.load_eliminater 0.32% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.37% : 0.000017s : 134: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 80: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 30: predicate.merge_addn 1.10% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 66: predicate.minmaximum_grad 0.32% : 0.000002s : 7: predicate.mutable_eliminate 0.14% : 0.000001s : 7: predicate.opt_reshape 0.14% : 0.000001s : 7: predicate.parallel_virtual_node 2.05% : 0.000015s : 99: predicate.partial_defer_inline 1.77% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 66: predicate.print_const_string_wrapper 0.50% : 0.000004s : 30: predicate.reduce_all_const_elim 1.38% : 0.000010s : 66: predicate.reduce_eliminate 2.63% : 0.000019s : 162: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 30: predicate.remove_not_recompute_node 1.88% : 0.000014s : 147: predicate.replace_applicator 0.63% : 0.000005s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.07% : 0.000008s : 66: predicate.reshape_eliminate 1.12% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 7: predicate.row_tensor_eliminate 1.24% : 0.000009s : 65: predicate.same_eliminate 0.36% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 30: predicate.shard_identity_eliminate 0.29% : 0.000002s : 14: predicate.special_op_eliminate 0.61% : 0.000004s : 30: predicate.specialize_transform 1.29% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 53: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 99: predicate.switch_defer_inline 2.96% : 0.000022s : 164: predicate.switch_layer_defer_inline 5.03% : 0.000037s : 270: predicate.switch_simplify 1.07% : 0.000008s : 66: predicate.tile_eliminate 1.07% : 0.000008s : 66: predicate.transpose_eliminate 1.54% : 0.000011s : 80: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 80: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 80: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000022s : 126: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 80: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000015s : 110: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 96: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 162: predicate.updatestate_pure_node_eliminater 3.20% : 0.000023s : 192: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 7: predicate.value_based_eliminate 0.54% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 30: predicate.virtual_output_eliminate 0.11% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001760 34 55.08% : 0.000970s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.92% : 0.000791s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.110862 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.81% : 0.003113s : 1: add_attr 2.80% : 0.003105s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000136s : 1: auto_monad 0.02% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.52% : 0.000581s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000019s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000032s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.07% : 0.000081s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.43% : 0.000476s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000506s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 4.44% : 0.004922s : 117: opt.transform.opt_a 0.04% : 0.000043s : 1: opt.transform.opt_after_cconv 0.03% : 0.000031s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000159s : 28: opt.transform.opt_b 0.06% : 0.000068s : 2: opt.transform.opt_trans_graph 0.04% : 0.000047s : 4: opt.transform.symbol_engine_opt 10.15% : 0.011258s : 1: opt_a 0.12% : 0.000129s : 1: opt_after_cconv 0.44% : 0.000484s : 1: opt_after_jit_grad 0.25% : 0.000273s : 1: opt_b 12.23% : 0.013555s : 1: optimize 0.02% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.05% : 0.000057s : 1: pre_auto_parallel 0.04% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.46% : 0.001614s : 2: renormalize.infer 1.39% : 0.001539s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000047s : 1: rewriter_after_opt_a 0.14% : 0.000154s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000096s : 1: symbol_engine_optimizer 50.15% : 0.055599s : 1: task_emit 0.09% : 0.000099s : 1: tuple_transform 10.83% : 0.012012s : 1: type_inference 0.06% : 0.000066s : 1: validate TotalTime = 0.0553609, [24] [bootstrap]: 0.00046535 [type_inference]: 0.00440977 [event_method]: 1.123e-05 [auto_monad]: 5.134e-05 [graph_reusing]: 5.27001e-06 [inline]: 1.99999e-06 [add_attr]: 0.00307381, [1] [add_attr_with_inline]: 0.00306523, [1] [Cycle 1]: 4.624e-05, [2] [tag_attr]: 1.277e-05 [meta_addattr_fg_expand]: 3.26999e-06 [parallel-infer-symbol]: 2.65002e-06 [pre_auto_parallel]: 2.196e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 9.50007e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00379443, [53] [py_interpret_to_execute]: 1.526e-05 [rewriter_before_opt_a]: 3.883e-05 [opt_a]: 0.00191632, [2] [Cycle 1]: 0.00129842, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 2.446e-05 [loop_unroll]: 1.407e-05 [a_1]: 0.00030228 [with_stream_mark]: 1.245e-05 [recompute_prepare]: 7.58999e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.55999e-06 [a_2]: 7.864e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.49998e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.38999e-06 [auto_parallel]: 5.72001e-06 [parallel]: 1.848e-05 [flash_sp]: 7.73999e-06 [merge_comm]: 3.44001e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 8.87999e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.66999e-06 [virtual_dataset]: 6.13002e-06 [get_grad_eliminate_]: 5.96e-06 [virtual_output]: 5.81998e-06 [merge_forward]: 3.5e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 8.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.202e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.62999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 1.134e-05 [a_after_grad]: 8.79e-06 [renormalize]: 0.00036232 [add_forward_monad_depend]: 4.63001e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.355e-05 [cse]: 2.896e-05 [a_3]: 4.211e-05 [Cycle 2]: 0.00060837, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.79999e-06 [a_1]: 0.00012859 [with_stream_mark]: 1.181e-05 [recompute_prepare]: 5.95002e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 7.50006e-07 [a_2]: 7.09e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.11002e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 4.31002e-06 [auto_parallel]: 5.17999e-06 [parallel]: 4.78001e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 5.49e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.32001e-06 [virtual_dataset]: 5.53002e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 5.24e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.20002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86998e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.3e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 9.43002e-06 [a_after_grad]: 8.16002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.47001e-06 [cse]: 1.275e-05 [a_3]: 3.326e-05 [py_interpret_to_execute_after_opt_a]: 7.86001e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.054e-05 [convert_after_rewriter]: 6.71999e-06 [order_py_execute_after_rewriter]: 4.90001e-06 [mutable_eliminate]: 0.00045289 [opt_b]: 0.0001861, [1] [Cycle 1]: 0.00018006, [7] [b_1]: 0.00011085 [b_2]: 7.69002e-06 [updatestate_depend_eliminate]: 5.24998e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 3.80009e-07 [cse]: 1.632e-05 [optimize_parallel_all_gather_comm]: 1.607e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.242e-05 [loop_unroll]: 0.0004636 [opt_after_cconv]: 9.77e-05, [1] [Cycle 1]: 9.175e-05, [7] [c_1]: 2.918e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.41002e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.41998e-06 [cse]: 1.576e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.323e-05 [tuple_transform]: 7.023e-05, [1] [Cycle 1]: 6.606e-05, [4] [d_1]: 3.966e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.59999e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.507e-05 [cse_after_recomputation]: 2.067e-05, [1] [Cycle 1]: 1.615e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.14998e-06 [bias_add_comm_swap]: 2.97002e-06 [label_micro_interleaved_index]: 3.81001e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 9.49978e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.43002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71998e-06 [control_data_broadcast_order]: 1.184e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 4.04002e-06 [overlap_recompute_and_grad_model_parallel]: 5.19998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.79e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.984e-05, [1] [Cycle 1]: 6.529e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.48001e-06 [elim_not_effective]: 1.185e-05 [opt_reshape]: 6.29999e-06 [fold_const_symbol]: 8.95001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.603e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00045752 [validate]: 3.042e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0427912 [execute]: 8.33999e-06 Sums bootstrap : 0.000465s : 0.91% type_inference : 0.004410s : 8.59% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000022s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000039s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.06% optimize.opt_a.loop_unroll : 0.000020s : 0.04% optimize.opt_a.a_1 : 0.000431s : 0.84% optimize.opt_a.with_stream_mark : 0.000024s : 0.05% optimize.opt_a.recompute_prepare : 0.000014s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.05% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.03% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000362s : 0.71% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000042s : 0.08% optimize.opt_a.a_3 : 0.000075s : 0.15% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.88% optimize.opt_b.b_1 : 0.000111s : 0.22% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000464s : 0.90% optimize.opt_after_cconv.c_1 : 0.000029s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.03% optimize.tuple_transform.d_1 : 0.000040s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000458s : 0.89% validate : 0.000030s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042791s : 83.40% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000124 26 17.85% : 0.000022s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.31% : 0.000005s : 4: substitution.graph_param_transform 65.76% : 0.000082s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.01% : 0.000005s : 4: substitution.remove_not_recompute_node 3.44% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004367 2 91.46% : 0.003995s : 1: type_inference.infer 8.54% : 0.000373s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000142 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.87% : 0.000001s : 8: predicate.check_bprop_eliminate 0.71% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.41% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.96% : 0.000003s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.17% : 0.000002s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.80% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.50% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.03% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.88% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.14% : 0.000002s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.16% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.33% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.74% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000260 6 41.03% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.97% : 0.000154s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063548 196 0.01% : 0.000003s : 1: ForceFp32Comm 4.84% : 0.003078s : 1: add_attr 4.83% : 0.003069s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000056s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.79% : 0.000502s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000472s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.25% : 0.000795s : 78: opt.transform.opt_a 0.04% : 0.000028s : 1: opt.transform.opt_after_cconv 0.04% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000094s : 28: opt.transform.opt_b 0.07% : 0.000044s : 2: opt.transform.opt_trans_graph 0.05% : 0.000032s : 4: opt.transform.symbol_engine_opt 3.02% : 0.001919s : 1: opt_a 0.16% : 0.000101s : 1: opt_after_cconv 0.74% : 0.000467s : 1: opt_after_jit_grad 0.30% : 0.000190s : 1: opt_b 5.98% : 0.003798s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000026s : 1: pre_auto_parallel 0.03% : 0.000019s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000017s : 1: remove_dup_value 0.31% : 0.000194s : 1: renormalize.infer 0.25% : 0.000162s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000072s : 1: symbol_engine_optimizer 67.37% : 0.042809s : 1: task_emit 0.12% : 0.000073s : 1: tuple_transform 6.96% : 0.004424s : 1: type_inference 0.08% : 0.000052s : 1: validate TotalTime = 0.167163, [24] [bootstrap]: 0.00051521 [type_inference]: 0.0107634 [event_method]: 4.719e-05 [auto_monad]: 0.00012168 [graph_reusing]: 8.48999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00310027, [1] [add_attr_with_inline]: 0.00309164, [1] [Cycle 1]: 7.079e-05, [2] [tag_attr]: 3.313e-05 [meta_addattr_fg_expand]: 9.29e-06 [parallel-infer-symbol]: 2.93998e-06 [pre_auto_parallel]: 4.789e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 7.90023e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.46002e-06 [optimize]: 0.0138857, [53] [py_interpret_to_execute]: 3.556e-05 [rewriter_before_opt_a]: 0.00013204 [opt_a]: 0.0113843, [3] [Cycle 1]: 0.00728607, [45] [expand_dump_flag]: 3.82998e-06 [switch_simplify]: 6.719e-05 [loop_unroll]: 5.616e-05 [a_1]: 0.00139502 [with_stream_mark]: 2.341e-05 [recompute_prepare]: 2.245e-05 [updatestate_depend_eliminate]: 9.57999e-06 [updatestate_assign_eliminate]: 8.03001e-06 [updatestate_loads_eliminate]: 7.53999e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 0.00024878 [accelerated_algorithm]: 3.191e-05 [shard]: 1.83997e-06 [meta_shard_fg_expand]: 3.66999e-06 [shard_inline]: 1.67e-05 [merge_send_recv]: 1.726e-05 [auto_parallel]: 1.116e-05 [parallel]: 1.789e-05 [flash_sp]: 1.147e-05 [merge_comm]: 9.62999e-06 [allreduce_fusion]: 9.22999e-06 [matmul_add_comm_reduction]: 2.64e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.859e-05 [virtual_dataset]: 1.635e-05 [get_grad_eliminate_]: 1.561e-05 [virtual_output]: 1.524e-05 [merge_forward]: 9.74e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 1.792e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.898e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 2.772e-05 [set_forward_comm_id_for_comm_node_pass]: 9.96e-06 [meta_fg_expand]: 0.00148595 [flash_sp_send_recv_attached]: 3.93999e-06 [receive_attached]: 2.81e-06 [after_resolve]: 6.098e-05 [a_after_grad]: 8.384e-05 [renormalize]: 0.00259197 [add_forward_monad_depend]: 1.042e-05 [auto_monad_grad]: 6.69001e-06 [auto_monad_eliminator]: 5.889e-05 [cse]: 0.00018532 [a_3]: 0.00034467 [Cycle 2]: 0.0032367, [45] [expand_dump_flag]: 1.66e-06 [switch_simplify]: 4.747e-05 [loop_unroll]: 4.405e-05 [a_1]: 0.00159138 [with_stream_mark]: 1.684e-05 [recompute_prepare]: 1.101e-05 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.27997e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 0.00014446 [accelerated_algorithm]: 1.213e-05 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 2.40002e-06 [shard_inline]: 8.71002e-06 [merge_send_recv]: 7.88001e-06 [auto_parallel]: 8.08999e-06 [parallel]: 6.77002e-06 [flash_sp]: 4.27e-06 [merge_comm]: 6.41e-06 [allreduce_fusion]: 5.40001e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 9.89999e-06 [virtual_dataset]: 9.02e-06 [get_grad_eliminate_]: 8.05e-06 [virtual_output]: 8.03001e-06 [merge_forward]: 4.56002e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 9.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.564e-05 [merge_recompute_call_nodes]: 8.49977e-07 [before_grad]: 1.33e-05 [set_forward_comm_id_for_comm_node_pass]: 4.87998e-06 [meta_fg_expand]: 5.889e-05 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.23002e-06 [after_resolve]: 1.51e-05 [a_after_grad]: 1.34e-05 [renormalize]: 0.00075226 [add_forward_monad_depend]: 4.71997e-06 [auto_monad_grad]: 1.50999e-06 [auto_monad_eliminator]: 1.477e-05 [cse]: 2.804e-05 [a_3]: 6.167e-05 [Cycle 3]: 0.00084562, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 9.77999e-06 [loop_unroll]: 8.67e-06 [a_1]: 0.00022245 [with_stream_mark]: 1.028e-05 [recompute_prepare]: 8.54e-06 [updatestate_depend_eliminate]: 4.53999e-06 [updatestate_assign_eliminate]: 3.67002e-06 [updatestate_loads_eliminate]: 3.73999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 0.00010937 [accelerated_algorithm]: 1.154e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 8.40999e-06 [merge_send_recv]: 6.59999e-06 [auto_parallel]: 7.24001e-06 [parallel]: 6.31e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.4e-06 [allreduce_fusion]: 4.30999e-06 [matmul_add_comm_reduction]: 7.61001e-06 [allreduce_slice_to_reducescatter]: 4.40021e-07 [virtual_shard_identity]: 9.86e-06 [virtual_dataset]: 8.22998e-06 [get_grad_eliminate_]: 7.72002e-06 [virtual_output]: 7.90998e-06 [merge_forward]: 3.75998e-06 [cell_reuse_recompute_pass]: 1.72999e-06 [offload_activation]: 8.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.524e-05 [merge_recompute_call_nodes]: 8.59989e-07 [before_grad]: 1.266e-05 [set_forward_comm_id_for_comm_node_pass]: 4.65999e-06 [meta_fg_expand]: 3.13e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.29e-06 [after_resolve]: 1.305e-05 [a_after_grad]: 1.313e-05 [renormalize]: 1.19995e-07 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 9.56003e-06 [cse]: 2.252e-05 [a_3]: 5.195e-05 [py_interpret_to_execute_after_opt_a]: 1.252e-05 [slice_cell_reuse_recomputed_activation]: 1.72001e-06 [rewriter_after_opt_a]: 4.597e-05 [convert_after_rewriter]: 9.47001e-06 [order_py_execute_after_rewriter]: 6.11998e-06 [mutable_eliminate]: 0.00059979 [opt_b]: 0.00029348, [1] [Cycle 1]: 0.00028658, [7] [b_1]: 0.0001914 [b_2]: 1.132e-05 [updatestate_depend_eliminate]: 7.56999e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 3.73001e-06 [renormalize]: 6.50005e-07 [cse]: 2.918e-05 [optimize_parallel_all_gather_comm]: 1.979e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.412e-05 [loop_unroll]: 0.00047889 [opt_after_cconv]: 0.00013296, [1] [Cycle 1]: 0.0001268, [7] [c_1]: 4.522e-05 [parameter_eliminate]: 3.44001e-06 [updatestate_depend_eliminate]: 7.38e-06 [updatestate_assign_eliminate]: 3.81999e-06 [updatestate_loads_eliminate]: 3.65998e-06 [cse]: 2.715e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 3.014e-05 [tuple_transform]: 0.00013845, [1] [Cycle 1]: 9.8e-05, [4] [d_1]: 6.502e-05 [none_parameter_eliminate]: 2.02001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.57001e-06 [partial_unused_args_eliminate]: 2.06e-06 [add_recomputation]: 6.109e-05 [cse_after_recomputation]: 3.211e-05, [1] [Cycle 1]: 2.633e-05, [1] [cse]: 2.066e-05 [environ_conv]: 9.05999e-06 [swap_dp_allreduce_reducescatter]: 7.31999e-06 [bias_add_comm_swap]: 2.93003e-06 [label_micro_interleaved_index]: 4.73001e-06 [label_fine_grained_interleaved_index]: 2.53998e-06 [merge_cast_opt]: 1.50999e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.51998e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.52999e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.536e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 4.72998e-06 [overlap_recompute_and_grad_model_parallel]: 5.58002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.95999e-06 [overlap_grad_flash_sp]: 2.205e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 9.665e-05, [1] [Cycle 1]: 9.207e-05, [6] [build]: 1.064e-05 [elim_shapecalc]: 1.287e-05 [elim_not_effective]: 1.664e-05 [opt_reshape]: 9.62001e-06 [fold_const_symbol]: 1.367e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 2.406e-05 [get_jit_bprop_graph]: 1.92999e-06 [rewriter_after_jit_bprop_graph]: 4.62e-06 [opt_after_jit_grad]: 0.00054224 [validate]: 4.927e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.137799 [execute]: 9.67999e-06 Sums bootstrap : 0.000515s : 0.32% type_inference : 0.010763s : 6.62% event_method : 0.000047s : 0.03% auto_monad : 0.000122s : 0.07% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.02% optimize.rewriter_before_opt_a : 0.000132s : 0.08% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000124s : 0.08% optimize.opt_a.loop_unroll : 0.000109s : 0.07% optimize.opt_a.a_1 : 0.003209s : 1.97% optimize.opt_a.with_stream_mark : 0.000051s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000503s : 0.31% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.03% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000032s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000031s : 0.02% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.02% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.02% optimize.opt_a.virtual_output : 0.000031s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000054s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.01% optimize.opt_a.meta_fg_expand : 0.001548s : 0.95% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.05% optimize.opt_a.a_after_grad : 0.000110s : 0.07% optimize.opt_a.renormalize : 0.003344s : 2.06% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.05% optimize.opt_a.cse : 0.000236s : 0.14% optimize.opt_a.a_3 : 0.000458s : 0.28% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000600s : 0.37% optimize.opt_b.b_1 : 0.000191s : 0.12% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000029s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.01% optimize.loop_unroll : 0.000479s : 0.29% optimize.opt_after_cconv.c_1 : 0.000045s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000027s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000065s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.04% optimize.cse_after_recomputation.cse : 0.000021s : 0.01% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000542s : 0.33% validate : 0.000049s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.137799s : 84.70% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000792 209 6.37% : 0.000050s : 11: substitution.arithmetic_simplify 0.31% : 0.000002s : 4: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 4: substitution.fold_const_symbol 1.00% : 0.000008s : 7: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 57.57% : 0.000456s : 16: substitution.inline 1.99% : 0.000016s : 2: substitution.inline_without_move 1.26% : 0.000010s : 18: substitution.j_node_and_user_rematch 1.96% : 0.000016s : 3: substitution.less_batch_normalization 1.81% : 0.000014s : 11: substitution.minmaximum_grad 0.70% : 0.000006s : 5: substitution.partial_eliminate 1.70% : 0.000013s : 18: substitution.remove_not_recompute_node 3.35% : 0.000027s : 10: substitution.replace_applicator 1.35% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.21% : 0.000065s : 28: substitution.tuple_list_get_item_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010692 2 86.34% : 0.009231s : 1: type_inference.infer 13.66% : 0.001461s : 1: type_inference.specialize ------[replace.] 0.000224 30 57.97% : 0.000130s : 16: replace.inline 42.03% : 0.000094s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000478 30 93.50% : 0.000447s : 16: match.inline 6.50% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5429 1.09% : 0.000008s : 65: predicate.accumulaten_eliminater 0.25% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 30: predicate.addn_check_dump 1.06% : 0.000008s : 65: predicate.addn_zero_filter 1.03% : 0.000008s : 65: predicate.adjust_all_reduce_mul_add 2.20% : 0.000016s : 95: predicate.arithmetic_simplify 1.14% : 0.000008s : 65: predicate.cast_eliminate 1.18% : 0.000009s : 65: predicate.check_bprop_eliminate 0.53% : 0.000004s : 30: predicate.compare_switch_simplify 0.07% : 0.000001s : 7: predicate.const_output_eliminate 0.52% : 0.000004s : 30: predicate.depend_value_elim 1.19% : 0.000009s : 65: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 65: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 7: predicate.elim_not_effective 0.18% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 72: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 72: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 72: predicate.environ_get_depend_swap 1.73% : 0.000013s : 102: predicate.environ_get_eliminate 1.17% : 0.000009s : 72: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 95: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 95: predicate.float_depend_g_call 0.52% : 0.000004s : 30: predicate.float_environ_get_switch 0.66% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 7: predicate.fold_const_symbol 0.58% : 0.000004s : 30: predicate.get_grad_eliminate 0.09% : 0.000001s : 7: predicate.graph_param_transform 0.52% : 0.000004s : 30: predicate.incorporate_call 0.49% : 0.000004s : 30: predicate.incorporate_call_switch 5.63% : 0.000041s : 234: predicate.inline 1.27% : 0.000009s : 53: predicate.inline_without_move 0.29% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.73% : 0.000005s : 30: predicate.less_batch_normalization 1.62% : 0.000012s : 93: predicate.list_to_tuple_eliminator_ 2.60% : 0.000019s : 158: predicate.load_eliminater 0.34% : 0.000003s : 7: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 126: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 79: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 30: predicate.merge_addn 1.16% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 65: predicate.minmaximum_grad 0.33% : 0.000002s : 7: predicate.mutable_eliminate 0.16% : 0.000001s : 7: predicate.opt_reshape 0.18% : 0.000001s : 7: predicate.parallel_virtual_node 2.03% : 0.000015s : 95: predicate.partial_defer_inline 1.71% : 0.000013s : 86: predicate.partial_eliminate 1.06% : 0.000008s : 65: predicate.print_const_string_wrapper 0.53% : 0.000004s : 30: predicate.reduce_all_const_elim 1.34% : 0.000010s : 65: predicate.reduce_eliminate 2.64% : 0.000019s : 158: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 30: predicate.remove_not_recompute_node 1.88% : 0.000014s : 144: predicate.replace_applicator 0.61% : 0.000004s : 53: predicate.replace_old_param 0.11% : 0.000001s : 7: predicate.reset_defer_inline 1.10% : 0.000008s : 65: predicate.reshape_eliminate 1.14% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 7: predicate.row_tensor_eliminate 1.30% : 0.000010s : 65: predicate.same_eliminate 0.37% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 30: predicate.shard_identity_eliminate 0.30% : 0.000002s : 14: predicate.special_op_eliminate 0.60% : 0.000004s : 30: predicate.specialize_transform 1.28% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 53: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 95: predicate.switch_defer_inline 2.89% : 0.000021s : 160: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 258: predicate.switch_simplify 1.07% : 0.000008s : 65: predicate.tile_eliminate 1.10% : 0.000008s : 65: predicate.transpose_eliminate 1.49% : 0.000011s : 79: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 79: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000010s : 79: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000021s : 123: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 79: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000015s : 109: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 93: predicate.tuple_to_list_eliminator_ 2.52% : 0.000019s : 158: predicate.updatestate_pure_node_eliminater 3.16% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 7: predicate.value_based_eliminate 0.64% : 0.000005s : 30: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 30: predicate.virtual_output_eliminate 0.14% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001697 32 54.30% : 0.000921s : 12: func_graph_cloner_run.FuncGraphClonerGraph 45.70% : 0.000775s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.192644 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.61% : 0.003105s : 1: add_attr 1.61% : 0.003096s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000065s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000129s : 1: auto_monad 0.01% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.29% : 0.000549s : 1: bootstrap 0.01% : 0.000028s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000055s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.25% : 0.000488s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.32% : 0.000610s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.52% : 0.004854s : 117: opt.transform.opt_a 0.02% : 0.000044s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000170s : 28: opt.transform.opt_b 0.04% : 0.000072s : 2: opt.transform.opt_trans_graph 0.03% : 0.000049s : 4: opt.transform.symbol_engine_opt 5.91% : 0.011387s : 1: opt_a 0.07% : 0.000137s : 1: opt_after_cconv 0.29% : 0.000553s : 1: opt_after_jit_grad 0.15% : 0.000297s : 1: opt_b 7.21% : 0.013890s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000053s : 1: pre_auto_parallel 0.02% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000035s : 1: remove_dup_value 0.86% : 0.001661s : 2: renormalize.infer 0.87% : 0.001668s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000050s : 1: rewriter_after_opt_a 0.07% : 0.000137s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000099s : 1: symbol_engine_optimizer 71.54% : 0.137820s : 1: task_emit 0.07% : 0.000142s : 1: tuple_transform 5.60% : 0.010778s : 1: type_inference 0.04% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x1-ge],max_mem:6.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x2-pynative],max_mem:6.0M TotalTime = 0.0236609, [24] [bootstrap]: 0.00063285 [type_inference]: 0.00701029 [event_method]: 1.593e-05 [auto_monad]: 5.894e-05 [graph_reusing]: 5.59e-06 [inline]: 2.00002e-06 [add_attr]: 0.00364835, [1] [add_attr_with_inline]: 0.00363627, [1] [Cycle 1]: 5.045e-05, [2] [tag_attr]: 1.691e-05 [meta_addattr_fg_expand]: 4.83001e-06 [parallel-infer-symbol]: 3.56001e-06 [pre_auto_parallel]: 3.242e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.56002e-06 [optimize]: 0.00422276, [53] [py_interpret_to_execute]: 2.18e-05 [rewriter_before_opt_a]: 6.008e-05 [opt_a]: 0.00230737, [2] [Cycle 1]: 0.00168443, [45] [expand_dump_flag]: 2.50002e-06 [switch_simplify]: 3.2e-05 [loop_unroll]: 2.124e-05 [a_1]: 0.00047899 [with_stream_mark]: 1.33e-05 [recompute_prepare]: 8.59998e-06 [updatestate_depend_eliminate]: 4.38999e-06 [updatestate_assign_eliminate]: 3.11001e-06 [updatestate_loads_eliminate]: 2.99001e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.888e-05 [accelerated_algorithm]: 6.76999e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 8.45001e-06 [auto_parallel]: 6.12001e-06 [parallel]: 2.671e-05 [flash_sp]: 7.5e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 8.84998e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 8.48999e-06 [virtual_dataset]: 6.56e-06 [get_grad_eliminate_]: 5.72999e-06 [virtual_output]: 6.05002e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.002e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.154e-05 [merge_recompute_call_nodes]: 1.44003e-06 [before_grad]: 9.87001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.48998e-06 [flash_sp_send_recv_attached]: 2.68998e-06 [receive_attached]: 2.73998e-06 [after_resolve]: 1.074e-05 [a_after_grad]: 8.97e-06 [renormalize]: 0.00050578 [add_forward_monad_depend]: 4.72998e-06 [auto_monad_grad]: 2.18002e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.899e-05 [a_3]: 7.081e-05 [Cycle 2]: 0.00061326, [45] [expand_dump_flag]: 1.37999e-06 [switch_simplify]: 7.54002e-06 [loop_unroll]: 5.77999e-06 [a_1]: 0.00012993 [with_stream_mark]: 1.047e-05 [recompute_prepare]: 5.85002e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.899e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.50999e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.28e-06 [merge_comm]: 3.2e-06 [allreduce_fusion]: 2.81999e-06 [matmul_add_comm_reduction]: 5.33002e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.18002e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.79999e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 5.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.027e-05 [merge_recompute_call_nodes]: 8.60018e-07 [before_grad]: 8.27998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 9.20999e-06 [a_after_grad]: 8.15999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 6.99001e-06 [cse]: 1.507e-05 [a_3]: 3.346e-05 [py_interpret_to_execute_after_opt_a]: 8.35001e-06 [slice_cell_reuse_recomputed_activation]: 2.27001e-06 [rewriter_after_opt_a]: 3.059e-05 [convert_after_rewriter]: 7.46001e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.00048199 [opt_b]: 0.00018767, [1] [Cycle 1]: 0.00018141, [7] [b_1]: 0.00011188 [b_2]: 7.09001e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 4.09986e-07 [cse]: 1.721e-05 [optimize_parallel_all_gather_comm]: 1.66e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 2.296e-05 [loop_unroll]: 0.00042297 [opt_after_cconv]: 9.727e-05, [1] [Cycle 1]: 9.139e-05, [7] [c_1]: 2.883e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.60001e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.659e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.334e-05 [tuple_transform]: 7.15e-05, [1] [Cycle 1]: 6.701e-05, [4] [d_1]: 4.062e-05 [none_parameter_eliminate]: 1.38002e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.62002e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 5.104e-05 [cse_after_recomputation]: 2.115e-05, [1] [Cycle 1]: 1.657e-05, [1] [cse]: 1.15e-05 [environ_conv]: 5.26002e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.33998e-06 [label_micro_interleaved_index]: 4.14997e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.60002e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.38002e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86998e-06 [control_data_broadcast_order]: 1.22e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.68002e-06 [overlap_recompute_comm]: 1.92999e-06 [overlap_grad_ring_attention]: 4.3e-06 [overlap_grad_flash_sp]: 1.765e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.69998e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 7.077e-05, [1] [Cycle 1]: 6.658e-05, [6] [build]: 2.79001e-06 [elim_shapecalc]: 9.34e-06 [elim_not_effective]: 1.136e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 9.24e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.63e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 0.00014264 [opt_after_jit_grad]: 0.00046663 [validate]: 3.438e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00713108 [execute]: 7.98001e-06 Sums bootstrap : 0.000633s : 3.33% type_inference : 0.007010s : 36.87% event_method : 0.000016s : 0.08% auto_monad : 0.000059s : 0.31% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000032s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.11% optimize.rewriter_before_opt_a : 0.000060s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.21% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000609s : 3.20% optimize.opt_a.with_stream_mark : 0.000024s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000148s : 0.78% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000031s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000506s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000044s : 0.23% optimize.opt_a.a_3 : 0.000104s : 0.55% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000482s : 2.54% optimize.opt_b.b_1 : 0.000112s : 0.59% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.12% optimize.loop_unroll : 0.000423s : 2.22% optimize.opt_after_cconv.c_1 : 0.000029s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000143s : 0.75% opt_after_jit_grad : 0.000467s : 2.45% validate : 0.000034s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.007131s : 37.51% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000179 30 14.79% : 0.000027s : 5: substitution.arithmetic_simplify 0.94% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000006s : 4: substitution.graph_param_transform 67.88% : 0.000122s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000005s : 4: substitution.remove_not_recompute_node 2.19% : 0.000004s : 4: substitution.replace_old_param 6.01% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006953 2 90.51% : 0.006294s : 1: type_inference.infer 9.49% : 0.000660s : 1: type_inference.specialize ------[replace.] 0.000041 5 69.63% : 0.000029s : 3: replace.inline 30.37% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000129 5 92.53% : 0.000120s : 3: match.inline 7.47% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000194 1131 0.73% : 0.000001s : 11: predicate.accumulaten_eliminater 0.75% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.48% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 11: predicate.addn_zero_filter 0.65% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.92% : 0.000004s : 19: predicate.arithmetic_simplify 0.77% : 0.000001s : 11: predicate.cast_eliminate 0.62% : 0.000001s : 8: predicate.check_bprop_eliminate 0.47% : 0.000001s : 8: predicate.compare_switch_simplify 0.20% : 0.000000s : 4: predicate.const_output_eliminate 0.54% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.83% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.66% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.03% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.93% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.88% : 0.000002s : 15: predicate.environ_get_depend_swap 1.50% : 0.000003s : 23: predicate.environ_get_eliminate 0.89% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.04% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.88% : 0.000004s : 16: predicate.float_depend_g_call 0.49% : 0.000001s : 8: predicate.float_environ_get_switch 0.76% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 4: predicate.fold_const_symbol 0.63% : 0.000001s : 8: predicate.get_grad_eliminate 0.20% : 0.000000s : 4: predicate.graph_param_transform 0.57% : 0.000001s : 8: predicate.incorporate_call 0.46% : 0.000001s : 8: predicate.incorporate_call_switch 5.13% : 0.000010s : 51: predicate.inline 0.74% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000002s : 8: predicate.less_batch_normalization 1.49% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.07% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.82% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.50% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.50% : 0.000001s : 8: predicate.merge_addn 0.59% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.63% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.33% : 0.000001s : 4: predicate.parallel_virtual_node 1.45% : 0.000003s : 16: predicate.partial_defer_inline 1.26% : 0.000002s : 17: predicate.partial_eliminate 0.93% : 0.000002s : 11: predicate.print_const_string_wrapper 0.57% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000002s : 11: predicate.reduce_eliminate 1.98% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.42% : 0.000001s : 8: predicate.remove_not_recompute_node 1.25% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.24% : 0.000000s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 11: predicate.reshape_eliminate 14.35% : 0.000028s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.63% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.75% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.66% : 0.000001s : 8: predicate.specialize_transform 0.81% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.71% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.32% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.12% : 0.000002s : 16: predicate.switch_defer_inline 1.71% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.30% : 0.000008s : 54: predicate.switch_simplify 0.75% : 0.000001s : 11: predicate.tile_eliminate 0.74% : 0.000001s : 11: predicate.transpose_eliminate 1.36% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.37% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.38% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.94% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.79% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.61% : 0.000001s : 8: predicate.virtual_output_eliminate 0.29% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000467 8 47.59% : 0.000222s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.41% : 0.000245s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033206 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.00% : 0.003653s : 1: add_attr 10.96% : 0.003640s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000064s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.03% : 0.000676s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.30% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.48% : 0.000492s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.06% : 0.001015s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.96% : 0.002310s : 1: opt_a 0.30% : 0.000101s : 1: opt_after_cconv 1.44% : 0.000477s : 1: opt_after_jit_grad 0.58% : 0.000191s : 1: opt_b 12.73% : 0.004227s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000037s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.78% : 0.000260s : 1: renormalize.infer 0.72% : 0.000238s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000149s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000034s : 1: rewriter_after_opt_a 0.19% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000073s : 1: symbol_engine_optimizer 21.51% : 0.007144s : 1: task_emit 0.22% : 0.000074s : 1: tuple_transform 21.16% : 0.007026s : 1: type_inference 0.19% : 0.000062s : 1: validate TotalTime = 0.0186475, [24] [bootstrap]: 0.0004733 [type_inference]: 0.00451245 [event_method]: 1.117e-05 [auto_monad]: 5.228e-05 [graph_reusing]: 4.65001e-06 [inline]: 2.21998e-06 [add_attr]: 0.00303911, [1] [add_attr_with_inline]: 0.00303142, [1] [Cycle 1]: 4.374e-05, [2] [tag_attr]: 1.226e-05 [meta_addattr_fg_expand]: 3.58e-06 [parallel-infer-symbol]: 2.97002e-06 [pre_auto_parallel]: 2.201e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.10002e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00379561, [53] [py_interpret_to_execute]: 1.584e-05 [rewriter_before_opt_a]: 3.784e-05 [opt_a]: 0.00195815, [2] [Cycle 1]: 0.00129521, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.383e-05 [loop_unroll]: 1.452e-05 [a_1]: 0.00030368 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.42002e-06 [updatestate_depend_eliminate]: 3.62998e-06 [updatestate_assign_eliminate]: 3.12002e-06 [updatestate_loads_eliminate]: 3.05998e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.889e-05 [accelerated_algorithm]: 6.93e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 6.31998e-06 [merge_send_recv]: 8.1e-06 [auto_parallel]: 5.79999e-06 [parallel]: 1.791e-05 [flash_sp]: 7.45003e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 9.54999e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.53e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.84e-06 [virtual_output]: 5.75001e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.099e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.25999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.16998e-06 [receive_attached]: 2.61e-06 [after_resolve]: 1.107e-05 [a_after_grad]: 9.31e-06 [renormalize]: 0.00035998 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.326e-05 [cse]: 2.707e-05 [a_3]: 4.059e-05 [Cycle 2]: 0.0006537, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 7.08e-06 [loop_unroll]: 5.73997e-06 [a_1]: 0.00013167 [with_stream_mark]: 1.161e-05 [recompute_prepare]: 6.09999e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.68998e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00011093 [accelerated_algorithm]: 6.25002e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.32e-06 [shard_inline]: 5.93002e-06 [merge_send_recv]: 4.76002e-06 [auto_parallel]: 5.66e-06 [parallel]: 4.45999e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.40999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.40002e-06 [virtual_dataset]: 5.68002e-06 [get_grad_eliminate_]: 5.19998e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94999e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.11002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.37001e-06 [a_after_grad]: 8.07e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.363e-05 [a_3]: 3.283e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.156e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 5.62999e-06 [mutable_eliminate]: 0.0004562 [opt_b]: 0.0001852, [1] [Cycle 1]: 0.00017915, [7] [b_1]: 0.00011061 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 5.64e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.33002e-06 [renormalize]: 3.10014e-07 [cse]: 1.663e-05 [optimize_parallel_all_gather_comm]: 1.622e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.196e-05 [loop_unroll]: 0.00042501 [opt_after_cconv]: 9.702e-05, [1] [Cycle 1]: 9.133e-05, [7] [c_1]: 2.876e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.645e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.195e-05 [tuple_transform]: 7.113e-05, [1] [Cycle 1]: 6.687e-05, [4] [d_1]: 4.055e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.49001e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.447e-05 [cse_after_recomputation]: 2.065e-05, [1] [Cycle 1]: 1.625e-05, [1] [cse]: 1.105e-05 [environ_conv]: 4.73001e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.73998e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.55999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.196e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.27e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 1.96e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.664e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 6.931e-05, [1] [Cycle 1]: 6.52e-05, [6] [build]: 2.53003e-06 [elim_shapecalc]: 8.54e-06 [elim_not_effective]: 1.172e-05 [opt_reshape]: 6.33e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.51002e-06 [auto_monad_reorder]: 1.509e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00046236 [validate]: 3.08e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00600577 [execute]: 7.01999e-06 Sums bootstrap : 0.000473s : 3.23% type_inference : 0.004512s : 30.82% event_method : 0.000011s : 0.08% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000435s : 2.97% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000190s : 1.30% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000360s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000456s : 3.12% optimize.opt_b.b_1 : 0.000111s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000425s : 2.90% optimize.opt_after_cconv.c_1 : 0.000029s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000462s : 3.16% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006006s : 41.02% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000125 26 18.23% : 0.000023s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 4.36% : 0.000005s : 4: substitution.graph_param_transform 66.10% : 0.000083s : 2: substitution.inline 2.16% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.47% : 0.000004s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004469 2 91.82% : 0.004103s : 1: type_inference.infer 8.18% : 0.000365s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000140 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.65% : 0.000004s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.15% : 0.000002s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.10% : 0.000002s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.32% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 41.89% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.11% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026802 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.36% : 0.003043s : 1: add_attr 11.32% : 0.003035s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000509s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000465s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.97% : 0.000797s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.09% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.32% : 0.001961s : 1: opt_a 0.37% : 0.000100s : 1: opt_after_cconv 1.76% : 0.000472s : 1: opt_after_jit_grad 0.70% : 0.000189s : 1: opt_b 14.18% : 0.003799s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.74% : 0.000198s : 1: renormalize.infer 0.58% : 0.000155s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.44% : 0.006016s : 1: task_emit 0.28% : 0.000074s : 1: tuple_transform 16.89% : 0.004526s : 1: type_inference 0.21% : 0.000057s : 1: validate TotalTime = 0.0202454, [24] [bootstrap]: 0.00046769 [type_inference]: 0.00572959 [event_method]: 1.439e-05 [auto_monad]: 5.528e-05 [graph_reusing]: 5.88002e-06 [inline]: 2.22999e-06 [add_attr]: 0.00300223, [1] [add_attr_with_inline]: 0.00299364, [1] [Cycle 1]: 4.594e-05, [2] [tag_attr]: 1.562e-05 [meta_addattr_fg_expand]: 4.37998e-06 [parallel-infer-symbol]: 2.99999e-06 [pre_auto_parallel]: 2.637e-05 [insert-virtual-dataset]: 3.06999e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.94999e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00411752, [53] [py_interpret_to_execute]: 2.129e-05 [rewriter_before_opt_a]: 5.791e-05 [opt_a]: 0.00224758, [2] [Cycle 1]: 0.00163337, [45] [expand_dump_flag]: 2.88998e-06 [switch_simplify]: 3.151e-05 [loop_unroll]: 2.14e-05 [a_1]: 0.00048955 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 8.05999e-06 [updatestate_depend_eliminate]: 3.92002e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 3.22002e-06 [parameter_eliminate]: 2.02999e-06 [a_2]: 7.856e-05 [accelerated_algorithm]: 6.79999e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.66002e-06 [shard_inline]: 6.13002e-06 [merge_send_recv]: 8.43999e-06 [auto_parallel]: 6.61999e-06 [parallel]: 1.738e-05 [flash_sp]: 7.68999e-06 [merge_comm]: 3.40003e-06 [allreduce_fusion]: 3.71001e-06 [matmul_add_comm_reduction]: 8.83001e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.42999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 9.14998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.80998e-06 [meta_fg_expand]: 2.55002e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 9.18002e-06 [renormalize]: 0.00049203 [add_forward_monad_depend]: 5.19998e-06 [auto_monad_grad]: 2.21e-06 [auto_monad_eliminator]: 1.397e-05 [cse]: 2.8e-05 [a_3]: 4.147e-05 [Cycle 2]: 0.00060475, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 6.95002e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012809 [with_stream_mark]: 1.033e-05 [recompute_prepare]: 6.01998e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.886e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.67999e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.47002e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.23002e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69e-06 [merge_recompute_call_nodes]: 6.39993e-07 [before_grad]: 8.08999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.76003e-06 [flash_sp_send_recv_attached]: 7.00005e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.39e-06 [a_after_grad]: 8.46002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.73e-06 [cse]: 1.758e-05 [a_3]: 3.242e-05 [py_interpret_to_execute_after_opt_a]: 8.08001e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.301e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.05001e-06 [mutable_eliminate]: 0.00045712 [opt_b]: 0.00018592, [1] [Cycle 1]: 0.00017986, [7] [b_1]: 0.00011122 [b_2]: 7.53e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.23998e-06 [renormalize]: 3.7998e-07 [cse]: 1.702e-05 [optimize_parallel_all_gather_comm]: 1.515e-05 [overlap_param_gather]: 1.95001e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00042271 [opt_after_cconv]: 9.955e-05, [1] [Cycle 1]: 9.356e-05, [7] [c_1]: 3.005e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.685e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.987e-05, [1] [Cycle 1]: 6.557e-05, [4] [d_1]: 3.977e-05 [none_parameter_eliminate]: 1.38002e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.43e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.695e-05 [cse_after_recomputation]: 2.082e-05, [1] [Cycle 1]: 1.645e-05, [1] [cse]: 1.136e-05 [environ_conv]: 4.35e-06 [swap_dp_allreduce_reducescatter]: 5.66e-06 [bias_add_comm_swap]: 2.42001e-06 [label_micro_interleaved_index]: 4.94e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.26997e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.188e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.62998e-06 [overlap_recompute_and_grad_model_parallel]: 4.47998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.58998e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 6.881e-05, [1] [Cycle 1]: 6.467e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.135e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 8.96002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.87001e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00051788 [validate]: 3.254e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00603347 [execute]: 6.56999e-06 Sums bootstrap : 0.000468s : 2.87% type_inference : 0.005730s : 35.20% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000618s : 3.79% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000492s : 3.02% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000046s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000457s : 2.81% optimize.opt_b.b_1 : 0.000111s : 0.68% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000423s : 2.60% optimize.opt_after_cconv.c_1 : 0.000030s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000518s : 3.18% validate : 0.000033s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006033s : 37.07% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000191 30 12.71% : 0.000024s : 5: substitution.arithmetic_simplify 0.94% : 0.000002s : 2: substitution.elim_not_effective 0.65% : 0.000001s : 2: substitution.fold_const_symbol 2.77% : 0.000005s : 4: substitution.graph_param_transform 71.23% : 0.000136s : 3: substitution.inline 1.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.26% : 0.000004s : 4: substitution.remove_not_recompute_node 2.05% : 0.000004s : 4: substitution.replace_old_param 5.95% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005688 2 89.20% : 0.005074s : 1: type_inference.infer 10.80% : 0.000614s : 1: type_inference.specialize ------[replace.] 0.000042 5 69.64% : 0.000029s : 3: replace.inline 30.36% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000144 5 92.86% : 0.000134s : 3: match.inline 7.14% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000165 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.93% : 0.000002s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.32% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 5.77% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.68% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 11: predicate.reduce_eliminate 2.23% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000002s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.86% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.14% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.99% : 0.000002s : 11: predicate.transpose_eliminate 1.67% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.24% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 8 45.08% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.92% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029007 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.37% : 0.003007s : 1: add_attr 10.33% : 0.002997s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.73% : 0.000503s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.49% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.41% : 0.000990s : 78: opt.transform.opt_a 0.10% : 0.000029s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000092s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.76% : 0.002251s : 1: opt_a 0.36% : 0.000103s : 1: opt_after_cconv 1.82% : 0.000528s : 1: opt_after_jit_grad 0.65% : 0.000189s : 1: opt_b 14.21% : 0.004121s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.96% : 0.000280s : 1: renormalize.infer 0.71% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000072s : 1: symbol_engine_optimizer 20.84% : 0.006044s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 19.80% : 0.005743s : 1: type_inference 0.20% : 0.000059s : 1: validate TotalTime = 0.0389563, [24] [bootstrap]: 0.00055572 [type_inference]: 0.0117348 [event_method]: 5.253e-05 [auto_monad]: 0.0001258 [graph_reusing]: 8.67998e-06 [inline]: 1.77999e-06 [add_attr]: 0.00304698, [1] [add_attr_with_inline]: 0.00303885, [1] [Cycle 1]: 7.422e-05, [2] [tag_attr]: 3.6e-05 [meta_addattr_fg_expand]: 1.044e-05 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 5.099e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.0139224, [53] [py_interpret_to_execute]: 3.88e-05 [rewriter_before_opt_a]: 0.00014874 [opt_a]: 0.0115722, [3] [Cycle 1]: 0.00745663, [45] [expand_dump_flag]: 4.18001e-06 [switch_simplify]: 7.672e-05 [loop_unroll]: 6.401e-05 [a_1]: 0.00154225 [with_stream_mark]: 2.309e-05 [recompute_prepare]: 2.266e-05 [updatestate_depend_eliminate]: 1e-05 [updatestate_assign_eliminate]: 7.71001e-06 [updatestate_loads_eliminate]: 7.88999e-06 [parameter_eliminate]: 2.56e-06 [a_2]: 0.00024885 [accelerated_algorithm]: 3.182e-05 [shard]: 2.06998e-06 [meta_shard_fg_expand]: 3.95e-06 [shard_inline]: 1.624e-05 [merge_send_recv]: 1.63e-05 [auto_parallel]: 1.13e-05 [parallel]: 1.838e-05 [flash_sp]: 1.117e-05 [merge_comm]: 9.79e-06 [allreduce_fusion]: 8.96002e-06 [matmul_add_comm_reduction]: 2.784e-05 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 1.819e-05 [virtual_dataset]: 1.621e-05 [get_grad_eliminate_]: 1.6e-05 [virtual_output]: 1.568e-05 [merge_forward]: 9.24998e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 1.831e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.929e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 2.78e-05 [set_forward_comm_id_for_comm_node_pass]: 1.033e-05 [meta_fg_expand]: 0.00144475 [flash_sp_send_recv_attached]: 3.97002e-06 [receive_attached]: 2.89001e-06 [after_resolve]: 6.183e-05 [a_after_grad]: 8.368e-05 [renormalize]: 0.00264927 [add_forward_monad_depend]: 1.01e-05 [auto_monad_grad]: 5.32001e-06 [auto_monad_eliminator]: 5.822e-05 [cse]: 0.00017546 [a_3]: 0.00034652 [Cycle 2]: 0.00318006, [45] [expand_dump_flag]: 1.67999e-06 [switch_simplify]: 4.844e-05 [loop_unroll]: 4.511e-05 [a_1]: 0.00163588 [with_stream_mark]: 1.26e-05 [recompute_prepare]: 1.153e-05 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 4.58001e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012992 [accelerated_algorithm]: 1.251e-05 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 2.01e-06 [shard_inline]: 9.51e-06 [merge_send_recv]: 7.06999e-06 [auto_parallel]: 7.55e-06 [parallel]: 4.87e-06 [flash_sp]: 3.25e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 4.67e-06 [matmul_add_comm_reduction]: 8.22e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 1.123e-05 [virtual_dataset]: 9.33002e-06 [get_grad_eliminate_]: 9.45001e-06 [virtual_output]: 8.75999e-06 [merge_forward]: 4.38999e-06 [cell_reuse_recompute_pass]: 1.04998e-06 [offload_activation]: 9.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.663e-05 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 1.449e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40001e-06 [meta_fg_expand]: 7.393e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.09e-06 [after_resolve]: 1.758e-05 [a_after_grad]: 1.553e-05 [renormalize]: 0.00062858 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.548e-05 [cse]: 5.071e-05 [a_3]: 6.815e-05 [Cycle 3]: 0.0009208, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 1.129e-05 [loop_unroll]: 9.34e-06 [a_1]: 0.00025641 [with_stream_mark]: 1.047e-05 [recompute_prepare]: 9.64e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 4.05e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 0.00012678 [accelerated_algorithm]: 1.225e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.93002e-06 [shard_inline]: 9.08002e-06 [merge_send_recv]: 7.38999e-06 [auto_parallel]: 7.47002e-06 [parallel]: 4.35999e-06 [flash_sp]: 1.11997e-06 [merge_comm]: 5.07999e-06 [allreduce_fusion]: 5.05001e-06 [matmul_add_comm_reduction]: 8.25e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.067e-05 [virtual_dataset]: 9.02999e-06 [get_grad_eliminate_]: 8.80999e-06 [virtual_output]: 8.78001e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.13002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.65e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.453e-05 [set_forward_comm_id_for_comm_node_pass]: 5.67999e-06 [meta_fg_expand]: 3.23998e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.389e-05 [a_after_grad]: 1.485e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.099e-05 [cse]: 2.491e-05 [a_3]: 5.809e-05 [py_interpret_to_execute_after_opt_a]: 1.1e-05 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 5.145e-05 [convert_after_rewriter]: 9.75002e-06 [order_py_execute_after_rewriter]: 7.36001e-06 [mutable_eliminate]: 0.00046457 [opt_b]: 0.00032995, [1] [Cycle 1]: 0.00032388, [7] [b_1]: 0.00022592 [b_2]: 1.192e-05 [updatestate_depend_eliminate]: 7.62002e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 4.10998e-06 [renormalize]: 4.2998e-07 [cse]: 3.316e-05 [optimize_parallel_all_gather_comm]: 2.102e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 1.992e-05 [loop_unroll]: 0.00043395 [opt_after_cconv]: 0.00013877, [1] [Cycle 1]: 0.00013284, [7] [c_1]: 5.012e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 7.61001e-06 [updatestate_assign_eliminate]: 4.45999e-06 [updatestate_loads_eliminate]: 4.15999e-06 [cse]: 2.978e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 2.869e-05 [tuple_transform]: 0.00010538, [1] [Cycle 1]: 0.00010067, [4] [d_1]: 6.955e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 1.034e-05 [partial_unused_args_eliminate]: 1.68997e-06 [add_recomputation]: 6.1e-05 [cse_after_recomputation]: 3.357e-05, [1] [Cycle 1]: 2.885e-05, [1] [cse]: 2.285e-05 [environ_conv]: 8.42e-06 [swap_dp_allreduce_reducescatter]: 8.02e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.14002e-06 [label_fine_grained_interleaved_index]: 2.41998e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.71e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.67999e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 9.50007e-07 [full_micro_interleaved_order_control]: 1.95001e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.99978e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.24998e-06 [overlap_opt_shard_in_pipeline]: 1.09003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.771e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 5.30001e-06 [overlap_recompute_and_grad_model_parallel]: 5.81998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 1.89e-06 [overlap_grad_ring_attention]: 5.61e-06 [overlap_grad_flash_sp]: 2.383e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010158, [1] [Cycle 1]: 9.73e-05, [6] [build]: 1.004e-05 [elim_shapecalc]: 1.393e-05 [elim_not_effective]: 1.89e-05 [opt_reshape]: 1.094e-05 [fold_const_symbol]: 1.525e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.546e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.8e-06 [opt_after_jit_grad]: 0.00047804 [validate]: 4.432e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0086738 [execute]: 7.01999e-06 Sums bootstrap : 0.000556s : 1.60% type_inference : 0.011735s : 33.88% event_method : 0.000053s : 0.15% auto_monad : 0.000126s : 0.36% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000149s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000136s : 0.39% optimize.opt_a.loop_unroll : 0.000118s : 0.34% optimize.opt_a.a_1 : 0.003435s : 9.92% optimize.opt_a.with_stream_mark : 0.000046s : 0.13% optimize.opt_a.recompute_prepare : 0.000044s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000506s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.04% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001522s : 4.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000093s : 0.27% optimize.opt_a.a_after_grad : 0.000114s : 0.33% optimize.opt_a.renormalize : 0.003278s : 9.46% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.24% optimize.opt_a.cse : 0.000251s : 0.72% optimize.opt_a.a_3 : 0.000473s : 1.36% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000465s : 1.34% optimize.opt_b.b_1 : 0.000226s : 0.65% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000434s : 1.25% optimize.opt_after_cconv.c_1 : 0.000050s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.08% optimize.tuple_transform.d_1 : 0.000070s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.18% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000478s : 1.38% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008674s : 25.04% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000800 222 6.02% : 0.000048s : 12: substitution.arithmetic_simplify 1.82% : 0.000015s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000007s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 56.96% : 0.000456s : 17: substitution.inline 2.08% : 0.000017s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.88% : 0.000015s : 3: substitution.less_batch_normalization 1.69% : 0.000014s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.68% : 0.000013s : 20: substitution.remove_not_recompute_node 3.05% : 0.000024s : 10: substitution.replace_applicator 1.31% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.55% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.70% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.27% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.28% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011658 2 86.34% : 0.010065s : 1: type_inference.infer 13.66% : 0.001593s : 1: type_inference.specialize ------[replace.] 0.000234 33 57.05% : 0.000134s : 17: replace.inline 42.95% : 0.000100s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000480 33 92.96% : 0.000446s : 17: match.inline 7.04% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000773 5764 1.05% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.14% : 0.000017s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_depend_swap 1.70% : 0.000013s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.60% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.67% : 0.000044s : 249: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.08% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000016s : 101: predicate.partial_defer_inline 1.76% : 0.000014s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.63% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000015s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.90% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.98% : 0.000038s : 277: predicate.switch_simplify 1.05% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.54% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001681 34 55.99% : 0.000941s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.01% : 0.000740s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064722 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.71% : 0.003051s : 1: add_attr 4.70% : 0.003043s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000133s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.91% : 0.000589s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000060s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000474s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.96% : 0.005149s : 117: opt.transform.opt_a 0.08% : 0.000049s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000211s : 28: opt.transform.opt_b 0.12% : 0.000078s : 2: opt.transform.opt_trans_graph 0.09% : 0.000056s : 4: opt.transform.symbol_engine_opt 17.88% : 0.011575s : 1: opt_a 0.22% : 0.000142s : 1: opt_after_cconv 0.75% : 0.000488s : 1: opt_after_jit_grad 0.52% : 0.000334s : 1: opt_b 21.52% : 0.013927s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000056s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.74% : 0.001771s : 2: renormalize.infer 2.31% : 0.001493s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000056s : 1: rewriter_after_opt_a 0.24% : 0.000154s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000104s : 1: symbol_engine_optimizer 13.42% : 0.008684s : 1: task_emit 0.17% : 0.000108s : 1: tuple_transform 18.15% : 0.011750s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0189203, [24] [bootstrap]: 0.00045459 [type_inference]: 0.00442127 [event_method]: 1.156e-05 [auto_monad]: 5.03e-05 [graph_reusing]: 5.55001e-06 [inline]: 1.76e-06 [add_attr]: 0.00305023, [1] [add_attr_with_inline]: 0.00304233, [1] [Cycle 1]: 4.496e-05, [2] [tag_attr]: 1.217e-05 [meta_addattr_fg_expand]: 3.2e-06 [parallel-infer-symbol]: 2.76999e-06 [pre_auto_parallel]: 2.177e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00381393, [53] [py_interpret_to_execute]: 1.562e-05 [rewriter_before_opt_a]: 4.07e-05 [opt_a]: 0.00191753, [2] [Cycle 1]: 0.00129851, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 2.532e-05 [loop_unroll]: 1.384e-05 [a_1]: 0.00030336 [with_stream_mark]: 1.337e-05 [recompute_prepare]: 7.96001e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 3.18998e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.923e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.37001e-06 [merge_send_recv]: 8.27e-06 [auto_parallel]: 5.92999e-06 [parallel]: 1.72e-05 [flash_sp]: 7.73001e-06 [merge_comm]: 3.59002e-06 [allreduce_fusion]: 3.43999e-06 [matmul_add_comm_reduction]: 9.77001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.30001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.30001e-06 [before_grad]: 1.002e-05 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.113e-05 [a_after_grad]: 9.20999e-06 [renormalize]: 0.00035985 [add_forward_monad_depend]: 4.58001e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.391e-05 [cse]: 2.669e-05 [a_3]: 4.201e-05 [Cycle 2]: 0.00060975, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 7.25e-06 [loop_unroll]: 5.65001e-06 [a_1]: 0.00012979 [with_stream_mark]: 9.85002e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 3.02002e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.968e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.56998e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.43002e-06 [parallel]: 4.12e-06 [flash_sp]: 3.66999e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.97002e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.44999e-06 [virtual_dataset]: 5.34998e-06 [get_grad_eliminate_]: 5.21998e-06 [virtual_output]: 5.04998e-06 [merge_forward]: 2.76e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.14999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52999e-06 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 8.26002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.3e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.298e-05 [a_3]: 3.326e-05 [py_interpret_to_execute_after_opt_a]: 7.54002e-06 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 3.114e-05 [convert_after_rewriter]: 7.32002e-06 [order_py_execute_after_rewriter]: 5.22e-06 [mutable_eliminate]: 0.00050812 [opt_b]: 0.00018865, [1] [Cycle 1]: 0.00018253, [7] [b_1]: 0.00011227 [b_2]: 7.61999e-06 [updatestate_depend_eliminate]: 5.59998e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 4.40021e-07 [cse]: 1.627e-05 [optimize_parallel_all_gather_comm]: 1.492e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 2.177e-05 [loop_unroll]: 0.00042515 [opt_after_cconv]: 9.605e-05, [1] [Cycle 1]: 9.028e-05, [7] [c_1]: 2.89e-05 [parameter_eliminate]: 2.16998e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.627e-05 [renormalize]: 3.49974e-07 [remove_dup_value]: 1.196e-05 [tuple_transform]: 7.119e-05, [1] [Cycle 1]: 6.7e-05, [4] [d_1]: 4.075e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.56e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.335e-05 [cse_after_recomputation]: 2.014e-05, [1] [Cycle 1]: 1.575e-05, [1] [cse]: 1.051e-05 [environ_conv]: 4.95999e-06 [swap_dp_allreduce_reducescatter]: 5.19998e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.45002e-06 [micro_interleaved_order_control]: 2.28002e-06 [assign_add_opt]: 1.19003e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.35001e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.39e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 3.92998e-06 [overlap_recompute_and_grad_model_parallel]: 4.90999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.669e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.015e-05, [1] [Cycle 1]: 6.6e-05, [6] [build]: 2.14e-06 [elim_shapecalc]: 8.88002e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.07001e-06 [fold_const_symbol]: 9.37001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.99999e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.537e-05 [get_jit_bprop_graph]: 9.60019e-07 [rewriter_after_jit_bprop_graph]: 3.42002e-06 [opt_after_jit_grad]: 0.00045575 [validate]: 3.129e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.00636645 [execute]: 7.09001e-06 Sums bootstrap : 0.000455s : 3.05% type_inference : 0.004421s : 29.68% event_method : 0.000012s : 0.08% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.10% optimize.rewriter_before_opt_a : 0.000041s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000033s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000433s : 2.91% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000149s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000360s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000508s : 3.41% optimize.opt_b.b_1 : 0.000112s : 0.75% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000425s : 2.85% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000456s : 3.06% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006366s : 42.74% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000126 26 17.34% : 0.000022s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.23% : 0.000002s : 2: substitution.fold_const_symbol 4.34% : 0.000005s : 4: substitution.graph_param_transform 66.60% : 0.000084s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.28% : 0.000004s : 4: substitution.remove_not_recompute_node 3.38% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004379 2 91.68% : 0.004015s : 1: type_inference.infer 8.32% : 0.000364s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000082 2 100.00% : 0.000082s : 2: match.inline ------[predicate.] 0.000141 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.52% : 0.000004s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.77% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.95% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000009s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.69% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.90% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.39% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.46% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 41.22% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.78% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027103 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.27% : 0.003055s : 1: add_attr 11.24% : 0.003046s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.79% : 0.000486s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.91% : 0.000518s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000795s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000094s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001920s : 1: opt_a 0.37% : 0.000100s : 1: opt_after_cconv 1.72% : 0.000466s : 1: opt_after_jit_grad 0.71% : 0.000192s : 1: opt_b 14.09% : 0.003818s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000196s : 1: renormalize.infer 0.58% : 0.000157s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000073s : 1: symbol_engine_optimizer 23.53% : 0.006377s : 1: task_emit 0.27% : 0.000074s : 1: tuple_transform 16.36% : 0.004435s : 1: type_inference 0.21% : 0.000058s : 1: validate TotalTime = 0.037246, [24] [bootstrap]: 0.00050549 [type_inference]: 0.0105929 [event_method]: 4.436e-05 [auto_monad]: 0.00012118 [graph_reusing]: 8.34002e-06 [inline]: 1.95001e-06 [add_attr]: 0.00305403, [1] [add_attr_with_inline]: 0.0030455, [1] [Cycle 1]: 6.826e-05, [2] [tag_attr]: 3.233e-05 [meta_addattr_fg_expand]: 9.49e-06 [parallel-infer-symbol]: 3.27002e-06 [pre_auto_parallel]: 4.8e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0135898, [53] [py_interpret_to_execute]: 3.862e-05 [rewriter_before_opt_a]: 0.00013113 [opt_a]: 0.0112496, [3] [Cycle 1]: 0.00721712, [45] [expand_dump_flag]: 4.27e-06 [switch_simplify]: 6.902e-05 [loop_unroll]: 5.573e-05 [a_1]: 0.00141459 [with_stream_mark]: 2.434e-05 [recompute_prepare]: 2.149e-05 [updatestate_depend_eliminate]: 9.42999e-06 [updatestate_assign_eliminate]: 8.19998e-06 [updatestate_loads_eliminate]: 7.66999e-06 [parameter_eliminate]: 2.76999e-06 [a_2]: 0.0002494 [accelerated_algorithm]: 3.092e-05 [shard]: 2.46e-06 [meta_shard_fg_expand]: 3.75998e-06 [shard_inline]: 1.628e-05 [merge_send_recv]: 1.665e-05 [auto_parallel]: 1.165e-05 [parallel]: 1.883e-05 [flash_sp]: 1.13e-05 [merge_comm]: 9.99999e-06 [allreduce_fusion]: 9.20001e-06 [matmul_add_comm_reduction]: 2.709e-05 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 1.796e-05 [virtual_dataset]: 1.605e-05 [get_grad_eliminate_]: 1.575e-05 [virtual_output]: 1.542e-05 [merge_forward]: 9.64e-06 [cell_reuse_recompute_pass]: 1.54998e-06 [offload_activation]: 1.84e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.962e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.799e-05 [set_forward_comm_id_for_comm_node_pass]: 1.03e-05 [meta_fg_expand]: 0.001435 [flash_sp_send_recv_attached]: 3.71999e-06 [receive_attached]: 2.51e-06 [after_resolve]: 6.151e-05 [a_after_grad]: 8.354e-05 [renormalize]: 0.00256463 [add_forward_monad_depend]: 1.003e-05 [auto_monad_grad]: 5.39e-06 [auto_monad_eliminator]: 5.827e-05 [cse]: 0.00017556 [a_3]: 0.00034554 [Cycle 2]: 0.00309996, [45] [expand_dump_flag]: 1.66002e-06 [switch_simplify]: 4.822e-05 [loop_unroll]: 4.542e-05 [a_1]: 0.00160543 [with_stream_mark]: 1.265e-05 [recompute_prepare]: 1.182e-05 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 4.62e-06 [updatestate_loads_eliminate]: 3.88999e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 0.00013013 [accelerated_algorithm]: 1.248e-05 [shard]: 1.65001e-06 [meta_shard_fg_expand]: 2.00002e-06 [shard_inline]: 9.51e-06 [merge_send_recv]: 6.98e-06 [auto_parallel]: 7.4e-06 [parallel]: 4.73001e-06 [flash_sp]: 3.25e-06 [merge_comm]: 5.12e-06 [allreduce_fusion]: 4.63999e-06 [matmul_add_comm_reduction]: 8.60001e-06 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 1.072e-05 [virtual_dataset]: 8.87e-06 [get_grad_eliminate_]: 9.25001e-06 [virtual_output]: 8.55001e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.663e-05 [merge_recompute_call_nodes]: 7.79983e-07 [before_grad]: 1.44e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.63e-05 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.599e-05 [a_after_grad]: 1.472e-05 [renormalize]: 0.00062071 [add_forward_monad_depend]: 4.2e-06 [auto_monad_grad]: 1.35001e-06 [auto_monad_eliminator]: 1.555e-05 [cse]: 4.983e-05 [a_3]: 6.883e-05 [Cycle 3]: 0.00091858, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 1.101e-05 [loop_unroll]: 9.51e-06 [a_1]: 0.00025497 [with_stream_mark]: 1.029e-05 [recompute_prepare]: 9.84001e-06 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.98999e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 0.00012617 [accelerated_algorithm]: 1.219e-05 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 7.41999e-06 [auto_parallel]: 7.63001e-06 [parallel]: 4.45999e-06 [flash_sp]: 1.11002e-06 [merge_comm]: 5.14e-06 [allreduce_fusion]: 5.29e-06 [matmul_add_comm_reduction]: 8.01001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.047e-05 [virtual_dataset]: 9.05999e-06 [get_grad_eliminate_]: 8.93002e-06 [virtual_output]: 8.70001e-06 [merge_forward]: 4.63001e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.612e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.43e-05 [set_forward_comm_id_for_comm_node_pass]: 5.59e-06 [meta_fg_expand]: 3.37002e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 1.373e-05 [a_after_grad]: 1.473e-05 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 1.08e-05 [cse]: 2.569e-05 [a_3]: 5.875e-05 [py_interpret_to_execute_after_opt_a]: 1.092e-05 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 5.187e-05 [convert_after_rewriter]: 9.94999e-06 [order_py_execute_after_rewriter]: 7.3e-06 [mutable_eliminate]: 0.00047125 [opt_b]: 0.00029614, [1] [Cycle 1]: 0.0002898, [7] [b_1]: 0.00019449 [b_2]: 1.164e-05 [updatestate_depend_eliminate]: 7.31999e-06 [updatestate_assign_eliminate]: 4.29002e-06 [updatestate_loads_eliminate]: 4.17e-06 [renormalize]: 7.2e-07 [cse]: 3.164e-05 [optimize_parallel_all_gather_comm]: 2.068e-05 [overlap_param_gather]: 2.34001e-06 [cconv]: 1.969e-05 [loop_unroll]: 0.00043498 [opt_after_cconv]: 0.00016713, [1] [Cycle 1]: 0.00016097, [7] [c_1]: 4.952e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 7.63001e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 4.25999e-06 [cse]: 3.032e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 2.942e-05 [tuple_transform]: 0.00010511, [1] [Cycle 1]: 0.00010036, [4] [d_1]: 6.893e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 1.037e-05 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 6.041e-05 [cse_after_recomputation]: 3.347e-05, [1] [Cycle 1]: 2.865e-05, [1] [cse]: 2.308e-05 [environ_conv]: 8.87e-06 [swap_dp_allreduce_reducescatter]: 8.42e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.16001e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.90025e-07 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.748e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 5.11002e-06 [overlap_recompute_and_grad_model_parallel]: 5.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.70001e-06 [overlap_recompute_comm]: 1.89999e-06 [overlap_grad_ring_attention]: 5.71e-06 [overlap_grad_flash_sp]: 2.494e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 0.0001017, [1] [Cycle 1]: 9.727e-05, [6] [build]: 1.011e-05 [elim_shapecalc]: 1.382e-05 [elim_not_effective]: 1.898e-05 [opt_reshape]: 1.061e-05 [fold_const_symbol]: 1.503e-05 [renormalize]: 2.40019e-07 [detach_backward]: 2.09e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.576e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00047686 [validate]: 4.47e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00849315 [execute]: 6.93e-06 Sums bootstrap : 0.000505s : 1.54% type_inference : 0.010593s : 32.22% event_method : 0.000044s : 0.13% auto_monad : 0.000121s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000131s : 0.40% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000128s : 0.39% optimize.opt_a.loop_unroll : 0.000111s : 0.34% optimize.opt_a.a_1 : 0.003275s : 9.96% optimize.opt_a.with_stream_mark : 0.000047s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000506s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.17% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000027s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001475s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.28% optimize.opt_a.a_after_grad : 0.000113s : 0.34% optimize.opt_a.renormalize : 0.003185s : 9.69% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.26% optimize.opt_a.cse : 0.000251s : 0.76% optimize.opt_a.a_3 : 0.000473s : 1.44% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000052s : 0.16% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000471s : 1.43% optimize.opt_b.b_1 : 0.000194s : 0.59% optimize.opt_b.b_2 : 0.000012s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000435s : 1.32% optimize.opt_after_cconv.c_1 : 0.000050s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000069s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.18% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000477s : 1.45% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008493s : 25.83% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000774 218 5.91% : 0.000046s : 11: substitution.arithmetic_simplify 1.79% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 56.38% : 0.000437s : 16: substitution.inline 2.11% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.87% : 0.000014s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.74% : 0.000014s : 20: substitution.remove_not_recompute_node 3.21% : 0.000025s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.62% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.17% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010517 2 87.25% : 0.009176s : 1: type_inference.infer 12.75% : 0.001341s : 1: type_inference.specialize ------[replace.] 0.000219 30 59.23% : 0.000129s : 16: replace.inline 40.77% : 0.000089s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000459 30 93.21% : 0.000428s : 16: match.inline 6.79% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000762 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.17% : 0.000016s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.65% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.37% : 0.000018s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000043s : 244: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.58% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.37% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 67: predicate.reduce_eliminate 2.60% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.12% : 0.000009s : 67: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.34% : 0.000003s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 97: predicate.switch_defer_inline 2.88% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000037s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.52% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000013s : 83: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.58% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.20% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001581 32 56.68% : 0.000896s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.32% : 0.000685s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062374 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.90% : 0.003058s : 1: add_attr 4.89% : 0.003049s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000128s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.87% : 0.000540s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000051s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000444s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.96% : 0.004965s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000180s : 28: opt.transform.opt_b 0.12% : 0.000077s : 2: opt.transform.opt_trans_graph 0.09% : 0.000055s : 4: opt.transform.symbol_engine_opt 18.04% : 0.011252s : 1: opt_a 0.27% : 0.000171s : 1: opt_after_cconv 0.78% : 0.000487s : 1: opt_after_jit_grad 0.48% : 0.000300s : 1: opt_b 21.79% : 0.013594s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.73% : 0.001700s : 2: renormalize.infer 2.36% : 0.001470s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000056s : 1: rewriter_after_opt_a 0.22% : 0.000136s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000104s : 1: symbol_engine_optimizer 13.63% : 0.008504s : 1: task_emit 0.17% : 0.000108s : 1: tuple_transform 17.01% : 0.010609s : 1: type_inference 0.13% : 0.000079s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x2-kbk],max_mem:6.0M TotalTime = 0.122132, [24] [bootstrap]: 0.00063717 [type_inference]: 0.00661376 [event_method]: 1.511e-05 [auto_monad]: 5.618e-05 [graph_reusing]: 5.74999e-06 [inline]: 1.64e-06 [add_attr]: 0.00347806, [1] [add_attr_with_inline]: 0.00346746, [1] [Cycle 1]: 4.44e-05, [2] [tag_attr]: 1.531e-05 [meta_addattr_fg_expand]: 4.33001e-06 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 2.801e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00412533, [53] [py_interpret_to_execute]: 2.129e-05 [rewriter_before_opt_a]: 5.886e-05 [opt_a]: 0.00222479, [2] [Cycle 1]: 0.00160946, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 3.281e-05 [loop_unroll]: 2.252e-05 [a_1]: 0.00047638 [with_stream_mark]: 1.31e-05 [recompute_prepare]: 8.13999e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 0.00010232 [accelerated_algorithm]: 6.88998e-06 [shard]: 2.44999e-06 [meta_shard_fg_expand]: 2.09e-06 [shard_inline]: 6.19001e-06 [merge_send_recv]: 8.37e-06 [auto_parallel]: 6.09001e-06 [parallel]: 2.264e-05 [flash_sp]: 7.21001e-06 [merge_comm]: 3.83999e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 8.45999e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 5.84999e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 9.57999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.134e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.30998e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 1.053e-05 [a_after_grad]: 9.22999e-06 [renormalize]: 0.00044294 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.376e-05 [cse]: 2.904e-05 [a_3]: 4.372e-05 [Cycle 2]: 0.00060581, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.0001291 [with_stream_mark]: 1.025e-05 [recompute_prepare]: 5.80002e-06 [updatestate_depend_eliminate]: 2.94001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.96e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 4.45999e-06 [auto_parallel]: 5.35001e-06 [parallel]: 3.97e-06 [flash_sp]: 3.69002e-06 [merge_comm]: 3.26999e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.63998e-06 [virtual_dataset]: 5.64998e-06 [get_grad_eliminate_]: 5.23002e-06 [virtual_output]: 5.09998e-06 [merge_forward]: 2.92002e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.006e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.15e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13998e-06 [meta_fg_expand]: 1.81998e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.27e-06 [after_resolve]: 9.53997e-06 [a_after_grad]: 8.38001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.40024e-07 [auto_monad_eliminator]: 6.78e-06 [cse]: 1.348e-05 [a_3]: 3.353e-05 [py_interpret_to_execute_after_opt_a]: 7.82e-06 [slice_cell_reuse_recomputed_activation]: 2.03997e-06 [rewriter_after_opt_a]: 3.192e-05 [convert_after_rewriter]: 7.2e-06 [order_py_execute_after_rewriter]: 5.15001e-06 [mutable_eliminate]: 0.00045845 [opt_b]: 0.00019408, [1] [Cycle 1]: 0.00018822, [7] [b_1]: 0.00011191 [b_2]: 6.91001e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.28998e-06 [renormalize]: 4.59986e-07 [cse]: 2.351e-05 [optimize_parallel_all_gather_comm]: 1.637e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.22e-05 [loop_unroll]: 0.00043175 [opt_after_cconv]: 9.854e-05, [1] [Cycle 1]: 9.247e-05, [7] [c_1]: 2.839e-05 [parameter_eliminate]: 2.61999e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.49999e-06 [cse]: 1.69e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.327e-05 [tuple_transform]: 7.171e-05, [1] [Cycle 1]: 6.712e-05, [4] [d_1]: 4.079e-05 [none_parameter_eliminate]: 1.88002e-06 [renormalize]: 1.40019e-07 [switch_simplify]: 6.48003e-06 [partial_unused_args_eliminate]: 1.66998e-06 [add_recomputation]: 5.225e-05 [cse_after_recomputation]: 2.163e-05, [1] [Cycle 1]: 1.721e-05, [1] [cse]: 1.191e-05 [environ_conv]: 4.77998e-06 [swap_dp_allreduce_reducescatter]: 5.17999e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.39002e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 1.02998e-06 [remove_cast_before_assign_add]: 1.42e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50001e-06 [control_data_broadcast_order]: 1.164e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.78999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.711e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.964e-05, [1] [Cycle 1]: 6.553e-05, [6] [build]: 2.49001e-06 [elim_shapecalc]: 8.67998e-06 [elim_not_effective]: 1.198e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.99e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 1.562e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00045894 [validate]: 3.207e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.106418 [execute]: 9.12001e-06 Sums bootstrap : 0.000637s : 0.54% type_inference : 0.006614s : 5.62% event_method : 0.000015s : 0.01% auto_monad : 0.000056s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000028s : 0.02% optimize.opt_a.a_1 : 0.000605s : 0.51% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000172s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000443s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000077s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000458s : 0.39% optimize.opt_b.b_1 : 0.000112s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000024s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000432s : 0.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.04% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000459s : 0.39% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.106418s : 90.45% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000171 30 14.60% : 0.000025s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.34% : 0.000006s : 4: substitution.graph_param_transform 66.64% : 0.000114s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000005s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.72% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006562 2 90.79% : 0.005958s : 1: type_inference.infer 9.21% : 0.000605s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.01% : 0.000029s : 3: replace.inline 29.99% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 91.54% : 0.000111s : 3: match.inline 8.46% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000186 1131 0.75% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 11: predicate.addn_zero_filter 0.69% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.01% : 0.000004s : 19: predicate.arithmetic_simplify 0.80% : 0.000001s : 11: predicate.cast_eliminate 0.62% : 0.000001s : 8: predicate.check_bprop_eliminate 0.50% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.52% : 0.000001s : 8: predicate.depend_value_elim 0.75% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.75% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.99% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.34% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.98% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.93% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.97% : 0.000002s : 15: predicate.environ_get_depend_swap 1.61% : 0.000003s : 23: predicate.environ_get_eliminate 0.95% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.07% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.98% : 0.000004s : 16: predicate.float_depend_g_call 0.50% : 0.000001s : 8: predicate.float_environ_get_switch 0.78% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.67% : 0.000001s : 8: predicate.get_grad_eliminate 0.19% : 0.000000s : 4: predicate.graph_param_transform 0.58% : 0.000001s : 8: predicate.incorporate_call 12.35% : 0.000023s : 8: predicate.incorporate_call_switch 5.15% : 0.000010s : 51: predicate.inline 0.73% : 0.000001s : 8: predicate.inline_without_move 0.34% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.76% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.09% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.02% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.52% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.57% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.31% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.51% : 0.000003s : 16: predicate.partial_defer_inline 1.29% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000002s : 11: predicate.print_const_string_wrapper 0.58% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.02% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.43% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 21: predicate.replace_applicator 0.47% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.35% : 0.000001s : 4: predicate.row_tensor_eliminate 0.72% : 0.000001s : 8: predicate.same_eliminate 0.44% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.73% : 0.000001s : 8: predicate.shard_identity_eliminate 0.68% : 0.000001s : 8: predicate.special_op_eliminate 0.67% : 0.000001s : 8: predicate.specialize_transform 0.87% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.69% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.34% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.15% : 0.000002s : 16: predicate.switch_defer_inline 1.78% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.45% : 0.000008s : 54: predicate.switch_simplify 0.68% : 0.000001s : 11: predicate.tile_eliminate 0.79% : 0.000001s : 11: predicate.transpose_eliminate 1.40% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.41% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.95% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.45% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.97% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.76% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.66% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000390 8 46.87% : 0.000183s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.13% : 0.000207s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131341 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.65% : 0.003483s : 1: add_attr 2.64% : 0.003471s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000061s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.52% : 0.000680s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.77% : 0.001009s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000093s : 28: opt.transform.opt_b 0.03% : 0.000045s : 2: opt.transform.opt_trans_graph 0.02% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.70% : 0.002228s : 1: opt_a 0.08% : 0.000102s : 1: opt_after_cconv 0.36% : 0.000468s : 1: opt_after_jit_grad 0.15% : 0.000197s : 1: opt_b 3.14% : 0.004129s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.17% : 0.000222s : 1: renormalize.infer 0.16% : 0.000214s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 81.04% : 0.106440s : 1: task_emit 0.06% : 0.000075s : 1: tuple_transform 5.05% : 0.006628s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.114334, [24] [bootstrap]: 0.00047249 [type_inference]: 0.0045204 [event_method]: 1.117e-05 [auto_monad]: 5.194e-05 [graph_reusing]: 5.20999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00302491, [1] [add_attr_with_inline]: 0.00301703, [1] [Cycle 1]: 4.686e-05, [2] [tag_attr]: 1.225e-05 [meta_addattr_fg_expand]: 3.35e-06 [parallel-infer-symbol]: 3.48e-06 [pre_auto_parallel]: 2.127e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.75001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00379698, [53] [py_interpret_to_execute]: 1.505e-05 [rewriter_before_opt_a]: 4.024e-05 [opt_a]: 0.00192114, [2] [Cycle 1]: 0.00129946, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 2.565e-05 [loop_unroll]: 1.434e-05 [a_1]: 0.00030425 [with_stream_mark]: 1.43e-05 [recompute_prepare]: 7.35e-06 [updatestate_depend_eliminate]: 3.35e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.846e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.66e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.88999e-06 [auto_parallel]: 6.03998e-06 [parallel]: 1.852e-05 [flash_sp]: 7.41001e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.85999e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 6.86001e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.57001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.125e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 1.008e-05 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 1.066e-05 [a_after_grad]: 8.90001e-06 [renormalize]: 0.00036493 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.287e-05 [cse]: 2.746e-05 [a_3]: 4.213e-05 [Cycle 2]: 0.00061247, [45] [expand_dump_flag]: 1.09998e-06 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00013066 [with_stream_mark]: 1.202e-05 [recompute_prepare]: 5.99e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 7.058e-05 [accelerated_algorithm]: 5.81e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 4.23999e-06 [auto_parallel]: 5.72001e-06 [parallel]: 4.75001e-06 [flash_sp]: 3.27997e-06 [merge_comm]: 3.19001e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.39e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 5.04998e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 6.43e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69999e-06 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 8.12998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 1.86998e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.25e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 9.19972e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.297e-05 [a_3]: 3.304e-05 [py_interpret_to_execute_after_opt_a]: 8.02e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 3.054e-05 [convert_after_rewriter]: 7.18998e-06 [order_py_execute_after_rewriter]: 5.86998e-06 [mutable_eliminate]: 0.00045038 [opt_b]: 0.00018978, [1] [Cycle 1]: 0.00018358, [7] [b_1]: 0.0001138 [b_2]: 7.63001e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 6.90023e-07 [cse]: 1.686e-05 [optimize_parallel_all_gather_comm]: 1.603e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00041591 [opt_after_cconv]: 9.676e-05, [1] [Cycle 1]: 9.045e-05, [7] [c_1]: 2.879e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.28998e-06 [cse]: 1.621e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.239e-05 [tuple_transform]: 7.158e-05, [1] [Cycle 1]: 6.71e-05, [4] [d_1]: 4.07e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.62002e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.44e-05 [cse_after_recomputation]: 2.009e-05, [1] [Cycle 1]: 1.581e-05, [1] [cse]: 1.047e-05 [environ_conv]: 4.56002e-06 [swap_dp_allreduce_reducescatter]: 5.57001e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 1.12e-06 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.30999e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.172e-05 [grouped_pairwise_exchange_alltoall]: 1.67001e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.82998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.747e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.97001e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.40025e-07 [symbol_engine_optimizer]: 7.234e-05, [1] [Cycle 1]: 6.786e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 9.24e-06 [elim_not_effective]: 1.228e-05 [opt_reshape]: 6.49999e-06 [fold_const_symbol]: 9.15001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.656e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00045241 [validate]: 3.295e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.101686 [execute]: 8.62e-06 Sums bootstrap : 0.000472s : 0.43% type_inference : 0.004520s : 4.10% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000435s : 0.39% optimize.opt_a.with_stream_mark : 0.000026s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000149s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000365s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000075s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.41% optimize.opt_b.b_1 : 0.000114s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000416s : 0.38% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000452s : 0.41% validate : 0.000033s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.101686s : 92.20% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000123 26 17.72% : 0.000022s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.44% : 0.000005s : 4: substitution.graph_param_transform 66.14% : 0.000081s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.49% : 0.000004s : 4: substitution.remove_not_recompute_node 3.38% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004480 2 91.40% : 0.004095s : 1: type_inference.infer 8.60% : 0.000385s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000144 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000004s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.33% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.73% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.75% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.77% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.82% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.48% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.96% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.38% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.72% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000279 6 43.27% : 0.000121s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.73% : 0.000158s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.122441 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.47% : 0.003029s : 1: add_attr 2.47% : 0.003021s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000506s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.65% : 0.000797s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000095s : 28: opt.transform.opt_b 0.04% : 0.000045s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.57% : 0.001924s : 1: opt_a 0.08% : 0.000101s : 1: opt_after_cconv 0.38% : 0.000463s : 1: opt_after_jit_grad 0.16% : 0.000193s : 1: opt_b 3.10% : 0.003801s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000197s : 1: renormalize.infer 0.13% : 0.000161s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000075s : 1: symbol_engine_optimizer 83.07% : 0.101708s : 1: task_emit 0.06% : 0.000075s : 1: tuple_transform 3.70% : 0.004535s : 1: type_inference 0.04% : 0.000055s : 1: validate TotalTime = 0.110421, [24] [bootstrap]: 0.0004752 [type_inference]: 0.00560774 [event_method]: 1.501e-05 [auto_monad]: 5.53e-05 [graph_reusing]: 5.12e-06 [inline]: 1.92001e-06 [add_attr]: 0.00299548, [1] [add_attr_with_inline]: 0.00298737, [1] [Cycle 1]: 4.63e-05, [2] [tag_attr]: 1.529e-05 [meta_addattr_fg_expand]: 4.40999e-06 [parallel-infer-symbol]: 2.72001e-06 [pre_auto_parallel]: 2.462e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.49003e-06 [optimize]: 0.00404356, [53] [py_interpret_to_execute]: 2.206e-05 [rewriter_before_opt_a]: 5.922e-05 [opt_a]: 0.00216424, [2] [Cycle 1]: 0.00154599, [45] [expand_dump_flag]: 2.65002e-06 [switch_simplify]: 3.176e-05 [loop_unroll]: 2.148e-05 [a_1]: 0.0004507 [with_stream_mark]: 1.378e-05 [recompute_prepare]: 7.93999e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.83001e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.831e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 1.81e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 7.34002e-06 [auto_parallel]: 6.03998e-06 [parallel]: 1.849e-05 [flash_sp]: 6.96001e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 8.75001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.26999e-06 [virtual_dataset]: 6.33002e-06 [get_grad_eliminate_]: 5.72999e-06 [virtual_output]: 5.82999e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 1.012e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.045e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.30001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.48e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00044856 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.373e-05 [cse]: 2.754e-05 [a_3]: 4.164e-05 [Cycle 2]: 0.00060866, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.03998e-06 [loop_unroll]: 5.74999e-06 [a_1]: 0.00012236 [with_stream_mark]: 1.009e-05 [recompute_prepare]: 6.02999e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 7.90023e-07 [a_2]: 6.793e-05 [accelerated_algorithm]: 5.56002e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.61003e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.26998e-06 [parallel]: 3.98001e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.78998e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.73998e-06 [cell_reuse_recompute_pass]: 1.59e-06 [offload_activation]: 6.39001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.96998e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.15998e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.09988e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.21002e-06 [a_after_grad]: 8.77e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.412e-05 [a_3]: 3.281e-05 [py_interpret_to_execute_after_opt_a]: 7.8e-06 [slice_cell_reuse_recomputed_activation]: 2.16998e-06 [rewriter_after_opt_a]: 3.205e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.55001e-06 [mutable_eliminate]: 0.00047355 [opt_b]: 0.00018413, [1] [Cycle 1]: 0.00017816, [7] [b_1]: 0.00010887 [b_2]: 7.66001e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.43002e-06 [renormalize]: 4.40021e-07 [cse]: 1.675e-05 [optimize_parallel_all_gather_comm]: 1.644e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.214e-05 [loop_unroll]: 0.0004195 [opt_after_cconv]: 9.759e-05, [1] [Cycle 1]: 9.178e-05, [7] [c_1]: 2.89e-05 [parameter_eliminate]: 2.36998e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.40002e-06 [cse]: 1.665e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.282e-05 [tuple_transform]: 6.997e-05, [1] [Cycle 1]: 6.529e-05, [4] [d_1]: 3.895e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.74999e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.313e-05 [cse_after_recomputation]: 2.104e-05, [1] [Cycle 1]: 1.66e-05, [1] [cse]: 1.138e-05 [environ_conv]: 4.87e-06 [swap_dp_allreduce_reducescatter]: 5.78002e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.29002e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.25002e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.122e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.58999e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.711e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 6.84e-05, [1] [Cycle 1]: 6.43e-05, [6] [build]: 2.09e-06 [elim_shapecalc]: 8.57e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 9.10999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.59998e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.632e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 4.01001e-06 [opt_after_jit_grad]: 0.00045364 [validate]: 3.106e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0964615 [execute]: 9.02e-06 Sums bootstrap : 0.000475s : 0.45% type_inference : 0.005608s : 5.27% event_method : 0.000015s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000573s : 0.54% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000449s : 0.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000474s : 0.44% optimize.opt_b.b_1 : 0.000109s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000419s : 0.39% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000454s : 0.43% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096462s : 90.63% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.82% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 66.90% : 0.000111s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.56% : 0.000004s : 4: substitution.replace_old_param 6.46% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005565 2 89.86% : 0.005001s : 1: type_inference.infer 10.14% : 0.000564s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.13% : 0.000028s : 3: replace.inline 28.87% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.81% : 0.000109s : 3: match.inline 8.19% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.83% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.92% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.60% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000002s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000002s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.93% : 0.000002s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.68% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 45.90% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.10% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119009 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.52% : 0.003000s : 1: add_attr 2.51% : 0.002991s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.43% : 0.000511s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000483s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.79% : 0.000945s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.82% : 0.002167s : 1: opt_a 0.08% : 0.000101s : 1: opt_after_cconv 0.39% : 0.000463s : 1: opt_after_jit_grad 0.16% : 0.000188s : 1: opt_b 3.40% : 0.004047s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000028s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.19% : 0.000224s : 1: renormalize.infer 0.18% : 0.000218s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 81.07% : 0.096483s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.72% : 0.005622s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.148684, [24] [bootstrap]: 0.00050755 [type_inference]: 0.0115619 [event_method]: 4.922e-05 [auto_monad]: 0.00012245 [graph_reusing]: 8.68001e-06 [inline]: 1.77999e-06 [add_attr]: 0.00302085, [1] [add_attr_with_inline]: 0.0030128, [1] [Cycle 1]: 6.957e-05, [2] [tag_attr]: 3.418e-05 [meta_addattr_fg_expand]: 9.00999e-06 [parallel-infer-symbol]: 3.01001e-06 [pre_auto_parallel]: 4.909e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.0134328, [53] [py_interpret_to_execute]: 3.853e-05 [rewriter_before_opt_a]: 0.00014495 [opt_a]: 0.0111169, [3] [Cycle 1]: 0.0071218, [45] [expand_dump_flag]: 3.81999e-06 [switch_simplify]: 7.38e-05 [loop_unroll]: 6.241e-05 [a_1]: 0.00145458 [with_stream_mark]: 2.298e-05 [recompute_prepare]: 2.197e-05 [updatestate_depend_eliminate]: 9.35001e-06 [updatestate_assign_eliminate]: 7.99002e-06 [updatestate_loads_eliminate]: 7.08e-06 [parameter_eliminate]: 2.83998e-06 [a_2]: 0.00024494 [accelerated_algorithm]: 2.98e-05 [shard]: 2.34999e-06 [meta_shard_fg_expand]: 3.39001e-06 [shard_inline]: 1.591e-05 [merge_send_recv]: 1.563e-05 [auto_parallel]: 1.057e-05 [parallel]: 1.841e-05 [flash_sp]: 1.18e-05 [merge_comm]: 9.54e-06 [allreduce_fusion]: 9.05999e-06 [matmul_add_comm_reduction]: 2.58e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.793e-05 [virtual_dataset]: 1.564e-05 [get_grad_eliminate_]: 1.512e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.86998e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.712e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.78e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 2.74e-05 [set_forward_comm_id_for_comm_node_pass]: 9.50001e-06 [meta_fg_expand]: 0.0014208 [flash_sp_send_recv_attached]: 3.43e-06 [receive_attached]: 3.26001e-06 [after_resolve]: 5.919e-05 [a_after_grad]: 8.067e-05 [renormalize]: 0.00247709 [add_forward_monad_depend]: 9.15999e-06 [auto_monad_grad]: 5.25001e-06 [auto_monad_eliminator]: 5.603e-05 [cse]: 0.00017103 [a_3]: 0.00033457 [Cycle 2]: 0.0030685, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.681e-05 [loop_unroll]: 4.465e-05 [a_1]: 0.00157447 [with_stream_mark]: 1.182e-05 [recompute_prepare]: 1.118e-05 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 3.5e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 0.00012637 [accelerated_algorithm]: 1.179e-05 [shard]: 1.24003e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 6.67002e-06 [auto_parallel]: 7.33e-06 [parallel]: 4.72e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 6.10002e-06 [allreduce_fusion]: 5.15001e-06 [matmul_add_comm_reduction]: 7.75998e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 9.74e-06 [virtual_dataset]: 8.91002e-06 [get_grad_eliminate_]: 8.52998e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.50999e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 9.28002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.663e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.399e-05 [set_forward_comm_id_for_comm_node_pass]: 5.33002e-06 [meta_fg_expand]: 7.027e-05 [flash_sp_send_recv_attached]: 1.10001e-06 [receive_attached]: 1.03001e-06 [after_resolve]: 1.643e-05 [a_after_grad]: 1.494e-05 [renormalize]: 0.00061048 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.506e-05 [cse]: 4.442e-05 [a_3]: 6.621e-05 [Cycle 3]: 0.00091239, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.084e-05 [loop_unroll]: 8.85001e-06 [a_1]: 0.00025344 [with_stream_mark]: 9.40001e-06 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 4.73001e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.95e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 0.00012423 [accelerated_algorithm]: 1.179e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 8.93002e-06 [merge_send_recv]: 7.1e-06 [auto_parallel]: 7.82e-06 [parallel]: 5.12999e-06 [flash_sp]: 9.70002e-07 [merge_comm]: 4.89003e-06 [allreduce_fusion]: 4.97999e-06 [matmul_add_comm_reduction]: 7.61999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.02e-05 [virtual_dataset]: 8.77999e-06 [get_grad_eliminate_]: 8.42e-06 [virtual_output]: 8.12e-06 [merge_forward]: 4.34002e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 8.50001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.6e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.406e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 1.336e-05 [a_after_grad]: 1.418e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 1.154e-05 [cse]: 2.733e-05 [a_3]: 5.909e-05 [py_interpret_to_execute_after_opt_a]: 1.028e-05 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 4.755e-05 [convert_after_rewriter]: 9.46e-06 [order_py_execute_after_rewriter]: 7.08998e-06 [mutable_eliminate]: 0.00049481 [opt_b]: 0.00029182, [1] [Cycle 1]: 0.00028558, [7] [b_1]: 0.00019184 [b_2]: 1.065e-05 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.95998e-06 [renormalize]: 6.39993e-07 [cse]: 3.221e-05 [optimize_parallel_all_gather_comm]: 2.093e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 1.914e-05 [loop_unroll]: 0.00042782 [opt_after_cconv]: 0.00013879, [1] [Cycle 1]: 0.00013291, [7] [c_1]: 4.895e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 7.25e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.98001e-06 [cse]: 3.147e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 2.906e-05 [tuple_transform]: 0.00010366, [1] [Cycle 1]: 9.888e-05, [4] [d_1]: 6.829e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 9.94001e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 5.79e-05 [cse_after_recomputation]: 3.348e-05, [1] [Cycle 1]: 2.821e-05, [1] [cse]: 2.255e-05 [environ_conv]: 8.42998e-06 [swap_dp_allreduce_reducescatter]: 8.2e-06 [bias_add_comm_swap]: 2.91e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 1.97999e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.663e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 4.98001e-06 [overlap_recompute_and_grad_model_parallel]: 5.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 5.60001e-06 [overlap_grad_flash_sp]: 2.401e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 9.937e-05, [1] [Cycle 1]: 9.516e-05, [6] [build]: 9.80002e-06 [elim_shapecalc]: 1.366e-05 [elim_not_effective]: 1.818e-05 [opt_reshape]: 1.045e-05 [fold_const_symbol]: 1.489e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 2.478e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00047211 [validate]: 4.564e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.119138 [execute]: 9.24e-06 Sums bootstrap : 0.000508s : 0.35% type_inference : 0.011562s : 8.01% event_method : 0.000049s : 0.03% auto_monad : 0.000122s : 0.08% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000145s : 0.10% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.09% optimize.opt_a.loop_unroll : 0.000116s : 0.08% optimize.opt_a.a_1 : 0.003282s : 2.27% optimize.opt_a.with_stream_mark : 0.000044s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001494s : 1.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003088s : 2.14% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000243s : 0.17% optimize.opt_a.a_3 : 0.000460s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000495s : 0.34% optimize.opt_b.b_1 : 0.000192s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.01% optimize.loop_unroll : 0.000428s : 0.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000068s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000472s : 0.33% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.119138s : 82.52% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000767 222 6.00% : 0.000046s : 12: substitution.arithmetic_simplify 1.82% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 56.22% : 0.000431s : 17: substitution.inline 1.99% : 0.000015s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.84% : 0.000014s : 3: substitution.less_batch_normalization 1.66% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 20: substitution.remove_not_recompute_node 3.12% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.68% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.49% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011486 2 86.39% : 0.009922s : 1: type_inference.infer 13.61% : 0.001564s : 1: type_inference.specialize ------[replace.] 0.000221 33 57.76% : 0.000128s : 17: replace.inline 42.24% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000455 33 92.78% : 0.000422s : 17: match.inline 7.22% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000786 5764 1.03% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.01% : 0.000008s : 68: predicate.addn_zero_filter 1.00% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.09% : 0.000016s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.06% : 0.000008s : 68: predicate.check_bprop_eliminate 0.49% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_depend_swap 1.66% : 0.000013s : 108: predicate.environ_get_eliminate 1.14% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.21% : 0.000017s : 101: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.38% : 0.000042s : 249: predicate.inline 1.21% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.60% : 0.000005s : 32: predicate.less_batch_normalization 1.55% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.57% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.32% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.05% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.05% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 68: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.92% : 0.000015s : 101: predicate.partial_defer_inline 1.68% : 0.000013s : 92: predicate.partial_eliminate 1.03% : 0.000008s : 68: predicate.print_const_string_wrapper 0.50% : 0.000004s : 32: predicate.reduce_all_const_elim 1.24% : 0.000010s : 68: predicate.reduce_eliminate 2.56% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.83% : 0.000014s : 152: predicate.replace_applicator 0.56% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.05% : 0.000008s : 68: predicate.reshape_eliminate 1.06% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.18% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.58% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.19% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.08% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 101: predicate.switch_defer_inline 2.82% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.80% : 0.000038s : 277: predicate.switch_simplify 1.05% : 0.000008s : 68: predicate.tile_eliminate 1.04% : 0.000008s : 68: predicate.transpose_eliminate 5.32% : 0.000042s : 84: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.49% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 1.94% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.53% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.12% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.53% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001649 34 55.30% : 0.000912s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.70% : 0.000737s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.173494 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.74% : 0.003025s : 1: add_attr 1.74% : 0.003017s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000130s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000543s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000057s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.29% : 0.000504s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.85% : 0.004944s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000176s : 28: opt.transform.opt_b 0.04% : 0.000076s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.41% : 0.011120s : 1: opt_a 0.08% : 0.000142s : 1: opt_after_cconv 0.28% : 0.000481s : 1: opt_after_jit_grad 0.17% : 0.000296s : 1: opt_b 7.74% : 0.013437s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000054s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.95% : 0.001648s : 2: renormalize.infer 0.82% : 0.001426s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.09% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000102s : 1: symbol_engine_optimizer 68.68% : 0.119159s : 1: task_emit 0.06% : 0.000107s : 1: tuple_transform 6.67% : 0.011577s : 1: type_inference 0.04% : 0.000070s : 1: validate TotalTime = 0.106053, [24] [bootstrap]: 0.00049897 [type_inference]: 0.00434085 [event_method]: 1.066e-05 [auto_monad]: 5.108e-05 [graph_reusing]: 4.80999e-06 [inline]: 1.90001e-06 [add_attr]: 0.00295925, [1] [add_attr_with_inline]: 0.00295074, [1] [Cycle 1]: 4.617e-05, [2] [tag_attr]: 1.221e-05 [meta_addattr_fg_expand]: 3.31001e-06 [parallel-infer-symbol]: 2.95998e-06 [pre_auto_parallel]: 2.115e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 2.53e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00372021, [53] [py_interpret_to_execute]: 1.491e-05 [rewriter_before_opt_a]: 3.89e-05 [opt_a]: 0.00190383, [2] [Cycle 1]: 0.00130167, [45] [expand_dump_flag]: 2.46998e-06 [switch_simplify]: 2.429e-05 [loop_unroll]: 1.387e-05 [a_1]: 0.0003318 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 7.46999e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.54002e-06 [updatestate_loads_eliminate]: 2.78998e-06 [parameter_eliminate]: 2.06e-06 [a_2]: 7.661e-05 [accelerated_algorithm]: 6.23e-06 [shard]: 2.08002e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 7.75e-06 [auto_parallel]: 5.95002e-06 [parallel]: 1.698e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 8.69998e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.76003e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.57002e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.93998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.097e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00035355 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 1.57999e-06 [auto_monad_eliminator]: 1.266e-05 [cse]: 2.712e-05 [a_3]: 4.083e-05 [Cycle 2]: 0.00059274, [45] [expand_dump_flag]: 8.90024e-07 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012546 [with_stream_mark]: 9.12001e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.832e-05 [accelerated_algorithm]: 5.83997e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.05999e-06 [parallel]: 4.20999e-06 [flash_sp]: 3.23e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.30001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 5.90002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.31e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.31002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.84999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 9.25001e-06 [a_after_grad]: 7.89002e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 5.97999e-06 [cse]: 1.327e-05 [a_3]: 3.182e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 3.26e-05 [convert_after_rewriter]: 6.75998e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00045102 [opt_b]: 0.00018216, [1] [Cycle 1]: 0.00017602, [7] [b_1]: 0.00010843 [b_2]: 7.46001e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 3.10014e-07 [cse]: 1.594e-05 [optimize_parallel_all_gather_comm]: 1.55e-05 [overlap_param_gather]: 2.21998e-06 [cconv]: 2.222e-05 [loop_unroll]: 0.00041707 [opt_after_cconv]: 9.562e-05, [1] [Cycle 1]: 8.989e-05, [7] [c_1]: 2.851e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.575e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.175e-05 [tuple_transform]: 6.853e-05, [1] [Cycle 1]: 6.436e-05, [4] [d_1]: 3.906e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.383e-05 [cse_after_recomputation]: 2.018e-05, [1] [Cycle 1]: 1.58e-05, [1] [cse]: 1.074e-05 [environ_conv]: 4.43999e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 2.96999e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 7.39994e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.31998e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 4.20999e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.58998e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.755e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 6.98e-05, [1] [Cycle 1]: 6.58e-05, [6] [build]: 2.77002e-06 [elim_shapecalc]: 8.99e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.79e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.95001e-06 [pipeline_parallel_scheduler]: 1.41998e-06 [auto_monad_reorder]: 1.517e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00045304 [validate]: 5.263e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.0936868 [execute]: 8.72e-06 Sums bootstrap : 0.000499s : 0.49% type_inference : 0.004341s : 4.25% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000457s : 0.45% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000354s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.44% optimize.opt_b.b_1 : 0.000108s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000417s : 0.41% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000453s : 0.44% validate : 0.000053s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.093687s : 91.74% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000124 26 17.35% : 0.000021s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.11% : 0.000005s : 4: substitution.graph_param_transform 67.39% : 0.000083s : 2: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.99% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004300 2 91.64% : 0.003940s : 1: type_inference.infer 8.36% : 0.000360s : 1: type_inference.specialize ------[replace.] 0.000054 2 100.00% : 0.000054s : 2: replace.inline ------[match.] 0.000082 2 100.00% : 0.000082s : 2: match.inline ------[predicate.] 0.000137 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 1.02% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 1.04% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.83% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 26: predicate.load_eliminater 1.36% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.64% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000249 6 41.23% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.77% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.114048 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.60% : 0.002963s : 1: add_attr 2.59% : 0.002954s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.47% : 0.000533s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.71% : 0.000809s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.67% : 0.001907s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.41% : 0.000462s : 1: opt_after_jit_grad 0.16% : 0.000186s : 1: opt_b 3.27% : 0.003724s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.17% : 0.000194s : 1: renormalize.infer 0.13% : 0.000153s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 82.17% : 0.093709s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.82% : 0.004354s : 1: type_inference 0.07% : 0.000075s : 1: validate TotalTime = 0.147249, [24] [bootstrap]: 0.00050303 [type_inference]: 0.0103383 [event_method]: 4.426e-05 [auto_monad]: 0.00011596 [graph_reusing]: 8.2e-06 [inline]: 1.96998e-06 [add_attr]: 0.00297574, [1] [add_attr_with_inline]: 0.00296705, [1] [Cycle 1]: 6.523e-05, [2] [tag_attr]: 3.064e-05 [meta_addattr_fg_expand]: 8.35999e-06 [parallel-infer-symbol]: 3.31001e-06 [pre_auto_parallel]: 4.553e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0132887, [53] [py_interpret_to_execute]: 3.57e-05 [rewriter_before_opt_a]: 0.00012816 [opt_a]: 0.0109961, [3] [Cycle 1]: 0.00704133, [45] [expand_dump_flag]: 3.83001e-06 [switch_simplify]: 6.794e-05 [loop_unroll]: 5.466e-05 [a_1]: 0.00137687 [with_stream_mark]: 2.315e-05 [recompute_prepare]: 2.139e-05 [updatestate_depend_eliminate]: 9.04e-06 [updatestate_assign_eliminate]: 7.46001e-06 [updatestate_loads_eliminate]: 6.98998e-06 [parameter_eliminate]: 2.46e-06 [a_2]: 0.00024592 [accelerated_algorithm]: 3.045e-05 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 3.33e-06 [shard_inline]: 1.599e-05 [merge_send_recv]: 1.609e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.838e-05 [flash_sp]: 1.171e-05 [merge_comm]: 9.63002e-06 [allreduce_fusion]: 8.83001e-06 [matmul_add_comm_reduction]: 2.672e-05 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 1.826e-05 [virtual_dataset]: 1.562e-05 [get_grad_eliminate_]: 1.519e-05 [virtual_output]: 1.566e-05 [merge_forward]: 9.28002e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.788e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.876e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.709e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71003e-06 [meta_fg_expand]: 0.00140114 [flash_sp_send_recv_attached]: 3.87998e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 5.993e-05 [a_after_grad]: 8.234e-05 [renormalize]: 0.00250361 [add_forward_monad_depend]: 1.013e-05 [auto_monad_grad]: 5.09e-06 [auto_monad_eliminator]: 5.745e-05 [cse]: 0.00016833 [a_3]: 0.000336 [Cycle 2]: 0.00303304, [45] [expand_dump_flag]: 1.61998e-06 [switch_simplify]: 4.699e-05 [loop_unroll]: 4.397e-05 [a_1]: 0.00154104 [with_stream_mark]: 1.198e-05 [recompute_prepare]: 1.101e-05 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.65998e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00017591 [accelerated_algorithm]: 1.272e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.23002e-06 [merge_send_recv]: 6.86999e-06 [auto_parallel]: 7.39002e-06 [parallel]: 5.12e-06 [flash_sp]: 3.49001e-06 [merge_comm]: 5.24e-06 [allreduce_fusion]: 5.11997e-06 [matmul_add_comm_reduction]: 8.36002e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.053e-05 [virtual_dataset]: 9.17999e-06 [get_grad_eliminate_]: 8.90999e-06 [virtual_output]: 8.63001e-06 [merge_forward]: 4.52998e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 9.37001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.696e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.427e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.407e-05 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.09e-06 [after_resolve]: 1.508e-05 [a_after_grad]: 1.439e-05 [renormalize]: 0.0005889 [add_forward_monad_depend]: 4e-06 [auto_monad_grad]: 1.14998e-06 [auto_monad_eliminator]: 1.447e-05 [cse]: 4.767e-05 [a_3]: 6.651e-05 [Cycle 3]: 0.00090763, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 1.068e-05 [loop_unroll]: 8.88002e-06 [a_1]: 0.00025195 [with_stream_mark]: 1.022e-05 [recompute_prepare]: 9.41e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.99002e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012381 [accelerated_algorithm]: 1.2e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 7.18e-06 [auto_parallel]: 7.28e-06 [parallel]: 4.96002e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 4.95999e-06 [allreduce_fusion]: 4.86002e-06 [matmul_add_comm_reduction]: 7.66001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.94001e-06 [virtual_dataset]: 8.66002e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.38001e-06 [merge_forward]: 4.38001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 8.43999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.639e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.418e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 1.444e-05 [a_after_grad]: 1.473e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.24998e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 1.099e-05 [cse]: 2.623e-05 [a_3]: 5.942e-05 [py_interpret_to_execute_after_opt_a]: 1.089e-05 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 4.922e-05 [convert_after_rewriter]: 9.34e-06 [order_py_execute_after_rewriter]: 6.69999e-06 [mutable_eliminate]: 0.00045996 [opt_b]: 0.00029076, [1] [Cycle 1]: 0.00028448, [7] [b_1]: 0.00019017 [b_2]: 1.098e-05 [updatestate_depend_eliminate]: 7.34002e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 4.1e-06 [renormalize]: 5.10016e-07 [cse]: 3.207e-05 [optimize_parallel_all_gather_comm]: 2.1e-05 [overlap_param_gather]: 2.36e-06 [cconv]: 2.074e-05 [loop_unroll]: 0.00042412 [opt_after_cconv]: 0.00013827, [1] [Cycle 1]: 0.00013244, [7] [c_1]: 4.933e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 7.34002e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 4.16001e-06 [cse]: 3.101e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 2.873e-05 [tuple_transform]: 0.00010245, [1] [Cycle 1]: 9.762e-05, [4] [d_1]: 6.688e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 9.94999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 5.828e-05 [cse_after_recomputation]: 3.375e-05, [1] [Cycle 1]: 2.899e-05, [1] [cse]: 2.354e-05 [environ_conv]: 8.90999e-06 [swap_dp_allreduce_reducescatter]: 7.84997e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 9.5999e-07 [full_micro_interleaved_order_control]: 2.03002e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.04003e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.674e-05 [grouped_pairwise_exchange_alltoall]: 1.39e-06 [offloading_packed_experts]: 5.30001e-06 [overlap_recompute_and_grad_model_parallel]: 6.07999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.29999e-06 [overlap_grad_ring_attention]: 5.12e-06 [overlap_grad_flash_sp]: 2.496e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 0.0001014, [1] [Cycle 1]: 9.704e-05, [6] [build]: 9.91998e-06 [elim_shapecalc]: 1.406e-05 [elim_not_effective]: 1.892e-05 [opt_reshape]: 9.97999e-06 [fold_const_symbol]: 1.493e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.478e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00046717 [validate]: 4.668e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.119068 [execute]: 8.109e-05 Sums bootstrap : 0.000503s : 0.35% type_inference : 0.010338s : 7.23% event_method : 0.000044s : 0.03% auto_monad : 0.000116s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.02% optimize.rewriter_before_opt_a : 0.000128s : 0.09% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000126s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.08% optimize.opt_a.a_1 : 0.003170s : 2.22% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000546s : 0.38% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001438s : 1.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.06% optimize.opt_a.a_after_grad : 0.000111s : 0.08% optimize.opt_a.renormalize : 0.003093s : 2.16% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000242s : 0.17% optimize.opt_a.a_3 : 0.000462s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000460s : 0.32% optimize.opt_b.b_1 : 0.000190s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.01% optimize.loop_unroll : 0.000424s : 0.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000467s : 0.33% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.119068s : 83.28% execute : 0.000081s : 0.06% Time group info: ------[substitution.] 0.000736 218 6.00% : 0.000044s : 11: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000007s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 54.76% : 0.000403s : 16: substitution.inline 2.13% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.88% : 0.000014s : 20: substitution.remove_not_recompute_node 3.31% : 0.000024s : 10: substitution.replace_applicator 1.55% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.78% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.89% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.28% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.53% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010268 2 87.09% : 0.008943s : 1: type_inference.infer 12.91% : 0.001325s : 1: type_inference.specialize ------[replace.] 0.000201 30 58.44% : 0.000118s : 16: replace.inline 41.56% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 30 92.80% : 0.000395s : 16: match.inline 7.20% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000790 5663 1.06% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 1.04% : 0.000008s : 67: predicate.addn_zero_filter 1.00% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.91% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.05% : 0.000008s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.48% : 0.000004s : 32: predicate.depend_value_elim 1.12% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.12% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.12% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_depend_swap 1.67% : 0.000013s : 107: predicate.environ_get_eliminate 1.13% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.57% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.17% : 0.000017s : 97: predicate.float_depend_g_call 6.45% : 0.000051s : 32: predicate.float_environ_get_switch 0.62% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.25% : 0.000041s : 244: predicate.inline 1.22% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.59% : 0.000005s : 32: predicate.less_batch_normalization 1.50% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.49% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.06% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.30% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.05% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.04% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.04% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.89% : 0.000015s : 97: predicate.partial_defer_inline 1.63% : 0.000013s : 89: predicate.partial_eliminate 1.00% : 0.000008s : 67: predicate.print_const_string_wrapper 0.58% : 0.000005s : 32: predicate.reduce_all_const_elim 1.20% : 0.000009s : 67: predicate.reduce_eliminate 2.51% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.84% : 0.000015s : 149: predicate.replace_applicator 0.59% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.01% : 0.000008s : 67: predicate.reshape_eliminate 1.08% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.21% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.58% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.09% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.69% : 0.000013s : 97: predicate.switch_defer_inline 2.73% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.58% : 0.000036s : 265: predicate.switch_simplify 0.99% : 0.000008s : 67: predicate.tile_eliminate 1.00% : 0.000008s : 67: predicate.transpose_eliminate 1.36% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.67% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.35% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.49% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.48% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.12% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.53% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.22% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001532 32 57.77% : 0.000885s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.23% : 0.000647s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.171779 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.73% : 0.002980s : 1: add_attr 1.73% : 0.002971s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000123s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000538s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000051s : 1: event_method 0.05% : 0.000090s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000469s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.84% : 0.004877s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000176s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000055s : 4: opt.transform.symbol_engine_opt 6.40% : 0.010999s : 1: opt_a 0.08% : 0.000142s : 1: opt_after_cconv 0.28% : 0.000476s : 1: opt_after_jit_grad 0.17% : 0.000295s : 1: opt_b 7.74% : 0.013293s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.02% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.96% : 0.001654s : 2: renormalize.infer 0.83% : 0.001425s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000053s : 1: rewriter_after_opt_a 0.08% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000104s : 1: symbol_engine_optimizer 69.33% : 0.119090s : 1: task_emit 0.06% : 0.000106s : 1: tuple_transform 6.03% : 0.010353s : 1: type_inference 0.04% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x2-ge],max_mem:8.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x3-pynative],max_mem:8.0M TotalTime = 0.0217914, [24] [bootstrap]: 0.00055568 [type_inference]: 0.006428 [event_method]: 1.457e-05 [auto_monad]: 5.584e-05 [graph_reusing]: 5.15999e-06 [inline]: 1.78002e-06 [add_attr]: 0.00345749, [1] [add_attr_with_inline]: 0.00344676, [1] [Cycle 1]: 4.487e-05, [2] [tag_attr]: 1.657e-05 [meta_addattr_fg_expand]: 3.9e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.729e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.96998e-06 [optimize]: 0.00403026, [53] [py_interpret_to_execute]: 1.936e-05 [rewriter_before_opt_a]: 5.862e-05 [opt_a]: 0.00219002, [2] [Cycle 1]: 0.00158325, [45] [expand_dump_flag]: 2.62001e-06 [switch_simplify]: 3.243e-05 [loop_unroll]: 2.14e-05 [a_1]: 0.00046079 [with_stream_mark]: 1.374e-05 [recompute_prepare]: 7.51001e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.671e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 7.66999e-06 [auto_parallel]: 6.09001e-06 [parallel]: 2.551e-05 [flash_sp]: 7.99002e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.49001e-06 [matmul_add_comm_reduction]: 8.73001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.71999e-06 [virtual_dataset]: 4.46e-05 [get_grad_eliminate_]: 6.33e-06 [virtual_output]: 6.24999e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.167e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.47001e-06 [flash_sp_send_recv_attached]: 2.69001e-06 [receive_attached]: 2.88998e-06 [after_resolve]: 1.072e-05 [a_after_grad]: 8.82999e-06 [renormalize]: 0.00042205 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.802e-05 [a_3]: 4.127e-05 [Cycle 2]: 0.00059748, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.63002e-06 [a_1]: 0.00012744 [with_stream_mark]: 9.93998e-06 [recompute_prepare]: 5.67001e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.776e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.19998e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 6.16e-06 [virtual_dataset]: 5.40001e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 4.97999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.71998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.30001e-06 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.17e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.73002e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.94998e-06 [a_after_grad]: 8.05e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.389e-05 [a_3]: 3.214e-05 [py_interpret_to_execute_after_opt_a]: 7.78001e-06 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 2.969e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.01002e-06 [mutable_eliminate]: 0.00044724 [opt_b]: 0.00018375, [1] [Cycle 1]: 0.00017769, [7] [b_1]: 0.00010888 [b_2]: 7.1e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 3.70026e-07 [cse]: 1.609e-05 [optimize_parallel_all_gather_comm]: 1.612e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.209e-05 [loop_unroll]: 0.00041555 [opt_after_cconv]: 9.496e-05, [1] [Cycle 1]: 8.917e-05, [7] [c_1]: 2.751e-05 [parameter_eliminate]: 2.44999e-06 [updatestate_depend_eliminate]: 5.37999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.611e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.265e-05 [tuple_transform]: 6.887e-05, [1] [Cycle 1]: 6.452e-05, [4] [d_1]: 3.88e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.21998e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.929e-05 [cse_after_recomputation]: 2.014e-05, [1] [Cycle 1]: 1.585e-05, [1] [cse]: 1.08e-05 [environ_conv]: 5.27001e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.52998e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.23002e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.29998e-06 [add_comm_op_reuse_tag]: 1.31002e-06 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.129e-05 [grouped_pairwise_exchange_alltoall]: 1.46002e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.47998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 3.69002e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.55001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.923e-05, [1] [Cycle 1]: 6.517e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.69e-06 [elim_not_effective]: 1.199e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.22999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.96e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 1.57e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 0.00012439 [opt_after_jit_grad]: 0.00045689 [validate]: 3.16e-05 [backend_pass]: 9.90025e-07 [task_emit]: 0.00635354 [execute]: 6.51e-06 Sums bootstrap : 0.000556s : 3.20% type_inference : 0.006428s : 37.05% event_method : 0.000015s : 0.08% auto_monad : 0.000056s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000588s : 3.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000050s : 0.29% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000422s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000042s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000447s : 2.58% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000416s : 2.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000124s : 0.72% opt_after_jit_grad : 0.000457s : 2.63% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006354s : 36.62% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.68% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000002s : 2: substitution.fold_const_symbol 3.11% : 0.000005s : 4: substitution.graph_param_transform 66.82% : 0.000111s : 3: substitution.inline 1.73% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.84% : 0.000005s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.48% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006383 2 90.41% : 0.005771s : 1: type_inference.infer 9.59% : 0.000612s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.65% : 0.000029s : 3: replace.inline 29.35% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.87% : 0.000109s : 3: match.inline 8.13% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.47% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 16: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.23% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.55% : 0.000002s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.30% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000002s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.57% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.25% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000397 8 44.25% : 0.000176s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.75% : 0.000221s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030852 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.22% : 0.003462s : 1: add_attr 11.18% : 0.003451s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000598s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.48% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.23% : 0.000997s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.11% : 0.002193s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.51% : 0.000467s : 1: opt_after_jit_grad 0.61% : 0.000187s : 1: opt_b 13.08% : 0.004034s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.07% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000218s : 1: renormalize.infer 0.64% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.42% : 0.000130s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 20.63% : 0.006363s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.88% : 0.006441s : 1: type_inference 0.20% : 0.000063s : 1: validate TotalTime = 0.0183709, [24] [bootstrap]: 0.00047048 [type_inference]: 0.00446115 [event_method]: 1.036e-05 [auto_monad]: 4.927e-05 [graph_reusing]: 5.81e-06 [inline]: 1.88002e-06 [add_attr]: 0.00298729, [1] [add_attr_with_inline]: 0.00297915, [1] [Cycle 1]: 4.503e-05, [2] [tag_attr]: 1.234e-05 [meta_addattr_fg_expand]: 2.82002e-06 [parallel-infer-symbol]: 3.07002e-06 [pre_auto_parallel]: 2.19e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00372631, [53] [py_interpret_to_execute]: 1.605e-05 [rewriter_before_opt_a]: 3.87e-05 [opt_a]: 0.00186963, [2] [Cycle 1]: 0.0012696, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 2.424e-05 [loop_unroll]: 1.36e-05 [a_1]: 0.00029443 [with_stream_mark]: 1.308e-05 [recompute_prepare]: 7.65998e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.47997e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.66e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 7.3e-06 [auto_parallel]: 5.48002e-06 [parallel]: 1.884e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.57999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.36e-06 [after_resolve]: 9.75002e-06 [a_after_grad]: 8.60999e-06 [renormalize]: 0.00035571 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.99999e-06 [auto_monad_eliminator]: 1.335e-05 [cse]: 2.696e-05 [a_3]: 3.967e-05 [Cycle 2]: 0.00059077, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012601 [with_stream_mark]: 1.185e-05 [recompute_prepare]: 5.51998e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.39999e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 6.718e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.33001e-06 [auto_parallel]: 5.20999e-06 [parallel]: 4.62e-06 [flash_sp]: 2.89999e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.79999e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.46e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.88999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.46e-06 [a_after_grad]: 7.93999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.19e-05 [a_3]: 3.174e-05 [py_interpret_to_execute_after_opt_a]: 7.75e-06 [slice_cell_reuse_recomputed_activation]: 1.70001e-06 [rewriter_after_opt_a]: 3.227e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 5.34e-06 [mutable_eliminate]: 0.00044514 [opt_b]: 0.0001802, [1] [Cycle 1]: 0.00017401, [7] [b_1]: 0.00010742 [b_2]: 7.15998e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 3.19997e-07 [cse]: 1.531e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.196e-05 [loop_unroll]: 0.00046926 [opt_after_cconv]: 9.291e-05, [1] [Cycle 1]: 8.719e-05, [7] [c_1]: 2.784e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.05002e-06 [cse]: 1.507e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.21e-05 [tuple_transform]: 7.07e-05, [1] [Cycle 1]: 6.645e-05, [4] [d_1]: 4.008e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.48e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.351e-05 [cse_after_recomputation]: 1.96e-05, [1] [Cycle 1]: 1.532e-05, [1] [cse]: 1.024e-05 [environ_conv]: 4.59002e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.36998e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.44001e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.122e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 3.78001e-06 [overlap_recompute_and_grad_model_parallel]: 4.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 1.96e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.706e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.69998e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 6.844e-05, [1] [Cycle 1]: 6.443e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 6.12999e-06 [fold_const_symbol]: 8.87e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.554e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00044754 [validate]: 3.08e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00592107 [execute]: 7.05002e-06 Sums bootstrap : 0.000470s : 3.26% type_inference : 0.004461s : 30.92% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000420s : 2.91% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000356s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000445s : 3.09% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000469s : 3.25% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000448s : 3.10% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005921s : 41.04% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.07% : 0.000022s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.18% : 0.000001s : 2: substitution.fold_const_symbol 4.71% : 0.000006s : 4: substitution.graph_param_transform 65.71% : 0.000078s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.94% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004420 2 92.12% : 0.004071s : 1: type_inference.infer 7.88% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.42% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.29% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.59% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.58% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.28% : 0.000002s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.22% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.75% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.25% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026372 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.35% : 0.002992s : 1: add_attr 11.31% : 0.002983s : 1: add_attr_with_inline 0.02% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000507s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.81% : 0.000478s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000770s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001872s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.73% : 0.000457s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.14% : 0.003730s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.75% : 0.000198s : 1: renormalize.infer 0.58% : 0.000152s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.49% : 0.005931s : 1: task_emit 0.28% : 0.000074s : 1: tuple_transform 16.97% : 0.004475s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.019672, [24] [bootstrap]: 0.00049112 [type_inference]: 0.00561767 [event_method]: 1.357e-05 [auto_monad]: 5.406e-05 [graph_reusing]: 5.05001e-06 [inline]: 1.57001e-06 [add_attr]: 0.00293661, [1] [add_attr_with_inline]: 0.00292843, [1] [Cycle 1]: 4.573e-05, [2] [tag_attr]: 1.501e-05 [meta_addattr_fg_expand]: 4.65001e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.487e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00394745, [53] [py_interpret_to_execute]: 1.992e-05 [rewriter_before_opt_a]: 5.755e-05 [opt_a]: 0.00212378, [2] [Cycle 1]: 0.001494, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 3.13e-05 [loop_unroll]: 2.04e-05 [a_1]: 0.00044497 [with_stream_mark]: 1.373e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 3.52997e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 2.94999e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.645e-05 [accelerated_algorithm]: 6.31e-06 [shard]: 1.95001e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 7.43999e-06 [auto_parallel]: 6.34001e-06 [parallel]: 1.794e-05 [flash_sp]: 7.1e-06 [merge_comm]: 3.84002e-06 [allreduce_fusion]: 3.28998e-06 [matmul_add_comm_reduction]: 8.75001e-06 [allreduce_slice_to_reducescatter]: 9.50007e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.68997e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.50998e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.75998e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.09e-06 [receive_attached]: 2.45002e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.74e-06 [renormalize]: 0.00040947 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.407e-05 [cse]: 2.739e-05 [a_3]: 4.015e-05 [Cycle 2]: 0.00062044, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012577 [with_stream_mark]: 9.61e-06 [recompute_prepare]: 5.80002e-06 [updatestate_depend_eliminate]: 2.97002e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 6.932e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.37e-06 [flash_sp]: 2.84001e-06 [merge_comm]: 3.14001e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.34998e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32999e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 7.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 1.10999e-06 [receive_attached]: 1.00999e-06 [after_resolve]: 8.87999e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 6.36998e-06 [cse]: 1.713e-05 [a_3]: 3.206e-05 [py_interpret_to_execute_after_opt_a]: 7.58001e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 3.198e-05 [convert_after_rewriter]: 6.46999e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.00044211 [opt_b]: 0.00018242, [1] [Cycle 1]: 0.00017604, [7] [b_1]: 0.00010763 [b_2]: 7.23e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 4.59986e-07 [cse]: 1.615e-05 [optimize_parallel_all_gather_comm]: 1.683e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.125e-05 [loop_unroll]: 0.00041257 [opt_after_cconv]: 9.487e-05, [1] [Cycle 1]: 8.917e-05, [7] [c_1]: 2.816e-05 [parameter_eliminate]: 2.23002e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.511e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.98e-05, [1] [Cycle 1]: 6.555e-05, [4] [d_1]: 3.976e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.94999e-06 [add_recomputation]: 4.348e-05 [cse_after_recomputation]: 1.992e-05, [1] [Cycle 1]: 1.567e-05, [1] [cse]: 1.069e-05 [environ_conv]: 4.46002e-06 [swap_dp_allreduce_reducescatter]: 5.03002e-06 [bias_add_comm_swap]: 2.22999e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 1.32999e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.112e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.25002e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.653e-05 [begin_end_overlap_inline]: 8.89995e-07 [split_matmul_comm_elemetwise]: 1.85001e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.754e-05, [1] [Cycle 1]: 6.369e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.48999e-06 [elim_not_effective]: 1.172e-05 [opt_reshape]: 5.86998e-06 [fold_const_symbol]: 8.72998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.81998e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.511e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.22002e-06 [opt_after_jit_grad]: 0.00044768 [validate]: 3.149e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00586983 [execute]: 7.04001e-06 Sums bootstrap : 0.000491s : 3.12% type_inference : 0.005618s : 35.65% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000571s : 3.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000410s : 2.60% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000045s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000442s : 2.81% optimize.opt_b.b_1 : 0.000108s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.13% optimize.loop_unroll : 0.000413s : 2.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 2.84% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005870s : 37.25% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000161 30 14.78% : 0.000024s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.41% : 0.000005s : 4: substitution.graph_param_transform 66.75% : 0.000108s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.16% : 0.000003s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005575 2 90.12% : 0.005024s : 1: type_inference.infer 9.88% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.85% : 0.000027s : 3: replace.inline 29.15% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.70% : 0.000105s : 3: match.inline 8.30% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.88% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.06% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 45.90% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.10% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028062 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.48% : 0.002941s : 1: add_attr 10.45% : 0.002932s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.88% : 0.000526s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000018s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000451s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.34% : 0.000937s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.58% : 0.002127s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.63% : 0.000457s : 1: opt_after_jit_grad 0.66% : 0.000186s : 1: opt_b 14.08% : 0.003951s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.09% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.75% : 0.000210s : 1: renormalize.infer 0.69% : 0.000193s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 20.95% : 0.005880s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 20.07% : 0.005632s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0376317, [24] [bootstrap]: 0.00051074 [type_inference]: 0.0115202 [event_method]: 4.99e-05 [auto_monad]: 0.00012071 [graph_reusing]: 7.8e-06 [inline]: 1.84e-06 [add_attr]: 0.00303664, [1] [add_attr_with_inline]: 0.00302791, [1] [Cycle 1]: 6.981e-05, [2] [tag_attr]: 3.471e-05 [meta_addattr_fg_expand]: 9.02e-06 [parallel-infer-symbol]: 3.00998e-06 [pre_auto_parallel]: 4.922e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0132827, [53] [py_interpret_to_execute]: 3.682e-05 [rewriter_before_opt_a]: 0.00014685 [opt_a]: 0.0110223, [3] [Cycle 1]: 0.00707039, [45] [expand_dump_flag]: 4.03001e-06 [switch_simplify]: 7.354e-05 [loop_unroll]: 6.16e-05 [a_1]: 0.0014448 [with_stream_mark]: 2.315e-05 [recompute_prepare]: 2.138e-05 [updatestate_depend_eliminate]: 9.25999e-06 [updatestate_assign_eliminate]: 7.55e-06 [updatestate_loads_eliminate]: 7.73001e-06 [parameter_eliminate]: 2.49999e-06 [a_2]: 0.0002788 [accelerated_algorithm]: 3.137e-05 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 3.58e-06 [shard_inline]: 1.622e-05 [merge_send_recv]: 1.639e-05 [auto_parallel]: 1.088e-05 [parallel]: 1.835e-05 [flash_sp]: 1.191e-05 [merge_comm]: 9.46003e-06 [allreduce_fusion]: 8.80999e-06 [matmul_add_comm_reduction]: 2.618e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.776e-05 [virtual_dataset]: 1.566e-05 [get_grad_eliminate_]: 1.508e-05 [virtual_output]: 1.511e-05 [merge_forward]: 9.26002e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.759e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.819e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.715e-05 [set_forward_comm_id_for_comm_node_pass]: 1.006e-05 [meta_fg_expand]: 0.00139007 [flash_sp_send_recv_attached]: 3.71999e-06 [receive_attached]: 2.55002e-06 [after_resolve]: 5.909e-05 [a_after_grad]: 8.111e-05 [renormalize]: 0.00243767 [add_forward_monad_depend]: 9.19998e-06 [auto_monad_grad]: 5.30999e-06 [auto_monad_eliminator]: 5.705e-05 [cse]: 0.00016432 [a_3]: 0.00033385 [Cycle 2]: 0.00303836, [45] [expand_dump_flag]: 1.54998e-06 [switch_simplify]: 4.737e-05 [loop_unroll]: 4.364e-05 [a_1]: 0.00152784 [with_stream_mark]: 1.199e-05 [recompute_prepare]: 1.076e-05 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012571 [accelerated_algorithm]: 1.209e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.86998e-06 [shard_inline]: 9.29998e-06 [merge_send_recv]: 6.68998e-06 [auto_parallel]: 7.03998e-06 [parallel]: 4.94e-06 [flash_sp]: 2.88e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.70001e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1e-05 [virtual_dataset]: 8.75001e-06 [get_grad_eliminate_]: 8.94e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.34002e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.622e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 6.909e-05 [flash_sp_send_recv_attached]: 1.08001e-06 [receive_attached]: 1.07e-06 [after_resolve]: 1.669e-05 [a_after_grad]: 1.458e-05 [renormalize]: 0.00063434 [add_forward_monad_depend]: 4.15e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.422e-05 [cse]: 4.561e-05 [a_3]: 6.597e-05 [Cycle 3]: 0.00089931, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 1.06e-05 [loop_unroll]: 9.03002e-06 [a_1]: 0.00025003 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 9.41003e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 3.76999e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 0.00012317 [accelerated_algorithm]: 1.158e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 9.12999e-06 [merge_send_recv]: 6.73e-06 [auto_parallel]: 6.91001e-06 [parallel]: 4.84e-06 [flash_sp]: 1.15001e-06 [merge_comm]: 4.92999e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 8.02e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 9.99999e-06 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.47998e-06 [virtual_output]: 8.31002e-06 [merge_forward]: 4.1e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 8.83001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.592e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.414e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10001e-06 [meta_fg_expand]: 2.97002e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.319e-05 [a_after_grad]: 1.395e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 1.07e-05 [cse]: 2.619e-05 [a_3]: 5.961e-05 [py_interpret_to_execute_after_opt_a]: 1.043e-05 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 4.772e-05 [convert_after_rewriter]: 8.79e-06 [order_py_execute_after_rewriter]: 6.91999e-06 [mutable_eliminate]: 0.00045864 [opt_b]: 0.00028835, [1] [Cycle 1]: 0.00028193, [7] [b_1]: 0.00019018 [b_2]: 1.107e-05 [updatestate_depend_eliminate]: 7.35e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 3.99002e-06 [renormalize]: 2.80008e-07 [cse]: 3.093e-05 [optimize_parallel_all_gather_comm]: 2.073e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 1.989e-05 [loop_unroll]: 0.0004248 [opt_after_cconv]: 0.00013575, [1] [Cycle 1]: 0.00012978, [7] [c_1]: 4.885e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 7.37997e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 4.02e-06 [cse]: 2.9e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 2.852e-05 [tuple_transform]: 0.00010188, [1] [Cycle 1]: 9.724e-05, [4] [d_1]: 6.683e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.99999e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 5.664e-05 [cse_after_recomputation]: 3.046e-05, [1] [Cycle 1]: 2.583e-05, [1] [cse]: 2.035e-05 [environ_conv]: 8.39998e-06 [swap_dp_allreduce_reducescatter]: 7.67998e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.02999e-06 [micro_interleaved_order_control]: 2.58998e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.77001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.738e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.22e-06 [overlap_recompute_and_grad_model_parallel]: 5.89999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 5.40001e-06 [overlap_grad_flash_sp]: 2.318e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.01003e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.836e-05, [1] [Cycle 1]: 9.421e-05, [6] [build]: 8.84e-06 [elim_shapecalc]: 1.326e-05 [elim_not_effective]: 1.824e-05 [opt_reshape]: 1.043e-05 [fold_const_symbol]: 1.518e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 2.533e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00046944 [validate]: 4.459e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00827785 [execute]: 7.38e-06 Sums bootstrap : 0.000511s : 1.53% type_inference : 0.011520s : 34.55% event_method : 0.000050s : 0.15% auto_monad : 0.000121s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000147s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003223s : 9.67% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000528s : 1.58% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001462s : 4.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.003072s : 9.21% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.25% optimize.opt_a.cse : 0.000236s : 0.71% optimize.opt_a.a_3 : 0.000459s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000459s : 1.38% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000425s : 1.27% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000020s : 0.06% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000469s : 1.41% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008278s : 24.83% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000761 222 5.82% : 0.000044s : 12: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.97% : 0.000426s : 17: substitution.inline 2.04% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.76% : 0.000013s : 20: substitution.remove_not_recompute_node 3.13% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.59% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011448 2 87.12% : 0.009974s : 1: type_inference.infer 12.88% : 0.001475s : 1: type_inference.specialize ------[replace.] 0.000220 33 58.23% : 0.000128s : 17: replace.inline 41.77% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 33 92.51% : 0.000417s : 17: match.inline 7.49% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000754 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.43% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.68% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.30% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.15% : 0.000009s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.24% : 0.000009s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.95% : 0.000015s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.93% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.42% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.53% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001562 34 57.17% : 0.000893s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.83% : 0.000669s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062267 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.88% : 0.003041s : 1: add_attr 4.87% : 0.003032s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000128s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000546s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000033s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000057s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.90% : 0.004917s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.71% : 0.011025s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000479s : 1: opt_after_jit_grad 0.47% : 0.000292s : 1: opt_b 21.34% : 0.013286s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.61% : 0.001628s : 2: renormalize.infer 2.30% : 0.001431s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.31% : 0.008288s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.53% : 0.011536s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0182605, [24] [bootstrap]: 0.00043772 [type_inference]: 0.00427253 [event_method]: 1.073e-05 [auto_monad]: 5.04e-05 [graph_reusing]: 5.92001e-06 [inline]: 1.76e-06 [add_attr]: 0.00298237, [1] [add_attr_with_inline]: 0.00297391, [1] [Cycle 1]: 4.426e-05, [2] [tag_attr]: 1.186e-05 [meta_addattr_fg_expand]: 3.16999e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.139e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00364074, [53] [py_interpret_to_execute]: 1.545e-05 [rewriter_before_opt_a]: 3.845e-05 [opt_a]: 0.00185369, [2] [Cycle 1]: 0.0012538, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 2.48e-05 [loop_unroll]: 1.465e-05 [a_1]: 0.00028935 [with_stream_mark]: 1.266e-05 [recompute_prepare]: 7.06001e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.56001e-06 [updatestate_loads_eliminate]: 3.45e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.661e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 7.65998e-06 [auto_parallel]: 5.51998e-06 [parallel]: 1.708e-05 [flash_sp]: 7.63001e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7e-06 [virtual_dataset]: 5.65001e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.92001e-06 [merge_forward]: 4.14002e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 8.48001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.077e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 9.19998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 2.44999e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.076e-05 [a_after_grad]: 9.02e-06 [renormalize]: 0.00034708 [add_forward_monad_depend]: 4.87998e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.315e-05 [cse]: 2.677e-05 [a_3]: 4.095e-05 [Cycle 2]: 0.00059073, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.0001254 [with_stream_mark]: 9.71998e-06 [recompute_prepare]: 5.67999e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.763e-05 [accelerated_algorithm]: 5.58002e-06 [shard]: 1.11997e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.23999e-06 [auto_parallel]: 5.13002e-06 [parallel]: 4.11001e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.26998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.21998e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.41998e-06 [offload_activation]: 5.81998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.88999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 8.77999e-06 [a_after_grad]: 7.7e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.08998e-06 [cse]: 1.261e-05 [a_3]: 3.158e-05 [py_interpret_to_execute_after_opt_a]: 7.31999e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.16e-05 [convert_after_rewriter]: 6.96999e-06 [order_py_execute_after_rewriter]: 4.77e-06 [mutable_eliminate]: 0.00044171 [opt_b]: 0.00018149, [1] [Cycle 1]: 0.00017536, [7] [b_1]: 0.00010818 [b_2]: 7.23e-06 [updatestate_depend_eliminate]: 4.75999e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 7.10017e-07 [cse]: 1.544e-05 [optimize_parallel_all_gather_comm]: 1.54e-05 [overlap_param_gather]: 2.17001e-06 [cconv]: 2.21e-05 [loop_unroll]: 0.00040386 [opt_after_cconv]: 9.403e-05, [1] [Cycle 1]: 8.827e-05, [7] [c_1]: 2.801e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.06e-06 [cse]: 1.538e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.89e-05, [1] [Cycle 1]: 6.453e-05, [4] [d_1]: 3.896e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.42e-05 [cse_after_recomputation]: 1.942e-05, [1] [Cycle 1]: 1.513e-05, [1] [cse]: 1.019e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 4.79998e-06 [bias_add_comm_swap]: 2.84999e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.01e-06 [micro_interleaved_order_control]: 2.65002e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 6.79982e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 1.31002e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.04998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.93002e-06 [control_data_broadcast_order]: 1.097e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.82001e-06 [overlap_recompute_comm]: 2.39001e-06 [overlap_grad_ring_attention]: 4.23001e-06 [overlap_grad_flash_sp]: 1.698e-05 [begin_end_overlap_inline]: 8.49977e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.77e-05, [1] [Cycle 1]: 6.353e-05, [6] [build]: 2.03997e-06 [elim_shapecalc]: 8.35999e-06 [elim_not_effective]: 1.096e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 9.04998e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 1.514e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.0004995 [validate]: 3.098e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00607211 [execute]: 6.15002e-06 Sums bootstrap : 0.000438s : 3.06% type_inference : 0.004273s : 29.83% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000415s : 2.90% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000347s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000442s : 3.08% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000404s : 2.82% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000499s : 3.49% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006072s : 42.40% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000118 26 17.94% : 0.000021s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.31% : 0.000002s : 2: substitution.fold_const_symbol 4.73% : 0.000006s : 4: substitution.graph_param_transform 65.46% : 0.000077s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004232 2 91.65% : 0.003878s : 1: type_inference.infer 8.35% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.07% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.63% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_depend_swap 1.99% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.74% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.45% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.22% : 0.000002s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 17: predicate.replace_applicator 1.02% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 1.02% : 0.000001s : 8: predicate.special_op_eliminate 1.03% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.91% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 41.04% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.96% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026154 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.42% : 0.002987s : 1: add_attr 11.38% : 0.002977s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.80% : 0.000472s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000413s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000451s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000766s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001857s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.95% : 0.000509s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 13.94% : 0.003645s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.02% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.73% : 0.000191s : 1: renormalize.infer 0.57% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.26% : 0.006082s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.39% : 0.004286s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0358282, [24] [bootstrap]: 0.00049545 [type_inference]: 0.0102729 [event_method]: 4.596e-05 [auto_monad]: 0.00010847 [graph_reusing]: 8.84e-06 [inline]: 2.06e-06 [add_attr]: 0.00294289, [1] [add_attr_with_inline]: 0.00293414, [1] [Cycle 1]: 6.812e-05, [2] [tag_attr]: 3.249e-05 [meta_addattr_fg_expand]: 8.55001e-06 [parallel-infer-symbol]: 3.32002e-06 [pre_auto_parallel]: 4.538e-05 [insert-virtual-dataset]: 2.47001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0130345, [53] [py_interpret_to_execute]: 3.462e-05 [rewriter_before_opt_a]: 0.00012939 [opt_a]: 0.0107573, [3] [Cycle 1]: 0.0068642, [45] [expand_dump_flag]: 3.48e-06 [switch_simplify]: 6.591e-05 [loop_unroll]: 5.466e-05 [a_1]: 0.00138389 [with_stream_mark]: 2.305e-05 [recompute_prepare]: 2.127e-05 [updatestate_depend_eliminate]: 8.94998e-06 [updatestate_assign_eliminate]: 8.18001e-06 [updatestate_loads_eliminate]: 6.98e-06 [parameter_eliminate]: 2.34999e-06 [a_2]: 0.00024549 [accelerated_algorithm]: 3.032e-05 [shard]: 1.84998e-06 [meta_shard_fg_expand]: 3.25e-06 [shard_inline]: 1.602e-05 [merge_send_recv]: 1.584e-05 [auto_parallel]: 1.063e-05 [parallel]: 1.881e-05 [flash_sp]: 1.168e-05 [merge_comm]: 9.29e-06 [allreduce_fusion]: 8.52e-06 [matmul_add_comm_reduction]: 2.638e-05 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 1.768e-05 [virtual_dataset]: 1.561e-05 [get_grad_eliminate_]: 1.506e-05 [virtual_output]: 1.504e-05 [merge_forward]: 9.42001e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 1.737e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.814e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 2.736e-05 [set_forward_comm_id_for_comm_node_pass]: 9.74e-06 [meta_fg_expand]: 0.00136227 [flash_sp_send_recv_attached]: 3.75e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 5.926e-05 [a_after_grad]: 8.067e-05 [renormalize]: 0.00238162 [add_forward_monad_depend]: 9.74e-06 [auto_monad_grad]: 5.23002e-06 [auto_monad_eliminator]: 5.6e-05 [cse]: 0.0001616 [a_3]: 0.00033178 [Cycle 2]: 0.00298869, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 4.669e-05 [loop_unroll]: 4.363e-05 [a_1]: 0.00156687 [with_stream_mark]: 1.227e-05 [recompute_prepare]: 1.106e-05 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.67002e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012579 [accelerated_algorithm]: 1.213e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 6.74001e-06 [auto_parallel]: 7.27002e-06 [parallel]: 4.99e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.22e-06 [allreduce_fusion]: 4.65999e-06 [matmul_add_comm_reduction]: 7.8e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.055e-05 [virtual_dataset]: 8.80999e-06 [get_grad_eliminate_]: 8.75001e-06 [virtual_output]: 8.48999e-06 [merge_forward]: 5.14e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.671e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.407e-05 [set_forward_comm_id_for_comm_node_pass]: 5.54e-06 [meta_fg_expand]: 3.412e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.476e-05 [a_after_grad]: 1.451e-05 [renormalize]: 0.00058184 [add_forward_monad_depend]: 3.78999e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 1.441e-05 [cse]: 4.528e-05 [a_3]: 6.419e-05 [Cycle 3]: 0.00089051, [45] [expand_dump_flag]: 1.19e-06 [switch_simplify]: 1.056e-05 [loop_unroll]: 8.99e-06 [a_1]: 0.00024786 [with_stream_mark]: 9.98002e-06 [recompute_prepare]: 9.28002e-06 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012249 [accelerated_algorithm]: 1.174e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.05001e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.29001e-06 [parallel]: 4.60001e-06 [flash_sp]: 1.17999e-06 [merge_comm]: 5.00001e-06 [allreduce_fusion]: 5.06997e-06 [matmul_add_comm_reduction]: 7.53999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.66998e-06 [virtual_dataset]: 8.60999e-06 [get_grad_eliminate_]: 8.55999e-06 [virtual_output]: 8.38999e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 8.85001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.56e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.361e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17e-06 [meta_fg_expand]: 3.01999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.364e-05 [a_after_grad]: 1.446e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 1.04003e-06 [auto_monad_eliminator]: 1.04e-05 [cse]: 2.445e-05 [a_3]: 5.666e-05 [py_interpret_to_execute_after_opt_a]: 1.056e-05 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 4.874e-05 [convert_after_rewriter]: 1.021e-05 [order_py_execute_after_rewriter]: 7.18e-06 [mutable_eliminate]: 0.00045298 [opt_b]: 0.00028466, [1] [Cycle 1]: 0.00027865, [7] [b_1]: 0.00018775 [b_2]: 1.059e-05 [updatestate_depend_eliminate]: 7.01001e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 4.08001e-06 [renormalize]: 4.7998e-07 [cse]: 3.012e-05 [optimize_parallel_all_gather_comm]: 2.023e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 1.934e-05 [loop_unroll]: 0.00041676 [opt_after_cconv]: 0.00018962, [1] [Cycle 1]: 0.00018377, [7] [c_1]: 4.84e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.29002e-06 [updatestate_loads_eliminate]: 3.81999e-06 [cse]: 8.222e-05 [renormalize]: 2.70025e-07 [remove_dup_value]: 2.719e-05 [tuple_transform]: 0.00010254, [1] [Cycle 1]: 9.775e-05, [4] [d_1]: 6.724e-05 [none_parameter_eliminate]: 2.03997e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.006e-05 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 5.823e-05 [cse_after_recomputation]: 3.175e-05, [1] [Cycle 1]: 2.714e-05, [1] [cse]: 2.174e-05 [environ_conv]: 8.58001e-06 [swap_dp_allreduce_reducescatter]: 7.4e-06 [bias_add_comm_swap]: 3.04001e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.37999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.53003e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.01002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.739e-05 [grouped_pairwise_exchange_alltoall]: 1.77999e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.54e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 5.04998e-06 [overlap_grad_flash_sp]: 2.383e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 9.861e-05, [1] [Cycle 1]: 9.439e-05, [6] [build]: 9.88002e-06 [elim_shapecalc]: 1.322e-05 [elim_not_effective]: 1.851e-05 [opt_reshape]: 1.016e-05 [fold_const_symbol]: 1.484e-05 [renormalize]: 1.59984e-07 [detach_backward]: 2.14e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 2.467e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00046479 [validate]: 4.405e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00810929 [execute]: 6.48e-06 Sums bootstrap : 0.000495s : 1.57% type_inference : 0.010273s : 32.46% event_method : 0.000046s : 0.15% auto_monad : 0.000108s : 0.34% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000129s : 0.41% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003199s : 10.11% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.56% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001399s : 4.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002964s : 9.37% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.26% optimize.opt_a.cse : 0.000231s : 0.73% optimize.opt_a.a_3 : 0.000453s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000453s : 1.43% optimize.opt_b.b_1 : 0.000188s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000417s : 1.32% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000082s : 0.26% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000027s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000465s : 1.47% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008109s : 25.63% execute : 0.000006s : 0.02% Time group info: ------[substitution.] 0.000789 218 5.40% : 0.000043s : 11: substitution.arithmetic_simplify 1.78% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000007s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 57.94% : 0.000457s : 16: substitution.inline 2.03% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.85% : 0.000015s : 3: substitution.less_batch_normalization 1.64% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.70% : 0.000013s : 20: substitution.remove_not_recompute_node 2.97% : 0.000023s : 10: substitution.replace_applicator 1.41% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.48% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.72% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.84% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.40% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010206 2 87.09% : 0.008889s : 1: type_inference.infer 12.91% : 0.001317s : 1: type_inference.specialize ------[replace.] 0.000204 30 58.57% : 0.000120s : 16: replace.inline 41.43% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000479 30 93.77% : 0.000449s : 16: match.inline 6.23% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.17% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.71% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 97: predicate.partial_defer_inline 1.68% : 0.000012s : 89: predicate.partial_eliminate 1.10% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.18% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.67% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.94% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.42% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001477 32 57.92% : 0.000856s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.08% : 0.000622s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059933 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.92% : 0.002947s : 1: add_attr 4.90% : 0.002938s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000116s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000529s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000052s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.07% : 0.004834s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.13% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.95% : 0.010760s : 1: opt_a 0.32% : 0.000193s : 1: opt_after_cconv 0.79% : 0.000474s : 1: opt_after_jit_grad 0.48% : 0.000288s : 1: opt_b 21.75% : 0.013038s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000031s : 1: remove_dup_value 2.66% : 0.001597s : 2: renormalize.infer 2.26% : 0.001354s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000053s : 1: rewriter_after_opt_a 0.22% : 0.000134s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.55% : 0.008120s : 1: task_emit 0.18% : 0.000106s : 1: tuple_transform 17.17% : 0.010288s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x3-kbk],max_mem:8.0M . TotalTime = 0.73615, [24] [bootstrap]: 0.00054513 [type_inference]: 0.00623183 [event_method]: 1.409e-05 [auto_monad]: 5.689e-05 [graph_reusing]: 5.44e-06 [inline]: 1.69e-06 [add_attr]: 0.00347649, [1] [add_attr_with_inline]: 0.00346564, [1] [Cycle 1]: 4.533e-05, [2] [tag_attr]: 1.559e-05 [meta_addattr_fg_expand]: 4.27e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 2.888e-05 [insert-virtual-dataset]: 2.75997e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.88997e-06 [pipeline_split]: 1.92999e-06 [optimize]: 0.00394398, [53] [py_interpret_to_execute]: 2.02e-05 [rewriter_before_opt_a]: 5.746e-05 [opt_a]: 0.00211279, [2] [Cycle 1]: 0.00151786, [45] [expand_dump_flag]: 2.37999e-06 [switch_simplify]: 3.146e-05 [loop_unroll]: 2.081e-05 [a_1]: 0.00045039 [with_stream_mark]: 1.266e-05 [recompute_prepare]: 8.09002e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 1.49e-06 [a_2]: 7.569e-05 [accelerated_algorithm]: 6.31998e-06 [shard]: 2.58e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 8.01001e-06 [auto_parallel]: 6.07001e-06 [parallel]: 2.167e-05 [flash_sp]: 7.78999e-06 [merge_comm]: 3.57002e-06 [allreduce_fusion]: 3.27002e-06 [matmul_add_comm_reduction]: 8.40999e-06 [allreduce_slice_to_reducescatter]: 8.00006e-07 [virtual_shard_identity]: 8.05999e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.72999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.14998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 2.14999e-06 [receive_attached]: 2.11e-06 [after_resolve]: 1.007e-05 [a_after_grad]: 9.05999e-06 [renormalize]: 0.00042613 [add_forward_monad_depend]: 4.75999e-06 [auto_monad_grad]: 1.98002e-06 [auto_monad_eliminator]: 1.29e-05 [cse]: 2.533e-05 [a_3]: 3.99e-05 [Cycle 2]: 0.00058546, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.71999e-06 [loop_unroll]: 5.32999e-06 [a_1]: 0.0001254 [with_stream_mark]: 9.41e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.90002e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 6.834e-05 [accelerated_algorithm]: 5.63002e-06 [shard]: 1.29998e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 5.00999e-06 [parallel]: 4.2e-06 [flash_sp]: 3.04001e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.49001e-06 [matmul_add_comm_reduction]: 5.27999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 4.96997e-06 [merge_forward]: 2.40002e-06 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 5.53002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.60001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.19998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 9.16002e-06 [a_after_grad]: 7.93999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.99979e-07 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 5.84999e-06 [cse]: 1.19e-05 [a_3]: 3.188e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 1.68002e-06 [rewriter_after_opt_a]: 3.112e-05 [convert_after_rewriter]: 6.68998e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00044751 [opt_b]: 0.00018371, [1] [Cycle 1]: 0.00017764, [7] [b_1]: 0.00010705 [b_2]: 6.76999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 2.49973e-07 [cse]: 1.585e-05 [optimize_parallel_all_gather_comm]: 1.565e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.137e-05 [loop_unroll]: 0.00041393 [opt_after_cconv]: 9.462e-05, [1] [Cycle 1]: 8.863e-05, [7] [c_1]: 2.795e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.558e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.868e-05, [1] [Cycle 1]: 6.418e-05, [4] [d_1]: 3.894e-05 [none_parameter_eliminate]: 1.46998e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 1.71002e-06 [add_recomputation]: 4.898e-05 [cse_after_recomputation]: 2.092e-05, [1] [Cycle 1]: 1.645e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.30999e-06 [swap_dp_allreduce_reducescatter]: 4.96002e-06 [bias_add_comm_swap]: 2.94999e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 3.11999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.107e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 3.31001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 3.7e-06 [overlap_grad_flash_sp]: 1.658e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.682e-05, [1] [Cycle 1]: 6.271e-05, [6] [build]: 2.37999e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.101e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.70001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.01998e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.503e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.50003e-06 [opt_after_jit_grad]: 0.00044713 [validate]: 3.082e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.721109 [execute]: 9.15999e-06 Sums bootstrap : 0.000545s : 0.07% type_inference : 0.006232s : 0.85% event_method : 0.000014s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000057s : 0.01% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000576s : 0.08% optimize.opt_a.with_stream_mark : 0.000022s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000426s : 0.06% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000037s : 0.01% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000448s : 0.06% optimize.opt_b.b_1 : 0.000107s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.00% optimize.loop_unroll : 0.000414s : 0.06% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.06% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.721109s : 98.55% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000164 30 14.56% : 0.000024s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.44% : 0.000006s : 4: substitution.graph_param_transform 67.05% : 0.000110s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.45% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.67% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006184 2 90.84% : 0.005617s : 1: type_inference.infer 9.16% : 0.000567s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.96% : 0.000027s : 3: replace.inline 30.04% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.58% : 0.000108s : 3: match.inline 8.42% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.93% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000363 8 46.38% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.62% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.745083 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.47% : 0.003481s : 1: add_attr 0.47% : 0.003469s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000062s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.08% : 0.000581s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.06% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000456s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.13% : 0.000940s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000089s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000031s : 4: opt.transform.symbol_engine_opt 0.28% : 0.002116s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.06% : 0.000456s : 1: opt_after_jit_grad 0.03% : 0.000187s : 1: opt_b 0.53% : 0.003948s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.03% : 0.000223s : 1: renormalize.infer 0.03% : 0.000196s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000069s : 1: symbol_engine_optimizer 96.79% : 0.721131s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.84% : 0.006245s : 1: type_inference 0.01% : 0.000058s : 1: validate TotalTime = 0.0703046, [24] [bootstrap]: 0.00052497 [type_inference]: 0.00441683 [event_method]: 1.106e-05 [auto_monad]: 5.09e-05 [graph_reusing]: 4.60999e-06 [inline]: 1.62001e-06 [add_attr]: 0.0029487, [1] [add_attr_with_inline]: 0.00294058, [1] [Cycle 1]: 4.675e-05, [2] [tag_attr]: 1.236e-05 [meta_addattr_fg_expand]: 3.23e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.221e-05 [insert-virtual-dataset]: 2.67001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00367569, [53] [py_interpret_to_execute]: 1.558e-05 [rewriter_before_opt_a]: 7.087e-05 [opt_a]: 0.001845, [2] [Cycle 1]: 0.00125087, [45] [expand_dump_flag]: 2.75002e-06 [switch_simplify]: 2.383e-05 [loop_unroll]: 1.403e-05 [a_1]: 0.00028714 [with_stream_mark]: 1.394e-05 [recompute_prepare]: 7.7e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.34001e-06 [updatestate_loads_eliminate]: 2.78003e-06 [parameter_eliminate]: 2.01e-06 [a_2]: 7.685e-05 [accelerated_algorithm]: 6.34001e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 7.78001e-06 [auto_parallel]: 5.91998e-06 [parallel]: 1.918e-05 [flash_sp]: 6.94999e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.06002e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 6.74999e-06 [virtual_dataset]: 5.85002e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.67998e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.79e-06 [before_grad]: 9.17001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33998e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.71e-06 [after_resolve]: 1.03e-05 [a_after_grad]: 8.97e-06 [renormalize]: 0.00034756 [add_forward_monad_depend]: 4.13999e-06 [auto_monad_grad]: 1.87999e-06 [auto_monad_eliminator]: 1.302e-05 [cse]: 2.674e-05 [a_3]: 3.949e-05 [Cycle 2]: 0.00058457, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.69999e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.0001244 [with_stream_mark]: 8.99998e-06 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 2.63e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.68e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.31002e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.01001e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.29e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.19e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.81998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46003e-06 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 7.62002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 8.98002e-06 [a_after_grad]: 7.85e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.322e-05 [a_3]: 3.145e-05 [py_interpret_to_execute_after_opt_a]: 7.59002e-06 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 3.054e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 4.86002e-06 [mutable_eliminate]: 0.00045076 [opt_b]: 0.00018036, [1] [Cycle 1]: 0.00017396, [7] [b_1]: 0.00010645 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 4.40021e-07 [cse]: 1.619e-05 [optimize_parallel_all_gather_comm]: 1.587e-05 [overlap_param_gather]: 1.66e-06 [cconv]: 2.32e-05 [loop_unroll]: 0.00040876 [opt_after_cconv]: 9.436e-05, [1] [Cycle 1]: 8.88e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.617e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.233e-05 [tuple_transform]: 6.737e-05, [1] [Cycle 1]: 6.305e-05, [4] [d_1]: 3.807e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 5.92999e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 4.51e-05 [cse_after_recomputation]: 2.107e-05, [1] [Cycle 1]: 1.685e-05, [1] [cse]: 1.176e-05 [environ_conv]: 5.05001e-06 [swap_dp_allreduce_reducescatter]: 4.96002e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.04002e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.46998e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.36002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 8.80013e-07 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.129e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.72001e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.747e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 6.741e-05, [1] [Cycle 1]: 6.331e-05, [6] [build]: 2.25002e-06 [elim_shapecalc]: 8.23999e-06 [elim_not_effective]: 1.123e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.67e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.579e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.40998e-06 [opt_after_jit_grad]: 0.00047994 [validate]: 3.102e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0578969 [execute]: 8.49998e-06 Sums bootstrap : 0.000525s : 0.79% type_inference : 0.004417s : 6.65% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000071s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000412s : 0.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000348s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.68% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000409s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000480s : 0.72% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057897s : 87.19% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000118 26 19.15% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 64.68% : 0.000076s : 2: substitution.inline 2.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.68% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004377 2 91.55% : 0.004007s : 1: type_inference.infer 8.45% : 0.000370s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.53% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 1.03% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.52% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.89% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.39% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.96% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000264 6 42.35% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.65% : 0.000152s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078187 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.78% : 0.002953s : 1: add_attr 3.77% : 0.002944s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.71% : 0.000559s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000417s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000758s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.36% : 0.001848s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.63% : 0.000490s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.71% : 0.003679s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000192s : 1: renormalize.infer 0.19% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.10% : 0.000076s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.07% : 0.057914s : 1: task_emit 0.09% : 0.000070s : 1: tuple_transform 5.67% : 0.004431s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.0724627, [24] [bootstrap]: 0.00047233 [type_inference]: 0.00557136 [event_method]: 1.395e-05 [auto_monad]: 5.517e-05 [graph_reusing]: 5.49e-06 [inline]: 1.59998e-06 [add_attr]: 0.00295282, [1] [add_attr_with_inline]: 0.00294533, [1] [Cycle 1]: 4.412e-05, [2] [tag_attr]: 1.502e-05 [meta_addattr_fg_expand]: 3.85e-06 [parallel-infer-symbol]: 2.82002e-06 [pre_auto_parallel]: 2.478e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00400571, [53] [py_interpret_to_execute]: 2.13e-05 [rewriter_before_opt_a]: 5.671e-05 [opt_a]: 0.0021104, [2] [Cycle 1]: 0.00150757, [45] [expand_dump_flag]: 3.2e-06 [switch_simplify]: 3.216e-05 [loop_unroll]: 2.024e-05 [a_1]: 0.00044392 [with_stream_mark]: 1.294e-05 [recompute_prepare]: 7.68999e-06 [updatestate_depend_eliminate]: 3.45998e-06 [updatestate_assign_eliminate]: 3.25002e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.554e-05 [accelerated_algorithm]: 6.41998e-06 [shard]: 2.01003e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 7.88001e-06 [auto_parallel]: 5.91e-06 [parallel]: 1.741e-05 [flash_sp]: 7.3e-06 [merge_comm]: 3.75998e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 8.95999e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.59998e-06 [merge_forward]: 4.02002e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.58002e-06 [before_grad]: 9.52999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 1.078e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 0.00042447 [add_forward_monad_depend]: 4.40999e-06 [auto_monad_grad]: 2.06998e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.937e-05 [a_3]: 4.048e-05 [Cycle 2]: 0.00059388, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.21002e-06 [a_1]: 0.00012439 [with_stream_mark]: 9.34e-06 [recompute_prepare]: 5.81998e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.726e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.05999e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.31002e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.29997e-06 [flash_sp]: 3.19001e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 3.07002e-06 [matmul_add_comm_reduction]: 4.82e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.89e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.26998e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 5.76998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.66e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.71999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.77e-06 [a_after_grad]: 8.13001e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.21998e-06 [cse]: 1.336e-05 [a_3]: 3.153e-05 [py_interpret_to_execute_after_opt_a]: 7.73999e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.046e-05 [convert_after_rewriter]: 6.31998e-06 [order_py_execute_after_rewriter]: 5.46998e-06 [mutable_eliminate]: 0.00051556 [opt_b]: 0.00018211, [1] [Cycle 1]: 0.00017586, [7] [b_1]: 0.00010767 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.89992e-07 [cse]: 1.665e-05 [optimize_parallel_all_gather_comm]: 1.686e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.148e-05 [loop_unroll]: 0.00041544 [opt_after_cconv]: 9.526e-05, [1] [Cycle 1]: 8.947e-05, [7] [c_1]: 2.839e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.588e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.19e-05 [tuple_transform]: 6.811e-05, [1] [Cycle 1]: 6.39e-05, [4] [d_1]: 3.858e-05 [none_parameter_eliminate]: 1.61998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 4.257e-05 [cse_after_recomputation]: 2.044e-05, [1] [Cycle 1]: 1.582e-05, [1] [cse]: 1.069e-05 [environ_conv]: 4.75999e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.45002e-06 [label_micro_interleaved_index]: 4.12003e-06 [label_fine_grained_interleaved_index]: 2.68003e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.60001e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 1.99999e-06 [reorder_send_recv_between_fp_bp]: 2.43e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.113e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 3.75e-06 [overlap_grad_flash_sp]: 1.694e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.42001e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 6.776e-05, [1] [Cycle 1]: 6.354e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.18999e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 5.84e-06 [fold_const_symbol]: 9.10999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.49e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00045057 [validate]: 3.157e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0586363 [execute]: 8.53001e-06 Sums bootstrap : 0.000472s : 0.69% type_inference : 0.005571s : 8.13% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000025s : 0.04% optimize.opt_a.a_1 : 0.000568s : 0.83% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000425s : 0.62% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000043s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000516s : 0.75% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000415s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000451s : 0.66% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058636s : 85.55% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000162 30 15.02% : 0.000024s : 5: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.40% : 0.000005s : 4: substitution.graph_param_transform 65.53% : 0.000106s : 3: substitution.inline 1.82% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000004s : 4: substitution.remove_not_recompute_node 2.82% : 0.000005s : 4: substitution.replace_old_param 6.59% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005529 2 89.92% : 0.004972s : 1: type_inference.infer 10.08% : 0.000558s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.97% : 0.000026s : 3: replace.inline 30.03% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.48% : 0.000104s : 3: match.inline 8.52% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.16% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.79% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.51% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 47.40% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.60% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080933 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.65% : 0.002957s : 1: add_attr 3.64% : 0.002949s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000506s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.65% : 0.000524s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.15% : 0.000932s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.61% : 0.002113s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000460s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.95% : 0.004010s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.27% : 0.000217s : 1: renormalize.infer 0.25% : 0.000200s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 72.47% : 0.058653s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 6.90% : 0.005585s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.766038, [24] [bootstrap]: 0.00049083 [type_inference]: 0.0113752 [event_method]: 4.852e-05 [auto_monad]: 0.00011968 [graph_reusing]: 8.15999e-06 [inline]: 1.77999e-06 [add_attr]: 0.00297142, [1] [add_attr_with_inline]: 0.00296312, [1] [Cycle 1]: 6.965e-05, [2] [tag_attr]: 3.426e-05 [meta_addattr_fg_expand]: 9.20001e-06 [parallel-infer-symbol]: 2.73998e-06 [pre_auto_parallel]: 4.977e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.013366, [53] [py_interpret_to_execute]: 3.761e-05 [rewriter_before_opt_a]: 0.00018731 [opt_a]: 0.0110111, [3] [Cycle 1]: 0.00707422, [45] [expand_dump_flag]: 3.88999e-06 [switch_simplify]: 7.288e-05 [loop_unroll]: 6.51e-05 [a_1]: 0.00144608 [with_stream_mark]: 2.292e-05 [recompute_prepare]: 2.122e-05 [updatestate_depend_eliminate]: 8.99e-06 [updatestate_assign_eliminate]: 7.8e-06 [updatestate_loads_eliminate]: 7.71999e-06 [parameter_eliminate]: 2.63e-06 [a_2]: 0.00024329 [accelerated_algorithm]: 3.058e-05 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 3.33998e-06 [shard_inline]: 1.602e-05 [merge_send_recv]: 1.626e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.767e-05 [flash_sp]: 1.114e-05 [merge_comm]: 9.87999e-06 [allreduce_fusion]: 8.77999e-06 [matmul_add_comm_reduction]: 2.612e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.803e-05 [virtual_dataset]: 1.562e-05 [get_grad_eliminate_]: 1.496e-05 [virtual_output]: 1.501e-05 [merge_forward]: 9.22001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.808e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.872e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.751e-05 [set_forward_comm_id_for_comm_node_pass]: 9.41998e-06 [meta_fg_expand]: 0.001395 [flash_sp_send_recv_attached]: 3.68e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 5.945e-05 [a_after_grad]: 0.00010507 [renormalize]: 0.00244333 [add_forward_monad_depend]: 8.88002e-06 [auto_monad_grad]: 5.34998e-06 [auto_monad_eliminator]: 5.667e-05 [cse]: 0.00016734 [a_3]: 0.00033376 [Cycle 2]: 0.00302564, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 4.643e-05 [loop_unroll]: 4.37e-05 [a_1]: 0.00155414 [with_stream_mark]: 1.158e-05 [recompute_prepare]: 1.074e-05 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.55e-06 [parameter_eliminate]: 1.18001e-06 [a_2]: 0.00012476 [accelerated_algorithm]: 1.161e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 9.35001e-06 [merge_send_recv]: 6.79001e-06 [auto_parallel]: 7.39002e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.00002e-06 [merge_comm]: 5.09e-06 [allreduce_fusion]: 4.52e-06 [matmul_add_comm_reduction]: 7.8e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 1.041e-05 [virtual_dataset]: 8.95001e-06 [get_grad_eliminate_]: 9.02999e-06 [virtual_output]: 8.54002e-06 [merge_forward]: 4.53001e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 8.65999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.577e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.393e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 6.877e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.09e-06 [after_resolve]: 1.645e-05 [a_after_grad]: 1.46e-05 [renormalize]: 0.00059675 [add_forward_monad_depend]: 4.26001e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.438e-05 [cse]: 4.588e-05 [a_3]: 6.523e-05 [Cycle 3]: 0.00089626, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 1.055e-05 [loop_unroll]: 8.99e-06 [a_1]: 0.00024817 [with_stream_mark]: 1.006e-05 [recompute_prepare]: 9.10999e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.00012256 [accelerated_algorithm]: 1.174e-05 [shard]: 8.80013e-07 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 6.81001e-06 [auto_parallel]: 7.00998e-06 [parallel]: 4.55999e-06 [flash_sp]: 1.06997e-06 [merge_comm]: 4.72e-06 [allreduce_fusion]: 4.92999e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 9.82999e-06 [virtual_dataset]: 8.75999e-06 [get_grad_eliminate_]: 8.44998e-06 [virtual_output]: 8.13001e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.42e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.588e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.14e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.549e-05 [a_after_grad]: 1.471e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 1.076e-05 [cse]: 2.562e-05 [a_3]: 5.966e-05 [py_interpret_to_execute_after_opt_a]: 1.045e-05 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 4.604e-05 [convert_after_rewriter]: 8.95999e-06 [order_py_execute_after_rewriter]: 6.61999e-06 [mutable_eliminate]: 0.00045767 [opt_b]: 0.0002878, [1] [Cycle 1]: 0.00028163, [7] [b_1]: 0.00018903 [b_2]: 1.077e-05 [updatestate_depend_eliminate]: 7e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 3.9e-06 [renormalize]: 3.89991e-07 [cse]: 3.157e-05 [optimize_parallel_all_gather_comm]: 2.065e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 2.007e-05 [loop_unroll]: 0.00042481 [opt_after_cconv]: 0.00013572, [1] [Cycle 1]: 0.00012957, [7] [c_1]: 4.76e-05 [parameter_eliminate]: 2.50002e-06 [updatestate_depend_eliminate]: 7.17002e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 4.02998e-06 [cse]: 3.038e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.878e-05 [tuple_transform]: 0.00010124, [1] [Cycle 1]: 9.692e-05, [4] [d_1]: 6.656e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.84999e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 5.701e-05 [cse_after_recomputation]: 3.201e-05, [1] [Cycle 1]: 2.729e-05, [1] [cse]: 2.156e-05 [environ_conv]: 9.22001e-06 [swap_dp_allreduce_reducescatter]: 8e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.60999e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.59999e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.712e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 4.68001e-06 [overlap_recompute_and_grad_model_parallel]: 5.74999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67001e-06 [overlap_recompute_comm]: 1.85001e-06 [overlap_grad_ring_attention]: 5.32001e-06 [overlap_grad_flash_sp]: 2.338e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.38998e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 9.938e-05, [1] [Cycle 1]: 9.498e-05, [6] [build]: 1.005e-05 [elim_shapecalc]: 1.392e-05 [elim_not_effective]: 1.82e-05 [opt_reshape]: 9.86e-06 [fold_const_symbol]: 1.444e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 2.385e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.62002e-06 [opt_after_jit_grad]: 0.00046771 [validate]: 4.577e-05 [backend_pass]: 1.01002e-06 [task_emit]: 0.736825 [execute]: 9.49e-06 Sums bootstrap : 0.000491s : 0.06% type_inference : 0.011375s : 1.49% event_method : 0.000049s : 0.01% auto_monad : 0.000120s : 0.02% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000187s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000130s : 0.02% optimize.opt_a.loop_unroll : 0.000118s : 0.02% optimize.opt_a.a_1 : 0.003248s : 0.43% optimize.opt_a.with_stream_mark : 0.000045s : 0.01% optimize.opt_a.recompute_prepare : 0.000041s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000491s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000025s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000015s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.01% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.001467s : 0.19% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.01% optimize.opt_a.a_after_grad : 0.000134s : 0.02% optimize.opt_a.renormalize : 0.003040s : 0.40% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.00% optimize.opt_a.auto_monad_grad : 0.000008s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.01% optimize.opt_a.cse : 0.000239s : 0.03% optimize.opt_a.a_3 : 0.000459s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000458s : 0.06% optimize.opt_b.b_1 : 0.000189s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000425s : 0.06% optimize.opt_after_cconv.c_1 : 0.000048s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.00% optimize.tuple_transform.d_1 : 0.000067s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000468s : 0.06% validate : 0.000046s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.736825s : 96.73% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000762 222 5.77% : 0.000044s : 12: substitution.arithmetic_simplify 1.80% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 55.75% : 0.000425s : 17: substitution.inline 2.10% : 0.000016s : 2: substitution.inline_without_move 1.42% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000014s : 3: substitution.less_batch_normalization 1.69% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.76% : 0.000013s : 20: substitution.remove_not_recompute_node 3.19% : 0.000024s : 10: substitution.replace_applicator 1.51% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.64% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.68% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011302 2 86.51% : 0.009777s : 1: type_inference.infer 13.49% : 0.001525s : 1: type_inference.specialize ------[replace.] 0.000221 33 57.61% : 0.000127s : 17: replace.inline 42.39% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 33 92.38% : 0.000416s : 17: match.inline 7.62% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.10% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000043s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001592 34 56.79% : 0.000904s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.21% : 0.000688s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.790670 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.38% : 0.002976s : 1: add_attr 0.38% : 0.002967s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.01% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.02% : 0.000127s : 1: auto_monad 0.00% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000525s : 1: bootstrap 0.01% : 0.000077s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000056s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000467s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.62% : 0.004933s : 117: opt.transform.opt_a 0.01% : 0.000046s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000174s : 28: opt.transform.opt_b 0.01% : 0.000074s : 2: opt.transform.opt_trans_graph 0.01% : 0.000053s : 4: opt.transform.symbol_engine_opt 1.39% : 0.011014s : 1: opt_a 0.02% : 0.000139s : 1: opt_after_cconv 0.06% : 0.000477s : 1: opt_after_jit_grad 0.04% : 0.000291s : 1: opt_b 1.69% : 0.013370s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000054s : 1: pre_auto_parallel 0.01% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000033s : 1: remove_dup_value 0.21% : 0.001626s : 2: renormalize.infer 0.18% : 0.001401s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000050s : 1: rewriter_after_opt_a 0.02% : 0.000193s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000102s : 1: symbol_engine_optimizer 93.19% : 0.736847s : 1: task_emit 0.01% : 0.000104s : 1: tuple_transform 1.44% : 0.011390s : 1: type_inference 0.01% : 0.000069s : 1: validate TotalTime = 0.0738139, [24] [bootstrap]: 0.00047643 [type_inference]: 0.00441036 [event_method]: 1.139e-05 [auto_monad]: 5.251e-05 [graph_reusing]: 5.01002e-06 [inline]: 2.09e-06 [add_attr]: 0.00309579, [1] [add_attr_with_inline]: 0.00308649, [1] [Cycle 1]: 5.067e-05, [2] [tag_attr]: 1.297e-05 [meta_addattr_fg_expand]: 3.6e-06 [parallel-infer-symbol]: 2.93e-06 [pre_auto_parallel]: 2.26e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.47999e-06 [optimize]: 0.00383146, [53] [py_interpret_to_execute]: 1.605e-05 [rewriter_before_opt_a]: 4.07e-05 [opt_a]: 0.00198289, [2] [Cycle 1]: 0.00131777, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 2.456e-05 [loop_unroll]: 1.396e-05 [a_1]: 0.00029386 [with_stream_mark]: 1.4e-05 [recompute_prepare]: 7.62002e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.24001e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.655e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.54999e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 6.48e-06 [merge_send_recv]: 8.07998e-06 [auto_parallel]: 6.33e-06 [parallel]: 1.827e-05 [flash_sp]: 7.23999e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 9.22001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.15998e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 4.11001e-06 [cell_reuse_recompute_pass]: 1.04998e-06 [offload_activation]: 9.68002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.209e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.21998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.23998e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 1.109e-05 [a_after_grad]: 8.88002e-06 [renormalize]: 0.00039071 [add_forward_monad_depend]: 4.94e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.256e-05 [cse]: 2.891e-05 [a_3]: 4.121e-05 [Cycle 2]: 0.0006558, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.00012525 [with_stream_mark]: 9.22001e-06 [recompute_prepare]: 5.60001e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.28002e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.723e-05 [accelerated_algorithm]: 5.88998e-06 [shard]: 1.36002e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.24002e-06 [auto_parallel]: 5.38002e-06 [parallel]: 3.8e-06 [flash_sp]: 2.90998e-06 [merge_comm]: 3.57997e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.86001e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.18002e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.83998e-06 [cell_reuse_recompute_pass]: 1.49998e-06 [offload_activation]: 6.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.002e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 9.49e-06 [a_after_grad]: 8.1e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 7.03e-06 [cse]: 1.465e-05 [a_3]: 3.301e-05 [py_interpret_to_execute_after_opt_a]: 7.46001e-06 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 3.13e-05 [convert_after_rewriter]: 7.13998e-06 [order_py_execute_after_rewriter]: 4.92e-06 [mutable_eliminate]: 0.00048096 [opt_b]: 0.00018252, [1] [Cycle 1]: 0.00017554, [7] [b_1]: 0.00010697 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 5.10016e-07 [cse]: 1.654e-05 [optimize_parallel_all_gather_comm]: 1.526e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.227e-05 [loop_unroll]: 0.00041594 [opt_after_cconv]: 9.44e-05, [1] [Cycle 1]: 8.858e-05, [7] [c_1]: 2.733e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.626e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 1.253e-05 [tuple_transform]: 6.956e-05, [1] [Cycle 1]: 6.517e-05, [4] [d_1]: 3.924e-05 [none_parameter_eliminate]: 1.43002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.32001e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.552e-05 [cse_after_recomputation]: 2.115e-05, [1] [Cycle 1]: 1.676e-05, [1] [cse]: 1.104e-05 [environ_conv]: 4.53001e-06 [swap_dp_allreduce_reducescatter]: 5.17999e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 3.80998e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 1.12999e-06 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 9.69972e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.116e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.62998e-06 [overlap_recompute_and_grad_model_parallel]: 4.93001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.849e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.82999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.953e-05, [1] [Cycle 1]: 6.545e-05, [6] [build]: 2.78e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.154e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.553e-05 [get_jit_bprop_graph]: 1.20001e-06 [rewriter_after_jit_bprop_graph]: 3.81999e-06 [opt_after_jit_grad]: 0.00045516 [validate]: 3.483e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0611563 [execute]: 9.14998e-06 Sums bootstrap : 0.000476s : 0.68% type_inference : 0.004410s : 6.33% event_method : 0.000011s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.60% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000391s : 0.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000044s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000481s : 0.69% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000416s : 0.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000455s : 0.65% validate : 0.000035s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.061156s : 87.77% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 17.54% : 0.000021s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.19% : 0.000005s : 4: substitution.graph_param_transform 65.94% : 0.000081s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.99% : 0.000005s : 4: substitution.remove_not_recompute_node 3.54% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004367 2 91.66% : 0.004003s : 1: type_inference.infer 8.34% : 0.000364s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000139 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.29% : 0.000003s : 17: predicate.arithmetic_simplify 1.02% : 0.000001s : 9: predicate.cast_eliminate 0.96% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.32% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.72% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.76% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.84% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.84% : 0.000003s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.24% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 1.11% : 0.000002s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.11% : 0.000002s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.99% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000257 6 40.07% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.93% : 0.000154s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.082060 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.78% : 0.003100s : 1: add_attr 3.77% : 0.003090s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000512s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000018s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.60% : 0.000490s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.95% : 0.000776s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.42% : 0.001986s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000465s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.67% : 0.003835s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000214s : 1: renormalize.infer 0.21% : 0.000170s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000045s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 74.55% : 0.061179s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.39% : 0.004426s : 1: type_inference 0.07% : 0.000058s : 1: validate TotalTime = 0.112489, [24] [bootstrap]: 0.00051198 [type_inference]: 0.0106724 [event_method]: 4.392e-05 [auto_monad]: 0.00011766 [graph_reusing]: 7.92e-06 [inline]: 1.86003e-06 [add_attr]: 0.00309784, [1] [add_attr_with_inline]: 0.00308883, [1] [Cycle 1]: 7.26e-05, [2] [tag_attr]: 3.275e-05 [meta_addattr_fg_expand]: 8.13999e-06 [parallel-infer-symbol]: 3.11999e-06 [pre_auto_parallel]: 4.835e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.23002e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.0136112, [53] [py_interpret_to_execute]: 3.827e-05 [rewriter_before_opt_a]: 0.00012781 [opt_a]: 0.0112804, [3] [Cycle 1]: 0.00722963, [45] [expand_dump_flag]: 3.71001e-06 [switch_simplify]: 6.724e-05 [loop_unroll]: 5.493e-05 [a_1]: 0.00134903 [with_stream_mark]: 2.465e-05 [recompute_prepare]: 2.385e-05 [updatestate_depend_eliminate]: 9.43002e-06 [updatestate_assign_eliminate]: 7.92e-06 [updatestate_loads_eliminate]: 7.38e-06 [parameter_eliminate]: 2.43e-06 [a_2]: 0.00024447 [accelerated_algorithm]: 3.21e-05 [shard]: 1.79e-06 [meta_shard_fg_expand]: 3.31001e-06 [shard_inline]: 1.63e-05 [merge_send_recv]: 1.691e-05 [auto_parallel]: 1.201e-05 [parallel]: 1.836e-05 [flash_sp]: 1.2e-05 [merge_comm]: 9.54999e-06 [allreduce_fusion]: 8.90001e-06 [matmul_add_comm_reduction]: 2.81e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.865e-05 [virtual_dataset]: 1.548e-05 [get_grad_eliminate_]: 1.539e-05 [virtual_output]: 1.524e-05 [merge_forward]: 9.94999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.769e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.919e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.797e-05 [set_forward_comm_id_for_comm_node_pass]: 9.81e-06 [meta_fg_expand]: 0.00150116 [flash_sp_send_recv_attached]: 3.91001e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 6.131e-05 [a_after_grad]: 8.236e-05 [renormalize]: 0.00253101 [add_forward_monad_depend]: 1.123e-05 [auto_monad_grad]: 5.79e-06 [auto_monad_eliminator]: 5.791e-05 [cse]: 0.00022601 [a_3]: 0.00033739 [Cycle 2]: 0.00308884, [45] [expand_dump_flag]: 2.02999e-06 [switch_simplify]: 4.71e-05 [loop_unroll]: 4.427e-05 [a_1]: 0.00155219 [with_stream_mark]: 1.421e-05 [recompute_prepare]: 1.22e-05 [updatestate_depend_eliminate]: 5.75001e-06 [updatestate_assign_eliminate]: 4.75001e-06 [updatestate_loads_eliminate]: 4.17998e-06 [parameter_eliminate]: 1.33002e-06 [a_2]: 0.00012688 [accelerated_algorithm]: 1.224e-05 [shard]: 1.37999e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.32999e-06 [merge_send_recv]: 7.60998e-06 [auto_parallel]: 8.03999e-06 [parallel]: 6.01998e-06 [flash_sp]: 3.48e-06 [merge_comm]: 5.47999e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 1.05e-05 [allreduce_slice_to_reducescatter]: 4.80009e-07 [virtual_shard_identity]: 1.057e-05 [virtual_dataset]: 9.00001e-06 [get_grad_eliminate_]: 8.65001e-06 [virtual_output]: 8.49002e-06 [merge_forward]: 4.79e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.99001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.702e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 1.449e-05 [set_forward_comm_id_for_comm_node_pass]: 5.46998e-06 [meta_fg_expand]: 3.864e-05 [flash_sp_send_recv_attached]: 1.05001e-06 [receive_attached]: 1.91e-06 [after_resolve]: 1.631e-05 [a_after_grad]: 1.417e-05 [renormalize]: 0.0006577 [add_forward_monad_depend]: 4.12e-06 [auto_monad_grad]: 1.51998e-06 [auto_monad_eliminator]: 1.477e-05 [cse]: 4.97e-05 [a_3]: 6.609e-05 [Cycle 3]: 0.00094633, [45] [expand_dump_flag]: 1.24998e-06 [switch_simplify]: 1.059e-05 [loop_unroll]: 9.01002e-06 [a_1]: 0.00025202 [with_stream_mark]: 1.037e-05 [recompute_prepare]: 9.32999e-06 [updatestate_depend_eliminate]: 4.71002e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 4.01001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00014997 [accelerated_algorithm]: 1.31e-05 [shard]: 1.09003e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 9.12999e-06 [merge_send_recv]: 7.38999e-06 [auto_parallel]: 7.39002e-06 [parallel]: 5.15001e-06 [flash_sp]: 1.45999e-06 [merge_comm]: 4.88001e-06 [allreduce_fusion]: 4.89e-06 [matmul_add_comm_reduction]: 8.58001e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.65999e-06 [get_grad_eliminate_]: 8.50001e-06 [virtual_output]: 8.13001e-06 [merge_forward]: 4.52e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.636e-05 [merge_recompute_call_nodes]: 8.60018e-07 [before_grad]: 1.417e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 2.92002e-06 [flash_sp_send_recv_attached]: 9.40025e-07 [receive_attached]: 1.54e-06 [after_resolve]: 1.555e-05 [a_after_grad]: 1.524e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 1.14e-05 [cse]: 2.844e-05 [a_3]: 5.934e-05 [py_interpret_to_execute_after_opt_a]: 1.269e-05 [slice_cell_reuse_recomputed_activation]: 2.01998e-06 [rewriter_after_opt_a]: 4.684e-05 [convert_after_rewriter]: 9.94001e-06 [order_py_execute_after_rewriter]: 6.91999e-06 [mutable_eliminate]: 0.00050278 [opt_b]: 0.00029311, [1] [Cycle 1]: 0.00028584, [7] [b_1]: 0.00018965 [b_2]: 1.084e-05 [updatestate_depend_eliminate]: 8.02998e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 4.12e-06 [renormalize]: 2.19996e-07 [cse]: 3.388e-05 [optimize_parallel_all_gather_comm]: 2.156e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.242e-05 [loop_unroll]: 0.00043453 [opt_after_cconv]: 0.00013873, [1] [Cycle 1]: 0.00013258, [7] [c_1]: 4.868e-05 [parameter_eliminate]: 2.87002e-06 [updatestate_depend_eliminate]: 7.35e-06 [updatestate_assign_eliminate]: 4.42e-06 [updatestate_loads_eliminate]: 4.05e-06 [cse]: 3.04e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 3.199e-05 [tuple_transform]: 0.00010235, [1] [Cycle 1]: 9.755e-05, [4] [d_1]: 6.709e-05 [none_parameter_eliminate]: 1.88002e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.89999e-06 [partial_unused_args_eliminate]: 1.61998e-06 [add_recomputation]: 6.101e-05 [cse_after_recomputation]: 3.205e-05, [1] [Cycle 1]: 2.756e-05, [1] [cse]: 2.216e-05 [environ_conv]: 9.36e-06 [swap_dp_allreduce_reducescatter]: 7.88001e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.68001e-06 [label_fine_grained_interleaved_index]: 3.13998e-06 [merge_cast_opt]: 1.29003e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 9.99979e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 1.98997e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.765e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 4.92e-06 [overlap_recompute_and_grad_model_parallel]: 5.84999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 5.35999e-06 [overlap_grad_flash_sp]: 2.553e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.08002e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010118, [1] [Cycle 1]: 9.706e-05, [6] [build]: 1.091e-05 [elim_shapecalc]: 1.373e-05 [elim_not_effective]: 1.825e-05 [opt_reshape]: 9.68002e-06 [fold_const_symbol]: 1.484e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 2.598e-05 [get_jit_bprop_graph]: 1.55999e-06 [rewriter_after_jit_bprop_graph]: 4.13001e-06 [opt_after_jit_grad]: 0.00049478 [validate]: 4.889e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.0835441 [execute]: 9.19998e-06 Sums bootstrap : 0.000512s : 0.47% type_inference : 0.010672s : 9.87% event_method : 0.000044s : 0.04% auto_monad : 0.000118s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000128s : 0.12% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003153s : 2.92% optimize.opt_a.with_stream_mark : 0.000049s : 0.05% optimize.opt_a.recompute_prepare : 0.000045s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000521s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000032s : 0.03% optimize.opt_a.auto_parallel : 0.000027s : 0.03% optimize.opt_a.parallel : 0.000030s : 0.03% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000047s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001543s : 1.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000093s : 0.09% optimize.opt_a.a_after_grad : 0.000112s : 0.10% optimize.opt_a.renormalize : 0.003189s : 2.95% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.02% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.08% optimize.opt_a.cse : 0.000304s : 0.28% optimize.opt_a.a_3 : 0.000463s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000503s : 0.47% optimize.opt_b.b_1 : 0.000190s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000435s : 0.40% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000032s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000495s : 0.46% validate : 0.000049s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.083544s : 77.30% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000771 218 6.34% : 0.000049s : 11: substitution.arithmetic_simplify 1.99% : 0.000015s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000003s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 54.86% : 0.000423s : 16: substitution.inline 2.19% : 0.000017s : 2: substitution.inline_without_move 1.41% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000016s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.26% : 0.000025s : 10: substitution.replace_applicator 1.59% : 0.000012s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.34% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010597 2 87.24% : 0.009245s : 1: type_inference.infer 12.76% : 0.001352s : 1: type_inference.specialize ------[replace.] 0.000207 30 59.79% : 0.000124s : 16: replace.inline 40.21% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000446 30 93.06% : 0.000415s : 16: match.inline 6.94% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000739 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.98% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.45% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.19% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.38% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.12% : 0.000016s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.72% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.33% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.84% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.28% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001558 32 57.62% : 0.000898s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.38% : 0.000660s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.137553 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.26% : 0.003103s : 1: add_attr 2.25% : 0.003093s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000065s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000125s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.40% : 0.000547s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000052s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000444s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000512s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.52% : 0.004846s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.05% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.20% : 0.011283s : 1: opt_a 0.10% : 0.000142s : 1: opt_after_cconv 0.37% : 0.000505s : 1: opt_after_jit_grad 0.22% : 0.000297s : 1: opt_b 9.90% : 0.013616s : 1: optimize 0.02% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000036s : 1: remove_dup_value 1.24% : 0.001702s : 2: renormalize.infer 1.07% : 0.001471s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000104s : 1: symbol_engine_optimizer 60.75% : 0.083567s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 7.77% : 0.010690s : 1: type_inference 0.06% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x3-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x4-pynative],max_mem:10.0M TotalTime = 0.0226734, [24] [bootstrap]: 0.00054431 [type_inference]: 0.00645133 [event_method]: 1.496e-05 [auto_monad]: 6.057e-05 [graph_reusing]: 5.17e-06 [inline]: 1.77001e-06 [add_attr]: 0.0035921, [1] [add_attr_with_inline]: 0.00358086, [1] [Cycle 1]: 5.04e-05, [2] [tag_attr]: 1.746e-05 [meta_addattr_fg_expand]: 3.97e-06 [parallel-infer-symbol]: 3.33998e-06 [pre_auto_parallel]: 3.069e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00424265, [53] [py_interpret_to_execute]: 2.176e-05 [rewriter_before_opt_a]: 5.974e-05 [opt_a]: 0.0022892, [2] [Cycle 1]: 0.00161761, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 3.209e-05 [loop_unroll]: 2.104e-05 [a_1]: 0.00046599 [with_stream_mark]: 1.452e-05 [recompute_prepare]: 8.80999e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.31001e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.568e-05 [accelerated_algorithm]: 6.86001e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.48999e-06 [auto_parallel]: 6.08002e-06 [parallel]: 2.362e-05 [flash_sp]: 7.60998e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 1.037e-05 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 7.66999e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 6.03998e-06 [virtual_output]: 5.51e-06 [merge_forward]: 4.18001e-06 [cell_reuse_recompute_pass]: 1.67999e-06 [offload_activation]: 9.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.177e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 1.004e-05 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.96999e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.026e-05 [a_after_grad]: 8.95001e-06 [renormalize]: 0.00048003 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 2.24999e-06 [auto_monad_eliminator]: 1.513e-05 [cse]: 2.858e-05 [a_3]: 4.164e-05 [Cycle 2]: 0.00066168, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 7.25003e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.00016828 [with_stream_mark]: 1.063e-05 [recompute_prepare]: 6.66e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 6.927e-05 [accelerated_algorithm]: 5.76003e-06 [shard]: 1.35999e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 5.37001e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.92999e-06 [flash_sp]: 3.58e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 6.51999e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 6.66e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.81e-06 [cell_reuse_recompute_pass]: 1.47999e-06 [offload_activation]: 6.39999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.77001e-06 [merge_recompute_call_nodes]: 8.80013e-07 [before_grad]: 8.01001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.29e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 7.8e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.35001e-06 [auto_monad_grad]: 1.26002e-06 [auto_monad_eliminator]: 7.09001e-06 [cse]: 1.918e-05 [a_3]: 3.161e-05 [py_interpret_to_execute_after_opt_a]: 8.97e-06 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.092e-05 [convert_after_rewriter]: 7.26999e-06 [order_py_execute_after_rewriter]: 5.72999e-06 [mutable_eliminate]: 0.0004997 [opt_b]: 0.00018835, [1] [Cycle 1]: 0.00018139, [7] [b_1]: 0.000108 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 6.58998e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.49999e-06 [renormalize]: 3.09985e-07 [cse]: 1.954e-05 [optimize_parallel_all_gather_comm]: 1.749e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.462e-05 [loop_unroll]: 0.00043668 [opt_after_cconv]: 9.844e-05, [1] [Cycle 1]: 9.189e-05, [7] [c_1]: 2.738e-05 [parameter_eliminate]: 2.48002e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.07999e-06 [cse]: 1.897e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.312e-05 [tuple_transform]: 7.182e-05, [1] [Cycle 1]: 6.749e-05, [4] [d_1]: 4.206e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.02999e-06 [partial_unused_args_eliminate]: 1.88002e-06 [add_recomputation]: 5.34e-05 [cse_after_recomputation]: 2.091e-05, [1] [Cycle 1]: 1.656e-05, [1] [cse]: 1.117e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 5.96998e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 5.05999e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 9.60019e-07 [full_micro_interleaved_order_control]: 2.50002e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.30001e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.155e-05 [grouped_pairwise_exchange_alltoall]: 2.01998e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.70999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.41002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.06e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.914e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.952e-05, [1] [Cycle 1]: 6.539e-05, [6] [build]: 3.09999e-06 [elim_shapecalc]: 9.34998e-06 [elim_not_effective]: 1.137e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.58001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.61e-05 [get_jit_bprop_graph]: 1.30001e-06 [rewriter_after_jit_bprop_graph]: 0.00014389 [opt_after_jit_grad]: 0.00050153 [validate]: 3.498e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00678911 [execute]: 6.95002e-06 Sums bootstrap : 0.000544s : 3.01% type_inference : 0.006451s : 35.70% event_method : 0.000015s : 0.08% auto_monad : 0.000061s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000031s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.33% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000634s : 3.51% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.80% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000480s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.12% optimize.opt_a.cse : 0.000048s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000500s : 2.77% optimize.opt_b.b_1 : 0.000108s : 0.60% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.14% optimize.loop_unroll : 0.000437s : 2.42% optimize.opt_after_cconv.c_1 : 0.000027s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000042s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000144s : 0.80% opt_after_jit_grad : 0.000502s : 2.78% validate : 0.000035s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006789s : 37.57% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000177 30 16.13% : 0.000029s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000001s : 2: substitution.fold_const_symbol 2.99% : 0.000005s : 4: substitution.graph_param_transform 65.63% : 0.000116s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000005s : 4: substitution.remove_not_recompute_node 2.61% : 0.000005s : 4: substitution.replace_old_param 6.57% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006401 2 90.58% : 0.005798s : 1: type_inference.infer 9.42% : 0.000603s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.62% : 0.000027s : 3: replace.inline 29.38% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 5 91.52% : 0.000114s : 3: match.inline 8.48% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.47% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.94% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.75% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.54% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.05% : 0.000002s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.82% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.64% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000381 8 46.07% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.93% : 0.000205s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032150 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.19% : 0.003597s : 1: add_attr 11.15% : 0.003585s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000066s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.81% : 0.000581s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.39% : 0.000447s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.59% : 0.000510s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 3.13% : 0.001007s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000046s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.13% : 0.002292s : 1: opt_a 0.32% : 0.000102s : 1: opt_after_cconv 1.60% : 0.000513s : 1: opt_after_jit_grad 0.60% : 0.000192s : 1: opt_b 13.21% : 0.004247s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000035s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.78% : 0.000252s : 1: renormalize.infer 0.68% : 0.000220s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.47% : 0.000152s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000072s : 1: symbol_engine_optimizer 21.15% : 0.006801s : 1: task_emit 0.23% : 0.000075s : 1: tuple_transform 20.12% : 0.006468s : 1: type_inference 0.22% : 0.000069s : 1: validate TotalTime = 0.0191991, [24] [bootstrap]: 0.00051223 [type_inference]: 0.00461156 [event_method]: 1.08e-05 [auto_monad]: 5.567e-05 [graph_reusing]: 4.89e-06 [inline]: 2.17001e-06 [add_attr]: 0.00313038, [1] [add_attr_with_inline]: 0.00312034, [1] [Cycle 1]: 5.086e-05, [2] [tag_attr]: 1.316e-05 [meta_addattr_fg_expand]: 3.14001e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 2.418e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 1.72001e-06 [pipeline_split]: 1.69998e-06 [optimize]: 0.00384936, [53] [py_interpret_to_execute]: 1.681e-05 [rewriter_before_opt_a]: 3.948e-05 [opt_a]: 0.00192573, [2] [Cycle 1]: 0.00131452, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 2.506e-05 [loop_unroll]: 1.336e-05 [a_1]: 0.0002974 [with_stream_mark]: 1.745e-05 [recompute_prepare]: 8e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.708e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 8e-06 [auto_parallel]: 5.95002e-06 [parallel]: 1.772e-05 [flash_sp]: 7.15998e-06 [merge_comm]: 3.40998e-06 [allreduce_fusion]: 3.32997e-06 [matmul_add_comm_reduction]: 9.25999e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.79001e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.63002e-06 [merge_forward]: 3.79002e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.117e-05 [merge_recompute_call_nodes]: 1.91e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.38998e-06 [receive_attached]: 2.73e-06 [after_resolve]: 1.117e-05 [a_after_grad]: 8.80999e-06 [renormalize]: 0.00038427 [add_forward_monad_depend]: 4.73001e-06 [auto_monad_grad]: 2.06e-06 [auto_monad_eliminator]: 1.351e-05 [cse]: 2.832e-05 [a_3]: 4.064e-05 [Cycle 2]: 0.00060181, [45] [expand_dump_flag]: 1.29e-06 [switch_simplify]: 6.93998e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.000126 [with_stream_mark]: 1.123e-05 [recompute_prepare]: 5.44998e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.43998e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 6.808e-05 [accelerated_algorithm]: 5.56998e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.19003e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.59998e-06 [auto_parallel]: 6.21e-06 [parallel]: 4.07003e-06 [flash_sp]: 3.17002e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 2.70002e-06 [matmul_add_comm_reduction]: 5.40999e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.44999e-06 [virtual_dataset]: 5.43002e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.60002e-06 [cell_reuse_recompute_pass]: 1.55001e-06 [offload_activation]: 6.06998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.67e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.22e-06 [after_resolve]: 9.91e-06 [a_after_grad]: 7.97003e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 7.32997e-06 [cse]: 1.353e-05 [a_3]: 3.273e-05 [py_interpret_to_execute_after_opt_a]: 7.95e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.78e-05 [convert_after_rewriter]: 7.53e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00052512 [opt_b]: 0.00018355, [1] [Cycle 1]: 0.0001771, [7] [b_1]: 0.00010742 [b_2]: 7.01999e-06 [updatestate_depend_eliminate]: 5.66003e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 3.50003e-07 [cse]: 1.689e-05 [optimize_parallel_all_gather_comm]: 2.051e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.494e-05 [loop_unroll]: 0.00042172 [opt_after_cconv]: 9.568e-05, [1] [Cycle 1]: 8.966e-05, [7] [c_1]: 2.797e-05 [parameter_eliminate]: 2.57001e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.633e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.313e-05 [tuple_transform]: 7.002e-05, [1] [Cycle 1]: 6.564e-05, [4] [d_1]: 4.029e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 4.481e-05 [cse_after_recomputation]: 2.027e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.061e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.59e-06 [bias_add_comm_swap]: 2.35002e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.37999e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.49998e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.158e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 3.93999e-06 [overlap_recompute_and_grad_model_parallel]: 4.31002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 3.90998e-06 [overlap_grad_flash_sp]: 1.75e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 7.201e-05, [1] [Cycle 1]: 6.762e-05, [6] [build]: 3.13e-06 [elim_shapecalc]: 8.90001e-06 [elim_not_effective]: 1.166e-05 [opt_reshape]: 6.07001e-06 [fold_const_symbol]: 9.51003e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.51e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.95e-06 [opt_after_jit_grad]: 0.00047495 [validate]: 3.304e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00625029 [execute]: 7.3e-06 Sums bootstrap : 0.000512s : 3.39% type_inference : 0.004612s : 30.55% event_method : 0.000011s : 0.07% auto_monad : 0.000056s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000423s : 2.80% optimize.opt_a.with_stream_mark : 0.000029s : 0.19% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.96% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000384s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.14% optimize.opt_a.cse : 0.000042s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000038s : 0.25% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000525s : 3.48% optimize.opt_b.b_1 : 0.000107s : 0.71% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.14% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000025s : 0.17% optimize.loop_unroll : 0.000422s : 2.79% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000475s : 3.15% validate : 0.000033s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006250s : 41.40% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000126 26 18.01% : 0.000023s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 1.34% : 0.000002s : 2: substitution.fold_const_symbol 4.70% : 0.000006s : 4: substitution.graph_param_transform 65.38% : 0.000083s : 2: substitution.inline 2.48% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.34% : 0.000004s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004565 2 92.27% : 0.004212s : 1: type_inference.infer 7.73% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000139 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 1.06% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.67% : 0.000004s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.79% : 0.000002s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.77% : 0.000008s : 44: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.59% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.35% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.95% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.86% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 1.06% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.31% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.27% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.49% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000247 6 42.89% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.11% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027498 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.40% : 0.003135s : 1: add_attr 11.36% : 0.003124s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000061s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.98% : 0.000546s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.95% : 0.000535s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.83% : 0.000780s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.01% : 0.001929s : 1: opt_a 0.36% : 0.000099s : 1: opt_after_cconv 1.76% : 0.000485s : 1: opt_after_jit_grad 0.68% : 0.000187s : 1: opt_b 14.01% : 0.003853s : 1: optimize 0.09% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.07% : 0.000021s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.79% : 0.000216s : 1: renormalize.infer 0.59% : 0.000161s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.15% : 0.000042s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000075s : 1: symbol_engine_optimizer 22.77% : 0.006261s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.83% : 0.004627s : 1: type_inference 0.22% : 0.000060s : 1: validate TotalTime = 0.0214199, [24] [bootstrap]: 0.0004909 [type_inference]: 0.00610299 [event_method]: 1.484e-05 [auto_monad]: 5.767e-05 [graph_reusing]: 6.09999e-06 [inline]: 2.09e-06 [add_attr]: 0.00316703, [1] [add_attr_with_inline]: 0.00315769, [1] [Cycle 1]: 5.595e-05, [2] [tag_attr]: 1.715e-05 [meta_addattr_fg_expand]: 4.12e-06 [parallel-infer-symbol]: 3.28998e-06 [pre_auto_parallel]: 2.7e-05 [insert-virtual-dataset]: 2.55002e-06 [parallel-infer-symbol-second]: 1.09e-06 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00433013, [53] [py_interpret_to_execute]: 2.143e-05 [rewriter_before_opt_a]: 5.903e-05 [opt_a]: 0.00233321, [2] [Cycle 1]: 0.00169512, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 3.3e-05 [loop_unroll]: 2.082e-05 [a_1]: 0.00051117 [with_stream_mark]: 1.623e-05 [recompute_prepare]: 9.84001e-06 [updatestate_depend_eliminate]: 3.39001e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 7.672e-05 [accelerated_algorithm]: 6.89999e-06 [shard]: 2.64001e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.92e-06 [auto_parallel]: 7.15e-06 [parallel]: 1.743e-05 [flash_sp]: 8.35001e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 1.067e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.71999e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.74e-06 [merge_forward]: 4.19002e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 1.012e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.187e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.57999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.73998e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 9.23002e-06 [renormalize]: 0.00050057 [add_forward_monad_depend]: 5.44998e-06 [auto_monad_grad]: 2.51e-06 [auto_monad_eliminator]: 1.505e-05 [cse]: 3.032e-05 [a_3]: 4.316e-05 [Cycle 2]: 0.00062723, [45] [expand_dump_flag]: 1.56002e-06 [switch_simplify]: 8.01001e-06 [loop_unroll]: 5.60001e-06 [a_1]: 0.00012916 [with_stream_mark]: 1.126e-05 [recompute_prepare]: 6.19001e-06 [updatestate_depend_eliminate]: 3.21001e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 6.864e-05 [accelerated_algorithm]: 6.15002e-06 [shard]: 1.38002e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.94e-06 [auto_parallel]: 5.68002e-06 [parallel]: 4.80001e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.58e-06 [matmul_add_comm_reduction]: 7.23999e-06 [allreduce_slice_to_reducescatter]: 4.89992e-07 [virtual_shard_identity]: 6.49001e-06 [virtual_dataset]: 5.30999e-06 [get_grad_eliminate_]: 5.18002e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.42001e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 6.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.93002e-06 [merge_recompute_call_nodes]: 8.80013e-07 [before_grad]: 8.03001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.52999e-06 [after_resolve]: 9.62001e-06 [a_after_grad]: 8.18999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.60001e-06 [auto_monad_grad]: 1.14998e-06 [auto_monad_eliminator]: 7.93001e-06 [cse]: 1.881e-05 [a_3]: 3.279e-05 [py_interpret_to_execute_after_opt_a]: 9.19e-06 [slice_cell_reuse_recomputed_activation]: 2.30002e-06 [rewriter_after_opt_a]: 3.553e-05 [convert_after_rewriter]: 6.94001e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00050809 [opt_b]: 0.0001874, [1] [Cycle 1]: 0.0001797, [7] [b_1]: 0.0001062 [b_2]: 7.29001e-06 [updatestate_depend_eliminate]: 6.12999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.51e-06 [renormalize]: 7.60017e-07 [cse]: 1.957e-05 [optimize_parallel_all_gather_comm]: 1.754e-05 [overlap_param_gather]: 2.30002e-06 [cconv]: 2.564e-05 [loop_unroll]: 0.00046173 [opt_after_cconv]: 0.0001017, [1] [Cycle 1]: 9.422e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.78e-06 [updatestate_depend_eliminate]: 6.43998e-06 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.899e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.314e-05 [tuple_transform]: 7.074e-05, [1] [Cycle 1]: 6.641e-05, [4] [d_1]: 3.979e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 1.94999e-06 [add_recomputation]: 4.583e-05 [cse_after_recomputation]: 2.149e-05, [1] [Cycle 1]: 1.672e-05, [1] [cse]: 1.105e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 5.32001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 5.17e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 9.60019e-07 [remove_cast_before_assign_add]: 1.38002e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81003e-06 [control_data_broadcast_order]: 1.304e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.32e-06 [overlap_recompute_and_grad_model_parallel]: 4.66002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.43998e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.851e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 2.03002e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 7.767e-05, [1] [Cycle 1]: 7.334e-05, [6] [build]: 3.28e-06 [elim_shapecalc]: 1.199e-05 [elim_not_effective]: 1.161e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 9.62001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.616e-05 [get_jit_bprop_graph]: 1.35999e-06 [rewriter_after_jit_bprop_graph]: 4.10998e-06 [opt_after_jit_grad]: 0.00053343 [validate]: 3.702e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00638597 [execute]: 7.15e-06 Sums bootstrap : 0.000491s : 2.85% type_inference : 0.006103s : 35.44% event_method : 0.000015s : 0.09% auto_monad : 0.000058s : 0.33% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000640s : 3.72% optimize.opt_a.with_stream_mark : 0.000027s : 0.16% optimize.opt_a.recompute_prepare : 0.000016s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.13% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000501s : 2.91% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.13% optimize.opt_a.cse : 0.000049s : 0.29% optimize.opt_a.a_3 : 0.000076s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000508s : 2.95% optimize.opt_b.b_1 : 0.000106s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.15% optimize.loop_unroll : 0.000462s : 2.68% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000019s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.07% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000533s : 3.10% validate : 0.000037s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006386s : 37.08% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000177 30 16.71% : 0.000030s : 5: substitution.arithmetic_simplify 0.96% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000002s : 2: substitution.fold_const_symbol 3.35% : 0.000006s : 4: substitution.graph_param_transform 64.63% : 0.000115s : 3: substitution.inline 1.54% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000005s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 7.01% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006053 2 89.56% : 0.005421s : 1: type_inference.infer 10.44% : 0.000632s : 1: type_inference.specialize ------[replace.] 0.000042 5 67.15% : 0.000028s : 3: replace.inline 32.85% : 0.000014s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 5 90.86% : 0.000113s : 3: match.inline 9.14% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000212 1131 0.67% : 0.000001s : 11: predicate.accumulaten_eliminater 0.79% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.45% : 0.000001s : 8: predicate.addn_check_dump 0.61% : 0.000001s : 11: predicate.addn_zero_filter 0.58% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.80% : 0.000004s : 19: predicate.arithmetic_simplify 0.67% : 0.000001s : 11: predicate.cast_eliminate 0.53% : 0.000001s : 8: predicate.check_bprop_eliminate 0.49% : 0.000001s : 8: predicate.compare_switch_simplify 0.18% : 0.000000s : 4: predicate.const_output_eliminate 0.50% : 0.000001s : 8: predicate.depend_value_elim 0.67% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.68% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.68% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.19% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.93% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.85% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.82% : 0.000002s : 15: predicate.environ_get_depend_swap 1.33% : 0.000003s : 23: predicate.environ_get_eliminate 0.82% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.96% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.59% : 0.000003s : 16: predicate.float_depend_g_call 0.44% : 0.000001s : 8: predicate.float_environ_get_switch 0.68% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000002s : 8: predicate.get_grad_eliminate 0.20% : 0.000000s : 4: predicate.graph_param_transform 0.53% : 0.000001s : 8: predicate.incorporate_call 0.45% : 0.000001s : 8: predicate.incorporate_call_switch 4.62% : 0.000010s : 51: predicate.inline 0.59% : 0.000001s : 8: predicate.inline_without_move 0.30% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000002s : 8: predicate.less_batch_normalization 1.41% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.81% : 0.000004s : 32: predicate.load_eliminater 1.24% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.64% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.44% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.46% : 0.000001s : 8: predicate.merge_addn 0.50% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.51% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.58% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.28% : 0.000001s : 4: predicate.opt_reshape 0.32% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000003s : 16: predicate.partial_defer_inline 1.08% : 0.000002s : 17: predicate.partial_eliminate 0.62% : 0.000001s : 11: predicate.print_const_string_wrapper 0.51% : 0.000001s : 8: predicate.reduce_all_const_elim 0.80% : 0.000002s : 11: predicate.reduce_eliminate 1.78% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.12% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000001s : 4: predicate.reset_defer_inline 0.65% : 0.000001s : 11: predicate.reshape_eliminate 0.55% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.31% : 0.000001s : 4: predicate.row_tensor_eliminate 0.70% : 0.000001s : 8: predicate.same_eliminate 0.41% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000002s : 8: predicate.shard_identity_eliminate 0.75% : 0.000002s : 8: predicate.special_op_eliminate 0.61% : 0.000001s : 8: predicate.specialize_transform 0.87% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.29% : 0.000001s : 4: predicate.switch_call_monad_eliminater 22.43% : 0.000048s : 16: predicate.switch_defer_inline 1.53% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.94% : 0.000008s : 54: predicate.switch_simplify 0.64% : 0.000001s : 11: predicate.tile_eliminate 0.65% : 0.000001s : 11: predicate.transpose_eliminate 1.16% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.23% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.04% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.55% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.04% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.61% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.31% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.75% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.45% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.30% : 0.000001s : 4: predicate.value_based_eliminate 0.55% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.55% : 0.000001s : 8: predicate.virtual_output_eliminate 0.24% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.41% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000371 8 46.39% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.61% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030599 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.37% : 0.003172s : 1: add_attr 10.33% : 0.003162s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000063s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.73% : 0.000529s : 1: bootstrap 0.10% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.04% : 0.000012s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.55% : 0.000473s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.70% : 0.000520s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 3.33% : 0.001018s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000036s : 4: opt.transform.symbol_engine_opt 7.63% : 0.002336s : 1: opt_a 0.34% : 0.000105s : 1: opt_after_cconv 1.78% : 0.000545s : 1: opt_after_jit_grad 0.62% : 0.000191s : 1: opt_b 14.16% : 0.004334s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.88% : 0.000269s : 1: renormalize.infer 0.73% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000040s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000080s : 1: symbol_engine_optimizer 20.92% : 0.006401s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 20.01% : 0.006122s : 1: type_inference 0.23% : 0.000070s : 1: validate TotalTime = 0.038963, [24] [bootstrap]: 0.00052841 [type_inference]: 0.0119515 [event_method]: 4.766e-05 [auto_monad]: 0.000122 [graph_reusing]: 8.42e-06 [inline]: 1.84998e-06 [add_attr]: 0.0031436, [1] [add_attr_with_inline]: 0.00313442, [1] [Cycle 1]: 7.69e-05, [2] [tag_attr]: 3.584e-05 [meta_addattr_fg_expand]: 9.24998e-06 [parallel-infer-symbol]: 3.11001e-06 [pre_auto_parallel]: 5.209e-05 [insert-virtual-dataset]: 2.86999e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0138367, [53] [py_interpret_to_execute]: 3.918e-05 [rewriter_before_opt_a]: 0.00014587 [opt_a]: 0.0114872, [3] [Cycle 1]: 0.00737051, [45] [expand_dump_flag]: 4.38001e-06 [switch_simplify]: 7.475e-05 [loop_unroll]: 6.193e-05 [a_1]: 0.00146287 [with_stream_mark]: 2.595e-05 [recompute_prepare]: 2.313e-05 [updatestate_depend_eliminate]: 9.34e-06 [updatestate_assign_eliminate]: 7.91001e-06 [updatestate_loads_eliminate]: 7.23e-06 [parameter_eliminate]: 2.71e-06 [a_2]: 0.00024331 [accelerated_algorithm]: 3.101e-05 [shard]: 1.96e-06 [meta_shard_fg_expand]: 3.48999e-06 [shard_inline]: 1.591e-05 [merge_send_recv]: 1.617e-05 [auto_parallel]: 1.129e-05 [parallel]: 1.929e-05 [flash_sp]: 1.224e-05 [merge_comm]: 9.57001e-06 [allreduce_fusion]: 8.79998e-06 [matmul_add_comm_reduction]: 2.79e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.853e-05 [virtual_dataset]: 1.572e-05 [get_grad_eliminate_]: 1.507e-05 [virtual_output]: 4.529e-05 [merge_forward]: 9.86e-06 [cell_reuse_recompute_pass]: 1.62001e-06 [offload_activation]: 1.974e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.046e-05 [merge_recompute_call_nodes]: 1.61998e-06 [before_grad]: 2.806e-05 [set_forward_comm_id_for_comm_node_pass]: 1.028e-05 [meta_fg_expand]: 0.0014919 [flash_sp_send_recv_attached]: 3.78001e-06 [receive_attached]: 2.48e-06 [after_resolve]: 6.213e-05 [a_after_grad]: 8.331e-05 [renormalize]: 0.00257373 [add_forward_monad_depend]: 1.007e-05 [auto_monad_grad]: 5.84999e-06 [auto_monad_eliminator]: 5.677e-05 [cse]: 0.00016456 [a_3]: 0.00033846 [Cycle 2]: 0.00318335, [45] [expand_dump_flag]: 2.14e-06 [switch_simplify]: 4.763e-05 [loop_unroll]: 4.377e-05 [a_1]: 0.00155422 [with_stream_mark]: 1.463e-05 [recompute_prepare]: 1.187e-05 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 3.81999e-06 [parameter_eliminate]: 1.35999e-06 [a_2]: 0.00012673 [accelerated_algorithm]: 1.276e-05 [shard]: 1.60999e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 7.78001e-06 [auto_parallel]: 7.7e-06 [parallel]: 6.04999e-06 [flash_sp]: 3.93999e-06 [merge_comm]: 5.08002e-06 [allreduce_fusion]: 4.82998e-06 [matmul_add_comm_reduction]: 9.03002e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.073e-05 [virtual_dataset]: 9.02999e-06 [get_grad_eliminate_]: 8.74e-06 [virtual_output]: 8.59e-06 [merge_forward]: 4.69002e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 9.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.71e-05 [merge_recompute_call_nodes]: 1.02998e-06 [before_grad]: 1.501e-05 [set_forward_comm_id_for_comm_node_pass]: 5.59e-06 [meta_fg_expand]: 7.827e-05 [flash_sp_send_recv_attached]: 1.32e-06 [receive_attached]: 1.96e-06 [after_resolve]: 1.72e-05 [a_after_grad]: 1.475e-05 [renormalize]: 0.00070372 [add_forward_monad_depend]: 4.18999e-06 [auto_monad_grad]: 1.59e-06 [auto_monad_eliminator]: 1.509e-05 [cse]: 5.067e-05 [a_3]: 6.763e-05 [Cycle 3]: 0.00091735, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 1.062e-05 [loop_unroll]: 8.82e-06 [a_1]: 0.00024971 [with_stream_mark]: 1.077e-05 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 4e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 0.00012398 [accelerated_algorithm]: 1.179e-05 [shard]: 1.35001e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 7.73999e-06 [auto_parallel]: 7.51001e-06 [parallel]: 5.32001e-06 [flash_sp]: 1.47001e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 5.19e-06 [matmul_add_comm_reduction]: 9.05001e-06 [allreduce_slice_to_reducescatter]: 4.80009e-07 [virtual_shard_identity]: 1.043e-05 [virtual_dataset]: 9.05001e-06 [get_grad_eliminate_]: 8.50999e-06 [virtual_output]: 8.35999e-06 [merge_forward]: 4.50999e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 9.42999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.596e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 1.426e-05 [set_forward_comm_id_for_comm_node_pass]: 5.57001e-06 [meta_fg_expand]: 3.01001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.54998e-06 [after_resolve]: 1.384e-05 [a_after_grad]: 1.421e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.223e-05 [cse]: 2.888e-05 [a_3]: 5.905e-05 [py_interpret_to_execute_after_opt_a]: 1.177e-05 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 4.828e-05 [convert_after_rewriter]: 9.57001e-06 [order_py_execute_after_rewriter]: 6.71e-06 [mutable_eliminate]: 0.0005067 [opt_b]: 0.00029361, [1] [Cycle 1]: 0.000286, [7] [b_1]: 0.00018865 [b_2]: 1.077e-05 [updatestate_depend_eliminate]: 8.33999e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 4.18999e-06 [renormalize]: 4.89992e-07 [cse]: 3.386e-05 [optimize_parallel_all_gather_comm]: 2.211e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.263e-05 [loop_unroll]: 0.0004374 [opt_after_cconv]: 0.00013799, [1] [Cycle 1]: 0.00013143, [7] [c_1]: 4.794e-05 [parameter_eliminate]: 2.54001e-06 [updatestate_depend_eliminate]: 7.73001e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 4.07998e-06 [cse]: 3.063e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 3.127e-05 [tuple_transform]: 0.00010112, [1] [Cycle 1]: 9.627e-05, [4] [d_1]: 6.649e-05 [none_parameter_eliminate]: 1.82999e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 9.66e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 5.863e-05 [cse_after_recomputation]: 3.206e-05, [1] [Cycle 1]: 2.696e-05, [1] [cse]: 2.126e-05 [environ_conv]: 9.05999e-06 [swap_dp_allreduce_reducescatter]: 7.65e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 4.67998e-06 [label_fine_grained_interleaved_index]: 2.35002e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.44998e-06 [full_micro_interleaved_order_control]: 2.35002e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.681e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 5.19998e-06 [overlap_recompute_and_grad_model_parallel]: 5.51002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 5.02999e-06 [overlap_grad_flash_sp]: 2.585e-05 [begin_end_overlap_inline]: 7.59988e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 0.00010031, [1] [Cycle 1]: 9.561e-05, [6] [build]: 1.071e-05 [elim_shapecalc]: 1.337e-05 [elim_not_effective]: 1.801e-05 [opt_reshape]: 1.014e-05 [fold_const_symbol]: 1.448e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.70001e-06 [auto_monad_reorder]: 2.524e-05 [get_jit_bprop_graph]: 1.44e-06 [rewriter_after_jit_bprop_graph]: 3.57997e-06 [opt_after_jit_grad]: 0.00049656 [validate]: 4.723e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00842242 [execute]: 6.43e-06 Sums bootstrap : 0.000528s : 1.53% type_inference : 0.011951s : 34.67% event_method : 0.000048s : 0.14% auto_monad : 0.000122s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000052s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000146s : 0.42% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000133s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.33% optimize.opt_a.a_1 : 0.003267s : 9.48% optimize.opt_a.with_stream_mark : 0.000051s : 0.15% optimize.opt_a.recompute_prepare : 0.000044s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.43% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.16% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000032s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000031s : 0.09% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000046s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.09% optimize.opt_a.virtual_output : 0.000062s : 0.18% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000039s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001573s : 4.56% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000093s : 0.27% optimize.opt_a.a_after_grad : 0.000112s : 0.33% optimize.opt_a.renormalize : 0.003278s : 9.51% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.24% optimize.opt_a.cse : 0.000244s : 0.71% optimize.opt_a.a_3 : 0.000465s : 1.35% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000507s : 1.47% optimize.opt_b.b_1 : 0.000189s : 0.55% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.07% optimize.loop_unroll : 0.000437s : 1.27% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000026s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000497s : 1.44% validate : 0.000047s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008422s : 24.43% execute : 0.000006s : 0.02% Time group info: ------[substitution.] 0.000803 222 6.29% : 0.000051s : 12: substitution.arithmetic_simplify 1.85% : 0.000015s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 56.06% : 0.000450s : 17: substitution.inline 2.11% : 0.000017s : 2: substitution.inline_without_move 1.35% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000016s : 3: substitution.less_batch_normalization 1.59% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.74% : 0.000014s : 20: substitution.remove_not_recompute_node 3.21% : 0.000026s : 10: substitution.replace_applicator 1.41% : 0.000011s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.72% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.22% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.42% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.27% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011870 2 87.43% : 0.010378s : 1: type_inference.infer 12.57% : 0.001492s : 1: type_inference.specialize ------[replace.] 0.000225 33 58.73% : 0.000132s : 17: replace.inline 41.27% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000476 33 92.84% : 0.000442s : 17: match.inline 7.16% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000759 5764 1.11% : 0.000008s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 100: predicate.arithmetic_simplify 1.16% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.16% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000042s : 249: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000009s : 68: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.70% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.12% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.35% : 0.000010s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.66% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.31% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.93% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.24% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001610 34 57.87% : 0.000932s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.13% : 0.000678s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064520 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.88% : 0.003148s : 1: add_attr 4.86% : 0.003138s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000129s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000567s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000447s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.80% : 0.000516s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.71% : 0.004977s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000173s : 28: opt.transform.opt_b 0.11% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.81% : 0.011490s : 1: opt_a 0.22% : 0.000141s : 1: opt_after_cconv 0.79% : 0.000508s : 1: opt_after_jit_grad 0.46% : 0.000298s : 1: opt_b 21.45% : 0.013841s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000057s : 1: pre_auto_parallel 0.07% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000036s : 1: remove_dup_value 2.72% : 0.001758s : 2: renormalize.infer 2.33% : 0.001505s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000053s : 1: rewriter_after_opt_a 0.23% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000103s : 1: symbol_engine_optimizer 13.07% : 0.008433s : 1: task_emit 0.16% : 0.000104s : 1: tuple_transform 18.55% : 0.011971s : 1: type_inference 0.17% : 0.000111s : 1: validate TotalTime = 0.0190283, [24] [bootstrap]: 0.00048421 [type_inference]: 0.00445181 [event_method]: 1.052e-05 [auto_monad]: 5.152e-05 [graph_reusing]: 5.59e-06 [inline]: 1.94e-06 [add_attr]: 0.0030592, [1] [add_attr_with_inline]: 0.00305059, [1] [Cycle 1]: 4.748e-05, [2] [tag_attr]: 1.279e-05 [meta_addattr_fg_expand]: 3.13e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 2.287e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.83002e-06 [pipeline_split]: 2.12999e-06 [optimize]: 0.00384655, [53] [py_interpret_to_execute]: 1.526e-05 [rewriter_before_opt_a]: 3.963e-05 [opt_a]: 0.0019196, [2] [Cycle 1]: 0.00130936, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 2.388e-05 [loop_unroll]: 1.362e-05 [a_1]: 0.00029528 [with_stream_mark]: 1.362e-05 [recompute_prepare]: 7.06001e-06 [updatestate_depend_eliminate]: 3.34001e-06 [updatestate_assign_eliminate]: 2.97002e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.504e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 2.02999e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 5.61003e-06 [merge_send_recv]: 8.25e-06 [auto_parallel]: 6.31998e-06 [parallel]: 1.769e-05 [flash_sp]: 7.48999e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 9.91e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 6.73998e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.119e-05 [a_after_grad]: 8.92e-06 [renormalize]: 0.00038622 [add_forward_monad_depend]: 4.93001e-06 [auto_monad_grad]: 2.11998e-06 [auto_monad_eliminator]: 1.461e-05 [cse]: 2.768e-05 [a_3]: 3.99e-05 [Cycle 2]: 0.00060032, [45] [expand_dump_flag]: 1.19e-06 [switch_simplify]: 7.36999e-06 [loop_unroll]: 5.67999e-06 [a_1]: 0.00012877 [with_stream_mark]: 1.107e-05 [recompute_prepare]: 5.47999e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.21998e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 6.795e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 1.34998e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.40001e-06 [merge_send_recv]: 4.90999e-06 [auto_parallel]: 5.47999e-06 [parallel]: 4.53999e-06 [flash_sp]: 3.18998e-06 [merge_comm]: 3.21999e-06 [allreduce_fusion]: 2.95002e-06 [matmul_add_comm_reduction]: 5.89999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.02001e-06 [virtual_dataset]: 5.06002e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.98998e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.37001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.29e-06 [merge_recompute_call_nodes]: 9.39996e-07 [before_grad]: 7.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.37e-06 [after_resolve]: 9.54e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.335e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 8.25999e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.188e-05 [convert_after_rewriter]: 7.56999e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00048191 [opt_b]: 0.00018427, [1] [Cycle 1]: 0.00017779, [7] [b_1]: 0.00010818 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 6.49001e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.63998e-06 [renormalize]: 3.69997e-07 [cse]: 1.716e-05 [optimize_parallel_all_gather_comm]: 1.731e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 2.327e-05 [loop_unroll]: 0.00041587 [opt_after_cconv]: 9.612e-05, [1] [Cycle 1]: 9.029e-05, [7] [c_1]: 2.76e-05 [parameter_eliminate]: 2.58998e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.32001e-06 [cse]: 1.667e-05 [renormalize]: 3.70026e-07 [remove_dup_value]: 1.236e-05 [tuple_transform]: 6.96e-05, [1] [Cycle 1]: 6.522e-05, [4] [d_1]: 3.889e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.454e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.582e-05, [1] [cse]: 1.072e-05 [environ_conv]: 4.57e-06 [swap_dp_allreduce_reducescatter]: 4.92e-06 [bias_add_comm_swap]: 2.24001e-06 [label_micro_interleaved_index]: 4.65999e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.47999e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.87002e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 3.90998e-06 [overlap_grad_flash_sp]: 1.787e-05 [begin_end_overlap_inline]: 4.40021e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 7.013e-05, [1] [Cycle 1]: 6.555e-05, [6] [build]: 3.04999e-06 [elim_shapecalc]: 8.89998e-06 [elim_not_effective]: 1.151e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.95001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.52001e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.606e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00046721 [validate]: 3.232e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00635027 [execute]: 6.76999e-06 Sums bootstrap : 0.000484s : 3.24% type_inference : 0.004452s : 29.83% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000424s : 2.84% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.96% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000386s : 2.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.14% optimize.opt_a.cse : 0.000041s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000482s : 3.23% optimize.opt_b.b_1 : 0.000108s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000416s : 2.79% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000467s : 3.13% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006350s : 42.55% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000125 26 18.81% : 0.000024s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.26% : 0.000002s : 2: substitution.fold_const_symbol 4.19% : 0.000005s : 4: substitution.graph_param_transform 65.01% : 0.000081s : 2: substitution.inline 2.40% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.20% : 0.000004s : 4: substitution.remove_not_recompute_node 3.70% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004408 2 91.96% : 0.004054s : 1: type_inference.infer 8.04% : 0.000354s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000138 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.74% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.78% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.60% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.82% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.06% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 1.10% : 0.000002s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.70% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.58% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.97% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.91% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000287 6 49.01% : 0.000141s : 2: func_graph_cloner_run.FuncGraphClonerGraph 50.99% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027252 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.24% : 0.003064s : 1: add_attr 11.21% : 0.003054s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000522s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.56% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.80% : 0.000492s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.84% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.06% : 0.001923s : 1: opt_a 0.36% : 0.000099s : 1: opt_after_cconv 1.75% : 0.000478s : 1: opt_after_jit_grad 0.69% : 0.000188s : 1: opt_b 14.13% : 0.003851s : 1: optimize 0.08% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.27% : 0.000073s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.79% : 0.000217s : 1: renormalize.infer 0.60% : 0.000163s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000073s : 1: symbol_engine_optimizer 23.34% : 0.006361s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.39% : 0.004467s : 1: type_inference 0.22% : 0.000060s : 1: validate TotalTime = 0.0367739, [24] [bootstrap]: 0.00051005 [type_inference]: 0.0104543 [event_method]: 4.099e-05 [auto_monad]: 0.0001126 [graph_reusing]: 7.98001e-06 [inline]: 2.06e-06 [add_attr]: 0.00309886, [1] [add_attr_with_inline]: 0.00309046, [1] [Cycle 1]: 7.077e-05, [2] [tag_attr]: 3.194e-05 [meta_addattr_fg_expand]: 8.69003e-06 [parallel-infer-symbol]: 3.25e-06 [pre_auto_parallel]: 4.613e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 8.99978e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0133978, [53] [py_interpret_to_execute]: 3.747e-05 [rewriter_before_opt_a]: 0.00012656 [opt_a]: 0.0110859, [3] [Cycle 1]: 0.00706656, [45] [expand_dump_flag]: 3.93001e-06 [switch_simplify]: 6.71e-05 [loop_unroll]: 5.453e-05 [a_1]: 0.00138004 [with_stream_mark]: 2.445e-05 [recompute_prepare]: 2.312e-05 [updatestate_depend_eliminate]: 9.52999e-06 [updatestate_assign_eliminate]: 7.95998e-06 [updatestate_loads_eliminate]: 7.43e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 0.00024397 [accelerated_algorithm]: 3.141e-05 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 3.19001e-06 [shard_inline]: 1.608e-05 [merge_send_recv]: 1.642e-05 [auto_parallel]: 1.068e-05 [parallel]: 1.845e-05 [flash_sp]: 1.232e-05 [merge_comm]: 9.32001e-06 [allreduce_fusion]: 9.16002e-06 [matmul_add_comm_reduction]: 2.82e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.848e-05 [virtual_dataset]: 1.539e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.529e-05 [merge_forward]: 9.88998e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 1.795e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.835e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 2.8e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00143894 [flash_sp_send_recv_attached]: 3.76999e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 6.151e-05 [a_after_grad]: 8.273e-05 [renormalize]: 0.00246773 [add_forward_monad_depend]: 9.72999e-06 [auto_monad_grad]: 5.89e-06 [auto_monad_eliminator]: 5.567e-05 [cse]: 0.00016333 [a_3]: 0.00033935 [Cycle 2]: 0.00309922, [45] [expand_dump_flag]: 2.08002e-06 [switch_simplify]: 4.731e-05 [loop_unroll]: 4.362e-05 [a_1]: 0.00158891 [with_stream_mark]: 1.462e-05 [recompute_prepare]: 1.21e-05 [updatestate_depend_eliminate]: 5.04998e-06 [updatestate_assign_eliminate]: 4.84003e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 1.39e-06 [a_2]: 0.00012764 [accelerated_algorithm]: 1.179e-05 [shard]: 1.35999e-06 [meta_shard_fg_expand]: 2.04999e-06 [shard_inline]: 9.42001e-06 [merge_send_recv]: 7.04001e-06 [auto_parallel]: 8.17998e-06 [parallel]: 5.65001e-06 [flash_sp]: 3.44001e-06 [merge_comm]: 4.89e-06 [allreduce_fusion]: 4.83001e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 1.043e-05 [virtual_dataset]: 9.08002e-06 [get_grad_eliminate_]: 8.60001e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.72e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.57001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.659e-05 [merge_recompute_call_nodes]: 9.5999e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.47999e-06 [meta_fg_expand]: 3.769e-05 [flash_sp_send_recv_attached]: 1.01002e-06 [receive_attached]: 1.77999e-06 [after_resolve]: 1.593e-05 [a_after_grad]: 1.437e-05 [renormalize]: 0.0006384 [add_forward_monad_depend]: 4.23999e-06 [auto_monad_grad]: 1.54e-06 [auto_monad_eliminator]: 1.531e-05 [cse]: 4.92e-05 [a_3]: 6.547e-05 [Cycle 3]: 0.00090486, [45] [expand_dump_flag]: 1.15999e-06 [switch_simplify]: 1.035e-05 [loop_unroll]: 8.81002e-06 [a_1]: 0.00024976 [with_stream_mark]: 1.021e-05 [recompute_prepare]: 9.34998e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 4.30999e-06 [updatestate_loads_eliminate]: 4.06001e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012271 [accelerated_algorithm]: 1.137e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.84998e-06 [shard_inline]: 9.00999e-06 [merge_send_recv]: 7.48999e-06 [auto_parallel]: 6.89001e-06 [parallel]: 4.56002e-06 [flash_sp]: 1.02e-06 [merge_comm]: 5.12e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 1.013e-05 [virtual_dataset]: 8.59e-06 [get_grad_eliminate_]: 8.42e-06 [virtual_output]: 8.10999e-06 [merge_forward]: 4.15999e-06 [cell_reuse_recompute_pass]: 1.41998e-06 [offload_activation]: 8.78001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.563e-05 [merge_recompute_call_nodes]: 9.09989e-07 [before_grad]: 1.373e-05 [set_forward_comm_id_for_comm_node_pass]: 5.01997e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.35999e-06 [after_resolve]: 1.359e-05 [a_after_grad]: 1.391e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.04003e-06 [auto_monad_grad]: 1.24003e-06 [auto_monad_eliminator]: 1.197e-05 [cse]: 2.822e-05 [a_3]: 5.922e-05 [py_interpret_to_execute_after_opt_a]: 1.115e-05 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 4.83e-05 [convert_after_rewriter]: 8.92999e-06 [order_py_execute_after_rewriter]: 6.63998e-06 [mutable_eliminate]: 0.00049435 [opt_b]: 0.00029091, [1] [Cycle 1]: 0.00028365, [7] [b_1]: 0.00018767 [b_2]: 1.091e-05 [updatestate_depend_eliminate]: 7.83001e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 4.12e-06 [renormalize]: 3.80009e-07 [cse]: 3.359e-05 [optimize_parallel_all_gather_comm]: 2.136e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.174e-05 [loop_unroll]: 0.00042477 [opt_after_cconv]: 0.00013504, [1] [Cycle 1]: 0.00012934, [7] [c_1]: 4.812e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 7.09001e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.81001e-06 [cse]: 2.932e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 3.185e-05 [tuple_transform]: 0.00011638, [1] [Cycle 1]: 9.93e-05, [4] [d_1]: 6.814e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.77001e-06 [partial_unused_args_eliminate]: 1.99e-06 [add_recomputation]: 6.047e-05 [cse_after_recomputation]: 3.159e-05, [1] [Cycle 1]: 2.687e-05, [1] [cse]: 2.171e-05 [environ_conv]: 9.30001e-06 [swap_dp_allreduce_reducescatter]: 7.70998e-06 [bias_add_comm_swap]: 2.49999e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.94001e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.14003e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.64e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.34e-06 [overlap_recompute_and_grad_model_parallel]: 5.58002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.04999e-06 [overlap_grad_ring_attention]: 5.12e-06 [overlap_grad_flash_sp]: 2.444e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010062, [1] [Cycle 1]: 9.653e-05, [6] [build]: 1.067e-05 [elim_shapecalc]: 1.359e-05 [elim_not_effective]: 1.826e-05 [opt_reshape]: 1.028e-05 [fold_const_symbol]: 1.46e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.36002e-06 [auto_monad_reorder]: 2.425e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00048104 [validate]: 4.732e-05 [backend_pass]: 1.23002e-06 [task_emit]: 0.00831086 [execute]: 7.23e-06 Sums bootstrap : 0.000510s : 1.58% type_inference : 0.010454s : 32.30% event_method : 0.000041s : 0.13% auto_monad : 0.000113s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.12% optimize.rewriter_before_opt_a : 0.000127s : 0.39% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.33% optimize.opt_a.a_1 : 0.003219s : 9.94% optimize.opt_a.with_stream_mark : 0.000049s : 0.15% optimize.opt_a.recompute_prepare : 0.000045s : 0.14% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000494s : 1.53% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.10% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001480s : 4.57% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000091s : 0.28% optimize.opt_a.a_after_grad : 0.000111s : 0.34% optimize.opt_a.renormalize : 0.003106s : 9.60% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.26% optimize.opt_a.cse : 0.000241s : 0.74% optimize.opt_a.a_3 : 0.000464s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000494s : 1.53% optimize.opt_b.b_1 : 0.000188s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.07% optimize.loop_unroll : 0.000425s : 1.31% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.10% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.19% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000481s : 1.49% validate : 0.000047s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008311s : 25.68% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000760 218 6.19% : 0.000047s : 11: substitution.arithmetic_simplify 2.00% : 0.000015s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.08% : 0.000008s : 8: substitution.graph_param_transform 0.47% : 0.000004s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.38% : 0.000413s : 16: substitution.inline 2.20% : 0.000017s : 2: substitution.inline_without_move 1.43% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.76% : 0.000013s : 20: substitution.remove_not_recompute_node 3.39% : 0.000026s : 10: substitution.replace_applicator 1.53% : 0.000012s : 15: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000065s : 28: substitution.tuple_list_get_item_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010382 2 87.68% : 0.009103s : 1: type_inference.infer 12.32% : 0.001279s : 1: type_inference.specialize ------[replace.] 0.000238 30 64.64% : 0.000154s : 16: replace.inline 35.36% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000437 30 92.64% : 0.000405s : 16: match.inline 7.36% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.99% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.84% : 0.000014s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.56% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 97: predicate.partial_defer_inline 1.69% : 0.000012s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.65% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.70% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.19% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.91% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.97% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.20% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001489 32 57.68% : 0.000859s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.32% : 0.000630s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061575 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.04% : 0.003103s : 1: add_attr 5.02% : 0.003094s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000120s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000544s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.82% : 0.000504s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.92% : 0.004876s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 18.01% : 0.011089s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.80% : 0.000491s : 1: opt_after_jit_grad 0.48% : 0.000295s : 1: opt_b 21.77% : 0.013402s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000036s : 1: remove_dup_value 2.75% : 0.001696s : 2: renormalize.infer 2.27% : 0.001397s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000052s : 1: rewriter_after_opt_a 0.21% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.52% : 0.008322s : 1: task_emit 0.19% : 0.000119s : 1: tuple_transform 17.01% : 0.010472s : 1: type_inference 0.13% : 0.000081s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x4-kbk],max_mem:10.0M TotalTime = 0.8672, [24] [bootstrap]: 0.00056565 [type_inference]: 0.00632414 [event_method]: 1.383e-05 [auto_monad]: 5.561e-05 [graph_reusing]: 5.40999e-06 [inline]: 2.27999e-06 [add_attr]: 0.00351073, [1] [add_attr_with_inline]: 0.00349869, [1] [Cycle 1]: 5.189e-05, [2] [tag_attr]: 1.69e-05 [meta_addattr_fg_expand]: 4.32e-06 [parallel-infer-symbol]: 3.38e-06 [pre_auto_parallel]: 2.967e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 6.99976e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00412551, [53] [py_interpret_to_execute]: 2.122e-05 [rewriter_before_opt_a]: 5.847e-05 [opt_a]: 0.00218725, [2] [Cycle 1]: 0.00158194, [45] [expand_dump_flag]: 2.47001e-06 [switch_simplify]: 3.27e-05 [loop_unroll]: 2.145e-05 [a_1]: 0.00046057 [with_stream_mark]: 1.329e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.46001e-06 [updatestate_assign_eliminate]: 3.10002e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.95001e-06 [a_2]: 7.637e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 5.88002e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 6.14001e-06 [parallel]: 2.343e-05 [flash_sp]: 7.1e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.33002e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.23e-06 [virtual_dataset]: 6.36e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.122e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.21998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.84001e-06 [after_resolve]: 1.082e-05 [a_after_grad]: 9.01998e-06 [renormalize]: 0.00046498 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.376e-05 [cse]: 2.712e-05 [a_3]: 4.164e-05 [Cycle 2]: 0.00059476, [45] [expand_dump_flag]: 1.36002e-06 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012436 [with_stream_mark]: 1.038e-05 [recompute_prepare]: 5.85002e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 6.759e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 1.34998e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.63001e-06 [auto_parallel]: 5.61003e-06 [parallel]: 4.67e-06 [flash_sp]: 3.08e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 2.70002e-06 [matmul_add_comm_reduction]: 5.81e-06 [allreduce_slice_to_reducescatter]: 2.40019e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 5.13002e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.39001e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 6.26998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61e-06 [merge_recompute_call_nodes]: 8.69972e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.32999e-06 [after_resolve]: 9.36998e-06 [a_after_grad]: 8.11002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.321e-05 [a_3]: 3.336e-05 [py_interpret_to_execute_after_opt_a]: 7.99002e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.207e-05 [convert_after_rewriter]: 7.31001e-06 [order_py_execute_after_rewriter]: 4.68999e-06 [mutable_eliminate]: 0.00048891 [opt_b]: 0.00019061, [1] [Cycle 1]: 0.00018321, [7] [b_1]: 0.00010705 [b_2]: 1.091e-05 [updatestate_depend_eliminate]: 6.20002e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 4.89992e-07 [cse]: 1.721e-05 [optimize_parallel_all_gather_comm]: 1.633e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.502e-05 [loop_unroll]: 0.00042644 [opt_after_cconv]: 9.638e-05, [1] [Cycle 1]: 9.051e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.64999e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.621e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.264e-05 [tuple_transform]: 7.02e-05, [1] [Cycle 1]: 6.565e-05, [4] [d_1]: 4.029e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 5.418e-05 [cse_after_recomputation]: 2.104e-05, [1] [Cycle 1]: 1.675e-05, [1] [cse]: 1.15e-05 [environ_conv]: 4.99998e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 4.03999e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.38002e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.10019e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.33998e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 4.08001e-06 [overlap_grad_flash_sp]: 1.77e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.13002e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 6.742e-05, [1] [Cycle 1]: 6.331e-05, [6] [build]: 2.74001e-06 [elim_shapecalc]: 8.3e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.557e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 4.11001e-06 [opt_after_jit_grad]: 0.0004712 [validate]: 3.292e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.851801 [execute]: 9.44e-06 Sums bootstrap : 0.000566s : 0.07% type_inference : 0.006324s : 0.73% event_method : 0.000014s : 0.00% auto_monad : 0.000056s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000030s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000585s : 0.07% optimize.opt_a.with_stream_mark : 0.000024s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000465s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000040s : 0.00% optimize.opt_a.a_3 : 0.000075s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000489s : 0.06% optimize.opt_b.b_1 : 0.000107s : 0.01% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.00% optimize.loop_unroll : 0.000426s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000471s : 0.05% validate : 0.000033s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.851801s : 98.74% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000175 30 14.86% : 0.000026s : 5: substitution.arithmetic_simplify 0.99% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.37% : 0.000006s : 4: substitution.graph_param_transform 66.83% : 0.000117s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000005s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.26% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006273 2 91.20% : 0.005721s : 1: type_inference.infer 8.80% : 0.000552s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.95% : 0.000028s : 3: replace.inline 29.05% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000125 5 92.05% : 0.000115s : 3: match.inline 7.95% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.31% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.53% : 0.000002s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.54% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 46.55% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.45% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.876404 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.40% : 0.003515s : 1: add_attr 0.40% : 0.003502s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000061s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000603s : 1: bootstrap 0.00% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000436s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000521s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000954s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000093s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.25% : 0.002190s : 1: opt_a 0.01% : 0.000100s : 1: opt_after_cconv 0.05% : 0.000481s : 1: opt_after_jit_grad 0.02% : 0.000194s : 1: opt_b 0.47% : 0.004130s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000034s : 1: pre_auto_parallel 0.00% : 0.000026s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.03% : 0.000247s : 1: renormalize.infer 0.02% : 0.000211s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000070s : 1: symbol_engine_optimizer 97.20% : 0.851822s : 1: task_emit 0.01% : 0.000073s : 1: tuple_transform 0.72% : 0.006340s : 1: type_inference 0.01% : 0.000056s : 1: validate TotalTime = 0.0767178, [24] [bootstrap]: 0.00048864 [type_inference]: 0.00461026 [event_method]: 1.05e-05 [auto_monad]: 5.398e-05 [graph_reusing]: 4.97999e-06 [inline]: 1.76e-06 [add_attr]: 0.00313473, [1] [add_attr_with_inline]: 0.0031268, [1] [Cycle 1]: 4.816e-05, [2] [tag_attr]: 1.212e-05 [meta_addattr_fg_expand]: 2.96001e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.177e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 2.11998e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.00380133, [53] [py_interpret_to_execute]: 1.618e-05 [rewriter_before_opt_a]: 3.913e-05 [opt_a]: 0.00194034, [2] [Cycle 1]: 0.00129866, [45] [expand_dump_flag]: 3.13e-06 [switch_simplify]: 2.323e-05 [loop_unroll]: 1.356e-05 [a_1]: 0.00029406 [with_stream_mark]: 1.371e-05 [recompute_prepare]: 7.45998e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 2.95002e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.682e-05 [accelerated_algorithm]: 6.43998e-06 [shard]: 2.55997e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 7.35998e-06 [auto_parallel]: 6.03002e-06 [parallel]: 1.787e-05 [flash_sp]: 7.58001e-06 [merge_comm]: 3.41999e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.95998e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.78998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.138e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.34998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.79001e-06 [after_resolve]: 1.091e-05 [a_after_grad]: 8.79998e-06 [renormalize]: 0.00038125 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 1.99e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.74e-05 [a_3]: 4.01e-05 [Cycle 2]: 0.00063192, [45] [expand_dump_flag]: 1.18001e-06 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012624 [with_stream_mark]: 9.58997e-06 [recompute_prepare]: 5.35001e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 9.989e-05 [accelerated_algorithm]: 5.67999e-06 [shard]: 1.53002e-06 [meta_shard_fg_expand]: 1.04e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.65001e-06 [auto_parallel]: 5.05999e-06 [parallel]: 4.38999e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.92001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.49001e-06 [virtual_dataset]: 5.26998e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.65002e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.11e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.01e-05 [merge_recompute_call_nodes]: 8.59989e-07 [before_grad]: 8.21002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.18001e-06 [after_resolve]: 9.51e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.77002e-06 [cse]: 1.373e-05 [a_3]: 3.197e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 1.66e-06 [rewriter_after_opt_a]: 3.496e-05 [convert_after_rewriter]: 6.74999e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00048171 [opt_b]: 0.00018589, [1] [Cycle 1]: 0.00017858, [7] [b_1]: 0.00010837 [b_2]: 7.43999e-06 [updatestate_depend_eliminate]: 5.93002e-06 [updatestate_assign_eliminate]: 2.70002e-06 [updatestate_loads_eliminate]: 2.61e-06 [renormalize]: 4.00003e-07 [cse]: 1.716e-05 [optimize_parallel_all_gather_comm]: 1.543e-05 [overlap_param_gather]: 2.15002e-06 [cconv]: 2.297e-05 [loop_unroll]: 0.00041412 [opt_after_cconv]: 9.565e-05, [1] [Cycle 1]: 8.959e-05, [7] [c_1]: 2.769e-05 [parameter_eliminate]: 2.56e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.667e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.263e-05 [tuple_transform]: 6.93e-05, [1] [Cycle 1]: 6.512e-05, [4] [d_1]: 3.892e-05 [none_parameter_eliminate]: 1.92999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 5.15e-05 [cse_after_recomputation]: 2.067e-05, [1] [Cycle 1]: 1.616e-05, [1] [cse]: 1.11e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 4.92e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.07003e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.66999e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.09003e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66002e-06 [control_data_broadcast_order]: 1.124e-05 [grouped_pairwise_exchange_alltoall]: 1.87999e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 1.773e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 6.943e-05, [1] [Cycle 1]: 6.531e-05, [6] [build]: 2.66999e-06 [elim_shapecalc]: 8.71002e-06 [elim_not_effective]: 1.104e-05 [opt_reshape]: 6.28998e-06 [fold_const_symbol]: 9.08002e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.617e-05 [get_jit_bprop_graph]: 9.60019e-07 [rewriter_after_jit_bprop_graph]: 4.03001e-06 [opt_after_jit_grad]: 0.00045567 [validate]: 3.778e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0638289 [execute]: 1.028e-05 Sums bootstrap : 0.000489s : 0.67% type_inference : 0.004610s : 6.35% event_method : 0.000010s : 0.01% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.58% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000177s : 0.24% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000381s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000482s : 0.66% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000414s : 0.57% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000456s : 0.63% validate : 0.000038s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.063829s : 87.93% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.27% : 0.000023s : 4: substitution.arithmetic_simplify 1.32% : 0.000002s : 2: substitution.elim_not_effective 1.15% : 0.000001s : 2: substitution.fold_const_symbol 4.31% : 0.000005s : 4: substitution.graph_param_transform 65.32% : 0.000081s : 2: substitution.inline 2.41% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 3.68% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004567 2 91.95% : 0.004199s : 1: type_inference.infer 8.05% : 0.000368s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000138 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.13% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.49% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.57% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_depend_swap 1.80% : 0.000002s : 21: predicate.environ_get_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.79% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.46% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.73% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.29% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.64% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.77% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000275 6 42.84% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.16% : 0.000157s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084993 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.69% : 0.003140s : 1: add_attr 3.68% : 0.003130s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000524s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.03% : 0.000025s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000491s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 0.94% : 0.000803s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.29% : 0.001943s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.55% : 0.000465s : 1: opt_after_jit_grad 0.22% : 0.000189s : 1: opt_b 4.48% : 0.003806s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000211s : 1: renormalize.infer 0.19% : 0.000163s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000039s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 75.13% : 0.063852s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 5.44% : 0.004626s : 1: type_inference 0.07% : 0.000061s : 1: validate TotalTime = 0.0765386, [24] [bootstrap]: 0.00048332 [type_inference]: 0.00631201 [event_method]: 1.503e-05 [auto_monad]: 5.481e-05 [graph_reusing]: 5.25999e-06 [inline]: 1.94e-06 [add_attr]: 0.00315185, [1] [add_attr_with_inline]: 0.00314231, [1] [Cycle 1]: 5.346e-05, [2] [tag_attr]: 1.613e-05 [meta_addattr_fg_expand]: 4.89e-06 [parallel-infer-symbol]: 3.36001e-06 [pre_auto_parallel]: 2.863e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00422764, [53] [py_interpret_to_execute]: 2.121e-05 [rewriter_before_opt_a]: 5.847e-05 [opt_a]: 0.00228743, [2] [Cycle 1]: 0.00165518, [45] [expand_dump_flag]: 3.13e-06 [switch_simplify]: 3.385e-05 [loop_unroll]: 2.066e-05 [a_1]: 0.00049465 [with_stream_mark]: 1.458e-05 [recompute_prepare]: 8.05e-06 [updatestate_depend_eliminate]: 3.84002e-06 [updatestate_assign_eliminate]: 3.66999e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.759e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 2.26998e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.26998e-06 [merge_send_recv]: 8.59002e-06 [auto_parallel]: 6.71e-06 [parallel]: 1.834e-05 [flash_sp]: 8.48999e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.12002e-06 [matmul_add_comm_reduction]: 9.58002e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.42999e-06 [virtual_output]: 5.61e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 9.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.09998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.77002e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.86e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 8.62998e-06 [renormalize]: 0.00049238 [add_forward_monad_depend]: 5.62999e-06 [auto_monad_grad]: 2.24999e-06 [auto_monad_eliminator]: 1.477e-05 [cse]: 2.864e-05 [a_3]: 4.3e-05 [Cycle 2]: 0.00062181, [45] [expand_dump_flag]: 1.37e-06 [switch_simplify]: 7.04001e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012622 [with_stream_mark]: 1.089e-05 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 3.15002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 6.868e-05 [accelerated_algorithm]: 5.92999e-06 [shard]: 1.91998e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.59998e-06 [merge_send_recv]: 4.72e-06 [auto_parallel]: 6.17001e-06 [parallel]: 4.93001e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 9.39998e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 6.46e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 5.25999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.56998e-06 [offload_activation]: 6.24999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.015e-05 [merge_recompute_call_nodes]: 1.19998e-06 [before_grad]: 8.18999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 9.40025e-07 [receive_attached]: 1.52001e-06 [after_resolve]: 9.18002e-06 [a_after_grad]: 8.40001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.59e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 7.47998e-06 [cse]: 1.553e-05 [a_3]: 3.314e-05 [py_interpret_to_execute_after_opt_a]: 8.92999e-06 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 3.246e-05 [convert_after_rewriter]: 7.38e-06 [order_py_execute_after_rewriter]: 5.71e-06 [mutable_eliminate]: 0.00049821 [opt_b]: 0.00018882, [1] [Cycle 1]: 0.00018144, [7] [b_1]: 0.00010884 [b_2]: 7.26999e-06 [updatestate_depend_eliminate]: 6.24001e-06 [updatestate_assign_eliminate]: 2.90002e-06 [updatestate_loads_eliminate]: 2.54999e-06 [renormalize]: 3.60014e-07 [cse]: 1.89e-05 [optimize_parallel_all_gather_comm]: 1.792e-05 [overlap_param_gather]: 2.34001e-06 [cconv]: 2.462e-05 [loop_unroll]: 0.00043501 [opt_after_cconv]: 9.83e-05, [1] [Cycle 1]: 9.201e-05, [7] [c_1]: 2.773e-05 [parameter_eliminate]: 2.63e-06 [updatestate_depend_eliminate]: 5.41998e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.48e-06 [cse]: 1.8e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.223e-05 [tuple_transform]: 6.998e-05, [1] [Cycle 1]: 6.567e-05, [4] [d_1]: 3.978e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 5.254e-05 [cse_after_recomputation]: 2.18e-05, [1] [Cycle 1]: 1.708e-05, [1] [cse]: 1.179e-05 [environ_conv]: 5.35001e-06 [swap_dp_allreduce_reducescatter]: 5.34998e-06 [bias_add_comm_swap]: 2.45997e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.65001e-06 [slice_recompute_activation]: 2.17001e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.15001e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 9.99979e-07 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.147e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.68001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67001e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 3.81001e-06 [overlap_grad_flash_sp]: 1.78e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 7.064e-05, [1] [Cycle 1]: 6.622e-05, [6] [build]: 3.11999e-06 [elim_shapecalc]: 8.69998e-06 [elim_not_effective]: 1.185e-05 [opt_reshape]: 6.12999e-06 [fold_const_symbol]: 8.62998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.626e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.86999e-06 [opt_after_jit_grad]: 0.00053975 [validate]: 3.392e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.0614138 [execute]: 9.44e-06 Sums bootstrap : 0.000483s : 0.67% type_inference : 0.006312s : 8.72% event_method : 0.000015s : 0.02% auto_monad : 0.000055s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000621s : 0.86% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000492s : 0.68% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.03% optimize.opt_a.cse : 0.000044s : 0.06% optimize.opt_a.a_3 : 0.000076s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000498s : 0.69% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.03% optimize.loop_unroll : 0.000435s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000540s : 0.75% validate : 0.000034s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.061414s : 84.86% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000212 30 12.93% : 0.000027s : 5: substitution.arithmetic_simplify 0.85% : 0.000002s : 2: substitution.elim_not_effective 0.57% : 0.000001s : 2: substitution.fold_const_symbol 2.56% : 0.000005s : 4: substitution.graph_param_transform 54.80% : 0.000116s : 3: substitution.inline 1.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.11% : 0.000004s : 4: substitution.remove_not_recompute_node 1.90% : 0.000004s : 4: substitution.replace_old_param 22.85% : 0.000048s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006265 2 90.25% : 0.005654s : 1: type_inference.infer 9.75% : 0.000611s : 1: type_inference.specialize ------[replace.] 0.000040 5 68.15% : 0.000027s : 3: replace.inline 31.85% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000162 5 70.68% : 0.000114s : 3: match.inline 29.32% : 0.000047s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 1.10% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_depend_swap 1.72% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.21% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.51% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.76% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.29% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 21: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000002s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.59% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.03% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000376 8 45.85% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.15% : 0.000203s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.085559 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.69% : 0.003157s : 1: add_attr 3.68% : 0.003146s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000520s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000444s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000508s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 1.16% : 0.000993s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.68% : 0.002291s : 1: opt_a 0.12% : 0.000102s : 1: opt_after_cconv 0.64% : 0.000552s : 1: opt_after_jit_grad 0.22% : 0.000192s : 1: opt_b 4.95% : 0.004232s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.29% : 0.000252s : 1: renormalize.infer 0.27% : 0.000232s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 71.81% : 0.061437s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 7.40% : 0.006331s : 1: type_inference 0.07% : 0.000058s : 1: validate TotalTime = 0.117279, [24] [bootstrap]: 0.00050094 [type_inference]: 0.011392 [event_method]: 4.827e-05 [auto_monad]: 0.00012046 [graph_reusing]: 8.12998e-06 [inline]: 1.87999e-06 [add_attr]: 0.00303324, [1] [add_attr_with_inline]: 0.00302517, [1] [Cycle 1]: 7.004e-05, [2] [tag_attr]: 3.393e-05 [meta_addattr_fg_expand]: 9.56e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 4.956e-05 [insert-virtual-dataset]: 2.61999e-06 [parallel-infer-symbol-second]: 8.79983e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.0134239, [53] [py_interpret_to_execute]: 3.717e-05 [rewriter_before_opt_a]: 0.00014303 [opt_a]: 0.011135, [3] [Cycle 1]: 0.00713954, [45] [expand_dump_flag]: 3.66001e-06 [switch_simplify]: 7.369e-05 [loop_unroll]: 6.161e-05 [a_1]: 0.00144807 [with_stream_mark]: 2.291e-05 [recompute_prepare]: 2.153e-05 [updatestate_depend_eliminate]: 9.00999e-06 [updatestate_assign_eliminate]: 7.38e-06 [updatestate_loads_eliminate]: 7.15003e-06 [parameter_eliminate]: 2.40002e-06 [a_2]: 0.00024422 [accelerated_algorithm]: 3.068e-05 [shard]: 1.80001e-06 [meta_shard_fg_expand]: 3.17002e-06 [shard_inline]: 1.61e-05 [merge_send_recv]: 1.568e-05 [auto_parallel]: 1.087e-05 [parallel]: 1.782e-05 [flash_sp]: 1.103e-05 [merge_comm]: 9.59e-06 [allreduce_fusion]: 8.64e-06 [matmul_add_comm_reduction]: 2.536e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.79e-05 [virtual_dataset]: 1.551e-05 [get_grad_eliminate_]: 1.507e-05 [virtual_output]: 1.496e-05 [merge_forward]: 9.33002e-06 [cell_reuse_recompute_pass]: 1.11997e-06 [offload_activation]: 1.748e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.865e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 2.796e-05 [set_forward_comm_id_for_comm_node_pass]: 9.61e-06 [meta_fg_expand]: 0.00143604 [flash_sp_send_recv_attached]: 3.71999e-06 [receive_attached]: 2.93998e-06 [after_resolve]: 5.904e-05 [a_after_grad]: 8.115e-05 [renormalize]: 0.00246851 [add_forward_monad_depend]: 9.33002e-06 [auto_monad_grad]: 5.29e-06 [auto_monad_eliminator]: 5.638e-05 [cse]: 0.00016805 [a_3]: 0.00036103 [Cycle 2]: 0.00301194, [45] [expand_dump_flag]: 1.67001e-06 [switch_simplify]: 4.789e-05 [loop_unroll]: 4.406e-05 [a_1]: 0.0015337 [with_stream_mark]: 1.205e-05 [recompute_prepare]: 1.062e-05 [updatestate_depend_eliminate]: 5.04998e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012655 [accelerated_algorithm]: 1.203e-05 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.25001e-06 [merge_send_recv]: 6.86001e-06 [auto_parallel]: 7.3e-06 [parallel]: 4.98001e-06 [flash_sp]: 3.45e-06 [merge_comm]: 5.02999e-06 [allreduce_fusion]: 4.53001e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.078e-05 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.94e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 9.05999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.651e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20999e-06 [meta_fg_expand]: 6.985e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.633e-05 [a_after_grad]: 1.508e-05 [renormalize]: 0.00059776 [add_forward_monad_depend]: 4.03999e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 1.503e-05 [cse]: 4.613e-05 [a_3]: 6.533e-05 [Cycle 3]: 0.00096887, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 1.07e-05 [loop_unroll]: 8.82999e-06 [a_1]: 0.00024927 [with_stream_mark]: 1.013e-05 [recompute_prepare]: 9.25001e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 3.81999e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012448 [accelerated_algorithm]: 1.194e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.88002e-06 [shard_inline]: 9.27999e-06 [merge_send_recv]: 7.06999e-06 [auto_parallel]: 6.96001e-06 [parallel]: 4.89e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 5.02e-06 [allreduce_fusion]: 5.50001e-06 [matmul_add_comm_reduction]: 7.63001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 1.042e-05 [virtual_dataset]: 8.88002e-06 [get_grad_eliminate_]: 8.56002e-06 [virtual_output]: 8.30999e-06 [merge_forward]: 4.39998e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.589e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.402e-05 [set_forward_comm_id_for_comm_node_pass]: 5.09003e-06 [meta_fg_expand]: 3.16001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.351e-05 [a_after_grad]: 1.428e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 1.1e-05 [cse]: 8.907e-05 [a_3]: 6.079e-05 [py_interpret_to_execute_after_opt_a]: 1.056e-05 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 4.843e-05 [convert_after_rewriter]: 9.42999e-06 [order_py_execute_after_rewriter]: 6.62002e-06 [mutable_eliminate]: 0.00047769 [opt_b]: 0.00029082, [1] [Cycle 1]: 0.00028459, [7] [b_1]: 0.00019018 [b_2]: 1.065e-05 [updatestate_depend_eliminate]: 7.08998e-06 [updatestate_assign_eliminate]: 4.12003e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 3.89991e-07 [cse]: 3.339e-05 [optimize_parallel_all_gather_comm]: 2.07e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.01e-05 [loop_unroll]: 0.00043041 [opt_after_cconv]: 0.00013601, [1] [Cycle 1]: 0.00013018, [7] [c_1]: 4.775e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 7.30998e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 4.06001e-06 [cse]: 3.09e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 2.832e-05 [tuple_transform]: 0.00010115, [1] [Cycle 1]: 9.648e-05, [4] [d_1]: 6.617e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.007e-05 [partial_unused_args_eliminate]: 2.02999e-06 [add_recomputation]: 5.748e-05 [cse_after_recomputation]: 3.316e-05, [1] [Cycle 1]: 2.847e-05, [1] [cse]: 2.287e-05 [environ_conv]: 8.89e-06 [swap_dp_allreduce_reducescatter]: 8.02003e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.32998e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.33998e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.662e-05 [grouped_pairwise_exchange_alltoall]: 1.40001e-06 [offloading_packed_experts]: 4.97e-06 [overlap_recompute_and_grad_model_parallel]: 5.52999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 5.25001e-06 [overlap_grad_flash_sp]: 2.359e-05 [begin_end_overlap_inline]: 7.39994e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 2.07999e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 9.96e-05, [1] [Cycle 1]: 9.507e-05, [6] [build]: 1.003e-05 [elim_shapecalc]: 1.352e-05 [elim_not_effective]: 1.835e-05 [opt_reshape]: 1.017e-05 [fold_const_symbol]: 1.468e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.78002e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 2.425e-05 [get_jit_bprop_graph]: 1.19003e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00047114 [validate]: 4.647e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.0879204 [execute]: 7.99997e-06 Sums bootstrap : 0.000501s : 0.44% type_inference : 0.011392s : 10.08% event_method : 0.000048s : 0.04% auto_monad : 0.000120s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000143s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.12% optimize.opt_a.loop_unroll : 0.000114s : 0.10% optimize.opt_a.a_1 : 0.003231s : 2.86% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001509s : 1.34% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.10% optimize.opt_a.renormalize : 0.003066s : 2.71% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.07% optimize.opt_a.cse : 0.000303s : 0.27% optimize.opt_a.a_3 : 0.000487s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000478s : 0.42% optimize.opt_b.b_1 : 0.000190s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000430s : 0.38% optimize.opt_after_cconv.c_1 : 0.000048s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000066s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000471s : 0.42% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.087920s : 77.81% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000762 222 5.96% : 0.000045s : 12: substitution.arithmetic_simplify 1.81% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.74% : 0.000425s : 17: substitution.inline 2.07% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.71% : 0.000013s : 20: substitution.remove_not_recompute_node 3.14% : 0.000024s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.75% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.55% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.52% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011320 2 86.32% : 0.009772s : 1: type_inference.infer 13.68% : 0.001548s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.42% : 0.000124s : 17: replace.inline 42.58% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000449 33 92.55% : 0.000416s : 17: match.inline 7.45% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.12% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.01% : 0.000015s : 100: predicate.arithmetic_simplify 1.17% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.53% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.63% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000015s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.93% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.07% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.89% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001610 34 56.55% : 0.000910s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.45% : 0.000699s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.142052 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.14% : 0.003037s : 1: add_attr 2.13% : 0.003029s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000128s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.38% : 0.000537s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000055s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000439s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000486s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.47% : 0.004924s : 117: opt.transform.opt_a 0.03% : 0.000046s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000176s : 28: opt.transform.opt_b 0.05% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 7.84% : 0.011138s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.34% : 0.000481s : 1: opt_after_jit_grad 0.21% : 0.000294s : 1: opt_b 9.45% : 0.013428s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000054s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 1.15% : 0.001636s : 2: renormalize.infer 1.00% : 0.001416s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000148s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000102s : 1: symbol_engine_optimizer 61.91% : 0.087938s : 1: task_emit 0.07% : 0.000104s : 1: tuple_transform 8.03% : 0.011408s : 1: type_inference 0.05% : 0.000071s : 1: validate TotalTime = 0.0702185, [24] [bootstrap]: 0.00045136 [type_inference]: 0.00428195 [event_method]: 1.066e-05 [auto_monad]: 0.00010924 [graph_reusing]: 5.03002e-06 [inline]: 1.76e-06 [add_attr]: 0.00293958, [1] [add_attr_with_inline]: 0.00293067, [1] [Cycle 1]: 4.698e-05, [2] [tag_attr]: 1.189e-05 [meta_addattr_fg_expand]: 3.04999e-06 [parallel-infer-symbol]: 3.51001e-06 [pre_auto_parallel]: 2.061e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.00368974, [53] [py_interpret_to_execute]: 1.523e-05 [rewriter_before_opt_a]: 3.907e-05 [opt_a]: 0.0018927, [2] [Cycle 1]: 0.00129623, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 2.469e-05 [loop_unroll]: 1.348e-05 [a_1]: 0.0002939 [with_stream_mark]: 1.262e-05 [recompute_prepare]: 7.82998e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.14001e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 7.675e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 2.18002e-06 [meta_shard_fg_expand]: 1.42999e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 7.65e-06 [auto_parallel]: 5.68002e-06 [parallel]: 1.808e-05 [flash_sp]: 7.51001e-06 [merge_comm]: 3.59002e-06 [allreduce_fusion]: 3.37997e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.86998e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.65001e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 8.84998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.096e-05 [a_after_grad]: 8.90001e-06 [renormalize]: 0.00038392 [add_forward_monad_depend]: 4.53999e-06 [auto_monad_grad]: 2.01e-06 [auto_monad_eliminator]: 1.232e-05 [cse]: 2.689e-05 [a_3]: 4.033e-05 [Cycle 2]: 0.00058757, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 6.94001e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012472 [with_stream_mark]: 9.10001e-06 [recompute_prepare]: 5.69e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.12001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.653e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.40001e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.22e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 3.05002e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 5.25001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.22001e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 6.07999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.87998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.09988e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.80999e-06 [a_after_grad]: 8.05999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.286e-05 [a_3]: 3.159e-05 [py_interpret_to_execute_after_opt_a]: 7.33e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.125e-05 [convert_after_rewriter]: 6.69999e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.0004477 [opt_b]: 0.00018426, [1] [Cycle 1]: 0.00017862, [7] [b_1]: 0.00011278 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 2.3999e-07 [cse]: 1.563e-05 [optimize_parallel_all_gather_comm]: 1.626e-05 [overlap_param_gather]: 2.15002e-06 [cconv]: 2.161e-05 [loop_unroll]: 0.00040696 [opt_after_cconv]: 9.374e-05, [1] [Cycle 1]: 8.818e-05, [7] [c_1]: 2.706e-05 [parameter_eliminate]: 2.10002e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.644e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.295e-05 [tuple_transform]: 6.805e-05, [1] [Cycle 1]: 6.378e-05, [4] [d_1]: 3.838e-05 [none_parameter_eliminate]: 1.44998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.15997e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 4.388e-05 [cse_after_recomputation]: 2.023e-05, [1] [Cycle 1]: 1.608e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.64002e-06 [swap_dp_allreduce_reducescatter]: 5.18002e-06 [bias_add_comm_swap]: 2.33998e-06 [label_micro_interleaved_index]: 4.40999e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 1.99e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 9.19972e-07 [remove_cast_before_assign_add]: 1.37e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 9.49978e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.131e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.627e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.749e-05, [1] [Cycle 1]: 6.327e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 5.84e-06 [fold_const_symbol]: 8.53001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.55e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00044725 [validate]: 3.125e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.0579878 [execute]: 8.13001e-06 Sums bootstrap : 0.000451s : 0.68% type_inference : 0.004282s : 6.46% event_method : 0.000011s : 0.02% auto_monad : 0.000109s : 0.16% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.63% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000384s : 0.58% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.68% optimize.opt_b.b_1 : 0.000113s : 0.17% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000407s : 0.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000447s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057988s : 87.44% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 17.12% : 0.000020s : 4: substitution.arithmetic_simplify 1.74% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.43% : 0.000005s : 4: substitution.graph_param_transform 66.79% : 0.000080s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.11% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004242 2 91.67% : 0.003889s : 1: type_inference.infer 8.33% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.69% : 0.000004s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.88% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 2.08% : 0.000003s : 21: predicate.environ_get_eliminate 1.16% : 0.000002s : 13: predicate.environ_get_set_eliminate 1.03% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.12% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 9: predicate.minmaximum_grad 1.48% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.84% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 41.23% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.77% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078154 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.77% : 0.002944s : 1: add_attr 3.76% : 0.002935s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.15% : 0.000115s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000486s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000415s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000769s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.43% : 0.001896s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.58% : 0.000457s : 1: opt_after_jit_grad 0.24% : 0.000188s : 1: opt_b 4.73% : 0.003693s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.29% : 0.000223s : 1: renormalize.infer 0.20% : 0.000154s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.22% : 0.058006s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.50% : 0.004296s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.116985, [24] [bootstrap]: 0.00050263 [type_inference]: 0.0102797 [event_method]: 4.284e-05 [auto_monad]: 0.00011279 [graph_reusing]: 7.69002e-06 [inline]: 1.84e-06 [add_attr]: 0.00301031, [1] [add_attr_with_inline]: 0.00300208, [1] [Cycle 1]: 6.636e-05, [2] [tag_attr]: 3.167e-05 [meta_addattr_fg_expand]: 8.57e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 4.596e-05 [insert-virtual-dataset]: 2.71999e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0131221, [53] [py_interpret_to_execute]: 3.487e-05 [rewriter_before_opt_a]: 0.00012649 [opt_a]: 0.0108422, [3] [Cycle 1]: 0.00692497, [45] [expand_dump_flag]: 3.69002e-06 [switch_simplify]: 6.682e-05 [loop_unroll]: 5.497e-05 [a_1]: 0.00133626 [with_stream_mark]: 2.291e-05 [recompute_prepare]: 2.14e-05 [updatestate_depend_eliminate]: 9.17999e-06 [updatestate_assign_eliminate]: 7.81001e-06 [updatestate_loads_eliminate]: 8.05e-06 [parameter_eliminate]: 2.49999e-06 [a_2]: 0.00024737 [accelerated_algorithm]: 3.097e-05 [shard]: 1.76003e-06 [meta_shard_fg_expand]: 3.25e-06 [shard_inline]: 1.64e-05 [merge_send_recv]: 1.554e-05 [auto_parallel]: 1.059e-05 [parallel]: 1.785e-05 [flash_sp]: 1.156e-05 [merge_comm]: 9.72001e-06 [allreduce_fusion]: 8.85999e-06 [matmul_add_comm_reduction]: 2.508e-05 [allreduce_slice_to_reducescatter]: 5.70028e-07 [virtual_shard_identity]: 1.785e-05 [virtual_dataset]: 1.572e-05 [get_grad_eliminate_]: 1.499e-05 [virtual_output]: 1.54e-05 [merge_forward]: 9.51e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.814e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.869e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 2.724e-05 [set_forward_comm_id_for_comm_node_pass]: 9.86998e-06 [meta_fg_expand]: 0.00141406 [flash_sp_send_recv_attached]: 3.72002e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 5.917e-05 [a_after_grad]: 8.288e-05 [renormalize]: 0.00242392 [add_forward_monad_depend]: 9.47001e-06 [auto_monad_grad]: 5.17e-06 [auto_monad_eliminator]: 5.667e-05 [cse]: 0.00016928 [a_3]: 0.00033401 [Cycle 2]: 0.00300539, [45] [expand_dump_flag]: 1.62001e-06 [switch_simplify]: 4.72e-05 [loop_unroll]: 4.371e-05 [a_1]: 0.0015801 [with_stream_mark]: 1.159e-05 [recompute_prepare]: 1.134e-05 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 1.04003e-06 [a_2]: 0.00012555 [accelerated_algorithm]: 1.214e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 9.14998e-06 [merge_send_recv]: 7.26001e-06 [auto_parallel]: 7.60998e-06 [parallel]: 4.87998e-06 [flash_sp]: 3.25e-06 [merge_comm]: 6.19999e-06 [allreduce_fusion]: 5.22e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 2.99973e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 8.84e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.60001e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 8.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.609e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.407e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37999e-06 [meta_fg_expand]: 3.288e-05 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.12e-06 [after_resolve]: 1.428e-05 [a_after_grad]: 1.426e-05 [renormalize]: 0.00058695 [add_forward_monad_depend]: 3.97e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.447e-05 [cse]: 4.668e-05 [a_3]: 6.499e-05 [Cycle 3]: 0.00089827, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.088e-05 [loop_unroll]: 9.14e-06 [a_1]: 0.00024896 [with_stream_mark]: 9.61e-06 [recompute_prepare]: 9.23002e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 3.81001e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012325 [accelerated_algorithm]: 1.165e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 9.09998e-06 [merge_send_recv]: 6.67002e-06 [auto_parallel]: 6.84999e-06 [parallel]: 4.48999e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.97999e-06 [allreduce_fusion]: 5.00001e-06 [matmul_add_comm_reduction]: 7.9e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 9.96998e-06 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.43999e-06 [virtual_output]: 8.52e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 8.48999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.564e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 4.92999e-06 [meta_fg_expand]: 2.91999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.313e-05 [a_after_grad]: 1.389e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 1.08001e-06 [auto_monad_eliminator]: 1.108e-05 [cse]: 2.812e-05 [a_3]: 6.109e-05 [py_interpret_to_execute_after_opt_a]: 1.047e-05 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 4.676e-05 [convert_after_rewriter]: 8.76002e-06 [order_py_execute_after_rewriter]: 6.97002e-06 [mutable_eliminate]: 0.00049626 [opt_b]: 0.00028922, [1] [Cycle 1]: 0.00028321, [7] [b_1]: 0.00018951 [b_2]: 1.11e-05 [updatestate_depend_eliminate]: 7.28e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.95e-06 [renormalize]: 3.60014e-07 [cse]: 3.245e-05 [optimize_parallel_all_gather_comm]: 2.038e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.024e-05 [loop_unroll]: 0.0004241 [opt_after_cconv]: 0.00013754, [1] [Cycle 1]: 0.00013174, [7] [c_1]: 4.839e-05 [parameter_eliminate]: 2.47001e-06 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.83999e-06 [cse]: 3.083e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 2.855e-05 [tuple_transform]: 0.00010164, [1] [Cycle 1]: 9.683e-05, [4] [d_1]: 6.659e-05 [none_parameter_eliminate]: 1.66002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.98002e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 5.64e-05 [cse_after_recomputation]: 3.369e-05, [1] [Cycle 1]: 2.878e-05, [1] [cse]: 2.295e-05 [environ_conv]: 9.37999e-06 [swap_dp_allreduce_reducescatter]: 7.98001e-06 [bias_add_comm_swap]: 2.33002e-06 [label_micro_interleaved_index]: 4.44002e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.74999e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.778e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 5.10999e-06 [overlap_recompute_and_grad_model_parallel]: 5.77999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 5.32999e-06 [overlap_grad_flash_sp]: 2.405e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.87001e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 9.76e-05, [1] [Cycle 1]: 9.336e-05, [6] [build]: 9.23002e-06 [elim_shapecalc]: 1.327e-05 [elim_not_effective]: 1.815e-05 [opt_reshape]: 9.77999e-06 [fold_const_symbol]: 1.485e-05 [renormalize]: 1.69995e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 2.432e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00046319 [validate]: 4.593e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.089085 [execute]: 9.05999e-06 Sums bootstrap : 0.000503s : 0.45% type_inference : 0.010280s : 9.12% event_method : 0.000043s : 0.04% auto_monad : 0.000113s : 0.10% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000126s : 0.11% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.11% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003165s : 2.81% optimize.opt_a.with_stream_mark : 0.000044s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001450s : 1.29% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.10% optimize.opt_a.renormalize : 0.003011s : 2.67% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.07% optimize.opt_a.cse : 0.000244s : 0.22% optimize.opt_a.a_3 : 0.000460s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000496s : 0.44% optimize.opt_b.b_1 : 0.000190s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000424s : 0.38% optimize.opt_after_cconv.c_1 : 0.000048s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000463s : 0.41% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.089085s : 79.03% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000730 218 5.85% : 0.000043s : 11: substitution.arithmetic_simplify 2.01% : 0.000015s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.72% : 0.000400s : 16: substitution.inline 2.17% : 0.000016s : 2: substitution.inline_without_move 1.41% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.81% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000005s : 5: substitution.partial_eliminate 1.84% : 0.000013s : 20: substitution.remove_not_recompute_node 3.28% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.71% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.53% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.56% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010213 2 86.90% : 0.008875s : 1: type_inference.infer 13.10% : 0.001337s : 1: type_inference.specialize ------[replace.] 0.000199 30 58.84% : 0.000117s : 16: replace.inline 41.16% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000422 30 92.86% : 0.000392s : 16: match.inline 7.14% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.25% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000016s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.66% : 0.000042s : 244: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.70% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.12% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001546 32 56.71% : 0.000876s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.29% : 0.000669s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.141267 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.13% : 0.003015s : 1: add_attr 2.13% : 0.003006s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000120s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.38% : 0.000536s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000506s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.41% : 0.004814s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000176s : 28: opt.transform.opt_b 0.05% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000052s : 4: opt.transform.symbol_engine_opt 7.68% : 0.010845s : 1: opt_a 0.10% : 0.000141s : 1: opt_after_cconv 0.33% : 0.000473s : 1: opt_after_jit_grad 0.21% : 0.000293s : 1: opt_b 9.29% : 0.013126s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.12% : 0.001586s : 2: renormalize.infer 1.00% : 0.001412s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.09% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000100s : 1: symbol_engine_optimizer 63.08% : 0.089105s : 1: task_emit 0.07% : 0.000104s : 1: tuple_transform 7.29% : 0.010295s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x4-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x5-pynative],max_mem:10.0M TotalTime = 0.0213267, [24] [bootstrap]: 0.00050723 [type_inference]: 0.00608426 [event_method]: 1.449e-05 [auto_monad]: 5.527e-05 [graph_reusing]: 5.01002e-06 [inline]: 1.85001e-06 [add_attr]: 0.00332332, [1] [add_attr_with_inline]: 0.00331273, [1] [Cycle 1]: 4.494e-05, [2] [tag_attr]: 1.51e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 2.73998e-06 [pre_auto_parallel]: 2.777e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.86e-06 [optimize]: 0.00397322, [53] [py_interpret_to_execute]: 1.939e-05 [rewriter_before_opt_a]: 5.864e-05 [opt_a]: 0.0021487, [2] [Cycle 1]: 0.00151039, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 3.217e-05 [loop_unroll]: 2.132e-05 [a_1]: 0.00045059 [with_stream_mark]: 1.275e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.621e-05 [accelerated_algorithm]: 6.49001e-06 [shard]: 1.81998e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.75e-06 [auto_parallel]: 6.01998e-06 [parallel]: 2.303e-05 [flash_sp]: 7.6e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.72e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.51003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.88e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00041613 [add_forward_monad_depend]: 4.76002e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.626e-05 [a_3]: 3.992e-05 [Cycle 2]: 0.00058841, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 6.48e-06 [loop_unroll]: 5.57001e-06 [a_1]: 0.00012509 [with_stream_mark]: 9.12999e-06 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.717e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.08001e-06 [auto_parallel]: 5.08002e-06 [parallel]: 4.03999e-06 [flash_sp]: 2.89001e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 4.97999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.33002e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.01002e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 5.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 8.91002e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.57002e-06 [cse]: 1.638e-05 [a_3]: 3.171e-05 [py_interpret_to_execute_after_opt_a]: 7.43999e-06 [slice_cell_reuse_recomputed_activation]: 2.24999e-06 [rewriter_after_opt_a]: 2.888e-05 [convert_after_rewriter]: 7.11999e-06 [order_py_execute_after_rewriter]: 5.24e-06 [mutable_eliminate]: 0.00044351 [opt_b]: 0.00017963, [1] [Cycle 1]: 0.0001735, [7] [b_1]: 0.0001067 [b_2]: 6.97002e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.33002e-06 [renormalize]: 5.00004e-07 [cse]: 1.596e-05 [optimize_parallel_all_gather_comm]: 1.653e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.165e-05 [loop_unroll]: 0.00040913 [opt_after_cconv]: 9.465e-05, [1] [Cycle 1]: 8.877e-05, [7] [c_1]: 2.773e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.42001e-06 [cse]: 1.593e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.192e-05 [tuple_transform]: 6.992e-05, [1] [Cycle 1]: 6.575e-05, [4] [d_1]: 3.958e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.59999e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.97e-05 [cse_after_recomputation]: 2.049e-05, [1] [Cycle 1]: 1.609e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.72001e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.41002e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.26997e-06 [full_micro_interleaved_order_control]: 2.03002e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.184e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.38e-06 [overlap_recompute_and_grad_model_parallel]: 4.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.37001e-06 [overlap_grad_ring_attention]: 3.85998e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.839e-05, [1] [Cycle 1]: 6.429e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.20999e-06 [elim_not_effective]: 1.148e-05 [opt_reshape]: 5.87001e-06 [fold_const_symbol]: 9.05999e-06 [renormalize]: 3.19997e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.569e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 0.00013365 [opt_after_jit_grad]: 0.00045234 [validate]: 3.077e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00647739 [execute]: 6.49001e-06 Sums bootstrap : 0.000507s : 2.98% type_inference : 0.006084s : 35.79% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000576s : 3.39% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000416s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.61% optimize.opt_b.b_1 : 0.000107s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000409s : 2.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000134s : 0.79% opt_after_jit_grad : 0.000452s : 2.66% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006477s : 38.10% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.94% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.52% : 0.000006s : 4: substitution.graph_param_transform 66.24% : 0.000108s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.89% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006039 2 90.23% : 0.005449s : 1: type_inference.infer 9.77% : 0.000590s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.01% : 0.000027s : 3: replace.inline 29.99% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.31% : 0.000106s : 3: match.inline 8.69% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.20% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.77% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.85% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.97% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.88% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.94% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 8 45.20% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.80% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030132 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.04% : 0.003327s : 1: add_attr 11.01% : 0.003316s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.80% : 0.000544s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.12% : 0.000939s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002152s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.53% : 0.000462s : 1: opt_after_jit_grad 0.61% : 0.000183s : 1: opt_b 13.20% : 0.003977s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.71% : 0.000214s : 1: renormalize.infer 0.65% : 0.000195s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.46% : 0.000140s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 21.53% : 0.006488s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.24% : 0.006098s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0189415, [24] [bootstrap]: 0.00052698 [type_inference]: 0.00447476 [event_method]: 1.086e-05 [auto_monad]: 5.179e-05 [graph_reusing]: 4.94998e-06 [inline]: 1.91998e-06 [add_attr]: 0.00323372, [1] [add_attr_with_inline]: 0.00322624, [1] [Cycle 1]: 4.474e-05, [2] [tag_attr]: 1.192e-05 [meta_addattr_fg_expand]: 3.06001e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.153e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.03002e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00368693, [53] [py_interpret_to_execute]: 1.474e-05 [rewriter_before_opt_a]: 3.828e-05 [opt_a]: 0.00186018, [2] [Cycle 1]: 0.0012612, [45] [expand_dump_flag]: 2.49001e-06 [switch_simplify]: 2.504e-05 [loop_unroll]: 1.4e-05 [a_1]: 0.00029651 [with_stream_mark]: 1.313e-05 [recompute_prepare]: 7.18e-06 [updatestate_depend_eliminate]: 3.74002e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 7.612e-05 [accelerated_algorithm]: 6.10002e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 8.43999e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.823e-05 [flash_sp]: 7.06999e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 9.27999e-06 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 6.85998e-06 [virtual_dataset]: 5.62001e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.005e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.057e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 9.48002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.07001e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00034413 [add_forward_monad_depend]: 4.32003e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.386e-05 [cse]: 2.652e-05 [a_3]: 4.026e-05 [Cycle 2]: 0.00058984, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 7.15e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.000124 [with_stream_mark]: 9.56e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.721e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 5.33002e-06 [parallel]: 4.31002e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.26998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.32001e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.24e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 6.04999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.40999e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.221e-05 [a_3]: 3.356e-05 [py_interpret_to_execute_after_opt_a]: 7.41999e-06 [slice_cell_reuse_recomputed_activation]: 2.38998e-06 [rewriter_after_opt_a]: 3.211e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.00044996 [opt_b]: 0.00017903, [1] [Cycle 1]: 0.00017297, [7] [b_1]: 0.0001069 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 4.40021e-07 [cse]: 1.493e-05 [optimize_parallel_all_gather_comm]: 1.627e-05 [overlap_param_gather]: 2.21e-06 [cconv]: 2.288e-05 [loop_unroll]: 0.00041223 [opt_after_cconv]: 9.32e-05, [1] [Cycle 1]: 8.759e-05, [7] [c_1]: 2.695e-05 [parameter_eliminate]: 2.13002e-06 [updatestate_depend_eliminate]: 4.81002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.547e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.352e-05 [tuple_transform]: 6.826e-05, [1] [Cycle 1]: 6.4e-05, [4] [d_1]: 3.847e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.422e-05 [cse_after_recomputation]: 1.844e-05, [1] [Cycle 1]: 1.422e-05, [1] [cse]: 9.26998e-06 [environ_conv]: 4.64002e-06 [swap_dp_allreduce_reducescatter]: 5.37999e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 3.06999e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.34999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.43002e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.36002e-06 [add_comm_op_reuse_tag]: 1.28002e-06 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02999e-06 [control_data_broadcast_order]: 1.186e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.97e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67001e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.742e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.38998e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.025e-05, [1] [Cycle 1]: 8.603e-05, [6] [build]: 2.04999e-06 [elim_shapecalc]: 7.83001e-06 [elim_not_effective]: 1.115e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.74003e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.556e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00044726 [validate]: 3.077e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00621337 [execute]: 7.77002e-06 Sums bootstrap : 0.000527s : 3.58% type_inference : 0.004475s : 30.38% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000421s : 2.85% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000344s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.05% optimize.opt_b.b_1 : 0.000107s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000412s : 2.80% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000009s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000447s : 3.04% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006213s : 42.18% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.83% : 0.000023s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000005s : 4: substitution.graph_param_transform 65.27% : 0.000079s : 2: substitution.inline 2.40% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.79% : 0.000005s : 4: substitution.remove_not_recompute_node 2.89% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004434 2 92.00% : 0.004080s : 1: type_inference.infer 8.00% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000135 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.01% : 0.000003s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.41% : 0.000001s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.29% : 0.000003s : 26: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.83% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.05% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.62% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000242 6 42.64% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.36% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027134 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.93% : 0.003238s : 1: add_attr 11.90% : 0.003230s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.08% : 0.000565s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000021s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.55% : 0.000421s : 1: loop_unroll 0.02% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.84% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.87% : 0.001863s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.68% : 0.000457s : 1: opt_after_jit_grad 0.67% : 0.000183s : 1: opt_b 13.60% : 0.003691s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.70% : 0.000190s : 1: renormalize.infer 0.55% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.34% : 0.000093s : 1: symbol_engine_optimizer 22.93% : 0.006223s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 16.54% : 0.004488s : 1: type_inference 0.21% : 0.000057s : 1: validate TotalTime = 0.0200015, [24] [bootstrap]: 0.00046747 [type_inference]: 0.00564373 [event_method]: 1.421e-05 [auto_monad]: 5.711e-05 [graph_reusing]: 5.76e-06 [inline]: 2.28998e-06 [add_attr]: 0.0030713, [1] [add_attr_with_inline]: 0.00306312, [1] [Cycle 1]: 4.508e-05, [2] [tag_attr]: 1.532e-05 [meta_addattr_fg_expand]: 4.13001e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 2.579e-05 [insert-virtual-dataset]: 2.38998e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00393004, [53] [py_interpret_to_execute]: 1.953e-05 [rewriter_before_opt_a]: 5.879e-05 [opt_a]: 0.00208215, [2] [Cycle 1]: 0.0014895, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 3.209e-05 [loop_unroll]: 2.098e-05 [a_1]: 0.00044208 [with_stream_mark]: 1.325e-05 [recompute_prepare]: 7.53e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.595e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.58e-06 [meta_shard_fg_expand]: 2.14e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 8.47e-06 [auto_parallel]: 6.24999e-06 [parallel]: 1.886e-05 [flash_sp]: 7.38999e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 1.25999e-06 [virtual_shard_identity]: 6.99001e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 6.16e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.041e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.20001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 9.81e-06 [a_after_grad]: 8.42e-06 [renormalize]: 0.00040391 [add_forward_monad_depend]: 4.33001e-06 [auto_monad_grad]: 1.87999e-06 [auto_monad_eliminator]: 1.416e-05 [cse]: 2.879e-05 [a_3]: 3.999e-05 [Cycle 2]: 0.00058334, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 6.71999e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.00012548 [with_stream_mark]: 9.47001e-06 [recompute_prepare]: 5.69e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 6.672e-05 [accelerated_algorithm]: 5.34998e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.24002e-06 [auto_parallel]: 5.25999e-06 [parallel]: 3.88999e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 2.89999e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 4.92999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 5.80002e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 4.91002e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.34001e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 5.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52999e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 2.80002e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.90024e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.85999e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 5.84e-06 [cse]: 1.501e-05 [a_3]: 3.168e-05 [py_interpret_to_execute_after_opt_a]: 7.64002e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.113e-05 [convert_after_rewriter]: 6.73998e-06 [order_py_execute_after_rewriter]: 5.15001e-06 [mutable_eliminate]: 0.00044656 [opt_b]: 0.00018079, [1] [Cycle 1]: 0.0001747, [7] [b_1]: 0.0001075 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 3.60014e-07 [cse]: 1.66e-05 [optimize_parallel_all_gather_comm]: 1.582e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.249e-05 [loop_unroll]: 0.00043496 [opt_after_cconv]: 9.418e-05, [1] [Cycle 1]: 8.845e-05, [7] [c_1]: 2.784e-05 [parameter_eliminate]: 2.28002e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.523e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.315e-05 [tuple_transform]: 6.93e-05, [1] [Cycle 1]: 6.493e-05, [4] [d_1]: 3.898e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.22001e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 4.264e-05 [cse_after_recomputation]: 2.029e-05, [1] [Cycle 1]: 1.58e-05, [1] [cse]: 1.074e-05 [environ_conv]: 4.23001e-06 [swap_dp_allreduce_reducescatter]: 5.57001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.85999e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.96999e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71998e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.96998e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 5.15999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.04999e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.656e-05 [begin_end_overlap_inline]: 7.79983e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 6.804e-05, [1] [Cycle 1]: 6.398e-05, [6] [build]: 2.51998e-06 [elim_shapecalc]: 8.47998e-06 [elim_not_effective]: 1.117e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.80001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.616e-05 [get_jit_bprop_graph]: 9.60019e-07 [rewriter_after_jit_bprop_graph]: 3.21001e-06 [opt_after_jit_grad]: 0.00044719 [validate]: 3.058e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00606786 [execute]: 7.19001e-06 Sums bootstrap : 0.000467s : 2.93% type_inference : 0.005644s : 35.32% event_method : 0.000014s : 0.09% auto_monad : 0.000057s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000568s : 3.55% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.89% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000404s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000044s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000447s : 2.79% optimize.opt_b.b_1 : 0.000108s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000435s : 2.72% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 2.80% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006068s : 37.98% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000164 30 15.08% : 0.000025s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.63% : 0.000006s : 4: substitution.graph_param_transform 66.03% : 0.000108s : 3: substitution.inline 1.82% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000004s : 4: substitution.remove_not_recompute_node 2.15% : 0.000004s : 4: substitution.replace_old_param 6.81% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005603 2 90.04% : 0.005045s : 1: type_inference.infer 9.96% : 0.000558s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.92% : 0.000027s : 3: replace.inline 29.08% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.31% : 0.000106s : 3: match.inline 8.69% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000154 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.52% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.21% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.59% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 46.99% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.01% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028493 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.80% : 0.003076s : 1: add_attr 10.76% : 0.003067s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000504s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.56% : 0.000444s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.60% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.26% : 0.000929s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.32% : 0.002085s : 1: opt_a 0.34% : 0.000097s : 1: opt_after_cconv 1.60% : 0.000457s : 1: opt_after_jit_grad 0.65% : 0.000184s : 1: opt_b 13.81% : 0.003934s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000208s : 1: renormalize.infer 0.67% : 0.000190s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.33% : 0.006078s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.86% : 0.005658s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0386819, [24] [bootstrap]: 0.00052183 [type_inference]: 0.0118986 [event_method]: 4.806e-05 [auto_monad]: 0.00012501 [graph_reusing]: 8.89998e-06 [inline]: 1.67999e-06 [add_attr]: 0.0031393, [1] [add_attr_with_inline]: 0.00313124, [1] [Cycle 1]: 7.227e-05, [2] [tag_attr]: 3.542e-05 [meta_addattr_fg_expand]: 9.85002e-06 [parallel-infer-symbol]: 3.2e-06 [pre_auto_parallel]: 5.257e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 2.31e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0137556, [53] [py_interpret_to_execute]: 3.873e-05 [rewriter_before_opt_a]: 0.00014885 [opt_a]: 0.0114146, [3] [Cycle 1]: 0.00739905, [45] [expand_dump_flag]: 3.81999e-06 [switch_simplify]: 7.56e-05 [loop_unroll]: 6.249e-05 [a_1]: 0.00147173 [with_stream_mark]: 5.557e-05 [recompute_prepare]: 2.34e-05 [updatestate_depend_eliminate]: 9.89999e-06 [updatestate_assign_eliminate]: 8e-06 [updatestate_loads_eliminate]: 7.81001e-06 [parameter_eliminate]: 2.94001e-06 [a_2]: 0.00024911 [accelerated_algorithm]: 3.147e-05 [shard]: 2.04e-06 [meta_shard_fg_expand]: 3.46999e-06 [shard_inline]: 1.741e-05 [merge_send_recv]: 1.65e-05 [auto_parallel]: 1.079e-05 [parallel]: 1.825e-05 [flash_sp]: 1.182e-05 [merge_comm]: 9.51e-06 [allreduce_fusion]: 9.87999e-06 [matmul_add_comm_reduction]: 2.769e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.784e-05 [virtual_dataset]: 1.642e-05 [get_grad_eliminate_]: 1.563e-05 [virtual_output]: 1.527e-05 [merge_forward]: 9.87999e-06 [cell_reuse_recompute_pass]: 1.14003e-06 [offload_activation]: 1.931e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.894e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 2.812e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66998e-06 [meta_fg_expand]: 0.00160299 [flash_sp_send_recv_attached]: 3.68999e-06 [receive_attached]: 2.30002e-06 [after_resolve]: 6.068e-05 [a_after_grad]: 8.223e-05 [renormalize]: 0.00249324 [add_forward_monad_depend]: 9.44e-06 [auto_monad_grad]: 5.03002e-06 [auto_monad_eliminator]: 5.611e-05 [cse]: 0.00016884 [a_3]: 0.00034244 [Cycle 2]: 0.00308309, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 4.72e-05 [loop_unroll]: 4.379e-05 [a_1]: 0.00153887 [with_stream_mark]: 1.192e-05 [recompute_prepare]: 1.091e-05 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 4.54002e-06 [updatestate_loads_eliminate]: 3.81999e-06 [parameter_eliminate]: 1.20001e-06 [a_2]: 0.00012733 [accelerated_algorithm]: 1.197e-05 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 6.56e-06 [auto_parallel]: 7.43e-06 [parallel]: 5.04e-06 [flash_sp]: 3.51999e-06 [merge_comm]: 5.79e-06 [allreduce_fusion]: 5.42999e-06 [matmul_add_comm_reduction]: 8.15e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.061e-05 [virtual_dataset]: 9.10001e-06 [get_grad_eliminate_]: 8.79e-06 [virtual_output]: 8.64998e-06 [merge_forward]: 4.74e-06 [cell_reuse_recompute_pass]: 8.50006e-07 [offload_activation]: 9.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.663e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.456e-05 [set_forward_comm_id_for_comm_node_pass]: 5.59998e-06 [meta_fg_expand]: 7.422e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.07e-06 [after_resolve]: 1.703e-05 [a_after_grad]: 1.481e-05 [renormalize]: 0.00064528 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.547e-05 [cse]: 4.972e-05 [a_3]: 6.67e-05 [Cycle 3]: 0.0009176, [45] [expand_dump_flag]: 1.09998e-06 [switch_simplify]: 1.095e-05 [loop_unroll]: 9.09998e-06 [a_1]: 0.0002536 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 9.89001e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 3.92998e-06 [parameter_eliminate]: 1.20999e-06 [a_2]: 0.00012541 [accelerated_algorithm]: 1.207e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.99e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 7.23999e-06 [auto_parallel]: 7.13e-06 [parallel]: 4.89998e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 4.84998e-06 [allreduce_fusion]: 5.13002e-06 [matmul_add_comm_reduction]: 7.99002e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.036e-05 [virtual_dataset]: 8.95001e-06 [get_grad_eliminate_]: 8.67e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.47998e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 8.84998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.635e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.418e-05 [set_forward_comm_id_for_comm_node_pass]: 5.64e-06 [meta_fg_expand]: 3.31001e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.12e-06 [after_resolve]: 1.378e-05 [a_after_grad]: 1.447e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 1.07998e-06 [auto_monad_eliminator]: 1.063e-05 [cse]: 2.653e-05 [a_3]: 6.158e-05 [py_interpret_to_execute_after_opt_a]: 1.113e-05 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 4.978e-05 [convert_after_rewriter]: 9.15999e-06 [order_py_execute_after_rewriter]: 6.94001e-06 [mutable_eliminate]: 0.00050364 [opt_b]: 0.00029196, [1] [Cycle 1]: 0.00028527, [7] [b_1]: 0.00019173 [b_2]: 1.085e-05 [updatestate_depend_eliminate]: 7.70998e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 4.17e-06 [renormalize]: 5.69999e-07 [cse]: 3.129e-05 [optimize_parallel_all_gather_comm]: 2.109e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.163e-05 [loop_unroll]: 0.00043078 [opt_after_cconv]: 0.00013511, [1] [Cycle 1]: 0.00012936, [7] [c_1]: 4.833e-05 [parameter_eliminate]: 2.28002e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 4.02e-06 [cse]: 2.879e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 3.047e-05 [tuple_transform]: 0.00010412, [1] [Cycle 1]: 9.921e-05, [4] [d_1]: 6.819e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 1.027e-05 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 5.947e-05 [cse_after_recomputation]: 3.246e-05, [1] [Cycle 1]: 2.761e-05, [1] [cse]: 2.194e-05 [environ_conv]: 8.48001e-06 [swap_dp_allreduce_reducescatter]: 8.40999e-06 [bias_add_comm_swap]: 2.78e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.12999e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.70997e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.741e-05 [grouped_pairwise_exchange_alltoall]: 1.69998e-06 [offloading_packed_experts]: 5.09e-06 [overlap_recompute_and_grad_model_parallel]: 5.58997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.96e-06 [overlap_recompute_comm]: 2.52001e-06 [overlap_grad_ring_attention]: 5.59998e-06 [overlap_grad_flash_sp]: 2.514e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.46998e-06 [split_layernorm_comm]: 2.14e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 9.846e-05, [1] [Cycle 1]: 9.394e-05, [6] [build]: 9.60001e-06 [elim_shapecalc]: 1.301e-05 [elim_not_effective]: 1.81e-05 [opt_reshape]: 1.016e-05 [fold_const_symbol]: 1.532e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.44003e-06 [auto_monad_reorder]: 2.583e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.40998e-06 [opt_after_jit_grad]: 0.0005117 [validate]: 4.31e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00831042 [execute]: 7.06001e-06 Sums bootstrap : 0.000522s : 1.52% type_inference : 0.011899s : 34.71% event_method : 0.000048s : 0.14% auto_monad : 0.000125s : 0.36% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000053s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000149s : 0.43% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000134s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.34% optimize.opt_a.a_1 : 0.003264s : 9.52% optimize.opt_a.with_stream_mark : 0.000078s : 0.23% optimize.opt_a.recompute_prepare : 0.000044s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000502s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001681s : 4.90% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.27% optimize.opt_a.a_after_grad : 0.000112s : 0.33% optimize.opt_a.renormalize : 0.003139s : 9.16% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.24% optimize.opt_a.cse : 0.000245s : 0.72% optimize.opt_a.a_3 : 0.000471s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000050s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000504s : 1.47% optimize.opt_b.b_1 : 0.000192s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.06% optimize.loop_unroll : 0.000431s : 1.26% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.08% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000059s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000512s : 1.49% validate : 0.000043s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008310s : 24.25% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000778 222 5.84% : 0.000045s : 12: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 56.18% : 0.000437s : 17: substitution.inline 1.98% : 0.000015s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000015s : 3: substitution.less_batch_normalization 1.75% : 0.000014s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.74% : 0.000014s : 20: substitution.remove_not_recompute_node 3.10% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.53% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.58% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011821 2 84.15% : 0.009947s : 1: type_inference.infer 15.85% : 0.001873s : 1: type_inference.specialize ------[replace.] 0.000229 33 57.56% : 0.000132s : 17: replace.inline 42.44% : 0.000097s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000462 33 92.53% : 0.000428s : 17: match.inline 7.47% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000016s : 100: predicate.arithmetic_simplify 1.11% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000043s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.33% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.64% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.94% : 0.000015s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.93% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.07% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.51% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.60% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001886 34 54.10% : 0.001021s : 13: func_graph_cloner_run.FuncGraphClonerGraph 45.90% : 0.000866s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064004 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.91% : 0.003144s : 1: add_attr 4.90% : 0.003135s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000132s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.87% : 0.000559s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.80% : 0.000513s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.75% : 0.004960s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000177s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.84% : 0.011418s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.81% : 0.000521s : 1: opt_after_jit_grad 0.46% : 0.000296s : 1: opt_b 21.50% : 0.013760s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000057s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000035s : 1: remove_dup_value 2.64% : 0.001689s : 2: renormalize.infer 2.24% : 0.001436s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000054s : 1: rewriter_after_opt_a 0.24% : 0.000153s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.00% : 0.008320s : 1: task_emit 0.17% : 0.000107s : 1: tuple_transform 18.61% : 0.011914s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0188168, [24] [bootstrap]: 0.00046154 [type_inference]: 0.00438945 [event_method]: 1.069e-05 [auto_monad]: 5.363e-05 [graph_reusing]: 5.42001e-06 [inline]: 1.84998e-06 [add_attr]: 0.00303844, [1] [add_attr_with_inline]: 0.00303082, [1] [Cycle 1]: 4.488e-05, [2] [tag_attr]: 1.202e-05 [meta_addattr_fg_expand]: 3.34001e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.201e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 2.23002e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00372125, [53] [py_interpret_to_execute]: 1.618e-05 [rewriter_before_opt_a]: 4.047e-05 [opt_a]: 0.00190426, [2] [Cycle 1]: 0.00129877, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 2.535e-05 [loop_unroll]: 1.406e-05 [a_1]: 0.00029687 [with_stream_mark]: 1.372e-05 [recompute_prepare]: 7.15998e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 7.827e-05 [accelerated_algorithm]: 7.00998e-06 [shard]: 2.67001e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8e-06 [auto_parallel]: 5.95002e-06 [parallel]: 1.855e-05 [flash_sp]: 7.53e-06 [merge_comm]: 3.81001e-06 [allreduce_fusion]: 4.00998e-06 [matmul_add_comm_reduction]: 9.33997e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.33e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 4.10998e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.11e-05 [merge_recompute_call_nodes]: 1.41998e-06 [before_grad]: 9.21998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 3.04001e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 9.12999e-06 [renormalize]: 0.00034109 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.342e-05 [cse]: 2.743e-05 [a_3]: 7.184e-05 [Cycle 2]: 0.00059607, [45] [expand_dump_flag]: 8.00006e-07 [switch_simplify]: 7.13e-06 [loop_unroll]: 5.82999e-06 [a_1]: 0.00012597 [with_stream_mark]: 9.25999e-06 [recompute_prepare]: 5.73002e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.714e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.02999e-06 [parallel]: 4.61002e-06 [flash_sp]: 3.07002e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.30001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.00001e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.66e-06 [offload_activation]: 6.59001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.003e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.84002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 7.74002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.44999e-06 [cse]: 1.328e-05 [a_3]: 3.163e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.238e-05 [convert_after_rewriter]: 7.05e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00045389 [opt_b]: 0.00017947, [1] [Cycle 1]: 0.00017359, [7] [b_1]: 0.00010708 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 2.30008e-07 [cse]: 1.496e-05 [optimize_parallel_all_gather_comm]: 1.575e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.306e-05 [loop_unroll]: 0.00041418 [opt_after_cconv]: 9.389e-05, [1] [Cycle 1]: 8.812e-05, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 5.46e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.08002e-06 [cse]: 1.555e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.274e-05 [tuple_transform]: 6.867e-05, [1] [Cycle 1]: 6.447e-05, [4] [d_1]: 3.952e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 5.93002e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 4.589e-05 [cse_after_recomputation]: 2.029e-05, [1] [Cycle 1]: 1.581e-05, [1] [cse]: 1.07e-05 [environ_conv]: 4.80999e-06 [swap_dp_allreduce_reducescatter]: 5.76e-06 [bias_add_comm_swap]: 2.34999e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.54001e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.73003e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 3.87998e-06 [overlap_recompute_and_grad_model_parallel]: 4.39002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.63002e-06 [overlap_recompute_comm]: 2.01e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.11997e-06 [symbol_engine_optimizer]: 6.696e-05, [1] [Cycle 1]: 6.3e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 7.89002e-06 [elim_not_effective]: 1.106e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.83002e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.566e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.2e-06 [opt_after_jit_grad]: 0.00045199 [validate]: 3.032e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.0063907 [execute]: 7.56001e-06 Sums bootstrap : 0.000462s : 3.11% type_inference : 0.004389s : 29.62% event_method : 0.000011s : 0.07% auto_monad : 0.000054s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000423s : 2.85% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000341s : 2.30% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.27% optimize.opt_a.a_3 : 0.000103s : 0.70% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000454s : 3.06% optimize.opt_b.b_1 : 0.000107s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000414s : 2.80% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 3.05% validate : 0.000030s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006391s : 43.13% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000124 26 18.31% : 0.000023s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.51% : 0.000006s : 4: substitution.graph_param_transform 65.89% : 0.000082s : 2: substitution.inline 2.15% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004346 2 91.74% : 0.003987s : 1: type_inference.infer 8.26% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000002s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.97% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.91% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.99% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.73% : 0.000001s : 4: predicate.row_tensor_eliminate 1.06% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.61% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000244 6 41.72% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.28% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026881 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.32% : 0.003043s : 1: add_attr 11.29% : 0.003034s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.85% : 0.000498s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000463s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 3.01% : 0.000809s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001907s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.72% : 0.000461s : 1: opt_after_jit_grad 0.68% : 0.000183s : 1: opt_b 13.86% : 0.003725s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.69% : 0.000187s : 1: renormalize.infer 0.55% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.17% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.81% : 0.006401s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.38% : 0.004404s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0366126, [24] [bootstrap]: 0.00052617 [type_inference]: 0.010369 [event_method]: 4.226e-05 [auto_monad]: 0.00011977 [graph_reusing]: 8.62e-06 [inline]: 1.89e-06 [add_attr]: 0.00307296, [1] [add_attr_with_inline]: 0.00306498, [1] [Cycle 1]: 6.834e-05, [2] [tag_attr]: 3.208e-05 [meta_addattr_fg_expand]: 9.19e-06 [parallel-infer-symbol]: 2.95998e-06 [pre_auto_parallel]: 4.787e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.07999e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0132327, [53] [py_interpret_to_execute]: 3.582e-05 [rewriter_before_opt_a]: 0.00012876 [opt_a]: 0.0109605, [3] [Cycle 1]: 0.00699392, [45] [expand_dump_flag]: 3.97e-06 [switch_simplify]: 6.704e-05 [loop_unroll]: 5.557e-05 [a_1]: 0.00136467 [with_stream_mark]: 2.434e-05 [recompute_prepare]: 2.249e-05 [updatestate_depend_eliminate]: 9.25999e-06 [updatestate_assign_eliminate]: 8.66002e-06 [updatestate_loads_eliminate]: 8.05e-06 [parameter_eliminate]: 2.79001e-06 [a_2]: 0.00024761 [accelerated_algorithm]: 3.259e-05 [shard]: 1.91998e-06 [meta_shard_fg_expand]: 3.86001e-06 [shard_inline]: 1.621e-05 [merge_send_recv]: 1.664e-05 [auto_parallel]: 1.096e-05 [parallel]: 1.895e-05 [flash_sp]: 1.135e-05 [merge_comm]: 9.89001e-06 [allreduce_fusion]: 9.25001e-06 [matmul_add_comm_reduction]: 2.812e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.792e-05 [virtual_dataset]: 1.61e-05 [get_grad_eliminate_]: 1.581e-05 [virtual_output]: 1.556e-05 [merge_forward]: 1.014e-05 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 1.875e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.887e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.788e-05 [set_forward_comm_id_for_comm_node_pass]: 9.76e-06 [meta_fg_expand]: 0.00140327 [flash_sp_send_recv_attached]: 3.53e-06 [receive_attached]: 2.56e-06 [after_resolve]: 6.03e-05 [a_after_grad]: 8.124e-05 [renormalize]: 0.00245047 [add_forward_monad_depend]: 9.02e-06 [auto_monad_grad]: 4.95001e-06 [auto_monad_eliminator]: 5.544e-05 [cse]: 0.00016789 [a_3]: 0.00034068 [Cycle 2]: 0.00301548, [45] [expand_dump_flag]: 1.40001e-06 [switch_simplify]: 4.711e-05 [loop_unroll]: 4.46e-05 [a_1]: 0.00155288 [with_stream_mark]: 1.231e-05 [recompute_prepare]: 1.143e-05 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 3.60998e-06 [parameter_eliminate]: 1.17999e-06 [a_2]: 0.000128 [accelerated_algorithm]: 1.199e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.44e-06 [merge_send_recv]: 6.78e-06 [auto_parallel]: 7.67998e-06 [parallel]: 4.83001e-06 [flash_sp]: 3.43e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.71002e-06 [matmul_add_comm_reduction]: 7.90998e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 1.007e-05 [virtual_dataset]: 8.95999e-06 [get_grad_eliminate_]: 8.81002e-06 [virtual_output]: 8.34998e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.37001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.594e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.364e-05 [set_forward_comm_id_for_comm_node_pass]: 5.80002e-06 [meta_fg_expand]: 3.498e-05 [flash_sp_send_recv_attached]: 9.40025e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.52e-05 [a_after_grad]: 1.436e-05 [renormalize]: 0.00061634 [add_forward_monad_depend]: 4.21001e-06 [auto_monad_grad]: 1.25999e-06 [auto_monad_eliminator]: 1.467e-05 [cse]: 4.782e-05 [a_3]: 6.647e-05 [Cycle 3]: 0.00093735, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.076e-05 [loop_unroll]: 9.06002e-06 [a_1]: 0.00026026 [with_stream_mark]: 1.127e-05 [recompute_prepare]: 1.074e-05 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 4.41002e-06 [parameter_eliminate]: 1.07998e-06 [a_2]: 0.00013299 [accelerated_algorithm]: 1.349e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 2.12999e-06 [shard_inline]: 8.80001e-06 [merge_send_recv]: 7.2e-06 [auto_parallel]: 7.41001e-06 [parallel]: 4.72998e-06 [flash_sp]: 1.10999e-06 [merge_comm]: 5.17999e-06 [allreduce_fusion]: 5.07e-06 [matmul_add_comm_reduction]: 7.85998e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.005e-05 [virtual_dataset]: 9.43997e-06 [get_grad_eliminate_]: 8.52e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.46002e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 9.32999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.613e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.436e-05 [set_forward_comm_id_for_comm_node_pass]: 5.46e-06 [meta_fg_expand]: 3.25e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.06997e-06 [after_resolve]: 1.315e-05 [a_after_grad]: 1.432e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 1.167e-05 [cse]: 2.603e-05 [a_3]: 6.043e-05 [py_interpret_to_execute_after_opt_a]: 1.045e-05 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 4.706e-05 [convert_after_rewriter]: 9.09e-06 [order_py_execute_after_rewriter]: 7.11001e-06 [mutable_eliminate]: 0.00046537 [opt_b]: 0.00029005, [1] [Cycle 1]: 0.00028381, [7] [b_1]: 0.00019207 [b_2]: 1.103e-05 [updatestate_depend_eliminate]: 7.07002e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4e-06 [renormalize]: 5.20027e-07 [cse]: 3.02e-05 [optimize_parallel_all_gather_comm]: 2.141e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.137e-05 [loop_unroll]: 0.00042772 [opt_after_cconv]: 0.00013561, [1] [Cycle 1]: 0.00012987, [7] [c_1]: 4.869e-05 [parameter_eliminate]: 2.33002e-06 [updatestate_depend_eliminate]: 7.38e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.92998e-06 [cse]: 2.9e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 3.119e-05 [tuple_transform]: 0.00010428, [1] [Cycle 1]: 9.953e-05, [4] [d_1]: 6.853e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 2.99973e-07 [switch_simplify]: 1.018e-05 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 6.004e-05 [cse_after_recomputation]: 3.152e-05, [1] [Cycle 1]: 2.688e-05, [1] [cse]: 2.159e-05 [environ_conv]: 8.75999e-06 [swap_dp_allreduce_reducescatter]: 8.33999e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.43001e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.10999e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.19e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 2.10002e-06 [control_data_broadcast_order]: 1.776e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 5.16002e-06 [overlap_recompute_and_grad_model_parallel]: 5.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.78997e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 5.50001e-06 [overlap_grad_flash_sp]: 2.47e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 2.12999e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 9.91e-05, [1] [Cycle 1]: 9.497e-05, [6] [build]: 9.90002e-06 [elim_shapecalc]: 1.369e-05 [elim_not_effective]: 1.798e-05 [opt_reshape]: 1.019e-05 [fold_const_symbol]: 1.56e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.597e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00052397 [validate]: 4.554e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00835966 [execute]: 7.37002e-06 Sums bootstrap : 0.000526s : 1.63% type_inference : 0.010369s : 32.12% event_method : 0.000042s : 0.13% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000129s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.39% optimize.opt_a.loop_unroll : 0.000109s : 0.34% optimize.opt_a.a_1 : 0.003178s : 9.84% optimize.opt_a.with_stream_mark : 0.000048s : 0.15% optimize.opt_a.recompute_prepare : 0.000045s : 0.14% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000509s : 1.58% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.18% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.07% optimize.opt_a.meta_fg_expand : 0.001441s : 4.47% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.003067s : 9.50% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.25% optimize.opt_a.cse : 0.000242s : 0.75% optimize.opt_a.a_3 : 0.000468s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000465s : 1.44% optimize.opt_b.b_1 : 0.000192s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000428s : 1.32% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.10% optimize.tuple_transform.d_1 : 0.000069s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.19% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000524s : 1.62% validate : 0.000046s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008360s : 25.90% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000743 218 5.81% : 0.000043s : 11: substitution.arithmetic_simplify 1.95% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.24% : 0.000410s : 16: substitution.inline 2.08% : 0.000015s : 2: substitution.inline_without_move 1.41% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.17% : 0.000016s : 3: substitution.less_batch_normalization 1.81% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000013s : 20: substitution.remove_not_recompute_node 3.22% : 0.000024s : 10: substitution.replace_applicator 1.37% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.76% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.28% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010296 2 87.32% : 0.008991s : 1: type_inference.infer 12.68% : 0.001306s : 1: type_inference.specialize ------[replace.] 0.000206 30 58.87% : 0.000121s : 16: replace.inline 41.13% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000432 30 92.92% : 0.000402s : 16: match.inline 7.08% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.55% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.58% : 0.000004s : 32: predicate.incorporate_call 0.53% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000042s : 244: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.61% : 0.000019s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 67: predicate.reduce_eliminate 2.62% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.19% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.93% : 0.000037s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.60% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001509 32 57.42% : 0.000867s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.58% : 0.000643s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061176 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.03% : 0.003077s : 1: add_attr 5.02% : 0.003069s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.92% : 0.000562s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000437s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000474s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.94% : 0.004858s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000177s : 28: opt.transform.opt_b 0.13% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.92% : 0.010964s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.87% : 0.000533s : 1: opt_after_jit_grad 0.48% : 0.000294s : 1: opt_b 21.64% : 0.013237s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000053s : 1: pre_auto_parallel 0.06% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000036s : 1: remove_dup_value 2.67% : 0.001635s : 2: renormalize.infer 2.32% : 0.001419s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000102s : 1: symbol_engine_optimizer 13.68% : 0.008370s : 1: task_emit 0.18% : 0.000107s : 1: tuple_transform 16.98% : 0.010385s : 1: type_inference 0.13% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x5-kbk],max_mem:10.0M . TotalTime = 0.0869364, [24] [bootstrap]: 0.00060996 [type_inference]: 0.00642843 [event_method]: 1.445e-05 [auto_monad]: 5.724e-05 [graph_reusing]: 5.90002e-06 [inline]: 1.78002e-06 [add_attr]: 0.0034414, [1] [add_attr_with_inline]: 0.00343022, [1] [Cycle 1]: 4.6e-05, [2] [tag_attr]: 1.525e-05 [meta_addattr_fg_expand]: 4.72e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.882e-05 [insert-virtual-dataset]: 2.45997e-06 [parallel-infer-symbol-second]: 9.5999e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00405485, [53] [py_interpret_to_execute]: 2.137e-05 [rewriter_before_opt_a]: 5.863e-05 [opt_a]: 0.00215775, [2] [Cycle 1]: 0.00155177, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 3.283e-05 [loop_unroll]: 2.14e-05 [a_1]: 0.00047171 [with_stream_mark]: 1.281e-05 [recompute_prepare]: 7.68001e-06 [updatestate_depend_eliminate]: 3.52997e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.845e-05 [accelerated_algorithm]: 6.44999e-06 [shard]: 2.02001e-06 [meta_shard_fg_expand]: 1.61998e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 8.74e-06 [auto_parallel]: 5.94e-06 [parallel]: 2.454e-05 [flash_sp]: 8.08999e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.65998e-06 [matmul_add_comm_reduction]: 8.89998e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.6e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 6.07999e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.091e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 9.71e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 2.53998e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 1.041e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00042078 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 2.12001e-06 [auto_monad_eliminator]: 1.433e-05 [cse]: 2.786e-05 [a_3]: 4.2e-05 [Cycle 2]: 0.00059673, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012971 [with_stream_mark]: 9.58002e-06 [recompute_prepare]: 5.84999e-06 [updatestate_depend_eliminate]: 2.92002e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.886e-05 [accelerated_algorithm]: 5.65001e-06 [shard]: 1.06997e-06 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.41002e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 3.08998e-06 [allreduce_fusion]: 2.84999e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 4.96997e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.53002e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.95998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 1.71002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.62001e-06 [a_after_grad]: 8.23999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.213e-05 [a_3]: 3.3e-05 [py_interpret_to_execute_after_opt_a]: 7.50998e-06 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 3.125e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.51998e-06 [mutable_eliminate]: 0.00049225 [opt_b]: 0.0001909, [1] [Cycle 1]: 0.00018503, [7] [b_1]: 0.0001152 [b_2]: 7.4e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.33002e-06 [renormalize]: 3.59985e-07 [cse]: 1.621e-05 [optimize_parallel_all_gather_comm]: 1.697e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.293e-05 [loop_unroll]: 0.00041335 [opt_after_cconv]: 9.436e-05, [1] [Cycle 1]: 8.857e-05, [7] [c_1]: 2.837e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.537e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.303e-05 [tuple_transform]: 7.032e-05, [1] [Cycle 1]: 6.603e-05, [4] [d_1]: 4.042e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 5.041e-05 [cse_after_recomputation]: 2.065e-05, [1] [Cycle 1]: 1.607e-05, [1] [cse]: 1.071e-05 [environ_conv]: 4.69002e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.24002e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.23001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.64999e-06 [overlap_grad_ring_attention]: 4.07003e-06 [overlap_grad_flash_sp]: 1.727e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 2.17999e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.811e-05, [1] [Cycle 1]: 6.403e-05, [6] [build]: 2.12999e-06 [elim_shapecalc]: 8.81997e-06 [elim_not_effective]: 1.137e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.567e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00045551 [validate]: 3.113e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.071551 [execute]: 8.27e-06 Sums bootstrap : 0.000610s : 0.74% type_inference : 0.006428s : 7.79% event_method : 0.000014s : 0.02% auto_monad : 0.000057s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000601s : 0.73% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.18% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000421s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000075s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000492s : 0.60% optimize.opt_b.b_1 : 0.000115s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000413s : 0.50% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000456s : 0.55% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.071551s : 86.71% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000170 30 15.01% : 0.000025s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.20% : 0.000005s : 4: substitution.graph_param_transform 67.08% : 0.000114s : 3: substitution.inline 1.57% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.48% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006378 2 90.89% : 0.005797s : 1: type_inference.infer 9.11% : 0.000581s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.12% : 0.000028s : 3: replace.inline 28.88% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 91.90% : 0.000112s : 3: match.inline 8.10% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.10% : 0.000003s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.07% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.36% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.37% : 0.000001s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.01% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.51% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.30% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.71% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000361 8 46.65% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.35% : 0.000193s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.095989 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.59% : 0.003446s : 1: add_attr 3.58% : 0.003434s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000062s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.68% : 0.000652s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.44% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.02% : 0.000976s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000097s : 28: opt.transform.opt_b 0.05% : 0.000045s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.25% : 0.002161s : 1: opt_a 0.10% : 0.000098s : 1: opt_after_cconv 0.48% : 0.000466s : 1: opt_after_jit_grad 0.20% : 0.000194s : 1: opt_b 4.23% : 0.004059s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.23% : 0.000216s : 1: renormalize.infer 0.21% : 0.000198s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000071s : 1: symbol_engine_optimizer 74.56% : 0.071569s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 6.71% : 0.006442s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.0782652, [24] [bootstrap]: 0.00050733 [type_inference]: 0.00451033 [event_method]: 1.144e-05 [auto_monad]: 5.308e-05 [graph_reusing]: 5.23002e-06 [inline]: 1.78002e-06 [add_attr]: 0.00303939, [1] [add_attr_with_inline]: 0.00303158, [1] [Cycle 1]: 4.767e-05, [2] [tag_attr]: 1.281e-05 [meta_addattr_fg_expand]: 3.8e-06 [parallel-infer-symbol]: 2.67001e-06 [pre_auto_parallel]: 2.137e-05 [insert-virtual-dataset]: 2.45002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.93002e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.00374543, [53] [py_interpret_to_execute]: 1.548e-05 [rewriter_before_opt_a]: 3.969e-05 [opt_a]: 0.00187983, [2] [Cycle 1]: 0.0012754, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 2.502e-05 [loop_unroll]: 1.378e-05 [a_1]: 0.00029447 [with_stream_mark]: 1.3e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 7.776e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 2.23002e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 8.38001e-06 [auto_parallel]: 6.58998e-06 [parallel]: 1.832e-05 [flash_sp]: 7.5e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.44e-06 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 7.42002e-06 [virtual_dataset]: 5.73997e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.59e-06 [merge_forward]: 3.75998e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 9.30001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.92999e-06 [renormalize]: 0.00035416 [add_forward_monad_depend]: 4.42e-06 [auto_monad_grad]: 1.98997e-06 [auto_monad_eliminator]: 1.392e-05 [cse]: 2.738e-05 [a_3]: 4.082e-05 [Cycle 2]: 0.00059547, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.48997e-06 [a_1]: 0.0001246 [with_stream_mark]: 1.089e-05 [recompute_prepare]: 5.52999e-06 [updatestate_depend_eliminate]: 3.13e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.923e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.40999e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.26002e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.24001e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 6.29001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.22e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.30013e-07 [after_resolve]: 9.00001e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.46e-06 [cse]: 1.236e-05 [a_3]: 3.244e-05 [py_interpret_to_execute_after_opt_a]: 7.83999e-06 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 3.164e-05 [convert_after_rewriter]: 7.35998e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00045364 [opt_b]: 0.0001821, [1] [Cycle 1]: 0.00017612, [7] [b_1]: 0.00010831 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 2.50002e-07 [cse]: 1.598e-05 [optimize_parallel_all_gather_comm]: 1.67e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.377e-05 [loop_unroll]: 0.00045209 [opt_after_cconv]: 9.626e-05, [1] [Cycle 1]: 9.068e-05, [7] [c_1]: 2.879e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 4.86002e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.656e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 1.269e-05 [tuple_transform]: 6.984e-05, [1] [Cycle 1]: 6.575e-05, [4] [d_1]: 3.955e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 6.23998e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 4.625e-05 [cse_after_recomputation]: 2.076e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.108e-05 [environ_conv]: 4.93001e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.65002e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.86999e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.25002e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.01997e-06 [full_micro_interleaved_order_control]: 2.46998e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.39003e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.43002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06e-06 [control_data_broadcast_order]: 1.155e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.83999e-06 [overlap_recompute_and_grad_model_parallel]: 4.62e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.735e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 6.775e-05, [1] [Cycle 1]: 6.374e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.32998e-06 [elim_not_effective]: 1.136e-05 [opt_reshape]: 6.26998e-06 [fold_const_symbol]: 8.59e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.664e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00045186 [validate]: 3.215e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0656427 [execute]: 8.48999e-06 Sums bootstrap : 0.000507s : 0.68% type_inference : 0.004510s : 6.07% event_method : 0.000011s : 0.02% auto_monad : 0.000053s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.05% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.56% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000354s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.61% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000452s : 0.61% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000452s : 0.61% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.065643s : 88.39% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 17.71% : 0.000021s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.44% : 0.000005s : 4: substitution.graph_param_transform 66.49% : 0.000081s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.47% : 0.000004s : 4: substitution.remove_not_recompute_node 3.06% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004468 2 91.65% : 0.004095s : 1: type_inference.infer 8.35% : 0.000373s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.60% : 0.000004s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.09% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.91% : 0.000001s : 8: predicate.get_grad_eliminate 0.41% : 0.000001s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.73% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.77% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.59% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000271 6 44.50% : 0.000120s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.50% : 0.000150s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.086338 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.53% : 0.003044s : 1: add_attr 3.52% : 0.003035s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000545s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000461s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.54% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.90% : 0.000774s : 78: opt.transform.opt_a 0.03% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.18% : 0.001883s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.53% : 0.000461s : 1: opt_after_jit_grad 0.22% : 0.000186s : 1: opt_b 4.34% : 0.003749s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000197s : 1: renormalize.infer 0.17% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.05% : 0.065658s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 5.24% : 0.004524s : 1: type_inference 0.06% : 0.000054s : 1: validate TotalTime = 0.0826457, [24] [bootstrap]: 0.00077342 [type_inference]: 0.00676759 [event_method]: 1.493e-05 [auto_monad]: 5.763e-05 [graph_reusing]: 6.29001e-06 [inline]: 1.80001e-06 [add_attr]: 0.00313873, [1] [add_attr_with_inline]: 0.00313097, [1] [Cycle 1]: 4.762e-05, [2] [tag_attr]: 1.553e-05 [meta_addattr_fg_expand]: 4.36002e-06 [parallel-infer-symbol]: 2.62001e-06 [pre_auto_parallel]: 2.533e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.00407896, [53] [py_interpret_to_execute]: 2.119e-05 [rewriter_before_opt_a]: 5.983e-05 [opt_a]: 0.00222288, [2] [Cycle 1]: 0.0015703, [45] [expand_dump_flag]: 2.76999e-06 [switch_simplify]: 3.341e-05 [loop_unroll]: 2.072e-05 [a_1]: 0.00045183 [with_stream_mark]: 1.438e-05 [recompute_prepare]: 7.71999e-06 [updatestate_depend_eliminate]: 4.11001e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 7.622e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.36e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 8.22998e-06 [auto_parallel]: 5.76998e-06 [parallel]: 4.826e-05 [flash_sp]: 7.30003e-06 [merge_comm]: 4.34002e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 9.37999e-06 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 7.98999e-06 [virtual_dataset]: 6.23e-06 [get_grad_eliminate_]: 5.60001e-06 [virtual_output]: 5.66998e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.111e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.79999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.52001e-06 [flash_sp_send_recv_attached]: 2.80997e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.092e-05 [a_after_grad]: 8.80999e-06 [renormalize]: 0.00043293 [add_forward_monad_depend]: 5.51e-06 [auto_monad_grad]: 1.87999e-06 [auto_monad_eliminator]: 1.413e-05 [cse]: 2.874e-05 [a_3]: 4.201e-05 [Cycle 2]: 0.0006434, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 7.06001e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012655 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.26e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.878e-05 [accelerated_algorithm]: 5.79e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.39998e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 5.52999e-06 [parallel]: 3.61999e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.00998e-06 [allreduce_fusion]: 2.98998e-06 [matmul_add_comm_reduction]: 5.99999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 5.82999e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.50002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.77001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.26002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.29998e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.34001e-06 [cse]: 1.375e-05 [a_3]: 3.174e-05 [py_interpret_to_execute_after_opt_a]: 7.95e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.433e-05 [convert_after_rewriter]: 7.48e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.00045223 [opt_b]: 0.000182, [1] [Cycle 1]: 0.00017599, [7] [b_1]: 0.00010757 [b_2]: 7.31001e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.89991e-07 [cse]: 1.658e-05 [optimize_parallel_all_gather_comm]: 1.625e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.266e-05 [loop_unroll]: 0.00041954 [opt_after_cconv]: 9.507e-05, [1] [Cycle 1]: 8.966e-05, [7] [c_1]: 2.772e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.665e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.983e-05, [1] [Cycle 1]: 6.543e-05, [4] [d_1]: 3.967e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.444e-05 [cse_after_recomputation]: 2.048e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.75001e-06 [swap_dp_allreduce_reducescatter]: 5.21002e-06 [bias_add_comm_swap]: 2.21e-06 [label_micro_interleaved_index]: 4.22998e-06 [label_fine_grained_interleaved_index]: 2.85998e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.78998e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.60001e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.66999e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 1.764e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.66999e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 6.909e-05, [1] [Cycle 1]: 6.48e-05, [6] [build]: 2.58e-06 [elim_shapecalc]: 8.43999e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.94998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.625e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.00044842 [validate]: 3.135e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0670414 [execute]: 8.32998e-06 Sums bootstrap : 0.000773s : 0.99% type_inference : 0.006768s : 8.62% event_method : 0.000015s : 0.02% auto_monad : 0.000058s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000578s : 0.74% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.18% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000052s : 0.07% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000433s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.58% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000420s : 0.53% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.57% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.067041s : 85.43% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.92% : 0.000025s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.48% : 0.000006s : 4: substitution.graph_param_transform 66.25% : 0.000110s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.43% : 0.000004s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.78% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006726 2 91.33% : 0.006143s : 1: type_inference.infer 8.67% : 0.000583s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.31% : 0.000028s : 3: replace.inline 29.69% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.37% : 0.000108s : 3: match.inline 8.63% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.87% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.54% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.53% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000000s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.61% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000360 8 47.22% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.78% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.091401 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.44% : 0.003143s : 1: add_attr 3.43% : 0.003134s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000063s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.90% : 0.000825s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.47% : 0.000428s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.04% : 0.000950s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.44% : 0.002226s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.50% : 0.000458s : 1: opt_after_jit_grad 0.20% : 0.000185s : 1: opt_b 4.47% : 0.004083s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000216s : 1: renormalize.infer 0.23% : 0.000209s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000038s : 1: rewriter_after_opt_a 0.07% : 0.000065s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 73.37% : 0.067058s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 7.42% : 0.006781s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.119769, [24] [bootstrap]: 0.00056469 [type_inference]: 0.0115937 [event_method]: 4.991e-05 [auto_monad]: 0.00012349 [graph_reusing]: 8.47e-06 [inline]: 1.97999e-06 [add_attr]: 0.00304133, [1] [add_attr_with_inline]: 0.00303222, [1] [Cycle 1]: 7.282e-05, [2] [tag_attr]: 3.604e-05 [meta_addattr_fg_expand]: 9.82001e-06 [parallel-infer-symbol]: 2.90002e-06 [pre_auto_parallel]: 4.908e-05 [insert-virtual-dataset]: 2.66999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.0134331, [53] [py_interpret_to_execute]: 3.77e-05 [rewriter_before_opt_a]: 0.00014705 [opt_a]: 0.0111317, [3] [Cycle 1]: 0.00719155, [45] [expand_dump_flag]: 3.61001e-06 [switch_simplify]: 7.494e-05 [loop_unroll]: 6.265e-05 [a_1]: 0.00147196 [with_stream_mark]: 2.408e-05 [recompute_prepare]: 2.204e-05 [updatestate_depend_eliminate]: 9.57001e-06 [updatestate_assign_eliminate]: 7.85e-06 [updatestate_loads_eliminate]: 7.13e-06 [parameter_eliminate]: 2.87002e-06 [a_2]: 0.00024501 [accelerated_algorithm]: 3.039e-05 [shard]: 2.00002e-06 [meta_shard_fg_expand]: 3.4e-06 [shard_inline]: 1.625e-05 [merge_send_recv]: 1.654e-05 [auto_parallel]: 1.053e-05 [parallel]: 2.011e-05 [flash_sp]: 1.165e-05 [merge_comm]: 9.79e-06 [allreduce_fusion]: 9.02999e-06 [matmul_add_comm_reduction]: 2.673e-05 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 1.806e-05 [virtual_dataset]: 1.57e-05 [get_grad_eliminate_]: 1.501e-05 [virtual_output]: 1.54e-05 [merge_forward]: 9.60001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.91e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.865e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.716e-05 [set_forward_comm_id_for_comm_node_pass]: 9.32999e-06 [meta_fg_expand]: 0.00142927 [flash_sp_send_recv_attached]: 3.63e-06 [receive_attached]: 2.77002e-06 [after_resolve]: 5.915e-05 [a_after_grad]: 8.114e-05 [renormalize]: 0.00251014 [add_forward_monad_depend]: 9.52999e-06 [auto_monad_grad]: 5.23002e-06 [auto_monad_eliminator]: 5.655e-05 [cse]: 0.00017192 [a_3]: 0.00033603 [Cycle 2]: 0.00302246, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.782e-05 [loop_unroll]: 4.404e-05 [a_1]: 0.0015484 [with_stream_mark]: 1.209e-05 [recompute_prepare]: 1.076e-05 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 4.52998e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012537 [accelerated_algorithm]: 1.188e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.29e-06 [merge_send_recv]: 6.71999e-06 [auto_parallel]: 7.06001e-06 [parallel]: 4.76002e-06 [flash_sp]: 3.25e-06 [merge_comm]: 5.24998e-06 [allreduce_fusion]: 4.63999e-06 [matmul_add_comm_reduction]: 7.88001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.92001e-06 [virtual_dataset]: 8.70999e-06 [get_grad_eliminate_]: 9.02e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.08999e-06 [cell_reuse_recompute_pass]: 9.90025e-07 [offload_activation]: 9.57999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.571e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.366e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 6.792e-05 [flash_sp_send_recv_attached]: 9.29984e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.537e-05 [a_after_grad]: 1.431e-05 [renormalize]: 0.00059927 [add_forward_monad_depend]: 4.02e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 1.441e-05 [cse]: 4.755e-05 [a_3]: 6.643e-05 [Cycle 3]: 0.00090371, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.059e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.00024897 [with_stream_mark]: 1.002e-05 [recompute_prepare]: 9.35001e-06 [updatestate_depend_eliminate]: 4.80001e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 3.89997e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.00012405 [accelerated_algorithm]: 1.182e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 6.67002e-06 [auto_parallel]: 6.88e-06 [parallel]: 4.54002e-06 [flash_sp]: 1.11002e-06 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 5.00001e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.79003e-06 [get_grad_eliminate_]: 8.55001e-06 [virtual_output]: 8.27998e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 1.59e-06 [offload_activation]: 9.28002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.704e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.458e-05 [set_forward_comm_id_for_comm_node_pass]: 5.79e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.34e-05 [a_after_grad]: 1.439e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 1.063e-05 [cse]: 2.677e-05 [a_3]: 5.942e-05 [py_interpret_to_execute_after_opt_a]: 1.031e-05 [slice_cell_reuse_recomputed_activation]: 2.53003e-06 [rewriter_after_opt_a]: 4.788e-05 [convert_after_rewriter]: 8.96002e-06 [order_py_execute_after_rewriter]: 6.79999e-06 [mutable_eliminate]: 0.0004603 [opt_b]: 0.00028819, [1] [Cycle 1]: 0.00028197, [7] [b_1]: 0.00018894 [b_2]: 1.092e-05 [updatestate_depend_eliminate]: 7.13998e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 3.30008e-07 [cse]: 3.172e-05 [optimize_parallel_all_gather_comm]: 2.15e-05 [overlap_param_gather]: 1.83997e-06 [cconv]: 2.074e-05 [loop_unroll]: 0.00042516 [opt_after_cconv]: 0.0001351, [1] [Cycle 1]: 0.00012935, [7] [c_1]: 4.796e-05 [parameter_eliminate]: 2.28998e-06 [updatestate_depend_eliminate]: 7.07002e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.75998e-06 [cse]: 3.013e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.931e-05 [tuple_transform]: 0.00012601, [1] [Cycle 1]: 0.00012144, [4] [d_1]: 8.97e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.003e-05 [partial_unused_args_eliminate]: 2.04999e-06 [add_recomputation]: 6e-05 [cse_after_recomputation]: 3.37e-05, [1] [Cycle 1]: 2.898e-05, [1] [cse]: 2.36e-05 [environ_conv]: 9.51003e-06 [swap_dp_allreduce_reducescatter]: 8.23001e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.47e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.41998e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.31998e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.737e-05 [grouped_pairwise_exchange_alltoall]: 1.87999e-06 [offloading_packed_experts]: 5.00999e-06 [overlap_recompute_and_grad_model_parallel]: 5.66e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54998e-06 [overlap_recompute_comm]: 2.53998e-06 [overlap_grad_ring_attention]: 5.57999e-06 [overlap_grad_flash_sp]: 2.423e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.28002e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 9.883e-05, [1] [Cycle 1]: 9.464e-05, [6] [build]: 1.039e-05 [elim_shapecalc]: 1.346e-05 [elim_not_effective]: 1.807e-05 [opt_reshape]: 1.035e-05 [fold_const_symbol]: 1.483e-05 [renormalize]: 2.60014e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 2.598e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00047051 [validate]: 4.604e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0901216 [execute]: 8.92e-06 Sums bootstrap : 0.000565s : 0.49% type_inference : 0.011594s : 10.04% event_method : 0.000050s : 0.04% auto_monad : 0.000123s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000147s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000133s : 0.12% optimize.opt_a.loop_unroll : 0.000116s : 0.10% optimize.opt_a.a_1 : 0.003269s : 2.83% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.43% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000024s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001500s : 1.30% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.10% optimize.opt_a.renormalize : 0.003109s : 2.69% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.07% optimize.opt_a.cse : 0.000246s : 0.21% optimize.opt_a.a_3 : 0.000462s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.40% optimize.opt_b.b_1 : 0.000189s : 0.16% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000425s : 0.37% optimize.opt_after_cconv.c_1 : 0.000048s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000090s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.05% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000471s : 0.41% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.090122s : 78.04% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000766 222 5.90% : 0.000045s : 12: substitution.arithmetic_simplify 1.82% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.88% : 0.000428s : 17: substitution.inline 2.02% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.89% : 0.000014s : 3: substitution.less_batch_normalization 1.76% : 0.000014s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.79% : 0.000014s : 20: substitution.remove_not_recompute_node 3.25% : 0.000025s : 10: substitution.replace_applicator 1.37% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.57% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.64% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011516 2 86.77% : 0.009993s : 1: type_inference.infer 13.23% : 0.001524s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.96% : 0.000127s : 17: replace.inline 42.04% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000453 33 92.40% : 0.000419s : 17: match.inline 7.60% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000008s : 68: predicate.cast_eliminate 1.11% : 0.000008s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.46% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000009s : 68: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.93% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000038s : 277: predicate.switch_simplify 1.10% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.59% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001652 34 57.51% : 0.000950s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.49% : 0.000702s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.144636 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.11% : 0.003046s : 1: add_attr 2.10% : 0.003036s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000130s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000601s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000057s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.30% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.32% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.41% : 0.004937s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000174s : 28: opt.transform.opt_b 0.07% : 0.000097s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 7.70% : 0.011135s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.33% : 0.000480s : 1: opt_after_jit_grad 0.20% : 0.000292s : 1: opt_b 9.29% : 0.013437s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000054s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.15% : 0.001667s : 2: renormalize.infer 0.99% : 0.001429s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000101s : 1: symbol_engine_optimizer 62.32% : 0.090137s : 1: task_emit 0.09% : 0.000129s : 1: tuple_transform 8.03% : 0.011608s : 1: type_inference 0.05% : 0.000072s : 1: validate TotalTime = 0.0784425, [24] [bootstrap]: 0.00048869 [type_inference]: 0.00440252 [event_method]: 1.094e-05 [auto_monad]: 5.294e-05 [graph_reusing]: 5.57999e-06 [inline]: 1.70001e-06 [add_attr]: 0.0030022, [1] [add_attr_with_inline]: 0.00299454, [1] [Cycle 1]: 4.72e-05, [2] [tag_attr]: 1.198e-05 [meta_addattr_fg_expand]: 3.34001e-06 [parallel-infer-symbol]: 3.60998e-06 [pre_auto_parallel]: 2.104e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00371406, [53] [py_interpret_to_execute]: 1.539e-05 [rewriter_before_opt_a]: 3.815e-05 [opt_a]: 0.00191024, [2] [Cycle 1]: 0.0013088, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 2.471e-05 [loop_unroll]: 1.372e-05 [a_1]: 0.00029786 [with_stream_mark]: 1.371e-05 [recompute_prepare]: 7.66999e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.75e-05 [accelerated_algorithm]: 6.44999e-06 [shard]: 2.63998e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 8.15999e-06 [auto_parallel]: 5.89e-06 [parallel]: 1.973e-05 [flash_sp]: 7.39002e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 9.05001e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 6.72002e-06 [virtual_dataset]: 5.72999e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 9.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 9.16002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.16998e-06 [flash_sp_send_recv_attached]: 2.31998e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.123e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00038184 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 1.64998e-06 [auto_monad_eliminator]: 1.379e-05 [cse]: 2.825e-05 [a_3]: 4.1e-05 [Cycle 2]: 0.0005923, [45] [expand_dump_flag]: 8.40024e-07 [switch_simplify]: 6.54999e-06 [loop_unroll]: 5.48002e-06 [a_1]: 0.00012597 [with_stream_mark]: 1.082e-05 [recompute_prepare]: 5.67999e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.801e-05 [accelerated_algorithm]: 5.61998e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.56002e-06 [flash_sp]: 3.03998e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.54999e-06 [matmul_add_comm_reduction]: 5.69e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 5.22e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 6.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72999e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.83999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00998e-06 [meta_fg_expand]: 1.66998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 8.57e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 7.59988e-07 [auto_monad_eliminator]: 5.84e-06 [cse]: 1.311e-05 [a_3]: 3.138e-05 [py_interpret_to_execute_after_opt_a]: 7.27002e-06 [slice_cell_reuse_recomputed_activation]: 2.37999e-06 [rewriter_after_opt_a]: 3.165e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00044606 [opt_b]: 0.00018094, [1] [Cycle 1]: 0.00017455, [7] [b_1]: 0.00010733 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 3.9002e-07 [cse]: 1.574e-05 [optimize_parallel_all_gather_comm]: 1.604e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.197e-05 [loop_unroll]: 0.00041389 [opt_after_cconv]: 9.544e-05, [1] [Cycle 1]: 9.007e-05, [7] [c_1]: 2.829e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.638e-05 [renormalize]: 2.70025e-07 [remove_dup_value]: 1.315e-05 [tuple_transform]: 6.863e-05, [1] [Cycle 1]: 6.434e-05, [4] [d_1]: 3.876e-05 [none_parameter_eliminate]: 1.40001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.02999e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.321e-05 [cse_after_recomputation]: 2.032e-05, [1] [Cycle 1]: 1.611e-05, [1] [cse]: 1.099e-05 [environ_conv]: 4.4e-06 [swap_dp_allreduce_reducescatter]: 5.46998e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.35e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.42001e-06 [overlap_grad_ring_attention]: 4.33999e-06 [overlap_grad_flash_sp]: 1.785e-05 [begin_end_overlap_inline]: 8.70001e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.776e-05, [1] [Cycle 1]: 6.361e-05, [6] [build]: 2.31998e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.143e-05 [opt_reshape]: 5.86e-06 [fold_const_symbol]: 8.68001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.534e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00044833 [validate]: 3.033e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0660228 [execute]: 7.8e-06 Sums bootstrap : 0.000489s : 0.66% type_inference : 0.004403s : 5.91% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000424s : 0.57% optimize.opt_a.with_stream_mark : 0.000025s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000382s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.60% optimize.opt_b.b_1 : 0.000107s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000414s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000448s : 0.60% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.066023s : 88.65% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000124 26 17.74% : 0.000022s : 4: substitution.arithmetic_simplify 1.67% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.27% : 0.000005s : 4: substitution.graph_param_transform 66.03% : 0.000082s : 2: substitution.inline 2.21% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.53% : 0.000004s : 4: substitution.remove_not_recompute_node 3.47% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004360 2 91.79% : 0.004002s : 1: type_inference.infer 8.21% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.96% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.00% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.16% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.91% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.27% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.84% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 42.03% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.97% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.086470 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.48% : 0.003006s : 1: add_attr 3.47% : 0.002998s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.61% : 0.000526s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000455s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.90% : 0.000777s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.21% : 0.001913s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.53% : 0.000458s : 1: opt_after_jit_grad 0.21% : 0.000184s : 1: opt_b 4.30% : 0.003718s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.22% : 0.000188s : 1: renormalize.infer 0.22% : 0.000187s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.37% : 0.066038s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 5.11% : 0.004416s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.120805, [24] [bootstrap]: 0.00056653 [type_inference]: 0.0108428 [event_method]: 4.518e-05 [auto_monad]: 0.00011896 [graph_reusing]: 8.40001e-06 [inline]: 1.96e-06 [add_attr]: 0.00316831, [1] [add_attr_with_inline]: 0.00315995, [1] [Cycle 1]: 6.852e-05, [2] [tag_attr]: 3.178e-05 [meta_addattr_fg_expand]: 8.53001e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 4.652e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.013401, [53] [py_interpret_to_execute]: 3.784e-05 [rewriter_before_opt_a]: 0.0001303 [opt_a]: 0.011108, [3] [Cycle 1]: 0.00715178, [45] [expand_dump_flag]: 3.84002e-06 [switch_simplify]: 7.213e-05 [loop_unroll]: 5.623e-05 [a_1]: 0.00139249 [with_stream_mark]: 2.429e-05 [recompute_prepare]: 2.148e-05 [updatestate_depend_eliminate]: 9.10001e-06 [updatestate_assign_eliminate]: 7.95e-06 [updatestate_loads_eliminate]: 7.25e-06 [parameter_eliminate]: 3.01999e-06 [a_2]: 0.00024779 [accelerated_algorithm]: 3.105e-05 [shard]: 1.96e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.637e-05 [merge_send_recv]: 1.639e-05 [auto_parallel]: 1.045e-05 [parallel]: 2.067e-05 [flash_sp]: 1.125e-05 [merge_comm]: 1.01e-05 [allreduce_fusion]: 8.90999e-06 [matmul_add_comm_reduction]: 2.699e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 1.775e-05 [virtual_dataset]: 1.585e-05 [get_grad_eliminate_]: 1.532e-05 [virtual_output]: 1.531e-05 [merge_forward]: 9.28002e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.863e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.894e-05 [merge_recompute_call_nodes]: 1.79e-06 [before_grad]: 2.762e-05 [set_forward_comm_id_for_comm_node_pass]: 9.31998e-06 [meta_fg_expand]: 0.00140213 [flash_sp_send_recv_attached]: 3.64002e-06 [receive_attached]: 2.58003e-06 [after_resolve]: 6.023e-05 [a_after_grad]: 8.196e-05 [renormalize]: 0.00257394 [add_forward_monad_depend]: 9.17999e-06 [auto_monad_grad]: 5.42001e-06 [auto_monad_eliminator]: 5.624e-05 [cse]: 0.00017516 [a_3]: 0.00033658 [Cycle 2]: 0.00303395, [45] [expand_dump_flag]: 1.47001e-06 [switch_simplify]: 4.673e-05 [loop_unroll]: 4.482e-05 [a_1]: 0.00154897 [with_stream_mark]: 1.2e-05 [recompute_prepare]: 1.079e-05 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.12999e-06 [a_2]: 0.00012661 [accelerated_algorithm]: 1.222e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.96e-06 [shard_inline]: 9.34e-06 [merge_send_recv]: 7.15e-06 [auto_parallel]: 7.10002e-06 [parallel]: 4.79998e-06 [flash_sp]: 3.13e-06 [merge_comm]: 6.21e-06 [allreduce_fusion]: 5.04e-06 [matmul_add_comm_reduction]: 8.1e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.014e-05 [virtual_dataset]: 2.846e-05 [get_grad_eliminate_]: 9.27001e-06 [virtual_output]: 8.55999e-06 [merge_forward]: 4.53001e-06 [cell_reuse_recompute_pass]: 9.00007e-07 [offload_activation]: 9.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.657e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.405e-05 [set_forward_comm_id_for_comm_node_pass]: 5.52999e-06 [meta_fg_expand]: 3.51e-05 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.14003e-06 [after_resolve]: 1.455e-05 [a_after_grad]: 1.425e-05 [renormalize]: 0.00061191 [add_forward_monad_depend]: 4.03999e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.477e-05 [cse]: 4.768e-05 [a_3]: 6.519e-05 [Cycle 3]: 0.00090885, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 1.048e-05 [loop_unroll]: 9.15999e-06 [a_1]: 0.00025279 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 9.40001e-06 [updatestate_depend_eliminate]: 4.74998e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.63999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012384 [accelerated_algorithm]: 1.157e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 8.99e-06 [merge_send_recv]: 6.94999e-06 [auto_parallel]: 6.89001e-06 [parallel]: 4.50001e-06 [flash_sp]: 9.80013e-07 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.85999e-06 [matmul_add_comm_reduction]: 7.7e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.85001e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.16001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.736e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.502e-05 [set_forward_comm_id_for_comm_node_pass]: 5.79e-06 [meta_fg_expand]: 2.93003e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.336e-05 [a_after_grad]: 1.412e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 1.032e-05 [cse]: 2.751e-05 [a_3]: 5.982e-05 [py_interpret_to_execute_after_opt_a]: 1.044e-05 [slice_cell_reuse_recomputed_activation]: 2.19999e-06 [rewriter_after_opt_a]: 4.902e-05 [convert_after_rewriter]: 9.33002e-06 [order_py_execute_after_rewriter]: 7.47998e-06 [mutable_eliminate]: 0.00046564 [opt_b]: 0.00029144, [1] [Cycle 1]: 0.00028528, [7] [b_1]: 0.0001913 [b_2]: 1.098e-05 [updatestate_depend_eliminate]: 7.01999e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 3.83001e-06 [renormalize]: 4.40021e-07 [cse]: 3.278e-05 [optimize_parallel_all_gather_comm]: 2.097e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.031e-05 [loop_unroll]: 0.00042868 [opt_after_cconv]: 0.00013988, [1] [Cycle 1]: 0.00013393, [7] [c_1]: 4.948e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 7.25003e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.78999e-06 [cse]: 3.159e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.925e-05 [tuple_transform]: 0.00010314, [1] [Cycle 1]: 9.834e-05, [4] [d_1]: 6.789e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 5.797e-05 [cse_after_recomputation]: 3.232e-05, [1] [Cycle 1]: 2.763e-05, [1] [cse]: 2.233e-05 [environ_conv]: 9.22001e-06 [swap_dp_allreduce_reducescatter]: 7.51999e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.73003e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.74e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 5.24e-06 [overlap_recompute_and_grad_model_parallel]: 5.46998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.62001e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 5.62001e-06 [overlap_grad_flash_sp]: 2.454e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 0.00011605, [1] [Cycle 1]: 0.00011145, [6] [build]: 1.011e-05 [elim_shapecalc]: 1.291e-05 [elim_not_effective]: 1.817e-05 [opt_reshape]: 2.536e-05 [fold_const_symbol]: 1.593e-05 [renormalize]: 2.09984e-07 [detach_backward]: 2.04999e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 2.611e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.93001e-06 [opt_after_jit_grad]: 0.00047462 [validate]: 4.619e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.0918096 [execute]: 8.29002e-06 Sums bootstrap : 0.000567s : 0.49% type_inference : 0.010843s : 9.32% event_method : 0.000045s : 0.04% auto_monad : 0.000119s : 0.10% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000130s : 0.11% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000129s : 0.11% optimize.opt_a.loop_unroll : 0.000110s : 0.09% optimize.opt_a.a_1 : 0.003194s : 2.75% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000498s : 0.43% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000024s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000053s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001440s : 1.24% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.09% optimize.opt_a.renormalize : 0.003186s : 2.74% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.07% optimize.opt_a.cse : 0.000250s : 0.22% optimize.opt_a.a_3 : 0.000462s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000466s : 0.40% optimize.opt_b.b_1 : 0.000191s : 0.16% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000429s : 0.37% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000025s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000475s : 0.41% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.091810s : 78.90% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000752 218 5.72% : 0.000043s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.63% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.34% : 0.000003s : 2: substitution.incorporate_call_switch 55.62% : 0.000418s : 16: substitution.inline 2.06% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.84% : 0.000014s : 20: substitution.remove_not_recompute_node 3.25% : 0.000024s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.70% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.32% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010770 2 83.00% : 0.008939s : 1: type_inference.infer 17.00% : 0.001830s : 1: type_inference.specialize ------[replace.] 0.000207 30 58.89% : 0.000122s : 16: replace.inline 41.11% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000441 30 93.00% : 0.000410s : 16: match.inline 7.00% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000741 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.14% : 0.000016s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.32% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.10% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.56% : 0.000041s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.61% : 0.000019s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.69% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 67: predicate.reduce_eliminate 2.64% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.52% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001944 32 44.24% : 0.000860s : 12: func_graph_cloner_run.FuncGraphClonerGraph 55.76% : 0.001084s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.145782 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.18% : 0.003173s : 1: add_attr 2.17% : 0.003164s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000126s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000606s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000052s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.30% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000475s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.35% : 0.004879s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000176s : 28: opt.transform.opt_b 0.05% : 0.000076s : 2: opt.transform.opt_trans_graph 0.05% : 0.000069s : 4: opt.transform.symbol_engine_opt 7.62% : 0.011111s : 1: opt_a 0.10% : 0.000143s : 1: opt_after_cconv 0.33% : 0.000484s : 1: opt_after_jit_grad 0.20% : 0.000295s : 1: opt_b 9.20% : 0.013405s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.18% : 0.001727s : 2: renormalize.infer 0.99% : 0.001446s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000053s : 1: rewriter_after_opt_a 0.09% : 0.000135s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000119s : 1: symbol_engine_optimizer 62.99% : 0.091825s : 1: task_emit 0.07% : 0.000106s : 1: tuple_transform 7.45% : 0.010861s : 1: type_inference 0.05% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x5-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x6-pynative],max_mem:10.0M TotalTime = 0.0218607, [24] [bootstrap]: 0.0005509 [type_inference]: 0.00627892 [event_method]: 1.444e-05 [auto_monad]: 5.76e-05 [graph_reusing]: 5.34e-06 [inline]: 1.76e-06 [add_attr]: 0.00335626, [1] [add_attr_with_inline]: 0.00334502, [1] [Cycle 1]: 4.539e-05, [2] [tag_attr]: 1.567e-05 [meta_addattr_fg_expand]: 4.78001e-06 [parallel-infer-symbol]: 3.09001e-06 [pre_auto_parallel]: 4.394e-05 [insert-virtual-dataset]: 2.76999e-06 [parallel-infer-symbol-second]: 8.49977e-07 [dataset_repeat_opt]: 2.34001e-06 [pipeline_split]: 1.91998e-06 [optimize]: 0.00400391, [53] [py_interpret_to_execute]: 2.015e-05 [rewriter_before_opt_a]: 5.882e-05 [opt_a]: 0.00212507, [2] [Cycle 1]: 0.00152251, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 3.193e-05 [loop_unroll]: 2.087e-05 [a_1]: 0.00045858 [with_stream_mark]: 1.36e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.83999e-06 [updatestate_assign_eliminate]: 3.68999e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 7.686e-05 [accelerated_algorithm]: 6.66999e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.05002e-06 [merge_send_recv]: 8.15999e-06 [auto_parallel]: 5.97999e-06 [parallel]: 2.395e-05 [flash_sp]: 7.94997e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.16998e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.14001e-06 [virtual_dataset]: 6.19999e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.77002e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.41e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.092e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 9.41998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42997e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.88e-06 [after_resolve]: 1.016e-05 [a_after_grad]: 8.62e-06 [renormalize]: 0.00041453 [add_forward_monad_depend]: 4.53001e-06 [auto_monad_grad]: 2.08002e-06 [auto_monad_eliminator]: 1.4e-05 [cse]: 2.71e-05 [a_3]: 4.083e-05 [Cycle 2]: 0.00059289, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.62999e-06 [a_1]: 0.00012554 [with_stream_mark]: 9.26002e-06 [recompute_prepare]: 6.19999e-06 [updatestate_depend_eliminate]: 2.90002e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.803e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 4.44002e-06 [auto_parallel]: 5.07999e-06 [parallel]: 4.12998e-06 [flash_sp]: 3.8e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.17999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 5.26998e-06 [merge_forward]: 2.77002e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56998e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 7.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.74001e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 6.50005e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.75999e-06 [a_after_grad]: 7.95e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 5.99e-06 [cse]: 1.316e-05 [a_3]: 3.144e-05 [py_interpret_to_execute_after_opt_a]: 7.05e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 2.966e-05 [convert_after_rewriter]: 7.35e-06 [order_py_execute_after_rewriter]: 5.54e-06 [mutable_eliminate]: 0.00044737 [opt_b]: 0.00018045, [1] [Cycle 1]: 0.0001743, [7] [b_1]: 0.000107 [b_2]: 7.08998e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 4.89992e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.629e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.337e-05 [loop_unroll]: 0.00041238 [opt_after_cconv]: 9.387e-05, [1] [Cycle 1]: 8.838e-05, [7] [c_1]: 2.74e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.624e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.319e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.462e-05, [4] [d_1]: 3.884e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 5.143e-05 [cse_after_recomputation]: 2.051e-05, [1] [Cycle 1]: 1.575e-05, [1] [cse]: 1.055e-05 [environ_conv]: 4.67998e-06 [swap_dp_allreduce_reducescatter]: 4.93001e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.47998e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 1.04998e-06 [remove_cast_before_assign_add]: 1.50999e-06 [full_micro_interleaved_order_control]: 2.40002e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.186e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.58999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.712e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 0.00010845, [1] [Cycle 1]: 0.00010397, [6] [build]: 2.51e-06 [elim_shapecalc]: 8.55001e-06 [elim_not_effective]: 1.186e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 8.67e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.628e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 0.00013161 [opt_after_jit_grad]: 0.00045087 [validate]: 3.071e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00669412 [execute]: 7e-06 Sums bootstrap : 0.000551s : 3.15% type_inference : 0.006279s : 35.88% event_method : 0.000014s : 0.08% auto_monad : 0.000058s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000044s : 0.25% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000584s : 3.34% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000415s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000040s : 0.23% optimize.opt_a.a_3 : 0.000072s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000447s : 2.56% optimize.opt_b.b_1 : 0.000107s : 0.61% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000412s : 2.36% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000002s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000132s : 0.75% opt_after_jit_grad : 0.000451s : 2.58% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006694s : 38.25% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000170 30 14.71% : 0.000025s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.15% : 0.000005s : 4: substitution.graph_param_transform 66.87% : 0.000114s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000005s : 4: substitution.remove_not_recompute_node 2.32% : 0.000004s : 4: substitution.replace_old_param 6.73% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006233 2 89.63% : 0.005587s : 1: type_inference.infer 10.37% : 0.000646s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.33% : 0.000028s : 3: replace.inline 28.67% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 91.47% : 0.000112s : 3: match.inline 8.53% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.82% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.16% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.61% : 0.000001s : 4: predicate.parallel_virtual_node 1.79% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.27% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.17% : 0.000008s : 54: predicate.switch_simplify 0.94% : 0.000001s : 11: predicate.tile_eliminate 0.93% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000383 8 46.48% : 0.000178s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.52% : 0.000205s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030735 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.93% : 0.003361s : 1: add_attr 10.90% : 0.003349s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000063s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000589s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.09% : 0.000950s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.92% : 0.002128s : 1: opt_a 0.32% : 0.000097s : 1: opt_after_cconv 1.50% : 0.000461s : 1: opt_after_jit_grad 0.60% : 0.000184s : 1: opt_b 13.04% : 0.004008s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.16% : 0.000050s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.69% : 0.000212s : 1: renormalize.infer 0.64% : 0.000196s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000137s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.36% : 0.000111s : 1: symbol_engine_optimizer 21.81% : 0.006705s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.47% : 0.006292s : 1: type_inference 0.19% : 0.000058s : 1: validate TotalTime = 0.0183845, [24] [bootstrap]: 0.00049329 [type_inference]: 0.00445962 [event_method]: 1.061e-05 [auto_monad]: 5.193e-05 [graph_reusing]: 5.04e-06 [inline]: 1.52999e-06 [add_attr]: 0.00296759, [1] [add_attr_with_inline]: 0.00296001, [1] [Cycle 1]: 4.733e-05, [2] [tag_attr]: 1.214e-05 [meta_addattr_fg_expand]: 4.14002e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.117e-05 [insert-virtual-dataset]: 2.83e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0036943, [53] [py_interpret_to_execute]: 1.541e-05 [rewriter_before_opt_a]: 3.93e-05 [opt_a]: 0.00188607, [2] [Cycle 1]: 0.00128925, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 2.469e-05 [loop_unroll]: 1.392e-05 [a_1]: 0.00029516 [with_stream_mark]: 3.99e-05 [recompute_prepare]: 8.02998e-06 [updatestate_depend_eliminate]: 3.80998e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 7.788e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 2.10002e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.90002e-06 [merge_send_recv]: 7.70998e-06 [auto_parallel]: 5.61e-06 [parallel]: 1.941e-05 [flash_sp]: 7.8e-06 [merge_comm]: 3.67002e-06 [allreduce_fusion]: 3.85998e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.51001e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.79999e-06 [merge_forward]: 4.12e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 9.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.081e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 9.43997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.52002e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 3.23e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 1.029e-05 [a_after_grad]: 8.84e-06 [renormalize]: 0.00033985 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.274e-05 [cse]: 2.762e-05 [a_3]: 3.949e-05 [Cycle 2]: 0.00058739, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.89001e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012468 [with_stream_mark]: 1.083e-05 [recompute_prepare]: 5.40999e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.43002e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.676e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.02999e-06 [parallel]: 4.24997e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.51e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 6.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27999e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.07003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.61002e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.14e-06 [after_resolve]: 9.07999e-06 [a_after_grad]: 8.37e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 5.75001e-06 [cse]: 1.28e-05 [a_3]: 3.138e-05 [py_interpret_to_execute_after_opt_a]: 7.3e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.266e-05 [convert_after_rewriter]: 7.43999e-06 [order_py_execute_after_rewriter]: 5.45001e-06 [mutable_eliminate]: 0.00044847 [opt_b]: 0.00018166, [1] [Cycle 1]: 0.00017551, [7] [b_1]: 0.00010838 [b_2]: 7.22002e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 2.89991e-07 [cse]: 1.557e-05 [optimize_parallel_all_gather_comm]: 1.675e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.211e-05 [loop_unroll]: 0.00041114 [opt_after_cconv]: 9.331e-05, [1] [Cycle 1]: 8.769e-05, [7] [c_1]: 2.753e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.18998e-06 [cse]: 1.538e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.305e-05 [tuple_transform]: 6.883e-05, [1] [Cycle 1]: 6.46e-05, [4] [d_1]: 3.91e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.05002e-06 [partial_unused_args_eliminate]: 1.88997e-06 [add_recomputation]: 4.552e-05 [cse_after_recomputation]: 1.964e-05, [1] [Cycle 1]: 1.529e-05, [1] [cse]: 1.022e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.02e-06 [label_fine_grained_interleaved_index]: 2.63998e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.23998e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 1.12e-06 [remove_cast_before_assign_add]: 1.45999e-06 [full_micro_interleaved_order_control]: 2.49999e-06 [reorder_send_recv_between_fp_bp]: 2.70002e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.54e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.79002e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72999e-06 [overlap_recompute_comm]: 2.45002e-06 [overlap_grad_ring_attention]: 4.3e-06 [overlap_grad_flash_sp]: 1.695e-05 [begin_end_overlap_inline]: 8.40024e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.63997e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 6.88e-05, [1] [Cycle 1]: 6.47e-05, [6] [build]: 2.02999e-06 [elim_shapecalc]: 8.40999e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 9.15001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.64e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.46999e-06 [opt_after_jit_grad]: 0.0004494 [validate]: 3.171e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.00596006 [execute]: 7.06001e-06 Sums bootstrap : 0.000493s : 3.41% type_inference : 0.004460s : 30.83% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000420s : 2.90% optimize.opt_a.with_stream_mark : 0.000051s : 0.35% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000340s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.10% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000411s : 2.84% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.11% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005960s : 41.21% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000124 26 18.66% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.21% : 0.000002s : 2: substitution.fold_const_symbol 4.71% : 0.000006s : 4: substitution.graph_param_transform 65.14% : 0.000081s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004418 2 92.22% : 0.004074s : 1: type_inference.infer 7.78% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.33% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 2.12% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.96% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.63% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 43.92% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.08% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026317 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.29% : 0.002972s : 1: add_attr 11.26% : 0.002963s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.02% : 0.000532s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.93% : 0.000772s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.18% : 0.001889s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000459s : 1: opt_after_jit_grad 0.70% : 0.000185s : 1: opt_b 14.05% : 0.003698s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.72% : 0.000189s : 1: renormalize.infer 0.55% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000037s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.68% : 0.005970s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 17.00% : 0.004473s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0197038, [24] [bootstrap]: 0.00048159 [type_inference]: 0.00555952 [event_method]: 1.4e-05 [auto_monad]: 5.634e-05 [graph_reusing]: 6.06998e-06 [inline]: 1.93002e-06 [add_attr]: 0.00296986, [1] [add_attr_with_inline]: 0.00296148, [1] [Cycle 1]: 4.588e-05, [2] [tag_attr]: 1.512e-05 [meta_addattr_fg_expand]: 4.4e-06 [parallel-infer-symbol]: 2.78998e-06 [pre_auto_parallel]: 2.573e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.37001e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00394306, [53] [py_interpret_to_execute]: 1.903e-05 [rewriter_before_opt_a]: 5.751e-05 [opt_a]: 0.00212214, [2] [Cycle 1]: 0.00149762, [45] [expand_dump_flag]: 2.64001e-06 [switch_simplify]: 3.319e-05 [loop_unroll]: 2.037e-05 [a_1]: 0.00045054 [with_stream_mark]: 1.325e-05 [recompute_prepare]: 7.97e-06 [updatestate_depend_eliminate]: 4.07e-06 [updatestate_assign_eliminate]: 3.40003e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.668e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 8.38999e-06 [auto_parallel]: 5.81e-06 [parallel]: 1.872e-05 [flash_sp]: 7.37997e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.59002e-06 [matmul_add_comm_reduction]: 9.57999e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.43e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.59998e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.064e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.57997e-06 [meta_fg_expand]: 2.28998e-06 [flash_sp_send_recv_attached]: 2.78e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 8.54e-06 [renormalize]: 0.00040282 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.32e-05 [cse]: 2.854e-05 [a_3]: 3.999e-05 [Cycle 2]: 0.00061534, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012504 [with_stream_mark]: 9.60001e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.736e-05 [accelerated_algorithm]: 5.74999e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.11002e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.16999e-06 [merge_comm]: 3.05998e-06 [allreduce_fusion]: 3.06001e-06 [matmul_add_comm_reduction]: 5.55001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.18998e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.89001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.14001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.009e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 8.24002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.29001e-06 [cse]: 1.43e-05 [a_3]: 3.169e-05 [py_interpret_to_execute_after_opt_a]: 7.60998e-06 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 3.137e-05 [convert_after_rewriter]: 6.93998e-06 [order_py_execute_after_rewriter]: 4.92e-06 [mutable_eliminate]: 0.00044354 [opt_b]: 0.00018247, [1] [Cycle 1]: 0.00017655, [7] [b_1]: 0.00010987 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 3.19997e-07 [cse]: 1.629e-05 [optimize_parallel_all_gather_comm]: 1.556e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00041233 [opt_after_cconv]: 9.501e-05, [1] [Cycle 1]: 8.927e-05, [7] [c_1]: 2.754e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.624e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.306e-05 [tuple_transform]: 6.859e-05, [1] [Cycle 1]: 6.412e-05, [4] [d_1]: 3.906e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.01998e-06 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 4.534e-05 [cse_after_recomputation]: 2.004e-05, [1] [Cycle 1]: 1.582e-05, [1] [cse]: 1.09e-05 [environ_conv]: 4.35999e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 3.06001e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.46998e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 3.67998e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.24999e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.729e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 1.22999e-06 [symbol_engine_optimizer]: 6.695e-05, [1] [Cycle 1]: 6.292e-05, [6] [build]: 2.26998e-06 [elim_shapecalc]: 8.45001e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.69998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.565e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00044679 [validate]: 3.014e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.005933 [execute]: 7.1e-06 Sums bootstrap : 0.000482s : 3.06% type_inference : 0.005560s : 35.27% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000576s : 3.65% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000403s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.81% optimize.opt_b.b_1 : 0.000110s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000412s : 2.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 2.83% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005933s : 37.64% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000166 30 15.10% : 0.000025s : 5: substitution.arithmetic_simplify 1.18% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.36% : 0.000006s : 4: substitution.graph_param_transform 66.41% : 0.000110s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.47% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005517 2 90.05% : 0.004968s : 1: type_inference.infer 9.95% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.04% : 0.000027s : 3: replace.inline 29.96% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.76% : 0.000108s : 3: match.inline 8.24% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000002s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.75% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.98% : 0.000002s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.08% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.64% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 47.24% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.76% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028119 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.58% : 0.002974s : 1: add_attr 10.55% : 0.002965s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.84% : 0.000518s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.61% : 0.000452s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.35% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.56% : 0.002125s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.62% : 0.000457s : 1: opt_after_jit_grad 0.66% : 0.000186s : 1: opt_b 14.04% : 0.003947s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000208s : 1: renormalize.infer 0.67% : 0.000188s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.14% : 0.005944s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.82% : 0.005573s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0381383, [24] [bootstrap]: 0.00053003 [type_inference]: 0.01151 [event_method]: 4.708e-05 [auto_monad]: 0.00012149 [graph_reusing]: 7.86001e-06 [inline]: 2.06e-06 [add_attr]: 0.00300991, [1] [add_attr_with_inline]: 0.00300117, [1] [Cycle 1]: 7.187e-05, [2] [tag_attr]: 3.514e-05 [meta_addattr_fg_expand]: 9.32001e-06 [parallel-infer-symbol]: 2.96999e-06 [pre_auto_parallel]: 5.087e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 2.34001e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.0132626, [53] [py_interpret_to_execute]: 3.853e-05 [rewriter_before_opt_a]: 0.00014549 [opt_a]: 0.011003, [3] [Cycle 1]: 0.00703807, [45] [expand_dump_flag]: 4.38999e-06 [switch_simplify]: 7.402e-05 [loop_unroll]: 6.088e-05 [a_1]: 0.00143732 [with_stream_mark]: 2.346e-05 [recompute_prepare]: 2.166e-05 [updatestate_depend_eliminate]: 9.32999e-06 [updatestate_assign_eliminate]: 7.97e-06 [updatestate_loads_eliminate]: 7.3e-06 [parameter_eliminate]: 2.72001e-06 [a_2]: 0.00024335 [accelerated_algorithm]: 3.149e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 3.26999e-06 [shard_inline]: 1.614e-05 [merge_send_recv]: 1.661e-05 [auto_parallel]: 1.109e-05 [parallel]: 1.939e-05 [flash_sp]: 1.187e-05 [merge_comm]: 9.72001e-06 [allreduce_fusion]: 9.02999e-06 [matmul_add_comm_reduction]: 2.679e-05 [allreduce_slice_to_reducescatter]: 7.40023e-07 [virtual_shard_identity]: 1.756e-05 [virtual_dataset]: 1.553e-05 [get_grad_eliminate_]: 1.5e-05 [virtual_output]: 1.489e-05 [merge_forward]: 1.042e-05 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 1.827e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.92e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.71e-05 [set_forward_comm_id_for_comm_node_pass]: 9.81998e-06 [meta_fg_expand]: 0.0013964 [flash_sp_send_recv_attached]: 4.12e-06 [receive_attached]: 2.02001e-06 [after_resolve]: 5.907e-05 [a_after_grad]: 8.133e-05 [renormalize]: 0.00241863 [add_forward_monad_depend]: 9.07001e-06 [auto_monad_grad]: 5.25001e-06 [auto_monad_eliminator]: 5.591e-05 [cse]: 0.00016578 [a_3]: 0.00033621 [Cycle 2]: 0.00298871, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 4.732e-05 [loop_unroll]: 4.408e-05 [a_1]: 0.00152437 [with_stream_mark]: 1.185e-05 [recompute_prepare]: 1.094e-05 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 4.20999e-06 [updatestate_loads_eliminate]: 3.76001e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012595 [accelerated_algorithm]: 1.169e-05 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.36998e-06 [merge_send_recv]: 6.56e-06 [auto_parallel]: 7.17002e-06 [parallel]: 4.63999e-06 [flash_sp]: 3.81001e-06 [merge_comm]: 5.93002e-06 [allreduce_fusion]: 4.98001e-06 [matmul_add_comm_reduction]: 7.93001e-06 [allreduce_slice_to_reducescatter]: 3.99974e-07 [virtual_shard_identity]: 1.046e-05 [virtual_dataset]: 8.82999e-06 [get_grad_eliminate_]: 8.48001e-06 [virtual_output]: 8.45999e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 8.09989e-07 [offload_activation]: 8.90999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.669e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.455e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35999e-06 [meta_fg_expand]: 6.854e-05 [flash_sp_send_recv_attached]: 1.01002e-06 [receive_attached]: 1.14e-06 [after_resolve]: 1.58e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00058591 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 1.378e-05 [cse]: 4.546e-05 [a_3]: 6.546e-05 [Cycle 3]: 0.00096247, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.029e-05 [loop_unroll]: 8.89e-06 [a_1]: 0.00031191 [with_stream_mark]: 1.043e-05 [recompute_prepare]: 9.31e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 4.02998e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 0.00012445 [accelerated_algorithm]: 1.161e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 8.94e-06 [merge_send_recv]: 6.90002e-06 [auto_parallel]: 7.02002e-06 [parallel]: 4.58999e-06 [flash_sp]: 1.12e-06 [merge_comm]: 4.87e-06 [allreduce_fusion]: 4.88001e-06 [matmul_add_comm_reduction]: 7.45e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.021e-05 [virtual_dataset]: 8.55999e-06 [get_grad_eliminate_]: 8.40999e-06 [virtual_output]: 8.20999e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.52e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.561e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.379e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07e-06 [meta_fg_expand]: 2.81999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.444e-05 [a_after_grad]: 1.439e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.21997e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 1.076e-05 [cse]: 2.614e-05 [a_3]: 5.895e-05 [py_interpret_to_execute_after_opt_a]: 9.67999e-06 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 4.707e-05 [convert_after_rewriter]: 8.94003e-06 [order_py_execute_after_rewriter]: 6.61999e-06 [mutable_eliminate]: 0.00045714 [opt_b]: 0.00028656, [1] [Cycle 1]: 0.00028048, [7] [b_1]: 0.00018956 [b_2]: 1.065e-05 [updatestate_depend_eliminate]: 6.98998e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.83999e-06 [renormalize]: 3.19997e-07 [cse]: 3.04e-05 [optimize_parallel_all_gather_comm]: 2.056e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.105e-05 [loop_unroll]: 0.00042344 [opt_after_cconv]: 0.00013542, [1] [Cycle 1]: 0.00012955, [7] [c_1]: 4.879e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 7.27002e-06 [updatestate_assign_eliminate]: 4.32998e-06 [updatestate_loads_eliminate]: 3.9e-06 [cse]: 2.953e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.919e-05 [tuple_transform]: 0.00010087, [1] [Cycle 1]: 9.62e-05, [4] [d_1]: 6.654e-05 [none_parameter_eliminate]: 1.68997e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.56998e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 5.891e-05 [cse_after_recomputation]: 3.174e-05, [1] [Cycle 1]: 2.712e-05, [1] [cse]: 2.145e-05 [environ_conv]: 8.55999e-06 [swap_dp_allreduce_reducescatter]: 7.93999e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.65999e-06 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.28002e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.30999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 2.03002e-06 [control_data_broadcast_order]: 1.766e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 5.18002e-06 [overlap_recompute_and_grad_model_parallel]: 5.42999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 5.14998e-06 [overlap_grad_flash_sp]: 2.43e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.61e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 9.676e-05, [1] [Cycle 1]: 9.261e-05, [6] [build]: 9.53002e-06 [elim_shapecalc]: 1.286e-05 [elim_not_effective]: 1.819e-05 [opt_reshape]: 1.005e-05 [fold_const_symbol]: 1.496e-05 [renormalize]: 1.99972e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.77001e-06 [auto_monad_reorder]: 2.505e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00047003 [validate]: 4.404e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.00881626 [execute]: 7.93999e-06 Sums bootstrap : 0.000530s : 1.57% type_inference : 0.011510s : 34.00% event_method : 0.000047s : 0.14% auto_monad : 0.000121s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000145s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003274s : 9.67% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.08% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.09% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001468s : 4.34% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.26% optimize.opt_a.a_after_grad : 0.000110s : 0.32% optimize.opt_a.renormalize : 0.003005s : 8.87% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000237s : 0.70% optimize.opt_a.a_3 : 0.000461s : 1.36% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000457s : 1.35% optimize.opt_b.b_1 : 0.000190s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000423s : 1.25% optimize.opt_after_cconv.c_1 : 0.000049s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000059s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.39% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008816s : 26.04% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000759 222 5.93% : 0.000045s : 12: substitution.arithmetic_simplify 1.98% : 0.000015s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 55.31% : 0.000420s : 17: substitution.inline 2.04% : 0.000015s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.79% : 0.000014s : 20: substitution.remove_not_recompute_node 3.18% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.63% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011433 2 87.26% : 0.009977s : 1: type_inference.infer 12.74% : 0.001456s : 1: type_inference.specialize ------[replace.] 0.000217 33 57.40% : 0.000125s : 17: replace.inline 42.60% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 33 92.33% : 0.000411s : 17: match.inline 7.67% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000008s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.12% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.46% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.15% : 0.000009s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000015s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001543 34 57.44% : 0.000886s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.56% : 0.000657s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062678 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.81% : 0.003015s : 1: add_attr 4.79% : 0.003005s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000128s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.91% : 0.000568s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.69% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.88% : 0.004936s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.56% : 0.011006s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.76% : 0.000479s : 1: opt_after_jit_grad 0.46% : 0.000290s : 1: opt_b 21.17% : 0.013267s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.55% : 0.001601s : 2: renormalize.infer 2.22% : 0.001391s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000099s : 1: symbol_engine_optimizer 14.09% : 0.008829s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.39% : 0.011525s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0186196, [24] [bootstrap]: 0.00049641 [type_inference]: 0.00435718 [event_method]: 1.036e-05 [auto_monad]: 5.353e-05 [graph_reusing]: 5.65001e-06 [inline]: 1.96003e-06 [add_attr]: 0.00301307, [1] [add_attr_with_inline]: 0.00300532, [1] [Cycle 1]: 4.583e-05, [2] [tag_attr]: 1.242e-05 [meta_addattr_fg_expand]: 3.36999e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.125e-05 [insert-virtual-dataset]: 2.41998e-06 [parallel-infer-symbol-second]: 6.99976e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.00369234, [53] [py_interpret_to_execute]: 1.525e-05 [rewriter_before_opt_a]: 3.965e-05 [opt_a]: 0.00186119, [2] [Cycle 1]: 0.00126469, [45] [expand_dump_flag]: 2.68998e-06 [switch_simplify]: 2.474e-05 [loop_unroll]: 1.385e-05 [a_1]: 0.00029744 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.25998e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.817e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.19002e-06 [auto_parallel]: 5.60001e-06 [parallel]: 1.943e-05 [flash_sp]: 7.35998e-06 [merge_comm]: 3.68999e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.19e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.2e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.55003e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.55001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.36998e-06 [receive_attached]: 2.11998e-06 [after_resolve]: 1.041e-05 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00034263 [add_forward_monad_depend]: 4.85999e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 1.36e-05 [cse]: 2.743e-05 [a_3]: 3.956e-05 [Cycle 2]: 0.00058769, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00012488 [with_stream_mark]: 9.27999e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.783e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 9.90025e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.66003e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 4.94e-06 [parallel]: 4.4e-06 [flash_sp]: 3.40003e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.27999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.14998e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 6.05002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 1.99e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.97999e-06 [a_after_grad]: 7.73001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.389e-05 [a_3]: 3.122e-05 [py_interpret_to_execute_after_opt_a]: 7.13e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.17e-05 [convert_after_rewriter]: 7.11999e-06 [order_py_execute_after_rewriter]: 6.36e-06 [mutable_eliminate]: 0.00044292 [opt_b]: 0.00018097, [1] [Cycle 1]: 0.00017513, [7] [b_1]: 0.00010823 [b_2]: 7.12002e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.69997e-07 [cse]: 1.591e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.328e-05 [loop_unroll]: 0.00040895 [opt_after_cconv]: 9.459e-05, [1] [Cycle 1]: 8.87e-05, [7] [c_1]: 2.751e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.597e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.421e-05, [4] [d_1]: 3.911e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.07001e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.412e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.59e-05, [1] [cse]: 1.046e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.20001e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.50001e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.35001e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.228e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.75e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.65001e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.68e-05, [1] [Cycle 1]: 6.277e-05, [6] [build]: 2.29999e-06 [elim_shapecalc]: 7.8e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 6.05002e-06 [fold_const_symbol]: 8.62e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.72999e-06 [auto_monad_reorder]: 1.61e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.55998e-06 [opt_after_jit_grad]: 0.00044873 [validate]: 3.064e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00625144 [execute]: 7.19001e-06 Sums bootstrap : 0.000496s : 3.39% type_inference : 0.004357s : 29.80% event_method : 0.000010s : 0.07% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000422s : 2.89% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000343s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000443s : 3.03% optimize.opt_b.b_1 : 0.000108s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000409s : 2.80% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000449s : 3.07% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006251s : 42.75% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000123 26 18.42% : 0.000023s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.56% : 0.000006s : 4: substitution.graph_param_transform 65.72% : 0.000081s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.09% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004316 2 91.88% : 0.003966s : 1: type_inference.infer 8.12% : 0.000351s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.60% : 0.000004s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 2.11% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.02% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.80% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.72% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.72% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 42.83% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.17% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026566 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.36% : 0.003017s : 1: add_attr 11.33% : 0.003009s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.01% : 0.000533s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000418s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.70% : 0.000452s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.91% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.02% : 0.001864s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.72% : 0.000458s : 1: opt_after_jit_grad 0.69% : 0.000184s : 1: opt_b 13.91% : 0.003696s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.04% : 0.000010s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000189s : 1: renormalize.infer 0.55% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000069s : 1: symbol_engine_optimizer 23.57% : 0.006262s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.45% : 0.004371s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0361294, [24] [bootstrap]: 0.00053986 [type_inference]: 0.010278 [event_method]: 4.642e-05 [auto_monad]: 0.00011186 [graph_reusing]: 8.15e-06 [inline]: 1.71998e-06 [add_attr]: 0.00303299, [1] [add_attr_with_inline]: 0.00302442, [1] [Cycle 1]: 6.734e-05, [2] [tag_attr]: 3.117e-05 [meta_addattr_fg_expand]: 8.65999e-06 [parallel-infer-symbol]: 3.01999e-06 [pre_auto_parallel]: 4.69e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.013021, [53] [py_interpret_to_execute]: 3.578e-05 [rewriter_before_opt_a]: 0.00012666 [opt_a]: 0.0107415, [3] [Cycle 1]: 0.00685601, [45] [expand_dump_flag]: 3.56999e-06 [switch_simplify]: 6.736e-05 [loop_unroll]: 5.524e-05 [a_1]: 0.00134211 [with_stream_mark]: 2.386e-05 [recompute_prepare]: 2.107e-05 [updatestate_depend_eliminate]: 8.84e-06 [updatestate_assign_eliminate]: 7.62998e-06 [updatestate_loads_eliminate]: 7.09001e-06 [parameter_eliminate]: 2.69001e-06 [a_2]: 0.00024338 [accelerated_algorithm]: 3.067e-05 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 3.3e-06 [shard_inline]: 1.575e-05 [merge_send_recv]: 1.671e-05 [auto_parallel]: 1.065e-05 [parallel]: 1.843e-05 [flash_sp]: 1.196e-05 [merge_comm]: 9.64999e-06 [allreduce_fusion]: 9.27001e-06 [matmul_add_comm_reduction]: 2.739e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.742e-05 [virtual_dataset]: 1.549e-05 [get_grad_eliminate_]: 1.486e-05 [virtual_output]: 1.504e-05 [merge_forward]: 9.51e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.884e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.82e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 2.677e-05 [set_forward_comm_id_for_comm_node_pass]: 9.51e-06 [meta_fg_expand]: 0.00142421 [flash_sp_send_recv_attached]: 3.56999e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 5.892e-05 [a_after_grad]: 8.078e-05 [renormalize]: 0.00234308 [add_forward_monad_depend]: 9.54e-06 [auto_monad_grad]: 5.40999e-06 [auto_monad_eliminator]: 5.578e-05 [cse]: 0.00016542 [a_3]: 0.0003333 [Cycle 2]: 0.00297948, [45] [expand_dump_flag]: 1.47001e-06 [switch_simplify]: 4.657e-05 [loop_unroll]: 4.355e-05 [a_1]: 0.00156632 [with_stream_mark]: 1.211e-05 [recompute_prepare]: 1.08e-05 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 4.45999e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 1.10999e-06 [a_2]: 0.00012483 [accelerated_algorithm]: 1.206e-05 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 9.04003e-06 [merge_send_recv]: 6.79999e-06 [auto_parallel]: 7.14001e-06 [parallel]: 5.03002e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.15999e-06 [allreduce_fusion]: 4.70001e-06 [matmul_add_comm_reduction]: 7.95998e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 9.94999e-06 [virtual_dataset]: 8.92e-06 [get_grad_eliminate_]: 8.79998e-06 [virtual_output]: 8.79998e-06 [merge_forward]: 5.04998e-06 [cell_reuse_recompute_pass]: 9.40025e-07 [offload_activation]: 8.90999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.636e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.378e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40001e-06 [meta_fg_expand]: 3.579e-05 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.477e-05 [a_after_grad]: 1.399e-05 [renormalize]: 0.00057424 [add_forward_monad_depend]: 4.02e-06 [auto_monad_grad]: 1.14998e-06 [auto_monad_eliminator]: 1.382e-05 [cse]: 4.573e-05 [a_3]: 6.49e-05 [Cycle 3]: 0.00089179, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.075e-05 [loop_unroll]: 8.84e-06 [a_1]: 0.0002486 [with_stream_mark]: 1.004e-05 [recompute_prepare]: 9.52999e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 0.00012262 [accelerated_algorithm]: 1.167e-05 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.66998e-06 [shard_inline]: 8.95001e-06 [merge_send_recv]: 7.03e-06 [auto_parallel]: 6.87002e-06 [parallel]: 4.52e-06 [flash_sp]: 1.10999e-06 [merge_comm]: 4.86002e-06 [allreduce_fusion]: 4.97999e-06 [matmul_add_comm_reduction]: 7.66001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 9.64e-06 [virtual_dataset]: 8.59e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.18001e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.50999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.56e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35999e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.301e-05 [a_after_grad]: 1.443e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 1.008e-05 [cse]: 2.478e-05 [a_3]: 5.955e-05 [py_interpret_to_execute_after_opt_a]: 1.027e-05 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 4.77e-05 [convert_after_rewriter]: 9.51e-06 [order_py_execute_after_rewriter]: 6.76999e-06 [mutable_eliminate]: 0.00049921 [opt_b]: 0.00028765, [1] [Cycle 1]: 0.0002816, [7] [b_1]: 0.00018856 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 7.11001e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 3.95e-06 [renormalize]: 5.00004e-07 [cse]: 3.115e-05 [optimize_parallel_all_gather_comm]: 2.006e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.128e-05 [loop_unroll]: 0.00042248 [opt_after_cconv]: 0.00013367, [1] [Cycle 1]: 0.00012782, [7] [c_1]: 4.8e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 7.05e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 3.85e-06 [cse]: 2.857e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.884e-05 [tuple_transform]: 0.00010119, [1] [Cycle 1]: 9.653e-05, [4] [d_1]: 6.65e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 9.82999e-06 [partial_unused_args_eliminate]: 2.36e-06 [add_recomputation]: 5.827e-05 [cse_after_recomputation]: 3.241e-05, [1] [Cycle 1]: 2.725e-05, [1] [cse]: 2.165e-05 [environ_conv]: 8.74e-06 [swap_dp_allreduce_reducescatter]: 8.22e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.04002e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.56e-06 [reorder_send_recv_between_fp_bp]: 2.90002e-06 [comm_op_add_attrs]: 1.16997e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.73e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 4.94e-06 [overlap_recompute_and_grad_model_parallel]: 5.49e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 5.07e-06 [overlap_grad_flash_sp]: 2.395e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 2.11e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.875e-05, [1] [Cycle 1]: 9.438e-05, [6] [build]: 9.71998e-06 [elim_shapecalc]: 1.353e-05 [elim_not_effective]: 1.828e-05 [opt_reshape]: 1.011e-05 [fold_const_symbol]: 1.496e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.94999e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 2.625e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.54002e-06 [opt_after_jit_grad]: 0.00046621 [validate]: 4.362e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0082714 [execute]: 7.35e-06 Sums bootstrap : 0.000540s : 1.70% type_inference : 0.010278s : 32.27% event_method : 0.000046s : 0.15% auto_monad : 0.000112s : 0.35% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003157s : 9.91% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000491s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.10% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000054s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001463s : 4.59% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.34% optimize.opt_a.renormalize : 0.002917s : 9.16% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000236s : 0.74% optimize.opt_a.a_3 : 0.000458s : 1.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000499s : 1.57% optimize.opt_b.b_1 : 0.000189s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000422s : 1.33% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000466s : 1.46% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008271s : 25.97% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000732 218 5.88% : 0.000043s : 11: substitution.arithmetic_simplify 1.84% : 0.000013s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 54.90% : 0.000402s : 16: substitution.inline 2.06% : 0.000015s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.06% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.82% : 0.000013s : 20: substitution.remove_not_recompute_node 3.34% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.79% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.92% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.47% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010208 2 87.56% : 0.008938s : 1: type_inference.infer 12.44% : 0.001270s : 1: type_inference.specialize ------[replace.] 0.000201 30 58.87% : 0.000119s : 16: replace.inline 41.13% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000424 30 92.82% : 0.000394s : 16: match.inline 7.18% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000733 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.70% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.20% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.58% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.44% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001506 32 58.89% : 0.000887s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.11% : 0.000619s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060181 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.05% : 0.003037s : 1: add_attr 5.03% : 0.003028s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000118s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.96% : 0.000576s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.85% : 0.000509s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.90% : 0.004752s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.85% : 0.010745s : 1: opt_a 0.23% : 0.000137s : 1: opt_after_cconv 0.79% : 0.000476s : 1: opt_after_jit_grad 0.48% : 0.000291s : 1: opt_b 21.64% : 0.013025s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.60% : 0.001564s : 2: renormalize.infer 2.23% : 0.001341s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000052s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000102s : 1: symbol_engine_optimizer 13.76% : 0.008282s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.10% : 0.010293s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x6-kbk],max_mem:10.0M TotalTime = 0.815286, [24] [bootstrap]: 0.00061391 [type_inference]: 0.00613119 [event_method]: 1.365e-05 [auto_monad]: 5.704e-05 [graph_reusing]: 6.16998e-06 [inline]: 1.76e-06 [add_attr]: 0.00339386, [1] [add_attr_with_inline]: 0.00338314, [1] [Cycle 1]: 4.685e-05, [2] [tag_attr]: 1.579e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 3.18998e-06 [pre_auto_parallel]: 2.864e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00399798, [53] [py_interpret_to_execute]: 1.959e-05 [rewriter_before_opt_a]: 5.9e-05 [opt_a]: 0.00215123, [2] [Cycle 1]: 0.00152038, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.262e-05 [loop_unroll]: 2.035e-05 [a_1]: 0.00045794 [with_stream_mark]: 1.339e-05 [recompute_prepare]: 7.73001e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.59002e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 2.01e-06 [a_2]: 7.673e-05 [accelerated_algorithm]: 6.61e-06 [shard]: 2.47001e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 8.04997e-06 [auto_parallel]: 5.97999e-06 [parallel]: 2.603e-05 [flash_sp]: 7.23999e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 8.62998e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 3.73999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.64999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.038e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 1.009e-05 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.18002e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 9.22999e-06 [renormalize]: 0.00040806 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.357e-05 [cse]: 2.853e-05 [a_3]: 4.1e-05 [Cycle 2]: 0.00062123, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012617 [with_stream_mark]: 9.50001e-06 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.93e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.60002e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.737e-05 [accelerated_algorithm]: 5.35999e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.54998e-06 [auto_parallel]: 5.21998e-06 [parallel]: 4.1e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.18002e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.38998e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.17001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 3.852e-05 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.45001e-06 [a_after_grad]: 8.38001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 5.82001e-06 [cse]: 1.286e-05 [a_3]: 3.236e-05 [py_interpret_to_execute_after_opt_a]: 7.21001e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.244e-05 [convert_after_rewriter]: 6.44999e-06 [order_py_execute_after_rewriter]: 5.13002e-06 [mutable_eliminate]: 0.00044934 [opt_b]: 0.00018257, [1] [Cycle 1]: 0.00017665, [7] [b_1]: 0.00010597 [b_2]: 6.51e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 5.3001e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.655e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00041516 [opt_after_cconv]: 9.412e-05, [1] [Cycle 1]: 8.835e-05, [7] [c_1]: 2.753e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.596e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.363e-05 [tuple_transform]: 6.871e-05, [1] [Cycle 1]: 6.447e-05, [4] [d_1]: 3.933e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 5.96e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 5.227e-05 [cse_after_recomputation]: 2.046e-05, [1] [Cycle 1]: 1.59e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.99e-06 [swap_dp_allreduce_reducescatter]: 4.93001e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.19998e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.141e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.91001e-06 [overlap_recompute_and_grad_model_parallel]: 4.51002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.82001e-06 [overlap_recompute_comm]: 2.67001e-06 [overlap_grad_ring_attention]: 3.78001e-06 [overlap_grad_flash_sp]: 1.834e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.34998e-06 [symbol_engine_optimizer]: 6.663e-05, [1] [Cycle 1]: 6.26e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.128e-05 [opt_reshape]: 5.93002e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.609e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.0004482 [validate]: 3.063e-05 [backend_pass]: 7.89994e-07 [task_emit]: 0.800307 [execute]: 8.89e-06 Sums bootstrap : 0.000614s : 0.08% type_inference : 0.006131s : 0.76% event_method : 0.000014s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000584s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000049s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000408s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000041s : 0.01% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000006s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.06% optimize.opt_b.b_1 : 0.000106s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.00% optimize.loop_unroll : 0.000415s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.06% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.800307s : 98.69% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000198 30 12.73% : 0.000025s : 5: substitution.arithmetic_simplify 0.95% : 0.000002s : 2: substitution.elim_not_effective 0.64% : 0.000001s : 2: substitution.fold_const_symbol 2.74% : 0.000005s : 4: substitution.graph_param_transform 56.82% : 0.000113s : 3: substitution.inline 16.45% : 0.000033s : 4: substitution.j_node_and_user_rematch 2.08% : 0.000004s : 4: substitution.remove_not_recompute_node 1.90% : 0.000004s : 4: substitution.replace_old_param 5.69% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006083 2 90.88% : 0.005529s : 1: type_inference.infer 9.12% : 0.000555s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.76% : 0.000027s : 3: replace.inline 30.24% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.49% : 0.000111s : 3: match.inline 8.51% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.27% : 0.000000s : 4: predicate.reset_defer_inline 0.91% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.78% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.27% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 47.32% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.68% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.824215 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.41% : 0.003398s : 1: add_attr 0.41% : 0.003387s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000062s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.08% : 0.000655s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000459s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.12% : 0.000983s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000089s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000031s : 4: opt.transform.symbol_engine_opt 0.26% : 0.002154s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.06% : 0.000458s : 1: opt_after_jit_grad 0.02% : 0.000186s : 1: opt_b 0.49% : 0.004002s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000023s : 1: py_interpret_to_execute 0.00% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.03% : 0.000208s : 1: renormalize.infer 0.02% : 0.000193s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000069s : 1: symbol_engine_optimizer 97.10% : 0.800327s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.75% : 0.006146s : 1: type_inference 0.01% : 0.000053s : 1: validate TotalTime = 0.0770274, [24] [bootstrap]: 0.00050706 [type_inference]: 0.00447203 [event_method]: 1.079e-05 [auto_monad]: 5.262e-05 [graph_reusing]: 5.29e-06 [inline]: 1.75001e-06 [add_attr]: 0.00301338, [1] [add_attr_with_inline]: 0.00300537, [1] [Cycle 1]: 4.863e-05, [2] [tag_attr]: 1.243e-05 [meta_addattr_fg_expand]: 3.25998e-06 [parallel-infer-symbol]: 2.75002e-06 [pre_auto_parallel]: 2.09e-05 [insert-virtual-dataset]: 2.62001e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.72999e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00368046, [53] [py_interpret_to_execute]: 1.527e-05 [rewriter_before_opt_a]: 3.989e-05 [opt_a]: 0.00185074, [2] [Cycle 1]: 0.00125754, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 2.512e-05 [loop_unroll]: 1.383e-05 [a_1]: 0.00029337 [with_stream_mark]: 1.371e-05 [recompute_prepare]: 7.31999e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 2.37999e-06 [a_2]: 7.621e-05 [accelerated_algorithm]: 6.09001e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 8.45001e-06 [auto_parallel]: 6.54999e-06 [parallel]: 1.927e-05 [flash_sp]: 7.6e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.75999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 6.91001e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 4.34002e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 9.64999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 1.72999e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28998e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.38002e-06 [receive_attached]: 2.31998e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 8.86997e-06 [renormalize]: 0.00034292 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 2.30002e-06 [auto_monad_eliminator]: 1.371e-05 [cse]: 2.713e-05 [a_3]: 3.98e-05 [Cycle 2]: 0.00058411, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.41998e-06 [loop_unroll]: 5.33002e-06 [a_1]: 0.00012269 [with_stream_mark]: 1.086e-05 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 2.78003e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.38002e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.71e-05 [accelerated_algorithm]: 5.30001e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.24998e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.6e-06 [merge_comm]: 2.84001e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.38002e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.93998e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 4.85001e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 5.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.24e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 7.52002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 8.75001e-06 [a_after_grad]: 7.8e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 7.59988e-07 [auto_monad_eliminator]: 6.10002e-06 [cse]: 1.267e-05 [a_3]: 3.249e-05 [py_interpret_to_execute_after_opt_a]: 7.70998e-06 [slice_cell_reuse_recomputed_activation]: 2.45002e-06 [rewriter_after_opt_a]: 3.147e-05 [convert_after_rewriter]: 7.23e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.00044768 [opt_b]: 0.00018226, [1] [Cycle 1]: 0.00017616, [7] [b_1]: 0.0001088 [b_2]: 7.15e-06 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.29979e-07 [cse]: 1.601e-05 [optimize_parallel_all_gather_comm]: 1.596e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.303e-05 [loop_unroll]: 0.00043293 [opt_after_cconv]: 9.358e-05, [1] [Cycle 1]: 8.789e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.569e-05 [renormalize]: 1.60013e-07 [remove_dup_value]: 1.255e-05 [tuple_transform]: 6.976e-05, [1] [Cycle 1]: 6.572e-05, [4] [d_1]: 3.959e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.449e-05 [cse_after_recomputation]: 2.082e-05, [1] [Cycle 1]: 1.638e-05, [1] [cse]: 1.097e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.37001e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 3.04999e-06 [merge_cast_opt]: 1.50001e-06 [slice_recompute_activation]: 2.71e-06 [micro_interleaved_order_control]: 2.59001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.92002e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.25001e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.58e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.736e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.832e-05, [1] [Cycle 1]: 6.434e-05, [6] [build]: 2.51e-06 [elim_shapecalc]: 8.27998e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 8.80001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.57999e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 1.63e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.34001e-06 [opt_after_jit_grad]: 0.00044805 [validate]: 3.094e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0645416 [execute]: 8.32e-06 Sums bootstrap : 0.000507s : 0.69% type_inference : 0.004472s : 6.12% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.57% optimize.opt_a.with_stream_mark : 0.000025s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000343s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.61% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000433s : 0.59% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.61% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064542s : 88.34% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000124 26 17.93% : 0.000022s : 4: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 5.12% : 0.000006s : 4: substitution.graph_param_transform 65.68% : 0.000081s : 2: substitution.inline 2.40% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.99% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004429 2 91.49% : 0.004052s : 1: type_inference.infer 8.51% : 0.000377s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000135 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.98% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.87% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.37% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.62% : 0.000006s : 41: predicate.switch_simplify 0.82% : 0.000001s : 9: predicate.tile_eliminate 0.99% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.21% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000278 6 43.07% : 0.000120s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.93% : 0.000158s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084989 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.55% : 0.003018s : 1: add_attr 3.54% : 0.003009s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.64% : 0.000547s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000441s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.54% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.90% : 0.000764s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.18% : 0.001854s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.54% : 0.000457s : 1: opt_after_jit_grad 0.22% : 0.000186s : 1: opt_b 4.33% : 0.003684s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.22% : 0.000188s : 1: renormalize.infer 0.17% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 75.96% : 0.064558s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.28% : 0.004485s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.0784366, [24] [bootstrap]: 0.00048909 [type_inference]: 0.00562317 [event_method]: 1.422e-05 [auto_monad]: 5.744e-05 [graph_reusing]: 6.01998e-06 [inline]: 1.64e-06 [add_attr]: 0.00297661, [1] [add_attr_with_inline]: 0.00296807, [1] [Cycle 1]: 4.491e-05, [2] [tag_attr]: 1.476e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 2.72001e-06 [pre_auto_parallel]: 2.577e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 1.02998e-06 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0039912, [53] [py_interpret_to_execute]: 2.115e-05 [rewriter_before_opt_a]: 5.776e-05 [opt_a]: 0.00215387, [2] [Cycle 1]: 0.00154446, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 3.298e-05 [loop_unroll]: 2.104e-05 [a_1]: 0.00044811 [with_stream_mark]: 1.303e-05 [recompute_prepare]: 7.56001e-06 [updatestate_depend_eliminate]: 4.11001e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.93998e-06 [parameter_eliminate]: 2.16e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.44001e-06 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.70001e-06 [merge_send_recv]: 7.91001e-06 [auto_parallel]: 5.19998e-06 [parallel]: 1.819e-05 [flash_sp]: 7.61001e-06 [merge_comm]: 3.54002e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.05001e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 6.98998e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.42999e-06 [virtual_output]: 5.58002e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 9.22001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.68e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00045517 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.436e-05 [cse]: 2.84e-05 [a_3]: 4.125e-05 [Cycle 2]: 0.00059934, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.53997e-06 [a_1]: 0.00012447 [with_stream_mark]: 1.014e-05 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.21998e-06 [updatestate_loads_eliminate]: 2.36998e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.88e-05 [accelerated_algorithm]: 5.37999e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.15e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 3.14001e-06 [allreduce_fusion]: 4.4e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.30002e-06 [virtual_dataset]: 5.51e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.41998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16999e-06 [meta_fg_expand]: 1.59998e-06 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 8.70001e-07 [after_resolve]: 9.34e-06 [a_after_grad]: 8.37e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.21997e-06 [auto_monad_grad]: 1.07998e-06 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.319e-05 [a_3]: 3.347e-05 [py_interpret_to_execute_after_opt_a]: 7.55e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.176e-05 [convert_after_rewriter]: 7.01999e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00044787 [opt_b]: 0.00018212, [1] [Cycle 1]: 0.00017575, [7] [b_1]: 0.0001079 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.46e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 4.89992e-07 [cse]: 1.583e-05 [optimize_parallel_all_gather_comm]: 1.591e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.273e-05 [loop_unroll]: 0.00041553 [opt_after_cconv]: 9.502e-05, [1] [Cycle 1]: 8.931e-05, [7] [c_1]: 2.823e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.575e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.34e-05 [tuple_transform]: 6.829e-05, [1] [Cycle 1]: 6.364e-05, [4] [d_1]: 3.859e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.06998e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.426e-05 [cse_after_recomputation]: 1.95e-05, [1] [Cycle 1]: 1.514e-05, [1] [cse]: 1.013e-05 [environ_conv]: 5.03002e-06 [swap_dp_allreduce_reducescatter]: 4.87e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.86999e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.74001e-06 [micro_interleaved_order_control]: 2.68e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.04003e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.141e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.36001e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 4.38001e-06 [overlap_grad_flash_sp]: 1.855e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.30002e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.07998e-06 [symbol_engine_optimizer]: 6.811e-05, [1] [Cycle 1]: 6.392e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.27998e-06 [elim_not_effective]: 1.12e-05 [opt_reshape]: 6.04999e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 1.22999e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00045073 [validate]: 3.133e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0645245 [execute]: 9.39e-06 Sums bootstrap : 0.000489s : 0.66% type_inference : 0.005623s : 7.55% event_method : 0.000014s : 0.02% auto_monad : 0.000057s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000573s : 0.77% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000010s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000008s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000455s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.60% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000416s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000451s : 0.61% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064524s : 86.62% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.73% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 4: substitution.graph_param_transform 66.39% : 0.000109s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.57% : 0.000004s : 4: substitution.remove_not_recompute_node 2.71% : 0.000004s : 4: substitution.replace_old_param 6.83% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005581 2 90.07% : 0.005026s : 1: type_inference.infer 9.93% : 0.000554s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.69% : 0.000027s : 3: replace.inline 30.31% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.40% : 0.000107s : 3: match.inline 8.60% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 2.04% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.36% : 0.000001s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.53% : 0.000002s : 16: predicate.partial_defer_inline 1.54% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000002s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 48.38% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.62% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.086954 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.43% : 0.002981s : 1: add_attr 3.42% : 0.002972s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000062s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000528s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.08% : 0.000941s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.48% : 0.002157s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.53% : 0.000460s : 1: opt_after_jit_grad 0.21% : 0.000186s : 1: opt_b 4.59% : 0.003995s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.29% : 0.000249s : 1: renormalize.infer 0.23% : 0.000199s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 74.22% : 0.064539s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 6.48% : 0.005637s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.848758, [24] [bootstrap]: 0.00052486 [type_inference]: 0.0114864 [event_method]: 5.038e-05 [auto_monad]: 0.00012199 [graph_reusing]: 8.27998e-06 [inline]: 1.97999e-06 [add_attr]: 0.00302728, [1] [add_attr_with_inline]: 0.00301888, [1] [Cycle 1]: 7.192e-05, [2] [tag_attr]: 3.51e-05 [meta_addattr_fg_expand]: 9.46e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 5.096e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0133454, [53] [py_interpret_to_execute]: 3.796e-05 [rewriter_before_opt_a]: 0.00014553 [opt_a]: 0.0110924, [3] [Cycle 1]: 0.0071244, [45] [expand_dump_flag]: 3.55e-06 [switch_simplify]: 7.381e-05 [loop_unroll]: 6.161e-05 [a_1]: 0.0014509 [with_stream_mark]: 2.291e-05 [recompute_prepare]: 2.149e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 8.27998e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 2.61e-06 [a_2]: 0.00024425 [accelerated_algorithm]: 3.097e-05 [shard]: 2.29999e-06 [meta_shard_fg_expand]: 3.16001e-06 [shard_inline]: 1.61e-05 [merge_send_recv]: 1.624e-05 [auto_parallel]: 1.06e-05 [parallel]: 1.877e-05 [flash_sp]: 1.182e-05 [merge_comm]: 9.50001e-06 [allreduce_fusion]: 8.74e-06 [matmul_add_comm_reduction]: 2.613e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 1.791e-05 [virtual_dataset]: 1.596e-05 [get_grad_eliminate_]: 1.544e-05 [virtual_output]: 1.573e-05 [merge_forward]: 9.69e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.781e-05 [cell_reuse_handle_not_recompute_node_pass]: 5.475e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 2.723e-05 [set_forward_comm_id_for_comm_node_pass]: 9.61e-06 [meta_fg_expand]: 0.00139735 [flash_sp_send_recv_attached]: 3.88999e-06 [receive_attached]: 2.83998e-06 [after_resolve]: 5.953e-05 [a_after_grad]: 8.136e-05 [renormalize]: 0.00247229 [add_forward_monad_depend]: 9.51e-06 [auto_monad_grad]: 5.33002e-06 [auto_monad_eliminator]: 5.572e-05 [cse]: 0.00017098 [a_3]: 0.00033925 [Cycle 2]: 0.00302301, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 4.779e-05 [loop_unroll]: 4.397e-05 [a_1]: 0.00153822 [with_stream_mark]: 1.153e-05 [recompute_prepare]: 1.138e-05 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.32003e-06 [updatestate_loads_eliminate]: 3.56001e-06 [parameter_eliminate]: 1.18001e-06 [a_2]: 0.00012605 [accelerated_algorithm]: 1.203e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.88997e-06 [shard_inline]: 9.34998e-06 [merge_send_recv]: 6.54999e-06 [auto_parallel]: 6.97002e-06 [parallel]: 4.92e-06 [flash_sp]: 3.08e-06 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.63001e-06 [matmul_add_comm_reduction]: 7.68999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.029e-05 [virtual_dataset]: 9.09e-06 [get_grad_eliminate_]: 8.87e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.67e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 8.85999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.67e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.462e-05 [set_forward_comm_id_for_comm_node_pass]: 5.50001e-06 [meta_fg_expand]: 6.971e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.19e-06 [after_resolve]: 1.614e-05 [a_after_grad]: 1.444e-05 [renormalize]: 0.00059984 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.426e-05 [cse]: 4.655e-05 [a_3]: 6.536e-05 [Cycle 3]: 0.00093104, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.072e-05 [loop_unroll]: 8.80999e-06 [a_1]: 0.00027702 [with_stream_mark]: 1.013e-05 [recompute_prepare]: 9.42999e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 3.75998e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00012312 [accelerated_algorithm]: 1.176e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 6.96001e-06 [auto_parallel]: 6.99001e-06 [parallel]: 4.67e-06 [flash_sp]: 1.02e-06 [merge_comm]: 4.84e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.68999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.94001e-06 [virtual_dataset]: 8.65999e-06 [get_grad_eliminate_]: 8.42998e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.08999e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.50999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.614e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.467e-05 [set_forward_comm_id_for_comm_node_pass]: 5.96003e-06 [meta_fg_expand]: 2.91999e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.415e-05 [a_after_grad]: 1.396e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.30999e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 1.056e-05 [cse]: 2.666e-05 [a_3]: 6.042e-05 [py_interpret_to_execute_after_opt_a]: 1.031e-05 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 4.684e-05 [convert_after_rewriter]: 9.35001e-06 [order_py_execute_after_rewriter]: 7.38e-06 [mutable_eliminate]: 0.00045611 [opt_b]: 0.00028684, [1] [Cycle 1]: 0.00028074, [7] [b_1]: 0.00018856 [b_2]: 1.066e-05 [updatestate_depend_eliminate]: 7.02002e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 4.1e-06 [renormalize]: 4.30009e-07 [cse]: 3.129e-05 [optimize_parallel_all_gather_comm]: 2.133e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.029e-05 [loop_unroll]: 0.00042078 [opt_after_cconv]: 0.00013759, [1] [Cycle 1]: 0.00013159, [7] [c_1]: 4.863e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 7.38999e-06 [updatestate_assign_eliminate]: 4.12998e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 2.985e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 2.962e-05 [tuple_transform]: 0.00010146, [1] [Cycle 1]: 9.677e-05, [4] [d_1]: 6.673e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.96e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 5.622e-05 [cse_after_recomputation]: 3.118e-05, [1] [Cycle 1]: 2.655e-05, [1] [cse]: 2.13e-05 [environ_conv]: 9.12001e-06 [swap_dp_allreduce_reducescatter]: 8.03001e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.35001e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.723e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 4.84e-06 [overlap_recompute_and_grad_model_parallel]: 5.57001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 5.32001e-06 [overlap_grad_flash_sp]: 2.496e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.18002e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 9.684e-05, [1] [Cycle 1]: 9.279e-05, [6] [build]: 9.91e-06 [elim_shapecalc]: 1.292e-05 [elim_not_effective]: 1.777e-05 [opt_reshape]: 1.015e-05 [fold_const_symbol]: 1.499e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.76998e-06 [auto_monad_reorder]: 2.545e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00047012 [validate]: 4.616e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.819357 [execute]: 8.72e-06 Sums bootstrap : 0.000525s : 0.06% type_inference : 0.011486s : 1.36% event_method : 0.000050s : 0.01% auto_monad : 0.000122s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000146s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.02% optimize.opt_a.loop_unroll : 0.000114s : 0.01% optimize.opt_a.a_1 : 0.003266s : 0.39% optimize.opt_a.with_stream_mark : 0.000045s : 0.01% optimize.opt_a.recompute_prepare : 0.000042s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000493s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000025s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000019s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000088s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001470s : 0.17% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.01% optimize.opt_a.a_after_grad : 0.000110s : 0.01% optimize.opt_a.renormalize : 0.003072s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000008s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.01% optimize.opt_a.cse : 0.000244s : 0.03% optimize.opt_a.a_3 : 0.000465s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000456s : 0.05% optimize.opt_b.b_1 : 0.000189s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000421s : 0.05% optimize.opt_after_cconv.c_1 : 0.000049s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.00% optimize.tuple_transform.d_1 : 0.000067s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.01% optimize.cse_after_recomputation.cse : 0.000021s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000470s : 0.06% validate : 0.000046s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.819357s : 97.03% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000767 222 5.87% : 0.000045s : 12: substitution.arithmetic_simplify 1.78% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000003s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.95% : 0.000429s : 17: substitution.inline 2.07% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000014s : 11: substitution.minmaximum_grad 0.65% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.08% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.29% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.57% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011410 2 86.83% : 0.009907s : 1: type_inference.infer 13.17% : 0.001503s : 1: type_inference.specialize ------[replace.] 0.000219 33 58.03% : 0.000127s : 17: replace.inline 41.97% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000455 33 92.44% : 0.000420s : 17: match.inline 7.56% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000774 5764 1.05% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.01% : 0.000008s : 68: predicate.addn_zero_filter 1.02% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.94% : 0.000015s : 100: predicate.arithmetic_simplify 1.10% : 0.000009s : 68: predicate.cast_eliminate 1.09% : 0.000008s : 68: predicate.check_bprop_eliminate 0.49% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_depend_swap 1.68% : 0.000013s : 108: predicate.environ_get_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.67% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.20% : 0.000017s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.41% : 0.000042s : 249: predicate.inline 1.22% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.57% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.36% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.08% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.92% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.02% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.24% : 0.000010s : 68: predicate.reduce_eliminate 2.62% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 152: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.23% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.20% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.09% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000014s : 101: predicate.switch_defer_inline 2.84% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.83% : 0.000037s : 277: predicate.switch_simplify 4.52% : 0.000035s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.41% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.55% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.16% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.53% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001593 34 57.11% : 0.000910s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.89% : 0.000683s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.873464 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.35% : 0.003032s : 1: add_attr 0.35% : 0.003023s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000129s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000561s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000034s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000058s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000465s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.57% : 0.004937s : 117: opt.transform.opt_a 0.01% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000174s : 28: opt.transform.opt_b 0.01% : 0.000074s : 2: opt.transform.opt_trans_graph 0.01% : 0.000053s : 4: opt.transform.symbol_engine_opt 1.27% : 0.011095s : 1: opt_a 0.02% : 0.000141s : 1: opt_after_cconv 0.05% : 0.000479s : 1: opt_after_jit_grad 0.03% : 0.000291s : 1: opt_b 1.53% : 0.013349s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000056s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000034s : 1: remove_dup_value 0.19% : 0.001624s : 2: renormalize.infer 0.16% : 0.001435s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000050s : 1: rewriter_after_opt_a 0.02% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000099s : 1: symbol_engine_optimizer 93.81% : 0.819375s : 1: task_emit 0.01% : 0.000104s : 1: tuple_transform 1.32% : 0.011502s : 1: type_inference 0.01% : 0.000072s : 1: validate TotalTime = 0.0765849, [24] [bootstrap]: 0.00046941 [type_inference]: 0.00436396 [event_method]: 1.089e-05 [auto_monad]: 5.312e-05 [graph_reusing]: 5.13002e-06 [inline]: 2.24001e-06 [add_attr]: 0.00299238, [1] [add_attr_with_inline]: 0.00298473, [1] [Cycle 1]: 4.517e-05, [2] [tag_attr]: 1.237e-05 [meta_addattr_fg_expand]: 3.28998e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.145e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.99999e-06 [optimize]: 0.00369182, [53] [py_interpret_to_execute]: 1.498e-05 [rewriter_before_opt_a]: 3.875e-05 [opt_a]: 0.00186708, [2] [Cycle 1]: 0.00126676, [45] [expand_dump_flag]: 2.82002e-06 [switch_simplify]: 2.461e-05 [loop_unroll]: 1.372e-05 [a_1]: 0.00029875 [with_stream_mark]: 1.377e-05 [recompute_prepare]: 6.89999e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.42997e-06 [updatestate_loads_eliminate]: 3.2e-06 [parameter_eliminate]: 1.99e-06 [a_2]: 7.669e-05 [accelerated_algorithm]: 6.16e-06 [shard]: 2.68998e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 8.45001e-06 [auto_parallel]: 6.11998e-06 [parallel]: 1.829e-05 [flash_sp]: 7.58999e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.25999e-06 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 6.83e-06 [virtual_dataset]: 5.76998e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.052e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.75001e-06 [before_grad]: 9.51e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 3.15002e-06 [receive_attached]: 2.78e-06 [after_resolve]: 1.008e-05 [a_after_grad]: 8.94e-06 [renormalize]: 0.00034512 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.316e-05 [cse]: 2.745e-05 [a_3]: 3.965e-05 [Cycle 2]: 0.00059122, [45] [expand_dump_flag]: 9.40025e-07 [switch_simplify]: 6.70002e-06 [loop_unroll]: 5.66e-06 [a_1]: 0.00012462 [with_stream_mark]: 9.27001e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 6.85e-05 [accelerated_algorithm]: 5.91998e-06 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.15999e-06 [auto_parallel]: 5.39e-06 [parallel]: 3.83999e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.09998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 5.93998e-06 [virtual_dataset]: 5.34998e-06 [get_grad_eliminate_]: 5.20001e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.34999e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 6.05002e-06 [cell_reuse_handle_not_recompute_node_pass]: 8.92999e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 2.92002e-06 [meta_fg_expand]: 1.61998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.28002e-06 [a_after_grad]: 8.39002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 5.84e-06 [cse]: 1.289e-05 [a_3]: 3.192e-05 [py_interpret_to_execute_after_opt_a]: 7.19001e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.236e-05 [convert_after_rewriter]: 6.75002e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00044465 [opt_b]: 0.00019675, [1] [Cycle 1]: 0.00019088, [7] [b_1]: 0.00012169 [b_2]: 7.25998e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 8.09989e-07 [cse]: 1.695e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.304e-05 [loop_unroll]: 0.00041418 [opt_after_cconv]: 9.401e-05, [1] [Cycle 1]: 8.841e-05, [7] [c_1]: 2.727e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.07999e-06 [cse]: 1.621e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.283e-05 [tuple_transform]: 6.994e-05, [1] [Cycle 1]: 6.568e-05, [4] [d_1]: 4.005e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.571e-05 [cse_after_recomputation]: 2.017e-05, [1] [Cycle 1]: 1.586e-05, [1] [cse]: 1.071e-05 [environ_conv]: 5.20001e-06 [swap_dp_allreduce_reducescatter]: 5.76e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.85999e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.69001e-06 [micro_interleaved_order_control]: 2.95998e-06 [assign_add_opt]: 1.14e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.45999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.05002e-06 [control_data_broadcast_order]: 1.148e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 4.82e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67999e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 4.30999e-06 [overlap_grad_flash_sp]: 1.753e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.44e-06 [symbol_engine_optimizer]: 6.738e-05, [1] [Cycle 1]: 6.335e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 7.92998e-06 [elim_not_effective]: 1.132e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.98002e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.637e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00044326 [validate]: 3.104e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.0642579 [execute]: 8.1e-06 Sums bootstrap : 0.000469s : 0.65% type_inference : 0.004364s : 6.01% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000423s : 0.58% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000345s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.61% optimize.opt_b.b_1 : 0.000122s : 0.17% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000414s : 0.57% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000443s : 0.61% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064258s : 88.47% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000124 26 18.18% : 0.000023s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.65% : 0.000006s : 4: substitution.graph_param_transform 66.16% : 0.000082s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.27% : 0.000004s : 4: substitution.remove_not_recompute_node 2.92% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004322 2 91.96% : 0.003975s : 1: type_inference.infer 8.04% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000137 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 1.01% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.51% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.56% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.84% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.96% : 0.000001s : 9: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 1.04% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 42.52% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.48% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084541 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.54% : 0.002997s : 1: add_attr 3.53% : 0.002988s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.60% : 0.000505s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.50% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.54% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.91% : 0.000773s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.21% : 0.001870s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.54% : 0.000453s : 1: opt_after_jit_grad 0.24% : 0.000200s : 1: opt_b 4.37% : 0.003696s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000191s : 1: renormalize.infer 0.17% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.03% : 0.064274s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.18% : 0.004378s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.118717, [24] [bootstrap]: 0.00058252 [type_inference]: 0.0103313 [event_method]: 4.252e-05 [auto_monad]: 0.00012153 [graph_reusing]: 8.53001e-06 [inline]: 1.91e-06 [add_attr]: 0.00303392, [1] [add_attr_with_inline]: 0.00302387, [1] [Cycle 1]: 6.622e-05, [2] [tag_attr]: 3.126e-05 [meta_addattr_fg_expand]: 8.47e-06 [parallel-infer-symbol]: 2.59999e-06 [pre_auto_parallel]: 4.574e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.01328, [53] [py_interpret_to_execute]: 3.603e-05 [rewriter_before_opt_a]: 0.00012939 [opt_a]: 0.0110176, [3] [Cycle 1]: 0.00709792, [45] [expand_dump_flag]: 3.98001e-06 [switch_simplify]: 6.747e-05 [loop_unroll]: 5.559e-05 [a_1]: 0.00135535 [with_stream_mark]: 2.668e-05 [recompute_prepare]: 2.17e-05 [updatestate_depend_eliminate]: 8.91002e-06 [updatestate_assign_eliminate]: 7.97e-06 [updatestate_loads_eliminate]: 7.21999e-06 [parameter_eliminate]: 3.04001e-06 [a_2]: 0.00027103 [accelerated_algorithm]: 3.124e-05 [shard]: 1.98002e-06 [meta_shard_fg_expand]: 3.33998e-06 [shard_inline]: 1.611e-05 [merge_send_recv]: 1.674e-05 [auto_parallel]: 1.095e-05 [parallel]: 1.867e-05 [flash_sp]: 1.139e-05 [merge_comm]: 9.25999e-06 [allreduce_fusion]: 8.97e-06 [matmul_add_comm_reduction]: 2.729e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.814e-05 [virtual_dataset]: 1.567e-05 [get_grad_eliminate_]: 1.547e-05 [virtual_output]: 1.509e-05 [merge_forward]: 9.22999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.845e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.882e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 2.747e-05 [set_forward_comm_id_for_comm_node_pass]: 9.56e-06 [meta_fg_expand]: 0.00140834 [flash_sp_send_recv_attached]: 4.00998e-06 [receive_attached]: 2.97002e-06 [after_resolve]: 5.99e-05 [a_after_grad]: 8.086e-05 [renormalize]: 0.00254013 [add_forward_monad_depend]: 9.36e-06 [auto_monad_grad]: 6.01998e-06 [auto_monad_eliminator]: 5.591e-05 [cse]: 0.00017025 [a_3]: 0.00033954 [Cycle 2]: 0.00299936, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 4.791e-05 [loop_unroll]: 4.487e-05 [a_1]: 0.00155267 [with_stream_mark]: 1.225e-05 [recompute_prepare]: 1.069e-05 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 3.55e-06 [parameter_eliminate]: 1.12999e-06 [a_2]: 0.00012793 [accelerated_algorithm]: 1.246e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 6.81001e-06 [auto_parallel]: 7.26999e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 5.35001e-06 [allreduce_fusion]: 4.77998e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.027e-05 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.65999e-06 [virtual_output]: 8.84998e-06 [merge_forward]: 4.97e-06 [cell_reuse_recompute_pass]: 9.49978e-07 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.678e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.407e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34e-06 [meta_fg_expand]: 3.522e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.492e-05 [a_after_grad]: 1.398e-05 [renormalize]: 0.00059887 [add_forward_monad_depend]: 3.95e-06 [auto_monad_grad]: 1.09003e-06 [auto_monad_eliminator]: 1.432e-05 [cse]: 4.695e-05 [a_3]: 6.569e-05 [Cycle 3]: 0.00090679, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.103e-05 [loop_unroll]: 9.09e-06 [a_1]: 0.00025175 [with_stream_mark]: 1.044e-05 [recompute_prepare]: 9.36e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 4.03001e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 0.00012502 [accelerated_algorithm]: 1.177e-05 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 6.79001e-06 [auto_parallel]: 7.08e-06 [parallel]: 4.79e-06 [flash_sp]: 1.15001e-06 [merge_comm]: 4.94e-06 [allreduce_fusion]: 4.85999e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 9.84999e-06 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.60001e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 8.47e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.63e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.442e-05 [set_forward_comm_id_for_comm_node_pass]: 5.16998e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.379e-05 [a_after_grad]: 1.525e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 1.11e-05 [cse]: 2.619e-05 [a_3]: 5.997e-05 [py_interpret_to_execute_after_opt_a]: 1.021e-05 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 5.235e-05 [convert_after_rewriter]: 9.41998e-06 [order_py_execute_after_rewriter]: 6.63e-06 [mutable_eliminate]: 0.00045794 [opt_b]: 0.00028976, [1] [Cycle 1]: 0.00028377, [7] [b_1]: 0.00019122 [b_2]: 1.095e-05 [updatestate_depend_eliminate]: 7.01001e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.96001e-06 [renormalize]: 3.59985e-07 [cse]: 3.147e-05 [optimize_parallel_all_gather_comm]: 2.453e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.168e-05 [loop_unroll]: 0.00042083 [opt_after_cconv]: 0.000138, [1] [Cycle 1]: 0.00013197, [7] [c_1]: 5.015e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 7.24001e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.74002e-06 [cse]: 3.024e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 2.805e-05 [tuple_transform]: 0.00010158, [1] [Cycle 1]: 9.699e-05, [4] [d_1]: 6.747e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.79999e-06 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 5.867e-05 [cse_after_recomputation]: 3.3e-05, [1] [Cycle 1]: 2.831e-05, [1] [cse]: 2.257e-05 [environ_conv]: 8.82e-06 [swap_dp_allreduce_reducescatter]: 7.8e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 4.19997e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.46998e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.06998e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.35001e-06 [full_micro_interleaved_order_control]: 2.53e-06 [reorder_send_recv_between_fp_bp]: 3.05002e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.762e-05 [grouped_pairwise_exchange_alltoall]: 2.07001e-06 [offloading_packed_experts]: 5.20001e-06 [overlap_recompute_and_grad_model_parallel]: 5.95002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.68998e-06 [overlap_grad_ring_attention]: 5.10001e-06 [overlap_grad_flash_sp]: 2.367e-05 [begin_end_overlap_inline]: 7.09988e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 9.862e-05, [1] [Cycle 1]: 9.444e-05, [6] [build]: 1.014e-05 [elim_shapecalc]: 1.355e-05 [elim_not_effective]: 1.835e-05 [opt_reshape]: 1.007e-05 [fold_const_symbol]: 1.487e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.57999e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 2.569e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00046986 [validate]: 4.764e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.0904876 [execute]: 9.04e-06 Sums bootstrap : 0.000583s : 0.51% type_inference : 0.010331s : 9.03% event_method : 0.000043s : 0.04% auto_monad : 0.000122s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000129s : 0.11% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.11% optimize.opt_a.loop_unroll : 0.000110s : 0.10% optimize.opt_a.a_1 : 0.003160s : 2.76% optimize.opt_a.with_stream_mark : 0.000049s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000524s : 0.46% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001447s : 1.26% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.10% optimize.opt_a.renormalize : 0.003139s : 2.74% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.07% optimize.opt_a.cse : 0.000243s : 0.21% optimize.opt_a.a_3 : 0.000465s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000052s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.40% optimize.opt_b.b_1 : 0.000191s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000421s : 0.37% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000470s : 0.41% validate : 0.000048s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.090488s : 79.08% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000732 218 5.95% : 0.000044s : 11: substitution.arithmetic_simplify 1.93% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.11% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 54.57% : 0.000399s : 16: substitution.inline 2.13% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.81% : 0.000013s : 20: substitution.remove_not_recompute_node 3.25% : 0.000024s : 10: substitution.replace_applicator 1.53% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.81% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.38% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.53% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010259 2 87.00% : 0.008926s : 1: type_inference.infer 13.00% : 0.001333s : 1: type_inference.specialize ------[replace.] 0.000202 30 59.23% : 0.000120s : 16: replace.inline 40.77% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000422 30 92.77% : 0.000391s : 16: match.inline 7.23% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000757 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.99% : 0.000015s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.63% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 3.27% : 0.000025s : 32: predicate.incorporate_call_switch 5.47% : 0.000041s : 244: predicate.inline 1.21% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.57% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.59% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.11% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.34% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.94% : 0.000015s : 97: predicate.partial_defer_inline 1.64% : 0.000012s : 89: predicate.partial_eliminate 1.04% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000010s : 67: predicate.reduce_eliminate 2.59% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 149: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 97: predicate.switch_defer_inline 2.87% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.77% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.41% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.71% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.38% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.56% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.57% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.17% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001571 32 55.50% : 0.000872s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.50% : 0.000699s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.143343 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.12% : 0.003038s : 1: add_attr 2.11% : 0.003028s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000129s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.43% : 0.000619s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000049s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.30% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.38% : 0.004847s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000177s : 28: opt.transform.opt_b 0.05% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 7.69% : 0.011020s : 1: opt_a 0.10% : 0.000142s : 1: opt_after_cconv 0.33% : 0.000479s : 1: opt_after_jit_grad 0.20% : 0.000293s : 1: opt_b 9.27% : 0.013284s : 1: optimize 0.02% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 1.15% : 0.001644s : 2: renormalize.infer 1.03% : 0.001482s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000056s : 1: rewriter_after_opt_a 0.09% : 0.000134s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000101s : 1: symbol_engine_optimizer 63.14% : 0.090504s : 1: task_emit 0.07% : 0.000105s : 1: tuple_transform 7.22% : 0.010347s : 1: type_inference 0.05% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x6-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x7-pynative],max_mem:10.0M TotalTime = 0.0216656, [24] [bootstrap]: 0.00052169 [type_inference]: 0.00620887 [event_method]: 1.428e-05 [auto_monad]: 5.851e-05 [graph_reusing]: 5.39e-06 [inline]: 1.76e-06 [add_attr]: 0.0033768, [1] [add_attr_with_inline]: 0.00336637, [1] [Cycle 1]: 4.408e-05, [2] [tag_attr]: 1.517e-05 [meta_addattr_fg_expand]: 4.2e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.773e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 2.29999e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00402482, [53] [py_interpret_to_execute]: 2.067e-05 [rewriter_before_opt_a]: 6.038e-05 [opt_a]: 0.00218416, [2] [Cycle 1]: 0.00158434, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.194e-05 [loop_unroll]: 2.088e-05 [a_1]: 0.00051496 [with_stream_mark]: 1.318e-05 [recompute_prepare]: 8.09002e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.526e-05 [accelerated_algorithm]: 6.67002e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.60001e-06 [auto_parallel]: 5.81998e-06 [parallel]: 2.458e-05 [flash_sp]: 6.74001e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.20999e-06 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 6.81001e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.46002e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.47001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.44003e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.96999e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 9.05001e-06 [renormalize]: 0.00041875 [add_forward_monad_depend]: 5.20001e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.346e-05 [cse]: 2.913e-05 [a_3]: 4.071e-05 [Cycle 2]: 0.00059048, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012526 [with_stream_mark]: 9.96e-06 [recompute_prepare]: 5.73997e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.677e-05 [accelerated_algorithm]: 5.41002e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 5.05999e-06 [parallel]: 3.92998e-06 [flash_sp]: 3.21001e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.98e-06 [matmul_add_comm_reduction]: 4.82e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 5.89e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.37001e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47999e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.93001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.80001e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 7.60017e-07 [auto_monad_eliminator]: 5.79e-06 [cse]: 1.36e-05 [a_3]: 3.282e-05 [py_interpret_to_execute_after_opt_a]: 7.55e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 2.941e-05 [convert_after_rewriter]: 7.12002e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.00044443 [opt_b]: 0.00018284, [1] [Cycle 1]: 0.00017687, [7] [b_1]: 0.00010798 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 5.39992e-07 [cse]: 1.735e-05 [optimize_parallel_all_gather_comm]: 1.675e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.216e-05 [loop_unroll]: 0.00041291 [opt_after_cconv]: 9.435e-05, [1] [Cycle 1]: 8.876e-05, [7] [c_1]: 2.791e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.647e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.338e-05 [tuple_transform]: 6.91e-05, [1] [Cycle 1]: 6.467e-05, [4] [d_1]: 3.932e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.92999e-06 [partial_unused_args_eliminate]: 1.71002e-06 [add_recomputation]: 5.082e-05 [cse_after_recomputation]: 2.061e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.092e-05 [environ_conv]: 4.82998e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.25002e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 1.99e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.10999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71998e-06 [control_data_broadcast_order]: 1.167e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.726e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.963e-05, [1] [Cycle 1]: 6.552e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 8.97e-06 [elim_not_effective]: 1.21e-05 [opt_reshape]: 6.29999e-06 [fold_const_symbol]: 8.94e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 1.578e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 0.00012735 [opt_after_jit_grad]: 0.00048793 [validate]: 3.126e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00653408 [execute]: 6.86001e-06 Sums bootstrap : 0.000522s : 3.01% type_inference : 0.006209s : 35.85% event_method : 0.000014s : 0.08% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000640s : 3.70% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000419s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.57% optimize.opt_b.b_1 : 0.000108s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000413s : 2.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000127s : 0.74% opt_after_jit_grad : 0.000488s : 2.82% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006534s : 37.73% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000223 30 11.28% : 0.000025s : 5: substitution.arithmetic_simplify 0.96% : 0.000002s : 2: substitution.elim_not_effective 0.57% : 0.000001s : 2: substitution.fold_const_symbol 2.44% : 0.000005s : 4: substitution.graph_param_transform 74.65% : 0.000167s : 3: substitution.inline 1.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 1.93% : 0.000004s : 4: substitution.remove_not_recompute_node 1.70% : 0.000004s : 4: substitution.replace_old_param 5.14% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006165 2 89.94% : 0.005545s : 1: type_inference.infer 10.06% : 0.000620s : 1: type_inference.specialize ------[replace.] 0.000040 5 72.53% : 0.000029s : 3: replace.inline 27.47% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000175 5 94.02% : 0.000165s : 3: match.inline 5.98% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.35% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.44% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.83% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.92% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.88% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000377 8 47.31% : 0.000179s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.69% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030643 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.03% : 0.003381s : 1: add_attr 11.00% : 0.003370s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.82% : 0.000559s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.48% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.28% : 0.001004s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002187s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.62% : 0.000497s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.15% : 0.004029s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.70% : 0.000213s : 1: renormalize.infer 0.65% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.43% : 0.000133s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 21.36% : 0.006544s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.31% : 0.006224s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0185426, [24] [bootstrap]: 0.00045705 [type_inference]: 0.00440656 [event_method]: 1.04e-05 [auto_monad]: 5.411e-05 [graph_reusing]: 5.10001e-06 [inline]: 1.75001e-06 [add_attr]: 0.00293949, [1] [add_attr_with_inline]: 0.00293173, [1] [Cycle 1]: 4.568e-05, [2] [tag_attr]: 1.219e-05 [meta_addattr_fg_expand]: 3.31999e-06 [parallel-infer-symbol]: 2.80997e-06 [pre_auto_parallel]: 2.2e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00368274, [53] [py_interpret_to_execute]: 1.494e-05 [rewriter_before_opt_a]: 4.133e-05 [opt_a]: 0.00188711, [2] [Cycle 1]: 0.0012852, [45] [expand_dump_flag]: 3.14001e-06 [switch_simplify]: 2.51e-05 [loop_unroll]: 1.349e-05 [a_1]: 0.00029161 [with_stream_mark]: 1.323e-05 [recompute_prepare]: 7.56999e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.40998e-06 [updatestate_loads_eliminate]: 2.85998e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 7.79e-05 [accelerated_algorithm]: 6.54001e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.74e-06 [auto_parallel]: 5.91e-06 [parallel]: 1.769e-05 [flash_sp]: 7.2e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 9.04998e-06 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 7.12002e-06 [virtual_dataset]: 5.68002e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.46e-06 [merge_forward]: 3.91001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 9.71998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.099e-05 [a_after_grad]: 8.79e-06 [renormalize]: 0.00033876 [add_forward_monad_depend]: 4.92e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.326e-05 [cse]: 2.815e-05 [a_3]: 3.988e-05 [Cycle 2]: 0.00059299, [45] [expand_dump_flag]: 8.00006e-07 [switch_simplify]: 6.63998e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012388 [with_stream_mark]: 1.111e-05 [recompute_prepare]: 6.02999e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.18998e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 6.787e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 4.97999e-06 [parallel]: 4.52998e-06 [flash_sp]: 3.53999e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.26002e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 5.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.80001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 8.95999e-06 [a_after_grad]: 8.33001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.07001e-06 [cse]: 1.351e-05 [a_3]: 3.254e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.109e-05 [convert_after_rewriter]: 7.06999e-06 [order_py_execute_after_rewriter]: 4.91002e-06 [mutable_eliminate]: 0.00044349 [opt_b]: 0.00017965, [1] [Cycle 1]: 0.00017344, [7] [b_1]: 0.00010696 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.59985e-07 [cse]: 1.62e-05 [optimize_parallel_all_gather_comm]: 1.569e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.29e-05 [loop_unroll]: 0.00040942 [opt_after_cconv]: 9.493e-05, [1] [Cycle 1]: 8.92e-05, [7] [c_1]: 2.845e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.605e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 6.879e-05, [1] [Cycle 1]: 6.439e-05, [4] [d_1]: 3.891e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.38e-05 [cse_after_recomputation]: 1.945e-05, [1] [Cycle 1]: 1.511e-05, [1] [cse]: 1.024e-05 [environ_conv]: 4.80999e-06 [swap_dp_allreduce_reducescatter]: 5.42999e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.25999e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.29998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.05002e-06 [control_data_broadcast_order]: 1.171e-05 [grouped_pairwise_exchange_alltoall]: 1.66998e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.12e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.71e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.739e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.69999e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 9.10019e-07 [symbol_engine_optimizer]: 6.802e-05, [1] [Cycle 1]: 6.386e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.151e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.71e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.25e-06 [opt_after_jit_grad]: 0.00044868 [validate]: 3.12e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00624327 [execute]: 7.75e-06 Sums bootstrap : 0.000457s : 3.13% type_inference : 0.004407s : 30.14% event_method : 0.000010s : 0.07% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000041s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.84% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000042s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000443s : 3.03% optimize.opt_b.b_1 : 0.000107s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000409s : 2.80% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.12% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.07% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006243s : 42.71% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000122 26 19.00% : 0.000023s : 4: substitution.arithmetic_simplify 1.71% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.13% : 0.000005s : 4: substitution.graph_param_transform 65.28% : 0.000080s : 2: substitution.inline 2.41% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.90% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004365 2 91.65% : 0.004000s : 1: type_inference.infer 8.35% : 0.000364s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.99% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.10% : 0.000001s : 13: predicate.environ_get_depend_swap 1.97% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.14% : 0.000002s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.35% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.80% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.95% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.13% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 43.12% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.88% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026431 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.14% : 0.002944s : 1: add_attr 11.11% : 0.002935s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.08% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.88% : 0.000496s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000418s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000452s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.91% : 0.000768s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.15% : 0.001890s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.73% : 0.000458s : 1: opt_after_jit_grad 0.69% : 0.000183s : 1: opt_b 13.95% : 0.003687s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000186s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000046s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.66% : 0.006253s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.72% : 0.004420s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0197648, [24] [bootstrap]: 0.00048022 [type_inference]: 0.0055538 [event_method]: 1.457e-05 [auto_monad]: 5.712e-05 [graph_reusing]: 5.91998e-06 [inline]: 2.31e-06 [add_attr]: 0.0029915, [1] [add_attr_with_inline]: 0.00298349, [1] [Cycle 1]: 4.599e-05, [2] [tag_attr]: 1.537e-05 [meta_addattr_fg_expand]: 4.36002e-06 [parallel-infer-symbol]: 3.25002e-06 [pre_auto_parallel]: 2.442e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00393874, [53] [py_interpret_to_execute]: 1.865e-05 [rewriter_before_opt_a]: 5.839e-05 [opt_a]: 0.00211658, [2] [Cycle 1]: 0.0015121, [45] [expand_dump_flag]: 2.93998e-06 [switch_simplify]: 3.178e-05 [loop_unroll]: 2.057e-05 [a_1]: 0.00044331 [with_stream_mark]: 1.314e-05 [recompute_prepare]: 7.56001e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.548e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 2.03002e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.23e-06 [merge_send_recv]: 8.49998e-06 [auto_parallel]: 5.76e-06 [parallel]: 1.825e-05 [flash_sp]: 7.1e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 8.80001e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.03e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.93998e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.03002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.044e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 9.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.45001e-06 [renormalize]: 0.00040068 [add_forward_monad_depend]: 4.67998e-06 [auto_monad_grad]: 2.12999e-06 [auto_monad_eliminator]: 1.333e-05 [cse]: 2.821e-05 [a_3]: 3.961e-05 [Cycle 2]: 0.00059529, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 7.25998e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012877 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.75002e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.654e-05 [accelerated_algorithm]: 5.53002e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.08001e-06 [auto_parallel]: 4.99e-06 [parallel]: 4.25e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.10998e-06 [allreduce_fusion]: 2.78998e-06 [matmul_add_comm_reduction]: 5.06002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.51e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.81001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95998e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.09e-06 [a_after_grad]: 8.1e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.10002e-06 [cse]: 1.717e-05 [a_3]: 3.123e-05 [py_interpret_to_execute_after_opt_a]: 7.06001e-06 [slice_cell_reuse_recomputed_activation]: 2.38998e-06 [rewriter_after_opt_a]: 3.093e-05 [convert_after_rewriter]: 6.53e-06 [order_py_execute_after_rewriter]: 4.85999e-06 [mutable_eliminate]: 0.00044953 [opt_b]: 0.00018174, [1] [Cycle 1]: 0.00017557, [7] [b_1]: 0.00010903 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 3.10014e-07 [cse]: 1.589e-05 [optimize_parallel_all_gather_comm]: 1.591e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.26e-05 [loop_unroll]: 0.00041024 [opt_after_cconv]: 9.288e-05, [1] [Cycle 1]: 8.737e-05, [7] [c_1]: 2.709e-05 [parameter_eliminate]: 2.18998e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.519e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.251e-05 [tuple_transform]: 6.831e-05, [1] [Cycle 1]: 6.418e-05, [4] [d_1]: 3.878e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.355e-05 [cse_after_recomputation]: 2.011e-05, [1] [Cycle 1]: 1.586e-05, [1] [cse]: 1.082e-05 [environ_conv]: 4.80001e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.43001e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.92002e-06 [overlap_grad_ring_attention]: 3.88999e-06 [overlap_grad_flash_sp]: 1.674e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.789e-05, [1] [Cycle 1]: 6.343e-05, [6] [build]: 2.24999e-06 [elim_shapecalc]: 8.1e-06 [elim_not_effective]: 1.143e-05 [opt_reshape]: 5.79e-06 [fold_const_symbol]: 9.10999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.91e-06 [auto_monad_reorder]: 1.607e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00045019 [validate]: 3.124e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0059781 [execute]: 6.88e-06 Sums bootstrap : 0.000480s : 3.04% type_inference : 0.005554s : 35.17% event_method : 0.000015s : 0.09% auto_monad : 0.000057s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000572s : 3.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000401s : 2.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000045s : 0.29% optimize.opt_a.a_3 : 0.000071s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.85% optimize.opt_b.b_1 : 0.000109s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000410s : 2.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000450s : 2.85% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005978s : 37.86% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 15.15% : 0.000025s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000002s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 66.20% : 0.000108s : 3: substitution.inline 1.86% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.59% : 0.000004s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005511 2 90.09% : 0.004965s : 1: type_inference.infer 9.91% : 0.000546s : 1: type_inference.specialize ------[replace.] 0.000037 5 70.51% : 0.000026s : 3: replace.inline 29.49% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.48% : 0.000106s : 3: match.inline 8.52% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.64% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.54% : 0.000002s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 1.00% : 0.000002s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.42% : 0.000002s : 16: predicate.switch_defer_inline 2.16% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.18% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.87% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.27% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000336 8 47.33% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.67% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028185 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.63% : 0.002996s : 1: add_attr 10.60% : 0.002987s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.84% : 0.000519s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.31% : 0.000934s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.52% : 0.002120s : 1: opt_a 0.34% : 0.000096s : 1: opt_after_cconv 1.63% : 0.000459s : 1: opt_after_jit_grad 0.66% : 0.000185s : 1: opt_b 13.99% : 0.003943s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000206s : 1: renormalize.infer 0.67% : 0.000188s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.24% : 0.005988s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.75% : 0.005568s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0374798, [24] [bootstrap]: 0.00052219 [type_inference]: 0.0114442 [event_method]: 4.844e-05 [auto_monad]: 0.00012209 [graph_reusing]: 9.09e-06 [inline]: 1.94999e-06 [add_attr]: 0.0030126, [1] [add_attr_with_inline]: 0.00300474, [1] [Cycle 1]: 6.971e-05, [2] [tag_attr]: 3.437e-05 [meta_addattr_fg_expand]: 9.42999e-06 [parallel-infer-symbol]: 3.16999e-06 [pre_auto_parallel]: 5.033e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.0132471, [53] [py_interpret_to_execute]: 3.71e-05 [rewriter_before_opt_a]: 0.00014386 [opt_a]: 0.0109858, [3] [Cycle 1]: 0.00707431, [45] [expand_dump_flag]: 4.66002e-06 [switch_simplify]: 7.443e-05 [loop_unroll]: 6.189e-05 [a_1]: 0.00145791 [with_stream_mark]: 2.336e-05 [recompute_prepare]: 2.205e-05 [updatestate_depend_eliminate]: 8.83001e-06 [updatestate_assign_eliminate]: 7.98999e-06 [updatestate_loads_eliminate]: 7.06999e-06 [parameter_eliminate]: 2.54999e-06 [a_2]: 0.00024229 [accelerated_algorithm]: 3.069e-05 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 3.4e-06 [shard_inline]: 1.59e-05 [merge_send_recv]: 1.646e-05 [auto_parallel]: 1.057e-05 [parallel]: 1.928e-05 [flash_sp]: 1.17e-05 [merge_comm]: 9.58002e-06 [allreduce_fusion]: 9.02999e-06 [matmul_add_comm_reduction]: 2.707e-05 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 1.818e-05 [virtual_dataset]: 1.549e-05 [get_grad_eliminate_]: 1.5e-05 [virtual_output]: 1.549e-05 [merge_forward]: 9.42001e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.788e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.881e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 2.703e-05 [set_forward_comm_id_for_comm_node_pass]: 9.59e-06 [meta_fg_expand]: 0.00142089 [flash_sp_send_recv_attached]: 3.7e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 5.904e-05 [a_after_grad]: 8.043e-05 [renormalize]: 0.00243465 [add_forward_monad_depend]: 9.36002e-06 [auto_monad_grad]: 5.24e-06 [auto_monad_eliminator]: 5.594e-05 [cse]: 0.00016282 [a_3]: 0.00033544 [Cycle 2]: 0.00300081, [45] [expand_dump_flag]: 1.56002e-06 [switch_simplify]: 4.727e-05 [loop_unroll]: 4.333e-05 [a_1]: 0.00152123 [with_stream_mark]: 1.14e-05 [recompute_prepare]: 1.039e-05 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 1.32999e-06 [a_2]: 0.00012595 [accelerated_algorithm]: 1.204e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 9.12999e-06 [merge_send_recv]: 7.3e-06 [auto_parallel]: 8.18999e-06 [parallel]: 5.15999e-06 [flash_sp]: 3.25e-06 [merge_comm]: 5.30001e-06 [allreduce_fusion]: 4.73001e-06 [matmul_add_comm_reduction]: 7.81001e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 8.75001e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 8.29983e-07 [offload_activation]: 8.83001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.62999e-06 [meta_fg_expand]: 6.931e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.559e-05 [a_after_grad]: 1.434e-05 [renormalize]: 0.00060297 [add_forward_monad_depend]: 4e-06 [auto_monad_grad]: 1.43002e-06 [auto_monad_eliminator]: 1.395e-05 [cse]: 4.518e-05 [a_3]: 6.502e-05 [Cycle 3]: 0.00089692, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.018e-05 [loop_unroll]: 8.85999e-06 [a_1]: 0.00024864 [with_stream_mark]: 1.002e-05 [recompute_prepare]: 9.27999e-06 [updatestate_depend_eliminate]: 4.85001e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 0.00012375 [accelerated_algorithm]: 1.188e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 8.90999e-06 [merge_send_recv]: 7.31999e-06 [auto_parallel]: 6.99001e-06 [parallel]: 4.32e-06 [flash_sp]: 1.02998e-06 [merge_comm]: 4.99003e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.01e-05 [virtual_dataset]: 8.60001e-06 [get_grad_eliminate_]: 8.42e-06 [virtual_output]: 8.25999e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 8.48999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.577e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.363e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99e-06 [meta_fg_expand]: 3.05002e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.373e-05 [a_after_grad]: 1.428e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 1.054e-05 [cse]: 2.582e-05 [a_3]: 5.919e-05 [py_interpret_to_execute_after_opt_a]: 1.045e-05 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 4.794e-05 [convert_after_rewriter]: 9.38002e-06 [order_py_execute_after_rewriter]: 6.63e-06 [mutable_eliminate]: 0.000459 [opt_b]: 0.00028667, [1] [Cycle 1]: 0.00028065, [7] [b_1]: 0.00018897 [b_2]: 1.117e-05 [updatestate_depend_eliminate]: 7.21001e-06 [updatestate_assign_eliminate]: 3.92002e-06 [updatestate_loads_eliminate]: 4.08001e-06 [renormalize]: 5.00004e-07 [cse]: 3.038e-05 [optimize_parallel_all_gather_comm]: 2.06e-05 [overlap_param_gather]: 1.95001e-06 [cconv]: 2.053e-05 [loop_unroll]: 0.00042358 [opt_after_cconv]: 0.00013419, [1] [Cycle 1]: 0.0001282, [7] [c_1]: 4.788e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 7.05002e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 2.907e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 2.956e-05 [tuple_transform]: 0.00010242, [1] [Cycle 1]: 9.771e-05, [4] [d_1]: 6.698e-05 [none_parameter_eliminate]: 2.12999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.81e-06 [partial_unused_args_eliminate]: 2.29001e-06 [add_recomputation]: 5.809e-05 [cse_after_recomputation]: 3.128e-05, [1] [Cycle 1]: 2.669e-05, [1] [cse]: 2.146e-05 [environ_conv]: 8.85001e-06 [swap_dp_allreduce_reducescatter]: 7.75e-06 [bias_add_comm_swap]: 2.61999e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 3.33e-06 [merge_cast_opt]: 1.67001e-06 [slice_recompute_activation]: 2.48998e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 8.40024e-07 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.16e-06 [control_data_broadcast_order]: 1.673e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 5.17e-06 [overlap_recompute_and_grad_model_parallel]: 5.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.57999e-06 [overlap_recompute_comm]: 2.34999e-06 [overlap_grad_ring_attention]: 5.19998e-06 [overlap_grad_flash_sp]: 2.421e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 2.19001e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 9.838e-05, [1] [Cycle 1]: 9.422e-05, [6] [build]: 9.69e-06 [elim_shapecalc]: 1.29e-05 [elim_not_effective]: 1.844e-05 [opt_reshape]: 9.93002e-06 [fold_const_symbol]: 1.528e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.515e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.66001e-06 [opt_after_jit_grad]: 0.00051375 [validate]: 4.474e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00820288 [execute]: 7.34002e-06 Sums bootstrap : 0.000522s : 1.57% type_inference : 0.011444s : 34.45% event_method : 0.000048s : 0.15% auto_monad : 0.000122s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003228s : 9.72% optimize.opt_a.with_stream_mark : 0.000045s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.48% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001493s : 4.50% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003038s : 9.14% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000234s : 0.70% optimize.opt_a.a_3 : 0.000460s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000459s : 1.38% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000424s : 1.28% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000514s : 1.55% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008203s : 24.69% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000755 222 6.13% : 0.000046s : 12: substitution.arithmetic_simplify 1.78% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.03% : 0.000415s : 17: substitution.inline 2.02% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.00% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.80% : 0.000006s : 5: substitution.partial_eliminate 1.81% : 0.000014s : 20: substitution.remove_not_recompute_node 3.07% : 0.000023s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.83% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011369 2 87.11% : 0.009904s : 1: type_inference.infer 12.89% : 0.001466s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.44% : 0.000126s : 17: replace.inline 42.56% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000442 33 92.01% : 0.000407s : 17: match.inline 7.99% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000813 5764 1.01% : 0.000008s : 68: predicate.accumulaten_eliminater 5.65% : 0.000046s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.00% : 0.000008s : 68: predicate.addn_zero_filter 0.98% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.88% : 0.000015s : 100: predicate.arithmetic_simplify 1.05% : 0.000009s : 68: predicate.cast_eliminate 1.04% : 0.000008s : 68: predicate.check_bprop_eliminate 0.47% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.46% : 0.000004s : 32: predicate.depend_value_elim 1.10% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.10% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.02% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.14% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.10% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.11% : 0.000009s : 76: predicate.environ_get_depend_swap 1.61% : 0.000013s : 108: predicate.environ_get_eliminate 1.10% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.60% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.14% : 0.000017s : 101: predicate.float_depend_g_call 0.46% : 0.000004s : 32: predicate.float_environ_get_switch 0.60% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.51% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.50% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.17% : 0.000042s : 249: predicate.inline 1.13% : 0.000009s : 55: predicate.inline_without_move 0.28% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.58% : 0.000005s : 32: predicate.less_batch_normalization 1.52% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.46% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.08% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.29% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.50% : 0.000004s : 32: predicate.merge_addn 1.03% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.04% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.04% : 0.000008s : 68: predicate.minmaximum_grad 0.29% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.83% : 0.000015s : 101: predicate.partial_defer_inline 1.63% : 0.000013s : 92: predicate.partial_eliminate 0.97% : 0.000008s : 68: predicate.print_const_string_wrapper 0.49% : 0.000004s : 32: predicate.reduce_all_const_elim 1.21% : 0.000010s : 68: predicate.reduce_eliminate 2.46% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 32: predicate.remove_not_recompute_node 1.78% : 0.000014s : 152: predicate.replace_applicator 0.55% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.00% : 0.000008s : 68: predicate.reshape_eliminate 1.02% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.15% : 0.000009s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.58% : 0.000005s : 32: predicate.specialize_transform 1.13% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.71% : 0.000014s : 101: predicate.switch_defer_inline 2.72% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.65% : 0.000038s : 277: predicate.switch_simplify 3.50% : 0.000028s : 68: predicate.tile_eliminate 0.99% : 0.000008s : 68: predicate.transpose_eliminate 1.34% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.41% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.23% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.61% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.34% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.90% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.51% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.45% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.02% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 8: predicate.value_based_eliminate 0.51% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001594 34 55.81% : 0.000890s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.19% : 0.000705s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062037 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.86% : 0.003017s : 1: add_attr 4.85% : 0.003008s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000129s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.90% : 0.000557s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000056s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.88% : 0.004886s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.12% : 0.000077s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.71% : 0.010989s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.84% : 0.000524s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.36% : 0.013251s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.61% : 0.001621s : 2: renormalize.infer 2.26% : 0.001404s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000148s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.24% : 0.008213s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.47% : 0.011460s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0185951, [24] [bootstrap]: 0.00052188 [type_inference]: 0.00436164 [event_method]: 1.027e-05 [auto_monad]: 5.43e-05 [graph_reusing]: 5.27001e-06 [inline]: 2.09999e-06 [add_attr]: 0.00296889, [1] [add_attr_with_inline]: 0.00296083, [1] [Cycle 1]: 4.695e-05, [2] [tag_attr]: 1.256e-05 [meta_addattr_fg_expand]: 3.11999e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 5.961e-05 [insert-virtual-dataset]: 2.78998e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00365181, [53] [py_interpret_to_execute]: 1.594e-05 [rewriter_before_opt_a]: 4.012e-05 [opt_a]: 0.00185276, [2] [Cycle 1]: 0.0012556, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 2.516e-05 [loop_unroll]: 1.389e-05 [a_1]: 0.00029393 [with_stream_mark]: 1.409e-05 [recompute_prepare]: 7.33999e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 2.95998e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.535e-05 [accelerated_algorithm]: 6.53998e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 7.71001e-06 [auto_parallel]: 5.39998e-06 [parallel]: 1.909e-05 [flash_sp]: 7.4e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.21002e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.76998e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.69999e-06 [merge_forward]: 3.62002e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.147e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 9.67001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.48e-06 [after_resolve]: 1.076e-05 [a_after_grad]: 8.70999e-06 [renormalize]: 0.00033805 [add_forward_monad_depend]: 4.44002e-06 [auto_monad_grad]: 1.58002e-06 [auto_monad_eliminator]: 1.308e-05 [cse]: 2.78e-05 [a_3]: 4.021e-05 [Cycle 2]: 0.00058789, [45] [expand_dump_flag]: 7.39994e-07 [switch_simplify]: 6.58998e-06 [loop_unroll]: 5.48002e-06 [a_1]: 0.00011935 [with_stream_mark]: 9.60001e-06 [recompute_prepare]: 5.49e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.674e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.18999e-06 [auto_parallel]: 4.92999e-06 [parallel]: 3.91001e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 2.95002e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.44001e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.75002e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.97003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23998e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 9.42001e-06 [a_after_grad]: 8.50999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.04001e-06 [cse]: 1.313e-05 [a_3]: 3.397e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.155e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00044653 [opt_b]: 0.00017842, [1] [Cycle 1]: 0.00017252, [7] [b_1]: 0.00010586 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.35002e-06 [renormalize]: 4.69998e-07 [cse]: 1.618e-05 [optimize_parallel_all_gather_comm]: 1.569e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.343e-05 [loop_unroll]: 0.00040858 [opt_after_cconv]: 9.415e-05, [1] [Cycle 1]: 8.888e-05, [7] [c_1]: 2.73e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.605e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.279e-05 [tuple_transform]: 6.956e-05, [1] [Cycle 1]: 6.534e-05, [4] [d_1]: 3.969e-05 [none_parameter_eliminate]: 1.92999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.94e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.452e-05 [cse_after_recomputation]: 1.976e-05, [1] [Cycle 1]: 1.54e-05, [1] [cse]: 1.021e-05 [environ_conv]: 4.37e-06 [swap_dp_allreduce_reducescatter]: 5.32001e-06 [bias_add_comm_swap]: 2.30002e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.79999e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.30001e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.68998e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.739e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 6.738e-05, [1] [Cycle 1]: 6.315e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.01001e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 8.65001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.59e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.634e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00047325 [validate]: 3.19e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00621723 [execute]: 6.83998e-06 Sums bootstrap : 0.000522s : 3.56% type_inference : 0.004362s : 29.73% event_method : 0.000010s : 0.07% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000060s : 0.41% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000413s : 2.82% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000010s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000338s : 2.30% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000447s : 3.04% optimize.opt_b.b_1 : 0.000106s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000409s : 2.78% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000473s : 3.23% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006217s : 42.37% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000124 26 18.04% : 0.000022s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 5.00% : 0.000006s : 4: substitution.graph_param_transform 65.04% : 0.000081s : 2: substitution.inline 2.47% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.80% : 0.000005s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004323 2 91.81% : 0.003969s : 1: type_inference.infer 8.19% : 0.000354s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.60% : 0.000001s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.96% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.03% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.11% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.08% : 0.000001s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.16% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.60% : 0.000006s : 41: predicate.switch_simplify 0.84% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 41.73% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.27% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026478 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.23% : 0.002973s : 1: add_attr 11.20% : 0.002964s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.11% : 0.000558s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000417s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.89% : 0.000766s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.01% : 0.001856s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.82% : 0.000483s : 1: opt_after_jit_grad 0.69% : 0.000182s : 1: opt_b 13.81% : 0.003656s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.24% : 0.000065s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000186s : 1: renormalize.infer 0.55% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.52% : 0.006227s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.52% : 0.004375s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0359487, [24] [bootstrap]: 0.00050708 [type_inference]: 0.0102408 [event_method]: 4.073e-05 [auto_monad]: 0.00011779 [graph_reusing]: 8.03001e-06 [inline]: 1.87999e-06 [add_attr]: 0.00299965, [1] [add_attr_with_inline]: 0.00299156, [1] [Cycle 1]: 6.692e-05, [2] [tag_attr]: 3.079e-05 [meta_addattr_fg_expand]: 8.52e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 4.621e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.88002e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0129635, [53] [py_interpret_to_execute]: 3.42e-05 [rewriter_before_opt_a]: 0.00012665 [opt_a]: 0.0107387, [3] [Cycle 1]: 0.00687041, [45] [expand_dump_flag]: 3.36999e-06 [switch_simplify]: 6.615e-05 [loop_unroll]: 5.414e-05 [a_1]: 0.00138649 [with_stream_mark]: 2.336e-05 [recompute_prepare]: 2.185e-05 [updatestate_depend_eliminate]: 9.18002e-06 [updatestate_assign_eliminate]: 7.77e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 3.01001e-06 [a_2]: 0.00024324 [accelerated_algorithm]: 3.046e-05 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 3.57997e-06 [shard_inline]: 1.626e-05 [merge_send_recv]: 1.71e-05 [auto_parallel]: 1.067e-05 [parallel]: 1.989e-05 [flash_sp]: 1.184e-05 [merge_comm]: 9.93002e-06 [allreduce_fusion]: 8.97e-06 [matmul_add_comm_reduction]: 2.767e-05 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 1.774e-05 [virtual_dataset]: 1.523e-05 [get_grad_eliminate_]: 1.507e-05 [virtual_output]: 1.496e-05 [merge_forward]: 9.81e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 1.791e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.832e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.716e-05 [set_forward_comm_id_for_comm_node_pass]: 9.36e-06 [meta_fg_expand]: 0.00137645 [flash_sp_send_recv_attached]: 3.74002e-06 [receive_attached]: 2.48e-06 [after_resolve]: 5.834e-05 [a_after_grad]: 8.098e-05 [renormalize]: 0.00236447 [add_forward_monad_depend]: 9.19e-06 [auto_monad_grad]: 5.05001e-06 [auto_monad_eliminator]: 5.525e-05 [cse]: 0.00016198 [a_3]: 0.00033323 [Cycle 2]: 0.00296871, [45] [expand_dump_flag]: 1.47001e-06 [switch_simplify]: 4.692e-05 [loop_unroll]: 4.383e-05 [a_1]: 0.00151634 [with_stream_mark]: 1.177e-05 [recompute_prepare]: 1.097e-05 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 4.50001e-06 [updatestate_loads_eliminate]: 3.61999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.0001254 [accelerated_algorithm]: 1.165e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 9.07001e-06 [merge_send_recv]: 6.74999e-06 [auto_parallel]: 7.43e-06 [parallel]: 4.61002e-06 [flash_sp]: 3.18998e-06 [merge_comm]: 5.17999e-06 [allreduce_fusion]: 4.74002e-06 [matmul_add_comm_reduction]: 7.86001e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 1.002e-05 [virtual_dataset]: 8.82e-06 [get_grad_eliminate_]: 4.449e-05 [virtual_output]: 9.04e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.773e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 1.443e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37001e-06 [meta_fg_expand]: 3.486e-05 [flash_sp_send_recv_attached]: 1.02998e-06 [receive_attached]: 1.07e-06 [after_resolve]: 1.493e-05 [a_after_grad]: 1.383e-05 [renormalize]: 0.00057443 [add_forward_monad_depend]: 3.95e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.433e-05 [cse]: 4.545e-05 [a_3]: 6.527e-05 [Cycle 3]: 0.00088577, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.047e-05 [loop_unroll]: 8.91997e-06 [a_1]: 0.00024643 [with_stream_mark]: 9.68997e-06 [recompute_prepare]: 9.44998e-06 [updatestate_depend_eliminate]: 4.86002e-06 [updatestate_assign_eliminate]: 3.99997e-06 [updatestate_loads_eliminate]: 3.76001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012135 [accelerated_algorithm]: 1.15e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 8.85001e-06 [merge_send_recv]: 7.27002e-06 [auto_parallel]: 6.96001e-06 [parallel]: 4.3e-06 [flash_sp]: 1.13001e-06 [merge_comm]: 4.92999e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.64002e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 9.74999e-06 [virtual_dataset]: 8.33999e-06 [get_grad_eliminate_]: 8.27998e-06 [virtual_output]: 8.15e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 8.55999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.558e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.421e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.252e-05 [a_after_grad]: 1.414e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 1.007e-05 [cse]: 2.532e-05 [a_3]: 5.732e-05 [py_interpret_to_execute_after_opt_a]: 9.94001e-06 [slice_cell_reuse_recomputed_activation]: 2.19999e-06 [rewriter_after_opt_a]: 4.925e-05 [convert_after_rewriter]: 9.05999e-06 [order_py_execute_after_rewriter]: 6.86001e-06 [mutable_eliminate]: 0.00045553 [opt_b]: 0.00028452, [1] [Cycle 1]: 0.00027837, [7] [b_1]: 0.00018738 [b_2]: 1.081e-05 [updatestate_depend_eliminate]: 6.98998e-06 [updatestate_assign_eliminate]: 4.07998e-06 [updatestate_loads_eliminate]: 3.94002e-06 [renormalize]: 3.19997e-07 [cse]: 3.051e-05 [optimize_parallel_all_gather_comm]: 2.098e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 1.932e-05 [loop_unroll]: 0.00041893 [opt_after_cconv]: 0.00013298, [1] [Cycle 1]: 0.00012727, [7] [c_1]: 4.734e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 6.84001e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 3.88001e-06 [cse]: 2.933e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 2.753e-05 [tuple_transform]: 0.00010031, [1] [Cycle 1]: 9.56e-05, [4] [d_1]: 6.572e-05 [none_parameter_eliminate]: 1.97001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.86998e-06 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 5.787e-05 [cse_after_recomputation]: 3.214e-05, [1] [Cycle 1]: 2.723e-05, [1] [cse]: 2.185e-05 [environ_conv]: 8.35999e-06 [swap_dp_allreduce_reducescatter]: 7.75998e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 3.01999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.48998e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.70002e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.40999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.801e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 4.88001e-06 [overlap_recompute_and_grad_model_parallel]: 5.91e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 5.10001e-06 [overlap_grad_flash_sp]: 2.464e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 2.15002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.799e-05, [1] [Cycle 1]: 9.374e-05, [6] [build]: 1.004e-05 [elim_shapecalc]: 1.281e-05 [elim_not_effective]: 1.783e-05 [opt_reshape]: 1.021e-05 [fold_const_symbol]: 1.475e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 2.489e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00049971 [validate]: 4.445e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00822095 [execute]: 6.89999e-06 Sums bootstrap : 0.000507s : 1.60% type_inference : 0.010241s : 32.31% event_method : 0.000041s : 0.13% auto_monad : 0.000118s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003149s : 9.94% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000490s : 1.55% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.10% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000032s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000068s : 0.21% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001414s : 4.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000086s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.34% optimize.opt_a.renormalize : 0.002939s : 9.27% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000233s : 0.73% optimize.opt_a.a_3 : 0.000456s : 1.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000456s : 1.44% optimize.opt_b.b_1 : 0.000187s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000419s : 1.32% optimize.opt_after_cconv.c_1 : 0.000047s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000500s : 1.58% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008221s : 25.94% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000727 218 5.82% : 0.000042s : 11: substitution.arithmetic_simplify 1.83% : 0.000013s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000007s : 8: substitution.graph_param_transform 0.44% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 54.77% : 0.000398s : 16: substitution.inline 2.15% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.94% : 0.000014s : 20: substitution.remove_not_recompute_node 3.24% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.80% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.90% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010170 2 87.54% : 0.008903s : 1: type_inference.infer 12.46% : 0.001267s : 1: type_inference.specialize ------[replace.] 0.000200 30 58.63% : 0.000117s : 16: replace.inline 41.37% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000421 30 92.64% : 0.000390s : 16: match.inline 7.36% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000730 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.10% : 0.000015s : 99: predicate.arithmetic_simplify 1.17% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000019s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000012s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000009s : 67: predicate.reduce_eliminate 2.70% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.85% : 0.000035s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.73% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000014s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001486 32 58.71% : 0.000873s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.29% : 0.000614s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059996 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.01% : 0.003004s : 1: add_attr 4.99% : 0.002995s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000125s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.91% : 0.000545s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.03% : 0.004820s : 117: opt.transform.opt_a 0.08% : 0.000046s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000073s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.90% : 0.010742s : 1: opt_a 0.23% : 0.000136s : 1: opt_after_cconv 0.85% : 0.000509s : 1: opt_after_jit_grad 0.48% : 0.000288s : 1: opt_b 21.61% : 0.012967s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.64% : 0.001585s : 2: renormalize.infer 2.23% : 0.001341s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000053s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.72% : 0.008231s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.10% : 0.010256s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x7-kbk],max_mem:10.0M . TotalTime = 0.809237, [24] [bootstrap]: 0.0005537 [type_inference]: 0.00615056 [event_method]: 1.418e-05 [auto_monad]: 5.665e-05 [graph_reusing]: 4.87e-06 [inline]: 1.92999e-06 [add_attr]: 0.0034072, [1] [add_attr_with_inline]: 0.00339607, [1] [Cycle 1]: 4.627e-05, [2] [tag_attr]: 1.512e-05 [meta_addattr_fg_expand]: 4.14002e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.858e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.71998e-06 [optimize]: 0.00399781, [53] [py_interpret_to_execute]: 1.995e-05 [rewriter_before_opt_a]: 5.831e-05 [opt_a]: 0.00211114, [2] [Cycle 1]: 0.0015198, [45] [expand_dump_flag]: 2.91e-06 [switch_simplify]: 3.172e-05 [loop_unroll]: 2.147e-05 [a_1]: 0.00046079 [with_stream_mark]: 1.36e-05 [recompute_prepare]: 7.26999e-06 [updatestate_depend_eliminate]: 4.10998e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.90002e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 7.719e-05 [accelerated_algorithm]: 6.04999e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 8.13999e-06 [auto_parallel]: 6.11e-06 [parallel]: 2.335e-05 [flash_sp]: 7.58999e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.40998e-06 [matmul_add_comm_reduction]: 9.23002e-06 [allreduce_slice_to_reducescatter]: 1.04998e-06 [virtual_shard_identity]: 7.37002e-06 [virtual_dataset]: 5.88998e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.41e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.143e-05 [merge_recompute_call_nodes]: 1.61002e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.023e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 0.0004094 [add_forward_monad_depend]: 4.85999e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.731e-05 [a_3]: 4.063e-05 [Cycle 2]: 0.00058218, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012551 [with_stream_mark]: 9.57999e-06 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.43998e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 6.723e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.65999e-06 [auto_parallel]: 5.09998e-06 [parallel]: 4.2e-06 [flash_sp]: 3.21001e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.13002e-06 [get_grad_eliminate_]: 4.87998e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 8.07e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94001e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 7.78001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.07999e-06 [cse]: 1.177e-05 [a_3]: 3.113e-05 [py_interpret_to_execute_after_opt_a]: 7.1e-06 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 3.193e-05 [convert_after_rewriter]: 6.56999e-06 [order_py_execute_after_rewriter]: 4.87e-06 [mutable_eliminate]: 0.00044518 [opt_b]: 0.00023296, [1] [Cycle 1]: 0.00022706, [7] [b_1]: 0.00015724 [b_2]: 7.34002e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 5.10016e-07 [cse]: 1.551e-05 [optimize_parallel_all_gather_comm]: 1.641e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.311e-05 [loop_unroll]: 0.00041528 [opt_after_cconv]: 9.258e-05, [1] [Cycle 1]: 8.707e-05, [7] [c_1]: 2.758e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 4.94998e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.568e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.431e-05 [tuple_transform]: 6.91e-05, [1] [Cycle 1]: 6.461e-05, [4] [d_1]: 3.878e-05 [none_parameter_eliminate]: 1.68002e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 5.087e-05 [cse_after_recomputation]: 2.005e-05, [1] [Cycle 1]: 1.556e-05, [1] [cse]: 1.039e-05 [environ_conv]: 4.64002e-06 [swap_dp_allreduce_reducescatter]: 5.06002e-06 [bias_add_comm_swap]: 2.96001e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.59002e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 3.68999e-06 [overlap_grad_flash_sp]: 1.751e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 6.68e-05, [1] [Cycle 1]: 6.266e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 8.17e-06 [elim_not_effective]: 1.113e-05 [opt_reshape]: 5.94999e-06 [fold_const_symbol]: 8.60999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 1.576e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00044657 [validate]: 3.062e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.794259 [execute]: 8.32998e-06 Sums bootstrap : 0.000554s : 0.07% type_inference : 0.006151s : 0.76% event_method : 0.000014s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000586s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000410s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000445s : 0.06% optimize.opt_b.b_1 : 0.000157s : 0.02% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.00% optimize.loop_unroll : 0.000415s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000447s : 0.06% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.794259s : 98.69% execute : 0.000008s : 0.00% Time group info: ------[substitution.] 0.000168 30 15.77% : 0.000027s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000005s : 4: substitution.graph_param_transform 65.69% : 0.000111s : 3: substitution.inline 1.92% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.79% : 0.000005s : 4: substitution.remove_not_recompute_node 2.17% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006105 2 90.98% : 0.005554s : 1: type_inference.infer 9.02% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.78% : 0.000028s : 3: replace.inline 28.22% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.38% : 0.000108s : 3: match.inline 8.62% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.84% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.47% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.37% : 0.000001s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.32% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.63% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.04% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.68% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.88% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.88% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.47% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.53% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000336 8 47.29% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.71% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.818204 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.42% : 0.003412s : 1: add_attr 0.42% : 0.003400s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000062s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000594s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000454s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.12% : 0.000952s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000139s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000031s : 4: opt.transform.symbol_engine_opt 0.26% : 0.002114s : 1: opt_a 0.01% : 0.000096s : 1: opt_after_cconv 0.06% : 0.000456s : 1: opt_after_jit_grad 0.03% : 0.000237s : 1: opt_b 0.49% : 0.004002s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000018s : 1: remove_dup_value 0.03% : 0.000212s : 1: renormalize.infer 0.02% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000069s : 1: symbol_engine_optimizer 97.08% : 0.794309s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.75% : 0.006165s : 1: type_inference 0.01% : 0.000057s : 1: validate TotalTime = 0.0840394, [24] [bootstrap]: 0.00061603 [type_inference]: 0.00476693 [event_method]: 1.099e-05 [auto_monad]: 5.706e-05 [graph_reusing]: 5.15999e-06 [inline]: 1.75001e-06 [add_attr]: 0.00315977, [1] [add_attr_with_inline]: 0.00315123, [1] [Cycle 1]: 3.985e-05, [2] [tag_attr]: 1.208e-05 [meta_addattr_fg_expand]: 3.21001e-06 [parallel-infer-symbol]: 2.73998e-06 [pre_auto_parallel]: 2.128e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00370336, [53] [py_interpret_to_execute]: 1.496e-05 [rewriter_before_opt_a]: 4.017e-05 [opt_a]: 0.00186167, [2] [Cycle 1]: 0.00126195, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 2.425e-05 [loop_unroll]: 1.426e-05 [a_1]: 0.00029562 [with_stream_mark]: 1.361e-05 [recompute_prepare]: 7.34002e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.706e-05 [accelerated_algorithm]: 6.42001e-06 [shard]: 2.21998e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 7.6e-06 [auto_parallel]: 5.77999e-06 [parallel]: 2.007e-05 [flash_sp]: 7.05002e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 9.93002e-06 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 7.56001e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.42999e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.69002e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 9.64999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 9.30001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 1.054e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00034096 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 1.64998e-06 [auto_monad_eliminator]: 1.346e-05 [cse]: 2.894e-05 [a_3]: 4.046e-05 [Cycle 2]: 0.00059031, [45] [expand_dump_flag]: 8.00006e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.00012695 [with_stream_mark]: 1.068e-05 [recompute_prepare]: 5.68997e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.849e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.44998e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.02e-06 [parallel]: 4.12998e-06 [flash_sp]: 3.71001e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.19998e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 5.92999e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.13002e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.26e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39998e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.71002e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 5.92999e-06 [cse]: 1.291e-05 [a_3]: 3.309e-05 [py_interpret_to_execute_after_opt_a]: 7.25003e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 5.454e-05 [convert_after_rewriter]: 7.5e-06 [order_py_execute_after_rewriter]: 5.24998e-06 [mutable_eliminate]: 0.00044719 [opt_b]: 0.00018235, [1] [Cycle 1]: 0.00017647, [7] [b_1]: 0.00010903 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 4.2998e-07 [cse]: 1.604e-05 [optimize_parallel_all_gather_comm]: 1.675e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.279e-05 [loop_unroll]: 0.00041272 [opt_after_cconv]: 9.555e-05, [1] [Cycle 1]: 8.997e-05, [7] [c_1]: 2.831e-05 [parameter_eliminate]: 2.45002e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.598e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.328e-05 [tuple_transform]: 7.01e-05, [1] [Cycle 1]: 6.564e-05, [4] [d_1]: 3.993e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.866e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.58e-05, [1] [cse]: 1.071e-05 [environ_conv]: 4.71002e-06 [swap_dp_allreduce_reducescatter]: 5.74e-06 [bias_add_comm_swap]: 2.48e-06 [label_micro_interleaved_index]: 3.99002e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.98e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.34e-06 [interleave_split_concat_branches]: 1.34998e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.97e-06 [overlap_recompute_and_grad_model_parallel]: 4.62998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.48998e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.777e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.54999e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 9.10019e-07 [symbol_engine_optimizer]: 6.774e-05, [1] [Cycle 1]: 6.36e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.76003e-06 [auto_monad_reorder]: 1.646e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00045925 [validate]: 3.576e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0709595 [execute]: 9.25999e-06 Sums bootstrap : 0.000616s : 0.77% type_inference : 0.004767s : 5.96% event_method : 0.000011s : 0.01% auto_monad : 0.000057s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000423s : 0.53% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.18% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000341s : 0.43% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000042s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000055s : 0.07% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.56% optimize.opt_b.b_1 : 0.000109s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000413s : 0.52% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000459s : 0.57% validate : 0.000036s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.070960s : 88.78% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000123 26 18.19% : 0.000022s : 4: substitution.arithmetic_simplify 1.66% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.83% : 0.000006s : 4: substitution.graph_param_transform 65.38% : 0.000080s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.38% : 0.000004s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004725 2 91.67% : 0.004331s : 1: type_inference.infer 8.33% : 0.000394s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.96% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.28% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.87% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.24% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.97% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000319 6 44.57% : 0.000142s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.43% : 0.000177s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.092180 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.43% : 0.003164s : 1: add_attr 3.42% : 0.003155s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000062s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.71% : 0.000653s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.46% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.49% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.84% : 0.000777s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.02% : 0.001864s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.51% : 0.000468s : 1: opt_after_jit_grad 0.20% : 0.000186s : 1: opt_b 4.02% : 0.003707s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.20% : 0.000187s : 1: renormalize.infer 0.16% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000059s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 77.00% : 0.070976s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 5.19% : 0.004781s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.0725078, [24] [bootstrap]: 0.00046127 [type_inference]: 0.00557037 [event_method]: 1.426e-05 [auto_monad]: 5.563e-05 [graph_reusing]: 5.25999e-06 [inline]: 2.01003e-06 [add_attr]: 0.00298003, [1] [add_attr_with_inline]: 0.00297207, [1] [Cycle 1]: 4.613e-05, [2] [tag_attr]: 1.57e-05 [meta_addattr_fg_expand]: 4.13001e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.589e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.68002e-06 [optimize]: 0.00396345, [53] [py_interpret_to_execute]: 2.111e-05 [rewriter_before_opt_a]: 5.797e-05 [opt_a]: 0.00211107, [2] [Cycle 1]: 0.00150686, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.297e-05 [loop_unroll]: 2.104e-05 [a_1]: 0.00044375 [with_stream_mark]: 1.296e-05 [recompute_prepare]: 7.68001e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.26999e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 1.85001e-06 [a_2]: 7.513e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 2.31998e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 7.63001e-06 [auto_parallel]: 6.10002e-06 [parallel]: 1.909e-05 [flash_sp]: 6.96001e-06 [merge_comm]: 3.79002e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.74e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.74001e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.51e-06 [merge_forward]: 4.01001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 9.14e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.78998e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 9.05001e-06 [renormalize]: 0.00042408 [add_forward_monad_depend]: 4.49998e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.351e-05 [cse]: 2.803e-05 [a_3]: 4.069e-05 [Cycle 2]: 0.00059461, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.72002e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012365 [with_stream_mark]: 9.54e-06 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 2.69001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.39999e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.778e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.10001e-06 [parallel]: 3.81999e-06 [flash_sp]: 3.30003e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.90998e-06 [matmul_add_comm_reduction]: 8.45999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.38998e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 4.88001e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 5.86998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.31002e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 8.12e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00998e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 8.92999e-06 [a_after_grad]: 7.97e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.48e-06 [cse]: 1.386e-05 [a_3]: 3.347e-05 [py_interpret_to_execute_after_opt_a]: 7.3e-06 [slice_cell_reuse_recomputed_activation]: 2.17999e-06 [rewriter_after_opt_a]: 4.975e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.24e-06 [mutable_eliminate]: 0.00045269 [opt_b]: 0.00018069, [1] [Cycle 1]: 0.00017472, [7] [b_1]: 0.00010659 [b_2]: 7.05998e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 5.09986e-07 [cse]: 1.686e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.218e-05 [loop_unroll]: 0.00041277 [opt_after_cconv]: 9.491e-05, [1] [Cycle 1]: 8.936e-05, [7] [c_1]: 2.829e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.624e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.279e-05 [tuple_transform]: 6.971e-05, [1] [Cycle 1]: 6.535e-05, [4] [d_1]: 3.945e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.76998e-06 [add_recomputation]: 4.429e-05 [cse_after_recomputation]: 1.949e-05, [1] [Cycle 1]: 1.535e-05, [1] [cse]: 1.046e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.21998e-06 [bias_add_comm_swap]: 2.21003e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.96999e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.89001e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 1.19998e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.16003e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.55999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86003e-06 [control_data_broadcast_order]: 1.183e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.49998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55001e-06 [overlap_recompute_comm]: 2.29999e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.748e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.91998e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 6.753e-05, [1] [Cycle 1]: 6.341e-05, [6] [build]: 2.54001e-06 [elim_shapecalc]: 8.60001e-06 [elim_not_effective]: 1.115e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.75999e-06 [renormalize]: 2.60014e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.41002e-06 [auto_monad_reorder]: 1.559e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00044823 [validate]: 3.152e-05 [backend_pass]: 8.49977e-07 [task_emit]: 0.0587073 [execute]: 8.74e-06 Sums bootstrap : 0.000461s : 0.67% type_inference : 0.005570s : 8.12% event_method : 0.000014s : 0.02% auto_monad : 0.000056s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000567s : 0.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000424s : 0.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.07% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.66% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000413s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000448s : 0.65% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058707s : 85.62% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.76% : 0.000024s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 66.88% : 0.000110s : 3: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.35% : 0.000004s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 4: substitution.replace_old_param 6.77% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005530 2 90.12% : 0.004984s : 1: type_inference.infer 9.88% : 0.000547s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.80% : 0.000027s : 3: replace.inline 29.20% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.44% : 0.000108s : 3: match.inline 8.56% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 19: predicate.arithmetic_simplify 1.04% : 0.000002s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.87% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 1.01% : 0.000002s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.65% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 46.32% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.68% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080964 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.69% : 0.002984s : 1: add_attr 3.68% : 0.002976s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000061s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000499s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.15% : 0.000932s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.61% : 0.002114s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000458s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.90% : 0.003967s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000209s : 1: renormalize.infer 0.26% : 0.000209s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000054s : 1: rewriter_after_opt_a 0.08% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 72.53% : 0.058723s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 6.90% : 0.005584s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.861285, [24] [bootstrap]: 0.00051643 [type_inference]: 0.0114239 [event_method]: 4.898e-05 [auto_monad]: 0.00012222 [graph_reusing]: 8.22e-06 [inline]: 2.03002e-06 [add_attr]: 0.00300271, [1] [add_attr_with_inline]: 0.0029948, [1] [Cycle 1]: 7.106e-05, [2] [tag_attr]: 3.475e-05 [meta_addattr_fg_expand]: 9.77999e-06 [parallel-infer-symbol]: 2.74001e-06 [pre_auto_parallel]: 5.011e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.0133616, [53] [py_interpret_to_execute]: 3.77e-05 [rewriter_before_opt_a]: 0.00014481 [opt_a]: 0.0110999, [3] [Cycle 1]: 0.00715794, [45] [expand_dump_flag]: 3.55e-06 [switch_simplify]: 7.325e-05 [loop_unroll]: 6.337e-05 [a_1]: 0.00149691 [with_stream_mark]: 2.354e-05 [recompute_prepare]: 2.198e-05 [updatestate_depend_eliminate]: 9.67001e-06 [updatestate_assign_eliminate]: 7.98999e-06 [updatestate_loads_eliminate]: 7.82e-06 [parameter_eliminate]: 2.61e-06 [a_2]: 0.00024464 [accelerated_algorithm]: 3.137e-05 [shard]: 1.98002e-06 [meta_shard_fg_expand]: 3.41999e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.661e-05 [auto_parallel]: 1.063e-05 [parallel]: 1.986e-05 [flash_sp]: 1.142e-05 [merge_comm]: 9.77999e-06 [allreduce_fusion]: 9.17999e-06 [matmul_add_comm_reduction]: 2.642e-05 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 1.776e-05 [virtual_dataset]: 1.545e-05 [get_grad_eliminate_]: 1.507e-05 [virtual_output]: 1.524e-05 [merge_forward]: 9.72001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.736e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.86e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.726e-05 [set_forward_comm_id_for_comm_node_pass]: 9.75002e-06 [meta_fg_expand]: 0.00141535 [flash_sp_send_recv_attached]: 3.87998e-06 [receive_attached]: 2.52001e-06 [after_resolve]: 5.981e-05 [a_after_grad]: 8.193e-05 [renormalize]: 0.00247049 [add_forward_monad_depend]: 9.02e-06 [auto_monad_grad]: 4.95999e-06 [auto_monad_eliminator]: 5.746e-05 [cse]: 0.0001691 [a_3]: 0.00033369 [Cycle 2]: 0.0030229, [45] [expand_dump_flag]: 1.44998e-06 [switch_simplify]: 4.686e-05 [loop_unroll]: 4.496e-05 [a_1]: 0.00152358 [with_stream_mark]: 1.184e-05 [recompute_prepare]: 1.071e-05 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 4.38999e-06 [updatestate_loads_eliminate]: 3.56999e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012662 [accelerated_algorithm]: 1.186e-05 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 2.03997e-06 [shard_inline]: 9.14998e-06 [merge_send_recv]: 6.52001e-06 [auto_parallel]: 7.28e-06 [parallel]: 5.02e-06 [flash_sp]: 3.19001e-06 [merge_comm]: 5.19e-06 [allreduce_fusion]: 4.54002e-06 [matmul_add_comm_reduction]: 7.61999e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 9.96e-06 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.97e-06 [virtual_output]: 8.56002e-06 [merge_forward]: 5.34998e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.34e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.611e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37001e-06 [meta_fg_expand]: 6.885e-05 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.657e-05 [a_after_grad]: 1.465e-05 [renormalize]: 0.00059424 [add_forward_monad_depend]: 4.07e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 1.434e-05 [cse]: 4.689e-05 [a_3]: 6.451e-05 [Cycle 3]: 0.00090498, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 1.056e-05 [loop_unroll]: 9.14e-06 [a_1]: 0.00025047 [with_stream_mark]: 1.047e-05 [recompute_prepare]: 9.42001e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.0001225 [accelerated_algorithm]: 1.168e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 7.06001e-06 [auto_parallel]: 7.05e-06 [parallel]: 4.65999e-06 [flash_sp]: 1.09003e-06 [merge_comm]: 4.97999e-06 [allreduce_fusion]: 5.02999e-06 [matmul_add_comm_reduction]: 7.55998e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 9.83998e-06 [virtual_dataset]: 8.54e-06 [get_grad_eliminate_]: 8.47998e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.28999e-06 [cell_reuse_recompute_pass]: 1.46002e-06 [offload_activation]: 8.50999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.691e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.485e-05 [set_forward_comm_id_for_comm_node_pass]: 5.98998e-06 [meta_fg_expand]: 3.06001e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 1.382e-05 [a_after_grad]: 1.4e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 1.054e-05 [cse]: 2.66e-05 [a_3]: 6.044e-05 [py_interpret_to_execute_after_opt_a]: 1.076e-05 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 4.675e-05 [convert_after_rewriter]: 8.84e-06 [order_py_execute_after_rewriter]: 7.3e-06 [mutable_eliminate]: 0.00045706 [opt_b]: 0.00028662, [1] [Cycle 1]: 0.00028063, [7] [b_1]: 0.00018868 [b_2]: 1.071e-05 [updatestate_depend_eliminate]: 7.17002e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.78999e-06 [renormalize]: 4.49974e-07 [cse]: 3.161e-05 [optimize_parallel_all_gather_comm]: 2.092e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 1.995e-05 [loop_unroll]: 0.00042036 [opt_after_cconv]: 0.00013677, [1] [Cycle 1]: 0.00013057, [7] [c_1]: 4.774e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 7.35e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 4.02002e-06 [cse]: 3.07e-05 [renormalize]: 2.09984e-07 [remove_dup_value]: 2.993e-05 [tuple_transform]: 0.00010049, [1] [Cycle 1]: 9.58e-05, [4] [d_1]: 6.582e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 9.74e-06 [partial_unused_args_eliminate]: 2.14999e-06 [add_recomputation]: 5.907e-05 [cse_after_recomputation]: 3.322e-05, [1] [Cycle 1]: 2.843e-05, [1] [cse]: 2.261e-05 [environ_conv]: 9.29998e-06 [swap_dp_allreduce_reducescatter]: 8.05e-06 [bias_add_comm_swap]: 2.89999e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.40999e-06 [full_micro_interleaved_order_control]: 2.48002e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.34e-06 [interleave_split_concat_branches]: 1.40999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01e-06 [control_data_broadcast_order]: 1.709e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.00001e-06 [overlap_recompute_and_grad_model_parallel]: 5.68002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.66e-06 [overlap_grad_ring_attention]: 5.19003e-06 [overlap_grad_flash_sp]: 2.428e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.43e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.859e-05, [1] [Cycle 1]: 9.451e-05, [6] [build]: 9.86e-06 [elim_shapecalc]: 1.329e-05 [elim_not_effective]: 1.844e-05 [opt_reshape]: 1.017e-05 [fold_const_symbol]: 1.459e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 2.583e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00050846 [validate]: 4.684e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.831919 [execute]: 9.09e-06 Sums bootstrap : 0.000516s : 0.06% type_inference : 0.011424s : 1.33% event_method : 0.000049s : 0.01% auto_monad : 0.000122s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000145s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.02% optimize.opt_a.loop_unroll : 0.000117s : 0.01% optimize.opt_a.a_1 : 0.003271s : 0.38% optimize.opt_a.with_stream_mark : 0.000046s : 0.01% optimize.opt_a.recompute_prepare : 0.000042s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000025s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.00% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001487s : 0.17% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.01% optimize.opt_a.a_after_grad : 0.000111s : 0.01% optimize.opt_a.renormalize : 0.003065s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.00% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.01% optimize.opt_a.cse : 0.000243s : 0.03% optimize.opt_a.a_3 : 0.000459s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000457s : 0.05% optimize.opt_b.b_1 : 0.000189s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000420s : 0.05% optimize.opt_after_cconv.c_1 : 0.000048s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.00% optimize.tuple_transform.d_1 : 0.000066s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.01% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000508s : 0.06% validate : 0.000047s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.831919s : 97.07% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000767 222 5.84% : 0.000045s : 12: substitution.arithmetic_simplify 1.79% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 55.54% : 0.000426s : 17: substitution.inline 2.07% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000016s : 3: substitution.less_batch_normalization 1.78% : 0.000014s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.92% : 0.000015s : 20: substitution.remove_not_recompute_node 3.20% : 0.000025s : 10: substitution.replace_applicator 1.35% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.55% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.29% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011350 2 86.64% : 0.009834s : 1: type_inference.infer 13.36% : 0.001516s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.53% : 0.000125s : 17: replace.inline 42.47% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 33 92.39% : 0.000417s : 17: match.inline 7.61% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.32% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.37% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.33% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001589 34 56.52% : 0.000898s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.48% : 0.000691s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.885970 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.34% : 0.003007s : 1: add_attr 0.34% : 0.002999s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000130s : 1: auto_monad 0.00% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000553s : 1: bootstrap 0.00% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000056s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000466s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000018s : 1: opt.transform.mutable_eliminate 0.56% : 0.004936s : 117: opt.transform.opt_a 0.01% : 0.000046s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000174s : 28: opt.transform.opt_b 0.01% : 0.000073s : 2: opt.transform.opt_trans_graph 0.01% : 0.000053s : 4: opt.transform.symbol_engine_opt 1.25% : 0.011103s : 1: opt_a 0.02% : 0.000140s : 1: opt_after_cconv 0.06% : 0.000518s : 1: opt_after_jit_grad 0.03% : 0.000290s : 1: opt_b 1.51% : 0.013365s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000055s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000034s : 1: remove_dup_value 0.19% : 0.001644s : 2: renormalize.infer 0.16% : 0.001407s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000051s : 1: rewriter_after_opt_a 0.02% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000101s : 1: symbol_engine_optimizer 93.90% : 0.831942s : 1: task_emit 0.01% : 0.000103s : 1: tuple_transform 1.29% : 0.011439s : 1: type_inference 0.01% : 0.000072s : 1: validate TotalTime = 0.0716552, [24] [bootstrap]: 0.00047692 [type_inference]: 0.00438755 [event_method]: 1.085e-05 [auto_monad]: 5.353e-05 [graph_reusing]: 5.77999e-06 [inline]: 1.64998e-06 [add_attr]: 0.00296159, [1] [add_attr_with_inline]: 0.00295259, [1] [Cycle 1]: 4.602e-05, [2] [tag_attr]: 1.192e-05 [meta_addattr_fg_expand]: 3.35e-06 [parallel-infer-symbol]: 2.70002e-06 [pre_auto_parallel]: 2.117e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 2.47001e-06 [optimize]: 0.00369469, [53] [py_interpret_to_execute]: 1.571e-05 [rewriter_before_opt_a]: 4.02e-05 [opt_a]: 0.00189291, [2] [Cycle 1]: 0.00129246, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 2.464e-05 [loop_unroll]: 1.382e-05 [a_1]: 0.00029481 [with_stream_mark]: 1.351e-05 [recompute_prepare]: 7.58999e-06 [updatestate_depend_eliminate]: 3.48999e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.81003e-06 [a_2]: 7.64e-05 [accelerated_algorithm]: 6.10002e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.50003e-06 [auto_parallel]: 5.57001e-06 [parallel]: 1.82e-05 [flash_sp]: 7.17002e-06 [merge_comm]: 3.44001e-06 [allreduce_fusion]: 3.28998e-06 [matmul_add_comm_reduction]: 3.774e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.75e-06 [virtual_dataset]: 5.95002e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.39998e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.37999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.158e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 9.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.81999e-06 [receive_attached]: 2.61e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 8.62e-06 [renormalize]: 0.00034606 [add_forward_monad_depend]: 4.20999e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.387e-05 [cse]: 2.864e-05 [a_3]: 3.941e-05 [Cycle 2]: 0.00059138, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.73003e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012503 [with_stream_mark]: 9.54e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.767e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.32999e-06 [parallel]: 4.15e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.91999e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.89e-06 [virtual_dataset]: 5.31002e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 6.15002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.39e-06 [a_after_grad]: 8.33001e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 9.80013e-07 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 6.01e-06 [cse]: 1.209e-05 [a_3]: 3.206e-05 [py_interpret_to_execute_after_opt_a]: 7.26001e-06 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 3.144e-05 [convert_after_rewriter]: 6.87002e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00044769 [opt_b]: 0.00017986, [1] [Cycle 1]: 0.00017411, [7] [b_1]: 0.00010649 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.4002e-07 [cse]: 1.623e-05 [optimize_parallel_all_gather_comm]: 1.498e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.00041248 [opt_after_cconv]: 9.386e-05, [1] [Cycle 1]: 8.818e-05, [7] [c_1]: 2.712e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.635e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.289e-05 [tuple_transform]: 6.816e-05, [1] [Cycle 1]: 6.385e-05, [4] [d_1]: 3.892e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 2.25002e-06 [add_recomputation]: 4.591e-05 [cse_after_recomputation]: 1.984e-05, [1] [Cycle 1]: 1.546e-05, [1] [cse]: 1.059e-05 [environ_conv]: 4.95999e-06 [swap_dp_allreduce_reducescatter]: 5.69e-06 [bias_add_comm_swap]: 2.92002e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.57001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.136e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.32998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.55002e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.75e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.845e-05, [1] [Cycle 1]: 6.441e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.87e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.596e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.00044711 [validate]: 3.23e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0593196 [execute]: 8.94e-06 Sums bootstrap : 0.000477s : 0.70% type_inference : 0.004388s : 6.48% event_method : 0.000011s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.06% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000346s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.66% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000412s : 0.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000447s : 0.66% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059320s : 87.57% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000123 26 18.27% : 0.000022s : 4: substitution.arithmetic_simplify 1.67% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.41% : 0.000005s : 4: substitution.graph_param_transform 65.51% : 0.000080s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.57% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004347 2 90.97% : 0.003954s : 1: type_inference.infer 9.03% : 0.000393s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.88% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.25% : 0.000002s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.88% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.07% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.53% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.57% : 0.000006s : 41: predicate.switch_simplify 0.90% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 43.40% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.60% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079587 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.73% : 0.002966s : 1: add_attr 3.71% : 0.002956s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000515s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000772s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.38% : 0.001896s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.57% : 0.000457s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.65% : 0.003698s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000189s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.55% : 0.059335s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.53% : 0.004402s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.106487, [24] [bootstrap]: 0.00055618 [type_inference]: 0.0110718 [event_method]: 4.491e-05 [auto_monad]: 0.00012189 [graph_reusing]: 8.07998e-06 [inline]: 1.81e-06 [add_attr]: 0.00304291, [1] [add_attr_with_inline]: 0.00303496, [1] [Cycle 1]: 6.952e-05, [2] [tag_attr]: 3.231e-05 [meta_addattr_fg_expand]: 9.08002e-06 [parallel-infer-symbol]: 3.10002e-06 [pre_auto_parallel]: 4.769e-05 [insert-virtual-dataset]: 2.50002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.0133146, [53] [py_interpret_to_execute]: 3.741e-05 [rewriter_before_opt_a]: 0.00013165 [opt_a]: 0.0110387, [3] [Cycle 1]: 0.00710339, [45] [expand_dump_flag]: 4.08001e-06 [switch_simplify]: 6.791e-05 [loop_unroll]: 5.612e-05 [a_1]: 0.00139636 [with_stream_mark]: 2.446e-05 [recompute_prepare]: 2.197e-05 [updatestate_depend_eliminate]: 8.99003e-06 [updatestate_assign_eliminate]: 8.18001e-06 [updatestate_loads_eliminate]: 7.46001e-06 [parameter_eliminate]: 2.58003e-06 [a_2]: 0.00024768 [accelerated_algorithm]: 3.09e-05 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 3.53999e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.553e-05 [auto_parallel]: 1.085e-05 [parallel]: 1.956e-05 [flash_sp]: 1.12e-05 [merge_comm]: 9.87001e-06 [allreduce_fusion]: 9.01002e-06 [matmul_add_comm_reduction]: 2.815e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.833e-05 [virtual_dataset]: 1.619e-05 [get_grad_eliminate_]: 1.57e-05 [virtual_output]: 1.536e-05 [merge_forward]: 9.76998e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.824e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.9e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 2.782e-05 [set_forward_comm_id_for_comm_node_pass]: 9.39e-06 [meta_fg_expand]: 0.00139971 [flash_sp_send_recv_attached]: 3.78001e-06 [receive_attached]: 2.41e-06 [after_resolve]: 6.008e-05 [a_after_grad]: 8.122e-05 [renormalize]: 0.00252728 [add_forward_monad_depend]: 9.28002e-06 [auto_monad_grad]: 5.02e-06 [auto_monad_eliminator]: 5.737e-05 [cse]: 0.0001761 [a_3]: 0.00033752 [Cycle 2]: 0.00301821, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.704e-05 [loop_unroll]: 4.416e-05 [a_1]: 0.00154212 [with_stream_mark]: 1.164e-05 [recompute_prepare]: 1.107e-05 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.72998e-06 [parameter_eliminate]: 1.13001e-06 [a_2]: 0.00012628 [accelerated_algorithm]: 1.184e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 9.12001e-06 [merge_send_recv]: 6.79999e-06 [auto_parallel]: 7.48999e-06 [parallel]: 4.47998e-06 [flash_sp]: 4.50999e-06 [merge_comm]: 5.99e-06 [allreduce_fusion]: 4.95999e-06 [matmul_add_comm_reduction]: 7.97e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 9.15001e-06 [get_grad_eliminate_]: 8.98002e-06 [virtual_output]: 8.64003e-06 [merge_forward]: 4.58999e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 9.24998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.666e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 2.377e-05 [meta_fg_expand]: 3.489e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 1.604e-05 [a_after_grad]: 1.412e-05 [renormalize]: 0.00060525 [add_forward_monad_depend]: 3.94002e-06 [auto_monad_grad]: 1.35001e-06 [auto_monad_eliminator]: 1.458e-05 [cse]: 4.815e-05 [a_3]: 6.551e-05 [Cycle 3]: 0.00090307, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.05e-05 [loop_unroll]: 9.09e-06 [a_1]: 0.0002499 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 9.09e-06 [updatestate_depend_eliminate]: 4.75999e-06 [updatestate_assign_eliminate]: 3.95998e-06 [updatestate_loads_eliminate]: 4.42998e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 0.00012357 [accelerated_algorithm]: 1.161e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.92999e-06 [shard_inline]: 9.16998e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.18e-06 [parallel]: 4.56002e-06 [flash_sp]: 9.99979e-07 [merge_comm]: 5.06002e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 8.08001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.59e-06 [get_grad_eliminate_]: 8.51002e-06 [virtual_output]: 8.52998e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.556e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 1.417e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42001e-06 [meta_fg_expand]: 3.2e-06 [flash_sp_send_recv_attached]: 7.10017e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.318e-05 [a_after_grad]: 1.398e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 1.05999e-06 [auto_monad_eliminator]: 1.125e-05 [cse]: 2.74e-05 [a_3]: 5.914e-05 [py_interpret_to_execute_after_opt_a]: 1.063e-05 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 4.83e-05 [convert_after_rewriter]: 9.24e-06 [order_py_execute_after_rewriter]: 6.83e-06 [mutable_eliminate]: 0.0004625 [opt_b]: 0.00028979, [1] [Cycle 1]: 0.00028356, [7] [b_1]: 0.00019039 [b_2]: 1.119e-05 [updatestate_depend_eliminate]: 7.09001e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 4.07998e-06 [renormalize]: 5.69999e-07 [cse]: 3.12e-05 [optimize_parallel_all_gather_comm]: 2.109e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.039e-05 [loop_unroll]: 0.00042817 [opt_after_cconv]: 0.00013863, [1] [Cycle 1]: 0.00013291, [7] [c_1]: 4.941e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 7.58001e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 4.01001e-06 [cse]: 3.068e-05 [renormalize]: 4.70027e-07 [remove_dup_value]: 3.132e-05 [tuple_transform]: 0.00010265, [1] [Cycle 1]: 9.773e-05, [4] [d_1]: 6.702e-05 [none_parameter_eliminate]: 1.97999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.018e-05 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 6.121e-05 [cse_after_recomputation]: 3.294e-05, [1] [Cycle 1]: 2.839e-05, [1] [cse]: 2.272e-05 [environ_conv]: 9.61998e-06 [swap_dp_allreduce_reducescatter]: 7.98999e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.45999e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.31998e-06 [slice_recompute_activation]: 2.54999e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.36998e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.752e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 5.51e-06 [overlap_recompute_and_grad_model_parallel]: 5.97999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 5.46998e-06 [overlap_grad_flash_sp]: 2.549e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 0.00010052, [1] [Cycle 1]: 9.592e-05, [6] [build]: 1.068e-05 [elim_shapecalc]: 1.338e-05 [elim_not_effective]: 1.861e-05 [opt_reshape]: 1.031e-05 [fold_const_symbol]: 1.498e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 2.598e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00051625 [validate]: 4.636e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0774476 [execute]: 8.55001e-06 Sums bootstrap : 0.000556s : 0.54% type_inference : 0.011072s : 10.84% event_method : 0.000045s : 0.04% auto_monad : 0.000122s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.04% optimize.rewriter_before_opt_a : 0.000132s : 0.13% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.12% optimize.opt_a.loop_unroll : 0.000109s : 0.11% optimize.opt_a.a_1 : 0.003188s : 3.12% optimize.opt_a.with_stream_mark : 0.000046s : 0.05% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000498s : 0.49% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000039s : 0.04% optimize.opt_a.meta_fg_expand : 0.001438s : 1.41% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.09% optimize.opt_a.a_after_grad : 0.000109s : 0.11% optimize.opt_a.renormalize : 0.003133s : 3.07% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000252s : 0.25% optimize.opt_a.a_3 : 0.000462s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.45% optimize.opt_b.b_1 : 0.000190s : 0.19% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000428s : 0.42% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000516s : 0.51% validate : 0.000046s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.077448s : 75.79% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000750 218 5.84% : 0.000044s : 11: substitution.arithmetic_simplify 1.90% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.62% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 55.33% : 0.000415s : 16: substitution.inline 2.06% : 0.000015s : 2: substitution.inline_without_move 1.43% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.80% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.83% : 0.000014s : 20: substitution.remove_not_recompute_node 3.15% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.68% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.23% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011001 2 85.49% : 0.009405s : 1: type_inference.infer 14.51% : 0.001596s : 1: type_inference.specialize ------[replace.] 0.000210 30 58.39% : 0.000123s : 16: replace.inline 41.61% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000438 30 93.00% : 0.000407s : 16: match.inline 7.00% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.19% : 0.000016s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000019s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.64% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.19% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.91% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.94% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.28% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001847 32 55.99% : 0.001034s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.01% : 0.000813s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131156 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.32% : 0.003047s : 1: add_attr 2.32% : 0.003039s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000065s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000129s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.45% : 0.000593s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000052s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.70% : 0.004850s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000177s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.42% : 0.011042s : 1: opt_a 0.11% : 0.000142s : 1: opt_after_cconv 0.40% : 0.000526s : 1: opt_after_jit_grad 0.22% : 0.000293s : 1: opt_b 10.15% : 0.013318s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000036s : 1: remove_dup_value 1.28% : 0.001685s : 2: renormalize.infer 1.09% : 0.001434s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000136s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000103s : 1: symbol_engine_optimizer 59.06% : 0.077464s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 8.45% : 0.011088s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x7-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x8-pynative],max_mem:10.0M TotalTime = 0.0219049, [24] [bootstrap]: 0.00053859 [type_inference]: 0.00631629 [event_method]: 1.485e-05 [auto_monad]: 5.69e-05 [graph_reusing]: 5.59998e-06 [inline]: 1.75001e-06 [add_attr]: 0.00341415, [1] [add_attr_with_inline]: 0.00340312, [1] [Cycle 1]: 4.518e-05, [2] [tag_attr]: 1.568e-05 [meta_addattr_fg_expand]: 4.22e-06 [parallel-infer-symbol]: 2.98003e-06 [pre_auto_parallel]: 3.05e-05 [insert-virtual-dataset]: 2.63998e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00405571, [53] [py_interpret_to_execute]: 2.121e-05 [rewriter_before_opt_a]: 5.9e-05 [opt_a]: 0.00215129, [2] [Cycle 1]: 0.00154801, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 3.403e-05 [loop_unroll]: 2.124e-05 [a_1]: 0.00046604 [with_stream_mark]: 1.309e-05 [recompute_prepare]: 7.87e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 3.41001e-06 [parameter_eliminate]: 1.86998e-06 [a_2]: 7.664e-05 [accelerated_algorithm]: 6.58998e-06 [shard]: 2.43002e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.30999e-06 [auto_parallel]: 6.11e-06 [parallel]: 2.47e-05 [flash_sp]: 7.46001e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.47002e-06 [virtual_dataset]: 6.40002e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 4.66002e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.052e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 9.16002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 9.99001e-06 [a_after_grad]: 8.43001e-06 [renormalize]: 0.00042487 [add_forward_monad_depend]: 5.22e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.467e-05 [cse]: 2.721e-05 [a_3]: 4.088e-05 [Cycle 2]: 0.00059368, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 7.02002e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012749 [with_stream_mark]: 9.69999e-06 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.869e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.31002e-06 [flash_sp]: 3.25002e-06 [merge_comm]: 2.84001e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 4.76002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.90002e-06 [virtual_dataset]: 5.24998e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.79999e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 5.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.34e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.03001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 9.21998e-06 [a_after_grad]: 7.87e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.653e-05 [a_3]: 3.268e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 2.49001e-06 [rewriter_after_opt_a]: 2.92e-05 [convert_after_rewriter]: 6.61e-06 [order_py_execute_after_rewriter]: 4.92e-06 [mutable_eliminate]: 0.00045252 [opt_b]: 0.0001844, [1] [Cycle 1]: 0.00017827, [7] [b_1]: 0.00011104 [b_2]: 7.17002e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 3.59985e-07 [cse]: 1.661e-05 [optimize_parallel_all_gather_comm]: 1.684e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.389e-05 [loop_unroll]: 0.00041827 [opt_after_cconv]: 9.571e-05, [1] [Cycle 1]: 9.01e-05, [7] [c_1]: 2.841e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.634e-05 [renormalize]: 2.40019e-07 [remove_dup_value]: 1.263e-05 [tuple_transform]: 7.023e-05, [1] [Cycle 1]: 6.555e-05, [4] [d_1]: 3.98e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 2.21e-06 [add_recomputation]: 9.245e-05 [cse_after_recomputation]: 2.234e-05, [1] [Cycle 1]: 1.772e-05, [1] [cse]: 1.244e-05 [environ_conv]: 5.12e-06 [swap_dp_allreduce_reducescatter]: 5.43002e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.16001e-06 [label_fine_grained_interleaved_index]: 2.99001e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.90025e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.26997e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.189e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.73001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.68e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.747e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 7.076e-05, [1] [Cycle 1]: 6.653e-05, [6] [build]: 2.58e-06 [elim_shapecalc]: 8.53001e-06 [elim_not_effective]: 1.169e-05 [opt_reshape]: 6.46e-06 [fold_const_symbol]: 9.41e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.51002e-06 [auto_monad_reorder]: 1.57e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 0.0001317 [opt_after_jit_grad]: 0.00045575 [validate]: 3.137e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.00660576 [execute]: 7.56999e-06 Sums bootstrap : 0.000539s : 3.07% type_inference : 0.006316s : 36.05% event_method : 0.000015s : 0.08% auto_monad : 0.000057s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000594s : 3.39% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000425s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.58% optimize.opt_b.b_1 : 0.000111s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000418s : 2.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000092s : 0.53% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000132s : 0.75% opt_after_jit_grad : 0.000456s : 2.60% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006606s : 37.70% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000169 30 14.58% : 0.000025s : 5: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.55% : 0.000006s : 4: substitution.graph_param_transform 66.79% : 0.000113s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.21% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006270 2 90.31% : 0.005663s : 1: type_inference.infer 9.69% : 0.000607s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.01% : 0.000028s : 3: replace.inline 29.99% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.51% : 0.000111s : 3: match.inline 8.49% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.12% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000001s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.79% : 0.000009s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.51% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.18% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.72% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000379 8 48.24% : 0.000183s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.76% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030922 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.05% : 0.003418s : 1: add_attr 11.02% : 0.003407s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.31% : 0.000097s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000062s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000577s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.12% : 0.000964s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.97% : 0.002154s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.51% : 0.000466s : 1: opt_after_jit_grad 0.61% : 0.000188s : 1: opt_b 13.13% : 0.004060s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000035s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000219s : 1: renormalize.infer 0.64% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000137s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.20% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000074s : 1: symbol_engine_optimizer 21.40% : 0.006616s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.47% : 0.006330s : 1: type_inference 0.20% : 0.000063s : 1: validate TotalTime = 0.0184555, [24] [bootstrap]: 0.00045828 [type_inference]: 0.00440443 [event_method]: 1.132e-05 [auto_monad]: 5.394e-05 [graph_reusing]: 5.14e-06 [inline]: 1.94e-06 [add_attr]: 0.00299594, [1] [add_attr_with_inline]: 0.00298817, [1] [Cycle 1]: 4.513e-05, [2] [tag_attr]: 1.323e-05 [meta_addattr_fg_expand]: 3.46001e-06 [parallel-infer-symbol]: 3.41999e-06 [pre_auto_parallel]: 2.256e-05 [insert-virtual-dataset]: 2.71e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.00372057, [53] [py_interpret_to_execute]: 1.549e-05 [rewriter_before_opt_a]: 3.91e-05 [opt_a]: 0.00187419, [2] [Cycle 1]: 0.0012671, [45] [expand_dump_flag]: 2.45002e-06 [switch_simplify]: 2.441e-05 [loop_unroll]: 1.393e-05 [a_1]: 0.00029676 [with_stream_mark]: 1.292e-05 [recompute_prepare]: 7.7e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.80998e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.789e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.23e-06 [merge_send_recv]: 8.34002e-06 [auto_parallel]: 5.51998e-06 [parallel]: 1.851e-05 [flash_sp]: 7.71999e-06 [merge_comm]: 3.91999e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.16999e-06 [virtual_dataset]: 6.00002e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.43002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.067e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 9.29e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 1.064e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00034435 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.64998e-06 [auto_monad_eliminator]: 1.321e-05 [cse]: 2.718e-05 [a_3]: 4.044e-05 [Cycle 2]: 0.00059786, [45] [expand_dump_flag]: 7.60017e-07 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.82999e-06 [a_1]: 0.00012541 [with_stream_mark]: 1.005e-05 [recompute_prepare]: 6.00002e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.40997e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 7.06e-05 [accelerated_algorithm]: 5.76998e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.63997e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.22999e-06 [parallel]: 4.22e-06 [flash_sp]: 3.65998e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 4.97999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.92999e-06 [virtual_dataset]: 5.10001e-06 [get_grad_eliminate_]: 5.04003e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 6.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 9.17001e-06 [a_after_grad]: 8.18999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.44999e-06 [cse]: 1.323e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.33e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 3.068e-05 [convert_after_rewriter]: 6.91999e-06 [order_py_execute_after_rewriter]: 5.22999e-06 [mutable_eliminate]: 0.00044898 [opt_b]: 0.00017988, [1] [Cycle 1]: 0.000174, [7] [b_1]: 0.00010779 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 3.69997e-07 [cse]: 1.58e-05 [optimize_parallel_all_gather_comm]: 1.556e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.335e-05 [loop_unroll]: 0.00040879 [opt_after_cconv]: 9.448e-05, [1] [Cycle 1]: 8.882e-05, [7] [c_1]: 2.829e-05 [parameter_eliminate]: 2.06003e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.576e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.445e-05 [tuple_transform]: 6.897e-05, [1] [Cycle 1]: 6.446e-05, [4] [d_1]: 3.893e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 8.339e-05 [cse_after_recomputation]: 2.08e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.11e-05 [environ_conv]: 4.48999e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.42998e-06 [label_fine_grained_interleaved_index]: 2.65997e-06 [merge_cast_opt]: 1.29003e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.62001e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.33002e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.30001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.183e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.65999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.689e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.38002e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.786e-05, [1] [Cycle 1]: 6.375e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.10999e-06 [elim_not_effective]: 1.148e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.613e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00044894 [validate]: 2.979e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.0060632 [execute]: 7.23e-06 Sums bootstrap : 0.000458s : 3.16% type_inference : 0.004404s : 30.37% event_method : 0.000011s : 0.08% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000422s : 2.91% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000148s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000344s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000449s : 3.10% optimize.opt_b.b_1 : 0.000108s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000409s : 2.82% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.10% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000083s : 0.58% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.10% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006063s : 41.81% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.65% : 0.000023s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.59% : 0.000006s : 4: substitution.graph_param_transform 65.36% : 0.000080s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 3.06% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004364 2 91.99% : 0.004014s : 1: type_inference.infer 8.01% : 0.000349s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.91% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000002s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.03% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.31% : 0.000009s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.21% : 0.000002s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 43.23% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.77% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026452 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.34% : 0.003000s : 1: add_attr 11.31% : 0.002992s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.33% : 0.000088s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.88% : 0.000496s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000417s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000458s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000778s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001877s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.73% : 0.000458s : 1: opt_after_jit_grad 0.69% : 0.000183s : 1: opt_b 14.08% : 0.003724s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000018s : 1: remove_dup_value 0.71% : 0.000189s : 1: renormalize.infer 0.56% : 0.000149s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.96% : 0.006073s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.70% : 0.004419s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0198633, [24] [bootstrap]: 0.00047151 [type_inference]: 0.00560863 [event_method]: 1.39e-05 [auto_monad]: 5.677e-05 [graph_reusing]: 5.78002e-06 [inline]: 2.04999e-06 [add_attr]: 0.00301033, [1] [add_attr_with_inline]: 0.00294697, [1] [Cycle 1]: 4.804e-05, [2] [tag_attr]: 1.621e-05 [meta_addattr_fg_expand]: 4.77998e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.634e-05 [insert-virtual-dataset]: 2.62001e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 2.45002e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00398502, [53] [py_interpret_to_execute]: 2.14e-05 [rewriter_before_opt_a]: 5.93e-05 [opt_a]: 0.00211104, [2] [Cycle 1]: 0.00150756, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 3.262e-05 [loop_unroll]: 2.086e-05 [a_1]: 0.00045342 [with_stream_mark]: 1.37e-05 [recompute_prepare]: 7.52002e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 3.09001e-06 [parameter_eliminate]: 2.22001e-06 [a_2]: 7.798e-05 [accelerated_algorithm]: 6.06e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 5.88998e-06 [merge_send_recv]: 8.23999e-06 [auto_parallel]: 6.04999e-06 [parallel]: 1.812e-05 [flash_sp]: 7.01999e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 7.65998e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.91003e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 4.23999e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.59999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.42001e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 2.50002e-06 [after_resolve]: 1.013e-05 [a_after_grad]: 8.77e-06 [renormalize]: 0.00040512 [add_forward_monad_depend]: 4.63001e-06 [auto_monad_grad]: 2.03002e-06 [auto_monad_eliminator]: 1.345e-05 [cse]: 2.786e-05 [a_3]: 4.078e-05 [Cycle 2]: 0.00059413, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012575 [with_stream_mark]: 9.43002e-06 [recompute_prepare]: 5.49998e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.824e-05 [accelerated_algorithm]: 5.53002e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.06002e-06 [parallel]: 4.32e-06 [flash_sp]: 3.43999e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.05999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.10002e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.97999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 5.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.45001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.04002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08998e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 9.12999e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.69001e-06 [cse]: 1.346e-05 [a_3]: 3.267e-05 [py_interpret_to_execute_after_opt_a]: 7.61001e-06 [slice_cell_reuse_recomputed_activation]: 1.78002e-06 [rewriter_after_opt_a]: 3.115e-05 [convert_after_rewriter]: 7.30998e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00044518 [opt_b]: 0.00018438, [1] [Cycle 1]: 0.00017845, [7] [b_1]: 0.00011048 [b_2]: 7.18998e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 3.59985e-07 [cse]: 1.617e-05 [optimize_parallel_all_gather_comm]: 1.6e-05 [overlap_param_gather]: 2.34001e-06 [cconv]: 2.319e-05 [loop_unroll]: 0.0004106 [opt_after_cconv]: 9.507e-05, [1] [Cycle 1]: 8.945e-05, [7] [c_1]: 2.823e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.611e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.401e-05 [tuple_transform]: 6.922e-05, [1] [Cycle 1]: 6.506e-05, [4] [d_1]: 3.946e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 4.318e-05 [cse_after_recomputation]: 2.087e-05, [1] [Cycle 1]: 1.635e-05, [1] [cse]: 1.094e-05 [environ_conv]: 4.42998e-06 [swap_dp_allreduce_reducescatter]: 5.19998e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.99999e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.30001e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.73001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.74e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.73e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.21003e-06 [split_layernorm_comm]: 2.00002e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 0.0001092, [1] [Cycle 1]: 0.00010491, [6] [build]: 3.983e-05 [elim_shapecalc]: 9.17001e-06 [elim_not_effective]: 1.208e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 8.80001e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.60001e-06 [auto_monad_reorder]: 1.617e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.57002e-06 [opt_after_jit_grad]: 0.00045063 [validate]: 3.097e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.00595955 [execute]: 7.02002e-06 Sums bootstrap : 0.000472s : 2.97% type_inference : 0.005609s : 35.30% event_method : 0.000014s : 0.09% auto_monad : 0.000057s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000579s : 3.64% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000405s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 2.80% optimize.opt_b.b_1 : 0.000110s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000411s : 2.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000040s : 0.25% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000451s : 2.84% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005960s : 37.50% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.81% : 0.000025s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.08% : 0.000005s : 4: substitution.graph_param_transform 66.78% : 0.000111s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 4: substitution.replace_old_param 6.97% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005568 2 90.00% : 0.005011s : 1: type_inference.infer 10.00% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.90% : 0.000028s : 3: replace.inline 30.10% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.21% : 0.000109s : 3: match.inline 8.79% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.70% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.54% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.75% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 47.26% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.74% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028320 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.65% : 0.003016s : 1: add_attr 10.42% : 0.002951s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.80% : 0.000510s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.60% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.35% : 0.000948s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.46% : 0.002114s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.63% : 0.000460s : 1: opt_after_jit_grad 0.66% : 0.000188s : 1: opt_b 14.08% : 0.003989s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.73% : 0.000207s : 1: renormalize.infer 0.68% : 0.000191s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.39% : 0.000112s : 1: symbol_engine_optimizer 21.08% : 0.005969s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.85% : 0.005623s : 1: type_inference 0.20% : 0.000058s : 1: validate TotalTime = 0.0376298, [24] [bootstrap]: 0.00050715 [type_inference]: 0.0114642 [event_method]: 4.686e-05 [auto_monad]: 0.00012341 [graph_reusing]: 8.74e-06 [inline]: 2.13998e-06 [add_attr]: 0.00303699, [1] [add_attr_with_inline]: 0.00302889, [1] [Cycle 1]: 7.107e-05, [2] [tag_attr]: 3.432e-05 [meta_addattr_fg_expand]: 9.54e-06 [parallel-infer-symbol]: 2.76999e-06 [pre_auto_parallel]: 5.007e-05 [insert-virtual-dataset]: 2.41998e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0133125, [53] [py_interpret_to_execute]: 3.697e-05 [rewriter_before_opt_a]: 0.00014492 [opt_a]: 0.0109974, [3] [Cycle 1]: 0.00703897, [45] [expand_dump_flag]: 3.80998e-06 [switch_simplify]: 7.417e-05 [loop_unroll]: 6.197e-05 [a_1]: 0.00145472 [with_stream_mark]: 2.358e-05 [recompute_prepare]: 2.162e-05 [updatestate_depend_eliminate]: 8.92999e-06 [updatestate_assign_eliminate]: 7.65998e-06 [updatestate_loads_eliminate]: 7.91001e-06 [parameter_eliminate]: 2.75002e-06 [a_2]: 0.00024293 [accelerated_algorithm]: 3.144e-05 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 3.33e-06 [shard_inline]: 1.598e-05 [merge_send_recv]: 1.699e-05 [auto_parallel]: 1.097e-05 [parallel]: 1.956e-05 [flash_sp]: 1.091e-05 [merge_comm]: 1.022e-05 [allreduce_fusion]: 8.95001e-06 [matmul_add_comm_reduction]: 2.678e-05 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 1.775e-05 [virtual_dataset]: 1.562e-05 [get_grad_eliminate_]: 1.538e-05 [virtual_output]: 1.504e-05 [merge_forward]: 9.71e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.851e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.809e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.691e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.0014316 [flash_sp_send_recv_attached]: 3.73999e-06 [receive_attached]: 2.73e-06 [after_resolve]: 5.898e-05 [a_after_grad]: 8.188e-05 [renormalize]: 0.00238374 [add_forward_monad_depend]: 8.94e-06 [auto_monad_grad]: 5.04e-06 [auto_monad_eliminator]: 5.553e-05 [cse]: 0.00016517 [a_3]: 0.00033659 [Cycle 2]: 0.00303702, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.703e-05 [loop_unroll]: 4.414e-05 [a_1]: 0.00156164 [with_stream_mark]: 1.151e-05 [recompute_prepare]: 1.138e-05 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 4.50001e-06 [updatestate_loads_eliminate]: 3.76001e-06 [parameter_eliminate]: 1.24e-06 [a_2]: 0.00012694 [accelerated_algorithm]: 1.185e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.22001e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.04001e-06 [parallel]: 5.09e-06 [flash_sp]: 3.59002e-06 [merge_comm]: 6.28e-06 [allreduce_fusion]: 5.22999e-06 [matmul_add_comm_reduction]: 7.88001e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 1.051e-05 [virtual_dataset]: 9.20001e-06 [get_grad_eliminate_]: 8.61002e-06 [virtual_output]: 8.65999e-06 [merge_forward]: 4.61002e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 8.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.636e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.377e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30001e-06 [meta_fg_expand]: 6.828e-05 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.627e-05 [a_after_grad]: 1.48e-05 [renormalize]: 0.00058792 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.38002e-06 [auto_monad_eliminator]: 1.46e-05 [cse]: 4.572e-05 [a_3]: 6.569e-05 [Cycle 3]: 0.00090791, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 1.043e-05 [loop_unroll]: 9.04998e-06 [a_1]: 0.00024919 [with_stream_mark]: 9.91998e-06 [recompute_prepare]: 9.72999e-06 [updatestate_depend_eliminate]: 4.62e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 4.07003e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.00012764 [accelerated_algorithm]: 1.181e-05 [shard]: 9.09989e-07 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.26002e-06 [merge_send_recv]: 6.94999e-06 [auto_parallel]: 7e-06 [parallel]: 4.63999e-06 [flash_sp]: 1.09e-06 [merge_comm]: 5.04998e-06 [allreduce_fusion]: 4.82998e-06 [matmul_add_comm_reduction]: 7.78001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 9.97001e-06 [virtual_dataset]: 8.68001e-06 [get_grad_eliminate_]: 8.50999e-06 [virtual_output]: 8.20999e-06 [merge_forward]: 4.47998e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 8.80001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.618e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.404e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.451e-05 [a_after_grad]: 1.499e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.27999e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 1.091e-05 [cse]: 2.52e-05 [a_3]: 5.943e-05 [py_interpret_to_execute_after_opt_a]: 1.055e-05 [slice_cell_reuse_recomputed_activation]: 2.25002e-06 [rewriter_after_opt_a]: 4.922e-05 [convert_after_rewriter]: 8.95999e-06 [order_py_execute_after_rewriter]: 6.71999e-06 [mutable_eliminate]: 0.00051177 [opt_b]: 0.00028658, [1] [Cycle 1]: 0.00028059, [7] [b_1]: 0.00018939 [b_2]: 1.055e-05 [updatestate_depend_eliminate]: 6.91001e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.88001e-06 [renormalize]: 4.7998e-07 [cse]: 3.08e-05 [optimize_parallel_all_gather_comm]: 2.057e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 2.06e-05 [loop_unroll]: 0.00042336 [opt_after_cconv]: 0.0001353, [1] [Cycle 1]: 0.00012959, [7] [c_1]: 4.848e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 7.23e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.76999e-06 [cse]: 2.888e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.892e-05 [tuple_transform]: 0.00010167, [1] [Cycle 1]: 9.709e-05, [4] [d_1]: 6.662e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 9.76e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.837e-05 [cse_after_recomputation]: 3.128e-05, [1] [Cycle 1]: 2.656e-05, [1] [cse]: 2.093e-05 [environ_conv]: 8.48999e-06 [swap_dp_allreduce_reducescatter]: 8.13001e-06 [bias_add_comm_swap]: 2.22001e-06 [label_micro_interleaved_index]: 4.85001e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.66e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.47999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.21998e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.16997e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.782e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 4.89e-06 [overlap_recompute_and_grad_model_parallel]: 5.67999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46998e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 5.01002e-06 [overlap_grad_flash_sp]: 2.433e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.50002e-06 [split_layernorm_comm]: 1.73997e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 9.838e-05, [1] [Cycle 1]: 9.419e-05, [6] [build]: 1e-05 [elim_shapecalc]: 1.339e-05 [elim_not_effective]: 1.877e-05 [opt_reshape]: 9.81e-06 [fold_const_symbol]: 1.483e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.87001e-06 [auto_monad_reorder]: 2.655e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00046833 [validate]: 4.36e-05 [backend_pass]: 1.04998e-06 [task_emit]: 0.00830405 [execute]: 6.66e-06 Sums bootstrap : 0.000507s : 1.52% type_inference : 0.011464s : 34.39% event_method : 0.000047s : 0.14% auto_monad : 0.000123s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000145s : 0.43% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.35% optimize.opt_a.a_1 : 0.003266s : 9.80% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000498s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000022s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001503s : 4.51% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000112s : 0.34% optimize.opt_a.renormalize : 0.002972s : 8.92% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000236s : 0.71% optimize.opt_a.a_3 : 0.000462s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000512s : 1.54% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000423s : 1.27% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000027s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000468s : 1.41% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008304s : 24.91% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000806 222 5.59% : 0.000045s : 12: substitution.arithmetic_simplify 1.71% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 57.84% : 0.000466s : 17: substitution.inline 1.96% : 0.000016s : 2: substitution.inline_without_move 1.21% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000015s : 3: substitution.less_batch_normalization 1.67% : 0.000013s : 11: substitution.minmaximum_grad 0.66% : 0.000005s : 5: substitution.partial_eliminate 1.72% : 0.000014s : 20: substitution.remove_not_recompute_node 3.03% : 0.000024s : 10: substitution.replace_applicator 1.32% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.38% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.68% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.34% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.22% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011391 2 86.83% : 0.009890s : 1: type_inference.infer 13.17% : 0.001501s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.44% : 0.000125s : 17: replace.inline 42.56% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000493 33 92.79% : 0.000457s : 17: match.inline 7.21% : 0.000036s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000008s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.72% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.67% : 0.000043s : 249: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.30% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.92% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.96% : 0.000037s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.11% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.70% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001612 34 57.49% : 0.000927s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.51% : 0.000685s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062210 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.89% : 0.003041s : 1: add_attr 4.88% : 0.003033s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000130s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.87% : 0.000544s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.69% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.84% : 0.000521s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.93% : 0.004933s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.68% : 0.011000s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000478s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.41% : 0.013316s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.59% : 0.001610s : 2: renormalize.infer 2.17% : 0.001348s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000053s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.36% : 0.008314s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.45% : 0.011479s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0186014, [24] [bootstrap]: 0.00047946 [type_inference]: 0.00434084 [event_method]: 1.01e-05 [auto_monad]: 5.293e-05 [graph_reusing]: 5.56e-06 [inline]: 1.87999e-06 [add_attr]: 0.00298827, [1] [add_attr_with_inline]: 0.00298036, [1] [Cycle 1]: 4.528e-05, [2] [tag_attr]: 1.185e-05 [meta_addattr_fg_expand]: 3.28e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.245e-05 [insert-virtual-dataset]: 2.50002e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00371854, [53] [py_interpret_to_execute]: 1.628e-05 [rewriter_before_opt_a]: 3.904e-05 [opt_a]: 0.00189945, [2] [Cycle 1]: 0.00127155, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 2.499e-05 [loop_unroll]: 1.417e-05 [a_1]: 0.00029463 [with_stream_mark]: 1.404e-05 [recompute_prepare]: 7.97e-06 [updatestate_depend_eliminate]: 3.72998e-06 [updatestate_assign_eliminate]: 3.37002e-06 [updatestate_loads_eliminate]: 3.55e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.763e-05 [accelerated_algorithm]: 6.61e-06 [shard]: 2.28002e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.26998e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 6.17999e-06 [parallel]: 1.842e-05 [flash_sp]: 7.38e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.9e-06 [matmul_add_comm_reduction]: 9.51e-06 [allreduce_slice_to_reducescatter]: 8.10018e-07 [virtual_shard_identity]: 7.21999e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.84999e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.79999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.51e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55998e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.70997e-06 [after_resolve]: 1.075e-05 [a_after_grad]: 8.75999e-06 [renormalize]: 0.0003429 [add_forward_monad_depend]: 4.32998e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.433e-05 [cse]: 2.947e-05 [a_3]: 4.033e-05 [Cycle 2]: 0.00061819, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.76998e-06 [a_1]: 0.0001246 [with_stream_mark]: 1.067e-05 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.12001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.882e-05 [accelerated_algorithm]: 5.31002e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.34998e-06 [shard_inline]: 5.61998e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.68002e-06 [parallel]: 4.05e-06 [flash_sp]: 4.02e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.37999e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 6.48998e-06 [virtual_dataset]: 5.30999e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 5.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89999e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.1e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.49999e-06 [cse]: 1.291e-05 [a_3]: 3.177e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 2.08002e-06 [rewriter_after_opt_a]: 3.255e-05 [convert_after_rewriter]: 7.36999e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00045101 [opt_b]: 0.00018205, [1] [Cycle 1]: 0.00017616, [7] [b_1]: 0.00010841 [b_2]: 7.24001e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.29979e-07 [cse]: 1.656e-05 [optimize_parallel_all_gather_comm]: 1.598e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.358e-05 [loop_unroll]: 0.00041323 [opt_after_cconv]: 9.359e-05, [1] [Cycle 1]: 8.833e-05, [7] [c_1]: 2.749e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 4.65001e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 1.625e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.322e-05 [tuple_transform]: 6.998e-05, [1] [Cycle 1]: 6.556e-05, [4] [d_1]: 3.985e-05 [none_parameter_eliminate]: 1.79998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.492e-05 [cse_after_recomputation]: 2.075e-05, [1] [Cycle 1]: 1.645e-05, [1] [cse]: 1.112e-05 [environ_conv]: 4.42998e-06 [swap_dp_allreduce_reducescatter]: 5.81e-06 [bias_add_comm_swap]: 2.34999e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.54999e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.73998e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.37e-06 [add_comm_op_reuse_tag]: 1.32e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.198e-05 [grouped_pairwise_exchange_alltoall]: 1.66998e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 4.86997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 4.38999e-06 [overlap_grad_flash_sp]: 1.682e-05 [begin_end_overlap_inline]: 8.30012e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 6.741e-05, [1] [Cycle 1]: 6.339e-05, [6] [build]: 2.37999e-06 [elim_shapecalc]: 8.14997e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.70001e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.617e-05 [get_jit_bprop_graph]: 9.60019e-07 [rewriter_after_jit_bprop_graph]: 3.48999e-06 [opt_after_jit_grad]: 0.00044721 [validate]: 3.133e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00626393 [execute]: 7.22002e-06 Sums bootstrap : 0.000479s : 3.28% type_inference : 0.004341s : 29.67% event_method : 0.000010s : 0.07% auto_monad : 0.000053s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000419s : 2.87% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000343s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.14% optimize.opt_a.cse : 0.000042s : 0.29% optimize.opt_a.a_3 : 0.000072s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 3.08% optimize.opt_b.b_1 : 0.000108s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000413s : 2.82% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.06% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006264s : 42.82% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000125 26 17.81% : 0.000022s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.61% : 0.000006s : 4: substitution.graph_param_transform 66.30% : 0.000083s : 2: substitution.inline 2.41% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.46% : 0.000004s : 4: substitution.remove_not_recompute_node 3.01% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004301 2 91.74% : 0.003946s : 1: type_inference.infer 8.26% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.75% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.73% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.42% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.34% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.48% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.36% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.92% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.16% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.84% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.61% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.53% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 41.85% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.15% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026584 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.26% : 0.002993s : 1: add_attr 11.22% : 0.002984s : 1: add_attr_with_inline 0.02% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000517s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000422s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000460s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000776s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.16% : 0.001902s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.72% : 0.000457s : 1: opt_after_jit_grad 0.70% : 0.000186s : 1: opt_b 14.00% : 0.003722s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.71% : 0.000188s : 1: renormalize.infer 0.56% : 0.000149s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.60% : 0.006274s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.38% : 0.004355s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0361211, [24] [bootstrap]: 0.00051366 [type_inference]: 0.0103519 [event_method]: 4.016e-05 [auto_monad]: 0.00011668 [graph_reusing]: 7.87e-06 [inline]: 1.87001e-06 [add_attr]: 0.00300388, [1] [add_attr_with_inline]: 0.00299577, [1] [Cycle 1]: 6.694e-05, [2] [tag_attr]: 3.167e-05 [meta_addattr_fg_expand]: 8.61002e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 4.684e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.0129652, [53] [py_interpret_to_execute]: 3.521e-05 [rewriter_before_opt_a]: 0.00012682 [opt_a]: 0.0107335, [3] [Cycle 1]: 0.00685812, [45] [expand_dump_flag]: 3.65e-06 [switch_simplify]: 6.646e-05 [loop_unroll]: 5.472e-05 [a_1]: 0.00133447 [with_stream_mark]: 2.32e-05 [recompute_prepare]: 2.129e-05 [updatestate_depend_eliminate]: 9.15999e-06 [updatestate_assign_eliminate]: 7.71001e-06 [updatestate_loads_eliminate]: 7.48e-06 [parameter_eliminate]: 2.66e-06 [a_2]: 0.00024479 [accelerated_algorithm]: 3.102e-05 [shard]: 2.11e-06 [meta_shard_fg_expand]: 3.3e-06 [shard_inline]: 1.63e-05 [merge_send_recv]: 1.684e-05 [auto_parallel]: 1.037e-05 [parallel]: 2.001e-05 [flash_sp]: 1.187e-05 [merge_comm]: 9.54e-06 [allreduce_fusion]: 8.70999e-06 [matmul_add_comm_reduction]: 2.664e-05 [allreduce_slice_to_reducescatter]: 9.89996e-07 [virtual_shard_identity]: 1.764e-05 [virtual_dataset]: 1.55e-05 [get_grad_eliminate_]: 1.486e-05 [virtual_output]: 1.495e-05 [merge_forward]: 9.25999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.725e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.863e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 2.68e-05 [set_forward_comm_id_for_comm_node_pass]: 9.34998e-06 [meta_fg_expand]: 0.00139766 [flash_sp_send_recv_attached]: 3.70998e-06 [receive_attached]: 2.73e-06 [after_resolve]: 5.966e-05 [a_after_grad]: 8.084e-05 [renormalize]: 0.00235051 [add_forward_monad_depend]: 9.44e-06 [auto_monad_grad]: 5.14998e-06 [auto_monad_eliminator]: 5.537e-05 [cse]: 0.00018902 [a_3]: 0.00033907 [Cycle 2]: 0.00293756, [45] [expand_dump_flag]: 1.52001e-06 [switch_simplify]: 4.66e-05 [loop_unroll]: 4.366e-05 [a_1]: 0.00152843 [with_stream_mark]: 1.156e-05 [recompute_prepare]: 1.053e-05 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 3.74002e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 0.00012639 [accelerated_algorithm]: 1.202e-05 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 7.21999e-06 [auto_parallel]: 7.21999e-06 [parallel]: 4.62e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 5.04e-06 [allreduce_fusion]: 4.74998e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.042e-05 [virtual_dataset]: 8.87999e-06 [get_grad_eliminate_]: 8.75999e-06 [virtual_output]: 8.43999e-06 [merge_forward]: 4.58999e-06 [cell_reuse_recompute_pass]: 8.60018e-07 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.597e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44e-06 [meta_fg_expand]: 3.345e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.439e-05 [a_after_grad]: 1.38e-05 [renormalize]: 0.0005772 [add_forward_monad_depend]: 3.72998e-06 [auto_monad_grad]: 1.52001e-06 [auto_monad_eliminator]: 1.4e-05 [cse]: 4.462e-05 [a_3]: 6.46e-05 [Cycle 3]: 0.00092392, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 1.034e-05 [loop_unroll]: 8.80999e-06 [a_1]: 0.0002472 [with_stream_mark]: 9.86e-06 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 3.87998e-06 [updatestate_loads_eliminate]: 3.75998e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012246 [accelerated_algorithm]: 1.15e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 9.00999e-06 [merge_send_recv]: 6.69001e-06 [auto_parallel]: 6.96001e-06 [parallel]: 4.48001e-06 [flash_sp]: 9.49978e-07 [merge_comm]: 4.88001e-06 [allreduce_fusion]: 5.04e-06 [matmul_add_comm_reduction]: 7.72998e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 9.94999e-06 [virtual_dataset]: 8.61002e-06 [get_grad_eliminate_]: 8.36002e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 3.204e-05 [cell_reuse_recompute_pass]: 1.24003e-06 [offload_activation]: 8.83001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.624e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.389e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22999e-06 [meta_fg_expand]: 2.89001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.45e-05 [a_after_grad]: 1.487e-05 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 1.042e-05 [cse]: 2.508e-05 [a_3]: 5.917e-05 [py_interpret_to_execute_after_opt_a]: 1.01e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 4.724e-05 [convert_after_rewriter]: 9.04e-06 [order_py_execute_after_rewriter]: 7.04001e-06 [mutable_eliminate]: 0.00045397 [opt_b]: 0.0002857, [1] [Cycle 1]: 0.00027982, [7] [b_1]: 0.00018964 [b_2]: 1.055e-05 [updatestate_depend_eliminate]: 6.86001e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 4.28001e-06 [renormalize]: 3.89991e-07 [cse]: 2.986e-05 [optimize_parallel_all_gather_comm]: 2.075e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.111e-05 [loop_unroll]: 0.00042293 [opt_after_cconv]: 0.00013374, [1] [Cycle 1]: 0.0001279, [7] [c_1]: 4.804e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 6.93e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.73999e-06 [cse]: 2.884e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.877e-05 [tuple_transform]: 0.00010066, [1] [Cycle 1]: 9.607e-05, [4] [d_1]: 6.608e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 9.74e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.795e-05 [cse_after_recomputation]: 3.226e-05, [1] [Cycle 1]: 2.756e-05, [1] [cse]: 2.205e-05 [environ_conv]: 9.10999e-06 [swap_dp_allreduce_reducescatter]: 8.04002e-06 [bias_add_comm_swap]: 2.85002e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.31998e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.48998e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.25999e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.727e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 4.92999e-06 [overlap_recompute_and_grad_model_parallel]: 5.72001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69998e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 5.39998e-06 [overlap_grad_flash_sp]: 2.443e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.43e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 1.38002e-06 [symbol_engine_optimizer]: 9.774e-05, [1] [Cycle 1]: 9.355e-05, [6] [build]: 9.83002e-06 [elim_shapecalc]: 1.328e-05 [elim_not_effective]: 1.8e-05 [opt_reshape]: 9.58002e-06 [fold_const_symbol]: 1.496e-05 [renormalize]: 5.8001e-07 [detach_backward]: 1.94999e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 2.429e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00046788 [validate]: 4.383e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00830079 [execute]: 7.56001e-06 Sums bootstrap : 0.000514s : 1.61% type_inference : 0.010352s : 32.48% event_method : 0.000040s : 0.13% auto_monad : 0.000117s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003110s : 9.76% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000494s : 1.55% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.10% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000046s : 0.14% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001434s : 4.50% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000089s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.002928s : 9.19% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000259s : 0.81% optimize.opt_a.a_3 : 0.000463s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000454s : 1.42% optimize.opt_b.b_1 : 0.000190s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000423s : 1.33% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000001s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000468s : 1.47% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008301s : 26.05% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000734 218 5.93% : 0.000044s : 11: substitution.arithmetic_simplify 1.93% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.65% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.08% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.82% : 0.000403s : 16: substitution.inline 2.18% : 0.000016s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000015s : 3: substitution.less_batch_normalization 1.79% : 0.000013s : 11: substitution.minmaximum_grad 0.82% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000013s : 20: substitution.remove_not_recompute_node 3.29% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.70% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.27% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010283 2 87.62% : 0.009010s : 1: type_inference.infer 12.38% : 0.001274s : 1: type_inference.specialize ------[replace.] 0.000204 30 59.65% : 0.000122s : 16: replace.inline 40.35% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000424 30 92.91% : 0.000394s : 16: match.inline 7.09% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.13% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.68% : 0.000042s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.34% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.19% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.66% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 97: predicate.switch_defer_inline 2.89% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.10% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001469 32 57.84% : 0.000850s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.16% : 0.000619s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060101 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.01% : 0.003008s : 1: add_attr 4.99% : 0.003000s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000123s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.92% : 0.000551s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.91% : 0.004756s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.86% : 0.010736s : 1: opt_a 0.23% : 0.000137s : 1: opt_after_cconv 0.79% : 0.000477s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.58% : 0.012969s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000052s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000033s : 1: remove_dup_value 2.60% : 0.001562s : 2: renormalize.infer 2.25% : 0.001353s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.83% : 0.008311s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.25% : 0.010368s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x8-kbk],max_mem:10.0M TotalTime = 0.859662, [24] [bootstrap]: 0.00054755 [type_inference]: 0.00620176 [event_method]: 1.349e-05 [auto_monad]: 5.667e-05 [graph_reusing]: 5.67999e-06 [inline]: 1.64e-06 [add_attr]: 0.00342786, [1] [add_attr_with_inline]: 0.00341672, [1] [Cycle 1]: 4.607e-05, [2] [tag_attr]: 1.548e-05 [meta_addattr_fg_expand]: 4.53001e-06 [parallel-infer-symbol]: 3.25998e-06 [pre_auto_parallel]: 2.845e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.32999e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00401553, [53] [py_interpret_to_execute]: 2.113e-05 [rewriter_before_opt_a]: 5.947e-05 [opt_a]: 0.00211197, [2] [Cycle 1]: 0.00152023, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.274e-05 [loop_unroll]: 2.075e-05 [a_1]: 0.00045607 [with_stream_mark]: 1.304e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.43999e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 7.554e-05 [accelerated_algorithm]: 6.54001e-06 [shard]: 1.98002e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 8.27998e-06 [auto_parallel]: 6.09001e-06 [parallel]: 2.595e-05 [flash_sp]: 6.96001e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 9.89996e-07 [virtual_shard_identity]: 7.31999e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.76e-06 [merge_forward]: 4.12e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 2.51e-06 [flash_sp_send_recv_attached]: 2.26998e-06 [receive_attached]: 2.25002e-06 [after_resolve]: 1.102e-05 [a_after_grad]: 8.95999e-06 [renormalize]: 0.00041285 [add_forward_monad_depend]: 4.58001e-06 [auto_monad_grad]: 1.90001e-06 [auto_monad_eliminator]: 1.394e-05 [cse]: 2.802e-05 [a_3]: 4.042e-05 [Cycle 2]: 0.00058291, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.52001e-06 [loop_unroll]: 5.40001e-06 [a_1]: 0.00012519 [with_stream_mark]: 9.52001e-06 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.41e-06 [parameter_eliminate]: 1.09e-06 [a_2]: 6.792e-05 [accelerated_algorithm]: 5.35999e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.32e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.21001e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.15e-06 [flash_sp]: 3.53e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.53998e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.64998e-06 [virtual_dataset]: 5.21002e-06 [get_grad_eliminate_]: 4.87e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.36e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 5.72999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57999e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95998e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.97e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 5.86998e-06 [cse]: 1.265e-05 [a_3]: 3.168e-05 [py_interpret_to_execute_after_opt_a]: 7.2e-06 [slice_cell_reuse_recomputed_activation]: 2.26998e-06 [rewriter_after_opt_a]: 3.15e-05 [convert_after_rewriter]: 6.79001e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.0005092 [opt_b]: 0.0001807, [1] [Cycle 1]: 0.00017466, [7] [b_1]: 0.00010815 [b_2]: 6.64001e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 5.89993e-07 [cse]: 1.64e-05 [optimize_parallel_all_gather_comm]: 1.644e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.352e-05 [loop_unroll]: 0.00041384 [opt_after_cconv]: 9.22e-05, [1] [Cycle 1]: 8.683e-05, [7] [c_1]: 2.746e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 4.81997e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.585e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.265e-05 [tuple_transform]: 6.808e-05, [1] [Cycle 1]: 6.348e-05, [4] [d_1]: 3.817e-05 [none_parameter_eliminate]: 1.42999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.00002e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.306e-05 [cse_after_recomputation]: 2.126e-05, [1] [Cycle 1]: 1.692e-05, [1] [cse]: 1.153e-05 [environ_conv]: 4.55001e-06 [swap_dp_allreduce_reducescatter]: 5.16998e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.37999e-06 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.12e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.31002e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.149e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.68999e-06 [overlap_recompute_and_grad_model_parallel]: 4.62e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.00998e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.28002e-06 [split_layernorm_comm]: 1.81998e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 6.894e-05, [1] [Cycle 1]: 6.481e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.13999e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 9.48002e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.571e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00045025 [validate]: 3.069e-05 [backend_pass]: 8.00006e-07 [task_emit]: 0.844631 [execute]: 8.27998e-06 Sums bootstrap : 0.000548s : 0.06% type_inference : 0.006202s : 0.73% event_method : 0.000013s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000581s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000030s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000413s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000041s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000509s : 0.06% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000414s : 0.05% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.01% optimize.cse_after_recomputation.cse : 0.000012s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.05% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.844631s : 98.76% execute : 0.000008s : 0.00% Time group info: ------[substitution.] 0.000170 30 14.54% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.34% : 0.000006s : 4: substitution.graph_param_transform 66.01% : 0.000112s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000004s : 4: substitution.remove_not_recompute_node 2.61% : 0.000004s : 4: substitution.replace_old_param 7.30% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006154 2 90.93% : 0.005596s : 1: type_inference.infer 9.07% : 0.000558s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.08% : 0.000028s : 3: replace.inline 28.92% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 90.66% : 0.000110s : 3: match.inline 9.34% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.91% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.33% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.99% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.29% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.68% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.88% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 1.09% : 0.000002s : 11: predicate.transpose_eliminate 1.45% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000344 8 46.55% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.45% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.868613 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.40% : 0.003432s : 1: add_attr 0.39% : 0.003420s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000062s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000589s : 1: bootstrap 0.00% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000518s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000946s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000042s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.24% : 0.002115s : 1: opt_a 0.01% : 0.000096s : 1: opt_after_cconv 0.05% : 0.000460s : 1: opt_after_jit_grad 0.02% : 0.000184s : 1: opt_b 0.46% : 0.004019s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000214s : 1: renormalize.infer 0.02% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 97.24% : 0.844648s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.72% : 0.006215s : 1: type_inference 0.01% : 0.000053s : 1: validate TotalTime = 0.0721576, [24] [bootstrap]: 0.00048391 [type_inference]: 0.00450402 [event_method]: 1.074e-05 [auto_monad]: 5.346e-05 [graph_reusing]: 5.35999e-06 [inline]: 1.81e-06 [add_attr]: 0.00298982, [1] [add_attr_with_inline]: 0.00298197, [1] [Cycle 1]: 4.531e-05, [2] [tag_attr]: 1.227e-05 [meta_addattr_fg_expand]: 3.56999e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.219e-05 [insert-virtual-dataset]: 2.48998e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.92999e-06 [optimize]: 0.00365976, [53] [py_interpret_to_execute]: 1.503e-05 [rewriter_before_opt_a]: 3.897e-05 [opt_a]: 0.00184992, [2] [Cycle 1]: 0.00125529, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 2.434e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00029112 [with_stream_mark]: 1.287e-05 [recompute_prepare]: 7.26999e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.11001e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 7.679e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.54999e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.89e-06 [parallel]: 1.806e-05 [flash_sp]: 7.27002e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.67002e-06 [matmul_add_comm_reduction]: 9.32001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.93998e-06 [virtual_dataset]: 5.70001e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 5.34e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.088e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 8.76002e-06 [renormalize]: 0.00034428 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.333e-05 [cse]: 2.846e-05 [a_3]: 4.007e-05 [Cycle 2]: 0.00058555, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.92002e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012331 [with_stream_mark]: 1.108e-05 [recompute_prepare]: 5.75001e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.688e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.16002e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.18001e-06 [auto_parallel]: 5.08002e-06 [parallel]: 4.77e-06 [flash_sp]: 3.4e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 5.87999e-06 [virtual_dataset]: 4.99e-06 [get_grad_eliminate_]: 4.89998e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.24998e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.92003e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89001e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 8.73001e-06 [a_after_grad]: 7.9e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09998e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.13002e-06 [cse]: 1.201e-05 [a_3]: 3.314e-05 [py_interpret_to_execute_after_opt_a]: 7.56999e-06 [slice_cell_reuse_recomputed_activation]: 2.13998e-06 [rewriter_after_opt_a]: 3.267e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00044576 [opt_b]: 0.00017836, [1] [Cycle 1]: 0.0001724, [7] [b_1]: 0.00010644 [b_2]: 6.76e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.80009e-07 [cse]: 1.569e-05 [optimize_parallel_all_gather_comm]: 1.547e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.325e-05 [loop_unroll]: 0.00041188 [opt_after_cconv]: 9.297e-05, [1] [Cycle 1]: 8.722e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.562e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.316e-05 [tuple_transform]: 6.898e-05, [1] [Cycle 1]: 6.479e-05, [4] [d_1]: 3.954e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 5.179e-05 [cse_after_recomputation]: 2.03e-05, [1] [Cycle 1]: 1.563e-05, [1] [cse]: 1.063e-05 [environ_conv]: 4.82998e-06 [swap_dp_allreduce_reducescatter]: 5.17999e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.36002e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.39e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.174e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.93999e-06 [overlap_recompute_and_grad_model_parallel]: 5.05999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67999e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.755e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 1.29998e-06 [symbol_engine_optimizer]: 6.812e-05, [1] [Cycle 1]: 6.372e-05, [6] [build]: 2.31998e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.151e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.617e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00046216 [validate]: 3.195e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.059689 [execute]: 8.54e-06 Sums bootstrap : 0.000484s : 0.71% type_inference : 0.004504s : 6.60% event_method : 0.000011s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000414s : 0.61% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000344s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.65% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000412s : 0.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000462s : 0.68% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059689s : 87.50% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 19.06% : 0.000023s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.76% : 0.000006s : 4: substitution.graph_param_transform 64.81% : 0.000079s : 2: substitution.inline 2.30% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.65% : 0.000004s : 4: substitution.remove_not_recompute_node 2.90% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004461 2 91.68% : 0.004089s : 1: type_inference.infer 8.32% : 0.000371s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.90% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.44% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.79% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.28% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000268 6 43.93% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.07% : 0.000150s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080069 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.74% : 0.002994s : 1: add_attr 3.73% : 0.002986s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000521s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.95% : 0.000761s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.31% : 0.001853s : 1: opt_a 0.12% : 0.000096s : 1: opt_after_cconv 0.59% : 0.000472s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.58% : 0.003664s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.24% : 0.000189s : 1: renormalize.infer 0.18% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.57% : 0.059705s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.64% : 0.004519s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.0752115, [24] [bootstrap]: 0.00047258 [type_inference]: 0.00572541 [event_method]: 1.403e-05 [auto_monad]: 5.805e-05 [graph_reusing]: 5.69999e-06 [inline]: 2.24001e-06 [add_attr]: 0.00306169, [1] [add_attr_with_inline]: 0.00305335, [1] [Cycle 1]: 4.535e-05, [2] [tag_attr]: 1.555e-05 [meta_addattr_fg_expand]: 4.13999e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.621e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.64e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00397511, [53] [py_interpret_to_execute]: 2.085e-05 [rewriter_before_opt_a]: 5.94e-05 [opt_a]: 0.00213953, [2] [Cycle 1]: 0.00153413, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 3.206e-05 [loop_unroll]: 2.087e-05 [a_1]: 0.00045139 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 3.80998e-06 [updatestate_assign_eliminate]: 3.68999e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 2.34001e-06 [a_2]: 7.57e-05 [accelerated_algorithm]: 5.98002e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 8.21002e-06 [auto_parallel]: 6.07001e-06 [parallel]: 1.838e-05 [flash_sp]: 7.82e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.70998e-06 [matmul_add_comm_reduction]: 8.74e-06 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 7.30998e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 1.59998e-06 [before_grad]: 9.71998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.75002e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.48999e-06 [renormalize]: 0.00042145 [add_forward_monad_depend]: 5.00999e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.436e-05 [cse]: 2.693e-05 [a_3]: 4.182e-05 [Cycle 2]: 0.00059594, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.00012432 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.844e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.37999e-06 [merge_send_recv]: 4.27998e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.05998e-06 [flash_sp]: 2.86999e-06 [merge_comm]: 3.3e-06 [allreduce_fusion]: 3.00002e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.19e-06 [get_grad_eliminate_]: 5.01997e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 5.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.83002e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.88999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01999e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.55999e-06 [a_after_grad]: 7.76001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.361e-05 [a_3]: 3.334e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 3.126e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.33002e-06 [mutable_eliminate]: 0.00044867 [opt_b]: 0.00018022, [1] [Cycle 1]: 0.00017428, [7] [b_1]: 0.00010687 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.9002e-07 [cse]: 1.613e-05 [optimize_parallel_all_gather_comm]: 1.569e-05 [overlap_param_gather]: 1.73002e-06 [cconv]: 2.377e-05 [loop_unroll]: 0.00041259 [opt_after_cconv]: 9.417e-05, [1] [Cycle 1]: 8.861e-05, [7] [c_1]: 2.731e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.01997e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.666e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.283e-05 [tuple_transform]: 6.863e-05, [1] [Cycle 1]: 6.45e-05, [4] [d_1]: 3.898e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.07001e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 4.435e-05 [cse_after_recomputation]: 2.048e-05, [1] [Cycle 1]: 1.592e-05, [1] [cse]: 1.096e-05 [environ_conv]: 4.74e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.46002e-06 [label_fine_grained_interleaved_index]: 2.92002e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.89999e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.90002e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 1.09998e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.32e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.133e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.99002e-06 [overlap_recompute_and_grad_model_parallel]: 4.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.69001e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 1.97001e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.948e-05, [1] [Cycle 1]: 6.531e-05, [6] [build]: 2.06998e-06 [elim_shapecalc]: 8.65001e-06 [elim_not_effective]: 1.227e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 2.50002e-07 [detach_backward]: 2.10002e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.567e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.9e-06 [opt_after_jit_grad]: 0.00045508 [validate]: 3.111e-05 [backend_pass]: 8.49977e-07 [task_emit]: 0.0611399 [execute]: 8.81002e-06 Sums bootstrap : 0.000473s : 0.66% type_inference : 0.005725s : 8.05% event_method : 0.000014s : 0.02% auto_monad : 0.000058s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000576s : 0.81% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000422s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.63% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000413s : 0.58% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000455s : 0.64% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.061140s : 85.91% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.85% : 0.000025s : 5: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.39% : 0.000006s : 4: substitution.graph_param_transform 66.65% : 0.000112s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000005s : 4: substitution.remove_not_recompute_node 2.20% : 0.000004s : 4: substitution.replace_old_param 6.39% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005683 2 90.12% : 0.005122s : 1: type_inference.infer 9.88% : 0.000562s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.22% : 0.000027s : 3: replace.inline 29.78% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.90% : 0.000110s : 3: match.inline 8.10% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.77% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.78% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.90% : 0.000001s : 11: predicate.reshape_eliminate 0.92% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.61% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.16% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000358 8 47.96% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.04% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.083766 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.66% : 0.003066s : 1: add_attr 3.65% : 0.003057s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000063s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000509s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.12% : 0.000941s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.56% : 0.002143s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.55% : 0.000465s : 1: opt_after_jit_grad 0.22% : 0.000184s : 1: opt_b 4.75% : 0.003979s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000210s : 1: renormalize.infer 0.24% : 0.000205s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 73.01% : 0.061156s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.85% : 0.005739s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.994716, [24] [bootstrap]: 0.00053321 [type_inference]: 0.0118397 [event_method]: 4.726e-05 [auto_monad]: 0.00012287 [graph_reusing]: 8.70001e-06 [inline]: 2.14e-06 [add_attr]: 0.00309056, [1] [add_attr_with_inline]: 0.00308208, [1] [Cycle 1]: 6.947e-05, [2] [tag_attr]: 3.383e-05 [meta_addattr_fg_expand]: 9.25999e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 5.099e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.17001e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0133634, [53] [py_interpret_to_execute]: 3.776e-05 [rewriter_before_opt_a]: 0.00014631 [opt_a]: 0.0110874, [3] [Cycle 1]: 0.00713867, [45] [expand_dump_flag]: 3.85e-06 [switch_simplify]: 7.448e-05 [loop_unroll]: 6.243e-05 [a_1]: 0.00145656 [with_stream_mark]: 2.324e-05 [recompute_prepare]: 2.128e-05 [updatestate_depend_eliminate]: 9.04998e-06 [updatestate_assign_eliminate]: 8.12998e-06 [updatestate_loads_eliminate]: 7.85e-06 [parameter_eliminate]: 2.79999e-06 [a_2]: 0.0002464 [accelerated_algorithm]: 3.064e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.43999e-06 [shard_inline]: 1.627e-05 [merge_send_recv]: 1.643e-05 [auto_parallel]: 1.077e-05 [parallel]: 1.899e-05 [flash_sp]: 1.161e-05 [merge_comm]: 9.62999e-06 [allreduce_fusion]: 8.88002e-06 [matmul_add_comm_reduction]: 2.633e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.785e-05 [virtual_dataset]: 1.555e-05 [get_grad_eliminate_]: 1.534e-05 [virtual_output]: 1.512e-05 [merge_forward]: 9.05999e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 1.785e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.845e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 2.751e-05 [set_forward_comm_id_for_comm_node_pass]: 9.46998e-06 [meta_fg_expand]: 0.00143647 [flash_sp_send_recv_attached]: 3.42002e-06 [receive_attached]: 2.71e-06 [after_resolve]: 5.899e-05 [a_after_grad]: 8.139e-05 [renormalize]: 0.00247155 [add_forward_monad_depend]: 9.32001e-06 [auto_monad_grad]: 5.14e-06 [auto_monad_eliminator]: 5.668e-05 [cse]: 0.00017016 [a_3]: 0.00033632 [Cycle 2]: 0.00300263, [45] [expand_dump_flag]: 1.43002e-06 [switch_simplify]: 4.674e-05 [loop_unroll]: 4.388e-05 [a_1]: 0.00152309 [with_stream_mark]: 1.182e-05 [recompute_prepare]: 1.108e-05 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 0.0001254 [accelerated_algorithm]: 1.197e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 7.58999e-06 [auto_parallel]: 7.77002e-06 [parallel]: 5.12e-06 [flash_sp]: 3.44001e-06 [merge_comm]: 5.25999e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 7.82e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 9.70002e-06 [virtual_dataset]: 8.79e-06 [get_grad_eliminate_]: 8.52998e-06 [virtual_output]: 8.48001e-06 [merge_forward]: 4.42998e-06 [cell_reuse_recompute_pass]: 8.50006e-07 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.63e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.405e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44e-06 [meta_fg_expand]: 7.021e-05 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.557e-05 [a_after_grad]: 1.445e-05 [renormalize]: 0.00059792 [add_forward_monad_depend]: 4.07003e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.471e-05 [cse]: 4.627e-05 [a_3]: 6.561e-05 [Cycle 3]: 0.00093263, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 1.055e-05 [loop_unroll]: 8.92e-06 [a_1]: 0.00024865 [with_stream_mark]: 9.99999e-06 [recompute_prepare]: 4.068e-05 [updatestate_depend_eliminate]: 4.99003e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 1.11002e-06 [a_2]: 0.00012383 [accelerated_algorithm]: 1.146e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 8.88002e-06 [merge_send_recv]: 6.59001e-06 [auto_parallel]: 7.41001e-06 [parallel]: 4.65999e-06 [flash_sp]: 1.17e-06 [merge_comm]: 4.92999e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.59002e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 9.92999e-06 [virtual_dataset]: 8.60001e-06 [get_grad_eliminate_]: 8.64e-06 [virtual_output]: 8.38999e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 8.3e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.603e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.375e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17999e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.483e-05 [a_after_grad]: 1.484e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 1.114e-05 [cse]: 2.73e-05 [a_3]: 5.902e-05 [py_interpret_to_execute_after_opt_a]: 1.004e-05 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 4.689e-05 [convert_after_rewriter]: 9.57001e-06 [order_py_execute_after_rewriter]: 7.03e-06 [mutable_eliminate]: 0.00046076 [opt_b]: 0.00028976, [1] [Cycle 1]: 0.00028345, [7] [b_1]: 0.00019015 [b_2]: 1.075e-05 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.88999e-06 [renormalize]: 5.19998e-07 [cse]: 3.231e-05 [optimize_parallel_all_gather_comm]: 2.085e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.014e-05 [loop_unroll]: 0.00042494 [opt_after_cconv]: 0.00013796, [1] [Cycle 1]: 0.00013206, [7] [c_1]: 4.82e-05 [parameter_eliminate]: 2.61e-06 [updatestate_depend_eliminate]: 7.38e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 4.07e-06 [cse]: 3.086e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 3e-05 [tuple_transform]: 0.00010092, [1] [Cycle 1]: 9.643e-05, [4] [d_1]: 6.601e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 1.032e-05 [partial_unused_args_eliminate]: 2.13998e-06 [add_recomputation]: 5.769e-05 [cse_after_recomputation]: 3.196e-05, [1] [Cycle 1]: 2.736e-05, [1] [cse]: 2.227e-05 [environ_conv]: 9.15001e-06 [swap_dp_allreduce_reducescatter]: 8.07e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.13999e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.99999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 1.15999e-06 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.86999e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.703e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 5.14e-06 [overlap_recompute_and_grad_model_parallel]: 5.72999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.53e-06 [overlap_grad_ring_attention]: 5.28002e-06 [overlap_grad_flash_sp]: 2.465e-05 [begin_end_overlap_inline]: 8.60018e-07 [split_matmul_comm_elemetwise]: 2.46998e-06 [split_layernorm_comm]: 2.12001e-06 [handle_group_info]: 1.31998e-06 [symbol_engine_optimizer]: 9.867e-05, [1] [Cycle 1]: 9.431e-05, [6] [build]: 1.041e-05 [elim_shapecalc]: 1.311e-05 [elim_not_effective]: 1.795e-05 [opt_reshape]: 9.97001e-06 [fold_const_symbol]: 1.497e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 2.598e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.8e-06 [opt_after_jit_grad]: 0.00046906 [validate]: 4.604e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.964868 [execute]: 9.91e-06 Sums bootstrap : 0.000533s : 0.05% type_inference : 0.011840s : 1.20% event_method : 0.000047s : 0.00% auto_monad : 0.000123s : 0.01% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000146s : 0.01% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.01% optimize.opt_a.loop_unroll : 0.000115s : 0.01% optimize.opt_a.a_1 : 0.003228s : 0.33% optimize.opt_a.with_stream_mark : 0.000045s : 0.00% optimize.opt_a.recompute_prepare : 0.000073s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.00% optimize.opt_a.merge_send_recv : 0.000031s : 0.00% optimize.opt_a.auto_parallel : 0.000026s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.00% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.001510s : 0.15% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.01% optimize.opt_a.a_after_grad : 0.000111s : 0.01% optimize.opt_a.renormalize : 0.003070s : 0.31% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.01% optimize.opt_a.cse : 0.000244s : 0.02% optimize.opt_a.a_3 : 0.000461s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000461s : 0.05% optimize.opt_b.b_1 : 0.000190s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000425s : 0.04% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.00% optimize.tuple_transform.d_1 : 0.000066s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000469s : 0.05% validate : 0.000046s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.964868s : 97.43% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000767 222 5.90% : 0.000045s : 12: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.68% : 0.000427s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.76% : 0.000014s : 20: substitution.remove_not_recompute_node 3.17% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.62% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.42% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011762 2 86.80% : 0.010210s : 1: type_inference.infer 13.20% : 0.001552s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.17% : 0.000125s : 17: replace.inline 42.83% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000452 33 92.43% : 0.000418s : 17: match.inline 7.57% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.00% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000008s : 68: predicate.cast_eliminate 1.11% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.21% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.12% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.34% : 0.000003s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001658 34 58.63% : 0.000972s : 13: func_graph_cloner_run.FuncGraphClonerGraph 41.37% : 0.000686s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.019487 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.30% : 0.003095s : 1: add_attr 0.30% : 0.003086s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000130s : 1: auto_monad 0.00% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000571s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000054s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.05% : 0.000470s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.48% : 0.004925s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000175s : 28: opt.transform.opt_b 0.01% : 0.000074s : 2: opt.transform.opt_trans_graph 0.01% : 0.000053s : 4: opt.transform.symbol_engine_opt 1.09% : 0.011090s : 1: opt_a 0.01% : 0.000141s : 1: opt_after_cconv 0.05% : 0.000478s : 1: opt_after_jit_grad 0.03% : 0.000293s : 1: opt_b 1.31% : 0.013367s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000055s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000034s : 1: remove_dup_value 0.16% : 0.001626s : 2: renormalize.infer 0.14% : 0.001431s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000051s : 1: rewriter_after_opt_a 0.01% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000101s : 1: symbol_engine_optimizer 94.64% : 0.964891s : 1: task_emit 0.01% : 0.000104s : 1: tuple_transform 1.16% : 0.011855s : 1: type_inference 0.01% : 0.000070s : 1: validate TotalTime = 0.0711542, [24] [bootstrap]: 0.00046763 [type_inference]: 0.00439425 [event_method]: 1.052e-05 [auto_monad]: 5.214e-05 [graph_reusing]: 5.69e-06 [inline]: 2.29001e-06 [add_attr]: 0.00295704, [1] [add_attr_with_inline]: 0.0029491, [1] [Cycle 1]: 4.762e-05, [2] [tag_attr]: 1.221e-05 [meta_addattr_fg_expand]: 3.18e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.069e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00371053, [53] [py_interpret_to_execute]: 1.541e-05 [rewriter_before_opt_a]: 3.932e-05 [opt_a]: 0.00190083, [2] [Cycle 1]: 0.00130455, [45] [expand_dump_flag]: 3.21999e-06 [switch_simplify]: 2.427e-05 [loop_unroll]: 1.336e-05 [a_1]: 0.00029525 [with_stream_mark]: 1.328e-05 [recompute_prepare]: 7.41999e-06 [updatestate_depend_eliminate]: 3.62002e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.26001e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.756e-05 [accelerated_algorithm]: 6.68998e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 6.07001e-06 [merge_send_recv]: 8.50999e-06 [auto_parallel]: 5.84999e-06 [parallel]: 1.79e-05 [flash_sp]: 7.68999e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.53999e-06 [matmul_add_comm_reduction]: 8.74998e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.92997e-06 [virtual_dataset]: 5.97999e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 4.928e-05 [merge_recompute_call_nodes]: 1.76e-06 [before_grad]: 9.82001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.99999e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.042e-05 [a_after_grad]: 9.34e-06 [renormalize]: 0.00034416 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 2.12001e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.879e-05 [a_3]: 3.916e-05 [Cycle 2]: 0.00058736, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.0001232 [with_stream_mark]: 1.094e-05 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.36e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.688e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 5.52999e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.08e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.63998e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.39001e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 5.07999e-06 [merge_forward]: 2.84001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 8.83001e-06 [a_after_grad]: 8.69e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.281e-05 [a_3]: 3.166e-05 [py_interpret_to_execute_after_opt_a]: 7.76001e-06 [slice_cell_reuse_recomputed_activation]: 2.25002e-06 [rewriter_after_opt_a]: 3.125e-05 [convert_after_rewriter]: 7.42998e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.00044646 [opt_b]: 0.00018213, [1] [Cycle 1]: 0.00017581, [7] [b_1]: 0.00010722 [b_2]: 7.31999e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 7.80012e-07 [cse]: 1.643e-05 [optimize_parallel_all_gather_comm]: 1.523e-05 [overlap_param_gather]: 2.16998e-06 [cconv]: 2.276e-05 [loop_unroll]: 0.00041228 [opt_after_cconv]: 9.495e-05, [1] [Cycle 1]: 8.928e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.23998e-06 [cse]: 1.651e-05 [renormalize]: 1.8999e-07 [remove_dup_value]: 1.298e-05 [tuple_transform]: 6.938e-05, [1] [Cycle 1]: 6.514e-05, [4] [d_1]: 3.962e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.48e-05 [cse_after_recomputation]: 2.027e-05, [1] [Cycle 1]: 1.562e-05, [1] [cse]: 1.049e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.73998e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.16997e-06 [interleave_parallel_branches]: 1.37e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88002e-06 [control_data_broadcast_order]: 1.219e-05 [grouped_pairwise_exchange_alltoall]: 1.95001e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 3.97998e-06 [overlap_grad_flash_sp]: 1.762e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 2.12001e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 6.849e-05, [1] [Cycle 1]: 6.435e-05, [6] [build]: 2.71999e-06 [elim_shapecalc]: 8.26002e-06 [elim_not_effective]: 1.09e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.66002e-06 [auto_monad_reorder]: 1.577e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00044665 [validate]: 3.218e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.058809 [execute]: 8.76997e-06 Sums bootstrap : 0.000468s : 0.70% type_inference : 0.004394s : 6.54% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000418s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000059s : 0.09% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000344s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.66% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000412s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000447s : 0.66% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058809s : 87.47% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.17% : 0.000023s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.65% : 0.000006s : 4: substitution.graph_param_transform 65.80% : 0.000082s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.74% : 0.000005s : 4: substitution.remove_not_recompute_node 2.95% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004352 2 91.41% : 0.003978s : 1: type_inference.infer 8.59% : 0.000374s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 17: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.27% : 0.000009s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.25% : 0.000002s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.41% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.80% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.61% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.05% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.88% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.68% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.19% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 43.20% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.80% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079130 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.74% : 0.002961s : 1: add_attr 3.73% : 0.002952s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.64% : 0.000506s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.02% : 0.000807s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.41% : 0.001904s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.58% : 0.000456s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.69% : 0.003714s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000189s : 1: renormalize.infer 0.19% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.34% : 0.058824s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.57% : 0.004409s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.109478, [24] [bootstrap]: 0.0005212 [type_inference]: 0.0108826 [event_method]: 4.647e-05 [auto_monad]: 0.00012177 [graph_reusing]: 8.32e-06 [inline]: 2.72001e-06 [add_attr]: 0.00312282, [1] [add_attr_with_inline]: 0.0031145, [1] [Cycle 1]: 6.885e-05, [2] [tag_attr]: 3.219e-05 [meta_addattr_fg_expand]: 8.59e-06 [parallel-infer-symbol]: 3.07002e-06 [pre_auto_parallel]: 4.681e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 2.21998e-06 [pipeline_split]: 1.98002e-06 [optimize]: 0.0133684, [53] [py_interpret_to_execute]: 3.611e-05 [rewriter_before_opt_a]: 0.00013151 [opt_a]: 0.0111025, [3] [Cycle 1]: 0.00708157, [45] [expand_dump_flag]: 3.91999e-06 [switch_simplify]: 6.728e-05 [loop_unroll]: 5.506e-05 [a_1]: 0.00134919 [with_stream_mark]: 2.344e-05 [recompute_prepare]: 2.127e-05 [updatestate_depend_eliminate]: 9.02999e-06 [updatestate_assign_eliminate]: 8.03999e-06 [updatestate_loads_eliminate]: 7.82e-06 [parameter_eliminate]: 3.23e-06 [a_2]: 0.00024586 [accelerated_algorithm]: 3.116e-05 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 3.45003e-06 [shard_inline]: 1.633e-05 [merge_send_recv]: 1.637e-05 [auto_parallel]: 1.082e-05 [parallel]: 1.91e-05 [flash_sp]: 1.142e-05 [merge_comm]: 9.45001e-06 [allreduce_fusion]: 8.72e-06 [matmul_add_comm_reduction]: 2.752e-05 [allreduce_slice_to_reducescatter]: 7.89994e-07 [virtual_shard_identity]: 1.822e-05 [virtual_dataset]: 1.57e-05 [get_grad_eliminate_]: 1.514e-05 [virtual_output]: 1.551e-05 [merge_forward]: 9.87001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.814e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.859e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 2.725e-05 [set_forward_comm_id_for_comm_node_pass]: 9.77999e-06 [meta_fg_expand]: 0.00143289 [flash_sp_send_recv_attached]: 3.7e-06 [receive_attached]: 2.51e-06 [after_resolve]: 6.014e-05 [a_after_grad]: 8.051e-05 [renormalize]: 0.00250715 [add_forward_monad_depend]: 9.86e-06 [auto_monad_grad]: 5.25999e-06 [auto_monad_eliminator]: 7.778e-05 [cse]: 0.00017711 [a_3]: 0.00033585 [Cycle 2]: 0.00307448, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 4.705e-05 [loop_unroll]: 4.391e-05 [a_1]: 0.00155229 [with_stream_mark]: 1.278e-05 [recompute_prepare]: 1.061e-05 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 4.48001e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.0001304 [accelerated_algorithm]: 1.218e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 2.16e-06 [shard_inline]: 1.088e-05 [merge_send_recv]: 7.75e-06 [auto_parallel]: 7.61999e-06 [parallel]: 5.51e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 5.49e-06 [allreduce_fusion]: 4.92999e-06 [matmul_add_comm_reduction]: 8.40999e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 1.051e-05 [virtual_dataset]: 9.04e-06 [get_grad_eliminate_]: 8.87999e-06 [virtual_output]: 9.19998e-06 [merge_forward]: 4.77e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 1.013e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.774e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.575e-05 [set_forward_comm_id_for_comm_node_pass]: 5.46e-06 [meta_fg_expand]: 3.659e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 1.736e-05 [a_after_grad]: 1.653e-05 [renormalize]: 0.00064528 [add_forward_monad_depend]: 4.12e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.478e-05 [cse]: 4.76e-05 [a_3]: 6.487e-05 [Cycle 3]: 0.00093251, [45] [expand_dump_flag]: 1.04998e-06 [switch_simplify]: 1.096e-05 [loop_unroll]: 8.92999e-06 [a_1]: 0.00024985 [with_stream_mark]: 1.002e-05 [recompute_prepare]: 8.92e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 3.78001e-06 [updatestate_loads_eliminate]: 3.62998e-06 [parameter_eliminate]: 1.06997e-06 [a_2]: 0.00012369 [accelerated_algorithm]: 1.141e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 9.07999e-06 [merge_send_recv]: 7.21999e-06 [auto_parallel]: 7.34002e-06 [parallel]: 4.83001e-06 [flash_sp]: 1.19e-06 [merge_comm]: 4.79998e-06 [allreduce_fusion]: 5.09e-06 [matmul_add_comm_reduction]: 7.86001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.011e-05 [virtual_dataset]: 8.76002e-06 [get_grad_eliminate_]: 8.52998e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.16001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.599e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.379e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20999e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.477e-05 [a_after_grad]: 1.492e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.32999e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 1.059e-05 [cse]: 2.76e-05 [a_3]: 5.913e-05 [py_interpret_to_execute_after_opt_a]: 1.068e-05 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 4.793e-05 [convert_after_rewriter]: 9.10999e-06 [order_py_execute_after_rewriter]: 7.23e-06 [mutable_eliminate]: 0.00046135 [opt_b]: 0.00028938, [1] [Cycle 1]: 0.0002834, [7] [b_1]: 0.00019019 [b_2]: 1.055e-05 [updatestate_depend_eliminate]: 7.31001e-06 [updatestate_assign_eliminate]: 4.14002e-06 [updatestate_loads_eliminate]: 4.22998e-06 [renormalize]: 4.19997e-07 [cse]: 3.233e-05 [optimize_parallel_all_gather_comm]: 2.019e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.077e-05 [loop_unroll]: 0.00042858 [opt_after_cconv]: 0.00013727, [1] [Cycle 1]: 0.00013123, [7] [c_1]: 4.821e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 7.41001e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.75e-06 [cse]: 3.157e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 2.955e-05 [tuple_transform]: 0.00010242, [1] [Cycle 1]: 9.771e-05, [4] [d_1]: 6.714e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 9.97001e-06 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 5.915e-05 [cse_after_recomputation]: 3.301e-05, [1] [Cycle 1]: 2.829e-05, [1] [cse]: 2.298e-05 [environ_conv]: 9.33002e-06 [swap_dp_allreduce_reducescatter]: 8.02998e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.23998e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.35001e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 3.08e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.766e-05 [grouped_pairwise_exchange_alltoall]: 2.05002e-06 [offloading_packed_experts]: 5.19998e-06 [overlap_recompute_and_grad_model_parallel]: 5.79999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 5.30001e-06 [overlap_grad_flash_sp]: 2.454e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.15002e-06 [split_layernorm_comm]: 1.89e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 9.828e-05, [1] [Cycle 1]: 9.363e-05, [6] [build]: 9.92001e-06 [elim_shapecalc]: 1.335e-05 [elim_not_effective]: 1.782e-05 [opt_reshape]: 9.97999e-06 [fold_const_symbol]: 1.496e-05 [renormalize]: 2.3999e-07 [detach_backward]: 1.93997e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 2.588e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.86999e-06 [opt_after_jit_grad]: 0.00047403 [validate]: 4.693e-05 [backend_pass]: 8.29983e-07 [task_emit]: 0.080567 [execute]: 8.17998e-06 Sums bootstrap : 0.000521s : 0.50% type_inference : 0.010883s : 10.36% event_method : 0.000046s : 0.04% auto_monad : 0.000122s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000132s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003151s : 3.00% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000500s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000036s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001473s : 1.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.09% optimize.opt_a.a_after_grad : 0.000112s : 0.11% optimize.opt_a.renormalize : 0.003152s : 3.00% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000103s : 0.10% optimize.opt_a.cse : 0.000252s : 0.24% optimize.opt_a.a_3 : 0.000460s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000461s : 0.44% optimize.opt_b.b_1 : 0.000190s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000429s : 0.41% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000474s : 0.45% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.080567s : 76.69% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000748 218 5.94% : 0.000044s : 11: substitution.arithmetic_simplify 1.89% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.08% : 0.000008s : 8: substitution.graph_param_transform 0.44% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.07% : 0.000412s : 16: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.06% : 0.000015s : 3: substitution.less_batch_normalization 1.80% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.84% : 0.000014s : 20: substitution.remove_not_recompute_node 3.28% : 0.000025s : 10: substitution.replace_applicator 1.46% : 0.000011s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.41% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010809 2 83.39% : 0.009014s : 1: type_inference.infer 16.61% : 0.001795s : 1: type_inference.specialize ------[replace.] 0.000206 30 58.46% : 0.000121s : 16: replace.inline 41.54% : 0.000086s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000435 30 92.72% : 0.000404s : 16: match.inline 7.28% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 99: predicate.arithmetic_simplify 1.13% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.52% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000042s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.58% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.89% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000037s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.13% : 0.000008s : 67: predicate.transpose_eliminate 1.44% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.62% : 0.000005s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001944 32 44.76% : 0.000870s : 12: func_graph_cloner_run.FuncGraphClonerGraph 55.24% : 0.001074s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.134261 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.33% : 0.003127s : 1: add_attr 2.32% : 0.003118s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000129s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000558s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000053s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.59% : 0.004816s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.27% : 0.011105s : 1: opt_a 0.10% : 0.000141s : 1: opt_after_cconv 0.36% : 0.000484s : 1: opt_after_jit_grad 0.22% : 0.000293s : 1: opt_b 9.96% : 0.013372s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000034s : 1: remove_dup_value 1.24% : 0.001666s : 2: renormalize.infer 1.10% : 0.001472s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000137s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 60.02% : 0.080583s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 8.12% : 0.010900s : 1: type_inference 0.05% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x8-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x9-pynative],max_mem:10.0M TotalTime = 0.0222267, [24] [bootstrap]: 0.00053631 [type_inference]: 0.00622437 [event_method]: 1.421e-05 [auto_monad]: 0.00011656 [graph_reusing]: 5.39998e-06 [inline]: 1.72999e-06 [add_attr]: 0.00335246, [1] [add_attr_with_inline]: 0.00334272, [1] [Cycle 1]: 4.624e-05, [2] [tag_attr]: 1.562e-05 [meta_addattr_fg_expand]: 4.54998e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.896e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.90001e-06 [optimize]: 0.00399479, [53] [py_interpret_to_execute]: 2.087e-05 [rewriter_before_opt_a]: 5.937e-05 [opt_a]: 0.00215327, [2] [Cycle 1]: 0.00155551, [45] [expand_dump_flag]: 3.13998e-06 [switch_simplify]: 3.205e-05 [loop_unroll]: 2.047e-05 [a_1]: 0.00048795 [with_stream_mark]: 1.431e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 7.675e-05 [accelerated_algorithm]: 6.23998e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.52998e-06 [auto_parallel]: 6.07001e-06 [parallel]: 2.517e-05 [flash_sp]: 7.09001e-06 [merge_comm]: 3.55998e-06 [allreduce_fusion]: 3.48999e-06 [matmul_add_comm_reduction]: 9.60001e-06 [allreduce_slice_to_reducescatter]: 9.5999e-07 [virtual_shard_identity]: 7.11001e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.92999e-06 [virtual_output]: 6.06e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.033e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.59999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.71e-06 [receive_attached]: 2.26998e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 9.40001e-06 [renormalize]: 0.00041102 [add_forward_monad_depend]: 4.92e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.426e-05 [cse]: 2.752e-05 [a_3]: 4.006e-05 [Cycle 2]: 0.00058832, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.49999e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012593 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.39001e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 6.709e-05 [accelerated_algorithm]: 5.38002e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.09e-06 [parallel]: 4.48999e-06 [flash_sp]: 3.18998e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 4.95001e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 5.84e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 5.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.7e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 8.76002e-06 [a_after_grad]: 7.8e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 5.72999e-06 [cse]: 1.548e-05 [a_3]: 3.248e-05 [py_interpret_to_execute_after_opt_a]: 7.54002e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 2.98e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.42999e-06 [mutable_eliminate]: 0.0004478 [opt_b]: 0.00017898, [1] [Cycle 1]: 0.00017295, [7] [b_1]: 0.0001066 [b_2]: 6.75002e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 5.00004e-07 [cse]: 1.552e-05 [optimize_parallel_all_gather_comm]: 1.605e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.237e-05 [loop_unroll]: 0.00041432 [opt_after_cconv]: 9.432e-05, [1] [Cycle 1]: 8.866e-05, [7] [c_1]: 2.815e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.588e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.344e-05 [tuple_transform]: 6.939e-05, [1] [Cycle 1]: 6.507e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.53002e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.43e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.988e-05 [cse_after_recomputation]: 2.049e-05, [1] [Cycle 1]: 1.592e-05, [1] [cse]: 1.088e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 2.86e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.52001e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71002e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.35e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.74e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.742e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.965e-05, [1] [Cycle 1]: 6.529e-05, [6] [build]: 2.35002e-06 [elim_shapecalc]: 8.84e-06 [elim_not_effective]: 1.162e-05 [opt_reshape]: 6.08998e-06 [fold_const_symbol]: 9.05001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.55999e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.523e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 0.00013014 [opt_after_jit_grad]: 0.00050634 [validate]: 3.212e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00703327 [execute]: 7.28999e-06 Sums bootstrap : 0.000536s : 3.00% type_inference : 0.006224s : 34.78% event_method : 0.000014s : 0.08% auto_monad : 0.000117s : 0.65% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000614s : 3.43% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.80% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000411s : 2.30% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000043s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 2.50% optimize.opt_b.b_1 : 0.000107s : 0.60% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000414s : 2.31% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000130s : 0.73% opt_after_jit_grad : 0.000506s : 2.83% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.007033s : 39.30% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000167 30 15.19% : 0.000025s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 66.70% : 0.000112s : 3: substitution.inline 1.73% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.18% : 0.000004s : 4: substitution.replace_old_param 6.52% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006179 2 90.54% : 0.005594s : 1: type_inference.infer 9.46% : 0.000584s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.12% : 0.000028s : 3: replace.inline 30.88% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.77% : 0.000110s : 3: match.inline 8.23% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_depend_swap 1.93% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.39% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.30% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.30% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.54% : 0.000002s : 21: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.59% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000374 8 47.88% : 0.000179s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.12% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031116 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.79% : 0.003357s : 1: add_attr 10.75% : 0.003346s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.39% : 0.000122s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.85% : 0.000577s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.47% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000978s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.93% : 0.002156s : 1: opt_a 0.31% : 0.000098s : 1: opt_after_cconv 1.66% : 0.000517s : 1: opt_after_jit_grad 0.59% : 0.000182s : 1: opt_b 12.85% : 0.003998s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.67% : 0.000209s : 1: renormalize.infer 0.63% : 0.000196s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000135s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 22.64% : 0.007045s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.05% : 0.006238s : 1: type_inference 0.20% : 0.000064s : 1: validate TotalTime = 0.0183427, [24] [bootstrap]: 0.0004607 [type_inference]: 0.00439869 [event_method]: 1.046e-05 [auto_monad]: 5.419e-05 [graph_reusing]: 5.34e-06 [inline]: 1.87001e-06 [add_attr]: 0.0029929, [1] [add_attr_with_inline]: 0.00298426, [1] [Cycle 1]: 4.657e-05, [2] [tag_attr]: 1.224e-05 [meta_addattr_fg_expand]: 3.07002e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.158e-05 [insert-virtual-dataset]: 2.64999e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.003674, [53] [py_interpret_to_execute]: 1.444e-05 [rewriter_before_opt_a]: 3.993e-05 [opt_a]: 0.00187859, [2] [Cycle 1]: 0.00125068, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 2.468e-05 [loop_unroll]: 1.423e-05 [a_1]: 0.00029265 [with_stream_mark]: 1.405e-05 [recompute_prepare]: 7.09001e-06 [updatestate_depend_eliminate]: 4.35e-06 [updatestate_assign_eliminate]: 3.55998e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.636e-05 [accelerated_algorithm]: 6.18002e-06 [shard]: 2.39001e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 8.45999e-06 [auto_parallel]: 5.73997e-06 [parallel]: 1.812e-05 [flash_sp]: 7.77002e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 9.35001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.49e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.12e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.11002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.07999e-06 [flash_sp_send_recv_attached]: 2.72001e-06 [receive_attached]: 2.71e-06 [after_resolve]: 1.002e-05 [a_after_grad]: 8.43999e-06 [renormalize]: 0.00033577 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.427e-05 [cse]: 2.73e-05 [a_3]: 3.978e-05 [Cycle 2]: 0.00061825, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7.10002e-06 [loop_unroll]: 5.67999e-06 [a_1]: 0.00012393 [with_stream_mark]: 1.099e-05 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.70002e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.58998e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.767e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.09003e-06 [parallel]: 4.14002e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.664e-05 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.61e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.33999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.09998e-06 [a_after_grad]: 7.87e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.03002e-06 [cse]: 1.283e-05 [a_3]: 3.306e-05 [py_interpret_to_execute_after_opt_a]: 7.25e-06 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 3.198e-05 [convert_after_rewriter]: 6.83998e-06 [order_py_execute_after_rewriter]: 5.42999e-06 [mutable_eliminate]: 0.00044337 [opt_b]: 0.00017947, [1] [Cycle 1]: 0.0001732, [7] [b_1]: 0.00010678 [b_2]: 6.67002e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.30009e-07 [cse]: 1.567e-05 [optimize_parallel_all_gather_comm]: 1.487e-05 [overlap_param_gather]: 2.11e-06 [cconv]: 2.374e-05 [loop_unroll]: 0.00040957 [opt_after_cconv]: 9.364e-05, [1] [Cycle 1]: 8.802e-05, [7] [c_1]: 2.761e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.18998e-06 [cse]: 1.52e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.267e-05 [tuple_transform]: 7.036e-05, [1] [Cycle 1]: 6.563e-05, [4] [d_1]: 4.021e-05 [none_parameter_eliminate]: 1.48002e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 4.445e-05 [cse_after_recomputation]: 1.943e-05, [1] [Cycle 1]: 1.504e-05, [1] [cse]: 9.99001e-06 [environ_conv]: 4.86002e-06 [swap_dp_allreduce_reducescatter]: 5.39998e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.43002e-06 [micro_interleaved_order_control]: 2.90002e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.39e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.191e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.716e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 6.696e-05, [1] [Cycle 1]: 6.303e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 7.93999e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 5.91998e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.649e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00044792 [validate]: 3.128e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00600517 [execute]: 7.07002e-06 Sums bootstrap : 0.000461s : 3.20% type_inference : 0.004399s : 30.56% event_method : 0.000010s : 0.07% auto_monad : 0.000054s : 0.38% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000417s : 2.89% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000030s : 0.21% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000336s : 2.33% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000443s : 3.08% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000410s : 2.85% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 3.11% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006005s : 41.72% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.37% : 0.000022s : 4: substitution.arithmetic_simplify 1.52% : 0.000002s : 2: substitution.elim_not_effective 1.10% : 0.000001s : 2: substitution.fold_const_symbol 4.96% : 0.000006s : 4: substitution.graph_param_transform 64.87% : 0.000079s : 2: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.73% : 0.000005s : 4: substitution.remove_not_recompute_node 3.25% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004357 2 92.23% : 0.004018s : 1: type_inference.infer 7.77% : 0.000339s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.86% : 0.000004s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.47% : 0.000001s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.98% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.37% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.18% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.59% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.97% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.62% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 43.97% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.03% : 0.000132s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026271 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.41% : 0.002997s : 1: add_attr 11.37% : 0.002988s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.23% : 0.000059s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.90% : 0.000500s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000418s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.72% : 0.000452s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000768s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.16% : 0.001881s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000457s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.00% : 0.003678s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000187s : 1: renormalize.infer 0.54% : 0.000142s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 22.90% : 0.006015s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.80% : 0.004413s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0197535, [24] [bootstrap]: 0.00045938 [type_inference]: 0.00558785 [event_method]: 1.401e-05 [auto_monad]: 5.713e-05 [graph_reusing]: 5.93998e-06 [inline]: 2.24001e-06 [add_attr]: 0.0029409, [1] [add_attr_with_inline]: 0.0029333, [1] [Cycle 1]: 4.757e-05, [2] [tag_attr]: 1.615e-05 [meta_addattr_fg_expand]: 4.2e-06 [parallel-infer-symbol]: 2.99999e-06 [pre_auto_parallel]: 2.589e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00394025, [53] [py_interpret_to_execute]: 1.967e-05 [rewriter_before_opt_a]: 5.942e-05 [opt_a]: 0.00210843, [2] [Cycle 1]: 0.00151349, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.261e-05 [loop_unroll]: 2.072e-05 [a_1]: 0.00044849 [with_stream_mark]: 1.336e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 3.45998e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 9.801e-05 [accelerated_algorithm]: 6.83998e-06 [shard]: 2.06998e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 5.69e-06 [parallel]: 1.825e-05 [flash_sp]: 7.30998e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.18002e-06 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 6.71999e-06 [virtual_dataset]: 6.13002e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.067e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 8.94e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.62001e-06 [flash_sp_send_recv_attached]: 2.31998e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 9.87001e-06 [a_after_grad]: 8.97e-06 [renormalize]: 0.00040021 [add_forward_monad_depend]: 4.48999e-06 [auto_monad_grad]: 1.88997e-06 [auto_monad_eliminator]: 1.422e-05 [cse]: 2.938e-05 [a_3]: 4.017e-05 [Cycle 2]: 0.00058555, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.27001e-06 [a_1]: 0.00012347 [with_stream_mark]: 9.52001e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.717e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.30999e-06 [auto_parallel]: 4.99e-06 [parallel]: 4.32e-06 [flash_sp]: 3.43e-06 [merge_comm]: 3.33e-06 [allreduce_fusion]: 2.70002e-06 [matmul_add_comm_reduction]: 4.94e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.01003e-06 [virtual_dataset]: 5.16002e-06 [get_grad_eliminate_]: 4.84e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.53003e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 5.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.29e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.74002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.99998e-06 [a_after_grad]: 7.71001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.19999e-06 [cse]: 1.306e-05 [a_3]: 3.14e-05 [py_interpret_to_execute_after_opt_a]: 7.57998e-06 [slice_cell_reuse_recomputed_activation]: 2.53998e-06 [rewriter_after_opt_a]: 3.203e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00045088 [opt_b]: 0.00017933, [1] [Cycle 1]: 0.00017332, [7] [b_1]: 0.0001068 [b_2]: 6.98998e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 4.10015e-07 [cse]: 1.65e-05 [optimize_parallel_all_gather_comm]: 1.588e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.254e-05 [loop_unroll]: 0.0004105 [opt_after_cconv]: 9.477e-05, [1] [Cycle 1]: 8.894e-05, [7] [c_1]: 2.77e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.593e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.335e-05 [tuple_transform]: 6.91e-05, [1] [Cycle 1]: 6.46e-05, [4] [d_1]: 3.911e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 4.474e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.564e-05, [1] [cse]: 1.055e-05 [environ_conv]: 5.18002e-06 [swap_dp_allreduce_reducescatter]: 5.35001e-06 [bias_add_comm_swap]: 2.40002e-06 [label_micro_interleaved_index]: 3.93001e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.22001e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 1.12e-06 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.40999e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.193e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.78001e-06 [overlap_grad_flash_sp]: 1.784e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 6.826e-05, [1] [Cycle 1]: 6.4e-05, [6] [build]: 2.54999e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.638e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00044662 [validate]: 3.12e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00600278 [execute]: 7.13998e-06 Sums bootstrap : 0.000459s : 2.90% type_inference : 0.005588s : 35.24% event_method : 0.000014s : 0.09% auto_monad : 0.000057s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000572s : 3.61% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000165s : 1.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000400s : 2.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.84% optimize.opt_b.b_1 : 0.000107s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000411s : 2.59% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000447s : 2.82% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006003s : 37.86% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000166 30 15.28% : 0.000025s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.35% : 0.000006s : 4: substitution.graph_param_transform 66.18% : 0.000110s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.33% : 0.000004s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 7.05% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005545 2 89.69% : 0.004974s : 1: type_inference.infer 10.31% : 0.000571s : 1: type_inference.specialize ------[replace.] 0.000038 5 71.60% : 0.000027s : 3: replace.inline 28.40% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.01% : 0.000108s : 3: match.inline 8.99% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.67% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000002s : 8: predicate.less_batch_normalization 1.94% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 1.04% : 0.000002s : 11: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.61% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.63% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.17% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.93% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.29% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.43% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000334 8 47.83% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.17% : 0.000174s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028149 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.46% : 0.002945s : 1: add_attr 10.43% : 0.002937s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000498s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.39% : 0.000955s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.50% : 0.002111s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.62% : 0.000456s : 1: opt_after_jit_grad 0.65% : 0.000183s : 1: opt_b 14.01% : 0.003944s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.74% : 0.000208s : 1: renormalize.infer 0.66% : 0.000186s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.23% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.36% : 0.006013s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 19.90% : 0.005602s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0374472, [24] [bootstrap]: 0.00049588 [type_inference]: 0.0114112 [event_method]: 4.691e-05 [auto_monad]: 0.00012138 [graph_reusing]: 8.10999e-06 [inline]: 2.10002e-06 [add_attr]: 0.00302524, [1] [add_attr_with_inline]: 0.00301701, [1] [Cycle 1]: 7.296e-05, [2] [tag_attr]: 3.47e-05 [meta_addattr_fg_expand]: 9.84001e-06 [parallel-infer-symbol]: 2.90002e-06 [pre_auto_parallel]: 5.096e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 6.70028e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.0132983, [53] [py_interpret_to_execute]: 3.779e-05 [rewriter_before_opt_a]: 0.00014304 [opt_a]: 0.011046, [3] [Cycle 1]: 0.00709158, [45] [expand_dump_flag]: 3.83999e-06 [switch_simplify]: 7.379e-05 [loop_unroll]: 6.146e-05 [a_1]: 0.00144565 [with_stream_mark]: 2.26e-05 [recompute_prepare]: 2.163e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 7.78999e-06 [updatestate_loads_eliminate]: 7.14001e-06 [parameter_eliminate]: 3.31999e-06 [a_2]: 0.00024573 [accelerated_algorithm]: 3.137e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 3.54002e-06 [shard_inline]: 1.642e-05 [merge_send_recv]: 1.612e-05 [auto_parallel]: 1.106e-05 [parallel]: 1.903e-05 [flash_sp]: 1.178e-05 [merge_comm]: 9.49999e-06 [allreduce_fusion]: 8.69e-06 [matmul_add_comm_reduction]: 2.657e-05 [allreduce_slice_to_reducescatter]: 6.20028e-07 [virtual_shard_identity]: 1.817e-05 [virtual_dataset]: 1.575e-05 [get_grad_eliminate_]: 1.544e-05 [virtual_output]: 1.52e-05 [merge_forward]: 9.87001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 1.767e-05 [cell_reuse_handle_not_recompute_node_pass]: 8.293e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 2.771e-05 [set_forward_comm_id_for_comm_node_pass]: 9.76e-06 [meta_fg_expand]: 0.00140295 [flash_sp_send_recv_attached]: 3.46001e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 5.895e-05 [a_after_grad]: 8.019e-05 [renormalize]: 0.00241974 [add_forward_monad_depend]: 9.32001e-06 [auto_monad_grad]: 5.86998e-06 [auto_monad_eliminator]: 5.589e-05 [cse]: 0.00016294 [a_3]: 0.00033798 [Cycle 2]: 0.00299904, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.706e-05 [loop_unroll]: 4.421e-05 [a_1]: 0.00152837 [with_stream_mark]: 1.223e-05 [recompute_prepare]: 1.09e-05 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.000127 [accelerated_algorithm]: 1.197e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.20999e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 7.00998e-06 [parallel]: 4.94e-06 [flash_sp]: 3.33e-06 [merge_comm]: 5.03002e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.67998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.055e-05 [virtual_dataset]: 9.09e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.56002e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 9.05999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.63e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.396e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 6.882e-05 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.609e-05 [a_after_grad]: 1.436e-05 [renormalize]: 0.00058856 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.446e-05 [cse]: 4.633e-05 [a_3]: 6.558e-05 [Cycle 3]: 0.00094147, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.115e-05 [loop_unroll]: 9.05001e-06 [a_1]: 0.00028081 [with_stream_mark]: 1.029e-05 [recompute_prepare]: 9.77999e-06 [updatestate_depend_eliminate]: 4.69998e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 3.90998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012516 [accelerated_algorithm]: 1.176e-05 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 8.90001e-06 [merge_send_recv]: 7.01001e-06 [auto_parallel]: 7.33e-06 [parallel]: 4.53001e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 5.04e-06 [matmul_add_comm_reduction]: 7.75e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.055e-05 [virtual_dataset]: 8.77999e-06 [get_grad_eliminate_]: 8.64e-06 [virtual_output]: 8.3e-06 [merge_forward]: 4.19002e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 8.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.737e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.469e-05 [set_forward_comm_id_for_comm_node_pass]: 6.14001e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.374e-05 [a_after_grad]: 1.452e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 1.057e-05 [cse]: 2.609e-05 [a_3]: 5.931e-05 [py_interpret_to_execute_after_opt_a]: 9.94999e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 4.775e-05 [convert_after_rewriter]: 9.32001e-06 [order_py_execute_after_rewriter]: 6.98e-06 [mutable_eliminate]: 0.00046079 [opt_b]: 0.00028635, [1] [Cycle 1]: 0.00028034, [7] [b_1]: 0.00018914 [b_2]: 1.09e-05 [updatestate_depend_eliminate]: 6.83e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.98001e-06 [renormalize]: 2.79979e-07 [cse]: 3.057e-05 [optimize_parallel_all_gather_comm]: 2.097e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.115e-05 [loop_unroll]: 0.00042193 [opt_after_cconv]: 0.00013481, [1] [Cycle 1]: 0.00012901, [7] [c_1]: 4.839e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 6.83998e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 4.23001e-06 [cse]: 2.929e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 2.946e-05 [tuple_transform]: 0.00010036, [1] [Cycle 1]: 9.594e-05, [4] [d_1]: 6.632e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.59e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 5.673e-05 [cse_after_recomputation]: 3.103e-05, [1] [Cycle 1]: 2.629e-05, [1] [cse]: 2.085e-05 [environ_conv]: 9.24e-06 [swap_dp_allreduce_reducescatter]: 7.76001e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.23998e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.29998e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.24998e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.668e-05 [grouped_pairwise_exchange_alltoall]: 1.68002e-06 [offloading_packed_experts]: 4.94e-06 [overlap_recompute_and_grad_model_parallel]: 5.57999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.38002e-06 [overlap_grad_ring_attention]: 5.42999e-06 [overlap_grad_flash_sp]: 2.443e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.704e-05, [1] [Cycle 1]: 9.274e-05, [6] [build]: 9.87999e-06 [elim_shapecalc]: 1.311e-05 [elim_not_effective]: 1.834e-05 [opt_reshape]: 9.67999e-06 [fold_const_symbol]: 1.457e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 2.59e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00046708 [validate]: 4.475e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00820871 [execute]: 7.95e-06 Sums bootstrap : 0.000496s : 1.50% type_inference : 0.011411s : 34.41% event_method : 0.000047s : 0.14% auto_monad : 0.000121s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000143s : 0.43% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.40% optimize.opt_a.loop_unroll : 0.000115s : 0.35% optimize.opt_a.a_1 : 0.003255s : 9.81% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000498s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000117s : 0.35% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001475s : 4.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003008s : 9.07% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000235s : 0.71% optimize.opt_a.a_3 : 0.000463s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000461s : 1.39% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000422s : 1.27% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000467s : 1.41% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008209s : 24.75% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000762 222 5.88% : 0.000045s : 12: substitution.arithmetic_simplify 1.82% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.57% : 0.000423s : 17: substitution.inline 2.01% : 0.000015s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000015s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.91% : 0.000015s : 20: substitution.remove_not_recompute_node 3.07% : 0.000023s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.58% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.80% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.44% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011336 2 87.25% : 0.009890s : 1: type_inference.infer 12.75% : 0.001446s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.53% : 0.000126s : 17: replace.inline 42.47% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000449 33 92.21% : 0.000414s : 17: match.inline 7.79% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.07% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 100: predicate.arithmetic_simplify 1.16% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 249: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.78% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001544 34 57.41% : 0.000886s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.59% : 0.000657s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062087 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.88% : 0.003029s : 1: add_attr 4.87% : 0.003021s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000129s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.86% : 0.000533s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.02% : 0.004981s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.80% : 0.011049s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.77% : 0.000477s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.43% : 0.013302s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000056s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.58% : 0.001599s : 2: renormalize.infer 2.25% : 0.001396s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000147s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.24% : 0.008220s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 18.40% : 0.011427s : 1: type_inference 0.13% : 0.000078s : 1: validate TotalTime = 0.0185901, [24] [bootstrap]: 0.00048242 [type_inference]: 0.00434226 [event_method]: 1.051e-05 [auto_monad]: 5.311e-05 [graph_reusing]: 5.44998e-06 [inline]: 1.81998e-06 [add_attr]: 0.00301907, [1] [add_attr_with_inline]: 0.0030111, [1] [Cycle 1]: 4.707e-05, [2] [tag_attr]: 1.146e-05 [meta_addattr_fg_expand]: 3.61001e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.192e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.91003e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00367867, [53] [py_interpret_to_execute]: 1.49e-05 [rewriter_before_opt_a]: 3.822e-05 [opt_a]: 0.00187476, [2] [Cycle 1]: 0.00125505, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 2.392e-05 [loop_unroll]: 1.356e-05 [a_1]: 0.00029286 [with_stream_mark]: 1.339e-05 [recompute_prepare]: 7.77002e-06 [updatestate_depend_eliminate]: 3.46999e-06 [updatestate_assign_eliminate]: 3.17002e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 2.32999e-06 [a_2]: 7.739e-05 [accelerated_algorithm]: 6.16998e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.13001e-06 [auto_parallel]: 5.95002e-06 [parallel]: 1.934e-05 [flash_sp]: 7.71999e-06 [merge_comm]: 3.76001e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.12002e-06 [virtual_dataset]: 5.62001e-06 [get_grad_eliminate_]: 5.33002e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.86001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.078e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61001e-06 [meta_fg_expand]: 2.53003e-06 [flash_sp_send_recv_attached]: 2.45002e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.121e-05 [a_after_grad]: 8.48001e-06 [renormalize]: 0.00033828 [add_forward_monad_depend]: 4.62e-06 [auto_monad_grad]: 1.60001e-06 [auto_monad_eliminator]: 1.361e-05 [cse]: 2.919e-05 [a_3]: 4.029e-05 [Cycle 2]: 0.00061008, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00012479 [with_stream_mark]: 9.14e-06 [recompute_prepare]: 5.69999e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.761e-05 [accelerated_algorithm]: 5.59998e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.43e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.15001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 2.65002e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.51e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59999e-06 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 7.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 1.51002e-06 [flash_sp_send_recv_attached]: 6.60017e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 9.23002e-06 [a_after_grad]: 8.13001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.241e-05 [a_3]: 5.028e-05 [py_interpret_to_execute_after_opt_a]: 7.95e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.151e-05 [convert_after_rewriter]: 6.73e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.00044815 [opt_b]: 0.00018004, [1] [Cycle 1]: 0.00017438, [7] [b_1]: 0.00010894 [b_2]: 6.69999e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 4.09986e-07 [cse]: 1.567e-05 [optimize_parallel_all_gather_comm]: 1.547e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.436e-05 [loop_unroll]: 0.00040957 [opt_after_cconv]: 9.333e-05, [1] [Cycle 1]: 8.757e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.553e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.329e-05 [tuple_transform]: 7.09e-05, [1] [Cycle 1]: 6.621e-05, [4] [d_1]: 4.06e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.518e-05 [cse_after_recomputation]: 1.976e-05, [1] [Cycle 1]: 1.542e-05, [1] [cse]: 1.022e-05 [environ_conv]: 4.87998e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 2.81999e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.98e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.28998e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.39e-06 [overlap_opt_shard_in_pipeline]: 1.40999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.98002e-06 [control_data_broadcast_order]: 1.172e-05 [grouped_pairwise_exchange_alltoall]: 1.90001e-06 [offloading_packed_experts]: 3.89002e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.48002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.63003e-06 [overlap_grad_ring_attention]: 4.20999e-06 [overlap_grad_flash_sp]: 1.699e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.21997e-06 [symbol_engine_optimizer]: 6.804e-05, [1] [Cycle 1]: 6.373e-05, [6] [build]: 2.10002e-06 [elim_shapecalc]: 8.28999e-06 [elim_not_effective]: 1.112e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.629e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.29001e-06 [opt_after_jit_grad]: 0.00044612 [validate]: 3.103e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00626168 [execute]: 6.94999e-06 Sums bootstrap : 0.000482s : 3.30% type_inference : 0.004342s : 29.70% event_method : 0.000011s : 0.07% auto_monad : 0.000053s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000418s : 2.86% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000338s : 2.31% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000042s : 0.28% optimize.opt_a.a_3 : 0.000091s : 0.62% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.07% optimize.opt_b.b_1 : 0.000109s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.17% optimize.loop_unroll : 0.000410s : 2.80% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.05% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006262s : 42.83% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.73% : 0.000023s : 4: substitution.arithmetic_simplify 1.50% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.75% : 0.000006s : 4: substitution.graph_param_transform 64.96% : 0.000079s : 2: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.38% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004299 2 91.97% : 0.003954s : 1: type_inference.infer 8.03% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.42% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.96% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.91% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000001s : 8: predicate.shard_identity_eliminate 1.13% : 0.000002s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.53% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.61% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.39% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026575 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.38% : 0.003023s : 1: add_attr 11.34% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.96% : 0.000520s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000418s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000457s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.96% : 0.000788s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.07% : 0.001878s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.71% : 0.000455s : 1: opt_after_jit_grad 0.69% : 0.000184s : 1: opt_b 13.86% : 0.003682s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.70% : 0.000185s : 1: renormalize.infer 0.55% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.60% : 0.006271s : 1: task_emit 0.28% : 0.000074s : 1: tuple_transform 16.39% : 0.004356s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0360503, [24] [bootstrap]: 0.00051592 [type_inference]: 0.0102925 [event_method]: 4.076e-05 [auto_monad]: 0.00011417 [graph_reusing]: 8.03999e-06 [inline]: 1.96e-06 [add_attr]: 0.00299249, [1] [add_attr_with_inline]: 0.00298429, [1] [Cycle 1]: 6.807e-05, [2] [tag_attr]: 3.215e-05 [meta_addattr_fg_expand]: 8.33001e-06 [parallel-infer-symbol]: 2.88998e-06 [pre_auto_parallel]: 4.573e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.0130623, [53] [py_interpret_to_execute]: 3.538e-05 [rewriter_before_opt_a]: 0.00012792 [opt_a]: 0.0107817, [3] [Cycle 1]: 0.00688137, [45] [expand_dump_flag]: 4.02e-06 [switch_simplify]: 6.702e-05 [loop_unroll]: 5.586e-05 [a_1]: 0.00136558 [with_stream_mark]: 2.409e-05 [recompute_prepare]: 2.231e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 8e-06 [updatestate_loads_eliminate]: 7.78001e-06 [parameter_eliminate]: 2.59999e-06 [a_2]: 0.00024925 [accelerated_algorithm]: 3.118e-05 [shard]: 1.98002e-06 [meta_shard_fg_expand]: 3.43999e-06 [shard_inline]: 1.639e-05 [merge_send_recv]: 1.611e-05 [auto_parallel]: 1.104e-05 [parallel]: 1.952e-05 [flash_sp]: 1.153e-05 [merge_comm]: 9.96e-06 [allreduce_fusion]: 8.75999e-06 [matmul_add_comm_reduction]: 2.664e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.772e-05 [virtual_dataset]: 1.557e-05 [get_grad_eliminate_]: 1.518e-05 [virtual_output]: 1.51e-05 [merge_forward]: 9.51e-06 [cell_reuse_recompute_pass]: 1.04998e-06 [offload_activation]: 1.778e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.906e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 2.75e-05 [set_forward_comm_id_for_comm_node_pass]: 9.91e-06 [meta_fg_expand]: 0.0013753 [flash_sp_send_recv_attached]: 3.7e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 5.92e-05 [a_after_grad]: 8.188e-05 [renormalize]: 0.0023762 [add_forward_monad_depend]: 9.54e-06 [auto_monad_grad]: 4.93001e-06 [auto_monad_eliminator]: 5.575e-05 [cse]: 0.00016179 [a_3]: 0.00033602 [Cycle 2]: 0.00297955, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.757e-05 [loop_unroll]: 4.468e-05 [a_1]: 0.00154111 [with_stream_mark]: 1.157e-05 [recompute_prepare]: 1.073e-05 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 3.63999e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 0.00012731 [accelerated_algorithm]: 1.211e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 2.341e-05 [shard_inline]: 1.016e-05 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.31999e-06 [parallel]: 4.88001e-06 [flash_sp]: 3.43e-06 [merge_comm]: 4.97e-06 [allreduce_fusion]: 4.68999e-06 [matmul_add_comm_reduction]: 7.40998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.78001e-06 [get_grad_eliminate_]: 8.69e-06 [virtual_output]: 8.52998e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 9.09989e-07 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.679e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.411e-05 [set_forward_comm_id_for_comm_node_pass]: 5.49e-06 [meta_fg_expand]: 3.429e-05 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.488e-05 [a_after_grad]: 1.41e-05 [renormalize]: 0.00057182 [add_forward_monad_depend]: 4.04002e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 1.437e-05 [cse]: 4.591e-05 [a_3]: 6.388e-05 [Cycle 3]: 0.00090653, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 1.067e-05 [loop_unroll]: 9.04998e-06 [a_1]: 0.00025786 [with_stream_mark]: 9.99001e-06 [recompute_prepare]: 9.10001e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 3.86999e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.00012639 [accelerated_algorithm]: 1.21e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 7.55e-06 [parallel]: 4.42e-06 [flash_sp]: 9.5999e-07 [merge_comm]: 5.17e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.60998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.042e-05 [virtual_dataset]: 8.97999e-06 [get_grad_eliminate_]: 8.60001e-06 [virtual_output]: 8.38999e-06 [merge_forward]: 4.08999e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 8.71002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.64e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 1.402e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.326e-05 [a_after_grad]: 1.447e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 1.024e-05 [cse]: 2.48e-05 [a_3]: 5.694e-05 [py_interpret_to_execute_after_opt_a]: 9.74e-06 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 4.943e-05 [convert_after_rewriter]: 9.01002e-06 [order_py_execute_after_rewriter]: 6.77002e-06 [mutable_eliminate]: 0.00045721 [opt_b]: 0.00028607, [1] [Cycle 1]: 0.00027997, [7] [b_1]: 0.00018889 [b_2]: 1.054e-05 [updatestate_depend_eliminate]: 7.46999e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 4.28999e-06 [renormalize]: 3.29979e-07 [cse]: 3.013e-05 [optimize_parallel_all_gather_comm]: 2.048e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.084e-05 [loop_unroll]: 0.0004235 [opt_after_cconv]: 0.00013473, [1] [Cycle 1]: 0.00012909, [7] [c_1]: 4.869e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 7.03e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 3.95e-06 [cse]: 2.93e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 2.941e-05 [tuple_transform]: 0.0001023, [1] [Cycle 1]: 9.777e-05, [4] [d_1]: 6.764e-05 [none_parameter_eliminate]: 2.02999e-06 [renormalize]: 2.70025e-07 [switch_simplify]: 9.81998e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 5.784e-05 [cse_after_recomputation]: 3.138e-05, [1] [Cycle 1]: 2.677e-05, [1] [cse]: 2.146e-05 [environ_conv]: 8.34998e-06 [swap_dp_allreduce_reducescatter]: 7.46999e-06 [bias_add_comm_swap]: 2.48e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.93998e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 2.86999e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.16003e-06 [reorder_send_recv_between_fp_bp]: 2.83003e-06 [comm_op_add_attrs]: 1.04003e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.759e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 4.90001e-06 [overlap_recompute_and_grad_model_parallel]: 5.62999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 5.17e-06 [overlap_grad_flash_sp]: 2.426e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.58e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 0.000137, [1] [Cycle 1]: 0.00013285, [6] [build]: 9.39998e-06 [elim_shapecalc]: 5.008e-05 [elim_not_effective]: 1.875e-05 [opt_reshape]: 1.036e-05 [fold_const_symbol]: 1.486e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 2.535e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.71999e-06 [opt_after_jit_grad]: 0.00046925 [validate]: 4.439e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00820335 [execute]: 7.14001e-06 Sums bootstrap : 0.000516s : 1.62% type_inference : 0.010292s : 32.37% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000128s : 0.40% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.39% optimize.opt_a.loop_unroll : 0.000110s : 0.34% optimize.opt_a.a_1 : 0.003165s : 9.95% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000503s : 1.58% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000029s : 0.09% optimize.opt_a.shard_inline : 0.000036s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001413s : 4.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000087s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002948s : 9.27% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000233s : 0.73% optimize.opt_a.a_3 : 0.000457s : 1.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000457s : 1.44% optimize.opt_b.b_1 : 0.000189s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000423s : 1.33% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000050s : 0.16% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000469s : 1.48% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008203s : 25.80% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000732 218 5.88% : 0.000043s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.63% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 54.63% : 0.000400s : 16: substitution.inline 2.06% : 0.000015s : 2: substitution.inline_without_move 1.41% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.07% : 0.000015s : 3: substitution.less_batch_normalization 1.79% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.35% : 0.000025s : 10: substitution.replace_applicator 1.45% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.87% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.43% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010221 2 87.45% : 0.008938s : 1: type_inference.infer 12.55% : 0.001283s : 1: type_inference.specialize ------[replace.] 0.000204 30 59.55% : 0.000121s : 16: replace.inline 40.45% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000422 30 92.84% : 0.000392s : 16: match.inline 7.16% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.12% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.29% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.51% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001462 32 58.05% : 0.000849s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.95% : 0.000613s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060208 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.98% : 0.002997s : 1: add_attr 4.96% : 0.002988s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.92% : 0.000553s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.01% : 0.004824s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.13% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.91% : 0.010785s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.80% : 0.000479s : 1: opt_after_jit_grad 0.48% : 0.000290s : 1: opt_b 21.70% : 0.013066s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.65% : 0.001596s : 2: renormalize.infer 2.23% : 0.001340s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000054s : 1: rewriter_after_opt_a 0.22% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000140s : 1: symbol_engine_optimizer 13.64% : 0.008213s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 17.12% : 0.010308s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x9-kbk],max_mem:10.0M . TotalTime = 0.834987, [24] [bootstrap]: 0.0005996 [type_inference]: 0.00605994 [event_method]: 1.409e-05 [auto_monad]: 5.757e-05 [graph_reusing]: 5.62999e-06 [inline]: 1.90001e-06 [add_attr]: 0.00340054, [1] [add_attr_with_inline]: 0.00338931, [1] [Cycle 1]: 4.599e-05, [2] [tag_attr]: 1.613e-05 [meta_addattr_fg_expand]: 4.00998e-06 [parallel-infer-symbol]: 3.24001e-06 [pre_auto_parallel]: 2.876e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.56002e-06 [optimize]: 0.00397455, [53] [py_interpret_to_execute]: 2.012e-05 [rewriter_before_opt_a]: 5.811e-05 [opt_a]: 0.00213345, [2] [Cycle 1]: 0.00151594, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 3.227e-05 [loop_unroll]: 2.111e-05 [a_1]: 0.00045584 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.55998e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.75e-05 [accelerated_algorithm]: 6.16998e-06 [shard]: 2.03997e-06 [meta_shard_fg_expand]: 1.66998e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.15999e-06 [auto_parallel]: 6.26998e-06 [parallel]: 2.422e-05 [flash_sp]: 7.51999e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.17001e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.43999e-06 [virtual_dataset]: 6.41e-06 [get_grad_eliminate_]: 5.77001e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.71003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 1.58002e-06 [before_grad]: 1.019e-05 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00040588 [add_forward_monad_depend]: 4.92e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.366e-05 [cse]: 2.85e-05 [a_3]: 4.138e-05 [Cycle 2]: 0.00060849, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 7.04001e-06 [loop_unroll]: 5.57001e-06 [a_1]: 0.00012586 [with_stream_mark]: 9.72001e-06 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.777e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.67e-06 [auto_parallel]: 4.95001e-06 [parallel]: 4.12e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.59001e-06 [matmul_add_comm_reduction]: 5.20999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 5.19998e-06 [merge_forward]: 2.36e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.65999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11999e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.31002e-06 [a_after_grad]: 8.05999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.19001e-06 [cse]: 1.259e-05 [a_3]: 3.205e-05 [py_interpret_to_execute_after_opt_a]: 7.46001e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.058e-05 [convert_after_rewriter]: 6.73e-06 [order_py_execute_after_rewriter]: 5.11002e-06 [mutable_eliminate]: 0.00045182 [opt_b]: 0.00018177, [1] [Cycle 1]: 0.00017588, [7] [b_1]: 0.00010849 [b_2]: 6.93e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.12999e-06 [renormalize]: 3.39991e-07 [cse]: 1.666e-05 [optimize_parallel_all_gather_comm]: 1.578e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 2.198e-05 [loop_unroll]: 0.0004146 [opt_after_cconv]: 9.33e-05, [1] [Cycle 1]: 8.759e-05, [7] [c_1]: 2.727e-05 [parameter_eliminate]: 2.47001e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.09999e-06 [cse]: 1.56e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.296e-05 [tuple_transform]: 6.811e-05, [1] [Cycle 1]: 6.379e-05, [4] [d_1]: 3.878e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.02999e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 5.189e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.571e-05, [1] [cse]: 1.056e-05 [environ_conv]: 4.02e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.28002e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.19003e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97999e-06 [control_data_broadcast_order]: 1.197e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 3.81999e-06 [overlap_grad_flash_sp]: 1.796e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 6.845e-05, [1] [Cycle 1]: 6.433e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 8.2e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 6.45997e-06 [fold_const_symbol]: 9.09e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.594e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.0004489 [validate]: 3.048e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.820107 [execute]: 9.12999e-06 Sums bootstrap : 0.000600s : 0.07% type_inference : 0.006060s : 0.73% event_method : 0.000014s : 0.00% auto_monad : 0.000058s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000582s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000406s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000041s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000452s : 0.05% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000415s : 0.05% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.05% validate : 0.000030s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.820107s : 98.74% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000164 30 14.98% : 0.000025s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 66.47% : 0.000109s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.54% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006012 2 90.83% : 0.005461s : 1: type_inference.infer 9.17% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.57% : 0.000027s : 3: replace.inline 30.43% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.69% : 0.000107s : 3: match.inline 8.31% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.05% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.21% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.20% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.27% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.85% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.84% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.94% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.47% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.07% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.67% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.78% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.22% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.843868 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.40% : 0.003405s : 1: add_attr 0.40% : 0.003393s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000063s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.08% : 0.000642s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000461s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000951s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000089s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.25% : 0.002136s : 1: opt_a 0.01% : 0.000097s : 1: opt_after_cconv 0.05% : 0.000458s : 1: opt_after_jit_grad 0.02% : 0.000185s : 1: opt_b 0.47% : 0.003978s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.03% : 0.000211s : 1: renormalize.infer 0.02% : 0.000188s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000034s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000071s : 1: symbol_engine_optimizer 97.19% : 0.820128s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.72% : 0.006074s : 1: type_inference 0.01% : 0.000056s : 1: validate TotalTime = 0.073644, [24] [bootstrap]: 0.00050841 [type_inference]: 0.00504002 [event_method]: 1.131e-05 [auto_monad]: 5.178e-05 [graph_reusing]: 5.06002e-06 [inline]: 2.34001e-06 [add_attr]: 0.00305551, [1] [add_attr_with_inline]: 0.00304742, [1] [Cycle 1]: 4.705e-05, [2] [tag_attr]: 1.198e-05 [meta_addattr_fg_expand]: 3.71999e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.164e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.91998e-06 [pipeline_split]: 1.56002e-06 [optimize]: 0.00373159, [53] [py_interpret_to_execute]: 1.523e-05 [rewriter_before_opt_a]: 3.964e-05 [opt_a]: 0.00187764, [2] [Cycle 1]: 0.00127657, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 2.52e-05 [loop_unroll]: 1.391e-05 [a_1]: 0.00029465 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 7.19001e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.49001e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.743e-05 [accelerated_algorithm]: 6.26998e-06 [shard]: 2.42001e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 7.95e-06 [auto_parallel]: 5.89e-06 [parallel]: 1.867e-05 [flash_sp]: 7.81001e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 9.14e-06 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 7.26001e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.72001e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.85998e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.067e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.84e-06 [before_grad]: 9.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.42001e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.042e-05 [a_after_grad]: 9.07999e-06 [renormalize]: 0.00035747 [add_forward_monad_depend]: 4.42e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.358e-05 [cse]: 2.812e-05 [a_3]: 4.008e-05 [Cycle 2]: 0.00059184, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012492 [with_stream_mark]: 1.131e-05 [recompute_prepare]: 5.68002e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.14e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.779e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.14003e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.48001e-06 [flash_sp]: 3.23e-06 [merge_comm]: 3.09001e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.46e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.01998e-06 [virtual_dataset]: 5.12999e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.55002e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49999e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.52998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.97e-06 [a_after_grad]: 8.40001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 6.28998e-06 [cse]: 1.315e-05 [a_3]: 3.179e-05 [py_interpret_to_execute_after_opt_a]: 7.58001e-06 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 3.055e-05 [convert_after_rewriter]: 6.64001e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.0004481 [opt_b]: 0.00018213, [1] [Cycle 1]: 0.00017616, [7] [b_1]: 0.00010825 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 4.72e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.23002e-06 [renormalize]: 3.80009e-07 [cse]: 1.642e-05 [optimize_parallel_all_gather_comm]: 1.56e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.218e-05 [loop_unroll]: 0.00045641 [opt_after_cconv]: 9.546e-05, [1] [Cycle 1]: 8.971e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.68e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 1.313e-05 [tuple_transform]: 6.944e-05, [1] [Cycle 1]: 6.482e-05, [4] [d_1]: 3.975e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 2.07999e-06 [add_recomputation]: 4.435e-05 [cse_after_recomputation]: 1.953e-05, [1] [Cycle 1]: 1.538e-05, [1] [cse]: 1.057e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 5.61e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.67e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.64999e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.77002e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66e-06 [overlap_recompute_comm]: 2.21998e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 6.815e-05, [1] [Cycle 1]: 6.407e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.33001e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 9.14e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.55999e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.53e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00045001 [validate]: 3.184e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0604891 [execute]: 8.37e-06 Sums bootstrap : 0.000508s : 0.73% type_inference : 0.005040s : 7.24% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.60% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000358s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.64% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000456s : 0.66% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.65% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060489s : 86.88% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.78% : 0.000023s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000001s : 2: substitution.fold_const_symbol 4.58% : 0.000006s : 4: substitution.graph_param_transform 65.39% : 0.000081s : 2: substitution.inline 2.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.34% : 0.000004s : 4: substitution.remove_not_recompute_node 2.78% : 0.000003s : 4: substitution.replace_old_param ------[type_inference.] 0.004996 2 92.54% : 0.004623s : 1: type_inference.infer 7.46% : 0.000373s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000138 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.98% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.08% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.74% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.59% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.30% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 1.00% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.95% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.18% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000304 6 49.41% : 0.000150s : 2: func_graph_cloner_run.FuncGraphClonerGraph 50.59% : 0.000154s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081719 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.74% : 0.003060s : 1: add_attr 3.73% : 0.003051s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000547s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.57% : 0.000465s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.95% : 0.000773s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.30% : 0.001880s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.56% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.57% : 0.003735s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.24% : 0.000197s : 1: renormalize.infer 0.19% : 0.000154s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.04% : 0.060505s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.19% : 0.005056s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.0724527, [24] [bootstrap]: 0.00048831 [type_inference]: 0.00562344 [event_method]: 1.402e-05 [auto_monad]: 5.688e-05 [graph_reusing]: 5.51e-06 [inline]: 2.02999e-06 [add_attr]: 0.00294812, [1] [add_attr_with_inline]: 0.00294022, [1] [Cycle 1]: 4.761e-05, [2] [tag_attr]: 1.588e-05 [meta_addattr_fg_expand]: 4.22e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.492e-05 [insert-virtual-dataset]: 2.33998e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00399615, [53] [py_interpret_to_execute]: 2.026e-05 [rewriter_before_opt_a]: 5.908e-05 [opt_a]: 0.00215872, [2] [Cycle 1]: 0.00155164, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 3.211e-05 [loop_unroll]: 2.06e-05 [a_1]: 0.00045286 [with_stream_mark]: 1.317e-05 [recompute_prepare]: 7.56999e-06 [updatestate_depend_eliminate]: 4.09002e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.6e-05 [accelerated_algorithm]: 6.49001e-06 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 7.93001e-06 [auto_parallel]: 5.82999e-06 [parallel]: 1.938e-05 [flash_sp]: 7.4e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 9.66e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 6.81999e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.36998e-06 [virtual_output]: 5.55001e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.11002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.067e-05 [merge_recompute_call_nodes]: 1.51002e-06 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55998e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.21998e-06 [receive_attached]: 2.63e-06 [after_resolve]: 9.84999e-06 [a_after_grad]: 8.80999e-06 [renormalize]: 0.0004565 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.374e-05 [cse]: 2.806e-05 [a_3]: 4.104e-05 [Cycle 2]: 0.00059796, [45] [expand_dump_flag]: 8.09989e-07 [switch_simplify]: 6.87002e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012487 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.40999e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.778e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.37001e-06 [merge_send_recv]: 4.20999e-06 [auto_parallel]: 5.17e-06 [parallel]: 3.93999e-06 [flash_sp]: 3.85e-06 [merge_comm]: 4.62e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.04998e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 4.85001e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.68998e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.21e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32999e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.58999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.20002e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.51003e-06 [a_after_grad]: 8.55001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.38998e-06 [cse]: 1.346e-05 [a_3]: 3.342e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.124e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.71e-06 [mutable_eliminate]: 0.00044976 [opt_b]: 0.00018257, [1] [Cycle 1]: 0.00017668, [7] [b_1]: 0.00010881 [b_2]: 7.09001e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 2.80008e-07 [cse]: 1.644e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.239e-05 [loop_unroll]: 0.00041545 [opt_after_cconv]: 9.497e-05, [1] [Cycle 1]: 8.912e-05, [7] [c_1]: 2.725e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.659e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.338e-05 [tuple_transform]: 6.948e-05, [1] [Cycle 1]: 6.516e-05, [4] [d_1]: 3.922e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.352e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.578e-05, [1] [cse]: 1.062e-05 [environ_conv]: 4.53001e-06 [swap_dp_allreduce_reducescatter]: 5.13002e-06 [bias_add_comm_swap]: 3.09001e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 3.13998e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 1.00001e-06 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.16002e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88002e-06 [control_data_broadcast_order]: 1.097e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.52997e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 3.76001e-06 [overlap_grad_flash_sp]: 1.743e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 1.82999e-06 [handle_group_info]: 1.28002e-06 [symbol_engine_optimizer]: 6.916e-05, [1] [Cycle 1]: 6.487e-05, [6] [build]: 2.79999e-06 [elim_shapecalc]: 8.53001e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.96002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.59e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.592e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00044902 [validate]: 3.076e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0585724 [execute]: 8.67998e-06 Sums bootstrap : 0.000488s : 0.71% type_inference : 0.005623s : 8.20% event_method : 0.000014s : 0.02% auto_monad : 0.000057s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000578s : 0.84% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000457s : 0.67% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.66% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000449s : 0.66% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.058572s : 85.45% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.92% : 0.000025s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 67.09% : 0.000111s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.18% : 0.000004s : 4: substitution.replace_old_param 6.60% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005581 2 90.16% : 0.005032s : 1: type_inference.infer 9.84% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.27% : 0.000028s : 3: replace.inline 29.73% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.68% : 0.000109s : 3: match.inline 8.32% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 1.12% : 0.000002s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.30% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.41% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000009s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.31% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.39% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.72% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 47.80% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.20% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080953 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.65% : 0.002952s : 1: add_attr 3.64% : 0.002944s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000062s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000525s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.16% : 0.000942s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.67% : 0.002162s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.94% : 0.004000s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.31% : 0.000248s : 1: renormalize.infer 0.25% : 0.000201s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 72.37% : 0.058588s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.96% : 0.005637s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 1.06263, [24] [bootstrap]: 0.00054668 [type_inference]: 0.0115809 [event_method]: 4.91e-05 [auto_monad]: 0.00012155 [graph_reusing]: 8.61002e-06 [inline]: 1.70001e-06 [add_attr]: 0.00304688, [1] [add_attr_with_inline]: 0.00303823, [1] [Cycle 1]: 7.089e-05, [2] [tag_attr]: 3.489e-05 [meta_addattr_fg_expand]: 9.64999e-06 [parallel-infer-symbol]: 2.58e-06 [pre_auto_parallel]: 5.005e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.013345, [53] [py_interpret_to_execute]: 3.734e-05 [rewriter_before_opt_a]: 0.00014808 [opt_a]: 0.0110812, [3] [Cycle 1]: 0.0071449, [45] [expand_dump_flag]: 3.93999e-06 [switch_simplify]: 7.56e-05 [loop_unroll]: 6.222e-05 [a_1]: 0.00145615 [with_stream_mark]: 2.385e-05 [recompute_prepare]: 2.18e-05 [updatestate_depend_eliminate]: 9.12001e-06 [updatestate_assign_eliminate]: 8.01001e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 2.62001e-06 [a_2]: 0.00024564 [accelerated_algorithm]: 3.19e-05 [shard]: 2.09999e-06 [meta_shard_fg_expand]: 3.30998e-06 [shard_inline]: 1.626e-05 [merge_send_recv]: 1.662e-05 [auto_parallel]: 1.145e-05 [parallel]: 2.266e-05 [flash_sp]: 1.12e-05 [merge_comm]: 9.86998e-06 [allreduce_fusion]: 9.03002e-06 [matmul_add_comm_reduction]: 2.67e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.771e-05 [virtual_dataset]: 1.608e-05 [get_grad_eliminate_]: 1.526e-05 [virtual_output]: 1.543e-05 [merge_forward]: 9.27999e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.819e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.878e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 2.779e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00141106 [flash_sp_send_recv_attached]: 4.25e-06 [receive_attached]: 2.93e-06 [after_resolve]: 5.965e-05 [a_after_grad]: 8.057e-05 [renormalize]: 0.00247572 [add_forward_monad_depend]: 9.54e-06 [auto_monad_grad]: 5.14998e-06 [auto_monad_eliminator]: 5.62e-05 [cse]: 0.00016872 [a_3]: 0.00033411 [Cycle 2]: 0.00301803, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.738e-05 [loop_unroll]: 4.388e-05 [a_1]: 0.00153119 [with_stream_mark]: 1.191e-05 [recompute_prepare]: 1.065e-05 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00012653 [accelerated_algorithm]: 1.207e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.36e-06 [merge_send_recv]: 6.64999e-06 [auto_parallel]: 7.25998e-06 [parallel]: 4.94998e-06 [flash_sp]: 3.8e-06 [merge_comm]: 5.69999e-06 [allreduce_fusion]: 4.58999e-06 [matmul_add_comm_reduction]: 8.03999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.033e-05 [virtual_dataset]: 9.12999e-06 [get_grad_eliminate_]: 8.62e-06 [virtual_output]: 8.48999e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.656e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.408e-05 [set_forward_comm_id_for_comm_node_pass]: 5.47999e-06 [meta_fg_expand]: 6.924e-05 [flash_sp_send_recv_attached]: 1.12999e-06 [receive_attached]: 1.12e-06 [after_resolve]: 1.581e-05 [a_after_grad]: 1.443e-05 [renormalize]: 0.00060852 [add_forward_monad_depend]: 3.98999e-06 [auto_monad_grad]: 1.14003e-06 [auto_monad_eliminator]: 1.411e-05 [cse]: 4.562e-05 [a_3]: 6.464e-05 [Cycle 3]: 0.00090434, [45] [expand_dump_flag]: 1.07998e-06 [switch_simplify]: 1.047e-05 [loop_unroll]: 8.95001e-06 [a_1]: 0.00025067 [with_stream_mark]: 9.72001e-06 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.65001e-06 [updatestate_assign_eliminate]: 3.95998e-06 [updatestate_loads_eliminate]: 3.75998e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012317 [accelerated_algorithm]: 1.162e-05 [shard]: 1.41002e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 9.00001e-06 [merge_send_recv]: 7.08e-06 [auto_parallel]: 7.26001e-06 [parallel]: 4.37998e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 5.08002e-06 [allreduce_fusion]: 4.89998e-06 [matmul_add_comm_reduction]: 7.71999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.91e-06 [virtual_dataset]: 8.85001e-06 [get_grad_eliminate_]: 8.47998e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 8.72e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.561e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.405e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 3.08e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 1.465e-05 [a_after_grad]: 1.506e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.36002e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 1.07e-05 [cse]: 2.713e-05 [a_3]: 5.974e-05 [py_interpret_to_execute_after_opt_a]: 1.004e-05 [slice_cell_reuse_recomputed_activation]: 1.93997e-06 [rewriter_after_opt_a]: 4.869e-05 [convert_after_rewriter]: 9.04e-06 [order_py_execute_after_rewriter]: 6.69001e-06 [mutable_eliminate]: 0.00045651 [opt_b]: 0.00028765, [1] [Cycle 1]: 0.00028182, [7] [b_1]: 0.00018861 [b_2]: 1.057e-05 [updatestate_depend_eliminate]: 6.96001e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.99002e-06 [renormalize]: 4.39992e-07 [cse]: 3.345e-05 [optimize_parallel_all_gather_comm]: 2.083e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.009e-05 [loop_unroll]: 0.00042222 [opt_after_cconv]: 0.00013629, [1] [Cycle 1]: 0.00013042, [7] [c_1]: 4.834e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 6.89999e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.81999e-06 [cse]: 3.12e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 3.029e-05 [tuple_transform]: 0.00010014, [1] [Cycle 1]: 9.548e-05, [4] [d_1]: 6.585e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.63002e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 5.878e-05 [cse_after_recomputation]: 3.243e-05, [1] [Cycle 1]: 2.784e-05, [1] [cse]: 2.24e-05 [environ_conv]: 9.21002e-06 [swap_dp_allreduce_reducescatter]: 7.87e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.26002e-06 [full_micro_interleaved_order_control]: 2.56e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.19998e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.20999e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.746e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.27001e-06 [overlap_recompute_and_grad_model_parallel]: 5.57999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.39999e-06 [overlap_grad_ring_attention]: 5.31002e-06 [overlap_grad_flash_sp]: 2.457e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 9.10019e-07 [symbol_engine_optimizer]: 9.877e-05, [1] [Cycle 1]: 9.449e-05, [6] [build]: 9.75002e-06 [elim_shapecalc]: 1.343e-05 [elim_not_effective]: 1.828e-05 [opt_reshape]: 1.006e-05 [fold_const_symbol]: 1.474e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 2.528e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.0004654 [validate]: 4.625e-05 [backend_pass]: 9.50007e-07 [task_emit]: 1.03307 [execute]: 9.27001e-06 Sums bootstrap : 0.000547s : 0.05% type_inference : 0.011581s : 1.09% event_method : 0.000049s : 0.00% auto_monad : 0.000122s : 0.01% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.00% optimize.rewriter_before_opt_a : 0.000148s : 0.01% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000133s : 0.01% optimize.opt_a.loop_unroll : 0.000115s : 0.01% optimize.opt_a.a_1 : 0.003238s : 0.31% optimize.opt_a.with_stream_mark : 0.000045s : 0.00% optimize.opt_a.recompute_prepare : 0.000042s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.01% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000026s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000021s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001483s : 0.14% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.01% optimize.opt_a.a_after_grad : 0.000110s : 0.01% optimize.opt_a.renormalize : 0.003084s : 0.29% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.01% optimize.opt_a.cse : 0.000241s : 0.02% optimize.opt_a.a_3 : 0.000458s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000457s : 0.04% optimize.opt_b.b_1 : 0.000189s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000422s : 0.04% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.00% optimize.tuple_transform.d_1 : 0.000066s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000465s : 0.04% validate : 0.000046s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.033070s : 97.62% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000764 222 5.90% : 0.000045s : 12: substitution.arithmetic_simplify 1.74% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.89% : 0.000427s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.07% : 0.000016s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 20: substitution.remove_not_recompute_node 3.07% : 0.000023s : 10: substitution.replace_applicator 1.37% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.63% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.59% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.44% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011507 2 86.87% : 0.009996s : 1: type_inference.infer 13.13% : 0.001511s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.59% : 0.000126s : 17: replace.inline 42.41% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000452 33 92.57% : 0.000418s : 17: match.inline 7.43% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.25% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.07% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000042s : 249: predicate.inline 1.20% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.48% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.57% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000015s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.12% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001611 34 55.81% : 0.000899s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.19% : 0.000712s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.087331 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.28% : 0.003051s : 1: add_attr 0.28% : 0.003042s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000129s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.05% : 0.000584s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000056s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000465s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.45% : 0.004906s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000174s : 28: opt.transform.opt_b 0.01% : 0.000073s : 2: opt.transform.opt_trans_graph 0.00% : 0.000053s : 4: opt.transform.symbol_engine_opt 1.02% : 0.011084s : 1: opt_a 0.01% : 0.000140s : 1: opt_after_cconv 0.04% : 0.000475s : 1: opt_after_jit_grad 0.03% : 0.000291s : 1: opt_b 1.23% : 0.013349s : 1: optimize 0.00% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000055s : 1: pre_auto_parallel 0.00% : 0.000041s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000034s : 1: remove_dup_value 0.15% : 0.001610s : 2: renormalize.infer 0.13% : 0.001460s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000052s : 1: rewriter_after_opt_a 0.01% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000101s : 1: symbol_engine_optimizer 95.01% : 1.033088s : 1: task_emit 0.01% : 0.000103s : 1: tuple_transform 1.07% : 0.011595s : 1: type_inference 0.01% : 0.000097s : 1: validate TotalTime = 0.0705422, [24] [bootstrap]: 0.00048416 [type_inference]: 0.00434996 [event_method]: 1.114e-05 [auto_monad]: 5.576e-05 [graph_reusing]: 5.22e-06 [inline]: 2.02001e-06 [add_attr]: 0.00299255, [1] [add_attr_with_inline]: 0.00298445, [1] [Cycle 1]: 4.371e-05, [2] [tag_attr]: 1.23e-05 [meta_addattr_fg_expand]: 3.58999e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.126e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00371237, [53] [py_interpret_to_execute]: 1.567e-05 [rewriter_before_opt_a]: 3.884e-05 [opt_a]: 0.00185264, [2] [Cycle 1]: 0.00125713, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 2.481e-05 [loop_unroll]: 1.388e-05 [a_1]: 0.00029399 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 7.44002e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.46001e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.693e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 2.88998e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.04999e-06 [merge_send_recv]: 8.1e-06 [auto_parallel]: 5.69e-06 [parallel]: 1.814e-05 [flash_sp]: 7.34002e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.16001e-06 [matmul_add_comm_reduction]: 9.42001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.72999e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.41002e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 9.10001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.078e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.20998e-06 [meta_fg_expand]: 2.36998e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.098e-05 [a_after_grad]: 8.69e-06 [renormalize]: 0.00034274 [add_forward_monad_depend]: 4.46002e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.317e-05 [cse]: 2.802e-05 [a_3]: 4.002e-05 [Cycle 2]: 0.00058646, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.17e-06 [a_1]: 0.00012439 [with_stream_mark]: 9.40001e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.707e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 4.82e-06 [parallel]: 3.86001e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.59001e-06 [matmul_add_comm_reduction]: 5.39e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.26e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 4.94003e-06 [virtual_output]: 4.91002e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.38002e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.69e-06 [a_after_grad]: 8.41002e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.79984e-07 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.21e-06 [cse]: 1.355e-05 [a_3]: 3.127e-05 [py_interpret_to_execute_after_opt_a]: 7.3e-06 [slice_cell_reuse_recomputed_activation]: 2.18002e-06 [rewriter_after_opt_a]: 3.238e-05 [convert_after_rewriter]: 7.16001e-06 [order_py_execute_after_rewriter]: 5.08002e-06 [mutable_eliminate]: 0.00044392 [opt_b]: 0.00018012, [1] [Cycle 1]: 0.00017409, [7] [b_1]: 0.00010747 [b_2]: 6.96999e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [renormalize]: 4.09986e-07 [cse]: 1.62e-05 [optimize_parallel_all_gather_comm]: 1.589e-05 [overlap_param_gather]: 2.79001e-06 [cconv]: 2.393e-05 [loop_unroll]: 0.00041025 [opt_after_cconv]: 9.443e-05, [1] [Cycle 1]: 8.873e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.608e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.275e-05 [tuple_transform]: 7.037e-05, [1] [Cycle 1]: 6.578e-05, [4] [d_1]: 3.97e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.15997e-06 [partial_unused_args_eliminate]: 1.76003e-06 [add_recomputation]: 4.581e-05 [cse_after_recomputation]: 2.14e-05, [1] [Cycle 1]: 1.712e-05, [1] [cse]: 1.203e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 5.03002e-06 [bias_add_comm_swap]: 2.65002e-06 [label_micro_interleaved_index]: 4.52e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.66999e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.48998e-06 [reorder_send_recv_between_fp_bp]: 2.91999e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 5.07e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.12e-06 [overlap_grad_flash_sp]: 1.699e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.855e-05, [1] [Cycle 1]: 6.45e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 7.83999e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.13002e-06 [fold_const_symbol]: 9.17001e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.62e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.34001e-06 [opt_after_jit_grad]: 0.00044498 [validate]: 3.24e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0581889 [execute]: 8.22e-06 Sums bootstrap : 0.000484s : 0.73% type_inference : 0.004350s : 6.54% event_method : 0.000011s : 0.02% auto_monad : 0.000056s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000418s : 0.63% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000343s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000444s : 0.67% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000410s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000445s : 0.67% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058189s : 87.45% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.01% : 0.000022s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000001s : 2: substitution.fold_const_symbol 4.79% : 0.000006s : 4: substitution.graph_param_transform 65.20% : 0.000079s : 2: substitution.inline 2.51% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000004s : 4: substitution.remove_not_recompute_node 3.28% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004307 2 91.80% : 0.003954s : 1: type_inference.infer 8.20% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000135 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.97% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.65% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.43% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.66% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.08% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 43.33% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.67% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078460 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.82% : 0.002997s : 1: add_attr 3.81% : 0.002988s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000061s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.66% : 0.000521s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000768s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.36% : 0.001855s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.58% : 0.000454s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.74% : 0.003716s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.24% : 0.000187s : 1: renormalize.infer 0.19% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.19% : 0.058206s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.56% : 0.004363s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.104814, [24] [bootstrap]: 0.00053501 [type_inference]: 0.010411 [event_method]: 4.294e-05 [auto_monad]: 0.00011694 [graph_reusing]: 8.07e-06 [inline]: 1.96e-06 [add_attr]: 0.0030454, [1] [add_attr_with_inline]: 0.00303674, [1] [Cycle 1]: 6.778e-05, [2] [tag_attr]: 3.261e-05 [meta_addattr_fg_expand]: 8.52998e-06 [parallel-infer-symbol]: 3.16999e-06 [pre_auto_parallel]: 4.542e-05 [insert-virtual-dataset]: 2.66999e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.0131671, [53] [py_interpret_to_execute]: 3.696e-05 [rewriter_before_opt_a]: 0.00012853 [opt_a]: 0.0109179, [3] [Cycle 1]: 0.00699333, [45] [expand_dump_flag]: 3.78001e-06 [switch_simplify]: 6.819e-05 [loop_unroll]: 5.467e-05 [a_1]: 0.00133506 [with_stream_mark]: 2.429e-05 [recompute_prepare]: 2.188e-05 [updatestate_depend_eliminate]: 9.39e-06 [updatestate_assign_eliminate]: 8.20999e-06 [updatestate_loads_eliminate]: 7.41001e-06 [parameter_eliminate]: 2.64999e-06 [a_2]: 0.00024425 [accelerated_algorithm]: 3.002e-05 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 3.3e-06 [shard_inline]: 1.594e-05 [merge_send_recv]: 1.642e-05 [auto_parallel]: 1.111e-05 [parallel]: 1.882e-05 [flash_sp]: 1.167e-05 [merge_comm]: 9.54999e-06 [allreduce_fusion]: 8.80999e-06 [matmul_add_comm_reduction]: 6.513e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 1.864e-05 [virtual_dataset]: 1.573e-05 [get_grad_eliminate_]: 1.515e-05 [virtual_output]: 1.512e-05 [merge_forward]: 9.89999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.763e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.848e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 2.749e-05 [set_forward_comm_id_for_comm_node_pass]: 9.76e-06 [meta_fg_expand]: 0.00138989 [flash_sp_send_recv_attached]: 3.65e-06 [receive_attached]: 2.73998e-06 [after_resolve]: 5.919e-05 [a_after_grad]: 8.05e-05 [renormalize]: 0.00247076 [add_forward_monad_depend]: 9.65002e-06 [auto_monad_grad]: 5.06002e-06 [auto_monad_eliminator]: 5.61e-05 [cse]: 0.00016934 [a_3]: 0.00033276 [Cycle 2]: 0.00296572, [45] [expand_dump_flag]: 1.56998e-06 [switch_simplify]: 4.686e-05 [loop_unroll]: 4.362e-05 [a_1]: 0.00153535 [with_stream_mark]: 1.145e-05 [recompute_prepare]: 1.101e-05 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 1.34998e-06 [a_2]: 0.00012424 [accelerated_algorithm]: 1.181e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 9.19998e-06 [merge_send_recv]: 6.98e-06 [auto_parallel]: 7.56001e-06 [parallel]: 4.74e-06 [flash_sp]: 3.22997e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 4.60001e-06 [matmul_add_comm_reduction]: 7.89002e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 1.045e-05 [virtual_dataset]: 8.86997e-06 [get_grad_eliminate_]: 8.64003e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.52e-06 [cell_reuse_recompute_pass]: 9.80013e-07 [offload_activation]: 9.22999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.633e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 3.541e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.502e-05 [a_after_grad]: 1.404e-05 [renormalize]: 0.00058748 [add_forward_monad_depend]: 3.97998e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 1.445e-05 [cse]: 4.705e-05 [a_3]: 6.482e-05 [Cycle 3]: 0.00094521, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 1.042e-05 [loop_unroll]: 8.84e-06 [a_1]: 0.00029062 [with_stream_mark]: 1.034e-05 [recompute_prepare]: 9.17999e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012424 [accelerated_algorithm]: 1.179e-05 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 7.25998e-06 [auto_parallel]: 7.1e-06 [parallel]: 4.38999e-06 [flash_sp]: 1.02998e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 5.05999e-06 [matmul_add_comm_reduction]: 7.58999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.037e-05 [virtual_dataset]: 8.94998e-06 [get_grad_eliminate_]: 8.59998e-06 [virtual_output]: 8.43999e-06 [merge_forward]: 4.34002e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 8.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.613e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27001e-06 [meta_fg_expand]: 3.06001e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.309e-05 [a_after_grad]: 1.399e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.127e-05 [cse]: 2.791e-05 [a_3]: 5.781e-05 [py_interpret_to_execute_after_opt_a]: 1.043e-05 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 4.699e-05 [convert_after_rewriter]: 9.05001e-06 [order_py_execute_after_rewriter]: 7.48999e-06 [mutable_eliminate]: 0.0004602 [opt_b]: 0.00028637, [1] [Cycle 1]: 0.00028041, [7] [b_1]: 0.00018785 [b_2]: 1.056e-05 [updatestate_depend_eliminate]: 6.73998e-06 [updatestate_assign_eliminate]: 3.97998e-06 [updatestate_loads_eliminate]: 4.07e-06 [renormalize]: 5.8001e-07 [cse]: 3.229e-05 [optimize_parallel_all_gather_comm]: 2.062e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.067e-05 [loop_unroll]: 0.00042184 [opt_after_cconv]: 0.00013725, [1] [Cycle 1]: 0.00013121, [7] [c_1]: 4.845e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 7.18998e-06 [updatestate_assign_eliminate]: 4.06001e-06 [updatestate_loads_eliminate]: 4.07e-06 [cse]: 3.06e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 3.018e-05 [tuple_transform]: 0.00010287, [1] [Cycle 1]: 9.828e-05, [4] [d_1]: 6.754e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 2.79979e-07 [switch_simplify]: 9.89001e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 5.847e-05 [cse_after_recomputation]: 3.33e-05, [1] [Cycle 1]: 2.835e-05, [1] [cse]: 2.259e-05 [environ_conv]: 9.19e-06 [swap_dp_allreduce_reducescatter]: 7.56999e-06 [bias_add_comm_swap]: 2.26998e-06 [label_micro_interleaved_index]: 3.86999e-06 [label_fine_grained_interleaved_index]: 2.73003e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.68e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 9.90025e-07 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.83997e-06 [control_data_broadcast_order]: 1.783e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.13002e-06 [overlap_recompute_and_grad_model_parallel]: 6.28e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.99e-06 [overlap_grad_flash_sp]: 2.416e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.01997e-06 [symbol_engine_optimizer]: 9.82e-05, [1] [Cycle 1]: 9.402e-05, [6] [build]: 9.92999e-06 [elim_shapecalc]: 1.32e-05 [elim_not_effective]: 1.824e-05 [opt_reshape]: 9.71998e-06 [fold_const_symbol]: 1.489e-05 [renormalize]: 1.90019e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 2.508e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00046502 [validate]: 4.65e-05 [backend_pass]: 7.99977e-07 [task_emit]: 0.0766339 [execute]: 7.95e-06 Sums bootstrap : 0.000535s : 0.53% type_inference : 0.010411s : 10.36% event_method : 0.000043s : 0.04% auto_monad : 0.000117s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.04% optimize.rewriter_before_opt_a : 0.000129s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.12% optimize.opt_a.loop_unroll : 0.000107s : 0.11% optimize.opt_a.a_1 : 0.003161s : 3.15% optimize.opt_a.with_stream_mark : 0.000046s : 0.05% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000493s : 0.49% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000081s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001428s : 1.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.09% optimize.opt_a.a_after_grad : 0.000109s : 0.11% optimize.opt_a.renormalize : 0.003058s : 3.04% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000244s : 0.24% optimize.opt_a.a_3 : 0.000455s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.46% optimize.opt_b.b_1 : 0.000188s : 0.19% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000422s : 0.42% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000465s : 0.46% validate : 0.000047s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.076634s : 76.27% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000733 218 5.90% : 0.000043s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.66% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000002s : 5: substitution.fold_const_symbol 1.11% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.81% : 0.000402s : 16: substitution.inline 2.11% : 0.000015s : 2: substitution.inline_without_move 1.40% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000014s : 3: substitution.less_batch_normalization 1.82% : 0.000013s : 11: substitution.minmaximum_grad 0.77% : 0.000006s : 5: substitution.partial_eliminate 1.81% : 0.000013s : 20: substitution.remove_not_recompute_node 3.12% : 0.000023s : 10: substitution.replace_applicator 1.43% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.73% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.88% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.54% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010339 2 87.02% : 0.008998s : 1: type_inference.infer 12.98% : 0.001342s : 1: type_inference.specialize ------[replace.] 0.000201 30 58.93% : 0.000119s : 16: replace.inline 41.07% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 30 92.63% : 0.000393s : 16: match.inline 7.37% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.07% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000041s : 244: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.37% : 0.000010s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.10% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001539 32 56.95% : 0.000876s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.05% : 0.000663s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.129179 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.36% : 0.003050s : 1: add_attr 2.35% : 0.003041s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000124s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000573s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000013s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.72% : 0.004801s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000173s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.45% : 0.010921s : 1: opt_a 0.11% : 0.000141s : 1: opt_after_cconv 0.37% : 0.000475s : 1: opt_after_jit_grad 0.22% : 0.000290s : 1: opt_b 10.20% : 0.013171s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000034s : 1: remove_dup_value 1.24% : 0.001597s : 2: renormalize.infer 1.12% : 0.001448s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.10% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 59.34% : 0.076649s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 8.07% : 0.010426s : 1: type_inference 0.06% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y1-dtype_x9-ge],max_mem:10.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x0-pynative],max_mem:10.0M TotalTime = 0.0219159, [24] [bootstrap]: 0.00054401 [type_inference]: 0.00625922 [event_method]: 1.45e-05 [auto_monad]: 5.949e-05 [graph_reusing]: 6.11e-06 [inline]: 1.80001e-06 [add_attr]: 0.00339923, [1] [add_attr_with_inline]: 0.00338884, [1] [Cycle 1]: 4.598e-05, [2] [tag_attr]: 1.573e-05 [meta_addattr_fg_expand]: 4.66002e-06 [parallel-infer-symbol]: 3.11999e-06 [pre_auto_parallel]: 2.859e-05 [insert-virtual-dataset]: 2.79999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00399404, [53] [py_interpret_to_execute]: 1.905e-05 [rewriter_before_opt_a]: 5.926e-05 [opt_a]: 0.00213278, [2] [Cycle 1]: 0.0015342, [45] [expand_dump_flag]: 2.89001e-06 [switch_simplify]: 3.381e-05 [loop_unroll]: 2.104e-05 [a_1]: 0.00045959 [with_stream_mark]: 1.384e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.573e-05 [accelerated_algorithm]: 6.02001e-06 [shard]: 2.53998e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 5.65001e-06 [merge_send_recv]: 8.03001e-06 [auto_parallel]: 5.83002e-06 [parallel]: 2.429e-05 [flash_sp]: 7.17002e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.24998e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.33999e-06 [virtual_dataset]: 5.96998e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.69999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.61002e-06 [before_grad]: 9.41e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.37001e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.071e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00042036 [add_forward_monad_depend]: 4.47998e-06 [auto_monad_grad]: 1.73997e-06 [auto_monad_eliminator]: 1.406e-05 [cse]: 2.961e-05 [a_3]: 4.07e-05 [Cycle 2]: 0.00058955, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.87002e-06 [loop_unroll]: 5.56998e-06 [a_1]: 0.00012305 [with_stream_mark]: 9.48002e-06 [recompute_prepare]: 5.45001e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.20002e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.753e-05 [accelerated_algorithm]: 5.51998e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.55999e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.78999e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 3.16999e-06 [matmul_add_comm_reduction]: 4.93001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.51e-06 [virtual_dataset]: 5.36002e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.12002e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.06998e-06 [a_after_grad]: 8.22998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.29001e-06 [cse]: 1.666e-05 [a_3]: 3.271e-05 [py_interpret_to_execute_after_opt_a]: 7.67998e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 2.936e-05 [convert_after_rewriter]: 7.28999e-06 [order_py_execute_after_rewriter]: 5.15001e-06 [mutable_eliminate]: 0.00046767 [opt_b]: 0.00018211, [1] [Cycle 1]: 0.00017602, [7] [b_1]: 0.00010803 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.19999e-06 [renormalize]: 4.90021e-07 [cse]: 1.671e-05 [optimize_parallel_all_gather_comm]: 1.556e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.329e-05 [loop_unroll]: 0.00041333 [opt_after_cconv]: 9.512e-05, [1] [Cycle 1]: 8.967e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.694e-05 [renormalize]: 2.70025e-07 [remove_dup_value]: 1.273e-05 [tuple_transform]: 6.919e-05, [1] [Cycle 1]: 6.491e-05, [4] [d_1]: 3.89e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.41998e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.089e-05 [cse_after_recomputation]: 2.047e-05, [1] [Cycle 1]: 1.602e-05, [1] [cse]: 1.093e-05 [environ_conv]: 4.94998e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.37001e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.58e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.21002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.129e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 4.03001e-06 [overlap_recompute_and_grad_model_parallel]: 4.86002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 3.92998e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.55002e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 6.865e-05, [1] [Cycle 1]: 6.461e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 8.39002e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 9.04e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 0.00012636 [opt_after_jit_grad]: 0.00045433 [validate]: 3.148e-05 [backend_pass]: 1.17999e-06 [task_emit]: 0.00675693 [execute]: 6.73998e-06 Sums bootstrap : 0.000544s : 3.10% type_inference : 0.006259s : 35.66% event_method : 0.000015s : 0.08% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000583s : 3.32% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000420s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000046s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000468s : 2.66% optimize.opt_b.b_1 : 0.000108s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000413s : 2.35% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000126s : 0.72% opt_after_jit_grad : 0.000454s : 2.59% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006757s : 38.49% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.97% : 0.000025s : 5: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.77% : 0.000111s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.32% : 0.000004s : 4: substitution.replace_old_param 6.65% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006214 2 90.61% : 0.005631s : 1: type_inference.infer 9.39% : 0.000584s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.03% : 0.000028s : 3: replace.inline 29.97% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.58% : 0.000109s : 3: match.inline 8.42% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 1.11% : 0.000002s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 1.08% : 0.000002s : 11: predicate.cast_eliminate 0.92% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.06% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.73% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.87% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.97% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.78% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.29% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000376 8 47.77% : 0.000179s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.23% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030835 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.04% : 0.003403s : 1: add_attr 11.00% : 0.003392s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000583s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.55% : 0.000477s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.08% : 0.000950s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.93% : 0.002136s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.51% : 0.000464s : 1: opt_after_jit_grad 0.60% : 0.000186s : 1: opt_b 12.96% : 0.003998s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.07% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.69% : 0.000214s : 1: renormalize.infer 0.65% : 0.000200s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.43% : 0.000132s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.95% : 0.006767s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.35% : 0.006274s : 1: type_inference 0.19% : 0.000058s : 1: validate TotalTime = 0.0186985, [24] [bootstrap]: 0.00049454 [type_inference]: 0.00446645 [event_method]: 1.007e-05 [auto_monad]: 5.066e-05 [graph_reusing]: 5.59e-06 [inline]: 1.94999e-06 [add_attr]: 0.00299496, [1] [add_attr_with_inline]: 0.00298736, [1] [Cycle 1]: 4.627e-05, [2] [tag_attr]: 1.223e-05 [meta_addattr_fg_expand]: 3.28e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.268e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00391918, [53] [py_interpret_to_execute]: 1.511e-05 [rewriter_before_opt_a]: 4.039e-05 [opt_a]: 0.00195437, [2] [Cycle 1]: 0.0013046, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.426e-05 [loop_unroll]: 1.356e-05 [a_1]: 0.00029468 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.42998e-06 [updatestate_depend_eliminate]: 4.25999e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.34001e-06 [parameter_eliminate]: 2.07001e-06 [a_2]: 7.735e-05 [accelerated_algorithm]: 6.79999e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.96998e-06 [merge_send_recv]: 8.18001e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.895e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.22001e-06 [allreduce_slice_to_reducescatter]: 9.79984e-07 [virtual_shard_identity]: 7.21999e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 6.01e-06 [merge_forward]: 4.17998e-06 [cell_reuse_recompute_pass]: 1.41002e-06 [offload_activation]: 1.057e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.195e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 1.109e-05 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.96999e-06 [after_resolve]: 1.15e-05 [a_after_grad]: 9.66998e-06 [renormalize]: 0.00036184 [add_forward_monad_depend]: 4.47e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.393e-05 [cse]: 2.765e-05 [a_3]: 4.506e-05 [Cycle 2]: 0.0006403, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.93998e-06 [loop_unroll]: 6.09001e-06 [a_1]: 0.00013811 [with_stream_mark]: 9.95002e-06 [recompute_prepare]: 6.41e-06 [updatestate_depend_eliminate]: 3.09001e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.183e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 6.24001e-06 [merge_send_recv]: 4.88001e-06 [auto_parallel]: 5.86998e-06 [parallel]: 4.07e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.67999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.86999e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.55001e-06 [merge_forward]: 2.66999e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 6.24999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.012e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.19e-06 [after_resolve]: 1.039e-05 [a_after_grad]: 9.75002e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.38998e-06 [cse]: 1.284e-05 [a_3]: 3.549e-05 [py_interpret_to_execute_after_opt_a]: 7.95e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.481e-05 [convert_after_rewriter]: 8.00999e-06 [order_py_execute_after_rewriter]: 5.78997e-06 [mutable_eliminate]: 0.00049241 [opt_b]: 0.00019686, [1] [Cycle 1]: 0.0001906, [7] [b_1]: 0.0001186 [b_2]: 7.65e-06 [updatestate_depend_eliminate]: 5.24998e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 5.10016e-07 [cse]: 1.704e-05 [optimize_parallel_all_gather_comm]: 1.617e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.446e-05 [loop_unroll]: 0.00049454 [opt_after_cconv]: 9.742e-05, [1] [Cycle 1]: 9.143e-05, [7] [c_1]: 3.08e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.10002e-06 [cse]: 1.586e-05 [renormalize]: 4.7998e-07 [remove_dup_value]: 1.292e-05 [tuple_transform]: 6.924e-05, [1] [Cycle 1]: 6.49e-05, [4] [d_1]: 3.895e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.467e-05 [cse_after_recomputation]: 1.907e-05, [1] [Cycle 1]: 1.479e-05, [1] [cse]: 9.87001e-06 [environ_conv]: 4.15e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.19002e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.36002e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.20999e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.92e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.74998e-06 [overlap_recompute_comm]: 2.72001e-06 [overlap_grad_ring_attention]: 4.22998e-06 [overlap_grad_flash_sp]: 1.783e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 2.21e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 6.759e-05, [1] [Cycle 1]: 6.313e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 5.82999e-06 [fold_const_symbol]: 8.65001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.61998e-06 [pipeline_parallel_scheduler]: 1.61998e-06 [auto_monad_reorder]: 1.595e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.44001e-06 [opt_after_jit_grad]: 0.00044653 [validate]: 3.081e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00601647 [execute]: 7.4e-06 Sums bootstrap : 0.000495s : 3.36% type_inference : 0.004466s : 30.35% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000433s : 2.94% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000149s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.09% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000012s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000022s : 0.15% optimize.opt_a.a_after_grad : 0.000019s : 0.13% optimize.opt_a.renormalize : 0.000362s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000081s : 0.55% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000035s : 0.24% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000492s : 3.35% optimize.opt_b.b_1 : 0.000119s : 0.81% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.17% optimize.loop_unroll : 0.000495s : 3.36% optimize.opt_after_cconv.c_1 : 0.000031s : 0.21% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.02% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.03% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006016s : 40.89% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000126 26 18.15% : 0.000023s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.18% : 0.000001s : 2: substitution.fold_const_symbol 4.22% : 0.000005s : 4: substitution.graph_param_transform 65.47% : 0.000082s : 2: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.86% : 0.000005s : 4: substitution.remove_not_recompute_node 3.28% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004426 2 92.05% : 0.004074s : 1: type_inference.infer 7.95% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000144 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.71% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000009s : 44: predicate.inline 1.09% : 0.000002s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000002s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 1.21% : 0.000002s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.25% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.52% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.23% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.02% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.96% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.84% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 43.01% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.99% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026954 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.13% : 0.002999s : 1: add_attr 11.10% : 0.002991s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.98% : 0.000534s : 1: bootstrap 0.11% : 0.000028s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.87% : 0.000504s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.86% : 0.000502s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.99% : 0.000806s : 78: opt.transform.opt_a 0.11% : 0.000030s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.37% : 0.000099s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.26% : 0.001957s : 1: opt_a 0.37% : 0.000101s : 1: opt_after_cconv 1.69% : 0.000456s : 1: opt_after_jit_grad 0.74% : 0.000201s : 1: opt_b 14.55% : 0.003923s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.75% : 0.000202s : 1: renormalize.infer 0.57% : 0.000153s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000039s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 22.36% : 0.006026s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.62% : 0.004481s : 1: type_inference 0.21% : 0.000058s : 1: validate TotalTime = 0.0199006, [24] [bootstrap]: 0.00049992 [type_inference]: 0.00560666 [event_method]: 1.395e-05 [auto_monad]: 6.015e-05 [graph_reusing]: 5.81e-06 [inline]: 1.75001e-06 [add_attr]: 0.00298473, [1] [add_attr_with_inline]: 0.00297705, [1] [Cycle 1]: 4.704e-05, [2] [tag_attr]: 1.502e-05 [meta_addattr_fg_expand]: 4.02e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.545e-05 [insert-virtual-dataset]: 2.79999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00395553, [53] [py_interpret_to_execute]: 1.996e-05 [rewriter_before_opt_a]: 5.845e-05 [opt_a]: 0.00213416, [2] [Cycle 1]: 0.00149886, [45] [expand_dump_flag]: 3.01001e-06 [switch_simplify]: 3.213e-05 [loop_unroll]: 2.024e-05 [a_1]: 0.00044946 [with_stream_mark]: 1.305e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.36999e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 7.825e-05 [accelerated_algorithm]: 6.66e-06 [shard]: 2.29999e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 7.92998e-06 [auto_parallel]: 6.43998e-06 [parallel]: 1.881e-05 [flash_sp]: 7.03998e-06 [merge_comm]: 3.3e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 8.70001e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.21001e-06 [virtual_dataset]: 5.79999e-06 [get_grad_eliminate_]: 5.38002e-06 [virtual_output]: 5.45001e-06 [merge_forward]: 3.68999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.75002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.057e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.57001e-06 [after_resolve]: 1.101e-05 [a_after_grad]: 8.88002e-06 [renormalize]: 0.0004076 [add_forward_monad_depend]: 4.60001e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.347e-05 [cse]: 2.804e-05 [a_3]: 4.057e-05 [Cycle 2]: 0.00062633, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.56e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00015955 [with_stream_mark]: 9.92001e-06 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.53998e-06 [parameter_eliminate]: 1.34e-06 [a_2]: 6.788e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.44998e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.92e-06 [flash_sp]: 3.59002e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.96999e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.02001e-06 [virtual_dataset]: 5.15001e-06 [get_grad_eliminate_]: 5.01002e-06 [virtual_output]: 4.91997e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.15001e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 2.72001e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.04003e-06 [after_resolve]: 9.10999e-06 [a_after_grad]: 7.78999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 6.48998e-06 [cse]: 1.33e-05 [a_3]: 3.183e-05 [py_interpret_to_execute_after_opt_a]: 7.68999e-06 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 3.13e-05 [convert_after_rewriter]: 7.55e-06 [order_py_execute_after_rewriter]: 4.79e-06 [mutable_eliminate]: 0.00044815 [opt_b]: 0.00017968, [1] [Cycle 1]: 0.00017399, [7] [b_1]: 0.0001078 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 2.30008e-07 [cse]: 1.558e-05 [optimize_parallel_all_gather_comm]: 1.602e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.223e-05 [loop_unroll]: 0.00041153 [opt_after_cconv]: 9.309e-05, [1] [Cycle 1]: 8.733e-05, [7] [c_1]: 2.766e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.56e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.24e-05 [tuple_transform]: 6.858e-05, [1] [Cycle 1]: 6.431e-05, [4] [d_1]: 3.898e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.18998e-06 [partial_unused_args_eliminate]: 1.94999e-06 [add_recomputation]: 4.331e-05 [cse_after_recomputation]: 1.984e-05, [1] [Cycle 1]: 1.52e-05, [1] [cse]: 1.009e-05 [environ_conv]: 4.80001e-06 [swap_dp_allreduce_reducescatter]: 5.52999e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 3.98001e-06 [label_fine_grained_interleaved_index]: 3.03e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.69999e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.97002e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.21997e-06 [interleave_parallel_branches]: 1.27999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.176e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.38999e-06 [overlap_recompute_and_grad_model_parallel]: 4.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.718e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 6.781e-05, [1] [Cycle 1]: 6.372e-05, [6] [build]: 2.20002e-06 [elim_shapecalc]: 8.53001e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.94998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.547e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00044638 [validate]: 3.118e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00602944 [execute]: 8.71002e-06 Sums bootstrap : 0.000500s : 3.13% type_inference : 0.005607s : 35.12% event_method : 0.000014s : 0.09% auto_monad : 0.000060s : 0.38% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000609s : 3.81% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000408s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 2.81% optimize.opt_b.b_1 : 0.000108s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000412s : 2.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000446s : 2.80% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006029s : 37.77% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000166 30 15.42% : 0.000026s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 66.06% : 0.000110s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.43% : 0.000004s : 4: substitution.remove_not_recompute_node 2.68% : 0.000004s : 4: substitution.replace_old_param 6.44% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005566 2 90.15% : 0.005017s : 1: type_inference.infer 9.85% : 0.000548s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.39% : 0.000027s : 3: replace.inline 29.61% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.72% : 0.000107s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.87% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.60% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.53% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.06% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.82% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.39% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 47.02% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.98% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028380 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.53% : 0.002989s : 1: add_attr 10.50% : 0.002981s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.23% : 0.000065s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000538s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.42% : 0.000972s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.53% : 0.002137s : 1: opt_a 0.34% : 0.000096s : 1: opt_after_cconv 1.61% : 0.000456s : 1: opt_after_jit_grad 0.65% : 0.000183s : 1: opt_b 13.95% : 0.003959s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000211s : 1: renormalize.infer 0.67% : 0.000190s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.28% : 0.006040s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.80% : 0.005620s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0376914, [24] [bootstrap]: 0.00053987 [type_inference]: 0.011491 [event_method]: 4.726e-05 [auto_monad]: 0.00012267 [graph_reusing]: 8.13999e-06 [inline]: 1.96e-06 [add_attr]: 0.00303251, [1] [add_attr_with_inline]: 0.00302415, [1] [Cycle 1]: 7.229e-05, [2] [tag_attr]: 3.506e-05 [meta_addattr_fg_expand]: 9.56998e-06 [parallel-infer-symbol]: 3.18998e-06 [pre_auto_parallel]: 4.953e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.0132527, [53] [py_interpret_to_execute]: 3.945e-05 [rewriter_before_opt_a]: 0.00014589 [opt_a]: 0.0109961, [3] [Cycle 1]: 0.00708176, [45] [expand_dump_flag]: 3.59002e-06 [switch_simplify]: 7.317e-05 [loop_unroll]: 6.098e-05 [a_1]: 0.00144125 [with_stream_mark]: 2.27e-05 [recompute_prepare]: 2.117e-05 [updatestate_depend_eliminate]: 8.75999e-06 [updatestate_assign_eliminate]: 8.48999e-06 [updatestate_loads_eliminate]: 7.31999e-06 [parameter_eliminate]: 2.73e-06 [a_2]: 0.00024432 [accelerated_algorithm]: 3.07e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 3.38999e-06 [shard_inline]: 1.582e-05 [merge_send_recv]: 1.587e-05 [auto_parallel]: 1.106e-05 [parallel]: 1.866e-05 [flash_sp]: 1.163e-05 [merge_comm]: 9.55001e-06 [allreduce_fusion]: 9.16998e-06 [matmul_add_comm_reduction]: 2.754e-05 [allreduce_slice_to_reducescatter]: 9.49978e-07 [virtual_shard_identity]: 1.797e-05 [virtual_dataset]: 1.541e-05 [get_grad_eliminate_]: 1.49e-05 [virtual_output]: 1.517e-05 [merge_forward]: 9.52001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.871e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.883e-05 [merge_recompute_call_nodes]: 1.71002e-06 [before_grad]: 2.774e-05 [set_forward_comm_id_for_comm_node_pass]: 9.97001e-06 [meta_fg_expand]: 0.00145351 [flash_sp_send_recv_attached]: 3.6e-06 [receive_attached]: 2.83e-06 [after_resolve]: 5.878e-05 [a_after_grad]: 8.019e-05 [renormalize]: 0.0024291 [add_forward_monad_depend]: 9.97999e-06 [auto_monad_grad]: 4.97e-06 [auto_monad_eliminator]: 5.492e-05 [cse]: 0.00016157 [a_3]: 0.00033385 [Cycle 2]: 0.00298683, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 4.673e-05 [loop_unroll]: 4.336e-05 [a_1]: 0.00152738 [with_stream_mark]: 1.139e-05 [recompute_prepare]: 1.106e-05 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.54998e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 0.00012521 [accelerated_algorithm]: 1.213e-05 [shard]: 9.40025e-07 [meta_shard_fg_expand]: 1.83002e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 7.06001e-06 [auto_parallel]: 7.46999e-06 [parallel]: 4.95999e-06 [flash_sp]: 3.63999e-06 [merge_comm]: 5.82001e-06 [allreduce_fusion]: 4.86002e-06 [matmul_add_comm_reduction]: 7.51999e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 1.03e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 8.63001e-06 [virtual_output]: 8.47e-06 [merge_forward]: 4.60999e-06 [cell_reuse_recompute_pass]: 9.80013e-07 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.631e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.378e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 7.071e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.14003e-06 [after_resolve]: 1.659e-05 [a_after_grad]: 1.416e-05 [renormalize]: 0.00058017 [add_forward_monad_depend]: 4.15999e-06 [auto_monad_grad]: 1.16997e-06 [auto_monad_eliminator]: 1.427e-05 [cse]: 4.607e-05 [a_3]: 6.516e-05 [Cycle 3]: 0.00091395, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.049e-05 [loop_unroll]: 8.74e-06 [a_1]: 0.00024704 [with_stream_mark]: 9.70002e-06 [recompute_prepare]: 9.36e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 4.00998e-06 [parameter_eliminate]: 1.17999e-06 [a_2]: 0.00012429 [accelerated_algorithm]: 1.149e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.83002e-06 [shard_inline]: 8.92999e-06 [merge_send_recv]: 6.94999e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.62e-06 [flash_sp]: 1.00999e-06 [merge_comm]: 4.86997e-06 [allreduce_fusion]: 4.90001e-06 [matmul_add_comm_reduction]: 7.90998e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.018e-05 [virtual_dataset]: 8.69003e-06 [get_grad_eliminate_]: 8.40001e-06 [virtual_output]: 8.1e-06 [merge_forward]: 4.02002e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.581e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.05999e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.445e-05 [a_after_grad]: 1.47e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 1.02998e-06 [auto_monad_eliminator]: 1.059e-05 [cse]: 2.577e-05 [a_3]: 5.867e-05 [py_interpret_to_execute_after_opt_a]: 9.92001e-06 [slice_cell_reuse_recomputed_activation]: 2.14999e-06 [rewriter_after_opt_a]: 4.683e-05 [convert_after_rewriter]: 9.04e-06 [order_py_execute_after_rewriter]: 6.61999e-06 [mutable_eliminate]: 0.00045763 [opt_b]: 0.00028376, [1] [Cycle 1]: 0.00027793, [7] [b_1]: 0.00018689 [b_2]: 1.046e-05 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.91999e-06 [renormalize]: 4.09986e-07 [cse]: 3.075e-05 [optimize_parallel_all_gather_comm]: 2.101e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 1.981e-05 [loop_unroll]: 0.00042283 [opt_after_cconv]: 0.0001354, [1] [Cycle 1]: 0.00012977, [7] [c_1]: 4.822e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.37998e-06 [updatestate_loads_eliminate]: 3.91999e-06 [cse]: 2.982e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 3e-05 [tuple_transform]: 0.00010104, [1] [Cycle 1]: 9.64e-05, [4] [d_1]: 6.637e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.83002e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.825e-05 [cse_after_recomputation]: 3.242e-05, [1] [Cycle 1]: 2.783e-05, [1] [cse]: 2.219e-05 [environ_conv]: 8.70999e-06 [swap_dp_allreduce_reducescatter]: 7.60998e-06 [bias_add_comm_swap]: 2.24001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.40999e-06 [overlap_opt_shard_in_pipeline]: 1.44e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.726e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 4.97999e-06 [overlap_recompute_and_grad_model_parallel]: 5.81e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 5.22999e-06 [overlap_grad_flash_sp]: 2.377e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 9.676e-05, [1] [Cycle 1]: 9.268e-05, [6] [build]: 9.41e-06 [elim_shapecalc]: 1.364e-05 [elim_not_effective]: 1.814e-05 [opt_reshape]: 9.94999e-06 [fold_const_symbol]: 1.446e-05 [renormalize]: 2.20025e-07 [detach_backward]: 1.55999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.49e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00046995 [validate]: 4.541e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00837039 [execute]: 7.60998e-06 Sums bootstrap : 0.000540s : 1.62% type_inference : 0.011491s : 34.41% event_method : 0.000047s : 0.14% auto_monad : 0.000123s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000146s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.39% optimize.opt_a.loop_unroll : 0.000113s : 0.34% optimize.opt_a.a_1 : 0.003216s : 9.63% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.48% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001527s : 4.57% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003009s : 9.01% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000233s : 0.70% optimize.opt_a.a_3 : 0.000458s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.37% optimize.opt_b.b_1 : 0.000187s : 0.56% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000423s : 1.27% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000470s : 1.41% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008370s : 25.07% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000760 222 5.98% : 0.000045s : 12: substitution.arithmetic_simplify 1.73% : 0.000013s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.70% : 0.000424s : 17: substitution.inline 2.04% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.09% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.68% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.46% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011416 2 87.05% : 0.009937s : 1: type_inference.infer 12.95% : 0.001479s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.25% : 0.000125s : 17: replace.inline 42.75% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000449 33 92.44% : 0.000415s : 17: match.inline 7.56% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.00% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.83% : 0.000014s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.12% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.22% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.33% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.18% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.93% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.39% : 0.000010s : 84: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001571 34 56.84% : 0.000893s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.16% : 0.000678s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062182 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.88% : 0.003037s : 1: add_attr 4.87% : 0.003028s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000131s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.93% : 0.000577s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.83% : 0.004870s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.69% : 0.010999s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000480s : 1: opt_after_jit_grad 0.46% : 0.000287s : 1: opt_b 21.32% : 0.013257s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000035s : 1: remove_dup_value 2.58% : 0.001601s : 2: renormalize.infer 2.24% : 0.001395s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.48% : 0.008381s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.50% : 0.011506s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0188494, [24] [bootstrap]: 0.00050165 [type_inference]: 0.0043774 [event_method]: 1.064e-05 [auto_monad]: 5.438e-05 [graph_reusing]: 5.23002e-06 [inline]: 1.84e-06 [add_attr]: 0.003007, [1] [add_attr_with_inline]: 0.00299933, [1] [Cycle 1]: 4.567e-05, [2] [tag_attr]: 1.231e-05 [meta_addattr_fg_expand]: 3.58e-06 [parallel-infer-symbol]: 3.81001e-06 [pre_auto_parallel]: 2.295e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 1.82999e-06 [optimize]: 0.00369189, [53] [py_interpret_to_execute]: 1.647e-05 [rewriter_before_opt_a]: 3.94e-05 [opt_a]: 0.00188477, [2] [Cycle 1]: 0.0012844, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 2.428e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.00029402 [with_stream_mark]: 1.287e-05 [recompute_prepare]: 7.31001e-06 [updatestate_depend_eliminate]: 3.98999e-06 [updatestate_assign_eliminate]: 3.61999e-06 [updatestate_loads_eliminate]: 3.05998e-06 [parameter_eliminate]: 2.17001e-06 [a_2]: 7.651e-05 [accelerated_algorithm]: 5.99e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 8.45001e-06 [auto_parallel]: 5.51e-06 [parallel]: 1.791e-05 [flash_sp]: 8.12e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 3.85e-06 [matmul_add_comm_reduction]: 8.87e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.72002e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.66998e-06 [merge_forward]: 3.92002e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 9.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 1.031e-05 [set_forward_comm_id_for_comm_node_pass]: 3.66999e-06 [meta_fg_expand]: 2.18998e-06 [flash_sp_send_recv_attached]: 2.53998e-06 [receive_attached]: 2.26998e-06 [after_resolve]: 1.085e-05 [a_after_grad]: 9.09e-06 [renormalize]: 0.00034345 [add_forward_monad_depend]: 4.02e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.287e-05 [cse]: 2.81e-05 [a_3]: 3.915e-05 [Cycle 2]: 0.00059124, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 7.28e-06 [loop_unroll]: 5.55001e-06 [a_1]: 0.00012507 [with_stream_mark]: 1.093e-05 [recompute_prepare]: 5.57999e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.39001e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.791e-05 [accelerated_algorithm]: 5.87001e-06 [shard]: 1.09003e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.27998e-06 [auto_parallel]: 5.40001e-06 [parallel]: 4.42e-06 [flash_sp]: 3.92998e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.15001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.22e-06 [set_forward_comm_id_for_comm_node_pass]: 3.07002e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 9.18002e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.30002e-06 [cse]: 1.321e-05 [a_3]: 3.205e-05 [py_interpret_to_execute_after_opt_a]: 7.01999e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 3.236e-05 [convert_after_rewriter]: 7.48e-06 [order_py_execute_after_rewriter]: 5.57001e-06 [mutable_eliminate]: 0.00044604 [opt_b]: 0.00018015, [1] [Cycle 1]: 0.00017408, [7] [b_1]: 0.00010777 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.10014e-07 [cse]: 1.562e-05 [optimize_parallel_all_gather_comm]: 1.601e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.284e-05 [loop_unroll]: 0.00041482 [opt_after_cconv]: 9.426e-05, [1] [Cycle 1]: 8.814e-05, [7] [c_1]: 2.823e-05 [parameter_eliminate]: 2.08002e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.514e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.317e-05 [tuple_transform]: 6.84e-05, [1] [Cycle 1]: 6.389e-05, [4] [d_1]: 3.836e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.431e-05 [cse_after_recomputation]: 1.965e-05, [1] [Cycle 1]: 1.536e-05, [1] [cse]: 1.034e-05 [environ_conv]: 4.76002e-06 [swap_dp_allreduce_reducescatter]: 5.24998e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.03999e-06 [label_fine_grained_interleaved_index]: 3.00998e-06 [merge_cast_opt]: 1.59998e-06 [slice_recompute_activation]: 2.53998e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.117e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 4.49998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.77002e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.714e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.812e-05, [1] [Cycle 1]: 6.39e-05, [6] [build]: 2.13998e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 5.93998e-06 [fold_const_symbol]: 9.00999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.564e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00044774 [validate]: 3.171e-05 [backend_pass]: 8.49977e-07 [task_emit]: 0.00645956 [execute]: 6.87002e-06 Sums bootstrap : 0.000502s : 3.37% type_inference : 0.004377s : 29.45% event_method : 0.000011s : 0.07% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.03% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000419s : 2.82% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000344s : 2.31% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000446s : 3.00% optimize.opt_b.b_1 : 0.000108s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000415s : 2.79% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 3.01% validate : 0.000032s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006460s : 43.46% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000123 26 18.46% : 0.000023s : 4: substitution.arithmetic_simplify 1.55% : 0.000002s : 2: substitution.elim_not_effective 1.17% : 0.000001s : 2: substitution.fold_const_symbol 4.16% : 0.000005s : 4: substitution.graph_param_transform 65.33% : 0.000080s : 2: substitution.inline 2.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 3.25% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004335 2 91.99% : 0.003988s : 1: type_inference.infer 8.01% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.41% : 0.000001s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.21% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.34% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.56% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 42.60% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.40% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026821 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.23% : 0.003011s : 1: add_attr 11.20% : 0.003003s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000060s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.01% : 0.000539s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.70% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.87% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.04% : 0.001888s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000457s : 1: opt_after_jit_grad 0.69% : 0.000184s : 1: opt_b 13.78% : 0.003696s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000187s : 1: renormalize.infer 0.56% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000071s : 1: symbol_engine_optimizer 24.12% : 0.006469s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.37% : 0.004392s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.03636, [24] [bootstrap]: 0.00054104 [type_inference]: 0.010408 [event_method]: 4.127e-05 [auto_monad]: 0.00011434 [graph_reusing]: 8.75999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00302522, [1] [add_attr_with_inline]: 0.00301706, [1] [Cycle 1]: 6.89e-05, [2] [tag_attr]: 3.236e-05 [meta_addattr_fg_expand]: 8.65001e-06 [parallel-infer-symbol]: 3.21999e-06 [pre_auto_parallel]: 4.69e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0130483, [53] [py_interpret_to_execute]: 3.662e-05 [rewriter_before_opt_a]: 0.00012874 [opt_a]: 0.0108065, [3] [Cycle 1]: 0.00690275, [45] [expand_dump_flag]: 3.58999e-06 [switch_simplify]: 6.641e-05 [loop_unroll]: 5.549e-05 [a_1]: 0.0013418 [with_stream_mark]: 2.235e-05 [recompute_prepare]: 2.21e-05 [updatestate_depend_eliminate]: 9.07001e-06 [updatestate_assign_eliminate]: 7.89002e-06 [updatestate_loads_eliminate]: 7.30998e-06 [parameter_eliminate]: 3.08e-06 [a_2]: 0.00024558 [accelerated_algorithm]: 3.076e-05 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 3.25e-06 [shard_inline]: 1.629e-05 [merge_send_recv]: 1.69e-05 [auto_parallel]: 1.09e-05 [parallel]: 1.812e-05 [flash_sp]: 1.123e-05 [merge_comm]: 9.50001e-06 [allreduce_fusion]: 8.91002e-06 [matmul_add_comm_reduction]: 2.711e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.74e-05 [virtual_dataset]: 1.6e-05 [get_grad_eliminate_]: 1.556e-05 [virtual_output]: 1.513e-05 [merge_forward]: 9.37001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.78e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.79e-05 [merge_recompute_call_nodes]: 1.51002e-06 [before_grad]: 2.733e-05 [set_forward_comm_id_for_comm_node_pass]: 9.36e-06 [meta_fg_expand]: 0.00142264 [flash_sp_send_recv_attached]: 4.07e-06 [receive_attached]: 2.63e-06 [after_resolve]: 5.868e-05 [a_after_grad]: 8.086e-05 [renormalize]: 0.00236504 [add_forward_monad_depend]: 9.15001e-06 [auto_monad_grad]: 2.921e-05 [auto_monad_eliminator]: 5.71e-05 [cse]: 0.00016533 [a_3]: 0.00033281 [Cycle 2]: 0.00293505, [45] [expand_dump_flag]: 1.37999e-06 [switch_simplify]: 4.642e-05 [loop_unroll]: 4.386e-05 [a_1]: 0.0015234 [with_stream_mark]: 1.18e-05 [recompute_prepare]: 1.084e-05 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 4.51002e-06 [updatestate_loads_eliminate]: 3.71001e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012556 [accelerated_algorithm]: 1.199e-05 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.22999e-06 [merge_send_recv]: 6.87002e-06 [auto_parallel]: 7.4e-06 [parallel]: 4.87e-06 [flash_sp]: 3.84002e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 4.70001e-06 [matmul_add_comm_reduction]: 7.55e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 9.59e-06 [get_grad_eliminate_]: 8.98002e-06 [virtual_output]: 8.43999e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 8.50006e-07 [offload_activation]: 8.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.674e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.38e-05 [set_forward_comm_id_for_comm_node_pass]: 5.33002e-06 [meta_fg_expand]: 3.547e-05 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.497e-05 [a_after_grad]: 1.423e-05 [renormalize]: 0.00057054 [add_forward_monad_depend]: 4.24997e-06 [auto_monad_grad]: 1.25999e-06 [auto_monad_eliminator]: 1.389e-05 [cse]: 4.393e-05 [a_3]: 6.481e-05 [Cycle 3]: 0.00095447, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 1.033e-05 [loop_unroll]: 8.78001e-06 [a_1]: 0.00024843 [with_stream_mark]: 9.68002e-06 [recompute_prepare]: 9.20999e-06 [updatestate_depend_eliminate]: 4.68999e-06 [updatestate_assign_eliminate]: 3.80998e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00018525 [accelerated_algorithm]: 1.225e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 8.91997e-06 [merge_send_recv]: 7.05002e-06 [auto_parallel]: 7.06999e-06 [parallel]: 4.53999e-06 [flash_sp]: 1.14998e-06 [merge_comm]: 4.86997e-06 [allreduce_fusion]: 4.85999e-06 [matmul_add_comm_reduction]: 8.1e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 9.74e-06 [virtual_dataset]: 8.63001e-06 [get_grad_eliminate_]: 8.3e-06 [virtual_output]: 8.18999e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 8.53001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.635e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.388e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 3.00002e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.311e-05 [a_after_grad]: 1.44e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 1.03e-05 [cse]: 2.49e-05 [a_3]: 5.8e-05 [py_interpret_to_execute_after_opt_a]: 1.053e-05 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 4.826e-05 [convert_after_rewriter]: 9.19e-06 [order_py_execute_after_rewriter]: 6.93998e-06 [mutable_eliminate]: 0.00046016 [opt_b]: 0.00028286, [1] [Cycle 1]: 0.00027695, [7] [b_1]: 0.00018757 [b_2]: 1.054e-05 [updatestate_depend_eliminate]: 6.93e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 3.29979e-07 [cse]: 2.916e-05 [optimize_parallel_all_gather_comm]: 2.088e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.04e-05 [loop_unroll]: 0.00042227 [opt_after_cconv]: 0.00013313, [1] [Cycle 1]: 0.00012751, [7] [c_1]: 4.746e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 6.78003e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.98001e-06 [cse]: 2.91e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 3.006e-05 [tuple_transform]: 9.993e-05, [1] [Cycle 1]: 9.547e-05, [4] [d_1]: 6.609e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 9.79999e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 5.682e-05 [cse_after_recomputation]: 3.171e-05, [1] [Cycle 1]: 2.654e-05, [1] [cse]: 2.09e-05 [environ_conv]: 8.18001e-06 [swap_dp_allreduce_reducescatter]: 7.67002e-06 [bias_add_comm_swap]: 2.41998e-06 [label_micro_interleaved_index]: 4.58001e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.58002e-06 [slice_recompute_activation]: 2.43002e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.35002e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.716e-05 [grouped_pairwise_exchange_alltoall]: 1.61998e-06 [offloading_packed_experts]: 5.45001e-06 [overlap_recompute_and_grad_model_parallel]: 5.49e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 5.47001e-06 [overlap_grad_flash_sp]: 2.463e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 0.00010064, [1] [Cycle 1]: 9.642e-05, [6] [build]: 1.03e-05 [elim_shapecalc]: 1.372e-05 [elim_not_effective]: 1.83e-05 [opt_reshape]: 1.017e-05 [fold_const_symbol]: 1.524e-05 [renormalize]: 2.40019e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 2.527e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00046732 [validate]: 4.418e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00834964 [execute]: 7.65998e-06 Sums bootstrap : 0.000541s : 1.69% type_inference : 0.010408s : 32.45% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000129s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.38% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003114s : 9.71% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000556s : 1.73% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.10% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001461s : 4.55% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.34% optimize.opt_a.renormalize : 0.002936s : 9.15% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000032s : 0.10% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000234s : 0.73% optimize.opt_a.a_3 : 0.000456s : 1.42% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000460s : 1.43% optimize.opt_b.b_1 : 0.000188s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000029s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000422s : 1.32% optimize.opt_after_cconv.c_1 : 0.000047s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000467s : 1.46% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008350s : 26.03% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000735 218 5.85% : 0.000043s : 11: substitution.arithmetic_simplify 1.93% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.10% : 0.000008s : 8: substitution.graph_param_transform 0.49% : 0.000004s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 54.58% : 0.000401s : 16: substitution.inline 2.11% : 0.000015s : 2: substitution.inline_without_move 1.39% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.03% : 0.000015s : 3: substitution.less_batch_normalization 1.83% : 0.000013s : 11: substitution.minmaximum_grad 0.77% : 0.000006s : 5: substitution.partial_eliminate 1.80% : 0.000013s : 20: substitution.remove_not_recompute_node 3.23% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.89% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.88% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.41% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010337 2 87.47% : 0.009041s : 1: type_inference.infer 12.53% : 0.001296s : 1: type_inference.specialize ------[replace.] 0.000202 30 59.22% : 0.000119s : 16: replace.inline 40.78% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000424 30 92.79% : 0.000393s : 16: match.inline 7.21% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000732 5663 1.13% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.09% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.21% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.24% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.81% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.55% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.10% : 0.000001s : 8: predicate.fold_const_symbol 0.61% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 149: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 32: predicate.shard_identity_eliminate 0.34% : 0.000003s : 16: predicate.special_op_eliminate 0.60% : 0.000004s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.43% : 0.000010s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001490 32 57.61% : 0.000859s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.39% : 0.000632s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060514 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.01% : 0.003029s : 1: add_attr 4.99% : 0.003021s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000122s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.96% : 0.000580s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.96% : 0.004819s : 117: opt.transform.opt_a 0.08% : 0.000046s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.86% : 0.010810s : 1: opt_a 0.23% : 0.000137s : 1: opt_after_cconv 0.79% : 0.000477s : 1: opt_after_jit_grad 0.47% : 0.000287s : 1: opt_b 21.57% : 0.013052s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.58% : 0.001564s : 2: renormalize.infer 2.25% : 0.001359s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000052s : 1: rewriter_after_opt_a 0.22% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.81% : 0.008360s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.23% : 0.010424s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x0-kbk],max_mem:10.0M TotalTime = 0.122011, [24] [bootstrap]: 0.00057517 [type_inference]: 0.00623756 [event_method]: 1.401e-05 [auto_monad]: 5.66e-05 [graph_reusing]: 5.35001e-06 [inline]: 1.71e-06 [add_attr]: 0.00353301, [1] [add_attr_with_inline]: 0.00352134, [1] [Cycle 1]: 4.704e-05, [2] [tag_attr]: 1.598e-05 [meta_addattr_fg_expand]: 4.2e-06 [parallel-infer-symbol]: 3.66999e-06 [pre_auto_parallel]: 2.817e-05 [insert-virtual-dataset]: 2.84001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00401408, [53] [py_interpret_to_execute]: 1.984e-05 [rewriter_before_opt_a]: 6.138e-05 [opt_a]: 0.00212715, [2] [Cycle 1]: 0.00153319, [45] [expand_dump_flag]: 2.76e-06 [switch_simplify]: 3.324e-05 [loop_unroll]: 2.135e-05 [a_1]: 0.00046854 [with_stream_mark]: 1.398e-05 [recompute_prepare]: 8.32e-06 [updatestate_depend_eliminate]: 4.57e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 7.545e-05 [accelerated_algorithm]: 6.81999e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.3e-06 [auto_parallel]: 5.59e-06 [parallel]: 2.491e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.19e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.58999e-06 [virtual_dataset]: 6.10002e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 1.022e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.81e-06 [before_grad]: 9.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.18002e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 1.116e-05 [a_after_grad]: 8.62e-06 [renormalize]: 0.00040824 [add_forward_monad_depend]: 4.90999e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.331e-05 [cse]: 2.968e-05 [a_3]: 3.939e-05 [Cycle 2]: 0.00058486, [45] [expand_dump_flag]: 8.00006e-07 [switch_simplify]: 7.04001e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.0001249 [with_stream_mark]: 9.42999e-06 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.794e-05 [accelerated_algorithm]: 5.74e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.07998e-06 [auto_parallel]: 5.20001e-06 [parallel]: 3.97e-06 [flash_sp]: 3.43e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.53998e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.33002e-06 [get_grad_eliminate_]: 5.06997e-06 [virtual_output]: 4.85999e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 5.68002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.12001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.00001e-06 [a_after_grad]: 7.73001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.94001e-06 [cse]: 1.232e-05 [a_3]: 3.171e-05 [py_interpret_to_execute_after_opt_a]: 7.41001e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.163e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 5.72999e-06 [mutable_eliminate]: 0.00045106 [opt_b]: 0.0001844, [1] [Cycle 1]: 0.00017801, [7] [b_1]: 0.00010981 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.79001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [renormalize]: 3.50003e-07 [cse]: 1.622e-05 [optimize_parallel_all_gather_comm]: 1.635e-05 [overlap_param_gather]: 1.86003e-06 [cconv]: 2.244e-05 [loop_unroll]: 0.00041634 [opt_after_cconv]: 0.00012722, [1] [Cycle 1]: 0.00012152, [7] [c_1]: 5.755e-05 [parameter_eliminate]: 2.49999e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.636e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.323e-05 [tuple_transform]: 6.894e-05, [1] [Cycle 1]: 6.433e-05, [4] [d_1]: 3.904e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.02999e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.243e-05 [cse_after_recomputation]: 2.018e-05, [1] [Cycle 1]: 1.563e-05, [1] [cse]: 1.045e-05 [environ_conv]: 4.70999e-06 [swap_dp_allreduce_reducescatter]: 5.57001e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.68001e-06 [label_fine_grained_interleaved_index]: 3.07002e-06 [merge_cast_opt]: 1.58002e-06 [slice_recompute_activation]: 2.58e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.33002e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 3.16001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.42999e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.194e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.47002e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.72e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.16998e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 6.679e-05, [1] [Cycle 1]: 6.279e-05, [6] [build]: 2.16e-06 [elim_shapecalc]: 8.17e-06 [elim_not_effective]: 1.122e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 8.70999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.634e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00044848 [validate]: 3.154e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.106802 [execute]: 9.81e-06 Sums bootstrap : 0.000575s : 0.49% type_inference : 0.006238s : 5.31% event_method : 0.000014s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000061s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000593s : 0.51% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.01% optimize.opt_a.renormalize : 0.000408s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.38% optimize.opt_b.b_1 : 0.000110s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000416s : 0.35% optimize.opt_after_cconv.c_1 : 0.000058s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.38% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.106802s : 90.89% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000174 30 14.76% : 0.000026s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 67.05% : 0.000117s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000005s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.54% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006189 2 90.90% : 0.005625s : 1: type_inference.infer 9.10% : 0.000563s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.36% : 0.000028s : 3: replace.inline 29.64% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000125 5 91.74% : 0.000115s : 3: match.inline 8.26% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.88% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.17% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000001s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.45% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.89% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.90% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000002s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.05% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.39% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.29% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 47.28% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.72% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131109 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.70% : 0.003537s : 1: add_attr 2.69% : 0.003526s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000062s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.47% : 0.000617s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000461s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.73% : 0.000961s : 78: opt.transform.opt_a 0.04% : 0.000056s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000093s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.62% : 0.002130s : 1: opt_a 0.10% : 0.000131s : 1: opt_after_cconv 0.35% : 0.000458s : 1: opt_after_jit_grad 0.14% : 0.000188s : 1: opt_b 3.06% : 0.004018s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000210s : 1: renormalize.infer 0.15% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000066s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000069s : 1: symbol_engine_optimizer 81.48% : 0.106825s : 1: task_emit 0.05% : 0.000072s : 1: tuple_transform 4.77% : 0.006251s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.111375, [24] [bootstrap]: 0.00050091 [type_inference]: 0.00456677 [event_method]: 1.122e-05 [auto_monad]: 5.26e-05 [graph_reusing]: 5.35999e-06 [inline]: 1.78002e-06 [add_attr]: 0.00298643, [1] [add_attr_with_inline]: 0.00297817, [1] [Cycle 1]: 4.219e-05, [2] [tag_attr]: 1.125e-05 [meta_addattr_fg_expand]: 3.46001e-06 [parallel-infer-symbol]: 3.00002e-06 [pre_auto_parallel]: 2.189e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 6.29982e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.003721, [53] [py_interpret_to_execute]: 1.502e-05 [rewriter_before_opt_a]: 3.975e-05 [opt_a]: 0.00192071, [2] [Cycle 1]: 0.00132505, [45] [expand_dump_flag]: 2.90998e-06 [switch_simplify]: 2.498e-05 [loop_unroll]: 1.384e-05 [a_1]: 0.00035046 [with_stream_mark]: 1.396e-05 [recompute_prepare]: 7.68999e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 3.98999e-06 [parameter_eliminate]: 1.81998e-06 [a_2]: 7.754e-05 [accelerated_algorithm]: 6.30002e-06 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 6.04999e-06 [merge_send_recv]: 8.79003e-06 [auto_parallel]: 6.06e-06 [parallel]: 1.914e-05 [flash_sp]: 7.10998e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 9.31998e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 6.93998e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.30001e-06 [merge_forward]: 4.33999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.61e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.22001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00034619 [add_forward_monad_depend]: 4.91002e-06 [auto_monad_grad]: 2.02001e-06 [auto_monad_eliminator]: 1.355e-05 [cse]: 2.992e-05 [a_3]: 4.049e-05 [Cycle 2]: 0.00058642, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.54999e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.0001247 [with_stream_mark]: 1.114e-05 [recompute_prepare]: 5.56998e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.745e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.32999e-06 [merge_send_recv]: 4.39002e-06 [auto_parallel]: 5.20999e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.58e-06 [merge_comm]: 2.90002e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.30001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 5.91998e-06 [virtual_dataset]: 4.99e-06 [get_grad_eliminate_]: 4.79998e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.41998e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47001e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13998e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.74003e-06 [a_after_grad]: 7.97e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 8.90024e-07 [auto_monad_eliminator]: 5.96e-06 [cse]: 1.302e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.51999e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.14e-05 [convert_after_rewriter]: 7.10002e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.00044679 [opt_b]: 0.00017997, [1] [Cycle 1]: 0.00017384, [7] [b_1]: 0.00010593 [b_2]: 7.1e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [renormalize]: 3.50003e-07 [cse]: 1.685e-05 [optimize_parallel_all_gather_comm]: 1.584e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.305e-05 [loop_unroll]: 0.0004092 [opt_after_cconv]: 9.409e-05, [1] [Cycle 1]: 8.855e-05, [7] [c_1]: 2.702e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 4.74998e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.687e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.31e-05 [tuple_transform]: 6.864e-05, [1] [Cycle 1]: 6.426e-05, [4] [d_1]: 3.897e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 5.87001e-06 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 4.491e-05 [cse_after_recomputation]: 2.06e-05, [1] [Cycle 1]: 1.624e-05, [1] [cse]: 1.134e-05 [environ_conv]: 4.57998e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 2.24001e-06 [label_micro_interleaved_index]: 4.53001e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.67001e-06 [slice_recompute_activation]: 2.51998e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 1.15001e-06 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.14e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 3.30998e-06 [overlap_recompute_and_grad_model_parallel]: 4.23001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 3.83999e-06 [overlap_grad_flash_sp]: 1.766e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 6.751e-05, [1] [Cycle 1]: 6.355e-05, [6] [build]: 2.24999e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.72999e-06 [auto_monad_reorder]: 1.654e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.00044823 [validate]: 3.262e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0987654 [execute]: 1.045e-05 Sums bootstrap : 0.000501s : 0.47% type_inference : 0.004567s : 4.25% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000475s : 0.44% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000346s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.42% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000409s : 0.38% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000448s : 0.42% validate : 0.000033s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098765s : 91.94% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.47% : 0.000023s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.43% : 0.000005s : 4: substitution.graph_param_transform 65.61% : 0.000080s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.59% : 0.000004s : 4: substitution.remove_not_recompute_node 3.09% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004524 2 91.74% : 0.004150s : 1: type_inference.infer 8.26% : 0.000374s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.96% : 0.000004s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.56% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.41% : 0.000001s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_depend_swap 2.12% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.65% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.97% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000267 6 42.49% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.51% : 0.000154s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119404 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.50% : 0.002991s : 1: add_attr 2.50% : 0.002982s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000541s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.02% : 0.000019s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000418s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000826s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.61% : 0.001924s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.38% : 0.000457s : 1: opt_after_jit_grad 0.15% : 0.000184s : 1: opt_b 3.12% : 0.003725s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.15% : 0.000185s : 1: renormalize.infer 0.13% : 0.000155s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.73% : 0.098788s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.84% : 0.004582s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.112301, [24] [bootstrap]: 0.00052487 [type_inference]: 0.00566975 [event_method]: 1.449e-05 [auto_monad]: 5.635e-05 [graph_reusing]: 5.97999e-06 [inline]: 1.89999e-06 [add_attr]: 0.00299324, [1] [add_attr_with_inline]: 0.00298574, [1] [Cycle 1]: 4.769e-05, [2] [tag_attr]: 1.534e-05 [meta_addattr_fg_expand]: 4.48999e-06 [parallel-infer-symbol]: 2.76999e-06 [pre_auto_parallel]: 2.626e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00396602, [53] [py_interpret_to_execute]: 2.082e-05 [rewriter_before_opt_a]: 6.049e-05 [opt_a]: 0.00211128, [2] [Cycle 1]: 0.00150674, [45] [expand_dump_flag]: 3.23998e-06 [switch_simplify]: 3.124e-05 [loop_unroll]: 2.07e-05 [a_1]: 0.00044796 [with_stream_mark]: 1.337e-05 [recompute_prepare]: 7.95998e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.45e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.621e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 8.54e-06 [auto_parallel]: 5.96998e-06 [parallel]: 1.876e-05 [flash_sp]: 7.21001e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.16001e-06 [matmul_add_comm_reduction]: 9.38002e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.43e-06 [virtual_dataset]: 6.22001e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.76e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 2.53e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 1.064e-05 [a_after_grad]: 8.44998e-06 [renormalize]: 0.00041428 [add_forward_monad_depend]: 5.05001e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 2.852e-05 [a_3]: 4.149e-05 [Cycle 2]: 0.00059496, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 7.33e-06 [loop_unroll]: 5.76e-06 [a_1]: 0.00012427 [with_stream_mark]: 9.78002e-06 [recompute_prepare]: 5.69e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.27999e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.768e-05 [accelerated_algorithm]: 5.43002e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.12e-06 [auto_parallel]: 5.21002e-06 [parallel]: 4.07998e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.96001e-06 [matmul_add_comm_reduction]: 5.16002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.48998e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 4.91997e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.88998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 8.55001e-06 [a_after_grad]: 7.9e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.31e-06 [cse]: 1.288e-05 [a_3]: 3.18e-05 [py_interpret_to_execute_after_opt_a]: 7.60998e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.347e-05 [convert_after_rewriter]: 7.35e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00045082 [opt_b]: 0.00017928, [1] [Cycle 1]: 0.00017307, [7] [b_1]: 0.00010635 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 3.30008e-07 [cse]: 1.657e-05 [optimize_parallel_all_gather_comm]: 1.59e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 3.797e-05 [loop_unroll]: 0.00041666 [opt_after_cconv]: 9.369e-05, [1] [Cycle 1]: 8.792e-05, [7] [c_1]: 2.706e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.609e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.304e-05 [tuple_transform]: 6.834e-05, [1] [Cycle 1]: 6.38e-05, [4] [d_1]: 3.832e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 4.416e-05 [cse_after_recomputation]: 2.041e-05, [1] [Cycle 1]: 1.592e-05, [1] [cse]: 1.099e-05 [environ_conv]: 4.98001e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.203e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.45998e-06 [overlap_recompute_and_grad_model_parallel]: 4.62998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44998e-06 [overlap_recompute_comm]: 2.68e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.787e-05, [1] [Cycle 1]: 6.387e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.67e-06 [elim_not_effective]: 1.1e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.55001e-06 [pipeline_parallel_scheduler]: 1.84e-06 [auto_monad_reorder]: 1.578e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.30998e-06 [opt_after_jit_grad]: 0.00045131 [validate]: 3.128e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.0983037 [execute]: 9.81998e-06 Sums bootstrap : 0.000525s : 0.48% type_inference : 0.005670s : 5.23% event_method : 0.000014s : 0.01% auto_monad : 0.000056s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000060s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000572s : 0.53% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000414s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.42% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000038s : 0.04% optimize.loop_unroll : 0.000417s : 0.38% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000451s : 0.42% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098304s : 90.74% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.69% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 66.20% : 0.000109s : 3: substitution.inline 2.06% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.52% : 0.000004s : 4: substitution.replace_old_param 7.08% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005628 2 90.02% : 0.005066s : 1: type_inference.infer 9.98% : 0.000562s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.71% : 0.000028s : 3: replace.inline 29.29% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 90.95% : 0.000107s : 3: match.inline 9.05% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.55% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000002s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000002s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.15% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.99% : 0.000002s : 11: predicate.transpose_eliminate 1.59% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.60% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000355 8 47.38% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.62% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120763 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.48% : 0.002997s : 1: add_attr 2.48% : 0.002989s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000061s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.47% : 0.000563s : 1: bootstrap 0.03% : 0.000042s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.78% : 0.000938s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000088s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.75% : 0.002114s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.38% : 0.000461s : 1: opt_after_jit_grad 0.15% : 0.000183s : 1: opt_b 3.29% : 0.003970s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.17% : 0.000204s : 1: renormalize.infer 0.17% : 0.000204s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000065s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.42% : 0.098326s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.71% : 0.005684s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.148944, [24] [bootstrap]: 0.00057662 [type_inference]: 0.0115568 [event_method]: 4.857e-05 [auto_monad]: 0.00015747 [graph_reusing]: 8.67e-06 [inline]: 2.09999e-06 [add_attr]: 0.0029995, [1] [add_attr_with_inline]: 0.00299134, [1] [Cycle 1]: 7.27e-05, [2] [tag_attr]: 3.567e-05 [meta_addattr_fg_expand]: 9.55001e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 5.032e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0135033, [53] [py_interpret_to_execute]: 3.881e-05 [rewriter_before_opt_a]: 0.000147 [opt_a]: 0.0111647, [3] [Cycle 1]: 0.0072051, [45] [expand_dump_flag]: 4.08999e-06 [switch_simplify]: 7.464e-05 [loop_unroll]: 6.16e-05 [a_1]: 0.00149013 [with_stream_mark]: 2.331e-05 [recompute_prepare]: 2.202e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 7.87e-06 [updatestate_loads_eliminate]: 7.44002e-06 [parameter_eliminate]: 2.66e-06 [a_2]: 0.00024585 [accelerated_algorithm]: 3.152e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 3.73001e-06 [shard_inline]: 1.631e-05 [merge_send_recv]: 1.649e-05 [auto_parallel]: 1.071e-05 [parallel]: 1.923e-05 [flash_sp]: 1.171e-05 [merge_comm]: 9.54999e-06 [allreduce_fusion]: 9.01998e-06 [matmul_add_comm_reduction]: 2.767e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.79e-05 [virtual_dataset]: 1.578e-05 [get_grad_eliminate_]: 1.526e-05 [virtual_output]: 1.537e-05 [merge_forward]: 9.67999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.833e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.883e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 2.768e-05 [set_forward_comm_id_for_comm_node_pass]: 9.72999e-06 [meta_fg_expand]: 0.00142546 [flash_sp_send_recv_attached]: 4.24002e-06 [receive_attached]: 2.80002e-06 [after_resolve]: 5.904e-05 [a_after_grad]: 8.11e-05 [renormalize]: 0.00250626 [add_forward_monad_depend]: 9.00999e-06 [auto_monad_grad]: 4.82e-06 [auto_monad_eliminator]: 5.796e-05 [cse]: 0.00017035 [a_3]: 0.00033593 [Cycle 2]: 0.00304069, [45] [expand_dump_flag]: 1.53002e-06 [switch_simplify]: 4.648e-05 [loop_unroll]: 4.411e-05 [a_1]: 0.00156283 [with_stream_mark]: 1.239e-05 [recompute_prepare]: 1.064e-05 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.00012613 [accelerated_algorithm]: 1.198e-05 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.84998e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 6.77002e-06 [auto_parallel]: 7.25e-06 [parallel]: 4.63999e-06 [flash_sp]: 3.17002e-06 [merge_comm]: 5.02999e-06 [allreduce_fusion]: 4.66002e-06 [matmul_add_comm_reduction]: 7.65e-06 [allreduce_slice_to_reducescatter]: 5.00004e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.57998e-06 [get_grad_eliminate_]: 8.93002e-06 [virtual_output]: 8.43999e-06 [merge_forward]: 4.28001e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 8.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.614e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.542e-05 [set_forward_comm_id_for_comm_node_pass]: 5.81e-06 [meta_fg_expand]: 7.179e-05 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.15999e-06 [after_resolve]: 1.658e-05 [a_after_grad]: 1.447e-05 [renormalize]: 0.00059543 [add_forward_monad_depend]: 4.13001e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 1.423e-05 [cse]: 4.637e-05 [a_3]: 6.497e-05 [Cycle 3]: 0.00090495, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.035e-05 [loop_unroll]: 8.78001e-06 [a_1]: 0.00024931 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 9.73002e-06 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 3.97e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012317 [accelerated_algorithm]: 1.181e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.84998e-06 [shard_inline]: 9.04998e-06 [merge_send_recv]: 7.04001e-06 [auto_parallel]: 6.87002e-06 [parallel]: 4.74002e-06 [flash_sp]: 1.02998e-06 [merge_comm]: 4.89e-06 [allreduce_fusion]: 4.90001e-06 [matmul_add_comm_reduction]: 7.92998e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 9.95002e-06 [virtual_dataset]: 8.60999e-06 [get_grad_eliminate_]: 8.40999e-06 [virtual_output]: 8.13999e-06 [merge_forward]: 4.45999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.587e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.53e-05 [set_forward_comm_id_for_comm_node_pass]: 5.91003e-06 [meta_fg_expand]: 2.92002e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.451e-05 [a_after_grad]: 1.416e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.30999e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 1.07e-05 [cse]: 2.652e-05 [a_3]: 6.007e-05 [py_interpret_to_execute_after_opt_a]: 1.063e-05 [slice_cell_reuse_recomputed_activation]: 2.03997e-06 [rewriter_after_opt_a]: 4.906e-05 [convert_after_rewriter]: 9.39e-06 [order_py_execute_after_rewriter]: 7.13998e-06 [mutable_eliminate]: 0.00046387 [opt_b]: 0.00028688, [1] [Cycle 1]: 0.00028074, [7] [b_1]: 0.0001888 [b_2]: 1.046e-05 [updatestate_depend_eliminate]: 7.37002e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.97e-06 [renormalize]: 3.50003e-07 [cse]: 3.147e-05 [optimize_parallel_all_gather_comm]: 1.998e-05 [overlap_param_gather]: 1.78002e-06 [cconv]: 2.023e-05 [loop_unroll]: 0.00042534 [opt_after_cconv]: 0.00019783, [1] [Cycle 1]: 0.0001919, [7] [c_1]: 4.796e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 7.3e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 4.04002e-06 [cse]: 3.086e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 2.952e-05 [tuple_transform]: 0.00010209, [1] [Cycle 1]: 9.729e-05, [4] [d_1]: 6.705e-05 [none_parameter_eliminate]: 1.82001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 2.25002e-06 [add_recomputation]: 5.806e-05 [cse_after_recomputation]: 3.212e-05, [1] [Cycle 1]: 2.721e-05, [1] [cse]: 2.162e-05 [environ_conv]: 9.55001e-06 [swap_dp_allreduce_reducescatter]: 7.91001e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.63999e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.69e-06 [slice_recompute_activation]: 2.51e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 1.15001e-06 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.18001e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.711e-05 [grouped_pairwise_exchange_alltoall]: 2.06e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 6.01e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.23002e-06 [overlap_grad_ring_attention]: 5.02999e-06 [overlap_grad_flash_sp]: 2.504e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.82999e-06 [handle_group_info]: 1.18001e-06 [symbol_engine_optimizer]: 9.766e-05, [1] [Cycle 1]: 9.363e-05, [6] [build]: 9.42999e-06 [elim_shapecalc]: 1.313e-05 [elim_not_effective]: 1.822e-05 [opt_reshape]: 9.87001e-06 [fold_const_symbol]: 1.494e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 2.665e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00046937 [validate]: 4.681e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.119245 [execute]: 9.97999e-06 Sums bootstrap : 0.000577s : 0.40% type_inference : 0.011557s : 7.99% event_method : 0.000049s : 0.03% auto_monad : 0.000157s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000147s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.09% optimize.opt_a.loop_unroll : 0.000114s : 0.08% optimize.opt_a.a_1 : 0.003302s : 2.28% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001500s : 1.04% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003102s : 2.14% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000243s : 0.17% optimize.opt_a.a_3 : 0.000461s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000464s : 0.32% optimize.opt_b.b_1 : 0.000189s : 0.13% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000425s : 0.29% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.01% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000469s : 0.32% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.119245s : 82.46% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000769 222 5.96% : 0.000046s : 12: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.33% : 0.000003s : 2: substitution.incorporate_call_switch 55.21% : 0.000425s : 17: substitution.inline 2.04% : 0.000016s : 2: substitution.inline_without_move 1.43% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.73% : 0.000013s : 20: substitution.remove_not_recompute_node 3.32% : 0.000026s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.68% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011479 2 86.54% : 0.009934s : 1: type_inference.infer 13.46% : 0.001545s : 1: type_inference.specialize ------[replace.] 0.000246 33 62.53% : 0.000154s : 17: replace.inline 37.47% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 33 92.34% : 0.000416s : 17: match.inline 7.66% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.07% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.12% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.28% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.11% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.39% : 0.000010s : 84: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001632 34 56.55% : 0.000923s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.45% : 0.000709s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.173843 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.73% : 0.003004s : 1: add_attr 1.72% : 0.002995s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000165s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.35% : 0.000615s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000055s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.25% : 0.000434s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000473s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.86% : 0.004972s : 117: opt.transform.opt_a 0.03% : 0.000046s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.42% : 0.011168s : 1: opt_a 0.12% : 0.000201s : 1: opt_after_cconv 0.28% : 0.000479s : 1: opt_after_jit_grad 0.17% : 0.000290s : 1: opt_b 7.77% : 0.013507s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.02% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.96% : 0.001663s : 2: renormalize.infer 0.82% : 0.001425s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000053s : 1: rewriter_after_opt_a 0.09% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000100s : 1: symbol_engine_optimizer 68.61% : 0.119267s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.66% : 0.011572s : 1: type_inference 0.04% : 0.000072s : 1: validate TotalTime = 0.106315, [24] [bootstrap]: 0.00048812 [type_inference]: 0.0044111 [event_method]: 1.132e-05 [auto_monad]: 5.265e-05 [graph_reusing]: 5.96e-06 [inline]: 1.81e-06 [add_attr]: 0.00299547, [1] [add_attr_with_inline]: 0.00298783, [1] [Cycle 1]: 4.835e-05, [2] [tag_attr]: 1.322e-05 [meta_addattr_fg_expand]: 3.25998e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 2.179e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.0037054, [53] [py_interpret_to_execute]: 1.601e-05 [rewriter_before_opt_a]: 3.887e-05 [opt_a]: 0.00189083, [2] [Cycle 1]: 0.00126866, [45] [expand_dump_flag]: 2.99001e-06 [switch_simplify]: 2.453e-05 [loop_unroll]: 1.394e-05 [a_1]: 0.00029497 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 7.85e-06 [updatestate_depend_eliminate]: 3.40998e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 2.23002e-06 [a_2]: 7.828e-05 [accelerated_algorithm]: 6.01e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.44998e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 5.85002e-06 [parallel]: 1.856e-05 [flash_sp]: 7.74002e-06 [merge_comm]: 3.54002e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.50001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.31001e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.088e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 9.62999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.36998e-06 [flash_sp_send_recv_attached]: 2.31998e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.063e-05 [a_after_grad]: 8.67e-06 [renormalize]: 0.00034558 [add_forward_monad_depend]: 4.47998e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.954e-05 [a_3]: 3.98e-05 [Cycle 2]: 0.00061291, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.0001475 [with_stream_mark]: 9.31998e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 7.39994e-07 [a_2]: 6.735e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.65001e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.25e-06 [flash_sp]: 3.98001e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.70997e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.10002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59999e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.12001e-06 [a_after_grad]: 7.88001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.01e-06 [cse]: 1.366e-05 [a_3]: 3.16e-05 [py_interpret_to_execute_after_opt_a]: 7.15e-06 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 3.305e-05 [convert_after_rewriter]: 7.71999e-06 [order_py_execute_after_rewriter]: 5.53002e-06 [mutable_eliminate]: 0.00044614 [opt_b]: 0.00017932, [1] [Cycle 1]: 0.00017342, [7] [b_1]: 0.00010726 [b_2]: 6.84999e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.60014e-07 [cse]: 1.62e-05 [optimize_parallel_all_gather_comm]: 1.645e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.329e-05 [loop_unroll]: 0.00041291 [opt_after_cconv]: 9.435e-05, [1] [Cycle 1]: 8.884e-05, [7] [c_1]: 2.766e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.666e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.359e-05 [tuple_transform]: 6.872e-05, [1] [Cycle 1]: 6.459e-05, [4] [d_1]: 3.952e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 4.584e-05 [cse_after_recomputation]: 2.059e-05, [1] [Cycle 1]: 1.621e-05, [1] [cse]: 1.117e-05 [environ_conv]: 4.82998e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.44e-06 [full_micro_interleaved_order_control]: 2.63e-06 [reorder_send_recv_between_fp_bp]: 2.98998e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.27999e-06 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09999e-06 [control_data_broadcast_order]: 1.205e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.81999e-06 [overlap_recompute_and_grad_model_parallel]: 4.64002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.59e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 1.696e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 6.935e-05, [1] [Cycle 1]: 6.523e-05, [6] [build]: 2.94001e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.168e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.654e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00044443 [validate]: 3.203e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0938878 [execute]: 1.023e-05 Sums bootstrap : 0.000488s : 0.48% type_inference : 0.004411s : 4.31% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000442s : 0.43% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000346s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.44% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000413s : 0.40% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.43% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.093888s : 91.73% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000123 26 18.24% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.89% : 0.000006s : 4: substitution.graph_param_transform 65.49% : 0.000081s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.17% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004370 2 91.80% : 0.004012s : 1: type_inference.infer 8.20% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000136 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.57% : 0.000004s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.00% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.43% : 0.000001s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.71% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.67% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.54% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000249 6 43.34% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.66% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.114310 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.62% : 0.003000s : 1: add_attr 2.62% : 0.002991s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.46% : 0.000525s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000455s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.70% : 0.000795s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.66% : 0.001894s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.40% : 0.000454s : 1: opt_after_jit_grad 0.16% : 0.000183s : 1: opt_b 3.24% : 0.003709s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.17% : 0.000190s : 1: renormalize.infer 0.13% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 82.15% : 0.093911s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.87% : 0.004425s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.148965, [24] [bootstrap]: 0.00053311 [type_inference]: 0.0107354 [event_method]: 4.577e-05 [auto_monad]: 0.00012313 [graph_reusing]: 7.93001e-06 [inline]: 2.30002e-06 [add_attr]: 0.00308407, [1] [add_attr_with_inline]: 0.00307508, [1] [Cycle 1]: 7.002e-05, [2] [tag_attr]: 3.438e-05 [meta_addattr_fg_expand]: 8.92e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 4.75e-05 [insert-virtual-dataset]: 2.79001e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.31998e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.0135178, [53] [py_interpret_to_execute]: 3.697e-05 [rewriter_before_opt_a]: 0.00012825 [opt_a]: 0.0112226, [3] [Cycle 1]: 0.00723498, [45] [expand_dump_flag]: 3.84002e-06 [switch_simplify]: 6.852e-05 [loop_unroll]: 5.566e-05 [a_1]: 0.00140353 [with_stream_mark]: 2.292e-05 [recompute_prepare]: 2.213e-05 [updatestate_depend_eliminate]: 9.36e-06 [updatestate_assign_eliminate]: 7.93999e-06 [updatestate_loads_eliminate]: 8.05e-06 [parameter_eliminate]: 2.58e-06 [a_2]: 0.00025005 [accelerated_algorithm]: 3.292e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 3.56999e-06 [shard_inline]: 1.644e-05 [merge_send_recv]: 1.612e-05 [auto_parallel]: 1.119e-05 [parallel]: 2.007e-05 [flash_sp]: 1.196e-05 [merge_comm]: 9.76e-06 [allreduce_fusion]: 9.19e-06 [matmul_add_comm_reduction]: 2.772e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.93e-05 [virtual_dataset]: 1.588e-05 [get_grad_eliminate_]: 1.566e-05 [virtual_output]: 1.605e-05 [merge_forward]: 9.99001e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 1.852e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.924e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 2.791e-05 [set_forward_comm_id_for_comm_node_pass]: 1.02e-05 [meta_fg_expand]: 0.00149424 [flash_sp_send_recv_attached]: 3.86999e-06 [receive_attached]: 2.90002e-06 [after_resolve]: 6.017e-05 [a_after_grad]: 8.326e-05 [renormalize]: 0.00253074 [add_forward_monad_depend]: 9.47999e-06 [auto_monad_grad]: 5.47999e-06 [auto_monad_eliminator]: 5.935e-05 [cse]: 0.00017584 [a_3]: 0.00034401 [Cycle 2]: 0.00306004, [45] [expand_dump_flag]: 1.77999e-06 [switch_simplify]: 4.79e-05 [loop_unroll]: 4.501e-05 [a_1]: 0.00156242 [with_stream_mark]: 1.195e-05 [recompute_prepare]: 1.136e-05 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 3.93999e-06 [parameter_eliminate]: 1.30001e-06 [a_2]: 0.00015787 [accelerated_algorithm]: 1.27e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 2.11e-06 [shard_inline]: 9.52999e-06 [merge_send_recv]: 7.08e-06 [auto_parallel]: 8.02e-06 [parallel]: 5.34e-06 [flash_sp]: 3.79002e-06 [merge_comm]: 5.69e-06 [allreduce_fusion]: 5.05999e-06 [matmul_add_comm_reduction]: 7.88001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 9.28002e-06 [get_grad_eliminate_]: 9.10999e-06 [virtual_output]: 8.58001e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.722e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.42e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27001e-06 [meta_fg_expand]: 3.699e-05 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.475e-05 [a_after_grad]: 1.478e-05 [renormalize]: 0.00060372 [add_forward_monad_depend]: 3.90998e-06 [auto_monad_grad]: 1.23002e-06 [auto_monad_eliminator]: 1.467e-05 [cse]: 4.877e-05 [a_3]: 6.62e-05 [Cycle 3]: 0.00091317, [45] [expand_dump_flag]: 9.90025e-07 [switch_simplify]: 1.098e-05 [loop_unroll]: 9.05001e-06 [a_1]: 0.00025524 [with_stream_mark]: 9.51e-06 [recompute_prepare]: 9.44e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 3.94002e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 0.0001251 [accelerated_algorithm]: 1.175e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.86998e-06 [shard_inline]: 9.14998e-06 [merge_send_recv]: 7.25e-06 [auto_parallel]: 7.31001e-06 [parallel]: 4.50999e-06 [flash_sp]: 1.05001e-06 [merge_comm]: 5.35999e-06 [allreduce_fusion]: 5.12999e-06 [matmul_add_comm_reduction]: 7.94002e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.036e-05 [virtual_dataset]: 8.62e-06 [get_grad_eliminate_]: 8.67998e-06 [virtual_output]: 8.47e-06 [merge_forward]: 4.58999e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.611e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.397e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 3.21001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 1.347e-05 [a_after_grad]: 1.441e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.131e-05 [cse]: 2.747e-05 [a_3]: 6.075e-05 [py_interpret_to_execute_after_opt_a]: 1.059e-05 [slice_cell_reuse_recomputed_activation]: 1.98002e-06 [rewriter_after_opt_a]: 4.805e-05 [convert_after_rewriter]: 9.20001e-06 [order_py_execute_after_rewriter]: 7.21001e-06 [mutable_eliminate]: 0.00046242 [opt_b]: 0.00029405, [1] [Cycle 1]: 0.00028707, [7] [b_1]: 0.00019325 [b_2]: 1.106e-05 [updatestate_depend_eliminate]: 7.26001e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.99002e-06 [renormalize]: 3.70026e-07 [cse]: 3.243e-05 [optimize_parallel_all_gather_comm]: 2.058e-05 [overlap_param_gather]: 2.46e-06 [cconv]: 2.072e-05 [loop_unroll]: 0.00042494 [opt_after_cconv]: 0.00013708, [1] [Cycle 1]: 0.00013107, [7] [c_1]: 4.85e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 7.35998e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 3.80998e-06 [cse]: 3.082e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 3.104e-05 [tuple_transform]: 0.00010205, [1] [Cycle 1]: 9.751e-05, [4] [d_1]: 6.717e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 1.008e-05 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 5.976e-05 [cse_after_recomputation]: 3.351e-05, [1] [Cycle 1]: 2.846e-05, [1] [cse]: 2.3e-05 [environ_conv]: 9.07001e-06 [swap_dp_allreduce_reducescatter]: 8.45001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.00998e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.51998e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.42999e-06 [full_micro_interleaved_order_control]: 2.76e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.14003e-06 [interleave_split_concat_branches]: 1.39e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.806e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 5.20001e-06 [overlap_recompute_and_grad_model_parallel]: 5.82001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72001e-06 [overlap_recompute_comm]: 2.73998e-06 [overlap_grad_ring_attention]: 5.47001e-06 [overlap_grad_flash_sp]: 2.441e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 0.00010147, [1] [Cycle 1]: 9.722e-05, [6] [build]: 1.06e-05 [elim_shapecalc]: 1.411e-05 [elim_not_effective]: 1.846e-05 [opt_reshape]: 1.044e-05 [fold_const_symbol]: 1.529e-05 [renormalize]: 2.20025e-07 [detach_backward]: 1.51998e-06 [pipeline_parallel_scheduler]: 1.51998e-06 [auto_monad_reorder]: 2.605e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00046991 [validate]: 4.552e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.120069 [execute]: 9.51e-06 Sums bootstrap : 0.000533s : 0.37% type_inference : 0.010735s : 7.43% event_method : 0.000046s : 0.03% auto_monad : 0.000123s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000128s : 0.09% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000127s : 0.09% optimize.opt_a.loop_unroll : 0.000110s : 0.08% optimize.opt_a.a_1 : 0.003221s : 2.23% optimize.opt_a.with_stream_mark : 0.000044s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000533s : 0.37% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000027s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.02% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001534s : 1.06% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.06% optimize.opt_a.a_after_grad : 0.000112s : 0.08% optimize.opt_a.renormalize : 0.003135s : 2.17% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.06% optimize.opt_a.cse : 0.000252s : 0.17% optimize.opt_a.a_3 : 0.000471s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000462s : 0.32% optimize.opt_b.b_1 : 0.000193s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.01% optimize.loop_unroll : 0.000425s : 0.29% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000470s : 0.33% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.120069s : 83.05% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000755 218 5.76% : 0.000043s : 11: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.56% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000007s : 8: substitution.graph_param_transform 0.48% : 0.000004s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.18% : 0.000417s : 16: substitution.inline 2.14% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.14% : 0.000016s : 3: substitution.less_batch_normalization 1.88% : 0.000014s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.79% : 0.000014s : 20: substitution.remove_not_recompute_node 3.31% : 0.000025s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.73% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.19% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.48% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010665 2 86.02% : 0.009175s : 1: type_inference.infer 13.98% : 0.001491s : 1: type_inference.specialize ------[replace.] 0.000210 30 58.63% : 0.000123s : 16: replace.inline 41.37% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000438 30 93.10% : 0.000408s : 16: match.inline 6.90% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000747 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.24% : 0.000017s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000042s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000015s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.10% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 67: predicate.reduce_eliminate 2.64% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.54% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001693 32 57.18% : 0.000968s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.82% : 0.000725s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.173965 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.78% : 0.003088s : 1: add_attr 1.77% : 0.003079s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000130s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.33% : 0.000572s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000053s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000024s : 1: micro_interleaved_order_control 0.27% : 0.000471s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.84% : 0.004937s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000178s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000055s : 4: opt.transform.symbol_engine_opt 6.45% : 0.011226s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.28% : 0.000480s : 1: opt_after_jit_grad 0.17% : 0.000298s : 1: opt_b 7.77% : 0.013522s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000052s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000035s : 1: remove_dup_value 0.96% : 0.001663s : 2: renormalize.infer 0.84% : 0.001458s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.08% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000104s : 1: symbol_engine_optimizer 69.03% : 0.120093s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.18% : 0.010751s : 1: type_inference 0.04% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x0-ge],max_mem:12.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x1-pynative],max_mem:12.0M TotalTime = 0.0222892, [24] [bootstrap]: 0.00058741 [type_inference]: 0.00647476 [event_method]: 1.5e-05 [auto_monad]: 6.146e-05 [graph_reusing]: 5.76998e-06 [inline]: 1.83002e-06 [add_attr]: 0.0034485, [1] [add_attr_with_inline]: 0.00343828, [1] [Cycle 1]: 4.645e-05, [2] [tag_attr]: 1.592e-05 [meta_addattr_fg_expand]: 4.16001e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 3.327e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00406428, [53] [py_interpret_to_execute]: 2.114e-05 [rewriter_before_opt_a]: 6.073e-05 [opt_a]: 0.00217241, [2] [Cycle 1]: 0.00156299, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 3.299e-05 [loop_unroll]: 2.101e-05 [a_1]: 0.00047038 [with_stream_mark]: 1.407e-05 [recompute_prepare]: 7.96001e-06 [updatestate_depend_eliminate]: 4.16001e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 3.17002e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 7.65e-05 [accelerated_algorithm]: 6.69001e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 6.84999e-06 [merge_send_recv]: 8.29002e-06 [auto_parallel]: 6.09001e-06 [parallel]: 2.483e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.73001e-06 [virtual_dataset]: 6.02001e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.94002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.80002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 9.59999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.42001e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.097e-05 [a_after_grad]: 9.03002e-06 [renormalize]: 0.00043456 [add_forward_monad_depend]: 4.60001e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.403e-05 [cse]: 2.783e-05 [a_3]: 4.144e-05 [Cycle 2]: 0.00060001, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.66998e-06 [a_1]: 0.00012786 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.87001e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.957e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.26002e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.12e-06 [flash_sp]: 3.2e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.38003e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 5.88002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.007e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.12002e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.87999e-06 [a_after_grad]: 8.15e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.339e-05 [a_3]: 3.27e-05 [py_interpret_to_execute_after_opt_a]: 7.49002e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 2.967e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 5.30999e-06 [mutable_eliminate]: 0.0004519 [opt_b]: 0.00018755, [1] [Cycle 1]: 0.00018133, [7] [b_1]: 0.00011298 [b_2]: 7.46999e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 3.50003e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.637e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.32e-05 [loop_unroll]: 0.00041622 [opt_after_cconv]: 9.623e-05, [1] [Cycle 1]: 9.056e-05, [7] [c_1]: 2.849e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.23998e-06 [cse]: 1.631e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.336e-05 [tuple_transform]: 7.131e-05, [1] [Cycle 1]: 6.68e-05, [4] [d_1]: 4.07e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 5.145e-05 [cse_after_recomputation]: 2.073e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.111e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 5.66e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.58001e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.52001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.185e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.62e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.45002e-06 [overlap_grad_ring_attention]: 2.82e-05 [overlap_grad_flash_sp]: 1.775e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 7.067e-05, [1] [Cycle 1]: 6.66e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.70001e-06 [elim_not_effective]: 1.231e-05 [opt_reshape]: 6.37001e-06 [fold_const_symbol]: 9.15999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 1.11002e-06 [rewriter_after_jit_bprop_graph]: 0.00014027 [opt_after_jit_grad]: 0.00046091 [validate]: 3.213e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.00669445 [execute]: 7.33e-06 Sums bootstrap : 0.000587s : 3.29% type_inference : 0.006475s : 36.29% event_method : 0.000015s : 0.08% auto_monad : 0.000061s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000033s : 0.19% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000061s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000598s : 3.35% optimize.opt_a.with_stream_mark : 0.000024s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000146s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000435s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000041s : 0.23% optimize.opt_a.a_3 : 0.000074s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.53% optimize.opt_b.b_1 : 0.000113s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000416s : 2.33% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000028s : 0.16% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000140s : 0.79% opt_after_jit_grad : 0.000461s : 2.58% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.006694s : 37.52% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000172 30 14.67% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.53% : 0.000006s : 4: substitution.graph_param_transform 66.94% : 0.000115s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000005s : 4: substitution.remove_not_recompute_node 2.22% : 0.000004s : 4: substitution.replace_old_param 6.52% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006428 2 89.93% : 0.005781s : 1: type_inference.infer 10.07% : 0.000647s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.79% : 0.000028s : 3: replace.inline 30.21% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 91.70% : 0.000113s : 3: match.inline 8.30% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.21% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.18% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 1.02% : 0.000002s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.32% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000002s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.57% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000400 8 46.05% : 0.000184s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.95% : 0.000216s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031347 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.01% : 0.003453s : 1: add_attr 10.98% : 0.003442s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000067s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.00% : 0.000628s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.35% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.47% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000972s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.94% : 0.002175s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.50% : 0.000471s : 1: opt_after_jit_grad 0.61% : 0.000191s : 1: opt_b 12.98% : 0.004068s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.10% : 0.000032s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.12% : 0.000039s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.71% : 0.000223s : 1: renormalize.infer 0.65% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.46% : 0.000146s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000073s : 1: symbol_engine_optimizer 21.39% : 0.006705s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 20.70% : 0.006489s : 1: type_inference 0.20% : 0.000063s : 1: validate TotalTime = 0.0186751, [24] [bootstrap]: 0.00049163 [type_inference]: 0.00451022 [event_method]: 1.04e-05 [auto_monad]: 5.298e-05 [graph_reusing]: 5.27999e-06 [inline]: 2.04e-06 [add_attr]: 0.00301779, [1] [add_attr_with_inline]: 0.00301005, [1] [Cycle 1]: 4.635e-05, [2] [tag_attr]: 1.229e-05 [meta_addattr_fg_expand]: 3.4e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 2.235e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 6.99976e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00375915, [53] [py_interpret_to_execute]: 1.552e-05 [rewriter_before_opt_a]: 3.982e-05 [opt_a]: 0.00192644, [2] [Cycle 1]: 0.00131547, [45] [expand_dump_flag]: 3.07002e-06 [switch_simplify]: 2.528e-05 [loop_unroll]: 1.431e-05 [a_1]: 0.00030325 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.74002e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.53e-06 [updatestate_loads_eliminate]: 3.56999e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 7.916e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.36998e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 6.28e-06 [merge_send_recv]: 8.01001e-06 [auto_parallel]: 6.03002e-06 [parallel]: 1.922e-05 [flash_sp]: 7.51001e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 8.08999e-06 [virtual_dataset]: 5.97999e-06 [get_grad_eliminate_]: 6.31e-06 [virtual_output]: 5.91e-06 [merge_forward]: 4.02002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.028e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.193e-05 [merge_recompute_call_nodes]: 1.86e-06 [before_grad]: 9.87001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76999e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 1.128e-05 [a_after_grad]: 9.02999e-06 [renormalize]: 0.00034685 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.343e-05 [cse]: 2.804e-05 [a_3]: 4.005e-05 [Cycle 2]: 0.00060175, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 7.38999e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.00012765 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.909e-05 [accelerated_algorithm]: 5.84999e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.35999e-06 [parallel]: 4.53999e-06 [flash_sp]: 3.33e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.28e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.01e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82001e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.30999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 9.56e-06 [a_after_grad]: 8.51002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.37001e-06 [cse]: 1.246e-05 [a_3]: 3.256e-05 [py_interpret_to_execute_after_opt_a]: 7.68999e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.147e-05 [convert_after_rewriter]: 7.71001e-06 [order_py_execute_after_rewriter]: 5.67999e-06 [mutable_eliminate]: 0.00045417 [opt_b]: 0.00018439, [1] [Cycle 1]: 0.00017845, [7] [b_1]: 0.00010948 [b_2]: 7.41001e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.30002e-06 [renormalize]: 4.00003e-07 [cse]: 1.585e-05 [optimize_parallel_all_gather_comm]: 1.625e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.217e-05 [loop_unroll]: 0.00041637 [opt_after_cconv]: 9.515e-05, [1] [Cycle 1]: 8.981e-05, [7] [c_1]: 2.856e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.41998e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.578e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.314e-05 [tuple_transform]: 7.045e-05, [1] [Cycle 1]: 6.622e-05, [4] [d_1]: 4.025e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 4.347e-05 [cse_after_recomputation]: 2.075e-05, [1] [Cycle 1]: 1.617e-05, [1] [cse]: 1.068e-05 [environ_conv]: 4.42e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.48e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.29999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 8.60018e-07 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.42e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.229e-05 [grouped_pairwise_exchange_alltoall]: 2.11e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.68001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.858e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.08002e-06 [split_layernorm_comm]: 1.91998e-06 [handle_group_info]: 1.24e-06 [symbol_engine_optimizer]: 6.962e-05, [1] [Cycle 1]: 6.527e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.199e-05 [opt_reshape]: 6.04999e-06 [fold_const_symbol]: 9.20001e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.66998e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.623e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.0004487 [validate]: 3.239e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00608292 [execute]: 7.08e-06 Sums bootstrap : 0.000492s : 3.35% type_inference : 0.004510s : 30.76% event_method : 0.000010s : 0.07% auto_monad : 0.000053s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000033s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000431s : 2.94% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000148s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000347s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000454s : 3.10% optimize.opt_b.b_1 : 0.000109s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.84% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000019s : 0.13% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.06% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006083s : 41.49% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000125 26 18.67% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.69% : 0.000006s : 4: substitution.graph_param_transform 65.16% : 0.000081s : 2: substitution.inline 2.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.44% : 0.000004s : 4: substitution.remove_not_recompute_node 3.28% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004467 2 91.43% : 0.004084s : 1: type_inference.infer 8.57% : 0.000383s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000140 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.21% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.57% : 0.000004s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.42% : 0.000001s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.33% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.95% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.96% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.97% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.19% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000255 6 41.84% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.16% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026754 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.30% : 0.003022s : 1: add_attr 11.26% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.98% : 0.000529s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000425s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.97% : 0.000794s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.21% : 0.001929s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.71% : 0.000459s : 1: opt_after_jit_grad 0.70% : 0.000188s : 1: opt_b 14.07% : 0.003763s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000022s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.71% : 0.000191s : 1: renormalize.infer 0.56% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.77% : 0.006093s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.91% : 0.004524s : 1: type_inference 0.22% : 0.000059s : 1: validate TotalTime = 0.0200734, [24] [bootstrap]: 0.00049939 [type_inference]: 0.00566418 [event_method]: 1.41e-05 [auto_monad]: 5.795e-05 [graph_reusing]: 5.56e-06 [inline]: 2.06e-06 [add_attr]: 0.00302424, [1] [add_attr_with_inline]: 0.00301624, [1] [Cycle 1]: 4.674e-05, [2] [tag_attr]: 1.562e-05 [meta_addattr_fg_expand]: 4.54998e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.607e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00401898, [53] [py_interpret_to_execute]: 2.021e-05 [rewriter_before_opt_a]: 5.989e-05 [opt_a]: 0.00213369, [2] [Cycle 1]: 0.00152934, [45] [expand_dump_flag]: 3.26001e-06 [switch_simplify]: 3.287e-05 [loop_unroll]: 2.182e-05 [a_1]: 0.00046122 [with_stream_mark]: 1.357e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.654e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 1.95001e-06 [meta_shard_fg_expand]: 1.61998e-06 [shard_inline]: 5.86003e-06 [merge_send_recv]: 8.13001e-06 [auto_parallel]: 6.11e-06 [parallel]: 1.924e-05 [flash_sp]: 8.03001e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.69002e-06 [matmul_add_comm_reduction]: 9.72001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 4.10998e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 1.03e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.57001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.81999e-06 [meta_fg_expand]: 2.31998e-06 [flash_sp_send_recv_attached]: 2.41998e-06 [receive_attached]: 2.12001e-06 [after_resolve]: 1.094e-05 [a_after_grad]: 8.77e-06 [renormalize]: 0.000415 [add_forward_monad_depend]: 4.83001e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.377e-05 [cse]: 2.812e-05 [a_3]: 4.184e-05 [Cycle 2]: 0.00059506, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 7.09001e-06 [loop_unroll]: 5.89e-06 [a_1]: 0.00012598 [with_stream_mark]: 1.042e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.751e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.60001e-06 [merge_send_recv]: 4.70001e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.46002e-06 [flash_sp]: 3.33e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.49e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.20001e-06 [virtual_output]: 5.30999e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.82998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 7.81001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.615e-05 [a_3]: 3.306e-05 [py_interpret_to_execute_after_opt_a]: 7.21001e-06 [slice_cell_reuse_recomputed_activation]: 2.76e-06 [rewriter_after_opt_a]: 3.117e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.20999e-06 [mutable_eliminate]: 0.00048059 [opt_b]: 0.00018287, [1] [Cycle 1]: 0.00017708, [7] [b_1]: 0.00010863 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.43998e-06 [renormalize]: 7.7e-07 [cse]: 1.63e-05 [optimize_parallel_all_gather_comm]: 1.648e-05 [overlap_param_gather]: 2.10002e-06 [cconv]: 2.404e-05 [loop_unroll]: 0.00042053 [opt_after_cconv]: 9.58e-05, [1] [Cycle 1]: 8.998e-05, [7] [c_1]: 2.87e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 5.37999e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.554e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.28e-05 [tuple_transform]: 7.132e-05, [1] [Cycle 1]: 6.703e-05, [4] [d_1]: 4.101e-05 [none_parameter_eliminate]: 1.44003e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.461e-05 [cse_after_recomputation]: 2.028e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.062e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.18002e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.31002e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.40999e-06 [full_micro_interleaved_order_control]: 2.56998e-06 [reorder_send_recv_between_fp_bp]: 3.06001e-06 [comm_op_add_attrs]: 1.34e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.204e-05 [grouped_pairwise_exchange_alltoall]: 1.94e-06 [offloading_packed_experts]: 3.50998e-06 [overlap_recompute_and_grad_model_parallel]: 4.25e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.54e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.25002e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.812e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.23998e-06 [split_layernorm_comm]: 2.12999e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.939e-05, [1] [Cycle 1]: 6.53e-05, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.47998e-06 [elim_not_effective]: 1.243e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 9.09e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.684e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00045487 [validate]: 3.079e-05 [backend_pass]: 1.07998e-06 [task_emit]: 0.00603222 [execute]: 7.11999e-06 Sums bootstrap : 0.000499s : 3.10% type_inference : 0.005664s : 35.20% event_method : 0.000014s : 0.09% auto_monad : 0.000058s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000060s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000028s : 0.17% optimize.opt_a.a_1 : 0.000587s : 3.65% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000415s : 2.58% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000044s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000481s : 2.99% optimize.opt_b.b_1 : 0.000109s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000421s : 2.61% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000455s : 2.83% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006032s : 37.49% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000172 30 14.61% : 0.000025s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.50% : 0.000006s : 4: substitution.graph_param_transform 66.32% : 0.000114s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000004s : 4: substitution.remove_not_recompute_node 2.62% : 0.000004s : 4: substitution.replace_old_param 6.77% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005622 2 89.88% : 0.005052s : 1: type_inference.infer 10.12% : 0.000569s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.58% : 0.000028s : 3: replace.inline 30.42% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 91.42% : 0.000112s : 3: match.inline 8.58% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.04% : 0.000003s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.73% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.64% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 46.49% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.51% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028650 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.57% : 0.003029s : 1: add_attr 10.54% : 0.003020s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000063s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.88% : 0.000538s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000490s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.34% : 0.000956s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.46% : 0.002137s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.62% : 0.000464s : 1: opt_after_jit_grad 0.65% : 0.000186s : 1: opt_b 14.04% : 0.004023s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000210s : 1: renormalize.infer 0.69% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000072s : 1: symbol_engine_optimizer 21.09% : 0.006042s : 1: task_emit 0.26% : 0.000074s : 1: tuple_transform 19.82% : 0.005678s : 1: type_inference 0.20% : 0.000058s : 1: validate TotalTime = 0.0382527, [24] [bootstrap]: 0.00062086 [type_inference]: 0.0117073 [event_method]: 4.675e-05 [auto_monad]: 0.00012761 [graph_reusing]: 9.12999e-06 [inline]: 1.95001e-06 [add_attr]: 0.00303403, [1] [add_attr_with_inline]: 0.00302445, [1] [Cycle 1]: 7.225e-05, [2] [tag_attr]: 3.528e-05 [meta_addattr_fg_expand]: 9.57999e-06 [parallel-infer-symbol]: 3.09999e-06 [pre_auto_parallel]: 5.033e-05 [insert-virtual-dataset]: 2.80002e-06 [parallel-infer-symbol-second]: 8.19971e-07 [dataset_repeat_opt]: 2.39999e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0134554, [53] [py_interpret_to_execute]: 3.881e-05 [rewriter_before_opt_a]: 0.00014677 [opt_a]: 0.0111184, [3] [Cycle 1]: 0.00713752, [45] [expand_dump_flag]: 4.43999e-06 [switch_simplify]: 7.562e-05 [loop_unroll]: 9.319e-05 [a_1]: 0.00146366 [with_stream_mark]: 2.65e-05 [recompute_prepare]: 2.215e-05 [updatestate_depend_eliminate]: 9.58002e-06 [updatestate_assign_eliminate]: 8.12998e-06 [updatestate_loads_eliminate]: 7.08e-06 [parameter_eliminate]: 2.64001e-06 [a_2]: 0.00024602 [accelerated_algorithm]: 3.108e-05 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 3.30003e-06 [shard_inline]: 1.602e-05 [merge_send_recv]: 1.597e-05 [auto_parallel]: 1.026e-05 [parallel]: 1.824e-05 [flash_sp]: 1.15e-05 [merge_comm]: 9.32999e-06 [allreduce_fusion]: 8.77e-06 [matmul_add_comm_reduction]: 2.684e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.746e-05 [virtual_dataset]: 1.599e-05 [get_grad_eliminate_]: 1.586e-05 [virtual_output]: 1.571e-05 [merge_forward]: 9.52999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.844e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.885e-05 [merge_recompute_call_nodes]: 1.70001e-06 [before_grad]: 2.741e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89999e-06 [meta_fg_expand]: 0.00141512 [flash_sp_send_recv_attached]: 3.70998e-06 [receive_attached]: 2.43e-06 [after_resolve]: 6.02e-05 [a_after_grad]: 8.25e-05 [renormalize]: 0.0024484 [add_forward_monad_depend]: 9.65002e-06 [auto_monad_grad]: 5.02e-06 [auto_monad_eliminator]: 5.504e-05 [cse]: 0.00016489 [a_3]: 0.0003371 [Cycle 2]: 0.00306213, [45] [expand_dump_flag]: 1.47001e-06 [switch_simplify]: 4.734e-05 [loop_unroll]: 4.444e-05 [a_1]: 0.00158766 [with_stream_mark]: 1.199e-05 [recompute_prepare]: 1.089e-05 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 4.42e-06 [updatestate_loads_eliminate]: 3.56999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00012587 [accelerated_algorithm]: 1.193e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.50001e-06 [merge_send_recv]: 6.60002e-06 [auto_parallel]: 7.17002e-06 [parallel]: 4.85001e-06 [flash_sp]: 3.28e-06 [merge_comm]: 5.12999e-06 [allreduce_fusion]: 4.53999e-06 [matmul_add_comm_reduction]: 7.95998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 9.00001e-06 [get_grad_eliminate_]: 9.04e-06 [virtual_output]: 8.35999e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 9.10019e-07 [offload_activation]: 9.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.723e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.418e-05 [set_forward_comm_id_for_comm_node_pass]: 5.14998e-06 [meta_fg_expand]: 6.98e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.15001e-06 [after_resolve]: 1.65e-05 [a_after_grad]: 1.488e-05 [renormalize]: 0.00058917 [add_forward_monad_depend]: 3.99002e-06 [auto_monad_grad]: 1.59e-06 [auto_monad_eliminator]: 1.478e-05 [cse]: 4.707e-05 [a_3]: 6.629e-05 [Cycle 3]: 0.00090458, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.065e-05 [loop_unroll]: 9.11998e-06 [a_1]: 0.00025135 [with_stream_mark]: 1.01e-05 [recompute_prepare]: 9.12999e-06 [updatestate_depend_eliminate]: 4.80001e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 3.90998e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012366 [accelerated_algorithm]: 1.174e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 8.98002e-06 [merge_send_recv]: 7.11999e-06 [auto_parallel]: 7.19001e-06 [parallel]: 4.68999e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 5.15999e-06 [allreduce_fusion]: 5.18002e-06 [matmul_add_comm_reduction]: 7.83999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 9.82001e-06 [virtual_dataset]: 8.91002e-06 [get_grad_eliminate_]: 8.51002e-06 [virtual_output]: 8.47998e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 8.68001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.601e-05 [merge_recompute_call_nodes]: 6.49976e-07 [before_grad]: 1.413e-05 [set_forward_comm_id_for_comm_node_pass]: 5.38002e-06 [meta_fg_expand]: 3.11001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.308e-05 [a_after_grad]: 1.414e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.21997e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 1.088e-05 [cse]: 2.622e-05 [a_3]: 5.976e-05 [py_interpret_to_execute_after_opt_a]: 9.98998e-06 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 5.277e-05 [convert_after_rewriter]: 9.31e-06 [order_py_execute_after_rewriter]: 7.23e-06 [mutable_eliminate]: 0.00046147 [opt_b]: 0.00028947, [1] [Cycle 1]: 0.00028323, [7] [b_1]: 0.0001909 [b_2]: 1.082e-05 [updatestate_depend_eliminate]: 6.93e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 4.07e-06 [renormalize]: 7.10017e-07 [cse]: 3.071e-05 [optimize_parallel_all_gather_comm]: 2.527e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.114e-05 [loop_unroll]: 0.00046814 [opt_after_cconv]: 0.00013691, [1] [Cycle 1]: 0.00013092, [7] [c_1]: 4.914e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 7.33999e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.77998e-06 [cse]: 2.961e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 2.783e-05 [tuple_transform]: 0.00010271, [1] [Cycle 1]: 9.795e-05, [4] [d_1]: 6.72e-05 [none_parameter_eliminate]: 1.87999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 1.006e-05 [partial_unused_args_eliminate]: 2.09e-06 [add_recomputation]: 5.756e-05 [cse_after_recomputation]: 3.227e-05, [1] [Cycle 1]: 2.742e-05, [1] [cse]: 2.122e-05 [environ_conv]: 8.74003e-06 [swap_dp_allreduce_reducescatter]: 8.08001e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 1.21002e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.698e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.04998e-06 [overlap_recompute_and_grad_model_parallel]: 5.47999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.27001e-06 [overlap_grad_ring_attention]: 5.47999e-06 [overlap_grad_flash_sp]: 2.488e-05 [begin_end_overlap_inline]: 7.7e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.44e-06 [symbol_engine_optimizer]: 0.00010002, [1] [Cycle 1]: 9.584e-05, [6] [build]: 1.052e-05 [elim_shapecalc]: 1.348e-05 [elim_not_effective]: 1.857e-05 [opt_reshape]: 1.01e-05 [fold_const_symbol]: 1.476e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 2.491e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.98999e-06 [opt_after_jit_grad]: 0.00047095 [validate]: 4.523e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.0084204 [execute]: 7.24001e-06 Sums bootstrap : 0.000621s : 1.83% type_inference : 0.011707s : 34.48% event_method : 0.000047s : 0.14% auto_monad : 0.000128s : 0.38% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000147s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000134s : 0.39% optimize.opt_a.loop_unroll : 0.000147s : 0.43% optimize.opt_a.a_1 : 0.003303s : 9.73% optimize.opt_a.with_stream_mark : 0.000049s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001488s : 4.38% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.26% optimize.opt_a.a_after_grad : 0.000112s : 0.33% optimize.opt_a.renormalize : 0.003038s : 8.95% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000238s : 0.70% optimize.opt_a.a_3 : 0.000463s : 1.36% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000053s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000461s : 1.36% optimize.opt_b.b_1 : 0.000191s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000468s : 1.38% optimize.opt_after_cconv.c_1 : 0.000049s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.08% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000471s : 1.39% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008420s : 24.80% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000768 222 5.94% : 0.000046s : 12: substitution.arithmetic_simplify 1.82% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.63% : 0.000427s : 17: substitution.inline 2.10% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.21% : 0.000025s : 10: substitution.replace_applicator 1.37% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.52% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.67% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011630 2 86.53% : 0.010063s : 1: type_inference.infer 13.47% : 0.001567s : 1: type_inference.specialize ------[replace.] 0.000226 33 57.29% : 0.000129s : 17: replace.inline 42.71% : 0.000096s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000453 33 92.39% : 0.000418s : 17: match.inline 7.61% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000792 5764 1.02% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.01% : 0.000008s : 68: predicate.addn_zero_filter 1.00% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.93% : 0.000015s : 100: predicate.arithmetic_simplify 1.07% : 0.000008s : 68: predicate.cast_eliminate 1.06% : 0.000008s : 68: predicate.check_bprop_eliminate 0.49% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.14% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.13% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.04% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.12% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_depend_swap 1.64% : 0.000013s : 108: predicate.environ_get_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.63% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.23% : 0.000018s : 101: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.35% : 0.000042s : 249: predicate.inline 1.21% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.57% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.54% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.32% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.04% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.05% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.04% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.13% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.92% : 0.000015s : 101: predicate.partial_defer_inline 1.68% : 0.000013s : 92: predicate.partial_eliminate 1.00% : 0.000008s : 68: predicate.print_const_string_wrapper 0.50% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 7.68% : 0.000061s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.83% : 0.000014s : 152: predicate.replace_applicator 0.56% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.02% : 0.000008s : 68: predicate.reshape_eliminate 1.09% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.17% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.58% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.17% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.09% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.77% : 0.000014s : 101: predicate.switch_defer_inline 2.78% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.79% : 0.000038s : 277: predicate.switch_simplify 1.03% : 0.000008s : 68: predicate.tile_eliminate 1.01% : 0.000008s : 68: predicate.transpose_eliminate 1.39% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.68% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.86% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.54% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.50% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.07% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.53% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.12% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001583 34 57.96% : 0.000918s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.04% : 0.000665s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063113 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.81% : 0.003038s : 1: add_attr 4.80% : 0.003028s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000135s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.04% : 0.000658s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.76% : 0.000477s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.94% : 0.005009s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.62% : 0.011121s : 1: opt_a 0.22% : 0.000140s : 1: opt_after_cconv 0.76% : 0.000481s : 1: opt_after_jit_grad 0.46% : 0.000293s : 1: opt_b 21.33% : 0.013459s : 1: optimize 0.05% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.63% : 0.001661s : 2: renormalize.infer 2.16% : 0.001365s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000057s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000103s : 1: symbol_engine_optimizer 13.36% : 0.008431s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 18.57% : 0.011722s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0188737, [24] [bootstrap]: 0.00048818 [type_inference]: 0.0044028 [event_method]: 1.124e-05 [auto_monad]: 5.501e-05 [graph_reusing]: 5.00001e-06 [inline]: 1.91e-06 [add_attr]: 0.00303134, [1] [add_attr_with_inline]: 0.0030237, [1] [Cycle 1]: 4.784e-05, [2] [tag_attr]: 1.238e-05 [meta_addattr_fg_expand]: 3.36001e-06 [parallel-infer-symbol]: 2.66999e-06 [pre_auto_parallel]: 2.184e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.30002e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.00375901, [53] [py_interpret_to_execute]: 1.578e-05 [rewriter_before_opt_a]: 3.949e-05 [opt_a]: 0.00188412, [2] [Cycle 1]: 0.00127706, [45] [expand_dump_flag]: 2.78003e-06 [switch_simplify]: 2.553e-05 [loop_unroll]: 1.373e-05 [a_1]: 0.00029846 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.40998e-06 [updatestate_depend_eliminate]: 4.56002e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.728e-05 [accelerated_algorithm]: 6.44001e-06 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 1.76003e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 8.43001e-06 [auto_parallel]: 5.98002e-06 [parallel]: 1.81e-05 [flash_sp]: 8.33999e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.18998e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.72999e-06 [virtual_output]: 5.37999e-06 [merge_forward]: 3.68999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.136e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.67999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.76999e-06 [receive_attached]: 2.86e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 9.74999e-06 [renormalize]: 0.00034449 [add_forward_monad_depend]: 4.73001e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.345e-05 [cse]: 2.939e-05 [a_3]: 4.132e-05 [Cycle 2]: 0.00059793, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 7e-06 [loop_unroll]: 5.69999e-06 [a_1]: 0.0001263 [with_stream_mark]: 1.186e-05 [recompute_prepare]: 6.01998e-06 [updatestate_depend_eliminate]: 2.88998e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.823e-05 [accelerated_algorithm]: 5.72001e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.02e-06 [parallel]: 4.19997e-06 [flash_sp]: 3.13e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.79999e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.01998e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.14998e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27999e-06 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.19002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.50001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.39996e-07 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.07999e-06 [cse]: 1.254e-05 [a_3]: 3.349e-05 [py_interpret_to_execute_after_opt_a]: 7.16001e-06 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 3.192e-05 [convert_after_rewriter]: 6.63998e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.00049753 [opt_b]: 0.00018481, [1] [Cycle 1]: 0.0001791, [7] [b_1]: 0.00011067 [b_2]: 7.59002e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.28998e-06 [renormalize]: 3.69997e-07 [cse]: 1.614e-05 [optimize_parallel_all_gather_comm]: 1.623e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.363e-05 [loop_unroll]: 0.00041934 [opt_after_cconv]: 9.469e-05, [1] [Cycle 1]: 8.925e-05, [7] [c_1]: 2.791e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.595e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 1.295e-05 [tuple_transform]: 6.985e-05, [1] [Cycle 1]: 6.555e-05, [4] [d_1]: 3.95e-05 [none_parameter_eliminate]: 1.89999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.44001e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.449e-05 [cse_after_recomputation]: 2.158e-05, [1] [Cycle 1]: 1.722e-05, [1] [cse]: 1.196e-05 [environ_conv]: 5.02999e-06 [swap_dp_allreduce_reducescatter]: 5.36002e-06 [bias_add_comm_swap]: 2.69999e-06 [label_micro_interleaved_index]: 4.17998e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 3.02002e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.30001e-06 [add_comm_op_reuse_tag]: 1.32999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.202e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.75002e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.713e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 9.40025e-07 [symbol_engine_optimizer]: 6.918e-05, [1] [Cycle 1]: 6.507e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.48001e-06 [elim_not_effective]: 1.182e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 9.02999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.654e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00044858 [validate]: 3.092e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.00637916 [execute]: 7.11999e-06 Sums bootstrap : 0.000488s : 3.28% type_inference : 0.004403s : 29.58% event_method : 0.000011s : 0.08% auto_monad : 0.000055s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000033s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000425s : 2.85% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000345s : 2.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000042s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000498s : 3.34% optimize.opt_b.b_1 : 0.000111s : 0.74% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000419s : 2.82% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000449s : 3.01% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006379s : 42.87% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000124 26 18.39% : 0.000023s : 4: substitution.arithmetic_simplify 1.71% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.68% : 0.000006s : 4: substitution.graph_param_transform 65.25% : 0.000081s : 2: substitution.inline 2.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.49% : 0.000004s : 4: substitution.remove_not_recompute_node 3.07% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004361 2 91.80% : 0.004003s : 1: type_inference.infer 8.20% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.51% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.70% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.02% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.48% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 42.84% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.16% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026954 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.26% : 0.003036s : 1: add_attr 11.23% : 0.003027s : 1: add_attr_with_inline 0.02% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.95% : 0.000525s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000428s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.88% : 0.000507s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.90% : 0.000783s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.00% : 0.001887s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.70% : 0.000458s : 1: opt_after_jit_grad 0.70% : 0.000188s : 1: opt_b 13.96% : 0.003763s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000189s : 1: renormalize.infer 0.55% : 0.000149s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 23.70% : 0.006389s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.39% : 0.004417s : 1: type_inference 0.21% : 0.000058s : 1: validate TotalTime = 0.0362632, [24] [bootstrap]: 0.00053412 [type_inference]: 0.0103581 [event_method]: 6.635e-05 [auto_monad]: 0.00011836 [graph_reusing]: 8.45001e-06 [inline]: 1.82999e-06 [add_attr]: 0.00298253, [1] [add_attr_with_inline]: 0.00297402, [1] [Cycle 1]: 6.805e-05, [2] [tag_attr]: 3.164e-05 [meta_addattr_fg_expand]: 9.00999e-06 [parallel-infer-symbol]: 2.77002e-06 [pre_auto_parallel]: 4.58e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.83997e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.0130739, [53] [py_interpret_to_execute]: 3.57e-05 [rewriter_before_opt_a]: 0.00012777 [opt_a]: 0.0108, [3] [Cycle 1]: 0.00689504, [45] [expand_dump_flag]: 3.65e-06 [switch_simplify]: 6.698e-05 [loop_unroll]: 5.486e-05 [a_1]: 0.00136665 [with_stream_mark]: 2.392e-05 [recompute_prepare]: 2.067e-05 [updatestate_depend_eliminate]: 9.07001e-06 [updatestate_assign_eliminate]: 7.48999e-06 [updatestate_loads_eliminate]: 7.47998e-06 [parameter_eliminate]: 3.08998e-06 [a_2]: 0.00024402 [accelerated_algorithm]: 3.06e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 3.23e-06 [shard_inline]: 1.612e-05 [merge_send_recv]: 1.567e-05 [auto_parallel]: 1.079e-05 [parallel]: 1.751e-05 [flash_sp]: 1.194e-05 [merge_comm]: 9.67001e-06 [allreduce_fusion]: 9.31e-06 [matmul_add_comm_reduction]: 2.628e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.816e-05 [virtual_dataset]: 1.574e-05 [get_grad_eliminate_]: 1.501e-05 [virtual_output]: 1.513e-05 [merge_forward]: 9.67999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.834e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.874e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 2.677e-05 [set_forward_comm_id_for_comm_node_pass]: 9.35001e-06 [meta_fg_expand]: 0.00138598 [flash_sp_send_recv_attached]: 3.84002e-06 [receive_attached]: 2.43002e-06 [after_resolve]: 5.918e-05 [a_after_grad]: 8.107e-05 [renormalize]: 0.00239772 [add_forward_monad_depend]: 8.84998e-06 [auto_monad_grad]: 5.29998e-06 [auto_monad_eliminator]: 5.505e-05 [cse]: 0.0001642 [a_3]: 0.00033457 [Cycle 2]: 0.00298526, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.682e-05 [loop_unroll]: 4.418e-05 [a_1]: 0.00156204 [with_stream_mark]: 1.252e-05 [recompute_prepare]: 1.098e-05 [updatestate_depend_eliminate]: 5.04998e-06 [updatestate_assign_eliminate]: 4.42e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012647 [accelerated_algorithm]: 1.179e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 9.20001e-06 [merge_send_recv]: 6.44001e-06 [auto_parallel]: 7.03e-06 [parallel]: 4.57e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 4.97e-06 [allreduce_fusion]: 4.47998e-06 [matmul_add_comm_reduction]: 7.88001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.48997e-06 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.84998e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.14002e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.603e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.373e-05 [set_forward_comm_id_for_comm_node_pass]: 5.94e-06 [meta_fg_expand]: 3.444e-05 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.561e-05 [a_after_grad]: 1.441e-05 [renormalize]: 0.0005826 [add_forward_monad_depend]: 4.07e-06 [auto_monad_grad]: 1.13001e-06 [auto_monad_eliminator]: 1.419e-05 [cse]: 4.605e-05 [a_3]: 6.455e-05 [Cycle 3]: 0.00090597, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.069e-05 [loop_unroll]: 8.97e-06 [a_1]: 0.0002521 [with_stream_mark]: 1.046e-05 [recompute_prepare]: 9.67999e-06 [updatestate_depend_eliminate]: 4.85001e-06 [updatestate_assign_eliminate]: 3.93999e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 0.00012346 [accelerated_algorithm]: 1.16e-05 [shard]: 9.90025e-07 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 9.00001e-06 [merge_send_recv]: 7.26001e-06 [auto_parallel]: 7.06001e-06 [parallel]: 4.75001e-06 [flash_sp]: 1.09e-06 [merge_comm]: 4.97999e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.75e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 1.034e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.47e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.06001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.52998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.624e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.05999e-06 [meta_fg_expand]: 3.21999e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.306e-05 [a_after_grad]: 1.417e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 1.04998e-06 [auto_monad_eliminator]: 1.093e-05 [cse]: 2.642e-05 [a_3]: 5.95e-05 [py_interpret_to_execute_after_opt_a]: 1.067e-05 [slice_cell_reuse_recomputed_activation]: 2.27001e-06 [rewriter_after_opt_a]: 4.757e-05 [convert_after_rewriter]: 9.37001e-06 [order_py_execute_after_rewriter]: 6.76999e-06 [mutable_eliminate]: 0.00046116 [opt_b]: 0.00028483, [1] [Cycle 1]: 0.00027896, [7] [b_1]: 0.00018772 [b_2]: 1.046e-05 [updatestate_depend_eliminate]: 7.34002e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 3.30008e-07 [cse]: 3.074e-05 [optimize_parallel_all_gather_comm]: 2.042e-05 [overlap_param_gather]: 2.59001e-06 [cconv]: 2.073e-05 [loop_unroll]: 0.00042346 [opt_after_cconv]: 0.00013538, [1] [Cycle 1]: 0.00012978, [7] [c_1]: 4.892e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 7.2e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 4.04002e-06 [cse]: 2.905e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 2.887e-05 [tuple_transform]: 0.00010078, [1] [Cycle 1]: 9.602e-05, [4] [d_1]: 6.582e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.82999e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 5.753e-05 [cse_after_recomputation]: 3.187e-05, [1] [Cycle 1]: 2.712e-05, [1] [cse]: 2.139e-05 [environ_conv]: 3.225e-05 [swap_dp_allreduce_reducescatter]: 8.55001e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 4.96997e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.83e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.23998e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.40999e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.787e-05 [grouped_pairwise_exchange_alltoall]: 1.79998e-06 [offloading_packed_experts]: 5.15001e-06 [overlap_recompute_and_grad_model_parallel]: 6.14999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.36002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.64001e-06 [overlap_grad_ring_attention]: 5.25999e-06 [overlap_grad_flash_sp]: 2.397e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 0.00010015, [1] [Cycle 1]: 9.582e-05, [6] [build]: 9.91998e-06 [elim_shapecalc]: 1.386e-05 [elim_not_effective]: 1.853e-05 [opt_reshape]: 1.021e-05 [fold_const_symbol]: 1.48e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.47999e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 2.547e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.55998e-06 [opt_after_jit_grad]: 0.00046647 [validate]: 4.529e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00830066 [execute]: 7.21999e-06 Sums bootstrap : 0.000534s : 1.67% type_inference : 0.010358s : 32.35% event_method : 0.000066s : 0.21% auto_monad : 0.000118s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000128s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003181s : 9.93% optimize.opt_a.with_stream_mark : 0.000047s : 0.15% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000494s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001424s : 4.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.002980s : 9.31% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000237s : 0.74% optimize.opt_a.a_3 : 0.000459s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000461s : 1.44% optimize.opt_b.b_1 : 0.000188s : 0.59% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000003s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000423s : 1.32% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000032s : 0.10% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000466s : 1.46% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008301s : 25.92% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000769 218 5.63% : 0.000043s : 11: substitution.arithmetic_simplify 1.73% : 0.000013s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 56.84% : 0.000437s : 16: substitution.inline 2.02% : 0.000016s : 2: substitution.inline_without_move 1.30% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000014s : 11: substitution.minmaximum_grad 0.65% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.17% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.09% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010287 2 87.53% : 0.009004s : 1: type_inference.infer 12.47% : 0.001283s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.09% : 0.000118s : 16: replace.inline 40.91% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000426 30 92.79% : 0.000396s : 16: match.inline 7.21% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000733 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000008s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.80% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000041s : 244: predicate.inline 1.29% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.14% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.91% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.10% : 0.000008s : 67: predicate.tile_eliminate 1.13% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001479 32 58.23% : 0.000861s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.77% : 0.000618s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060448 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.94% : 0.002987s : 1: add_attr 4.93% : 0.002978s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000125s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.94% : 0.000571s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.06% : 0.000036s : 1: environ_conv 0.12% : 0.000074s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.71% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.78% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.98% : 0.004824s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.87% : 0.010803s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.79% : 0.000476s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.63% : 0.013078s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.67% : 0.001612s : 2: renormalize.infer 2.24% : 0.001356s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000132s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.75% : 0.008311s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.16% : 0.010373s : 1: type_inference 0.13% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x1-kbk],max_mem:12.0M TotalTime = 0.131921, [24] [bootstrap]: 0.00060713 [type_inference]: 0.00638902 [event_method]: 1.413e-05 [auto_monad]: 5.662e-05 [graph_reusing]: 5.37999e-06 [inline]: 1.95001e-06 [add_attr]: 0.00354621, [1] [add_attr_with_inline]: 0.00353508, [1] [Cycle 1]: 4.657e-05, [2] [tag_attr]: 1.574e-05 [meta_addattr_fg_expand]: 4.28001e-06 [parallel-infer-symbol]: 2.83998e-06 [pre_auto_parallel]: 2.863e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.0040228, [53] [py_interpret_to_execute]: 2.096e-05 [rewriter_before_opt_a]: 6.118e-05 [opt_a]: 0.00216013, [2] [Cycle 1]: 0.00156489, [45] [expand_dump_flag]: 3.15998e-06 [switch_simplify]: 3.352e-05 [loop_unroll]: 2.086e-05 [a_1]: 0.00045997 [with_stream_mark]: 1.341e-05 [recompute_prepare]: 7.83001e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 3.7e-06 [updatestate_loads_eliminate]: 3.32002e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.687e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 2.25002e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 2.642e-05 [merge_send_recv]: 8.52998e-06 [auto_parallel]: 6.49999e-06 [parallel]: 2.606e-05 [flash_sp]: 7.43999e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.56e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 8.00999e-06 [virtual_dataset]: 6.24001e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.36002e-06 [offload_activation]: 9.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.119e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.63998e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.74001e-06 [after_resolve]: 1.035e-05 [a_after_grad]: 8.86002e-06 [renormalize]: 0.00042298 [add_forward_monad_depend]: 4.92e-06 [auto_monad_grad]: 1.95001e-06 [auto_monad_eliminator]: 1.367e-05 [cse]: 2.78e-05 [a_3]: 4.021e-05 [Cycle 2]: 0.0005855, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.46e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012635 [with_stream_mark]: 9.67999e-06 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.779e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.33002e-06 [parallel]: 4.21001e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.57001e-06 [matmul_add_comm_reduction]: 5.14998e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 5.76e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79999e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.68999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.94998e-06 [a_after_grad]: 7.63001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 5.92999e-06 [cse]: 1.193e-05 [a_3]: 3.199e-05 [py_interpret_to_execute_after_opt_a]: 7.75e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.194e-05 [convert_after_rewriter]: 7.26001e-06 [order_py_execute_after_rewriter]: 4.79002e-06 [mutable_eliminate]: 0.00045057 [opt_b]: 0.00018324, [1] [Cycle 1]: 0.00017738, [7] [b_1]: 0.00010623 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.09e-06 [renormalize]: 4.2998e-07 [cse]: 1.615e-05 [optimize_parallel_all_gather_comm]: 1.637e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.277e-05 [loop_unroll]: 0.00042356 [opt_after_cconv]: 9.445e-05, [1] [Cycle 1]: 8.909e-05, [7] [c_1]: 2.823e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.587e-05 [renormalize]: 4.7998e-07 [remove_dup_value]: 1.28e-05 [tuple_transform]: 6.831e-05, [1] [Cycle 1]: 6.413e-05, [4] [d_1]: 3.868e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.02999e-06 [partial_unused_args_eliminate]: 2.24999e-06 [add_recomputation]: 5.233e-05 [cse_after_recomputation]: 2.065e-05, [1] [Cycle 1]: 1.615e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 2.81999e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.21998e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.33002e-06 [full_micro_interleaved_order_control]: 2.60002e-06 [reorder_send_recv_between_fp_bp]: 3.06001e-06 [comm_op_add_attrs]: 1.29998e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.198e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.63e-06 [overlap_grad_ring_attention]: 3.71999e-06 [overlap_grad_flash_sp]: 1.705e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 1.23002e-06 [symbol_engine_optimizer]: 6.698e-05, [1] [Cycle 1]: 6.305e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.28999e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.67e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.559e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00045397 [validate]: 3.216e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.116498 [execute]: 1.014e-05 Sums bootstrap : 0.000607s : 0.48% type_inference : 0.006389s : 5.02% event_method : 0.000014s : 0.01% auto_monad : 0.000057s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000061s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000586s : 0.46% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.11% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000032s : 0.03% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000030s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.01% optimize.opt_a.renormalize : 0.000423s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.35% optimize.opt_b.b_1 : 0.000106s : 0.08% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000424s : 0.33% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000454s : 0.36% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.116498s : 91.45% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000169 30 14.73% : 0.000025s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.07% : 0.000005s : 4: substitution.graph_param_transform 66.73% : 0.000113s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.92% : 0.000005s : 4: substitution.remove_not_recompute_node 2.36% : 0.000004s : 4: substitution.replace_old_param 6.76% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006341 2 90.65% : 0.005748s : 1: type_inference.infer 9.35% : 0.000593s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.01% : 0.000029s : 3: replace.inline 29.99% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.44% : 0.000111s : 3: match.inline 8.56% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.40% : 0.000004s : 19: predicate.arithmetic_simplify 0.95% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 54: predicate.switch_simplify 0.89% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 44.21% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.79% : 0.000208s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.141032 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.52% : 0.003550s : 1: add_attr 2.51% : 0.003539s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000062s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.46% : 0.000650s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000459s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000975s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000089s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.53% : 0.002163s : 1: opt_a 0.07% : 0.000098s : 1: opt_after_cconv 0.33% : 0.000463s : 1: opt_after_jit_grad 0.13% : 0.000187s : 1: opt_b 2.86% : 0.004027s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.15% : 0.000218s : 1: renormalize.infer 0.14% : 0.000198s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000065s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 82.62% : 0.116521s : 1: task_emit 0.05% : 0.000071s : 1: tuple_transform 4.54% : 0.006403s : 1: type_inference 0.04% : 0.000058s : 1: validate TotalTime = 0.11254, [24] [bootstrap]: 0.00051725 [type_inference]: 0.0045712 [event_method]: 1.065e-05 [auto_monad]: 5.255e-05 [graph_reusing]: 6.06e-06 [inline]: 1.66e-06 [add_attr]: 0.00300771, [1] [add_attr_with_inline]: 0.00299964, [1] [Cycle 1]: 4.284e-05, [2] [tag_attr]: 1.255e-05 [meta_addattr_fg_expand]: 3.44001e-06 [parallel-infer-symbol]: 2.73003e-06 [pre_auto_parallel]: 2.251e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.94999e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.00375125, [53] [py_interpret_to_execute]: 1.518e-05 [rewriter_before_opt_a]: 4.03e-05 [opt_a]: 0.00188346, [2] [Cycle 1]: 0.00127938, [45] [expand_dump_flag]: 2.89001e-06 [switch_simplify]: 2.461e-05 [loop_unroll]: 1.34e-05 [a_1]: 0.00029865 [with_stream_mark]: 1.494e-05 [recompute_prepare]: 7.2e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 1.78002e-06 [a_2]: 7.571e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.47001e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.27e-06 [auto_parallel]: 5.91e-06 [parallel]: 1.895e-05 [flash_sp]: 7.52002e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 9.43002e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.31001e-06 [virtual_dataset]: 5.97999e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.24003e-06 [offload_activation]: 1.005e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.157e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 2.56998e-06 [after_resolve]: 1.063e-05 [a_after_grad]: 8.63001e-06 [renormalize]: 0.00035569 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.32e-05 [cse]: 2.888e-05 [a_3]: 3.941e-05 [Cycle 2]: 0.00059497, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 6.47001e-06 [loop_unroll]: 5.32999e-06 [a_1]: 0.00012584 [with_stream_mark]: 9.43002e-06 [recompute_prepare]: 5.74999e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 6.751e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.38999e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.15002e-06 [virtual_dataset]: 5.57999e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.78002e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.36002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 8.80999e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.00002e-06 [cse]: 1.291e-05 [a_3]: 3.186e-05 [py_interpret_to_execute_after_opt_a]: 7.05e-06 [slice_cell_reuse_recomputed_activation]: 2.37999e-06 [rewriter_after_opt_a]: 8.39e-05 [convert_after_rewriter]: 6.99001e-06 [order_py_execute_after_rewriter]: 5.23002e-06 [mutable_eliminate]: 0.00045104 [opt_b]: 0.00018112, [1] [Cycle 1]: 0.00017493, [7] [b_1]: 0.00010782 [b_2]: 6.81001e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 4.00003e-07 [cse]: 1.63e-05 [optimize_parallel_all_gather_comm]: 1.575e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.248e-05 [loop_unroll]: 0.0004157 [opt_after_cconv]: 9.355e-05, [1] [Cycle 1]: 8.789e-05, [7] [c_1]: 2.78e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 4.83001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.564e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.373e-05 [tuple_transform]: 6.77e-05, [1] [Cycle 1]: 6.359e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.07001e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.41e-05 [cse_after_recomputation]: 1.945e-05, [1] [Cycle 1]: 1.522e-05, [1] [cse]: 1.024e-05 [environ_conv]: 5.14e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.40001e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.162e-05 [grouped_pairwise_exchange_alltoall]: 1.99e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.79998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.06998e-06 [overlap_grad_ring_attention]: 4.19002e-06 [overlap_grad_flash_sp]: 1.835e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.869e-05, [1] [Cycle 1]: 6.452e-05, [6] [build]: 2.52001e-06 [elim_shapecalc]: 8.55001e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.78002e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.578e-05 [get_jit_bprop_graph]: 9.29984e-07 [rewriter_after_jit_bprop_graph]: 3.28e-06 [opt_after_jit_grad]: 0.00044978 [validate]: 3.227e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0998633 [execute]: 1.023e-05 Sums bootstrap : 0.000517s : 0.48% type_inference : 0.004571s : 4.21% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000424s : 0.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000356s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000084s : 0.08% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.42% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000416s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.41% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.099863s : 91.99% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000126 26 17.59% : 0.000022s : 4: substitution.arithmetic_simplify 1.35% : 0.000002s : 2: substitution.elim_not_effective 1.16% : 0.000001s : 2: substitution.fold_const_symbol 4.37% : 0.000006s : 4: substitution.graph_param_transform 66.70% : 0.000084s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000005s : 4: substitution.remove_not_recompute_node 3.01% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004527 2 91.68% : 0.004150s : 1: type_inference.infer 8.32% : 0.000377s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000082 2 100.00% : 0.000082s : 2: match.inline ------[predicate.] 0.000138 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.73% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.94% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 5.87% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.96% : 0.000001s : 9: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.07% : 0.000001s : 8: predicate.shard_identity_eliminate 0.98% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.98% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000277 6 41.02% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.98% : 0.000163s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120585 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.50% : 0.003012s : 1: add_attr 2.49% : 0.003003s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.46% : 0.000554s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.64% : 0.000774s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.56% : 0.001886s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.38% : 0.000459s : 1: opt_after_jit_grad 0.15% : 0.000185s : 1: opt_b 3.11% : 0.003755s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000190s : 1: renormalize.infer 0.13% : 0.000159s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000088s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 82.83% : 0.099886s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.80% : 0.004585s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.113096, [24] [bootstrap]: 0.00049031 [type_inference]: 0.00562259 [event_method]: 1.424e-05 [auto_monad]: 5.682e-05 [graph_reusing]: 6.06998e-06 [inline]: 1.82001e-06 [add_attr]: 0.00298309, [1] [add_attr_with_inline]: 0.00297562, [1] [Cycle 1]: 4.761e-05, [2] [tag_attr]: 1.599e-05 [meta_addattr_fg_expand]: 4.77e-06 [parallel-infer-symbol]: 3.21999e-06 [pre_auto_parallel]: 2.588e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 6.00005e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.77999e-06 [optimize]: 0.00399906, [53] [py_interpret_to_execute]: 2.131e-05 [rewriter_before_opt_a]: 5.876e-05 [opt_a]: 0.00215619, [2] [Cycle 1]: 0.00155348, [45] [expand_dump_flag]: 2.80002e-06 [switch_simplify]: 3.239e-05 [loop_unroll]: 2.099e-05 [a_1]: 0.00045324 [with_stream_mark]: 1.319e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.583e-05 [accelerated_algorithm]: 6.46999e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.68002e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 7.89002e-06 [auto_parallel]: 5.76e-06 [parallel]: 1.828e-05 [flash_sp]: 6.96001e-06 [merge_comm]: 3.49001e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.85001e-06 [allreduce_slice_to_reducescatter]: 6.49976e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.45001e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.24e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.98e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.053e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.00042019 [add_forward_monad_depend]: 4.40999e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.518e-05 [cse]: 2.814e-05 [a_3]: 4.1e-05 [Cycle 2]: 0.00059342, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.46e-06 [loop_unroll]: 5.12999e-06 [a_1]: 0.00012382 [with_stream_mark]: 9.64e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.63003e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.21e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.695e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.35001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.51998e-06 [parallel]: 4.2e-06 [flash_sp]: 4.04997e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.95002e-06 [matmul_add_comm_reduction]: 5.11997e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 5.87999e-06 [virtual_dataset]: 5.10999e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 4.82998e-06 [merge_forward]: 2.31998e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.81003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.17001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.94002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.55999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.376e-05 [a_3]: 3.188e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 3.187e-05 [convert_after_rewriter]: 7.21999e-06 [order_py_execute_after_rewriter]: 5.42001e-06 [mutable_eliminate]: 0.00044945 [opt_b]: 0.00018073, [1] [Cycle 1]: 0.00017484, [7] [b_1]: 0.00010714 [b_2]: 7.04001e-06 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.29979e-07 [cse]: 1.671e-05 [optimize_parallel_all_gather_comm]: 1.672e-05 [overlap_param_gather]: 2.44999e-06 [cconv]: 2.33e-05 [loop_unroll]: 0.00041701 [opt_after_cconv]: 9.453e-05, [1] [Cycle 1]: 8.901e-05, [7] [c_1]: 2.754e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.644e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.296e-05 [tuple_transform]: 6.864e-05, [1] [Cycle 1]: 6.441e-05, [4] [d_1]: 3.905e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.518e-05 [cse_after_recomputation]: 2.038e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.80001e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.42001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.68003e-06 [merge_cast_opt]: 1.69998e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.14003e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.162e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 3.51999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41998e-06 [overlap_recompute_comm]: 2.01e-06 [overlap_grad_ring_attention]: 3.78999e-06 [overlap_grad_flash_sp]: 1.806e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.49999e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 6.749e-05, [1] [Cycle 1]: 6.345e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 8.2e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.6e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.0004522 [validate]: 3.185e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.0991527 [execute]: 9.69e-06 Sums bootstrap : 0.000490s : 0.45% type_inference : 0.005623s : 5.15% event_method : 0.000014s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000577s : 0.53% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000420s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.41% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000417s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000452s : 0.41% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.099153s : 90.89% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.93% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.20% : 0.000005s : 4: substitution.graph_param_transform 67.14% : 0.000113s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.36% : 0.000004s : 4: substitution.remove_not_recompute_node 2.21% : 0.000004s : 4: substitution.replace_old_param 6.67% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005579 2 89.97% : 0.005019s : 1: type_inference.infer 10.03% : 0.000559s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.33% : 0.000027s : 3: replace.inline 29.67% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.65% : 0.000111s : 3: match.inline 8.35% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.18% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.79% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000002s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.61% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.73% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.36% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 47.07% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.93% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.121588 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.46% : 0.002987s : 1: add_attr 2.45% : 0.002979s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000062s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.43% : 0.000529s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000011s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.77% : 0.000939s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.78% : 0.002159s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.38% : 0.000462s : 1: opt_after_jit_grad 0.15% : 0.000184s : 1: opt_b 3.29% : 0.004003s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000207s : 1: renormalize.infer 0.17% : 0.000207s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.57% : 0.099176s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.64% : 0.005637s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.148679, [24] [bootstrap]: 0.00052674 [type_inference]: 0.0114957 [event_method]: 4.906e-05 [auto_monad]: 0.00012264 [graph_reusing]: 8.56002e-06 [inline]: 2.16e-06 [add_attr]: 0.00303889, [1] [add_attr_with_inline]: 0.00303068, [1] [Cycle 1]: 7.192e-05, [2] [tag_attr]: 3.559e-05 [meta_addattr_fg_expand]: 9.52001e-06 [parallel-infer-symbol]: 2.99999e-06 [pre_auto_parallel]: 5.058e-05 [insert-virtual-dataset]: 2.45002e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.0134475, [53] [py_interpret_to_execute]: 3.935e-05 [rewriter_before_opt_a]: 0.00014613 [opt_a]: 0.0111153, [3] [Cycle 1]: 0.00715648, [45] [expand_dump_flag]: 3.79002e-06 [switch_simplify]: 7.506e-05 [loop_unroll]: 6.286e-05 [a_1]: 0.00148155 [with_stream_mark]: 2.317e-05 [recompute_prepare]: 2.179e-05 [updatestate_depend_eliminate]: 9.64999e-06 [updatestate_assign_eliminate]: 8.08001e-06 [updatestate_loads_eliminate]: 7.56001e-06 [parameter_eliminate]: 2.82002e-06 [a_2]: 0.00024896 [accelerated_algorithm]: 3.182e-05 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.651e-05 [merge_send_recv]: 1.616e-05 [auto_parallel]: 1.055e-05 [parallel]: 1.896e-05 [flash_sp]: 1.148e-05 [merge_comm]: 9.61e-06 [allreduce_fusion]: 8.84e-06 [matmul_add_comm_reduction]: 2.793e-05 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 1.806e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.553e-05 [virtual_output]: 1.521e-05 [merge_forward]: 1.013e-05 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.864e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.973e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 2.697e-05 [set_forward_comm_id_for_comm_node_pass]: 9.92001e-06 [meta_fg_expand]: 0.00140176 [flash_sp_send_recv_attached]: 3.54002e-06 [receive_attached]: 2.79999e-06 [after_resolve]: 5.969e-05 [a_after_grad]: 8.117e-05 [renormalize]: 0.00248392 [add_forward_monad_depend]: 9.17001e-06 [auto_monad_grad]: 5.04998e-06 [auto_monad_eliminator]: 5.759e-05 [cse]: 0.00017179 [a_3]: 0.00033608 [Cycle 2]: 0.00304499, [45] [expand_dump_flag]: 1.84998e-06 [switch_simplify]: 4.768e-05 [loop_unroll]: 4.482e-05 [a_1]: 0.00153706 [with_stream_mark]: 1.192e-05 [recompute_prepare]: 1.083e-05 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 4.27003e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 0.00014592 [accelerated_algorithm]: 1.25e-05 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.30001e-06 [merge_send_recv]: 6.66999e-06 [auto_parallel]: 7.65e-06 [parallel]: 4.58999e-06 [flash_sp]: 3.91001e-06 [merge_comm]: 5.77999e-06 [allreduce_fusion]: 5.40001e-06 [matmul_add_comm_reduction]: 7.95e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.021e-05 [virtual_dataset]: 9.09e-06 [get_grad_eliminate_]: 8.45001e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.60001e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 9.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.621e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22e-06 [meta_fg_expand]: 6.992e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.562e-05 [a_after_grad]: 1.433e-05 [renormalize]: 0.00060086 [add_forward_monad_depend]: 3.92002e-06 [auto_monad_grad]: 1.21997e-06 [auto_monad_eliminator]: 1.485e-05 [cse]: 4.612e-05 [a_3]: 6.563e-05 [Cycle 3]: 0.00089971, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.034e-05 [loop_unroll]: 8.82999e-06 [a_1]: 0.00025072 [with_stream_mark]: 9.87999e-06 [recompute_prepare]: 9.04e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012326 [accelerated_algorithm]: 1.156e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 8.97e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 6.97002e-06 [parallel]: 4.87998e-06 [flash_sp]: 1.00999e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 4.80001e-06 [matmul_add_comm_reduction]: 7.88999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 9.81e-06 [virtual_dataset]: 8.38001e-06 [get_grad_eliminate_]: 8.39998e-06 [virtual_output]: 8.37998e-06 [merge_forward]: 4.12003e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.577e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.367e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20001e-06 [meta_fg_expand]: 2.96999e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.3e-05 [a_after_grad]: 1.429e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.109e-05 [cse]: 2.701e-05 [a_3]: 5.983e-05 [py_interpret_to_execute_after_opt_a]: 1.082e-05 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 4.816e-05 [convert_after_rewriter]: 9.36998e-06 [order_py_execute_after_rewriter]: 6.87002e-06 [mutable_eliminate]: 0.00046239 [opt_b]: 0.00028889, [1] [Cycle 1]: 0.00028229, [7] [b_1]: 0.00018906 [b_2]: 1.059e-05 [updatestate_depend_eliminate]: 6.81001e-06 [updatestate_assign_eliminate]: 4.20999e-06 [updatestate_loads_eliminate]: 3.90998e-06 [renormalize]: 4.00003e-07 [cse]: 3.249e-05 [optimize_parallel_all_gather_comm]: 2.151e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 2.021e-05 [loop_unroll]: 0.0004268 [opt_after_cconv]: 0.00013539, [1] [Cycle 1]: 0.00012954, [7] [c_1]: 4.86e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 6.91999e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.81999e-06 [cse]: 3.011e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 2.979e-05 [tuple_transform]: 0.00010145, [1] [Cycle 1]: 9.676e-05, [4] [d_1]: 6.675e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 9.88998e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 5.926e-05 [cse_after_recomputation]: 3.335e-05, [1] [Cycle 1]: 2.857e-05, [1] [cse]: 2.279e-05 [environ_conv]: 9.56e-06 [swap_dp_allreduce_reducescatter]: 7.98001e-06 [bias_add_comm_swap]: 2.19999e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.45002e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.22999e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 3.13e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 2.21e-06 [control_data_broadcast_order]: 1.698e-05 [grouped_pairwise_exchange_alltoall]: 1.74998e-06 [offloading_packed_experts]: 5.44e-06 [overlap_recompute_and_grad_model_parallel]: 5.98002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 5.37001e-06 [overlap_grad_flash_sp]: 2.46e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.13998e-06 [split_layernorm_comm]: 2.06998e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 0.00010093, [1] [Cycle 1]: 9.643e-05, [6] [build]: 1.032e-05 [elim_shapecalc]: 1.359e-05 [elim_not_effective]: 1.874e-05 [opt_reshape]: 1.001e-05 [fold_const_symbol]: 1.526e-05 [renormalize]: 2.40019e-07 [detach_backward]: 1.59998e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 2.519e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00046932 [validate]: 4.561e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.119097 [execute]: 5.351e-05 Sums bootstrap : 0.000527s : 0.36% type_inference : 0.011496s : 7.97% event_method : 0.000049s : 0.03% auto_monad : 0.000123s : 0.08% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000146s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000133s : 0.09% optimize.opt_a.loop_unroll : 0.000117s : 0.08% optimize.opt_a.a_1 : 0.003269s : 2.27% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000518s : 0.36% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000054s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001475s : 1.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003085s : 2.14% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.06% optimize.opt_a.cse : 0.000245s : 0.17% optimize.opt_a.a_3 : 0.000462s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000462s : 0.32% optimize.opt_b.b_1 : 0.000189s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000427s : 0.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000469s : 0.33% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.119097s : 82.52% execute : 0.000054s : 0.04% Time group info: ------[substitution.] 0.000769 222 5.88% : 0.000045s : 12: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 55.80% : 0.000429s : 17: substitution.inline 2.01% : 0.000015s : 2: substitution.inline_without_move 1.30% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.03% : 0.000016s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.16% : 0.000024s : 10: substitution.replace_applicator 1.28% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.62% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.65% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011417 2 86.67% : 0.009895s : 1: type_inference.infer 13.33% : 0.001522s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.73% : 0.000126s : 17: replace.inline 42.27% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000454 33 92.43% : 0.000420s : 17: match.inline 7.57% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.07% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 100: predicate.arithmetic_simplify 1.21% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.78% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.36% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.79% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.98% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.42% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.16% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001608 34 56.44% : 0.000907s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.56% : 0.000700s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.173527 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.75% : 0.003043s : 1: add_attr 1.75% : 0.003034s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000131s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.33% : 0.000566s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000056s : 1: event_method 0.04% : 0.000061s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.03% : 0.000051s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.86% : 0.004958s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.41% : 0.011118s : 1: opt_a 0.08% : 0.000139s : 1: opt_after_cconv 0.28% : 0.000479s : 1: opt_after_jit_grad 0.17% : 0.000293s : 1: opt_b 7.75% : 0.013451s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.95% : 0.001647s : 2: renormalize.infer 0.82% : 0.001425s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.09% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000104s : 1: symbol_engine_optimizer 68.65% : 0.119120s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.63% : 0.011511s : 1: type_inference 0.04% : 0.000070s : 1: validate TotalTime = 0.109418, [24] [bootstrap]: 0.0005459 [type_inference]: 0.00438925 [event_method]: 1.084e-05 [auto_monad]: 5.185e-05 [graph_reusing]: 5.66e-06 [inline]: 1.77001e-06 [add_attr]: 0.00296129, [1] [add_attr_with_inline]: 0.00295314, [1] [Cycle 1]: 4.409e-05, [2] [tag_attr]: 1.23e-05 [meta_addattr_fg_expand]: 3.55e-06 [parallel-infer-symbol]: 3.24001e-06 [pre_auto_parallel]: 2.155e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.91998e-06 [pipeline_split]: 1.85001e-06 [optimize]: 0.00369026, [53] [py_interpret_to_execute]: 1.546e-05 [rewriter_before_opt_a]: 3.878e-05 [opt_a]: 0.00187849, [2] [Cycle 1]: 0.00128143, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 2.472e-05 [loop_unroll]: 1.361e-05 [a_1]: 0.00031317 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 7.44002e-06 [updatestate_depend_eliminate]: 3.87998e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.93998e-06 [parameter_eliminate]: 1.82999e-06 [a_2]: 7.606e-05 [accelerated_algorithm]: 6.83998e-06 [shard]: 2.43002e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.83002e-06 [merge_send_recv]: 8.28001e-06 [auto_parallel]: 5.59e-06 [parallel]: 1.83e-05 [flash_sp]: 7.08998e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 9.23002e-06 [allreduce_slice_to_reducescatter]: 9.5999e-07 [virtual_shard_identity]: 6.59001e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.54e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.123e-05 [merge_recompute_call_nodes]: 1.51002e-06 [before_grad]: 9.42001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.50998e-06 [meta_fg_expand]: 2.40997e-06 [flash_sp_send_recv_attached]: 2.31998e-06 [receive_attached]: 2.55997e-06 [after_resolve]: 1.126e-05 [a_after_grad]: 8.52e-06 [renormalize]: 0.00034818 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.319e-05 [cse]: 2.835e-05 [a_3]: 3.968e-05 [Cycle 2]: 0.00058796, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 6.73998e-06 [loop_unroll]: 5.24998e-06 [a_1]: 0.00012526 [with_stream_mark]: 1.092e-05 [recompute_prepare]: 5.60001e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.656e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.09003e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.07999e-06 [parallel]: 4.13001e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 2.86999e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.45001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.02999e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 5.02999e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 5.76998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59999e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.85998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.37998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 5.89999e-06 [cse]: 1.317e-05 [a_3]: 3.194e-05 [py_interpret_to_execute_after_opt_a]: 6.83e-06 [slice_cell_reuse_recomputed_activation]: 2.01998e-06 [rewriter_after_opt_a]: 3.297e-05 [convert_after_rewriter]: 7.46999e-06 [order_py_execute_after_rewriter]: 5.64e-06 [mutable_eliminate]: 0.00045002 [opt_b]: 0.00017869, [1] [Cycle 1]: 0.00017287, [7] [b_1]: 0.00010617 [b_2]: 6.74999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 2.59985e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 2.31e-06 [cconv]: 2.296e-05 [loop_unroll]: 0.00041253 [opt_after_cconv]: 9.447e-05, [1] [Cycle 1]: 8.876e-05, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 5.09003e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.07001e-06 [cse]: 1.636e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.32e-05 [tuple_transform]: 6.931e-05, [1] [Cycle 1]: 6.478e-05, [4] [d_1]: 3.926e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.16998e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.616e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.633e-05, [1] [cse]: 1.128e-05 [environ_conv]: 5.05001e-06 [swap_dp_allreduce_reducescatter]: 5.22999e-06 [bias_add_comm_swap]: 2.61999e-06 [label_micro_interleaved_index]: 4.60999e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 3.06001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.206e-05 [grouped_pairwise_exchange_alltoall]: 1.97001e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.34002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.69001e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.763e-05, [1] [Cycle 1]: 6.339e-05, [6] [build]: 2.59999e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.141e-05 [opt_reshape]: 5.92001e-06 [fold_const_symbol]: 8.47998e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.78002e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.632e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00051541 [validate]: 3.224e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0969365 [execute]: 1.008e-05 Sums bootstrap : 0.000546s : 0.52% type_inference : 0.004389s : 4.16% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000438s : 0.42% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000348s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.43% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000413s : 0.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000515s : 0.49% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096937s : 91.89% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000122 26 17.97% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.96% : 0.000006s : 4: substitution.graph_param_transform 65.20% : 0.000080s : 2: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.59% : 0.000004s : 4: substitution.remove_not_recompute_node 3.62% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004347 2 91.70% : 0.003986s : 1: type_inference.infer 8.30% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.04% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.23% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.61% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.46% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000253 6 43.18% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.82% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.117356 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.53% : 0.002965s : 1: add_attr 2.52% : 0.002957s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.50% : 0.000582s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.67% : 0.000788s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.60% : 0.001882s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.45% : 0.000525s : 1: opt_after_jit_grad 0.16% : 0.000182s : 1: opt_b 3.15% : 0.003694s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000191s : 1: renormalize.infer 0.13% : 0.000151s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.62% : 0.096961s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.75% : 0.004403s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.151043, [24] [bootstrap]: 0.00053199 [type_inference]: 0.0103816 [event_method]: 4.229e-05 [auto_monad]: 0.0001155 [graph_reusing]: 8.13001e-06 [inline]: 1.99999e-06 [add_attr]: 0.00298884, [1] [add_attr_with_inline]: 0.00298003, [1] [Cycle 1]: 6.791e-05, [2] [tag_attr]: 3.124e-05 [meta_addattr_fg_expand]: 8.47e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 4.594e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0131774, [53] [py_interpret_to_execute]: 3.716e-05 [rewriter_before_opt_a]: 0.00016116 [opt_a]: 0.0108524, [3] [Cycle 1]: 0.00692619, [45] [expand_dump_flag]: 4.15999e-06 [switch_simplify]: 6.793e-05 [loop_unroll]: 5.502e-05 [a_1]: 0.00133839 [with_stream_mark]: 2.377e-05 [recompute_prepare]: 2.148e-05 [updatestate_depend_eliminate]: 9.61003e-06 [updatestate_assign_eliminate]: 8.33999e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.68e-06 [a_2]: 0.00024704 [accelerated_algorithm]: 3.079e-05 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 3.91999e-06 [shard_inline]: 1.604e-05 [merge_send_recv]: 1.61e-05 [auto_parallel]: 1.045e-05 [parallel]: 1.978e-05 [flash_sp]: 1.143e-05 [merge_comm]: 9.49999e-06 [allreduce_fusion]: 8.72998e-06 [matmul_add_comm_reduction]: 2.69e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 1.799e-05 [virtual_dataset]: 1.594e-05 [get_grad_eliminate_]: 1.533e-05 [virtual_output]: 1.564e-05 [merge_forward]: 9.92001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.778e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.984e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.701e-05 [set_forward_comm_id_for_comm_node_pass]: 9.25001e-06 [meta_fg_expand]: 0.00138585 [flash_sp_send_recv_attached]: 3.6e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 5.944e-05 [a_after_grad]: 8.071e-05 [renormalize]: 0.00244202 [add_forward_monad_depend]: 9.62999e-06 [auto_monad_grad]: 5.52999e-06 [auto_monad_eliminator]: 5.618e-05 [cse]: 0.00017038 [a_3]: 0.0003327 [Cycle 2]: 0.00301199, [45] [expand_dump_flag]: 1.57001e-06 [switch_simplify]: 4.666e-05 [loop_unroll]: 4.421e-05 [a_1]: 0.00158436 [with_stream_mark]: 1.205e-05 [recompute_prepare]: 1.089e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.53e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 0.00012673 [accelerated_algorithm]: 1.177e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.02999e-06 [merge_send_recv]: 6.75002e-06 [auto_parallel]: 7.09001e-06 [parallel]: 4.80001e-06 [flash_sp]: 3.14999e-06 [merge_comm]: 5.97001e-06 [allreduce_fusion]: 5.29998e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 4.49974e-07 [virtual_shard_identity]: 1.021e-05 [virtual_dataset]: 8.70999e-06 [get_grad_eliminate_]: 8.72998e-06 [virtual_output]: 8.48999e-06 [merge_forward]: 4.55001e-06 [cell_reuse_recompute_pass]: 9.00007e-07 [offload_activation]: 9.42999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.666e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25001e-06 [meta_fg_expand]: 3.367e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.12e-06 [after_resolve]: 1.509e-05 [a_after_grad]: 1.46e-05 [renormalize]: 0.0005855 [add_forward_monad_depend]: 4.30999e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.399e-05 [cse]: 4.62e-05 [a_3]: 6.466e-05 [Cycle 3]: 0.00089991, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 1.058e-05 [loop_unroll]: 9.08002e-06 [a_1]: 0.00024826 [with_stream_mark]: 9.87001e-06 [recompute_prepare]: 9.24998e-06 [updatestate_depend_eliminate]: 4.87998e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 3.96001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012347 [accelerated_algorithm]: 1.131e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.73002e-06 [shard_inline]: 8.96002e-06 [merge_send_recv]: 7.31999e-06 [auto_parallel]: 7.21999e-06 [parallel]: 4.79998e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 5.07999e-06 [allreduce_fusion]: 5.05001e-06 [matmul_add_comm_reduction]: 7.8e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 9.99999e-06 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.52e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 8.53001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.567e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.08002e-06 [meta_fg_expand]: 3.05002e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.308e-05 [a_after_grad]: 1.435e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.103e-05 [cse]: 2.745e-05 [a_3]: 5.926e-05 [py_interpret_to_execute_after_opt_a]: 1.036e-05 [slice_cell_reuse_recomputed_activation]: 2.21e-06 [rewriter_after_opt_a]: 4.971e-05 [convert_after_rewriter]: 9.46e-06 [order_py_execute_after_rewriter]: 7.23e-06 [mutable_eliminate]: 0.00045991 [opt_b]: 0.00028868, [1] [Cycle 1]: 0.00028216, [7] [b_1]: 0.00018907 [b_2]: 1.066e-05 [updatestate_depend_eliminate]: 7.15998e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 4.02e-06 [renormalize]: 3.69997e-07 [cse]: 3.165e-05 [optimize_parallel_all_gather_comm]: 2.214e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.07e-05 [loop_unroll]: 0.00045659 [opt_after_cconv]: 0.00013673, [1] [Cycle 1]: 0.00013075, [7] [c_1]: 4.872e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 7.04001e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 3.82998e-06 [cse]: 3.064e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 3.01e-05 [tuple_transform]: 0.00010169, [1] [Cycle 1]: 9.679e-05, [4] [d_1]: 6.649e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.84001e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 5.783e-05 [cse_after_recomputation]: 3.233e-05, [1] [Cycle 1]: 2.775e-05, [1] [cse]: 2.209e-05 [environ_conv]: 8.84e-06 [swap_dp_allreduce_reducescatter]: 8.04002e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.32998e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.33002e-06 [full_micro_interleaved_order_control]: 2.45002e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.36002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.672e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 5.22999e-06 [overlap_recompute_and_grad_model_parallel]: 6.00002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.62001e-06 [overlap_grad_ring_attention]: 5.05999e-06 [overlap_grad_flash_sp]: 2.406e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 2.00002e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 9.879e-05, [1] [Cycle 1]: 9.453e-05, [6] [build]: 9.40001e-06 [elim_shapecalc]: 1.338e-05 [elim_not_effective]: 1.844e-05 [opt_reshape]: 1.008e-05 [fold_const_symbol]: 1.5e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.96e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.506e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.0004674 [validate]: 4.709e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.122957 [execute]: 9.90002e-06 Sums bootstrap : 0.000532s : 0.36% type_inference : 0.010382s : 7.07% event_method : 0.000042s : 0.03% auto_monad : 0.000115s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000161s : 0.11% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.07% optimize.opt_a.a_1 : 0.003171s : 2.16% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000497s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001423s : 0.97% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.07% optimize.opt_a.renormalize : 0.003028s : 2.06% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.06% optimize.opt_a.cse : 0.000244s : 0.17% optimize.opt_a.a_3 : 0.000457s : 0.31% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000460s : 0.31% optimize.opt_b.b_1 : 0.000189s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.01% optimize.loop_unroll : 0.000457s : 0.31% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000467s : 0.32% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.122957s : 83.77% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000734 218 5.86% : 0.000043s : 11: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.41% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000007s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.04% : 0.000404s : 16: substitution.inline 2.17% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000015s : 3: substitution.less_batch_normalization 1.81% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.89% : 0.000014s : 20: substitution.remove_not_recompute_node 3.20% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.75% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.37% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010312 2 86.95% : 0.008966s : 1: type_inference.infer 13.05% : 0.001346s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.70% : 0.000119s : 16: replace.inline 41.30% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000426 30 92.85% : 0.000396s : 16: match.inline 7.15% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000742 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.28% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.54% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 244: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.18% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.10% : 0.000008s : 67: predicate.transpose_eliminate 1.42% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.68% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.61% : 0.000005s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001511 32 57.46% : 0.000868s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.54% : 0.000643s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.175374 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.71% : 0.002993s : 1: add_attr 1.70% : 0.002984s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000122s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.32% : 0.000569s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000049s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.27% : 0.000465s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000469s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.75% : 0.004817s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000173s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.19% : 0.010856s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.27% : 0.000476s : 1: opt_after_jit_grad 0.17% : 0.000292s : 1: opt_b 7.52% : 0.013181s : 1: optimize 0.01% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.92% : 0.001610s : 2: renormalize.infer 0.80% : 0.001404s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000054s : 1: rewriter_after_opt_a 0.10% : 0.000167s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 70.12% : 0.122981s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 5.93% : 0.010397s : 1: type_inference 0.04% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x1-ge],max_mem:14.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x2-pynative],max_mem:14.0M TotalTime = 0.0263128, [24] [bootstrap]: 0.00055617 [type_inference]: 0.00635017 [event_method]: 1.437e-05 [auto_monad]: 5.98e-05 [graph_reusing]: 5.44e-06 [inline]: 1.88002e-06 [add_attr]: 0.0034954, [1] [add_attr_with_inline]: 0.00348477, [1] [Cycle 1]: 4.474e-05, [2] [tag_attr]: 1.57e-05 [meta_addattr_fg_expand]: 4.13999e-06 [parallel-infer-symbol]: 2.78998e-06 [pre_auto_parallel]: 2.878e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00404026, [53] [py_interpret_to_execute]: 2.086e-05 [rewriter_before_opt_a]: 5.945e-05 [opt_a]: 0.00219985, [2] [Cycle 1]: 0.00159523, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 3.28e-05 [loop_unroll]: 2.164e-05 [a_1]: 0.00045979 [with_stream_mark]: 1.379e-05 [recompute_prepare]: 8.09002e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 3.45e-06 [updatestate_loads_eliminate]: 2.88998e-06 [parameter_eliminate]: 1.97001e-06 [a_2]: 0.00012917 [accelerated_algorithm]: 7.04001e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.71998e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 8.88002e-06 [auto_parallel]: 5.89999e-06 [parallel]: 2.488e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.74002e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.67999e-06 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 7.31001e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.98999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.008e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.092e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 8.89003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.32002e-06 [meta_fg_expand]: 2.38002e-06 [flash_sp_send_recv_attached]: 2.67001e-06 [receive_attached]: 2.92002e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00042171 [add_forward_monad_depend]: 4.82e-06 [auto_monad_grad]: 2.01e-06 [auto_monad_eliminator]: 1.417e-05 [cse]: 2.941e-05 [a_3]: 4.08e-05 [Cycle 2]: 0.00059532, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.68002e-06 [a_1]: 0.00012604 [with_stream_mark]: 9.92001e-06 [recompute_prepare]: 5.39e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.944e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.21001e-06 [auto_parallel]: 4.87e-06 [parallel]: 4.3e-06 [flash_sp]: 3.30998e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.28002e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.23998e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 5.01002e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 5.80002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46003e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.24002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 1.21997e-06 [receive_attached]: 1.00001e-06 [after_resolve]: 9.32999e-06 [a_after_grad]: 7.98999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 7.39994e-07 [auto_monad_eliminator]: 6.07999e-06 [cse]: 1.32e-05 [a_3]: 3.265e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 2.973e-05 [convert_after_rewriter]: 6.63998e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.00044964 [opt_b]: 0.00018042, [1] [Cycle 1]: 0.00017432, [7] [b_1]: 0.00010745 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.08002e-06 [renormalize]: 2.70025e-07 [cse]: 1.633e-05 [optimize_parallel_all_gather_comm]: 1.563e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.262e-05 [loop_unroll]: 0.00041291 [opt_after_cconv]: 9.526e-05, [1] [Cycle 1]: 8.962e-05, [7] [c_1]: 2.772e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.652e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.277e-05 [tuple_transform]: 6.877e-05, [1] [Cycle 1]: 6.436e-05, [4] [d_1]: 3.909e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 5.209e-05 [cse_after_recomputation]: 2.058e-05, [1] [Cycle 1]: 1.603e-05, [1] [cse]: 1.068e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 5.07999e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.48002e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.50002e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.187e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.41001e-06 [overlap_recompute_and_grad_model_parallel]: 4.24002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.58998e-06 [overlap_grad_ring_attention]: 4.75001e-06 [overlap_grad_flash_sp]: 1.761e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.13002e-06 [split_layernorm_comm]: 1.98997e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.754e-05, [1] [Cycle 1]: 6.341e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 8.47998e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 5.82999e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.59998e-06 [auto_monad_reorder]: 1.577e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 0.00013236 [opt_after_jit_grad]: 0.00045726 [validate]: 3.12e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.0108409 [execute]: 8.00999e-06 Sums bootstrap : 0.000556s : 2.55% type_inference : 0.006350s : 29.14% event_method : 0.000014s : 0.07% auto_monad : 0.000060s : 0.27% graph_reusing : 0.000005s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000029s : 0.13% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.10% optimize.rewriter_before_opt_a : 0.000059s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.18% optimize.opt_a.loop_unroll : 0.000027s : 0.13% optimize.opt_a.a_1 : 0.000586s : 2.69% optimize.opt_a.with_stream_mark : 0.000024s : 0.11% optimize.opt_a.recompute_prepare : 0.000013s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000199s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.06% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.05% optimize.opt_a.merge_send_recv : 0.000013s : 0.06% optimize.opt_a.auto_parallel : 0.000011s : 0.05% optimize.opt_a.parallel : 0.000029s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.05% optimize.opt_a.merge_comm : 0.000007s : 0.03% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.06% optimize.opt_a.virtual_dataset : 0.000011s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.07% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.09% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.03% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.09% optimize.opt_a.a_after_grad : 0.000017s : 0.08% optimize.opt_a.renormalize : 0.000422s : 1.94% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.09% optimize.opt_a.cse : 0.000043s : 0.20% optimize.opt_a.a_3 : 0.000073s : 0.34% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.14% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000450s : 2.06% optimize.opt_b.b_1 : 0.000107s : 0.49% optimize.opt_b.b_2 : 0.000007s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.07% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.10% optimize.loop_unroll : 0.000413s : 1.89% optimize.opt_after_cconv.c_1 : 0.000028s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.06% optimize.tuple_transform.d_1 : 0.000039s : 0.18% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.24% optimize.cse_after_recomputation.cse : 0.000011s : 0.05% optimize.environ_conv : 0.000005s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000132s : 0.61% opt_after_jit_grad : 0.000457s : 2.10% validate : 0.000031s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.010841s : 49.74% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000172 30 15.21% : 0.000026s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 66.88% : 0.000115s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.43% : 0.000004s : 4: substitution.remove_not_recompute_node 2.25% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006303 2 90.39% : 0.005698s : 1: type_inference.infer 9.61% : 0.000606s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.06% : 0.000027s : 3: replace.inline 29.94% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 91.72% : 0.000113s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.31% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.82% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.87% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.24% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 1.29% : 0.000002s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.04% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000396 8 46.88% : 0.000186s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.12% : 0.000210s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.035432 196 0.01% : 0.000004s : 1: ForceFp32Comm 9.88% : 0.003500s : 1: add_attr 9.84% : 0.003488s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000065s : 1: auto_monad 0.05% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.68% : 0.000596s : 1: bootstrap 0.07% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.19% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.29% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.03% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.85% : 0.001009s : 78: opt.transform.opt_a 0.07% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000090s : 28: opt.transform.opt_b 0.12% : 0.000043s : 2: opt.transform.opt_trans_graph 0.09% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.22% : 0.002203s : 1: opt_a 0.28% : 0.000099s : 1: opt_after_cconv 1.32% : 0.000467s : 1: opt_after_jit_grad 0.52% : 0.000184s : 1: opt_b 11.41% : 0.004044s : 1: optimize 0.05% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000033s : 1: pre_auto_parallel 0.07% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.61% : 0.000215s : 1: renormalize.infer 0.56% : 0.000199s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.39% : 0.000138s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000033s : 1: rewriter_after_opt_a 0.18% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.20% : 0.000070s : 1: symbol_engine_optimizer 30.64% : 0.010857s : 1: task_emit 0.20% : 0.000072s : 1: tuple_transform 17.96% : 0.006364s : 1: type_inference 0.30% : 0.000106s : 1: validate TotalTime = 0.018909, [24] [bootstrap]: 0.00055252 [type_inference]: 0.00450503 [event_method]: 1.03e-05 [auto_monad]: 5.181e-05 [graph_reusing]: 6.31998e-06 [inline]: 1.79e-06 [add_attr]: 0.00326681, [1] [add_attr_with_inline]: 0.00325882, [1] [Cycle 1]: 4.49e-05, [2] [tag_attr]: 1.222e-05 [meta_addattr_fg_expand]: 3.34001e-06 [parallel-infer-symbol]: 2.53e-06 [pre_auto_parallel]: 2.14e-05 [insert-virtual-dataset]: 3.06001e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00370642, [53] [py_interpret_to_execute]: 1.54e-05 [rewriter_before_opt_a]: 3.897e-05 [opt_a]: 0.00185615, [2] [Cycle 1]: 0.00126067, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 2.441e-05 [loop_unroll]: 1.339e-05 [a_1]: 0.00029394 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.2e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.71999e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.94999e-06 [a_2]: 7.719e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.59001e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 8.43999e-06 [auto_parallel]: 6.19999e-06 [parallel]: 1.886e-05 [flash_sp]: 7.45e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.98001e-06 [matmul_add_comm_reduction]: 8.74998e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 6.86001e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.34e-06 [merge_forward]: 3.64002e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 8.90001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.117e-05 [merge_recompute_call_nodes]: 2.02999e-06 [before_grad]: 9.41e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 9.18002e-06 [renormalize]: 0.00033999 [add_forward_monad_depend]: 4.57998e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 2.952e-05 [a_3]: 3.925e-05 [Cycle 2]: 0.00058672, [45] [expand_dump_flag]: 7.80012e-07 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.0001236 [with_stream_mark]: 9.10001e-06 [recompute_prepare]: 5.42001e-06 [updatestate_depend_eliminate]: 2.92002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.706e-05 [accelerated_algorithm]: 5.61998e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.11001e-06 [auto_parallel]: 5.56998e-06 [parallel]: 4.10998e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.75002e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 3.99974e-07 [virtual_shard_identity]: 5.89999e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.34001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.34e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.92003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 7.40023e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.84998e-06 [a_after_grad]: 8.13999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.237e-05 [a_3]: 3.13e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 3.115e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00044921 [opt_b]: 0.00017883, [1] [Cycle 1]: 0.00017272, [7] [b_1]: 0.00010674 [b_2]: 6.83998e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 2.70025e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.631e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.302e-05 [loop_unroll]: 0.00041429 [opt_after_cconv]: 9.361e-05, [1] [Cycle 1]: 8.819e-05, [7] [c_1]: 2.701e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.48e-06 [cse]: 1.563e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.287e-05 [tuple_transform]: 0.00010495, [1] [Cycle 1]: 0.00010072, [4] [d_1]: 7.409e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 5.065e-05 [cse_after_recomputation]: 2.142e-05, [1] [Cycle 1]: 1.688e-05, [1] [cse]: 1.144e-05 [environ_conv]: 4.42998e-06 [swap_dp_allreduce_reducescatter]: 5.27001e-06 [bias_add_comm_swap]: 2.37001e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.86e-06 [assign_add_opt]: 1.84e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.94001e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.201e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.82e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 1.729e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.91998e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 6.787e-05, [1] [Cycle 1]: 6.381e-05, [6] [build]: 2.56e-06 [elim_shapecalc]: 8.15e-06 [elim_not_effective]: 1.123e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.64998e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.624e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.00045006 [validate]: 3.094e-05 [backend_pass]: 8.00006e-07 [task_emit]: 0.00606788 [execute]: 7.26999e-06 Sums bootstrap : 0.000553s : 3.76% type_inference : 0.004505s : 30.68% event_method : 0.000010s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000418s : 2.84% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000340s : 2.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000042s : 0.29% optimize.opt_a.a_3 : 0.000071s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000449s : 3.06% optimize.opt_b.b_1 : 0.000107s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000414s : 2.82% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000074s : 0.50% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.34% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 3.06% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006068s : 41.32% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.65% : 0.000023s : 4: substitution.arithmetic_simplify 1.61% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.59% : 0.000006s : 4: substitution.graph_param_transform 65.24% : 0.000079s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.86% : 0.000003s : 4: substitution.replace_old_param ------[type_inference.] 0.004463 2 92.24% : 0.004117s : 1: type_inference.infer 7.76% : 0.000346s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.72% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.57% : 0.000004s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.77% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.97% : 0.000003s : 11: predicate.float_depend_g_call 0.72% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.44% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.85% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 1.09% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 43.50% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.50% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027178 196 0.01% : 0.000003s : 1: ForceFp32Comm 12.04% : 0.003271s : 1: add_attr 12.00% : 0.003262s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.20% : 0.000055s : 1: add_recomputation 0.02% : 0.000005s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.17% : 0.000591s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.56% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.69% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 2.82% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000089s : 28: opt.transform.opt_b 0.29% : 0.000078s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.84% : 0.001859s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.69% : 0.000459s : 1: opt_after_jit_grad 0.67% : 0.000182s : 1: opt_b 13.65% : 0.003710s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.68% : 0.000186s : 1: renormalize.infer 0.55% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 22.36% : 0.006078s : 1: task_emit 0.40% : 0.000108s : 1: tuple_transform 16.63% : 0.004519s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0200591, [24] [bootstrap]: 0.00050222 [type_inference]: 0.00581097 [event_method]: 1.386e-05 [auto_monad]: 5.635e-05 [graph_reusing]: 5.51e-06 [inline]: 1.79e-06 [add_attr]: 0.00295967, [1] [add_attr_with_inline]: 0.00295194, [1] [Cycle 1]: 4.683e-05, [2] [tag_attr]: 1.614e-05 [meta_addattr_fg_expand]: 4.03001e-06 [parallel-infer-symbol]: 3.01001e-06 [pre_auto_parallel]: 2.634e-05 [insert-virtual-dataset]: 2.70002e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.39999e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00397938, [53] [py_interpret_to_execute]: 2.033e-05 [rewriter_before_opt_a]: 5.859e-05 [opt_a]: 0.00209999, [2] [Cycle 1]: 0.00150072, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 3.18e-05 [loop_unroll]: 2.103e-05 [a_1]: 0.00045491 [with_stream_mark]: 1.45e-05 [recompute_prepare]: 7.70998e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.53e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.657e-05 [accelerated_algorithm]: 6.11e-06 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 1.68002e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.92e-06 [auto_parallel]: 5.46002e-06 [parallel]: 1.989e-05 [flash_sp]: 8.25e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 9.39998e-06 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 7.06001e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.072e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 8.80999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.40998e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 9.84001e-06 [a_after_grad]: 8.72998e-06 [renormalize]: 0.0004034 [add_forward_monad_depend]: 5.09e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.295e-05 [cse]: 2.795e-05 [a_3]: 3.996e-05 [Cycle 2]: 0.00059008, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012487 [with_stream_mark]: 9.09e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.766e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.29997e-06 [auto_parallel]: 5.02e-06 [parallel]: 4.2e-06 [flash_sp]: 3.22997e-06 [merge_comm]: 2.91999e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 5.40999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.58e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.96001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.15001e-06 [a_after_grad]: 8e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 7.50006e-07 [auto_monad_eliminator]: 5.72999e-06 [cse]: 1.667e-05 [a_3]: 3.184e-05 [py_interpret_to_execute_after_opt_a]: 7.28e-06 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 3.234e-05 [convert_after_rewriter]: 6.79001e-06 [order_py_execute_after_rewriter]: 5.35001e-06 [mutable_eliminate]: 0.00044502 [opt_b]: 0.00018386, [1] [Cycle 1]: 0.00017753, [7] [b_1]: 0.00010779 [b_2]: 7.66001e-06 [updatestate_depend_eliminate]: 4.94998e-06 [updatestate_assign_eliminate]: 2.75002e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 5.49975e-07 [cse]: 1.7e-05 [optimize_parallel_all_gather_comm]: 1.589e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.399e-05 [loop_unroll]: 0.00041163 [opt_after_cconv]: 9.428e-05, [1] [Cycle 1]: 8.86e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.609e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.905e-05, [1] [Cycle 1]: 6.482e-05, [4] [d_1]: 3.909e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.18998e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 4.989e-05 [cse_after_recomputation]: 1.982e-05, [1] [Cycle 1]: 1.526e-05, [1] [cse]: 1.01e-05 [environ_conv]: 4.90999e-06 [swap_dp_allreduce_reducescatter]: 5.52001e-06 [bias_add_comm_swap]: 2.56998e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.50001e-06 [slice_recompute_activation]: 2.81999e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 8.99978e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.73003e-06 [comm_op_add_attrs]: 1.06002e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.172e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.78003e-06 [overlap_grad_ring_attention]: 4.27e-06 [overlap_grad_flash_sp]: 1.76e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.28002e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.773e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 9.22001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.59998e-06 [pipeline_parallel_scheduler]: 1.87001e-06 [auto_monad_reorder]: 4.101e-05 [get_jit_bprop_graph]: 1.48002e-06 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.00044579 [validate]: 2.972e-05 [backend_pass]: 8.29983e-07 [task_emit]: 0.00596489 [execute]: 7.00998e-06 Sums bootstrap : 0.000502s : 3.12% type_inference : 0.005811s : 36.08% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000580s : 3.60% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000010s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000403s : 2.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000045s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 2.76% optimize.opt_b.b_1 : 0.000108s : 0.67% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000412s : 2.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000041s : 0.25% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 2.77% validate : 0.000030s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.005965s : 37.03% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.72% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000002s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.98% : 0.000111s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.22% : 0.000004s : 4: substitution.replace_old_param 6.68% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005767 2 90.63% : 0.005227s : 1: type_inference.infer 9.37% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.75% : 0.000026s : 3: replace.inline 30.25% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.67% : 0.000109s : 3: match.inline 8.33% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.37% : 0.000001s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.59% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.70% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.55% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.31% : 0.000004s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 47.45% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.55% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028463 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.41% : 0.002964s : 1: add_attr 10.38% : 0.002955s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000061s : 1: auto_monad 0.16% : 0.000045s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000540s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.59% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.31% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.39% : 0.002103s : 1: opt_a 0.34% : 0.000098s : 1: opt_after_cconv 1.60% : 0.000455s : 1: opt_after_jit_grad 0.66% : 0.000187s : 1: opt_b 13.99% : 0.003983s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000006s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000207s : 1: renormalize.infer 0.66% : 0.000189s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 20.99% : 0.005975s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 20.46% : 0.005824s : 1: type_inference 0.19% : 0.000055s : 1: validate TotalTime = 0.0395789, [24] [bootstrap]: 0.0005992 [type_inference]: 0.0114188 [event_method]: 4.727e-05 [auto_monad]: 0.00016593 [graph_reusing]: 9.41e-06 [inline]: 2.12001e-06 [add_attr]: 0.00299439, [1] [add_attr_with_inline]: 0.00298603, [1] [Cycle 1]: 7.102e-05, [2] [tag_attr]: 3.463e-05 [meta_addattr_fg_expand]: 9.49e-06 [parallel-infer-symbol]: 2.93e-06 [pre_auto_parallel]: 4.929e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.03997e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.0127526, [53] [py_interpret_to_execute]: 3.775e-05 [rewriter_before_opt_a]: 0.00014443 [opt_a]: 0.0105947, [3] [Cycle 1]: 0.00695501, [45] [expand_dump_flag]: 3.93001e-06 [switch_simplify]: 7.386e-05 [loop_unroll]: 6.143e-05 [a_1]: 0.00149951 [with_stream_mark]: 2.335e-05 [recompute_prepare]: 2.198e-05 [updatestate_depend_eliminate]: 9.38002e-06 [updatestate_assign_eliminate]: 8.47998e-06 [updatestate_loads_eliminate]: 7.3e-06 [parameter_eliminate]: 2.73e-06 [a_2]: 0.0002468 [accelerated_algorithm]: 3.203e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 3.33998e-06 [shard_inline]: 1.619e-05 [merge_send_recv]: 1.624e-05 [auto_parallel]: 1.069e-05 [parallel]: 1.874e-05 [flash_sp]: 1.157e-05 [merge_comm]: 9.32001e-06 [allreduce_fusion]: 8.85001e-06 [matmul_add_comm_reduction]: 2.627e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.749e-05 [virtual_dataset]: 1.564e-05 [get_grad_eliminate_]: 1.508e-05 [virtual_output]: 1.494e-05 [merge_forward]: 9.67999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.762e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.847e-05 [merge_recompute_call_nodes]: 1.46998e-06 [before_grad]: 2.778e-05 [set_forward_comm_id_for_comm_node_pass]: 9.30001e-06 [meta_fg_expand]: 0.00144549 [flash_sp_send_recv_attached]: 3.65998e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 5.983e-05 [a_after_grad]: 8.047e-05 [renormalize]: 0.00225884 [add_forward_monad_depend]: 9.64e-06 [auto_monad_grad]: 5.17999e-06 [auto_monad_eliminator]: 5.389e-05 [cse]: 0.00015981 [a_3]: 0.00032722 [Cycle 2]: 0.00283172, [45] [expand_dump_flag]: 1.62999e-06 [switch_simplify]: 4.586e-05 [loop_unroll]: 4.308e-05 [a_1]: 0.00148213 [with_stream_mark]: 1.118e-05 [recompute_prepare]: 9.59999e-06 [updatestate_depend_eliminate]: 4.3e-06 [updatestate_assign_eliminate]: 3.70998e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 1.12999e-06 [a_2]: 0.00014876 [accelerated_algorithm]: 1.15e-05 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 8.13001e-06 [merge_send_recv]: 6.21998e-06 [auto_parallel]: 6.80998e-06 [parallel]: 4.86002e-06 [flash_sp]: 3.69002e-06 [merge_comm]: 4.23999e-06 [allreduce_fusion]: 3.79002e-06 [matmul_add_comm_reduction]: 6.48998e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 9.02e-06 [virtual_dataset]: 7.68001e-06 [get_grad_eliminate_]: 7.46001e-06 [virtual_output]: 7.66999e-06 [merge_forward]: 3.55e-06 [cell_reuse_recompute_pass]: 8.29983e-07 [offload_activation]: 7.79002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.473e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.31e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 6.756e-05 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.16002e-06 [after_resolve]: 1.451e-05 [a_after_grad]: 1.264e-05 [renormalize]: 0.00050912 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.255e-05 [cse]: 2.256e-05 [a_3]: 5.677e-05 [Cycle 3]: 0.00079397, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 9.34998e-06 [loop_unroll]: 8.01001e-06 [a_1]: 0.00021082 [with_stream_mark]: 8.67e-06 [recompute_prepare]: 8.01001e-06 [updatestate_depend_eliminate]: 4.22e-06 [updatestate_assign_eliminate]: 3.5e-06 [updatestate_loads_eliminate]: 3.45998e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 0.00010736 [accelerated_algorithm]: 1.057e-05 [shard]: 8.2e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 8.16002e-06 [merge_send_recv]: 5.62001e-06 [auto_parallel]: 6.34001e-06 [parallel]: 4.90001e-06 [flash_sp]: 1.04e-06 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.88999e-06 [matmul_add_comm_reduction]: 6.56e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 9.00001e-06 [virtual_dataset]: 7.84002e-06 [get_grad_eliminate_]: 7.48999e-06 [virtual_output]: 7.3e-06 [merge_forward]: 3.43e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 7.53e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.396e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.214e-05 [set_forward_comm_id_for_comm_node_pass]: 4.33999e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.221e-05 [a_after_grad]: 1.241e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 7.2e-07 [auto_monad_eliminator]: 8.38001e-06 [cse]: 1.827e-05 [a_3]: 4.9e-05 [py_interpret_to_execute_after_opt_a]: 8.87999e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 4.267e-05 [convert_after_rewriter]: 8.55001e-06 [order_py_execute_after_rewriter]: 6.33e-06 [mutable_eliminate]: 0.00046134 [opt_b]: 0.00025054, [1] [Cycle 1]: 0.00024462, [7] [b_1]: 0.00016463 [b_2]: 9.68002e-06 [updatestate_depend_eliminate]: 6.56999e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.25e-06 [renormalize]: 5.00004e-07 [cse]: 2.249e-05 [optimize_parallel_all_gather_comm]: 1.815e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 1.953e-05 [loop_unroll]: 0.00041912 [opt_after_cconv]: 0.00012046, [1] [Cycle 1]: 0.00011487, [7] [c_1]: 4.269e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 6.31e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.48e-06 [cse]: 2.287e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 2.695e-05 [tuple_transform]: 9.137e-05, [1] [Cycle 1]: 8.636e-05, [4] [d_1]: 5.849e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 8.62e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 5.303e-05 [cse_after_recomputation]: 2.701e-05, [1] [Cycle 1]: 2.231e-05, [1] [cse]: 1.695e-05 [environ_conv]: 7.77e-06 [swap_dp_allreduce_reducescatter]: 7.13e-06 [bias_add_comm_swap]: 2.97002e-06 [label_micro_interleaved_index]: 4.45999e-06 [label_fine_grained_interleaved_index]: 3.00998e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.52001e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.56e-06 [reorder_send_recv_between_fp_bp]: 2.68998e-06 [comm_op_add_attrs]: 1.11002e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.46998e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.529e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.17e-06 [overlap_recompute_and_grad_model_parallel]: 5.51998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.06e-06 [overlap_grad_ring_attention]: 4.65001e-06 [overlap_grad_flash_sp]: 2.275e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 8.951e-05, [1] [Cycle 1]: 8.533e-05, [6] [build]: 9.47999e-06 [elim_shapecalc]: 1.109e-05 [elim_not_effective]: 1.576e-05 [opt_reshape]: 8.53001e-06 [fold_const_symbol]: 1.328e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.187e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.00048852 [validate]: 4.029e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0107506 [execute]: 7.97998e-06 Sums bootstrap : 0.000599s : 1.70% type_inference : 0.011419s : 32.31% event_method : 0.000047s : 0.13% auto_monad : 0.000166s : 0.47% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.41% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000129s : 0.37% optimize.opt_a.loop_unroll : 0.000113s : 0.32% optimize.opt_a.a_1 : 0.003192s : 9.03% optimize.opt_a.with_stream_mark : 0.000043s : 0.12% optimize.opt_a.recompute_prepare : 0.000040s : 0.11% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000503s : 1.42% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.15% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000032s : 0.09% optimize.opt_a.merge_send_recv : 0.000028s : 0.08% optimize.opt_a.auto_parallel : 0.000024s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.05% optimize.opt_a.allreduce_fusion : 0.000017s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.10% optimize.opt_a.virtual_dataset : 0.000031s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.08% optimize.opt_a.virtual_output : 0.000030s : 0.08% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000033s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.16% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000053s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.05% optimize.opt_a.meta_fg_expand : 0.001516s : 4.29% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.24% optimize.opt_a.a_after_grad : 0.000106s : 0.30% optimize.opt_a.renormalize : 0.002768s : 7.83% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000075s : 0.21% optimize.opt_a.cse : 0.000201s : 0.57% optimize.opt_a.a_3 : 0.000433s : 1.23% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000043s : 0.12% optimize.convert_after_rewriter : 0.000009s : 0.02% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000461s : 1.31% optimize.opt_b.b_1 : 0.000165s : 0.47% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000022s : 0.06% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.05% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000419s : 1.19% optimize.opt_after_cconv.c_1 : 0.000043s : 0.12% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000023s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000027s : 0.08% optimize.tuple_transform.d_1 : 0.000058s : 0.17% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.02% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.15% optimize.cse_after_recomputation.cse : 0.000017s : 0.05% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.04% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.06% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.02% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000022s : 0.06% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000489s : 1.38% validate : 0.000040s : 0.11% backend_pass : 0.000001s : 0.00% task_emit : 0.010751s : 30.42% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000737 213 6.07% : 0.000045s : 12: substitution.arithmetic_simplify 0.32% : 0.000002s : 4: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 4: substitution.fold_const_symbol 0.91% : 0.000007s : 7: substitution.graph_param_transform 0.42% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 56.49% : 0.000417s : 17: substitution.inline 2.10% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 18: substitution.j_node_and_user_rematch 2.15% : 0.000016s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.68% : 0.000012s : 18: substitution.remove_not_recompute_node 3.28% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000010s : 15: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.71% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.89% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011344 2 87.08% : 0.009878s : 1: type_inference.infer 12.92% : 0.001466s : 1: type_inference.specialize ------[replace.] 0.000221 33 58.31% : 0.000129s : 17: replace.inline 41.69% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000442 33 92.30% : 0.000408s : 17: match.inline 7.70% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000756 5530 1.05% : 0.000008s : 66: predicate.accumulaten_eliminater 0.26% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 30: predicate.addn_check_dump 1.03% : 0.000008s : 66: predicate.addn_zero_filter 1.01% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 1.92% : 0.000015s : 96: predicate.arithmetic_simplify 1.06% : 0.000008s : 66: predicate.cast_eliminate 1.07% : 0.000008s : 65: predicate.check_bprop_eliminate 0.49% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.48% : 0.000004s : 30: predicate.depend_value_elim 1.14% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 66: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 66: predicate.dict_set_item_eliminator 0.33% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 7: predicate.elim_not_effective 0.13% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000009s : 73: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 73: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 73: predicate.environ_get_depend_swap 1.65% : 0.000012s : 103: predicate.environ_get_eliminate 1.16% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.68% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 99: predicate.float_depend_g_call 0.48% : 0.000004s : 30: predicate.float_environ_get_switch 0.61% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.53% : 0.000004s : 30: predicate.get_grad_eliminate 0.09% : 0.000001s : 7: predicate.graph_param_transform 0.50% : 0.000004s : 30: predicate.incorporate_call 0.46% : 0.000003s : 30: predicate.incorporate_call_switch 5.27% : 0.000040s : 239: predicate.inline 1.19% : 0.000009s : 53: predicate.inline_without_move 0.31% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.60% : 0.000005s : 30: predicate.less_batch_normalization 1.54% : 0.000012s : 96: predicate.list_to_tuple_eliminator_ 2.54% : 0.000019s : 162: predicate.load_eliminater 0.27% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.21% : 0.000017s : 134: predicate.loop_unroll_before_grad 1.36% : 0.000010s : 80: predicate.make_slice_get_slice_eliminator 5.45% : 0.000041s : 30: predicate.merge_addn 1.06% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.07% : 0.000008s : 66: predicate.minmaximum_grad 0.29% : 0.000002s : 7: predicate.mutable_eliminate 0.14% : 0.000001s : 7: predicate.opt_reshape 0.17% : 0.000001s : 7: predicate.parallel_virtual_node 2.04% : 0.000015s : 99: predicate.partial_defer_inline 1.68% : 0.000013s : 89: predicate.partial_eliminate 1.02% : 0.000008s : 66: predicate.print_const_string_wrapper 0.50% : 0.000004s : 30: predicate.reduce_all_const_elim 1.28% : 0.000010s : 66: predicate.reduce_eliminate 2.56% : 0.000019s : 162: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 30: predicate.remove_not_recompute_node 1.84% : 0.000014s : 147: predicate.replace_applicator 0.59% : 0.000004s : 53: predicate.replace_old_param 0.09% : 0.000001s : 7: predicate.reset_defer_inline 1.03% : 0.000008s : 66: predicate.reshape_eliminate 1.08% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 7: predicate.row_tensor_eliminate 1.18% : 0.000009s : 65: predicate.same_eliminate 0.34% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.57% : 0.000004s : 30: predicate.shard_identity_eliminate 0.24% : 0.000002s : 14: predicate.special_op_eliminate 0.58% : 0.000004s : 30: predicate.specialize_transform 1.17% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.08% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.13% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 99: predicate.switch_defer_inline 2.86% : 0.000022s : 164: predicate.switch_layer_defer_inline 4.85% : 0.000037s : 270: predicate.switch_simplify 1.03% : 0.000008s : 66: predicate.tile_eliminate 1.03% : 0.000008s : 66: predicate.transpose_eliminate 1.39% : 0.000010s : 80: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 80: predicate.tuple_list_get_item_const_eliminator 1.24% : 0.000009s : 80: predicate.tuple_list_get_item_depend_reorder 2.69% : 0.000020s : 126: predicate.tuple_list_get_item_eliminator 1.37% : 0.000010s : 80: predicate.tuple_list_get_set_item_eliminator 1.85% : 0.000014s : 110: predicate.tuple_list_set_item_eliminator 1.55% : 0.000012s : 96: predicate.tuple_to_list_eliminator_ 2.53% : 0.000019s : 162: predicate.updatestate_pure_node_eliminater 3.13% : 0.000024s : 192: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 7: predicate.value_based_eliminate 0.52% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 30: predicate.virtual_output_eliminate 0.13% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001525 34 57.06% : 0.000870s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.94% : 0.000655s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063171 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.75% : 0.002999s : 1: add_attr 4.73% : 0.002990s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.27% : 0.000173s : 1: auto_monad 0.04% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.01% : 0.000637s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000018s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000030s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000016s : 1: opt.transform.mutable_eliminate 7.61% : 0.004807s : 117: opt.transform.opt_a 0.07% : 0.000041s : 1: opt.transform.opt_after_cconv 0.05% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.23% : 0.000148s : 28: opt.transform.opt_b 0.10% : 0.000065s : 2: opt.transform.opt_trans_graph 0.07% : 0.000045s : 4: opt.transform.symbol_engine_opt 16.78% : 0.010598s : 1: opt_a 0.20% : 0.000124s : 1: opt_after_cconv 0.79% : 0.000498s : 1: opt_after_jit_grad 0.40% : 0.000254s : 1: opt_b 20.19% : 0.012756s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000031s : 1: remove_dup_value 2.28% : 0.001439s : 2: renormalize.infer 2.08% : 0.001315s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000047s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000092s : 1: symbol_engine_optimizer 17.04% : 0.010764s : 1: task_emit 0.15% : 0.000094s : 1: tuple_transform 18.10% : 0.011434s : 1: type_inference 0.11% : 0.000071s : 1: validate TotalTime = 0.0189138, [24] [bootstrap]: 0.00050812 [type_inference]: 0.00440965 [event_method]: 1.055e-05 [auto_monad]: 5.165e-05 [graph_reusing]: 5.48002e-06 [inline]: 2.34001e-06 [add_attr]: 0.00307513, [1] [add_attr_with_inline]: 0.00306772, [1] [Cycle 1]: 4.494e-05, [2] [tag_attr]: 1.215e-05 [meta_addattr_fg_expand]: 3.15002e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 2.153e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00370511, [53] [py_interpret_to_execute]: 1.457e-05 [rewriter_before_opt_a]: 4.094e-05 [opt_a]: 0.00186763, [2] [Cycle 1]: 0.00126698, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.485e-05 [loop_unroll]: 1.368e-05 [a_1]: 0.00029412 [with_stream_mark]: 1.473e-05 [recompute_prepare]: 7.16001e-06 [updatestate_depend_eliminate]: 3.97998e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 7.803e-05 [accelerated_algorithm]: 6.35002e-06 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 1.49998e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 5.66e-06 [parallel]: 1.906e-05 [flash_sp]: 7.83001e-06 [merge_comm]: 4.03001e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 9.20999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.53002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.94001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.95e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.55997e-06 [receive_attached]: 2.71e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 9.11998e-06 [renormalize]: 0.00034089 [add_forward_monad_depend]: 4.47998e-06 [auto_monad_grad]: 2.04999e-06 [auto_monad_eliminator]: 1.309e-05 [cse]: 2.851e-05 [a_3]: 4.069e-05 [Cycle 2]: 0.00059176, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.55001e-06 [a_1]: 0.00012575 [with_stream_mark]: 1.065e-05 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.68998e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 7.79983e-07 [a_2]: 6.809e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.66003e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 5.22e-06 [parallel]: 4.28001e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.10998e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.22001e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 6.11e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.76998e-06 [flash_sp_send_recv_attached]: 1.06002e-06 [receive_attached]: 1.05001e-06 [after_resolve]: 8.95999e-06 [a_after_grad]: 7.95e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 5.59e-06 [cse]: 1.226e-05 [a_3]: 3.169e-05 [py_interpret_to_execute_after_opt_a]: 7.33999e-06 [slice_cell_reuse_recomputed_activation]: 1.93997e-06 [rewriter_after_opt_a]: 3.158e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 4.93001e-06 [mutable_eliminate]: 0.00044655 [opt_b]: 0.00018153, [1] [Cycle 1]: 0.00017573, [7] [b_1]: 0.00010881 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 4.39992e-07 [cse]: 1.604e-05 [optimize_parallel_all_gather_comm]: 1.578e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.36e-05 [loop_unroll]: 0.00043849 [opt_after_cconv]: 9.558e-05, [1] [Cycle 1]: 8.977e-05, [7] [c_1]: 2.866e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.09999e-06 [cse]: 1.516e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.301e-05 [tuple_transform]: 6.938e-05, [1] [Cycle 1]: 6.504e-05, [4] [d_1]: 3.916e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 4.577e-05 [cse_after_recomputation]: 1.954e-05, [1] [Cycle 1]: 1.512e-05, [1] [cse]: 1.008e-05 [environ_conv]: 5.07e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 3.09999e-06 [label_micro_interleaved_index]: 4.09002e-06 [label_fine_grained_interleaved_index]: 2.85998e-06 [merge_cast_opt]: 1.31002e-06 [slice_recompute_activation]: 2.54999e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.06997e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.24998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.229e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 3.89002e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.697e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 1.39998e-06 [symbol_engine_optimizer]: 6.785e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.37999e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.57998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 1.618e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.61999e-06 [opt_after_jit_grad]: 0.00044778 [validate]: 3.217e-05 [backend_pass]: 1.01002e-06 [task_emit]: 0.00640608 [execute]: 6.92997e-06 Sums bootstrap : 0.000508s : 3.41% type_inference : 0.004410s : 29.63% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.14% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000041s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000420s : 2.82% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000341s : 2.29% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000447s : 3.00% optimize.opt_b.b_1 : 0.000109s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000438s : 2.95% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000448s : 3.01% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006406s : 43.05% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000123 26 18.43% : 0.000023s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.54% : 0.000006s : 4: substitution.graph_param_transform 65.10% : 0.000080s : 2: substitution.inline 2.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 3.22% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004369 2 91.99% : 0.004019s : 1: type_inference.infer 8.01% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.37% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 17: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.43% : 0.000001s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.01% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.48% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.55% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.99% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.60% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.36% : 0.000000s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000242 6 42.32% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.68% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026970 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.42% : 0.003079s : 1: add_attr 11.39% : 0.003071s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000050s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.02% : 0.000546s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.66% : 0.000447s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000455s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.87% : 0.000774s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.94% : 0.001870s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.69% : 0.000457s : 1: opt_after_jit_grad 0.69% : 0.000185s : 1: opt_b 13.75% : 0.003709s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.02% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.68% : 0.000185s : 1: renormalize.infer 0.55% : 0.000149s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.17% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000071s : 1: symbol_engine_optimizer 23.79% : 0.006416s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.40% : 0.004424s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0361369, [24] [bootstrap]: 0.00055644 [type_inference]: 0.0112485 [event_method]: 4.196e-05 [auto_monad]: 0.00011861 [graph_reusing]: 8.80001e-06 [inline]: 2.27001e-06 [add_attr]: 0.00308237, [1] [add_attr_with_inline]: 0.00307415, [1] [Cycle 1]: 7.134e-05, [2] [tag_attr]: 3.253e-05 [meta_addattr_fg_expand]: 9.07999e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 4.596e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 8.49977e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.0126369, [53] [py_interpret_to_execute]: 3.527e-05 [rewriter_before_opt_a]: 0.0001284 [opt_a]: 0.0104472, [3] [Cycle 1]: 0.00684363, [45] [expand_dump_flag]: 4.16001e-06 [switch_simplify]: 6.762e-05 [loop_unroll]: 5.472e-05 [a_1]: 0.00133893 [with_stream_mark]: 2.437e-05 [recompute_prepare]: 2.209e-05 [updatestate_depend_eliminate]: 9.40001e-06 [updatestate_assign_eliminate]: 7.35e-06 [updatestate_loads_eliminate]: 7.51999e-06 [parameter_eliminate]: 2.96001e-06 [a_2]: 0.00024714 [accelerated_algorithm]: 3.127e-05 [shard]: 2.04e-06 [meta_shard_fg_expand]: 3.71001e-06 [shard_inline]: 1.623e-05 [merge_send_recv]: 1.666e-05 [auto_parallel]: 1.075e-05 [parallel]: 2.036e-05 [flash_sp]: 1.123e-05 [merge_comm]: 9.89999e-06 [allreduce_fusion]: 9.07999e-06 [matmul_add_comm_reduction]: 2.684e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.834e-05 [virtual_dataset]: 1.547e-05 [get_grad_eliminate_]: 1.529e-05 [virtual_output]: 1.489e-05 [merge_forward]: 1.004e-05 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 1.766e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.863e-05 [merge_recompute_call_nodes]: 1.97999e-06 [before_grad]: 2.785e-05 [set_forward_comm_id_for_comm_node_pass]: 9.60001e-06 [meta_fg_expand]: 0.00141624 [flash_sp_send_recv_attached]: 3.43e-06 [receive_attached]: 3.09999e-06 [after_resolve]: 5.937e-05 [a_after_grad]: 8.229e-05 [renormalize]: 0.00230679 [add_forward_monad_depend]: 9.58002e-06 [auto_monad_grad]: 5.32001e-06 [auto_monad_eliminator]: 5.39e-05 [cse]: 0.00016362 [a_3]: 0.00035511 [Cycle 2]: 0.00279609, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.654e-05 [loop_unroll]: 4.272e-05 [a_1]: 0.00149932 [with_stream_mark]: 1.129e-05 [recompute_prepare]: 9.67001e-06 [updatestate_depend_eliminate]: 4.42e-06 [updatestate_assign_eliminate]: 3.58999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00010924 [accelerated_algorithm]: 1.096e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 8.47998e-06 [merge_send_recv]: 6.38e-06 [auto_parallel]: 7.15e-06 [parallel]: 4.72998e-06 [flash_sp]: 3.58e-06 [merge_comm]: 4.50999e-06 [allreduce_fusion]: 4.00998e-06 [matmul_add_comm_reduction]: 7.01001e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 9.09e-06 [virtual_dataset]: 7.75e-06 [get_grad_eliminate_]: 7.4e-06 [virtual_output]: 7.45998e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 8.59989e-07 [offload_activation]: 8.33001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.511e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.221e-05 [set_forward_comm_id_for_comm_node_pass]: 4.50001e-06 [meta_fg_expand]: 3.694e-05 [flash_sp_send_recv_attached]: 8.79983e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.421e-05 [a_after_grad]: 1.331e-05 [renormalize]: 0.00052739 [add_forward_monad_depend]: 4.21001e-06 [auto_monad_grad]: 1.47001e-06 [auto_monad_eliminator]: 1.249e-05 [cse]: 2.221e-05 [a_3]: 5.654e-05 [Cycle 3]: 0.00079326, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 9.39e-06 [loop_unroll]: 7.94002e-06 [a_1]: 0.00021172 [with_stream_mark]: 8.44998e-06 [recompute_prepare]: 8.09002e-06 [updatestate_depend_eliminate]: 4.2e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 3.23998e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 0.00010735 [accelerated_algorithm]: 1.044e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 8.28999e-06 [merge_send_recv]: 5.87999e-06 [auto_parallel]: 6.43e-06 [parallel]: 4.87e-06 [flash_sp]: 9.39996e-07 [merge_comm]: 4.22e-06 [allreduce_fusion]: 3.93999e-06 [matmul_add_comm_reduction]: 6.58e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 8.57998e-06 [virtual_dataset]: 7.65998e-06 [get_grad_eliminate_]: 7.46999e-06 [virtual_output]: 7.21999e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 7.61999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.392e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.174e-05 [set_forward_comm_id_for_comm_node_pass]: 4.23999e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 9.49978e-07 [after_resolve]: 1.235e-05 [a_after_grad]: 1.261e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 7.29982e-07 [auto_monad_eliminator]: 8.22998e-06 [cse]: 1.839e-05 [a_3]: 4.838e-05 [py_interpret_to_execute_after_opt_a]: 9.47001e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 4.327e-05 [convert_after_rewriter]: 8.35001e-06 [order_py_execute_after_rewriter]: 6.98998e-06 [mutable_eliminate]: 0.0004832 [opt_b]: 0.00026134, [1] [Cycle 1]: 0.00025526, [7] [b_1]: 0.00017196 [b_2]: 1.003e-05 [updatestate_depend_eliminate]: 6.54999e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 3.18e-06 [renormalize]: 3.89991e-07 [cse]: 2.402e-05 [optimize_parallel_all_gather_comm]: 1.894e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.011e-05 [loop_unroll]: 0.00042321 [opt_after_cconv]: 0.00012064, [1] [Cycle 1]: 0.0001151, [7] [c_1]: 4.281e-05 [parameter_eliminate]: 2.44001e-06 [updatestate_depend_eliminate]: 6.48e-06 [updatestate_assign_eliminate]: 3.62002e-06 [updatestate_loads_eliminate]: 3.28e-06 [cse]: 2.28e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 2.629e-05 [tuple_transform]: 9.327e-05, [1] [Cycle 1]: 8.849e-05, [4] [d_1]: 5.931e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 8.79e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 5.595e-05 [cse_after_recomputation]: 2.658e-05, [1] [Cycle 1]: 2.196e-05, [1] [cse]: 1.666e-05 [environ_conv]: 8.33999e-06 [swap_dp_allreduce_reducescatter]: 7.37002e-06 [bias_add_comm_swap]: 2.48998e-06 [label_micro_interleaved_index]: 4.66997e-06 [label_fine_grained_interleaved_index]: 2.82002e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.48002e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 3.00998e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.533e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 4.48001e-06 [overlap_recompute_and_grad_model_parallel]: 5.25001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69998e-06 [overlap_recompute_comm]: 2.01e-06 [overlap_grad_ring_attention]: 5.00999e-06 [overlap_grad_flash_sp]: 2.306e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.68997e-06 [handle_group_info]: 9.00007e-07 [symbol_engine_optimizer]: 9.141e-05, [1] [Cycle 1]: 8.731e-05, [6] [build]: 9.91998e-06 [elim_shapecalc]: 1.163e-05 [elim_not_effective]: 1.623e-05 [opt_reshape]: 8.61997e-06 [fold_const_symbol]: 1.279e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 2.262e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.58999e-06 [opt_after_jit_grad]: 0.00046167 [validate]: 4.044e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00762861 [execute]: 7.72998e-06 Sums bootstrap : 0.000556s : 1.75% type_inference : 0.011249s : 35.38% event_method : 0.000042s : 0.13% auto_monad : 0.000119s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000128s : 0.40% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000105s : 0.33% optimize.opt_a.a_1 : 0.003050s : 9.59% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000040s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000014s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000464s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000033s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000024s : 0.08% optimize.opt_a.parallel : 0.000030s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000017s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.11% optimize.opt_a.virtual_dataset : 0.000031s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.09% optimize.opt_a.virtual_output : 0.000030s : 0.09% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000058s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000052s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.06% optimize.opt_a.meta_fg_expand : 0.001456s : 4.58% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000086s : 0.27% optimize.opt_a.a_after_grad : 0.000108s : 0.34% optimize.opt_a.renormalize : 0.002834s : 8.91% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000075s : 0.23% optimize.opt_a.cse : 0.000204s : 0.64% optimize.opt_a.a_3 : 0.000460s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000043s : 0.14% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000483s : 1.52% optimize.opt_b.b_1 : 0.000172s : 0.54% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000024s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000423s : 1.33% optimize.opt_after_cconv.c_1 : 0.000043s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000023s : 0.07% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000026s : 0.08% optimize.tuple_transform.d_1 : 0.000059s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000017s : 0.05% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000023s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000462s : 1.45% validate : 0.000040s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.007629s : 23.99% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000727 209 6.00% : 0.000044s : 11: substitution.arithmetic_simplify 0.37% : 0.000003s : 4: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.64% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 4: substitution.fold_const_symbol 0.98% : 0.000007s : 7: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 55.67% : 0.000405s : 16: substitution.inline 2.27% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 18: substitution.j_node_and_user_rematch 2.13% : 0.000015s : 3: substitution.less_batch_normalization 1.84% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000005s : 5: substitution.partial_eliminate 1.72% : 0.000013s : 18: substitution.remove_not_recompute_node 3.50% : 0.000025s : 10: substitution.replace_applicator 1.44% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.87% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.83% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.51% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011173 2 88.22% : 0.009857s : 1: type_inference.infer 11.78% : 0.001316s : 1: type_inference.specialize ------[replace.] 0.000203 30 58.99% : 0.000120s : 16: replace.inline 41.01% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000428 30 92.60% : 0.000397s : 16: match.inline 7.40% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5429 1.08% : 0.000008s : 65: predicate.accumulaten_eliminater 0.27% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 30: predicate.addn_check_dump 1.11% : 0.000008s : 65: predicate.addn_zero_filter 1.06% : 0.000008s : 65: predicate.adjust_all_reduce_mul_add 2.00% : 0.000015s : 95: predicate.arithmetic_simplify 1.07% : 0.000008s : 65: predicate.cast_eliminate 1.12% : 0.000008s : 65: predicate.check_bprop_eliminate 0.52% : 0.000004s : 30: predicate.compare_switch_simplify 0.07% : 0.000001s : 7: predicate.const_output_eliminate 0.52% : 0.000004s : 30: predicate.depend_value_elim 1.18% : 0.000009s : 65: predicate.dict_get_item_const_eliminator 1.16% : 0.000008s : 65: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 7: predicate.elim_not_effective 0.15% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 72: predicate.environ_add_const_eliminate 1.15% : 0.000008s : 72: predicate.environ_get_add_eliminate 1.16% : 0.000008s : 72: predicate.environ_get_depend_swap 1.71% : 0.000013s : 102: predicate.environ_get_eliminate 1.15% : 0.000008s : 72: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 95: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 95: predicate.float_depend_g_call 0.50% : 0.000004s : 30: predicate.float_environ_get_switch 0.64% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.53% : 0.000004s : 30: predicate.get_grad_eliminate 0.08% : 0.000001s : 7: predicate.graph_param_transform 0.52% : 0.000004s : 30: predicate.incorporate_call 0.48% : 0.000003s : 30: predicate.incorporate_call_switch 5.31% : 0.000039s : 234: predicate.inline 1.21% : 0.000009s : 53: predicate.inline_without_move 0.28% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.60% : 0.000004s : 30: predicate.less_batch_normalization 1.60% : 0.000012s : 93: predicate.list_to_tuple_eliminator_ 2.59% : 0.000019s : 158: predicate.load_eliminater 0.31% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.16% : 0.000016s : 126: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 79: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 30: predicate.merge_addn 1.10% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.15% : 0.000008s : 65: predicate.minmaximum_grad 0.30% : 0.000002s : 7: predicate.mutable_eliminate 0.14% : 0.000001s : 7: predicate.opt_reshape 0.16% : 0.000001s : 7: predicate.parallel_virtual_node 2.01% : 0.000015s : 95: predicate.partial_defer_inline 1.67% : 0.000012s : 86: predicate.partial_eliminate 1.06% : 0.000008s : 65: predicate.print_const_string_wrapper 0.51% : 0.000004s : 30: predicate.reduce_all_const_elim 1.30% : 0.000010s : 65: predicate.reduce_eliminate 2.58% : 0.000019s : 158: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 30: predicate.remove_not_recompute_node 5.20% : 0.000038s : 144: predicate.replace_applicator 0.58% : 0.000004s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.09% : 0.000008s : 65: predicate.reshape_eliminate 1.12% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 7: predicate.row_tensor_eliminate 1.23% : 0.000009s : 65: predicate.same_eliminate 0.34% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 30: predicate.shard_identity_eliminate 0.25% : 0.000002s : 14: predicate.special_op_eliminate 0.59% : 0.000004s : 30: predicate.specialize_transform 1.18% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.13% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 95: predicate.switch_defer_inline 2.84% : 0.000021s : 160: predicate.switch_layer_defer_inline 4.81% : 0.000035s : 258: predicate.switch_simplify 1.09% : 0.000008s : 65: predicate.tile_eliminate 1.06% : 0.000008s : 65: predicate.transpose_eliminate 1.40% : 0.000010s : 79: predicate.tuple_list_convert_item_index_to_positive 1.43% : 0.000010s : 79: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000010s : 79: predicate.tuple_list_get_item_depend_reorder 2.68% : 0.000020s : 123: predicate.tuple_list_get_item_eliminator 1.35% : 0.000010s : 79: predicate.tuple_list_get_set_item_eliminator 1.88% : 0.000014s : 109: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 93: predicate.tuple_to_list_eliminator_ 2.55% : 0.000019s : 158: predicate.updatestate_pure_node_eliminater 3.17% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 7: predicate.value_based_eliminate 0.54% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 30: predicate.virtual_output_eliminate 0.13% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001484 32 57.99% : 0.000861s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.01% : 0.000623s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059605 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.18% : 0.003087s : 1: add_attr 5.16% : 0.003078s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000125s : 1: auto_monad 0.04% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 1.00% : 0.000595s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000018s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.83% : 0.000493s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000016s : 1: opt.transform.mutable_eliminate 7.78% : 0.004640s : 117: opt.transform.opt_a 0.07% : 0.000041s : 1: opt.transform.opt_after_cconv 0.05% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000149s : 28: opt.transform.opt_b 0.11% : 0.000066s : 2: opt.transform.opt_trans_graph 0.08% : 0.000046s : 4: opt.transform.symbol_engine_opt 17.53% : 0.010450s : 1: opt_a 0.21% : 0.000124s : 1: opt_after_cconv 0.79% : 0.000472s : 1: opt_after_jit_grad 0.44% : 0.000265s : 1: opt_b 21.21% : 0.012641s : 1: optimize 0.04% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000031s : 1: remove_dup_value 2.46% : 0.001469s : 2: renormalize.infer 2.27% : 0.001352s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000047s : 1: rewriter_after_opt_a 0.22% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000094s : 1: symbol_engine_optimizer 12.82% : 0.007639s : 1: task_emit 0.16% : 0.000096s : 1: tuple_transform 18.90% : 0.011265s : 1: type_inference 0.12% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x2-kbk],max_mem:14.0M TotalTime = 0.126061, [24] [bootstrap]: 0.00055853 [type_inference]: 0.00638493 [event_method]: 1.417e-05 [auto_monad]: 6.129e-05 [graph_reusing]: 5.67001e-06 [inline]: 1.87999e-06 [add_attr]: 0.00370228, [1] [add_attr_with_inline]: 0.00369017, [1] [Cycle 1]: 5.183e-05, [2] [tag_attr]: 1.776e-05 [meta_addattr_fg_expand]: 4.21001e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 3.155e-05 [insert-virtual-dataset]: 2.71999e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00425642, [53] [py_interpret_to_execute]: 2.101e-05 [rewriter_before_opt_a]: 6.275e-05 [opt_a]: 0.00235067, [2] [Cycle 1]: 0.00174686, [45] [expand_dump_flag]: 3.41001e-06 [switch_simplify]: 3.316e-05 [loop_unroll]: 2.085e-05 [a_1]: 0.00052195 [with_stream_mark]: 1.435e-05 [recompute_prepare]: 7.90998e-06 [updatestate_depend_eliminate]: 4.49998e-06 [updatestate_assign_eliminate]: 3.55003e-06 [updatestate_loads_eliminate]: 3.09001e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 7.662e-05 [accelerated_algorithm]: 6.70002e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 8.37e-06 [auto_parallel]: 5.91e-06 [parallel]: 2.738e-05 [flash_sp]: 8.3e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.17002e-06 [matmul_add_comm_reduction]: 9.56e-06 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 7.87998e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 5.94999e-06 [merge_forward]: 4.05998e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.047e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.163e-05 [merge_recompute_call_nodes]: 1.75001e-06 [before_grad]: 9.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.70998e-06 [meta_fg_expand]: 2.25002e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.33002e-06 [after_resolve]: 1.043e-05 [a_after_grad]: 9.82001e-06 [renormalize]: 0.00055239 [add_forward_monad_depend]: 5.15999e-06 [auto_monad_grad]: 2.36998e-06 [auto_monad_eliminator]: 1.469e-05 [cse]: 3.032e-05 [a_3]: 4.2e-05 [Cycle 2]: 0.00059442, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.66999e-06 [loop_unroll]: 5.77999e-06 [a_1]: 0.00012789 [with_stream_mark]: 1.023e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.36e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.78e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 5.10999e-06 [auto_parallel]: 5.46998e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 3.05998e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.06997e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.01e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.00001e-06 [a_after_grad]: 8.21002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.296e-05 [a_3]: 3.247e-05 [py_interpret_to_execute_after_opt_a]: 7.82998e-06 [slice_cell_reuse_recomputed_activation]: 2.25002e-06 [rewriter_after_opt_a]: 3.38e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.00046907 [opt_b]: 0.0001936, [1] [Cycle 1]: 0.00018682, [7] [b_1]: 0.0001093 [b_2]: 1.369e-05 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 2.79999e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 2.19996e-07 [cse]: 1.749e-05 [optimize_parallel_all_gather_comm]: 1.669e-05 [overlap_param_gather]: 1.96003e-06 [cconv]: 2.47e-05 [loop_unroll]: 0.0004176 [opt_after_cconv]: 9.64e-05, [1] [Cycle 1]: 9.045e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.49998e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.685e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.336e-05 [tuple_transform]: 7.069e-05, [1] [Cycle 1]: 6.619e-05, [4] [d_1]: 4.019e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.54999e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 5.212e-05 [cse_after_recomputation]: 2.114e-05, [1] [Cycle 1]: 1.679e-05, [1] [cse]: 1.143e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.64001e-06 [micro_interleaved_order_control]: 2.25002e-06 [assign_add_opt]: 1.81e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.34e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76003e-06 [control_data_broadcast_order]: 1.243e-05 [grouped_pairwise_exchange_alltoall]: 2.05002e-06 [offloading_packed_experts]: 3.42997e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.74999e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.854e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.989e-05, [1] [Cycle 1]: 6.577e-05, [6] [build]: 2.74999e-06 [elim_shapecalc]: 8.50999e-06 [elim_not_effective]: 1.208e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 9.40001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.47999e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 1.718e-05 [get_jit_bprop_graph]: 1.39e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00050146 [validate]: 3.394e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.110227 [execute]: 9.76e-06 Sums bootstrap : 0.000559s : 0.46% type_inference : 0.006385s : 5.26% event_method : 0.000014s : 0.01% auto_monad : 0.000061s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000032s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000063s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000650s : 0.54% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000032s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000552s : 0.46% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000469s : 0.39% optimize.opt_b.b_1 : 0.000109s : 0.09% optimize.opt_b.b_2 : 0.000014s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.02% optimize.loop_unroll : 0.000418s : 0.34% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000501s : 0.41% validate : 0.000034s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.110227s : 90.83% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000231 30 10.86% : 0.000025s : 5: substitution.arithmetic_simplify 0.79% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000002s : 2: substitution.fold_const_symbol 2.55% : 0.000006s : 4: substitution.graph_param_transform 74.66% : 0.000172s : 3: substitution.inline 1.39% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.06% : 0.000005s : 4: substitution.remove_not_recompute_node 1.81% : 0.000004s : 4: substitution.replace_old_param 5.15% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006334 2 90.62% : 0.005740s : 1: type_inference.infer 9.38% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000041 5 71.14% : 0.000029s : 3: replace.inline 28.86% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000181 5 94.02% : 0.000170s : 3: match.inline 5.98% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 1.03% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.88% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.93% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.12% : 0.000010s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.29% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.19% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.86% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.57% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.75% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000368 8 45.87% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.13% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.135748 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.73% : 0.003707s : 1: add_attr 2.72% : 0.003694s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000066s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.44% : 0.000599s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000479s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.75% : 0.001022s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000098s : 28: opt.transform.opt_b 0.03% : 0.000045s : 2: opt.transform.opt_trans_graph 0.02% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.73% : 0.002354s : 1: opt_a 0.07% : 0.000100s : 1: opt_after_cconv 0.38% : 0.000511s : 1: opt_after_jit_grad 0.15% : 0.000197s : 1: opt_b 3.14% : 0.004261s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000036s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.23% : 0.000314s : 1: renormalize.infer 0.17% : 0.000230s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000038s : 1: rewriter_after_opt_a 0.05% : 0.000067s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000072s : 1: symbol_engine_optimizer 81.22% : 0.110252s : 1: task_emit 0.05% : 0.000074s : 1: tuple_transform 4.72% : 0.006405s : 1: type_inference 0.05% : 0.000063s : 1: validate TotalTime = 0.10078, [24] [bootstrap]: 0.00044752 [type_inference]: 0.00452139 [event_method]: 1.117e-05 [auto_monad]: 5.491e-05 [graph_reusing]: 5.46998e-06 [inline]: 1.96e-06 [add_attr]: 0.00308394, [1] [add_attr_with_inline]: 0.00307586, [1] [Cycle 1]: 4.555e-05, [2] [tag_attr]: 1.293e-05 [meta_addattr_fg_expand]: 3.24001e-06 [parallel-infer-symbol]: 3.16001e-06 [pre_auto_parallel]: 2.284e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00380199, [53] [py_interpret_to_execute]: 1.616e-05 [rewriter_before_opt_a]: 4.14e-05 [opt_a]: 0.00195327, [2] [Cycle 1]: 0.00135714, [45] [expand_dump_flag]: 3.14999e-06 [switch_simplify]: 2.638e-05 [loop_unroll]: 1.363e-05 [a_1]: 0.00029585 [with_stream_mark]: 1.568e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.36999e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.683e-05 [accelerated_algorithm]: 6.09001e-06 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 1.56002e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 8.46997e-06 [auto_parallel]: 6.00002e-06 [parallel]: 1.981e-05 [flash_sp]: 7.25e-06 [merge_comm]: 3.67998e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 9.72001e-06 [allreduce_slice_to_reducescatter]: 9.49978e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 6.81999e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 3.69002e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.129e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 9.24e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33998e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.11e-06 [after_resolve]: 1.073e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00042993 [add_forward_monad_depend]: 4.70999e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.354e-05 [cse]: 2.863e-05 [a_3]: 4.044e-05 [Cycle 2]: 0.00058654, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.89001e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00011937 [with_stream_mark]: 1.09e-05 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.74e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.29998e-06 [parallel]: 4.34002e-06 [flash_sp]: 3.81999e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.51002e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.58002e-06 [offload_activation]: 6.21e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 9.27999e-06 [a_after_grad]: 7.97998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 5.96998e-06 [cse]: 1.196e-05 [a_3]: 3.187e-05 [py_interpret_to_execute_after_opt_a]: 7.03e-06 [slice_cell_reuse_recomputed_activation]: 2.01998e-06 [rewriter_after_opt_a]: 3.168e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00047964 [opt_b]: 0.00018361, [1] [Cycle 1]: 0.00017719, [7] [b_1]: 0.00010791 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.04e-06 [renormalize]: 6.69999e-07 [cse]: 1.72e-05 [optimize_parallel_all_gather_comm]: 1.692e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.371e-05 [loop_unroll]: 0.00041005 [opt_after_cconv]: 9.432e-05, [1] [Cycle 1]: 8.858e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.41998e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.568e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.288e-05 [tuple_transform]: 6.947e-05, [1] [Cycle 1]: 6.471e-05, [4] [d_1]: 3.877e-05 [none_parameter_eliminate]: 1.90001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 5.96e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 4.423e-05 [cse_after_recomputation]: 1.946e-05, [1] [Cycle 1]: 1.514e-05, [1] [cse]: 1.023e-05 [environ_conv]: 5.08002e-06 [swap_dp_allreduce_reducescatter]: 5.32999e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.35999e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.49999e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.55001e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.66e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.30999e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.215e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 4.27998e-06 [overlap_grad_flash_sp]: 1.834e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.29999e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.869e-05, [1] [Cycle 1]: 6.442e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 8.31002e-06 [elim_not_effective]: 1.168e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.614e-05 [get_jit_bprop_graph]: 1.23002e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00044634 [validate]: 3.508e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0880825 [execute]: 1.102e-05 Sums bootstrap : 0.000448s : 0.46% type_inference : 0.004521s : 4.67% event_method : 0.000011s : 0.01% auto_monad : 0.000055s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000415s : 0.43% optimize.opt_a.with_stream_mark : 0.000027s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000430s : 0.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000480s : 0.50% optimize.opt_b.b_1 : 0.000108s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000410s : 0.42% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.05% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000446s : 0.46% validate : 0.000035s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.088083s : 91.07% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.34% : 0.000023s : 4: substitution.arithmetic_simplify 1.67% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.35% : 0.000005s : 4: substitution.graph_param_transform 65.76% : 0.000082s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004475 2 91.52% : 0.004096s : 1: type_inference.infer 8.48% : 0.000380s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000139 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.15% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.71% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.84% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.43% : 0.000001s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000009s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.51% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.47% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 1.17% : 0.000002s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.61% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000283 6 41.19% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.81% : 0.000167s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.109017 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.83% : 0.003088s : 1: add_attr 2.82% : 0.003079s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000060s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000484s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000019s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.38% : 0.000419s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000489s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.71% : 0.000769s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.79% : 0.001956s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.42% : 0.000456s : 1: opt_after_jit_grad 0.17% : 0.000187s : 1: opt_b 3.49% : 0.003806s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.22% : 0.000244s : 1: renormalize.infer 0.16% : 0.000180s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000046s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000071s : 1: symbol_engine_optimizer 80.82% : 0.088105s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 4.16% : 0.004539s : 1: type_inference 0.06% : 0.000060s : 1: validate TotalTime = 0.101641, [24] [bootstrap]: 0.00044607 [type_inference]: 0.00583158 [event_method]: 1.494e-05 [auto_monad]: 5.87e-05 [graph_reusing]: 6.02001e-06 [inline]: 2.12001e-06 [add_attr]: 0.00304973, [1] [add_attr_with_inline]: 0.00304095, [1] [Cycle 1]: 4.795e-05, [2] [tag_attr]: 1.576e-05 [meta_addattr_fg_expand]: 4.60001e-06 [parallel-infer-symbol]: 3.60998e-06 [pre_auto_parallel]: 2.624e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.31e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0041572, [53] [py_interpret_to_execute]: 2.234e-05 [rewriter_before_opt_a]: 6.158e-05 [opt_a]: 0.00226097, [2] [Cycle 1]: 0.00165031, [45] [expand_dump_flag]: 2.68003e-06 [switch_simplify]: 3.24e-05 [loop_unroll]: 2.084e-05 [a_1]: 0.00045632 [with_stream_mark]: 1.51e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 4.22e-06 [updatestate_assign_eliminate]: 3.37002e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 2.21e-06 [a_2]: 8.967e-05 [accelerated_algorithm]: 6.83998e-06 [shard]: 2.56e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 8.65999e-06 [auto_parallel]: 6.44001e-06 [parallel]: 1.984e-05 [flash_sp]: 8.04002e-06 [merge_comm]: 3.77002e-06 [allreduce_fusion]: 3.67998e-06 [matmul_add_comm_reduction]: 9.81e-06 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.91003e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 5.64e-06 [merge_forward]: 4.08999e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 1.025e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 9.00001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.26998e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 1.008e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00052145 [add_forward_monad_depend]: 4.71002e-06 [auto_monad_grad]: 2.34999e-06 [auto_monad_eliminator]: 1.435e-05 [cse]: 3.201e-05 [a_3]: 4.197e-05 [Cycle 2]: 0.00060043, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.71999e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.0001253 [with_stream_mark]: 9.84001e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.35002e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 6.877e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 4.65999e-06 [auto_parallel]: 5.25999e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 3.30998e-06 [matmul_add_comm_reduction]: 9.10001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.36e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.21998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.76e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 9.00007e-07 [after_resolve]: 8.78001e-06 [a_after_grad]: 8.22e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.319e-05 [a_3]: 3.21e-05 [py_interpret_to_execute_after_opt_a]: 7.71999e-06 [slice_cell_reuse_recomputed_activation]: 2.47001e-06 [rewriter_after_opt_a]: 3.26e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.78002e-06 [mutable_eliminate]: 0.00047608 [opt_b]: 0.00019038, [1] [Cycle 1]: 0.00018432, [7] [b_1]: 0.00011624 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.99975e-07 [cse]: 1.689e-05 [optimize_parallel_all_gather_comm]: 1.624e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.497e-05 [loop_unroll]: 0.00042305 [opt_after_cconv]: 9.515e-05, [1] [Cycle 1]: 8.95e-05, [7] [c_1]: 2.796e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.652e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.3e-05 [tuple_transform]: 6.988e-05, [1] [Cycle 1]: 6.57e-05, [4] [d_1]: 3.961e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.30002e-06 [partial_unused_args_eliminate]: 2.22999e-06 [add_recomputation]: 4.716e-05 [cse_after_recomputation]: 1.972e-05, [1] [Cycle 1]: 1.536e-05, [1] [cse]: 1.04e-05 [environ_conv]: 5.14e-06 [swap_dp_allreduce_reducescatter]: 5.78002e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.47999e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.32999e-06 [full_micro_interleaved_order_control]: 2.38998e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 4.06001e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.57001e-06 [overlap_grad_ring_attention]: 4.02998e-06 [overlap_grad_flash_sp]: 1.778e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.84001e-06 [split_layernorm_comm]: 1.86003e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.709e-05, [1] [Cycle 1]: 6.322e-05, [6] [build]: 2.57001e-06 [elim_shapecalc]: 8.38001e-06 [elim_not_effective]: 1.123e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.59e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.611e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.62002e-06 [opt_after_jit_grad]: 0.00045457 [validate]: 3.403e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0872779 [execute]: 9.99001e-06 Sums bootstrap : 0.000446s : 0.46% type_inference : 0.005832s : 5.98% event_method : 0.000015s : 0.02% auto_monad : 0.000059s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000026s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000062s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000582s : 0.60% optimize.opt_a.with_stream_mark : 0.000025s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000158s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000522s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000045s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000476s : 0.49% optimize.opt_b.b_1 : 0.000116s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.03% optimize.loop_unroll : 0.000423s : 0.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.05% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000455s : 0.47% validate : 0.000034s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.087278s : 89.43% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000172 30 14.65% : 0.000025s : 5: substitution.arithmetic_simplify 1.21% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.63% : 0.000006s : 4: substitution.graph_param_transform 66.87% : 0.000115s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.45% : 0.000004s : 4: substitution.remove_not_recompute_node 2.24% : 0.000004s : 4: substitution.replace_old_param 6.56% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005786 2 89.86% : 0.005199s : 1: type_inference.infer 10.14% : 0.000587s : 1: type_inference.specialize ------[replace.] 0.000041 5 72.47% : 0.000030s : 3: replace.inline 27.53% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 5 91.71% : 0.000113s : 3: match.inline 8.29% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.61% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.95% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_depend_swap 1.94% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.78% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 1.11% : 0.000002s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 44.66% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.34% : 0.000213s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.110478 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.76% : 0.003055s : 1: add_attr 2.76% : 0.003045s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000064s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000480s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.39% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.44% : 0.000486s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.86% : 0.000950s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000096s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.05% : 0.002264s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.42% : 0.000464s : 1: opt_after_jit_grad 0.18% : 0.000194s : 1: opt_b 3.77% : 0.004161s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.25% : 0.000275s : 1: renormalize.infer 0.22% : 0.000239s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.06% : 0.000066s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 79.02% : 0.087304s : 1: task_emit 0.07% : 0.000073s : 1: tuple_transform 5.31% : 0.005861s : 1: type_inference 0.05% : 0.000060s : 1: validate TotalTime = 0.142944, [24] [bootstrap]: 0.00046037 [type_inference]: 0.0115957 [event_method]: 5.193e-05 [auto_monad]: 0.00012535 [graph_reusing]: 8.47e-06 [inline]: 1.93002e-06 [add_attr]: 0.00307393, [1] [add_attr_with_inline]: 0.00306558, [1] [Cycle 1]: 7.482e-05, [2] [tag_attr]: 3.629e-05 [meta_addattr_fg_expand]: 9.59e-06 [parallel-infer-symbol]: 3.29001e-06 [pre_auto_parallel]: 5.126e-05 [insert-virtual-dataset]: 2.44999e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.90001e-06 [optimize]: 0.0132454, [53] [py_interpret_to_execute]: 4.026e-05 [rewriter_before_opt_a]: 0.00014651 [opt_a]: 0.0109862, [3] [Cycle 1]: 0.00726042, [45] [expand_dump_flag]: 4.21001e-06 [switch_simplify]: 7.41e-05 [loop_unroll]: 6.159e-05 [a_1]: 0.00149229 [with_stream_mark]: 2.489e-05 [recompute_prepare]: 2.306e-05 [updatestate_depend_eliminate]: 9.62001e-06 [updatestate_assign_eliminate]: 7.95998e-06 [updatestate_loads_eliminate]: 7.35998e-06 [parameter_eliminate]: 2.64001e-06 [a_2]: 0.00024813 [accelerated_algorithm]: 3.23e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 3.58999e-06 [shard_inline]: 1.61e-05 [merge_send_recv]: 1.761e-05 [auto_parallel]: 1.095e-05 [parallel]: 2.076e-05 [flash_sp]: 1.167e-05 [merge_comm]: 9.64e-06 [allreduce_fusion]: 8.69e-06 [matmul_add_comm_reduction]: 2.736e-05 [allreduce_slice_to_reducescatter]: 8.00006e-07 [virtual_shard_identity]: 1.837e-05 [virtual_dataset]: 1.594e-05 [get_grad_eliminate_]: 1.55e-05 [virtual_output]: 1.531e-05 [merge_forward]: 9.66e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 1.942e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.958e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 2.723e-05 [set_forward_comm_id_for_comm_node_pass]: 9.77999e-06 [meta_fg_expand]: 0.00145167 [flash_sp_send_recv_attached]: 3.88999e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 6.012e-05 [a_after_grad]: 8.109e-05 [renormalize]: 0.00252666 [add_forward_monad_depend]: 1.019e-05 [auto_monad_grad]: 6.26998e-06 [auto_monad_eliminator]: 5.745e-05 [cse]: 0.00017334 [a_3]: 0.00032952 [Cycle 2]: 0.00291077, [45] [expand_dump_flag]: 1.96998e-06 [switch_simplify]: 4.694e-05 [loop_unroll]: 4.357e-05 [a_1]: 0.00150017 [with_stream_mark]: 1.244e-05 [recompute_prepare]: 9.59e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 3.7e-06 [updatestate_loads_eliminate]: 3.09001e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 0.00010916 [accelerated_algorithm]: 1.06e-05 [shard]: 9.90025e-07 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 7.89002e-06 [merge_send_recv]: 6.22001e-06 [auto_parallel]: 7.13e-06 [parallel]: 5.88002e-06 [flash_sp]: 3.51999e-06 [merge_comm]: 4.53999e-06 [allreduce_fusion]: 4.18001e-06 [matmul_add_comm_reduction]: 7.51001e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 9.14e-06 [virtual_dataset]: 7.66999e-06 [get_grad_eliminate_]: 7.55e-06 [virtual_output]: 7.82e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 8.71997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.441e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 2.532e-05 [set_forward_comm_id_for_comm_node_pass]: 4.73001e-06 [meta_fg_expand]: 8.128e-05 [flash_sp_send_recv_attached]: 1.25001e-06 [receive_attached]: 1.10001e-06 [after_resolve]: 1.552e-05 [a_after_grad]: 1.299e-05 [renormalize]: 0.00057021 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.54e-06 [auto_monad_eliminator]: 1.343e-05 [cse]: 2.283e-05 [a_3]: 5.655e-05 [Cycle 3]: 0.0007999, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 9.19e-06 [loop_unroll]: 7.69002e-06 [a_1]: 0.00021335 [with_stream_mark]: 8.92999e-06 [recompute_prepare]: 8.01001e-06 [updatestate_depend_eliminate]: 4.21001e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 3.43e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00010715 [accelerated_algorithm]: 1.049e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 8.03999e-06 [merge_send_recv]: 6.22001e-06 [auto_parallel]: 6.45002e-06 [parallel]: 5.10999e-06 [flash_sp]: 9.50007e-07 [merge_comm]: 4.21001e-06 [allreduce_fusion]: 4.17998e-06 [matmul_add_comm_reduction]: 6.69999e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 8.76002e-06 [virtual_dataset]: 7.50998e-06 [get_grad_eliminate_]: 7.65998e-06 [virtual_output]: 7.23999e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 7.90998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.39e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.213e-05 [set_forward_comm_id_for_comm_node_pass]: 4.67e-06 [meta_fg_expand]: 2.71999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.231e-05 [a_after_grad]: 1.268e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 7.99977e-07 [auto_monad_eliminator]: 8.36002e-06 [cse]: 1.963e-05 [a_3]: 4.936e-05 [py_interpret_to_execute_after_opt_a]: 1.041e-05 [slice_cell_reuse_recomputed_activation]: 2.27999e-06 [rewriter_after_opt_a]: 4.518e-05 [convert_after_rewriter]: 8.29998e-06 [order_py_execute_after_rewriter]: 6.34001e-06 [mutable_eliminate]: 0.00051929 [opt_b]: 0.00025635, [1] [Cycle 1]: 0.00025046, [7] [b_1]: 0.00016753 [b_2]: 9.72999e-06 [updatestate_depend_eliminate]: 6.47001e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 3.24001e-06 [renormalize]: 4.90021e-07 [cse]: 2.513e-05 [optimize_parallel_all_gather_comm]: 1.98e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.208e-05 [loop_unroll]: 0.00042862 [opt_after_cconv]: 0.00012374, [1] [Cycle 1]: 0.00011789, [7] [c_1]: 4.337e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 6.94001e-06 [updatestate_assign_eliminate]: 3.56001e-06 [updatestate_loads_eliminate]: 3.22002e-06 [cse]: 2.504e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.704e-05 [tuple_transform]: 9.348e-05, [1] [Cycle 1]: 8.878e-05, [4] [d_1]: 5.99e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 8.70001e-06 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 5.594e-05 [cse_after_recomputation]: 2.785e-05, [1] [Cycle 1]: 2.287e-05, [1] [cse]: 1.773e-05 [environ_conv]: 8.31002e-06 [swap_dp_allreduce_reducescatter]: 7.16999e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.43001e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.81e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71998e-06 [control_data_broadcast_order]: 1.531e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 5.39998e-06 [overlap_recompute_and_grad_model_parallel]: 5.44e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.02999e-06 [overlap_grad_ring_attention]: 4.71002e-06 [overlap_grad_flash_sp]: 2.334e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.152e-05, [1] [Cycle 1]: 8.737e-05, [6] [build]: 1.069e-05 [elim_shapecalc]: 1.174e-05 [elim_not_effective]: 1.58e-05 [opt_reshape]: 8.50001e-06 [fold_const_symbol]: 1.28e-05 [renormalize]: 1.59984e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 2.241e-05 [get_jit_bprop_graph]: 1.39e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00048708 [validate]: 4.388e-05 [backend_pass]: 8.90024e-07 [task_emit]: 0.113518 [execute]: 1.084e-05 Sums bootstrap : 0.000460s : 0.33% type_inference : 0.011596s : 8.37% event_method : 0.000052s : 0.04% auto_monad : 0.000125s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.03% optimize.rewriter_before_opt_a : 0.000147s : 0.11% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.09% optimize.opt_a.loop_unroll : 0.000113s : 0.08% optimize.opt_a.a_1 : 0.003206s : 2.31% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000041s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000464s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000032s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000032s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000018s : 0.01% optimize.opt_a.allreduce_fusion : 0.000017s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.03% optimize.opt_a.virtual_dataset : 0.000031s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000031s : 0.02% optimize.opt_a.virtual_output : 0.000030s : 0.02% optimize.opt_a.merge_forward : 0.000017s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000058s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000065s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.01% optimize.opt_a.meta_fg_expand : 0.001536s : 1.11% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.06% optimize.opt_a.a_after_grad : 0.000107s : 0.08% optimize.opt_a.renormalize : 0.003097s : 2.23% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000079s : 0.06% optimize.opt_a.cse : 0.000216s : 0.16% optimize.opt_a.a_3 : 0.000435s : 0.31% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000045s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000519s : 0.37% optimize.opt_b.b_1 : 0.000168s : 0.12% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000025s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000429s : 0.31% optimize.opt_after_cconv.c_1 : 0.000043s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000027s : 0.02% optimize.tuple_transform.d_1 : 0.000060s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.04% optimize.cse_after_recomputation.cse : 0.000018s : 0.01% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000487s : 0.35% validate : 0.000044s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.113518s : 81.90% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000774 213 6.12% : 0.000047s : 12: substitution.arithmetic_simplify 0.34% : 0.000003s : 4: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 4: substitution.fold_const_symbol 0.95% : 0.000007s : 7: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 57.27% : 0.000443s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 18: substitution.j_node_and_user_rematch 2.00% : 0.000016s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000006s : 5: substitution.partial_eliminate 1.61% : 0.000012s : 18: substitution.remove_not_recompute_node 3.17% : 0.000025s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.42% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.62% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.77% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.48% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011517 2 86.47% : 0.009958s : 1: type_inference.infer 13.53% : 0.001559s : 1: type_inference.specialize ------[replace.] 0.000227 33 57.80% : 0.000131s : 17: replace.inline 42.20% : 0.000096s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000469 33 92.63% : 0.000434s : 17: match.inline 7.37% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000722 5530 1.08% : 0.000008s : 66: predicate.accumulaten_eliminater 0.27% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 30: predicate.addn_check_dump 1.05% : 0.000008s : 66: predicate.addn_zero_filter 1.07% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 96: predicate.arithmetic_simplify 1.10% : 0.000008s : 66: predicate.cast_eliminate 1.11% : 0.000008s : 65: predicate.check_bprop_eliminate 0.51% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.53% : 0.000004s : 30: predicate.depend_value_elim 1.21% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 66: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 66: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 7: predicate.elim_not_effective 0.16% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000009s : 73: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 73: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 73: predicate.environ_get_depend_swap 1.76% : 0.000013s : 103: predicate.environ_get_eliminate 1.19% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.76% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.38% : 0.000017s : 99: predicate.float_depend_g_call 0.52% : 0.000004s : 30: predicate.float_environ_get_switch 0.65% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.54% : 0.000004s : 30: predicate.get_grad_eliminate 0.08% : 0.000001s : 7: predicate.graph_param_transform 0.52% : 0.000004s : 30: predicate.incorporate_call 0.48% : 0.000003s : 30: predicate.incorporate_call_switch 5.58% : 0.000040s : 239: predicate.inline 1.28% : 0.000009s : 53: predicate.inline_without_move 0.30% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 30: predicate.less_batch_normalization 1.65% : 0.000012s : 96: predicate.list_to_tuple_eliminator_ 2.67% : 0.000019s : 162: predicate.load_eliminater 0.32% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.34% : 0.000017s : 134: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 80: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 30: predicate.merge_addn 1.14% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 66: predicate.minmaximum_grad 0.35% : 0.000003s : 7: predicate.mutable_eliminate 0.15% : 0.000001s : 7: predicate.opt_reshape 0.14% : 0.000001s : 7: predicate.parallel_virtual_node 2.13% : 0.000015s : 99: predicate.partial_defer_inline 1.77% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 66: predicate.print_const_string_wrapper 0.54% : 0.000004s : 30: predicate.reduce_all_const_elim 1.29% : 0.000009s : 66: predicate.reduce_eliminate 2.68% : 0.000019s : 162: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 30: predicate.remove_not_recompute_node 1.93% : 0.000014s : 147: predicate.replace_applicator 0.61% : 0.000004s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.12% : 0.000008s : 66: predicate.reshape_eliminate 1.15% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 7: predicate.row_tensor_eliminate 1.25% : 0.000009s : 65: predicate.same_eliminate 0.36% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 30: predicate.shard_identity_eliminate 0.26% : 0.000002s : 14: predicate.special_op_eliminate 0.61% : 0.000004s : 30: predicate.specialize_transform 1.22% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 99: predicate.switch_defer_inline 2.95% : 0.000021s : 164: predicate.switch_layer_defer_inline 5.04% : 0.000036s : 270: predicate.switch_simplify 1.09% : 0.000008s : 66: predicate.tile_eliminate 1.09% : 0.000008s : 66: predicate.transpose_eliminate 1.44% : 0.000010s : 80: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 80: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000009s : 80: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000021s : 126: predicate.tuple_list_get_item_eliminator 1.41% : 0.000010s : 80: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000014s : 110: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 96: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 162: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 192: predicate.updatestate_useless_node_eliminater 0.12% : 0.000001s : 7: predicate.value_based_eliminate 0.56% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 30: predicate.virtual_output_eliminate 0.13% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001639 34 56.61% : 0.000928s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.39% : 0.000711s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.167429 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.84% : 0.003078s : 1: add_attr 1.83% : 0.003069s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000133s : 1: auto_monad 0.02% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.29% : 0.000493s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000031s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.04% : 0.000059s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.26% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.32% : 0.000528s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.87% : 0.004802s : 117: opt.transform.opt_a 0.03% : 0.000042s : 1: opt.transform.opt_after_cconv 0.02% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000149s : 28: opt.transform.opt_b 0.04% : 0.000067s : 2: opt.transform.opt_trans_graph 0.03% : 0.000046s : 4: opt.transform.symbol_engine_opt 6.56% : 0.010989s : 1: opt_a 0.08% : 0.000127s : 1: opt_after_cconv 0.30% : 0.000497s : 1: opt_after_jit_grad 0.16% : 0.000260s : 1: opt_b 7.91% : 0.013250s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000056s : 1: pre_auto_parallel 0.03% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000031s : 1: remove_dup_value 0.97% : 0.001618s : 2: renormalize.infer 0.88% : 0.001466s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000049s : 1: rewriter_after_opt_a 0.09% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000094s : 1: symbol_engine_optimizer 67.82% : 0.113544s : 1: task_emit 0.06% : 0.000096s : 1: tuple_transform 6.94% : 0.011613s : 1: type_inference 0.04% : 0.000070s : 1: validate TotalTime = 0.101254, [24] [bootstrap]: 0.00042004 [type_inference]: 0.00444693 [event_method]: 1.074e-05 [auto_monad]: 5.5e-05 [graph_reusing]: 5.77999e-06 [inline]: 1.80001e-06 [add_attr]: 0.00304765, [1] [add_attr_with_inline]: 0.0030385, [1] [Cycle 1]: 4.393e-05, [2] [tag_attr]: 1.229e-05 [meta_addattr_fg_expand]: 3.26999e-06 [parallel-infer-symbol]: 4.20999e-06 [pre_auto_parallel]: 2.398e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.00406307, [53] [py_interpret_to_execute]: 1.644e-05 [rewriter_before_opt_a]: 4.291e-05 [opt_a]: 0.00212862, [2] [Cycle 1]: 0.00148846, [45] [expand_dump_flag]: 3.26001e-06 [switch_simplify]: 2.587e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00030667 [with_stream_mark]: 1.519e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 9.852e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 2.66999e-06 [meta_shard_fg_expand]: 1.61998e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 9.07001e-06 [auto_parallel]: 6.85002e-06 [parallel]: 1.891e-05 [flash_sp]: 8.22e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.51998e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.06999e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.40999e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 9.80013e-07 [offload_activation]: 9.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.137e-05 [merge_recompute_call_nodes]: 1.73002e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.18998e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.48998e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 9.17001e-06 [renormalize]: 0.0005072 [add_forward_monad_depend]: 5.50001e-06 [auto_monad_grad]: 2.06e-06 [auto_monad_eliminator]: 1.574e-05 [cse]: 3.109e-05 [a_3]: 4.409e-05 [Cycle 2]: 0.0006291, [45] [expand_dump_flag]: 1.09998e-06 [switch_simplify]: 7.86001e-06 [loop_unroll]: 6.02999e-06 [a_1]: 0.00013536 [with_stream_mark]: 1.243e-05 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 6.804e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 1.50001e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.56002e-06 [merge_send_recv]: 5.14998e-06 [auto_parallel]: 5.10999e-06 [parallel]: 6.94999e-06 [flash_sp]: 3.50998e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.06999e-06 [matmul_add_comm_reduction]: 6.86999e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 6.34999e-06 [virtual_dataset]: 5.16002e-06 [get_grad_eliminate_]: 5.12999e-06 [virtual_output]: 4.87e-06 [merge_forward]: 3.31001e-06 [cell_reuse_recompute_pass]: 1.69e-06 [offload_activation]: 7.70998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 1.14998e-06 [before_grad]: 8.34002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21999e-06 [meta_fg_expand]: 1.76998e-06 [flash_sp_send_recv_attached]: 1.25001e-06 [receive_attached]: 1.71e-06 [after_resolve]: 9.42999e-06 [a_after_grad]: 8.69003e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.59e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.417e-05 [a_3]: 3.258e-05 [py_interpret_to_execute_after_opt_a]: 1.098e-05 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 3.382e-05 [convert_after_rewriter]: 7.29001e-06 [order_py_execute_after_rewriter]: 5.55001e-06 [mutable_eliminate]: 0.00052626 [opt_b]: 0.00018813, [1] [Cycle 1]: 0.00018108, [7] [b_1]: 0.00011016 [b_2]: 7.24001e-06 [updatestate_depend_eliminate]: 6.02001e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 4.19997e-07 [cse]: 1.758e-05 [optimize_parallel_all_gather_comm]: 1.614e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.649e-05 [loop_unroll]: 0.00041615 [opt_after_cconv]: 9.661e-05, [1] [Cycle 1]: 9.068e-05, [7] [c_1]: 2.758e-05 [parameter_eliminate]: 2.61999e-06 [updatestate_depend_eliminate]: 5.62001e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.721e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.368e-05 [tuple_transform]: 6.962e-05, [1] [Cycle 1]: 6.526e-05, [4] [d_1]: 3.957e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 5.101e-05 [cse_after_recomputation]: 2.024e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.092e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 4.94e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.12e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.30001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.188e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60999e-06 [overlap_recompute_comm]: 2.76e-06 [overlap_grad_ring_attention]: 4.64998e-06 [overlap_grad_flash_sp]: 1.953e-05 [begin_end_overlap_inline]: 6.79982e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 7.344e-05, [1] [Cycle 1]: 6.896e-05, [6] [build]: 3.36001e-06 [elim_shapecalc]: 9.21998e-06 [elim_not_effective]: 1.219e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.18002e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.69998e-06 [auto_monad_reorder]: 1.623e-05 [get_jit_bprop_graph]: 2.22001e-06 [rewriter_after_jit_bprop_graph]: 3.92998e-06 [opt_after_jit_grad]: 0.00050381 [validate]: 3.842e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.0883573 [execute]: 9.97001e-06 Sums bootstrap : 0.000420s : 0.43% type_inference : 0.004447s : 4.58% event_method : 0.000011s : 0.01% auto_monad : 0.000055s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000024s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000043s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000034s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000442s : 0.45% optimize.opt_a.with_stream_mark : 0.000028s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000167s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000507s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000045s : 0.05% optimize.opt_a.a_3 : 0.000077s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000526s : 0.54% optimize.opt_b.b_1 : 0.000110s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.03% optimize.loop_unroll : 0.000416s : 0.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000504s : 0.52% validate : 0.000038s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.088357s : 90.91% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000132 26 18.73% : 0.000025s : 4: substitution.arithmetic_simplify 1.72% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.56% : 0.000006s : 4: substitution.graph_param_transform 65.21% : 0.000086s : 2: substitution.inline 2.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.19% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004401 2 91.11% : 0.004010s : 1: type_inference.infer 8.89% : 0.000391s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000085 2 100.00% : 0.000085s : 2: match.inline ------[predicate.] 0.000144 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 1.19% : 0.000002s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.81% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.43% : 0.000001s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.99% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.91% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000009s : 44: predicate.inline 1.23% : 0.000002s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.41% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.86% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 9: predicate.minmaximum_grad 1.50% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.87% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 2.06% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.55% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 4: predicate.row_tensor_eliminate 1.06% : 0.000002s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.13% : 0.000002s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.45% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000281 6 40.12% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.88% : 0.000168s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.109898 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.78% : 0.003052s : 1: add_attr 2.77% : 0.003043s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000458s : 1: bootstrap 0.03% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.39% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.49% : 0.000536s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.75% : 0.000821s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000073s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.94% : 0.002132s : 1: opt_a 0.09% : 0.000100s : 1: opt_after_cconv 0.47% : 0.000514s : 1: opt_after_jit_grad 0.17% : 0.000192s : 1: opt_b 3.70% : 0.004067s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000028s : 1: pre_auto_parallel 0.02% : 0.000021s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.26% : 0.000287s : 1: renormalize.infer 0.19% : 0.000212s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000038s : 1: rewriter_after_opt_a 0.04% : 0.000047s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000076s : 1: symbol_engine_optimizer 80.42% : 0.088383s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 4.06% : 0.004464s : 1: type_inference 0.06% : 0.000069s : 1: validate TotalTime = 0.142574, [24] [bootstrap]: 0.0004769 [type_inference]: 0.010997 [event_method]: 4.924e-05 [auto_monad]: 0.00012288 [graph_reusing]: 8.23999e-06 [inline]: 2.24999e-06 [add_attr]: 0.00335663, [1] [add_attr_with_inline]: 0.00334731, [1] [Cycle 1]: 7.655e-05, [2] [tag_attr]: 3.359e-05 [meta_addattr_fg_expand]: 8.75001e-06 [parallel-infer-symbol]: 3.41999e-06 [pre_auto_parallel]: 4.985e-05 [insert-virtual-dataset]: 2.94999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0143202, [53] [py_interpret_to_execute]: 3.868e-05 [rewriter_before_opt_a]: 0.00013253 [opt_a]: 0.0116719, [3] [Cycle 1]: 0.0076225, [45] [expand_dump_flag]: 4.70999e-06 [switch_simplify]: 6.819e-05 [loop_unroll]: 5.535e-05 [a_1]: 0.00138308 [with_stream_mark]: 2.49e-05 [recompute_prepare]: 2.243e-05 [updatestate_depend_eliminate]: 9.79e-06 [updatestate_assign_eliminate]: 8.25e-06 [updatestate_loads_eliminate]: 7.70998e-06 [parameter_eliminate]: 3.06001e-06 [a_2]: 0.0002468 [accelerated_algorithm]: 3.242e-05 [shard]: 2.02999e-06 [meta_shard_fg_expand]: 3.56001e-06 [shard_inline]: 1.639e-05 [merge_send_recv]: 1.818e-05 [auto_parallel]: 1.135e-05 [parallel]: 2e-05 [flash_sp]: 1.26e-05 [merge_comm]: 1.027e-05 [allreduce_fusion]: 8.97e-06 [matmul_add_comm_reduction]: 2.912e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.822e-05 [virtual_dataset]: 1.582e-05 [get_grad_eliminate_]: 1.649e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.97001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.852e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.867e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 2.806e-05 [set_forward_comm_id_for_comm_node_pass]: 9.99999e-06 [meta_fg_expand]: 0.001605 [flash_sp_send_recv_attached]: 3.97998e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 6.412e-05 [a_after_grad]: 8.437e-05 [renormalize]: 0.00281625 [add_forward_monad_depend]: 9.52999e-06 [auto_monad_grad]: 6.58e-06 [auto_monad_eliminator]: 5.838e-05 [cse]: 0.00017931 [a_3]: 0.00033662 [Cycle 2]: 0.00316253, [45] [expand_dump_flag]: 2.42001e-06 [switch_simplify]: 4.634e-05 [loop_unroll]: 4.271e-05 [a_1]: 0.00155027 [with_stream_mark]: 1.543e-05 [recompute_prepare]: 1.084e-05 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.31999e-06 [parameter_eliminate]: 1.44998e-06 [a_2]: 0.00011061 [accelerated_algorithm]: 1.239e-05 [shard]: 1.67999e-06 [meta_shard_fg_expand]: 2.30002e-06 [shard_inline]: 8.25e-06 [merge_send_recv]: 8.86002e-06 [auto_parallel]: 9.77001e-06 [parallel]: 8.3e-06 [flash_sp]: 3.61999e-06 [merge_comm]: 4.82e-06 [allreduce_fusion]: 4.27e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 9.47999e-06 [virtual_dataset]: 8.12998e-06 [get_grad_eliminate_]: 7.9e-06 [virtual_output]: 7.83001e-06 [merge_forward]: 4.87998e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 1.022e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.545e-05 [merge_recompute_call_nodes]: 1.20001e-06 [before_grad]: 1.304e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07999e-06 [meta_fg_expand]: 5.173e-05 [flash_sp_send_recv_attached]: 1.44e-06 [receive_attached]: 2.08002e-06 [after_resolve]: 1.606e-05 [a_after_grad]: 1.319e-05 [renormalize]: 0.00073429 [add_forward_monad_depend]: 5.62001e-06 [auto_monad_grad]: 2.48e-06 [auto_monad_eliminator]: 1.781e-05 [cse]: 4.719e-05 [a_3]: 6.314e-05 [Cycle 3]: 0.00086873, [45] [expand_dump_flag]: 1.77999e-06 [switch_simplify]: 1.014e-05 [loop_unroll]: 8.15999e-06 [a_1]: 0.00022645 [with_stream_mark]: 1.354e-05 [recompute_prepare]: 8.82e-06 [updatestate_depend_eliminate]: 4.70001e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 0.00010872 [accelerated_algorithm]: 1.193e-05 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 2.01e-06 [shard_inline]: 8.27e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 7.66999e-06 [parallel]: 8.11002e-06 [flash_sp]: 1.40999e-06 [merge_comm]: 4.47998e-06 [allreduce_fusion]: 4.33999e-06 [matmul_add_comm_reduction]: 9.49999e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 9.24998e-06 [virtual_dataset]: 7.91001e-06 [get_grad_eliminate_]: 7.63999e-06 [virtual_output]: 7.57002e-06 [merge_forward]: 4.72e-06 [cell_reuse_recompute_pass]: 1.76e-06 [offload_activation]: 9.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.482e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 1.332e-05 [set_forward_comm_id_for_comm_node_pass]: 4.71997e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 1.22999e-06 [receive_attached]: 1.57001e-06 [after_resolve]: 1.322e-05 [a_after_grad]: 1.271e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.53002e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.211e-05 [cse]: 2.387e-05 [a_3]: 5.482e-05 [py_interpret_to_execute_after_opt_a]: 1.643e-05 [slice_cell_reuse_recomputed_activation]: 2.32001e-06 [rewriter_after_opt_a]: 4.984e-05 [convert_after_rewriter]: 8.71002e-06 [order_py_execute_after_rewriter]: 6.41998e-06 [mutable_eliminate]: 0.00076077 [opt_b]: 0.0002867, [1] [Cycle 1]: 0.00027803, [7] [b_1]: 0.00018178 [b_2]: 1.006e-05 [updatestate_depend_eliminate]: 8.71997e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.66999e-06 [renormalize]: 4.69998e-07 [cse]: 3.129e-05 [optimize_parallel_all_gather_comm]: 2e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.915e-05 [loop_unroll]: 0.00047382 [opt_after_cconv]: 0.00013512, [1] [Cycle 1]: 0.0001287, [7] [c_1]: 4.943e-05 [parameter_eliminate]: 3.21999e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 3.7e-06 [updatestate_loads_eliminate]: 3.44001e-06 [cse]: 2.546e-05 [renormalize]: 4.7998e-07 [remove_dup_value]: 3.775e-05 [tuple_transform]: 0.00010355, [1] [Cycle 1]: 9.82e-05, [4] [d_1]: 6.75e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 9.84001e-06 [partial_unused_args_eliminate]: 2.01998e-06 [add_recomputation]: 6.601e-05 [cse_after_recomputation]: 3.069e-05, [1] [Cycle 1]: 2.537e-05, [1] [cse]: 1.94e-05 [environ_conv]: 1.032e-05 [swap_dp_allreduce_reducescatter]: 7.18998e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 5.45001e-06 [label_fine_grained_interleaved_index]: 3.02002e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.23998e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 1.13001e-06 [remove_cast_before_assign_add]: 1.42e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 1.30999e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.54998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88997e-06 [control_data_broadcast_order]: 1.522e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 5.07e-06 [overlap_recompute_and_grad_model_parallel]: 5.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.81e-06 [overlap_grad_ring_attention]: 4.50001e-06 [overlap_grad_flash_sp]: 2.549e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 9.283e-05, [1] [Cycle 1]: 8.85e-05, [6] [build]: 1.12e-05 [elim_shapecalc]: 1.168e-05 [elim_not_effective]: 1.566e-05 [opt_reshape]: 8.52e-06 [fold_const_symbol]: 1.361e-05 [renormalize]: 1.59984e-07 [detach_backward]: 2.25002e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 2.25e-05 [get_jit_bprop_graph]: 2.49001e-06 [rewriter_after_jit_bprop_graph]: 4.95999e-06 [opt_after_jit_grad]: 0.0004766 [validate]: 6.28e-05 [backend_pass]: 1.14e-06 [task_emit]: 0.112341 [execute]: 9.81e-06 Sums bootstrap : 0.000477s : 0.35% type_inference : 0.010997s : 7.98% event_method : 0.000049s : 0.04% auto_monad : 0.000123s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000133s : 0.10% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.09% optimize.opt_a.loop_unroll : 0.000106s : 0.08% optimize.opt_a.a_1 : 0.003160s : 2.29% optimize.opt_a.with_stream_mark : 0.000054s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000466s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.04% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000033s : 0.02% optimize.opt_a.merge_send_recv : 0.000035s : 0.03% optimize.opt_a.auto_parallel : 0.000029s : 0.02% optimize.opt_a.parallel : 0.000036s : 0.03% optimize.opt_a.flash_sp : 0.000018s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.03% optimize.opt_a.virtual_dataset : 0.000032s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000031s : 0.02% optimize.opt_a.merge_forward : 0.000020s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000039s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000059s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000054s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001660s : 1.20% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000093s : 0.07% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003551s : 2.58% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000011s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000088s : 0.06% optimize.opt_a.cse : 0.000250s : 0.18% optimize.opt_a.a_3 : 0.000455s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000761s : 0.55% optimize.opt_b.b_1 : 0.000182s : 0.13% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000029s : 0.02% optimize.loop_unroll : 0.000474s : 0.34% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000025s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000038s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000066s : 0.05% optimize.cse_after_recomputation.cse : 0.000019s : 0.01% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000477s : 0.35% validate : 0.000063s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.112341s : 81.50% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000798 209 6.60% : 0.000053s : 11: substitution.arithmetic_simplify 0.32% : 0.000003s : 4: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 4: substitution.fold_const_symbol 1.01% : 0.000008s : 7: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 56.37% : 0.000450s : 16: substitution.inline 2.16% : 0.000017s : 2: substitution.inline_without_move 1.39% : 0.000011s : 18: substitution.j_node_and_user_rematch 2.27% : 0.000018s : 3: substitution.less_batch_normalization 1.77% : 0.000014s : 11: substitution.minmaximum_grad 0.78% : 0.000006s : 5: substitution.partial_eliminate 1.66% : 0.000013s : 18: substitution.remove_not_recompute_node 3.31% : 0.000026s : 10: substitution.replace_applicator 1.53% : 0.000012s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.81% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.33% : 0.000066s : 28: substitution.tuple_list_get_item_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010912 2 86.08% : 0.009393s : 1: type_inference.infer 13.92% : 0.001519s : 1: type_inference.specialize ------[replace.] 0.000213 30 59.15% : 0.000126s : 16: replace.inline 40.85% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000474 30 93.20% : 0.000441s : 16: match.inline 6.80% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000729 5429 1.11% : 0.000008s : 65: predicate.accumulaten_eliminater 0.32% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 30: predicate.addn_check_dump 1.06% : 0.000008s : 65: predicate.addn_zero_filter 1.04% : 0.000008s : 65: predicate.adjust_all_reduce_mul_add 2.16% : 0.000016s : 95: predicate.arithmetic_simplify 1.13% : 0.000008s : 65: predicate.cast_eliminate 1.17% : 0.000009s : 65: predicate.check_bprop_eliminate 0.52% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.52% : 0.000004s : 30: predicate.depend_value_elim 1.18% : 0.000009s : 65: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 65: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 7: predicate.elim_not_effective 0.15% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 72: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 72: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 72: predicate.environ_get_depend_swap 1.77% : 0.000013s : 102: predicate.environ_get_eliminate 1.19% : 0.000009s : 72: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 95: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 95: predicate.float_depend_g_call 0.50% : 0.000004s : 30: predicate.float_environ_get_switch 0.68% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.59% : 0.000004s : 30: predicate.get_grad_eliminate 0.09% : 0.000001s : 7: predicate.graph_param_transform 0.53% : 0.000004s : 30: predicate.incorporate_call 0.47% : 0.000003s : 30: predicate.incorporate_call_switch 5.64% : 0.000041s : 234: predicate.inline 1.32% : 0.000010s : 53: predicate.inline_without_move 0.29% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 30: predicate.less_batch_normalization 1.64% : 0.000012s : 93: predicate.list_to_tuple_eliminator_ 2.61% : 0.000019s : 158: predicate.load_eliminater 0.33% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 126: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 79: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 30: predicate.merge_addn 1.10% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.15% : 0.000008s : 65: predicate.minmaximum_grad 0.49% : 0.000004s : 7: predicate.mutable_eliminate 0.13% : 0.000001s : 7: predicate.opt_reshape 0.16% : 0.000001s : 7: predicate.parallel_virtual_node 2.00% : 0.000015s : 95: predicate.partial_defer_inline 1.69% : 0.000012s : 86: predicate.partial_eliminate 1.08% : 0.000008s : 65: predicate.print_const_string_wrapper 0.54% : 0.000004s : 30: predicate.reduce_all_const_elim 1.33% : 0.000010s : 65: predicate.reduce_eliminate 2.67% : 0.000019s : 158: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 30: predicate.remove_not_recompute_node 1.91% : 0.000014s : 144: predicate.replace_applicator 0.67% : 0.000005s : 53: predicate.replace_old_param 0.13% : 0.000001s : 7: predicate.reset_defer_inline 1.08% : 0.000008s : 65: predicate.reshape_eliminate 1.19% : 0.000009s : 65: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 7: predicate.row_tensor_eliminate 1.27% : 0.000009s : 65: predicate.same_eliminate 0.36% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 30: predicate.shard_identity_eliminate 0.28% : 0.000002s : 14: predicate.special_op_eliminate 0.61% : 0.000004s : 30: predicate.specialize_transform 1.26% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000009s : 53: predicate.stack_unstack_eliminate 0.18% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 95: predicate.switch_defer_inline 2.92% : 0.000021s : 160: predicate.switch_layer_defer_inline 4.80% : 0.000035s : 258: predicate.switch_simplify 1.08% : 0.000008s : 65: predicate.tile_eliminate 1.07% : 0.000008s : 65: predicate.transpose_eliminate 1.52% : 0.000011s : 79: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 79: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 79: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 123: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 79: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 109: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 93: predicate.tuple_to_list_eliminator_ 2.59% : 0.000019s : 158: predicate.updatestate_pure_node_eliminater 3.22% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 7: predicate.value_based_eliminate 0.56% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 30: predicate.virtual_output_eliminate 0.13% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001686 32 56.25% : 0.000948s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.75% : 0.000738s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168851 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.99% : 0.003361s : 1: add_attr 1.98% : 0.003352s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000070s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000130s : 1: auto_monad 0.02% : 0.000026s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.30% : 0.000514s : 1: bootstrap 0.02% : 0.000033s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000034s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000057s : 1: event_method 0.01% : 0.000019s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.29% : 0.000483s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000773s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 2.82% : 0.004761s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000161s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000046s : 4: opt.transform.symbol_engine_opt 6.92% : 0.011676s : 1: opt_a 0.08% : 0.000139s : 1: opt_after_cconv 0.29% : 0.000487s : 1: opt_after_jit_grad 0.17% : 0.000290s : 1: opt_b 8.48% : 0.014326s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000042s : 1: remove_dup_value 1.11% : 0.001874s : 2: renormalize.infer 0.98% : 0.001659s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000055s : 1: rewriter_after_opt_a 0.08% : 0.000137s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000095s : 1: symbol_engine_optimizer 66.55% : 0.112364s : 1: task_emit 0.06% : 0.000107s : 1: tuple_transform 6.53% : 0.011024s : 1: type_inference 0.06% : 0.000097s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x2-ge],max_mem:18.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x3-pynative],max_mem:18.0M TotalTime = 0.0244944, [24] [bootstrap]: 0.00058593 [type_inference]: 0.00717766 [event_method]: 1.464e-05 [auto_monad]: 6.515e-05 [graph_reusing]: 5.89e-06 [inline]: 2.19001e-06 [add_attr]: 0.00403981, [1] [add_attr_with_inline]: 0.00402408, [1] [Cycle 1]: 6.12e-05, [2] [tag_attr]: 1.971e-05 [meta_addattr_fg_expand]: 4.22e-06 [parallel-infer-symbol]: 4.35e-06 [pre_auto_parallel]: 4.087e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 2.19001e-06 [optimize]: 0.00443011, [53] [py_interpret_to_execute]: 2.548e-05 [rewriter_before_opt_a]: 6.517e-05 [opt_a]: 0.00251504, [2] [Cycle 1]: 0.00190609, [45] [expand_dump_flag]: 3.53e-06 [switch_simplify]: 3.314e-05 [loop_unroll]: 2.051e-05 [a_1]: 0.0004931 [with_stream_mark]: 1.56e-05 [recompute_prepare]: 8.47998e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.93999e-06 [updatestate_loads_eliminate]: 3.45998e-06 [parameter_eliminate]: 1.73997e-06 [a_2]: 0.00013013 [accelerated_algorithm]: 6.47001e-06 [shard]: 2.68e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.38999e-06 [auto_parallel]: 6.84001e-06 [parallel]: 2.714e-05 [flash_sp]: 8.78001e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.68e-06 [matmul_add_comm_reduction]: 9.46e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.51001e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.71998e-06 [merge_forward]: 3.82998e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.024e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.48998e-06 [receive_attached]: 2.88e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 8.84e-06 [renormalize]: 0.00068494 [add_forward_monad_depend]: 4.79e-06 [auto_monad_grad]: 2.53e-06 [auto_monad_eliminator]: 1.488e-05 [cse]: 2.971e-05 [a_3]: 4.154e-05 [Cycle 2]: 0.00059836, [45] [expand_dump_flag]: 8.30012e-07 [switch_simplify]: 6.78998e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.00012591 [with_stream_mark]: 1.037e-05 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 6.909e-05 [accelerated_algorithm]: 5.48002e-06 [shard]: 1.34e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.48001e-06 [auto_parallel]: 5.02e-06 [parallel]: 4.24002e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 5.52999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.56002e-06 [offload_activation]: 6.54001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46998e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.27998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 1.92999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.06997e-06 [after_resolve]: 9.30001e-06 [a_after_grad]: 7.77e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.41998e-06 [cse]: 1.776e-05 [a_3]: 3.251e-05 [py_interpret_to_execute_after_opt_a]: 9.66e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.26e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00048581 [opt_b]: 0.00018189, [1] [Cycle 1]: 0.00017552, [7] [b_1]: 0.00010778 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.14999e-06 [renormalize]: 4.50003e-07 [cse]: 1.717e-05 [optimize_parallel_all_gather_comm]: 1.596e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 2.657e-05 [loop_unroll]: 0.00041757 [opt_after_cconv]: 9.726e-05, [1] [Cycle 1]: 9.168e-05, [7] [c_1]: 2.827e-05 [parameter_eliminate]: 2.91e-06 [updatestate_depend_eliminate]: 5.53002e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.33002e-06 [cse]: 1.604e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.376e-05 [tuple_transform]: 7.047e-05, [1] [Cycle 1]: 6.629e-05, [4] [d_1]: 4.12e-05 [none_parameter_eliminate]: 1.51998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 1.91998e-06 [add_recomputation]: 5.183e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.056e-05 [environ_conv]: 5.25999e-06 [swap_dp_allreduce_reducescatter]: 5.61e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.59998e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.24999e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.27e-06 [add_comm_op_reuse_tag]: 1.37e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09003e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.186e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.93999e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.837e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 2.12001e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 6.896e-05, [1] [Cycle 1]: 6.458e-05, [6] [build]: 2.59999e-06 [elim_shapecalc]: 8.40999e-06 [elim_not_effective]: 1.183e-05 [opt_reshape]: 5.95002e-06 [fold_const_symbol]: 9.01002e-06 [renormalize]: 1.99972e-07 [detach_backward]: 2.08002e-06 [pipeline_parallel_scheduler]: 1.56002e-06 [auto_monad_reorder]: 1.641e-05 [get_jit_bprop_graph]: 1.49998e-06 [rewriter_after_jit_bprop_graph]: 0.00012438 [opt_after_jit_grad]: 0.00049593 [validate]: 3.601e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00719438 [execute]: 9.05001e-06 Sums bootstrap : 0.000586s : 3.01% type_inference : 0.007178s : 36.90% event_method : 0.000015s : 0.08% auto_monad : 0.000065s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000041s : 0.21% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000025s : 0.13% optimize.rewriter_before_opt_a : 0.000065s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.21% optimize.opt_a.loop_unroll : 0.000026s : 0.13% optimize.opt_a.a_1 : 0.000619s : 3.18% optimize.opt_a.with_stream_mark : 0.000026s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000199s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000031s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.03% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000685s : 3.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000047s : 0.24% optimize.opt_a.a_3 : 0.000074s : 0.38% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000486s : 2.50% optimize.opt_b.b_1 : 0.000108s : 0.55% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.14% optimize.loop_unroll : 0.000418s : 2.15% optimize.opt_after_cconv.c_1 : 0.000028s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.05% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.08% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000124s : 0.64% opt_after_jit_grad : 0.000496s : 2.55% validate : 0.000036s : 0.19% backend_pass : 0.000001s : 0.00% task_emit : 0.007194s : 36.99% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000194 30 14.55% : 0.000028s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.66% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000006s : 4: substitution.graph_param_transform 68.51% : 0.000133s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.22% : 0.000004s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 5.96% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007120 2 91.24% : 0.006496s : 1: type_inference.infer 8.76% : 0.000624s : 1: type_inference.specialize ------[replace.] 0.000041 5 72.12% : 0.000029s : 3: replace.inline 27.88% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000141 5 92.56% : 0.000130s : 3: match.inline 7.44% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 1.01% : 0.000002s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.53% : 0.000004s : 19: predicate.arithmetic_simplify 1.09% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000004s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.52% : 0.000002s : 21: predicate.replace_applicator 0.50% : 0.000001s : 8: predicate.replace_old_param 0.46% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.08% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.44% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000448 8 46.64% : 0.000209s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.36% : 0.000239s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.034782 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.63% : 0.004045s : 1: add_attr 11.58% : 0.004028s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000071s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.81% : 0.000630s : 1: bootstrap 0.09% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.23% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.42% : 0.000495s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.84% : 0.000989s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.06% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000089s : 28: opt.transform.opt_b 0.13% : 0.000045s : 2: opt.transform.opt_trans_graph 0.09% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.24% : 0.002518s : 1: opt_a 0.29% : 0.000101s : 1: opt_after_cconv 1.46% : 0.000507s : 1: opt_after_jit_grad 0.53% : 0.000186s : 1: opt_b 12.75% : 0.004434s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.13% : 0.000045s : 1: pre_auto_parallel 0.08% : 0.000029s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 1.08% : 0.000376s : 1: renormalize.infer 0.87% : 0.000301s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.37% : 0.000130s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000037s : 1: rewriter_after_opt_a 0.20% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.21% : 0.000072s : 1: symbol_engine_optimizer 20.73% : 0.007212s : 1: task_emit 0.21% : 0.000073s : 1: tuple_transform 20.69% : 0.007198s : 1: type_inference 0.20% : 0.000070s : 1: validate TotalTime = 0.0202958, [24] [bootstrap]: 0.00042619 [type_inference]: 0.00482925 [event_method]: 1.255e-05 [auto_monad]: 5.646e-05 [graph_reusing]: 5.29e-06 [inline]: 2.22999e-06 [add_attr]: 0.00335772, [1] [add_attr_with_inline]: 0.00334776, [1] [Cycle 1]: 5.306e-05, [2] [tag_attr]: 1.483e-05 [meta_addattr_fg_expand]: 3.31001e-06 [parallel-infer-symbol]: 3.43e-06 [pre_auto_parallel]: 2.743e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.32001e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00420935, [53] [py_interpret_to_execute]: 1.823e-05 [rewriter_before_opt_a]: 4.319e-05 [opt_a]: 0.0021715, [2] [Cycle 1]: 0.00154175, [45] [expand_dump_flag]: 2.46e-06 [switch_simplify]: 2.622e-05 [loop_unroll]: 1.407e-05 [a_1]: 0.00031199 [with_stream_mark]: 1.611e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 4.48001e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 2.22001e-06 [a_2]: 7.813e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 3.03e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 8.69998e-06 [auto_parallel]: 6.09001e-06 [parallel]: 1.949e-05 [flash_sp]: 8.90001e-06 [merge_comm]: 3.57002e-06 [allreduce_fusion]: 3.25002e-06 [matmul_add_comm_reduction]: 9.91998e-06 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.61998e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 1.056e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.123e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 9.42999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.52001e-06 [flash_sp_send_recv_attached]: 2.66999e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.135e-05 [a_after_grad]: 9.05999e-06 [renormalize]: 0.00057923 [add_forward_monad_depend]: 4.88001e-06 [auto_monad_grad]: 2.53998e-06 [auto_monad_eliminator]: 1.446e-05 [cse]: 2.998e-05 [a_3]: 4.255e-05 [Cycle 2]: 0.00061822, [45] [expand_dump_flag]: 1.18001e-06 [switch_simplify]: 7.13e-06 [loop_unroll]: 5.94e-06 [a_1]: 0.00013008 [with_stream_mark]: 1.306e-05 [recompute_prepare]: 6.23e-06 [updatestate_depend_eliminate]: 3.14999e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.886e-05 [accelerated_algorithm]: 5.82999e-06 [shard]: 1.44e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 4.92999e-06 [auto_parallel]: 6.49999e-06 [parallel]: 5.42001e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.18998e-06 [allreduce_fusion]: 2.84999e-06 [matmul_add_comm_reduction]: 5.62999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.89999e-06 [cell_reuse_recompute_pass]: 1.80001e-06 [offload_activation]: 6.80002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 8.49977e-07 [before_grad]: 8.30999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76999e-06 [meta_fg_expand]: 1.94999e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.32999e-06 [after_resolve]: 9.52999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.45999e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 6.89999e-06 [cse]: 1.435e-05 [a_3]: 3.201e-05 [py_interpret_to_execute_after_opt_a]: 1.015e-05 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 3.445e-05 [convert_after_rewriter]: 6.94999e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.00057053 [opt_b]: 0.00018997, [1] [Cycle 1]: 0.00018309, [7] [b_1]: 0.00011147 [b_2]: 7.55e-06 [updatestate_depend_eliminate]: 5.59998e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.40997e-06 [renormalize]: 3.50003e-07 [cse]: 1.83e-05 [optimize_parallel_all_gather_comm]: 1.684e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.738e-05 [loop_unroll]: 0.00045493 [opt_after_cconv]: 0.00010086, [1] [Cycle 1]: 9.466e-05, [7] [c_1]: 2.88e-05 [parameter_eliminate]: 3.28e-06 [updatestate_depend_eliminate]: 5.92999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.846e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.385e-05 [tuple_transform]: 7.247e-05, [1] [Cycle 1]: 6.804e-05, [4] [d_1]: 4.188e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.37001e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 4.888e-05 [cse_after_recomputation]: 2.083e-05, [1] [Cycle 1]: 1.631e-05, [1] [cse]: 1.095e-05 [environ_conv]: 5.29e-06 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 2.77002e-06 [label_micro_interleaved_index]: 4.65001e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 1.33002e-06 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.88998e-06 [comm_op_add_attrs]: 1.44998e-06 [add_comm_op_reuse_tag]: 1.64e-06 [interleave_split_concat_branches]: 1.40001e-06 [interleave_parallel_branches]: 1.20999e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88002e-06 [control_data_broadcast_order]: 1.231e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 4.07003e-06 [overlap_recompute_and_grad_model_parallel]: 4.56002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.54999e-06 [overlap_grad_ring_attention]: 4.52e-06 [overlap_grad_flash_sp]: 1.959e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.25002e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 1.42999e-06 [symbol_engine_optimizer]: 7.187e-05, [1] [Cycle 1]: 6.76e-05, [6] [build]: 3.63999e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.166e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 2.09984e-07 [detach_backward]: 2.52001e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.769e-05 [get_jit_bprop_graph]: 1.68002e-06 [rewriter_after_jit_bprop_graph]: 3.60998e-06 [opt_after_jit_grad]: 0.0004839 [validate]: 3.914e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00655939 [execute]: 9.40001e-06 Sums bootstrap : 0.000426s : 2.68% type_inference : 0.004829s : 30.35% event_method : 0.000013s : 0.08% auto_monad : 0.000056s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.11% optimize.rewriter_before_opt_a : 0.000043s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000033s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000442s : 2.78% optimize.opt_a.with_stream_mark : 0.000029s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000025s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000579s : 3.64% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000044s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000571s : 3.58% optimize.opt_b.b_1 : 0.000111s : 0.70% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.17% optimize.loop_unroll : 0.000455s : 2.86% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000042s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000002s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000020s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.02% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.11% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000484s : 3.04% validate : 0.000039s : 0.25% backend_pass : 0.000001s : 0.01% task_emit : 0.006559s : 41.22% execute : 0.000009s : 0.06% Time group info: ------[substitution.] 0.000139 26 18.64% : 0.000026s : 4: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000006s : 4: substitution.graph_param_transform 65.62% : 0.000091s : 2: substitution.inline 2.51% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.14% : 0.000004s : 4: substitution.remove_not_recompute_node 3.51% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004779 2 91.88% : 0.004390s : 1: type_inference.infer 8.12% : 0.000388s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000090 2 100.00% : 0.000090s : 2: match.inline ------[predicate.] 0.000145 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.65% : 0.000004s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.90% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.73% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.41% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.53% : 0.000001s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.80% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.47% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.64% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.97% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.81% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.53% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.29% : 0.000002s : 11: predicate.partial_defer_inline 1.16% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.26% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 1.07% : 0.000002s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000002s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.67% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.39% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.65% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.87% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000294 6 40.72% : 0.000120s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.28% : 0.000175s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029395 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.44% : 0.003363s : 1: add_attr 11.40% : 0.003351s : 1: add_attr_with_inline 0.02% : 0.000005s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.58% : 0.000465s : 1: bootstrap 0.11% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000005s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000464s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.98% : 0.000581s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.72% : 0.000800s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000046s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.40% : 0.002175s : 1: opt_a 0.36% : 0.000104s : 1: opt_after_cconv 1.68% : 0.000494s : 1: opt_after_jit_grad 0.66% : 0.000193s : 1: opt_b 14.33% : 0.004214s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.05% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 1.22% : 0.000358s : 1: renormalize.infer 0.73% : 0.000215s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.16% : 0.000047s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000075s : 1: symbol_engine_optimizer 22.38% : 0.006579s : 1: task_emit 0.26% : 0.000075s : 1: tuple_transform 16.51% : 0.004854s : 1: type_inference 0.25% : 0.000073s : 1: validate TotalTime = 0.0206067, [24] [bootstrap]: 0.00043379 [type_inference]: 0.00574718 [event_method]: 1.478e-05 [auto_monad]: 5.614e-05 [graph_reusing]: 6.33002e-06 [inline]: 2.33998e-06 [add_attr]: 0.00320212, [1] [add_attr_with_inline]: 0.00319377, [1] [Cycle 1]: 5.007e-05, [2] [tag_attr]: 1.711e-05 [meta_addattr_fg_expand]: 4.43001e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 2.914e-05 [insert-virtual-dataset]: 2.49999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 2.09e-06 [optimize]: 0.00417124, [53] [py_interpret_to_execute]: 2.136e-05 [rewriter_before_opt_a]: 6.163e-05 [opt_a]: 0.00225971, [2] [Cycle 1]: 0.00164963, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 3.256e-05 [loop_unroll]: 2.066e-05 [a_1]: 0.00046467 [with_stream_mark]: 1.512e-05 [recompute_prepare]: 7.28999e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 7.571e-05 [accelerated_algorithm]: 6.34001e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.87998e-06 [auto_parallel]: 5.67999e-06 [parallel]: 1.933e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.63999e-06 [matmul_add_comm_reduction]: 9.46e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.05998e-06 [virtual_dataset]: 6.01998e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.55001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.63e-06 [flash_sp_send_recv_attached]: 2.49001e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.011e-05 [a_after_grad]: 8.77e-06 [renormalize]: 0.00053229 [add_forward_monad_depend]: 5.24e-06 [auto_monad_grad]: 2.22001e-06 [auto_monad_eliminator]: 1.512e-05 [cse]: 2.956e-05 [a_3]: 4.193e-05 [Cycle 2]: 0.00059991, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.50998e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00012728 [with_stream_mark]: 1.007e-05 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.86e-05 [accelerated_algorithm]: 5.75001e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.53002e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.26001e-06 [flash_sp]: 3.71001e-06 [merge_comm]: 2.81999e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.89e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.84e-06 [virtual_dataset]: 5.07e-06 [get_grad_eliminate_]: 4.95001e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.42001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.25999e-06 [a_after_grad]: 8.05999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 1.04003e-06 [auto_monad_eliminator]: 6.02999e-06 [cse]: 1.821e-05 [a_3]: 3.244e-05 [py_interpret_to_execute_after_opt_a]: 8.20999e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.208e-05 [convert_after_rewriter]: 6.90998e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00045888 [opt_b]: 0.00018567, [1] [Cycle 1]: 0.00017913, [7] [b_1]: 0.0001096 [b_2]: 6.93998e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 3.33998e-06 [renormalize]: 2.49973e-07 [cse]: 1.728e-05 [optimize_parallel_all_gather_comm]: 1.671e-05 [overlap_param_gather]: 2.46998e-06 [cconv]: 2.409e-05 [loop_unroll]: 0.00045751 [opt_after_cconv]: 9.785e-05, [1] [Cycle 1]: 9.213e-05, [7] [c_1]: 2.828e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.73998e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.681e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.349e-05 [tuple_transform]: 7.007e-05, [1] [Cycle 1]: 6.567e-05, [4] [d_1]: 3.986e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.29978e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.722e-05 [cse_after_recomputation]: 2.001e-05, [1] [Cycle 1]: 1.537e-05, [1] [cse]: 1.04e-05 [environ_conv]: 5.09e-06 [swap_dp_allreduce_reducescatter]: 5.32999e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 4.77e-06 [label_fine_grained_interleaved_index]: 3.01001e-06 [merge_cast_opt]: 1.67001e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.73002e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.47999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.167e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.79002e-06 [overlap_recompute_and_grad_model_parallel]: 4.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.38002e-06 [overlap_grad_ring_attention]: 4.32998e-06 [overlap_grad_flash_sp]: 1.828e-05 [begin_end_overlap_inline]: 8.00006e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 6.876e-05, [1] [Cycle 1]: 6.462e-05, [6] [build]: 2.57001e-06 [elim_shapecalc]: 8.55001e-06 [elim_not_effective]: 1.132e-05 [opt_reshape]: 5.92001e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 1.609e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00044654 [validate]: 3.411e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00619893 [execute]: 9.67001e-06 Sums bootstrap : 0.000434s : 2.64% type_inference : 0.005747s : 34.99% event_method : 0.000015s : 0.09% auto_monad : 0.000056s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.18% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000062s : 0.38% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000592s : 3.60% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000024s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000532s : 3.24% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000048s : 0.29% optimize.opt_a.a_3 : 0.000074s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000459s : 2.79% optimize.opt_b.b_1 : 0.000110s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000024s : 0.15% optimize.loop_unroll : 0.000458s : 2.79% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000447s : 2.72% validate : 0.000034s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006199s : 37.74% execute : 0.000010s : 0.06% Time group info: ------[substitution.] 0.000178 30 14.14% : 0.000025s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000002s : 2: substitution.fold_const_symbol 3.25% : 0.000006s : 4: substitution.graph_param_transform 67.54% : 0.000121s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000005s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.44% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005697 2 89.83% : 0.005117s : 1: type_inference.infer 10.17% : 0.000579s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.95% : 0.000028s : 3: replace.inline 30.05% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000129 5 91.94% : 0.000119s : 3: match.inline 8.06% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.95% : 0.000002s : 11: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.95% : 0.000002s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.37% : 0.000001s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 6.17% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.88% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.18% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.79% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.92% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 8 44.93% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.07% : 0.000205s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029627 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.82% : 0.003207s : 1: add_attr 10.79% : 0.003197s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.60% : 0.000473s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.05% : 0.000016s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.58% : 0.000467s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.58% : 0.000468s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.23% : 0.000958s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.64% : 0.002263s : 1: opt_a 0.34% : 0.000101s : 1: opt_after_cconv 1.54% : 0.000456s : 1: opt_after_jit_grad 0.64% : 0.000189s : 1: opt_b 14.09% : 0.004175s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.97% : 0.000288s : 1: renormalize.infer 0.80% : 0.000237s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.22% : 0.000066s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.97% : 0.006213s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 19.46% : 0.005765s : 1: type_inference 0.22% : 0.000064s : 1: validate TotalTime = 0.0397595, [24] [bootstrap]: 0.00051331 [type_inference]: 0.0115471 [event_method]: 7.98e-05 [auto_monad]: 0.00012601 [graph_reusing]: 8.28999e-06 [inline]: 1.97999e-06 [add_attr]: 0.00314712, [1] [add_attr_with_inline]: 0.00313819, [1] [Cycle 1]: 7.781e-05, [2] [tag_attr]: 3.653e-05 [meta_addattr_fg_expand]: 9.77999e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 5.307e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.0145422, [53] [py_interpret_to_execute]: 3.953e-05 [rewriter_before_opt_a]: 0.00015215 [opt_a]: 0.011953, [3] [Cycle 1]: 0.00775984, [45] [expand_dump_flag]: 4.32998e-06 [switch_simplify]: 7.466e-05 [loop_unroll]: 6.12e-05 [a_1]: 0.00155521 [with_stream_mark]: 2.438e-05 [recompute_prepare]: 2.282e-05 [updatestate_depend_eliminate]: 9.63002e-06 [updatestate_assign_eliminate]: 7.82e-06 [updatestate_loads_eliminate]: 7.2e-06 [parameter_eliminate]: 3.36999e-06 [a_2]: 0.00024584 [accelerated_algorithm]: 3.167e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 4.22e-06 [shard_inline]: 1.624e-05 [merge_send_recv]: 1.696e-05 [auto_parallel]: 1.194e-05 [parallel]: 2.212e-05 [flash_sp]: 1.298e-05 [merge_comm]: 9.64e-06 [allreduce_fusion]: 8.87999e-06 [matmul_add_comm_reduction]: 2.895e-05 [allreduce_slice_to_reducescatter]: 1.02e-06 [virtual_shard_identity]: 1.762e-05 [virtual_dataset]: 1.547e-05 [get_grad_eliminate_]: 1.632e-05 [virtual_output]: 1.552e-05 [merge_forward]: 9.61003e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 1.866e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.949e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 2.73e-05 [set_forward_comm_id_for_comm_node_pass]: 9.44e-06 [meta_fg_expand]: 0.00157817 [flash_sp_send_recv_attached]: 3.78999e-06 [receive_attached]: 2.88e-06 [after_resolve]: 6.261e-05 [a_after_grad]: 8.467e-05 [renormalize]: 0.00280237 [add_forward_monad_depend]: 1.035e-05 [auto_monad_grad]: 6.17999e-06 [auto_monad_eliminator]: 5.779e-05 [cse]: 0.00017536 [a_3]: 0.0003442 [Cycle 2]: 0.00326516, [45] [expand_dump_flag]: 2.36998e-06 [switch_simplify]: 4.639e-05 [loop_unroll]: 4.372e-05 [a_1]: 0.00160204 [with_stream_mark]: 1.442e-05 [recompute_prepare]: 1.164e-05 [updatestate_depend_eliminate]: 5.74999e-06 [updatestate_assign_eliminate]: 4.63001e-06 [updatestate_loads_eliminate]: 4.23001e-06 [parameter_eliminate]: 1.37999e-06 [a_2]: 0.00012746 [accelerated_algorithm]: 1.258e-05 [shard]: 2.23998e-06 [meta_shard_fg_expand]: 2.24999e-06 [shard_inline]: 9.16002e-06 [merge_send_recv]: 9.12999e-06 [auto_parallel]: 9.04e-06 [parallel]: 8.45999e-06 [flash_sp]: 3.75e-06 [merge_comm]: 5.89e-06 [allreduce_fusion]: 5.56e-06 [matmul_add_comm_reduction]: 9.71e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 8.99e-06 [get_grad_eliminate_]: 8.99e-06 [virtual_output]: 8.51002e-06 [merge_forward]: 4.85001e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 1.165e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.79e-05 [merge_recompute_call_nodes]: 1.04003e-06 [before_grad]: 1.489e-05 [set_forward_comm_id_for_comm_node_pass]: 5.62001e-06 [meta_fg_expand]: 9.92e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.79e-06 [after_resolve]: 1.769e-05 [a_after_grad]: 1.469e-05 [renormalize]: 0.00070827 [add_forward_monad_depend]: 4.74e-06 [auto_monad_grad]: 2.14e-06 [auto_monad_eliminator]: 1.67e-05 [cse]: 4.999e-05 [a_3]: 6.708e-05 [Cycle 3]: 0.00091037, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 1.074e-05 [loop_unroll]: 8.96002e-06 [a_1]: 0.00025153 [with_stream_mark]: 1.037e-05 [recompute_prepare]: 9.41998e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 4.06001e-06 [parameter_eliminate]: 1.16002e-06 [a_2]: 0.00012525 [accelerated_algorithm]: 1.158e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 2.06e-06 [shard_inline]: 8.99e-06 [merge_send_recv]: 7.11999e-06 [auto_parallel]: 6.88e-06 [parallel]: 5.39e-06 [flash_sp]: 1.18001e-06 [merge_comm]: 4.90001e-06 [allreduce_fusion]: 5.40001e-06 [matmul_add_comm_reduction]: 8.25999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.03e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 8.71002e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.68999e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 9.11002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.598e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.441e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 3.32997e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 1.347e-05 [a_after_grad]: 1.384e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.188e-05 [cse]: 2.62e-05 [a_3]: 5.902e-05 [py_interpret_to_execute_after_opt_a]: 1.482e-05 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 5.133e-05 [convert_after_rewriter]: 9.15999e-06 [order_py_execute_after_rewriter]: 6.99001e-06 [mutable_eliminate]: 0.00069179 [opt_b]: 0.00029475, [1] [Cycle 1]: 0.00028721, [7] [b_1]: 0.00019245 [b_2]: 1.141e-05 [updatestate_depend_eliminate]: 7.25e-06 [updatestate_assign_eliminate]: 3.93999e-06 [updatestate_loads_eliminate]: 4.08999e-06 [renormalize]: 3.59985e-07 [cse]: 3.208e-05 [optimize_parallel_all_gather_comm]: 2.154e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 2.561e-05 [loop_unroll]: 0.0004497 [opt_after_cconv]: 0.00013741, [1] [Cycle 1]: 0.0001312, [7] [c_1]: 4.832e-05 [parameter_eliminate]: 3.21999e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.9e-06 [cse]: 3.044e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 3.64e-05 [tuple_transform]: 0.00010554, [1] [Cycle 1]: 0.00010049, [4] [d_1]: 6.967e-05 [none_parameter_eliminate]: 1.87001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.86e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.988e-05 [cse_after_recomputation]: 3.346e-05, [1] [Cycle 1]: 2.831e-05, [1] [cse]: 2.206e-05 [environ_conv]: 9.77001e-06 [swap_dp_allreduce_reducescatter]: 8.26002e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.44998e-06 [label_fine_grained_interleaved_index]: 2.53003e-06 [merge_cast_opt]: 1.36002e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.33998e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.62001e-06 [reorder_send_recv_between_fp_bp]: 2.85998e-06 [comm_op_add_attrs]: 1.24998e-06 [add_comm_op_reuse_tag]: 1.41002e-06 [interleave_split_concat_branches]: 1.36998e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.24003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76003e-06 [control_data_broadcast_order]: 1.79e-05 [grouped_pairwise_exchange_alltoall]: 1.81998e-06 [offloading_packed_experts]: 5.28002e-06 [overlap_recompute_and_grad_model_parallel]: 6.76999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.97999e-06 [overlap_grad_flash_sp]: 2.715e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.53e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 0.00010291, [1] [Cycle 1]: 9.852e-05, [6] [build]: 1.117e-05 [elim_shapecalc]: 1.354e-05 [elim_not_effective]: 1.848e-05 [opt_reshape]: 1.089e-05 [fold_const_symbol]: 1.602e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.17001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.698e-05 [get_jit_bprop_graph]: 1.86998e-06 [rewriter_after_jit_bprop_graph]: 3.8e-06 [opt_after_jit_grad]: 0.00048663 [validate]: 5.238e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.00890595 [execute]: 8.42e-06 Sums bootstrap : 0.000513s : 1.45% type_inference : 0.011547s : 32.71% event_method : 0.000080s : 0.23% auto_monad : 0.000126s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000053s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000152s : 0.43% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.37% optimize.opt_a.loop_unroll : 0.000114s : 0.32% optimize.opt_a.a_1 : 0.003409s : 9.66% optimize.opt_a.with_stream_mark : 0.000049s : 0.14% optimize.opt_a.recompute_prepare : 0.000044s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000499s : 1.41% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.16% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000033s : 0.09% optimize.opt_a.auto_parallel : 0.000028s : 0.08% optimize.opt_a.parallel : 0.000036s : 0.10% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000047s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000039s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001681s : 4.76% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000094s : 0.27% optimize.opt_a.a_after_grad : 0.000113s : 0.32% optimize.opt_a.renormalize : 0.003511s : 9.95% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.24% optimize.opt_a.cse : 0.000252s : 0.71% optimize.opt_a.a_3 : 0.000470s : 1.33% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000692s : 1.96% optimize.opt_b.b_1 : 0.000192s : 0.55% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000026s : 0.07% optimize.loop_unroll : 0.000450s : 1.27% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000036s : 0.10% optimize.tuple_transform.d_1 : 0.000070s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000010s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000027s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000487s : 1.38% validate : 0.000052s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008906s : 25.23% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000905 222 5.78% : 0.000052s : 12: substitution.arithmetic_simplify 1.70% : 0.000015s : 2: substitution.cast_eliminate 0.31% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.48% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000003s : 5: substitution.fold_const_symbol 0.98% : 0.000009s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 59.22% : 0.000536s : 17: substitution.inline 1.97% : 0.000018s : 2: substitution.inline_without_move 1.21% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.83% : 0.000017s : 3: substitution.less_batch_normalization 1.56% : 0.000014s : 11: substitution.minmaximum_grad 0.72% : 0.000007s : 5: substitution.partial_eliminate 1.70% : 0.000015s : 20: substitution.remove_not_recompute_node 2.87% : 0.000026s : 10: substitution.replace_applicator 1.27% : 0.000012s : 15: substitution.replace_old_param 0.29% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.33% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.53% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.15% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.59% : 0.000069s : 30: substitution.tuple_list_get_item_eliminator 2.11% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011467 2 86.64% : 0.009935s : 1: type_inference.infer 13.36% : 0.001532s : 1: type_inference.specialize ------[replace.] 0.000232 33 57.82% : 0.000134s : 17: replace.inline 42.18% : 0.000098s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000563 33 93.74% : 0.000527s : 17: match.inline 6.26% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000801 5764 1.09% : 0.000009s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 5.47% : 0.000044s : 68: predicate.addn_zero_filter 0.99% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.98% : 0.000016s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.10% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.13% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.12% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.08% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_depend_swap 1.66% : 0.000013s : 108: predicate.environ_get_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.19% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.34% : 0.000043s : 249: predicate.inline 1.15% : 0.000009s : 55: predicate.inline_without_move 0.28% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.57% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.54% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.33% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.06% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.13% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000016s : 101: predicate.partial_defer_inline 1.62% : 0.000013s : 92: predicate.partial_eliminate 1.00% : 0.000008s : 68: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.23% : 0.000010s : 68: predicate.reduce_eliminate 2.56% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.77% : 0.000014s : 152: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.05% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.22% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.56% : 0.000004s : 32: predicate.specialize_transform 1.24% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.08% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.78% : 0.000014s : 101: predicate.switch_defer_inline 2.76% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.72% : 0.000038s : 277: predicate.switch_simplify 1.05% : 0.000008s : 68: predicate.tile_eliminate 1.02% : 0.000008s : 68: predicate.transpose_eliminate 1.36% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.71% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.39% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.88% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.58% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.52% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.08% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000005s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001656 34 56.80% : 0.000940s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.20% : 0.000715s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.066382 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.75% : 0.003152s : 1: add_attr 4.73% : 0.003142s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000133s : 1: auto_monad 0.05% : 0.000031s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.83% : 0.000550s : 1: bootstrap 0.04% : 0.000029s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.13% : 0.000089s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000459s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.06% : 0.000701s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.67% : 0.005095s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000077s : 2: opt.transform.opt_trans_graph 0.08% : 0.000056s : 4: opt.transform.symbol_engine_opt 18.01% : 0.011957s : 1: opt_a 0.21% : 0.000141s : 1: opt_after_cconv 0.75% : 0.000496s : 1: opt_after_jit_grad 0.45% : 0.000299s : 1: opt_b 21.91% : 0.014547s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000031s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000058s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.03% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000041s : 1: remove_dup_value 2.94% : 0.001955s : 2: renormalize.infer 2.32% : 0.001540s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000056s : 1: rewriter_after_opt_a 0.24% : 0.000156s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000106s : 1: symbol_engine_optimizer 13.44% : 0.008923s : 1: task_emit 0.16% : 0.000109s : 1: tuple_transform 17.42% : 0.011567s : 1: type_inference 0.14% : 0.000093s : 1: validate TotalTime = 0.0202511, [24] [bootstrap]: 0.00043911 [type_inference]: 0.0045486 [event_method]: 1.107e-05 [auto_monad]: 5.601e-05 [graph_reusing]: 5.45001e-06 [inline]: 2.51e-06 [add_attr]: 0.00317684, [1] [add_attr_with_inline]: 0.00316644, [1] [Cycle 1]: 5.097e-05, [2] [tag_attr]: 1.42e-05 [meta_addattr_fg_expand]: 3.38999e-06 [parallel-infer-symbol]: 3.93999e-06 [pre_auto_parallel]: 2.745e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 1.05001e-06 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00416026, [53] [py_interpret_to_execute]: 1.738e-05 [rewriter_before_opt_a]: 4.232e-05 [opt_a]: 0.00213675, [2] [Cycle 1]: 0.00150998, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 2.495e-05 [loop_unroll]: 1.398e-05 [a_1]: 0.00031031 [with_stream_mark]: 1.774e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.59002e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.719e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.43e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.57998e-06 [auto_parallel]: 6.02001e-06 [parallel]: 2.034e-05 [flash_sp]: 8.37998e-06 [merge_comm]: 3.90998e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.52001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.68001e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.35999e-06 [merge_forward]: 4.16001e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 9.78998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.09e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.57999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.37001e-06 [flash_sp_send_recv_attached]: 2.48998e-06 [receive_attached]: 2.51998e-06 [after_resolve]: 1.137e-05 [a_after_grad]: 9.12999e-06 [renormalize]: 0.00055441 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 2.86e-06 [auto_monad_eliminator]: 1.386e-05 [cse]: 3.017e-05 [a_3]: 4.31e-05 [Cycle 2]: 0.00061645, [45] [expand_dump_flag]: 1.24003e-06 [switch_simplify]: 7.19001e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012756 [with_stream_mark]: 1.294e-05 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 3.13998e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.08001e-06 [a_2]: 6.79e-05 [accelerated_algorithm]: 5.64998e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.21997e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 5.59998e-06 [auto_parallel]: 6.71e-06 [parallel]: 5.39998e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 6.39001e-06 [allreduce_slice_to_reducescatter]: 3.99974e-07 [virtual_shard_identity]: 6.34001e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 4.89e-06 [merge_forward]: 3.20998e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 6.99001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.45001e-06 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 8.56002e-06 [set_forward_comm_id_for_comm_node_pass]: 4.45e-06 [meta_fg_expand]: 1.83002e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 9.68002e-06 [a_after_grad]: 8.42998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.54e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 7.48999e-06 [cse]: 1.478e-05 [a_3]: 3.217e-05 [py_interpret_to_execute_after_opt_a]: 9.52001e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.503e-05 [convert_after_rewriter]: 7.23e-06 [order_py_execute_after_rewriter]: 5.47999e-06 [mutable_eliminate]: 0.00056828 [opt_b]: 0.00018653, [1] [Cycle 1]: 0.00017941, [7] [b_1]: 0.00010774 [b_2]: 7.33999e-06 [updatestate_depend_eliminate]: 6.07001e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 8.70001e-07 [cse]: 1.908e-05 [optimize_parallel_all_gather_comm]: 1.722e-05 [overlap_param_gather]: 2.48e-06 [cconv]: 2.604e-05 [loop_unroll]: 0.0004553 [opt_after_cconv]: 9.871e-05, [1] [Cycle 1]: 9.296e-05, [7] [c_1]: 2.873e-05 [parameter_eliminate]: 3.99002e-06 [updatestate_depend_eliminate]: 5.75001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.667e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.386e-05 [tuple_transform]: 7.169e-05, [1] [Cycle 1]: 6.748e-05, [4] [d_1]: 4.196e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 4.883e-05 [cse_after_recomputation]: 2.146e-05, [1] [Cycle 1]: 1.672e-05, [1] [cse]: 1.118e-05 [environ_conv]: 5.29e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.94999e-06 [label_micro_interleaved_index]: 4.83001e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.23998e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 8.49977e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.41998e-06 [reorder_send_recv_between_fp_bp]: 2.99999e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.52001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.244e-05 [grouped_pairwise_exchange_alltoall]: 1.71998e-06 [offloading_packed_experts]: 3.69002e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.53e-06 [overlap_grad_ring_attention]: 4.84e-06 [overlap_grad_flash_sp]: 2.008e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.53e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 7.248e-05, [1] [Cycle 1]: 6.814e-05, [6] [build]: 3.34001e-06 [elim_shapecalc]: 9.23002e-06 [elim_not_effective]: 1.181e-05 [opt_reshape]: 6.51e-06 [fold_const_symbol]: 8.93002e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.99999e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.702e-05 [get_jit_bprop_graph]: 2.04999e-06 [rewriter_after_jit_bprop_graph]: 4.1e-06 [opt_after_jit_grad]: 0.00047376 [validate]: 4.012e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00701977 [execute]: 9.58997e-06 Sums bootstrap : 0.000439s : 2.73% type_inference : 0.004549s : 28.33% event_method : 0.000011s : 0.07% auto_monad : 0.000056s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000027s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.11% optimize.rewriter_before_opt_a : 0.000042s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.20% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000438s : 2.73% optimize.opt_a.with_stream_mark : 0.000031s : 0.19% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000555s : 3.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000045s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000568s : 3.54% optimize.opt_b.b_1 : 0.000108s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.01% optimize.opt_b.cse : 0.000019s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000026s : 0.16% optimize.loop_unroll : 0.000455s : 2.84% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000042s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000020s : 0.13% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.11% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000474s : 2.95% validate : 0.000040s : 0.25% backend_pass : 0.000001s : 0.01% task_emit : 0.007020s : 43.72% execute : 0.000010s : 0.06% Time group info: ------[substitution.] 0.000138 26 17.95% : 0.000025s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.67% : 0.000006s : 4: substitution.graph_param_transform 66.19% : 0.000092s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.06% : 0.000004s : 4: substitution.remove_not_recompute_node 3.39% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004500 2 92.02% : 0.004141s : 1: type_inference.infer 7.98% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000090 2 100.00% : 0.000090s : 2: match.inline ------[predicate.] 0.000141 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.48% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.69% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.98% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.85% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.07% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.70% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.43% : 0.000006s : 41: predicate.switch_simplify 0.83% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.75% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.76% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.61% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.02% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.79% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000280 6 40.44% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.56% : 0.000167s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029086 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.94% : 0.003182s : 1: add_attr 10.90% : 0.003171s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.64% : 0.000478s : 1: bootstrap 0.10% : 0.000030s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.06% : 0.000017s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.60% : 0.000465s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.99% : 0.000578s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.73% : 0.000793s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000046s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.36% : 0.002140s : 1: opt_a 0.35% : 0.000102s : 1: opt_after_cconv 1.66% : 0.000483s : 1: opt_after_jit_grad 0.65% : 0.000190s : 1: opt_b 14.32% : 0.004165s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.07% : 0.000021s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 1.11% : 0.000322s : 1: renormalize.infer 0.78% : 0.000225s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000039s : 1: rewriter_after_opt_a 0.16% : 0.000046s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000075s : 1: symbol_engine_optimizer 24.20% : 0.007040s : 1: task_emit 0.26% : 0.000074s : 1: tuple_transform 15.71% : 0.004569s : 1: type_inference 0.27% : 0.000078s : 1: validate TotalTime = 0.0406877, [24] [bootstrap]: 0.00046987 [type_inference]: 0.0111407 [event_method]: 4.686e-05 [auto_monad]: 0.00012282 [graph_reusing]: 8.77e-06 [inline]: 2.46e-06 [add_attr]: 0.00347091, [1] [add_attr_with_inline]: 0.00345907, [1] [Cycle 1]: 8.42e-05, [2] [tag_attr]: 3.696e-05 [meta_addattr_fg_expand]: 8.78001e-06 [parallel-infer-symbol]: 3.83001e-06 [pre_auto_parallel]: 5.465e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.0152361, [53] [py_interpret_to_execute]: 4.227e-05 [rewriter_before_opt_a]: 0.00014129 [opt_a]: 0.0125141, [3] [Cycle 1]: 0.00786277, [45] [expand_dump_flag]: 4.05998e-06 [switch_simplify]: 6.737e-05 [loop_unroll]: 5.469e-05 [a_1]: 0.00139272 [with_stream_mark]: 2.887e-05 [recompute_prepare]: 2.183e-05 [updatestate_depend_eliminate]: 9.27001e-06 [updatestate_assign_eliminate]: 8.02e-06 [updatestate_loads_eliminate]: 7.8e-06 [parameter_eliminate]: 2.61999e-06 [a_2]: 0.00024766 [accelerated_algorithm]: 3.191e-05 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 3.53999e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.727e-05 [auto_parallel]: 1.143e-05 [parallel]: 2.089e-05 [flash_sp]: 1.319e-05 [merge_comm]: 1.029e-05 [allreduce_fusion]: 9.17001e-06 [matmul_add_comm_reduction]: 3.341e-05 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 1.86e-05 [virtual_dataset]: 1.579e-05 [get_grad_eliminate_]: 1.538e-05 [virtual_output]: 1.552e-05 [merge_forward]: 9.71998e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 2e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.954e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 2.697e-05 [set_forward_comm_id_for_comm_node_pass]: 9.64e-06 [meta_fg_expand]: 0.00168071 [flash_sp_send_recv_attached]: 4.18999e-06 [receive_attached]: 3.45998e-06 [after_resolve]: 6.424e-05 [a_after_grad]: 8.274e-05 [renormalize]: 0.00292964 [add_forward_monad_depend]: 1.178e-05 [auto_monad_grad]: 7e-06 [auto_monad_eliminator]: 6.013e-05 [cse]: 0.00017597 [a_3]: 0.00035347 [Cycle 2]: 0.00368286, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 4.951e-05 [loop_unroll]: 4.348e-05 [a_1]: 0.00162119 [with_stream_mark]: 2.176e-05 [recompute_prepare]: 1.375e-05 [updatestate_depend_eliminate]: 6.31998e-06 [updatestate_assign_eliminate]: 5.52999e-06 [updatestate_loads_eliminate]: 4.28001e-06 [parameter_eliminate]: 2.49999e-06 [a_2]: 0.00012958 [accelerated_algorithm]: 1.375e-05 [shard]: 2.52001e-06 [meta_shard_fg_expand]: 3.08e-06 [shard_inline]: 1.013e-05 [merge_send_recv]: 1.128e-05 [auto_parallel]: 1.225e-05 [parallel]: 1.105e-05 [flash_sp]: 3.96001e-06 [merge_comm]: 5.74999e-06 [allreduce_fusion]: 5.05999e-06 [matmul_add_comm_reduction]: 1.257e-05 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 1.088e-05 [virtual_dataset]: 9.66e-06 [get_grad_eliminate_]: 9.05999e-06 [virtual_output]: 8.42e-06 [merge_forward]: 5.07999e-06 [cell_reuse_recompute_pass]: 1.51002e-06 [offload_activation]: 1.455e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.732e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.46e-05 [set_forward_comm_id_for_comm_node_pass]: 5.82999e-06 [meta_fg_expand]: 6.267e-05 [flash_sp_send_recv_attached]: 1.86e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 1.937e-05 [a_after_grad]: 1.422e-05 [renormalize]: 0.00106221 [add_forward_monad_depend]: 6.43e-06 [auto_monad_grad]: 2.56998e-06 [auto_monad_eliminator]: 1.99e-05 [cse]: 6.426e-05 [a_3]: 6.983e-05 [Cycle 3]: 0.00094722, [45] [expand_dump_flag]: 2.21998e-06 [switch_simplify]: 1.062e-05 [loop_unroll]: 9.09998e-06 [a_1]: 0.0002585 [with_stream_mark]: 1.333e-05 [recompute_prepare]: 9.44e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 4.21001e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 0.0001262 [accelerated_algorithm]: 1.306e-05 [shard]: 1.47999e-06 [meta_shard_fg_expand]: 2.34001e-06 [shard_inline]: 9.03002e-06 [merge_send_recv]: 8.15e-06 [auto_parallel]: 8.88002e-06 [parallel]: 6.54001e-06 [flash_sp]: 1.09e-06 [merge_comm]: 5.34e-06 [allreduce_fusion]: 5.11002e-06 [matmul_add_comm_reduction]: 8.93002e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 1.051e-05 [virtual_dataset]: 8.65001e-06 [get_grad_eliminate_]: 8.42e-06 [virtual_output]: 8.67e-06 [merge_forward]: 5.34e-06 [cell_reuse_recompute_pass]: 1.85001e-06 [offload_activation]: 1.189e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.6e-05 [merge_recompute_call_nodes]: 1.19e-06 [before_grad]: 1.511e-05 [set_forward_comm_id_for_comm_node_pass]: 5.66e-06 [meta_fg_expand]: 3.23e-06 [flash_sp_send_recv_attached]: 1.05999e-06 [receive_attached]: 1.60999e-06 [after_resolve]: 1.407e-05 [a_after_grad]: 1.446e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.157e-05 [cse]: 2.93e-05 [a_3]: 5.97e-05 [py_interpret_to_execute_after_opt_a]: 1.767e-05 [slice_cell_reuse_recomputed_activation]: 2.17999e-06 [rewriter_after_opt_a]: 5.712e-05 [convert_after_rewriter]: 9.96e-06 [order_py_execute_after_rewriter]: 7.26001e-06 [mutable_eliminate]: 0.00072457 [opt_b]: 0.00029632, [1] [Cycle 1]: 0.00028876, [7] [b_1]: 0.00019071 [b_2]: 1.185e-05 [updatestate_depend_eliminate]: 8.85001e-06 [updatestate_assign_eliminate]: 4.12998e-06 [updatestate_loads_eliminate]: 3.86999e-06 [renormalize]: 7.00005e-07 [cse]: 3.388e-05 [optimize_parallel_all_gather_comm]: 2.219e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.874e-05 [loop_unroll]: 0.00045652 [opt_after_cconv]: 0.00014054, [1] [Cycle 1]: 0.00013423, [7] [c_1]: 4.989e-05 [parameter_eliminate]: 2.56e-06 [updatestate_depend_eliminate]: 7.38999e-06 [updatestate_assign_eliminate]: 4.34002e-06 [updatestate_loads_eliminate]: 3.88999e-06 [cse]: 3.048e-05 [renormalize]: 9.89996e-07 [remove_dup_value]: 4.213e-05 [tuple_transform]: 0.00010608, [1] [Cycle 1]: 0.00010107, [4] [d_1]: 6.975e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 1.088e-05 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 7.027e-05 [cse_after_recomputation]: 3.241e-05, [1] [Cycle 1]: 2.751e-05, [1] [cse]: 2.173e-05 [environ_conv]: 1.076e-05 [swap_dp_allreduce_reducescatter]: 6.179e-05 [bias_add_comm_swap]: 2.77002e-06 [label_micro_interleaved_index]: 5.24e-06 [label_fine_grained_interleaved_index]: 3.18998e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.04003e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 3.3e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.24998e-06 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02999e-06 [control_data_broadcast_order]: 1.876e-05 [grouped_pairwise_exchange_alltoall]: 1.66998e-06 [offloading_packed_experts]: 5.00999e-06 [overlap_recompute_and_grad_model_parallel]: 6.61e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 5.56e-06 [overlap_grad_flash_sp]: 3.033e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 0.00010644, [1] [Cycle 1]: 0.00010151, [6] [build]: 1.221e-05 [elim_shapecalc]: 1.483e-05 [elim_not_effective]: 1.915e-05 [opt_reshape]: 1.087e-05 [fold_const_symbol]: 1.514e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.54999e-06 [pipeline_parallel_scheduler]: 1.68002e-06 [auto_monad_reorder]: 2.657e-05 [get_jit_bprop_graph]: 1.88002e-06 [rewriter_after_jit_bprop_graph]: 4.97999e-06 [opt_after_jit_grad]: 0.00048874 [validate]: 5.568e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00926616 [execute]: 9.92999e-06 Sums bootstrap : 0.000470s : 1.31% type_inference : 0.011141s : 31.10% event_method : 0.000047s : 0.13% auto_monad : 0.000123s : 0.34% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000055s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.12% optimize.rewriter_before_opt_a : 0.000141s : 0.39% optimize.opt_a.expand_dump_flag : 0.000009s : 0.03% optimize.opt_a.switch_simplify : 0.000127s : 0.36% optimize.opt_a.loop_unroll : 0.000107s : 0.30% optimize.opt_a.a_1 : 0.003272s : 9.14% optimize.opt_a.with_stream_mark : 0.000064s : 0.18% optimize.opt_a.recompute_prepare : 0.000045s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000007s : 0.02% optimize.opt_a.a_2 : 0.000503s : 1.41% optimize.opt_a.accelerated_algorithm : 0.000059s : 0.16% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.03% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000037s : 0.10% optimize.opt_a.auto_parallel : 0.000033s : 0.09% optimize.opt_a.parallel : 0.000038s : 0.11% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000055s : 0.15% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.09% optimize.opt_a.virtual_output : 0.000033s : 0.09% optimize.opt_a.merge_forward : 0.000020s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000046s : 0.13% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001747s : 4.88% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000008s : 0.02% optimize.opt_a.after_resolve : 0.000098s : 0.27% optimize.opt_a.a_after_grad : 0.000111s : 0.31% optimize.opt_a.renormalize : 0.003992s : 11.14% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.05% optimize.opt_a.auto_monad_grad : 0.000011s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000092s : 0.26% optimize.opt_a.cse : 0.000270s : 0.75% optimize.opt_a.a_3 : 0.000483s : 1.35% optimize.py_interpret_to_execute_after_opt_a : 0.000018s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000057s : 0.16% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000725s : 2.02% optimize.opt_b.b_1 : 0.000191s : 0.53% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000029s : 0.08% optimize.loop_unroll : 0.000457s : 1.27% optimize.opt_after_cconv.c_1 : 0.000050s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000042s : 0.12% optimize.tuple_transform.d_1 : 0.000070s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000070s : 0.20% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000011s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000062s : 0.17% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000019s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000030s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000489s : 1.36% validate : 0.000056s : 0.16% backend_pass : 0.000001s : 0.00% task_emit : 0.009266s : 25.87% execute : 0.000010s : 0.03% Time group info: ------[substitution.] 0.000871 218 6.59% : 0.000057s : 11: substitution.arithmetic_simplify 2.11% : 0.000018s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.56% : 0.000005s : 5: substitution.float_depend_g_call 0.49% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000009s : 8: substitution.graph_param_transform 0.44% : 0.000004s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 55.92% : 0.000487s : 16: substitution.inline 2.05% : 0.000018s : 2: substitution.inline_without_move 1.33% : 0.000012s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000018s : 3: substitution.less_batch_normalization 1.65% : 0.000014s : 11: substitution.minmaximum_grad 0.81% : 0.000007s : 5: substitution.partial_eliminate 1.64% : 0.000014s : 20: substitution.remove_not_recompute_node 3.24% : 0.000028s : 10: substitution.replace_applicator 1.52% : 0.000013s : 15: substitution.replace_old_param 0.27% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.55% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.68% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.23% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.88% : 0.000069s : 28: substitution.tuple_list_get_item_eliminator 2.25% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011049 2 86.71% : 0.009580s : 1: type_inference.infer 13.29% : 0.001469s : 1: type_inference.specialize ------[replace.] 0.000216 30 60.95% : 0.000132s : 16: replace.inline 39.05% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000510 30 93.66% : 0.000478s : 16: match.inline 6.34% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 99: predicate.arithmetic_simplify 1.21% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.24% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.54% : 0.000042s : 244: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.71% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 164: predicate.load_eliminater 0.36% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.14% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.44% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.16% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.40% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.67% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.68% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.36% : 0.000010s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.33% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000014s : 97: predicate.switch_defer_inline 2.88% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.82% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.52% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.57% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.21% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.19% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.61% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001765 32 54.95% : 0.000970s : 12: func_graph_cloner_run.FuncGraphClonerGraph 45.05% : 0.000795s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.068673 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.06% : 0.003476s : 1: add_attr 5.04% : 0.003464s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000075s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000130s : 1: auto_monad 0.04% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.74% : 0.000510s : 1: bootstrap 0.05% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000022s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.08% : 0.000055s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.68% : 0.000466s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.07% : 0.000734s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.23% : 0.004964s : 117: opt.transform.opt_a 0.07% : 0.000049s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000177s : 28: opt.transform.opt_b 0.11% : 0.000078s : 2: opt.transform.opt_trans_graph 0.08% : 0.000057s : 4: opt.transform.symbol_engine_opt 18.23% : 0.012518s : 1: opt_a 0.21% : 0.000144s : 1: opt_after_cconv 0.73% : 0.000499s : 1: opt_after_jit_grad 0.44% : 0.000300s : 1: opt_b 22.19% : 0.015241s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000033s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000059s : 1: pre_auto_parallel 0.07% : 0.000046s : 1: py_interpret_to_execute 0.03% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000046s : 1: remove_dup_value 3.31% : 0.002271s : 2: renormalize.infer 2.48% : 0.001703s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000061s : 1: rewriter_after_opt_a 0.21% : 0.000147s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.10% : 0.000066s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000109s : 1: symbol_engine_optimizer 13.52% : 0.009287s : 1: task_emit 0.16% : 0.000109s : 1: tuple_transform 16.27% : 0.011170s : 1: type_inference 0.14% : 0.000098s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x3-kbk],max_mem:18.0M TotalTime = 0.137998, [24] [bootstrap]: 0.00052055 [type_inference]: 0.00639701 [event_method]: 1.391e-05 [auto_monad]: 7.481e-05 [graph_reusing]: 6.73998e-06 [inline]: 2.94001e-06 [add_attr]: 0.00385312, [1] [add_attr_with_inline]: 0.00383979, [1] [Cycle 1]: 5.601e-05, [2] [tag_attr]: 1.831e-05 [meta_addattr_fg_expand]: 4.68001e-06 [parallel-infer-symbol]: 4.18999e-06 [pre_auto_parallel]: 3.354e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00457407, [53] [py_interpret_to_execute]: 9.495e-05 [rewriter_before_opt_a]: 6.656e-05 [opt_a]: 0.00244864, [2] [Cycle 1]: 0.0018332, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 3.521e-05 [loop_unroll]: 2.28e-05 [a_1]: 0.00049529 [with_stream_mark]: 1.519e-05 [recompute_prepare]: 8.77e-06 [updatestate_depend_eliminate]: 4.08999e-06 [updatestate_assign_eliminate]: 3.68e-06 [updatestate_loads_eliminate]: 3.21001e-06 [parameter_eliminate]: 1.88002e-06 [a_2]: 8.112e-05 [accelerated_algorithm]: 6.42001e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 8.3e-06 [auto_parallel]: 6.99001e-06 [parallel]: 2.85e-05 [flash_sp]: 8.38001e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 9.13002e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 8.42e-06 [virtual_dataset]: 6.63e-06 [get_grad_eliminate_]: 5.92999e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 3.96001e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 9.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.37e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 1.004e-05 [set_forward_comm_id_for_comm_node_pass]: 3.86001e-06 [meta_fg_expand]: 2.89999e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 1.114e-05 [a_after_grad]: 8.42998e-06 [renormalize]: 0.00064542 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 2.29999e-06 [auto_monad_eliminator]: 1.41e-05 [cse]: 2.959e-05 [a_3]: 4.234e-05 [Cycle 2]: 0.00060396, [45] [expand_dump_flag]: 1.14e-06 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.77001e-06 [a_1]: 0.00012931 [with_stream_mark]: 1.086e-05 [recompute_prepare]: 5.92001e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.848e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.73002e-06 [merge_send_recv]: 4.87e-06 [auto_parallel]: 6.41e-06 [parallel]: 4.82e-06 [flash_sp]: 3.56999e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.93e-06 [matmul_add_comm_reduction]: 5.66998e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.40002e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.10001e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.95998e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 6.68e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 9.39996e-07 [before_grad]: 8.43999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.02002e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.42999e-06 [after_resolve]: 9.32999e-06 [a_after_grad]: 7.78999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.57002e-06 [cse]: 1.365e-05 [a_3]: 3.203e-05 [py_interpret_to_execute_after_opt_a]: 8.78001e-06 [slice_cell_reuse_recomputed_activation]: 1.96998e-06 [rewriter_after_opt_a]: 3.448e-05 [convert_after_rewriter]: 7.07002e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00055027 [opt_b]: 0.0001925, [1] [Cycle 1]: 0.00018581, [7] [b_1]: 0.00010882 [b_2]: 6.87002e-06 [updatestate_depend_eliminate]: 5.60001e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.11003e-06 [renormalize]: 3.50003e-07 [cse]: 2.6e-05 [optimize_parallel_all_gather_comm]: 1.708e-05 [overlap_param_gather]: 2.14e-06 [cconv]: 2.651e-05 [loop_unroll]: 0.00043605 [opt_after_cconv]: 0.00011953, [1] [Cycle 1]: 0.00011319, [7] [c_1]: 2.722e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.52999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.01e-06 [cse]: 1.69e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.379e-05 [tuple_transform]: 7.365e-05, [1] [Cycle 1]: 6.888e-05, [4] [d_1]: 4.236e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 2.17001e-06 [add_recomputation]: 5.881e-05 [cse_after_recomputation]: 2.173e-05, [1] [Cycle 1]: 1.706e-05, [1] [cse]: 1.128e-05 [environ_conv]: 5.78002e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 3.23998e-06 [label_micro_interleaved_index]: 5.32001e-06 [label_fine_grained_interleaved_index]: 2.94001e-06 [merge_cast_opt]: 1.71e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 1.15999e-06 [remove_cast_before_assign_add]: 1.34003e-06 [full_micro_interleaved_order_control]: 2.49999e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.195e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.85e-06 [overlap_recompute_and_grad_model_parallel]: 4.28999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.42001e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.87e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 1.24998e-06 [symbol_engine_optimizer]: 7.034e-05, [1] [Cycle 1]: 6.575e-05, [6] [build]: 3.36999e-06 [elim_shapecalc]: 8.92999e-06 [elim_not_effective]: 1.172e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.41998e-06 [pipeline_parallel_scheduler]: 1.56002e-06 [auto_monad_reorder]: 1.604e-05 [get_jit_bprop_graph]: 1.82999e-06 [rewriter_after_jit_bprop_graph]: 3.87002e-06 [opt_after_jit_grad]: 0.0004725 [validate]: 3.692e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.121731 [execute]: 1.022e-05 Sums bootstrap : 0.000521s : 0.39% type_inference : 0.006397s : 4.81% event_method : 0.000014s : 0.01% auto_monad : 0.000075s : 0.06% graph_reusing : 0.000007s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000034s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000095s : 0.07% optimize.rewriter_before_opt_a : 0.000067s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.03% optimize.opt_a.loop_unroll : 0.000029s : 0.02% optimize.opt_a.a_1 : 0.000625s : 0.47% optimize.opt_a.with_stream_mark : 0.000026s : 0.02% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.11% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000013s : 0.01% optimize.opt_a.parallel : 0.000033s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.01% optimize.opt_a.renormalize : 0.000645s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000043s : 0.03% optimize.opt_a.a_3 : 0.000074s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000550s : 0.41% optimize.opt_b.b_1 : 0.000109s : 0.08% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000026s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.02% optimize.loop_unroll : 0.000436s : 0.33% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000042s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000473s : 0.35% validate : 0.000037s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.121731s : 91.45% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000191 30 14.80% : 0.000028s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000001s : 2: substitution.fold_const_symbol 3.29% : 0.000006s : 4: substitution.graph_param_transform 66.88% : 0.000128s : 3: substitution.inline 1.92% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000005s : 4: substitution.remove_not_recompute_node 2.64% : 0.000005s : 4: substitution.replace_old_param 5.98% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006342 2 90.67% : 0.005750s : 1: type_inference.infer 9.33% : 0.000592s : 1: type_inference.specialize ------[replace.] 0.000042 5 70.50% : 0.000029s : 3: replace.inline 29.50% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000136 5 92.41% : 0.000126s : 3: match.inline 7.59% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.41% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000002s : 11: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.01% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.02% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.32% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_depend_swap 1.87% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.51% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.39% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 1.00% : 0.000002s : 11: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.64% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.89% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.45% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.51% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.21% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.06% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.91% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 43.78% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.22% : 0.000217s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.148221 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.60% : 0.003858s : 1: add_attr 2.59% : 0.003844s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000081s : 1: auto_monad 0.01% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.38% : 0.000556s : 1: bootstrap 0.02% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000011s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.30% : 0.000445s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000560s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.68% : 0.001007s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000047s : 2: opt.transform.opt_trans_graph 0.02% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.65% : 0.002452s : 1: opt_a 0.08% : 0.000123s : 1: opt_after_cconv 0.33% : 0.000483s : 1: opt_after_jit_grad 0.13% : 0.000196s : 1: opt_b 3.09% : 0.004578s : 1: optimize 0.01% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000038s : 1: pre_auto_parallel 0.07% : 0.000100s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.24% : 0.000355s : 1: renormalize.infer 0.19% : 0.000283s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000038s : 1: rewriter_after_opt_a 0.05% : 0.000071s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000073s : 1: symbol_engine_optimizer 82.14% : 0.121756s : 1: task_emit 0.05% : 0.000076s : 1: tuple_transform 4.33% : 0.006416s : 1: type_inference 0.05% : 0.000067s : 1: validate TotalTime = 0.131057, [24] [bootstrap]: 0.00051696 [type_inference]: 0.00485346 [event_method]: 1.266e-05 [auto_monad]: 5.474e-05 [graph_reusing]: 5.53002e-06 [inline]: 2.87002e-06 [add_attr]: 0.00375788, [1] [add_attr_with_inline]: 0.00374455, [1] [Cycle 1]: 6.333e-05, [2] [tag_attr]: 1.655e-05 [meta_addattr_fg_expand]: 3.37997e-06 [parallel-infer-symbol]: 3.54002e-06 [pre_auto_parallel]: 3.256e-05 [insert-virtual-dataset]: 3.17002e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00469592, [53] [py_interpret_to_execute]: 2.227e-05 [rewriter_before_opt_a]: 4.691e-05 [opt_a]: 0.0023588, [2] [Cycle 1]: 0.00169762, [45] [expand_dump_flag]: 3.11999e-06 [switch_simplify]: 2.575e-05 [loop_unroll]: 1.37e-05 [a_1]: 0.0003396 [with_stream_mark]: 1.973e-05 [recompute_prepare]: 8.23999e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 3.81999e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.911e-05 [accelerated_algorithm]: 7.18998e-06 [shard]: 2.91999e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 7.4e-06 [merge_send_recv]: 8.75999e-06 [auto_parallel]: 7.94002e-06 [parallel]: 2.002e-05 [flash_sp]: 9.15001e-06 [merge_comm]: 3.84002e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.56e-06 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 7.88001e-06 [virtual_dataset]: 5.67999e-06 [get_grad_eliminate_]: 6.04999e-06 [virtual_output]: 6.11998e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 1.082e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.223e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 1.087e-05 [set_forward_comm_id_for_comm_node_pass]: 3.73999e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 2.36998e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.15e-05 [a_after_grad]: 9.86e-06 [renormalize]: 0.00067591 [add_forward_monad_depend]: 5.83002e-06 [auto_monad_grad]: 3.08e-06 [auto_monad_eliminator]: 1.506e-05 [cse]: 2.966e-05 [a_3]: 4.568e-05 [Cycle 2]: 0.00064915, [45] [expand_dump_flag]: 1.30001e-06 [switch_simplify]: 7.47998e-06 [loop_unroll]: 6.40002e-06 [a_1]: 0.00013707 [with_stream_mark]: 1.133e-05 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 3.2e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 7.283e-05 [accelerated_algorithm]: 6.26998e-06 [shard]: 1.59e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 5.92999e-06 [auto_parallel]: 6.14999e-06 [parallel]: 5.99e-06 [flash_sp]: 3.76001e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.99999e-06 [matmul_add_comm_reduction]: 6.31e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.36998e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 3.01001e-06 [cell_reuse_recompute_pass]: 1.99e-06 [offload_activation]: 6.92002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.03e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 9.45001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.01001e-06 [meta_fg_expand]: 1.94e-06 [flash_sp_send_recv_attached]: 7.10017e-07 [receive_attached]: 1.49998e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 9.52001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.96003e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 7.46001e-06 [cse]: 1.522e-05 [a_3]: 3.257e-05 [py_interpret_to_execute_after_opt_a]: 1.124e-05 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 3.813e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 6.19001e-06 [mutable_eliminate]: 0.00075259 [opt_b]: 0.00020565, [1] [Cycle 1]: 0.00019741, [7] [b_1]: 0.00011444 [b_2]: 7.44002e-06 [updatestate_depend_eliminate]: 8.78001e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 7.50006e-07 [cse]: 2.329e-05 [optimize_parallel_all_gather_comm]: 1.932e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 3.107e-05 [loop_unroll]: 0.0004849 [opt_after_cconv]: 0.0001097, [1] [Cycle 1]: 0.00010323, [7] [c_1]: 3.041e-05 [parameter_eliminate]: 5.36998e-06 [updatestate_depend_eliminate]: 6.75002e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.973e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 1.513e-05 [tuple_transform]: 7.719e-05, [1] [Cycle 1]: 7.241e-05, [4] [d_1]: 4.51e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 1.50999e-06 [add_recomputation]: 5.532e-05 [cse_after_recomputation]: 2.179e-05, [1] [Cycle 1]: 1.683e-05, [1] [cse]: 1.129e-05 [environ_conv]: 5.97001e-06 [swap_dp_allreduce_reducescatter]: 5.13002e-06 [bias_add_comm_swap]: 3.41999e-06 [label_micro_interleaved_index]: 6.68e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.48002e-06 [assign_add_opt]: 1.36002e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.42001e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 2.09999e-06 [control_data_broadcast_order]: 1.468e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 4.62e-06 [overlap_recompute_and_grad_model_parallel]: 5.21002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27999e-06 [overlap_recompute_comm]: 2.68e-06 [overlap_grad_ring_attention]: 4.42e-06 [overlap_grad_flash_sp]: 2.324e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 7.824e-05, [1] [Cycle 1]: 7.32e-05, [6] [build]: 4.03001e-06 [elim_shapecalc]: 1.042e-05 [elim_not_effective]: 1.237e-05 [opt_reshape]: 6.73e-06 [fold_const_symbol]: 9.16998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 2.51e-06 [pipeline_parallel_scheduler]: 1.84e-06 [auto_monad_reorder]: 1.853e-05 [get_jit_bprop_graph]: 2.09e-06 [rewriter_after_jit_bprop_graph]: 6.05002e-06 [opt_after_jit_grad]: 0.00058953 [validate]: 4.684e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.116177 [execute]: 9.40001e-06 Sums bootstrap : 0.000517s : 0.41% type_inference : 0.004853s : 3.85% event_method : 0.000013s : 0.01% auto_monad : 0.000055s : 0.04% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000033s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000047s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000477s : 0.38% optimize.opt_a.with_stream_mark : 0.000031s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000152s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.01% optimize.opt_a.auto_parallel : 0.000014s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000019s : 0.02% optimize.opt_a.renormalize : 0.000676s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.02% optimize.opt_a.cse : 0.000045s : 0.04% optimize.opt_a.a_3 : 0.000078s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000038s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000753s : 0.60% optimize.opt_b.b_1 : 0.000114s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000031s : 0.02% optimize.loop_unroll : 0.000485s : 0.38% optimize.opt_after_cconv.c_1 : 0.000030s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.01% optimize.tuple_transform.d_1 : 0.000045s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000007s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000019s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000590s : 0.47% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.116177s : 92.05% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000156 26 17.63% : 0.000027s : 4: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 4.08% : 0.000006s : 4: substitution.graph_param_transform 67.79% : 0.000106s : 2: substitution.inline 2.53% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.96% : 0.000005s : 4: substitution.remove_not_recompute_node 2.78% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004804 2 91.72% : 0.004407s : 1: type_inference.infer 8.28% : 0.000398s : 1: type_inference.specialize ------[replace.] 0.000023 2 100.00% : 0.000023s : 2: replace.inline ------[match.] 0.000104 2 100.00% : 0.000104s : 2: match.inline ------[predicate.] 0.000155 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.67% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.03% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.96% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.95% : 0.000001s : 13: predicate.environ_get_depend_swap 1.80% : 0.000003s : 21: predicate.environ_get_eliminate 0.96% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.89% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 1.01% : 0.000002s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 6.44% : 0.000010s : 44: predicate.inline 1.08% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 8: predicate.less_batch_normalization 1.46% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.06% : 0.000003s : 26: predicate.load_eliminater 1.73% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.64% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.99% : 0.000002s : 8: predicate.mini_step_allgather_replace 0.65% : 0.000001s : 9: predicate.minmaximum_grad 2.41% : 0.000004s : 4: predicate.mutable_eliminate 0.49% : 0.000001s : 4: predicate.opt_reshape 0.65% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 9: predicate.reduce_eliminate 2.06% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 1.03% : 0.000002s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.56% : 0.000001s : 4: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.18% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000002s : 11: predicate.switch_defer_inline 1.69% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.15% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.75% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.90% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.61% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000320 6 39.49% : 0.000126s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.51% : 0.000194s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.141207 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.67% : 0.003764s : 1: add_attr 2.65% : 0.003749s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000061s : 1: auto_monad 0.02% : 0.000023s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.39% : 0.000556s : 1: bootstrap 0.02% : 0.000035s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000010s : 1: label_micro_interleaved_index 0.35% : 0.000496s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.54% : 0.000768s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000020s : 1: opt.transform.mutable_eliminate 0.60% : 0.000853s : 78: opt.transform.opt_a 0.02% : 0.000029s : 1: opt.transform.opt_after_cconv 0.02% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000095s : 28: opt.transform.opt_b 0.03% : 0.000049s : 2: opt.transform.opt_trans_graph 0.03% : 0.000036s : 4: opt.transform.symbol_engine_opt 1.67% : 0.002362s : 1: opt_a 0.08% : 0.000114s : 1: opt_after_cconv 0.43% : 0.000605s : 1: opt_after_jit_grad 0.15% : 0.000210s : 1: opt_b 3.33% : 0.004701s : 1: optimize 0.02% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000037s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000019s : 1: remove_dup_value 0.28% : 0.000401s : 1: renormalize.infer 0.19% : 0.000266s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000043s : 1: rewriter_after_opt_a 0.04% : 0.000052s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000081s : 1: symbol_engine_optimizer 82.29% : 0.116201s : 1: task_emit 0.06% : 0.000080s : 1: tuple_transform 3.45% : 0.004878s : 1: type_inference 0.06% : 0.000081s : 1: validate TotalTime = 0.133287, [24] [bootstrap]: 0.00045423 [type_inference]: 0.00642488 [event_method]: 1.557e-05 [auto_monad]: 5.658e-05 [graph_reusing]: 5.32001e-06 [inline]: 2.36e-06 [add_attr]: 0.00361884, [1] [add_attr_with_inline]: 0.00360787, [1] [Cycle 1]: 6.887e-05, [2] [tag_attr]: 2.076e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 3.246e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.00496617, [53] [py_interpret_to_execute]: 2.592e-05 [rewriter_before_opt_a]: 6.798e-05 [opt_a]: 0.0027214, [2] [Cycle 1]: 0.00207326, [45] [expand_dump_flag]: 3.03998e-06 [switch_simplify]: 3.281e-05 [loop_unroll]: 2.054e-05 [a_1]: 0.00050277 [with_stream_mark]: 1.756e-05 [recompute_prepare]: 9.37999e-06 [updatestate_depend_eliminate]: 4.07e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 4.999e-05 [parameter_eliminate]: 2.46e-06 [a_2]: 7.993e-05 [accelerated_algorithm]: 6.73e-06 [shard]: 3.23e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 9.19e-06 [auto_parallel]: 6.78e-06 [parallel]: 2.083e-05 [flash_sp]: 9.53002e-06 [merge_comm]: 3.93001e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 9.87001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.80998e-06 [virtual_dataset]: 6.28e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.51e-06 [merge_forward]: 4.31002e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 1.067e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.24e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.73998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.82002e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 1.203e-05 [a_after_grad]: 9.10999e-06 [renormalize]: 0.00080641 [add_forward_monad_depend]: 6.91001e-06 [auto_monad_grad]: 2.51e-06 [auto_monad_eliminator]: 1.757e-05 [cse]: 3.178e-05 [a_3]: 4.79e-05 [Cycle 2]: 0.0006355, [45] [expand_dump_flag]: 1.66e-06 [switch_simplify]: 7.58001e-06 [loop_unroll]: 5.46998e-06 [a_1]: 0.00013096 [with_stream_mark]: 1.246e-05 [recompute_prepare]: 5.91e-06 [updatestate_depend_eliminate]: 3.11999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.34999e-06 [parameter_eliminate]: 1.17999e-06 [a_2]: 6.812e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 5.47001e-06 [auto_parallel]: 7.03e-06 [parallel]: 7.21999e-06 [flash_sp]: 3.86999e-06 [merge_comm]: 3.35e-06 [allreduce_fusion]: 3.55e-06 [matmul_add_comm_reduction]: 6.91001e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.69999e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 5.23002e-06 [merge_forward]: 3.11001e-06 [cell_reuse_recompute_pass]: 1.76998e-06 [offload_activation]: 7.93001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 8.08999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.91998e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.40001e-06 [after_resolve]: 9.55001e-06 [a_after_grad]: 8.67998e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.76998e-06 [auto_monad_grad]: 1.13001e-06 [auto_monad_eliminator]: 7.36001e-06 [cse]: 1.536e-05 [a_3]: 3.373e-05 [py_interpret_to_execute_after_opt_a]: 1.231e-05 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.983e-05 [convert_after_rewriter]: 7.08e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.00068469 [opt_b]: 0.00019577, [1] [Cycle 1]: 0.00018836, [7] [b_1]: 0.00011245 [b_2]: 7.39002e-06 [updatestate_depend_eliminate]: 7.98999e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 2.89991e-07 [cse]: 1.921e-05 [optimize_parallel_all_gather_comm]: 1.745e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 3.05e-05 [loop_unroll]: 0.00043948 [opt_after_cconv]: 0.00010064, [1] [Cycle 1]: 9.383e-05, [7] [c_1]: 2.937e-05 [parameter_eliminate]: 3.38e-06 [updatestate_depend_eliminate]: 5.97999e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.737e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.387e-05 [tuple_transform]: 7.291e-05, [1] [Cycle 1]: 6.842e-05, [4] [d_1]: 4.163e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 1.64998e-06 [add_recomputation]: 4.839e-05 [cse_after_recomputation]: 2.217e-05, [1] [Cycle 1]: 1.672e-05, [1] [cse]: 1.151e-05 [environ_conv]: 5.66e-06 [swap_dp_allreduce_reducescatter]: 5.52999e-06 [bias_add_comm_swap]: 3.09001e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.59001e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.59e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 9.49978e-07 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.47999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.265e-05 [grouped_pairwise_exchange_alltoall]: 1.88002e-06 [offloading_packed_experts]: 4.2e-06 [overlap_recompute_and_grad_model_parallel]: 6.042e-05 [overlap_grad_matmul_and_grad_allreduce]: 1.37999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.39001e-06 [overlap_grad_ring_attention]: 4.27998e-06 [overlap_grad_flash_sp]: 2.074e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 7.286e-05, [1] [Cycle 1]: 6.837e-05, [6] [build]: 3.46001e-06 [elim_shapecalc]: 9.74999e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.05002e-06 [fold_const_symbol]: 9.37999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.53003e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 1.676e-05 [get_jit_bprop_graph]: 1.76e-06 [rewriter_after_jit_bprop_graph]: 4.31002e-06 [opt_after_jit_grad]: 0.00046504 [validate]: 3.991e-05 [backend_pass]: 1.01002e-06 [task_emit]: 0.116913 [execute]: 1.029e-05 Sums bootstrap : 0.000454s : 0.35% type_inference : 0.006425s : 5.00% event_method : 0.000016s : 0.01% auto_monad : 0.000057s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000032s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000026s : 0.02% optimize.rewriter_before_opt_a : 0.000068s : 0.05% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000634s : 0.49% optimize.opt_a.with_stream_mark : 0.000030s : 0.02% optimize.opt_a.recompute_prepare : 0.000015s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000052s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.01% optimize.opt_a.auto_parallel : 0.000014s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000807s : 0.63% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.02% optimize.opt_a.cse : 0.000047s : 0.04% optimize.opt_a.a_3 : 0.000082s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000040s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000685s : 0.53% optimize.opt_b.b_1 : 0.000112s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.02% optimize.loop_unroll : 0.000439s : 0.34% optimize.opt_after_cconv.c_1 : 0.000029s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000042s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.04% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000006s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000060s : 0.05% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000465s : 0.36% validate : 0.000040s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.116913s : 90.91% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000209 30 15.45% : 0.000032s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.56% : 0.000001s : 2: substitution.fold_const_symbol 2.72% : 0.000006s : 4: substitution.graph_param_transform 68.19% : 0.000143s : 3: substitution.inline 1.57% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.34% : 0.000005s : 4: substitution.remove_not_recompute_node 2.39% : 0.000005s : 4: substitution.replace_old_param 5.72% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006372 2 90.18% : 0.005746s : 1: type_inference.infer 9.82% : 0.000626s : 1: type_inference.specialize ------[replace.] 0.000043 5 72.08% : 0.000031s : 3: replace.inline 27.92% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000152 5 92.82% : 0.000141s : 3: match.inline 7.18% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000168 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.84% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.74% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.29% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000011s : 51: predicate.inline 0.91% : 0.000002s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000002s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 11: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.71% : 0.000003s : 16: predicate.partial_defer_inline 1.35% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000002s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.60% : 0.000003s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000002s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 1.31% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.93% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.19% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000452 8 42.76% : 0.000193s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.24% : 0.000259s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.143841 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.52% : 0.003625s : 1: add_attr 2.51% : 0.003612s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000063s : 1: auto_monad 0.01% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.34% : 0.000491s : 1: bootstrap 0.02% : 0.000034s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000022s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000449s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.48% : 0.000696s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.71% : 0.001015s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000092s : 28: opt.transform.opt_b 0.03% : 0.000046s : 2: opt.transform.opt_trans_graph 0.02% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.89% : 0.002724s : 1: opt_a 0.07% : 0.000105s : 1: opt_after_cconv 0.33% : 0.000476s : 1: opt_after_jit_grad 0.14% : 0.000199s : 1: opt_b 3.46% : 0.004971s : 1: optimize 0.01% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.04% : 0.000064s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000037s : 1: pre_auto_parallel 0.02% : 0.000030s : 1: py_interpret_to_execute 0.01% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000018s : 1: remove_dup_value 0.31% : 0.000443s : 1: renormalize.infer 0.25% : 0.000353s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000044s : 1: rewriter_after_opt_a 0.05% : 0.000073s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000075s : 1: symbol_engine_optimizer 81.30% : 0.116938s : 1: task_emit 0.05% : 0.000076s : 1: tuple_transform 4.48% : 0.006446s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.185312, [24] [bootstrap]: 0.00050107 [type_inference]: 0.0137233 [event_method]: 6.104e-05 [auto_monad]: 0.00013252 [graph_reusing]: 8.63001e-06 [inline]: 2.55002e-06 [add_attr]: 0.00380075, [1] [add_attr_with_inline]: 0.00378948, [1] [Cycle 1]: 9.782e-05, [2] [tag_attr]: 4.498e-05 [meta_addattr_fg_expand]: 9.76998e-06 [parallel-infer-symbol]: 4.82e-06 [pre_auto_parallel]: 6.028e-05 [insert-virtual-dataset]: 2.86e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.21998e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.016393, [53] [py_interpret_to_execute]: 4.564e-05 [rewriter_before_opt_a]: 0.00016715 [opt_a]: 0.0136359, [3] [Cycle 1]: 0.0087347, [45] [expand_dump_flag]: 6.07999e-06 [switch_simplify]: 7.653e-05 [loop_unroll]: 6.309e-05 [a_1]: 0.00156713 [with_stream_mark]: 3.135e-05 [recompute_prepare]: 2.562e-05 [updatestate_depend_eliminate]: 1.01e-05 [updatestate_assign_eliminate]: 7.71999e-06 [updatestate_loads_eliminate]: 7.13e-06 [parameter_eliminate]: 2.98e-06 [a_2]: 0.00025272 [accelerated_algorithm]: 3.467e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 4.50999e-06 [shard_inline]: 1.783e-05 [merge_send_recv]: 1.857e-05 [auto_parallel]: 1.494e-05 [parallel]: 2.11e-05 [flash_sp]: 1.471e-05 [merge_comm]: 1.009e-05 [allreduce_fusion]: 9.13002e-06 [matmul_add_comm_reduction]: 3.429e-05 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 1.973e-05 [virtual_dataset]: 1.604e-05 [get_grad_eliminate_]: 1.541e-05 [virtual_output]: 1.623e-05 [merge_forward]: 1.018e-05 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 1.781e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.008e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 2.858e-05 [set_forward_comm_id_for_comm_node_pass]: 1.069e-05 [meta_fg_expand]: 0.0019162 [flash_sp_send_recv_attached]: 4.73001e-06 [receive_attached]: 2.79999e-06 [after_resolve]: 7.55e-05 [a_after_grad]: 8.755e-05 [renormalize]: 0.00325683 [add_forward_monad_depend]: 1.362e-05 [auto_monad_grad]: 8.43001e-06 [auto_monad_eliminator]: 6.239e-05 [cse]: 0.00018704 [a_3]: 0.00036796 [Cycle 2]: 0.00391827, [45] [expand_dump_flag]: 2.99999e-06 [switch_simplify]: 5.059e-05 [loop_unroll]: 4.56e-05 [a_1]: 0.00168904 [with_stream_mark]: 2.716e-05 [recompute_prepare]: 1.39e-05 [updatestate_depend_eliminate]: 6.14999e-06 [updatestate_assign_eliminate]: 5.39998e-06 [updatestate_loads_eliminate]: 4.74e-06 [parameter_eliminate]: 2.53003e-06 [a_2]: 0.00014152 [accelerated_algorithm]: 1.513e-05 [shard]: 2.89001e-06 [meta_shard_fg_expand]: 3.72002e-06 [shard_inline]: 9.59e-06 [merge_send_recv]: 1.101e-05 [auto_parallel]: 1.24e-05 [parallel]: 1.05e-05 [flash_sp]: 3.98001e-06 [merge_comm]: 5.64e-06 [allreduce_fusion]: 5.57001e-06 [matmul_add_comm_reduction]: 1.109e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.203e-05 [virtual_dataset]: 9.39e-06 [get_grad_eliminate_]: 9.27001e-06 [virtual_output]: 8.90999e-06 [merge_forward]: 5.634e-05 [cell_reuse_recompute_pass]: 2.43e-06 [offload_activation]: 1.563e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.223e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 1.57e-05 [set_forward_comm_id_for_comm_node_pass]: 6.25002e-06 [meta_fg_expand]: 0.00013576 [flash_sp_send_recv_attached]: 1.90001e-06 [receive_attached]: 3.47002e-06 [after_resolve]: 2.03e-05 [a_after_grad]: 1.599e-05 [renormalize]: 0.00104082 [add_forward_monad_depend]: 7.05e-06 [auto_monad_grad]: 2.84999e-06 [auto_monad_eliminator]: 1.942e-05 [cse]: 6.468e-05 [a_3]: 7.276e-05 [Cycle 3]: 0.00096317, [45] [expand_dump_flag]: 2.16998e-06 [switch_simplify]: 1.109e-05 [loop_unroll]: 9.65002e-06 [a_1]: 0.00026299 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 9.62999e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 4.51002e-06 [updatestate_loads_eliminate]: 4.20999e-06 [parameter_eliminate]: 1.32e-06 [a_2]: 0.00012517 [accelerated_algorithm]: 1.249e-05 [shard]: 2.17001e-06 [meta_shard_fg_expand]: 2.08002e-06 [shard_inline]: 9.27001e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 8.44002e-06 [parallel]: 6.94001e-06 [flash_sp]: 9.80013e-07 [merge_comm]: 5.16002e-06 [allreduce_fusion]: 5.27001e-06 [matmul_add_comm_reduction]: 9.36e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.076e-05 [virtual_dataset]: 8.97999e-06 [get_grad_eliminate_]: 8.64998e-06 [virtual_output]: 8.54e-06 [merge_forward]: 4.97e-06 [cell_reuse_recompute_pass]: 2.40002e-06 [offload_activation]: 1.127e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.642e-05 [merge_recompute_call_nodes]: 8.59989e-07 [before_grad]: 1.529e-05 [set_forward_comm_id_for_comm_node_pass]: 5.55001e-06 [meta_fg_expand]: 3.33e-06 [flash_sp_send_recv_attached]: 1.25999e-06 [receive_attached]: 1.38002e-06 [after_resolve]: 1.608e-05 [a_after_grad]: 1.502e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.80001e-06 [auto_monad_grad]: 1.28002e-06 [auto_monad_eliminator]: 1.288e-05 [cse]: 3.125e-05 [a_3]: 6.098e-05 [py_interpret_to_execute_after_opt_a]: 1.781e-05 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 5.665e-05 [convert_after_rewriter]: 9.37001e-06 [order_py_execute_after_rewriter]: 7e-06 [mutable_eliminate]: 0.00074198 [opt_b]: 0.00030841, [1] [Cycle 1]: 0.00029971, [7] [b_1]: 0.00019555 [b_2]: 1.119e-05 [updatestate_depend_eliminate]: 9.29e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 4.47e-06 [renormalize]: 4.50003e-07 [cse]: 3.79e-05 [optimize_parallel_all_gather_comm]: 2.204e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.984e-05 [loop_unroll]: 0.00047757 [opt_after_cconv]: 0.000146, [1] [Cycle 1]: 0.00013843, [7] [c_1]: 5.004e-05 [parameter_eliminate]: 3.06001e-06 [updatestate_depend_eliminate]: 7.93999e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.91001e-06 [cse]: 3.329e-05 [renormalize]: 6.90023e-07 [remove_dup_value]: 4.32e-05 [tuple_transform]: 0.00010638, [1] [Cycle 1]: 0.00010102, [4] [d_1]: 6.957e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 3.09985e-07 [switch_simplify]: 1.027e-05 [partial_unused_args_eliminate]: 1.99e-06 [add_recomputation]: 6.652e-05 [cse_after_recomputation]: 3.491e-05, [1] [Cycle 1]: 3.007e-05, [1] [cse]: 2.415e-05 [environ_conv]: 1.156e-05 [swap_dp_allreduce_reducescatter]: 8.15e-06 [bias_add_comm_swap]: 2.89999e-06 [label_micro_interleaved_index]: 4.57e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 9.40025e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.11998e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 9.90025e-07 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.47999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91003e-06 [control_data_broadcast_order]: 1.827e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 5.65001e-06 [overlap_recompute_and_grad_model_parallel]: 5.74e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.58e-06 [overlap_grad_ring_attention]: 5.98998e-06 [overlap_grad_flash_sp]: 2.774e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.43002e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 1.16997e-06 [symbol_engine_optimizer]: 0.00010843, [1] [Cycle 1]: 0.00010354, [6] [build]: 1.267e-05 [elim_shapecalc]: 1.423e-05 [elim_not_effective]: 1.927e-05 [opt_reshape]: 1.172e-05 [fold_const_symbol]: 1.535e-05 [renormalize]: 2.09984e-07 [detach_backward]: 2.16998e-06 [pipeline_parallel_scheduler]: 1.85001e-06 [auto_monad_reorder]: 2.683e-05 [get_jit_bprop_graph]: 1.84998e-06 [rewriter_after_jit_bprop_graph]: 5.15001e-06 [opt_after_jit_grad]: 0.00050639 [validate]: 5.977e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.149734 [execute]: 8.95999e-06 Sums bootstrap : 0.000501s : 0.28% type_inference : 0.013723s : 7.62% event_method : 0.000061s : 0.03% auto_monad : 0.000133s : 0.07% graph_reusing : 0.000009s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000045s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000005s : 0.00% pre_auto_parallel : 0.000060s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000046s : 0.03% optimize.rewriter_before_opt_a : 0.000167s : 0.09% optimize.opt_a.expand_dump_flag : 0.000011s : 0.01% optimize.opt_a.switch_simplify : 0.000138s : 0.08% optimize.opt_a.loop_unroll : 0.000118s : 0.07% optimize.opt_a.a_1 : 0.003519s : 1.95% optimize.opt_a.with_stream_mark : 0.000072s : 0.04% optimize.opt_a.recompute_prepare : 0.000049s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000519s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000062s : 0.03% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.01% optimize.opt_a.shard_inline : 0.000037s : 0.02% optimize.opt_a.merge_send_recv : 0.000038s : 0.02% optimize.opt_a.auto_parallel : 0.000036s : 0.02% optimize.opt_a.parallel : 0.000039s : 0.02% optimize.opt_a.flash_sp : 0.000020s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000055s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.02% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000034s : 0.02% optimize.opt_a.merge_forward : 0.000071s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000045s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000069s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000060s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.01% optimize.opt_a.meta_fg_expand : 0.002055s : 1.14% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000008s : 0.00% optimize.opt_a.after_resolve : 0.000112s : 0.06% optimize.opt_a.a_after_grad : 0.000119s : 0.07% optimize.opt_a.renormalize : 0.004298s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000022s : 0.01% optimize.opt_a.auto_monad_grad : 0.000013s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000095s : 0.05% optimize.opt_a.cse : 0.000283s : 0.16% optimize.opt_a.a_3 : 0.000502s : 0.28% optimize.py_interpret_to_execute_after_opt_a : 0.000018s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000057s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000742s : 0.41% optimize.opt_b.b_1 : 0.000196s : 0.11% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000038s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.02% optimize.loop_unroll : 0.000478s : 0.27% optimize.opt_after_cconv.c_1 : 0.000050s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000043s : 0.02% optimize.tuple_transform.d_1 : 0.000070s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000067s : 0.04% optimize.cse_after_recomputation.cse : 0.000024s : 0.01% optimize.environ_conv : 0.000012s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000506s : 0.28% validate : 0.000060s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.149734s : 83.17% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000982 222 6.58% : 0.000065s : 12: substitution.arithmetic_simplify 2.18% : 0.000021s : 2: substitution.cast_eliminate 0.27% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000005s : 5: substitution.float_depend_g_call 0.46% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.22% : 0.000002s : 5: substitution.fold_const_symbol 0.89% : 0.000009s : 8: substitution.graph_param_transform 0.28% : 0.000003s : 2: substitution.incorporate_call 0.20% : 0.000002s : 2: substitution.incorporate_call_switch 57.50% : 0.000564s : 17: substitution.inline 2.15% : 0.000021s : 2: substitution.inline_without_move 1.22% : 0.000012s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000019s : 3: substitution.less_batch_normalization 1.52% : 0.000015s : 11: substitution.minmaximum_grad 0.68% : 0.000007s : 5: substitution.partial_eliminate 1.52% : 0.000015s : 20: substitution.remove_not_recompute_node 3.16% : 0.000031s : 10: substitution.replace_applicator 1.51% : 0.000015s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.28% : 0.000032s : 11: substitution.tuple_list_convert_item_index_to_positive 1.52% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.01% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.93% : 0.000078s : 30: substitution.tuple_list_get_item_eliminator 2.11% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013617 2 86.13% : 0.011728s : 1: type_inference.infer 13.87% : 0.001889s : 1: type_inference.specialize ------[replace.] 0.000246 33 59.26% : 0.000146s : 17: replace.inline 40.74% : 0.000100s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000593 33 93.58% : 0.000555s : 17: match.inline 6.42% : 0.000038s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000794 5764 1.09% : 0.000009s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.03% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.19% : 0.000009s : 68: predicate.cast_eliminate 1.23% : 0.000010s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000010s : 76: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_depend_swap 1.79% : 0.000014s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.66% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000018s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000006s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000005s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.54% : 0.000044s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000006s : 32: predicate.less_batch_normalization 1.62% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.60% : 0.000021s : 168: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000009s : 68: predicate.minmaximum_grad 0.39% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000016s : 101: predicate.partial_defer_inline 1.72% : 0.000014s : 92: predicate.partial_eliminate 1.09% : 0.000009s : 68: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000011s : 68: predicate.reduce_eliminate 2.67% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 2.06% : 0.000016s : 152: predicate.replace_applicator 0.68% : 0.000005s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000009s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.34% : 0.000011s : 68: predicate.same_eliminate 0.41% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.35% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 101: predicate.switch_defer_inline 2.91% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.84% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000009s : 68: predicate.tile_eliminate 1.08% : 0.000009s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.57% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.14% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002049 34 54.80% : 0.001123s : 13: func_graph_cloner_run.FuncGraphClonerGraph 45.20% : 0.000926s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.215424 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.77% : 0.003807s : 1: add_attr 1.76% : 0.003794s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000071s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000140s : 1: auto_monad 0.01% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.25% : 0.000540s : 1: bootstrap 0.02% : 0.000034s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000015s : 1: environ_conv 0.03% : 0.000070s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.23% : 0.000488s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000753s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000020s : 1: opt.transform.mutable_eliminate 2.46% : 0.005300s : 117: opt.transform.opt_a 0.02% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000179s : 28: opt.transform.opt_b 0.04% : 0.000078s : 2: opt.transform.opt_trans_graph 0.03% : 0.000057s : 4: opt.transform.symbol_engine_opt 6.33% : 0.013640s : 1: opt_a 0.07% : 0.000149s : 1: opt_after_cconv 0.24% : 0.000517s : 1: opt_after_jit_grad 0.15% : 0.000313s : 1: opt_b 7.61% : 0.016399s : 1: optimize 0.01% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.01% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000065s : 1: pre_auto_parallel 0.02% : 0.000050s : 1: py_interpret_to_execute 0.01% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000048s : 1: remove_dup_value 1.10% : 0.002375s : 2: renormalize.infer 0.88% : 0.001902s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000061s : 1: rewriter_after_opt_a 0.08% : 0.000173s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000111s : 1: symbol_engine_optimizer 69.52% : 0.149757s : 1: task_emit 0.05% : 0.000110s : 1: tuple_transform 6.39% : 0.013756s : 1: type_inference 0.04% : 0.000095s : 1: validate TotalTime = 0.122776, [24] [bootstrap]: 0.00045029 [type_inference]: 0.00461025 [event_method]: 1.156e-05 [auto_monad]: 5.057e-05 [graph_reusing]: 5.82999e-06 [inline]: 3.09001e-06 [add_attr]: 0.00331216, [1] [add_attr_with_inline]: 0.00330173, [1] [Cycle 1]: 5.557e-05, [2] [tag_attr]: 1.479e-05 [meta_addattr_fg_expand]: 3.20998e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.873e-05 [insert-virtual-dataset]: 2.80002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00422351, [53] [py_interpret_to_execute]: 1.791e-05 [rewriter_before_opt_a]: 4.462e-05 [opt_a]: 0.00217506, [2] [Cycle 1]: 0.00155012, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 2.459e-05 [loop_unroll]: 1.38e-05 [a_1]: 0.00031677 [with_stream_mark]: 1.666e-05 [recompute_prepare]: 7.40003e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 3.41001e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.92001e-06 [a_2]: 7.851e-05 [accelerated_algorithm]: 6.54999e-06 [shard]: 2.59001e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 8.24998e-06 [auto_parallel]: 7.42998e-06 [parallel]: 1.999e-05 [flash_sp]: 8.38001e-06 [merge_comm]: 3.52002e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.53002e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.10998e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 2.109e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.265e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.62002e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.76999e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 1.083e-05 [a_after_grad]: 9.06002e-06 [renormalize]: 0.00057342 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 2.44999e-06 [auto_monad_eliminator]: 1.445e-05 [cse]: 3.024e-05 [a_3]: 4.189e-05 [Cycle 2]: 0.00061383, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012832 [with_stream_mark]: 1.25e-05 [recompute_prepare]: 6.16998e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.87e-05 [accelerated_algorithm]: 5.49998e-06 [shard]: 1.33002e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.68999e-06 [auto_parallel]: 5.69e-06 [parallel]: 5.79e-06 [flash_sp]: 3.48e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 6.24999e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.15002e-06 [virtual_dataset]: 5.10999e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 3.14999e-06 [cell_reuse_recompute_pass]: 1.69e-06 [offload_activation]: 7.45e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.023e-05 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 8.52e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.20999e-06 [after_resolve]: 9.61e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 7.93001e-06 [cse]: 1.505e-05 [a_3]: 3.33e-05 [py_interpret_to_execute_after_opt_a]: 8.85001e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.497e-05 [convert_after_rewriter]: 7.00998e-06 [order_py_execute_after_rewriter]: 5.42001e-06 [mutable_eliminate]: 0.0005917 [opt_b]: 0.00019426, [1] [Cycle 1]: 0.00018743, [7] [b_1]: 0.00011012 [b_2]: 8.27e-06 [updatestate_depend_eliminate]: 7.11001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.33998e-06 [renormalize]: 4.90021e-07 [cse]: 2.15e-05 [optimize_parallel_all_gather_comm]: 1.799e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.682e-05 [loop_unroll]: 0.00044413 [opt_after_cconv]: 0.00010058, [1] [Cycle 1]: 9.501e-05, [7] [c_1]: 3.011e-05 [parameter_eliminate]: 3.71999e-06 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.31998e-06 [cse]: 1.751e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.266e-05 [tuple_transform]: 7.257e-05, [1] [Cycle 1]: 6.819e-05, [4] [d_1]: 4.171e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.43003e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.89e-05 [cse_after_recomputation]: 2.06e-05, [1] [Cycle 1]: 1.613e-05, [1] [cse]: 1.089e-05 [environ_conv]: 5.00001e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 3.01999e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.86999e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.13998e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.16997e-06 [full_micro_interleaved_order_control]: 2.68998e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.52998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.31002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 4.28999e-06 [overlap_grad_flash_sp]: 1.956e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 7.088e-05, [1] [Cycle 1]: 6.665e-05, [6] [build]: 3.43e-06 [elim_shapecalc]: 8.14002e-06 [elim_not_effective]: 1.207e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 1.692e-05 [get_jit_bprop_graph]: 1.91998e-06 [rewriter_after_jit_bprop_graph]: 3.83001e-06 [opt_after_jit_grad]: 0.00046953 [validate]: 3.957e-05 [backend_pass]: 1.13001e-06 [task_emit]: 0.109215 [execute]: 9.96e-06 Sums bootstrap : 0.000450s : 0.38% type_inference : 0.004610s : 3.89% event_method : 0.000012s : 0.01% auto_monad : 0.000051s : 0.04% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000018s : 0.02% optimize.rewriter_before_opt_a : 0.000045s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000445s : 0.38% optimize.opt_a.with_stream_mark : 0.000029s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000013s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000029s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000574s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000045s : 0.04% optimize.opt_a.a_3 : 0.000075s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000592s : 0.50% optimize.opt_b.b_1 : 0.000110s : 0.09% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.02% optimize.loop_unroll : 0.000444s : 0.38% optimize.opt_after_cconv.c_1 : 0.000030s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000042s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000470s : 0.40% validate : 0.000040s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.109215s : 92.27% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000140 26 17.07% : 0.000024s : 4: substitution.arithmetic_simplify 1.27% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 4.61% : 0.000006s : 4: substitution.graph_param_transform 67.33% : 0.000094s : 2: substitution.inline 2.11% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.89% : 0.000005s : 4: substitution.remove_not_recompute_node 2.86% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004562 2 91.64% : 0.004181s : 1: type_inference.infer 8.36% : 0.000381s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000093 2 100.00% : 0.000093s : 2: match.inline ------[predicate.] 0.000145 984 0.92% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.53% : 0.000004s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.99% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.97% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_depend_swap 1.74% : 0.000003s : 21: predicate.environ_get_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.96% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.30% : 0.000009s : 44: predicate.inline 1.03% : 0.000002s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.65% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.94% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.64% : 0.000001s : 9: predicate.minmaximum_grad 2.18% : 0.000003s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.15% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.23% : 0.000002s : 9: predicate.reduce_eliminate 2.30% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.28% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.50% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 1.22% : 0.000002s : 8: predicate.same_eliminate 0.68% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.52% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.95% : 0.000001s : 11: predicate.switch_defer_inline 1.66% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.18% : 0.000006s : 41: predicate.switch_simplify 0.84% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.45% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.58% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.99% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.68% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000295 6 37.57% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.43% : 0.000184s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131846 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.52% : 0.003318s : 1: add_attr 2.51% : 0.003306s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000056s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.37% : 0.000487s : 1: bootstrap 0.02% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000018s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000455s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000603s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000016s : 1: opt.transform.mutable_eliminate 0.61% : 0.000803s : 78: opt.transform.opt_a 0.02% : 0.000029s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000093s : 28: opt.transform.opt_b 0.04% : 0.000046s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.65% : 0.002178s : 1: opt_a 0.08% : 0.000104s : 1: opt_after_cconv 0.36% : 0.000480s : 1: opt_after_jit_grad 0.15% : 0.000198s : 1: opt_b 3.21% : 0.004229s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000022s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.25% : 0.000328s : 1: renormalize.infer 0.18% : 0.000239s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000039s : 1: rewriter_after_opt_a 0.04% : 0.000050s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000073s : 1: symbol_engine_optimizer 82.85% : 0.109238s : 1: task_emit 0.06% : 0.000075s : 1: tuple_transform 3.51% : 0.004632s : 1: type_inference 0.11% : 0.000140s : 1: validate TotalTime = 0.168269, [24] [bootstrap]: 0.00117506 [type_inference]: 0.0132296 [event_method]: 6.022e-05 [auto_monad]: 0.00012976 [graph_reusing]: 8.82e-06 [inline]: 2.44999e-06 [add_attr]: 0.00363089, [1] [add_attr_with_inline]: 0.00362043, [1] [Cycle 1]: 8.281e-05, [2] [tag_attr]: 3.564e-05 [meta_addattr_fg_expand]: 8.52e-06 [parallel-infer-symbol]: 3.86001e-06 [pre_auto_parallel]: 5.262e-05 [insert-virtual-dataset]: 2.65002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.93002e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.0154238, [53] [py_interpret_to_execute]: 3.838e-05 [rewriter_before_opt_a]: 0.00013741 [opt_a]: 0.0127872, [3] [Cycle 1]: 0.00815849, [45] [expand_dump_flag]: 4.23001e-06 [switch_simplify]: 6.704e-05 [loop_unroll]: 5.601e-05 [a_1]: 0.00146475 [with_stream_mark]: 2.819e-05 [recompute_prepare]: 2.248e-05 [updatestate_depend_eliminate]: 9.11998e-06 [updatestate_assign_eliminate]: 7.92e-06 [updatestate_loads_eliminate]: 7.21999e-06 [parameter_eliminate]: 2.72001e-06 [a_2]: 0.00025241 [accelerated_algorithm]: 3.376e-05 [shard]: 2.17999e-06 [meta_shard_fg_expand]: 3.46001e-06 [shard_inline]: 1.625e-05 [merge_send_recv]: 1.656e-05 [auto_parallel]: 1.152e-05 [parallel]: 6.104e-05 [flash_sp]: 1.447e-05 [merge_comm]: 1.122e-05 [allreduce_fusion]: 9.16002e-06 [matmul_add_comm_reduction]: 3.266e-05 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 1.943e-05 [virtual_dataset]: 1.68e-05 [get_grad_eliminate_]: 1.553e-05 [virtual_output]: 1.531e-05 [merge_forward]: 1e-05 [cell_reuse_recompute_pass]: 1.46998e-06 [offload_activation]: 1.903e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.961e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.785e-05 [set_forward_comm_id_for_comm_node_pass]: 1.03e-05 [meta_fg_expand]: 0.00176839 [flash_sp_send_recv_attached]: 4.89998e-06 [receive_attached]: 2.71e-06 [after_resolve]: 6.586e-05 [a_after_grad]: 8.686e-05 [renormalize]: 0.00299023 [add_forward_monad_depend]: 1.193e-05 [auto_monad_grad]: 6.56999e-06 [auto_monad_eliminator]: 6.07e-05 [cse]: 0.00018464 [a_3]: 0.00035363 [Cycle 2]: 0.00354969, [45] [expand_dump_flag]: 2.88998e-06 [switch_simplify]: 4.811e-05 [loop_unroll]: 4.531e-05 [a_1]: 0.00168585 [with_stream_mark]: 2.002e-05 [recompute_prepare]: 1.314e-05 [updatestate_depend_eliminate]: 6.07001e-06 [updatestate_assign_eliminate]: 5.30999e-06 [updatestate_loads_eliminate]: 4.32003e-06 [parameter_eliminate]: 2.07001e-06 [a_2]: 0.00013111 [accelerated_algorithm]: 1.424e-05 [shard]: 2.37001e-06 [meta_shard_fg_expand]: 3.09001e-06 [shard_inline]: 9.54e-06 [merge_send_recv]: 1.032e-05 [auto_parallel]: 1.187e-05 [parallel]: 9.30001e-06 [flash_sp]: 3.98001e-06 [merge_comm]: 5.49e-06 [allreduce_fusion]: 5.33002e-06 [matmul_add_comm_reduction]: 1.162e-05 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 1.08e-05 [virtual_dataset]: 9.67001e-06 [get_grad_eliminate_]: 9.54e-06 [virtual_output]: 9.39e-06 [merge_forward]: 6.12999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.282e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.79e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 1.495e-05 [set_forward_comm_id_for_comm_node_pass]: 5.63002e-06 [meta_fg_expand]: 5.647e-05 [flash_sp_send_recv_attached]: 1.68002e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.73e-05 [a_after_grad]: 1.502e-05 [renormalize]: 0.0008837 [add_forward_monad_depend]: 5.37001e-06 [auto_monad_grad]: 2.21003e-06 [auto_monad_eliminator]: 1.838e-05 [cse]: 5.978e-05 [a_3]: 7.403e-05 [Cycle 3]: 0.00105969, [45] [expand_dump_flag]: 2.04e-06 [switch_simplify]: 1.124e-05 [loop_unroll]: 9.94001e-06 [a_1]: 0.00027328 [with_stream_mark]: 1.328e-05 [recompute_prepare]: 1.058e-05 [updatestate_depend_eliminate]: 6.59001e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 4.07e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012591 [accelerated_algorithm]: 1.25e-05 [shard]: 1.32e-06 [meta_shard_fg_expand]: 2.22001e-06 [shard_inline]: 9.05001e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 8.82e-06 [parallel]: 6.48003e-06 [flash_sp]: 1.20001e-06 [merge_comm]: 5.66e-06 [allreduce_fusion]: 5.34e-06 [matmul_add_comm_reduction]: 8.54e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.115e-05 [virtual_dataset]: 9.12999e-06 [get_grad_eliminate_]: 8.72e-06 [virtual_output]: 8.48001e-06 [merge_forward]: 4.40999e-06 [cell_reuse_recompute_pass]: 2.39001e-06 [offload_activation]: 1.033e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.845e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 1.55e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34e-06 [meta_fg_expand]: 3.33e-06 [flash_sp_send_recv_attached]: 1.24998e-06 [receive_attached]: 1.30999e-06 [after_resolve]: 1.506e-05 [a_after_grad]: 1.535e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.86e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.252e-05 [cse]: 3.111e-05 [a_3]: 6.203e-05 [py_interpret_to_execute_after_opt_a]: 1.672e-05 [slice_cell_reuse_recomputed_activation]: 2.55997e-06 [rewriter_after_opt_a]: 5.466e-05 [convert_after_rewriter]: 1.143e-05 [order_py_execute_after_rewriter]: 7.1e-06 [mutable_eliminate]: 0.00071932 [opt_b]: 0.00030393, [1] [Cycle 1]: 0.00029537, [7] [b_1]: 0.0001964 [b_2]: 1.129e-05 [updatestate_depend_eliminate]: 8.31002e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 4.15e-06 [renormalize]: 6.00005e-07 [cse]: 3.537e-05 [optimize_parallel_all_gather_comm]: 2.223e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.777e-05 [loop_unroll]: 0.0004442 [opt_after_cconv]: 0.00014239, [1] [Cycle 1]: 0.0001356, [7] [c_1]: 4.995e-05 [parameter_eliminate]: 2.60002e-06 [updatestate_depend_eliminate]: 7.42002e-06 [updatestate_assign_eliminate]: 4.52e-06 [updatestate_loads_eliminate]: 4.06001e-06 [cse]: 3.235e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 3.773e-05 [tuple_transform]: 0.00010661, [1] [Cycle 1]: 0.00010132, [4] [d_1]: 7.049e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.001e-05 [partial_unused_args_eliminate]: 1.61998e-06 [add_recomputation]: 6.566e-05 [cse_after_recomputation]: 3.452e-05, [1] [Cycle 1]: 2.957e-05, [1] [cse]: 2.385e-05 [environ_conv]: 1.018e-05 [swap_dp_allreduce_reducescatter]: 8.17e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.84e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.817e-05 [grouped_pairwise_exchange_alltoall]: 2.02001e-06 [offloading_packed_experts]: 5.32999e-06 [overlap_recompute_and_grad_model_parallel]: 6.23e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 5.14e-06 [overlap_grad_flash_sp]: 2.797e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.18002e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 0.00010502, [1] [Cycle 1]: 0.00010067, [6] [build]: 1.162e-05 [elim_shapecalc]: 1.451e-05 [elim_not_effective]: 1.935e-05 [opt_reshape]: 1.075e-05 [fold_const_symbol]: 1.521e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.29001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 2.543e-05 [get_jit_bprop_graph]: 2.37001e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00048067 [validate]: 5.574e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.133691 [execute]: 9.97999e-06 Sums bootstrap : 0.001175s : 0.72% type_inference : 0.013230s : 8.11% event_method : 0.000060s : 0.04% auto_monad : 0.000130s : 0.08% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000053s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.02% optimize.rewriter_before_opt_a : 0.000137s : 0.08% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.08% optimize.opt_a.loop_unroll : 0.000111s : 0.07% optimize.opt_a.a_1 : 0.003424s : 2.10% optimize.opt_a.with_stream_mark : 0.000061s : 0.04% optimize.opt_a.recompute_prepare : 0.000046s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000509s : 0.31% optimize.opt_a.accelerated_algorithm : 0.000060s : 0.04% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000034s : 0.02% optimize.opt_a.auto_parallel : 0.000032s : 0.02% optimize.opt_a.parallel : 0.000077s : 0.05% optimize.opt_a.flash_sp : 0.000020s : 0.01% optimize.opt_a.merge_comm : 0.000022s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000053s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.03% optimize.opt_a.virtual_dataset : 0.000036s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000021s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000042s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001828s : 1.12% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000098s : 0.06% optimize.opt_a.a_after_grad : 0.000117s : 0.07% optimize.opt_a.renormalize : 0.003874s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000019s : 0.01% optimize.opt_a.auto_monad_grad : 0.000010s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000092s : 0.06% optimize.opt_a.cse : 0.000276s : 0.17% optimize.opt_a.a_3 : 0.000490s : 0.30% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000055s : 0.03% optimize.convert_after_rewriter : 0.000011s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000719s : 0.44% optimize.opt_b.b_1 : 0.000196s : 0.12% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000035s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000028s : 0.02% optimize.loop_unroll : 0.000444s : 0.27% optimize.opt_after_cconv.c_1 : 0.000050s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000038s : 0.02% optimize.tuple_transform.d_1 : 0.000070s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000066s : 0.04% optimize.cse_after_recomputation.cse : 0.000024s : 0.01% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000481s : 0.29% validate : 0.000056s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.133691s : 81.95% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000884 218 6.48% : 0.000057s : 11: substitution.arithmetic_simplify 2.11% : 0.000019s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000005s : 5: substitution.float_depend_g_call 0.52% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000009s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000003s : 2: substitution.incorporate_call_switch 56.40% : 0.000498s : 16: substitution.inline 2.24% : 0.000020s : 2: substitution.inline_without_move 1.35% : 0.000012s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000018s : 3: substitution.less_batch_normalization 1.75% : 0.000015s : 11: substitution.minmaximum_grad 0.71% : 0.000006s : 5: substitution.partial_eliminate 1.67% : 0.000015s : 20: substitution.remove_not_recompute_node 3.18% : 0.000028s : 10: substitution.replace_applicator 1.48% : 0.000013s : 15: substitution.replace_old_param 0.29% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.46% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.59% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.18% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.62% : 0.000067s : 28: substitution.tuple_list_get_item_eliminator 2.18% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013124 2 84.85% : 0.011137s : 1: type_inference.infer 15.15% : 0.001988s : 1: type_inference.specialize ------[replace.] 0.000266 30 67.02% : 0.000178s : 16: replace.inline 32.98% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000522 30 93.88% : 0.000490s : 16: match.inline 6.12% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000769 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.14% : 0.000016s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.21% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.10% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000043s : 244: predicate.inline 1.33% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.71% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.56% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000009s : 67: predicate.minmaximum_grad 0.46% : 0.000004s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.67% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.57% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 67: predicate.reduce_eliminate 2.60% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.99% : 0.000015s : 149: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.16% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.35% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.77% : 0.000014s : 97: predicate.switch_defer_inline 2.89% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.74% : 0.000036s : 265: predicate.switch_simplify 1.04% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.57% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.20% : 0.000002s : 8: predicate.value_based_eliminate 0.62% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001774 32 57.18% : 0.001014s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.82% : 0.000760s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.196673 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.85% : 0.003636s : 1: add_attr 1.84% : 0.003625s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000070s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000137s : 1: auto_monad 0.01% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.62% : 0.001225s : 1: bootstrap 0.02% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000015s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000069s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.23% : 0.000454s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000731s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 2.62% : 0.005146s : 117: opt.transform.opt_a 0.02% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000181s : 28: opt.transform.opt_b 0.04% : 0.000078s : 2: opt.transform.opt_trans_graph 0.03% : 0.000056s : 4: opt.transform.symbol_engine_opt 6.50% : 0.012791s : 1: opt_a 0.07% : 0.000146s : 1: opt_after_cconv 0.25% : 0.000491s : 1: opt_after_jit_grad 0.16% : 0.000308s : 1: opt_b 7.85% : 0.015429s : 1: optimize 0.01% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000057s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000042s : 1: remove_dup_value 1.09% : 0.002145s : 2: renormalize.infer 0.87% : 0.001713s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000059s : 1: rewriter_after_opt_a 0.07% : 0.000142s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000108s : 1: symbol_engine_optimizer 67.99% : 0.133715s : 1: task_emit 0.06% : 0.000110s : 1: tuple_transform 6.74% : 0.013262s : 1: type_inference 0.04% : 0.000088s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x3-ge],max_mem:20.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x4-pynative],max_mem:20.0M TotalTime = 0.0250453, [24] [bootstrap]: 0.00091511 [type_inference]: 0.00684651 [event_method]: 1.567e-05 [auto_monad]: 7.974e-05 [graph_reusing]: 5.47001e-06 [inline]: 2.24001e-06 [add_attr]: 0.0041822, [1] [add_attr_with_inline]: 0.0041537, [1] [Cycle 1]: 4.983e-05, [2] [tag_attr]: 1.69e-05 [meta_addattr_fg_expand]: 4.25e-06 [parallel-infer-symbol]: 2.98998e-06 [pre_auto_parallel]: 3.094e-05 [insert-virtual-dataset]: 2.28002e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00426597, [53] [py_interpret_to_execute]: 2.114e-05 [rewriter_before_opt_a]: 5.921e-05 [opt_a]: 0.00223223, [2] [Cycle 1]: 0.00162588, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.183e-05 [loop_unroll]: 2.079e-05 [a_1]: 0.00046346 [with_stream_mark]: 4.167e-05 [recompute_prepare]: 9.29e-06 [updatestate_depend_eliminate]: 4.22e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 2.32001e-06 [a_2]: 7.754e-05 [accelerated_algorithm]: 6.71e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 7.98999e-06 [auto_parallel]: 6.25002e-06 [parallel]: 2.479e-05 [flash_sp]: 7.46001e-06 [merge_comm]: 3.89002e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 8.63001e-06 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 7.3e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.047e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 8.96998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.69002e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 8.43999e-06 [renormalize]: 0.00047707 [add_forward_monad_depend]: 4.84e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.369e-05 [cse]: 2.833e-05 [a_3]: 4.089e-05 [Cycle 2]: 0.00059657, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 7.02002e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012746 [with_stream_mark]: 1.023e-05 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.85002e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.769e-05 [accelerated_algorithm]: 5.64998e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 5.48002e-06 [merge_send_recv]: 4.63001e-06 [auto_parallel]: 5.17e-06 [parallel]: 4.02e-06 [flash_sp]: 2.96999e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.65002e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.27001e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.06002e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.013e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 2.92002e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.92e-06 [a_after_grad]: 8.02e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.28002e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.61e-06 [cse]: 1.381e-05 [a_3]: 3.269e-05 [py_interpret_to_execute_after_opt_a]: 8.15e-06 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 7.766e-05 [convert_after_rewriter]: 7.55e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00052036 [opt_b]: 0.00018434, [1] [Cycle 1]: 0.0001781, [7] [b_1]: 0.00011017 [b_2]: 7.45998e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.80009e-07 [cse]: 1.647e-05 [optimize_parallel_all_gather_comm]: 4.391e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.332e-05 [loop_unroll]: 0.00041509 [opt_after_cconv]: 9.769e-05, [1] [Cycle 1]: 9.123e-05, [7] [c_1]: 2.835e-05 [parameter_eliminate]: 2.56998e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.767e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.308e-05 [tuple_transform]: 7.047e-05, [1] [Cycle 1]: 6.612e-05, [4] [d_1]: 4.014e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 5.124e-05 [cse_after_recomputation]: 2.018e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.067e-05 [environ_conv]: 4.99998e-06 [swap_dp_allreduce_reducescatter]: 5.20001e-06 [bias_add_comm_swap]: 2.87002e-06 [label_micro_interleaved_index]: 3.94002e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.38002e-06 [assign_add_opt]: 1.55999e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.85998e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52001e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.81001e-06 [overlap_recompute_and_grad_model_parallel]: 4.81002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.18001e-06 [symbol_engine_optimizer]: 6.916e-05, [1] [Cycle 1]: 6.502e-05, [6] [build]: 2.46998e-06 [elim_shapecalc]: 8.98002e-06 [elim_not_effective]: 1.157e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 9.18002e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 1.564e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 0.00011798 [opt_after_jit_grad]: 0.00045198 [validate]: 3.402e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00784158 [execute]: 8.05999e-06 Sums bootstrap : 0.000915s : 4.61% type_inference : 0.006847s : 34.50% event_method : 0.000016s : 0.08% auto_monad : 0.000080s : 0.40% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000031s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.30% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.20% optimize.opt_a.loop_unroll : 0.000026s : 0.13% optimize.opt_a.a_1 : 0.000591s : 2.98% optimize.opt_a.with_stream_mark : 0.000052s : 0.26% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.73% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.06% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.05% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.08% optimize.opt_a.renormalize : 0.000477s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.10% optimize.opt_a.cse : 0.000042s : 0.21% optimize.opt_a.a_3 : 0.000074s : 0.37% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000078s : 0.39% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000520s : 2.62% optimize.opt_b.b_1 : 0.000110s : 0.56% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000044s : 0.22% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.12% optimize.loop_unroll : 0.000415s : 2.09% optimize.opt_after_cconv.c_1 : 0.000028s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.26% optimize.cse_after_recomputation.cse : 0.000011s : 0.05% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.09% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000118s : 0.59% opt_after_jit_grad : 0.000452s : 2.28% validate : 0.000034s : 0.17% backend_pass : 0.000001s : 0.00% task_emit : 0.007842s : 39.51% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000173 30 15.04% : 0.000026s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000002s : 2: substitution.fold_const_symbol 3.34% : 0.000006s : 4: substitution.graph_param_transform 67.18% : 0.000116s : 3: substitution.inline 1.55% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.24% : 0.000004s : 4: substitution.replace_old_param 6.28% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006776 2 90.51% : 0.006133s : 1: type_inference.infer 9.49% : 0.000643s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.02% : 0.000028s : 3: replace.inline 28.98% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 5 92.13% : 0.000114s : 3: match.inline 7.87% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.83% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.08% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.98% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.12% : 0.000002s : 8: predicate.less_batch_normalization 1.61% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.29% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 8: predicate.remove_not_recompute_node 1.56% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.64% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 1.02% : 0.000002s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000401 8 47.34% : 0.000190s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.66% : 0.000211s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.035039 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.95% : 0.004186s : 1: add_attr 11.86% : 0.004157s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.25% : 0.000086s : 1: auto_monad 0.05% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.73% : 0.000956s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000022s : 1: event_method 0.04% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.21% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000529s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.74% : 0.000959s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.06% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000091s : 28: opt.transform.opt_b 0.13% : 0.000044s : 2: opt.transform.opt_trans_graph 0.09% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.38% : 0.002235s : 1: opt_a 0.29% : 0.000101s : 1: opt_after_cconv 1.32% : 0.000462s : 1: opt_after_jit_grad 0.54% : 0.000188s : 1: opt_b 12.19% : 0.004270s : 1: optimize 0.14% : 0.000048s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000035s : 1: pre_auto_parallel 0.08% : 0.000027s : 1: py_interpret_to_execute 0.03% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.72% : 0.000253s : 1: renormalize.infer 0.62% : 0.000217s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.35% : 0.000124s : 1: rewriter_after_jit_bprop_graph 0.24% : 0.000083s : 1: rewriter_after_opt_a 0.18% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.20% : 0.000072s : 1: symbol_engine_optimizer 22.41% : 0.007852s : 1: task_emit 0.21% : 0.000073s : 1: tuple_transform 19.58% : 0.006862s : 1: type_inference 0.19% : 0.000067s : 1: validate TotalTime = 0.0186522, [24] [bootstrap]: 0.00045989 [type_inference]: 0.00438009 [event_method]: 1.116e-05 [auto_monad]: 5.082e-05 [graph_reusing]: 5.17999e-06 [inline]: 1.72999e-06 [add_attr]: 0.0030381, [1] [add_attr_with_inline]: 0.00303033, [1] [Cycle 1]: 4.732e-05, [2] [tag_attr]: 1.259e-05 [meta_addattr_fg_expand]: 3.53999e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.218e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 1.19e-06 [dataset_repeat_opt]: 1.96003e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00366514, [53] [py_interpret_to_execute]: 1.518e-05 [rewriter_before_opt_a]: 3.885e-05 [opt_a]: 0.00186125, [2] [Cycle 1]: 0.00126316, [45] [expand_dump_flag]: 2.43e-06 [switch_simplify]: 2.431e-05 [loop_unroll]: 1.33e-05 [a_1]: 0.00029015 [with_stream_mark]: 1.307e-05 [recompute_prepare]: 7.09001e-06 [updatestate_depend_eliminate]: 3.47997e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 2.11998e-06 [a_2]: 7.569e-05 [accelerated_algorithm]: 6.23e-06 [shard]: 2.08002e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 6.03002e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 5.79e-06 [parallel]: 1.837e-05 [flash_sp]: 7.78001e-06 [merge_comm]: 3.46999e-06 [allreduce_fusion]: 3.62002e-06 [matmul_add_comm_reduction]: 8.40001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.18998e-06 [virtual_dataset]: 5.57001e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.65998e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 8.75001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.133e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 1.023e-05 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.112e-05 [a_after_grad]: 8.85001e-06 [renormalize]: 0.00035707 [add_forward_monad_depend]: 4.4e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.287e-05 [cse]: 2.594e-05 [a_3]: 3.953e-05 [Cycle 2]: 0.00058906, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.49999e-06 [loop_unroll]: 5.41002e-06 [a_1]: 0.00012485 [with_stream_mark]: 9.44998e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.759e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.10998e-06 [auto_parallel]: 4.95999e-06 [parallel]: 4.32e-06 [flash_sp]: 3.63e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.78003e-06 [matmul_add_comm_reduction]: 5.15001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.26e-06 [virtual_dataset]: 5.13002e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84001e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.7e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 8.88002e-06 [a_after_grad]: 7.92e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.218e-05 [a_3]: 3.168e-05 [py_interpret_to_execute_after_opt_a]: 7.79002e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.136e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 4.87998e-06 [mutable_eliminate]: 0.00045228 [opt_b]: 0.00018163, [1] [Cycle 1]: 0.0001755, [7] [b_1]: 0.00010896 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 4.19997e-07 [cse]: 1.501e-05 [optimize_parallel_all_gather_comm]: 1.584e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.281e-05 [loop_unroll]: 0.00041112 [opt_after_cconv]: 9.429e-05, [1] [Cycle 1]: 8.851e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 2.23002e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.32001e-06 [cse]: 1.55e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 1.276e-05 [tuple_transform]: 6.858e-05, [1] [Cycle 1]: 6.453e-05, [4] [d_1]: 3.873e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.05002e-06 [partial_unused_args_eliminate]: 2.07999e-06 [add_recomputation]: 4.242e-05 [cse_after_recomputation]: 1.874e-05, [1] [Cycle 1]: 1.444e-05, [1] [cse]: 9.42999e-06 [environ_conv]: 5.09e-06 [swap_dp_allreduce_reducescatter]: 4.82e-06 [bias_add_comm_swap]: 2.48998e-06 [label_micro_interleaved_index]: 4.74e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.58003e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.27999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.51002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.08001e-06 [overlap_grad_flash_sp]: 1.711e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.829e-05, [1] [Cycle 1]: 6.426e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.28999e-06 [elim_not_effective]: 1.141e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 1.55e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00044651 [validate]: 3.136e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00630319 [execute]: 7.87003e-06 Sums bootstrap : 0.000460s : 3.14% type_inference : 0.004380s : 29.88% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000357s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 3.09% optimize.opt_b.b_1 : 0.000109s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000411s : 2.80% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000009s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.05% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006303s : 43.00% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.22% : 0.000022s : 4: substitution.arithmetic_simplify 1.58% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 64.98% : 0.000077s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.63% : 0.000004s : 4: substitution.remove_not_recompute_node 3.55% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004340 2 91.88% : 0.003987s : 1: type_inference.infer 8.12% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.75% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.70% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.87% : 0.000001s : 8: predicate.remove_not_recompute_node 1.25% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.82% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 40.88% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.12% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026636 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.42% : 0.003043s : 1: add_attr 11.39% : 0.003034s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000496s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.58% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.87% : 0.000764s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.00% : 0.001864s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000456s : 1: opt_after_jit_grad 0.69% : 0.000185s : 1: opt_b 13.77% : 0.003669s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.02% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.75% : 0.000201s : 1: renormalize.infer 0.56% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.70% : 0.006313s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.50% : 0.004395s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.019808, [24] [bootstrap]: 0.00051566 [type_inference]: 0.00552468 [event_method]: 1.409e-05 [auto_monad]: 5.446e-05 [graph_reusing]: 5.72001e-06 [inline]: 1.89999e-06 [add_attr]: 0.00297317, [1] [add_attr_with_inline]: 0.00296509, [1] [Cycle 1]: 4.681e-05, [2] [tag_attr]: 1.567e-05 [meta_addattr_fg_expand]: 4.42998e-06 [parallel-infer-symbol]: 2.49999e-06 [pre_auto_parallel]: 2.576e-05 [insert-virtual-dataset]: 2.18002e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00394866, [53] [py_interpret_to_execute]: 1.918e-05 [rewriter_before_opt_a]: 5.822e-05 [opt_a]: 0.00209384, [2] [Cycle 1]: 0.00150059, [45] [expand_dump_flag]: 2.59001e-06 [switch_simplify]: 3.101e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.00044697 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 2.09e-06 [a_2]: 7.626e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 7.70998e-06 [auto_parallel]: 6.43e-06 [parallel]: 1.761e-05 [flash_sp]: 7.14001e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.43999e-06 [matmul_add_comm_reduction]: 8.3e-06 [allreduce_slice_to_reducescatter]: 8.40024e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 5.91003e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.92002e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.49e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.31002e-06 [before_grad]: 8.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 1.023e-05 [a_after_grad]: 8.64e-06 [renormalize]: 0.00041842 [add_forward_monad_depend]: 5.02999e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.385e-05 [cse]: 2.468e-05 [a_3]: 4.048e-05 [Cycle 2]: 0.00058334, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.97002e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012574 [with_stream_mark]: 9.35001e-06 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.75997e-06 [updatestate_assign_eliminate]: 2.07001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.779e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.56002e-06 [auto_parallel]: 5.52999e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.4e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.56998e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 5.86998e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.45002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 5.81003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.10999e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.73999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.12001e-06 [a_after_grad]: 7.82e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.10019e-07 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 5.97999e-06 [cse]: 1.191e-05 [a_3]: 3.18e-05 [py_interpret_to_execute_after_opt_a]: 7.68999e-06 [slice_cell_reuse_recomputed_activation]: 2.44001e-06 [rewriter_after_opt_a]: 3.094e-05 [convert_after_rewriter]: 7.90998e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00045214 [opt_b]: 0.0002051, [1] [Cycle 1]: 0.0001988, [7] [b_1]: 0.00010673 [b_2]: 3.007e-05 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 7.89994e-07 [cse]: 1.634e-05 [optimize_parallel_all_gather_comm]: 1.526e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.212e-05 [loop_unroll]: 0.00041376 [opt_after_cconv]: 9.71e-05, [1] [Cycle 1]: 9.1e-05, [7] [c_1]: 2.896e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.535e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.193e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.45e-05, [4] [d_1]: 3.896e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 1.98002e-06 [add_recomputation]: 4.561e-05 [cse_after_recomputation]: 1.977e-05, [1] [Cycle 1]: 1.544e-05, [1] [cse]: 1.027e-05 [environ_conv]: 4.34997e-06 [swap_dp_allreduce_reducescatter]: 5.32001e-06 [bias_add_comm_swap]: 2.20002e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.37999e-06 [merge_cast_opt]: 1.11002e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.14003e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.32999e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.39001e-06 [overlap_recompute_and_grad_model_parallel]: 4.38001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.84e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.592e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.86e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 1.26997e-06 [symbol_engine_optimizer]: 6.819e-05, [1] [Cycle 1]: 6.397e-05, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.72e-06 [elim_not_effective]: 1.119e-05 [opt_reshape]: 6.03998e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.63002e-06 [pipeline_parallel_scheduler]: 1.87001e-06 [auto_monad_reorder]: 1.561e-05 [get_jit_bprop_graph]: 9.29984e-07 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00044596 [validate]: 2.993e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.0060324 [execute]: 7.74002e-06 Sums bootstrap : 0.000516s : 3.25% type_inference : 0.005525s : 34.78% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000002s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000573s : 3.61% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000419s : 2.63% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000037s : 0.23% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.85% optimize.opt_b.b_1 : 0.000107s : 0.67% optimize.opt_b.b_2 : 0.000030s : 0.19% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000414s : 2.60% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 2.81% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006032s : 37.98% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000162 30 15.17% : 0.000025s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.41% : 0.000006s : 4: substitution.graph_param_transform 66.27% : 0.000108s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.58% : 0.000004s : 4: substitution.replace_old_param 6.45% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005483 2 90.05% : 0.004938s : 1: type_inference.infer 9.95% : 0.000546s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.60% : 0.000026s : 3: replace.inline 30.40% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.74% : 0.000105s : 3: match.inline 8.26% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 1.01% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.40% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.51% : 0.000004s : 32: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.81% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.33% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.12% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.88% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028264 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.53% : 0.002977s : 1: add_attr 10.50% : 0.002969s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.95% : 0.000551s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.31% : 0.000935s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.40% : 0.000113s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.42% : 0.002097s : 1: opt_a 0.36% : 0.000100s : 1: opt_after_cconv 1.61% : 0.000455s : 1: opt_after_jit_grad 0.74% : 0.000208s : 1: opt_b 13.98% : 0.003952s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.76% : 0.000215s : 1: renormalize.infer 0.70% : 0.000197s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.38% : 0.006043s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.60% : 0.005538s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0380286, [24] [bootstrap]: 0.00051125 [type_inference]: 0.0113827 [event_method]: 0.00010038 [auto_monad]: 0.00012121 [graph_reusing]: 8.28001e-06 [inline]: 2.04e-06 [add_attr]: 0.00302516, [1] [add_attr_with_inline]: 0.0030171, [1] [Cycle 1]: 7.317e-05, [2] [tag_attr]: 3.506e-05 [meta_addattr_fg_expand]: 9.54e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 5.038e-05 [insert-virtual-dataset]: 2.90002e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.013604, [53] [py_interpret_to_execute]: 3.877e-05 [rewriter_before_opt_a]: 0.00014426 [opt_a]: 0.0112912, [3] [Cycle 1]: 0.00731907, [45] [expand_dump_flag]: 3.69002e-06 [switch_simplify]: 7.402e-05 [loop_unroll]: 6.113e-05 [a_1]: 0.00145514 [with_stream_mark]: 2.338e-05 [recompute_prepare]: 2.165e-05 [updatestate_depend_eliminate]: 9.56998e-06 [updatestate_assign_eliminate]: 8.40001e-06 [updatestate_loads_eliminate]: 7.19001e-06 [parameter_eliminate]: 2.68e-06 [a_2]: 0.00024458 [accelerated_algorithm]: 3.069e-05 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 3.38999e-06 [shard_inline]: 1.611e-05 [merge_send_recv]: 1.573e-05 [auto_parallel]: 1.086e-05 [parallel]: 1.839e-05 [flash_sp]: 1.172e-05 [merge_comm]: 9.37999e-06 [allreduce_fusion]: 8.62998e-06 [matmul_add_comm_reduction]: 2.645e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.763e-05 [virtual_dataset]: 1.53e-05 [get_grad_eliminate_]: 1.494e-05 [virtual_output]: 1.501e-05 [merge_forward]: 9.25001e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 1.803e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.874e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.76e-05 [set_forward_comm_id_for_comm_node_pass]: 9.45001e-06 [meta_fg_expand]: 0.0014303 [flash_sp_send_recv_attached]: 3.73999e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 5.822e-05 [a_after_grad]: 8.02e-05 [renormalize]: 0.00267403 [add_forward_monad_depend]: 9.12999e-06 [auto_monad_grad]: 5.41002e-06 [auto_monad_eliminator]: 5.708e-05 [cse]: 0.00016151 [a_3]: 0.00033658 [Cycle 2]: 0.00306044, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.765e-05 [loop_unroll]: 4.458e-05 [a_1]: 0.00157735 [with_stream_mark]: 1.261e-05 [recompute_prepare]: 1.107e-05 [updatestate_depend_eliminate]: 5.43002e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 0.00012589 [accelerated_algorithm]: 1.197e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 2.19001e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 7.08998e-06 [auto_parallel]: 7.56001e-06 [parallel]: 5.46e-06 [flash_sp]: 3.58e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 5.00001e-06 [matmul_add_comm_reduction]: 8.18999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 9.02e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 9.09989e-07 [offload_activation]: 9.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.57e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.397e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 7.072e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.24e-06 [after_resolve]: 1.579e-05 [a_after_grad]: 1.417e-05 [renormalize]: 0.00059876 [add_forward_monad_depend]: 4e-06 [auto_monad_grad]: 1.46998e-06 [auto_monad_eliminator]: 1.497e-05 [cse]: 4.707e-05 [a_3]: 6.598e-05 [Cycle 3]: 0.00089665, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 1.075e-05 [loop_unroll]: 8.93002e-06 [a_1]: 0.00024742 [with_stream_mark]: 1.003e-05 [recompute_prepare]: 9.24998e-06 [updatestate_depend_eliminate]: 4.70001e-06 [updatestate_assign_eliminate]: 3.89002e-06 [updatestate_loads_eliminate]: 3.82998e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.00012337 [accelerated_algorithm]: 1.163e-05 [shard]: 9.40025e-07 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 6.84001e-06 [auto_parallel]: 6.93e-06 [parallel]: 4.45999e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 5.01002e-06 [allreduce_fusion]: 5.19e-06 [matmul_add_comm_reduction]: 7.75998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.004e-05 [virtual_dataset]: 8.82999e-06 [get_grad_eliminate_]: 8.38999e-06 [virtual_output]: 8.22e-06 [merge_forward]: 4.37998e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.604e-05 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29998e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.19998e-06 [after_resolve]: 1.35e-05 [a_after_grad]: 1.376e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 1.115e-05 [cse]: 2.664e-05 [a_3]: 5.941e-05 [py_interpret_to_execute_after_opt_a]: 1.101e-05 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 4.8e-05 [convert_after_rewriter]: 9.41003e-06 [order_py_execute_after_rewriter]: 6.90002e-06 [mutable_eliminate]: 0.00047555 [opt_b]: 0.00028638, [1] [Cycle 1]: 0.0002798, [7] [b_1]: 0.00018855 [b_2]: 1.07e-05 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.92998e-06 [renormalize]: 3.80009e-07 [cse]: 3.077e-05 [optimize_parallel_all_gather_comm]: 2.078e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 1.944e-05 [loop_unroll]: 0.00046185 [opt_after_cconv]: 0.00013612, [1] [Cycle 1]: 0.00013006, [7] [c_1]: 4.806e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 7.10002e-06 [updatestate_assign_eliminate]: 4.46002e-06 [updatestate_loads_eliminate]: 3.91001e-06 [cse]: 3.006e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.915e-05 [tuple_transform]: 0.00010155, [1] [Cycle 1]: 9.693e-05, [4] [d_1]: 6.68e-05 [none_parameter_eliminate]: 1.66998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.84999e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 5.602e-05 [cse_after_recomputation]: 3.089e-05, [1] [Cycle 1]: 2.619e-05, [1] [cse]: 2.088e-05 [environ_conv]: 8.47998e-06 [swap_dp_allreduce_reducescatter]: 7.98001e-06 [bias_add_comm_swap]: 2.70002e-06 [label_micro_interleaved_index]: 4.33001e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.60999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.676e-05 [grouped_pairwise_exchange_alltoall]: 1.41998e-06 [offloading_packed_experts]: 4.87e-06 [overlap_recompute_and_grad_model_parallel]: 5.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.02001e-06 [overlap_grad_ring_attention]: 5.14e-06 [overlap_grad_flash_sp]: 2.443e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.05999e-06 [symbol_engine_optimizer]: 9.915e-05, [1] [Cycle 1]: 9.478e-05, [6] [build]: 1.008e-05 [elim_shapecalc]: 1.323e-05 [elim_not_effective]: 1.794e-05 [opt_reshape]: 9.99001e-06 [fold_const_symbol]: 1.512e-05 [renormalize]: 1.90019e-07 [detach_backward]: 2.19001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.465e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.97998e-06 [opt_after_jit_grad]: 0.00046873 [validate]: 4.517e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00844908 [execute]: 7.32002e-06 Sums bootstrap : 0.000511s : 1.51% type_inference : 0.011383s : 33.72% event_method : 0.000100s : 0.30% auto_monad : 0.000121s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.43% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.34% optimize.opt_a.a_1 : 0.003280s : 9.72% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001504s : 4.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.26% optimize.opt_a.a_after_grad : 0.000108s : 0.32% optimize.opt_a.renormalize : 0.003273s : 9.70% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.25% optimize.opt_a.cse : 0.000235s : 0.70% optimize.opt_a.a_3 : 0.000462s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000476s : 1.41% optimize.opt_b.b_1 : 0.000189s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000462s : 1.37% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000469s : 1.39% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008449s : 25.03% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000780 222 5.95% : 0.000046s : 12: substitution.arithmetic_simplify 1.73% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000003s : 5: substitution.fold_const_symbol 0.96% : 0.000007s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 56.47% : 0.000440s : 17: substitution.inline 2.00% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000014s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.69% : 0.000013s : 20: substitution.remove_not_recompute_node 3.12% : 0.000024s : 10: substitution.replace_applicator 1.33% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.24% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.43% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011311 2 87.16% : 0.009859s : 1: type_inference.infer 12.84% : 0.001452s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.13% : 0.000125s : 17: replace.inline 42.87% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000466 33 92.66% : 0.000431s : 17: match.inline 7.34% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.07% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.17% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.78% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.73% : 0.000043s : 249: predicate.inline 1.22% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000004s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.97% : 0.000037s : 277: predicate.switch_simplify 1.10% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.23% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001628 34 54.53% : 0.000888s : 13: func_graph_cloner_run.FuncGraphClonerGraph 45.47% : 0.000740s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063196 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.79% : 0.003029s : 1: add_attr 4.78% : 0.003021s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000129s : 1: auto_monad 0.04% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.86% : 0.000546s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.17% : 0.000109s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000470s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000484s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.82% : 0.004939s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.87% : 0.011294s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.76% : 0.000478s : 1: opt_after_jit_grad 0.46% : 0.000290s : 1: opt_b 21.53% : 0.013608s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.73% : 0.001727s : 2: renormalize.infer 2.43% : 0.001533s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000102s : 1: symbol_engine_optimizer 13.39% : 0.008459s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.04% : 0.011399s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0185826, [24] [bootstrap]: 0.00046267 [type_inference]: 0.00430136 [event_method]: 1.035e-05 [auto_monad]: 5.083e-05 [graph_reusing]: 5.38002e-06 [inline]: 2.04e-06 [add_attr]: 0.00308963, [1] [add_attr_with_inline]: 0.00308161, [1] [Cycle 1]: 4.537e-05, [2] [tag_attr]: 1.132e-05 [meta_addattr_fg_expand]: 3.16999e-06 [parallel-infer-symbol]: 3.17002e-06 [pre_auto_parallel]: 2.165e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.71002e-06 [optimize]: 0.00367573, [53] [py_interpret_to_execute]: 1.518e-05 [rewriter_before_opt_a]: 3.709e-05 [opt_a]: 0.0018887, [2] [Cycle 1]: 0.00129269, [45] [expand_dump_flag]: 2.59001e-06 [switch_simplify]: 2.36e-05 [loop_unroll]: 1.37e-05 [a_1]: 0.00029151 [with_stream_mark]: 1.317e-05 [recompute_prepare]: 7.64002e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 7.622e-05 [accelerated_algorithm]: 6.48003e-06 [shard]: 2.28998e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.5e-06 [auto_parallel]: 5.46998e-06 [parallel]: 1.74e-05 [flash_sp]: 7.75e-06 [merge_comm]: 3.35998e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.94e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 6.82002e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.099e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 9.55001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.44999e-06 [receive_attached]: 2.15002e-06 [after_resolve]: 1.14e-05 [a_after_grad]: 8.85999e-06 [renormalize]: 0.00038453 [add_forward_monad_depend]: 4.85001e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.26e-05 [cse]: 2.709e-05 [a_3]: 3.963e-05 [Cycle 2]: 0.00058671, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.70002e-06 [loop_unroll]: 5.48002e-06 [a_1]: 0.00012567 [with_stream_mark]: 8.92999e-06 [recompute_prepare]: 5.76998e-06 [updatestate_depend_eliminate]: 2.61999e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 6.79e-05 [accelerated_algorithm]: 5.32001e-06 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 1.19003e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.07e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.11999e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 2.30008e-07 [virtual_shard_identity]: 5.92001e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.37999e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 5.86998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.51e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.71999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16999e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 8.75999e-06 [a_after_grad]: 7.93999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14003e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 5.92999e-06 [cse]: 1.203e-05 [a_3]: 3.16e-05 [py_interpret_to_execute_after_opt_a]: 7.35998e-06 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 2.992e-05 [convert_after_rewriter]: 7.18998e-06 [order_py_execute_after_rewriter]: 5.45001e-06 [mutable_eliminate]: 0.00044707 [opt_b]: 0.00017746, [1] [Cycle 1]: 0.00017145, [7] [b_1]: 0.00010538 [b_2]: 6.48e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.15002e-06 [renormalize]: 3.4002e-07 [cse]: 1.646e-05 [optimize_parallel_all_gather_comm]: 1.637e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.211e-05 [loop_unroll]: 0.00041037 [opt_after_cconv]: 9.296e-05, [1] [Cycle 1]: 8.717e-05, [7] [c_1]: 2.742e-05 [parameter_eliminate]: 2.08998e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.551e-05 [renormalize]: 2.10013e-07 [remove_dup_value]: 1.245e-05 [tuple_transform]: 6.848e-05, [1] [Cycle 1]: 6.4e-05, [4] [d_1]: 3.832e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 4.38e-05 [cse_after_recomputation]: 1.954e-05, [1] [Cycle 1]: 1.506e-05, [1] [cse]: 1.002e-05 [environ_conv]: 4.63999e-06 [swap_dp_allreduce_reducescatter]: 5.22e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.13001e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.53002e-06 [control_data_broadcast_order]: 1.121e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.58999e-06 [overlap_recompute_and_grad_model_parallel]: 4.37e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.66e-06 [overlap_grad_ring_attention]: 4.05998e-06 [overlap_grad_flash_sp]: 1.61e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.76e-05, [1] [Cycle 1]: 6.352e-05, [6] [build]: 2.06998e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.117e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 8.89998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.553e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.18998e-06 [opt_after_jit_grad]: 0.00044563 [validate]: 3.079e-05 [backend_pass]: 1.08001e-06 [task_emit]: 0.00625017 [execute]: 6.96001e-06 Sums bootstrap : 0.000463s : 3.18% type_inference : 0.004301s : 29.58% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.87% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000385s : 2.65% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000447s : 3.07% optimize.opt_b.b_1 : 0.000105s : 0.72% optimize.opt_b.b_2 : 0.000006s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000410s : 2.82% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.06% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006250s : 42.99% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.43% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.39% : 0.000005s : 4: substitution.graph_param_transform 65.07% : 0.000078s : 2: substitution.inline 2.44% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.46% : 0.000004s : 4: substitution.remove_not_recompute_node 3.75% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004261 2 91.93% : 0.003917s : 1: type_inference.infer 8.07% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.73% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000008s : 44: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.82% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.04% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.85% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 1.02% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.02% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.19% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.21% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.17% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.83% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026655 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.61% : 0.003094s : 1: add_attr 11.57% : 0.003085s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000498s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.88% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000088s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001892s : 1: opt_a 0.36% : 0.000096s : 1: opt_after_cconv 1.71% : 0.000455s : 1: opt_after_jit_grad 0.68% : 0.000181s : 1: opt_b 13.80% : 0.003679s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.86% : 0.000229s : 1: renormalize.infer 0.56% : 0.000149s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.15% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.49% : 0.006261s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.19% : 0.004316s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0366723, [24] [bootstrap]: 0.0005094 [type_inference]: 0.0102412 [event_method]: 4.517e-05 [auto_monad]: 0.00011014 [graph_reusing]: 8.05e-06 [inline]: 2.27001e-06 [add_attr]: 0.00297329, [1] [add_attr_with_inline]: 0.00296491, [1] [Cycle 1]: 6.685e-05, [2] [tag_attr]: 3.129e-05 [meta_addattr_fg_expand]: 8.28999e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 4.655e-05 [insert-virtual-dataset]: 2.64999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.0136732, [53] [py_interpret_to_execute]: 3.647e-05 [rewriter_before_opt_a]: 0.00012693 [opt_a]: 0.0113804, [3] [Cycle 1]: 0.00741071, [45] [expand_dump_flag]: 3.51999e-06 [switch_simplify]: 6.74e-05 [loop_unroll]: 5.489e-05 [a_1]: 0.00139753 [with_stream_mark]: 2.263e-05 [recompute_prepare]: 2.172e-05 [updatestate_depend_eliminate]: 9.24e-06 [updatestate_assign_eliminate]: 8.2e-06 [updatestate_loads_eliminate]: 7.56001e-06 [parameter_eliminate]: 2.77002e-06 [a_2]: 0.00024593 [accelerated_algorithm]: 3.035e-05 [shard]: 2.29999e-06 [meta_shard_fg_expand]: 3.34001e-06 [shard_inline]: 1.581e-05 [merge_send_recv]: 1.642e-05 [auto_parallel]: 1.15e-05 [parallel]: 1.743e-05 [flash_sp]: 1.164e-05 [merge_comm]: 9.74e-06 [allreduce_fusion]: 8.65001e-06 [matmul_add_comm_reduction]: 2.716e-05 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 1.819e-05 [virtual_dataset]: 1.553e-05 [get_grad_eliminate_]: 1.543e-05 [virtual_output]: 1.519e-05 [merge_forward]: 9.29e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.763e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.813e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 2.704e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66e-06 [meta_fg_expand]: 0.00156819 [flash_sp_send_recv_attached]: 3.98001e-06 [receive_attached]: 3.18998e-06 [after_resolve]: 6.105e-05 [a_after_grad]: 8.738e-05 [renormalize]: 0.00266928 [add_forward_monad_depend]: 1.009e-05 [auto_monad_grad]: 5.67999e-06 [auto_monad_eliminator]: 5.761e-05 [cse]: 0.00016849 [a_3]: 0.00034118 [Cycle 2]: 0.00305776, [45] [expand_dump_flag]: 1.73002e-06 [switch_simplify]: 4.726e-05 [loop_unroll]: 4.444e-05 [a_1]: 0.00159607 [with_stream_mark]: 1.249e-05 [recompute_prepare]: 1.127e-05 [updatestate_depend_eliminate]: 5.26998e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 3.95998e-06 [parameter_eliminate]: 1.04998e-06 [a_2]: 0.00012698 [accelerated_algorithm]: 1.28e-05 [shard]: 1.35001e-06 [meta_shard_fg_expand]: 2.03997e-06 [shard_inline]: 9.51e-06 [merge_send_recv]: 7.4e-06 [auto_parallel]: 7.63999e-06 [parallel]: 4.97999e-06 [flash_sp]: 3.28e-06 [merge_comm]: 5.37999e-06 [allreduce_fusion]: 4.68001e-06 [matmul_add_comm_reduction]: 8.22e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 1.048e-05 [virtual_dataset]: 8.60999e-06 [get_grad_eliminate_]: 8.92e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.647e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.507e-05 [set_forward_comm_id_for_comm_node_pass]: 5.73997e-06 [meta_fg_expand]: 3.551e-05 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.25999e-06 [after_resolve]: 1.523e-05 [a_after_grad]: 1.415e-05 [renormalize]: 0.00060553 [add_forward_monad_depend]: 3.97e-06 [auto_monad_grad]: 1.23002e-06 [auto_monad_eliminator]: 1.438e-05 [cse]: 4.789e-05 [a_3]: 6.555e-05 [Cycle 3]: 0.00089723, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 1.053e-05 [loop_unroll]: 9.15999e-06 [a_1]: 0.00024948 [with_stream_mark]: 9.89999e-06 [recompute_prepare]: 9.19e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012326 [accelerated_algorithm]: 1.159e-05 [shard]: 9.09989e-07 [meta_shard_fg_expand]: 2.06e-06 [shard_inline]: 9.36e-06 [merge_send_recv]: 6.91001e-06 [auto_parallel]: 6.99001e-06 [parallel]: 4.47e-06 [flash_sp]: 1.10999e-06 [merge_comm]: 4.90999e-06 [allreduce_fusion]: 4.89003e-06 [matmul_add_comm_reduction]: 7.79002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 9.88998e-06 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.48999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.6e-05 [merge_recompute_call_nodes]: 8.60018e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 3.13e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.316e-05 [a_after_grad]: 1.413e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.16997e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 1.056e-05 [cse]: 2.438e-05 [a_3]: 6.07e-05 [py_interpret_to_execute_after_opt_a]: 1.122e-05 [slice_cell_reuse_recomputed_activation]: 2.35002e-06 [rewriter_after_opt_a]: 4.786e-05 [convert_after_rewriter]: 9.02e-06 [order_py_execute_after_rewriter]: 6.69001e-06 [mutable_eliminate]: 0.00047494 [opt_b]: 0.00028769, [1] [Cycle 1]: 0.00028126, [7] [b_1]: 0.00018916 [b_2]: 1.114e-05 [updatestate_depend_eliminate]: 7.06999e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 3.19997e-07 [cse]: 3.088e-05 [optimize_parallel_all_gather_comm]: 2.067e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.045e-05 [loop_unroll]: 0.00044993 [opt_after_cconv]: 0.00013617, [1] [Cycle 1]: 0.00013015, [7] [c_1]: 4.85e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 6.89999e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 4.01001e-06 [cse]: 2.932e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 3.011e-05 [tuple_transform]: 0.00010278, [1] [Cycle 1]: 9.831e-05, [4] [d_1]: 6.77e-05 [none_parameter_eliminate]: 1.90001e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 1.006e-05 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 5.827e-05 [cse_after_recomputation]: 3.174e-05, [1] [Cycle 1]: 2.704e-05, [1] [cse]: 2.172e-05 [environ_conv]: 8.99003e-06 [swap_dp_allreduce_reducescatter]: 8.12e-06 [bias_add_comm_swap]: 2.21e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.52001e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.72001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.749e-05 [grouped_pairwise_exchange_alltoall]: 1.71998e-06 [offloading_packed_experts]: 4.85999e-06 [overlap_recompute_and_grad_model_parallel]: 6.31e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60001e-06 [overlap_recompute_comm]: 2.18998e-06 [overlap_grad_ring_attention]: 5.07999e-06 [overlap_grad_flash_sp]: 2.466e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.94e-06 [split_layernorm_comm]: 1.89e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 0.00010017, [1] [Cycle 1]: 9.607e-05, [6] [build]: 1.012e-05 [elim_shapecalc]: 1.388e-05 [elim_not_effective]: 1.813e-05 [opt_reshape]: 1.08e-05 [fold_const_symbol]: 1.447e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 2.466e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00047039 [validate]: 4.395e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00829366 [execute]: 7.05998e-06 Sums bootstrap : 0.000509s : 1.57% type_inference : 0.010241s : 31.57% event_method : 0.000045s : 0.14% auto_monad : 0.000110s : 0.34% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.33% optimize.opt_a.a_1 : 0.003243s : 10.00% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.53% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001607s : 4.95% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000089s : 0.28% optimize.opt_a.a_after_grad : 0.000116s : 0.36% optimize.opt_a.renormalize : 0.003275s : 10.09% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.25% optimize.opt_a.cse : 0.000241s : 0.74% optimize.opt_a.a_3 : 0.000467s : 1.44% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000475s : 1.46% optimize.opt_b.b_1 : 0.000189s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000450s : 1.39% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000470s : 1.45% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008294s : 25.57% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000843 218 5.09% : 0.000043s : 11: substitution.arithmetic_simplify 1.63% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 60.36% : 0.000509s : 16: substitution.inline 1.91% : 0.000016s : 2: substitution.inline_without_move 1.24% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.75% : 0.000015s : 3: substitution.less_batch_normalization 1.54% : 0.000013s : 11: substitution.minmaximum_grad 0.62% : 0.000005s : 5: substitution.partial_eliminate 1.55% : 0.000013s : 20: substitution.remove_not_recompute_node 2.95% : 0.000025s : 10: substitution.replace_applicator 1.22% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.30% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.65% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.12% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.45% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.23% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010169 2 87.05% : 0.008852s : 1: type_inference.infer 12.95% : 0.001317s : 1: type_inference.specialize ------[replace.] 0.000210 30 59.52% : 0.000125s : 16: replace.inline 40.48% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000532 30 94.21% : 0.000501s : 16: match.inline 5.79% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000740 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.07% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.28% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000042s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000019s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.21% : 0.000002s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.69% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 67: predicate.reduce_eliminate 2.64% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.18% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.33% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.97% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.51% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.19% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001714 32 56.99% : 0.000977s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.01% : 0.000737s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061835 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.82% : 0.002977s : 1: add_attr 4.80% : 0.002969s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000117s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000543s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000051s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000459s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000484s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.94% : 0.004910s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 18.41% : 0.011384s : 1: opt_a 0.23% : 0.000140s : 1: opt_after_cconv 0.78% : 0.000480s : 1: opt_after_jit_grad 0.47% : 0.000291s : 1: opt_b 22.12% : 0.013677s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.72% : 0.001681s : 2: renormalize.infer 2.56% : 0.001581s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.21% : 0.000132s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.43% : 0.008304s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 16.59% : 0.010257s : 1: type_inference 0.12% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x4-kbk],max_mem:20.0M TotalTime = 0.117935, [24] [bootstrap]: 0.00053363 [type_inference]: 0.0060761 [event_method]: 1.434e-05 [auto_monad]: 5.877e-05 [graph_reusing]: 5.32999e-06 [inline]: 1.74998e-06 [add_attr]: 0.00345715, [1] [add_attr_with_inline]: 0.00344619, [1] [Cycle 1]: 4.628e-05, [2] [tag_attr]: 1.496e-05 [meta_addattr_fg_expand]: 4.20999e-06 [parallel-infer-symbol]: 2.68998e-06 [pre_auto_parallel]: 2.844e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0040051, [53] [py_interpret_to_execute]: 1.985e-05 [rewriter_before_opt_a]: 5.883e-05 [opt_a]: 0.00215206, [2] [Cycle 1]: 0.00155457, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 3.18e-05 [loop_unroll]: 2.064e-05 [a_1]: 0.00045576 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.90998e-06 [updatestate_depend_eliminate]: 3.86001e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 9.638e-05 [accelerated_algorithm]: 7.1e-06 [shard]: 2.02001e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 7.97998e-06 [auto_parallel]: 5.89e-06 [parallel]: 2.228e-05 [flash_sp]: 7.05e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 8.12998e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.77001e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.31998e-06 [flash_sp_send_recv_attached]: 2.12999e-06 [receive_attached]: 2.30002e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.75999e-06 [renormalize]: 0.00042992 [add_forward_monad_depend]: 4.62e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.34e-05 [cse]: 2.733e-05 [a_3]: 4.021e-05 [Cycle 2]: 0.00058798, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012636 [with_stream_mark]: 1.029e-05 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.69999e-06 [updatestate_assign_eliminate]: 2.18998e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.691e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.20001e-06 [parallel]: 4.01001e-06 [flash_sp]: 3.42002e-06 [merge_comm]: 2.89999e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 5.82001e-06 [virtual_dataset]: 5.24998e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 6.01e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.34998e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.15e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.25999e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.07e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 6.31e-06 [cse]: 1.235e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.88999e-06 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 3.127e-05 [convert_after_rewriter]: 6.56999e-06 [order_py_execute_after_rewriter]: 5.33002e-06 [mutable_eliminate]: 0.00045959 [opt_b]: 0.00018367, [1] [Cycle 1]: 0.00017677, [7] [b_1]: 0.00010781 [b_2]: 7.21999e-06 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 2.76999e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 3.29979e-07 [cse]: 1.661e-05 [optimize_parallel_all_gather_comm]: 1.6e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.328e-05 [loop_unroll]: 0.00041534 [opt_after_cconv]: 9.471e-05, [1] [Cycle 1]: 8.894e-05, [7] [c_1]: 2.803e-05 [parameter_eliminate]: 2.15002e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.606e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.22e-05 [tuple_transform]: 6.804e-05, [1] [Cycle 1]: 6.381e-05, [4] [d_1]: 3.858e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.976e-05 [cse_after_recomputation]: 2.022e-05, [1] [Cycle 1]: 1.585e-05, [1] [cse]: 1.077e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.51e-06 [bias_add_comm_swap]: 2.50002e-06 [label_micro_interleaved_index]: 4.99e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.11003e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 1.21002e-06 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.16e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.81002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.11998e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.646e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 2.31e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.749e-05, [1] [Cycle 1]: 6.337e-05, [6] [build]: 2.17001e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.107e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 9.17001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 9.49978e-07 [rewriter_after_jit_bprop_graph]: 3.18e-06 [opt_after_jit_grad]: 0.00045189 [validate]: 3.13e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.103017 [execute]: 8.25999e-06 Sums bootstrap : 0.000534s : 0.47% type_inference : 0.006076s : 5.35% event_method : 0.000014s : 0.01% auto_monad : 0.000059s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000582s : 0.51% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000163s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000430s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000460s : 0.40% optimize.opt_b.b_1 : 0.000108s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000415s : 0.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000452s : 0.40% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.103017s : 90.76% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.85% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 4: substitution.graph_param_transform 67.07% : 0.000112s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.53% : 0.000004s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.13% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006033 2 90.82% : 0.005479s : 1: type_inference.infer 9.18% : 0.000554s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.41% : 0.000028s : 3: replace.inline 29.59% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 92.36% : 0.000109s : 3: match.inline 7.64% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.90% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.24% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.04% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 1.02% : 0.000002s : 11: predicate.reshape_eliminate 0.88% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.52% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.33% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000343 8 46.69% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.31% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.126942 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.73% : 0.003461s : 1: add_attr 2.72% : 0.003450s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000064s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000575s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.33% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.76% : 0.000967s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.70% : 0.002155s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.36% : 0.000461s : 1: opt_after_jit_grad 0.15% : 0.000187s : 1: opt_b 3.16% : 0.004009s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.18% : 0.000223s : 1: renormalize.infer 0.16% : 0.000200s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.17% : 0.103040s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.80% : 0.006089s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.113989, [24] [bootstrap]: 0.00047016 [type_inference]: 0.00446347 [event_method]: 1.11e-05 [auto_monad]: 5.065e-05 [graph_reusing]: 4.99e-06 [inline]: 1.77999e-06 [add_attr]: 0.00304862, [1] [add_attr_with_inline]: 0.00304029, [1] [Cycle 1]: 4.528e-05, [2] [tag_attr]: 1.169e-05 [meta_addattr_fg_expand]: 3.11999e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 2.248e-05 [insert-virtual-dataset]: 2.74999e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.07999e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00371839, [53] [py_interpret_to_execute]: 1.455e-05 [rewriter_before_opt_a]: 3.881e-05 [opt_a]: 0.00186602, [2] [Cycle 1]: 0.00126542, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 2.49e-05 [loop_unroll]: 1.416e-05 [a_1]: 0.00029244 [with_stream_mark]: 1.365e-05 [recompute_prepare]: 7.84997e-06 [updatestate_depend_eliminate]: 3.58999e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.48002e-06 [a_2]: 7.73e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 5.61998e-06 [parallel]: 1.814e-05 [flash_sp]: 6.93998e-06 [merge_comm]: 3.41999e-06 [allreduce_fusion]: 3.40998e-06 [matmul_add_comm_reduction]: 8.86997e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.03e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.81e-06 [before_grad]: 9.46003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25998e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.39999e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 8.61002e-06 [renormalize]: 0.00035543 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.27e-05 [cse]: 2.758e-05 [a_3]: 3.984e-05 [Cycle 2]: 0.0005917, [45] [expand_dump_flag]: 7.50006e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012491 [with_stream_mark]: 1.034e-05 [recompute_prepare]: 5.86998e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.792e-05 [accelerated_algorithm]: 5.46998e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.56998e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.10999e-06 [parallel]: 4.89e-06 [flash_sp]: 3.11999e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.33e-06 [virtual_dataset]: 5.38002e-06 [get_grad_eliminate_]: 5.18002e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.43998e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.61e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 7.68999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.13999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.53e-06 [cse]: 1.3e-05 [a_3]: 3.256e-05 [py_interpret_to_execute_after_opt_a]: 7.34002e-06 [slice_cell_reuse_recomputed_activation]: 1.69998e-06 [rewriter_after_opt_a]: 3.088e-05 [convert_after_rewriter]: 7.34002e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00049285 [opt_b]: 0.00018141, [1] [Cycle 1]: 0.00017565, [7] [b_1]: 0.00010872 [b_2]: 7.08998e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 5.60016e-07 [cse]: 1.575e-05 [optimize_parallel_all_gather_comm]: 1.509e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.168e-05 [loop_unroll]: 0.00041564 [opt_after_cconv]: 9.536e-05, [1] [Cycle 1]: 8.946e-05, [7] [c_1]: 2.761e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 5.68002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.06e-06 [cse]: 1.58e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 1.32e-05 [tuple_transform]: 6.971e-05, [1] [Cycle 1]: 6.507e-05, [4] [d_1]: 3.94e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.38e-05 [cse_after_recomputation]: 2.142e-05, [1] [Cycle 1]: 1.685e-05, [1] [cse]: 1.143e-05 [environ_conv]: 4.41002e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 3.96001e-06 [label_fine_grained_interleaved_index]: 2.68998e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 1.98997e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.03997e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.63997e-06 [control_data_broadcast_order]: 1.188e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 3.78999e-06 [overlap_grad_flash_sp]: 1.686e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.98002e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.812e-05, [1] [Cycle 1]: 6.411e-05, [6] [build]: 2.21998e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.173e-05 [opt_reshape]: 5.92001e-06 [fold_const_symbol]: 8.50001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 1.583e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00044951 [validate]: 3.139e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.101464 [execute]: 8.85001e-06 Sums bootstrap : 0.000470s : 0.43% type_inference : 0.004463s : 4.06% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000417s : 0.38% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000356s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000493s : 0.45% optimize.opt_b.b_1 : 0.000109s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000416s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.41% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.101464s : 92.26% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.38% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.27% : 0.000005s : 4: substitution.graph_param_transform 65.44% : 0.000079s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000004s : 4: substitution.remove_not_recompute_node 3.59% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004421 2 91.48% : 0.004044s : 1: type_inference.infer 8.52% : 0.000377s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.57% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.34% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.90% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000266 6 41.89% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.11% : 0.000155s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.122036 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.50% : 0.003053s : 1: add_attr 2.49% : 0.003044s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000507s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.63% : 0.000770s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.53% : 0.001869s : 1: opt_a 0.08% : 0.000099s : 1: opt_after_cconv 0.38% : 0.000459s : 1: opt_after_jit_grad 0.15% : 0.000185s : 1: opt_b 3.05% : 0.003722s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000195s : 1: renormalize.infer 0.13% : 0.000154s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 83.16% : 0.101486s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.67% : 0.004477s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.110814, [24] [bootstrap]: 0.00046381 [type_inference]: 0.00555493 [event_method]: 1.495e-05 [auto_monad]: 5.477e-05 [graph_reusing]: 5.22e-06 [inline]: 1.64998e-06 [add_attr]: 0.00298709, [1] [add_attr_with_inline]: 0.00297859, [1] [Cycle 1]: 4.571e-05, [2] [tag_attr]: 1.484e-05 [meta_addattr_fg_expand]: 4.16001e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.647e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.0040231, [53] [py_interpret_to_execute]: 2.039e-05 [rewriter_before_opt_a]: 5.933e-05 [opt_a]: 0.00218005, [2] [Cycle 1]: 0.00157559, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 3.232e-05 [loop_unroll]: 2.144e-05 [a_1]: 0.00044915 [with_stream_mark]: 1.372e-05 [recompute_prepare]: 8.23999e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.516e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 7.96001e-06 [auto_parallel]: 5.82999e-06 [parallel]: 1.818e-05 [flash_sp]: 7.68999e-06 [merge_comm]: 3.43999e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.69998e-06 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 7.23e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.51001e-06 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 9.61998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 9.82999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.36998e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.067e-05 [a_after_grad]: 9.12999e-06 [renormalize]: 0.00048348 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.385e-05 [cse]: 2.765e-05 [a_3]: 4.173e-05 [Cycle 2]: 0.00059471, [45] [expand_dump_flag]: 1.22999e-06 [switch_simplify]: 7.07997e-06 [loop_unroll]: 5.18002e-06 [a_1]: 0.00012427 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 5.77001e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.28998e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.766e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.13999e-06 [auto_parallel]: 5.40999e-06 [parallel]: 3.88001e-06 [flash_sp]: 3.09999e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.93998e-06 [matmul_add_comm_reduction]: 8.3e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 5.87999e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86998e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03998e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 7.92998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.23e-06 [cse]: 1.354e-05 [a_3]: 3.194e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 3.106e-05 [convert_after_rewriter]: 7.00002e-06 [order_py_execute_after_rewriter]: 5.52999e-06 [mutable_eliminate]: 0.00045831 [opt_b]: 0.00018175, [1] [Cycle 1]: 0.00017565, [7] [b_1]: 0.00010753 [b_2]: 7.31001e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.23002e-06 [renormalize]: 3.50003e-07 [cse]: 1.631e-05 [optimize_parallel_all_gather_comm]: 1.62e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.285e-05 [loop_unroll]: 0.00041623 [opt_after_cconv]: 9.42e-05, [1] [Cycle 1]: 8.854e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.07001e-06 [updatestate_depend_eliminate]: 5.24998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.576e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.893e-05, [1] [Cycle 1]: 6.447e-05, [4] [d_1]: 3.935e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.12001e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.387e-05 [cse_after_recomputation]: 1.993e-05, [1] [Cycle 1]: 1.531e-05, [1] [cse]: 1.015e-05 [environ_conv]: 4.47e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 2.83998e-06 [label_micro_interleaved_index]: 4.49002e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.24998e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.127e-05 [grouped_pairwise_exchange_alltoall]: 1.61998e-06 [offloading_packed_experts]: 3.53999e-06 [overlap_recompute_and_grad_model_parallel]: 4.19997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.746e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.808e-05, [1] [Cycle 1]: 6.396e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.186e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 3.09985e-07 [detach_backward]: 1.44e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.5e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.0004497 [validate]: 3.196e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0969509 [execute]: 8.59e-06 Sums bootstrap : 0.000464s : 0.43% type_inference : 0.005555s : 5.20% event_method : 0.000015s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000573s : 0.54% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000484s : 0.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.43% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000416s : 0.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000450s : 0.42% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096951s : 90.73% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.74% : 0.000024s : 5: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 66.95% : 0.000111s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000004s : 4: substitution.remove_not_recompute_node 2.36% : 0.000004s : 4: substitution.replace_old_param 6.25% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005513 2 90.01% : 0.004962s : 1: type_inference.infer 9.99% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.61% : 0.000027s : 3: replace.inline 29.39% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 92.15% : 0.000109s : 3: match.inline 7.85% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.03% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.04% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.19% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.06% : 0.000008s : 54: predicate.switch_simplify 0.88% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 46.10% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.90% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119402 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.51% : 0.002991s : 1: add_attr 2.50% : 0.002983s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000500s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.36% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.79% : 0.000940s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.83% : 0.002183s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.38% : 0.000459s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.37% : 0.004027s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.22% : 0.000265s : 1: renormalize.infer 0.18% : 0.000212s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 81.22% : 0.096973s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.66% : 0.005569s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.147835, [24] [bootstrap]: 0.00056199 [type_inference]: 0.011482 [event_method]: 4.941e-05 [auto_monad]: 0.0001445 [graph_reusing]: 8.22998e-06 [inline]: 2.27999e-06 [add_attr]: 0.00301445, [1] [add_attr_with_inline]: 0.00300607, [1] [Cycle 1]: 7.265e-05, [2] [tag_attr]: 3.516e-05 [meta_addattr_fg_expand]: 9.19e-06 [parallel-infer-symbol]: 3.12002e-06 [pre_auto_parallel]: 5.051e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.0135438, [53] [py_interpret_to_execute]: 3.824e-05 [rewriter_before_opt_a]: 0.00014753 [opt_a]: 0.0112285, [3] [Cycle 1]: 0.00723202, [45] [expand_dump_flag]: 3.65e-06 [switch_simplify]: 7.472e-05 [loop_unroll]: 6.273e-05 [a_1]: 0.00149944 [with_stream_mark]: 2.276e-05 [recompute_prepare]: 2.226e-05 [updatestate_depend_eliminate]: 9.56e-06 [updatestate_assign_eliminate]: 8.09997e-06 [updatestate_loads_eliminate]: 7.3e-06 [parameter_eliminate]: 2.46998e-06 [a_2]: 0.00024611 [accelerated_algorithm]: 3.161e-05 [shard]: 1.81003e-06 [meta_shard_fg_expand]: 3.33e-06 [shard_inline]: 1.648e-05 [merge_send_recv]: 1.592e-05 [auto_parallel]: 1.071e-05 [parallel]: 1.852e-05 [flash_sp]: 1.123e-05 [merge_comm]: 9.49e-06 [allreduce_fusion]: 9.17999e-06 [matmul_add_comm_reduction]: 2.623e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.8e-05 [virtual_dataset]: 1.594e-05 [get_grad_eliminate_]: 1.527e-05 [virtual_output]: 1.515e-05 [merge_forward]: 9.18002e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 1.756e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.862e-05 [merge_recompute_call_nodes]: 1.63002e-06 [before_grad]: 2.756e-05 [set_forward_comm_id_for_comm_node_pass]: 9.65002e-06 [meta_fg_expand]: 0.00142379 [flash_sp_send_recv_attached]: 3.89002e-06 [receive_attached]: 2.52001e-06 [after_resolve]: 6.023e-05 [a_after_grad]: 8.154e-05 [renormalize]: 0.00253189 [add_forward_monad_depend]: 8.92e-06 [auto_monad_grad]: 5.40001e-06 [auto_monad_eliminator]: 5.664e-05 [cse]: 0.00016779 [a_3]: 0.0003369 [Cycle 2]: 0.003077, [45] [expand_dump_flag]: 1.91e-06 [switch_simplify]: 4.696e-05 [loop_unroll]: 4.443e-05 [a_1]: 0.00157017 [with_stream_mark]: 1.273e-05 [recompute_prepare]: 1.133e-05 [updatestate_depend_eliminate]: 5.53002e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 3.80998e-06 [parameter_eliminate]: 1.08001e-06 [a_2]: 0.00012644 [accelerated_algorithm]: 1.223e-05 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.98002e-06 [shard_inline]: 9.36e-06 [merge_send_recv]: 6.59001e-06 [auto_parallel]: 7.75e-06 [parallel]: 6.37001e-06 [flash_sp]: 3.56999e-06 [merge_comm]: 5.77001e-06 [allreduce_fusion]: 5.10001e-06 [matmul_add_comm_reduction]: 7.96001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.043e-05 [virtual_dataset]: 8.88002e-06 [get_grad_eliminate_]: 9.09e-06 [virtual_output]: 8.62e-06 [merge_forward]: 4.57e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.632e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.424e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42001e-06 [meta_fg_expand]: 7.517e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.24e-06 [after_resolve]: 1.537e-05 [a_after_grad]: 1.424e-05 [renormalize]: 0.00061238 [add_forward_monad_depend]: 4.10998e-06 [auto_monad_grad]: 1.32e-06 [auto_monad_eliminator]: 1.464e-05 [cse]: 4.742e-05 [a_3]: 6.63e-05 [Cycle 3]: 0.00090494, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 1.07e-05 [loop_unroll]: 9.05001e-06 [a_1]: 0.00024963 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 9.39998e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 4.15e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00012314 [accelerated_algorithm]: 1.141e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 7.23e-06 [auto_parallel]: 7.32002e-06 [parallel]: 5.39998e-06 [flash_sp]: 1.03001e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 5.04e-06 [matmul_add_comm_reduction]: 7.58001e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 8.75001e-06 [get_grad_eliminate_]: 8.47998e-06 [virtual_output]: 8.20999e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.626e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.415e-05 [set_forward_comm_id_for_comm_node_pass]: 5.28002e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.333e-05 [a_after_grad]: 1.536e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.153e-05 [cse]: 2.705e-05 [a_3]: 5.92e-05 [py_interpret_to_execute_after_opt_a]: 1.116e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 4.655e-05 [convert_after_rewriter]: 9.02999e-06 [order_py_execute_after_rewriter]: 6.77002e-06 [mutable_eliminate]: 0.00047584 [opt_b]: 0.00028729, [1] [Cycle 1]: 0.00028089, [7] [b_1]: 0.00018774 [b_2]: 1.094e-05 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 4.09002e-06 [updatestate_loads_eliminate]: 4.08999e-06 [renormalize]: 4.00003e-07 [cse]: 3.182e-05 [optimize_parallel_all_gather_comm]: 2.019e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.011e-05 [loop_unroll]: 0.00042663 [opt_after_cconv]: 0.00017167, [1] [Cycle 1]: 0.00016575, [7] [c_1]: 8.311e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 7.45e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.83001e-06 [cse]: 3.022e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.889e-05 [tuple_transform]: 0.00010077, [1] [Cycle 1]: 9.632e-05, [4] [d_1]: 6.601e-05 [none_parameter_eliminate]: 1.63002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.008e-05 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 5.591e-05 [cse_after_recomputation]: 3.194e-05, [1] [Cycle 1]: 2.744e-05, [1] [cse]: 2.217e-05 [environ_conv]: 8.92999e-06 [swap_dp_allreduce_reducescatter]: 7.47002e-06 [bias_add_comm_swap]: 2.48998e-06 [label_micro_interleaved_index]: 4.13999e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.14003e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.726e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 5.20999e-06 [overlap_recompute_and_grad_model_parallel]: 5.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.76e-06 [overlap_grad_ring_attention]: 4.99e-06 [overlap_grad_flash_sp]: 2.439e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.96e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 9.758e-05, [1] [Cycle 1]: 9.343e-05, [6] [build]: 9.47001e-06 [elim_shapecalc]: 1.307e-05 [elim_not_effective]: 1.8e-05 [opt_reshape]: 1.014e-05 [fold_const_symbol]: 1.486e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.96998e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.483e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00047074 [validate]: 4.607e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.118189 [execute]: 8.60001e-06 Sums bootstrap : 0.000562s : 0.39% type_inference : 0.011482s : 8.00% event_method : 0.000049s : 0.03% auto_monad : 0.000144s : 0.10% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000148s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.09% optimize.opt_a.loop_unroll : 0.000116s : 0.08% optimize.opt_a.a_1 : 0.003319s : 2.31% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001502s : 1.05% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.06% optimize.opt_a.a_after_grad : 0.000111s : 0.08% optimize.opt_a.renormalize : 0.003144s : 2.19% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000242s : 0.17% optimize.opt_a.a_3 : 0.000462s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000476s : 0.33% optimize.opt_b.b_1 : 0.000188s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000427s : 0.30% optimize.opt_after_cconv.c_1 : 0.000083s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000471s : 0.33% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.118189s : 82.33% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000801 222 5.73% : 0.000046s : 12: substitution.arithmetic_simplify 1.83% : 0.000015s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.89% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 57.56% : 0.000461s : 17: substitution.inline 2.01% : 0.000016s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.61% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.69% : 0.000014s : 20: substitution.remove_not_recompute_node 3.05% : 0.000024s : 10: substitution.replace_applicator 1.31% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.37% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.67% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.24% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.32% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.21% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011407 2 86.58% : 0.009877s : 1: type_inference.infer 13.42% : 0.001531s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.60% : 0.000126s : 17: replace.inline 42.40% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000487 33 92.93% : 0.000453s : 17: match.inline 7.07% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.19% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.24% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.30% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000004s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.13% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001629 34 56.47% : 0.000920s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.53% : 0.000709s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.172885 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.75% : 0.003018s : 1: add_attr 1.74% : 0.003010s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000152s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.35% : 0.000599s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000057s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000485s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.89% : 0.004993s : 117: opt.transform.opt_a 0.05% : 0.000082s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.50% : 0.011232s : 1: opt_a 0.10% : 0.000175s : 1: opt_after_cconv 0.28% : 0.000480s : 1: opt_after_jit_grad 0.17% : 0.000291s : 1: opt_b 7.84% : 0.013548s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.98% : 0.001694s : 2: renormalize.infer 0.83% : 0.001437s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000050s : 1: rewriter_after_opt_a 0.09% : 0.000153s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000100s : 1: symbol_engine_optimizer 68.38% : 0.118210s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.65% : 0.011496s : 1: type_inference 0.04% : 0.000072s : 1: validate TotalTime = 0.1271, [24] [bootstrap]: 0.00047029 [type_inference]: 0.00435508 [event_method]: 1.04e-05 [auto_monad]: 5.126e-05 [graph_reusing]: 5.09998e-06 [inline]: 1.60999e-06 [add_attr]: 0.00298434, [1] [add_attr_with_inline]: 0.0029762, [1] [Cycle 1]: 4.718e-05, [2] [tag_attr]: 1.205e-05 [meta_addattr_fg_expand]: 3.08e-06 [parallel-infer-symbol]: 2.75002e-06 [pre_auto_parallel]: 2.195e-05 [insert-virtual-dataset]: 2.96001e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.64e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00876543, [53] [py_interpret_to_execute]: 1.554e-05 [rewriter_before_opt_a]: 3.863e-05 [opt_a]: 0.00601135, [2] [Cycle 1]: 0.005381, [45] [expand_dump_flag]: 2.80002e-06 [switch_simplify]: 2.462e-05 [loop_unroll]: 1.355e-05 [a_1]: 0.0002927 [with_stream_mark]: 2.773e-05 [recompute_prepare]: 2.757e-05 [updatestate_depend_eliminate]: 4.37003e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 2.22999e-06 [a_2]: 0.00015126 [accelerated_algorithm]: 7.1e-06 [shard]: 2.86e-06 [meta_shard_fg_expand]: 2.66e-06 [shard_inline]: 5.93002e-06 [merge_send_recv]: 4.846e-05 [auto_parallel]: 1.857e-05 [parallel]: 2.829e-05 [flash_sp]: 9.19e-06 [merge_comm]: 3.94002e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 9.67999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 8.37998e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 6.12999e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.87002e-06 [cell_reuse_recompute_pass]: 1.83002e-06 [offload_activation]: 2.063e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.308e-05 [merge_recompute_call_nodes]: 4.919e-05 [before_grad]: 1.123e-05 [set_forward_comm_id_for_comm_node_pass]: 3.91999e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 2.73e-06 [receive_attached]: 2.18998e-06 [after_resolve]: 1.183e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00075801 [add_forward_monad_depend]: 4.64998e-06 [auto_monad_grad]: 1.92001e-06 [auto_monad_eliminator]: 1.541e-05 [cse]: 2.777e-05 [a_3]: 4.241e-05 [Cycle 2]: 0.00061943, [45] [expand_dump_flag]: 1.11002e-06 [switch_simplify]: 6.48998e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00014384 [with_stream_mark]: 1.035e-05 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 2.94001e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.88e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.28002e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.47001e-06 [parallel]: 4.17998e-06 [flash_sp]: 3.43999e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.29001e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 4.85001e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.19003e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.049e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 8.3e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 8e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 6.81001e-06 [cse]: 1.442e-05 [a_3]: 3.232e-05 [py_interpret_to_execute_after_opt_a]: 9.29e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.446e-05 [convert_after_rewriter]: 7.83999e-06 [order_py_execute_after_rewriter]: 5.29998e-06 [mutable_eliminate]: 0.00120799 [opt_b]: 0.00020032, [1] [Cycle 1]: 0.00018215, [7] [b_1]: 0.0001112 [b_2]: 7.68999e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 5.3001e-07 [cse]: 1.801e-05 [optimize_parallel_all_gather_comm]: 1.582e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.223e-05 [loop_unroll]: 0.00045343 [opt_after_cconv]: 9.762e-05, [1] [Cycle 1]: 9.177e-05, [7] [c_1]: 2.809e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.77999e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.699e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.179e-05 [tuple_transform]: 7.016e-05, [1] [Cycle 1]: 6.577e-05, [4] [d_1]: 4.003e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 2.06998e-06 [add_recomputation]: 7.391e-05 [cse_after_recomputation]: 3.392e-05, [1] [Cycle 1]: 1.918e-05, [1] [cse]: 1.317e-05 [environ_conv]: 6.617e-05 [swap_dp_allreduce_reducescatter]: 5.66998e-06 [bias_add_comm_swap]: 3.02002e-06 [label_micro_interleaved_index]: 4.41002e-06 [label_fine_grained_interleaved_index]: 2.43998e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 1.30001e-06 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.212e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.88001e-06 [overlap_recompute_and_grad_model_parallel]: 4.54998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 1.719e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 7.332e-05, [1] [Cycle 1]: 6.889e-05, [6] [build]: 2.94001e-06 [elim_shapecalc]: 9.91998e-06 [elim_not_effective]: 1.273e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 3.19997e-07 [detach_backward]: 2.31e-06 [pipeline_parallel_scheduler]: 1.84e-06 [auto_monad_reorder]: 1.565e-05 [get_jit_bprop_graph]: 1.20999e-06 [rewriter_after_jit_bprop_graph]: 3.61999e-06 [opt_after_jit_grad]: 0.00048376 [validate]: 6.215e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.109551 [execute]: 9.07001e-06 Sums bootstrap : 0.000470s : 0.39% type_inference : 0.004355s : 3.64% event_method : 0.000010s : 0.01% auto_monad : 0.000051s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.03% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000437s : 0.37% optimize.opt_a.with_stream_mark : 0.000038s : 0.03% optimize.opt_a.recompute_prepare : 0.000033s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000220s : 0.18% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000053s : 0.04% optimize.opt_a.auto_parallel : 0.000024s : 0.02% optimize.opt_a.parallel : 0.000032s : 0.03% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000027s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000024s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000050s : 0.04% optimize.opt_a.before_grad : 0.000020s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000758s : 0.63% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000075s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.001208s : 1.01% optimize.opt_b.b_1 : 0.000111s : 0.09% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000453s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000074s : 0.06% optimize.cse_after_recomputation.cse : 0.000013s : 0.01% optimize.environ_conv : 0.000066s : 0.06% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000484s : 0.40% validate : 0.000062s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.109551s : 91.62% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000131 26 22.21% : 0.000029s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 4.64% : 0.000006s : 4: substitution.graph_param_transform 60.75% : 0.000080s : 2: substitution.inline 2.58% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.83% : 0.000005s : 4: substitution.remove_not_recompute_node 3.49% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004314 2 90.51% : 0.003905s : 1: type_inference.infer 9.49% : 0.000409s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000181 984 0.64% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.56% : 0.000001s : 9: predicate.addn_zero_filter 0.54% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 22.48% : 0.000041s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.63% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.68% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.62% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.90% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.87% : 0.000002s : 13: predicate.environ_get_add_eliminate 0.79% : 0.000001s : 13: predicate.environ_get_depend_swap 1.40% : 0.000003s : 21: predicate.environ_get_eliminate 0.83% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.75% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.39% : 0.000003s : 11: predicate.float_depend_g_call 0.80% : 0.000001s : 8: predicate.float_environ_get_switch 0.78% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.66% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 4.47% : 0.000008s : 44: predicate.inline 0.74% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000002s : 8: predicate.less_batch_normalization 1.29% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.71% : 0.000003s : 26: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.34% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.28% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.52% : 0.000001s : 9: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 0.98% : 0.000002s : 11: predicate.partial_defer_inline 0.93% : 0.000002s : 13: predicate.partial_eliminate 0.61% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 0.83% : 0.000002s : 9: predicate.reduce_eliminate 1.66% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.02% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.62% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.74% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.75% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000002s : 8: predicate.specialize_transform 0.87% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.78% : 0.000001s : 11: predicate.switch_defer_inline 1.34% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.43% : 0.000006s : 41: predicate.switch_simplify 0.57% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.12% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.20% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.05% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.39% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.23% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.88% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.16% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.56% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.43% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.67% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000366 6 27.75% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 72.25% : 0.000265s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.140612 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.13% : 0.002989s : 1: add_attr 2.12% : 0.002979s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000079s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000056s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000571s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.05% : 0.000071s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000463s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.87% : 0.001219s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.63% : 0.000887s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000092s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000034s : 4: opt.transform.symbol_engine_opt 4.28% : 0.006015s : 1: opt_a 0.07% : 0.000101s : 1: opt_after_cconv 0.35% : 0.000494s : 1: opt_after_jit_grad 0.14% : 0.000204s : 1: opt_b 6.24% : 0.008770s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.01% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.29% : 0.000404s : 1: renormalize.infer 0.22% : 0.000307s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000038s : 1: rewriter_after_opt_a 0.03% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000076s : 1: symbol_engine_optimizer 77.93% : 0.109573s : 1: task_emit 0.05% : 0.000073s : 1: tuple_transform 3.11% : 0.004369s : 1: type_inference 0.07% : 0.000100s : 1: validate TotalTime = 0.150541, [24] [bootstrap]: 0.00051977 [type_inference]: 0.0106566 [event_method]: 4.339e-05 [auto_monad]: 0.00011395 [graph_reusing]: 8.25e-06 [inline]: 2.44999e-06 [add_attr]: 0.00322232, [1] [add_attr_with_inline]: 0.00321422, [1] [Cycle 1]: 6.947e-05, [2] [tag_attr]: 3.304e-05 [meta_addattr_fg_expand]: 8.38999e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 4.635e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0134553, [53] [py_interpret_to_execute]: 3.543e-05 [rewriter_before_opt_a]: 0.00012857 [opt_a]: 0.0111129, [3] [Cycle 1]: 0.00715235, [45] [expand_dump_flag]: 4.13999e-06 [switch_simplify]: 6.671e-05 [loop_unroll]: 5.545e-05 [a_1]: 0.00140404 [with_stream_mark]: 2.396e-05 [recompute_prepare]: 2.19e-05 [updatestate_depend_eliminate]: 9.10999e-06 [updatestate_assign_eliminate]: 7.8e-06 [updatestate_loads_eliminate]: 7.46001e-06 [parameter_eliminate]: 2.60002e-06 [a_2]: 0.00024463 [accelerated_algorithm]: 3.014e-05 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.639e-05 [merge_send_recv]: 1.627e-05 [auto_parallel]: 1.07e-05 [parallel]: 1.853e-05 [flash_sp]: 1.098e-05 [merge_comm]: 9.98998e-06 [allreduce_fusion]: 8.79e-06 [matmul_add_comm_reduction]: 2.721e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.813e-05 [virtual_dataset]: 1.6e-05 [get_grad_eliminate_]: 1.563e-05 [virtual_output]: 1.551e-05 [merge_forward]: 9.34e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.731e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.88e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.7e-05 [set_forward_comm_id_for_comm_node_pass]: 9.56e-06 [meta_fg_expand]: 0.00140567 [flash_sp_send_recv_attached]: 3.68e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 5.973e-05 [a_after_grad]: 8.241e-05 [renormalize]: 0.00257928 [add_forward_monad_depend]: 9.71e-06 [auto_monad_grad]: 5.44e-06 [auto_monad_eliminator]: 5.74e-05 [cse]: 0.00017027 [a_3]: 0.00033505 [Cycle 2]: 0.00303745, [45] [expand_dump_flag]: 1.63997e-06 [switch_simplify]: 4.719e-05 [loop_unroll]: 4.45e-05 [a_1]: 0.00154811 [with_stream_mark]: 1.269e-05 [recompute_prepare]: 1.119e-05 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 4.65999e-06 [updatestate_loads_eliminate]: 3.75003e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012611 [accelerated_algorithm]: 1.185e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.99999e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.69001e-06 [auto_parallel]: 7.73999e-06 [parallel]: 5.17e-06 [flash_sp]: 3.06999e-06 [merge_comm]: 5.12999e-06 [allreduce_fusion]: 4.82998e-06 [matmul_add_comm_reduction]: 7.93001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.083e-05 [virtual_dataset]: 9.26998e-06 [get_grad_eliminate_]: 8.89e-06 [virtual_output]: 8.63001e-06 [merge_forward]: 4.249e-05 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 9.83998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.707e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.416e-05 [set_forward_comm_id_for_comm_node_pass]: 5.56e-06 [meta_fg_expand]: 3.705e-05 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.12999e-06 [after_resolve]: 1.492e-05 [a_after_grad]: 1.438e-05 [renormalize]: 0.00059637 [add_forward_monad_depend]: 4.31002e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.556e-05 [cse]: 4.615e-05 [a_3]: 6.486e-05 [Cycle 3]: 0.00090819, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.068e-05 [loop_unroll]: 9.02999e-06 [a_1]: 0.00025226 [with_stream_mark]: 1.001e-05 [recompute_prepare]: 9.52001e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 4e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012385 [accelerated_algorithm]: 1.188e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 7.11001e-06 [auto_parallel]: 7.51999e-06 [parallel]: 4.61002e-06 [flash_sp]: 1.04e-06 [merge_comm]: 4.90001e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.83001e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.50999e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 8.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.622e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.429e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 3.04001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.323e-05 [a_after_grad]: 1.51e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 1.132e-05 [cse]: 2.649e-05 [a_3]: 5.919e-05 [py_interpret_to_execute_after_opt_a]: 1.069e-05 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 4.692e-05 [convert_after_rewriter]: 9.85002e-06 [order_py_execute_after_rewriter]: 6.86001e-06 [mutable_eliminate]: 0.0004737 [opt_b]: 0.00028971, [1] [Cycle 1]: 0.00028339, [7] [b_1]: 0.0001897 [b_2]: 1.104e-05 [updatestate_depend_eliminate]: 7.57998e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.96001e-06 [renormalize]: 4.19997e-07 [cse]: 3.195e-05 [optimize_parallel_all_gather_comm]: 1.985e-05 [overlap_param_gather]: 2.10002e-06 [cconv]: 2.08e-05 [loop_unroll]: 0.00042476 [opt_after_cconv]: 0.00013666, [1] [Cycle 1]: 0.00013049, [7] [c_1]: 4.872e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 7.38e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 4.08001e-06 [cse]: 2.994e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.939e-05 [tuple_transform]: 0.00010136, [1] [Cycle 1]: 9.64e-05, [4] [d_1]: 6.649e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.68997e-06 [partial_unused_args_eliminate]: 2.08002e-06 [add_recomputation]: 8.459e-05 [cse_after_recomputation]: 3.385e-05, [1] [Cycle 1]: 2.877e-05, [1] [cse]: 2.296e-05 [environ_conv]: 9.66998e-06 [swap_dp_allreduce_reducescatter]: 8.06001e-06 [bias_add_comm_swap]: 2.59999e-06 [label_micro_interleaved_index]: 4.43001e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 9.79984e-07 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.53002e-06 [control_data_broadcast_order]: 1.684e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 5.99e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 5.40001e-06 [overlap_grad_flash_sp]: 2.433e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 0.00014967, [1] [Cycle 1]: 0.00014496, [6] [build]: 1.038e-05 [elim_shapecalc]: 1.386e-05 [elim_not_effective]: 1.848e-05 [opt_reshape]: 1.009e-05 [fold_const_symbol]: 6.341e-05 [renormalize]: 1.79978e-07 [detach_backward]: 2.02999e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.542e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00046481 [validate]: 4.678e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.121682 [execute]: 9.09e-06 Sums bootstrap : 0.000520s : 0.36% type_inference : 0.010657s : 7.30% event_method : 0.000043s : 0.03% auto_monad : 0.000114s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.02% optimize.rewriter_before_opt_a : 0.000129s : 0.09% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.09% optimize.opt_a.loop_unroll : 0.000109s : 0.07% optimize.opt_a.a_1 : 0.003204s : 2.19% optimize.opt_a.with_stream_mark : 0.000047s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000056s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001446s : 0.99% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.06% optimize.opt_a.a_after_grad : 0.000112s : 0.08% optimize.opt_a.renormalize : 0.003176s : 2.17% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.06% optimize.opt_a.cse : 0.000243s : 0.17% optimize.opt_a.a_3 : 0.000459s : 0.31% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000474s : 0.32% optimize.opt_b.b_1 : 0.000190s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.01% optimize.loop_unroll : 0.000425s : 0.29% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000085s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000063s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000465s : 0.32% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.121682s : 83.32% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000747 218 5.87% : 0.000044s : 11: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.06% : 0.000412s : 16: substitution.inline 2.11% : 0.000016s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.81% : 0.000006s : 5: substitution.partial_eliminate 1.86% : 0.000014s : 20: substitution.remove_not_recompute_node 3.29% : 0.000025s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.72% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.56% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.48% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010585 2 87.25% : 0.009236s : 1: type_inference.infer 12.75% : 0.001349s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.94% : 0.000119s : 16: replace.inline 41.06% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000434 30 92.80% : 0.000403s : 16: match.inline 7.20% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.12% : 0.000008s : 67: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000019s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.24% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.20% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.74% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 67: predicate.reduce_eliminate 2.69% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.89% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.92% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.16% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001632 32 59.52% : 0.000971s : 12: func_graph_cloner_run.FuncGraphClonerGraph 40.48% : 0.000661s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.175628 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.84% : 0.003227s : 1: add_attr 1.83% : 0.003218s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.05% : 0.000089s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000121s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.32% : 0.000555s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000050s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000483s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.77% : 0.004861s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.06% : 0.000102s : 4: opt.transform.symbol_engine_opt 6.33% : 0.011116s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.27% : 0.000474s : 1: opt_after_jit_grad 0.17% : 0.000293s : 1: opt_b 7.66% : 0.013460s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.02% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.97% : 0.001711s : 2: renormalize.infer 0.83% : 0.001452s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.08% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000152s : 1: symbol_engine_optimizer 69.30% : 0.121704s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.08% : 0.010674s : 1: type_inference 0.04% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x4-ge],max_mem:24.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x5-pynative],max_mem:24.0M TotalTime = 0.0222727, [24] [bootstrap]: 0.00055978 [type_inference]: 0.00644813 [event_method]: 1.434e-05 [auto_monad]: 5.699e-05 [graph_reusing]: 5.35001e-06 [inline]: 1.83002e-06 [add_attr]: 0.00352932, [1] [add_attr_with_inline]: 0.00351845, [1] [Cycle 1]: 4.558e-05, [2] [tag_attr]: 1.435e-05 [meta_addattr_fg_expand]: 4.08001e-06 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 2.957e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00398432, [53] [py_interpret_to_execute]: 2.144e-05 [rewriter_before_opt_a]: 5.655e-05 [opt_a]: 0.0021456, [2] [Cycle 1]: 0.00153557, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 3.132e-05 [loop_unroll]: 2.101e-05 [a_1]: 0.00044833 [with_stream_mark]: 1.282e-05 [recompute_prepare]: 7.58999e-06 [updatestate_depend_eliminate]: 3.77002e-06 [updatestate_assign_eliminate]: 3.31001e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 7.55e-05 [accelerated_algorithm]: 6.63998e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 7.84002e-06 [auto_parallel]: 5.61e-06 [parallel]: 2.272e-05 [flash_sp]: 6.76999e-06 [merge_comm]: 3.83999e-06 [allreduce_fusion]: 3.66001e-06 [matmul_add_comm_reduction]: 9.37001e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 7.37002e-06 [virtual_dataset]: 6.22001e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 4.13999e-06 [cell_reuse_recompute_pass]: 1.21997e-06 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.129e-05 [merge_recompute_call_nodes]: 1.97001e-06 [before_grad]: 8.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.62998e-06 [renormalize]: 0.0004448 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.295e-05 [cse]: 2.736e-05 [a_3]: 4.005e-05 [Cycle 2]: 0.00060045, [45] [expand_dump_flag]: 9.69972e-07 [switch_simplify]: 6.75998e-06 [loop_unroll]: 5.27001e-06 [a_1]: 0.00013309 [with_stream_mark]: 9.86998e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.785e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 4.50999e-06 [auto_parallel]: 5.18002e-06 [parallel]: 4.26001e-06 [flash_sp]: 4e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.14998e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.98998e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.55002e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 6.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.45001e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 7.56999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01999e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 7.73999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.01997e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.37e-05 [a_3]: 3.144e-05 [py_interpret_to_execute_after_opt_a]: 7.16001e-06 [slice_cell_reuse_recomputed_activation]: 1.85001e-06 [rewriter_after_opt_a]: 2.861e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00045158 [opt_b]: 0.00018116, [1] [Cycle 1]: 0.00017474, [7] [b_1]: 0.00010667 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.10015e-07 [cse]: 1.694e-05 [optimize_parallel_all_gather_comm]: 1.565e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.215e-05 [loop_unroll]: 0.00041472 [opt_after_cconv]: 9.381e-05, [1] [Cycle 1]: 8.837e-05, [7] [c_1]: 2.758e-05 [parameter_eliminate]: 2.02999e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.653e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.289e-05 [tuple_transform]: 6.85e-05, [1] [Cycle 1]: 6.422e-05, [4] [d_1]: 3.839e-05 [none_parameter_eliminate]: 1.81003e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 4.927e-05 [cse_after_recomputation]: 2.068e-05, [1] [Cycle 1]: 1.614e-05, [1] [cse]: 1.091e-05 [environ_conv]: 5.07e-06 [swap_dp_allreduce_reducescatter]: 5.71998e-06 [bias_add_comm_swap]: 3.11001e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.50002e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.29999e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 1.06002e-06 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.24003e-06 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.34997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 4.05998e-06 [overlap_grad_flash_sp]: 1.714e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 6.728e-05, [1] [Cycle 1]: 6.298e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.2e-06 [elim_not_effective]: 1.118e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.85001e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 1.529e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 0.00012818 [opt_after_jit_grad]: 0.00045723 [validate]: 3.221e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0067849 [execute]: 7.71001e-06 Sums bootstrap : 0.000560s : 3.15% type_inference : 0.006448s : 36.28% event_method : 0.000014s : 0.08% auto_monad : 0.000057s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.21% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000581s : 3.27% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.81% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000016s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000445s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000041s : 0.23% optimize.opt_a.a_3 : 0.000071s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.54% optimize.opt_b.b_1 : 0.000107s : 0.60% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000415s : 2.33% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000128s : 0.72% opt_after_jit_grad : 0.000457s : 2.57% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006785s : 38.17% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.84% : 0.000024s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000001s : 2: substitution.fold_const_symbol 3.33% : 0.000005s : 4: substitution.graph_param_transform 66.62% : 0.000109s : 3: substitution.inline 1.59% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.90% : 0.000005s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.33% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006398 2 90.70% : 0.005803s : 1: type_inference.infer 9.30% : 0.000595s : 1: type_inference.specialize ------[replace.] 0.000038 5 71.14% : 0.000027s : 3: replace.inline 28.86% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.98% : 0.000107s : 3: match.inline 8.02% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 1131 0.81% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 5.74% : 0.000010s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 15: predicate.environ_get_depend_swap 1.71% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.60% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.04% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.56% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.51% : 0.000002s : 16: predicate.partial_defer_inline 1.40% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.24% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.43% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.26% : 0.000000s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.83% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.29% : 0.000002s : 16: predicate.switch_defer_inline 1.93% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.77% : 0.000008s : 54: predicate.switch_simplify 0.77% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.24% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.55% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.22% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.05% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000377 8 45.50% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.50% : 0.000206s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031324 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.28% : 0.003534s : 1: add_attr 11.24% : 0.003522s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000062s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.91% : 0.000599s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.35% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.47% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000011s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.01% : 0.000944s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.86% : 0.002148s : 1: opt_a 0.31% : 0.000097s : 1: opt_after_cconv 1.49% : 0.000468s : 1: opt_after_jit_grad 0.59% : 0.000184s : 1: opt_b 12.73% : 0.003988s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.73% : 0.000229s : 1: renormalize.infer 0.67% : 0.000209s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.43% : 0.000133s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000033s : 1: rewriter_after_opt_a 0.19% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000070s : 1: symbol_engine_optimizer 21.69% : 0.006795s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.63% : 0.006462s : 1: type_inference 0.19% : 0.000060s : 1: validate TotalTime = 0.0182816, [24] [bootstrap]: 0.00049251 [type_inference]: 0.00440668 [event_method]: 1.036e-05 [auto_monad]: 4.849e-05 [graph_reusing]: 4.57e-06 [inline]: 1.78997e-06 [add_attr]: 0.00297269, [1] [add_attr_with_inline]: 0.00296507, [1] [Cycle 1]: 4.523e-05, [2] [tag_attr]: 1.189e-05 [meta_addattr_fg_expand]: 2.94999e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 2.22e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00369447, [53] [py_interpret_to_execute]: 1.478e-05 [rewriter_before_opt_a]: 3.838e-05 [opt_a]: 0.00189302, [2] [Cycle 1]: 0.00129376, [45] [expand_dump_flag]: 2.47001e-06 [switch_simplify]: 2.432e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00032029 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.26001e-06 [updatestate_depend_eliminate]: 3.53999e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.16999e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.68e-05 [accelerated_algorithm]: 6.16e-06 [shard]: 2.57001e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 7.25e-06 [auto_parallel]: 6.12999e-06 [parallel]: 1.78e-05 [flash_sp]: 6.91001e-06 [merge_comm]: 3.68999e-06 [allreduce_fusion]: 3.52997e-06 [matmul_add_comm_reduction]: 8.76997e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.61001e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.90002e-06 [virtual_output]: 5.44998e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.33002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.48998e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.122e-05 [a_after_grad]: 9.14998e-06 [renormalize]: 0.00035232 [add_forward_monad_depend]: 4.03999e-06 [auto_monad_grad]: 1.81998e-06 [auto_monad_eliminator]: 1.286e-05 [cse]: 2.718e-05 [a_3]: 4.021e-05 [Cycle 2]: 0.00058972, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.49998e-06 [a_1]: 0.00012532 [with_stream_mark]: 1.062e-05 [recompute_prepare]: 5.75001e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 6.802e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.2e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.12e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.78998e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 2.30008e-07 [virtual_shard_identity]: 5.88998e-06 [virtual_dataset]: 5.16002e-06 [get_grad_eliminate_]: 4.94003e-06 [virtual_output]: 4.78001e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.91e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.73999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.91997e-06 [a_after_grad]: 7.71001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.07001e-06 [cse]: 1.271e-05 [a_3]: 3.124e-05 [py_interpret_to_execute_after_opt_a]: 7.48e-06 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.126e-05 [convert_after_rewriter]: 6.97002e-06 [order_py_execute_after_rewriter]: 5.33002e-06 [mutable_eliminate]: 0.00044803 [opt_b]: 0.00017807, [1] [Cycle 1]: 0.00017203, [7] [b_1]: 0.0001067 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 3.39991e-07 [cse]: 1.568e-05 [optimize_parallel_all_gather_comm]: 1.564e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.202e-05 [loop_unroll]: 0.00041435 [opt_after_cconv]: 9.463e-05, [1] [Cycle 1]: 8.887e-05, [7] [c_1]: 2.769e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.559e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.245e-05 [tuple_transform]: 6.96e-05, [1] [Cycle 1]: 6.489e-05, [4] [d_1]: 3.894e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.18998e-06 [partial_unused_args_eliminate]: 2.17001e-06 [add_recomputation]: 4.51e-05 [cse_after_recomputation]: 1.965e-05, [1] [Cycle 1]: 1.543e-05, [1] [cse]: 1.056e-05 [environ_conv]: 4.45999e-06 [swap_dp_allreduce_reducescatter]: 4.94e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.97e-06 [label_fine_grained_interleaved_index]: 2.43998e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.33998e-06 [reorder_send_recv_between_fp_bp]: 2.53003e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.116e-05 [grouped_pairwise_exchange_alltoall]: 2.04e-06 [offloading_packed_experts]: 3.88001e-06 [overlap_recompute_and_grad_model_parallel]: 4.71002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27999e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.749e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 1.24998e-06 [symbol_engine_optimizer]: 6.78e-05, [1] [Cycle 1]: 6.362e-05, [6] [build]: 2.36998e-06 [elim_shapecalc]: 8.16002e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.29001e-06 [opt_after_jit_grad]: 0.00045 [validate]: 3.18e-05 [backend_pass]: 8.49977e-07 [task_emit]: 0.00588002 [execute]: 8.18001e-06 Sums bootstrap : 0.000493s : 3.44% type_inference : 0.004407s : 30.76% event_method : 0.000010s : 0.07% auto_monad : 0.000048s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000446s : 3.11% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000011s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000352s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.13% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000414s : 2.89% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.02% optimize.add_recomputation : 0.000045s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 3.14% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005880s : 41.05% execute : 0.000008s : 0.06% Time group info: ------[substitution.] 0.000147 26 15.13% : 0.000022s : 4: substitution.arithmetic_simplify 1.30% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.71% : 0.000005s : 4: substitution.graph_param_transform 70.93% : 0.000104s : 2: substitution.inline 1.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.22% : 0.000005s : 4: substitution.remove_not_recompute_node 3.01% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004366 2 91.99% : 0.004016s : 1: type_inference.infer 8.01% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000102 2 100.00% : 0.000102s : 2: match.inline ------[predicate.] 0.000137 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.07% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.41% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.76% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.35% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.55% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.42% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.59% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.38% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.10% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.90% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026256 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.34% : 0.002977s : 1: add_attr 11.31% : 0.002968s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.19% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.20% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.01% : 0.000528s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.61% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000457s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.03% : 0.000797s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.22% : 0.001896s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.87% : 0.000492s : 1: opt_after_jit_grad 0.69% : 0.000181s : 1: opt_b 14.08% : 0.003698s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.75% : 0.000196s : 1: renormalize.infer 0.57% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.43% : 0.005890s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.83% : 0.004420s : 1: type_inference 0.22% : 0.000059s : 1: validate TotalTime = 0.0261317, [24] [bootstrap]: 0.00671916 [type_inference]: 0.00562387 [event_method]: 1.436e-05 [auto_monad]: 5.377e-05 [graph_reusing]: 5.11002e-06 [inline]: 1.79e-06 [add_attr]: 0.00297507, [1] [add_attr_with_inline]: 0.00296712, [1] [Cycle 1]: 4.626e-05, [2] [tag_attr]: 1.509e-05 [meta_addattr_fg_expand]: 4.24002e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.473e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00397109, [53] [py_interpret_to_execute]: 2.109e-05 [rewriter_before_opt_a]: 5.904e-05 [opt_a]: 0.00213652, [2] [Cycle 1]: 0.00153622, [45] [expand_dump_flag]: 2.68998e-06 [switch_simplify]: 3.074e-05 [loop_unroll]: 2.092e-05 [a_1]: 0.0004466 [with_stream_mark]: 1.31e-05 [recompute_prepare]: 7.93001e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 7.54e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 6.05002e-06 [merge_send_recv]: 8.24998e-06 [auto_parallel]: 6.54001e-06 [parallel]: 1.804e-05 [flash_sp]: 6.72002e-06 [merge_comm]: 4.38001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.62e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.01999e-06 [virtual_dataset]: 6.00002e-06 [get_grad_eliminate_]: 5.86e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 8.95001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.132e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 9.76e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.74999e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.84e-06 [renormalize]: 0.00041611 [add_forward_monad_depend]: 4.08001e-06 [auto_monad_grad]: 1.68997e-06 [auto_monad_eliminator]: 1.389e-05 [cse]: 2.727e-05 [a_3]: 4.076e-05 [Cycle 2]: 0.00059091, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.48002e-06 [a_1]: 0.00012476 [with_stream_mark]: 9.54e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.739e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.39998e-06 [parallel]: 4.28001e-06 [flash_sp]: 2.94999e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.70997e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64999e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.84e-06 [a_after_grad]: 7.9e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.04001e-06 [cse]: 1.381e-05 [a_3]: 3.171e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.244e-05 [convert_after_rewriter]: 6.84001e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.0004536 [opt_b]: 0.00017996, [1] [Cycle 1]: 0.0001737, [7] [b_1]: 0.0001061 [b_2]: 6.75998e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.09e-06 [renormalize]: 3.19997e-07 [cse]: 1.699e-05 [optimize_parallel_all_gather_comm]: 1.588e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.171e-05 [loop_unroll]: 0.00041444 [opt_after_cconv]: 9.552e-05, [1] [Cycle 1]: 8.961e-05, [7] [c_1]: 2.801e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.533e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 6.84e-05, [1] [Cycle 1]: 6.378e-05, [4] [d_1]: 3.825e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.97999e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.41e-05 [cse_after_recomputation]: 2.034e-05, [1] [Cycle 1]: 1.598e-05, [1] [cse]: 1.07e-05 [environ_conv]: 4.58001e-06 [swap_dp_allreduce_reducescatter]: 5.17999e-06 [bias_add_comm_swap]: 2.66999e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 1.20001e-06 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.07e-06 [interleave_parallel_branches]: 1.33002e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66998e-06 [control_data_broadcast_order]: 1.171e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.46001e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.51998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 4.3e-06 [overlap_grad_flash_sp]: 1.729e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 2.14999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.583e-05, [1] [Cycle 1]: 6.174e-05, [6] [build]: 2.51e-06 [elim_shapecalc]: 8.01001e-06 [elim_not_effective]: 1.068e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.54e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.57001e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.617e-05 [get_jit_bprop_graph]: 9.29984e-07 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00044658 [validate]: 3.064e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00603051 [execute]: 7.13e-06 Sums bootstrap : 0.006719s : 30.31% type_inference : 0.005624s : 25.37% event_method : 0.000014s : 0.06% auto_monad : 0.000054s : 0.24% graph_reusing : 0.000005s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.11% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.10% optimize.rewriter_before_opt_a : 0.000059s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.17% optimize.opt_a.loop_unroll : 0.000026s : 0.12% optimize.opt_a.a_1 : 0.000571s : 2.58% optimize.opt_a.with_stream_mark : 0.000023s : 0.10% optimize.opt_a.recompute_prepare : 0.000014s : 0.06% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.64% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.05% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.05% optimize.opt_a.merge_send_recv : 0.000013s : 0.06% optimize.opt_a.auto_parallel : 0.000012s : 0.05% optimize.opt_a.parallel : 0.000022s : 0.10% optimize.opt_a.flash_sp : 0.000010s : 0.04% optimize.opt_a.merge_comm : 0.000007s : 0.03% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.06% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.06% optimize.opt_a.virtual_dataset : 0.000011s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.07% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.09% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.08% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.09% optimize.opt_a.a_after_grad : 0.000017s : 0.08% optimize.opt_a.renormalize : 0.000416s : 1.88% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.02% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.09% optimize.opt_a.cse : 0.000041s : 0.19% optimize.opt_a.a_3 : 0.000072s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.15% optimize.convert_after_rewriter : 0.000007s : 0.03% optimize.order_py_execute_after_rewriter : 0.000005s : 0.02% optimize.mutable_eliminate : 0.000454s : 2.05% optimize.opt_b.b_1 : 0.000106s : 0.48% optimize.opt_b.b_2 : 0.000007s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.10% optimize.loop_unroll : 0.000414s : 1.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.07% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.06% optimize.tuple_transform.d_1 : 0.000038s : 0.17% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.20% optimize.cse_after_recomputation.cse : 0.000011s : 0.05% optimize.environ_conv : 0.000005s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000447s : 2.01% validate : 0.000031s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.006031s : 27.20% execute : 0.000007s : 0.03% Time group info: ------[substitution.] 0.000162 30 14.83% : 0.000024s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.51% : 0.000107s : 3: substitution.inline 1.93% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.80% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005583 2 88.89% : 0.004963s : 1: type_inference.infer 11.11% : 0.000620s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.88% : 0.000027s : 3: replace.inline 30.12% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.41% : 0.000105s : 3: match.inline 8.59% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 1.10% : 0.000002s : 11: predicate.accumulaten_eliminater 1.07% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.48% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.60% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.48% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.31% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 45.94% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.06% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.034585 196 0.01% : 0.000004s : 1: ForceFp32Comm 8.61% : 0.002979s : 1: add_attr 8.59% : 0.002971s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.14% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.17% : 0.000059s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 19.53% : 0.006755s : 1: bootstrap 0.07% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.03% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.22% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.34% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.71% : 0.000937s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.06% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000088s : 28: opt.transform.opt_b 0.12% : 0.000042s : 2: opt.transform.opt_trans_graph 0.09% : 0.000030s : 4: opt.transform.symbol_engine_opt 6.19% : 0.002139s : 1: opt_a 0.29% : 0.000099s : 1: opt_after_cconv 1.32% : 0.000456s : 1: opt_after_jit_grad 0.53% : 0.000183s : 1: opt_b 11.49% : 0.003975s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000029s : 1: pre_auto_parallel 0.07% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.62% : 0.000213s : 1: renormalize.infer 0.57% : 0.000196s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000036s : 1: rewriter_after_opt_a 0.18% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.20% : 0.000068s : 1: symbol_engine_optimizer 17.46% : 0.006040s : 1: task_emit 0.21% : 0.000071s : 1: tuple_transform 16.30% : 0.005638s : 1: type_inference 0.16% : 0.000057s : 1: validate TotalTime = 0.0377292, [24] [bootstrap]: 0.00050925 [type_inference]: 0.011488 [event_method]: 4.815e-05 [auto_monad]: 0.00011926 [graph_reusing]: 8.13001e-06 [inline]: 1.99999e-06 [add_attr]: 0.00304751, [1] [add_attr_with_inline]: 0.00303842, [1] [Cycle 1]: 7.215e-05, [2] [tag_attr]: 3.526e-05 [meta_addattr_fg_expand]: 9.49e-06 [parallel-infer-symbol]: 3.09999e-06 [pre_auto_parallel]: 5.028e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.86e-06 [optimize]: 0.0133812, [53] [py_interpret_to_execute]: 3.828e-05 [rewriter_before_opt_a]: 0.00014622 [opt_a]: 0.0110953, [3] [Cycle 1]: 0.00716961, [45] [expand_dump_flag]: 3.41001e-06 [switch_simplify]: 7.321e-05 [loop_unroll]: 6.24e-05 [a_1]: 0.00148864 [with_stream_mark]: 2.297e-05 [recompute_prepare]: 2.179e-05 [updatestate_depend_eliminate]: 9.14e-06 [updatestate_assign_eliminate]: 7.92e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 2.66e-06 [a_2]: 0.0002439 [accelerated_algorithm]: 3.082e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 3.4e-06 [shard_inline]: 1.607e-05 [merge_send_recv]: 1.533e-05 [auto_parallel]: 1.098e-05 [parallel]: 1.858e-05 [flash_sp]: 1.101e-05 [merge_comm]: 9.67999e-06 [allreduce_fusion]: 9.20001e-06 [matmul_add_comm_reduction]: 2.635e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.793e-05 [virtual_dataset]: 1.596e-05 [get_grad_eliminate_]: 1.514e-05 [virtual_output]: 1.509e-05 [merge_forward]: 9.52999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.773e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.876e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.768e-05 [set_forward_comm_id_for_comm_node_pass]: 9.61998e-06 [meta_fg_expand]: 0.00140969 [flash_sp_send_recv_attached]: 3.82002e-06 [receive_attached]: 2.51e-06 [after_resolve]: 5.913e-05 [a_after_grad]: 8.204e-05 [renormalize]: 0.00250971 [add_forward_monad_depend]: 8.95999e-06 [auto_monad_grad]: 4.85001e-06 [auto_monad_eliminator]: 5.538e-05 [cse]: 0.0001643 [a_3]: 0.00033447 [Cycle 2]: 0.00301703, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.691e-05 [loop_unroll]: 4.337e-05 [a_1]: 0.00151892 [with_stream_mark]: 1.254e-05 [recompute_prepare]: 1.097e-05 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 3.64002e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.0001254 [accelerated_algorithm]: 1.196e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 2.41e-06 [shard_inline]: 9.15001e-06 [merge_send_recv]: 6.78998e-06 [auto_parallel]: 7.72998e-06 [parallel]: 5.05999e-06 [flash_sp]: 3.20998e-06 [merge_comm]: 5.20001e-06 [allreduce_fusion]: 5.12e-06 [matmul_add_comm_reduction]: 8.18001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 9.89001e-06 [virtual_dataset]: 8.92999e-06 [get_grad_eliminate_]: 8.68001e-06 [virtual_output]: 8.38999e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 9.27999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.616e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.415e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22e-06 [meta_fg_expand]: 9.072e-05 [flash_sp_send_recv_attached]: 1.04e-06 [receive_attached]: 1.24e-06 [after_resolve]: 1.659e-05 [a_after_grad]: 1.486e-05 [renormalize]: 0.0005961 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.456e-05 [cse]: 4.6e-05 [a_3]: 6.382e-05 [Cycle 3]: 0.00089445, [45] [expand_dump_flag]: 1.25001e-06 [switch_simplify]: 1.046e-05 [loop_unroll]: 8.83001e-06 [a_1]: 0.00024857 [with_stream_mark]: 9.85002e-06 [recompute_prepare]: 9.53997e-06 [updatestate_depend_eliminate]: 4.88001e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.86001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 0.00012311 [accelerated_algorithm]: 1.152e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.71002e-06 [shard_inline]: 8.94998e-06 [merge_send_recv]: 7.28999e-06 [auto_parallel]: 7.30998e-06 [parallel]: 4.50001e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 5.19e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 7.81001e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 8.79e-06 [get_grad_eliminate_]: 8.43999e-06 [virtual_output]: 8.38001e-06 [merge_forward]: 4.32e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 8.57e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.428e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.309e-05 [a_after_grad]: 1.408e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 1.067e-05 [cse]: 2.532e-05 [a_3]: 5.704e-05 [py_interpret_to_execute_after_opt_a]: 1.135e-05 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 4.965e-05 [convert_after_rewriter]: 9.72001e-06 [order_py_execute_after_rewriter]: 6.84001e-06 [mutable_eliminate]: 0.00047763 [opt_b]: 0.0002884, [1] [Cycle 1]: 0.0002816, [7] [b_1]: 0.00018959 [b_2]: 1.082e-05 [updatestate_depend_eliminate]: 7.36999e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.9e-06 [renormalize]: 4.39992e-07 [cse]: 3.073e-05 [optimize_parallel_all_gather_comm]: 2.007e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.063e-05 [loop_unroll]: 0.00042264 [opt_after_cconv]: 0.00013386, [1] [Cycle 1]: 0.00012784, [7] [c_1]: 4.713e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 6.97002e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.75e-06 [cse]: 2.946e-05 [renormalize]: 5.59987e-07 [remove_dup_value]: 2.953e-05 [tuple_transform]: 0.00010277, [1] [Cycle 1]: 9.816e-05, [4] [d_1]: 6.797e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.87999e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 5.817e-05 [cse_after_recomputation]: 3.167e-05, [1] [Cycle 1]: 2.706e-05, [1] [cse]: 2.158e-05 [environ_conv]: 8.65001e-06 [swap_dp_allreduce_reducescatter]: 7.73001e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.48002e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.85002e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.07e-06 [interleave_parallel_branches]: 9.79984e-07 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91998e-06 [control_data_broadcast_order]: 1.624e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.12999e-06 [overlap_recompute_and_grad_model_parallel]: 5.95002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.06e-06 [overlap_grad_ring_attention]: 5.34e-06 [overlap_grad_flash_sp]: 2.564e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 9.671e-05, [1] [Cycle 1]: 9.264e-05, [6] [build]: 9.77001e-06 [elim_shapecalc]: 1.276e-05 [elim_not_effective]: 1.776e-05 [opt_reshape]: 9.90002e-06 [fold_const_symbol]: 1.479e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.21e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 2.476e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.10998e-06 [opt_after_jit_grad]: 0.00049126 [validate]: 4.509e-05 [backend_pass]: 1.19003e-06 [task_emit]: 0.00828174 [execute]: 7.16999e-06 Sums bootstrap : 0.000509s : 1.52% type_inference : 0.011488s : 34.36% event_method : 0.000048s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000146s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.34% optimize.opt_a.a_1 : 0.003256s : 9.74% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.47% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001503s : 4.50% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000111s : 0.33% optimize.opt_a.renormalize : 0.003106s : 9.29% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000236s : 0.70% optimize.opt_a.a_3 : 0.000455s : 1.36% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000050s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000478s : 1.43% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000423s : 1.26% optimize.opt_after_cconv.c_1 : 0.000047s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000026s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000491s : 1.47% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008282s : 24.77% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000807 222 5.64% : 0.000045s : 12: substitution.arithmetic_simplify 1.77% : 0.000014s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.50% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 52.57% : 0.000424s : 17: substitution.inline 1.99% : 0.000016s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.87% : 0.000015s : 3: substitution.less_batch_normalization 7.13% : 0.000058s : 11: substitution.minmaximum_grad 0.71% : 0.000006s : 5: substitution.partial_eliminate 1.67% : 0.000013s : 20: substitution.remove_not_recompute_node 2.95% : 0.000024s : 10: substitution.replace_applicator 1.27% : 0.000010s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.36% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.66% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.23% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.16% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.31% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011413 2 87.12% : 0.009943s : 1: type_inference.infer 12.88% : 0.001470s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.12% : 0.000124s : 17: replace.inline 42.88% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000449 33 92.45% : 0.000415s : 17: match.inline 7.55% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.16% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.18% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.23% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.67% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000037s : 277: predicate.switch_simplify 1.11% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001583 34 57.22% : 0.000906s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.78% : 0.000677s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062499 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.88% : 0.003052s : 1: add_attr 4.87% : 0.003043s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000127s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.87% : 0.000543s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000019s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000487s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.86% : 0.004913s : 117: opt.transform.opt_a 0.07% : 0.000046s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.08% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.76% : 0.011098s : 1: opt_a 0.22% : 0.000137s : 1: opt_after_cconv 0.80% : 0.000501s : 1: opt_after_jit_grad 0.47% : 0.000292s : 1: opt_b 21.42% : 0.013385s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.74% : 0.001712s : 2: renormalize.infer 2.21% : 0.001380s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000054s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000099s : 1: symbol_engine_optimizer 13.27% : 0.008291s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 18.41% : 0.011503s : 1: type_inference 0.12% : 0.000078s : 1: validate TotalTime = 0.0203601, [24] [bootstrap]: 0.00046628 [type_inference]: 0.00431647 [event_method]: 1.071e-05 [auto_monad]: 7.762e-05 [graph_reusing]: 5.12999e-06 [inline]: 2.21998e-06 [add_attr]: 0.00297027, [1] [add_attr_with_inline]: 0.00296245, [1] [Cycle 1]: 4.611e-05, [2] [tag_attr]: 1.207e-05 [meta_addattr_fg_expand]: 3.10998e-06 [parallel-infer-symbol]: 3.2e-06 [pre_auto_parallel]: 2.159e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.59e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00371162, [53] [py_interpret_to_execute]: 1.524e-05 [rewriter_before_opt_a]: 3.882e-05 [opt_a]: 0.0019085, [2] [Cycle 1]: 0.00130752, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 2.401e-05 [loop_unroll]: 1.383e-05 [a_1]: 0.00029443 [with_stream_mark]: 1.332e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 3.66999e-06 [updatestate_assign_eliminate]: 3.04999e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.716e-05 [accelerated_algorithm]: 6.64999e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 5.86998e-06 [merge_send_recv]: 7.68999e-06 [auto_parallel]: 5.99e-06 [parallel]: 1.712e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.71999e-06 [matmul_add_comm_reduction]: 8.60999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 6.02999e-06 [get_grad_eliminate_]: 5.42999e-06 [virtual_output]: 5.76e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.78002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.131e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.36002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.28998e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.057e-05 [a_after_grad]: 8.69e-06 [renormalize]: 0.00039295 [add_forward_monad_depend]: 4.4e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.292e-05 [cse]: 2.787e-05 [a_3]: 3.98e-05 [Cycle 2]: 0.00059142, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.19e-06 [a_1]: 0.00012518 [with_stream_mark]: 1.079e-05 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.794e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.40999e-06 [auto_parallel]: 5.03002e-06 [parallel]: 4.35999e-06 [flash_sp]: 3.45e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.01e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.86001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 9.00999e-06 [a_after_grad]: 7.77e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.30002e-06 [cse]: 1.275e-05 [a_3]: 3.215e-05 [py_interpret_to_execute_after_opt_a]: 6.98e-06 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 3.037e-05 [convert_after_rewriter]: 6.74999e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00045108 [opt_b]: 0.00018326, [1] [Cycle 1]: 0.00017695, [7] [b_1]: 0.00010824 [b_2]: 7.42998e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.11e-06 [renormalize]: 3.50003e-07 [cse]: 1.665e-05 [optimize_parallel_all_gather_comm]: 1.559e-05 [overlap_param_gather]: 1.67999e-06 [cconv]: 2.172e-05 [loop_unroll]: 0.00041422 [opt_after_cconv]: 9.442e-05, [1] [Cycle 1]: 8.887e-05, [7] [c_1]: 2.783e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.566e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.297e-05 [tuple_transform]: 6.872e-05, [1] [Cycle 1]: 6.446e-05, [4] [d_1]: 3.912e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 4.267e-05 [cse_after_recomputation]: 1.97e-05, [1] [Cycle 1]: 1.553e-05, [1] [cse]: 1.035e-05 [environ_conv]: 4.60001e-06 [swap_dp_allreduce_reducescatter]: 4.90001e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 2.31998e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.14999e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.55999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.15001e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.37e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.28999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.96e-06 [overlap_grad_ring_attention]: 4.09002e-06 [overlap_grad_flash_sp]: 1.723e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.88997e-06 [split_layernorm_comm]: 1.78997e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 6.777e-05, [1] [Cycle 1]: 6.381e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 8.56002e-06 [elim_not_effective]: 1.115e-05 [opt_reshape]: 6.00002e-06 [fold_const_symbol]: 8.99998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.531e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044819 [validate]: 3.205e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00806291 [execute]: 7.78001e-06 Sums bootstrap : 0.000466s : 2.84% type_inference : 0.004316s : 26.26% event_method : 0.000011s : 0.07% auto_monad : 0.000078s : 0.47% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.13% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.09% optimize.rewriter_before_opt_a : 0.000039s : 0.24% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.19% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000420s : 2.55% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000393s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000041s : 0.25% optimize.opt_a.a_3 : 0.000072s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.74% optimize.opt_b.b_1 : 0.000108s : 0.66% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000414s : 2.52% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.26% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 2.73% validate : 0.000032s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.008063s : 49.06% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000123 26 18.08% : 0.000022s : 4: substitution.arithmetic_simplify 1.40% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000001s : 2: substitution.fold_const_symbol 4.62% : 0.000006s : 4: substitution.graph_param_transform 65.57% : 0.000080s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.69% : 0.000005s : 4: substitution.remove_not_recompute_node 3.12% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004276 2 91.87% : 0.003928s : 1: type_inference.infer 8.13% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.52% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.96% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_set_eliminate 1.00% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.18% : 0.000002s : 8: predicate.less_batch_normalization 1.50% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.97% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 1.06% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.54% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 41.31% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.69% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028365 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.49% : 0.002975s : 1: add_attr 10.46% : 0.002966s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.29% : 0.000083s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.76% : 0.000500s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.62% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.72% : 0.000770s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.74% : 0.001912s : 1: opt_a 0.34% : 0.000098s : 1: opt_after_cconv 1.61% : 0.000458s : 1: opt_after_jit_grad 0.66% : 0.000187s : 1: opt_b 13.10% : 0.003715s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.83% : 0.000235s : 1: renormalize.infer 0.53% : 0.000152s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.15% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 28.46% : 0.008073s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 15.27% : 0.004330s : 1: type_inference 0.21% : 0.000060s : 1: validate TotalTime = 0.0364869, [24] [bootstrap]: 0.00053102 [type_inference]: 0.0104568 [event_method]: 4.107e-05 [auto_monad]: 0.00011385 [graph_reusing]: 7.88001e-06 [inline]: 2.04999e-06 [add_attr]: 0.00306517, [1] [add_attr_with_inline]: 0.0030574, [1] [Cycle 1]: 6.8e-05, [2] [tag_attr]: 3.124e-05 [meta_addattr_fg_expand]: 8.60001e-06 [parallel-infer-symbol]: 2.93998e-06 [pre_auto_parallel]: 4.603e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0130973, [53] [py_interpret_to_execute]: 3.552e-05 [rewriter_before_opt_a]: 0.00012672 [opt_a]: 0.0108525, [3] [Cycle 1]: 0.00695283, [45] [expand_dump_flag]: 3.76001e-06 [switch_simplify]: 6.587e-05 [loop_unroll]: 5.543e-05 [a_1]: 0.0013367 [with_stream_mark]: 4.862e-05 [recompute_prepare]: 2.253e-05 [updatestate_depend_eliminate]: 9.57999e-06 [updatestate_assign_eliminate]: 8.08999e-06 [updatestate_loads_eliminate]: 7.56001e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 0.00024593 [accelerated_algorithm]: 3.153e-05 [shard]: 2.11998e-06 [meta_shard_fg_expand]: 3.26999e-06 [shard_inline]: 1.604e-05 [merge_send_recv]: 1.637e-05 [auto_parallel]: 1.072e-05 [parallel]: 1.827e-05 [flash_sp]: 1.117e-05 [merge_comm]: 9.67001e-06 [allreduce_fusion]: 9.07999e-06 [matmul_add_comm_reduction]: 2.707e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.8e-05 [virtual_dataset]: 1.547e-05 [get_grad_eliminate_]: 1.515e-05 [virtual_output]: 1.497e-05 [merge_forward]: 9.58002e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.723e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.862e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 2.694e-05 [set_forward_comm_id_for_comm_node_pass]: 9.45001e-06 [meta_fg_expand]: 0.00139006 [flash_sp_send_recv_attached]: 3.86999e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 5.9e-05 [a_after_grad]: 8.147e-05 [renormalize]: 0.00245217 [add_forward_monad_depend]: 8.99e-06 [auto_monad_grad]: 5.47001e-06 [auto_monad_eliminator]: 5.589e-05 [cse]: 0.00016117 [a_3]: 0.00033583 [Cycle 2]: 0.00298902, [45] [expand_dump_flag]: 1.84998e-06 [switch_simplify]: 4.71e-05 [loop_unroll]: 4.38e-05 [a_1]: 0.00153342 [with_stream_mark]: 1.284e-05 [recompute_prepare]: 1.114e-05 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 4.67e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 0.0001264 [accelerated_algorithm]: 1.179e-05 [shard]: 1.36998e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.53e-06 [parallel]: 5.27001e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 4.85999e-06 [matmul_add_comm_reduction]: 7.92998e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.034e-05 [virtual_dataset]: 8.78001e-06 [get_grad_eliminate_]: 9.12001e-06 [virtual_output]: 8.64998e-06 [merge_forward]: 4.38999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.06998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.648e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.399e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20001e-06 [meta_fg_expand]: 3.622e-05 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.536e-05 [a_after_grad]: 1.432e-05 [renormalize]: 0.00060855 [add_forward_monad_depend]: 3.86999e-06 [auto_monad_grad]: 1.34e-06 [auto_monad_eliminator]: 1.521e-05 [cse]: 4.586e-05 [a_3]: 6.521e-05 [Cycle 3]: 0.00089569, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 1.076e-05 [loop_unroll]: 9.12001e-06 [a_1]: 0.00024797 [with_stream_mark]: 9.88998e-06 [recompute_prepare]: 9.05001e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012272 [accelerated_algorithm]: 1.144e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.96998e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.07002e-06 [parallel]: 4.38999e-06 [flash_sp]: 1.04e-06 [merge_comm]: 4.90999e-06 [allreduce_fusion]: 5.17e-06 [matmul_add_comm_reduction]: 7.56999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.008e-05 [virtual_dataset]: 8.65001e-06 [get_grad_eliminate_]: 8.48999e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 8.58001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.616e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.369e-05 [set_forward_comm_id_for_comm_node_pass]: 5.12999e-06 [meta_fg_expand]: 3.01999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.319e-05 [a_after_grad]: 1.413e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 1.02998e-06 [auto_monad_eliminator]: 1.135e-05 [cse]: 2.577e-05 [a_3]: 5.938e-05 [py_interpret_to_execute_after_opt_a]: 1.073e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 4.773e-05 [convert_after_rewriter]: 9.18002e-06 [order_py_execute_after_rewriter]: 6.86999e-06 [mutable_eliminate]: 0.00047352 [opt_b]: 0.00028566, [1] [Cycle 1]: 0.00027892, [7] [b_1]: 0.00018686 [b_2]: 1.099e-05 [updatestate_depend_eliminate]: 7.30998e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 4.01001e-06 [renormalize]: 3.50003e-07 [cse]: 3.119e-05 [optimize_parallel_all_gather_comm]: 2.042e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 1.925e-05 [loop_unroll]: 0.00041978 [opt_after_cconv]: 0.00013482, [1] [Cycle 1]: 0.00012868, [7] [c_1]: 4.844e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.79002e-06 [cse]: 2.891e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 2.837e-05 [tuple_transform]: 0.00010181, [1] [Cycle 1]: 9.728e-05, [4] [d_1]: 6.678e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.002e-05 [partial_unused_args_eliminate]: 2.11e-06 [add_recomputation]: 5.811e-05 [cse_after_recomputation]: 3.062e-05, [1] [Cycle 1]: 2.59e-05, [1] [cse]: 2.056e-05 [environ_conv]: 8.55001e-06 [swap_dp_allreduce_reducescatter]: 7.51999e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.53998e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.03997e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.669e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 5.00001e-06 [overlap_recompute_and_grad_model_parallel]: 5.61003e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 5.30999e-06 [overlap_grad_flash_sp]: 2.432e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 9.905e-05, [1] [Cycle 1]: 9.487e-05, [6] [build]: 9.70002e-06 [elim_shapecalc]: 1.346e-05 [elim_not_effective]: 1.849e-05 [opt_reshape]: 9.92999e-06 [fold_const_symbol]: 1.477e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.30002e-06 [pipeline_parallel_scheduler]: 1.46998e-06 [auto_monad_reorder]: 2.457e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00046628 [validate]: 0.00010026 [backend_pass]: 9.30013e-07 [task_emit]: 0.00829968 [execute]: 7.85e-06 Sums bootstrap : 0.000531s : 1.65% type_inference : 0.010457s : 32.51% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.39% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.38% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003118s : 9.69% optimize.opt_a.with_stream_mark : 0.000071s : 0.22% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000495s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001429s : 4.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.003061s : 9.51% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.26% optimize.opt_a.cse : 0.000233s : 0.72% optimize.opt_a.a_3 : 0.000460s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000474s : 1.47% optimize.opt_b.b_1 : 0.000187s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000420s : 1.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000466s : 1.45% validate : 0.000100s : 0.31% backend_pass : 0.000001s : 0.00% task_emit : 0.008300s : 25.80% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000734 218 5.79% : 0.000043s : 11: substitution.arithmetic_simplify 1.91% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000002s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.03% : 0.000404s : 16: substitution.inline 2.23% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.11% : 0.000016s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.79% : 0.000006s : 5: substitution.partial_eliminate 1.80% : 0.000013s : 20: substitution.remove_not_recompute_node 3.25% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000010s : 15: substitution.replace_old_param 0.38% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.42% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010389 2 87.61% : 0.009102s : 1: type_inference.infer 12.39% : 0.001287s : 1: type_inference.specialize ------[replace.] 0.000202 30 59.63% : 0.000121s : 16: replace.inline 40.37% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000426 30 92.86% : 0.000395s : 16: match.inline 7.14% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000731 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.14% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.17% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000041s : 244: predicate.inline 1.29% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.94% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.42% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000014s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.30% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001528 32 57.55% : 0.000879s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.45% : 0.000648s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060802 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.05% : 0.003069s : 1: add_attr 5.03% : 0.003061s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.93% : 0.000565s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000483s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.84% : 0.004767s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.85% : 0.010856s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.78% : 0.000476s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.55% : 0.013101s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.06% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.66% : 0.001617s : 2: renormalize.infer 2.35% : 0.001431s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000102s : 1: symbol_engine_optimizer 13.67% : 0.008309s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 17.22% : 0.010472s : 1: type_inference 0.22% : 0.000134s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x5-kbk],max_mem:24.0M TotalTime = 0.120463, [24] [bootstrap]: 0.00052466 [type_inference]: 0.00605278 [event_method]: 1.455e-05 [auto_monad]: 0.00010259 [graph_reusing]: 5.44e-06 [inline]: 1.89e-06 [add_attr]: 0.00346026, [1] [add_attr_with_inline]: 0.00344988, [1] [Cycle 1]: 4.568e-05, [2] [tag_attr]: 1.549e-05 [meta_addattr_fg_expand]: 4.14002e-06 [parallel-infer-symbol]: 2.77002e-06 [pre_auto_parallel]: 2.912e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00401063, [53] [py_interpret_to_execute]: 1.98e-05 [rewriter_before_opt_a]: 5.886e-05 [opt_a]: 0.00216533, [2] [Cycle 1]: 0.00156976, [45] [expand_dump_flag]: 3.07002e-06 [switch_simplify]: 3.207e-05 [loop_unroll]: 2.073e-05 [a_1]: 0.00049544 [with_stream_mark]: 1.444e-05 [recompute_prepare]: 8.17998e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.71999e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.538e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 1.84998e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 6.11e-06 [parallel]: 2.336e-05 [flash_sp]: 7.06001e-06 [merge_comm]: 3.91999e-06 [allreduce_fusion]: 3.63999e-06 [matmul_add_comm_reduction]: 8.58001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.77002e-06 [virtual_dataset]: 6.19001e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 8.90001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 9.57001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.85e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.61e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.051e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00042645 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.301e-05 [cse]: 2.74e-05 [a_3]: 4.044e-05 [Cycle 2]: 0.00058637, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.67002e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012426 [with_stream_mark]: 9.72001e-06 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.39999e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 7.105e-05 [accelerated_algorithm]: 5.37001e-06 [shard]: 1.11997e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.23002e-06 [parallel]: 4.63001e-06 [flash_sp]: 3.19001e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.56e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 5.61998e-06 [virtual_dataset]: 5.08002e-06 [get_grad_eliminate_]: 4.77e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 5.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.26002e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.7e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.85999e-06 [a_after_grad]: 7.92e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.20001e-07 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 6.64999e-06 [cse]: 1.252e-05 [a_3]: 3.114e-05 [py_interpret_to_execute_after_opt_a]: 7.33e-06 [slice_cell_reuse_recomputed_activation]: 1.74998e-06 [rewriter_after_opt_a]: 3.043e-05 [convert_after_rewriter]: 7.21999e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00045394 [opt_b]: 0.00018119, [1] [Cycle 1]: 0.00017498, [7] [b_1]: 0.00010476 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.11e-06 [renormalize]: 3.50003e-07 [cse]: 2.004e-05 [optimize_parallel_all_gather_comm]: 1.669e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 2.232e-05 [loop_unroll]: 0.00041698 [opt_after_cconv]: 9.444e-05, [1] [Cycle 1]: 8.868e-05, [7] [c_1]: 2.795e-05 [parameter_eliminate]: 2.43998e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.58e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.228e-05 [tuple_transform]: 6.918e-05, [1] [Cycle 1]: 6.492e-05, [4] [d_1]: 3.84e-05 [none_parameter_eliminate]: 1.37999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.55997e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 5.017e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.61e-05, [1] [cse]: 1.093e-05 [environ_conv]: 4.57e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.09002e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.50999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.71e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.34e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.164e-05 [grouped_pairwise_exchange_alltoall]: 1.70001e-06 [offloading_packed_experts]: 3.38999e-06 [overlap_recompute_and_grad_model_parallel]: 4.48999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.708e-05 [begin_end_overlap_inline]: 4.50003e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.658e-05, [1] [Cycle 1]: 6.26e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.01001e-06 [elim_not_effective]: 1.128e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 8.60001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.471e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.2e-06 [opt_after_jit_grad]: 0.00045331 [validate]: 3.2e-05 [backend_pass]: 8.10018e-07 [task_emit]: 0.105521 [execute]: 8.92e-06 Sums bootstrap : 0.000525s : 0.45% type_inference : 0.006053s : 5.22% event_method : 0.000015s : 0.01% auto_monad : 0.000103s : 0.09% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000620s : 0.53% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000427s : 0.37% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000454s : 0.39% optimize.opt_b.b_1 : 0.000105s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000417s : 0.36% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000453s : 0.39% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.105521s : 90.94% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000166 30 15.16% : 0.000025s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.69% : 0.000001s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 66.88% : 0.000111s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.47% : 0.000004s : 4: substitution.remove_not_recompute_node 2.58% : 0.000004s : 4: substitution.replace_old_param 6.33% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006008 2 90.78% : 0.005454s : 1: type_inference.infer 9.22% : 0.000554s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.18% : 0.000027s : 3: replace.inline 29.82% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.97% : 0.000109s : 3: match.inline 8.03% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.18% : 0.000002s : 15: predicate.environ_get_depend_swap 1.99% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000009s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.85% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.64% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000002s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 1.14% : 0.000002s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 46.13% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.87% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.129487 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.68% : 0.003464s : 1: add_attr 2.67% : 0.003453s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000109s : 1: auto_monad 0.01% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000565s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000464s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.76% : 0.000982s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000087s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.67% : 0.002168s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.36% : 0.000463s : 1: opt_after_jit_grad 0.14% : 0.000184s : 1: opt_b 3.10% : 0.004014s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000219s : 1: renormalize.infer 0.15% : 0.000200s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000069s : 1: symbol_engine_optimizer 81.51% : 0.105543s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.68% : 0.006066s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.110319, [24] [bootstrap]: 0.00050555 [type_inference]: 0.00459847 [event_method]: 1.062e-05 [auto_monad]: 0.00010153 [graph_reusing]: 5.39998e-06 [inline]: 2.62001e-06 [add_attr]: 0.00309261, [1] [add_attr_with_inline]: 0.00308458, [1] [Cycle 1]: 4.669e-05, [2] [tag_attr]: 1.171e-05 [meta_addattr_fg_expand]: 3.33998e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.181e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.60001e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0037618, [53] [py_interpret_to_execute]: 1.557e-05 [rewriter_before_opt_a]: 3.984e-05 [opt_a]: 0.00191411, [2] [Cycle 1]: 0.00131024, [45] [expand_dump_flag]: 3.11999e-06 [switch_simplify]: 2.347e-05 [loop_unroll]: 1.378e-05 [a_1]: 0.00028997 [with_stream_mark]: 1.386e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.50999e-06 [a_2]: 7.653e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.45e-06 [auto_parallel]: 6.26e-06 [parallel]: 1.714e-05 [flash_sp]: 7.02002e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 9.19998e-06 [allreduce_slice_to_reducescatter]: 8.79983e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.122e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.57001e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 9.09e-06 [renormalize]: 0.00036494 [add_forward_monad_depend]: 4.99998e-06 [auto_monad_grad]: 1.83002e-06 [auto_monad_eliminator]: 1.343e-05 [cse]: 2.705e-05 [a_3]: 4.404e-05 [Cycle 2]: 0.00059454, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 7.05998e-06 [loop_unroll]: 5.74999e-06 [a_1]: 0.00012455 [with_stream_mark]: 9.89999e-06 [recompute_prepare]: 6.04001e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.55997e-06 [parameter_eliminate]: 7.59988e-07 [a_2]: 6.742e-05 [accelerated_algorithm]: 5.41998e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 5.72999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.13999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25002e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 8.3e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.76e-06 [cse]: 1.364e-05 [a_3]: 3.161e-05 [py_interpret_to_execute_after_opt_a]: 7.94002e-06 [slice_cell_reuse_recomputed_activation]: 1.91998e-06 [rewriter_after_opt_a]: 3.173e-05 [convert_after_rewriter]: 7.88001e-06 [order_py_execute_after_rewriter]: 5.78002e-06 [mutable_eliminate]: 0.00046385 [opt_b]: 0.00018771, [1] [Cycle 1]: 0.00018125, [7] [b_1]: 0.00011298 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 4.89998e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 7.7e-07 [cse]: 1.587e-05 [optimize_parallel_all_gather_comm]: 1.728e-05 [overlap_param_gather]: 1.98002e-06 [cconv]: 2.202e-05 [loop_unroll]: 0.00042634 [opt_after_cconv]: 9.454e-05, [1] [Cycle 1]: 8.867e-05, [7] [c_1]: 2.697e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.599e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.238e-05 [tuple_transform]: 6.913e-05, [1] [Cycle 1]: 6.453e-05, [4] [d_1]: 3.877e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.378e-05 [cse_after_recomputation]: 1.999e-05, [1] [Cycle 1]: 1.564e-05, [1] [cse]: 1.061e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 4.82e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.94001e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.58e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 9.20001e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.50999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.117e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.67001e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.605e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 2.07999e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.765e-05, [1] [Cycle 1]: 6.36e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.30999e-06 [elim_not_effective]: 1.153e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.52998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.57999e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.505e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.71999e-06 [opt_after_jit_grad]: 0.00044747 [validate]: 3.11e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.097488 [execute]: 8.38999e-06 Sums bootstrap : 0.000506s : 0.48% type_inference : 0.004598s : 4.33% event_method : 0.000011s : 0.01% auto_monad : 0.000102s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000415s : 0.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000365s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000076s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000464s : 0.44% optimize.opt_b.b_1 : 0.000113s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000426s : 0.40% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.42% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.097488s : 91.78% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 17.91% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.24% : 0.000005s : 4: substitution.graph_param_transform 66.47% : 0.000080s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.96% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004558 2 91.57% : 0.004174s : 1: type_inference.infer 8.43% : 0.000384s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.86% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.10% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.14% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.13% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.80% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.84% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.93% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 1.06% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.83% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.43% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.70% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000285 6 40.66% : 0.000116s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.34% : 0.000169s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.118459 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.61% : 0.003097s : 1: add_attr 2.61% : 0.003088s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000108s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.46% : 0.000541s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000473s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.65% : 0.000766s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000094s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.62% : 0.001917s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.39% : 0.000457s : 1: opt_after_jit_grad 0.16% : 0.000191s : 1: opt_b 3.18% : 0.003766s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000205s : 1: renormalize.infer 0.13% : 0.000153s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.32% : 0.097510s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.89% : 0.004613s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.109576, [24] [bootstrap]: 0.00046142 [type_inference]: 0.00559447 [event_method]: 1.434e-05 [auto_monad]: 5.408e-05 [graph_reusing]: 5.84999e-06 [inline]: 1.87001e-06 [add_attr]: 0.0030012, [1] [add_attr_with_inline]: 0.00299329, [1] [Cycle 1]: 4.664e-05, [2] [tag_attr]: 1.501e-05 [meta_addattr_fg_expand]: 3.88001e-06 [parallel-infer-symbol]: 3.01001e-06 [pre_auto_parallel]: 2.573e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.77999e-06 [optimize]: 0.00395786, [53] [py_interpret_to_execute]: 2.071e-05 [rewriter_before_opt_a]: 5.836e-05 [opt_a]: 0.00212893, [2] [Cycle 1]: 0.00152621, [45] [expand_dump_flag]: 2.65002e-06 [switch_simplify]: 3.144e-05 [loop_unroll]: 2.051e-05 [a_1]: 0.00044439 [with_stream_mark]: 1.304e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.94002e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.96999e-06 [parameter_eliminate]: 2.21e-06 [a_2]: 7.497e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 1.78002e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.97001e-06 [merge_send_recv]: 8.13999e-06 [auto_parallel]: 5.72999e-06 [parallel]: 1.761e-05 [flash_sp]: 7.24001e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 8.05999e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 6.97002e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.71e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.72999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.034e-05 [merge_recompute_call_nodes]: 1.79e-06 [before_grad]: 9.41e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.94001e-06 [receive_attached]: 2.46e-06 [after_resolve]: 2.161e-05 [a_after_grad]: 9.10999e-06 [renormalize]: 0.00043338 [add_forward_monad_depend]: 4.54998e-06 [auto_monad_grad]: 2.14e-06 [auto_monad_eliminator]: 1.334e-05 [cse]: 2.816e-05 [a_3]: 3.946e-05 [Cycle 2]: 0.00059327, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.0001243 [with_stream_mark]: 9.78002e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.704e-05 [accelerated_algorithm]: 5.32999e-06 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.32001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.76e-06 [parallel]: 3.85e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.2e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 5.16002e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.35002e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 4.90001e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.55002e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 6.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.02998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11999e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.72e-06 [a_after_grad]: 8.05999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.286e-05 [a_3]: 3.167e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.065e-05 [convert_after_rewriter]: 6.60002e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00045237 [opt_b]: 0.00018137, [1] [Cycle 1]: 0.00017538, [7] [b_1]: 0.00010761 [b_2]: 7.43999e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.71999e-06 [updatestate_loads_eliminate]: 2.53e-06 [renormalize]: 3.09985e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.545e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.282e-05 [loop_unroll]: 0.00041296 [opt_after_cconv]: 9.361e-05, [1] [Cycle 1]: 8.814e-05, [7] [c_1]: 2.752e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.623e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.295e-05 [tuple_transform]: 6.786e-05, [1] [Cycle 1]: 6.371e-05, [4] [d_1]: 3.891e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.77001e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 4.354e-05 [cse_after_recomputation]: 1.993e-05, [1] [Cycle 1]: 1.545e-05, [1] [cse]: 1.036e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.75002e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.33002e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 1.99999e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.105e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46998e-06 [overlap_recompute_comm]: 2.06e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.74e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 2.23002e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 6.712e-05, [1] [Cycle 1]: 6.278e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.095e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.79e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.41998e-06 [auto_monad_reorder]: 1.516e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.0004466 [validate]: 3.16e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0957278 [execute]: 9.24998e-06 Sums bootstrap : 0.000461s : 0.44% type_inference : 0.005594s : 5.30% event_method : 0.000014s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000569s : 0.54% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000030s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000433s : 0.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.43% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000413s : 0.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000447s : 0.42% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.095728s : 90.65% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000161 30 15.52% : 0.000025s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000002s : 2: substitution.fold_const_symbol 3.32% : 0.000005s : 4: substitution.graph_param_transform 65.92% : 0.000106s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.46% : 0.000004s : 4: substitution.remove_not_recompute_node 2.56% : 0.000004s : 4: substitution.replace_old_param 6.33% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005554 2 90.03% : 0.005000s : 1: type_inference.infer 9.97% : 0.000554s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.84% : 0.000027s : 3: replace.inline 30.16% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 5 91.91% : 0.000104s : 3: match.inline 8.09% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.03% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000009s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 0.94% : 0.000001s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.40% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000347 8 45.20% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.80% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.118061 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.55% : 0.003006s : 1: add_attr 2.54% : 0.002997s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000497s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.80% : 0.000941s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.81% : 0.002132s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.39% : 0.000456s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.36% : 0.003962s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.19% : 0.000220s : 1: renormalize.infer 0.17% : 0.000206s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.10% : 0.095751s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.75% : 0.005608s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.146644, [24] [bootstrap]: 0.00050514 [type_inference]: 0.0114559 [event_method]: 4.908e-05 [auto_monad]: 0.00012099 [graph_reusing]: 8.89998e-06 [inline]: 2.07001e-06 [add_attr]: 0.00304918, [1] [add_attr_with_inline]: 0.00304095, [1] [Cycle 1]: 7.304e-05, [2] [tag_attr]: 3.502e-05 [meta_addattr_fg_expand]: 9.31998e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 5.091e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.013515, [53] [py_interpret_to_execute]: 3.878e-05 [rewriter_before_opt_a]: 0.00014407 [opt_a]: 0.0112341, [3] [Cycle 1]: 0.00724237, [45] [expand_dump_flag]: 3.82002e-06 [switch_simplify]: 7.394e-05 [loop_unroll]: 6.199e-05 [a_1]: 0.00144756 [with_stream_mark]: 2.333e-05 [recompute_prepare]: 2.136e-05 [updatestate_depend_eliminate]: 9.50001e-06 [updatestate_assign_eliminate]: 7.83999e-06 [updatestate_loads_eliminate]: 7.61999e-06 [parameter_eliminate]: 2.59001e-06 [a_2]: 0.00027377 [accelerated_algorithm]: 3.096e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 3.23998e-06 [shard_inline]: 1.601e-05 [merge_send_recv]: 1.589e-05 [auto_parallel]: 1.109e-05 [parallel]: 1.892e-05 [flash_sp]: 1.187e-05 [merge_comm]: 9.59e-06 [allreduce_fusion]: 9.10999e-06 [matmul_add_comm_reduction]: 2.634e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.737e-05 [virtual_dataset]: 1.538e-05 [get_grad_eliminate_]: 1.559e-05 [virtual_output]: 1.51e-05 [merge_forward]: 9.15999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.736e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.901e-05 [merge_recompute_call_nodes]: 1.74998e-06 [before_grad]: 2.729e-05 [set_forward_comm_id_for_comm_node_pass]: 9.61998e-06 [meta_fg_expand]: 0.00142517 [flash_sp_send_recv_attached]: 3.66999e-06 [receive_attached]: 2.77002e-06 [after_resolve]: 5.951e-05 [a_after_grad]: 8.288e-05 [renormalize]: 0.00257103 [add_forward_monad_depend]: 9.07999e-06 [auto_monad_grad]: 5.37001e-06 [auto_monad_eliminator]: 5.601e-05 [cse]: 0.00016537 [a_3]: 0.00033638 [Cycle 2]: 0.00308007, [45] [expand_dump_flag]: 1.60999e-06 [switch_simplify]: 4.684e-05 [loop_unroll]: 4.418e-05 [a_1]: 0.00153651 [with_stream_mark]: 1.248e-05 [recompute_prepare]: 1.054e-05 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 4.60001e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 0.00012559 [accelerated_algorithm]: 1.211e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.43e-06 [parallel]: 4.92e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 5.12e-06 [allreduce_fusion]: 4.76002e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.032e-05 [virtual_dataset]: 8.73001e-06 [get_grad_eliminate_]: 8.72998e-06 [virtual_output]: 9.00001e-06 [merge_forward]: 5.30001e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.70002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.668e-05 [merge_recompute_call_nodes]: 1.01002e-06 [before_grad]: 1.414e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 7.242e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.578e-05 [a_after_grad]: 1.427e-05 [renormalize]: 0.00066006 [add_forward_monad_depend]: 4.40999e-06 [auto_monad_grad]: 1.34e-06 [auto_monad_eliminator]: 1.441e-05 [cse]: 4.68e-05 [a_3]: 6.451e-05 [Cycle 3]: 0.0008973, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 1.025e-05 [loop_unroll]: 8.87e-06 [a_1]: 0.00024851 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 9.48002e-06 [updatestate_depend_eliminate]: 4.68999e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012335 [accelerated_algorithm]: 1.156e-05 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.02002e-06 [parallel]: 4.74e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 5.22e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 7.69002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.99999e-06 [virtual_dataset]: 8.57998e-06 [get_grad_eliminate_]: 8.28001e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 8.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.603e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.389e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 3.06001e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.342e-05 [a_after_grad]: 1.404e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.45999e-06 [auto_monad_grad]: 8.40024e-07 [auto_monad_eliminator]: 1.112e-05 [cse]: 2.705e-05 [a_3]: 5.894e-05 [py_interpret_to_execute_after_opt_a]: 1.073e-05 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 4.651e-05 [convert_after_rewriter]: 8.97e-06 [order_py_execute_after_rewriter]: 6.89999e-06 [mutable_eliminate]: 0.0004808 [opt_b]: 0.00028744, [1] [Cycle 1]: 0.00028101, [7] [b_1]: 0.00018831 [b_2]: 1.064e-05 [updatestate_depend_eliminate]: 7.82e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 4.17e-06 [renormalize]: 3.59985e-07 [cse]: 3.153e-05 [optimize_parallel_all_gather_comm]: 2.017e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 1.929e-05 [loop_unroll]: 0.00042299 [opt_after_cconv]: 0.00013628, [1] [Cycle 1]: 0.00013046, [7] [c_1]: 4.783e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 7.2e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 4.13999e-06 [cse]: 3.037e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 2.955e-05 [tuple_transform]: 0.00010151, [1] [Cycle 1]: 9.685e-05, [4] [d_1]: 6.634e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.003e-05 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 5.632e-05 [cse_after_recomputation]: 3.307e-05, [1] [Cycle 1]: 2.838e-05, [1] [cse]: 2.253e-05 [environ_conv]: 9.00999e-06 [swap_dp_allreduce_reducescatter]: 7.58001e-06 [bias_add_comm_swap]: 2.77002e-06 [label_micro_interleaved_index]: 3.98001e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.35997e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.57001e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.705e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 5.09e-06 [overlap_recompute_and_grad_model_parallel]: 6.01e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 4.99998e-06 [overlap_grad_flash_sp]: 2.418e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 9.873e-05, [1] [Cycle 1]: 9.449e-05, [6] [build]: 9.61998e-06 [elim_shapecalc]: 1.311e-05 [elim_not_effective]: 1.811e-05 [opt_reshape]: 1.044e-05 [fold_const_symbol]: 1.455e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.83002e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 2.551e-05 [get_jit_bprop_graph]: 1.18001e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00051223 [validate]: 4.558e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.117048 [execute]: 9.39998e-06 Sums bootstrap : 0.000505s : 0.35% type_inference : 0.011456s : 8.05% event_method : 0.000049s : 0.03% auto_monad : 0.000121s : 0.09% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000144s : 0.10% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.09% optimize.opt_a.loop_unroll : 0.000115s : 0.08% optimize.opt_a.a_1 : 0.003233s : 2.27% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000041s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000523s : 0.37% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001501s : 1.05% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.06% optimize.opt_a.a_after_grad : 0.000111s : 0.08% optimize.opt_a.renormalize : 0.003231s : 2.27% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.06% optimize.opt_a.cse : 0.000239s : 0.17% optimize.opt_a.a_3 : 0.000460s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000481s : 0.34% optimize.opt_b.b_1 : 0.000188s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.01% optimize.loop_unroll : 0.000423s : 0.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000512s : 0.36% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.117048s : 82.24% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000763 222 5.84% : 0.000045s : 12: substitution.arithmetic_simplify 1.79% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 55.90% : 0.000427s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.86% : 0.000014s : 20: substitution.remove_not_recompute_node 3.24% : 0.000025s : 10: substitution.replace_applicator 1.33% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.58% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.68% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.45% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011381 2 86.63% : 0.009859s : 1: type_inference.infer 13.37% : 0.001522s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.96% : 0.000126s : 17: replace.inline 42.04% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000452 33 92.49% : 0.000418s : 17: match.inline 7.51% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.19% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000042s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.70% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001618 34 56.18% : 0.000909s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.82% : 0.000709s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.171681 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.78% : 0.003054s : 1: add_attr 1.77% : 0.003045s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000128s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000540s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000056s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.29% : 0.000490s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.87% : 0.004924s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000173s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.55% : 0.011237s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.30% : 0.000522s : 1: opt_after_jit_grad 0.17% : 0.000291s : 1: opt_b 7.87% : 0.013519s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.00% : 0.001719s : 2: renormalize.infer 0.87% : 0.001499s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000050s : 1: rewriter_after_opt_a 0.09% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 68.19% : 0.117073s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.68% : 0.011472s : 1: type_inference 0.04% : 0.000071s : 1: validate TotalTime = 0.104612, [24] [bootstrap]: 0.00046832 [type_inference]: 0.00438277 [event_method]: 1.119e-05 [auto_monad]: 5.006e-05 [graph_reusing]: 4.86002e-06 [inline]: 1.65001e-06 [add_attr]: 0.00303057, [1] [add_attr_with_inline]: 0.00302268, [1] [Cycle 1]: 4.577e-05, [2] [tag_attr]: 1.188e-05 [meta_addattr_fg_expand]: 2.98998e-06 [parallel-infer-symbol]: 3.26999e-06 [pre_auto_parallel]: 2.175e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.88002e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00368948, [53] [py_interpret_to_execute]: 1.592e-05 [rewriter_before_opt_a]: 3.956e-05 [opt_a]: 0.0018595, [2] [Cycle 1]: 0.00125983, [45] [expand_dump_flag]: 2.66999e-06 [switch_simplify]: 2.403e-05 [loop_unroll]: 1.377e-05 [a_1]: 0.00029077 [with_stream_mark]: 1.389e-05 [recompute_prepare]: 7.26001e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 1.56002e-06 [a_2]: 7.547e-05 [accelerated_algorithm]: 5.93002e-06 [shard]: 2.99999e-06 [meta_shard_fg_expand]: 1.33002e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 7.41999e-06 [auto_parallel]: 5.77001e-06 [parallel]: 1.705e-05 [flash_sp]: 7.21999e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.46003e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.66e-06 [virtual_dataset]: 5.65001e-06 [get_grad_eliminate_]: 5.56002e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 9.15001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.92001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.90998e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.116e-05 [a_after_grad]: 8.40999e-06 [renormalize]: 0.00035338 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.285e-05 [cse]: 2.721e-05 [a_3]: 3.94e-05 [Cycle 2]: 0.0005905, [45] [expand_dump_flag]: 7.7e-07 [switch_simplify]: 6.96999e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012336 [with_stream_mark]: 9.88998e-06 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.79e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.35999e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.21998e-06 [parallel]: 3.91999e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 5.23002e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.29001e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.07999e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.007e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14001e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.46e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.316e-05 [a_3]: 3.274e-05 [py_interpret_to_execute_after_opt_a]: 7.74997e-06 [slice_cell_reuse_recomputed_activation]: 2.26998e-06 [rewriter_after_opt_a]: 3.106e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00048113 [opt_b]: 0.00018241, [1] [Cycle 1]: 0.00017616, [7] [b_1]: 0.00010843 [b_2]: 7.08998e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 5.3001e-07 [cse]: 1.641e-05 [optimize_parallel_all_gather_comm]: 1.527e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 2.171e-05 [loop_unroll]: 0.00040602 [opt_after_cconv]: 9.497e-05, [1] [Cycle 1]: 8.928e-05, [7] [c_1]: 2.735e-05 [parameter_eliminate]: 2.45002e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.621e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.228e-05 [tuple_transform]: 6.756e-05, [1] [Cycle 1]: 6.313e-05, [4] [d_1]: 3.814e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 5.99999e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 4.413e-05 [cse_after_recomputation]: 1.956e-05, [1] [Cycle 1]: 1.527e-05, [1] [cse]: 1.046e-05 [environ_conv]: 4.94998e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 1.99999e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 8.39995e-07 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.11997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.41002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 6.681e-05, [1] [Cycle 1]: 6.279e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 7.98999e-06 [elim_not_effective]: 1.143e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.50999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.585e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.50998e-06 [opt_after_jit_grad]: 0.00044407 [validate]: 3.199e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.0922208 [execute]: 9.22001e-06 Sums bootstrap : 0.000468s : 0.47% type_inference : 0.004383s : 4.36% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000414s : 0.41% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000353s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000481s : 0.48% optimize.opt_b.b_1 : 0.000108s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000406s : 0.40% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000444s : 0.44% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.092221s : 91.66% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.03% : 0.000022s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.48% : 0.000005s : 4: substitution.graph_param_transform 65.53% : 0.000079s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.88% : 0.000005s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004342 2 91.85% : 0.003988s : 1: type_inference.infer 8.15% : 0.000354s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 1.08% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.81% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 41.96% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.04% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.112602 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.70% : 0.003035s : 1: add_attr 2.69% : 0.003026s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000505s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000415s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.44% : 0.000491s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.68% : 0.000765s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.65% : 0.001863s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.40% : 0.000454s : 1: opt_after_jit_grad 0.17% : 0.000186s : 1: opt_b 3.28% : 0.003693s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000196s : 1: renormalize.infer 0.13% : 0.000151s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000069s : 1: symbol_engine_optimizer 81.92% : 0.092243s : 1: task_emit 0.06% : 0.000070s : 1: tuple_transform 3.90% : 0.004396s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.145276, [24] [bootstrap]: 0.00050258 [type_inference]: 0.0102571 [event_method]: 4.401e-05 [auto_monad]: 0.00011326 [graph_reusing]: 8.23999e-06 [inline]: 1.78997e-06 [add_attr]: 0.00299186, [1] [add_attr_with_inline]: 0.00298343, [1] [Cycle 1]: 7.112e-05, [2] [tag_attr]: 3.283e-05 [meta_addattr_fg_expand]: 8.23001e-06 [parallel-infer-symbol]: 2.66999e-06 [pre_auto_parallel]: 4.588e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0133698, [53] [py_interpret_to_execute]: 3.664e-05 [rewriter_before_opt_a]: 0.00012703 [opt_a]: 0.0110611, [3] [Cycle 1]: 0.00713443, [45] [expand_dump_flag]: 3.65e-06 [switch_simplify]: 6.723e-05 [loop_unroll]: 5.495e-05 [a_1]: 0.00139058 [with_stream_mark]: 2.369e-05 [recompute_prepare]: 2.133e-05 [updatestate_depend_eliminate]: 9.31e-06 [updatestate_assign_eliminate]: 7.71001e-06 [updatestate_loads_eliminate]: 7.25e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 0.00024485 [accelerated_algorithm]: 3.059e-05 [shard]: 1.79e-06 [meta_shard_fg_expand]: 3.16001e-06 [shard_inline]: 1.631e-05 [merge_send_recv]: 1.586e-05 [auto_parallel]: 1.053e-05 [parallel]: 1.776e-05 [flash_sp]: 1.103e-05 [merge_comm]: 9.64999e-06 [allreduce_fusion]: 9.27001e-06 [matmul_add_comm_reduction]: 2.686e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.824e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.492e-05 [virtual_output]: 1.522e-05 [merge_forward]: 9.29998e-06 [cell_reuse_recompute_pass]: 1.01997e-06 [offload_activation]: 1.76e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.908e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.814e-05 [set_forward_comm_id_for_comm_node_pass]: 9.78002e-06 [meta_fg_expand]: 0.00142388 [flash_sp_send_recv_attached]: 3.47002e-06 [receive_attached]: 2.79999e-06 [after_resolve]: 6.038e-05 [a_after_grad]: 8.142e-05 [renormalize]: 0.00255824 [add_forward_monad_depend]: 9.32999e-06 [auto_monad_grad]: 5.47999e-06 [auto_monad_eliminator]: 5.826e-05 [cse]: 0.00016881 [a_3]: 0.00033862 [Cycle 2]: 0.00301739, [45] [expand_dump_flag]: 1.77001e-06 [switch_simplify]: 4.8e-05 [loop_unroll]: 4.425e-05 [a_1]: 0.0015652 [with_stream_mark]: 1.326e-05 [recompute_prepare]: 1.071e-05 [updatestate_depend_eliminate]: 5.79999e-06 [updatestate_assign_eliminate]: 4.58001e-06 [updatestate_loads_eliminate]: 3.86999e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 0.00012584 [accelerated_algorithm]: 1.207e-05 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.88002e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 8.16002e-06 [parallel]: 5.05001e-06 [flash_sp]: 3.43999e-06 [merge_comm]: 5.38002e-06 [allreduce_fusion]: 4.82998e-06 [matmul_add_comm_reduction]: 7.86001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.064e-05 [virtual_dataset]: 9.17999e-06 [get_grad_eliminate_]: 8.67e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.40999e-06 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 8.90999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.648e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.461e-05 [set_forward_comm_id_for_comm_node_pass]: 5.45001e-06 [meta_fg_expand]: 3.448e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.35999e-06 [after_resolve]: 1.518e-05 [a_after_grad]: 1.444e-05 [renormalize]: 0.0006001 [add_forward_monad_depend]: 4.42e-06 [auto_monad_grad]: 1.30001e-06 [auto_monad_eliminator]: 1.458e-05 [cse]: 4.646e-05 [a_3]: 6.413e-05 [Cycle 3]: 0.0008951, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 1.064e-05 [loop_unroll]: 8.96998e-06 [a_1]: 0.00024893 [with_stream_mark]: 1.033e-05 [recompute_prepare]: 9.07001e-06 [updatestate_depend_eliminate]: 4.70001e-06 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012281 [accelerated_algorithm]: 1.127e-05 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 8.87999e-06 [merge_send_recv]: 6.90002e-06 [auto_parallel]: 7.34002e-06 [parallel]: 4.65001e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 4.95999e-06 [allreduce_fusion]: 4.94998e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 9.42001e-06 [get_grad_eliminate_]: 8.22e-06 [virtual_output]: 8.62e-06 [merge_forward]: 4.24997e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 8.55999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.592e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.43e-05 [set_forward_comm_id_for_comm_node_pass]: 5.14e-06 [meta_fg_expand]: 2.91999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.341e-05 [a_after_grad]: 1.393e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.129e-05 [cse]: 2.647e-05 [a_3]: 5.808e-05 [py_interpret_to_execute_after_opt_a]: 1.129e-05 [slice_cell_reuse_recomputed_activation]: 1.79998e-06 [rewriter_after_opt_a]: 4.681e-05 [convert_after_rewriter]: 9.12001e-06 [order_py_execute_after_rewriter]: 6.72002e-06 [mutable_eliminate]: 0.00047413 [opt_b]: 0.00028684, [1] [Cycle 1]: 0.00027981, [7] [b_1]: 0.00018889 [b_2]: 1.065e-05 [updatestate_depend_eliminate]: 6.96999e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.88001e-06 [renormalize]: 5.10016e-07 [cse]: 3.057e-05 [optimize_parallel_all_gather_comm]: 2.09e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 1.987e-05 [loop_unroll]: 0.00042731 [opt_after_cconv]: 0.0001875, [1] [Cycle 1]: 0.0001814, [7] [c_1]: 4.856e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 5.439e-05 [cse]: 3.003e-05 [renormalize]: 1.8999e-07 [remove_dup_value]: 2.932e-05 [tuple_transform]: 0.00010198, [1] [Cycle 1]: 9.707e-05, [4] [d_1]: 6.698e-05 [none_parameter_eliminate]: 1.88997e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 5.536e-05 [cse_after_recomputation]: 3.242e-05, [1] [Cycle 1]: 2.749e-05, [1] [cse]: 2.223e-05 [environ_conv]: 8.60001e-06 [swap_dp_allreduce_reducescatter]: 7.73999e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 3.89002e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 9.49978e-07 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.652e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.81e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 4.89e-06 [overlap_grad_flash_sp]: 2.594e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.871e-05, [1] [Cycle 1]: 9.452e-05, [6] [build]: 1.036e-05 [elim_shapecalc]: 1.336e-05 [elim_not_effective]: 1.797e-05 [opt_reshape]: 9.97999e-06 [fold_const_symbol]: 1.479e-05 [renormalize]: 2.60014e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 2.42e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.23e-06 [opt_after_jit_grad]: 0.00047226 [validate]: 4.816e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.117118 [execute]: 9.44e-06 Sums bootstrap : 0.000503s : 0.36% type_inference : 0.010257s : 7.28% event_method : 0.000044s : 0.03% auto_monad : 0.000113s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000126s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.08% optimize.opt_a.a_1 : 0.003205s : 2.27% optimize.opt_a.with_stream_mark : 0.000047s : 0.03% optimize.opt_a.recompute_prepare : 0.000041s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001461s : 1.04% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003158s : 2.24% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.06% optimize.opt_a.cse : 0.000242s : 0.17% optimize.opt_a.a_3 : 0.000461s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000474s : 0.34% optimize.opt_b.b_1 : 0.000189s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000427s : 0.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000054s : 0.04% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000472s : 0.33% validate : 0.000048s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.117118s : 83.07% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000774 218 5.64% : 0.000044s : 11: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 57.08% : 0.000442s : 16: substitution.inline 2.03% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.64% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.09% : 0.000024s : 10: substitution.replace_applicator 1.46% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.27% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.91% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010188 2 86.86% : 0.008850s : 1: type_inference.infer 13.14% : 0.001338s : 1: type_inference.specialize ------[replace.] 0.000204 30 59.47% : 0.000122s : 16: replace.inline 40.53% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000464 30 93.47% : 0.000433s : 16: match.inline 6.53% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000782 5663 1.04% : 0.000008s : 67: predicate.accumulaten_eliminater 0.25% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.00% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.93% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.49% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.12% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.15% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.07% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 75: predicate.environ_get_depend_swap 1.67% : 0.000013s : 107: predicate.environ_get_eliminate 1.13% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.58% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.10% : 0.000016s : 97: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.63% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.28% : 0.000041s : 244: predicate.inline 1.22% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.52% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.50% : 0.000020s : 164: predicate.load_eliminater 0.27% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.07% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.07% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.87% : 0.000015s : 97: predicate.partial_defer_inline 1.60% : 0.000013s : 89: predicate.partial_eliminate 1.01% : 0.000008s : 67: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.22% : 0.000010s : 67: predicate.reduce_eliminate 2.53% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.78% : 0.000014s : 149: predicate.replace_applicator 0.57% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.03% : 0.000008s : 67: predicate.reshape_eliminate 1.08% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.20% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.71% : 0.000013s : 97: predicate.switch_defer_inline 2.78% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.60% : 0.000036s : 265: predicate.switch_simplify 1.02% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.38% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.68% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.36% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.91% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.51% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.48% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 8.65% : 0.000068s : 196: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001548 32 56.98% : 0.000882s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.02% : 0.000666s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.169975 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.76% : 0.002996s : 1: add_attr 1.76% : 0.002987s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000120s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.32% : 0.000537s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000052s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.26% : 0.000436s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000483s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.86% : 0.004858s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.51% : 0.011064s : 1: opt_a 0.11% : 0.000191s : 1: opt_after_cconv 0.28% : 0.000482s : 1: opt_after_jit_grad 0.17% : 0.000291s : 1: opt_b 7.87% : 0.013374s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.00% : 0.001700s : 2: renormalize.infer 0.85% : 0.001445s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.08% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 68.92% : 0.117140s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.06% : 0.010303s : 1: type_inference 0.04% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x5-ge],max_mem:26.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x6-pynative],max_mem:26.0M TotalTime = 0.0216303, [24] [bootstrap]: 0.00054423 [type_inference]: 0.00608798 [event_method]: 1.524e-05 [auto_monad]: 5.631e-05 [graph_reusing]: 5.62999e-06 [inline]: 1.96e-06 [add_attr]: 0.00341217, [1] [add_attr_with_inline]: 0.00340082, [1] [Cycle 1]: 4.577e-05, [2] [tag_attr]: 1.52e-05 [meta_addattr_fg_expand]: 3.78999e-06 [parallel-infer-symbol]: 2.77002e-06 [pre_auto_parallel]: 2.834e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.0040261, [53] [py_interpret_to_execute]: 1.984e-05 [rewriter_before_opt_a]: 6.015e-05 [opt_a]: 0.00219881, [2] [Cycle 1]: 0.0015412, [45] [expand_dump_flag]: 2.70997e-06 [switch_simplify]: 3.221e-05 [loop_unroll]: 2.074e-05 [a_1]: 0.00045082 [with_stream_mark]: 1.321e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.68999e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.607e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 6.16e-06 [parallel]: 2.296e-05 [flash_sp]: 7.26001e-06 [merge_comm]: 4.1e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 6.36998e-06 [get_grad_eliminate_]: 5.61998e-06 [virtual_output]: 5.72999e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.09003e-06 [offload_activation]: 9.11002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.135e-05 [merge_recompute_call_nodes]: 1.31002e-06 [before_grad]: 9.18002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.42001e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 8.74e-06 [renormalize]: 0.00044379 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.409e-05 [cse]: 2.621e-05 [a_3]: 4.142e-05 [Cycle 2]: 0.00064814, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.58e-06 [loop_unroll]: 5.77001e-06 [a_1]: 0.00012514 [with_stream_mark]: 1.012e-05 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 7.50006e-07 [a_2]: 6.707e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.29e-06 [parallel]: 5.937e-05 [flash_sp]: 3.04999e-06 [merge_comm]: 3.40998e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.30001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.25002e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.04003e-06 [merge_forward]: 2.55002e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.67999e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.93999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.83001e-06 [a_after_grad]: 8.25e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.66999e-06 [cse]: 1.637e-05 [a_3]: 3.13e-05 [py_interpret_to_execute_after_opt_a]: 7.45998e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 2.886e-05 [convert_after_rewriter]: 6.43003e-06 [order_py_execute_after_rewriter]: 4.74e-06 [mutable_eliminate]: 0.00044855 [opt_b]: 0.0001835, [1] [Cycle 1]: 0.00017731, [7] [b_1]: 0.00010895 [b_2]: 6.91999e-06 [updatestate_depend_eliminate]: 5.65001e-06 [updatestate_assign_eliminate]: 2.86999e-06 [updatestate_loads_eliminate]: 2.33998e-06 [renormalize]: 3.30008e-07 [cse]: 1.663e-05 [optimize_parallel_all_gather_comm]: 1.594e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.198e-05 [loop_unroll]: 0.00040763 [opt_after_cconv]: 9.452e-05, [1] [Cycle 1]: 8.872e-05, [7] [c_1]: 2.752e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.651e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.203e-05 [tuple_transform]: 6.883e-05, [1] [Cycle 1]: 6.457e-05, [4] [d_1]: 3.904e-05 [none_parameter_eliminate]: 1.93002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 5.95002e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 5.053e-05 [cse_after_recomputation]: 1.992e-05, [1] [Cycle 1]: 1.56e-05, [1] [cse]: 1.068e-05 [environ_conv]: 4.42998e-06 [swap_dp_allreduce_reducescatter]: 5.06997e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.57001e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.01e-06 [micro_interleaved_order_control]: 2.40002e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.28002e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.201e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 4.03999e-06 [overlap_recompute_and_grad_model_parallel]: 4.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.18998e-06 [overlap_grad_ring_attention]: 3.61999e-06 [overlap_grad_flash_sp]: 1.58e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 1.86003e-06 [split_layernorm_comm]: 2.19999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.83e-05, [1] [Cycle 1]: 6.392e-05, [6] [build]: 2.74999e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.119e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.90001e-06 [renormalize]: 2.59985e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.61002e-06 [auto_monad_reorder]: 1.657e-05 [get_jit_bprop_graph]: 9.29984e-07 [rewriter_after_jit_bprop_graph]: 0.00013137 [opt_after_jit_grad]: 0.00045495 [validate]: 3.168e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.0065929 [execute]: 6.79001e-06 Sums bootstrap : 0.000544s : 3.15% type_inference : 0.006088s : 35.28% event_method : 0.000015s : 0.09% auto_monad : 0.000056s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000576s : 3.34% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000082s : 0.48% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000444s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 2.60% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000408s : 2.36% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000131s : 0.76% opt_after_jit_grad : 0.000455s : 2.64% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.006593s : 38.21% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.51% : 0.000024s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.52% : 0.000006s : 4: substitution.graph_param_transform 66.59% : 0.000110s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.87% : 0.000005s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 4: substitution.replace_old_param 6.81% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006044 2 90.25% : 0.005455s : 1: type_inference.infer 9.75% : 0.000590s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.40% : 0.000026s : 3: replace.inline 30.60% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.34% : 0.000107s : 3: match.inline 8.66% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.82% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.04% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.43% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.69% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.90% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.47% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 45.74% : 0.000177s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.26% : 0.000210s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030606 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.16% : 0.003417s : 1: add_attr 11.12% : 0.003405s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000581s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000416s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.08% : 0.000941s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.19% : 0.002202s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.52% : 0.000464s : 1: opt_after_jit_grad 0.61% : 0.000187s : 1: opt_b 13.17% : 0.004030s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000006s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.76% : 0.000232s : 1: renormalize.infer 0.67% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000137s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.57% : 0.006603s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 19.94% : 0.006102s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0182827, [24] [bootstrap]: 0.00046844 [type_inference]: 0.00438527 [event_method]: 1.089e-05 [auto_monad]: 5.16e-05 [graph_reusing]: 5.12e-06 [inline]: 1.69e-06 [add_attr]: 0.00301454, [1] [add_attr_with_inline]: 0.0030069, [1] [Cycle 1]: 4.431e-05, [2] [tag_attr]: 1.156e-05 [meta_addattr_fg_expand]: 2.99999e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.22e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00366686, [53] [py_interpret_to_execute]: 1.49e-05 [rewriter_before_opt_a]: 3.873e-05 [opt_a]: 0.0018577, [2] [Cycle 1]: 0.0012637, [45] [expand_dump_flag]: 2.84001e-06 [switch_simplify]: 2.404e-05 [loop_unroll]: 1.363e-05 [a_1]: 0.0002921 [with_stream_mark]: 1.273e-05 [recompute_prepare]: 7.25e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.73001e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 1.96e-06 [a_2]: 7.752e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.37001e-06 [meta_shard_fg_expand]: 1.49998e-06 [shard_inline]: 6.08002e-06 [merge_send_recv]: 7.28e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.823e-05 [flash_sp]: 7.08e-06 [merge_comm]: 4.15e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 8.74e-06 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 7.36999e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.87999e-06 [merge_forward]: 3.64002e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.06998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.57001e-06 [flash_sp_send_recv_attached]: 2.86e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.068e-05 [a_after_grad]: 8.92e-06 [renormalize]: 0.00034941 [add_forward_monad_depend]: 4.33001e-06 [auto_monad_grad]: 1.61002e-06 [auto_monad_eliminator]: 1.286e-05 [cse]: 2.78e-05 [a_3]: 3.982e-05 [Cycle 2]: 0.0005846, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.53998e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012456 [with_stream_mark]: 9.50001e-06 [recompute_prepare]: 5.47999e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.41e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.727e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.29e-06 [meta_shard_fg_expand]: 1.05001e-06 [shard_inline]: 5.48997e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.08002e-06 [parallel]: 4.17003e-06 [flash_sp]: 2.87002e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.77e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.47999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27999e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.15998e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.87999e-06 [a_after_grad]: 7.73999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 8.40024e-07 [auto_monad_eliminator]: 6.31998e-06 [cse]: 1.247e-05 [a_3]: 3.095e-05 [py_interpret_to_execute_after_opt_a]: 7.53999e-06 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 3.034e-05 [convert_after_rewriter]: 6.69001e-06 [order_py_execute_after_rewriter]: 4.87e-06 [mutable_eliminate]: 0.00044516 [opt_b]: 0.00018116, [1] [Cycle 1]: 0.00017511, [7] [b_1]: 0.00010748 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 4.69998e-07 [cse]: 1.604e-05 [optimize_parallel_all_gather_comm]: 1.522e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.199e-05 [loop_unroll]: 0.00041228 [opt_after_cconv]: 9.361e-05, [1] [Cycle 1]: 8.793e-05, [7] [c_1]: 2.727e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.59e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 2.379e-05 [tuple_transform]: 6.872e-05, [1] [Cycle 1]: 6.43e-05, [4] [d_1]: 3.846e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 5.87999e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.397e-05 [cse_after_recomputation]: 2.086e-05, [1] [Cycle 1]: 1.655e-05, [1] [cse]: 1.136e-05 [environ_conv]: 4.48999e-06 [swap_dp_allreduce_reducescatter]: 4.92e-06 [bias_add_comm_swap]: 2.83998e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.86999e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.14003e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.187e-05 [grouped_pairwise_exchange_alltoall]: 1.70001e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.733e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.93997e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 6.722e-05, [1] [Cycle 1]: 6.292e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.04002e-06 [elim_not_effective]: 1.095e-05 [opt_reshape]: 5.90002e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.673e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.00044646 [validate]: 3.129e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00594643 [execute]: 6.54999e-06 Sums bootstrap : 0.000468s : 3.27% type_inference : 0.004385s : 30.64% event_method : 0.000011s : 0.08% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.91% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000350s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 3.11% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000412s : 2.88% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000024s : 0.17% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.12% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.12% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005946s : 41.54% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.10% : 0.000022s : 4: substitution.arithmetic_simplify 1.35% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.35% : 0.000005s : 4: substitution.graph_param_transform 66.06% : 0.000079s : 2: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 3.44% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004342 2 91.80% : 0.003986s : 1: type_inference.infer 8.20% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.69% : 0.000004s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.19% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.52% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.80% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.67% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.88% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 42.38% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.62% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026236 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.51% : 0.003019s : 1: add_attr 11.47% : 0.003010s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.92% : 0.000503s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000454s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.92% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001860s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000456s : 1: opt_after_jit_grad 0.70% : 0.000185s : 1: opt_b 13.99% : 0.003671s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.11% : 0.000028s : 1: remove_dup_value 0.74% : 0.000195s : 1: renormalize.infer 0.57% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.70% : 0.005956s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.76% : 0.004398s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0197106, [24] [bootstrap]: 0.0004695 [type_inference]: 0.00552266 [event_method]: 1.415e-05 [auto_monad]: 5.531e-05 [graph_reusing]: 5.49998e-06 [inline]: 2.31e-06 [add_attr]: 0.00299966, [1] [add_attr_with_inline]: 0.00299207, [1] [Cycle 1]: 4.549e-05, [2] [tag_attr]: 1.478e-05 [meta_addattr_fg_expand]: 3.93999e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.513e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00397733, [53] [py_interpret_to_execute]: 2.02e-05 [rewriter_before_opt_a]: 5.804e-05 [opt_a]: 0.00211633, [2] [Cycle 1]: 0.0015131, [45] [expand_dump_flag]: 3.10002e-06 [switch_simplify]: 3.165e-05 [loop_unroll]: 2.122e-05 [a_1]: 0.00044791 [with_stream_mark]: 1.271e-05 [recompute_prepare]: 7.64002e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.26999e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.778e-05 [accelerated_algorithm]: 6.67002e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 6.22001e-06 [parallel]: 1.719e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.81002e-06 [allreduce_slice_to_reducescatter]: 9.10019e-07 [virtual_shard_identity]: 7.21001e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 9.51003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.098e-05 [merge_recompute_call_nodes]: 1.61002e-06 [before_grad]: 9.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.102e-05 [a_after_grad]: 9.17999e-06 [renormalize]: 0.00042217 [add_forward_monad_depend]: 4.52998e-06 [auto_monad_grad]: 1.98002e-06 [auto_monad_eliminator]: 1.316e-05 [cse]: 2.853e-05 [a_3]: 4.025e-05 [Cycle 2]: 0.00059354, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.49002e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00012533 [with_stream_mark]: 9.31e-06 [recompute_prepare]: 5.94e-06 [updatestate_depend_eliminate]: 2.85002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 7.39994e-07 [a_2]: 7.251e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.10998e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.93998e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 5.94e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 4.90001e-06 [virtual_output]: 4.91002e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 5.82001e-06 [cell_reuse_handle_not_recompute_node_pass]: 8.94e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.50001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 8.29002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04003e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.289e-05 [a_3]: 3.237e-05 [py_interpret_to_execute_after_opt_a]: 7.13e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 2.923e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00048842 [opt_b]: 0.00018209, [1] [Cycle 1]: 0.0001758, [7] [b_1]: 0.00010855 [b_2]: 6.56e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.19997e-07 [cse]: 1.633e-05 [optimize_parallel_all_gather_comm]: 1.52e-05 [overlap_param_gather]: 2.05002e-06 [cconv]: 2.197e-05 [loop_unroll]: 0.00041207 [opt_after_cconv]: 9.545e-05, [1] [Cycle 1]: 8.978e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.637e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 1.252e-05 [tuple_transform]: 6.859e-05, [1] [Cycle 1]: 6.412e-05, [4] [d_1]: 3.832e-05 [none_parameter_eliminate]: 1.76998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.15002e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.336e-05 [cse_after_recomputation]: 2.037e-05, [1] [Cycle 1]: 1.621e-05, [1] [cse]: 1.12e-05 [environ_conv]: 4.67998e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.10002e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 3.09001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.167e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 3.45998e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55001e-06 [overlap_recompute_comm]: 2.42001e-06 [overlap_grad_ring_attention]: 3.7e-06 [overlap_grad_flash_sp]: 1.641e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.61998e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.699e-05, [1] [Cycle 1]: 6.291e-05, [6] [build]: 2.00002e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.128e-05 [opt_reshape]: 6.04999e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.61e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044502 [validate]: 3.117e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.00592959 [execute]: 7.16001e-06 Sums bootstrap : 0.000469s : 2.98% type_inference : 0.005523s : 35.04% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000573s : 3.64% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000150s : 0.95% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000422s : 2.68% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000041s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000488s : 3.10% optimize.opt_b.b_1 : 0.000109s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000412s : 2.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 2.82% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005930s : 37.62% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000163 30 15.51% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 4: substitution.graph_param_transform 65.75% : 0.000107s : 3: substitution.inline 1.94% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.57% : 0.000004s : 4: substitution.remove_not_recompute_node 2.45% : 0.000004s : 4: substitution.replace_old_param 6.82% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005483 2 90.14% : 0.004942s : 1: type_inference.infer 9.86% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.49% : 0.000026s : 3: replace.inline 30.51% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.31% : 0.000105s : 3: match.inline 8.69% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 1.08% : 0.000002s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 0.97% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000010s : 51: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.28% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.07% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.06% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 46.12% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.88% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028211 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.65% : 0.003004s : 1: add_attr 10.62% : 0.002995s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.78% : 0.000503s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.76% : 0.000498s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.34% : 0.000943s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.51% : 0.002119s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.61% : 0.000455s : 1: opt_after_jit_grad 0.66% : 0.000185s : 1: opt_b 14.11% : 0.003981s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.77% : 0.000218s : 1: renormalize.infer 0.70% : 0.000197s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000033s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.05% : 0.005939s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.62% : 0.005536s : 1: type_inference 0.21% : 0.000058s : 1: validate TotalTime = 0.0373949, [24] [bootstrap]: 0.00049715 [type_inference]: 0.0113975 [event_method]: 4.703e-05 [auto_monad]: 0.00011905 [graph_reusing]: 8.2e-06 [inline]: 2.12999e-06 [add_attr]: 0.00297127, [1] [add_attr_with_inline]: 0.00296284, [1] [Cycle 1]: 7.265e-05, [2] [tag_attr]: 3.512e-05 [meta_addattr_fg_expand]: 9.47001e-06 [parallel-infer-symbol]: 2.95002e-06 [pre_auto_parallel]: 4.978e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.0133084, [53] [py_interpret_to_execute]: 3.944e-05 [rewriter_before_opt_a]: 0.00014282 [opt_a]: 0.011034, [3] [Cycle 1]: 0.0071032, [45] [expand_dump_flag]: 3.93001e-06 [switch_simplify]: 7.346e-05 [loop_unroll]: 6.197e-05 [a_1]: 0.00146038 [with_stream_mark]: 2.27e-05 [recompute_prepare]: 2.116e-05 [updatestate_depend_eliminate]: 9.66e-06 [updatestate_assign_eliminate]: 7.63999e-06 [updatestate_loads_eliminate]: 7.46999e-06 [parameter_eliminate]: 2.59999e-06 [a_2]: 0.00024301 [accelerated_algorithm]: 2.996e-05 [shard]: 1.80001e-06 [meta_shard_fg_expand]: 3.23e-06 [shard_inline]: 1.632e-05 [merge_send_recv]: 1.651e-05 [auto_parallel]: 1.068e-05 [parallel]: 1.881e-05 [flash_sp]: 1.1e-05 [merge_comm]: 9.69999e-06 [allreduce_fusion]: 8.97999e-06 [matmul_add_comm_reduction]: 2.617e-05 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 1.773e-05 [virtual_dataset]: 1.569e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.501e-05 [merge_forward]: 9.90002e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.724e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.854e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 2.749e-05 [set_forward_comm_id_for_comm_node_pass]: 9.34e-06 [meta_fg_expand]: 0.00139772 [flash_sp_send_recv_attached]: 3.55e-06 [receive_attached]: 2.53003e-06 [after_resolve]: 5.87e-05 [a_after_grad]: 8.065e-05 [renormalize]: 0.00249245 [add_forward_monad_depend]: 9.00001e-06 [auto_monad_grad]: 5.35999e-06 [auto_monad_eliminator]: 5.439e-05 [cse]: 0.00016239 [a_3]: 0.00033486 [Cycle 2]: 0.00302039, [45] [expand_dump_flag]: 2.02001e-06 [switch_simplify]: 4.701e-05 [loop_unroll]: 4.326e-05 [a_1]: 0.00153825 [with_stream_mark]: 1.277e-05 [recompute_prepare]: 1.116e-05 [updatestate_depend_eliminate]: 5.37999e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012627 [accelerated_algorithm]: 1.191e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.99e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 6.96001e-06 [auto_parallel]: 7.24001e-06 [parallel]: 4.75001e-06 [flash_sp]: 3.09999e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.67e-06 [matmul_add_comm_reduction]: 7.78001e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 8.79003e-06 [get_grad_eliminate_]: 8.89003e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.669e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.415e-05 [set_forward_comm_id_for_comm_node_pass]: 5.15999e-06 [meta_fg_expand]: 7.368e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.577e-05 [a_after_grad]: 1.427e-05 [renormalize]: 0.00059768 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 1.428e-05 [cse]: 4.595e-05 [a_3]: 6.536e-05 [Cycle 3]: 0.00089561, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.015e-05 [loop_unroll]: 8.84e-06 [a_1]: 0.000248 [with_stream_mark]: 9.78002e-06 [recompute_prepare]: 9.29e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 3.88001e-06 [updatestate_loads_eliminate]: 3.86999e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 0.00012226 [accelerated_algorithm]: 1.199e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.01998e-06 [merge_send_recv]: 7.01001e-06 [auto_parallel]: 7.18e-06 [parallel]: 4.68999e-06 [flash_sp]: 1.13001e-06 [merge_comm]: 4.81002e-06 [allreduce_fusion]: 4.86002e-06 [matmul_add_comm_reduction]: 7.47002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.032e-05 [virtual_dataset]: 8.95001e-06 [get_grad_eliminate_]: 8.53001e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.12e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.35001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.575e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.375e-05 [set_forward_comm_id_for_comm_node_pass]: 5.46e-06 [meta_fg_expand]: 3.02002e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 1.304e-05 [a_after_grad]: 1.389e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 1.074e-05 [cse]: 2.69e-05 [a_3]: 6.024e-05 [py_interpret_to_execute_after_opt_a]: 1.082e-05 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 4.682e-05 [convert_after_rewriter]: 9.30001e-06 [order_py_execute_after_rewriter]: 6.88e-06 [mutable_eliminate]: 0.00046866 [opt_b]: 0.0002854, [1] [Cycle 1]: 0.00027882, [7] [b_1]: 0.00018871 [b_2]: 1.059e-05 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 4.10998e-06 [renormalize]: 3.89991e-07 [cse]: 2.932e-05 [optimize_parallel_all_gather_comm]: 2.011e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.017e-05 [loop_unroll]: 0.00044083 [opt_after_cconv]: 0.0001344, [1] [Cycle 1]: 0.00012803, [7] [c_1]: 4.767e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 7.26999e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 4.03999e-06 [cse]: 2.861e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 2.864e-05 [tuple_transform]: 0.00010056, [1] [Cycle 1]: 9.602e-05, [4] [d_1]: 6.578e-05 [none_parameter_eliminate]: 1.83002e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 9.82999e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 5.651e-05 [cse_after_recomputation]: 3.067e-05, [1] [Cycle 1]: 2.627e-05, [1] [cse]: 2.077e-05 [environ_conv]: 8.17003e-06 [swap_dp_allreduce_reducescatter]: 7.81001e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 1.00001e-06 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.43002e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.721e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.87998e-06 [overlap_recompute_and_grad_model_parallel]: 5.57001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.46998e-06 [overlap_grad_ring_attention]: 4.98001e-06 [overlap_grad_flash_sp]: 2.467e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 9.69e-05, [1] [Cycle 1]: 9.268e-05, [6] [build]: 8.94998e-06 [elim_shapecalc]: 1.274e-05 [elim_not_effective]: 1.824e-05 [opt_reshape]: 1.012e-05 [fold_const_symbol]: 1.476e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.47e-05 [get_jit_bprop_graph]: 1.15999e-06 [rewriter_after_jit_bprop_graph]: 3.32002e-06 [opt_after_jit_grad]: 0.00046068 [validate]: 4.421e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.0082324 [execute]: 7.26999e-06 Sums bootstrap : 0.000497s : 1.50% type_inference : 0.011397s : 34.35% event_method : 0.000047s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000143s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.39% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003247s : 9.79% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.48% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001474s : 4.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.26% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003090s : 9.31% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000079s : 0.24% optimize.opt_a.cse : 0.000235s : 0.71% optimize.opt_a.a_3 : 0.000460s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000469s : 1.41% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000029s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000441s : 1.33% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000461s : 1.39% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008232s : 24.81% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000775 222 5.83% : 0.000045s : 12: substitution.arithmetic_simplify 1.68% : 0.000013s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.93% : 0.000007s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.58% : 0.000438s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.88% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.77% : 0.000014s : 20: substitution.remove_not_recompute_node 3.04% : 0.000024s : 10: substitution.replace_applicator 1.29% : 0.000010s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.61% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.44% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011323 2 86.62% : 0.009807s : 1: type_inference.infer 13.38% : 0.001516s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.38% : 0.000125s : 17: replace.inline 42.62% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000464 33 92.56% : 0.000430s : 17: match.inline 7.44% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.72% : 0.000043s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 68: predicate.reduce_eliminate 2.72% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.94% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000037s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001581 34 57.08% : 0.000902s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.92% : 0.000678s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061992 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.80% : 0.002976s : 1: add_attr 4.79% : 0.002967s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000126s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.86% : 0.000531s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000033s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000450s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000477s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.91% : 0.004904s : 117: opt.transform.opt_a 0.07% : 0.000046s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.80% : 0.011037s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.76% : 0.000470s : 1: opt_after_jit_grad 0.47% : 0.000289s : 1: opt_b 21.47% : 0.013312s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.73% : 0.001695s : 2: renormalize.infer 2.23% : 0.001382s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000147s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.30% : 0.008243s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 18.41% : 0.011413s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0185472, [24] [bootstrap]: 0.0004377 [type_inference]: 0.00435034 [event_method]: 1.072e-05 [auto_monad]: 5.107e-05 [graph_reusing]: 4.84e-06 [inline]: 1.80001e-06 [add_attr]: 0.00299207, [1] [add_attr_with_inline]: 0.00298392, [1] [Cycle 1]: 4.642e-05, [2] [tag_attr]: 1.251e-05 [meta_addattr_fg_expand]: 3.11999e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.184e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 9.80013e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00369712, [53] [py_interpret_to_execute]: 1.486e-05 [rewriter_before_opt_a]: 3.754e-05 [opt_a]: 0.00186262, [2] [Cycle 1]: 0.00126426, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.47e-05 [loop_unroll]: 1.384e-05 [a_1]: 0.00029053 [with_stream_mark]: 1.355e-05 [recompute_prepare]: 7.67002e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 7.67e-05 [accelerated_algorithm]: 6.11e-06 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 7.43e-06 [auto_parallel]: 6.22001e-06 [parallel]: 1.786e-05 [flash_sp]: 7.4e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 9.62001e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.12002e-06 [virtual_dataset]: 6.02001e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.45001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.85999e-06 [renormalize]: 0.00035186 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.23e-05 [cse]: 2.65e-05 [a_3]: 4.053e-05 [Cycle 2]: 0.00058876, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 7.06001e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012414 [with_stream_mark]: 1.096e-05 [recompute_prepare]: 5.33002e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.791e-05 [accelerated_algorithm]: 5.57999e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.06997e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.22999e-06 [parallel]: 4.33001e-06 [flash_sp]: 3.06999e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.65002e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.92999e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42999e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.81001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.90999e-06 [a_after_grad]: 7.9e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.09001e-06 [cse]: 1.259e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.23e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.13e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 4.71002e-06 [mutable_eliminate]: 0.00049476 [opt_b]: 0.00018065, [1] [Cycle 1]: 0.00017433, [7] [b_1]: 0.00010724 [b_2]: 6.58003e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.10014e-07 [cse]: 1.588e-05 [optimize_parallel_all_gather_comm]: 1.582e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.2e-05 [loop_unroll]: 0.00040289 [opt_after_cconv]: 9.262e-05, [1] [Cycle 1]: 8.693e-05, [7] [c_1]: 2.755e-05 [parameter_eliminate]: 1.97999e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.552e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.231e-05 [tuple_transform]: 6.797e-05, [1] [Cycle 1]: 6.37e-05, [4] [d_1]: 3.833e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.97001e-06 [partial_unused_args_eliminate]: 1.55999e-06 [add_recomputation]: 4.377e-05 [cse_after_recomputation]: 1.987e-05, [1] [Cycle 1]: 1.563e-05, [1] [cse]: 1.063e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.39e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.44998e-06 [slice_recompute_activation]: 2.64001e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.47999e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.08998e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.169e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.85999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.82998e-06 [overlap_grad_flash_sp]: 1.69e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.23998e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.856e-05, [1] [Cycle 1]: 6.432e-05, [6] [build]: 2.11998e-06 [elim_shapecalc]: 8.45999e-06 [elim_not_effective]: 1.214e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.71997e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.24001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.519e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00044049 [validate]: 3.179e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00627024 [execute]: 7.14001e-06 Sums bootstrap : 0.000438s : 3.00% type_inference : 0.004350s : 29.80% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.84% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000352s : 2.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000495s : 3.39% optimize.opt_b.b_1 : 0.000107s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000403s : 2.76% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.02% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000440s : 3.02% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006270s : 42.95% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.27% : 0.000022s : 4: substitution.arithmetic_simplify 1.74% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.27% : 0.000005s : 4: substitution.graph_param_transform 65.72% : 0.000079s : 2: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.72% : 0.000004s : 4: substitution.remove_not_recompute_node 2.94% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004308 2 91.96% : 0.003962s : 1: type_inference.infer 8.04% : 0.000346s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.72% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.10% : 0.000001s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.92% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.70% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.01% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.64% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.13% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.87% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026514 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.30% : 0.002996s : 1: add_attr 11.27% : 0.002988s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.78% : 0.000473s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.55% : 0.000411s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.90% : 0.000504s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.89% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.04% : 0.001866s : 1: opt_a 0.36% : 0.000096s : 1: opt_after_cconv 1.70% : 0.000450s : 1: opt_after_jit_grad 0.69% : 0.000184s : 1: opt_b 13.96% : 0.003701s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.02% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000194s : 1: renormalize.infer 0.57% : 0.000152s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.69% : 0.006280s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.46% : 0.004365s : 1: type_inference 0.22% : 0.000059s : 1: validate TotalTime = 0.0361682, [24] [bootstrap]: 0.00047727 [type_inference]: 0.0103352 [event_method]: 4.051e-05 [auto_monad]: 0.00011689 [graph_reusing]: 7.82998e-06 [inline]: 1.90001e-06 [add_attr]: 0.00304404, [1] [add_attr_with_inline]: 0.00303577, [1] [Cycle 1]: 6.627e-05, [2] [tag_attr]: 3.08e-05 [meta_addattr_fg_expand]: 8.42998e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 4.606e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.01998e-06 [pipeline_split]: 1.56002e-06 [optimize]: 0.013004, [53] [py_interpret_to_execute]: 3.57e-05 [rewriter_before_opt_a]: 0.00012658 [opt_a]: 0.0107719, [3] [Cycle 1]: 0.00689375, [45] [expand_dump_flag]: 4.13001e-06 [switch_simplify]: 6.622e-05 [loop_unroll]: 5.435e-05 [a_1]: 0.00133909 [with_stream_mark]: 2.335e-05 [recompute_prepare]: 2.173e-05 [updatestate_depend_eliminate]: 8.88002e-06 [updatestate_assign_eliminate]: 7.75e-06 [updatestate_loads_eliminate]: 7.26999e-06 [parameter_eliminate]: 2.83e-06 [a_2]: 0.00024413 [accelerated_algorithm]: 3.161e-05 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 3.70003e-06 [shard_inline]: 1.614e-05 [merge_send_recv]: 1.616e-05 [auto_parallel]: 1.07e-05 [parallel]: 1.941e-05 [flash_sp]: 1.162e-05 [merge_comm]: 9.61998e-06 [allreduce_fusion]: 8.69e-06 [matmul_add_comm_reduction]: 2.748e-05 [allreduce_slice_to_reducescatter]: 6.49976e-07 [virtual_shard_identity]: 1.808e-05 [virtual_dataset]: 1.562e-05 [get_grad_eliminate_]: 1.537e-05 [virtual_output]: 1.527e-05 [merge_forward]: 9.27999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.725e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.837e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.867e-05 [set_forward_comm_id_for_comm_node_pass]: 9.91998e-06 [meta_fg_expand]: 0.00136865 [flash_sp_send_recv_attached]: 3.94002e-06 [receive_attached]: 2.78e-06 [after_resolve]: 5.988e-05 [a_after_grad]: 8.16e-05 [renormalize]: 0.00235326 [add_forward_monad_depend]: 9.61e-06 [auto_monad_grad]: 5.57001e-06 [auto_monad_eliminator]: 5.615e-05 [cse]: 0.00016505 [a_3]: 0.00033277 [Cycle 2]: 0.00293737, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 4.659e-05 [loop_unroll]: 4.318e-05 [a_1]: 0.00152771 [with_stream_mark]: 1.247e-05 [recompute_prepare]: 1.072e-05 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 4.42e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 0.0001256 [accelerated_algorithm]: 1.193e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 9.32999e-06 [merge_send_recv]: 6.90998e-06 [auto_parallel]: 7.16999e-06 [parallel]: 4.80001e-06 [flash_sp]: 3.55e-06 [merge_comm]: 5.29e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 7.93001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.068e-05 [virtual_dataset]: 9.00999e-06 [get_grad_eliminate_]: 8.90001e-06 [virtual_output]: 8.3e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 9.10019e-07 [offload_activation]: 9.47001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.633e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 3.493e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 1.506e-05 [a_after_grad]: 1.405e-05 [renormalize]: 0.00057458 [add_forward_monad_depend]: 4.08001e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.415e-05 [cse]: 4.254e-05 [a_3]: 6.462e-05 [Cycle 3]: 0.00092707, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 1.026e-05 [loop_unroll]: 8.85999e-06 [a_1]: 0.00024718 [with_stream_mark]: 9.99001e-06 [recompute_prepare]: 8.85999e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 3.89002e-06 [updatestate_loads_eliminate]: 3.91999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00015751 [accelerated_algorithm]: 1.192e-05 [shard]: 9.69972e-07 [meta_shard_fg_expand]: 1.71998e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.04001e-06 [parallel]: 4.63999e-06 [flash_sp]: 1.17999e-06 [merge_comm]: 4.97999e-06 [allreduce_fusion]: 4.98001e-06 [matmul_add_comm_reduction]: 7.66999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 8.62998e-06 [get_grad_eliminate_]: 8.64e-06 [virtual_output]: 8.22998e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.73001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.57e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.356e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 2.96999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.331e-05 [a_after_grad]: 1.408e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.39003e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 1.033e-05 [cse]: 2.942e-05 [a_3]: 5.662e-05 [py_interpret_to_execute_after_opt_a]: 1.005e-05 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 4.743e-05 [convert_after_rewriter]: 1.019e-05 [order_py_execute_after_rewriter]: 7.06999e-06 [mutable_eliminate]: 0.00045469 [opt_b]: 0.00028806, [1] [Cycle 1]: 0.00028185, [7] [b_1]: 0.00019074 [b_2]: 1.072e-05 [updatestate_depend_eliminate]: 6.92002e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.97e-06 [renormalize]: 4.69998e-07 [cse]: 3.039e-05 [optimize_parallel_all_gather_comm]: 2.03e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.078e-05 [loop_unroll]: 0.0004179 [opt_after_cconv]: 0.00013511, [1] [Cycle 1]: 0.00012929, [7] [c_1]: 4.953e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 7.03998e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.98999e-06 [cse]: 2.872e-05 [renormalize]: 3.49974e-07 [remove_dup_value]: 2.982e-05 [tuple_transform]: 0.00010064, [1] [Cycle 1]: 9.615e-05, [4] [d_1]: 6.624e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 9.86e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 5.792e-05 [cse_after_recomputation]: 3.092e-05, [1] [Cycle 1]: 2.625e-05, [1] [cse]: 2.115e-05 [environ_conv]: 9.20999e-06 [swap_dp_allreduce_reducescatter]: 7.89002e-06 [bias_add_comm_swap]: 2.33002e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.24998e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.757e-05 [grouped_pairwise_exchange_alltoall]: 1.92001e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.74999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 2.24999e-06 [overlap_grad_ring_attention]: 4.97999e-06 [overlap_grad_flash_sp]: 2.44e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.773e-05, [1] [Cycle 1]: 9.343e-05, [6] [build]: 9.72001e-06 [elim_shapecalc]: 1.282e-05 [elim_not_effective]: 1.794e-05 [opt_reshape]: 1.013e-05 [fold_const_symbol]: 1.454e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.80001e-06 [auto_monad_reorder]: 2.608e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.0004628 [validate]: 4.471e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0083261 [execute]: 7.16001e-06 Sums bootstrap : 0.000477s : 1.50% type_inference : 0.010335s : 32.51% event_method : 0.000041s : 0.13% auto_monad : 0.000117s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.39% optimize.opt_a.loop_unroll : 0.000106s : 0.33% optimize.opt_a.a_1 : 0.003114s : 9.79% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000527s : 1.66% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001407s : 4.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002928s : 9.21% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000237s : 0.75% optimize.opt_a.a_3 : 0.000454s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000455s : 1.43% optimize.opt_b.b_1 : 0.000191s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000418s : 1.31% optimize.opt_after_cconv.c_1 : 0.000050s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000463s : 1.46% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008326s : 26.19% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000733 218 5.93% : 0.000043s : 11: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 54.19% : 0.000397s : 16: substitution.inline 2.25% : 0.000016s : 2: substitution.inline_without_move 1.50% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.12% : 0.000016s : 3: substitution.less_batch_normalization 1.81% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.83% : 0.000013s : 20: substitution.remove_not_recompute_node 3.30% : 0.000024s : 10: substitution.replace_applicator 1.43% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.74% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.49% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.48% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010266 2 87.71% : 0.009004s : 1: type_inference.infer 12.29% : 0.001261s : 1: type_inference.specialize ------[replace.] 0.000201 30 59.21% : 0.000119s : 16: replace.inline 40.79% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000420 30 92.53% : 0.000389s : 16: match.inline 7.47% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000739 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.07% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 244: predicate.inline 1.24% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 67: predicate.reduce_eliminate 2.69% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.33% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001522 32 59.57% : 0.000907s : 12: func_graph_cloner_run.FuncGraphClonerGraph 40.43% : 0.000615s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060263 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.06% : 0.003048s : 1: add_attr 5.04% : 0.003040s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000124s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.85% : 0.000514s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.95% : 0.004790s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.88% : 0.010775s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.78% : 0.000472s : 1: opt_after_jit_grad 0.48% : 0.000292s : 1: opt_b 21.58% : 0.013008s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.60% : 0.001566s : 2: renormalize.infer 2.24% : 0.001350s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.83% : 0.008337s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.18% : 0.010351s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x6-kbk],max_mem:26.0M TotalTime = 0.119485, [24] [bootstrap]: 0.00052139 [type_inference]: 0.00613818 [event_method]: 1.345e-05 [auto_monad]: 5.968e-05 [graph_reusing]: 5.43002e-06 [inline]: 2.02001e-06 [add_attr]: 0.00338683, [1] [add_attr_with_inline]: 0.00337676, [1] [Cycle 1]: 4.56e-05, [2] [tag_attr]: 1.555e-05 [meta_addattr_fg_expand]: 4.36002e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 2.895e-05 [insert-virtual-dataset]: 2.71e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00397582, [53] [py_interpret_to_execute]: 2.074e-05 [rewriter_before_opt_a]: 5.952e-05 [opt_a]: 0.00211702, [2] [Cycle 1]: 0.00152183, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 3.263e-05 [loop_unroll]: 2.178e-05 [a_1]: 0.00045482 [with_stream_mark]: 1.398e-05 [recompute_prepare]: 7.68001e-06 [updatestate_depend_eliminate]: 3.91999e-06 [updatestate_assign_eliminate]: 3.38999e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 7.58e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 8.25e-06 [auto_parallel]: 6.33998e-06 [parallel]: 2.377e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.86001e-06 [allreduce_fusion]: 3.59002e-06 [matmul_add_comm_reduction]: 8.40999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.51003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 9.13002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.28998e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.77002e-06 [after_resolve]: 9.85002e-06 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00041438 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 2.07999e-06 [auto_monad_eliminator]: 1.478e-05 [cse]: 2.853e-05 [a_3]: 4.045e-05 [Cycle 2]: 0.000586, [45] [expand_dump_flag]: 1.14e-06 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012546 [with_stream_mark]: 1.016e-05 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 6.702e-05 [accelerated_algorithm]: 5.49998e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.55001e-06 [flash_sp]: 3.73001e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 5.02999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 5.76e-06 [virtual_dataset]: 5.05001e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.67999e-06 [offload_activation]: 5.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.76e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.13002e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 5.98002e-06 [cse]: 1.191e-05 [a_3]: 3.156e-05 [py_interpret_to_execute_after_opt_a]: 7.77002e-06 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 3.182e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 5.27001e-06 [mutable_eliminate]: 0.00044076 [opt_b]: 0.00018534, [1] [Cycle 1]: 0.00017905, [7] [b_1]: 0.00010745 [b_2]: 1.099e-05 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 3.29979e-07 [cse]: 1.615e-05 [optimize_parallel_all_gather_comm]: 1.607e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.25e-05 [loop_unroll]: 0.00040738 [opt_after_cconv]: 9.231e-05, [1] [Cycle 1]: 8.685e-05, [7] [c_1]: 2.761e-05 [parameter_eliminate]: 2.06998e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.536e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.273e-05 [tuple_transform]: 6.964e-05, [1] [Cycle 1]: 6.538e-05, [4] [d_1]: 3.955e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.05002e-06 [partial_unused_args_eliminate]: 2.16e-06 [add_recomputation]: 7.609e-05 [cse_after_recomputation]: 2.115e-05, [1] [Cycle 1]: 1.657e-05, [1] [cse]: 1.121e-05 [environ_conv]: 4.60999e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.51998e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.07001e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 8.40024e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.31002e-06 [add_comm_op_reuse_tag]: 1.31002e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.202e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.28998e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.738e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 2.13002e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.789e-05, [1] [Cycle 1]: 6.383e-05, [6] [build]: 2.25002e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.111e-05 [opt_reshape]: 6.48e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.76001e-06 [opt_after_jit_grad]: 0.00044265 [validate]: 3.043e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.104619 [execute]: 9.28002e-06 Sums bootstrap : 0.000521s : 0.45% type_inference : 0.006138s : 5.33% event_method : 0.000013s : 0.01% auto_monad : 0.000060s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000060s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000580s : 0.50% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000414s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000441s : 0.38% optimize.opt_b.b_1 : 0.000107s : 0.09% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000407s : 0.35% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000076s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000443s : 0.38% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.104619s : 90.88% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000167 30 14.93% : 0.000025s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.55% : 0.000006s : 4: substitution.graph_param_transform 66.00% : 0.000111s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.91% : 0.000005s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 6.90% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006089 2 90.89% : 0.005535s : 1: type_inference.infer 9.11% : 0.000555s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.01% : 0.000027s : 3: replace.inline 30.99% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.20% : 0.000108s : 3: match.inline 8.80% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 1.10% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000009s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.40% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.94% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.93% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.71% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.29% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.128359 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.64% : 0.003392s : 1: add_attr 2.63% : 0.003380s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000080s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000065s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000559s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000416s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000449s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.74% : 0.000945s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000093s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.65% : 0.002120s : 1: opt_a 0.07% : 0.000096s : 1: opt_after_cconv 0.35% : 0.000452s : 1: opt_after_jit_grad 0.15% : 0.000189s : 1: opt_b 3.10% : 0.003979s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000215s : 1: renormalize.infer 0.15% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000071s : 1: symbol_engine_optimizer 81.52% : 0.104641s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.79% : 0.006153s : 1: type_inference 0.04% : 0.000057s : 1: validate TotalTime = 0.111315, [24] [bootstrap]: 0.00045544 [type_inference]: 0.00445607 [event_method]: 1.105e-05 [auto_monad]: 5.169e-05 [graph_reusing]: 5.34e-06 [inline]: 1.96e-06 [add_attr]: 0.00296668, [1] [add_attr_with_inline]: 0.00295883, [1] [Cycle 1]: 4.328e-05, [2] [tag_attr]: 1.323e-05 [meta_addattr_fg_expand]: 3.38999e-06 [parallel-infer-symbol]: 3.03998e-06 [pre_auto_parallel]: 2.19e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00375578, [53] [py_interpret_to_execute]: 1.617e-05 [rewriter_before_opt_a]: 3.942e-05 [opt_a]: 0.00194953, [2] [Cycle 1]: 0.00135512, [45] [expand_dump_flag]: 2.58998e-06 [switch_simplify]: 2.472e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00029576 [with_stream_mark]: 1.332e-05 [recompute_prepare]: 7.33999e-06 [updatestate_depend_eliminate]: 3.35e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.556e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.58e-06 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 5.56998e-06 [merge_send_recv]: 8.47998e-06 [auto_parallel]: 5.77999e-06 [parallel]: 1.85e-05 [flash_sp]: 7.08e-06 [merge_comm]: 3.54002e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 9.02e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.06999e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.59002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.57999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.06e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 2.98e-06 [receive_attached]: 2.93e-06 [after_resolve]: 1.066e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00043873 [add_forward_monad_depend]: 5.04e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.805e-05 [a_3]: 4.025e-05 [Cycle 2]: 0.00058518, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.44001e-06 [loop_unroll]: 5.58002e-06 [a_1]: 0.00012478 [with_stream_mark]: 9.05001e-06 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.63003e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.48002e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.751e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.34002e-06 [auto_parallel]: 5.09e-06 [parallel]: 4.01001e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 2.85002e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 5.05999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 4.75999e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.14e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.68999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.11998e-06 [a_after_grad]: 7.9e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 6.45997e-06 [cse]: 1.278e-05 [a_3]: 3.153e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.14e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00044415 [opt_b]: 0.00018066, [1] [Cycle 1]: 0.00017474, [7] [b_1]: 0.00010851 [b_2]: 7.00998e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.17001e-06 [renormalize]: 3.89991e-07 [cse]: 1.651e-05 [optimize_parallel_all_gather_comm]: 1.583e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.273e-05 [loop_unroll]: 0.00041011 [opt_after_cconv]: 9.615e-05, [1] [Cycle 1]: 9.042e-05, [7] [c_1]: 2.817e-05 [parameter_eliminate]: 2.51e-06 [updatestate_depend_eliminate]: 5.41002e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.68e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 6.915e-05, [1] [Cycle 1]: 6.492e-05, [4] [d_1]: 3.958e-05 [none_parameter_eliminate]: 1.40001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 2.12999e-06 [add_recomputation]: 4.501e-05 [cse_after_recomputation]: 2.074e-05, [1] [Cycle 1]: 1.631e-05, [1] [cse]: 1.087e-05 [environ_conv]: 5.03002e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.65002e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.31002e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.85998e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.79999e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 1.17999e-06 [interleave_split_concat_branches]: 1.26002e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06e-06 [control_data_broadcast_order]: 1.121e-05 [grouped_pairwise_exchange_alltoall]: 1.79998e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.53002e-06 [overlap_recompute_comm]: 2.76e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.703e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.89e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 6.762e-05, [1] [Cycle 1]: 6.357e-05, [6] [build]: 2.40002e-06 [elim_shapecalc]: 8.17e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 6.30002e-06 [fold_const_symbol]: 8.86002e-06 [renormalize]: 3.19997e-07 [detach_backward]: 1.53002e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 1.53e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00044213 [validate]: 3.206e-05 [backend_pass]: 1.18001e-06 [task_emit]: 0.0988266 [execute]: 9.44e-06 Sums bootstrap : 0.000455s : 0.42% type_inference : 0.004456s : 4.15% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000421s : 0.39% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000439s : 0.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000444s : 0.41% optimize.opt_b.b_1 : 0.000109s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000410s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000442s : 0.41% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098827s : 92.06% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000124 26 18.32% : 0.000023s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.12% : 0.000001s : 2: substitution.fold_const_symbol 4.64% : 0.000006s : 4: substitution.graph_param_transform 65.87% : 0.000082s : 2: substitution.inline 2.18% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.27% : 0.000004s : 4: substitution.remove_not_recompute_node 3.13% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004412 2 91.51% : 0.004038s : 1: type_inference.infer 8.49% : 0.000374s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000139 984 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.72% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.94% : 0.000001s : 8: predicate.get_grad_eliminate 0.40% : 0.000001s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.78% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.76% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.61% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000001s : 8: predicate.shard_identity_eliminate 1.14% : 0.000002s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.58% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.67% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000266 6 41.72% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.28% : 0.000155s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119395 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.49% : 0.002971s : 1: add_attr 2.48% : 0.002962s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000089s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000492s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000419s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.38% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.64% : 0.000766s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.64% : 0.001952s : 1: opt_a 0.08% : 0.000099s : 1: opt_after_cconv 0.38% : 0.000451s : 1: opt_after_jit_grad 0.15% : 0.000184s : 1: opt_b 3.15% : 0.003759s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.23% : 0.000274s : 1: renormalize.infer 0.13% : 0.000159s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.79% : 0.098850s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.74% : 0.004471s : 1: type_inference 0.05% : 0.000055s : 1: validate TotalTime = 0.112896, [24] [bootstrap]: 0.00044399 [type_inference]: 0.00556816 [event_method]: 1.505e-05 [auto_monad]: 5.727e-05 [graph_reusing]: 5.87001e-06 [inline]: 2.04999e-06 [add_attr]: 0.00299689, [1] [add_attr_with_inline]: 0.00298883, [1] [Cycle 1]: 4.727e-05, [2] [tag_attr]: 1.596e-05 [meta_addattr_fg_expand]: 4.17998e-06 [parallel-infer-symbol]: 3.46001e-06 [pre_auto_parallel]: 2.735e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00405144, [53] [py_interpret_to_execute]: 2.171e-05 [rewriter_before_opt_a]: 5.948e-05 [opt_a]: 0.00213457, [2] [Cycle 1]: 0.00152703, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.213e-05 [loop_unroll]: 2.164e-05 [a_1]: 0.0004513 [with_stream_mark]: 1.323e-05 [recompute_prepare]: 7.67002e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.10002e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.655e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.61002e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 5.91998e-06 [parallel]: 1.902e-05 [flash_sp]: 6.93e-06 [merge_comm]: 3.48999e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.48999e-06 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 7.25998e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.53002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.28002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.46998e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.12e-05 [a_after_grad]: 9.22001e-06 [renormalize]: 0.00042571 [add_forward_monad_depend]: 4.82998e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.955e-05 [a_3]: 4.199e-05 [Cycle 2]: 0.00059808, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.48002e-06 [a_1]: 0.00012573 [with_stream_mark]: 9.79e-06 [recompute_prepare]: 5.58002e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.34001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.75e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.43002e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.93999e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 3.32002e-06 [allreduce_fusion]: 2.93e-06 [matmul_add_comm_reduction]: 5.11002e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.48997e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.65e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.74e-06 [a_after_grad]: 8.18001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.16e-06 [cse]: 1.342e-05 [a_3]: 3.228e-05 [py_interpret_to_execute_after_opt_a]: 7.93001e-06 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 3.174e-05 [convert_after_rewriter]: 6.93e-06 [order_py_execute_after_rewriter]: 5.10001e-06 [mutable_eliminate]: 0.00052774 [opt_b]: 0.00018133, [1] [Cycle 1]: 0.00017547, [7] [b_1]: 0.00010815 [b_2]: 7.37002e-06 [updatestate_depend_eliminate]: 4.76002e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 3.39991e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.584e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.29e-05 [loop_unroll]: 0.00041313 [opt_after_cconv]: 9.548e-05, [1] [Cycle 1]: 8.982e-05, [7] [c_1]: 2.813e-05 [parameter_eliminate]: 2.28998e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.622e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.291e-05 [tuple_transform]: 6.969e-05, [1] [Cycle 1]: 6.524e-05, [4] [d_1]: 3.964e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.13998e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.371e-05 [cse_after_recomputation]: 2.056e-05, [1] [Cycle 1]: 1.611e-05, [1] [cse]: 1.105e-05 [environ_conv]: 5.00001e-06 [swap_dp_allreduce_reducescatter]: 5.48002e-06 [bias_add_comm_swap]: 2.53998e-06 [label_micro_interleaved_index]: 4.24997e-06 [label_fine_grained_interleaved_index]: 2.98e-06 [merge_cast_opt]: 1.31998e-06 [slice_recompute_activation]: 2.49999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.49999e-06 [reorder_send_recv_between_fp_bp]: 2.66999e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.42e-06 [overlap_opt_shard_in_pipeline]: 1.39e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.38001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.34999e-06 [overlap_grad_ring_attention]: 4.14002e-06 [overlap_grad_flash_sp]: 1.658e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 2.10002e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.708e-05, [1] [Cycle 1]: 6.308e-05, [6] [build]: 2.16e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.12e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 8.70999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.44e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.606e-05 [get_jit_bprop_graph]: 9.09989e-07 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044467 [validate]: 3.226e-05 [backend_pass]: 8.09989e-07 [task_emit]: 0.0989944 [execute]: 9.79999e-06 Sums bootstrap : 0.000444s : 0.41% type_inference : 0.005568s : 5.11% event_method : 0.000015s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000577s : 0.53% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000426s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000528s : 0.48% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000413s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000445s : 0.41% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098994s : 90.89% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.65% : 0.000025s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.68% : 0.000006s : 4: substitution.graph_param_transform 66.16% : 0.000111s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.63% : 0.000004s : 4: substitution.replace_old_param 6.85% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005525 2 89.83% : 0.004963s : 1: type_inference.infer 10.17% : 0.000562s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.48% : 0.000026s : 3: replace.inline 30.52% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.24% : 0.000109s : 3: match.inline 8.76% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000001s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.39% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.81% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.27% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.22% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.09% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.45% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 45.76% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.24% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.121468 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.47% : 0.003001s : 1: add_attr 2.46% : 0.002993s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000063s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000480s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.44% : 0.000537s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.78% : 0.000945s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.76% : 0.002137s : 1: opt_a 0.08% : 0.000099s : 1: opt_after_cconv 0.37% : 0.000454s : 1: opt_after_jit_grad 0.15% : 0.000185s : 1: opt_b 3.34% : 0.004055s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.18% : 0.000214s : 1: renormalize.infer 0.17% : 0.000205s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.52% : 0.099018s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.60% : 0.005582s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.146649, [24] [bootstrap]: 0.00049067 [type_inference]: 0.0114455 [event_method]: 4.889e-05 [auto_monad]: 0.00012181 [graph_reusing]: 8.60999e-06 [inline]: 1.68002e-06 [add_attr]: 0.00302638, [1] [add_attr_with_inline]: 0.00301776, [1] [Cycle 1]: 7.016e-05, [2] [tag_attr]: 3.464e-05 [meta_addattr_fg_expand]: 9.23002e-06 [parallel-infer-symbol]: 2.75002e-06 [pre_auto_parallel]: 5.127e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0134517, [53] [py_interpret_to_execute]: 4.092e-05 [rewriter_before_opt_a]: 0.00014665 [opt_a]: 0.0111773, [3] [Cycle 1]: 0.00719209, [45] [expand_dump_flag]: 4.30999e-06 [switch_simplify]: 7.478e-05 [loop_unroll]: 6.187e-05 [a_1]: 0.00145076 [with_stream_mark]: 2.278e-05 [recompute_prepare]: 2.161e-05 [updatestate_depend_eliminate]: 9.34e-06 [updatestate_assign_eliminate]: 8.52e-06 [updatestate_loads_eliminate]: 7.38999e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 0.00024628 [accelerated_algorithm]: 3.198e-05 [shard]: 2.50002e-06 [meta_shard_fg_expand]: 3.30003e-06 [shard_inline]: 1.621e-05 [merge_send_recv]: 1.596e-05 [auto_parallel]: 1.062e-05 [parallel]: 1.867e-05 [flash_sp]: 1.174e-05 [merge_comm]: 9.25999e-06 [allreduce_fusion]: 8.94e-06 [matmul_add_comm_reduction]: 2.714e-05 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 1.815e-05 [virtual_dataset]: 1.558e-05 [get_grad_eliminate_]: 1.527e-05 [virtual_output]: 1.495e-05 [merge_forward]: 9.67001e-06 [cell_reuse_recompute_pass]: 1.48002e-06 [offload_activation]: 1.782e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.918e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 2.698e-05 [set_forward_comm_id_for_comm_node_pass]: 9.86e-06 [meta_fg_expand]: 0.00144816 [flash_sp_send_recv_attached]: 3.7e-06 [receive_attached]: 2.79999e-06 [after_resolve]: 5.949e-05 [a_after_grad]: 8.178e-05 [renormalize]: 0.00247828 [add_forward_monad_depend]: 9.19e-06 [auto_monad_grad]: 5.22999e-06 [auto_monad_eliminator]: 5.641e-05 [cse]: 0.00020734 [a_3]: 0.00033584 [Cycle 2]: 0.00301243, [45] [expand_dump_flag]: 1.60001e-06 [switch_simplify]: 4.732e-05 [loop_unroll]: 4.389e-05 [a_1]: 0.00152805 [with_stream_mark]: 1.193e-05 [recompute_prepare]: 1.083e-05 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 4.30999e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 0.00012458 [accelerated_algorithm]: 1.2e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 9.30001e-06 [merge_send_recv]: 6.97002e-06 [auto_parallel]: 7.23e-06 [parallel]: 4.84e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.60999e-06 [matmul_add_comm_reduction]: 8.1e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.58001e-06 [get_grad_eliminate_]: 8.74e-06 [virtual_output]: 9.02e-06 [merge_forward]: 5.06997e-06 [cell_reuse_recompute_pass]: 9.00007e-07 [offload_activation]: 9.48002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.645e-05 [merge_recompute_call_nodes]: 6.70028e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.05001e-06 [meta_fg_expand]: 6.918e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.05001e-06 [after_resolve]: 1.584e-05 [a_after_grad]: 1.454e-05 [renormalize]: 0.00060485 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.475e-05 [cse]: 4.695e-05 [a_3]: 6.48e-05 [Cycle 3]: 0.00095899, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 1.051e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.00024956 [with_stream_mark]: 1.006e-05 [recompute_prepare]: 9.59999e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 3.94002e-06 [updatestate_loads_eliminate]: 3.83001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00017565 [accelerated_algorithm]: 1.215e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.81003e-06 [shard_inline]: 9.07999e-06 [merge_send_recv]: 7.18e-06 [auto_parallel]: 7.11999e-06 [parallel]: 4.89998e-06 [flash_sp]: 1.00999e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 4.97e-06 [matmul_add_comm_reduction]: 7.7e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 9.81e-06 [virtual_dataset]: 8.53001e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 8.42998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.615e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.478e-05 [a_after_grad]: 1.495e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 1.143e-05 [cse]: 2.645e-05 [a_3]: 5.994e-05 [py_interpret_to_execute_after_opt_a]: 1.106e-05 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 4.863e-05 [convert_after_rewriter]: 9.14e-06 [order_py_execute_after_rewriter]: 7.23e-06 [mutable_eliminate]: 0.00046196 [opt_b]: 0.0002878, [1] [Cycle 1]: 0.00028161, [7] [b_1]: 0.00018952 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 6.99001e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.88001e-06 [renormalize]: 3.9002e-07 [cse]: 3.193e-05 [optimize_parallel_all_gather_comm]: 2.153e-05 [overlap_param_gather]: 1.73002e-06 [cconv]: 2.137e-05 [loop_unroll]: 0.00042113 [opt_after_cconv]: 0.00013508, [1] [Cycle 1]: 0.00012932, [7] [c_1]: 4.811e-05 [parameter_eliminate]: 2.33998e-06 [updatestate_depend_eliminate]: 6.88e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.85e-06 [cse]: 2.995e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 3.017e-05 [tuple_transform]: 0.0001024, [1] [Cycle 1]: 9.779e-05, [4] [d_1]: 6.756e-05 [none_parameter_eliminate]: 2.10002e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 9.77999e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 5.739e-05 [cse_after_recomputation]: 3.218e-05, [1] [Cycle 1]: 2.743e-05, [1] [cse]: 2.189e-05 [environ_conv]: 9.12999e-06 [swap_dp_allreduce_reducescatter]: 8e-06 [bias_add_comm_swap]: 2.84999e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 9.30013e-07 [remove_cast_before_assign_add]: 1.50999e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.14003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.688e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 5.16002e-06 [overlap_recompute_and_grad_model_parallel]: 5.68997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 5.40999e-06 [overlap_grad_flash_sp]: 2.384e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.91998e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 9.87e-05, [1] [Cycle 1]: 9.439e-05, [6] [build]: 1.044e-05 [elim_shapecalc]: 1.357e-05 [elim_not_effective]: 1.761e-05 [opt_reshape]: 1.027e-05 [fold_const_symbol]: 1.484e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.50999e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 2.527e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.66001e-06 [opt_after_jit_grad]: 0.000469 [validate]: 4.598e-05 [backend_pass]: 8.90024e-07 [task_emit]: 0.11721 [execute]: 9.44e-06 Sums bootstrap : 0.000491s : 0.34% type_inference : 0.011446s : 8.04% event_method : 0.000049s : 0.03% auto_monad : 0.000122s : 0.09% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.03% optimize.rewriter_before_opt_a : 0.000147s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000133s : 0.09% optimize.opt_a.loop_unroll : 0.000115s : 0.08% optimize.opt_a.a_1 : 0.003228s : 2.27% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000547s : 0.38% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001520s : 1.07% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.06% optimize.opt_a.a_after_grad : 0.000111s : 0.08% optimize.opt_a.renormalize : 0.003083s : 2.17% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000281s : 0.20% optimize.opt_a.a_3 : 0.000461s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000462s : 0.32% optimize.opt_b.b_1 : 0.000190s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000421s : 0.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000068s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000469s : 0.33% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.117210s : 82.34% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000763 222 5.84% : 0.000045s : 12: substitution.arithmetic_simplify 1.77% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.54% : 0.000424s : 17: substitution.inline 2.13% : 0.000016s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.07% : 0.000016s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.84% : 0.000014s : 20: substitution.remove_not_recompute_node 3.14% : 0.000024s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.74% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.43% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011370 2 86.79% : 0.009868s : 1: type_inference.infer 13.21% : 0.001502s : 1: type_inference.specialize ------[replace.] 0.000221 33 56.89% : 0.000126s : 17: replace.inline 43.11% : 0.000095s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000449 33 92.34% : 0.000415s : 17: match.inline 7.66% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.98% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000042s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.30% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.16% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001612 34 56.97% : 0.000918s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.03% : 0.000694s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.171480 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.77% : 0.003031s : 1: add_attr 1.76% : 0.003022s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000129s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000528s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000056s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000471s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.89% : 0.004949s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.52% : 0.011180s : 1: opt_a 0.08% : 0.000138s : 1: opt_after_cconv 0.28% : 0.000479s : 1: opt_after_jit_grad 0.17% : 0.000291s : 1: opt_b 7.85% : 0.013456s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000056s : 1: pre_auto_parallel 0.03% : 0.000045s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.96% : 0.001646s : 2: renormalize.infer 0.83% : 0.001423s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000053s : 1: rewriter_after_opt_a 0.09% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 68.37% : 0.117233s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.68% : 0.011461s : 1: type_inference 0.04% : 0.000071s : 1: validate TotalTime = 0.107167, [24] [bootstrap]: 0.00049814 [type_inference]: 0.00433648 [event_method]: 1.062e-05 [auto_monad]: 5.311e-05 [graph_reusing]: 5.82001e-06 [inline]: 1.73002e-06 [add_attr]: 0.00294044, [1] [add_attr_with_inline]: 0.00293242, [1] [Cycle 1]: 4.269e-05, [2] [tag_attr]: 1.249e-05 [meta_addattr_fg_expand]: 3.26001e-06 [parallel-infer-symbol]: 3.04001e-06 [pre_auto_parallel]: 2.192e-05 [insert-virtual-dataset]: 2.68003e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00369658, [53] [py_interpret_to_execute]: 1.597e-05 [rewriter_before_opt_a]: 3.908e-05 [opt_a]: 0.00189457, [2] [Cycle 1]: 0.00129229, [45] [expand_dump_flag]: 3.27002e-06 [switch_simplify]: 2.445e-05 [loop_unroll]: 1.364e-05 [a_1]: 0.0003171 [with_stream_mark]: 1.374e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.46001e-06 [updatestate_loads_eliminate]: 2.96999e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.669e-05 [accelerated_algorithm]: 6.47001e-06 [shard]: 2.41e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 8.15999e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.981e-05 [flash_sp]: 7.41001e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 9.17999e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.61998e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 3.56001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.148e-05 [merge_recompute_call_nodes]: 1.84e-06 [before_grad]: 9.90002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 3.03998e-06 [receive_attached]: 2.85002e-06 [after_resolve]: 1.034e-05 [a_after_grad]: 8.64003e-06 [renormalize]: 0.00034671 [add_forward_monad_depend]: 4.42e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.392e-05 [cse]: 2.848e-05 [a_3]: 4.066e-05 [Cycle 2]: 0.00059299, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.00012341 [with_stream_mark]: 9.46003e-06 [recompute_prepare]: 5.50001e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.8e-05 [accelerated_algorithm]: 5.64998e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.18002e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.68999e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.89999e-06 [matmul_add_comm_reduction]: 5.44998e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.79997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.88002e-06 [a_after_grad]: 7.85998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.58998e-06 [cse]: 1.362e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 7.11001e-06 [slice_cell_reuse_recomputed_activation]: 2.08998e-06 [rewriter_after_opt_a]: 3.111e-05 [convert_after_rewriter]: 7.16001e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00044816 [opt_b]: 0.0001784, [1] [Cycle 1]: 0.00017252, [7] [b_1]: 0.00010545 [b_2]: 7.64002e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 3.60014e-07 [cse]: 1.646e-05 [optimize_parallel_all_gather_comm]: 1.661e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.284e-05 [loop_unroll]: 0.0004076 [opt_after_cconv]: 9.386e-05, [1] [Cycle 1]: 8.827e-05, [7] [c_1]: 2.717e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.635e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.338e-05 [tuple_transform]: 6.862e-05, [1] [Cycle 1]: 6.407e-05, [4] [d_1]: 3.857e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.45e-05 [cse_after_recomputation]: 2.054e-05, [1] [Cycle 1]: 1.603e-05, [1] [cse]: 1.055e-05 [environ_conv]: 5.09e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.46002e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.22e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 4.06001e-06 [overlap_recompute_and_grad_model_parallel]: 4.90001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 2.03002e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.829e-05, [1] [Cycle 1]: 6.416e-05, [6] [build]: 2.44001e-06 [elim_shapecalc]: 8.37998e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.67e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.593e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.25e-06 [opt_after_jit_grad]: 0.00044704 [validate]: 3.162e-05 [backend_pass]: 8.30012e-07 [task_emit]: 0.0948673 [execute]: 9.50001e-06 Sums bootstrap : 0.000498s : 0.48% type_inference : 0.004336s : 4.20% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000441s : 0.43% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000347s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.43% optimize.opt_b.b_1 : 0.000105s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000408s : 0.39% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000447s : 0.43% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.094867s : 91.88% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.13% : 0.000023s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.42% : 0.000006s : 4: substitution.graph_param_transform 65.94% : 0.000082s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.92% : 0.000005s : 4: substitution.remove_not_recompute_node 2.73% : 0.000003s : 4: substitution.replace_old_param ------[type_inference.] 0.004295 2 91.62% : 0.003935s : 1: type_inference.infer 8.38% : 0.000360s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000155 984 0.73% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.63% : 0.000001s : 9: predicate.addn_zero_filter 0.65% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.12% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.69% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.96% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.04% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.93% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.91% : 0.000001s : 13: predicate.environ_get_depend_swap 1.69% : 0.000003s : 21: predicate.environ_get_eliminate 0.93% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.85% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.65% : 0.000003s : 11: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.36% : 0.000008s : 44: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.43% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.86% : 0.000003s : 26: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.59% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.63% : 0.000001s : 9: predicate.minmaximum_grad 1.18% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.10% : 0.000002s : 11: predicate.partial_defer_inline 1.09% : 0.000002s : 13: predicate.partial_eliminate 0.69% : 0.000001s : 9: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 9: predicate.reduce_eliminate 1.80% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.14% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.64% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.92% : 0.000001s : 11: predicate.switch_defer_inline 1.61% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.82% : 0.000006s : 41: predicate.switch_simplify 0.66% : 0.000001s : 9: predicate.tile_eliminate 0.74% : 0.000001s : 9: predicate.transpose_eliminate 1.37% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.20% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.17% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.85% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 14.18% : 0.000022s : 34: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000249 6 42.30% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.70% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.115094 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.56% : 0.002945s : 1: add_attr 2.55% : 0.002936s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.47% : 0.000536s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000416s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000791s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.65% : 0.001898s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.40% : 0.000456s : 1: opt_after_jit_grad 0.16% : 0.000182s : 1: opt_b 3.22% : 0.003700s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000189s : 1: renormalize.infer 0.13% : 0.000152s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 82.45% : 0.094891s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.78% : 0.004350s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.144625, [24] [bootstrap]: 0.00052202 [type_inference]: 0.0101515 [event_method]: 4.444e-05 [auto_monad]: 0.00011625 [graph_reusing]: 8.38999e-06 [inline]: 1.84e-06 [add_attr]: 0.00298572, [1] [add_attr_with_inline]: 0.00297699, [1] [Cycle 1]: 6.757e-05, [2] [tag_attr]: 3.221e-05 [meta_addattr_fg_expand]: 8.51002e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 4.549e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.013073, [53] [py_interpret_to_execute]: 3.666e-05 [rewriter_before_opt_a]: 0.00012969 [opt_a]: 0.0108378, [3] [Cycle 1]: 0.006935, [45] [expand_dump_flag]: 3.76001e-06 [switch_simplify]: 6.628e-05 [loop_unroll]: 5.512e-05 [a_1]: 0.00134225 [with_stream_mark]: 2.345e-05 [recompute_prepare]: 2.255e-05 [updatestate_depend_eliminate]: 9.10999e-06 [updatestate_assign_eliminate]: 7.76001e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 3.06999e-06 [a_2]: 0.00024565 [accelerated_algorithm]: 3.165e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.71001e-06 [shard_inline]: 1.626e-05 [merge_send_recv]: 1.569e-05 [auto_parallel]: 1.071e-05 [parallel]: 1.88e-05 [flash_sp]: 1.122e-05 [merge_comm]: 9.40001e-06 [allreduce_fusion]: 2.728e-05 [matmul_add_comm_reduction]: 2.746e-05 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 1.844e-05 [virtual_dataset]: 1.571e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.54999e-06 [cell_reuse_recompute_pass]: 1.11997e-06 [offload_activation]: 1.812e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.924e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 2.784e-05 [set_forward_comm_id_for_comm_node_pass]: 9.92999e-06 [meta_fg_expand]: 0.00138526 [flash_sp_send_recv_attached]: 3.48e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 5.926e-05 [a_after_grad]: 8.106e-05 [renormalize]: 0.00242127 [add_forward_monad_depend]: 9.66998e-06 [auto_monad_grad]: 5.10001e-06 [auto_monad_eliminator]: 5.628e-05 [cse]: 0.00017319 [a_3]: 0.00033625 [Cycle 2]: 0.00296135, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.788e-05 [loop_unroll]: 4.372e-05 [a_1]: 0.00153271 [with_stream_mark]: 1.173e-05 [recompute_prepare]: 1.059e-05 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 0.00012637 [accelerated_algorithm]: 1.167e-05 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 9.35001e-06 [merge_send_recv]: 6.86999e-06 [auto_parallel]: 7.15e-06 [parallel]: 4.70001e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 5.10001e-06 [allreduce_fusion]: 5.71e-06 [matmul_add_comm_reduction]: 8.08999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.027e-05 [virtual_dataset]: 9.02e-06 [get_grad_eliminate_]: 8.58001e-06 [virtual_output]: 8.35999e-06 [merge_forward]: 4.53999e-06 [cell_reuse_recompute_pass]: 9.30013e-07 [offload_activation]: 8.74998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.621e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.394e-05 [set_forward_comm_id_for_comm_node_pass]: 5.18002e-06 [meta_fg_expand]: 3.401e-05 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.484e-05 [a_after_grad]: 1.4e-05 [renormalize]: 0.00058742 [add_forward_monad_depend]: 3.98001e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 1.439e-05 [cse]: 4.802e-05 [a_3]: 6.504e-05 [Cycle 3]: 0.00092777, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 1.067e-05 [loop_unroll]: 9.08002e-06 [a_1]: 0.00027167 [with_stream_mark]: 1.049e-05 [recompute_prepare]: 9.46e-06 [updatestate_depend_eliminate]: 4.77998e-06 [updatestate_assign_eliminate]: 3.83999e-06 [updatestate_loads_eliminate]: 3.72002e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012394 [accelerated_algorithm]: 1.188e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 9.33997e-06 [merge_send_recv]: 6.79001e-06 [auto_parallel]: 6.66999e-06 [parallel]: 4.75001e-06 [flash_sp]: 1.07998e-06 [merge_comm]: 4.83001e-06 [allreduce_fusion]: 4.95001e-06 [matmul_add_comm_reduction]: 7.73999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 9.89999e-06 [virtual_dataset]: 8.70001e-06 [get_grad_eliminate_]: 8.45001e-06 [virtual_output]: 8.33999e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 8.65001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.625e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.396e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10001e-06 [meta_fg_expand]: 2.90998e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.293e-05 [a_after_grad]: 1.405e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.35999e-06 [auto_monad_grad]: 9.90025e-07 [auto_monad_eliminator]: 1.091e-05 [cse]: 2.755e-05 [a_3]: 6.063e-05 [py_interpret_to_execute_after_opt_a]: 1.06e-05 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 4.821e-05 [convert_after_rewriter]: 8.90999e-06 [order_py_execute_after_rewriter]: 6.81001e-06 [mutable_eliminate]: 0.0004491 [opt_b]: 0.0002882, [1] [Cycle 1]: 0.00028227, [7] [b_1]: 0.00018959 [b_2]: 1.063e-05 [updatestate_depend_eliminate]: 6.88e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.86001e-06 [renormalize]: 5.19998e-07 [cse]: 3.198e-05 [optimize_parallel_all_gather_comm]: 2.032e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.014e-05 [loop_unroll]: 0.00041535 [opt_after_cconv]: 0.000136, [1] [Cycle 1]: 0.00013001, [7] [c_1]: 4.835e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 6.74001e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.75e-06 [cse]: 3.098e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 2.917e-05 [tuple_transform]: 0.00010486, [1] [Cycle 1]: 0.00010012, [4] [d_1]: 6.978e-05 [none_parameter_eliminate]: 1.99e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 9.79e-06 [partial_unused_args_eliminate]: 2.14999e-06 [add_recomputation]: 5.877e-05 [cse_after_recomputation]: 3.319e-05, [1] [Cycle 1]: 2.841e-05, [1] [cse]: 2.291e-05 [environ_conv]: 9.39e-06 [swap_dp_allreduce_reducescatter]: 7.75e-06 [bias_add_comm_swap]: 2.34999e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.78998e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76998e-06 [control_data_broadcast_order]: 1.801e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 5.42001e-06 [overlap_recompute_and_grad_model_parallel]: 5.59e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 5.16002e-06 [overlap_grad_flash_sp]: 2.376e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.819e-05, [1] [Cycle 1]: 9.395e-05, [6] [build]: 9.77999e-06 [elim_shapecalc]: 1.335e-05 [elim_not_effective]: 1.809e-05 [opt_reshape]: 9.82001e-06 [fold_const_symbol]: 1.489e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.83002e-06 [auto_monad_reorder]: 2.517e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 4.14002e-06 [opt_after_jit_grad]: 0.0004519 [validate]: 4.679e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.116908 [execute]: 1.039e-05 Sums bootstrap : 0.000522s : 0.37% type_inference : 0.010151s : 7.23% event_method : 0.000044s : 0.03% auto_monad : 0.000116s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000130s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.08% optimize.opt_a.a_1 : 0.003147s : 2.24% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000038s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001422s : 1.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.06% optimize.opt_a.a_after_grad : 0.000109s : 0.08% optimize.opt_a.renormalize : 0.003009s : 2.14% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.06% optimize.opt_a.cse : 0.000249s : 0.18% optimize.opt_a.a_3 : 0.000462s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.32% optimize.opt_b.b_1 : 0.000190s : 0.14% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000415s : 0.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000070s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000452s : 0.32% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.116908s : 83.28% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000732 218 5.76% : 0.000042s : 11: substitution.arithmetic_simplify 1.93% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.45% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.65% : 0.000400s : 16: substitution.inline 2.09% : 0.000015s : 2: substitution.inline_without_move 1.43% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.10% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000013s : 20: substitution.remove_not_recompute_node 3.27% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000010s : 15: substitution.replace_old_param 0.45% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.72% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.55% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010081 2 86.79% : 0.008749s : 1: type_inference.infer 13.21% : 0.001332s : 1: type_inference.specialize ------[replace.] 0.000199 30 58.11% : 0.000115s : 16: replace.inline 41.89% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.60% : 0.000392s : 16: match.inline 7.40% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000737 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000008s : 67: predicate.cast_eliminate 1.19% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.33% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000019s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.94% : 0.000014s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000009s : 67: predicate.reduce_eliminate 2.63% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.90% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.60% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001539 32 58.11% : 0.000894s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.89% : 0.000645s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168816 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.77% : 0.002990s : 1: add_attr 1.77% : 0.002981s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000123s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.33% : 0.000555s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000051s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000458s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.84% : 0.004799s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.05% : 0.000078s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.42% : 0.010841s : 1: opt_a 0.08% : 0.000139s : 1: opt_after_cconv 0.27% : 0.000461s : 1: opt_after_jit_grad 0.17% : 0.000292s : 1: opt_b 7.75% : 0.013077s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.93% : 0.001576s : 2: renormalize.infer 0.84% : 0.001420s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.08% : 0.000134s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 69.26% : 0.116930s : 1: task_emit 0.06% : 0.000108s : 1: tuple_transform 6.02% : 0.010167s : 1: type_inference 0.04% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x6-ge],max_mem:30.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x7-pynative],max_mem:30.0M TotalTime = 0.0219803, [24] [bootstrap]: 0.00057543 [type_inference]: 0.00623813 [event_method]: 1.417e-05 [auto_monad]: 5.931e-05 [graph_reusing]: 6.22001e-06 [inline]: 2.01998e-06 [add_attr]: 0.0034625, [1] [add_attr_with_inline]: 0.00345159, [1] [Cycle 1]: 4.607e-05, [2] [tag_attr]: 1.574e-05 [meta_addattr_fg_expand]: 4.33001e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.857e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00408946, [53] [py_interpret_to_execute]: 2.059e-05 [rewriter_before_opt_a]: 6.087e-05 [opt_a]: 0.0021616, [2] [Cycle 1]: 0.00156232, [45] [expand_dump_flag]: 2.83998e-06 [switch_simplify]: 3.325e-05 [loop_unroll]: 2.098e-05 [a_1]: 0.00046687 [with_stream_mark]: 1.443e-05 [recompute_prepare]: 7.75e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 3.26999e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.623e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.68997e-06 [merge_send_recv]: 7.76001e-06 [auto_parallel]: 5.84e-06 [parallel]: 2.474e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.54e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.51e-06 [merge_forward]: 4.07e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 9.47001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 8.74e-06 [renormalize]: 0.0004447 [add_forward_monad_depend]: 4.17e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.363e-05 [cse]: 2.985e-05 [a_3]: 4.039e-05 [Cycle 2]: 0.00058984, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 7.06001e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012508 [with_stream_mark]: 9.69e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.71e-05 [accelerated_algorithm]: 5.27001e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.22e-06 [flash_sp]: 3.13998e-06 [merge_comm]: 2.78e-06 [allreduce_fusion]: 2.75002e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.08002e-06 [get_grad_eliminate_]: 4.79e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.40002e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 5.88002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37999e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.88001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01999e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 9.08002e-06 [a_after_grad]: 7.61999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.99979e-07 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.31e-06 [cse]: 1.463e-05 [a_3]: 3.209e-05 [py_interpret_to_execute_after_opt_a]: 8.01001e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 3.125e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.0005169 [opt_b]: 0.00018406, [1] [Cycle 1]: 0.00017813, [7] [b_1]: 0.0001101 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 4.00003e-07 [cse]: 1.708e-05 [optimize_parallel_all_gather_comm]: 1.556e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.32e-05 [loop_unroll]: 0.00041643 [opt_after_cconv]: 9.756e-05, [1] [Cycle 1]: 9.143e-05, [7] [c_1]: 2.903e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.714e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.27e-05 [tuple_transform]: 6.983e-05, [1] [Cycle 1]: 6.565e-05, [4] [d_1]: 3.975e-05 [none_parameter_eliminate]: 1.82999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 2.11e-06 [add_recomputation]: 5.229e-05 [cse_after_recomputation]: 2.156e-05, [1] [Cycle 1]: 1.674e-05, [1] [cse]: 1.153e-05 [environ_conv]: 4.92e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.63998e-06 [label_micro_interleaved_index]: 3.9e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.35001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.34998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71002e-06 [control_data_broadcast_order]: 1.221e-05 [grouped_pairwise_exchange_alltoall]: 1.82001e-06 [offloading_packed_experts]: 4.08999e-06 [overlap_recompute_and_grad_model_parallel]: 4.73001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.851e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.99999e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 6.809e-05, [1] [Cycle 1]: 6.394e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 8.63001e-06 [elim_not_effective]: 1.161e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.84998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 1.616e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 0.00010838 [opt_after_jit_grad]: 0.00045149 [validate]: 3.214e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00667084 [execute]: 7.73001e-06 Sums bootstrap : 0.000575s : 3.28% type_inference : 0.006238s : 35.54% event_method : 0.000014s : 0.08% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000061s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000592s : 3.37% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000445s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000072s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000517s : 2.94% optimize.opt_b.b_1 : 0.000110s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000416s : 2.37% optimize.opt_after_cconv.c_1 : 0.000029s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000108s : 0.62% opt_after_jit_grad : 0.000451s : 2.57% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006671s : 38.01% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000171 30 14.31% : 0.000024s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000006s : 4: substitution.graph_param_transform 67.69% : 0.000116s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.41% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.61% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006189 2 90.51% : 0.005602s : 1: type_inference.infer 9.49% : 0.000587s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.92% : 0.000028s : 3: replace.inline 30.08% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 5 91.74% : 0.000114s : 3: match.inline 8.26% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.80% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.37% : 0.000001s : 4: predicate.graph_param_transform 0.63% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000002s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.37% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 1.06% : 0.000002s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.05% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.24% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000391 8 48.09% : 0.000188s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.91% : 0.000203s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031090 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.15% : 0.003467s : 1: add_attr 11.11% : 0.003455s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000065s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.96% : 0.000609s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000526s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.07% : 0.000955s : 78: opt.transform.opt_a 0.09% : 0.000028s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.96% : 0.002164s : 1: opt_a 0.32% : 0.000101s : 1: opt_after_cconv 1.48% : 0.000461s : 1: opt_after_jit_grad 0.60% : 0.000188s : 1: opt_b 13.17% : 0.004093s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.73% : 0.000227s : 1: renormalize.infer 0.68% : 0.000211s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.37% : 0.000114s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000036s : 1: rewriter_after_opt_a 0.21% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.49% : 0.006681s : 1: task_emit 0.23% : 0.000073s : 1: tuple_transform 20.11% : 0.006253s : 1: type_inference 0.20% : 0.000063s : 1: validate TotalTime = 0.0188268, [24] [bootstrap]: 0.00050873 [type_inference]: 0.00448704 [event_method]: 1.085e-05 [auto_monad]: 5.469e-05 [graph_reusing]: 5.46998e-06 [inline]: 1.82001e-06 [add_attr]: 0.00300321, [1] [add_attr_with_inline]: 0.00299565, [1] [Cycle 1]: 4.114e-05, [2] [tag_attr]: 1.189e-05 [meta_addattr_fg_expand]: 3.26999e-06 [parallel-infer-symbol]: 3.00998e-06 [pre_auto_parallel]: 2.26e-05 [insert-virtual-dataset]: 2.73998e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00372464, [53] [py_interpret_to_execute]: 1.533e-05 [rewriter_before_opt_a]: 3.976e-05 [opt_a]: 0.00191347, [2] [Cycle 1]: 0.00131349, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 2.532e-05 [loop_unroll]: 1.363e-05 [a_1]: 0.0002922 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 3.23998e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 7.695e-05 [accelerated_algorithm]: 6.14999e-06 [shard]: 2.33002e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 6.05002e-06 [parallel]: 1.857e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.62002e-06 [allreduce_fusion]: 3.40003e-06 [matmul_add_comm_reduction]: 9.72999e-06 [allreduce_slice_to_reducescatter]: 6.49976e-07 [virtual_shard_identity]: 6.68e-06 [virtual_dataset]: 5.81998e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.44998e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 1.003e-05 [a_after_grad]: 8.56002e-06 [renormalize]: 0.00040157 [add_forward_monad_depend]: 4.48999e-06 [auto_monad_grad]: 2.16e-06 [auto_monad_eliminator]: 1.409e-05 [cse]: 2.759e-05 [a_3]: 3.997e-05 [Cycle 2]: 0.00059089, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.49999e-06 [loop_unroll]: 5.21002e-06 [a_1]: 0.00012505 [with_stream_mark]: 1.13e-05 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.719e-05 [accelerated_algorithm]: 5.43002e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.29997e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.77e-06 [flash_sp]: 3.38e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.44998e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 5.04003e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 6.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 8.12e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.14e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 7.93999e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.23e-06 [cse]: 1.172e-05 [a_3]: 3.192e-05 [py_interpret_to_execute_after_opt_a]: 7.10998e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.172e-05 [convert_after_rewriter]: 7.64002e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.00044354 [opt_b]: 0.00018159, [1] [Cycle 1]: 0.00017531, [7] [b_1]: 0.0001085 [b_2]: 7.46999e-06 [updatestate_depend_eliminate]: 4.89003e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.40002e-06 [renormalize]: 4.90021e-07 [cse]: 1.556e-05 [optimize_parallel_all_gather_comm]: 1.641e-05 [overlap_param_gather]: 2.05002e-06 [cconv]: 2.404e-05 [loop_unroll]: 0.00041382 [opt_after_cconv]: 9.408e-05, [1] [Cycle 1]: 8.832e-05, [7] [c_1]: 2.776e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.51002e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.509e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.298e-05 [tuple_transform]: 7.008e-05, [1] [Cycle 1]: 6.567e-05, [4] [d_1]: 4.038e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 4.556e-05 [cse_after_recomputation]: 1.963e-05, [1] [Cycle 1]: 1.537e-05, [1] [cse]: 1.042e-05 [environ_conv]: 4.78001e-06 [swap_dp_allreduce_reducescatter]: 5.71003e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 3.01001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.18998e-06 [assign_add_opt]: 1.47999e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.40999e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.23e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 4.03001e-06 [overlap_recompute_and_grad_model_parallel]: 4.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 4.47e-06 [overlap_grad_flash_sp]: 1.845e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 6.885e-05, [1] [Cycle 1]: 6.474e-05, [6] [build]: 2.93e-06 [elim_shapecalc]: 8.55001e-06 [elim_not_effective]: 1.188e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.31e-06 [pipeline_parallel_scheduler]: 1.89999e-06 [auto_monad_reorder]: 1.614e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.67002e-06 [opt_after_jit_grad]: 0.00045669 [validate]: 3.29e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00627262 [execute]: 8.44002e-06 Sums bootstrap : 0.000509s : 3.42% type_inference : 0.004487s : 30.18% event_method : 0.000011s : 0.07% auto_monad : 0.000055s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.81% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000402s : 2.70% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.98% optimize.opt_b.b_1 : 0.000109s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000414s : 2.78% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.02% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000457s : 3.07% validate : 0.000033s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006273s : 42.19% execute : 0.000008s : 0.06% Time group info: ------[substitution.] 0.000121 26 18.24% : 0.000022s : 4: substitution.arithmetic_simplify 1.89% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 5.11% : 0.000006s : 4: substitution.graph_param_transform 64.79% : 0.000078s : 2: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.47% : 0.000004s : 4: substitution.remove_not_recompute_node 3.15% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004445 2 92.06% : 0.004092s : 1: type_inference.infer 7.94% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.08% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.82% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.43% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.86% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.22% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.33% : 0.000006s : 41: predicate.switch_simplify 0.81% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.62% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.61% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.74% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 43.55% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.45% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026880 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.19% : 0.003007s : 1: add_attr 11.16% : 0.002999s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.01% : 0.000541s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000422s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.68% : 0.000452s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.85% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.13% : 0.001916s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000466s : 1: opt_after_jit_grad 0.69% : 0.000185s : 1: opt_b 13.87% : 0.003728s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000022s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.90% : 0.000243s : 1: renormalize.infer 0.57% : 0.000152s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.39% : 0.006289s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.74% : 0.004501s : 1: type_inference 0.23% : 0.000062s : 1: validate TotalTime = 0.0199084, [24] [bootstrap]: 0.00050532 [type_inference]: 0.00559837 [event_method]: 1.415e-05 [auto_monad]: 5.763e-05 [graph_reusing]: 6.00002e-06 [inline]: 2.11e-06 [add_attr]: 0.00298617, [1] [add_attr_with_inline]: 0.00297781, [1] [Cycle 1]: 4.533e-05, [2] [tag_attr]: 1.574e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.616e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00392875, [53] [py_interpret_to_execute]: 1.967e-05 [rewriter_before_opt_a]: 5.878e-05 [opt_a]: 0.00210394, [2] [Cycle 1]: 0.00149789, [45] [expand_dump_flag]: 2.97002e-06 [switch_simplify]: 3.319e-05 [loop_unroll]: 2.065e-05 [a_1]: 0.0004482 [with_stream_mark]: 1.247e-05 [recompute_prepare]: 7.51001e-06 [updatestate_depend_eliminate]: 3.52002e-06 [updatestate_assign_eliminate]: 3.73001e-06 [updatestate_loads_eliminate]: 3.26999e-06 [parameter_eliminate]: 1.95001e-06 [a_2]: 7.47e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 8.18001e-06 [auto_parallel]: 5.79e-06 [parallel]: 1.9e-05 [flash_sp]: 7.52998e-06 [merge_comm]: 3.21999e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.82001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.76999e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.51998e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 3.87002e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.24e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.40997e-06 [receive_attached]: 2.22001e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00041156 [add_forward_monad_depend]: 4.38999e-06 [auto_monad_grad]: 1.83002e-06 [auto_monad_eliminator]: 1.366e-05 [cse]: 2.827e-05 [a_3]: 4.077e-05 [Cycle 2]: 0.00059671, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.26002e-06 [a_1]: 0.0001242 [with_stream_mark]: 9.69999e-06 [recompute_prepare]: 5.46e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.668e-05 [accelerated_algorithm]: 5.37001e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.36002e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 5.11997e-06 [parallel]: 4.45e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 2.79001e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.20001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.37999e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 5.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.21998e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 2.97002e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.08002e-06 [a_after_grad]: 7.88001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 2.781e-05 [a_3]: 3.156e-05 [py_interpret_to_execute_after_opt_a]: 7.71001e-06 [slice_cell_reuse_recomputed_activation]: 2.21e-06 [rewriter_after_opt_a]: 3.32e-05 [convert_after_rewriter]: 6.92002e-06 [order_py_execute_after_rewriter]: 4.90001e-06 [mutable_eliminate]: 0.00044572 [opt_b]: 0.00018052, [1] [Cycle 1]: 0.00017405, [7] [b_1]: 0.00010694 [b_2]: 6.59999e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.51998e-06 [renormalize]: 4.19997e-07 [cse]: 1.565e-05 [optimize_parallel_all_gather_comm]: 1.602e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.311e-05 [loop_unroll]: 0.00040867 [opt_after_cconv]: 9.351e-05, [1] [Cycle 1]: 8.781e-05, [7] [c_1]: 2.761e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 4.83001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.534e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.352e-05 [tuple_transform]: 6.898e-05, [1] [Cycle 1]: 6.452e-05, [4] [d_1]: 3.912e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.01998e-06 [partial_unused_args_eliminate]: 1.76998e-06 [add_recomputation]: 4.622e-05 [cse_after_recomputation]: 2.045e-05, [1] [Cycle 1]: 1.6e-05, [1] [cse]: 1.1e-05 [environ_conv]: 4.75999e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.94001e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.70997e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.51998e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.66e-06 [ForceFp32Comm]: 8.40024e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.35e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.71999e-06 [overlap_grad_ring_attention]: 4.00998e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 6.638e-05, [1] [Cycle 1]: 6.205e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 7.95998e-06 [elim_not_effective]: 1.1e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.578e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044309 [validate]: 3.168e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00608081 [execute]: 6.76e-06 Sums bootstrap : 0.000505s : 3.16% type_inference : 0.005598s : 35.03% event_method : 0.000014s : 0.09% auto_monad : 0.000058s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000572s : 3.58% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000141s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000412s : 2.58% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000056s : 0.35% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 2.79% optimize.opt_b.b_1 : 0.000107s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000409s : 2.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000443s : 2.77% validate : 0.000032s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006081s : 38.05% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 15.07% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.53% : 0.000006s : 4: substitution.graph_param_transform 66.66% : 0.000109s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.22% : 0.000004s : 4: substitution.replace_old_param 6.45% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005556 2 90.08% : 0.005005s : 1: type_inference.infer 9.92% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.48% : 0.000027s : 3: replace.inline 29.52% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.83% : 0.000107s : 3: match.inline 8.17% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 1.03% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000009s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 0.98% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.46% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.24% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 47.76% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.24% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028323 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.56% : 0.002990s : 1: add_attr 10.53% : 0.002981s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000050s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.89% : 0.000535s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.47% : 0.000417s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.29% : 0.000933s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.44% : 0.002107s : 1: opt_a 0.34% : 0.000097s : 1: opt_after_cconv 1.60% : 0.000453s : 1: opt_after_jit_grad 0.65% : 0.000184s : 1: opt_b 13.88% : 0.003933s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.75% : 0.000214s : 1: renormalize.infer 0.67% : 0.000191s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000069s : 1: symbol_engine_optimizer 21.51% : 0.006091s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.81% : 0.005612s : 1: type_inference 0.21% : 0.000059s : 1: validate TotalTime = 0.0376723, [24] [bootstrap]: 0.00050243 [type_inference]: 0.0113422 [event_method]: 4.602e-05 [auto_monad]: 0.00012287 [graph_reusing]: 8.22e-06 [inline]: 2.40002e-06 [add_attr]: 0.00303576, [1] [add_attr_with_inline]: 0.00302778, [1] [Cycle 1]: 7.216e-05, [2] [tag_attr]: 3.545e-05 [meta_addattr_fg_expand]: 9.56998e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 5.039e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0134068, [53] [py_interpret_to_execute]: 3.868e-05 [rewriter_before_opt_a]: 0.00014407 [opt_a]: 0.011113, [3] [Cycle 1]: 0.00707702, [45] [expand_dump_flag]: 3.98001e-06 [switch_simplify]: 7.398e-05 [loop_unroll]: 6.417e-05 [a_1]: 0.00144614 [with_stream_mark]: 2.366e-05 [recompute_prepare]: 2.177e-05 [updatestate_depend_eliminate]: 9.17001e-06 [updatestate_assign_eliminate]: 7.74997e-06 [updatestate_loads_eliminate]: 7.61001e-06 [parameter_eliminate]: 2.91999e-06 [a_2]: 0.00024534 [accelerated_algorithm]: 3.063e-05 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 3.29001e-06 [shard_inline]: 1.591e-05 [merge_send_recv]: 1.697e-05 [auto_parallel]: 1.115e-05 [parallel]: 1.933e-05 [flash_sp]: 1.204e-05 [merge_comm]: 1.004e-05 [allreduce_fusion]: 8.94e-06 [matmul_add_comm_reduction]: 2.756e-05 [allreduce_slice_to_reducescatter]: 9.09989e-07 [virtual_shard_identity]: 1.824e-05 [virtual_dataset]: 1.541e-05 [get_grad_eliminate_]: 1.527e-05 [virtual_output]: 1.509e-05 [merge_forward]: 9.72999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.89e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.834e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 2.693e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00144312 [flash_sp_send_recv_attached]: 3.65e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 6.072e-05 [a_after_grad]: 8.046e-05 [renormalize]: 0.00241746 [add_forward_monad_depend]: 9.22999e-06 [auto_monad_grad]: 5.14998e-06 [auto_monad_eliminator]: 5.566e-05 [cse]: 0.00016441 [a_3]: 0.00033445 [Cycle 2]: 0.003126, [45] [expand_dump_flag]: 1.52001e-06 [switch_simplify]: 0.00012361 [loop_unroll]: 4.646e-05 [a_1]: 0.00151674 [with_stream_mark]: 1.159e-05 [recompute_prepare]: 1.078e-05 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 4.50001e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 1.22e-06 [a_2]: 0.00012554 [accelerated_algorithm]: 1.186e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 2.09999e-06 [shard_inline]: 9.38002e-06 [merge_send_recv]: 7.03e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.79e-06 [flash_sp]: 3.5e-06 [merge_comm]: 5.12e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 7.61001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.051e-05 [virtual_dataset]: 9.72999e-06 [get_grad_eliminate_]: 9.05001e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.20999e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 9.35001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.662e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.463e-05 [set_forward_comm_id_for_comm_node_pass]: 5.02999e-06 [meta_fg_expand]: 6.988e-05 [flash_sp_send_recv_attached]: 1.05999e-06 [receive_attached]: 1.03001e-06 [after_resolve]: 1.597e-05 [a_after_grad]: 1.413e-05 [renormalize]: 0.00064573 [add_forward_monad_depend]: 4.11001e-06 [auto_monad_grad]: 1.25001e-06 [auto_monad_eliminator]: 1.463e-05 [cse]: 4.692e-05 [a_3]: 6.53e-05 [Cycle 3]: 0.00089515, [45] [expand_dump_flag]: 1.24998e-06 [switch_simplify]: 1.082e-05 [loop_unroll]: 9.01998e-06 [a_1]: 0.00024912 [with_stream_mark]: 9.44e-06 [recompute_prepare]: 9.29e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 3.91001e-06 [updatestate_loads_eliminate]: 3.88999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00012273 [accelerated_algorithm]: 1.176e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 6.98998e-06 [auto_parallel]: 6.73e-06 [parallel]: 4.39002e-06 [flash_sp]: 1.17e-06 [merge_comm]: 4.80001e-06 [allreduce_fusion]: 5.05999e-06 [matmul_add_comm_reduction]: 7.45998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.92001e-06 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 8.49998e-06 [virtual_output]: 8.18999e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.555e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.4e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07e-06 [meta_fg_expand]: 2.93998e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 1.311e-05 [a_after_grad]: 1.428e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 1.116e-05 [cse]: 2.566e-05 [a_3]: 5.968e-05 [py_interpret_to_execute_after_opt_a]: 1.013e-05 [slice_cell_reuse_recomputed_activation]: 2.51e-06 [rewriter_after_opt_a]: 4.976e-05 [convert_after_rewriter]: 9.31002e-06 [order_py_execute_after_rewriter]: 7.01999e-06 [mutable_eliminate]: 0.0004594 [opt_b]: 0.00028746, [1] [Cycle 1]: 0.00028095, [7] [b_1]: 0.00018919 [b_2]: 1.105e-05 [updatestate_depend_eliminate]: 6.96999e-06 [updatestate_assign_eliminate]: 3.95998e-06 [updatestate_loads_eliminate]: 4.1e-06 [renormalize]: 4.19997e-07 [cse]: 3.066e-05 [optimize_parallel_all_gather_comm]: 2.166e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 2.119e-05 [loop_unroll]: 0.00041613 [opt_after_cconv]: 0.00013576, [1] [Cycle 1]: 0.00012998, [7] [c_1]: 4.899e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 7.5e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.97998e-06 [cse]: 2.917e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 2.893e-05 [tuple_transform]: 0.00010166, [1] [Cycle 1]: 9.722e-05, [4] [d_1]: 6.678e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 9.94001e-06 [partial_unused_args_eliminate]: 1.90001e-06 [add_recomputation]: 5.849e-05 [cse_after_recomputation]: 3.211e-05, [1] [Cycle 1]: 2.741e-05, [1] [cse]: 2.183e-05 [environ_conv]: 8.40999e-06 [swap_dp_allreduce_reducescatter]: 7.99997e-06 [bias_add_comm_swap]: 2.48002e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.73e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.68998e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.19003e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.79e-05 [grouped_pairwise_exchange_alltoall]: 1.76003e-06 [offloading_packed_experts]: 5.20999e-06 [overlap_recompute_and_grad_model_parallel]: 5.80002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67999e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 5.27999e-06 [overlap_grad_flash_sp]: 2.44e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.23002e-06 [symbol_engine_optimizer]: 9.874e-05, [1] [Cycle 1]: 9.457e-05, [6] [build]: 9.42001e-06 [elim_shapecalc]: 1.3e-05 [elim_not_effective]: 1.827e-05 [opt_reshape]: 1.037e-05 [fold_const_symbol]: 1.522e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.02999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.579e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.73999e-06 [opt_after_jit_grad]: 0.0004592 [validate]: 4.503e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.0083958 [execute]: 6.91999e-06 Sums bootstrap : 0.000502s : 1.51% type_inference : 0.011342s : 34.00% event_method : 0.000046s : 0.14% auto_monad : 0.000123s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000144s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000208s : 0.62% optimize.opt_a.loop_unroll : 0.000120s : 0.36% optimize.opt_a.a_1 : 0.003212s : 9.63% optimize.opt_a.with_stream_mark : 0.000045s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000494s : 1.48% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001516s : 4.54% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003063s : 9.18% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000237s : 0.71% optimize.opt_a.a_3 : 0.000459s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000050s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000459s : 1.38% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000416s : 1.25% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000459s : 1.38% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008396s : 25.17% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000762 222 5.95% : 0.000045s : 12: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000007s : 8: substitution.graph_param_transform 0.42% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.70% : 0.000425s : 17: substitution.inline 2.03% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 20: substitution.remove_not_recompute_node 3.12% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.40% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.51% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.75% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011268 2 86.70% : 0.009769s : 1: type_inference.infer 13.30% : 0.001498s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.83% : 0.000127s : 17: replace.inline 42.17% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 33 92.35% : 0.000416s : 17: match.inline 7.65% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000827 5764 0.98% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 0.98% : 0.000008s : 68: predicate.addn_zero_filter 0.95% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.80% : 0.000015s : 100: predicate.arithmetic_simplify 1.05% : 0.000009s : 68: predicate.cast_eliminate 1.03% : 0.000009s : 68: predicate.check_bprop_eliminate 0.47% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.46% : 0.000004s : 32: predicate.depend_value_elim 1.07% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.09% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.02% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.14% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.08% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.10% : 0.000009s : 76: predicate.environ_get_depend_swap 1.58% : 0.000013s : 108: predicate.environ_get_eliminate 1.09% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.59% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.08% : 0.000017s : 101: predicate.float_depend_g_call 0.46% : 0.000004s : 32: predicate.float_environ_get_switch 0.61% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.48% : 0.000004s : 32: predicate.incorporate_call 0.45% : 0.000004s : 32: predicate.incorporate_call_switch 5.08% : 0.000042s : 249: predicate.inline 1.12% : 0.000009s : 55: predicate.inline_without_move 0.27% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.59% : 0.000005s : 32: predicate.less_batch_normalization 1.48% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.42% : 0.000020s : 168: predicate.load_eliminater 0.26% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.12% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.30% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.49% : 0.000004s : 32: predicate.merge_addn 1.02% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.02% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.01% : 0.000008s : 68: predicate.minmaximum_grad 0.26% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.83% : 0.000015s : 101: predicate.partial_defer_inline 1.58% : 0.000013s : 92: predicate.partial_eliminate 0.99% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.16% : 0.000010s : 68: predicate.reduce_eliminate 2.44% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000003s : 32: predicate.remove_not_recompute_node 1.72% : 0.000014s : 152: predicate.replace_applicator 0.55% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 0.99% : 0.000008s : 68: predicate.reshape_eliminate 1.02% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.15% : 0.000010s : 68: predicate.same_eliminate 0.33% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.57% : 0.000005s : 32: predicate.specialize_transform 1.12% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.70% : 0.000014s : 101: predicate.switch_defer_inline 2.67% : 0.000022s : 169: predicate.switch_layer_defer_inline 13.60% : 0.000112s : 277: predicate.switch_simplify 0.97% : 0.000008s : 68: predicate.tile_eliminate 0.97% : 0.000008s : 68: predicate.transpose_eliminate 1.34% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.40% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.58% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.33% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.86% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.47% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.40% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 2.98% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 8: predicate.value_based_eliminate 0.50% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.50% : 0.000004s : 32: predicate.virtual_output_eliminate 0.12% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001549 34 57.44% : 0.000890s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.56% : 0.000659s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062433 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.87% : 0.003040s : 1: add_attr 4.86% : 0.003032s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000130s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.85% : 0.000533s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.94% : 0.004955s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.80% : 0.011116s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.75% : 0.000468s : 1: opt_after_jit_grad 0.47% : 0.000291s : 1: opt_b 21.48% : 0.013411s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.69% : 0.001682s : 2: renormalize.infer 2.19% : 0.001369s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000054s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.46% : 0.008406s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.19% : 0.011358s : 1: type_inference 0.12% : 0.000078s : 1: validate TotalTime = 0.0188257, [24] [bootstrap]: 0.00044532 [type_inference]: 0.00434474 [event_method]: 1.092e-05 [auto_monad]: 5.242e-05 [graph_reusing]: 5.43002e-06 [inline]: 1.90001e-06 [add_attr]: 0.00301823, [1] [add_attr_with_inline]: 0.00300977, [1] [Cycle 1]: 4.438e-05, [2] [tag_attr]: 1.307e-05 [meta_addattr_fg_expand]: 3.36999e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.238e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00373621, [53] [py_interpret_to_execute]: 1.64e-05 [rewriter_before_opt_a]: 3.92e-05 [opt_a]: 0.00192748, [2] [Cycle 1]: 0.0012956, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.515e-05 [loop_unroll]: 1.341e-05 [a_1]: 0.00029457 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.49001e-06 [updatestate_assign_eliminate]: 3.40998e-06 [updatestate_loads_eliminate]: 3.22002e-06 [parameter_eliminate]: 2.46e-06 [a_2]: 7.736e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.81e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.37999e-06 [parallel]: 2.253e-05 [flash_sp]: 7.39002e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 9.47001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.17997e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.46e-06 [merge_forward]: 4.15999e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 9.72001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 9.23002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.50002e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.057e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00036914 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.878e-05 [a_3]: 4.137e-05 [Cycle 2]: 0.00062247, [45] [expand_dump_flag]: 8.09989e-07 [switch_simplify]: 7.3e-06 [loop_unroll]: 5.73002e-06 [a_1]: 0.00012222 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.23002e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 9.634e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.37999e-06 [parallel]: 4.66002e-06 [flash_sp]: 3.93001e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.21998e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.22998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.15999e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 8.16002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.328e-05 [a_3]: 3.303e-05 [py_interpret_to_execute_after_opt_a]: 7.28999e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.169e-05 [convert_after_rewriter]: 6.94001e-06 [order_py_execute_after_rewriter]: 4.80001e-06 [mutable_eliminate]: 0.00045457 [opt_b]: 0.00017863, [1] [Cycle 1]: 0.00017183, [7] [b_1]: 0.00010539 [b_2]: 6.78998e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 3.69997e-07 [cse]: 1.67e-05 [optimize_parallel_all_gather_comm]: 1.624e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.323e-05 [loop_unroll]: 0.00040525 [opt_after_cconv]: 9.414e-05, [1] [Cycle 1]: 8.842e-05, [7] [c_1]: 2.793e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 2.40997e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.594e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.289e-05 [tuple_transform]: 7.143e-05, [1] [Cycle 1]: 6.728e-05, [4] [d_1]: 4.091e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.81003e-06 [add_recomputation]: 4.473e-05 [cse_after_recomputation]: 2.001e-05, [1] [Cycle 1]: 1.571e-05, [1] [cse]: 1.067e-05 [environ_conv]: 4.60999e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 3.36999e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.16e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 3.83999e-06 [overlap_recompute_and_grad_model_parallel]: 4.79998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.00998e-06 [overlap_grad_flash_sp]: 1.805e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.38002e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 1.29998e-06 [symbol_engine_optimizer]: 6.786e-05, [1] [Cycle 1]: 6.364e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.36002e-06 [elim_not_effective]: 1.143e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.57998e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.68997e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.599e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00044382 [validate]: 3.284e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00647844 [execute]: 6.68e-06 Sums bootstrap : 0.000445s : 3.00% type_inference : 0.004345s : 29.25% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.81% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000174s : 1.17% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000369s : 2.49% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000042s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000455s : 3.06% optimize.opt_b.b_1 : 0.000105s : 0.71% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000405s : 2.73% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 2.99% validate : 0.000033s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006478s : 43.61% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000122 26 18.86% : 0.000023s : 4: substitution.arithmetic_simplify 1.50% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 5.03% : 0.000006s : 4: substitution.graph_param_transform 64.37% : 0.000079s : 2: substitution.inline 2.57% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.17% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004301 2 91.70% : 0.003944s : 1: type_inference.infer 8.30% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.97% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.74% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.27% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.06% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.93% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.03% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.95% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.99% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 1.11% : 0.000002s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.97% : 0.000007s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.45% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 41.61% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.39% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026908 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.23% : 0.003023s : 1: add_attr 11.20% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.76% : 0.000475s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.54% : 0.000414s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.97% : 0.000800s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.17% : 0.001930s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.68% : 0.000453s : 1: opt_after_jit_grad 0.68% : 0.000182s : 1: opt_b 13.90% : 0.003740s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.77% : 0.000207s : 1: renormalize.infer 0.58% : 0.000155s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000071s : 1: symbol_engine_optimizer 24.12% : 0.006489s : 1: task_emit 0.28% : 0.000074s : 1: tuple_transform 16.20% : 0.004360s : 1: type_inference 0.22% : 0.000060s : 1: validate TotalTime = 0.0360586, [24] [bootstrap]: 0.00049928 [type_inference]: 0.0102155 [event_method]: 4.046e-05 [auto_monad]: 0.00011743 [graph_reusing]: 8.02998e-06 [inline]: 2.09e-06 [add_attr]: 0.00296208, [1] [add_attr_with_inline]: 0.00295381, [1] [Cycle 1]: 6.766e-05, [2] [tag_attr]: 3.13e-05 [meta_addattr_fg_expand]: 8.88002e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 4.624e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0130296, [53] [py_interpret_to_execute]: 3.655e-05 [rewriter_before_opt_a]: 0.00012619 [opt_a]: 0.0107546, [3] [Cycle 1]: 0.00687947, [45] [expand_dump_flag]: 4.22998e-06 [switch_simplify]: 6.761e-05 [loop_unroll]: 5.48e-05 [a_1]: 0.00139364 [with_stream_mark]: 2.369e-05 [recompute_prepare]: 2.132e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 7.65e-06 [updatestate_loads_eliminate]: 7.30998e-06 [parameter_eliminate]: 2.74999e-06 [a_2]: 0.00024441 [accelerated_algorithm]: 3.108e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 3.20998e-06 [shard_inline]: 1.598e-05 [merge_send_recv]: 1.623e-05 [auto_parallel]: 1.085e-05 [parallel]: 1.945e-05 [flash_sp]: 1.139e-05 [merge_comm]: 9.54999e-06 [allreduce_fusion]: 8.79998e-06 [matmul_add_comm_reduction]: 2.648e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.806e-05 [virtual_dataset]: 1.555e-05 [get_grad_eliminate_]: 1.504e-05 [virtual_output]: 1.527e-05 [merge_forward]: 9.76e-06 [cell_reuse_recompute_pass]: 1.14003e-06 [offload_activation]: 1.795e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.824e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 2.752e-05 [set_forward_comm_id_for_comm_node_pass]: 9.52001e-06 [meta_fg_expand]: 0.00136403 [flash_sp_send_recv_attached]: 3.76001e-06 [receive_attached]: 2.97002e-06 [after_resolve]: 6.013e-05 [a_after_grad]: 8e-05 [renormalize]: 0.00237261 [add_forward_monad_depend]: 8.84e-06 [auto_monad_grad]: 5.69999e-06 [auto_monad_eliminator]: 5.562e-05 [cse]: 0.00016376 [a_3]: 0.00033697 [Cycle 2]: 0.00296989, [45] [expand_dump_flag]: 1.46002e-06 [switch_simplify]: 4.705e-05 [loop_unroll]: 4.362e-05 [a_1]: 0.00156051 [with_stream_mark]: 1.184e-05 [recompute_prepare]: 1.071e-05 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 4.48001e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 1.09e-06 [a_2]: 0.00012624 [accelerated_algorithm]: 1.208e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 6.89001e-06 [auto_parallel]: 7.18998e-06 [parallel]: 4.62e-06 [flash_sp]: 3.37002e-06 [merge_comm]: 5.06997e-06 [allreduce_fusion]: 4.60999e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.97e-06 [get_grad_eliminate_]: 8.97e-06 [virtual_output]: 8.24998e-06 [merge_forward]: 4.37998e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.659e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 1.414e-05 [set_forward_comm_id_for_comm_node_pass]: 5.41998e-06 [meta_fg_expand]: 3.418e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.509e-05 [a_after_grad]: 1.427e-05 [renormalize]: 0.00057353 [add_forward_monad_depend]: 3.56999e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.385e-05 [cse]: 4.481e-05 [a_3]: 6.541e-05 [Cycle 3]: 0.0008913, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 1.059e-05 [loop_unroll]: 9.01002e-06 [a_1]: 0.00025029 [with_stream_mark]: 9.51e-06 [recompute_prepare]: 9.46998e-06 [updatestate_depend_eliminate]: 4.68999e-06 [updatestate_assign_eliminate]: 3.92002e-06 [updatestate_loads_eliminate]: 3.82002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00012282 [accelerated_algorithm]: 1.176e-05 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 9.20001e-06 [merge_send_recv]: 6.83e-06 [auto_parallel]: 6.91001e-06 [parallel]: 4.55001e-06 [flash_sp]: 1.11002e-06 [merge_comm]: 4.89998e-06 [allreduce_fusion]: 5.07999e-06 [matmul_add_comm_reduction]: 7.65998e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.037e-05 [virtual_dataset]: 8.61002e-06 [get_grad_eliminate_]: 8.43001e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 8.85999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.561e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.387e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29998e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.31e-05 [a_after_grad]: 1.409e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.061e-05 [cse]: 2.381e-05 [a_3]: 5.595e-05 [py_interpret_to_execute_after_opt_a]: 9.67999e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 4.771e-05 [convert_after_rewriter]: 9.97999e-06 [order_py_execute_after_rewriter]: 7.58001e-06 [mutable_eliminate]: 0.00045131 [opt_b]: 0.00028708, [1] [Cycle 1]: 0.00028051, [7] [b_1]: 0.00018989 [b_2]: 1.059e-05 [updatestate_depend_eliminate]: 7.03998e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.83999e-06 [renormalize]: 3.9002e-07 [cse]: 2.998e-05 [optimize_parallel_all_gather_comm]: 2.062e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.045e-05 [loop_unroll]: 0.00041504 [opt_after_cconv]: 0.00013449, [1] [Cycle 1]: 0.00012892, [7] [c_1]: 4.859e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 7.06999e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.78999e-06 [cse]: 2.936e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.804e-05 [tuple_transform]: 0.00010052, [1] [Cycle 1]: 9.618e-05, [4] [d_1]: 6.599e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 9.82001e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 0.00010384 [cse_after_recomputation]: 3.203e-05, [1] [Cycle 1]: 2.721e-05, [1] [cse]: 2.127e-05 [environ_conv]: 8.80999e-06 [swap_dp_allreduce_reducescatter]: 7.66999e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.63002e-06 [slice_recompute_activation]: 2.28998e-06 [micro_interleaved_order_control]: 2.26998e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.63003e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.804e-05 [grouped_pairwise_exchange_alltoall]: 2.06998e-06 [offloading_packed_experts]: 5.15999e-06 [overlap_recompute_and_grad_model_parallel]: 5.62999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.74999e-06 [overlap_grad_ring_attention]: 5.56e-06 [overlap_grad_flash_sp]: 2.443e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 9.93e-05, [1] [Cycle 1]: 9.522e-05, [6] [build]: 1.001e-05 [elim_shapecalc]: 1.373e-05 [elim_not_effective]: 1.844e-05 [opt_reshape]: 1.027e-05 [fold_const_symbol]: 1.466e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.66002e-06 [auto_monad_reorder]: 2.451e-05 [get_jit_bprop_graph]: 1.15999e-06 [rewriter_after_jit_bprop_graph]: 3.26001e-06 [opt_after_jit_grad]: 0.00046257 [validate]: 4.498e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00837973 [execute]: 7.43e-06 Sums bootstrap : 0.000499s : 1.57% type_inference : 0.010215s : 32.07% event_method : 0.000040s : 0.13% auto_monad : 0.000117s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000126s : 0.40% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003204s : 10.06% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.55% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001401s : 4.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000108s : 0.34% optimize.opt_a.renormalize : 0.002946s : 9.25% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000232s : 0.73% optimize.opt_a.a_3 : 0.000458s : 1.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000008s : 0.02% optimize.mutable_eliminate : 0.000451s : 1.42% optimize.opt_b.b_1 : 0.000190s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000415s : 1.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000104s : 0.33% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000463s : 1.45% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008380s : 26.31% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000779 218 5.58% : 0.000044s : 11: substitution.arithmetic_simplify 1.80% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 57.00% : 0.000444s : 16: substitution.inline 1.93% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000014s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.69% : 0.000013s : 20: substitution.remove_not_recompute_node 3.20% : 0.000025s : 10: substitution.replace_applicator 1.35% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.05% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010146 2 87.46% : 0.008873s : 1: type_inference.infer 12.54% : 0.001272s : 1: type_inference.specialize ------[replace.] 0.000205 30 59.48% : 0.000122s : 16: replace.inline 40.52% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000468 30 93.20% : 0.000436s : 16: match.inline 6.80% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000732 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.01% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.33% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.58% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.62% : 0.000019s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.16% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.20% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.58% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.66% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.53% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001455 32 58.02% : 0.000844s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.98% : 0.000611s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060176 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.93% : 0.002966s : 1: add_attr 4.92% : 0.002958s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000109s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000124s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000530s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000014s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.06% : 0.004851s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.88% : 0.010757s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.78% : 0.000472s : 1: opt_after_jit_grad 0.48% : 0.000291s : 1: opt_b 21.66% : 0.013033s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000011s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.64% : 0.001591s : 2: renormalize.infer 2.23% : 0.001343s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000052s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000102s : 1: symbol_engine_optimizer 13.94% : 0.008390s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.00% : 0.010230s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x7-kbk],max_mem:30.0M TotalTime = 0.118657, [24] [bootstrap]: 0.00060456 [type_inference]: 0.00605061 [event_method]: 1.389e-05 [auto_monad]: 5.962e-05 [graph_reusing]: 5.22e-06 [inline]: 1.69e-06 [add_attr]: 0.0033955, [1] [add_attr_with_inline]: 0.00338512, [1] [Cycle 1]: 4.528e-05, [2] [tag_attr]: 1.534e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 2.809e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00395718, [53] [py_interpret_to_execute]: 1.975e-05 [rewriter_before_opt_a]: 5.825e-05 [opt_a]: 0.00212315, [2] [Cycle 1]: 0.00151118, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.303e-05 [loop_unroll]: 2.071e-05 [a_1]: 0.00045769 [with_stream_mark]: 1.375e-05 [recompute_prepare]: 7.93999e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.432e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.36e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 5.71998e-06 [merge_send_recv]: 8.25e-06 [auto_parallel]: 5.87999e-06 [parallel]: 2.43e-05 [flash_sp]: 8.05e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.64002e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.35998e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.93998e-06 [virtual_output]: 5.46e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 9.20001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.092e-05 [merge_recompute_call_nodes]: 1.71998e-06 [before_grad]: 8.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.23998e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00040454 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 2.01e-06 [auto_monad_eliminator]: 1.407e-05 [cse]: 2.808e-05 [a_3]: 4.022e-05 [Cycle 2]: 0.00060269, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012607 [with_stream_mark]: 1.003e-05 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.65002e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.688e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.51998e-06 [parallel]: 3.88001e-06 [flash_sp]: 3.50998e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.70997e-06 [matmul_add_comm_reduction]: 4.97999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.37999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.02998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 2.388e-05 [a_after_grad]: 8.49002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.44e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.46e-06 [cse]: 1.261e-05 [a_3]: 3.157e-05 [py_interpret_to_execute_after_opt_a]: 7.18e-06 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 3.075e-05 [convert_after_rewriter]: 6.97002e-06 [order_py_execute_after_rewriter]: 5.57001e-06 [mutable_eliminate]: 0.0004449 [opt_b]: 0.0001809, [1] [Cycle 1]: 0.00017489, [7] [b_1]: 0.00010605 [b_2]: 6.54999e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.08002e-06 [renormalize]: 3.60014e-07 [cse]: 1.925e-05 [optimize_parallel_all_gather_comm]: 1.6e-05 [overlap_param_gather]: 1.68002e-06 [cconv]: 2.33e-05 [loop_unroll]: 0.00041258 [opt_after_cconv]: 9.316e-05, [1] [Cycle 1]: 8.739e-05, [7] [c_1]: 2.723e-05 [parameter_eliminate]: 2.28998e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.52e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.352e-05 [tuple_transform]: 6.883e-05, [1] [Cycle 1]: 6.426e-05, [4] [d_1]: 3.914e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 5.93998e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.113e-05 [cse_after_recomputation]: 2.115e-05, [1] [Cycle 1]: 1.636e-05, [1] [cse]: 1.096e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 5.26002e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.70999e-06 [label_fine_grained_interleaved_index]: 3.09999e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.67001e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.44999e-06 [reorder_send_recv_between_fp_bp]: 3.09001e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 1.32e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.46999e-06 [overlap_recompute_and_grad_model_parallel]: 4.22998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 3.82998e-06 [overlap_grad_flash_sp]: 1.743e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.36998e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.29003e-06 [symbol_engine_optimizer]: 6.695e-05, [1] [Cycle 1]: 6.289e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.05999e-06 [elim_not_effective]: 1.133e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.73002e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.483e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00044433 [validate]: 3.035e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.10381 [execute]: 1.011e-05 Sums bootstrap : 0.000605s : 0.53% type_inference : 0.006051s : 5.29% event_method : 0.000014s : 0.01% auto_monad : 0.000060s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000584s : 0.51% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000034s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000405s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000445s : 0.39% optimize.opt_b.b_1 : 0.000106s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000413s : 0.36% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000444s : 0.39% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.103810s : 90.83% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000166 30 15.17% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.58% : 0.000006s : 4: substitution.graph_param_transform 65.63% : 0.000109s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 7.19% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006004 2 90.68% : 0.005445s : 1: type_inference.infer 9.32% : 0.000559s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.22% : 0.000028s : 3: replace.inline 28.78% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 90.76% : 0.000107s : 3: match.inline 9.24% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 1.06% : 0.000002s : 11: predicate.accumulaten_eliminater 0.84% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 1.01% : 0.000002s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.94% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.01% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.31% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 0.99% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.89% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.05% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.65% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.67% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.33% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.127523 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.67% : 0.003400s : 1: add_attr 2.66% : 0.003389s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000065s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.51% : 0.000644s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000454s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.76% : 0.000964s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000088s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.67% : 0.002126s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.36% : 0.000454s : 1: opt_after_jit_grad 0.14% : 0.000184s : 1: opt_b 3.11% : 0.003961s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000207s : 1: renormalize.infer 0.15% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 81.42% : 0.103833s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.76% : 0.006064s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.1118, [24] [bootstrap]: 0.00049158 [type_inference]: 0.00440315 [event_method]: 1.14e-05 [auto_monad]: 5.189e-05 [graph_reusing]: 5.32001e-06 [inline]: 2.39999e-06 [add_attr]: 0.00294775, [1] [add_attr_with_inline]: 0.00293917, [1] [Cycle 1]: 4.32e-05, [2] [tag_attr]: 1.193e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 2.80002e-06 [pre_auto_parallel]: 2.142e-05 [insert-virtual-dataset]: 2.86e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.80001e-06 [optimize]: 0.00369466, [53] [py_interpret_to_execute]: 1.507e-05 [rewriter_before_opt_a]: 3.869e-05 [opt_a]: 0.00188646, [2] [Cycle 1]: 0.00128405, [45] [expand_dump_flag]: 3.36999e-06 [switch_simplify]: 2.521e-05 [loop_unroll]: 1.352e-05 [a_1]: 0.00031249 [with_stream_mark]: 1.389e-05 [recompute_prepare]: 7.63001e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.60998e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 2.14e-06 [a_2]: 7.516e-05 [accelerated_algorithm]: 6.09001e-06 [shard]: 2.20002e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 7.47002e-06 [auto_parallel]: 5.97999e-06 [parallel]: 1.875e-05 [flash_sp]: 7.45998e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 6.98998e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.80002e-06 [merge_forward]: 3.65998e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.38002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.108e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.67999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 8.95999e-06 [renormalize]: 0.00035057 [add_forward_monad_depend]: 4.97999e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.276e-05 [cse]: 2.632e-05 [a_3]: 4.027e-05 [Cycle 2]: 0.00059297, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.39998e-06 [a_1]: 0.000126 [with_stream_mark]: 9.79e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.686e-05 [accelerated_algorithm]: 5.24e-06 [shard]: 1.35001e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.35001e-06 [merge_send_recv]: 4.12e-06 [auto_parallel]: 5.11997e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.5e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.44e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.09998e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 4.85001e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.81003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.11998e-06 [merge_recompute_call_nodes]: 8.09989e-07 [before_grad]: 7.83999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 1.82001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.20999e-06 [a_after_grad]: 8.28999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 7.50006e-07 [auto_monad_eliminator]: 5.82999e-06 [cse]: 1.3e-05 [a_3]: 3.215e-05 [py_interpret_to_execute_after_opt_a]: 7.56999e-06 [slice_cell_reuse_recomputed_activation]: 2.06003e-06 [rewriter_after_opt_a]: 3.072e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00045008 [opt_b]: 0.00018021, [1] [Cycle 1]: 0.00017458, [7] [b_1]: 0.00010734 [b_2]: 6.94001e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.16003e-06 [renormalize]: 5.00004e-07 [cse]: 1.585e-05 [optimize_parallel_all_gather_comm]: 1.592e-05 [overlap_param_gather]: 2.53e-06 [cconv]: 2.412e-05 [loop_unroll]: 0.00041486 [opt_after_cconv]: 9.382e-05, [1] [Cycle 1]: 8.822e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.06e-06 [cse]: 1.621e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.326e-05 [tuple_transform]: 6.961e-05, [1] [Cycle 1]: 6.55e-05, [4] [d_1]: 3.96e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.321e-05 [cse_after_recomputation]: 1.996e-05, [1] [Cycle 1]: 1.563e-05, [1] [cse]: 1.053e-05 [environ_conv]: 4.3e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.47e-06 [label_fine_grained_interleaved_index]: 2.75002e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.73e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.69002e-06 [overlap_recompute_and_grad_model_parallel]: 4.77998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.31998e-06 [overlap_grad_ring_attention]: 4.04002e-06 [overlap_grad_flash_sp]: 1.575e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 6.883e-05, [1] [Cycle 1]: 6.474e-05, [6] [build]: 2.71e-06 [elim_shapecalc]: 8.86002e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 6.29999e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.552e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00048209 [validate]: 3.244e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.099409 [execute]: 9.37001e-06 Sums bootstrap : 0.000492s : 0.46% type_inference : 0.004403s : 4.08% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000438s : 0.41% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000351s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000450s : 0.42% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000003s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000415s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000482s : 0.45% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.099409s : 92.14% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.65% : 0.000023s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.57% : 0.000006s : 4: substitution.graph_param_transform 65.09% : 0.000079s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.53% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004363 2 91.64% : 0.003998s : 1: type_inference.infer 8.36% : 0.000365s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.11% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.75% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.38% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.66% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 4: predicate.value_based_eliminate 1.13% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000269 6 42.82% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.18% : 0.000154s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119737 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.47% : 0.002952s : 1: add_attr 2.46% : 0.002943s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000522s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.66% : 0.000787s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.58% : 0.001890s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.41% : 0.000492s : 1: opt_after_jit_grad 0.15% : 0.000184s : 1: opt_b 3.09% : 0.003698s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000186s : 1: renormalize.infer 0.13% : 0.000158s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 83.04% : 0.099432s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.69% : 0.004417s : 1: type_inference 0.05% : 0.000055s : 1: validate TotalTime = 0.117708, [24] [bootstrap]: 0.00047089 [type_inference]: 0.00559832 [event_method]: 1.438e-05 [auto_monad]: 5.832e-05 [graph_reusing]: 6.44001e-06 [inline]: 1.94999e-06 [add_attr]: 0.00303186, [1] [add_attr_with_inline]: 0.00302402, [1] [Cycle 1]: 4.591e-05, [2] [tag_attr]: 1.556e-05 [meta_addattr_fg_expand]: 5.11002e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 2.607e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.19999e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.00405011, [53] [py_interpret_to_execute]: 2.408e-05 [rewriter_before_opt_a]: 5.994e-05 [opt_a]: 0.00218365, [2] [Cycle 1]: 0.00157428, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 3.258e-05 [loop_unroll]: 2.07e-05 [a_1]: 0.00045687 [with_stream_mark]: 1.454e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 4.32e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 7.642e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 2.02999e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 9.37001e-06 [auto_parallel]: 6.38e-06 [parallel]: 1.828e-05 [flash_sp]: 7.21999e-06 [merge_comm]: 3.72002e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 9.64e-06 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 7.30003e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.53997e-06 [virtual_output]: 8.55001e-06 [merge_forward]: 3.80998e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.73998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 1.73002e-06 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 1.068e-05 [a_after_grad]: 8.79e-06 [renormalize]: 0.00045928 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.519e-05 [cse]: 2.834e-05 [a_3]: 4.157e-05 [Cycle 2]: 0.00060008, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.16999e-06 [loop_unroll]: 5.57001e-06 [a_1]: 0.00012371 [with_stream_mark]: 1.044e-05 [recompute_prepare]: 5.96e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 6.806e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.39998e-06 [merge_send_recv]: 4.55001e-06 [auto_parallel]: 5.35999e-06 [parallel]: 4.3e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 3.05002e-06 [allreduce_fusion]: 2.97002e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.25002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 1.04e-06 [receive_attached]: 1.04998e-06 [after_resolve]: 8.60001e-06 [a_after_grad]: 7.98999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.21e-06 [cse]: 1.383e-05 [a_3]: 3.22e-05 [py_interpret_to_execute_after_opt_a]: 7.9e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.171e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.57999e-06 [mutable_eliminate]: 0.00045268 [opt_b]: 0.00018687, [1] [Cycle 1]: 0.00018097, [7] [b_1]: 0.00011244 [b_2]: 7.33e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 2.20025e-07 [cse]: 1.67e-05 [optimize_parallel_all_gather_comm]: 1.672e-05 [overlap_param_gather]: 2.12999e-06 [cconv]: 2.309e-05 [loop_unroll]: 0.00042056 [opt_after_cconv]: 9.633e-05, [1] [Cycle 1]: 9.059e-05, [7] [c_1]: 2.773e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.718e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.255e-05 [tuple_transform]: 7.007e-05, [1] [Cycle 1]: 6.563e-05, [4] [d_1]: 3.996e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.06998e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 4.66e-05 [cse_after_recomputation]: 2.048e-05, [1] [Cycle 1]: 1.603e-05, [1] [cse]: 1.078e-05 [environ_conv]: 4.83001e-06 [swap_dp_allreduce_reducescatter]: 5.33002e-06 [bias_add_comm_swap]: 2.78003e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.57001e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.52001e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 3.08998e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.14998e-06 [overlap_opt_shard_in_pipeline]: 1.30001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01998e-06 [control_data_broadcast_order]: 1.23e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.62998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 3.89002e-06 [overlap_grad_flash_sp]: 1.681e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.43002e-06 [symbol_engine_optimizer]: 6.838e-05, [1] [Cycle 1]: 6.414e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 8.54e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 6.28998e-06 [fold_const_symbol]: 8.94998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00045069 [validate]: 3.194e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.103712 [execute]: 9.74e-06 Sums bootstrap : 0.000471s : 0.41% type_inference : 0.005598s : 4.92% event_method : 0.000014s : 0.01% auto_monad : 0.000058s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.02% optimize.rewriter_before_opt_a : 0.000060s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000581s : 0.51% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000014s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000459s : 0.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000453s : 0.40% optimize.opt_b.b_1 : 0.000112s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000421s : 0.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000451s : 0.40% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.103712s : 91.22% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000166 30 15.10% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.35% : 0.000006s : 4: substitution.graph_param_transform 66.10% : 0.000110s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.59% : 0.000004s : 4: substitution.replace_old_param 6.77% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005556 2 89.53% : 0.004975s : 1: type_inference.infer 10.47% : 0.000582s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.17% : 0.000028s : 3: replace.inline 29.83% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.34% : 0.000108s : 3: match.inline 8.66% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.11% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.78% : 0.000009s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.48% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.23% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.52% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.85% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.26% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.64% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000374 8 43.79% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.21% : 0.000210s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.126352 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.40% : 0.003036s : 1: add_attr 2.40% : 0.003028s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000064s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.40% : 0.000503s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.75% : 0.000949s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000092s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.73% : 0.002187s : 1: opt_a 0.08% : 0.000100s : 1: opt_after_cconv 0.36% : 0.000460s : 1: opt_after_jit_grad 0.15% : 0.000191s : 1: opt_b 3.21% : 0.004054s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.19% : 0.000237s : 1: renormalize.infer 0.17% : 0.000215s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 82.10% : 0.103735s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.44% : 0.005612s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.147673, [24] [bootstrap]: 0.00050903 [type_inference]: 0.0117762 [event_method]: 4.994e-05 [auto_monad]: 0.00012588 [graph_reusing]: 8.63001e-06 [inline]: 2.12999e-06 [add_attr]: 0.00301743, [1] [add_attr_with_inline]: 0.00300893, [1] [Cycle 1]: 7.225e-05, [2] [tag_attr]: 3.569e-05 [meta_addattr_fg_expand]: 9.74999e-06 [parallel-infer-symbol]: 3.11999e-06 [pre_auto_parallel]: 5.044e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.17001e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.0136813, [53] [py_interpret_to_execute]: 4.01e-05 [rewriter_before_opt_a]: 0.00014737 [opt_a]: 0.0113214, [3] [Cycle 1]: 0.00726452, [45] [expand_dump_flag]: 4.12003e-06 [switch_simplify]: 9.646e-05 [loop_unroll]: 6.254e-05 [a_1]: 0.00146567 [with_stream_mark]: 2.324e-05 [recompute_prepare]: 2.2e-05 [updatestate_depend_eliminate]: 9.40001e-06 [updatestate_assign_eliminate]: 8.60999e-06 [updatestate_loads_eliminate]: 7.58001e-06 [parameter_eliminate]: 3.01999e-06 [a_2]: 0.00024649 [accelerated_algorithm]: 3.113e-05 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 3.57997e-06 [shard_inline]: 1.631e-05 [merge_send_recv]: 1.587e-05 [auto_parallel]: 1.075e-05 [parallel]: 1.921e-05 [flash_sp]: 1.188e-05 [merge_comm]: 9.67999e-06 [allreduce_fusion]: 9.15001e-06 [matmul_add_comm_reduction]: 2.709e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.824e-05 [virtual_dataset]: 1.6e-05 [get_grad_eliminate_]: 1.549e-05 [virtual_output]: 1.549e-05 [merge_forward]: 9.09e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 1.821e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.939e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.753e-05 [set_forward_comm_id_for_comm_node_pass]: 9.68002e-06 [meta_fg_expand]: 0.00142623 [flash_sp_send_recv_attached]: 3.69002e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 6.072e-05 [a_after_grad]: 0.00012318 [renormalize]: 0.00251038 [add_forward_monad_depend]: 8.92999e-06 [auto_monad_grad]: 4.94e-06 [auto_monad_eliminator]: 5.8e-05 [cse]: 0.00017453 [a_3]: 0.00034067 [Cycle 2]: 0.00311313, [45] [expand_dump_flag]: 1.40999e-06 [switch_simplify]: 4.763e-05 [loop_unroll]: 4.427e-05 [a_1]: 0.00158829 [with_stream_mark]: 1.248e-05 [recompute_prepare]: 1.15e-05 [updatestate_depend_eliminate]: 5.64998e-06 [updatestate_assign_eliminate]: 4.50001e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 0.00012921 [accelerated_algorithm]: 1.228e-05 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.99e-06 [shard_inline]: 9.59999e-06 [merge_send_recv]: 6.94999e-06 [auto_parallel]: 7.83999e-06 [parallel]: 4.59002e-06 [flash_sp]: 3.38999e-06 [merge_comm]: 5.29998e-06 [allreduce_fusion]: 5.64e-06 [matmul_add_comm_reduction]: 8.46002e-06 [allreduce_slice_to_reducescatter]: 3.49974e-07 [virtual_shard_identity]: 1.038e-05 [virtual_dataset]: 9.13002e-06 [get_grad_eliminate_]: 8.89e-06 [virtual_output]: 8.82e-06 [merge_forward]: 4.73001e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 9.34e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.662e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.449e-05 [set_forward_comm_id_for_comm_node_pass]: 5.28002e-06 [meta_fg_expand]: 7.228e-05 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.10001e-06 [after_resolve]: 1.611e-05 [a_after_grad]: 1.514e-05 [renormalize]: 0.0006188 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.32e-06 [auto_monad_eliminator]: 1.519e-05 [cse]: 4.901e-05 [a_3]: 6.728e-05 [Cycle 3]: 0.00093042, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.111e-05 [loop_unroll]: 9.28002e-06 [a_1]: 0.00025633 [with_stream_mark]: 9.76e-06 [recompute_prepare]: 9.82001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.02e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 0.00012755 [accelerated_algorithm]: 1.201e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.91003e-06 [shard_inline]: 9.25001e-06 [merge_send_recv]: 7.22002e-06 [auto_parallel]: 7.50998e-06 [parallel]: 4.79e-06 [flash_sp]: 1.04e-06 [merge_comm]: 5.09e-06 [allreduce_fusion]: 5.09e-06 [matmul_add_comm_reduction]: 7.95998e-06 [allreduce_slice_to_reducescatter]: 4.80009e-07 [virtual_shard_identity]: 1.062e-05 [virtual_dataset]: 9.09e-06 [get_grad_eliminate_]: 8.82e-06 [virtual_output]: 8.43999e-06 [merge_forward]: 4.10998e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.83001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.677e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.451e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24998e-06 [meta_fg_expand]: 3.37002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.449e-05 [a_after_grad]: 1.596e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.33002e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.141e-05 [cse]: 2.809e-05 [a_3]: 6.198e-05 [py_interpret_to_execute_after_opt_a]: 1.093e-05 [slice_cell_reuse_recomputed_activation]: 2.37001e-06 [rewriter_after_opt_a]: 5.011e-05 [convert_after_rewriter]: 9.31998e-06 [order_py_execute_after_rewriter]: 6.78e-06 [mutable_eliminate]: 0.00046749 [opt_b]: 0.00033472, [1] [Cycle 1]: 0.00032771, [7] [b_1]: 0.00023012 [b_2]: 1.141e-05 [updatestate_depend_eliminate]: 7.35e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.18001e-06 [renormalize]: 3.19997e-07 [cse]: 3.347e-05 [optimize_parallel_all_gather_comm]: 2.209e-05 [overlap_param_gather]: 2.32001e-06 [cconv]: 2.253e-05 [loop_unroll]: 0.00043407 [opt_after_cconv]: 0.00014011, [1] [Cycle 1]: 0.00013354, [7] [c_1]: 4.971e-05 [parameter_eliminate]: 2.53e-06 [updatestate_depend_eliminate]: 7.37002e-06 [updatestate_assign_eliminate]: 4.32998e-06 [updatestate_loads_eliminate]: 4.11001e-06 [cse]: 3.098e-05 [renormalize]: 6.10016e-07 [remove_dup_value]: 2.889e-05 [tuple_transform]: 0.00010195, [1] [Cycle 1]: 9.711e-05, [4] [d_1]: 6.715e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.19995e-07 [switch_simplify]: 9.83998e-06 [partial_unused_args_eliminate]: 1.94999e-06 [add_recomputation]: 5.856e-05 [cse_after_recomputation]: 3.384e-05, [1] [Cycle 1]: 2.885e-05, [1] [cse]: 2.306e-05 [environ_conv]: 9.03002e-06 [swap_dp_allreduce_reducescatter]: 8.41002e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 1.10999e-06 [remove_cast_before_assign_add]: 8.2e-07 [full_micro_interleaved_order_control]: 2.57001e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.79e-05 [grouped_pairwise_exchange_alltoall]: 1.88002e-06 [offloading_packed_experts]: 4.81002e-06 [overlap_recompute_and_grad_model_parallel]: 5.96998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.77001e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 5.40001e-06 [overlap_grad_flash_sp]: 2.428e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 9.965e-05, [1] [Cycle 1]: 9.537e-05, [6] [build]: 1.034e-05 [elim_shapecalc]: 1.381e-05 [elim_not_effective]: 1.775e-05 [opt_reshape]: 1.019e-05 [fold_const_symbol]: 1.464e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 2.641e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00046345 [validate]: 4.682e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.117665 [execute]: 9.74999e-06 Sums bootstrap : 0.000509s : 0.36% type_inference : 0.011776s : 8.21% event_method : 0.000050s : 0.03% auto_monad : 0.000126s : 0.09% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.03% optimize.rewriter_before_opt_a : 0.000147s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000155s : 0.11% optimize.opt_a.loop_unroll : 0.000116s : 0.08% optimize.opt_a.a_1 : 0.003310s : 2.31% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000503s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001502s : 1.05% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.06% optimize.opt_a.a_after_grad : 0.000154s : 0.11% optimize.opt_a.renormalize : 0.003129s : 2.18% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.06% optimize.opt_a.cse : 0.000252s : 0.18% optimize.opt_a.a_3 : 0.000470s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000467s : 0.33% optimize.opt_b.b_1 : 0.000230s : 0.16% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000434s : 0.30% optimize.opt_after_cconv.c_1 : 0.000050s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000463s : 0.32% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.117665s : 82.07% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000825 222 5.63% : 0.000046s : 12: substitution.arithmetic_simplify 1.73% : 0.000014s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.44% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 53.18% : 0.000439s : 17: substitution.inline 6.86% : 0.000057s : 2: substitution.inline_without_move 1.28% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.89% : 0.000016s : 3: substitution.less_batch_normalization 1.64% : 0.000014s : 11: substitution.minmaximum_grad 0.64% : 0.000005s : 5: substitution.partial_eliminate 1.71% : 0.000014s : 20: substitution.remove_not_recompute_node 3.03% : 0.000025s : 10: substitution.replace_applicator 1.29% : 0.000011s : 15: substitution.replace_old_param 0.28% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.42% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.72% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.20% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.22% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.20% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011698 2 85.94% : 0.010053s : 1: type_inference.infer 14.06% : 0.001644s : 1: type_inference.specialize ------[replace.] 0.000227 33 57.15% : 0.000130s : 17: replace.inline 42.85% : 0.000097s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000465 33 92.37% : 0.000430s : 17: match.inline 7.63% : 0.000036s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000761 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.70% : 0.000043s : 249: predicate.inline 1.23% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000009s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.04% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.92% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.04% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.20% : 0.000002s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001693 34 56.67% : 0.000960s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.33% : 0.000734s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.172900 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.75% : 0.003021s : 1: add_attr 1.74% : 0.003013s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000133s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.31% : 0.000544s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000056s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.26% : 0.000443s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000477s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.93% : 0.005069s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000181s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.55% : 0.011325s : 1: opt_a 0.08% : 0.000144s : 1: opt_after_cconv 0.27% : 0.000474s : 1: opt_after_jit_grad 0.20% : 0.000339s : 1: opt_b 7.92% : 0.013685s : 1: optimize 0.01% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000045s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.96% : 0.001661s : 2: renormalize.infer 0.84% : 0.001455s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000054s : 1: rewriter_after_opt_a 0.09% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000102s : 1: symbol_engine_optimizer 68.07% : 0.117688s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.82% : 0.011791s : 1: type_inference 0.04% : 0.000072s : 1: validate TotalTime = 0.108106, [24] [bootstrap]: 0.00045185 [type_inference]: 0.00435803 [event_method]: 1.117e-05 [auto_monad]: 5.295e-05 [graph_reusing]: 5.57001e-06 [inline]: 1.80001e-06 [add_attr]: 0.00301554, [1] [add_attr_with_inline]: 0.00300719, [1] [Cycle 1]: 4.311e-05, [2] [tag_attr]: 1.187e-05 [meta_addattr_fg_expand]: 3.73001e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.179e-05 [insert-virtual-dataset]: 2.92002e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00372472, [53] [py_interpret_to_execute]: 1.57e-05 [rewriter_before_opt_a]: 3.933e-05 [opt_a]: 0.00190494, [2] [Cycle 1]: 0.00130166, [45] [expand_dump_flag]: 3.15002e-06 [switch_simplify]: 2.481e-05 [loop_unroll]: 1.448e-05 [a_1]: 0.00029539 [with_stream_mark]: 1.371e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 4.25e-06 [updatestate_assign_eliminate]: 3.54002e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.834e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.42001e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 7.98999e-06 [auto_parallel]: 5.76e-06 [parallel]: 1.84e-05 [flash_sp]: 7.45998e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.50998e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 6.33002e-06 [merge_forward]: 3.57002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.048e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.131e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 9.85002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.112e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 0.00035064 [add_forward_monad_depend]: 4.16001e-06 [auto_monad_grad]: 1.66002e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 2.826e-05 [a_3]: 6.467e-05 [Cycle 2]: 0.00059392, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012531 [with_stream_mark]: 1.157e-05 [recompute_prepare]: 5.84999e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.827e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.59998e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.08002e-06 [parallel]: 4.27e-06 [flash_sp]: 3.20002e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.10001e-06 [virtual_output]: 4.88001e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 6.14001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.11002e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.96001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 8.02e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.03002e-06 [cse]: 1.374e-05 [a_3]: 3.191e-05 [py_interpret_to_execute_after_opt_a]: 7.15e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 3.167e-05 [convert_after_rewriter]: 7.06999e-06 [order_py_execute_after_rewriter]: 6.03002e-06 [mutable_eliminate]: 0.00044809 [opt_b]: 0.00018157, [1] [Cycle 1]: 0.00017551, [7] [b_1]: 0.00010789 [b_2]: 6.89999e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.14e-06 [renormalize]: 4.40021e-07 [cse]: 1.693e-05 [optimize_parallel_all_gather_comm]: 1.644e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.321e-05 [loop_unroll]: 0.00041602 [opt_after_cconv]: 9.433e-05, [1] [Cycle 1]: 8.869e-05, [7] [c_1]: 2.733e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.639e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.203e-05 [tuple_transform]: 7.038e-05, [1] [Cycle 1]: 6.575e-05, [4] [d_1]: 3.989e-05 [none_parameter_eliminate]: 1.61998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 2.09e-06 [add_recomputation]: 4.566e-05 [cse_after_recomputation]: 2.076e-05, [1] [Cycle 1]: 1.625e-05, [1] [cse]: 1.116e-05 [environ_conv]: 5.16002e-06 [swap_dp_allreduce_reducescatter]: 5.24998e-06 [bias_add_comm_swap]: 2.22999e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 3.36001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.15002e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.204e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.88999e-06 [overlap_recompute_and_grad_model_parallel]: 4.56002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.23998e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.69e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.931e-05, [1] [Cycle 1]: 6.522e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.87999e-06 [elim_not_effective]: 1.162e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.66002e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.614e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00044536 [validate]: 3.306e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0957337 [execute]: 9.09e-06 Sums bootstrap : 0.000452s : 0.43% type_inference : 0.004358s : 4.19% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000421s : 0.40% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000351s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000097s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.43% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000416s : 0.40% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000445s : 0.43% validate : 0.000033s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.095734s : 91.94% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 17.89% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.18% : 0.000001s : 2: substitution.fold_const_symbol 4.49% : 0.000005s : 4: substitution.graph_param_transform 65.77% : 0.000080s : 2: substitution.inline 2.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 3.04% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004318 2 91.44% : 0.003948s : 1: type_inference.infer 8.56% : 0.000370s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000159 984 0.69% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.63% : 0.000001s : 9: predicate.addn_zero_filter 0.59% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.08% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.70% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.76% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.70% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.94% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.89% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.90% : 0.000001s : 13: predicate.environ_get_depend_swap 1.68% : 0.000003s : 21: predicate.environ_get_eliminate 0.91% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.85% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.61% : 0.000003s : 11: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.12% : 0.000008s : 44: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.43% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.88% : 0.000003s : 26: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.57% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.51% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.59% : 0.000001s : 9: predicate.minmaximum_grad 1.06% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.07% : 0.000002s : 11: predicate.partial_defer_inline 1.06% : 0.000002s : 13: predicate.partial_eliminate 0.68% : 0.000001s : 9: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 0.87% : 0.000001s : 9: predicate.reduce_eliminate 1.82% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.05% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.67% : 0.000001s : 9: predicate.reshape_eliminate 14.97% : 0.000024s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000002s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.82% : 0.000001s : 11: predicate.switch_defer_inline 1.47% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.92% : 0.000006s : 41: predicate.switch_simplify 0.63% : 0.000001s : 9: predicate.tile_eliminate 0.72% : 0.000001s : 9: predicate.transpose_eliminate 1.27% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.15% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.19% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.26% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.77% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.52% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.67% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.94% : 0.000002s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000259 6 41.72% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.28% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.116153 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.60% : 0.003020s : 1: add_attr 2.59% : 0.003011s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000484s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000802s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.64% : 0.001908s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.39% : 0.000454s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.21% : 0.003729s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000191s : 1: renormalize.infer 0.13% : 0.000154s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 82.44% : 0.095756s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 3.76% : 0.004372s : 1: type_inference 0.05% : 0.000056s : 1: validate TotalTime = 0.1445, [24] [bootstrap]: 0.00052513 [type_inference]: 0.0104402 [event_method]: 4.65e-05 [auto_monad]: 0.00011945 [graph_reusing]: 7.66999e-06 [inline]: 1.98002e-06 [add_attr]: 0.00303949, [1] [add_attr_with_inline]: 0.00303132, [1] [Cycle 1]: 6.888e-05, [2] [tag_attr]: 3.266e-05 [meta_addattr_fg_expand]: 8.69003e-06 [parallel-infer-symbol]: 2.69001e-06 [pre_auto_parallel]: 4.663e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0132552, [53] [py_interpret_to_execute]: 3.795e-05 [rewriter_before_opt_a]: 0.00012989 [opt_a]: 0.0109927, [3] [Cycle 1]: 0.00706737, [45] [expand_dump_flag]: 3.48e-06 [switch_simplify]: 6.723e-05 [loop_unroll]: 5.477e-05 [a_1]: 0.00135652 [with_stream_mark]: 2.417e-05 [recompute_prepare]: 2.243e-05 [updatestate_depend_eliminate]: 9.32001e-06 [updatestate_assign_eliminate]: 7.97e-06 [updatestate_loads_eliminate]: 7.42002e-06 [parameter_eliminate]: 2.70002e-06 [a_2]: 0.00024867 [accelerated_algorithm]: 3.126e-05 [shard]: 2.63e-06 [meta_shard_fg_expand]: 3.23e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.529e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.919e-05 [flash_sp]: 1.149e-05 [merge_comm]: 9.69e-06 [allreduce_fusion]: 8.80001e-06 [matmul_add_comm_reduction]: 2.626e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 1.83e-05 [virtual_dataset]: 1.655e-05 [get_grad_eliminate_]: 1.581e-05 [virtual_output]: 1.605e-05 [merge_forward]: 9.63002e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.769e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.859e-05 [merge_recompute_call_nodes]: 1.81e-06 [before_grad]: 2.76e-05 [set_forward_comm_id_for_comm_node_pass]: 9.72001e-06 [meta_fg_expand]: 0.00140241 [flash_sp_send_recv_attached]: 3.9e-06 [receive_attached]: 3.11001e-06 [after_resolve]: 6.038e-05 [a_after_grad]: 8.16e-05 [renormalize]: 0.00253482 [add_forward_monad_depend]: 9.42999e-06 [auto_monad_grad]: 5.92999e-06 [auto_monad_eliminator]: 5.657e-05 [cse]: 0.00017106 [a_3]: 0.00033839 [Cycle 2]: 0.00300612, [45] [expand_dump_flag]: 1.41002e-06 [switch_simplify]: 4.662e-05 [loop_unroll]: 4.367e-05 [a_1]: 0.00153652 [with_stream_mark]: 1.181e-05 [recompute_prepare]: 1.099e-05 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.51001e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012598 [accelerated_algorithm]: 1.198e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 9.09998e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 7.36999e-06 [parallel]: 4.85001e-06 [flash_sp]: 3.33998e-06 [merge_comm]: 5.11997e-06 [allreduce_fusion]: 4.68999e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.051e-05 [virtual_dataset]: 9.20001e-06 [get_grad_eliminate_]: 8.84e-06 [virtual_output]: 8.87e-06 [merge_forward]: 4.87e-06 [cell_reuse_recompute_pass]: 9.10019e-07 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.676e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.44e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39998e-06 [meta_fg_expand]: 3.393e-05 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.525e-05 [a_after_grad]: 1.495e-05 [renormalize]: 0.0006226 [add_forward_monad_depend]: 4.3e-06 [auto_monad_grad]: 1.33002e-06 [auto_monad_eliminator]: 1.456e-05 [cse]: 4.692e-05 [a_3]: 6.637e-05 [Cycle 3]: 0.0009049, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 1.056e-05 [loop_unroll]: 9.02e-06 [a_1]: 0.0002507 [with_stream_mark]: 1.001e-05 [recompute_prepare]: 9.71998e-06 [updatestate_depend_eliminate]: 4.65001e-06 [updatestate_assign_eliminate]: 3.94002e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 0.00012397 [accelerated_algorithm]: 1.155e-05 [shard]: 9.49978e-07 [meta_shard_fg_expand]: 1.86998e-06 [shard_inline]: 9.31998e-06 [merge_send_recv]: 7e-06 [auto_parallel]: 7.18e-06 [parallel]: 4.77e-06 [flash_sp]: 1.06997e-06 [merge_comm]: 4.85999e-06 [allreduce_fusion]: 5.06002e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.94998e-06 [get_grad_eliminate_]: 8.55001e-06 [virtual_output]: 8.26002e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 8.44998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.553e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.415e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 2.97002e-06 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.345e-05 [a_after_grad]: 1.435e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 1.105e-05 [cse]: 2.774e-05 [a_3]: 6.007e-05 [py_interpret_to_execute_after_opt_a]: 1.066e-05 [slice_cell_reuse_recomputed_activation]: 1.96003e-06 [rewriter_after_opt_a]: 4.829e-05 [convert_after_rewriter]: 1.003e-05 [order_py_execute_after_rewriter]: 6.94999e-06 [mutable_eliminate]: 0.00045791 [opt_b]: 0.00029025, [1] [Cycle 1]: 0.0002841, [7] [b_1]: 0.00018955 [b_2]: 1.089e-05 [updatestate_depend_eliminate]: 7.36999e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 4.1e-06 [renormalize]: 4.09986e-07 [cse]: 3.288e-05 [optimize_parallel_all_gather_comm]: 2.062e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.068e-05 [loop_unroll]: 0.00042612 [opt_after_cconv]: 0.00013742, [1] [Cycle 1]: 0.00013179, [7] [c_1]: 4.849e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 7.03998e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 4.10998e-06 [cse]: 3.152e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.937e-05 [tuple_transform]: 0.00010251, [1] [Cycle 1]: 9.762e-05, [4] [d_1]: 6.683e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.006e-05 [partial_unused_args_eliminate]: 2.05002e-06 [add_recomputation]: 5.892e-05 [cse_after_recomputation]: 3.246e-05, [1] [Cycle 1]: 2.78e-05, [1] [cse]: 2.227e-05 [environ_conv]: 9.20999e-06 [swap_dp_allreduce_reducescatter]: 8.47e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 3.96001e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.56002e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.34998e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.752e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 5.38002e-06 [overlap_recompute_and_grad_model_parallel]: 5.89999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.89999e-06 [overlap_grad_ring_attention]: 5.09e-06 [overlap_grad_flash_sp]: 2.418e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.99999e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 9.907e-05, [1] [Cycle 1]: 9.486e-05, [6] [build]: 1.008e-05 [elim_shapecalc]: 1.308e-05 [elim_not_effective]: 1.784e-05 [opt_reshape]: 1.077e-05 [fold_const_symbol]: 1.437e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.56002e-06 [auto_monad_reorder]: 2.575e-05 [get_jit_bprop_graph]: 1.18001e-06 [rewriter_after_jit_bprop_graph]: 3.71999e-06 [opt_after_jit_grad]: 0.00048322 [validate]: 4.73e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.116215 [execute]: 9.86e-06 Sums bootstrap : 0.000525s : 0.37% type_inference : 0.010440s : 7.45% event_method : 0.000047s : 0.03% auto_monad : 0.000119s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000130s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000124s : 0.09% optimize.opt_a.loop_unroll : 0.000107s : 0.08% optimize.opt_a.a_1 : 0.003144s : 2.24% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.36% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000035s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001439s : 1.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.06% optimize.opt_a.a_after_grad : 0.000111s : 0.08% optimize.opt_a.renormalize : 0.003158s : 2.25% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.06% optimize.opt_a.cse : 0.000246s : 0.18% optimize.opt_a.a_3 : 0.000465s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000458s : 0.33% optimize.opt_b.b_1 : 0.000190s : 0.14% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.01% optimize.loop_unroll : 0.000426s : 0.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000483s : 0.34% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.116215s : 82.90% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000737 218 5.93% : 0.000044s : 11: substitution.arithmetic_simplify 1.90% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.66% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 54.62% : 0.000403s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.41% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.06% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.83% : 0.000014s : 20: substitution.remove_not_recompute_node 3.25% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.80% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.45% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.49% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010371 2 86.75% : 0.008996s : 1: type_inference.infer 13.25% : 0.001374s : 1: type_inference.specialize ------[replace.] 0.000203 30 58.41% : 0.000119s : 16: replace.inline 41.59% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 30 92.79% : 0.000394s : 16: match.inline 7.21% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000740 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.22% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 1.94% : 0.000014s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.10% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000010s : 67: predicate.reduce_eliminate 2.63% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.18% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.90% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.84% : 0.000036s : 265: predicate.switch_simplify 1.10% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.53% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.61% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001613 32 55.10% : 0.000889s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.90% : 0.000724s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.169083 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.80% : 0.003044s : 1: add_attr 1.80% : 0.003035s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000127s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.33% : 0.000556s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000054s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.26% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000468s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.84% : 0.004807s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.50% : 0.010996s : 1: opt_a 0.08% : 0.000141s : 1: opt_after_cconv 0.29% : 0.000493s : 1: opt_after_jit_grad 0.17% : 0.000294s : 1: opt_b 7.84% : 0.013259s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.96% : 0.001628s : 2: renormalize.infer 0.90% : 0.001516s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.08% : 0.000134s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000102s : 1: symbol_engine_optimizer 68.75% : 0.116239s : 1: task_emit 0.06% : 0.000106s : 1: tuple_transform 6.18% : 0.010456s : 1: type_inference 0.04% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x7-ge],max_mem:32.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x8-pynative],max_mem:32.0M TotalTime = 0.0218679, [24] [bootstrap]: 0.0005407 [type_inference]: 0.00638905 [event_method]: 1.513e-05 [auto_monad]: 6.14e-05 [graph_reusing]: 5.37999e-06 [inline]: 1.87999e-06 [add_attr]: 0.00334045, [1] [add_attr_with_inline]: 0.00332976, [1] [Cycle 1]: 4.505e-05, [2] [tag_attr]: 1.549e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 2.85002e-06 [pre_auto_parallel]: 2.86e-05 [insert-virtual-dataset]: 2.78e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00401339, [53] [py_interpret_to_execute]: 1.998e-05 [rewriter_before_opt_a]: 5.851e-05 [opt_a]: 0.00218164, [2] [Cycle 1]: 0.00157952, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 3.326e-05 [loop_unroll]: 2.122e-05 [a_1]: 0.00050426 [with_stream_mark]: 1.361e-05 [recompute_prepare]: 8.15e-06 [updatestate_depend_eliminate]: 3.91999e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 7.708e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 2.52001e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 8.00999e-06 [auto_parallel]: 6.16998e-06 [parallel]: 2.434e-05 [flash_sp]: 7.11999e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 8.70999e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.56998e-06 [virtual_output]: 5.74999e-06 [merge_forward]: 4.19997e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.123e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 9.99999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.52002e-06 [meta_fg_expand]: 2.66e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.92e-06 [renormalize]: 0.00041859 [add_forward_monad_depend]: 4.81997e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.426e-05 [cse]: 2.822e-05 [a_3]: 4.123e-05 [Cycle 2]: 0.00059285, [45] [expand_dump_flag]: 8.40024e-07 [switch_simplify]: 6.87002e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012602 [with_stream_mark]: 9.84999e-06 [recompute_prepare]: 5.57999e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.716e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.14003e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.17e-06 [flash_sp]: 3.55e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.32999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.81003e-06 [virtual_dataset]: 5.21002e-06 [get_grad_eliminate_]: 5.01002e-06 [virtual_output]: 5.21002e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 5.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 9.11002e-06 [a_after_grad]: 8.70999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.20002e-06 [cse]: 1.341e-05 [a_3]: 3.152e-05 [py_interpret_to_execute_after_opt_a]: 7.13998e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 2.834e-05 [convert_after_rewriter]: 6.66999e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00044468 [opt_b]: 0.00018271, [1] [Cycle 1]: 0.00017691, [7] [b_1]: 0.00011004 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 3.4002e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.369e-05 [loop_unroll]: 0.00040692 [opt_after_cconv]: 9.605e-05, [1] [Cycle 1]: 9.043e-05, [7] [c_1]: 2.837e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [cse]: 1.554e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.356e-05 [tuple_transform]: 6.876e-05, [1] [Cycle 1]: 6.431e-05, [4] [d_1]: 3.89e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 5.089e-05 [cse_after_recomputation]: 2.027e-05, [1] [Cycle 1]: 1.598e-05, [1] [cse]: 1.098e-05 [environ_conv]: 4.70999e-06 [swap_dp_allreduce_reducescatter]: 5.24998e-06 [bias_add_comm_swap]: 2.82002e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.33998e-06 [micro_interleaved_order_control]: 2.64999e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 1.33002e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.154e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.763e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 6.832e-05, [1] [Cycle 1]: 6.432e-05, [6] [build]: 2.53e-06 [elim_shapecalc]: 8.70999e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.81002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.602e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 0.00011338 [opt_after_jit_grad]: 0.00047283 [validate]: 3.216e-05 [backend_pass]: 1.17e-06 [task_emit]: 0.00660979 [execute]: 6.98998e-06 Sums bootstrap : 0.000541s : 3.08% type_inference : 0.006389s : 36.38% event_method : 0.000015s : 0.09% auto_monad : 0.000061s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.33% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000630s : 3.59% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000018s : 0.10% optimize.opt_a.renormalize : 0.000419s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000042s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 2.53% optimize.opt_b.b_1 : 0.000110s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.13% optimize.loop_unroll : 0.000407s : 2.32% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000113s : 0.65% opt_after_jit_grad : 0.000473s : 2.69% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006610s : 37.64% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000208 30 12.02% : 0.000025s : 5: substitution.arithmetic_simplify 0.85% : 0.000002s : 2: substitution.elim_not_effective 0.61% : 0.000001s : 2: substitution.fold_const_symbol 2.56% : 0.000005s : 4: substitution.graph_param_transform 73.07% : 0.000152s : 3: substitution.inline 1.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.15% : 0.000004s : 4: substitution.remove_not_recompute_node 1.74% : 0.000004s : 4: substitution.replace_old_param 5.55% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006342 2 90.03% : 0.005709s : 1: type_inference.infer 9.97% : 0.000632s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.26% : 0.000028s : 3: replace.inline 28.74% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000160 5 93.42% : 0.000150s : 3: match.inline 6.58% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.73% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.57% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 1.01% : 0.000002s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.19% : 0.000002s : 11: predicate.reduce_eliminate 2.29% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.52% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.06% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.82% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.83% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000391 8 47.57% : 0.000186s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.43% : 0.000205s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030790 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.86% : 0.003344s : 1: add_attr 10.83% : 0.003334s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000066s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.88% : 0.000578s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.35% : 0.000415s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.47% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.24% : 0.000999s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.10% : 0.002185s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.57% : 0.000482s : 1: opt_after_jit_grad 0.60% : 0.000186s : 1: opt_b 13.05% : 0.004017s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.69% : 0.000213s : 1: renormalize.infer 0.65% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.39% : 0.000119s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000032s : 1: rewriter_after_opt_a 0.20% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.50% : 0.006620s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.79% : 0.006402s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0178916, [24] [bootstrap]: 0.00043779 [type_inference]: 0.0042825 [event_method]: 1.083e-05 [auto_monad]: 5.097e-05 [graph_reusing]: 4.76002e-06 [inline]: 1.62001e-06 [add_attr]: 0.00293737, [1] [add_attr_with_inline]: 0.00292941, [1] [Cycle 1]: 4.052e-05, [2] [tag_attr]: 1.137e-05 [meta_addattr_fg_expand]: 3.55e-06 [parallel-infer-symbol]: 2.24001e-06 [pre_auto_parallel]: 1.997e-05 [insert-virtual-dataset]: 2.64001e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.00365245, [53] [py_interpret_to_execute]: 1.381e-05 [rewriter_before_opt_a]: 3.692e-05 [opt_a]: 0.00185605, [2] [Cycle 1]: 0.00125476, [45] [expand_dump_flag]: 2.22001e-06 [switch_simplify]: 2.363e-05 [loop_unroll]: 1.378e-05 [a_1]: 0.00029325 [with_stream_mark]: 1.269e-05 [recompute_prepare]: 7.39002e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 2.89001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 1.95001e-06 [a_2]: 7.728e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 8.33999e-06 [auto_parallel]: 5.78002e-06 [parallel]: 1.715e-05 [flash_sp]: 7.38e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.16001e-06 [matmul_add_comm_reduction]: 8.85001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.11001e-06 [virtual_dataset]: 6.03002e-06 [get_grad_eliminate_]: 5.59998e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 8.38999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.09e-05 [merge_recompute_call_nodes]: 1.15999e-06 [before_grad]: 9.81e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.19999e-06 [flash_sp_send_recv_attached]: 2.14999e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.066e-05 [a_after_grad]: 8.93002e-06 [renormalize]: 0.00034318 [add_forward_monad_depend]: 4.55001e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.394e-05 [cse]: 2.565e-05 [a_3]: 4.15e-05 [Cycle 2]: 0.00059229, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.97002e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00012531 [with_stream_mark]: 9.37999e-06 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.23002e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 7.60017e-07 [a_2]: 6.749e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.17998e-06 [auto_parallel]: 5.37999e-06 [parallel]: 4.20999e-06 [flash_sp]: 2.58e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.27999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.14001e-06 [virtual_dataset]: 5.52001e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.66998e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 9.60019e-07 [after_resolve]: 9.00999e-06 [a_after_grad]: 7.96001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.332e-05 [a_3]: 3.185e-05 [py_interpret_to_execute_after_opt_a]: 7.17002e-06 [slice_cell_reuse_recomputed_activation]: 2.08998e-06 [rewriter_after_opt_a]: 3.116e-05 [convert_after_rewriter]: 6.36998e-06 [order_py_execute_after_rewriter]: 5.37001e-06 [mutable_eliminate]: 0.00044831 [opt_b]: 0.00018236, [1] [Cycle 1]: 0.00017626, [7] [b_1]: 0.00010731 [b_2]: 7.75998e-06 [updatestate_depend_eliminate]: 5.43002e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.30002e-06 [renormalize]: 4.10015e-07 [cse]: 1.636e-05 [optimize_parallel_all_gather_comm]: 1.505e-05 [overlap_param_gather]: 2.32999e-06 [cconv]: 2.188e-05 [loop_unroll]: 0.0004121 [opt_after_cconv]: 9.505e-05, [1] [Cycle 1]: 8.922e-05, [7] [c_1]: 2.772e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.637e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.148e-05 [tuple_transform]: 6.886e-05, [1] [Cycle 1]: 6.481e-05, [4] [d_1]: 3.909e-05 [none_parameter_eliminate]: 1.37e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.83002e-06 [add_recomputation]: 4.238e-05 [cse_after_recomputation]: 2.052e-05, [1] [Cycle 1]: 1.536e-05, [1] [cse]: 1.026e-05 [environ_conv]: 4.02e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.00001e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.07001e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.16002e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 7.80012e-07 [add_comm_op_reuse_tag]: 7.90023e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.20001e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.48002e-06 [control_data_broadcast_order]: 1.093e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.22999e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.718e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 1.79998e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 6.774e-05, [1] [Cycle 1]: 6.343e-05, [6] [build]: 1.80001e-06 [elim_shapecalc]: 8.29002e-06 [elim_not_effective]: 1.118e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.88002e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.45001e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 1.455e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00048393 [validate]: 2.9e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00575555 [execute]: 6.74001e-06 Sums bootstrap : 0.000438s : 3.13% type_inference : 0.004283s : 30.59% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000002s : 0.02% pre_auto_parallel : 0.000020s : 0.14% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000419s : 2.99% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000343s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000006s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.20% optimize.opt_b.b_1 : 0.000107s : 0.77% optimize.opt_b.b_2 : 0.000008s : 0.06% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000412s : 2.94% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000484s : 3.46% validate : 0.000029s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005756s : 41.11% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000117 26 18.16% : 0.000021s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.13% : 0.000001s : 2: substitution.fold_const_symbol 4.25% : 0.000005s : 4: substitution.graph_param_transform 65.56% : 0.000077s : 2: substitution.inline 2.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004242 2 91.87% : 0.003897s : 1: type_inference.infer 8.13% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000140 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.57% : 0.000004s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.86% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_depend_swap 1.81% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 0.95% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.52% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.27% : 0.000003s : 26: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.98% : 0.000001s : 9: predicate.print_const_string_wrapper 0.82% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.25% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.82% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.92% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.67% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.24% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.13% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000254 6 44.08% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.92% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025757 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.42% : 0.002942s : 1: add_attr 11.39% : 0.002933s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.82% : 0.000468s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000003s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.78% : 0.000458s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.00% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.22% : 0.001859s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.92% : 0.000493s : 1: opt_after_jit_grad 0.72% : 0.000186s : 1: opt_b 14.19% : 0.003656s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000024s : 1: pre_auto_parallel 0.07% : 0.000017s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.72% : 0.000186s : 1: renormalize.infer 0.58% : 0.000151s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.39% : 0.005766s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.68% : 0.004296s : 1: type_inference 0.21% : 0.000054s : 1: validate TotalTime = 0.0194841, [24] [bootstrap]: 0.00044602 [type_inference]: 0.00555858 [event_method]: 1.387e-05 [auto_monad]: 5.484e-05 [graph_reusing]: 5.71998e-06 [inline]: 1.79e-06 [add_attr]: 0.00288699, [1] [add_attr_with_inline]: 0.00287941, [1] [Cycle 1]: 4.374e-05, [2] [tag_attr]: 1.497e-05 [meta_addattr_fg_expand]: 4.25e-06 [parallel-infer-symbol]: 2.58998e-06 [pre_auto_parallel]: 2.441e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00392221, [53] [py_interpret_to_execute]: 1.893e-05 [rewriter_before_opt_a]: 5.633e-05 [opt_a]: 0.00212096, [2] [Cycle 1]: 0.0015212, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 3.083e-05 [loop_unroll]: 2.035e-05 [a_1]: 0.00044645 [with_stream_mark]: 1.277e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.56001e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 0.00011833 [accelerated_algorithm]: 6.71e-06 [shard]: 1.79e-06 [meta_shard_fg_expand]: 2.01e-06 [shard_inline]: 5.97001e-06 [merge_send_recv]: 7.83001e-06 [auto_parallel]: 5.89999e-06 [parallel]: 1.733e-05 [flash_sp]: 6.84999e-06 [merge_comm]: 3.40003e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 7.87998e-06 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 7.02002e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.70998e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 9.24998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.50998e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.28998e-06 [receive_attached]: 2.52001e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 8.79998e-06 [renormalize]: 0.00039953 [add_forward_monad_depend]: 4.38999e-06 [auto_monad_grad]: 2.36e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.556e-05 [a_3]: 4.054e-05 [Cycle 2]: 0.00059019, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.00012559 [with_stream_mark]: 9.62001e-06 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.695e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.22999e-06 [parallel]: 4.22998e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 4.94998e-06 [allreduce_slice_to_reducescatter]: 3.99974e-07 [virtual_shard_identity]: 5.93002e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.84998e-06 [merge_forward]: 2.75997e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.00001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.87998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28998e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 9.47001e-06 [a_after_grad]: 8.13001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.42001e-06 [cse]: 1.603e-05 [a_3]: 3.159e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 1.80001e-06 [rewriter_after_opt_a]: 3.039e-05 [convert_after_rewriter]: 6.56e-06 [order_py_execute_after_rewriter]: 5.39998e-06 [mutable_eliminate]: 0.00044034 [opt_b]: 0.00017896, [1] [Cycle 1]: 0.00017323, [7] [b_1]: 0.00010749 [b_2]: 6.31998e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 3.80009e-07 [cse]: 1.598e-05 [optimize_parallel_all_gather_comm]: 1.581e-05 [overlap_param_gather]: 1.95001e-06 [cconv]: 2.215e-05 [loop_unroll]: 0.00040712 [opt_after_cconv]: 9.484e-05, [1] [Cycle 1]: 8.94e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.13002e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.581e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 6.828e-05, [1] [Cycle 1]: 6.418e-05, [4] [d_1]: 3.852e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 4.209e-05 [cse_after_recomputation]: 2.025e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.044e-05 [environ_conv]: 4.29002e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.11e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 1.82001e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.36998e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 7.80012e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.19998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.14e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.14e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 3.70998e-06 [overlap_grad_flash_sp]: 1.633e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 2.09999e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 6.57e-05, [1] [Cycle 1]: 6.166e-05, [6] [build]: 2.30002e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.087e-05 [opt_reshape]: 5.84999e-06 [fold_const_symbol]: 8.57998e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 1.48e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00044038 [validate]: 3.028e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.00586837 [execute]: 6.84001e-06 Sums bootstrap : 0.000446s : 2.85% type_inference : 0.005559s : 35.52% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000056s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000572s : 3.66% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000185s : 1.18% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000400s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000440s : 2.81% optimize.opt_b.b_1 : 0.000107s : 0.69% optimize.opt_b.b_2 : 0.000006s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000407s : 2.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000440s : 2.81% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005868s : 37.50% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 15.17% : 0.000025s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.25% : 0.000005s : 4: substitution.graph_param_transform 66.12% : 0.000108s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.53% : 0.000004s : 4: substitution.remove_not_recompute_node 2.66% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005518 2 89.12% : 0.004918s : 1: type_inference.infer 10.88% : 0.000600s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.09% : 0.000026s : 3: replace.inline 30.91% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.62% : 0.000106s : 3: match.inline 8.38% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.93% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000010s : 51: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.83% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.90% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.40% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.38% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 45.92% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.08% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027828 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.39% : 0.002891s : 1: add_attr 10.36% : 0.002883s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.73% : 0.000481s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000416s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000449s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.51% : 0.000977s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.63% : 0.002124s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.62% : 0.000450s : 1: opt_after_jit_grad 0.66% : 0.000182s : 1: opt_b 14.11% : 0.003926s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000204s : 1: renormalize.infer 0.68% : 0.000189s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000068s : 1: symbol_engine_optimizer 21.12% : 0.005878s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 20.02% : 0.005572s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0367246, [24] [bootstrap]: 0.00043446 [type_inference]: 0.0112037 [event_method]: 4.738e-05 [auto_monad]: 0.00011713 [graph_reusing]: 6.96001e-06 [inline]: 1.54e-06 [add_attr]: 0.00292531, [1] [add_attr_with_inline]: 0.00291732, [1] [Cycle 1]: 6.693e-05, [2] [tag_attr]: 3.37e-05 [meta_addattr_fg_expand]: 8.58001e-06 [parallel-infer-symbol]: 2.22999e-06 [pre_auto_parallel]: 4.782e-05 [insert-virtual-dataset]: 1.84e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.24998e-06 [optimize]: 0.0131984, [53] [py_interpret_to_execute]: 3.671e-05 [rewriter_before_opt_a]: 0.00014116 [opt_a]: 0.0109458, [3] [Cycle 1]: 0.00701343, [45] [expand_dump_flag]: 4.17e-06 [switch_simplify]: 7.152e-05 [loop_unroll]: 6.125e-05 [a_1]: 0.0014372 [with_stream_mark]: 2.194e-05 [recompute_prepare]: 2.249e-05 [updatestate_depend_eliminate]: 9.07999e-06 [updatestate_assign_eliminate]: 7.33999e-06 [updatestate_loads_eliminate]: 6.63e-06 [parameter_eliminate]: 2.26998e-06 [a_2]: 0.00024225 [accelerated_algorithm]: 3.126e-05 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 3.35e-06 [shard_inline]: 1.611e-05 [merge_send_recv]: 1.437e-05 [auto_parallel]: 1.11e-05 [parallel]: 1.536e-05 [flash_sp]: 9.67999e-06 [merge_comm]: 9.79e-06 [allreduce_fusion]: 8.74998e-06 [matmul_add_comm_reduction]: 2.425e-05 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 1.792e-05 [virtual_dataset]: 1.579e-05 [get_grad_eliminate_]: 1.543e-05 [virtual_output]: 1.534e-05 [merge_forward]: 9.10001e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 1.633e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.832e-05 [merge_recompute_call_nodes]: 1.21997e-06 [before_grad]: 2.817e-05 [set_forward_comm_id_for_comm_node_pass]: 9.50001e-06 [meta_fg_expand]: 0.00143839 [flash_sp_send_recv_attached]: 2.90002e-06 [receive_attached]: 2.31e-06 [after_resolve]: 5.975e-05 [a_after_grad]: 8.197e-05 [renormalize]: 0.00239931 [add_forward_monad_depend]: 8.66002e-06 [auto_monad_grad]: 5.51002e-06 [auto_monad_eliminator]: 5.444e-05 [cse]: 0.00015575 [a_3]: 0.00033677 [Cycle 2]: 0.00302001, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 8.462e-05 [loop_unroll]: 4.395e-05 [a_1]: 0.00152008 [with_stream_mark]: 1.187e-05 [recompute_prepare]: 1.102e-05 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 1.19e-06 [a_2]: 0.00012624 [accelerated_algorithm]: 1.227e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 9.38002e-06 [merge_send_recv]: 6.78e-06 [auto_parallel]: 7.27002e-06 [parallel]: 4.62e-06 [flash_sp]: 2.80997e-06 [merge_comm]: 5.13002e-06 [allreduce_fusion]: 4.60999e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.005e-05 [virtual_dataset]: 9.07999e-06 [get_grad_eliminate_]: 8.69e-06 [virtual_output]: 8.50999e-06 [merge_forward]: 4.39998e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.675e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.39e-05 [set_forward_comm_id_for_comm_node_pass]: 5.15999e-06 [meta_fg_expand]: 6.895e-05 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.12e-06 [after_resolve]: 1.608e-05 [a_after_grad]: 1.439e-05 [renormalize]: 0.00058274 [add_forward_monad_depend]: 3.85998e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 1.426e-05 [cse]: 4.594e-05 [a_3]: 6.513e-05 [Cycle 3]: 0.00089882, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.075e-05 [loop_unroll]: 8.96002e-06 [a_1]: 0.0002496 [with_stream_mark]: 9.94999e-06 [recompute_prepare]: 9.44e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.93001e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 0.00012359 [accelerated_algorithm]: 1.178e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.81998e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 6.84999e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.55999e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.92e-06 [allreduce_fusion]: 5.05001e-06 [matmul_add_comm_reduction]: 7.69002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.84999e-06 [virtual_dataset]: 8.68001e-06 [get_grad_eliminate_]: 8.50001e-06 [virtual_output]: 8.21002e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.52e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.603e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.419e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29998e-06 [meta_fg_expand]: 2.89999e-06 [flash_sp_send_recv_attached]: 6.69999e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 1.315e-05 [a_after_grad]: 1.436e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.048e-05 [cse]: 2.719e-05 [a_3]: 5.946e-05 [py_interpret_to_execute_after_opt_a]: 1.077e-05 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 4.593e-05 [convert_after_rewriter]: 8.45999e-06 [order_py_execute_after_rewriter]: 6.85002e-06 [mutable_eliminate]: 0.0004785 [opt_b]: 0.00029035, [1] [Cycle 1]: 0.00028442, [7] [b_1]: 0.00019197 [b_2]: 1.099e-05 [updatestate_depend_eliminate]: 7.03e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 4.16001e-06 [renormalize]: 4.09986e-07 [cse]: 3.106e-05 [optimize_parallel_all_gather_comm]: 1.931e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 1.883e-05 [loop_unroll]: 0.00041566 [opt_after_cconv]: 0.00013551, [1] [Cycle 1]: 0.00012924, [7] [c_1]: 4.843e-05 [parameter_eliminate]: 2.52001e-06 [updatestate_depend_eliminate]: 7.35e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 2.935e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.65e-05 [tuple_transform]: 0.00010035, [1] [Cycle 1]: 9.6e-05, [4] [d_1]: 6.584e-05 [none_parameter_eliminate]: 1.32999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.033e-05 [partial_unused_args_eliminate]: 1.55001e-06 [add_recomputation]: 5.467e-05 [cse_after_recomputation]: 3.108e-05, [1] [Cycle 1]: 2.653e-05, [1] [cse]: 2.123e-05 [environ_conv]: 8.37e-06 [swap_dp_allreduce_reducescatter]: 7.51999e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 3.64002e-06 [label_fine_grained_interleaved_index]: 2.17999e-06 [merge_cast_opt]: 1.12e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 8.50006e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 1.84e-06 [comm_op_add_attrs]: 6.19999e-07 [add_comm_op_reuse_tag]: 8.00006e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.41002e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01e-06 [control_data_broadcast_order]: 1.634e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 5.06002e-06 [overlap_recompute_and_grad_model_parallel]: 5.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.26998e-06 [overlap_grad_ring_attention]: 5.46998e-06 [overlap_grad_flash_sp]: 2.201e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 1.42e-06 [split_layernorm_comm]: 2.29001e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 0.00010078, [1] [Cycle 1]: 9.657e-05, [6] [build]: 1.047e-05 [elim_shapecalc]: 1.351e-05 [elim_not_effective]: 1.824e-05 [opt_reshape]: 1.008e-05 [fold_const_symbol]: 1.506e-05 [renormalize]: 2.3999e-07 [detach_backward]: 1.57001e-06 [pipeline_parallel_scheduler]: 1.60001e-06 [auto_monad_reorder]: 2.296e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.11001e-06 [opt_after_jit_grad]: 0.0004621 [validate]: 4.143e-05 [backend_pass]: 7.60017e-07 [task_emit]: 0.00800032 [execute]: 6.07999e-06 Sums bootstrap : 0.000434s : 1.33% type_inference : 0.011204s : 34.41% event_method : 0.000047s : 0.15% auto_monad : 0.000117s : 0.36% graph_reusing : 0.000007s : 0.02% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000002s : 0.01% pre_auto_parallel : 0.000048s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000141s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000167s : 0.51% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003207s : 9.85% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.51% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000028s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000025s : 0.08% optimize.opt_a.flash_sp : 0.000014s : 0.04% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001510s : 4.64% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000111s : 0.34% optimize.opt_a.renormalize : 0.002982s : 9.16% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000079s : 0.24% optimize.opt_a.cse : 0.000229s : 0.70% optimize.opt_a.a_3 : 0.000461s : 1.42% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000479s : 1.47% optimize.opt_b.b_1 : 0.000192s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000416s : 1.28% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000026s : 0.08% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000022s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000001s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000023s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000462s : 1.42% validate : 0.000041s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008000s : 24.57% execute : 0.000006s : 0.02% Time group info: ------[substitution.] 0.000754 222 5.55% : 0.000042s : 12: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.59% : 0.000419s : 17: substitution.inline 2.10% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.79% : 0.000013s : 20: substitution.remove_not_recompute_node 3.14% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.46% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.80% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011135 2 86.69% : 0.009653s : 1: type_inference.infer 13.31% : 0.001482s : 1: type_inference.specialize ------[replace.] 0.000218 33 56.88% : 0.000124s : 17: replace.inline 43.12% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 33 92.32% : 0.000411s : 17: match.inline 7.68% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.12% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.71% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.22% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001601 34 56.64% : 0.000907s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.36% : 0.000694s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061071 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.80% : 0.002930s : 1: add_attr 4.78% : 0.002921s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000059s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000124s : 1: auto_monad 0.04% : 0.000027s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.75% : 0.000458s : 1: bootstrap 0.04% : 0.000022s : 1: cconv 0.01% : 0.000003s : 1: comm_op_add_attrs 0.03% : 0.000019s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000011s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000006s : 1: label_micro_interleaved_index 0.69% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.80% : 0.000487s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.04% : 0.004908s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000177s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.93% : 0.010949s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000471s : 1: opt_after_jit_grad 0.48% : 0.000294s : 1: opt_b 21.62% : 0.013202s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000052s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000030s : 1: remove_dup_value 2.62% : 0.001599s : 2: renormalize.infer 2.25% : 0.001371s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000146s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000104s : 1: symbol_engine_optimizer 13.12% : 0.008011s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 18.37% : 0.011219s : 1: type_inference 0.12% : 0.000072s : 1: validate TotalTime = 0.0182281, [24] [bootstrap]: 0.00043994 [type_inference]: 0.00425699 [event_method]: 1.043e-05 [auto_monad]: 5.021e-05 [graph_reusing]: 5.34998e-06 [inline]: 2.23002e-06 [add_attr]: 0.00292796, [1] [add_attr_with_inline]: 0.00291995, [1] [Cycle 1]: 4.189e-05, [2] [tag_attr]: 1.104e-05 [meta_addattr_fg_expand]: 3.33e-06 [parallel-infer-symbol]: 2.63998e-06 [pre_auto_parallel]: 2.097e-05 [insert-virtual-dataset]: 2.17001e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 1.72999e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00368816, [53] [py_interpret_to_execute]: 1.408e-05 [rewriter_before_opt_a]: 3.632e-05 [opt_a]: 0.00184536, [2] [Cycle 1]: 0.00124768, [45] [expand_dump_flag]: 3.4e-06 [switch_simplify]: 2.301e-05 [loop_unroll]: 1.425e-05 [a_1]: 0.00029219 [with_stream_mark]: 1.326e-05 [recompute_prepare]: 7.19001e-06 [updatestate_depend_eliminate]: 3.98001e-06 [updatestate_assign_eliminate]: 3.15998e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 1.37e-06 [a_2]: 7.668e-05 [accelerated_algorithm]: 6.14999e-06 [shard]: 1.66e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 7.30003e-06 [auto_parallel]: 5.94e-06 [parallel]: 1.858e-05 [flash_sp]: 7.20003e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.30999e-06 [allreduce_slice_to_reducescatter]: 8.99978e-07 [virtual_shard_identity]: 7.6e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.15999e-06 [before_grad]: 9.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.41998e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.13002e-06 [after_resolve]: 1.023e-05 [a_after_grad]: 8.84998e-06 [renormalize]: 0.00033703 [add_forward_monad_depend]: 4.47e-06 [auto_monad_grad]: 2.27999e-06 [auto_monad_eliminator]: 1.319e-05 [cse]: 2.356e-05 [a_3]: 4.208e-05 [Cycle 2]: 0.00058824, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 6.69001e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012443 [with_stream_mark]: 8.84998e-06 [recompute_prepare]: 5.67999e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 6.742e-05 [accelerated_algorithm]: 5.26998e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.10001e-06 [parallel]: 4.15e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.59001e-06 [matmul_add_comm_reduction]: 5.31998e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.01e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.16998e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.07998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.88002e-06 [a_after_grad]: 8.18999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.22e-05 [a_3]: 3.233e-05 [py_interpret_to_execute_after_opt_a]: 7.23999e-06 [slice_cell_reuse_recomputed_activation]: 2.33998e-06 [rewriter_after_opt_a]: 2.929e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 4.95999e-06 [mutable_eliminate]: 0.00050239 [opt_b]: 0.00018229, [1] [Cycle 1]: 0.00017644, [7] [b_1]: 0.00010967 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 3.39991e-07 [cse]: 1.592e-05 [optimize_parallel_all_gather_comm]: 1.44e-05 [overlap_param_gather]: 1.40999e-06 [cconv]: 2.163e-05 [loop_unroll]: 0.00041233 [opt_after_cconv]: 9.361e-05, [1] [Cycle 1]: 8.803e-05, [7] [c_1]: 2.808e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.54e-05 [renormalize]: 2.10013e-07 [remove_dup_value]: 1.258e-05 [tuple_transform]: 6.871e-05, [1] [Cycle 1]: 6.444e-05, [4] [d_1]: 3.876e-05 [none_parameter_eliminate]: 1.15999e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.33998e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 3.985e-05 [cse_after_recomputation]: 1.978e-05, [1] [Cycle 1]: 1.55e-05, [1] [cse]: 1.044e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.20001e-06 [bias_add_comm_swap]: 3.27002e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 9.49978e-07 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.05001e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.30999e-06 [full_micro_interleaved_order_control]: 2.36998e-06 [reorder_send_recv_between_fp_bp]: 2.17001e-06 [comm_op_add_attrs]: 7.80012e-07 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.44e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 9.79984e-07 [overlap_opt_shard_grad_in_pipeline]: 1.30999e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.03001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.02e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 3.94002e-06 [overlap_grad_flash_sp]: 1.673e-05 [begin_end_overlap_inline]: 6.50005e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.47001e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 6.786e-05, [1] [Cycle 1]: 6.392e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.22e-06 [elim_not_effective]: 1.157e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.79e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.30999e-06 [auto_monad_reorder]: 1.368e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00044096 [validate]: 2.983e-05 [backend_pass]: 6.89994e-07 [task_emit]: 0.00613019 [execute]: 7.08e-06 Sums bootstrap : 0.000440s : 3.07% type_inference : 0.004257s : 29.67% event_method : 0.000010s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000036s : 0.25% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000417s : 2.90% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000337s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000036s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000029s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000502s : 3.50% optimize.opt_b.b_1 : 0.000110s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000014s : 0.10% optimize.overlap_param_gather : 0.000001s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000412s : 2.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000040s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000001s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000014s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000441s : 3.07% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.00% task_emit : 0.006130s : 42.72% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.68% : 0.000022s : 4: substitution.arithmetic_simplify 1.78% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000001s : 2: substitution.fold_const_symbol 4.31% : 0.000005s : 4: substitution.graph_param_transform 64.87% : 0.000077s : 2: substitution.inline 2.60% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.87% : 0.000005s : 4: substitution.remove_not_recompute_node 3.01% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004218 2 91.76% : 0.003870s : 1: type_inference.infer 8.24% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 1.09% : 0.000001s : 8: predicate.check_bprop_eliminate 0.83% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.70% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.41% : 0.000001s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.70% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000001s : 8: predicate.shard_identity_eliminate 0.98% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.63% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000242 6 40.40% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.60% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026112 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.23% : 0.002932s : 1: add_attr 11.20% : 0.002924s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000044s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000017s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.81% : 0.000472s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.96% : 0.000511s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.94% : 0.000768s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001848s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.73% : 0.000451s : 1: opt_after_jit_grad 0.71% : 0.000186s : 1: opt_b 14.14% : 0.003692s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000004s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000182s : 1: renormalize.infer 0.57% : 0.000149s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000033s : 1: rewriter_after_opt_a 0.16% : 0.000040s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.51% : 0.006140s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.35% : 0.004270s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0359594, [24] [bootstrap]: 0.00049719 [type_inference]: 0.0102435 [event_method]: 4.205e-05 [auto_monad]: 0.00011592 [graph_reusing]: 7.82e-06 [inline]: 1.64e-06 [add_attr]: 0.00296163, [1] [add_attr_with_inline]: 0.00295362, [1] [Cycle 1]: 6.704e-05, [2] [tag_attr]: 3.159e-05 [meta_addattr_fg_expand]: 8.3e-06 [parallel-infer-symbol]: 2.62001e-06 [pre_auto_parallel]: 4.42e-05 [insert-virtual-dataset]: 3.01001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0130509, [53] [py_interpret_to_execute]: 3.632e-05 [rewriter_before_opt_a]: 0.00012506 [opt_a]: 0.0108116, [3] [Cycle 1]: 0.00690434, [45] [expand_dump_flag]: 3.58e-06 [switch_simplify]: 6.654e-05 [loop_unroll]: 5.48e-05 [a_1]: 0.0013874 [with_stream_mark]: 2.243e-05 [recompute_prepare]: 2.18e-05 [updatestate_depend_eliminate]: 9.00001e-06 [updatestate_assign_eliminate]: 7.71999e-06 [updatestate_loads_eliminate]: 7.41999e-06 [parameter_eliminate]: 2.29001e-06 [a_2]: 0.0002479 [accelerated_algorithm]: 3.119e-05 [shard]: 1.37999e-06 [meta_shard_fg_expand]: 3.31001e-06 [shard_inline]: 1.623e-05 [merge_send_recv]: 1.545e-05 [auto_parallel]: 1.103e-05 [parallel]: 1.674e-05 [flash_sp]: 1.027e-05 [merge_comm]: 9.86998e-06 [allreduce_fusion]: 9.22999e-06 [matmul_add_comm_reduction]: 2.607e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.781e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.547e-05 [virtual_output]: 1.533e-05 [merge_forward]: 9.15999e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 1.731e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.841e-05 [merge_recompute_call_nodes]: 1.14e-06 [before_grad]: 2.767e-05 [set_forward_comm_id_for_comm_node_pass]: 9.54999e-06 [meta_fg_expand]: 0.00138409 [flash_sp_send_recv_attached]: 3.31001e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 6.037e-05 [a_after_grad]: 8.219e-05 [renormalize]: 0.00238714 [add_forward_monad_depend]: 8.77e-06 [auto_monad_grad]: 5.00999e-06 [auto_monad_eliminator]: 5.533e-05 [cse]: 0.00015945 [a_3]: 0.00034039 [Cycle 2]: 0.0029954, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 4.739e-05 [loop_unroll]: 4.427e-05 [a_1]: 0.00153361 [with_stream_mark]: 1.141e-05 [recompute_prepare]: 1.137e-05 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.22999e-06 [a_2]: 0.00012593 [accelerated_algorithm]: 1.206e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 6.73003e-06 [auto_parallel]: 7.47998e-06 [parallel]: 4.70999e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 5.16998e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.001e-05 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 9.41e-06 [virtual_output]: 8.84998e-06 [merge_forward]: 4.91002e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.05999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.658e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.438e-05 [set_forward_comm_id_for_comm_node_pass]: 5.43002e-06 [meta_fg_expand]: 3.482e-05 [flash_sp_send_recv_attached]: 8.90024e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.542e-05 [a_after_grad]: 1.461e-05 [renormalize]: 0.00061758 [add_forward_monad_depend]: 3.98001e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.486e-05 [cse]: 4.484e-05 [a_3]: 6.506e-05 [Cycle 3]: 0.00089805, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.039e-05 [loop_unroll]: 8.84e-06 [a_1]: 0.0002502 [with_stream_mark]: 9.58002e-06 [recompute_prepare]: 9.39e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.81999e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 0.00012337 [accelerated_algorithm]: 1.172e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 8.90001e-06 [merge_send_recv]: 7.11999e-06 [auto_parallel]: 7.45998e-06 [parallel]: 4.54002e-06 [flash_sp]: 1.21002e-06 [merge_comm]: 4.97999e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 7.53e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 9.91e-06 [virtual_dataset]: 8.73001e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.02998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.49002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.58e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.423e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24998e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.321e-05 [a_after_grad]: 1.421e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 9.99979e-07 [auto_monad_eliminator]: 1.144e-05 [cse]: 2.517e-05 [a_3]: 5.954e-05 [py_interpret_to_execute_after_opt_a]: 1.014e-05 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 4.721e-05 [convert_after_rewriter]: 9.82999e-06 [order_py_execute_after_rewriter]: 7.08998e-06 [mutable_eliminate]: 0.00046117 [opt_b]: 0.00028724, [1] [Cycle 1]: 0.0002811, [7] [b_1]: 0.00018917 [b_2]: 1.088e-05 [updatestate_depend_eliminate]: 7.26999e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.96001e-06 [renormalize]: 7.39994e-07 [cse]: 3.012e-05 [optimize_parallel_all_gather_comm]: 1.984e-05 [overlap_param_gather]: 1.96998e-06 [cconv]: 2.13e-05 [loop_unroll]: 0.00042216 [opt_after_cconv]: 0.00013458, [1] [Cycle 1]: 0.00012836, [7] [c_1]: 4.803e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.73999e-06 [cse]: 2.916e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 2.833e-05 [tuple_transform]: 0.00010025, [1] [Cycle 1]: 9.592e-05, [4] [d_1]: 6.676e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 9.71998e-06 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 5.768e-05 [cse_after_recomputation]: 3.136e-05, [1] [Cycle 1]: 2.686e-05, [1] [cse]: 2.138e-05 [environ_conv]: 8.95001e-06 [swap_dp_allreduce_reducescatter]: 8.10999e-06 [bias_add_comm_swap]: 2.41998e-06 [label_micro_interleaved_index]: 4.07998e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.56998e-06 [micro_interleaved_order_control]: 2.28998e-06 [assign_add_opt]: 1.66e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.43998e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 8.30012e-07 [interleave_split_concat_branches]: 1.35999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.701e-05 [grouped_pairwise_exchange_alltoall]: 1.32e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 6.04001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.51998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 5.22e-06 [overlap_grad_flash_sp]: 2.556e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.49e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 9.745e-05, [1] [Cycle 1]: 9.328e-05, [6] [build]: 9.80002e-06 [elim_shapecalc]: 1.308e-05 [elim_not_effective]: 1.808e-05 [opt_reshape]: 9.90002e-06 [fold_const_symbol]: 1.487e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 2.515e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.0004799 [validate]: 4.315e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00821739 [execute]: 8.1e-06 Sums bootstrap : 0.000497s : 1.57% type_inference : 0.010243s : 32.26% event_method : 0.000042s : 0.13% auto_monad : 0.000116s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000044s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000125s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003171s : 9.99% optimize.opt_a.with_stream_mark : 0.000043s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000497s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.11% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001422s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000089s : 0.28% optimize.opt_a.a_after_grad : 0.000111s : 0.35% optimize.opt_a.renormalize : 0.003005s : 9.46% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.26% optimize.opt_a.cse : 0.000229s : 0.72% optimize.opt_a.a_3 : 0.000465s : 1.46% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000461s : 1.45% optimize.opt_b.b_1 : 0.000189s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000422s : 1.33% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000026s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000001s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000480s : 1.51% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008217s : 25.88% execute : 0.000008s : 0.03% Time group info: ------[substitution.] 0.000768 218 5.41% : 0.000042s : 11: substitution.arithmetic_simplify 1.74% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 51.50% : 0.000395s : 16: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.74% : 0.000013s : 20: substitution.remove_not_recompute_node 3.16% : 0.000024s : 10: substitution.replace_applicator 1.33% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 13.61% : 0.000105s : 28: substitution.tuple_list_get_item_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010175 2 87.41% : 0.008894s : 1: type_inference.infer 12.59% : 0.001281s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.20% : 0.000118s : 16: replace.inline 40.80% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000418 30 92.76% : 0.000387s : 16: match.inline 7.24% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.10% : 0.000016s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000019s : 164: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.47% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.30% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.94% : 0.000014s : 149: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001484 32 57.44% : 0.000853s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.56% : 0.000632s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060136 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.93% : 0.002966s : 1: add_attr 4.92% : 0.002957s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000122s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000529s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.03% : 0.004830s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.98% : 0.010815s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.81% : 0.000489s : 1: opt_after_jit_grad 0.48% : 0.000291s : 1: opt_b 21.71% : 0.013055s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000049s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.73% : 0.001641s : 2: renormalize.infer 2.25% : 0.001352s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000130s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.68% : 0.008227s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.06% : 0.010258s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x8-kbk],max_mem:32.0M TotalTime = 0.117167, [24] [bootstrap]: 0.00058231 [type_inference]: 0.00623902 [event_method]: 1.369e-05 [auto_monad]: 5.935e-05 [graph_reusing]: 5.14998e-06 [inline]: 1.82999e-06 [add_attr]: 0.00339566, [1] [add_attr_with_inline]: 0.00338489, [1] [Cycle 1]: 4.484e-05, [2] [tag_attr]: 1.486e-05 [meta_addattr_fg_expand]: 4.46002e-06 [parallel-infer-symbol]: 3.09001e-06 [pre_auto_parallel]: 2.958e-05 [insert-virtual-dataset]: 2.43998e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00395869, [53] [py_interpret_to_execute]: 1.982e-05 [rewriter_before_opt_a]: 5.883e-05 [opt_a]: 0.00212158, [2] [Cycle 1]: 0.0015295, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 3.229e-05 [loop_unroll]: 2.07e-05 [a_1]: 0.00047175 [with_stream_mark]: 1.36e-05 [recompute_prepare]: 7.92e-06 [updatestate_depend_eliminate]: 3.93001e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.553e-05 [accelerated_algorithm]: 6.44999e-06 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 7.9e-06 [auto_parallel]: 6.14001e-06 [parallel]: 2.406e-05 [flash_sp]: 7.29001e-06 [merge_comm]: 3.52997e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 9.48002e-06 [allreduce_slice_to_reducescatter]: 9.29984e-07 [virtual_shard_identity]: 7.13998e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.69e-06 [merge_forward]: 4.05998e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.28002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.63998e-06 [flash_sp_send_recv_attached]: 2.88e-06 [receive_attached]: 2.88e-06 [after_resolve]: 1.051e-05 [a_after_grad]: 8.86002e-06 [renormalize]: 0.00040767 [add_forward_monad_depend]: 4.62998e-06 [auto_monad_grad]: 2.07999e-06 [auto_monad_eliminator]: 1.335e-05 [cse]: 2.792e-05 [a_3]: 4.013e-05 [Cycle 2]: 0.00058317, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.67002e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012431 [with_stream_mark]: 9.23002e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.37999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.686e-05 [accelerated_algorithm]: 5.58002e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.32003e-06 [auto_parallel]: 5.25001e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.06002e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.02999e-06 [virtual_dataset]: 5.41002e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 4.91997e-06 [merge_forward]: 2.88e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.29998e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.02003e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.06997e-06 [after_resolve]: 9.49e-06 [a_after_grad]: 8.00999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.09999e-06 [cse]: 1.214e-05 [a_3]: 3.185e-05 [py_interpret_to_execute_after_opt_a]: 7.51999e-06 [slice_cell_reuse_recomputed_activation]: 2.44001e-06 [rewriter_after_opt_a]: 3.075e-05 [convert_after_rewriter]: 6.98998e-06 [order_py_execute_after_rewriter]: 4.83001e-06 [mutable_eliminate]: 0.00044591 [opt_b]: 0.00018025, [1] [Cycle 1]: 0.00017404, [7] [b_1]: 0.00010701 [b_2]: 6.63e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 3.50003e-07 [cse]: 1.574e-05 [optimize_parallel_all_gather_comm]: 1.605e-05 [overlap_param_gather]: 2.34001e-06 [cconv]: 2.261e-05 [loop_unroll]: 0.00041154 [opt_after_cconv]: 9.469e-05, [1] [Cycle 1]: 8.906e-05, [7] [c_1]: 2.849e-05 [parameter_eliminate]: 2.58e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.56e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 1.249e-05 [tuple_transform]: 6.857e-05, [1] [Cycle 1]: 6.429e-05, [4] [d_1]: 3.85e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 2.18998e-06 [add_recomputation]: 5.275e-05 [cse_after_recomputation]: 2.034e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.087e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 5.19998e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.46002e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.183e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.13001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 1.96e-06 [overlap_grad_ring_attention]: 3.63999e-06 [overlap_grad_flash_sp]: 1.804e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.94e-06 [split_layernorm_comm]: 2.14e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 6.843e-05, [1] [Cycle 1]: 6.435e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 8.50999e-06 [elim_not_effective]: 1.183e-05 [opt_reshape]: 6.20002e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.559e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00050496 [validate]: 3.151e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.102065 [execute]: 1.011e-05 Sums bootstrap : 0.000582s : 0.52% type_inference : 0.006239s : 5.53% event_method : 0.000014s : 0.01% auto_monad : 0.000059s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000030s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000596s : 0.53% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000408s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.40% optimize.opt_b.b_1 : 0.000107s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000412s : 0.36% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000505s : 0.45% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.102065s : 90.50% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000168 30 15.27% : 0.000026s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.37% : 0.000112s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.40% : 0.000004s : 4: substitution.remove_not_recompute_node 2.57% : 0.000004s : 4: substitution.replace_old_param 6.70% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006187 2 90.85% : 0.005621s : 1: type_inference.infer 9.15% : 0.000566s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.56% : 0.000026s : 3: replace.inline 31.44% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.52% : 0.000109s : 3: match.inline 8.48% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000216 1131 0.64% : 0.000001s : 11: predicate.accumulaten_eliminater 0.66% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.41% : 0.000001s : 8: predicate.addn_check_dump 0.59% : 0.000001s : 11: predicate.addn_zero_filter 0.57% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.57% : 0.000003s : 19: predicate.arithmetic_simplify 0.67% : 0.000001s : 11: predicate.cast_eliminate 0.52% : 0.000001s : 8: predicate.check_bprop_eliminate 0.42% : 0.000001s : 8: predicate.compare_switch_simplify 0.19% : 0.000000s : 4: predicate.const_output_eliminate 0.44% : 0.000001s : 8: predicate.depend_value_elim 0.66% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.67% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.64% : 0.000001s : 11: predicate.dict_set_item_eliminator 27.28% : 0.000059s : 8: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 4: predicate.elim_not_effective 0.26% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.84% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.80% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.80% : 0.000002s : 15: predicate.environ_get_depend_swap 1.39% : 0.000003s : 23: predicate.environ_get_eliminate 0.86% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.93% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.59% : 0.000003s : 16: predicate.float_depend_g_call 0.43% : 0.000001s : 8: predicate.float_environ_get_switch 0.64% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 4: predicate.fold_const_symbol 0.61% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000001s : 4: predicate.graph_param_transform 0.49% : 0.000001s : 8: predicate.incorporate_call 0.42% : 0.000001s : 8: predicate.incorporate_call_switch 4.45% : 0.000010s : 51: predicate.inline 0.63% : 0.000001s : 8: predicate.inline_without_move 0.29% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.67% : 0.000001s : 8: predicate.less_batch_normalization 1.27% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.76% : 0.000004s : 32: predicate.load_eliminater 0.75% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.58% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.24% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.45% : 0.000001s : 8: predicate.merge_addn 0.49% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.50% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.58% : 0.000001s : 11: predicate.minmaximum_grad 0.83% : 0.000002s : 4: predicate.mutable_eliminate 0.25% : 0.000001s : 4: predicate.opt_reshape 0.28% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000003s : 16: predicate.partial_defer_inline 1.08% : 0.000002s : 17: predicate.partial_eliminate 0.63% : 0.000001s : 11: predicate.print_const_string_wrapper 0.49% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000002s : 11: predicate.reduce_eliminate 1.74% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000001s : 8: predicate.remove_not_recompute_node 1.07% : 0.000002s : 21: predicate.replace_applicator 0.42% : 0.000001s : 8: predicate.replace_old_param 0.25% : 0.000001s : 4: predicate.reset_defer_inline 0.62% : 0.000001s : 11: predicate.reshape_eliminate 0.49% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.27% : 0.000001s : 4: predicate.row_tensor_eliminate 0.59% : 0.000001s : 8: predicate.same_eliminate 0.39% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.59% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000002s : 8: predicate.special_op_eliminate 0.60% : 0.000001s : 8: predicate.specialize_transform 0.72% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.61% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000002s : 16: predicate.switch_defer_inline 1.46% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.72% : 0.000008s : 54: predicate.switch_simplify 0.60% : 0.000001s : 11: predicate.tile_eliminate 0.67% : 0.000001s : 11: predicate.transpose_eliminate 1.12% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.16% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 0.99% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.49% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.03% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.75% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.21% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.73% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.38% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.27% : 0.000001s : 4: predicate.value_based_eliminate 0.51% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.52% : 0.000001s : 8: predicate.virtual_output_eliminate 0.26% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.36% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 8 48.01% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.99% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.126089 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.70% : 0.003400s : 1: add_attr 2.69% : 0.003388s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000065s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.49% : 0.000622s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000038s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000455s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.76% : 0.000960s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.06% : 0.000080s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.68% : 0.002124s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.41% : 0.000515s : 1: opt_after_jit_grad 0.15% : 0.000184s : 1: opt_b 3.14% : 0.003962s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000034s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000209s : 1: renormalize.infer 0.15% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 80.97% : 0.102088s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.96% : 0.006252s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.111175, [24] [bootstrap]: 0.00047606 [type_inference]: 0.00455577 [event_method]: 1.035e-05 [auto_monad]: 5.358e-05 [graph_reusing]: 5.00999e-06 [inline]: 1.71e-06 [add_attr]: 0.00294498, [1] [add_attr_with_inline]: 0.00293681, [1] [Cycle 1]: 4.537e-05, [2] [tag_attr]: 1.235e-05 [meta_addattr_fg_expand]: 3.61999e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.151e-05 [insert-virtual-dataset]: 2.54999e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.96e-06 [optimize]: 0.00373988, [53] [py_interpret_to_execute]: 1.536e-05 [rewriter_before_opt_a]: 4.038e-05 [opt_a]: 0.00193972, [2] [Cycle 1]: 0.00134207, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 2.427e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00029148 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.33e-06 [updatestate_depend_eliminate]: 3.46999e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 2.73e-06 [a_2]: 7.815e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.35002e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 8.47e-06 [auto_parallel]: 6.46e-06 [parallel]: 1.938e-05 [flash_sp]: 7.84997e-06 [merge_comm]: 3.27997e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 9.92001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 6.81999e-06 [virtual_dataset]: 6.02999e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.56999e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 1.005e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.138e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.46998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.23998e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 1.068e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00034714 [add_forward_monad_depend]: 4.38001e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.364e-05 [cse]: 2.712e-05 [a_3]: 3.969e-05 [Cycle 2]: 0.00058831, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.69001e-06 [loop_unroll]: 5.35001e-06 [a_1]: 0.00012197 [with_stream_mark]: 1.09e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.61e-06 [updatestate_assign_eliminate]: 2.14e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.665e-05 [accelerated_algorithm]: 5.31998e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.37999e-06 [merge_send_recv]: 4.16001e-06 [auto_parallel]: 5.21998e-06 [parallel]: 5.01997e-06 [flash_sp]: 4.1e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 2.89999e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 4.90999e-06 [virtual_output]: 4.85001e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 5.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.17e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 1.58997e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.98002e-06 [a_after_grad]: 7.8e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.90025e-07 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.07999e-06 [cse]: 1.308e-05 [a_3]: 3.206e-05 [py_interpret_to_execute_after_opt_a]: 7.28999e-06 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 3.019e-05 [convert_after_rewriter]: 6.62002e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00044408 [opt_b]: 0.00017965, [1] [Cycle 1]: 0.00017334, [7] [b_1]: 0.00010583 [b_2]: 6.96999e-06 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.09e-06 [renormalize]: 5.00004e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.664e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.391e-05 [loop_unroll]: 0.00040609 [opt_after_cconv]: 9.435e-05, [1] [Cycle 1]: 8.894e-05, [7] [c_1]: 2.768e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 4.86002e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.63e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.282e-05 [tuple_transform]: 7.018e-05, [1] [Cycle 1]: 6.595e-05, [4] [d_1]: 3.948e-05 [none_parameter_eliminate]: 1.82999e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.348e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.575e-05, [1] [cse]: 1.042e-05 [environ_conv]: 5.62999e-06 [swap_dp_allreduce_reducescatter]: 5.19003e-06 [bias_add_comm_swap]: 2.45002e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.69999e-06 [micro_interleaved_order_control]: 2.39999e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 8.80013e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.186e-05 [grouped_pairwise_exchange_alltoall]: 1.85001e-06 [offloading_packed_experts]: 3.68999e-06 [overlap_recompute_and_grad_model_parallel]: 4.91002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 4.48001e-06 [overlap_grad_flash_sp]: 1.708e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 2.00002e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 6.999e-05, [1] [Cycle 1]: 6.557e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.45001e-06 [elim_not_effective]: 1.2e-05 [opt_reshape]: 6.52001e-06 [fold_const_symbol]: 9.04998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.88997e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.64e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00043739 [validate]: 3.253e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.0986438 [execute]: 9.76e-06 Sums bootstrap : 0.000476s : 0.44% type_inference : 0.004556s : 4.25% event_method : 0.000010s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000413s : 0.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000347s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000444s : 0.41% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000406s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000437s : 0.41% validate : 0.000033s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098644s : 92.03% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000123 26 19.16% : 0.000024s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.25% : 0.000002s : 2: substitution.fold_const_symbol 4.44% : 0.000005s : 4: substitution.graph_param_transform 64.32% : 0.000079s : 2: substitution.inline 2.41% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.65% : 0.000005s : 4: substitution.remove_not_recompute_node 3.25% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004513 2 90.52% : 0.004085s : 1: type_inference.infer 9.48% : 0.000428s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.41% : 0.000001s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 26: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.85% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.78% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.90% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.58% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.84% : 0.000003s : 17: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.02% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.24% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000319 6 36.50% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.50% : 0.000203s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119123 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.48% : 0.002950s : 1: add_attr 2.47% : 0.002940s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.43% : 0.000510s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000414s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.64% : 0.000763s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.63% : 0.001943s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.37% : 0.000447s : 1: opt_after_jit_grad 0.15% : 0.000183s : 1: opt_b 3.14% : 0.003744s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000188s : 1: renormalize.infer 0.13% : 0.000153s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000073s : 1: symbol_engine_optimizer 82.83% : 0.098668s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 3.84% : 0.004569s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.112053, [24] [bootstrap]: 0.00046235 [type_inference]: 0.00570686 [event_method]: 1.477e-05 [auto_monad]: 5.664e-05 [graph_reusing]: 6.23e-06 [inline]: 1.71e-06 [add_attr]: 0.00294802, [1] [add_attr_with_inline]: 0.00294009, [1] [Cycle 1]: 4.508e-05, [2] [tag_attr]: 1.553e-05 [meta_addattr_fg_expand]: 4.17998e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 2.524e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00396559, [53] [py_interpret_to_execute]: 2.166e-05 [rewriter_before_opt_a]: 5.907e-05 [opt_a]: 0.00213362, [2] [Cycle 1]: 0.00152806, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 3.265e-05 [loop_unroll]: 2.015e-05 [a_1]: 0.0004774 [with_stream_mark]: 1.38e-05 [recompute_prepare]: 7.51001e-06 [updatestate_depend_eliminate]: 3.76001e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 7.668e-05 [accelerated_algorithm]: 6.02999e-06 [shard]: 2.54999e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 8.23001e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.706e-05 [flash_sp]: 6.66e-06 [merge_comm]: 3.81999e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 9.19998e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.21001e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.42999e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.41998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.047e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 9.15001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.97002e-06 [receive_attached]: 3.01001e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 8.65999e-06 [renormalize]: 0.00041302 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.96998e-06 [auto_monad_eliminator]: 1.389e-05 [cse]: 2.634e-05 [a_3]: 4.099e-05 [Cycle 2]: 0.00059603, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00012477 [with_stream_mark]: 9.87001e-06 [recompute_prepare]: 5.67999e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.898e-05 [accelerated_algorithm]: 5.59998e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.11002e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.19e-06 [parallel]: 3.98001e-06 [flash_sp]: 3.2e-06 [merge_comm]: 3.15998e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 4.90999e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.35002e-06 [virtual_dataset]: 5.24998e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 4.89003e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 5.57999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.89998e-06 [a_after_grad]: 8.11002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.52001e-06 [cse]: 1.224e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.66001e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.115e-05 [convert_after_rewriter]: 7.16001e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00044677 [opt_b]: 0.00018246, [1] [Cycle 1]: 0.00017661, [7] [b_1]: 0.000109 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 4.10015e-07 [cse]: 1.611e-05 [optimize_parallel_all_gather_comm]: 1.561e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.303e-05 [loop_unroll]: 0.0004118 [opt_after_cconv]: 9.587e-05, [1] [Cycle 1]: 9.023e-05, [7] [c_1]: 2.781e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.04e-06 [cse]: 1.66e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.389e-05 [tuple_transform]: 6.863e-05, [1] [Cycle 1]: 6.442e-05, [4] [d_1]: 3.874e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 4.37e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.569e-05, [1] [cse]: 1.055e-05 [environ_conv]: 5.15001e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.56998e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 3.04001e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.50002e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.177e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.72998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 1.731e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.675e-05, [1] [Cycle 1]: 6.266e-05, [6] [build]: 2.29999e-06 [elim_shapecalc]: 7.93001e-06 [elim_not_effective]: 1.116e-05 [opt_reshape]: 5.82999e-06 [fold_const_symbol]: 9.07001e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.29001e-06 [opt_after_jit_grad]: 0.00048664 [validate]: 3.213e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0981006 [execute]: 9.74e-06 Sums bootstrap : 0.000462s : 0.43% type_inference : 0.005707s : 5.28% event_method : 0.000015s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000602s : 0.56% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000413s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.41% optimize.opt_b.b_1 : 0.000109s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000412s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000487s : 0.45% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098101s : 90.72% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.76% : 0.000024s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 4: substitution.graph_param_transform 66.91% : 0.000110s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.75% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005665 2 89.54% : 0.005072s : 1: type_inference.infer 10.46% : 0.000593s : 1: type_inference.specialize ------[replace.] 0.000067 5 83.35% : 0.000056s : 3: replace.inline 16.65% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.49% : 0.000108s : 3: match.inline 8.51% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.39% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.31% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.98% : 0.000002s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.81% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.30% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000357 8 46.21% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.79% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120500 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.45% : 0.002952s : 1: add_attr 2.44% : 0.002944s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000061s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000493s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.80% : 0.000966s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.77% : 0.002136s : 1: opt_a 0.08% : 0.000099s : 1: opt_after_cconv 0.41% : 0.000496s : 1: opt_after_jit_grad 0.15% : 0.000186s : 1: opt_b 3.29% : 0.003970s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.17% : 0.000202s : 1: renormalize.infer 0.17% : 0.000204s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000069s : 1: symbol_engine_optimizer 81.43% : 0.098124s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.75% : 0.005721s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.151311, [24] [bootstrap]: 0.0004897 [type_inference]: 0.0114795 [event_method]: 4.839e-05 [auto_monad]: 0.00012395 [graph_reusing]: 8.81997e-06 [inline]: 2.01003e-06 [add_attr]: 0.00301765, [1] [add_attr_with_inline]: 0.00300893, [1] [Cycle 1]: 7.086e-05, [2] [tag_attr]: 3.536e-05 [meta_addattr_fg_expand]: 9.79e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 5.031e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.0133596, [53] [py_interpret_to_execute]: 3.868e-05 [rewriter_before_opt_a]: 0.00014714 [opt_a]: 0.0110886, [3] [Cycle 1]: 0.00715786, [45] [expand_dump_flag]: 4.01001e-06 [switch_simplify]: 7.355e-05 [loop_unroll]: 6.148e-05 [a_1]: 0.00143777 [with_stream_mark]: 2.321e-05 [recompute_prepare]: 2.109e-05 [updatestate_depend_eliminate]: 9.51e-06 [updatestate_assign_eliminate]: 7.71999e-06 [updatestate_loads_eliminate]: 7.13e-06 [parameter_eliminate]: 2.51998e-06 [a_2]: 0.00024258 [accelerated_algorithm]: 2.997e-05 [shard]: 2.26e-06 [meta_shard_fg_expand]: 3.35998e-06 [shard_inline]: 1.62e-05 [merge_send_recv]: 1.623e-05 [auto_parallel]: 1.058e-05 [parallel]: 1.799e-05 [flash_sp]: 1.154e-05 [merge_comm]: 9.57999e-06 [allreduce_fusion]: 9.09e-06 [matmul_add_comm_reduction]: 2.657e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.758e-05 [virtual_dataset]: 1.565e-05 [get_grad_eliminate_]: 5.919e-05 [virtual_output]: 1.578e-05 [merge_forward]: 9.43002e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.877e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.931e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 2.719e-05 [set_forward_comm_id_for_comm_node_pass]: 1.005e-05 [meta_fg_expand]: 0.00141202 [flash_sp_send_recv_attached]: 3.71999e-06 [receive_attached]: 2.89001e-06 [after_resolve]: 6.069e-05 [a_after_grad]: 8.19e-05 [renormalize]: 0.00249682 [add_forward_monad_depend]: 9.46e-06 [auto_monad_grad]: 5.38002e-06 [auto_monad_eliminator]: 5.604e-05 [cse]: 0.00016814 [a_3]: 0.00033318 [Cycle 2]: 0.00298934, [45] [expand_dump_flag]: 1.53002e-06 [switch_simplify]: 4.716e-05 [loop_unroll]: 4.408e-05 [a_1]: 0.00151643 [with_stream_mark]: 1.255e-05 [recompute_prepare]: 1.087e-05 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 4.38001e-06 [updatestate_loads_eliminate]: 3.55998e-06 [parameter_eliminate]: 1.08001e-06 [a_2]: 0.00012524 [accelerated_algorithm]: 1.19e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.84998e-06 [shard_inline]: 9.21998e-06 [merge_send_recv]: 6.79001e-06 [auto_parallel]: 7.05998e-06 [parallel]: 4.63999e-06 [flash_sp]: 3.33e-06 [merge_comm]: 4.99998e-06 [allreduce_fusion]: 4.59002e-06 [matmul_add_comm_reduction]: 7.61999e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.027e-05 [virtual_dataset]: 8.74998e-06 [get_grad_eliminate_]: 9.22001e-06 [virtual_output]: 8.87999e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 9.15001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.624e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.369e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 7.08e-05 [flash_sp_send_recv_attached]: 1.24e-06 [receive_attached]: 1.19e-06 [after_resolve]: 1.649e-05 [a_after_grad]: 1.429e-05 [renormalize]: 0.00059655 [add_forward_monad_depend]: 4.05e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.448e-05 [cse]: 4.646e-05 [a_3]: 6.453e-05 [Cycle 3]: 0.00092694, [45] [expand_dump_flag]: 1.15999e-06 [switch_simplify]: 1.043e-05 [loop_unroll]: 9.19e-06 [a_1]: 0.00027165 [with_stream_mark]: 1.047e-05 [recompute_prepare]: 9.64e-06 [updatestate_depend_eliminate]: 4.88001e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 3.88999e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00012334 [accelerated_algorithm]: 1.181e-05 [shard]: 9.69972e-07 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 9.08002e-06 [merge_send_recv]: 7.3e-06 [auto_parallel]: 6.86999e-06 [parallel]: 4.60001e-06 [flash_sp]: 1.13001e-06 [merge_comm]: 4.87e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.63999e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.007e-05 [virtual_dataset]: 8.90001e-06 [get_grad_eliminate_]: 8.41002e-06 [virtual_output]: 8.29998e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 8.62e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.57e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.398e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19998e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.49e-05 [a_after_grad]: 1.514e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.26002e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.068e-05 [cse]: 2.697e-05 [a_3]: 5.884e-05 [py_interpret_to_execute_after_opt_a]: 1.042e-05 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 4.772e-05 [convert_after_rewriter]: 9.10999e-06 [order_py_execute_after_rewriter]: 7.46999e-06 [mutable_eliminate]: 0.00045729 [opt_b]: 0.0002896, [1] [Cycle 1]: 0.00028322, [7] [b_1]: 0.00018852 [b_2]: 1.115e-05 [updatestate_depend_eliminate]: 7.51999e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 4.25e-06 [renormalize]: 4.30009e-07 [cse]: 3.198e-05 [optimize_parallel_all_gather_comm]: 2.082e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.164e-05 [loop_unroll]: 0.00042161 [opt_after_cconv]: 0.00013635, [1] [Cycle 1]: 0.00013067, [7] [c_1]: 4.803e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.86999e-06 [cse]: 3.063e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.888e-05 [tuple_transform]: 0.00010134, [1] [Cycle 1]: 9.669e-05, [4] [d_1]: 6.639e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.004e-05 [partial_unused_args_eliminate]: 1.89999e-06 [add_recomputation]: 5.852e-05 [cse_after_recomputation]: 3.221e-05, [1] [Cycle 1]: 2.751e-05, [1] [cse]: 2.176e-05 [environ_conv]: 8.95001e-06 [swap_dp_allreduce_reducescatter]: 8.17e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.79e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.12999e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.93998e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.22e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.783e-05 [grouped_pairwise_exchange_alltoall]: 2.14e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.66e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.57001e-06 [overlap_recompute_comm]: 2.77002e-06 [overlap_grad_ring_attention]: 5.36998e-06 [overlap_grad_flash_sp]: 2.366e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 9.887e-05, [1] [Cycle 1]: 9.465e-05, [6] [build]: 1.081e-05 [elim_shapecalc]: 1.316e-05 [elim_not_effective]: 1.824e-05 [opt_reshape]: 9.92999e-06 [fold_const_symbol]: 1.466e-05 [renormalize]: 2.3999e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.574e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00046184 [validate]: 4.593e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.121955 [execute]: 9.30001e-06 Sums bootstrap : 0.000490s : 0.33% type_inference : 0.011479s : 7.81% event_method : 0.000048s : 0.03% auto_monad : 0.000124s : 0.08% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000147s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.09% optimize.opt_a.loop_unroll : 0.000115s : 0.08% optimize.opt_a.a_1 : 0.003226s : 2.19% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000491s : 0.33% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000077s : 0.05% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001486s : 1.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.06% optimize.opt_a.a_after_grad : 0.000111s : 0.08% optimize.opt_a.renormalize : 0.003093s : 2.10% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.06% optimize.opt_a.cse : 0.000242s : 0.16% optimize.opt_a.a_3 : 0.000457s : 0.31% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.31% optimize.opt_b.b_1 : 0.000189s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.01% optimize.loop_unroll : 0.000422s : 0.29% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.01% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000462s : 0.31% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.121955s : 82.94% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000759 222 6.01% : 0.000046s : 12: substitution.arithmetic_simplify 1.76% : 0.000013s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.48% : 0.000421s : 17: substitution.inline 2.10% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000014s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.80% : 0.000006s : 5: substitution.partial_eliminate 1.80% : 0.000014s : 20: substitution.remove_not_recompute_node 3.15% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.62% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.67% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011403 2 86.59% : 0.009874s : 1: type_inference.infer 13.41% : 0.001529s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.36% : 0.000124s : 17: replace.inline 42.64% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 33 92.52% : 0.000412s : 17: match.inline 7.48% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.23% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.67% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001622 34 56.58% : 0.000918s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.42% : 0.000704s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.176030 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.72% : 0.003022s : 1: add_attr 1.71% : 0.003013s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000131s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.29% : 0.000519s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000055s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.24% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.26% : 0.000466s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.80% : 0.004931s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000032s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.30% : 0.011092s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.27% : 0.000472s : 1: opt_after_jit_grad 0.17% : 0.000293s : 1: opt_b 7.59% : 0.013363s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.02% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.92% : 0.001623s : 2: renormalize.infer 0.83% : 0.001457s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.09% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 69.29% : 0.121977s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.53% : 0.011495s : 1: type_inference 0.04% : 0.000071s : 1: validate TotalTime = 0.110744, [24] [bootstrap]: 0.00046158 [type_inference]: 0.0043843 [event_method]: 1.127e-05 [auto_monad]: 5.407e-05 [graph_reusing]: 5.05001e-06 [inline]: 1.77999e-06 [add_attr]: 0.00299472, [1] [add_attr_with_inline]: 0.00298665, [1] [Cycle 1]: 4.205e-05, [2] [tag_attr]: 1.165e-05 [meta_addattr_fg_expand]: 3.34001e-06 [parallel-infer-symbol]: 3.33e-06 [pre_auto_parallel]: 2.086e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 1.94999e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00373105, [53] [py_interpret_to_execute]: 1.584e-05 [rewriter_before_opt_a]: 4.029e-05 [opt_a]: 0.00188044, [2] [Cycle 1]: 0.00127791, [45] [expand_dump_flag]: 3.2e-06 [switch_simplify]: 2.569e-05 [loop_unroll]: 1.376e-05 [a_1]: 0.00029558 [with_stream_mark]: 1.741e-05 [recompute_prepare]: 8.40001e-06 [updatestate_depend_eliminate]: 3.78999e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 2.93998e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 7.872e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 2.58e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 7.98999e-06 [auto_parallel]: 5.85002e-06 [parallel]: 1.842e-05 [flash_sp]: 8.17e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 8.75001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 5.57999e-06 [get_grad_eliminate_]: 5.31002e-06 [virtual_output]: 6.02001e-06 [merge_forward]: 3.75998e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.007e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.135e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.04e-06 [flash_sp_send_recv_attached]: 2.33998e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.14e-05 [a_after_grad]: 9.24e-06 [renormalize]: 0.0003481 [add_forward_monad_depend]: 4.35999e-06 [auto_monad_grad]: 2.12999e-06 [auto_monad_eliminator]: 1.407e-05 [cse]: 2.761e-05 [a_3]: 4.1e-05 [Cycle 2]: 0.00059296, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 7.18998e-06 [loop_unroll]: 5.61998e-06 [a_1]: 0.00012364 [with_stream_mark]: 9.34e-06 [recompute_prepare]: 5.52001e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.784e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.65001e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.32999e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 6.00002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.23999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98998e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.30001e-06 [a_after_grad]: 8.22e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.99979e-07 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.282e-05 [a_3]: 3.327e-05 [py_interpret_to_execute_after_opt_a]: 7.52998e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 3.095e-05 [convert_after_rewriter]: 6.71999e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00044676 [opt_b]: 0.00018114, [1] [Cycle 1]: 0.00017495, [7] [b_1]: 0.00010784 [b_2]: 7.1e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 3.69997e-07 [cse]: 1.575e-05 [optimize_parallel_all_gather_comm]: 1.906e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.421e-05 [loop_unroll]: 0.00045177 [opt_after_cconv]: 9.494e-05, [1] [Cycle 1]: 8.955e-05, [7] [c_1]: 2.824e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.603e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.32e-05 [tuple_transform]: 6.943e-05, [1] [Cycle 1]: 6.491e-05, [4] [d_1]: 3.93e-05 [none_parameter_eliminate]: 1.84998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.18998e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.351e-05 [cse_after_recomputation]: 1.987e-05, [1] [Cycle 1]: 1.548e-05, [1] [cse]: 1.034e-05 [environ_conv]: 4.58001e-06 [swap_dp_allreduce_reducescatter]: 5.31998e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71998e-06 [control_data_broadcast_order]: 1.149e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.73e-06 [overlap_grad_ring_attention]: 4.20999e-06 [overlap_grad_flash_sp]: 1.755e-05 [begin_end_overlap_inline]: 8.00006e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.771e-05, [1] [Cycle 1]: 6.375e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.132e-05 [opt_reshape]: 5.82001e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.94e-06 [auto_monad_reorder]: 1.583e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.58999e-06 [opt_after_jit_grad]: 0.00044572 [validate]: 3.147e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0983515 [execute]: 9.81e-06 Sums bootstrap : 0.000462s : 0.43% type_inference : 0.004384s : 4.11% event_method : 0.000011s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000419s : 0.39% optimize.opt_a.with_stream_mark : 0.000027s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000348s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.42% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000452s : 0.42% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000446s : 0.42% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098351s : 92.10% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000124 26 18.16% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000005s : 4: substitution.graph_param_transform 65.56% : 0.000081s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.69% : 0.000005s : 4: substitution.remove_not_recompute_node 3.51% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004344 2 91.65% : 0.003981s : 1: type_inference.infer 8.35% : 0.000363s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.16% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.38% : 0.000001s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.33% : 0.000003s : 26: predicate.load_eliminater 1.36% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.88% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.30% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.67% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.02% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000254 6 42.17% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.83% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.118752 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.53% : 0.002999s : 1: add_attr 2.52% : 0.002990s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000493s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.39% : 0.000461s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.66% : 0.000778s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.59% : 0.001883s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.38% : 0.000455s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.15% : 0.003735s : 1: optimize 0.02% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000190s : 1: renormalize.infer 0.13% : 0.000152s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.84% : 0.098374s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.70% : 0.004398s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.147355, [24] [bootstrap]: 0.00048647 [type_inference]: 0.0103126 [event_method]: 4.387e-05 [auto_monad]: 0.00011862 [graph_reusing]: 8.15999e-06 [inline]: 2.20002e-06 [add_attr]: 0.00298579, [1] [add_attr_with_inline]: 0.00297705, [1] [Cycle 1]: 6.683e-05, [2] [tag_attr]: 3.201e-05 [meta_addattr_fg_expand]: 8.49002e-06 [parallel-infer-symbol]: 2.68003e-06 [pre_auto_parallel]: 4.526e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0131684, [53] [py_interpret_to_execute]: 3.714e-05 [rewriter_before_opt_a]: 0.00012903 [opt_a]: 0.0109247, [3] [Cycle 1]: 0.00700648, [45] [expand_dump_flag]: 3.78001e-06 [switch_simplify]: 6.639e-05 [loop_unroll]: 5.547e-05 [a_1]: 0.00135606 [with_stream_mark]: 2.343e-05 [recompute_prepare]: 2.156e-05 [updatestate_depend_eliminate]: 9.59e-06 [updatestate_assign_eliminate]: 7.81001e-06 [updatestate_loads_eliminate]: 7.35e-06 [parameter_eliminate]: 2.61e-06 [a_2]: 0.00024531 [accelerated_algorithm]: 3.013e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 3.38999e-06 [shard_inline]: 1.606e-05 [merge_send_recv]: 1.613e-05 [auto_parallel]: 1.105e-05 [parallel]: 1.96e-05 [flash_sp]: 1.164e-05 [merge_comm]: 9.82001e-06 [allreduce_fusion]: 8.79e-06 [matmul_add_comm_reduction]: 2.749e-05 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 1.806e-05 [virtual_dataset]: 1.594e-05 [get_grad_eliminate_]: 1.516e-05 [virtual_output]: 1.533e-05 [merge_forward]: 9.29e-06 [cell_reuse_recompute_pass]: 1.04003e-06 [offload_activation]: 1.844e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.84e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.675e-05 [set_forward_comm_id_for_comm_node_pass]: 9.87999e-06 [meta_fg_expand]: 0.00139873 [flash_sp_send_recv_attached]: 3.76999e-06 [receive_attached]: 2.45002e-06 [after_resolve]: 5.963e-05 [a_after_grad]: 8.048e-05 [renormalize]: 0.00249552 [add_forward_monad_depend]: 9.19e-06 [auto_monad_grad]: 4.84e-06 [auto_monad_eliminator]: 5.735e-05 [cse]: 0.0001691 [a_3]: 0.00033474 [Cycle 2]: 0.00300707, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.75e-05 [loop_unroll]: 4.361e-05 [a_1]: 0.00152794 [with_stream_mark]: 1.192e-05 [recompute_prepare]: 1.063e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.62998e-06 [updatestate_loads_eliminate]: 3.65998e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012724 [accelerated_algorithm]: 1.221e-05 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.17999e-06 [merge_send_recv]: 6.72002e-06 [auto_parallel]: 7.07002e-06 [parallel]: 4.74e-06 [flash_sp]: 3.24001e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 4.61002e-06 [matmul_add_comm_reduction]: 7.7e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 9.20001e-06 [get_grad_eliminate_]: 8.92999e-06 [virtual_output]: 8.70999e-06 [merge_forward]: 4.20999e-06 [cell_reuse_recompute_pass]: 9.80013e-07 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.65e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.435e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 3.446e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.09998e-06 [after_resolve]: 1.467e-05 [a_after_grad]: 1.46e-05 [renormalize]: 0.00063596 [add_forward_monad_depend]: 4.16001e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.472e-05 [cse]: 4.586e-05 [a_3]: 6.498e-05 [Cycle 3]: 0.00089716, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 1.045e-05 [loop_unroll]: 8.90001e-06 [a_1]: 0.00024935 [with_stream_mark]: 1.023e-05 [recompute_prepare]: 9.20999e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 3.97e-06 [updatestate_loads_eliminate]: 4.18001e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 0.00012303 [accelerated_algorithm]: 1.161e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 9.02e-06 [merge_send_recv]: 7.01999e-06 [auto_parallel]: 6.91001e-06 [parallel]: 4.62e-06 [flash_sp]: 1.02e-06 [merge_comm]: 4.96002e-06 [allreduce_fusion]: 5.13002e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 9.94001e-06 [virtual_dataset]: 8.65999e-06 [get_grad_eliminate_]: 8.49998e-06 [virtual_output]: 8.22998e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 1.29003e-06 [offload_activation]: 8.67e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.558e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.441e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32999e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.297e-05 [a_after_grad]: 1.447e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.34003e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 1.096e-05 [cse]: 2.601e-05 [a_3]: 5.82e-05 [py_interpret_to_execute_after_opt_a]: 1.012e-05 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 4.735e-05 [convert_after_rewriter]: 8.90999e-06 [order_py_execute_after_rewriter]: 7.2e-06 [mutable_eliminate]: 0.00045833 [opt_b]: 0.0002857, [1] [Cycle 1]: 0.00027969, [7] [b_1]: 0.00018823 [b_2]: 1.052e-05 [updatestate_depend_eliminate]: 6.83998e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 4.05e-06 [renormalize]: 5.89993e-07 [cse]: 3.11e-05 [optimize_parallel_all_gather_comm]: 2.083e-05 [overlap_param_gather]: 2.25002e-06 [cconv]: 2.116e-05 [loop_unroll]: 0.00042187 [opt_after_cconv]: 0.00013526, [1] [Cycle 1]: 0.00012932, [7] [c_1]: 4.837e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 7.02002e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.91001e-06 [cse]: 2.998e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 2.897e-05 [tuple_transform]: 0.00010074, [1] [Cycle 1]: 9.597e-05, [4] [d_1]: 6.588e-05 [none_parameter_eliminate]: 1.84998e-06 [renormalize]: 2.89991e-07 [switch_simplify]: 9.80002e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 6.008e-05 [cse_after_recomputation]: 3.163e-05, [1] [Cycle 1]: 2.689e-05, [1] [cse]: 2.155e-05 [environ_conv]: 9.17001e-06 [swap_dp_allreduce_reducescatter]: 7.77998e-06 [bias_add_comm_swap]: 2.88e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.31998e-06 [slice_recompute_activation]: 2.69001e-06 [micro_interleaved_order_control]: 2.56998e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.29998e-06 [full_micro_interleaved_order_control]: 2.57001e-06 [reorder_send_recv_between_fp_bp]: 3.08e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.719e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.15001e-06 [overlap_recompute_and_grad_model_parallel]: 5.66003e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 5.23002e-06 [overlap_grad_flash_sp]: 2.323e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 9.812e-05, [1] [Cycle 1]: 9.386e-05, [6] [build]: 1.027e-05 [elim_shapecalc]: 1.337e-05 [elim_not_effective]: 1.825e-05 [opt_reshape]: 9.91e-06 [fold_const_symbol]: 1.513e-05 [renormalize]: 1.69995e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.551e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.62998e-06 [opt_after_jit_grad]: 0.00048112 [validate]: 4.634e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.119393 [execute]: 8.98002e-06 Sums bootstrap : 0.000486s : 0.34% type_inference : 0.010313s : 7.21% event_method : 0.000044s : 0.03% auto_monad : 0.000119s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000129s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000124s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.08% optimize.opt_a.a_1 : 0.003133s : 2.19% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000041s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001436s : 1.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003132s : 2.19% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000241s : 0.17% optimize.opt_a.a_3 : 0.000458s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.32% optimize.opt_b.b_1 : 0.000188s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.01% optimize.loop_unroll : 0.000422s : 0.29% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000481s : 0.34% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.119393s : 83.42% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000728 218 6.02% : 0.000044s : 11: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000007s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.48% : 0.000397s : 16: substitution.inline 2.12% : 0.000015s : 2: substitution.inline_without_move 1.39% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.84% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.82% : 0.000013s : 20: substitution.remove_not_recompute_node 3.34% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.73% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.52% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.64% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010243 2 86.94% : 0.008905s : 1: type_inference.infer 13.06% : 0.001338s : 1: type_inference.specialize ------[replace.] 0.000196 30 58.89% : 0.000115s : 16: replace.inline 41.11% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000419 30 92.73% : 0.000388s : 16: match.inline 7.27% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.33% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.20% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.69% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.96% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.32% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.11% : 0.000008s : 67: predicate.transpose_eliminate 1.51% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001582 32 56.04% : 0.000886s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.96% : 0.000695s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.171739 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.74% : 0.002990s : 1: add_attr 1.74% : 0.002981s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000126s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.30% : 0.000514s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000050s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000467s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.78% : 0.004778s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000173s : 28: opt.transform.opt_b 0.04% : 0.000073s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.36% : 0.010928s : 1: opt_a 0.08% : 0.000139s : 1: opt_after_cconv 0.29% : 0.000491s : 1: opt_after_jit_grad 0.17% : 0.000289s : 1: opt_b 7.67% : 0.013172s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.96% : 0.001642s : 2: renormalize.infer 0.86% : 0.001476s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.08% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 69.53% : 0.119414s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.01% : 0.010328s : 1: type_inference 0.04% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x8-ge],max_mem:36.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x9-pynative],max_mem:36.0M TotalTime = 0.021699, [24] [bootstrap]: 0.00051939 [type_inference]: 0.00622303 [event_method]: 1.48e-05 [auto_monad]: 5.777e-05 [graph_reusing]: 5.76e-06 [inline]: 1.62999e-06 [add_attr]: 0.00339719, [1] [add_attr_with_inline]: 0.00338668, [1] [Cycle 1]: 4.493e-05, [2] [tag_attr]: 1.586e-05 [meta_addattr_fg_expand]: 4.05e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.851e-05 [insert-virtual-dataset]: 2.65002e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.91003e-06 [optimize]: 0.00396087, [53] [py_interpret_to_execute]: 2.004e-05 [rewriter_before_opt_a]: 5.861e-05 [opt_a]: 0.00214547, [2] [Cycle 1]: 0.00154974, [45] [expand_dump_flag]: 3.11999e-06 [switch_simplify]: 3.254e-05 [loop_unroll]: 2.125e-05 [a_1]: 0.00045537 [with_stream_mark]: 1.393e-05 [recompute_prepare]: 7.55e-06 [updatestate_depend_eliminate]: 3.97998e-06 [updatestate_assign_eliminate]: 3.17997e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 2.15002e-06 [a_2]: 7.788e-05 [accelerated_algorithm]: 6.36998e-06 [shard]: 2.36e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 8.75001e-06 [auto_parallel]: 6.16e-06 [parallel]: 2.518e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.91001e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 9.33002e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.61001e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.59998e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.44998e-06 [offload_activation]: 9.27999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.042e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 8.99e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.67001e-06 [after_resolve]: 1.043e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00042406 [add_forward_monad_depend]: 5.05999e-06 [auto_monad_grad]: 1.84998e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.787e-05 [a_3]: 4.073e-05 [Cycle 2]: 0.00058672, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012521 [with_stream_mark]: 9.72999e-06 [recompute_prepare]: 5.58997e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 9.99979e-07 [a_2]: 6.82e-05 [accelerated_algorithm]: 5.37999e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.44998e-06 [merge_send_recv]: 4.42998e-06 [auto_parallel]: 4.90999e-06 [parallel]: 4.45e-06 [flash_sp]: 3.38e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.53998e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 5.83997e-06 [virtual_dataset]: 5.12999e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 4.82e-06 [merge_forward]: 2.40002e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.22999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.10002e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.59e-06 [a_after_grad]: 7.98001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 5.77001e-06 [cse]: 1.553e-05 [a_3]: 3.128e-05 [py_interpret_to_execute_after_opt_a]: 7.29001e-06 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 2.971e-05 [convert_after_rewriter]: 7.06999e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00043594 [opt_b]: 0.00018179, [1] [Cycle 1]: 0.00017595, [7] [b_1]: 0.00010858 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 3.30008e-07 [cse]: 1.636e-05 [optimize_parallel_all_gather_comm]: 1.625e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.285e-05 [loop_unroll]: 0.00040031 [opt_after_cconv]: 9.197e-05, [1] [Cycle 1]: 8.61e-05, [7] [c_1]: 2.68e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.521e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.218e-05 [tuple_transform]: 7.051e-05, [1] [Cycle 1]: 6.589e-05, [4] [d_1]: 3.973e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 2.10002e-06 [add_recomputation]: 5.314e-05 [cse_after_recomputation]: 2.024e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.68999e-06 [swap_dp_allreduce_reducescatter]: 5.51998e-06 [bias_add_comm_swap]: 3.04001e-06 [label_micro_interleaved_index]: 4.00998e-06 [label_fine_grained_interleaved_index]: 3.12002e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.51998e-06 [micro_interleaved_order_control]: 2.34999e-06 [assign_add_opt]: 1.59e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.35001e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.29998e-06 [interleave_split_concat_branches]: 1.16997e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 2.02001e-06 [control_data_broadcast_order]: 1.151e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.23998e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.33002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.99999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.801e-05, [1] [Cycle 1]: 6.377e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.138e-05 [opt_reshape]: 6.05002e-06 [fold_const_symbol]: 8.59e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.607e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 0.00010659 [opt_after_jit_grad]: 0.00044282 [validate]: 3.247e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.0066684 [execute]: 6.52001e-06 Sums bootstrap : 0.000519s : 3.00% type_inference : 0.006223s : 35.91% event_method : 0.000015s : 0.09% auto_monad : 0.000058s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000581s : 3.35% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000424s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000436s : 2.52% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000400s : 2.31% optimize.opt_after_cconv.c_1 : 0.000027s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000107s : 0.62% opt_after_jit_grad : 0.000443s : 2.56% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006668s : 38.48% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000169 30 14.49% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.65% : 0.000006s : 4: substitution.graph_param_transform 66.72% : 0.000113s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.47% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.78% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006177 2 90.26% : 0.005575s : 1: type_inference.infer 9.74% : 0.000602s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.36% : 0.000028s : 3: replace.inline 29.64% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.37% : 0.000110s : 3: match.inline 8.63% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.36% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.86% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.52% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.76% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.32% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.55% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.93% : 0.000001s : 11: predicate.reshape_eliminate 0.93% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.75% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.21% : 0.000004s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.07% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 48.04% : 0.000186s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.96% : 0.000201s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030580 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.12% : 0.003401s : 1: add_attr 11.09% : 0.003390s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000063s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.83% : 0.000559s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.34% : 0.000408s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.46% : 0.000445s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.09% : 0.000946s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.03% : 0.002148s : 1: opt_a 0.31% : 0.000095s : 1: opt_after_cconv 1.48% : 0.000452s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 12.96% : 0.003965s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.72% : 0.000219s : 1: renormalize.infer 0.65% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.37% : 0.000112s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.84% : 0.006678s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.40% : 0.006237s : 1: type_inference 0.19% : 0.000059s : 1: validate TotalTime = 0.0172422, [24] [bootstrap]: 0.00038677 [type_inference]: 0.00410099 [event_method]: 9.82999e-06 [auto_monad]: 4.379e-05 [graph_reusing]: 3.85998e-06 [inline]: 1.79e-06 [add_attr]: 0.00286542, [1] [add_attr_with_inline]: 0.0028583, [1] [Cycle 1]: 3.79e-05, [2] [tag_attr]: 1.013e-05 [meta_addattr_fg_expand]: 3.2e-06 [parallel-infer-symbol]: 2.10002e-06 [pre_auto_parallel]: 2.052e-05 [insert-virtual-dataset]: 2.02001e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.46002e-06 [optimize]: 0.00356105, [53] [py_interpret_to_execute]: 1.363e-05 [rewriter_before_opt_a]: 3.627e-05 [opt_a]: 0.00178994, [2] [Cycle 1]: 0.00119372, [45] [expand_dump_flag]: 2.29999e-06 [switch_simplify]: 2.148e-05 [loop_unroll]: 1.336e-05 [a_1]: 0.00027357 [with_stream_mark]: 1.162e-05 [recompute_prepare]: 7.12002e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.543e-05 [accelerated_algorithm]: 6.20002e-06 [shard]: 1.73002e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 5.92999e-06 [merge_send_recv]: 6.32001e-06 [auto_parallel]: 5.99e-06 [parallel]: 1.424e-05 [flash_sp]: 6.51999e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.16999e-06 [matmul_add_comm_reduction]: 6.77002e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.91999e-06 [virtual_dataset]: 6.56999e-06 [get_grad_eliminate_]: 5.32999e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.06001e-06 [cell_reuse_recompute_pass]: 9.69972e-07 [offload_activation]: 7.69002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 9.17001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 2.16998e-06 [flash_sp_send_recv_attached]: 2.06e-06 [receive_attached]: 1.62001e-06 [after_resolve]: 1.082e-05 [a_after_grad]: 9.46e-06 [renormalize]: 0.00033129 [add_forward_monad_depend]: 4.35999e-06 [auto_monad_grad]: 1.37e-06 [auto_monad_eliminator]: 1.199e-05 [cse]: 2.009e-05 [a_3]: 3.958e-05 [Cycle 2]: 0.0005869, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012371 [with_stream_mark]: 9.26002e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.16003e-06 [updatestate_loads_eliminate]: 2.34001e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.793e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.61003e-06 [merge_send_recv]: 4.18001e-06 [auto_parallel]: 5.17e-06 [parallel]: 4.27003e-06 [flash_sp]: 3.14999e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.37999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.88001e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.95002e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.36002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 8.84e-06 [a_after_grad]: 8.03999e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 7.29982e-07 [auto_monad_eliminator]: 5.93002e-06 [cse]: 1.273e-05 [a_3]: 3.18e-05 [py_interpret_to_execute_after_opt_a]: 7.16999e-06 [slice_cell_reuse_recomputed_activation]: 1.54e-06 [rewriter_after_opt_a]: 2.955e-05 [convert_after_rewriter]: 6.45002e-06 [order_py_execute_after_rewriter]: 4.79e-06 [mutable_eliminate]: 0.00043754 [opt_b]: 0.00018105, [1] [Cycle 1]: 0.00017526, [7] [b_1]: 0.00010842 [b_2]: 7.23e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 3.39991e-07 [cse]: 1.571e-05 [optimize_parallel_all_gather_comm]: 1.37e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 1.966e-05 [loop_unroll]: 0.00040642 [opt_after_cconv]: 9.464e-05, [1] [Cycle 1]: 8.893e-05, [7] [c_1]: 2.828e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 2.55997e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.555e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 9.37001e-06 [tuple_transform]: 6.694e-05, [1] [Cycle 1]: 6.302e-05, [4] [d_1]: 3.828e-05 [none_parameter_eliminate]: 1.29e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.05002e-06 [partial_unused_args_eliminate]: 1.17e-06 [add_recomputation]: 3.831e-05 [cse_after_recomputation]: 1.978e-05, [1] [Cycle 1]: 1.558e-05, [1] [cse]: 1.031e-05 [environ_conv]: 3.9e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 1.72001e-06 [label_micro_interleaved_index]: 3.76001e-06 [label_fine_grained_interleaved_index]: 2.31e-06 [merge_cast_opt]: 8.79983e-07 [slice_recompute_activation]: 1.56002e-06 [micro_interleaved_order_control]: 2.28002e-06 [assign_add_opt]: 9.99979e-07 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.26002e-06 [full_micro_interleaved_order_control]: 2.49001e-06 [reorder_send_recv_between_fp_bp]: 2.40002e-06 [comm_op_add_attrs]: 6.50005e-07 [add_comm_op_reuse_tag]: 7.59988e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 9.80013e-07 [overlap_opt_shard_grad_in_pipeline]: 1.28002e-06 [control_data_broadcast_order]: 1.092e-05 [grouped_pairwise_exchange_alltoall]: 1.10999e-06 [offloading_packed_experts]: 3.37002e-06 [overlap_recompute_and_grad_model_parallel]: 1.948e-05 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.14e-06 [overlap_recompute_comm]: 2.58e-06 [overlap_grad_ring_attention]: 3.73001e-06 [overlap_grad_flash_sp]: 1.655e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 1.77999e-06 [split_layernorm_comm]: 1.45001e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 6.764e-05, [1] [Cycle 1]: 6.354e-05, [6] [build]: 1.69e-06 [elim_shapecalc]: 8.72e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 5.87999e-06 [fold_const_symbol]: 8.43999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.29e-06 [auto_monad_reorder]: 1.262e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00044339 [validate]: 2.865e-05 [backend_pass]: 6.29982e-07 [task_emit]: 0.00555943 [execute]: 6.60997e-06 Sums bootstrap : 0.000387s : 2.88% type_inference : 0.004101s : 30.53% event_method : 0.000010s : 0.07% auto_monad : 0.000044s : 0.33% graph_reusing : 0.000004s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000010s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000002s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000036s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000028s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000397s : 2.96% optimize.opt_a.with_stream_mark : 0.000021s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.07% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.09% optimize.opt_a.merge_send_recv : 0.000011s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000019s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000012s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.10% optimize.opt_a.virtual_dataset : 0.000012s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.15% optimize.opt_a.a_after_grad : 0.000017s : 0.13% optimize.opt_a.renormalize : 0.000331s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.13% optimize.opt_a.cse : 0.000033s : 0.24% optimize.opt_a.a_3 : 0.000071s : 0.53% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.22% optimize.convert_after_rewriter : 0.000006s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000438s : 3.26% optimize.opt_b.b_1 : 0.000108s : 0.81% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000014s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000020s : 0.15% optimize.loop_unroll : 0.000406s : 3.03% optimize.opt_after_cconv.c_1 : 0.000028s : 0.21% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000009s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.05% optimize.partial_unused_args_eliminate : 0.000001s : 0.01% optimize.add_recomputation : 0.000038s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000019s : 0.15% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000001s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.09% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000013s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000443s : 3.30% validate : 0.000029s : 0.21% backend_pass : 0.000001s : 0.00% task_emit : 0.005559s : 41.38% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000104 26 17.68% : 0.000018s : 4: substitution.arithmetic_simplify 1.71% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.77% : 0.000005s : 4: substitution.graph_param_transform 64.31% : 0.000067s : 2: substitution.inline 2.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.02% : 0.000004s : 4: substitution.remove_not_recompute_node 3.84% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004062 2 92.04% : 0.003739s : 1: type_inference.infer 7.96% : 0.000323s : 1: type_inference.specialize ------[replace.] 0.000017 2 100.00% : 0.000017s : 2: replace.inline ------[match.] 0.000066 2 100.00% : 0.000066s : 2: match.inline ------[predicate.] 0.000136 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.45% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.85% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.51% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.20% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.29% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.18% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.01% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000226 6 37.76% : 0.000085s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.24% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.024907 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.52% : 0.002870s : 1: add_attr 11.49% : 0.002862s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000042s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000049s : 1: auto_monad 0.06% : 0.000016s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000004s : 1: bias_add_comm_swap 1.66% : 0.000414s : 1: bootstrap 0.09% : 0.000023s : 1: cconv 0.01% : 0.000003s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000006s : 1: label_micro_interleaved_index 1.67% : 0.000415s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.79% : 0.000446s : 1: mutable_eliminate 0.03% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.99% : 0.000746s : 78: opt.transform.opt_a 0.11% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.37% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000042s : 2: opt.transform.opt_trans_graph 0.13% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.20% : 0.001793s : 1: opt_a 0.39% : 0.000098s : 1: opt_after_cconv 1.82% : 0.000453s : 1: opt_after_jit_grad 0.74% : 0.000185s : 1: opt_b 14.31% : 0.003565s : 1: optimize 0.07% : 0.000017s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.09% : 0.000023s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.02% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000013s : 1: remove_dup_value 0.72% : 0.000179s : 1: renormalize.infer 0.59% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000040s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000004s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000004s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000070s : 1: symbol_engine_optimizer 22.36% : 0.005570s : 1: task_emit 0.28% : 0.000070s : 1: tuple_transform 16.52% : 0.004115s : 1: type_inference 0.22% : 0.000054s : 1: validate TotalTime = 0.0195059, [24] [bootstrap]: 0.00048235 [type_inference]: 0.00550693 [event_method]: 1.391e-05 [auto_monad]: 5.671e-05 [graph_reusing]: 5.47999e-06 [inline]: 1.86e-06 [add_attr]: 0.00294159, [1] [add_attr_with_inline]: 0.00293383, [1] [Cycle 1]: 4.506e-05, [2] [tag_attr]: 1.537e-05 [meta_addattr_fg_expand]: 4.24002e-06 [parallel-infer-symbol]: 2.58e-06 [pre_auto_parallel]: 2.568e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.00391919, [53] [py_interpret_to_execute]: 1.943e-05 [rewriter_before_opt_a]: 5.75e-05 [opt_a]: 0.00206893, [2] [Cycle 1]: 0.00147139, [45] [expand_dump_flag]: 2.50002e-06 [switch_simplify]: 3.219e-05 [loop_unroll]: 2.057e-05 [a_1]: 0.00044121 [with_stream_mark]: 1.312e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.68999e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 7.522e-05 [accelerated_algorithm]: 6.31e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.61998e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 6.88998e-06 [auto_parallel]: 5.49e-06 [parallel]: 1.67e-05 [flash_sp]: 7.48999e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 8.46002e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.77002e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.66998e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.24998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.76998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.69999e-06 [flash_sp_send_recv_attached]: 2.94001e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.78001e-06 [renormalize]: 0.000399 [add_forward_monad_depend]: 4.87e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.283e-05 [cse]: 2.731e-05 [a_3]: 3.97e-05 [Cycle 2]: 0.00058827, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00012539 [with_stream_mark]: 9.69999e-06 [recompute_prepare]: 5.50001e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.77e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.16002e-06 [parallel]: 4.27003e-06 [flash_sp]: 3.25e-06 [merge_comm]: 2.84999e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.89e-06 [virtual_dataset]: 5.15001e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 6.24999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.29e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.6e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00998e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 7.66999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04003e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 5.81998e-06 [cse]: 1.788e-05 [a_3]: 3.172e-05 [py_interpret_to_execute_after_opt_a]: 7.53999e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.169e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00043969 [opt_b]: 0.00017897, [1] [Cycle 1]: 0.00017303, [7] [b_1]: 0.00010548 [b_2]: 6.54001e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 3.30008e-07 [cse]: 1.695e-05 [optimize_parallel_all_gather_comm]: 1.52e-05 [overlap_param_gather]: 2.69001e-06 [cconv]: 2.181e-05 [loop_unroll]: 0.00045293 [opt_after_cconv]: 9.363e-05, [1] [Cycle 1]: 8.813e-05, [7] [c_1]: 2.752e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.06998e-06 [cse]: 1.562e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.142e-05 [tuple_transform]: 6.844e-05, [1] [Cycle 1]: 6.43e-05, [4] [d_1]: 3.902e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 5.89e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.335e-05 [cse_after_recomputation]: 2.002e-05, [1] [Cycle 1]: 1.571e-05, [1] [cse]: 1.062e-05 [environ_conv]: 4.18999e-06 [swap_dp_allreduce_reducescatter]: 4.55001e-06 [bias_add_comm_swap]: 2.16e-06 [label_micro_interleaved_index]: 4.44002e-06 [label_fine_grained_interleaved_index]: 2.80997e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 1.94999e-06 [micro_interleaved_order_control]: 2.15002e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.50999e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 6.50005e-07 [add_comm_op_reuse_tag]: 7.80012e-07 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.189e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.95e-06 [overlap_recompute_and_grad_model_parallel]: 4.07998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.64999e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 1.698e-05 [begin_end_overlap_inline]: 5.89993e-07 [split_matmul_comm_elemetwise]: 2.40002e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.659e-05, [1] [Cycle 1]: 6.259e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 8.3e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 5.77001e-06 [fold_const_symbol]: 8.67998e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.514e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00044076 [validate]: 2.968e-05 [backend_pass]: 6.80011e-07 [task_emit]: 0.00585219 [execute]: 7.08e-06 Sums bootstrap : 0.000482s : 3.09% type_inference : 0.005507s : 35.25% event_method : 0.000014s : 0.09% auto_monad : 0.000057s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000567s : 3.63% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000011s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000399s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000045s : 0.29% optimize.opt_a.a_3 : 0.000071s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000440s : 2.81% optimize.opt_b.b_1 : 0.000105s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000003s : 0.02% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000453s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000002s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000441s : 2.82% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.00% task_emit : 0.005852s : 37.46% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000159 30 15.65% : 0.000025s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 65.30% : 0.000104s : 3: substitution.inline 2.10% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.35% : 0.000004s : 4: substitution.replace_old_param 6.91% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005467 2 90.03% : 0.004922s : 1: type_inference.infer 9.97% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.97% : 0.000026s : 3: replace.inline 30.03% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000112 5 91.12% : 0.000102s : 3: match.inline 8.88% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.99% : 0.000002s : 11: predicate.accumulaten_eliminater 1.08% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000001s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.79% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.87% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.67% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 45.59% : 0.000154s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.41% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027848 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.58% : 0.002946s : 1: add_attr 10.55% : 0.002937s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.85% : 0.000515s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000003s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.66% : 0.000462s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000449s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.33% : 0.000928s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000088s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.44% : 0.002072s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.62% : 0.000450s : 1: opt_after_jit_grad 0.66% : 0.000182s : 1: opt_b 14.09% : 0.003923s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.73% : 0.000203s : 1: renormalize.infer 0.68% : 0.000190s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000007s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000069s : 1: symbol_engine_optimizer 21.05% : 0.005863s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 19.82% : 0.005521s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.037125, [24] [bootstrap]: 0.00049085 [type_inference]: 0.0113538 [event_method]: 4.759e-05 [auto_monad]: 0.00011642 [graph_reusing]: 7.83001e-06 [inline]: 1.87001e-06 [add_attr]: 0.00293134, [1] [add_attr_with_inline]: 0.00292283, [1] [Cycle 1]: 6.844e-05, [2] [tag_attr]: 3.437e-05 [meta_addattr_fg_expand]: 8.95999e-06 [parallel-infer-symbol]: 2.65002e-06 [pre_auto_parallel]: 4.919e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.42e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.0131997, [53] [py_interpret_to_execute]: 3.737e-05 [rewriter_before_opt_a]: 0.00014183 [opt_a]: 0.0109223, [3] [Cycle 1]: 0.0069894, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 7.171e-05 [loop_unroll]: 6.063e-05 [a_1]: 0.00142428 [with_stream_mark]: 2.155e-05 [recompute_prepare]: 2.221e-05 [updatestate_depend_eliminate]: 9.67001e-06 [updatestate_assign_eliminate]: 7.08e-06 [updatestate_loads_eliminate]: 6.61999e-06 [parameter_eliminate]: 2.94001e-06 [a_2]: 0.00024494 [accelerated_algorithm]: 3.158e-05 [shard]: 1.70001e-06 [meta_shard_fg_expand]: 3.45e-06 [shard_inline]: 1.619e-05 [merge_send_recv]: 1.439e-05 [auto_parallel]: 1.101e-05 [parallel]: 1.655e-05 [flash_sp]: 1.03e-05 [merge_comm]: 9.77001e-06 [allreduce_fusion]: 8.97e-06 [matmul_add_comm_reduction]: 2.462e-05 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.793e-05 [virtual_dataset]: 1.572e-05 [get_grad_eliminate_]: 1.541e-05 [virtual_output]: 1.534e-05 [merge_forward]: 9.54999e-06 [cell_reuse_recompute_pass]: 8.49977e-07 [offload_activation]: 1.65e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.855e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.753e-05 [set_forward_comm_id_for_comm_node_pass]: 9.59999e-06 [meta_fg_expand]: 0.00142419 [flash_sp_send_recv_attached]: 3.35003e-06 [receive_attached]: 2.51e-06 [after_resolve]: 5.94e-05 [a_after_grad]: 8.116e-05 [renormalize]: 0.0023994 [add_forward_monad_depend]: 9.36e-06 [auto_monad_grad]: 5.34e-06 [auto_monad_eliminator]: 5.449e-05 [cse]: 0.00015825 [a_3]: 0.00033399 [Cycle 2]: 0.00302396, [45] [expand_dump_flag]: 1.49998e-06 [switch_simplify]: 4.667e-05 [loop_unroll]: 4.293e-05 [a_1]: 0.00155917 [with_stream_mark]: 1.228e-05 [recompute_prepare]: 1.059e-05 [updatestate_depend_eliminate]: 4.83001e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.97e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012498 [accelerated_algorithm]: 1.17e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 7.11001e-06 [auto_parallel]: 7.13e-06 [parallel]: 4.84e-06 [flash_sp]: 3.13998e-06 [merge_comm]: 5.14998e-06 [allreduce_fusion]: 4.97e-06 [matmul_add_comm_reduction]: 7.93001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 9.87001e-06 [virtual_dataset]: 8.62e-06 [get_grad_eliminate_]: 9.77001e-06 [virtual_output]: 8.75001e-06 [merge_forward]: 4.1e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.50001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.661e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.393e-05 [set_forward_comm_id_for_comm_node_pass]: 5.14e-06 [meta_fg_expand]: 7.182e-05 [flash_sp_send_recv_attached]: 1.02998e-06 [receive_attached]: 1.12999e-06 [after_resolve]: 1.637e-05 [a_after_grad]: 1.43e-05 [renormalize]: 0.00058636 [add_forward_monad_depend]: 3.98001e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 1.505e-05 [cse]: 4.587e-05 [a_3]: 6.463e-05 [Cycle 3]: 0.00089509, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.057e-05 [loop_unroll]: 8.92e-06 [a_1]: 0.00024769 [with_stream_mark]: 9.87999e-06 [recompute_prepare]: 9.32001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 4.04002e-06 [updatestate_loads_eliminate]: 4.16001e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 0.00012282 [accelerated_algorithm]: 1.166e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.88002e-06 [shard_inline]: 8.97999e-06 [merge_send_recv]: 7.14001e-06 [auto_parallel]: 6.99001e-06 [parallel]: 4.66002e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.73001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.66002e-06 [get_grad_eliminate_]: 8.43001e-06 [virtual_output]: 8.17998e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.67998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.635e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.394e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32999e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.324e-05 [a_after_grad]: 1.39e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.19003e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 1.035e-05 [cse]: 2.593e-05 [a_3]: 5.885e-05 [py_interpret_to_execute_after_opt_a]: 1.007e-05 [slice_cell_reuse_recomputed_activation]: 2.01998e-06 [rewriter_after_opt_a]: 4.651e-05 [convert_after_rewriter]: 9.26998e-06 [order_py_execute_after_rewriter]: 6.79999e-06 [mutable_eliminate]: 0.00049853 [opt_b]: 0.00028591, [1] [Cycle 1]: 0.0002799, [7] [b_1]: 0.00018775 [b_2]: 1.049e-05 [updatestate_depend_eliminate]: 7.4e-06 [updatestate_assign_eliminate]: 4.12003e-06 [updatestate_loads_eliminate]: 3.96001e-06 [renormalize]: 3.69997e-07 [cse]: 3.118e-05 [optimize_parallel_all_gather_comm]: 1.992e-05 [overlap_param_gather]: 2.17001e-06 [cconv]: 2.014e-05 [loop_unroll]: 0.00041937 [opt_after_cconv]: 0.00013578, [1] [Cycle 1]: 0.00013011, [7] [c_1]: 4.867e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 7.2e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 3.88001e-06 [cse]: 2.943e-05 [renormalize]: 3.70026e-07 [remove_dup_value]: 2.867e-05 [tuple_transform]: 0.00010024, [1] [Cycle 1]: 9.556e-05, [4] [d_1]: 6.595e-05 [none_parameter_eliminate]: 1.48002e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 9.86998e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.359e-05 [cse_after_recomputation]: 3.091e-05, [1] [Cycle 1]: 2.64e-05, [1] [cse]: 2.124e-05 [environ_conv]: 7.9e-06 [swap_dp_allreduce_reducescatter]: 7.56001e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.18998e-06 [assign_add_opt]: 1.58002e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 8.00006e-07 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.704e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 4.85999e-06 [overlap_recompute_and_grad_model_parallel]: 5.72999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.83002e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 5.09e-06 [overlap_grad_flash_sp]: 2.339e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.49999e-06 [split_layernorm_comm]: 1.42999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.75e-05, [1] [Cycle 1]: 9.342e-05, [6] [build]: 1.026e-05 [elim_shapecalc]: 1.296e-05 [elim_not_effective]: 1.801e-05 [opt_reshape]: 1.011e-05 [fold_const_symbol]: 1.481e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.353e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.61001e-06 [opt_after_jit_grad]: 0.00046536 [validate]: 4.428e-05 [backend_pass]: 7.2e-07 [task_emit]: 0.00816735 [execute]: 7.44002e-06 Sums bootstrap : 0.000491s : 1.49% type_inference : 0.011354s : 34.45% event_method : 0.000048s : 0.14% auto_monad : 0.000116s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000001s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000142s : 0.43% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000129s : 0.39% optimize.opt_a.loop_unroll : 0.000112s : 0.34% optimize.opt_a.a_1 : 0.003231s : 9.80% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.04% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001499s : 4.55% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.002986s : 9.06% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000230s : 0.70% optimize.opt_a.a_3 : 0.000457s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000499s : 1.51% optimize.opt_b.b_1 : 0.000188s : 0.57% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000419s : 1.27% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000054s : 0.16% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000001s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000465s : 1.41% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008167s : 24.78% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000746 222 5.81% : 0.000043s : 12: substitution.arithmetic_simplify 1.77% : 0.000013s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.25% : 0.000412s : 17: substitution.inline 2.12% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.09% : 0.000016s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.86% : 0.000014s : 20: substitution.remove_not_recompute_node 3.27% : 0.000024s : 10: substitution.replace_applicator 1.35% : 0.000010s : 15: substitution.replace_old_param 0.40% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011281 2 86.85% : 0.009798s : 1: type_inference.infer 13.15% : 0.001484s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.26% : 0.000124s : 17: replace.inline 42.74% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000436 33 92.44% : 0.000403s : 17: match.inline 7.56% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 249: predicate.inline 1.24% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.68% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.36% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.14% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.93% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000037s : 277: predicate.switch_simplify 1.05% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001586 34 56.76% : 0.000900s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.24% : 0.000686s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061457 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.78% : 0.002936s : 1: add_attr 4.76% : 0.002927s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.09% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000124s : 1: auto_monad 0.04% : 0.000027s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.85% : 0.000520s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.83% : 0.000508s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.95% : 0.004888s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.78% : 0.010925s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000474s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.48% : 0.013203s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.61% : 0.001604s : 2: renormalize.infer 2.23% : 0.001369s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000146s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.31% : 0.008177s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 18.50% : 0.011369s : 1: type_inference 0.13% : 0.000077s : 1: validate TotalTime = 0.0179244, [24] [bootstrap]: 0.00042305 [type_inference]: 0.00421683 [event_method]: 9.91e-06 [auto_monad]: 4.854e-05 [graph_reusing]: 5.25001e-06 [inline]: 1.51998e-06 [add_attr]: 0.00293066, [1] [add_attr_with_inline]: 0.00292323, [1] [Cycle 1]: 3.797e-05, [2] [tag_attr]: 1.079e-05 [meta_addattr_fg_expand]: 3.10998e-06 [parallel-infer-symbol]: 2.51998e-06 [pre_auto_parallel]: 1.998e-05 [insert-virtual-dataset]: 2.38998e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.02e-06 [optimize]: 0.0036576, [53] [py_interpret_to_execute]: 1.301e-05 [rewriter_before_opt_a]: 3.706e-05 [opt_a]: 0.00181678, [2] [Cycle 1]: 0.00121499, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.209e-05 [loop_unroll]: 1.391e-05 [a_1]: 0.00028296 [with_stream_mark]: 1.266e-05 [recompute_prepare]: 7.73999e-06 [updatestate_depend_eliminate]: 3.45003e-06 [updatestate_assign_eliminate]: 2.82002e-06 [updatestate_loads_eliminate]: 2.58003e-06 [parameter_eliminate]: 2.21998e-06 [a_2]: 7.571e-05 [accelerated_algorithm]: 6.33998e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 6.38e-06 [auto_parallel]: 5.62001e-06 [parallel]: 1.617e-05 [flash_sp]: 7.03e-06 [merge_comm]: 4.13999e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 6.68e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.32002e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.89e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 7.51999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.038e-05 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.04e-06 [after_resolve]: 1.127e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00032966 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.265e-05 [cse]: 2.018e-05 [a_3]: 3.999e-05 [Cycle 2]: 0.00059238, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 7.03998e-06 [loop_unroll]: 5.75001e-06 [a_1]: 0.00012503 [with_stream_mark]: 9.07001e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.58998e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 6.783e-05 [accelerated_algorithm]: 5.48002e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.17999e-06 [parallel]: 4.33001e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 3.27997e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.55001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.05999e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 6.07999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42999e-06 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 8.18001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 9.09989e-07 [after_resolve]: 9.19998e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.266e-05 [a_3]: 3.17e-05 [py_interpret_to_execute_after_opt_a]: 7.41999e-06 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 2.993e-05 [convert_after_rewriter]: 7.87998e-06 [order_py_execute_after_rewriter]: 5.67001e-06 [mutable_eliminate]: 0.00051228 [opt_b]: 0.00018054, [1] [Cycle 1]: 0.00017453, [7] [b_1]: 0.00010766 [b_2]: 7.01999e-06 [updatestate_depend_eliminate]: 5.58002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.80009e-07 [cse]: 1.554e-05 [optimize_parallel_all_gather_comm]: 1.422e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 1.941e-05 [loop_unroll]: 0.00041155 [opt_after_cconv]: 9.572e-05, [1] [Cycle 1]: 9.011e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.54001e-06 [cse]: 1.608e-05 [renormalize]: 6.19999e-07 [remove_dup_value]: 1.073e-05 [tuple_transform]: 6.842e-05, [1] [Cycle 1]: 6.427e-05, [4] [d_1]: 3.875e-05 [none_parameter_eliminate]: 1.04e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.51e-06 [partial_unused_args_eliminate]: 1.66998e-06 [add_recomputation]: 3.789e-05 [cse_after_recomputation]: 1.955e-05, [1] [Cycle 1]: 1.544e-05, [1] [cse]: 1.034e-05 [environ_conv]: 3.53999e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 1.75001e-06 [label_micro_interleaved_index]: 3.90998e-06 [label_fine_grained_interleaved_index]: 2.65002e-06 [merge_cast_opt]: 8.80013e-07 [slice_recompute_activation]: 1.81e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 8.09989e-07 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 6.50005e-07 [full_micro_interleaved_order_control]: 1.89e-06 [reorder_send_recv_between_fp_bp]: 1.86e-06 [comm_op_add_attrs]: 7.59988e-07 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 8.89995e-07 [overlap_opt_shard_in_pipeline]: 9.30013e-07 [overlap_opt_shard_grad_in_pipeline]: 1.34998e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 5.07999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27999e-06 [overlap_recompute_comm]: 1.82999e-06 [overlap_grad_ring_attention]: 3.71999e-06 [overlap_grad_flash_sp]: 1.541e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 1.46002e-06 [split_layernorm_comm]: 1.34e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 6.789e-05, [1] [Cycle 1]: 6.384e-05, [6] [build]: 2.58e-06 [elim_shapecalc]: 8.29002e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 8.48001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.45001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.375e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.47002e-06 [opt_after_jit_grad]: 0.00044458 [validate]: 2.94e-05 [backend_pass]: 6.79982e-07 [task_emit]: 0.00591595 [execute]: 6.64001e-06 Sums bootstrap : 0.000423s : 3.01% type_inference : 0.004217s : 30.02% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000020s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000013s : 0.09% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000029s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000408s : 2.90% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000011s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000012s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.15% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000330s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000033s : 0.23% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000008s : 0.06% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000512s : 3.65% optimize.opt_b.b_1 : 0.000108s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000014s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.14% optimize.loop_unroll : 0.000412s : 2.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000011s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000038s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.04% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000015s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000001s : 0.01% optimize.split_layernorm_comm : 0.000001s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000014s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.17% validate : 0.000029s : 0.21% backend_pass : 0.000001s : 0.00% task_emit : 0.005916s : 42.12% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000113 26 18.43% : 0.000021s : 4: substitution.arithmetic_simplify 1.57% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 4.53% : 0.000005s : 4: substitution.graph_param_transform 63.94% : 0.000072s : 2: substitution.inline 2.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 4.23% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004180 2 91.81% : 0.003838s : 1: type_inference.infer 8.19% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000070 2 100.00% : 0.000070s : 2: match.inline ------[predicate.] 0.000137 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.15% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.86% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.45% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 1.12% : 0.000002s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.39% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.86% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.85% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.31% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.23% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.01% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 39.70% : 0.000094s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.30% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025763 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.39% : 0.002935s : 1: add_attr 11.36% : 0.002927s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000042s : 1: add_recomputation 0.01% : 0.000003s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.07% : 0.000017s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.75% : 0.000450s : 1: bootstrap 0.09% : 0.000023s : 1: cconv 0.01% : 0.000003s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.02% : 0.000522s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000760s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.06% : 0.001820s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.76% : 0.000454s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.21% : 0.003661s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000024s : 1: pre_auto_parallel 0.07% : 0.000017s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000014s : 1: remove_dup_value 0.70% : 0.000181s : 1: renormalize.infer 0.55% : 0.000143s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000004s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000004s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.00% : 0.005926s : 1: task_emit 0.28% : 0.000071s : 1: tuple_transform 16.42% : 0.004231s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0366064, [24] [bootstrap]: 0.00054131 [type_inference]: 0.0104604 [event_method]: 4.169e-05 [auto_monad]: 0.00011652 [graph_reusing]: 7.98999e-06 [inline]: 1.77999e-06 [add_attr]: 0.00306494, [1] [add_attr_with_inline]: 0.00305649, [1] [Cycle 1]: 6.677e-05, [2] [tag_attr]: 3.21e-05 [meta_addattr_fg_expand]: 9.02e-06 [parallel-infer-symbol]: 2.69001e-06 [pre_auto_parallel]: 4.615e-05 [insert-virtual-dataset]: 2.45002e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.59e-06 [pipeline_split]: 1.39998e-06 [optimize]: 0.0131382, [53] [py_interpret_to_execute]: 3.697e-05 [rewriter_before_opt_a]: 0.00012571 [opt_a]: 0.0108869, [3] [Cycle 1]: 0.00701527, [45] [expand_dump_flag]: 3.85e-06 [switch_simplify]: 6.525e-05 [loop_unroll]: 5.435e-05 [a_1]: 0.00133291 [with_stream_mark]: 2.325e-05 [recompute_prepare]: 2.217e-05 [updatestate_depend_eliminate]: 9.50001e-06 [updatestate_assign_eliminate]: 8.22998e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 3.01999e-06 [a_2]: 0.00024476 [accelerated_algorithm]: 3.02e-05 [shard]: 2.16998e-06 [meta_shard_fg_expand]: 3.26001e-06 [shard_inline]: 1.597e-05 [merge_send_recv]: 1.583e-05 [auto_parallel]: 1.144e-05 [parallel]: 1.838e-05 [flash_sp]: 1.105e-05 [merge_comm]: 9.88002e-06 [allreduce_fusion]: 8.87e-06 [matmul_add_comm_reduction]: 2.761e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.794e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.536e-05 [virtual_output]: 1.51e-05 [merge_forward]: 9.54e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.769e-05 [cell_reuse_handle_not_recompute_node_pass]: 7.675e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 2.824e-05 [set_forward_comm_id_for_comm_node_pass]: 9.97999e-06 [meta_fg_expand]: 0.00140856 [flash_sp_send_recv_attached]: 3.81999e-06 [receive_attached]: 2.61e-06 [after_resolve]: 6.024e-05 [a_after_grad]: 8.121e-05 [renormalize]: 0.00247013 [add_forward_monad_depend]: 9.76998e-06 [auto_monad_grad]: 5.07999e-06 [auto_monad_eliminator]: 5.518e-05 [cse]: 0.00016675 [a_3]: 0.00033504 [Cycle 2]: 0.00293132, [45] [expand_dump_flag]: 1.67999e-06 [switch_simplify]: 4.642e-05 [loop_unroll]: 4.412e-05 [a_1]: 0.00152205 [with_stream_mark]: 1.191e-05 [recompute_prepare]: 1.079e-05 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 1.04998e-06 [a_2]: 0.00012663 [accelerated_algorithm]: 1.19e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 2.05002e-06 [shard_inline]: 9.17001e-06 [merge_send_recv]: 7.11999e-06 [auto_parallel]: 7.13e-06 [parallel]: 5.06002e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 5.22999e-06 [allreduce_fusion]: 4.87998e-06 [matmul_add_comm_reduction]: 7.66999e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 9.94001e-06 [virtual_dataset]: 8.53001e-06 [get_grad_eliminate_]: 8.73001e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.18001e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 9.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.587e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.382e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 3.527e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 1.505e-05 [a_after_grad]: 1.416e-05 [renormalize]: 0.00057291 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.41e-05 [cse]: 4.558e-05 [a_3]: 6.467e-05 [Cycle 3]: 0.00092581, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 1.027e-05 [loop_unroll]: 8.89e-06 [a_1]: 0.00027659 [with_stream_mark]: 1.023e-05 [recompute_prepare]: 9.57999e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 0.00012435 [accelerated_algorithm]: 1.153e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.07001e-06 [merge_send_recv]: 7.08998e-06 [auto_parallel]: 7.05998e-06 [parallel]: 4.35e-06 [flash_sp]: 1.03001e-06 [merge_comm]: 5.04998e-06 [allreduce_fusion]: 4.92e-06 [matmul_add_comm_reduction]: 7.66999e-06 [allreduce_slice_to_reducescatter]: 3.70026e-07 [virtual_shard_identity]: 9.76e-06 [virtual_dataset]: 8.54e-06 [get_grad_eliminate_]: 8.3e-06 [virtual_output]: 8.31002e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 8.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.627e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.368e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17e-06 [meta_fg_expand]: 3.28e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.279e-05 [a_after_grad]: 1.41e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.29998e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.133e-05 [cse]: 2.652e-05 [a_3]: 5.825e-05 [py_interpret_to_execute_after_opt_a]: 1.104e-05 [slice_cell_reuse_recomputed_activation]: 2.27001e-06 [rewriter_after_opt_a]: 5.167e-05 [convert_after_rewriter]: 9.32001e-06 [order_py_execute_after_rewriter]: 6.89999e-06 [mutable_eliminate]: 0.00046087 [opt_b]: 0.00028625, [1] [Cycle 1]: 0.00028011, [7] [b_1]: 0.00018787 [b_2]: 1.095e-05 [updatestate_depend_eliminate]: 7.40998e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.99002e-06 [renormalize]: 4.00003e-07 [cse]: 3.043e-05 [optimize_parallel_all_gather_comm]: 2.033e-05 [overlap_param_gather]: 1.93002e-06 [cconv]: 2.119e-05 [loop_unroll]: 0.00042107 [opt_after_cconv]: 0.00013458, [1] [Cycle 1]: 0.00012873, [7] [c_1]: 4.756e-05 [parameter_eliminate]: 2.36998e-06 [updatestate_depend_eliminate]: 7.33e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 2.962e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 2.907e-05 [tuple_transform]: 0.00010094, [1] [Cycle 1]: 9.628e-05, [4] [d_1]: 6.615e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.82001e-06 [partial_unused_args_eliminate]: 1.83002e-06 [add_recomputation]: 6.234e-05 [cse_after_recomputation]: 3.244e-05, [1] [Cycle 1]: 2.732e-05, [1] [cse]: 2.196e-05 [environ_conv]: 8.73001e-06 [swap_dp_allreduce_reducescatter]: 7.71999e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.16997e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 1.07998e-06 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.06997e-06 [overlap_opt_shard_in_pipeline]: 1.30999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81998e-06 [control_data_broadcast_order]: 1.734e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 5.15001e-06 [overlap_recompute_and_grad_model_parallel]: 5.66998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.68998e-06 [overlap_grad_ring_attention]: 5.02e-06 [overlap_grad_flash_sp]: 2.412e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 9.10019e-07 [symbol_engine_optimizer]: 9.825e-05, [1] [Cycle 1]: 9.395e-05, [6] [build]: 1.032e-05 [elim_shapecalc]: 1.331e-05 [elim_not_effective]: 1.813e-05 [opt_reshape]: 1.013e-05 [fold_const_symbol]: 1.433e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 2.578e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.30998e-06 [opt_after_jit_grad]: 0.00046276 [validate]: 4.783e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00842141 [execute]: 6.78998e-06 Sums bootstrap : 0.000541s : 1.68% type_inference : 0.010460s : 32.39% event_method : 0.000042s : 0.13% auto_monad : 0.000117s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000126s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000122s : 0.38% optimize.opt_a.loop_unroll : 0.000107s : 0.33% optimize.opt_a.a_1 : 0.003132s : 9.70% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000496s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000109s : 0.34% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001447s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.34% optimize.opt_a.renormalize : 0.003043s : 9.42% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000239s : 0.74% optimize.opt_a.a_3 : 0.000458s : 1.42% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000052s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000461s : 1.43% optimize.opt_b.b_1 : 0.000188s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000421s : 1.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000062s : 0.19% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000463s : 1.43% validate : 0.000048s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008421s : 26.08% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000782 218 5.60% : 0.000044s : 11: substitution.arithmetic_simplify 1.74% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 51.13% : 0.000400s : 16: substitution.inline 2.03% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.89% : 0.000015s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 7.75% : 0.000061s : 20: substitution.remove_not_recompute_node 2.99% : 0.000023s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.52% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.23% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010392 2 87.60% : 0.009104s : 1: type_inference.infer 12.40% : 0.001288s : 1: type_inference.specialize ------[replace.] 0.000198 30 58.95% : 0.000117s : 16: replace.inline 41.05% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.52% : 0.000392s : 16: match.inline 7.48% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.09% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.56% : 0.000041s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.17% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.73% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001560 32 58.80% : 0.000917s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.20% : 0.000643s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060999 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.03% : 0.003069s : 1: add_attr 5.02% : 0.003060s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000067s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000124s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.94% : 0.000576s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.91% : 0.004823s : 117: opt.transform.opt_a 0.08% : 0.000046s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.85% : 0.010890s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.77% : 0.000472s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.54% : 0.013142s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.59% : 0.001582s : 2: renormalize.infer 2.37% : 0.001447s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000056s : 1: rewriter_after_opt_a 0.21% : 0.000130s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.82% : 0.008432s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.17% : 0.010475s : 1: type_inference 0.13% : 0.000081s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x9-kbk],max_mem:36.0M TotalTime = 0.121732, [24] [bootstrap]: 0.00054973 [type_inference]: 0.00617659 [event_method]: 1.358e-05 [auto_monad]: 5.914e-05 [graph_reusing]: 6.37001e-06 [inline]: 1.82001e-06 [add_attr]: 0.00343153, [1] [add_attr_with_inline]: 0.00342056, [1] [Cycle 1]: 4.586e-05, [2] [tag_attr]: 1.472e-05 [meta_addattr_fg_expand]: 4.22998e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.872e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.48e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.00399738, [53] [py_interpret_to_execute]: 2.035e-05 [rewriter_before_opt_a]: 6.028e-05 [opt_a]: 0.00215648, [2] [Cycle 1]: 0.00156083, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 3.297e-05 [loop_unroll]: 2.095e-05 [a_1]: 0.00049193 [with_stream_mark]: 1.417e-05 [recompute_prepare]: 7.85998e-06 [updatestate_depend_eliminate]: 4.33001e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.96998e-06 [a_2]: 7.66e-05 [accelerated_algorithm]: 6.63e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.12001e-06 [merge_send_recv]: 8.47998e-06 [auto_parallel]: 6.17001e-06 [parallel]: 2.45e-05 [flash_sp]: 6.88e-06 [merge_comm]: 3.88999e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 9.15001e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 7.00998e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 5.38002e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.46001e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 9.20999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.134e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 9.30001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.40002e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 8.50999e-06 [renormalize]: 0.00041482 [add_forward_monad_depend]: 4.62e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.344e-05 [cse]: 2.846e-05 [a_3]: 4.099e-05 [Cycle 2]: 0.00058634, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.67002e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012612 [with_stream_mark]: 9.87999e-06 [recompute_prepare]: 5.66003e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 6.728e-05 [accelerated_algorithm]: 5.37001e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.15999e-06 [auto_parallel]: 5.02e-06 [parallel]: 4.42003e-06 [flash_sp]: 3.23e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.37001e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 5.40999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56998e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 9.09998e-06 [a_after_grad]: 8.29002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 5.74999e-06 [cse]: 1.233e-05 [a_3]: 3.139e-05 [py_interpret_to_execute_after_opt_a]: 7.11999e-06 [slice_cell_reuse_recomputed_activation]: 2.03997e-06 [rewriter_after_opt_a]: 3.19e-05 [convert_after_rewriter]: 6.52001e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00044882 [opt_b]: 0.00018016, [1] [Cycle 1]: 0.00017421, [7] [b_1]: 0.00010633 [b_2]: 7.57002e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.09e-06 [renormalize]: 4.39992e-07 [cse]: 1.599e-05 [optimize_parallel_all_gather_comm]: 1.612e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.35e-05 [loop_unroll]: 0.00041287 [opt_after_cconv]: 9.538e-05, [1] [Cycle 1]: 8.986e-05, [7] [c_1]: 2.816e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.56e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.374e-05 [tuple_transform]: 6.866e-05, [1] [Cycle 1]: 6.436e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.24001e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.154e-05 [cse_after_recomputation]: 2.038e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.101e-05 [environ_conv]: 4.65999e-06 [swap_dp_allreduce_reducescatter]: 4.94998e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.16001e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.25999e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.167e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.69999e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.99999e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 6.735e-05, [1] [Cycle 1]: 6.329e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.70001e-06 [elim_not_effective]: 1.097e-05 [opt_reshape]: 5.85002e-06 [fold_const_symbol]: 8.93002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.639e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00044347 [validate]: 3.033e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.106737 [execute]: 1.003e-05 Sums bootstrap : 0.000550s : 0.47% type_inference : 0.006177s : 5.26% event_method : 0.000014s : 0.01% auto_monad : 0.000059s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000060s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000618s : 0.53% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000415s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000041s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.38% optimize.opt_b.b_1 : 0.000106s : 0.09% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000413s : 0.35% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000443s : 0.38% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.106737s : 90.97% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000168 30 15.34% : 0.000026s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.31% : 0.000006s : 4: substitution.graph_param_transform 65.90% : 0.000111s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.72% : 0.000005s : 4: substitution.remove_not_recompute_node 2.45% : 0.000004s : 4: substitution.replace_old_param 6.72% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006129 2 90.76% : 0.005563s : 1: type_inference.infer 9.24% : 0.000566s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.06% : 0.000028s : 3: replace.inline 28.94% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.38% : 0.000109s : 3: match.inline 8.62% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000195 1131 0.74% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000001s : 8: predicate.addn_check_dump 0.62% : 0.000001s : 11: predicate.addn_zero_filter 0.65% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.87% : 0.000004s : 19: predicate.arithmetic_simplify 0.77% : 0.000002s : 11: predicate.cast_eliminate 0.57% : 0.000001s : 8: predicate.check_bprop_eliminate 0.46% : 0.000001s : 8: predicate.compare_switch_simplify 0.18% : 0.000000s : 4: predicate.const_output_eliminate 0.50% : 0.000001s : 8: predicate.depend_value_elim 0.71% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.73% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.68% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.88% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 4: predicate.elim_not_effective 0.33% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.92% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.87% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.90% : 0.000002s : 15: predicate.environ_get_depend_swap 1.44% : 0.000003s : 23: predicate.environ_get_eliminate 0.88% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.04% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.82% : 0.000004s : 16: predicate.float_depend_g_call 0.46% : 0.000001s : 8: predicate.float_environ_get_switch 0.71% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.61% : 0.000001s : 8: predicate.get_grad_eliminate 0.20% : 0.000000s : 4: predicate.graph_param_transform 0.57% : 0.000001s : 8: predicate.incorporate_call 0.45% : 0.000001s : 8: predicate.incorporate_call_switch 4.84% : 0.000009s : 51: predicate.inline 0.69% : 0.000001s : 8: predicate.inline_without_move 0.33% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.72% : 0.000001s : 8: predicate.less_batch_normalization 1.37% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.90% : 0.000004s : 32: predicate.load_eliminater 0.85% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.34% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.49% : 0.000001s : 8: predicate.merge_addn 0.55% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.63% : 0.000001s : 11: predicate.minmaximum_grad 0.89% : 0.000002s : 4: predicate.mutable_eliminate 0.29% : 0.000001s : 4: predicate.opt_reshape 0.29% : 0.000001s : 4: predicate.parallel_virtual_node 1.40% : 0.000003s : 16: predicate.partial_defer_inline 20.23% : 0.000039s : 17: predicate.partial_eliminate 0.69% : 0.000001s : 11: predicate.print_const_string_wrapper 0.53% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000002s : 11: predicate.reduce_eliminate 1.99% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.43% : 0.000001s : 8: predicate.remove_not_recompute_node 1.16% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000001s : 4: predicate.reset_defer_inline 0.70% : 0.000001s : 11: predicate.reshape_eliminate 0.56% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.32% : 0.000001s : 4: predicate.row_tensor_eliminate 0.67% : 0.000001s : 8: predicate.same_eliminate 0.42% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.67% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.62% : 0.000001s : 8: predicate.specialize_transform 0.75% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.63% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.33% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.09% : 0.000002s : 16: predicate.switch_defer_inline 1.62% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.11% : 0.000008s : 54: predicate.switch_simplify 0.66% : 0.000001s : 11: predicate.tile_eliminate 0.70% : 0.000001s : 11: predicate.transpose_eliminate 1.24% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.22% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.12% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.10% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.73% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.34% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.88% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.54% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.29% : 0.000001s : 4: predicate.value_based_eliminate 0.60% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.60% : 0.000001s : 8: predicate.virtual_output_eliminate 0.26% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.41% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 8 45.89% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.11% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.130709 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.63% : 0.003436s : 1: add_attr 2.62% : 0.003424s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000064s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.45% : 0.000589s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000457s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.75% : 0.000985s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.65% : 0.002159s : 1: opt_a 0.08% : 0.000099s : 1: opt_after_cconv 0.35% : 0.000453s : 1: opt_after_jit_grad 0.14% : 0.000184s : 1: opt_b 3.06% : 0.004001s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000214s : 1: renormalize.infer 0.15% : 0.000195s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 81.68% : 0.106761s : 1: task_emit 0.05% : 0.000072s : 1: tuple_transform 4.74% : 0.006190s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.111403, [24] [bootstrap]: 0.00046783 [type_inference]: 0.00452708 [event_method]: 1.101e-05 [auto_monad]: 5.196e-05 [graph_reusing]: 5.10999e-06 [inline]: 1.89e-06 [add_attr]: 0.00294773, [1] [add_attr_with_inline]: 0.00293962, [1] [Cycle 1]: 4.245e-05, [2] [tag_attr]: 1.243e-05 [meta_addattr_fg_expand]: 3.42997e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.205e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00367943, [53] [py_interpret_to_execute]: 1.487e-05 [rewriter_before_opt_a]: 3.894e-05 [opt_a]: 0.00188944, [2] [Cycle 1]: 0.00129527, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.436e-05 [loop_unroll]: 1.338e-05 [a_1]: 0.00028996 [with_stream_mark]: 1.357e-05 [recompute_prepare]: 7.29001e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 3.57002e-06 [updatestate_loads_eliminate]: 3.41001e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.691e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.68999e-06 [auto_parallel]: 6.02001e-06 [parallel]: 1.905e-05 [flash_sp]: 7.56999e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 9.13002e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.40003e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.86e-06 [before_grad]: 9.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.571e-05 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 2.52001e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 1.098e-05 [a_after_grad]: 8.79e-06 [renormalize]: 0.00033973 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.344e-05 [cse]: 2.607e-05 [a_3]: 4.124e-05 [Cycle 2]: 0.00058503, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 6.68e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012349 [with_stream_mark]: 1.113e-05 [recompute_prepare]: 5.91998e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.752e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.12e-06 [parallel]: 3.85e-06 [flash_sp]: 3.2e-06 [merge_comm]: 2.87002e-06 [allreduce_fusion]: 2.56e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.30001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 8.97999e-06 [a_after_grad]: 7.85998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.20001e-07 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 5.89e-06 [cse]: 1.214e-05 [a_3]: 3.127e-05 [py_interpret_to_execute_after_opt_a]: 7.25e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.198e-05 [convert_after_rewriter]: 6.52001e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00044395 [opt_b]: 0.00017945, [1] [Cycle 1]: 0.00017373, [7] [b_1]: 0.0001071 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.17001e-06 [renormalize]: 3.7998e-07 [cse]: 1.606e-05 [optimize_parallel_all_gather_comm]: 1.619e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.314e-05 [loop_unroll]: 0.0004083 [opt_after_cconv]: 9.311e-05, [1] [Cycle 1]: 8.729e-05, [7] [c_1]: 2.731e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 4.87998e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.569e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.266e-05 [tuple_transform]: 6.773e-05, [1] [Cycle 1]: 6.343e-05, [4] [d_1]: 3.836e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.97999e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 4.358e-05 [cse_after_recomputation]: 1.941e-05, [1] [Cycle 1]: 1.492e-05, [1] [cse]: 9.96e-06 [environ_conv]: 4.27998e-06 [swap_dp_allreduce_reducescatter]: 5.44998e-06 [bias_add_comm_swap]: 2.67001e-06 [label_micro_interleaved_index]: 4.03999e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.63e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.98e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.155e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.39001e-06 [overlap_grad_ring_attention]: 4.02998e-06 [overlap_grad_flash_sp]: 1.632e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 6.835e-05, [1] [Cycle 1]: 6.405e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.88002e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.545e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.54002e-06 [opt_after_jit_grad]: 0.00044089 [validate]: 3.198e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.098966 [execute]: 9.94999e-06 Sums bootstrap : 0.000468s : 0.44% type_inference : 0.004527s : 4.21% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000413s : 0.38% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000049s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000340s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000038s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000444s : 0.41% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000408s : 0.38% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000441s : 0.41% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.098966s : 92.06% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.78% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.71% : 0.000006s : 4: substitution.graph_param_transform 64.59% : 0.000077s : 2: substitution.inline 2.61% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.27% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004484 2 91.42% : 0.004099s : 1: type_inference.infer 8.58% : 0.000385s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 1.05% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.05% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.83% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 1.08% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.90% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000269 6 44.14% : 0.000119s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.86% : 0.000150s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119285 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.47% : 0.002952s : 1: add_attr 2.47% : 0.002943s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000500s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000416s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.64% : 0.000764s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.59% : 0.001892s : 1: opt_a 0.08% : 0.000096s : 1: opt_after_cconv 0.38% : 0.000450s : 1: opt_after_jit_grad 0.15% : 0.000183s : 1: opt_b 3.09% : 0.003683s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.15% : 0.000184s : 1: renormalize.infer 0.13% : 0.000149s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 82.98% : 0.098988s : 1: task_emit 0.06% : 0.000070s : 1: tuple_transform 3.81% : 0.004542s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.113239, [24] [bootstrap]: 0.00047394 [type_inference]: 0.0056442 [event_method]: 1.4e-05 [auto_monad]: 5.636e-05 [graph_reusing]: 6.12001e-06 [inline]: 2.26e-06 [add_attr]: 0.00298479, [1] [add_attr_with_inline]: 0.00297641, [1] [Cycle 1]: 4.547e-05, [2] [tag_attr]: 1.579e-05 [meta_addattr_fg_expand]: 4.41002e-06 [parallel-infer-symbol]: 2.80002e-06 [pre_auto_parallel]: 2.481e-05 [insert-virtual-dataset]: 2.79999e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 2.41998e-06 [pipeline_split]: 2.04e-06 [optimize]: 0.00395179, [53] [py_interpret_to_execute]: 2.084e-05 [rewriter_before_opt_a]: 5.847e-05 [opt_a]: 0.00211001, [2] [Cycle 1]: 0.00151077, [45] [expand_dump_flag]: 3.26999e-06 [switch_simplify]: 3.24e-05 [loop_unroll]: 2.135e-05 [a_1]: 0.00045003 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.59002e-06 [updatestate_assign_eliminate]: 3.50003e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 2.32001e-06 [a_2]: 7.602e-05 [accelerated_algorithm]: 6.17001e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 6.21e-06 [merge_send_recv]: 8.05999e-06 [auto_parallel]: 6.36e-06 [parallel]: 1.928e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.53999e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 9.04998e-06 [allreduce_slice_to_reducescatter]: 9.5999e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.81998e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.88002e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.38002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 9.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.51998e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00041631 [add_forward_monad_depend]: 4.90001e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.353e-05 [cse]: 2.696e-05 [a_3]: 4.146e-05 [Cycle 2]: 0.00059006, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012496 [with_stream_mark]: 9.79e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.69001e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.17999e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.636e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.41002e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.20999e-06 [parallel]: 3.88999e-06 [flash_sp]: 3.2e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 3.06001e-06 [matmul_add_comm_reduction]: 8.22e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.76e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.12999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.84002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18998e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.04002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.55997e-06 [cse]: 1.227e-05 [a_3]: 3.129e-05 [py_interpret_to_execute_after_opt_a]: 7.71999e-06 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 3.072e-05 [convert_after_rewriter]: 7.26001e-06 [order_py_execute_after_rewriter]: 5.30999e-06 [mutable_eliminate]: 0.00044443 [opt_b]: 0.00018324, [1] [Cycle 1]: 0.00017718, [7] [b_1]: 0.00011067 [b_2]: 6.91001e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.50003e-07 [cse]: 1.562e-05 [optimize_parallel_all_gather_comm]: 1.65e-05 [overlap_param_gather]: 1.88997e-06 [cconv]: 2.267e-05 [loop_unroll]: 0.00040913 [opt_after_cconv]: 9.361e-05, [1] [Cycle 1]: 8.802e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.558e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.346e-05 [tuple_transform]: 7.019e-05, [1] [Cycle 1]: 6.608e-05, [4] [d_1]: 3.998e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 4.451e-05 [cse_after_recomputation]: 1.99e-05, [1] [Cycle 1]: 1.55e-05, [1] [cse]: 1.049e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.07999e-06 [bias_add_comm_swap]: 1.55e-05 [label_micro_interleaved_index]: 4.56002e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.31002e-06 [add_comm_op_reuse_tag]: 1.22e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92001e-06 [control_data_broadcast_order]: 1.181e-05 [grouped_pairwise_exchange_alltoall]: 1.65001e-06 [offloading_packed_experts]: 3.51999e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.54999e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.7e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.31998e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 6.791e-05, [1] [Cycle 1]: 6.378e-05, [6] [build]: 2.56e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.154e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.95001e-06 [pipeline_parallel_scheduler]: 1.56002e-06 [auto_monad_reorder]: 1.582e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.65998e-06 [opt_after_jit_grad]: 0.00044437 [validate]: 3.052e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0993558 [execute]: 1.022e-05 Sums bootstrap : 0.000474s : 0.43% type_inference : 0.005644s : 5.16% event_method : 0.000014s : 0.01% auto_monad : 0.000056s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000575s : 0.53% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000416s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000444s : 0.41% optimize.opt_b.b_1 : 0.000111s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000409s : 0.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000016s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000444s : 0.41% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.099356s : 90.91% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.37% : 0.000024s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.54% : 0.000006s : 4: substitution.graph_param_transform 66.51% : 0.000109s : 3: substitution.inline 1.82% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.58% : 0.000004s : 4: substitution.replace_old_param 6.86% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005604 2 89.80% : 0.005032s : 1: type_inference.infer 10.20% : 0.000571s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.50% : 0.000026s : 3: replace.inline 30.50% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.31% : 0.000107s : 3: match.inline 8.69% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.37% : 0.000001s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.73% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.36% : 0.000001s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.90% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.89% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000358 8 46.95% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.05% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.121685 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.46% : 0.002989s : 1: add_attr 2.45% : 0.002980s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000061s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000019s : 1: bias_add_comm_swap 0.42% : 0.000506s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000417s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.77% : 0.000939s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.74% : 0.002113s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.37% : 0.000454s : 1: opt_after_jit_grad 0.15% : 0.000187s : 1: opt_b 3.25% : 0.003955s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.17% : 0.000206s : 1: renormalize.infer 0.17% : 0.000204s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 81.67% : 0.099379s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.65% : 0.005658s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.157433, [24] [bootstrap]: 0.00049623 [type_inference]: 0.0116824 [event_method]: 4.93e-05 [auto_monad]: 0.00012267 [graph_reusing]: 8.98002e-06 [inline]: 2.32999e-06 [add_attr]: 0.00308753, [1] [add_attr_with_inline]: 0.00307959, [1] [Cycle 1]: 7.624e-05, [2] [tag_attr]: 3.705e-05 [meta_addattr_fg_expand]: 9.74e-06 [parallel-infer-symbol]: 3.43999e-06 [pre_auto_parallel]: 5.262e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0137026, [53] [py_interpret_to_execute]: 4.089e-05 [rewriter_before_opt_a]: 0.0001502 [opt_a]: 0.0113725, [3] [Cycle 1]: 0.00731869, [45] [expand_dump_flag]: 4.08001e-06 [switch_simplify]: 7.676e-05 [loop_unroll]: 6.29e-05 [a_1]: 0.00147002 [with_stream_mark]: 2.351e-05 [recompute_prepare]: 2.198e-05 [updatestate_depend_eliminate]: 8.95999e-06 [updatestate_assign_eliminate]: 7.66999e-06 [updatestate_loads_eliminate]: 7.2e-06 [parameter_eliminate]: 2.83998e-06 [a_2]: 0.00027564 [accelerated_algorithm]: 3.23e-05 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 3.29001e-06 [shard_inline]: 1.626e-05 [merge_send_recv]: 1.617e-05 [auto_parallel]: 1.126e-05 [parallel]: 1.911e-05 [flash_sp]: 1.219e-05 [merge_comm]: 9.44998e-06 [allreduce_fusion]: 9.25001e-06 [matmul_add_comm_reduction]: 2.808e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.861e-05 [virtual_dataset]: 1.611e-05 [get_grad_eliminate_]: 1.543e-05 [virtual_output]: 1.554e-05 [merge_forward]: 9.50001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.797e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.963e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 2.711e-05 [set_forward_comm_id_for_comm_node_pass]: 9.52999e-06 [meta_fg_expand]: 0.00143949 [flash_sp_send_recv_attached]: 3.8e-06 [receive_attached]: 2.48e-06 [after_resolve]: 6.044e-05 [a_after_grad]: 8.257e-05 [renormalize]: 0.00258624 [add_forward_monad_depend]: 9.88002e-06 [auto_monad_grad]: 5.87999e-06 [auto_monad_eliminator]: 5.588e-05 [cse]: 0.00017119 [a_3]: 0.00033678 [Cycle 2]: 0.00313209, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 4.791e-05 [loop_unroll]: 4.449e-05 [a_1]: 0.00154495 [with_stream_mark]: 1.249e-05 [recompute_prepare]: 1.112e-05 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 4.65999e-06 [updatestate_loads_eliminate]: 3.85998e-06 [parameter_eliminate]: 1.22999e-06 [a_2]: 0.00012709 [accelerated_algorithm]: 1.209e-05 [shard]: 1.44e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 9.39e-06 [merge_send_recv]: 7.54002e-06 [auto_parallel]: 7.73999e-06 [parallel]: 4.82998e-06 [flash_sp]: 3.90998e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.80001e-06 [matmul_add_comm_reduction]: 7.82e-06 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 1.038e-05 [virtual_dataset]: 8.75999e-06 [get_grad_eliminate_]: 8.87999e-06 [virtual_output]: 9.09e-06 [merge_forward]: 4.80001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.24998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.65e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.444e-05 [set_forward_comm_id_for_comm_node_pass]: 5.47999e-06 [meta_fg_expand]: 7.615e-05 [flash_sp_send_recv_attached]: 1.40001e-06 [receive_attached]: 1.47001e-06 [after_resolve]: 1.618e-05 [a_after_grad]: 1.431e-05 [renormalize]: 0.00068076 [add_forward_monad_depend]: 4.08001e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.486e-05 [cse]: 5.02e-05 [a_3]: 6.635e-05 [Cycle 3]: 0.00090687, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 1.038e-05 [loop_unroll]: 8.79e-06 [a_1]: 0.00025107 [with_stream_mark]: 1.002e-05 [recompute_prepare]: 9.39e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.73999e-06 [parameter_eliminate]: 1.29998e-06 [a_2]: 0.00012283 [accelerated_algorithm]: 1.167e-05 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 6.98e-06 [auto_parallel]: 7.56999e-06 [parallel]: 4.38999e-06 [flash_sp]: 1.30999e-06 [merge_comm]: 4.87e-06 [allreduce_fusion]: 4.97e-06 [matmul_add_comm_reduction]: 7.75998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 8.89e-06 [get_grad_eliminate_]: 8.51002e-06 [virtual_output]: 8.21002e-06 [merge_forward]: 4.39002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 8.48001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.62e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.406e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22e-06 [meta_fg_expand]: 2.92002e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 1.492e-05 [a_after_grad]: 1.519e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.053e-05 [cse]: 2.711e-05 [a_3]: 5.965e-05 [py_interpret_to_execute_after_opt_a]: 1.135e-05 [slice_cell_reuse_recomputed_activation]: 2.11003e-06 [rewriter_after_opt_a]: 4.841e-05 [convert_after_rewriter]: 9.34e-06 [order_py_execute_after_rewriter]: 7.26999e-06 [mutable_eliminate]: 0.00049788 [opt_b]: 0.00029106, [1] [Cycle 1]: 0.00028447, [7] [b_1]: 0.00018993 [b_2]: 1.064e-05 [updatestate_depend_eliminate]: 7.37002e-06 [updatestate_assign_eliminate]: 3.98001e-06 [updatestate_loads_eliminate]: 4.10998e-06 [renormalize]: 3.69997e-07 [cse]: 3.243e-05 [optimize_parallel_all_gather_comm]: 2.081e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 2.148e-05 [loop_unroll]: 0.00042275 [opt_after_cconv]: 0.00013675, [1] [Cycle 1]: 0.00013087, [7] [c_1]: 4.82e-05 [parameter_eliminate]: 2.18998e-06 [updatestate_depend_eliminate]: 7.43e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 3.91999e-06 [cse]: 3.063e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 3.173e-05 [tuple_transform]: 0.0001017, [1] [Cycle 1]: 9.698e-05, [4] [d_1]: 6.695e-05 [none_parameter_eliminate]: 1.63002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.92001e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 5.851e-05 [cse_after_recomputation]: 3.373e-05, [1] [Cycle 1]: 2.882e-05, [1] [cse]: 2.322e-05 [environ_conv]: 9.18002e-06 [swap_dp_allreduce_reducescatter]: 7.68001e-06 [bias_add_comm_swap]: 2.21e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.26997e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 1.12999e-06 [remove_cast_before_assign_add]: 1.28002e-06 [full_micro_interleaved_order_control]: 2.44999e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.772e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 5.17e-06 [overlap_recompute_and_grad_model_parallel]: 5.59e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37e-06 [overlap_recompute_allgather_and_fa_grad]: 1.63002e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 5.47001e-06 [overlap_grad_flash_sp]: 2.536e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.61e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 0.00010004, [1] [Cycle 1]: 9.562e-05, [6] [build]: 1.082e-05 [elim_shapecalc]: 1.322e-05 [elim_not_effective]: 1.795e-05 [opt_reshape]: 9.83998e-06 [fold_const_symbol]: 1.487e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 2.607e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00049115 [validate]: 4.738e-05 [backend_pass]: 1.10999e-06 [task_emit]: 0.127404 [execute]: 9.60001e-06 Sums bootstrap : 0.000496s : 0.32% type_inference : 0.011682s : 7.63% event_method : 0.000049s : 0.03% auto_monad : 0.000123s : 0.08% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000053s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.03% optimize.rewriter_before_opt_a : 0.000150s : 0.10% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000135s : 0.09% optimize.opt_a.loop_unroll : 0.000116s : 0.08% optimize.opt_a.a_1 : 0.003266s : 2.13% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000526s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000031s : 0.02% optimize.opt_a.auto_parallel : 0.000027s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001519s : 0.99% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.06% optimize.opt_a.a_after_grad : 0.000112s : 0.07% optimize.opt_a.renormalize : 0.003267s : 2.13% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.05% optimize.opt_a.cse : 0.000249s : 0.16% optimize.opt_a.a_3 : 0.000463s : 0.30% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000498s : 0.33% optimize.opt_b.b_1 : 0.000190s : 0.12% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.01% optimize.loop_unroll : 0.000423s : 0.28% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000491s : 0.32% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.127404s : 83.24% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000782 222 5.85% : 0.000046s : 12: substitution.arithmetic_simplify 1.95% : 0.000015s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.46% : 0.000434s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000016s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.73% : 0.000014s : 20: substitution.remove_not_recompute_node 3.06% : 0.000024s : 10: substitution.replace_applicator 1.45% : 0.000011s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.84% : 0.000069s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011600 2 86.64% : 0.010050s : 1: type_inference.infer 13.36% : 0.001550s : 1: type_inference.specialize ------[replace.] 0.000221 33 57.86% : 0.000128s : 17: replace.inline 42.14% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000461 33 92.17% : 0.000425s : 17: match.inline 7.83% : 0.000036s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.76% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000015s : 152: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.05% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001684 34 55.67% : 0.000937s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.33% : 0.000746s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.182787 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.69% : 0.003092s : 1: add_attr 1.69% : 0.003083s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000130s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.29% : 0.000533s : 1: bootstrap 0.01% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000057s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.24% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000507s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.72% : 0.004978s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.22% : 0.011376s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.27% : 0.000501s : 1: opt_after_jit_grad 0.16% : 0.000295s : 1: opt_b 7.50% : 0.013707s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000057s : 1: pre_auto_parallel 0.02% : 0.000045s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000036s : 1: remove_dup_value 0.96% : 0.001761s : 2: renormalize.infer 0.82% : 0.001491s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.08% : 0.000155s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000103s : 1: symbol_engine_optimizer 69.71% : 0.127427s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.40% : 0.011701s : 1: type_inference 0.04% : 0.000075s : 1: validate TotalTime = 0.106345, [24] [bootstrap]: 0.00044541 [type_inference]: 0.00437273 [event_method]: 1.099e-05 [auto_monad]: 5.471e-05 [graph_reusing]: 5.14e-06 [inline]: 1.79e-06 [add_attr]: 0.00297908, [1] [add_attr_with_inline]: 0.00297122, [1] [Cycle 1]: 4.373e-05, [2] [tag_attr]: 1.222e-05 [meta_addattr_fg_expand]: 3.78001e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.121e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00366506, [53] [py_interpret_to_execute]: 1.675e-05 [rewriter_before_opt_a]: 3.943e-05 [opt_a]: 0.00186374, [2] [Cycle 1]: 0.00126574, [45] [expand_dump_flag]: 2.78003e-06 [switch_simplify]: 2.528e-05 [loop_unroll]: 1.338e-05 [a_1]: 0.00029286 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.34002e-06 [updatestate_depend_eliminate]: 3.25998e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 3.41999e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.731e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.61999e-06 [auto_parallel]: 6.05002e-06 [parallel]: 1.873e-05 [flash_sp]: 7.63001e-06 [merge_comm]: 4e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 9.15001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.63997e-06 [get_grad_eliminate_]: 5.68002e-06 [virtual_output]: 5.96e-06 [merge_forward]: 3.84002e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 9.22001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 2.78e-06 [receive_attached]: 2.78e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 8.54998e-06 [renormalize]: 0.00034855 [add_forward_monad_depend]: 4.79e-06 [auto_monad_grad]: 1.87999e-06 [auto_monad_eliminator]: 1.439e-05 [cse]: 2.754e-05 [a_3]: 3.919e-05 [Cycle 2]: 0.00058909, [45] [expand_dump_flag]: 7.30011e-07 [switch_simplify]: 6.44999e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012488 [with_stream_mark]: 1.04e-05 [recompute_prepare]: 5.43002e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.738e-05 [accelerated_algorithm]: 5.35001e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.32998e-06 [auto_parallel]: 5.04e-06 [parallel]: 3.86999e-06 [flash_sp]: 3.11999e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.20999e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.88001e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.96e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.12e-06 [set_forward_comm_id_for_comm_node_pass]: 2.85002e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.75999e-06 [a_after_grad]: 7.71999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.351e-05 [a_3]: 3.284e-05 [py_interpret_to_execute_after_opt_a]: 7.91001e-06 [slice_cell_reuse_recomputed_activation]: 2.14999e-06 [rewriter_after_opt_a]: 3.258e-05 [convert_after_rewriter]: 7.43e-06 [order_py_execute_after_rewriter]: 5.32001e-06 [mutable_eliminate]: 0.00044477 [opt_b]: 0.00017868, [1] [Cycle 1]: 0.0001727, [7] [b_1]: 0.00010655 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.80009e-07 [cse]: 1.569e-05 [optimize_parallel_all_gather_comm]: 1.562e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.233e-05 [loop_unroll]: 0.00040903 [opt_after_cconv]: 9.626e-05, [1] [Cycle 1]: 9.066e-05, [7] [c_1]: 2.807e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.36998e-06 [cse]: 1.636e-05 [renormalize]: 3.70026e-07 [remove_dup_value]: 1.318e-05 [tuple_transform]: 6.841e-05, [1] [Cycle 1]: 6.416e-05, [4] [d_1]: 3.88e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 4.55e-05 [cse_after_recomputation]: 1.974e-05, [1] [Cycle 1]: 1.536e-05, [1] [cse]: 1.019e-05 [environ_conv]: 4.71002e-06 [swap_dp_allreduce_reducescatter]: 5.22e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.75001e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.71e-06 [micro_interleaved_order_control]: 2.26998e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.42001e-06 [reorder_send_recv_between_fp_bp]: 2.92002e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.06997e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.145e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 4.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.24999e-06 [overlap_grad_ring_attention]: 3.79002e-06 [overlap_grad_flash_sp]: 1.618e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 6.726e-05, [1] [Cycle 1]: 6.33e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.13001e-06 [elim_not_effective]: 1.125e-05 [opt_reshape]: 5.94999e-06 [fold_const_symbol]: 8.72998e-06 [renormalize]: 2.3999e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.63999e-06 [opt_after_jit_grad]: 0.00044214 [validate]: 3.061e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0940699 [execute]: 1.109e-05 Sums bootstrap : 0.000445s : 0.43% type_inference : 0.004373s : 4.27% event_method : 0.000011s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000418s : 0.41% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000349s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.43% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000409s : 0.40% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000442s : 0.43% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.094070s : 91.86% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.31% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.59% : 0.000006s : 4: substitution.graph_param_transform 65.75% : 0.000080s : 2: substitution.inline 2.21% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.62% : 0.000004s : 4: substitution.remove_not_recompute_node 3.06% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004331 2 91.54% : 0.003964s : 1: type_inference.infer 8.46% : 0.000367s : 1: type_inference.specialize ------[replace.] 0.000017 2 100.00% : 0.000017s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 26: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.07% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.95% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.68% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.58% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.55% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.01% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000254 6 42.23% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.77% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.114260 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.61% : 0.002983s : 1: add_attr 2.60% : 0.002975s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000474s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.37% : 0.000418s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.67% : 0.000768s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.63% : 0.001867s : 1: opt_a 0.09% : 0.000100s : 1: opt_after_cconv 0.40% : 0.000451s : 1: opt_after_jit_grad 0.16% : 0.000182s : 1: opt_b 3.21% : 0.003669s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000021s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000183s : 1: renormalize.infer 0.14% : 0.000159s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.35% : 0.094093s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.84% : 0.004387s : 1: type_inference 0.05% : 0.000053s : 1: validate TotalTime = 0.153042, [24] [bootstrap]: 0.00057721 [type_inference]: 0.0113409 [event_method]: 4.438e-05 [auto_monad]: 0.00012081 [graph_reusing]: 8.38001e-06 [inline]: 2.38998e-06 [add_attr]: 0.00329875, [1] [add_attr_with_inline]: 0.00328944, [1] [Cycle 1]: 8.051e-05, [2] [tag_attr]: 3.515e-05 [meta_addattr_fg_expand]: 8.56002e-06 [parallel-infer-symbol]: 3.95e-06 [pre_auto_parallel]: 5.36e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0177745, [53] [py_interpret_to_execute]: 4.12e-05 [rewriter_before_opt_a]: 0.00013892 [opt_a]: 0.015029, [3] [Cycle 1]: 0.0106861, [45] [expand_dump_flag]: 4.13999e-06 [switch_simplify]: 6.939e-05 [loop_unroll]: 5.484e-05 [a_1]: 0.00380733 [with_stream_mark]: 3.495e-05 [recompute_prepare]: 3.266e-05 [updatestate_depend_eliminate]: 9.25999e-06 [updatestate_assign_eliminate]: 8.03999e-06 [updatestate_loads_eliminate]: 7.32002e-06 [parameter_eliminate]: 3.70998e-06 [a_2]: 0.00025742 [accelerated_algorithm]: 3.375e-05 [shard]: 2.64999e-06 [meta_shard_fg_expand]: 3.63e-06 [shard_inline]: 1.626e-05 [merge_send_recv]: 1.839e-05 [auto_parallel]: 1.278e-05 [parallel]: 2.19e-05 [flash_sp]: 1.308e-05 [merge_comm]: 9.82999e-06 [allreduce_fusion]: 8.80999e-06 [matmul_add_comm_reduction]: 3.364e-05 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 1.799e-05 [virtual_dataset]: 1.579e-05 [get_grad_eliminate_]: 1.579e-05 [virtual_output]: 1.542e-05 [merge_forward]: 9.77999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.809e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.97e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 2.847e-05 [set_forward_comm_id_for_comm_node_pass]: 1.024e-05 [meta_fg_expand]: 0.00181795 [flash_sp_send_recv_attached]: 3.83999e-06 [receive_attached]: 2.60997e-06 [after_resolve]: 6.482e-05 [a_after_grad]: 8.441e-05 [renormalize]: 0.00315955 [add_forward_monad_depend]: 1.128e-05 [auto_monad_grad]: 7.13998e-06 [auto_monad_eliminator]: 5.962e-05 [cse]: 0.00017928 [a_3]: 0.00034614 [Cycle 2]: 0.00336029, [45] [expand_dump_flag]: 3.18e-06 [switch_simplify]: 4.857e-05 [loop_unroll]: 4.425e-05 [a_1]: 0.0016251 [with_stream_mark]: 1.923e-05 [recompute_prepare]: 1.374e-05 [updatestate_depend_eliminate]: 6.74001e-06 [updatestate_assign_eliminate]: 5.74999e-06 [updatestate_loads_eliminate]: 4.57e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 0.00013044 [accelerated_algorithm]: 1.409e-05 [shard]: 2.50002e-06 [meta_shard_fg_expand]: 2.64999e-06 [shard_inline]: 9.49999e-06 [merge_send_recv]: 1.031e-05 [auto_parallel]: 1.028e-05 [parallel]: 9.36002e-06 [flash_sp]: 4.12998e-06 [merge_comm]: 5.37001e-06 [allreduce_fusion]: 5.39998e-06 [matmul_add_comm_reduction]: 1.072e-05 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 1.07e-05 [virtual_dataset]: 9.06998e-06 [get_grad_eliminate_]: 8.88002e-06 [virtual_output]: 8.62e-06 [merge_forward]: 5.67001e-06 [cell_reuse_recompute_pass]: 1.63002e-06 [offload_activation]: 1.254e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.81e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 1.451e-05 [set_forward_comm_id_for_comm_node_pass]: 7.23999e-06 [meta_fg_expand]: 4.922e-05 [flash_sp_send_recv_attached]: 1.45999e-06 [receive_attached]: 1.91e-06 [after_resolve]: 1.702e-05 [a_after_grad]: 1.42e-05 [renormalize]: 0.0007695 [add_forward_monad_depend]: 5.14e-06 [auto_monad_grad]: 2.20002e-06 [auto_monad_eliminator]: 1.74e-05 [cse]: 5.318e-05 [a_3]: 7.238e-05 [Cycle 3]: 0.00096333, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 1.061e-05 [loop_unroll]: 8.80001e-06 [a_1]: 0.00026292 [with_stream_mark]: 1.264e-05 [recompute_prepare]: 1.057e-05 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 4.07e-06 [parameter_eliminate]: 1.35001e-06 [a_2]: 0.00012465 [accelerated_algorithm]: 1.206e-05 [shard]: 1.42e-06 [meta_shard_fg_expand]: 2.09999e-06 [shard_inline]: 9.59e-06 [merge_send_recv]: 7.83001e-06 [auto_parallel]: 7.65e-06 [parallel]: 5.56998e-06 [flash_sp]: 1.02e-06 [merge_comm]: 4.74e-06 [allreduce_fusion]: 4.94998e-06 [matmul_add_comm_reduction]: 9.59e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.063e-05 [virtual_dataset]: 8.70001e-06 [get_grad_eliminate_]: 8.99e-06 [virtual_output]: 8.54002e-06 [merge_forward]: 4.58001e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 9.35001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.769e-05 [merge_recompute_call_nodes]: 1.17999e-06 [before_grad]: 1.606e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40999e-06 [meta_fg_expand]: 3.04001e-06 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.33002e-06 [after_resolve]: 1.445e-05 [a_after_grad]: 1.397e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.06e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.346e-05 [cse]: 3.148e-05 [a_3]: 6.106e-05 [py_interpret_to_execute_after_opt_a]: 1.574e-05 [slice_cell_reuse_recomputed_activation]: 2.20002e-06 [rewriter_after_opt_a]: 5.422e-05 [convert_after_rewriter]: 1.04e-05 [order_py_execute_after_rewriter]: 6.98e-06 [mutable_eliminate]: 0.00071389 [opt_b]: 0.00030444, [1] [Cycle 1]: 0.0002968, [7] [b_1]: 0.00019123 [b_2]: 1.082e-05 [updatestate_depend_eliminate]: 9.70002e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 4.80001e-06 [renormalize]: 8.29983e-07 [cse]: 3.737e-05 [optimize_parallel_all_gather_comm]: 2.371e-05 [overlap_param_gather]: 2.32001e-06 [cconv]: 2.871e-05 [loop_unroll]: 0.00049098 [opt_after_cconv]: 0.00014634, [1] [Cycle 1]: 0.00013953, [7] [c_1]: 4.929e-05 [parameter_eliminate]: 3.46999e-06 [updatestate_depend_eliminate]: 8.43999e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.03999e-06 [cse]: 3.365e-05 [renormalize]: 8.30012e-07 [remove_dup_value]: 3.44e-05 [tuple_transform]: 0.00010744, [1] [Cycle 1]: 0.00010203, [4] [d_1]: 6.921e-05 [none_parameter_eliminate]: 1.73997e-06 [renormalize]: 4.2998e-07 [switch_simplify]: 1.016e-05 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 7.392e-05 [cse_after_recomputation]: 3.506e-05, [1] [Cycle 1]: 2.962e-05, [1] [cse]: 2.389e-05 [environ_conv]: 1.003e-05 [swap_dp_allreduce_reducescatter]: 8.05e-06 [bias_add_comm_swap]: 2.88003e-06 [label_micro_interleaved_index]: 4.99e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.01997e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.40001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.805e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 5.41998e-06 [overlap_recompute_and_grad_model_parallel]: 6.06998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.34999e-06 [overlap_grad_ring_attention]: 5.45001e-06 [overlap_grad_flash_sp]: 2.926e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.56998e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 0.0001468, [1] [Cycle 1]: 0.00014159, [6] [build]: 1.102e-05 [elim_shapecalc]: 1.648e-05 [elim_not_effective]: 4.974e-05 [opt_reshape]: 1.147e-05 [fold_const_symbol]: 1.653e-05 [renormalize]: 2.59985e-07 [detach_backward]: 2.67001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 2.838e-05 [get_jit_bprop_graph]: 1.86998e-06 [rewriter_after_jit_bprop_graph]: 4.23999e-06 [opt_after_jit_grad]: 0.00053292 [validate]: 5.147e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.118923 [execute]: 9.63002e-06 Sums bootstrap : 0.000577s : 0.39% type_inference : 0.011341s : 7.65% event_method : 0.000044s : 0.03% auto_monad : 0.000121s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000054s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000041s : 0.03% optimize.rewriter_before_opt_a : 0.000139s : 0.09% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000129s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.07% optimize.opt_a.a_1 : 0.005695s : 3.84% optimize.opt_a.with_stream_mark : 0.000067s : 0.05% optimize.opt_a.recompute_prepare : 0.000057s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000007s : 0.01% optimize.opt_a.a_2 : 0.000513s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000060s : 0.04% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000037s : 0.02% optimize.opt_a.auto_parallel : 0.000031s : 0.02% optimize.opt_a.parallel : 0.000037s : 0.02% optimize.opt_a.flash_sp : 0.000018s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000054s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000020s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000040s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000059s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.02% optimize.opt_a.meta_fg_expand : 0.001870s : 1.26% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000096s : 0.06% optimize.opt_a.a_after_grad : 0.000113s : 0.08% optimize.opt_a.renormalize : 0.003929s : 2.65% optimize.opt_a.add_forward_monad_depend : 0.000018s : 0.01% optimize.opt_a.auto_monad_grad : 0.000011s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000090s : 0.06% optimize.opt_a.cse : 0.000264s : 0.18% optimize.opt_a.a_3 : 0.000480s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000054s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000714s : 0.48% optimize.opt_b.b_1 : 0.000191s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000037s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000029s : 0.02% optimize.loop_unroll : 0.000491s : 0.33% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000034s : 0.02% optimize.tuple_transform.d_1 : 0.000069s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000074s : 0.05% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000050s : 0.03% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000028s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000533s : 0.36% validate : 0.000051s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.118923s : 80.19% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000859 218 6.25% : 0.000054s : 11: substitution.arithmetic_simplify 1.77% : 0.000015s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000005s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.90% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 55.73% : 0.000478s : 16: substitution.inline 2.21% : 0.000019s : 2: substitution.inline_without_move 1.38% : 0.000012s : 20: substitution.j_node_and_user_rematch 2.09% : 0.000018s : 3: substitution.less_batch_normalization 1.80% : 0.000015s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.59% : 0.000014s : 20: substitution.remove_not_recompute_node 3.17% : 0.000027s : 10: substitution.replace_applicator 1.31% : 0.000011s : 15: substitution.replace_old_param 0.44% : 0.000004s : 1: substitution.set_cell_output_no_recompute 3.58% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.64% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 8.41% : 0.000072s : 28: substitution.tuple_list_get_item_eliminator 2.37% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011259 2 87.54% : 0.009857s : 1: type_inference.infer 12.46% : 0.001402s : 1: type_inference.specialize ------[replace.] 0.000215 30 59.93% : 0.000129s : 16: replace.inline 40.07% : 0.000086s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000503 30 93.20% : 0.000469s : 16: match.inline 6.80% : 0.000034s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.003163 5663 0.26% : 0.000008s : 67: predicate.accumulaten_eliminater 0.07% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.12% : 0.000004s : 32: predicate.addn_check_dump 0.26% : 0.000008s : 67: predicate.addn_zero_filter 0.25% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 0.52% : 0.000017s : 99: predicate.arithmetic_simplify 0.28% : 0.000009s : 67: predicate.cast_eliminate 0.27% : 0.000009s : 68: predicate.check_bprop_eliminate 0.13% : 0.000004s : 32: predicate.compare_switch_simplify 0.02% : 0.000001s : 8: predicate.const_output_eliminate 0.13% : 0.000004s : 32: predicate.depend_value_elim 0.28% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 0.29% : 0.000009s : 67: predicate.dict_get_item_eliminator 0.26% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.09% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.02% : 0.000001s : 8: predicate.elim_not_effective 0.05% : 0.000002s : 8: predicate.elim_shapecalc_of_broadcastargs 0.30% : 0.000010s : 75: predicate.environ_add_const_eliminate 0.28% : 0.000009s : 75: predicate.environ_get_add_eliminate 0.28% : 0.000009s : 75: predicate.environ_get_depend_swap 0.41% : 0.000013s : 107: predicate.environ_get_eliminate 0.28% : 0.000009s : 75: predicate.environ_get_set_eliminate 0.39% : 0.000012s : 97: predicate.exchange_switch_depend_value 0.57% : 0.000018s : 97: predicate.float_depend_g_call 0.12% : 0.000004s : 32: predicate.float_environ_get_switch 0.16% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.02% : 0.000001s : 8: predicate.fold_const_symbol 0.15% : 0.000005s : 32: predicate.get_grad_eliminate 0.02% : 0.000001s : 8: predicate.graph_param_transform 0.13% : 0.000004s : 32: predicate.incorporate_call 0.12% : 0.000004s : 32: predicate.incorporate_call_switch 1.35% : 0.000043s : 244: predicate.inline 0.30% : 0.000009s : 55: predicate.inline_without_move 0.08% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.17% : 0.000006s : 32: predicate.less_batch_normalization 0.38% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 0.65% : 0.000020s : 164: predicate.load_eliminater 0.09% : 0.000003s : 8: predicate.loop_unroll_after_grad 0.51% : 0.000016s : 128: predicate.loop_unroll_before_grad 0.34% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.13% : 0.000004s : 32: predicate.merge_addn 0.27% : 0.000008s : 68: predicate.micro_step_allgather_replace 0.27% : 0.000009s : 68: predicate.mini_step_allgather_replace 0.27% : 0.000008s : 67: predicate.minmaximum_grad 0.10% : 0.000003s : 8: predicate.mutable_eliminate 0.05% : 0.000001s : 8: predicate.opt_reshape 0.05% : 0.000002s : 8: predicate.parallel_virtual_node 0.47% : 0.000015s : 97: predicate.partial_defer_inline 0.40% : 0.000013s : 89: predicate.partial_eliminate 0.26% : 0.000008s : 67: predicate.print_const_string_wrapper 0.13% : 0.000004s : 32: predicate.reduce_all_const_elim 0.32% : 0.000010s : 67: predicate.reduce_eliminate 0.63% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.08% : 0.000003s : 32: predicate.remove_not_recompute_node 0.47% : 0.000015s : 149: predicate.replace_applicator 0.17% : 0.000005s : 55: predicate.replace_old_param 0.03% : 0.000001s : 8: predicate.reset_defer_inline 0.26% : 0.000008s : 67: predicate.reshape_eliminate 0.27% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.04% : 0.000001s : 8: predicate.row_tensor_eliminate 0.31% : 0.000010s : 68: predicate.same_eliminate 0.11% : 0.000004s : 32: predicate.set_cell_output_no_recompute 0.14% : 0.000005s : 32: predicate.shard_identity_eliminate 0.08% : 0.000002s : 16: predicate.special_op_eliminate 0.15% : 0.000005s : 32: predicate.specialize_transform 0.31% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 0.27% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.04% : 0.000001s : 8: predicate.switch_call_monad_eliminater 0.43% : 0.000014s : 97: predicate.switch_defer_inline 0.69% : 0.000022s : 165: predicate.switch_layer_defer_inline 1.15% : 0.000036s : 265: predicate.switch_simplify 0.25% : 0.000008s : 67: predicate.tile_eliminate 0.26% : 0.000008s : 67: predicate.transpose_eliminate 0.35% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 0.36% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 0.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 0.66% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 0.36% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 0.49% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 0.38% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 76.64% : 0.002424s : 164: predicate.updatestate_pure_node_eliminater 0.77% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.04% : 0.000001s : 8: predicate.value_based_eliminate 0.13% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.14% : 0.000004s : 32: predicate.virtual_output_eliminate 0.03% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.04% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001826 32 57.89% : 0.001057s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.11% : 0.000769s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.185815 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.78% : 0.003303s : 1: add_attr 1.77% : 0.003294s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000079s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000128s : 1: auto_monad 0.02% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.33% : 0.000615s : 1: bootstrap 0.02% : 0.000032s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.02% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000053s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.27% : 0.000502s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000727s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000022s : 1: opt.transform.mutable_eliminate 3.99% : 0.007414s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000176s : 28: opt.transform.opt_b 0.04% : 0.000077s : 2: opt.transform.opt_trans_graph 0.05% : 0.000090s : 4: opt.transform.symbol_engine_opt 8.09% : 0.015033s : 1: opt_a 0.08% : 0.000150s : 1: opt_after_cconv 0.29% : 0.000544s : 1: opt_after_jit_grad 0.17% : 0.000308s : 1: opt_b 9.57% : 0.017780s : 1: optimize 0.01% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000032s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000006s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000059s : 1: pre_auto_parallel 0.02% : 0.000046s : 1: py_interpret_to_execute 0.01% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000039s : 1: remove_dup_value 1.15% : 0.002135s : 2: renormalize.infer 0.96% : 0.001777s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000058s : 1: rewriter_after_opt_a 0.08% : 0.000144s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000150s : 1: symbol_engine_optimizer 64.01% : 0.118945s : 1: task_emit 0.06% : 0.000110s : 1: tuple_transform 6.12% : 0.011363s : 1: type_inference 0.05% : 0.000085s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y2-dtype_x9-ge],max_mem:38.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x0-pynative],max_mem:38.0M TotalTime = 0.0225092, [24] [bootstrap]: 0.00057567 [type_inference]: 0.00637598 [event_method]: 1.463e-05 [auto_monad]: 5.774e-05 [graph_reusing]: 6.12999e-06 [inline]: 1.76998e-06 [add_attr]: 0.00390401, [1] [add_attr_with_inline]: 0.00389352, [1] [Cycle 1]: 4.647e-05, [2] [tag_attr]: 1.698e-05 [meta_addattr_fg_expand]: 4.18999e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.994e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0041025, [53] [py_interpret_to_execute]: 2.124e-05 [rewriter_before_opt_a]: 6.399e-05 [opt_a]: 0.00220439, [2] [Cycle 1]: 0.00160328, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 3.268e-05 [loop_unroll]: 2.092e-05 [a_1]: 0.00046512 [with_stream_mark]: 1.377e-05 [recompute_prepare]: 8.12e-06 [updatestate_depend_eliminate]: 4.34997e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 7.582e-05 [accelerated_algorithm]: 6.16998e-06 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 8.89e-06 [auto_parallel]: 5.96998e-06 [parallel]: 2.475e-05 [flash_sp]: 7.59002e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.44e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.06999e-06 [virtual_dataset]: 5.68997e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.99e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.44998e-06 [offload_activation]: 1.035e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 9.37999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.53e-06 [flash_sp_send_recv_attached]: 2.72001e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00048394 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.358e-05 [cse]: 2.919e-05 [a_3]: 3.987e-05 [Cycle 2]: 0.00059127, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.07002e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.0001238 [with_stream_mark]: 9.82001e-06 [recompute_prepare]: 5.42001e-06 [updatestate_depend_eliminate]: 3.11001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 6.809e-05 [accelerated_algorithm]: 5.51998e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.03001e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.83998e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 5.92001e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.55002e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 9.24e-06 [a_after_grad]: 7.70998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.54999e-06 [cse]: 1.389e-05 [a_3]: 3.195e-05 [py_interpret_to_execute_after_opt_a]: 7.53e-06 [slice_cell_reuse_recomputed_activation]: 2.61999e-06 [rewriter_after_opt_a]: 2.91e-05 [convert_after_rewriter]: 7.34002e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00049932 [opt_b]: 0.00018429, [1] [Cycle 1]: 0.0001779, [7] [b_1]: 0.00010949 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 8.2e-07 [cse]: 1.687e-05 [optimize_parallel_all_gather_comm]: 1.641e-05 [overlap_param_gather]: 2.01998e-06 [cconv]: 2.38e-05 [loop_unroll]: 0.00040743 [opt_after_cconv]: 9.479e-05, [1] [Cycle 1]: 8.894e-05, [7] [c_1]: 2.776e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.605e-05 [renormalize]: 2.59985e-07 [remove_dup_value]: 1.358e-05 [tuple_transform]: 6.908e-05, [1] [Cycle 1]: 6.472e-05, [4] [d_1]: 3.906e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 5.209e-05 [cse_after_recomputation]: 2.018e-05, [1] [Cycle 1]: 1.566e-05, [1] [cse]: 1.056e-05 [environ_conv]: 5.60001e-06 [swap_dp_allreduce_reducescatter]: 5.27001e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.61002e-06 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.81999e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.48999e-06 [overlap_recompute_and_grad_model_parallel]: 4.32e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.693e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.15002e-06 [split_layernorm_comm]: 2.24999e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 6.875e-05, [1] [Cycle 1]: 6.47e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.75999e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.03002e-06 [pipeline_parallel_scheduler]: 1.92001e-06 [auto_monad_reorder]: 1.584e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 0.00012693 [opt_after_jit_grad]: 0.00045175 [validate]: 3.336e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00658224 [execute]: 8.57e-06 Sums bootstrap : 0.000576s : 3.26% type_inference : 0.006376s : 36.14% event_method : 0.000015s : 0.08% auto_monad : 0.000058s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000064s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000589s : 3.34% optimize.opt_a.with_stream_mark : 0.000024s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000484s : 2.74% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000043s : 0.24% optimize.opt_a.a_3 : 0.000072s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000499s : 2.83% optimize.opt_b.b_1 : 0.000109s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.13% optimize.loop_unroll : 0.000407s : 2.31% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000127s : 0.72% opt_after_jit_grad : 0.000452s : 2.56% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006582s : 37.31% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000171 30 14.46% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.31% : 0.000006s : 4: substitution.graph_param_transform 67.34% : 0.000115s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.47% : 0.000004s : 4: substitution.remove_not_recompute_node 2.51% : 0.000004s : 4: substitution.replace_old_param 6.50% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006329 2 90.68% : 0.005739s : 1: type_inference.infer 9.32% : 0.000590s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.78% : 0.000028s : 3: replace.inline 29.22% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 91.84% : 0.000113s : 3: match.inline 8.16% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.99% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.19% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.53% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.57% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000383 8 46.79% : 0.000179s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.21% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032109 196 0.01% : 0.000004s : 1: ForceFp32Comm 12.17% : 0.003909s : 1: add_attr 12.14% : 0.003897s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000063s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000608s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.30% : 0.000416s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.58% : 0.000508s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.97% : 0.000955s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.87% : 0.002207s : 1: opt_a 0.31% : 0.000098s : 1: opt_after_cconv 1.44% : 0.000462s : 1: opt_after_jit_grad 0.58% : 0.000188s : 1: opt_b 12.79% : 0.004106s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.80% : 0.000257s : 1: renormalize.infer 0.68% : 0.000220s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.41% : 0.000132s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000071s : 1: symbol_engine_optimizer 20.53% : 0.006593s : 1: task_emit 0.22% : 0.000072s : 1: tuple_transform 19.90% : 0.006390s : 1: type_inference 0.21% : 0.000067s : 1: validate TotalTime = 0.0189449, [24] [bootstrap]: 0.00050708 [type_inference]: 0.00448478 [event_method]: 1.025e-05 [auto_monad]: 5.192e-05 [graph_reusing]: 5.37999e-06 [inline]: 2.19001e-06 [add_attr]: 0.00309049, [1] [add_attr_with_inline]: 0.00308115, [1] [Cycle 1]: 4.769e-05, [2] [tag_attr]: 1.346e-05 [meta_addattr_fg_expand]: 3.6e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.372e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00382577, [53] [py_interpret_to_execute]: 1.737e-05 [rewriter_before_opt_a]: 4.343e-05 [opt_a]: 0.00195323, [2] [Cycle 1]: 0.00135256, [45] [expand_dump_flag]: 2.70002e-06 [switch_simplify]: 2.522e-05 [loop_unroll]: 1.408e-05 [a_1]: 0.00030725 [with_stream_mark]: 1.513e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 3.95998e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 3.26001e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.86e-05 [accelerated_algorithm]: 6.04001e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 8.75999e-06 [auto_parallel]: 6.07001e-06 [parallel]: 1.91e-05 [flash_sp]: 8.21002e-06 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 8.93002e-06 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.036e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.141e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 1.103e-05 [a_after_grad]: 9.05999e-06 [renormalize]: 0.00041115 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.346e-05 [cse]: 2.883e-05 [a_3]: 4.047e-05 [Cycle 2]: 0.00059081, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012678 [with_stream_mark]: 1.072e-05 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.857e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.25001e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.14002e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.27e-06 [flash_sp]: 3.7e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.35999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 5.72001e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64999e-06 [merge_recompute_call_nodes]: 9.00007e-07 [before_grad]: 8.17998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.38997e-06 [a_after_grad]: 8.22e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.05002e-06 [cse]: 1.184e-05 [a_3]: 3.166e-05 [py_interpret_to_execute_after_opt_a]: 7.20998e-06 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 3.269e-05 [convert_after_rewriter]: 6.53e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00050885 [opt_b]: 0.00018144, [1] [Cycle 1]: 0.0001755, [7] [b_1]: 0.0001082 [b_2]: 7.03998e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.28002e-06 [updatestate_loads_eliminate]: 2.33002e-06 [renormalize]: 3.19997e-07 [cse]: 1.56e-05 [optimize_parallel_all_gather_comm]: 1.568e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.254e-05 [loop_unroll]: 0.00041076 [opt_after_cconv]: 9.373e-05, [1] [Cycle 1]: 8.815e-05, [7] [c_1]: 2.795e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.552e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.283e-05 [tuple_transform]: 6.964e-05, [1] [Cycle 1]: 6.537e-05, [4] [d_1]: 3.98e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 4.337e-05 [cse_after_recomputation]: 1.942e-05, [1] [Cycle 1]: 1.522e-05, [1] [cse]: 1.006e-05 [environ_conv]: 5.24e-06 [swap_dp_allreduce_reducescatter]: 5.34998e-06 [bias_add_comm_swap]: 2.41998e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.34998e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.58999e-06 [overlap_recompute_and_grad_model_parallel]: 4.13001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.86003e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 4.35e-06 [overlap_grad_flash_sp]: 1.719e-05 [begin_end_overlap_inline]: 8.50006e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.744e-05, [1] [Cycle 1]: 6.325e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.16002e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 2.50002e-07 [detach_backward]: 2.11e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.623e-05 [get_jit_bprop_graph]: 1.39e-06 [rewriter_after_jit_bprop_graph]: 3.19001e-06 [opt_after_jit_grad]: 0.00044262 [validate]: 3.222e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00622798 [execute]: 6.93e-06 Sums bootstrap : 0.000507s : 3.40% type_inference : 0.004485s : 30.11% event_method : 0.000010s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.12% optimize.rewriter_before_opt_a : 0.000043s : 0.29% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000434s : 2.91% optimize.opt_a.with_stream_mark : 0.000026s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000411s : 2.76% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000509s : 3.42% optimize.opt_b.b_1 : 0.000108s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000411s : 2.76% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000443s : 2.97% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006228s : 41.81% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000132 26 17.37% : 0.000023s : 4: substitution.arithmetic_simplify 1.66% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 4.51% : 0.000006s : 4: substitution.graph_param_transform 66.68% : 0.000088s : 2: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.25% : 0.000004s : 4: substitution.remove_not_recompute_node 3.46% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004440 2 91.97% : 0.004084s : 1: type_inference.infer 8.03% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000086 2 100.00% : 0.000086s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.02% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.99% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.04% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.97% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.50% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.88% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.16% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.43% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.71% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000257 6 42.59% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.41% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027215 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.37% : 0.003095s : 1: add_attr 11.34% : 0.003085s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.97% : 0.000538s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.54% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.91% : 0.000519s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.90% : 0.000789s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.19% : 0.001956s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.66% : 0.000452s : 1: opt_after_jit_grad 0.68% : 0.000185s : 1: opt_b 14.07% : 0.003829s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000021s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.86% : 0.000233s : 1: renormalize.infer 0.63% : 0.000171s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.18% : 0.000048s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 22.93% : 0.006240s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.54% : 0.004501s : 1: type_inference 0.22% : 0.000060s : 1: validate TotalTime = 0.0200334, [24] [bootstrap]: 0.00052307 [type_inference]: 0.00571397 [event_method]: 1.379e-05 [auto_monad]: 5.839e-05 [graph_reusing]: 5.96e-06 [inline]: 2.19001e-06 [add_attr]: 0.00299627, [1] [add_attr_with_inline]: 0.0029877, [1] [Cycle 1]: 4.719e-05, [2] [tag_attr]: 1.635e-05 [meta_addattr_fg_expand]: 4.33999e-06 [parallel-infer-symbol]: 2.90002e-06 [pre_auto_parallel]: 2.514e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.0039627, [53] [py_interpret_to_execute]: 1.992e-05 [rewriter_before_opt_a]: 5.818e-05 [opt_a]: 0.00211845, [2] [Cycle 1]: 0.0015227, [45] [expand_dump_flag]: 3.31001e-06 [switch_simplify]: 3.299e-05 [loop_unroll]: 2.104e-05 [a_1]: 0.00045011 [with_stream_mark]: 1.378e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 4.28001e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.87001e-06 [a_2]: 7.517e-05 [accelerated_algorithm]: 6.66e-06 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 1.98997e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 8.13001e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.874e-05 [flash_sp]: 7.2e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.82999e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.23e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 9.54e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.42001e-06 [flash_sp_send_recv_attached]: 2.74001e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 1.064e-05 [a_after_grad]: 8.63001e-06 [renormalize]: 0.00043002 [add_forward_monad_depend]: 4.73001e-06 [auto_monad_grad]: 1.71998e-06 [auto_monad_eliminator]: 1.349e-05 [cse]: 2.863e-05 [a_3]: 3.997e-05 [Cycle 2]: 0.00058623, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012545 [with_stream_mark]: 1.008e-05 [recompute_prepare]: 5.57999e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 6.681e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.08002e-06 [parallel]: 3.66999e-06 [flash_sp]: 3.2e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 5.88002e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 4.77e-06 [virtual_output]: 4.81997e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.73002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.23002e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.58999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 8.82999e-06 [a_after_grad]: 7.77e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.43003e-06 [cse]: 1.54e-05 [a_3]: 3.166e-05 [py_interpret_to_execute_after_opt_a]: 7.47002e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.209e-05 [convert_after_rewriter]: 6.82002e-06 [order_py_execute_after_rewriter]: 5.27999e-06 [mutable_eliminate]: 0.00044973 [opt_b]: 0.00017792, [1] [Cycle 1]: 0.00017208, [7] [b_1]: 0.00010571 [b_2]: 6.63e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.53998e-06 [renormalize]: 3.39991e-07 [cse]: 1.601e-05 [optimize_parallel_all_gather_comm]: 1.601e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.334e-05 [loop_unroll]: 0.00041 [opt_after_cconv]: 9.609e-05, [1] [Cycle 1]: 9.05e-05, [7] [c_1]: 2.893e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.567e-05 [renormalize]: 5.50004e-07 [remove_dup_value]: 1.249e-05 [tuple_transform]: 8.428e-05, [1] [Cycle 1]: 7.955e-05, [4] [d_1]: 5.23e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 1.86003e-06 [add_recomputation]: 4.658e-05 [cse_after_recomputation]: 2.137e-05, [1] [Cycle 1]: 1.687e-05, [1] [cse]: 1.164e-05 [environ_conv]: 5.30999e-06 [swap_dp_allreduce_reducescatter]: 5.36998e-06 [bias_add_comm_swap]: 2.67001e-06 [label_micro_interleaved_index]: 4.09002e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.94001e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.209e-05 [grouped_pairwise_exchange_alltoall]: 1.92001e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.16001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.59e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 4.12e-06 [overlap_grad_flash_sp]: 1.724e-05 [begin_end_overlap_inline]: 6.79982e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.654e-05, [1] [Cycle 1]: 6.243e-05, [6] [build]: 2.31998e-06 [elim_shapecalc]: 8.07e-06 [elim_not_effective]: 1.105e-05 [opt_reshape]: 5.92001e-06 [fold_const_symbol]: 8.66002e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.61002e-06 [auto_monad_reorder]: 1.578e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00044572 [validate]: 3.25e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.0060186 [execute]: 7.57998e-06 Sums bootstrap : 0.000523s : 3.25% type_inference : 0.005714s : 35.51% event_method : 0.000014s : 0.09% auto_monad : 0.000058s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000576s : 3.58% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.88% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000430s : 2.67% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000044s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.79% optimize.opt_b.b_1 : 0.000106s : 0.66% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000410s : 2.55% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000052s : 0.33% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000446s : 2.77% validate : 0.000033s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006019s : 37.40% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000167 30 14.74% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.36% : 0.000006s : 4: substitution.graph_param_transform 66.69% : 0.000111s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.64% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005671 2 90.26% : 0.005118s : 1: type_inference.infer 9.74% : 0.000553s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.73% : 0.000027s : 3: replace.inline 30.27% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.63% : 0.000109s : 3: match.inline 8.37% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000170 1131 0.83% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 11: predicate.addn_zero_filter 0.72% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000004s : 19: predicate.arithmetic_simplify 0.81% : 0.000001s : 11: predicate.cast_eliminate 0.64% : 0.000001s : 8: predicate.check_bprop_eliminate 0.52% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.02% : 0.000002s : 15: predicate.environ_get_depend_swap 1.70% : 0.000003s : 23: predicate.environ_get_eliminate 1.00% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.18% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 16: predicate.float_depend_g_call 0.52% : 0.000001s : 8: predicate.float_environ_get_switch 0.80% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 7.62% : 0.000013s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.51% : 0.000001s : 8: predicate.incorporate_call_switch 5.76% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.19% : 0.000004s : 32: predicate.load_eliminater 0.89% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 8: predicate.merge_addn 0.58% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.46% : 0.000002s : 16: predicate.partial_defer_inline 1.35% : 0.000002s : 17: predicate.partial_eliminate 0.78% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.30% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.52% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.63% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.68% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.83% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.34% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.29% : 0.000002s : 16: predicate.switch_defer_inline 1.88% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 54: predicate.switch_simplify 0.75% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.36% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.57% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.12% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.92% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 46.38% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.62% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028527 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.52% : 0.003001s : 1: add_attr 10.49% : 0.002991s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000063s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.94% : 0.000553s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.47% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.29% : 0.000938s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000088s : 28: opt.transform.opt_b 0.20% : 0.000057s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.44% : 0.002121s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.59% : 0.000455s : 1: opt_after_jit_grad 0.64% : 0.000181s : 1: opt_b 13.91% : 0.003967s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.79% : 0.000224s : 1: renormalize.infer 0.70% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000069s : 1: symbol_engine_optimizer 21.14% : 0.006030s : 1: task_emit 0.31% : 0.000087s : 1: tuple_transform 20.08% : 0.005728s : 1: type_inference 0.21% : 0.000060s : 1: validate TotalTime = 0.039051, [24] [bootstrap]: 0.00055083 [type_inference]: 0.0115362 [event_method]: 4.803e-05 [auto_monad]: 0.00012229 [graph_reusing]: 8.44998e-06 [inline]: 2.04e-06 [add_attr]: 0.00304122, [1] [add_attr_with_inline]: 0.00303302, [1] [Cycle 1]: 7.187e-05, [2] [tag_attr]: 3.622e-05 [meta_addattr_fg_expand]: 9.14e-06 [parallel-infer-symbol]: 3.65998e-06 [pre_auto_parallel]: 4.973e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 6.19999e-07 [dataset_repeat_opt]: 2.16998e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.0138004, [53] [py_interpret_to_execute]: 3.748e-05 [rewriter_before_opt_a]: 0.000145 [opt_a]: 0.0114456, [3] [Cycle 1]: 0.00725792, [45] [expand_dump_flag]: 3.69002e-06 [switch_simplify]: 7.461e-05 [loop_unroll]: 6.106e-05 [a_1]: 0.00144179 [with_stream_mark]: 2.293e-05 [recompute_prepare]: 2.184e-05 [updatestate_depend_eliminate]: 8.87999e-06 [updatestate_assign_eliminate]: 8.02e-06 [updatestate_loads_eliminate]: 7.38e-06 [parameter_eliminate]: 2.83e-06 [a_2]: 0.00024442 [accelerated_algorithm]: 3.075e-05 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 3.35e-06 [shard_inline]: 1.592e-05 [merge_send_recv]: 1.646e-05 [auto_parallel]: 1.11e-05 [parallel]: 1.847e-05 [flash_sp]: 1.185e-05 [merge_comm]: 9.75002e-06 [allreduce_fusion]: 8.85999e-06 [matmul_add_comm_reduction]: 2.727e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.816e-05 [virtual_dataset]: 1.553e-05 [get_grad_eliminate_]: 1.511e-05 [virtual_output]: 1.514e-05 [merge_forward]: 9.64e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 1.796e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.863e-05 [merge_recompute_call_nodes]: 1.46002e-06 [before_grad]: 2.867e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66e-06 [meta_fg_expand]: 0.00147242 [flash_sp_send_recv_attached]: 4.35e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 6.038e-05 [a_after_grad]: 8.12e-05 [renormalize]: 0.0025641 [add_forward_monad_depend]: 9.77999e-06 [auto_monad_grad]: 5.91e-06 [auto_monad_eliminator]: 5.666e-05 [cse]: 0.00016542 [a_3]: 0.00034362 [Cycle 2]: 0.00317803, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 4.822e-05 [loop_unroll]: 6.8e-05 [a_1]: 0.00158497 [with_stream_mark]: 1.223e-05 [recompute_prepare]: 1.145e-05 [updatestate_depend_eliminate]: 5.72999e-06 [updatestate_assign_eliminate]: 4.74e-06 [updatestate_loads_eliminate]: 4.20999e-06 [parameter_eliminate]: 1.27e-06 [a_2]: 0.00013884 [accelerated_algorithm]: 1.223e-05 [shard]: 1.30001e-06 [meta_shard_fg_expand]: 2.18002e-06 [shard_inline]: 9.94001e-06 [merge_send_recv]: 8.2e-06 [auto_parallel]: 7.71999e-06 [parallel]: 4.90999e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 5.76998e-06 [allreduce_fusion]: 5.14998e-06 [matmul_add_comm_reduction]: 8.95001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.103e-05 [virtual_dataset]: 9.53002e-06 [get_grad_eliminate_]: 9.34e-06 [virtual_output]: 9.37999e-06 [merge_forward]: 4.99003e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.935e-05 [merge_recompute_call_nodes]: 8.10018e-07 [before_grad]: 1.647e-05 [set_forward_comm_id_for_comm_node_pass]: 5.96e-06 [meta_fg_expand]: 7.862e-05 [flash_sp_send_recv_attached]: 1.01002e-06 [receive_attached]: 1.30999e-06 [after_resolve]: 1.609e-05 [a_after_grad]: 1.554e-05 [renormalize]: 0.00063989 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.539e-05 [cse]: 4.566e-05 [a_3]: 7.091e-05 [Cycle 3]: 0.00099524, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.13e-05 [loop_unroll]: 1.024e-05 [a_1]: 0.00027991 [with_stream_mark]: 1.021e-05 [recompute_prepare]: 1.006e-05 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 4.73001e-06 [updatestate_loads_eliminate]: 4.38001e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.00013542 [accelerated_algorithm]: 1.265e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 1.032e-05 [merge_send_recv]: 7.15e-06 [auto_parallel]: 7.45998e-06 [parallel]: 4.53999e-06 [flash_sp]: 1.02e-06 [merge_comm]: 5.71e-06 [allreduce_fusion]: 5.19998e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.098e-05 [virtual_dataset]: 9.76e-06 [get_grad_eliminate_]: 9.40001e-06 [virtual_output]: 9.14e-06 [merge_forward]: 4.65999e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 9.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.795e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.55e-05 [set_forward_comm_id_for_comm_node_pass]: 6.29999e-06 [meta_fg_expand]: 3.3e-06 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.544e-05 [a_after_grad]: 1.57e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.42999e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 1.138e-05 [cse]: 4.349e-05 [a_3]: 6.672e-05 [py_interpret_to_execute_after_opt_a]: 1.029e-05 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 5.315e-05 [convert_after_rewriter]: 1.059e-05 [order_py_execute_after_rewriter]: 7.20003e-06 [mutable_eliminate]: 0.00047981 [opt_b]: 0.0003128, [1] [Cycle 1]: 0.00030626, [7] [b_1]: 0.00020968 [b_2]: 1.163e-05 [updatestate_depend_eliminate]: 7.91001e-06 [updatestate_assign_eliminate]: 4.72e-06 [updatestate_loads_eliminate]: 4.37998e-06 [renormalize]: 5.00004e-07 [cse]: 3.307e-05 [optimize_parallel_all_gather_comm]: 2.163e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.122e-05 [loop_unroll]: 0.0004184 [opt_after_cconv]: 0.00014644, [1] [Cycle 1]: 0.00014032, [7] [c_1]: 5.352e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 7.6e-06 [updatestate_assign_eliminate]: 4.90999e-06 [updatestate_loads_eliminate]: 4.33001e-06 [cse]: 3.259e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 3.704e-05 [tuple_transform]: 0.00010994, [1] [Cycle 1]: 0.00010502, [4] [d_1]: 7.34e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 1.096e-05 [partial_unused_args_eliminate]: 1.86e-06 [add_recomputation]: 6.166e-05 [cse_after_recomputation]: 3.429e-05, [1] [Cycle 1]: 2.92e-05, [1] [cse]: 2.381e-05 [environ_conv]: 9.02999e-06 [swap_dp_allreduce_reducescatter]: 8.37998e-06 [bias_add_comm_swap]: 3.28998e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.60002e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.18998e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.34998e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.905e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 5.47999e-06 [overlap_recompute_and_grad_model_parallel]: 5.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 5.39998e-06 [overlap_grad_flash_sp]: 2.728e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.42e-06 [symbol_engine_optimizer]: 0.0001013, [1] [Cycle 1]: 9.693e-05, [6] [build]: 1.003e-05 [elim_shapecalc]: 1.296e-05 [elim_not_effective]: 1.912e-05 [opt_reshape]: 1.059e-05 [fold_const_symbol]: 1.668e-05 [renormalize]: 2.20025e-07 [detach_backward]: 1.94999e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 2.434e-05 [get_jit_bprop_graph]: 1.32999e-06 [rewriter_after_jit_bprop_graph]: 3.55003e-06 [opt_after_jit_grad]: 0.00046613 [validate]: 4.681e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00910589 [execute]: 8.58001e-06 Sums bootstrap : 0.000551s : 1.59% type_inference : 0.011536s : 33.21% event_method : 0.000048s : 0.14% auto_monad : 0.000122s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000050s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000145s : 0.42% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000134s : 0.39% optimize.opt_a.loop_unroll : 0.000139s : 0.40% optimize.opt_a.a_1 : 0.003307s : 9.52% optimize.opt_a.with_stream_mark : 0.000045s : 0.13% optimize.opt_a.recompute_prepare : 0.000043s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000519s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.16% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.10% optimize.opt_a.merge_send_recv : 0.000032s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000034s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000061s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.06% optimize.opt_a.meta_fg_expand : 0.001554s : 4.47% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000092s : 0.26% optimize.opt_a.a_after_grad : 0.000112s : 0.32% optimize.opt_a.renormalize : 0.003204s : 9.22% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.24% optimize.opt_a.cse : 0.000255s : 0.73% optimize.opt_a.a_3 : 0.000481s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000053s : 0.15% optimize.convert_after_rewriter : 0.000011s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000480s : 1.38% optimize.opt_b.b_1 : 0.000210s : 0.60% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000418s : 1.20% optimize.opt_after_cconv.c_1 : 0.000054s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000033s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000037s : 0.11% optimize.tuple_transform.d_1 : 0.000073s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000062s : 0.18% optimize.cse_after_recomputation.cse : 0.000024s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000027s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000466s : 1.34% validate : 0.000047s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.009106s : 26.21% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000777 231 5.89% : 0.000046s : 12: substitution.arithmetic_simplify 2.40% : 0.000019s : 4: substitution.cast_eliminate 0.38% : 0.000003s : 6: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 6: substitution.fold_const_symbol 1.11% : 0.000009s : 9: substitution.graph_param_transform 0.42% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.04% : 0.000428s : 17: substitution.inline 2.01% : 0.000016s : 2: substitution.inline_without_move 1.41% : 0.000011s : 22: substitution.j_node_and_user_rematch 1.89% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000014s : 11: substitution.minmaximum_grad 0.64% : 0.000005s : 5: substitution.partial_eliminate 2.00% : 0.000016s : 22: substitution.remove_not_recompute_node 3.12% : 0.000024s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.54% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.64% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011461 2 86.72% : 0.009939s : 1: type_inference.infer 13.28% : 0.001522s : 1: type_inference.specialize ------[replace.] 0.000215 33 57.30% : 0.000123s : 17: replace.inline 42.70% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000453 33 92.45% : 0.000419s : 17: match.inline 7.55% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000806 5998 1.04% : 0.000008s : 70: predicate.accumulaten_eliminater 0.32% : 0.000003s : 9: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 34: predicate.addn_check_dump 1.05% : 0.000008s : 70: predicate.addn_zero_filter 1.01% : 0.000008s : 70: predicate.adjust_all_reduce_mul_add 1.95% : 0.000016s : 104: predicate.arithmetic_simplify 1.09% : 0.000009s : 70: predicate.cast_eliminate 1.12% : 0.000009s : 71: predicate.check_bprop_eliminate 0.52% : 0.000004s : 34: predicate.compare_switch_simplify 0.10% : 0.000001s : 9: predicate.const_output_eliminate 0.52% : 0.000004s : 34: predicate.depend_value_elim 1.14% : 0.000009s : 70: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 70: predicate.dict_get_item_eliminator 1.08% : 0.000009s : 70: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 18: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 9: predicate.elim_not_effective 0.16% : 0.000001s : 9: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000010s : 79: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 79: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 79: predicate.environ_get_depend_swap 1.72% : 0.000014s : 113: predicate.environ_get_eliminate 1.18% : 0.000010s : 79: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 103: predicate.exchange_switch_depend_value 2.17% : 0.000017s : 103: predicate.float_depend_g_call 0.51% : 0.000004s : 34: predicate.float_environ_get_switch 0.68% : 0.000005s : 43: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 9: predicate.fold_const_symbol 0.55% : 0.000004s : 34: predicate.get_grad_eliminate 0.09% : 0.000001s : 9: predicate.graph_param_transform 0.54% : 0.000004s : 34: predicate.incorporate_call 0.50% : 0.000004s : 34: predicate.incorporate_call_switch 5.45% : 0.000044s : 259: predicate.inline 1.24% : 0.000010s : 57: predicate.inline_without_move 0.30% : 0.000002s : 34: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 34: predicate.less_batch_normalization 1.61% : 0.000013s : 104: predicate.list_to_tuple_eliminator_ 2.60% : 0.000021s : 174: predicate.load_eliminater 0.32% : 0.000003s : 9: predicate.loop_unroll_after_grad 4.95% : 0.000040s : 138: predicate.loop_unroll_before_grad 1.37% : 0.000011s : 88: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 34: predicate.merge_addn 1.10% : 0.000009s : 71: predicate.micro_step_allgather_replace 1.10% : 0.000009s : 71: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 70: predicate.minmaximum_grad 0.33% : 0.000003s : 9: predicate.mutable_eliminate 0.15% : 0.000001s : 9: predicate.opt_reshape 0.17% : 0.000001s : 9: predicate.parallel_virtual_node 1.89% : 0.000015s : 103: predicate.partial_defer_inline 1.68% : 0.000014s : 95: predicate.partial_eliminate 1.02% : 0.000008s : 70: predicate.print_const_string_wrapper 0.53% : 0.000004s : 34: predicate.reduce_all_const_elim 1.28% : 0.000010s : 70: predicate.reduce_eliminate 2.62% : 0.000021s : 174: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 34: predicate.remove_not_recompute_node 1.84% : 0.000015s : 157: predicate.replace_applicator 0.60% : 0.000005s : 57: predicate.replace_old_param 0.11% : 0.000001s : 9: predicate.reset_defer_inline 1.04% : 0.000008s : 70: predicate.reshape_eliminate 1.11% : 0.000009s : 71: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 9: predicate.row_tensor_eliminate 1.25% : 0.000010s : 71: predicate.same_eliminate 0.37% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 34: predicate.shard_identity_eliminate 0.31% : 0.000002s : 18: predicate.special_op_eliminate 0.63% : 0.000005s : 34: predicate.specialize_transform 1.19% : 0.000010s : 71: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000009s : 57: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 9: predicate.switch_call_monad_eliminater 1.77% : 0.000014s : 103: predicate.switch_defer_inline 2.82% : 0.000023s : 174: predicate.switch_layer_defer_inline 4.77% : 0.000038s : 284: predicate.switch_simplify 1.04% : 0.000008s : 70: predicate.tile_eliminate 1.03% : 0.000008s : 70: predicate.transpose_eliminate 1.42% : 0.000011s : 88: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000012s : 88: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000011s : 88: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000022s : 138: predicate.tuple_list_get_item_eliminator 1.38% : 0.000011s : 88: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000016s : 122: predicate.tuple_list_set_item_eliminator 1.62% : 0.000013s : 104: predicate.tuple_to_list_eliminator_ 2.58% : 0.000021s : 174: predicate.updatestate_pure_node_eliminater 3.17% : 0.000026s : 208: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 9: predicate.value_based_eliminate 0.56% : 0.000005s : 34: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 34: predicate.virtual_output_eliminate 0.15% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001644 34 58.62% : 0.000963s : 13: func_graph_cloner_run.FuncGraphClonerGraph 41.38% : 0.000680s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064535 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.72% : 0.003046s : 1: add_attr 4.71% : 0.003037s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000129s : 1: auto_monad 0.04% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.90% : 0.000580s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000022s : 1: control_data_broadcast_order 0.02% : 0.000014s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000488s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.85% : 0.005069s : 117: opt.transform.opt_a 0.08% : 0.000052s : 1: opt.transform.opt_after_cconv 0.06% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000196s : 28: opt.transform.opt_b 0.13% : 0.000082s : 2: opt.transform.opt_trans_graph 0.09% : 0.000056s : 4: opt.transform.symbol_engine_opt 17.74% : 0.011449s : 1: opt_a 0.23% : 0.000150s : 1: opt_after_cconv 0.74% : 0.000476s : 1: opt_after_jit_grad 0.49% : 0.000316s : 1: opt_b 21.39% : 0.013804s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.06% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000042s : 1: remove_dup_value 2.75% : 0.001772s : 2: renormalize.infer 2.20% : 0.001419s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000057s : 1: rewriter_after_opt_a 0.23% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000104s : 1: symbol_engine_optimizer 14.15% : 0.009131s : 1: task_emit 0.17% : 0.000113s : 1: tuple_transform 17.90% : 0.011551s : 1: type_inference 0.13% : 0.000082s : 1: validate TotalTime = 0.0187105, [24] [bootstrap]: 0.00051475 [type_inference]: 0.00436743 [event_method]: 1.08e-05 [auto_monad]: 5.409e-05 [graph_reusing]: 5.47999e-06 [inline]: 2.07999e-06 [add_attr]: 0.00301359, [1] [add_attr_with_inline]: 0.00300606, [1] [Cycle 1]: 4.312e-05, [2] [tag_attr]: 1.201e-05 [meta_addattr_fg_expand]: 3.65e-06 [parallel-infer-symbol]: 3.39001e-06 [pre_auto_parallel]: 2.19e-05 [insert-virtual-dataset]: 2.96999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00367602, [53] [py_interpret_to_execute]: 1.507e-05 [rewriter_before_opt_a]: 3.905e-05 [opt_a]: 0.00188183, [2] [Cycle 1]: 0.00126329, [45] [expand_dump_flag]: 3.31001e-06 [switch_simplify]: 2.442e-05 [loop_unroll]: 1.419e-05 [a_1]: 0.00029593 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 7.23e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.45e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.86998e-06 [a_2]: 7.7e-05 [accelerated_algorithm]: 6.06e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.49998e-06 [auto_parallel]: 5.66e-06 [parallel]: 1.875e-05 [flash_sp]: 7.58001e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 9.14998e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.49002e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 9.05999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.072e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.49e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 9.04998e-06 [renormalize]: 0.00034588 [add_forward_monad_depend]: 4.14002e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.307e-05 [cse]: 2.676e-05 [a_3]: 4.03e-05 [Cycle 2]: 0.00060952, [45] [expand_dump_flag]: 9.40025e-07 [switch_simplify]: 6.76e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012397 [with_stream_mark]: 9.51e-06 [recompute_prepare]: 5.67001e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.36998e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.765e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.18999e-06 [auto_parallel]: 5.06002e-06 [parallel]: 4.60999e-06 [flash_sp]: 3.43e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.18002e-06 [virtual_dataset]: 5.10999e-06 [get_grad_eliminate_]: 4.80999e-06 [virtual_output]: 4.91002e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32999e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 7.87003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 9.41e-06 [a_after_grad]: 8.02e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.274e-05 [a_3]: 3.238e-05 [py_interpret_to_execute_after_opt_a]: 6.96999e-06 [slice_cell_reuse_recomputed_activation]: 2.27001e-06 [rewriter_after_opt_a]: 3.196e-05 [convert_after_rewriter]: 6.64001e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00044248 [opt_b]: 0.00018121, [1] [Cycle 1]: 0.00017482, [7] [b_1]: 0.00010884 [b_2]: 7.26999e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 3.50003e-07 [cse]: 1.524e-05 [optimize_parallel_all_gather_comm]: 1.598e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 2.238e-05 [loop_unroll]: 0.00040634 [opt_after_cconv]: 9.314e-05, [1] [Cycle 1]: 8.753e-05, [7] [c_1]: 2.76e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.512e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.317e-05 [tuple_transform]: 6.903e-05, [1] [Cycle 1]: 6.455e-05, [4] [d_1]: 3.903e-05 [none_parameter_eliminate]: 1.40001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 4.512e-05 [cse_after_recomputation]: 1.993e-05, [1] [Cycle 1]: 1.55e-05, [1] [cse]: 1.018e-05 [environ_conv]: 4.96002e-06 [swap_dp_allreduce_reducescatter]: 5.08002e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.64998e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.40002e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.30999e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.104e-05 [grouped_pairwise_exchange_alltoall]: 1.91998e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.78e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 2.00002e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 6.817e-05, [1] [Cycle 1]: 6.411e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.09002e-06 [elim_not_effective]: 1.163e-05 [opt_reshape]: 6.05002e-06 [fold_const_symbol]: 9.04998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.76003e-06 [auto_monad_reorder]: 1.592e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00044328 [validate]: 3.015e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00633624 [execute]: 6.78e-06 Sums bootstrap : 0.000515s : 3.50% type_inference : 0.004367s : 29.66% event_method : 0.000011s : 0.07% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000420s : 2.85% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000346s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000442s : 3.00% optimize.opt_b.b_1 : 0.000109s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000406s : 2.76% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000443s : 3.01% validate : 0.000030s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006336s : 43.03% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.14% : 0.000022s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 66.03% : 0.000081s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.35% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004325 2 91.86% : 0.003973s : 1: type_inference.infer 8.14% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000136 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.42% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.93% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.72% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.21% : 0.000002s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.68% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000242 6 42.56% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.44% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026676 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.31% : 0.003018s : 1: add_attr 11.28% : 0.003009s : 1: add_attr_with_inline 0.02% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.05% : 0.000547s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.56% : 0.000415s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000451s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.89% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.07% : 0.001885s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.70% : 0.000452s : 1: opt_after_jit_grad 0.69% : 0.000185s : 1: opt_b 13.80% : 0.003680s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.72% : 0.000191s : 1: renormalize.infer 0.56% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.79% : 0.006346s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.42% : 0.004381s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0375933, [24] [bootstrap]: 0.00047397 [type_inference]: 0.0102992 [event_method]: 4.475e-05 [auto_monad]: 0.00012099 [graph_reusing]: 8.05e-06 [inline]: 2.16e-06 [add_attr]: 0.00311046, [1] [add_attr_with_inline]: 0.00310162, [1] [Cycle 1]: 7.035e-05, [2] [tag_attr]: 3.434e-05 [meta_addattr_fg_expand]: 8.37998e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 4.791e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.86e-06 [optimize]: 0.0137312, [53] [py_interpret_to_execute]: 4.323e-05 [rewriter_before_opt_a]: 0.0001315 [opt_a]: 0.0113511, [3] [Cycle 1]: 0.00718809, [45] [expand_dump_flag]: 3.65998e-06 [switch_simplify]: 6.716e-05 [loop_unroll]: 5.418e-05 [a_1]: 0.00135997 [with_stream_mark]: 2.344e-05 [recompute_prepare]: 2.16e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 8.01001e-06 [updatestate_loads_eliminate]: 7.76001e-06 [parameter_eliminate]: 2.67001e-06 [a_2]: 0.00025311 [accelerated_algorithm]: 3.121e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.642e-05 [merge_send_recv]: 1.613e-05 [auto_parallel]: 1.105e-05 [parallel]: 1.925e-05 [flash_sp]: 1.149e-05 [merge_comm]: 9.51e-06 [allreduce_fusion]: 8.70999e-06 [matmul_add_comm_reduction]: 2.746e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.776e-05 [virtual_dataset]: 1.544e-05 [get_grad_eliminate_]: 1.576e-05 [virtual_output]: 1.565e-05 [merge_forward]: 9.32999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.827e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.867e-05 [merge_recompute_call_nodes]: 1.69998e-06 [before_grad]: 2.804e-05 [set_forward_comm_id_for_comm_node_pass]: 1.994e-05 [meta_fg_expand]: 0.00138431 [flash_sp_send_recv_attached]: 3.47002e-06 [receive_attached]: 2.82002e-06 [after_resolve]: 5.95e-05 [a_after_grad]: 8.245e-05 [renormalize]: 0.00265525 [add_forward_monad_depend]: 9.34e-06 [auto_monad_grad]: 5.27001e-06 [auto_monad_eliminator]: 5.693e-05 [cse]: 0.00016823 [a_3]: 0.00034377 [Cycle 2]: 0.00316799, [45] [expand_dump_flag]: 1.51998e-06 [switch_simplify]: 4.84e-05 [loop_unroll]: 4.404e-05 [a_1]: 0.0015744 [with_stream_mark]: 1.296e-05 [recompute_prepare]: 1.17e-05 [updatestate_depend_eliminate]: 6.04001e-06 [updatestate_assign_eliminate]: 4.87998e-06 [updatestate_loads_eliminate]: 4.43999e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00014084 [accelerated_algorithm]: 1.303e-05 [shard]: 1.17e-06 [meta_shard_fg_expand]: 2.16998e-06 [shard_inline]: 1.032e-05 [merge_send_recv]: 8.30999e-06 [auto_parallel]: 8.80001e-06 [parallel]: 5.15001e-06 [flash_sp]: 3.5e-06 [merge_comm]: 5.81e-06 [allreduce_fusion]: 5.46002e-06 [matmul_add_comm_reduction]: 8.87e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.205e-05 [virtual_dataset]: 9.56998e-06 [get_grad_eliminate_]: 9.76e-06 [virtual_output]: 9.72999e-06 [merge_forward]: 5.02e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.055e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.864e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.599e-05 [set_forward_comm_id_for_comm_node_pass]: 6.21e-06 [meta_fg_expand]: 4.309e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.579e-05 [a_after_grad]: 1.663e-05 [renormalize]: 0.00063204 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 1.541e-05 [cse]: 4.889e-05 [a_3]: 0.00012826 [Cycle 3]: 0.00098119, [45] [expand_dump_flag]: 1.17999e-06 [switch_simplify]: 1.204e-05 [loop_unroll]: 1.034e-05 [a_1]: 0.00028209 [with_stream_mark]: 1.085e-05 [recompute_prepare]: 9.94001e-06 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 4.50999e-06 [updatestate_loads_eliminate]: 4.36002e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 0.00013758 [accelerated_algorithm]: 1.261e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.94999e-06 [shard_inline]: 1.039e-05 [merge_send_recv]: 7.18e-06 [auto_parallel]: 7.71001e-06 [parallel]: 4.83001e-06 [flash_sp]: 8.99978e-07 [merge_comm]: 5.86e-06 [allreduce_fusion]: 5.19e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.134e-05 [virtual_dataset]: 9.86e-06 [get_grad_eliminate_]: 9.77001e-06 [virtual_output]: 9.20001e-06 [merge_forward]: 5.07999e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 1.004e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.864e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.586e-05 [set_forward_comm_id_for_comm_node_pass]: 6.36e-06 [meta_fg_expand]: 3.56999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.435e-05 [a_after_grad]: 1.567e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.30999e-06 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 1.218e-05 [cse]: 2.894e-05 [a_3]: 6.322e-05 [py_interpret_to_execute_after_opt_a]: 1.161e-05 [slice_cell_reuse_recomputed_activation]: 2.62001e-06 [rewriter_after_opt_a]: 5.694e-05 [convert_after_rewriter]: 1.03e-05 [order_py_execute_after_rewriter]: 7.95e-06 [mutable_eliminate]: 0.00049317 [opt_b]: 0.00031621, [1] [Cycle 1]: 0.00030935, [7] [b_1]: 0.00021083 [b_2]: 1.192e-05 [updatestate_depend_eliminate]: 7.95e-06 [updatestate_assign_eliminate]: 4.69998e-06 [updatestate_loads_eliminate]: 4.48999e-06 [renormalize]: 8.60018e-07 [cse]: 3.378e-05 [optimize_parallel_all_gather_comm]: 2.263e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.185e-05 [loop_unroll]: 0.00041772 [opt_after_cconv]: 0.00014767, [1] [Cycle 1]: 0.00014162, [7] [c_1]: 5.473e-05 [parameter_eliminate]: 2.64001e-06 [updatestate_depend_eliminate]: 7.8e-06 [updatestate_assign_eliminate]: 4.58999e-06 [updatestate_loads_eliminate]: 4.38001e-06 [cse]: 3.25e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 3.67e-05 [tuple_transform]: 0.00011029, [1] [Cycle 1]: 0.00010539, [4] [d_1]: 7.376e-05 [none_parameter_eliminate]: 1.90001e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 1.092e-05 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 6.35e-05 [cse_after_recomputation]: 3.405e-05, [1] [Cycle 1]: 2.929e-05, [1] [cse]: 2.413e-05 [environ_conv]: 8.94e-06 [swap_dp_allreduce_reducescatter]: 8.99e-06 [bias_add_comm_swap]: 3.23e-06 [label_micro_interleaved_index]: 4.33001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.39999e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 3.03998e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 1.30999e-06 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.952e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.62999e-06 [overlap_recompute_and_grad_model_parallel]: 5.79999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 5.96e-06 [overlap_grad_flash_sp]: 2.694e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 0.00010199, [1] [Cycle 1]: 9.773e-05, [6] [build]: 1.088e-05 [elim_shapecalc]: 1.377e-05 [elim_not_effective]: 1.915e-05 [opt_reshape]: 1.042e-05 [fold_const_symbol]: 1.635e-05 [renormalize]: 2.69996e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.553e-05 [get_jit_bprop_graph]: 1.16002e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00045893 [validate]: 4.675e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00897877 [execute]: 7.3e-06 Sums bootstrap : 0.000474s : 1.43% type_inference : 0.010299s : 31.01% event_method : 0.000045s : 0.13% auto_monad : 0.000121s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000043s : 0.13% optimize.rewriter_before_opt_a : 0.000132s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000128s : 0.38% optimize.opt_a.loop_unroll : 0.000109s : 0.33% optimize.opt_a.a_1 : 0.003216s : 9.68% optimize.opt_a.with_stream_mark : 0.000047s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000532s : 1.60% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000037s : 0.11% optimize.opt_a.merge_send_recv : 0.000032s : 0.10% optimize.opt_a.auto_parallel : 0.000028s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.11% optimize.opt_a.virtual_output : 0.000035s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000039s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000060s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000033s : 0.10% optimize.opt_a.meta_fg_expand : 0.001431s : 4.31% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000115s : 0.35% optimize.opt_a.renormalize : 0.003287s : 9.90% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.25% optimize.opt_a.cse : 0.000246s : 0.74% optimize.opt_a.a_3 : 0.000535s : 1.61% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000057s : 0.17% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000008s : 0.02% optimize.mutable_eliminate : 0.000493s : 1.48% optimize.opt_b.b_1 : 0.000211s : 0.63% optimize.opt_b.b_2 : 0.000012s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.07% optimize.loop_unroll : 0.000418s : 1.26% optimize.opt_after_cconv.c_1 : 0.000055s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000032s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000037s : 0.11% optimize.tuple_transform.d_1 : 0.000074s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000063s : 0.19% optimize.cse_after_recomputation.cse : 0.000024s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000020s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000027s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000459s : 1.38% validate : 0.000047s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008979s : 27.03% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000772 227 5.77% : 0.000045s : 11: substitution.arithmetic_simplify 2.47% : 0.000019s : 4: substitution.cast_eliminate 0.38% : 0.000003s : 6: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 1.06% : 0.000008s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 6: substitution.fold_const_symbol 1.12% : 0.000009s : 9: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.31% : 0.000002s : 2: substitution.incorporate_call_switch 54.95% : 0.000424s : 16: substitution.inline 2.10% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000011s : 22: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000006s : 5: substitution.partial_eliminate 1.90% : 0.000015s : 22: substitution.remove_not_recompute_node 3.18% : 0.000025s : 10: substitution.replace_applicator 1.30% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.20% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010229 2 86.94% : 0.008893s : 1: type_inference.infer 13.06% : 0.001336s : 1: type_inference.specialize ------[replace.] 0.000201 30 59.15% : 0.000119s : 16: replace.inline 40.85% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000447 30 92.95% : 0.000416s : 16: match.inline 7.05% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000767 5897 1.08% : 0.000008s : 69: predicate.accumulaten_eliminater 0.31% : 0.000002s : 9: predicate.ad_related_special_op_eliminate 0.54% : 0.000004s : 34: predicate.addn_check_dump 1.05% : 0.000008s : 69: predicate.addn_zero_filter 1.04% : 0.000008s : 69: predicate.adjust_all_reduce_mul_add 2.05% : 0.000016s : 103: predicate.arithmetic_simplify 1.22% : 0.000009s : 69: predicate.cast_eliminate 1.16% : 0.000009s : 71: predicate.check_bprop_eliminate 0.54% : 0.000004s : 34: predicate.compare_switch_simplify 0.10% : 0.000001s : 9: predicate.const_output_eliminate 0.53% : 0.000004s : 34: predicate.depend_value_elim 1.18% : 0.000009s : 69: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 69: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 69: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 18: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 9: predicate.elim_not_effective 0.18% : 0.000001s : 9: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000010s : 78: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 78: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 78: predicate.environ_get_depend_swap 1.76% : 0.000014s : 112: predicate.environ_get_eliminate 1.20% : 0.000009s : 78: predicate.environ_get_set_eliminate 1.66% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.19% : 0.000017s : 99: predicate.float_depend_g_call 0.53% : 0.000004s : 34: predicate.float_environ_get_switch 0.69% : 0.000005s : 43: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 9: predicate.fold_const_symbol 0.57% : 0.000004s : 34: predicate.get_grad_eliminate 0.10% : 0.000001s : 9: predicate.graph_param_transform 0.56% : 0.000004s : 34: predicate.incorporate_call 0.50% : 0.000004s : 34: predicate.incorporate_call_switch 5.61% : 0.000043s : 254: predicate.inline 1.29% : 0.000010s : 57: predicate.inline_without_move 0.31% : 0.000002s : 34: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 34: predicate.less_batch_normalization 1.61% : 0.000012s : 101: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 170: predicate.load_eliminater 0.33% : 0.000003s : 9: predicate.loop_unroll_after_grad 2.17% : 0.000017s : 130: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 87: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 34: predicate.merge_addn 1.14% : 0.000009s : 71: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 71: predicate.mini_step_allgather_replace 1.11% : 0.000009s : 69: predicate.minmaximum_grad 0.33% : 0.000003s : 9: predicate.mutable_eliminate 0.16% : 0.000001s : 9: predicate.opt_reshape 0.17% : 0.000001s : 9: predicate.parallel_virtual_node 1.91% : 0.000015s : 99: predicate.partial_defer_inline 1.68% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 69: predicate.print_const_string_wrapper 0.55% : 0.000004s : 34: predicate.reduce_all_const_elim 1.31% : 0.000010s : 69: predicate.reduce_eliminate 2.72% : 0.000021s : 170: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 34: predicate.remove_not_recompute_node 1.87% : 0.000014s : 154: predicate.replace_applicator 0.60% : 0.000005s : 57: predicate.replace_old_param 0.12% : 0.000001s : 9: predicate.reset_defer_inline 1.09% : 0.000008s : 69: predicate.reshape_eliminate 1.17% : 0.000009s : 71: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 9: predicate.row_tensor_eliminate 1.27% : 0.000010s : 71: predicate.same_eliminate 0.39% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 34: predicate.shard_identity_eliminate 0.32% : 0.000002s : 18: predicate.special_op_eliminate 0.66% : 0.000005s : 34: predicate.specialize_transform 1.29% : 0.000010s : 71: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 57: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 9: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 99: predicate.switch_defer_inline 2.88% : 0.000022s : 170: predicate.switch_layer_defer_inline 4.78% : 0.000037s : 272: predicate.switch_simplify 1.07% : 0.000008s : 69: predicate.tile_eliminate 1.06% : 0.000008s : 69: predicate.transpose_eliminate 1.45% : 0.000011s : 87: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000012s : 87: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 87: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000021s : 135: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 87: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000016s : 121: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 101: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 170: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 204: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 9: predicate.value_based_eliminate 0.60% : 0.000005s : 34: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 34: predicate.virtual_output_eliminate 0.15% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.20% : 0.000002s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001514 32 57.27% : 0.000867s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.73% : 0.000647s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063101 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.94% : 0.003115s : 1: add_attr 4.92% : 0.003105s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000068s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000128s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.81% : 0.000511s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000023s : 1: control_data_broadcast_order 0.02% : 0.000014s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.80% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.94% : 0.005013s : 117: opt.transform.opt_a 0.08% : 0.000053s : 1: opt.transform.opt_after_cconv 0.06% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000197s : 28: opt.transform.opt_b 0.13% : 0.000082s : 2: opt.transform.opt_trans_graph 0.09% : 0.000056s : 4: opt.transform.symbol_engine_opt 17.99% : 0.011354s : 1: opt_a 0.24% : 0.000151s : 1: opt_after_cconv 0.74% : 0.000468s : 1: opt_after_jit_grad 0.51% : 0.000320s : 1: opt_b 21.77% : 0.013736s : 1: optimize 0.04% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000011s : 1: order_py_execute_after_rewriter 0.05% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000053s : 1: pre_auto_parallel 0.08% : 0.000048s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000041s : 1: remove_dup_value 2.83% : 0.001784s : 2: renormalize.infer 2.36% : 0.001490s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000062s : 1: rewriter_after_opt_a 0.22% : 0.000136s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000105s : 1: symbol_engine_optimizer 14.25% : 0.008990s : 1: task_emit 0.18% : 0.000113s : 1: tuple_transform 16.35% : 0.010316s : 1: type_inference 0.13% : 0.000081s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x0-kbk],max_mem:38.0M . TotalTime = 0.881503, [24] [bootstrap]: 0.00062493 [type_inference]: 0.00645873 [event_method]: 1.42e-05 [auto_monad]: 6.194e-05 [graph_reusing]: 5.31998e-06 [inline]: 2.24001e-06 [add_attr]: 0.00374038, [1] [add_attr_with_inline]: 0.00372696, [1] [Cycle 1]: 5.425e-05, [2] [tag_attr]: 1.823e-05 [meta_addattr_fg_expand]: 4.18999e-06 [parallel-infer-symbol]: 3.33998e-06 [pre_auto_parallel]: 3.332e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.92999e-06 [optimize]: 0.00435171, [53] [py_interpret_to_execute]: 2.154e-05 [rewriter_before_opt_a]: 6.393e-05 [opt_a]: 0.00241621, [2] [Cycle 1]: 0.00181225, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 3.291e-05 [loop_unroll]: 2.106e-05 [a_1]: 0.00048495 [with_stream_mark]: 1.54e-05 [recompute_prepare]: 7.83999e-06 [updatestate_depend_eliminate]: 4.32998e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 3.2e-06 [parameter_eliminate]: 1.81998e-06 [a_2]: 7.711e-05 [accelerated_algorithm]: 6.30002e-06 [shard]: 2.73e-06 [meta_shard_fg_expand]: 1.99e-06 [shard_inline]: 6.31e-06 [merge_send_recv]: 8.68001e-06 [auto_parallel]: 6.00002e-06 [parallel]: 2.632e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.72002e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 9.77001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.48e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 1.093e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 1.039e-05 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 3.00998e-06 [flash_sp_send_recv_attached]: 2.93e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 1.096e-05 [a_after_grad]: 8.93002e-06 [renormalize]: 0.00065056 [add_forward_monad_depend]: 5.30001e-06 [auto_monad_grad]: 1.96e-06 [auto_monad_eliminator]: 1.476e-05 [cse]: 3.007e-05 [a_3]: 4.186e-05 [Cycle 2]: 0.00059435, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012682 [with_stream_mark]: 1.039e-05 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.20002e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.811e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.36998e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 4.86002e-06 [auto_parallel]: 5.17999e-06 [parallel]: 4.92e-06 [flash_sp]: 3.43e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 5.20001e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 6.01998e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.67e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.87999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.15999e-06 [after_resolve]: 9.38002e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.83e-06 [cse]: 1.352e-05 [a_3]: 3.119e-05 [py_interpret_to_execute_after_opt_a]: 8.17003e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.269e-05 [convert_after_rewriter]: 6.81999e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00050686 [opt_b]: 0.00018536, [1] [Cycle 1]: 0.00017919, [7] [b_1]: 0.00010872 [b_2]: 7.06001e-06 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 7.50006e-07 [cse]: 1.783e-05 [optimize_parallel_all_gather_comm]: 1.838e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 2.572e-05 [loop_unroll]: 0.00041536 [opt_after_cconv]: 9.618e-05, [1] [Cycle 1]: 9.026e-05, [7] [c_1]: 2.852e-05 [parameter_eliminate]: 2.69001e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.53003e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.599e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.363e-05 [tuple_transform]: 7.033e-05, [1] [Cycle 1]: 6.601e-05, [4] [d_1]: 4.023e-05 [none_parameter_eliminate]: 1.87001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.23998e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 5.408e-05 [cse_after_recomputation]: 2.112e-05, [1] [Cycle 1]: 1.632e-05, [1] [cse]: 1.107e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.22999e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 3.13e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.33002e-06 [micro_interleaved_order_control]: 3.09999e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.55997e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.21997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.73002e-06 [control_data_broadcast_order]: 1.247e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 4.08001e-06 [overlap_grad_flash_sp]: 1.872e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.907e-05, [1] [Cycle 1]: 6.503e-05, [6] [build]: 2.94001e-06 [elim_shapecalc]: 8.54e-06 [elim_not_effective]: 1.214e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.95001e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.688e-05 [get_jit_bprop_graph]: 1.34e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00045526 [validate]: 3.504e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.865433 [execute]: 9.71e-06 Sums bootstrap : 0.000625s : 0.07% type_inference : 0.006459s : 0.74% event_method : 0.000014s : 0.00% auto_monad : 0.000062s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000033s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.00% optimize.rewriter_before_opt_a : 0.000064s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000612s : 0.07% optimize.opt_a.with_stream_mark : 0.000026s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000014s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000031s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000651s : 0.07% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.00% optimize.opt_a.cse : 0.000044s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000507s : 0.06% optimize.opt_b.b_1 : 0.000109s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.00% optimize.loop_unroll : 0.000415s : 0.05% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000455s : 0.05% validate : 0.000035s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.865433s : 98.71% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000183 30 13.82% : 0.000025s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.69% : 0.000001s : 2: substitution.fold_const_symbol 3.15% : 0.000006s : 4: substitution.graph_param_transform 68.19% : 0.000125s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.17% : 0.000004s : 4: substitution.remove_not_recompute_node 2.38% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006405 2 90.98% : 0.005827s : 1: type_inference.infer 9.02% : 0.000578s : 1: type_inference.specialize ------[replace.] 0.000042 5 71.68% : 0.000030s : 3: replace.inline 28.32% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000134 5 91.76% : 0.000123s : 3: match.inline 8.24% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 1.12% : 0.000002s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.92% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.33% : 0.000004s : 19: predicate.arithmetic_simplify 1.12% : 0.000002s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_depend_swap 1.87% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.59% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.40% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.56% : 0.000001s : 8: predicate.reduce_all_const_elim 1.23% : 0.000002s : 11: predicate.reduce_eliminate 2.46% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 8 45.04% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.96% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.891372 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.42% : 0.003745s : 1: add_attr 0.42% : 0.003731s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000067s : 1: auto_monad 0.00% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000664s : 1: bootstrap 0.00% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.06% : 0.000516s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000982s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000045s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.27% : 0.002419s : 1: opt_a 0.01% : 0.000100s : 1: opt_after_cconv 0.05% : 0.000465s : 1: opt_after_jit_grad 0.02% : 0.000189s : 1: opt_b 0.49% : 0.004356s : 1: optimize 0.00% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000038s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.04% : 0.000378s : 1: renormalize.infer 0.03% : 0.000265s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000037s : 1: rewriter_after_opt_a 0.01% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 97.09% : 0.865459s : 1: task_emit 0.01% : 0.000073s : 1: tuple_transform 0.73% : 0.006477s : 1: type_inference 0.01% : 0.000070s : 1: validate TotalTime = 0.145141, [24] [bootstrap]: 0.000463 [type_inference]: 0.00495479 [event_method]: 1.26e-05 [auto_monad]: 5.761e-05 [graph_reusing]: 5.86e-06 [inline]: 2.44999e-06 [add_attr]: 0.00351123, [1] [add_attr_with_inline]: 0.00349891, [1] [Cycle 1]: 6.368e-05, [2] [tag_attr]: 1.539e-05 [meta_addattr_fg_expand]: 3.44001e-06 [parallel-infer-symbol]: 3.90998e-06 [pre_auto_parallel]: 2.921e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.0125666, [53] [py_interpret_to_execute]: 1.901e-05 [rewriter_before_opt_a]: 4.591e-05 [opt_a]: 0.0103143, [2] [Cycle 1]: 0.00162113, [45] [expand_dump_flag]: 3.13e-06 [switch_simplify]: 2.654e-05 [loop_unroll]: 1.433e-05 [a_1]: 0.00032442 [with_stream_mark]: 2.053e-05 [recompute_prepare]: 7.55998e-06 [updatestate_depend_eliminate]: 4.08001e-06 [updatestate_assign_eliminate]: 3.26999e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 7.814e-05 [accelerated_algorithm]: 6.34001e-06 [shard]: 2.82002e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 9.77999e-06 [auto_parallel]: 7.26001e-06 [parallel]: 2.207e-05 [flash_sp]: 9.91e-06 [merge_comm]: 4.02e-06 [allreduce_fusion]: 3.82002e-06 [matmul_add_comm_reduction]: 1.06e-05 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 7.7e-06 [virtual_dataset]: 6.06998e-06 [get_grad_eliminate_]: 5.86e-06 [virtual_output]: 5.45001e-06 [merge_forward]: 4.17998e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.127e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.12e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.47001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.83001e-06 [meta_fg_expand]: 2.13002e-06 [flash_sp_send_recv_attached]: 2.71999e-06 [receive_attached]: 2.72001e-06 [after_resolve]: 1.104e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00062827 [add_forward_monad_depend]: 5.17e-06 [auto_monad_grad]: 2.84999e-06 [auto_monad_eliminator]: 1.468e-05 [cse]: 3.13e-05 [a_3]: 4.231e-05 [Cycle 2]: 0.00868136, [45] [expand_dump_flag]: 1.07998e-06 [switch_simplify]: 6.68e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00792646 [with_stream_mark]: 4.515e-05 [recompute_prepare]: 1.782e-05 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 3.01999e-06 [updatestate_loads_eliminate]: 4.15999e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 0.000142 [accelerated_algorithm]: 7.70998e-06 [shard]: 3.35003e-06 [meta_shard_fg_expand]: 2.86e-06 [shard_inline]: 6.01998e-06 [merge_send_recv]: 1.129e-05 [auto_parallel]: 1.027e-05 [parallel]: 1.093e-05 [flash_sp]: 4.88001e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 3.49001e-06 [matmul_add_comm_reduction]: 1.324e-05 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 6.73e-06 [virtual_dataset]: 5.87001e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 4.29002e-06 [cell_reuse_recompute_pass]: 3.14001e-06 [offload_activation]: 1.26e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.209e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 9.92001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.33999e-06 [meta_fg_expand]: 2.68003e-06 [flash_sp_send_recv_attached]: 1.73997e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 1.291e-05 [a_after_grad]: 8.65001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 3.33998e-06 [auto_monad_grad]: 3.45e-06 [auto_monad_eliminator]: 1.805e-05 [cse]: 3.492e-05 [a_3]: 3.539e-05 [py_interpret_to_execute_after_opt_a]: 2.062e-05 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 4.419e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 4.91997e-06 [mutable_eliminate]: 0.00074534 [opt_b]: 0.00020283, [1] [Cycle 1]: 0.00019371, [7] [b_1]: 0.00011495 [b_2]: 8.48001e-06 [updatestate_depend_eliminate]: 7.3e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.76e-06 [renormalize]: 2.80008e-07 [cse]: 2.163e-05 [optimize_parallel_all_gather_comm]: 1.793e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 3.685e-05 [loop_unroll]: 0.00043174 [opt_after_cconv]: 0.00010076, [1] [Cycle 1]: 9.458e-05, [7] [c_1]: 2.958e-05 [parameter_eliminate]: 3.56001e-06 [updatestate_depend_eliminate]: 5.77001e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.757e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.429e-05 [tuple_transform]: 7.412e-05, [1] [Cycle 1]: 6.939e-05, [4] [d_1]: 4.314e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 4.874e-05 [cse_after_recomputation]: 2.14e-05, [1] [Cycle 1]: 1.624e-05, [1] [cse]: 1.103e-05 [environ_conv]: 5.40001e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 3.33e-06 [label_micro_interleaved_index]: 5.37999e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.78e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.16002e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.24998e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.50999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.265e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.81001e-06 [overlap_recompute_and_grad_model_parallel]: 4.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.54998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.31002e-06 [overlap_grad_flash_sp]: 2.263e-05 [begin_end_overlap_inline]: 6.19999e-07 [split_matmul_comm_elemetwise]: 1.90001e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 7.455e-05, [1] [Cycle 1]: 6.977e-05, [6] [build]: 3.98001e-06 [elim_shapecalc]: 9.37001e-06 [elim_not_effective]: 1.168e-05 [opt_reshape]: 7.04001e-06 [fold_const_symbol]: 9.22999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.30002e-06 [pipeline_parallel_scheduler]: 1.76e-06 [auto_monad_reorder]: 1.676e-05 [get_jit_bprop_graph]: 1.83002e-06 [rewriter_after_jit_bprop_graph]: 5.38002e-06 [opt_after_jit_grad]: 0.00045582 [validate]: 4.477e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.122733 [execute]: 1.138e-05 Sums bootstrap : 0.000463s : 0.33% type_inference : 0.004955s : 3.53% event_method : 0.000013s : 0.01% auto_monad : 0.000058s : 0.04% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000029s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.01% optimize.rewriter_before_opt_a : 0.000046s : 0.03% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.02% optimize.opt_a.loop_unroll : 0.000020s : 0.01% optimize.opt_a.a_1 : 0.008251s : 5.87% optimize.opt_a.with_stream_mark : 0.000066s : 0.05% optimize.opt_a.recompute_prepare : 0.000025s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000007s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000220s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.01% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000005s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000021s : 0.01% optimize.opt_a.auto_parallel : 0.000018s : 0.01% optimize.opt_a.parallel : 0.000033s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000024s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000024s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000024s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000628s : 0.45% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000006s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000033s : 0.02% optimize.opt_a.cse : 0.000066s : 0.05% optimize.opt_a.a_3 : 0.000078s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000021s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000044s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000745s : 0.53% optimize.opt_b.b_1 : 0.000115s : 0.08% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000022s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.03% optimize.loop_unroll : 0.000432s : 0.31% optimize.opt_after_cconv.c_1 : 0.000030s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000043s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000456s : 0.32% validate : 0.000045s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.122733s : 87.34% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000163 26 22.44% : 0.000037s : 4: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.94% : 0.000006s : 4: substitution.graph_param_transform 61.93% : 0.000101s : 2: substitution.inline 2.56% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.94% : 0.000005s : 4: substitution.remove_not_recompute_node 4.34% : 0.000007s : 4: substitution.replace_old_param ------[type_inference.] 0.004901 2 91.27% : 0.004473s : 1: type_inference.infer 8.73% : 0.000428s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000100 2 100.00% : 0.000100s : 2: match.inline ------[predicate.] 0.000153 984 0.76% : 0.000001s : 9: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.65% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.93% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.97% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.79% : 0.000001s : 8: predicate.depend_value_elim 0.71% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.75% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.73% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.96% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.94% : 0.000001s : 13: predicate.environ_get_depend_swap 1.79% : 0.000003s : 21: predicate.environ_get_eliminate 0.94% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.88% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.96% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000009s : 44: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.24% : 0.000002s : 8: predicate.less_batch_normalization 1.52% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.57% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.98% : 0.000002s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.64% : 0.000001s : 9: predicate.minmaximum_grad 1.53% : 0.000002s : 4: predicate.mutable_eliminate 0.69% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.09% : 0.000002s : 11: predicate.partial_defer_inline 1.11% : 0.000002s : 13: predicate.partial_eliminate 0.69% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.00% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.79% : 0.000001s : 8: predicate.remove_not_recompute_node 1.24% : 0.000002s : 17: predicate.replace_applicator 0.96% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 4: predicate.row_tensor_eliminate 1.07% : 0.000002s : 8: predicate.same_eliminate 0.83% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 1.23% : 0.000002s : 8: predicate.special_op_eliminate 1.18% : 0.000002s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.95% : 0.000001s : 11: predicate.switch_defer_inline 1.60% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.12% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 1.25% : 0.000002s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.71% : 0.000003s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.92% : 0.000003s : 17: predicate.tuple_list_get_set_item_eliminator 2.79% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.33% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.96% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.79% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 6 40.58% : 0.000144s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.42% : 0.000211s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.170653 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.06% : 0.003518s : 1: add_attr 2.05% : 0.003503s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000063s : 1: auto_monad 0.01% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.29% : 0.000496s : 1: bootstrap 0.02% : 0.000040s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000020s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.26% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.44% : 0.000756s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 5.07% : 0.008656s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.01% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000096s : 28: opt.transform.opt_b 0.03% : 0.000047s : 2: opt.transform.opt_trans_graph 0.02% : 0.000034s : 4: opt.transform.symbol_engine_opt 6.05% : 0.010319s : 1: opt_a 0.06% : 0.000104s : 1: opt_after_cconv 0.27% : 0.000466s : 1: opt_after_jit_grad 0.12% : 0.000207s : 1: opt_b 7.37% : 0.012573s : 1: optimize 0.01% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000033s : 1: pre_auto_parallel 0.01% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000024s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000018s : 1: remove_dup_value 0.22% : 0.000370s : 1: renormalize.infer 0.15% : 0.000251s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000049s : 1: rewriter_after_opt_a 0.03% : 0.000051s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000077s : 1: symbol_engine_optimizer 71.93% : 0.122759s : 1: task_emit 0.05% : 0.000077s : 1: tuple_transform 2.92% : 0.004984s : 1: type_inference 0.05% : 0.000077s : 1: validate TotalTime = 0.109908, [24] [bootstrap]: 0.00050642 [type_inference]: 0.00624728 [event_method]: 1.548e-05 [auto_monad]: 5.833e-05 [graph_reusing]: 6.07999e-06 [inline]: 2.79001e-06 [add_attr]: 0.0034701, [1] [add_attr_with_inline]: 0.00345834, [1] [Cycle 1]: 6.57e-05, [2] [tag_attr]: 2.087e-05 [meta_addattr_fg_expand]: 4.35999e-06 [parallel-infer-symbol]: 3.61999e-06 [pre_auto_parallel]: 3.526e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.99978e-07 [dataset_repeat_opt]: 2.24999e-06 [pipeline_split]: 1.92001e-06 [optimize]: 0.00474236, [53] [py_interpret_to_execute]: 2.711e-05 [rewriter_before_opt_a]: 7.104e-05 [opt_a]: 0.00266658, [2] [Cycle 1]: 0.00201919, [45] [expand_dump_flag]: 3.28e-06 [switch_simplify]: 3.499e-05 [loop_unroll]: 2.077e-05 [a_1]: 0.00049443 [with_stream_mark]: 1.765e-05 [recompute_prepare]: 8.28999e-06 [updatestate_depend_eliminate]: 4.22e-06 [updatestate_assign_eliminate]: 3.37002e-06 [updatestate_loads_eliminate]: 2.83003e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 7.847e-05 [accelerated_algorithm]: 6.20002e-06 [shard]: 2.26998e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.87e-06 [auto_parallel]: 7.08e-06 [parallel]: 2.057e-05 [flash_sp]: 8.91997e-06 [merge_comm]: 3.59002e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 9.92999e-06 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 7.38999e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 1.07e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.182e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.57999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.54001e-06 [flash_sp_send_recv_attached]: 2.53998e-06 [receive_attached]: 3.06001e-06 [after_resolve]: 1.111e-05 [a_after_grad]: 8.94e-06 [renormalize]: 0.00083556 [add_forward_monad_depend]: 5.92999e-06 [auto_monad_grad]: 2.74999e-06 [auto_monad_eliminator]: 1.603e-05 [cse]: 3.065e-05 [a_3]: 4.404e-05 [Cycle 2]: 0.00063608, [45] [expand_dump_flag]: 1.35999e-06 [switch_simplify]: 7.11999e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.0001321 [with_stream_mark]: 1.408e-05 [recompute_prepare]: 6.06998e-06 [updatestate_depend_eliminate]: 3.3e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.38998e-06 [parameter_eliminate]: 1.03001e-06 [a_2]: 6.872e-05 [accelerated_algorithm]: 6.04001e-06 [shard]: 1.95001e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 5.54e-06 [auto_parallel]: 6.79001e-06 [parallel]: 5.44e-06 [flash_sp]: 4.09002e-06 [merge_comm]: 3.55998e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 6.16e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 6.44999e-06 [virtual_dataset]: 5.38002e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.99003e-06 [merge_forward]: 3.26999e-06 [cell_reuse_recompute_pass]: 2.52001e-06 [offload_activation]: 8.2e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.033e-05 [merge_recompute_call_nodes]: 1.07e-06 [before_grad]: 8.30999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.8e-06 [meta_fg_expand]: 2.01e-06 [flash_sp_send_recv_attached]: 1.04e-06 [receive_attached]: 1.35001e-06 [after_resolve]: 9.77999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 7.53999e-06 [cse]: 1.404e-05 [a_3]: 3.239e-05 [py_interpret_to_execute_after_opt_a]: 1.097e-05 [slice_cell_reuse_recomputed_activation]: 2.38002e-06 [rewriter_after_opt_a]: 3.907e-05 [convert_after_rewriter]: 7.23999e-06 [order_py_execute_after_rewriter]: 5.66e-06 [mutable_eliminate]: 0.0006021 [opt_b]: 0.00019085, [1] [Cycle 1]: 0.00018346, [7] [b_1]: 0.00011057 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 7.84002e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.49999e-06 [renormalize]: 5.10016e-07 [cse]: 1.864e-05 [optimize_parallel_all_gather_comm]: 1.669e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 2.876e-05 [loop_unroll]: 0.00042694 [opt_after_cconv]: 9.446e-05, [1] [Cycle 1]: 8.887e-05, [7] [c_1]: 2.765e-05 [parameter_eliminate]: 2.94999e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.539e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.422e-05 [tuple_transform]: 7.199e-05, [1] [Cycle 1]: 6.737e-05, [4] [d_1]: 4.116e-05 [none_parameter_eliminate]: 1.92999e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 5.96e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.969e-05 [cse_after_recomputation]: 2.135e-05, [1] [Cycle 1]: 1.646e-05, [1] [cse]: 1.116e-05 [environ_conv]: 5.52001e-06 [swap_dp_allreduce_reducescatter]: 5.50001e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.21998e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.23002e-06 [reorder_send_recv_between_fp_bp]: 2.97002e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.59e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.21e-05 [grouped_pairwise_exchange_alltoall]: 2.21e-06 [offloading_packed_experts]: 3.82002e-06 [overlap_recompute_and_grad_model_parallel]: 4.61002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 2.113e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 2.26e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 7.012e-05, [1] [Cycle 1]: 6.595e-05, [6] [build]: 3.66999e-06 [elim_shapecalc]: 8.64998e-06 [elim_not_effective]: 1.213e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.16e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.636e-05 [get_jit_bprop_graph]: 1.95001e-06 [rewriter_after_jit_bprop_graph]: 4.67e-06 [opt_after_jit_grad]: 0.00045512 [validate]: 4.336e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0940367 [execute]: 1.002e-05 Sums bootstrap : 0.000506s : 0.48% type_inference : 0.006247s : 5.93% event_method : 0.000015s : 0.01% auto_monad : 0.000058s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000035s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000027s : 0.03% optimize.rewriter_before_opt_a : 0.000071s : 0.07% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000042s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000627s : 0.59% optimize.opt_a.with_stream_mark : 0.000032s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000014s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000836s : 0.79% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.02% optimize.opt_a.cse : 0.000045s : 0.04% optimize.opt_a.a_3 : 0.000076s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000602s : 0.57% optimize.opt_b.b_1 : 0.000111s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000029s : 0.03% optimize.loop_unroll : 0.000427s : 0.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000455s : 0.43% validate : 0.000043s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.094037s : 89.21% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000207 30 14.26% : 0.000030s : 5: substitution.arithmetic_simplify 0.93% : 0.000002s : 2: substitution.elim_not_effective 0.61% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000006s : 4: substitution.graph_param_transform 69.08% : 0.000143s : 3: substitution.inline 1.73% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000005s : 4: substitution.remove_not_recompute_node 2.26% : 0.000005s : 4: substitution.replace_old_param 5.65% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006194 2 90.14% : 0.005583s : 1: type_inference.infer 9.86% : 0.000611s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.40% : 0.000028s : 3: replace.inline 28.60% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000152 5 92.97% : 0.000141s : 3: match.inline 7.03% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 8: predicate.addn_check_dump 0.92% : 0.000002s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 11: predicate.dict_get_item_eliminator 1.11% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000004s : 16: predicate.float_depend_g_call 0.52% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.85% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.34% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.93% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 1.05% : 0.000002s : 11: predicate.transpose_eliminate 1.61% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000434 8 42.69% : 0.000185s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.31% : 0.000249s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120108 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.89% : 0.003477s : 1: add_attr 2.88% : 0.003463s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000063s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000538s : 1: bootstrap 0.03% : 0.000032s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000022s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000436s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000613s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 0.84% : 0.001003s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000045s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.22% : 0.002670s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.39% : 0.000465s : 1: opt_after_jit_grad 0.16% : 0.000195s : 1: opt_b 3.95% : 0.004748s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000040s : 1: pre_auto_parallel 0.03% : 0.000031s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000018s : 1: remove_dup_value 0.36% : 0.000429s : 1: renormalize.infer 0.33% : 0.000398s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000043s : 1: rewriter_after_opt_a 0.06% : 0.000076s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000073s : 1: symbol_engine_optimizer 78.32% : 0.094063s : 1: task_emit 0.06% : 0.000075s : 1: tuple_transform 5.22% : 0.006269s : 1: type_inference 0.06% : 0.000076s : 1: validate TotalTime = 0.17055, [24] [bootstrap]: 0.00059259 [type_inference]: 0.0137074 [event_method]: 5.947e-05 [auto_monad]: 0.0001416 [graph_reusing]: 9.86e-06 [inline]: 3.23e-06 [add_attr]: 0.00381847, [1] [add_attr_with_inline]: 0.00380513, [1] [Cycle 1]: 9.801e-05, [2] [tag_attr]: 4.713e-05 [meta_addattr_fg_expand]: 9.96e-06 [parallel-infer-symbol]: 4.22e-06 [pre_auto_parallel]: 6.52e-05 [insert-virtual-dataset]: 3.16001e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.13998e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.0173293, [53] [py_interpret_to_execute]: 4.901e-05 [rewriter_before_opt_a]: 0.00018563 [opt_a]: 0.0144584, [3] [Cycle 1]: 0.00938931, [45] [expand_dump_flag]: 5.82001e-06 [switch_simplify]: 7.775e-05 [loop_unroll]: 6.526e-05 [a_1]: 0.00177261 [with_stream_mark]: 3.676e-05 [recompute_prepare]: 2.954e-05 [updatestate_depend_eliminate]: 1.118e-05 [updatestate_assign_eliminate]: 9.09e-06 [updatestate_loads_eliminate]: 8.08999e-06 [parameter_eliminate]: 3.83999e-06 [a_2]: 0.0002654 [accelerated_algorithm]: 3.672e-05 [shard]: 2.44999e-06 [meta_shard_fg_expand]: 5.77999e-06 [shard_inline]: 1.702e-05 [merge_send_recv]: 2.073e-05 [auto_parallel]: 1.689e-05 [parallel]: 2.271e-05 [flash_sp]: 1.579e-05 [merge_comm]: 1.018e-05 [allreduce_fusion]: 9.33002e-06 [matmul_add_comm_reduction]: 3.647e-05 [allreduce_slice_to_reducescatter]: 8.00006e-07 [virtual_shard_identity]: 2.045e-05 [virtual_dataset]: 1.68e-05 [get_grad_eliminate_]: 1.574e-05 [virtual_output]: 1.59e-05 [merge_forward]: 1.034e-05 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 1.941e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.265e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 2.943e-05 [set_forward_comm_id_for_comm_node_pass]: 1.038e-05 [meta_fg_expand]: 0.00203692 [flash_sp_send_recv_attached]: 4.85999e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 8.127e-05 [a_after_grad]: 9.209e-05 [renormalize]: 0.0034538 [add_forward_monad_depend]: 1.362e-05 [auto_monad_grad]: 7.3e-06 [auto_monad_eliminator]: 6.423e-05 [cse]: 0.00019481 [a_3]: 0.00043049 [Cycle 2]: 0.00398996, [45] [expand_dump_flag]: 3.00002e-06 [switch_simplify]: 5.083e-05 [loop_unroll]: 4.722e-05 [a_1]: 0.00171584 [with_stream_mark]: 2.509e-05 [recompute_prepare]: 1.487e-05 [updatestate_depend_eliminate]: 7.77e-06 [updatestate_assign_eliminate]: 5.75001e-06 [updatestate_loads_eliminate]: 5.53002e-06 [parameter_eliminate]: 2.53e-06 [a_2]: 0.00014924 [accelerated_algorithm]: 1.687e-05 [shard]: 2.73998e-06 [meta_shard_fg_expand]: 3.31999e-06 [shard_inline]: 1.03e-05 [merge_send_recv]: 1.27e-05 [auto_parallel]: 1.386e-05 [parallel]: 1.037e-05 [flash_sp]: 5.00999e-06 [merge_comm]: 6.57002e-06 [allreduce_fusion]: 6.21e-06 [matmul_add_comm_reduction]: 1.354e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.342e-05 [virtual_dataset]: 1.001e-05 [get_grad_eliminate_]: 9.84999e-06 [virtual_output]: 1.009e-05 [merge_forward]: 6.12999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.459e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.305e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 1.774e-05 [set_forward_comm_id_for_comm_node_pass]: 6.91999e-06 [meta_fg_expand]: 0.00013689 [flash_sp_send_recv_attached]: 2.03002e-06 [receive_attached]: 3.37002e-06 [after_resolve]: 2.318e-05 [a_after_grad]: 1.792e-05 [renormalize]: 0.00108609 [add_forward_monad_depend]: 6.58e-06 [auto_monad_grad]: 2.77002e-06 [auto_monad_eliminator]: 2.188e-05 [cse]: 7.074e-05 [a_3]: 8.103e-05 [Cycle 3]: 0.00105705, [45] [expand_dump_flag]: 2.06e-06 [switch_simplify]: 1.235e-05 [loop_unroll]: 1.088e-05 [a_1]: 0.00029721 [with_stream_mark]: 1.417e-05 [recompute_prepare]: 1.116e-05 [updatestate_depend_eliminate]: 6.00002e-06 [updatestate_assign_eliminate]: 4.96002e-06 [updatestate_loads_eliminate]: 4.68999e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 0.00014144 [accelerated_algorithm]: 1.343e-05 [shard]: 1.80001e-06 [meta_shard_fg_expand]: 2.72001e-06 [shard_inline]: 1.032e-05 [merge_send_recv]: 9.25999e-06 [auto_parallel]: 9.93998e-06 [parallel]: 7.12002e-06 [flash_sp]: 1.52001e-06 [merge_comm]: 6.29001e-06 [allreduce_fusion]: 5.66e-06 [matmul_add_comm_reduction]: 1.099e-05 [allreduce_slice_to_reducescatter]: 4.40021e-07 [virtual_shard_identity]: 1.159e-05 [virtual_dataset]: 1.028e-05 [get_grad_eliminate_]: 1.004e-05 [virtual_output]: 9.14e-06 [merge_forward]: 6.19999e-06 [cell_reuse_recompute_pass]: 1.79998e-06 [offload_activation]: 1.339e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.942e-05 [merge_recompute_call_nodes]: 9.99979e-07 [before_grad]: 1.86e-05 [set_forward_comm_id_for_comm_node_pass]: 7.41999e-06 [meta_fg_expand]: 3.86999e-06 [flash_sp_send_recv_attached]: 1.05001e-06 [receive_attached]: 1.42e-06 [after_resolve]: 1.713e-05 [a_after_grad]: 1.66e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 2.16998e-06 [auto_monad_eliminator]: 1.357e-05 [cse]: 3.323e-05 [a_3]: 6.877e-05 [py_interpret_to_execute_after_opt_a]: 1.851e-05 [slice_cell_reuse_recomputed_activation]: 1.94999e-06 [rewriter_after_opt_a]: 6.064e-05 [convert_after_rewriter]: 1.052e-05 [order_py_execute_after_rewriter]: 7.4e-06 [mutable_eliminate]: 0.00074822 [opt_b]: 0.00033861, [1] [Cycle 1]: 0.00033001, [7] [b_1]: 0.00021792 [b_2]: 1.255e-05 [updatestate_depend_eliminate]: 1.063e-05 [updatestate_assign_eliminate]: 5.02e-06 [updatestate_loads_eliminate]: 4.80999e-06 [renormalize]: 5.39992e-07 [cse]: 4.231e-05 [optimize_parallel_all_gather_comm]: 2.58e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 3.286e-05 [loop_unroll]: 0.00046498 [opt_after_cconv]: 0.00015584, [1] [Cycle 1]: 0.00014851, [7] [c_1]: 5.505e-05 [parameter_eliminate]: 3.41001e-06 [updatestate_depend_eliminate]: 9.04e-06 [updatestate_assign_eliminate]: 4.97e-06 [updatestate_loads_eliminate]: 4.89e-06 [cse]: 3.593e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 5.437e-05 [tuple_transform]: 0.00011544, [1] [Cycle 1]: 0.00011015, [4] [d_1]: 7.802e-05 [none_parameter_eliminate]: 1.99999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.119e-05 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 6.993e-05 [cse_after_recomputation]: 3.704e-05, [1] [Cycle 1]: 3.188e-05, [1] [cse]: 2.572e-05 [environ_conv]: 1.26e-05 [swap_dp_allreduce_reducescatter]: 9.09998e-06 [bias_add_comm_swap]: 2.69999e-06 [label_micro_interleaved_index]: 5.49e-06 [label_fine_grained_interleaved_index]: 3.19001e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.54999e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 3.33e-06 [comm_op_add_attrs]: 1.27e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.982e-05 [grouped_pairwise_exchange_alltoall]: 1.62999e-06 [offloading_packed_experts]: 6.44001e-06 [overlap_recompute_and_grad_model_parallel]: 7.31001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 5.92999e-06 [overlap_grad_flash_sp]: 3.281e-05 [begin_end_overlap_inline]: 5.49975e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.82999e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 0.00011117, [1] [Cycle 1]: 0.00010621, [6] [build]: 1.312e-05 [elim_shapecalc]: 1.482e-05 [elim_not_effective]: 2.055e-05 [opt_reshape]: 1.125e-05 [fold_const_symbol]: 1.693e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.32999e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 2.735e-05 [get_jit_bprop_graph]: 1.98002e-06 [rewriter_after_jit_bprop_graph]: 5.54998e-06 [opt_after_jit_grad]: 0.00058359 [validate]: 6.172e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.133855 [execute]: 9.87001e-06 Sums bootstrap : 0.000593s : 0.36% type_inference : 0.013707s : 8.29% event_method : 0.000059s : 0.04% auto_monad : 0.000142s : 0.09% graph_reusing : 0.000010s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000047s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000065s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000049s : 0.03% optimize.rewriter_before_opt_a : 0.000186s : 0.11% optimize.opt_a.expand_dump_flag : 0.000011s : 0.01% optimize.opt_a.switch_simplify : 0.000141s : 0.09% optimize.opt_a.loop_unroll : 0.000123s : 0.07% optimize.opt_a.a_1 : 0.003786s : 2.29% optimize.opt_a.with_stream_mark : 0.000076s : 0.05% optimize.opt_a.recompute_prepare : 0.000056s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000025s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000018s : 0.01% optimize.opt_a.parameter_eliminate : 0.000008s : 0.00% optimize.opt_a.a_2 : 0.000556s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000067s : 0.04% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000012s : 0.01% optimize.opt_a.shard_inline : 0.000038s : 0.02% optimize.opt_a.merge_send_recv : 0.000043s : 0.03% optimize.opt_a.auto_parallel : 0.000041s : 0.02% optimize.opt_a.parallel : 0.000040s : 0.02% optimize.opt_a.flash_sp : 0.000022s : 0.01% optimize.opt_a.merge_comm : 0.000023s : 0.01% optimize.opt_a.allreduce_fusion : 0.000021s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000061s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000045s : 0.03% optimize.opt_a.virtual_dataset : 0.000037s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000036s : 0.02% optimize.opt_a.virtual_output : 0.000035s : 0.02% optimize.opt_a.merge_forward : 0.000023s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000047s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000075s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000066s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000025s : 0.01% optimize.opt_a.meta_fg_expand : 0.002178s : 1.32% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000122s : 0.07% optimize.opt_a.a_after_grad : 0.000127s : 0.08% optimize.opt_a.renormalize : 0.004540s : 2.75% optimize.opt_a.add_forward_monad_depend : 0.000021s : 0.01% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000100s : 0.06% optimize.opt_a.cse : 0.000299s : 0.18% optimize.opt_a.a_3 : 0.000580s : 0.35% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000061s : 0.04% optimize.convert_after_rewriter : 0.000011s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000748s : 0.45% optimize.opt_b.b_1 : 0.000218s : 0.13% optimize.opt_b.b_2 : 0.000013s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000042s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000033s : 0.02% optimize.loop_unroll : 0.000465s : 0.28% optimize.opt_after_cconv.c_1 : 0.000055s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.cse : 0.000036s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000054s : 0.03% optimize.tuple_transform.d_1 : 0.000078s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000070s : 0.04% optimize.cse_after_recomputation.cse : 0.000026s : 0.02% optimize.environ_conv : 0.000013s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000033s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000584s : 0.35% validate : 0.000062s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.133855s : 81.00% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.001060 231 5.94% : 0.000063s : 12: substitution.arithmetic_simplify 2.41% : 0.000025s : 4: substitution.cast_eliminate 0.28% : 0.000003s : 6: substitution.elim_not_effective 0.45% : 0.000005s : 5: substitution.float_depend_g_call 0.47% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000003s : 6: substitution.fold_const_symbol 0.89% : 0.000009s : 9: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 59.20% : 0.000627s : 17: substitution.inline 2.17% : 0.000023s : 2: substitution.inline_without_move 1.25% : 0.000013s : 22: substitution.j_node_and_user_rematch 1.88% : 0.000020s : 3: substitution.less_batch_normalization 1.48% : 0.000016s : 11: substitution.minmaximum_grad 0.62% : 0.000007s : 5: substitution.partial_eliminate 1.67% : 0.000018s : 22: substitution.remove_not_recompute_node 2.86% : 0.000030s : 10: substitution.replace_applicator 1.33% : 0.000014s : 15: substitution.replace_old_param 0.39% : 0.000004s : 1: substitution.set_cell_output_no_recompute 3.03% : 0.000032s : 11: substitution.tuple_list_convert_item_index_to_positive 1.39% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 1.91% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.59% : 0.000080s : 30: substitution.tuple_list_get_item_eliminator 2.00% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013594 2 85.01% : 0.011557s : 1: type_inference.infer 14.99% : 0.002038s : 1: type_inference.specialize ------[replace.] 0.000267 33 58.89% : 0.000157s : 17: replace.inline 41.11% : 0.000110s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000656 33 93.92% : 0.000616s : 17: match.inline 6.08% : 0.000040s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000939 5998 0.92% : 0.000009s : 70: predicate.accumulaten_eliminater 0.27% : 0.000003s : 9: predicate.ad_related_special_op_eliminate 0.44% : 0.000004s : 34: predicate.addn_check_dump 0.93% : 0.000009s : 70: predicate.addn_zero_filter 0.90% : 0.000008s : 70: predicate.adjust_all_reduce_mul_add 1.86% : 0.000017s : 104: predicate.arithmetic_simplify 0.98% : 0.000009s : 70: predicate.cast_eliminate 0.98% : 0.000009s : 71: predicate.check_bprop_eliminate 0.44% : 0.000004s : 34: predicate.compare_switch_simplify 0.08% : 0.000001s : 9: predicate.const_output_eliminate 0.45% : 0.000004s : 34: predicate.depend_value_elim 1.01% : 0.000009s : 70: predicate.dict_get_item_const_eliminator 1.02% : 0.000010s : 70: predicate.dict_get_item_eliminator 0.95% : 0.000009s : 70: predicate.dict_set_item_eliminator 0.34% : 0.000003s : 18: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 9: predicate.elim_not_effective 0.14% : 0.000001s : 9: predicate.elim_shapecalc_of_broadcastargs 1.05% : 0.000010s : 79: predicate.environ_add_const_eliminate 1.03% : 0.000010s : 79: predicate.environ_get_add_eliminate 1.02% : 0.000010s : 79: predicate.environ_get_depend_swap 1.50% : 0.000014s : 113: predicate.environ_get_eliminate 1.03% : 0.000010s : 79: predicate.environ_get_set_eliminate 1.45% : 0.000014s : 103: predicate.exchange_switch_depend_value 2.01% : 0.000019s : 103: predicate.float_depend_g_call 0.44% : 0.000004s : 34: predicate.float_environ_get_switch 0.59% : 0.000006s : 43: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 9: predicate.fold_const_symbol 0.52% : 0.000005s : 34: predicate.get_grad_eliminate 0.09% : 0.000001s : 9: predicate.graph_param_transform 0.46% : 0.000004s : 34: predicate.incorporate_call 0.43% : 0.000004s : 34: predicate.incorporate_call_switch 11.49% : 0.000108s : 259: predicate.inline 1.16% : 0.000011s : 57: predicate.inline_without_move 0.31% : 0.000003s : 34: predicate.j_node_and_user_rematch 0.59% : 0.000006s : 34: predicate.less_batch_normalization 1.44% : 0.000014s : 104: predicate.list_to_tuple_eliminator_ 2.29% : 0.000021s : 174: predicate.load_eliminater 0.31% : 0.000003s : 9: predicate.loop_unroll_after_grad 1.97% : 0.000019s : 138: predicate.loop_unroll_before_grad 1.25% : 0.000012s : 88: predicate.make_slice_get_slice_eliminator 0.47% : 0.000004s : 34: predicate.merge_addn 0.97% : 0.000009s : 71: predicate.micro_step_allgather_replace 0.97% : 0.000009s : 71: predicate.mini_step_allgather_replace 0.95% : 0.000009s : 70: predicate.minmaximum_grad 0.39% : 0.000004s : 9: predicate.mutable_eliminate 0.13% : 0.000001s : 9: predicate.opt_reshape 0.15% : 0.000001s : 9: predicate.parallel_virtual_node 1.90% : 0.000018s : 103: predicate.partial_defer_inline 1.50% : 0.000014s : 95: predicate.partial_eliminate 0.92% : 0.000009s : 70: predicate.print_const_string_wrapper 0.48% : 0.000005s : 34: predicate.reduce_all_const_elim 1.21% : 0.000011s : 70: predicate.reduce_eliminate 2.28% : 0.000021s : 174: predicate.redundant_stop_gradient_eliminater 0.28% : 0.000003s : 34: predicate.remove_not_recompute_node 1.62% : 0.000015s : 157: predicate.replace_applicator 0.61% : 0.000006s : 57: predicate.replace_old_param 0.10% : 0.000001s : 9: predicate.reset_defer_inline 0.95% : 0.000009s : 70: predicate.reshape_eliminate 0.98% : 0.000009s : 71: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 9: predicate.row_tensor_eliminate 1.16% : 0.000011s : 71: predicate.same_eliminate 0.32% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.58% : 0.000005s : 34: predicate.shard_identity_eliminate 0.29% : 0.000003s : 18: predicate.special_op_eliminate 0.56% : 0.000005s : 34: predicate.specialize_transform 6.90% : 0.000065s : 71: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000009s : 57: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 9: predicate.switch_call_monad_eliminater 1.57% : 0.000015s : 103: predicate.switch_defer_inline 2.50% : 0.000023s : 174: predicate.switch_layer_defer_inline 4.30% : 0.000040s : 284: predicate.switch_simplify 0.92% : 0.000009s : 70: predicate.tile_eliminate 0.95% : 0.000009s : 70: predicate.transpose_eliminate 1.31% : 0.000012s : 88: predicate.tuple_list_convert_item_index_to_positive 1.38% : 0.000013s : 88: predicate.tuple_list_get_item_const_eliminator 1.19% : 0.000011s : 88: predicate.tuple_list_get_item_depend_reorder 2.60% : 0.000024s : 138: predicate.tuple_list_get_item_eliminator 1.31% : 0.000012s : 88: predicate.tuple_list_get_set_item_eliminator 1.81% : 0.000017s : 122: predicate.tuple_list_set_item_eliminator 1.41% : 0.000013s : 104: predicate.tuple_to_list_eliminator_ 2.24% : 0.000021s : 174: predicate.updatestate_pure_node_eliminater 2.83% : 0.000027s : 208: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 9: predicate.value_based_eliminate 0.50% : 0.000005s : 34: predicate.virtual_dataset_eliminate 0.49% : 0.000005s : 34: predicate.virtual_output_eliminate 0.13% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.17% : 0.000002s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002224 34 54.05% : 0.001202s : 13: func_graph_cloner_run.FuncGraphClonerGraph 45.95% : 0.001022s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.202331 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.89% : 0.003825s : 1: add_attr 1.88% : 0.003810s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000074s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000151s : 1: auto_monad 0.02% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000623s : 1: bootstrap 0.02% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000023s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.02% : 0.000040s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000016s : 1: environ_conv 0.03% : 0.000069s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000016s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.23% : 0.000475s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000760s : 1: mutable_eliminate 0.00% : 0.000010s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000023s : 1: opt.transform.mutable_eliminate 2.83% : 0.005734s : 117: opt.transform.opt_a 0.03% : 0.000054s : 1: opt.transform.opt_after_cconv 0.02% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000203s : 28: opt.transform.opt_b 0.04% : 0.000087s : 2: opt.transform.opt_trans_graph 0.03% : 0.000060s : 4: opt.transform.symbol_engine_opt 7.15% : 0.014462s : 1: opt_a 0.08% : 0.000159s : 1: opt_after_cconv 0.29% : 0.000595s : 1: opt_after_jit_grad 0.17% : 0.000343s : 1: opt_b 8.57% : 0.017335s : 1: optimize 0.01% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000036s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000070s : 1: pre_auto_parallel 0.03% : 0.000054s : 1: py_interpret_to_execute 0.01% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000059s : 1: remove_dup_value 1.25% : 0.002530s : 2: renormalize.infer 0.98% : 0.001988s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000066s : 1: rewriter_after_opt_a 0.09% : 0.000191s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000114s : 1: symbol_engine_optimizer 66.17% : 0.133879s : 1: task_emit 0.06% : 0.000119s : 1: tuple_transform 6.79% : 0.013733s : 1: type_inference 0.05% : 0.000097s : 1: validate TotalTime = 0.184501, [24] [bootstrap]: 0.00048615 [type_inference]: 0.00508402 [event_method]: 1.234e-05 [auto_monad]: 5.623e-05 [graph_reusing]: 5.67999e-06 [inline]: 2.57001e-06 [add_attr]: 0.00332174, [1] [add_attr_with_inline]: 0.00331035, [1] [Cycle 1]: 5.443e-05, [2] [tag_attr]: 1.508e-05 [meta_addattr_fg_expand]: 3.48999e-06 [parallel-infer-symbol]: 4.01001e-06 [pre_auto_parallel]: 2.546e-05 [insert-virtual-dataset]: 2.70002e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.00419457, [53] [py_interpret_to_execute]: 1.846e-05 [rewriter_before_opt_a]: 4.298e-05 [opt_a]: 0.00219034, [2] [Cycle 1]: 0.00155706, [45] [expand_dump_flag]: 2.42001e-06 [switch_simplify]: 2.601e-05 [loop_unroll]: 1.458e-05 [a_1]: 0.00031035 [with_stream_mark]: 1.631e-05 [recompute_prepare]: 8.31002e-06 [updatestate_depend_eliminate]: 4.18999e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.21001e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 8.406e-05 [accelerated_algorithm]: 7.2e-06 [shard]: 2.76999e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 6.52001e-06 [merge_send_recv]: 9.14998e-06 [auto_parallel]: 6.75002e-06 [parallel]: 1.97e-05 [flash_sp]: 8.47e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 9.61e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.55998e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 6.07001e-06 [virtual_output]: 5.76e-06 [merge_forward]: 3.73999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.089e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.213e-05 [merge_recompute_call_nodes]: 1.67999e-06 [before_grad]: 9.65002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.91999e-06 [meta_fg_expand]: 2.40002e-06 [flash_sp_send_recv_attached]: 2.48e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 1.113e-05 [a_after_grad]: 9.25999e-06 [renormalize]: 0.00058056 [add_forward_monad_depend]: 4.94e-06 [auto_monad_grad]: 2.37001e-06 [auto_monad_eliminator]: 1.392e-05 [cse]: 2.798e-05 [a_3]: 4.722e-05 [Cycle 2]: 0.00062371, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.24001e-06 [loop_unroll]: 5.66e-06 [a_1]: 0.00012838 [with_stream_mark]: 1.068e-05 [recompute_prepare]: 5.99e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.85998e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 7.081e-05 [accelerated_algorithm]: 5.69e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.97999e-06 [auto_parallel]: 5.57999e-06 [parallel]: 4.58001e-06 [flash_sp]: 3.93001e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 3.13e-06 [matmul_add_comm_reduction]: 5.74e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 7.97e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.98998e-06 [virtual_output]: 5.38002e-06 [merge_forward]: 3.54002e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 7.76001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.017e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.91e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.24e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 8.63001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 7.18998e-06 [cse]: 1.375e-05 [a_3]: 3.285e-05 [py_interpret_to_execute_after_opt_a]: 8.42998e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.468e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.70001e-06 [mutable_eliminate]: 0.0005298 [opt_b]: 0.00018891, [1] [Cycle 1]: 0.00018204, [7] [b_1]: 0.00011149 [b_2]: 7.68001e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 6.50005e-07 [cse]: 1.655e-05 [optimize_parallel_all_gather_comm]: 1.639e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.614e-05 [loop_unroll]: 0.00042922 [opt_after_cconv]: 0.00014072, [1] [Cycle 1]: 0.00013477, [7] [c_1]: 2.811e-05 [parameter_eliminate]: 4.334e-05 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.756e-05 [renormalize]: 8.2e-07 [remove_dup_value]: 1.443e-05 [tuple_transform]: 7.18e-05, [1] [Cycle 1]: 6.746e-05, [4] [d_1]: 4.096e-05 [none_parameter_eliminate]: 1.85001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.37001e-06 [partial_unused_args_eliminate]: 2.37001e-06 [add_recomputation]: 4.51e-05 [cse_after_recomputation]: 2.061e-05, [1] [Cycle 1]: 1.645e-05, [1] [cse]: 1.08e-05 [environ_conv]: 5.36002e-06 [swap_dp_allreduce_reducescatter]: 5.53002e-06 [bias_add_comm_swap]: 3.22002e-06 [label_micro_interleaved_index]: 5.15001e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.53998e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.70001e-06 [full_micro_interleaved_order_control]: 2.89999e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.285e-05 [grouped_pairwise_exchange_alltoall]: 1.69998e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 4.99e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.62001e-06 [overlap_grad_ring_attention]: 4.53001e-06 [overlap_grad_flash_sp]: 1.893e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 2.08002e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 7.264e-05, [1] [Cycle 1]: 6.819e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.77999e-06 [elim_not_effective]: 1.205e-05 [opt_reshape]: 6.69001e-06 [fold_const_symbol]: 9.37999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.01e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.709e-05 [get_jit_bprop_graph]: 2.16998e-06 [rewriter_after_jit_bprop_graph]: 4.59002e-06 [opt_after_jit_grad]: 0.00048063 [validate]: 3.956e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.170498 [execute]: 1.188e-05 Sums bootstrap : 0.000486s : 0.27% type_inference : 0.005084s : 2.82% event_method : 0.000012s : 0.01% auto_monad : 0.000056s : 0.03% graph_reusing : 0.000006s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000025s : 0.01% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000018s : 0.01% optimize.rewriter_before_opt_a : 0.000043s : 0.02% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000033s : 0.02% optimize.opt_a.loop_unroll : 0.000020s : 0.01% optimize.opt_a.a_1 : 0.000439s : 0.24% optimize.opt_a.with_stream_mark : 0.000027s : 0.01% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000155s : 0.09% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000014s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.01% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000007s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.01% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000581s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.01% optimize.opt_a.cse : 0.000042s : 0.02% optimize.opt_a.a_3 : 0.000080s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000530s : 0.29% optimize.opt_b.b_1 : 0.000111s : 0.06% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.01% optimize.loop_unroll : 0.000429s : 0.24% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000043s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.02% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000481s : 0.27% validate : 0.000040s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.170498s : 94.64% execute : 0.000012s : 0.01% Time group info: ------[substitution.] 0.000134 26 18.16% : 0.000024s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000006s : 4: substitution.graph_param_transform 65.58% : 0.000088s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.74% : 0.000005s : 4: substitution.remove_not_recompute_node 3.39% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.005033 2 92.00% : 0.004630s : 1: type_inference.infer 8.00% : 0.000402s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000086 2 100.00% : 0.000086s : 2: match.inline ------[predicate.] 0.000144 984 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.53% : 0.000004s : 17: predicate.arithmetic_simplify 0.71% : 0.000001s : 9: predicate.cast_eliminate 0.99% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.80% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.48% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_depend_swap 1.96% : 0.000003s : 21: predicate.environ_get_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.93% : 0.000001s : 8: predicate.get_grad_eliminate 0.40% : 0.000001s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.18% : 0.000009s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.97% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.66% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000001s : 8: predicate.shard_identity_eliminate 1.27% : 0.000002s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.22% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.28% : 0.000006s : 41: predicate.switch_simplify 0.65% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.00% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.94% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000301 6 39.64% : 0.000119s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.36% : 0.000182s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.193564 196 0.00% : 0.000003s : 1: ForceFp32Comm 1.72% : 0.003328s : 1: add_attr 1.71% : 0.003315s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.03% : 0.000062s : 1: auto_monad 0.01% : 0.000021s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.27% : 0.000516s : 1: bootstrap 0.02% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.01% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000009s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000020s : 1: execute 0.00% : 0.000006s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.23% : 0.000438s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000540s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.42% : 0.000815s : 78: opt.transform.opt_a 0.01% : 0.000027s : 1: opt.transform.opt_after_cconv 0.01% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.05% : 0.000094s : 28: opt.transform.opt_b 0.02% : 0.000045s : 2: opt.transform.opt_trans_graph 0.02% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.13% : 0.002193s : 1: opt_a 0.07% : 0.000144s : 1: opt_after_cconv 0.25% : 0.000491s : 1: opt_after_jit_grad 0.10% : 0.000192s : 1: opt_b 2.17% : 0.004200s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.01% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.01% : 0.000022s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000018s : 1: remove_dup_value 0.18% : 0.000339s : 1: renormalize.infer 0.12% : 0.000234s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000039s : 1: rewriter_after_opt_a 0.02% : 0.000047s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.04% : 0.000075s : 1: symbol_engine_optimizer 88.10% : 0.170526s : 1: task_emit 0.04% : 0.000075s : 1: tuple_transform 2.64% : 0.005110s : 1: type_inference 0.03% : 0.000068s : 1: validate TotalTime = 0.177545, [24] [bootstrap]: 0.00052874 [type_inference]: 0.0122661 [event_method]: 5.466e-05 [auto_monad]: 0.00012886 [graph_reusing]: 9.39e-06 [inline]: 2.81e-06 [add_attr]: 0.00381364, [1] [add_attr_with_inline]: 0.00380017, [1] [Cycle 1]: 9.025e-05, [2] [tag_attr]: 3.839e-05 [meta_addattr_fg_expand]: 9.19e-06 [parallel-infer-symbol]: 3.78999e-06 [pre_auto_parallel]: 5.468e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.26003e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.0165656, [53] [py_interpret_to_execute]: 4.339e-05 [rewriter_before_opt_a]: 0.00014629 [opt_a]: 0.0135845, [3] [Cycle 1]: 0.00849664, [45] [expand_dump_flag]: 5.15999e-06 [switch_simplify]: 7.013e-05 [loop_unroll]: 5.597e-05 [a_1]: 0.00144735 [with_stream_mark]: 3.117e-05 [recompute_prepare]: 2.69e-05 [updatestate_depend_eliminate]: 1.001e-05 [updatestate_assign_eliminate]: 7.73999e-06 [updatestate_loads_eliminate]: 7.81001e-06 [parameter_eliminate]: 3.09999e-06 [a_2]: 0.000253 [accelerated_algorithm]: 4.975e-05 [shard]: 2.73e-06 [meta_shard_fg_expand]: 4.74e-06 [shard_inline]: 1.765e-05 [merge_send_recv]: 2.032e-05 [auto_parallel]: 1.383e-05 [parallel]: 2.268e-05 [flash_sp]: 1.446e-05 [merge_comm]: 9.74999e-06 [allreduce_fusion]: 8.92e-06 [matmul_add_comm_reduction]: 3.251e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 2.137e-05 [virtual_dataset]: 1.643e-05 [get_grad_eliminate_]: 1.551e-05 [virtual_output]: 1.542e-05 [merge_forward]: 1.013e-05 [cell_reuse_recompute_pass]: 1.74998e-06 [offload_activation]: 1.973e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.261e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 3.022e-05 [set_forward_comm_id_for_comm_node_pass]: 9.84001e-06 [meta_fg_expand]: 0.00184209 [flash_sp_send_recv_attached]: 4.2e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 7.414e-05 [a_after_grad]: 9.117e-05 [renormalize]: 0.00320637 [add_forward_monad_depend]: 1.342e-05 [auto_monad_grad]: 7.63001e-06 [auto_monad_eliminator]: 6.44e-05 [cse]: 0.00018865 [a_3]: 0.00037555 [Cycle 2]: 0.00397811, [45] [expand_dump_flag]: 2.73998e-06 [switch_simplify]: 5.114e-05 [loop_unroll]: 4.512e-05 [a_1]: 0.00171909 [with_stream_mark]: 2.509e-05 [recompute_prepare]: 1.481e-05 [updatestate_depend_eliminate]: 7.47002e-06 [updatestate_assign_eliminate]: 5.25999e-06 [updatestate_loads_eliminate]: 5.03002e-06 [parameter_eliminate]: 2.09999e-06 [a_2]: 0.00014507 [accelerated_algorithm]: 1.566e-05 [shard]: 2.56e-06 [meta_shard_fg_expand]: 3.65e-06 [shard_inline]: 1.028e-05 [merge_send_recv]: 1.192e-05 [auto_parallel]: 1.32e-05 [parallel]: 1.02e-05 [flash_sp]: 4.55999e-06 [merge_comm]: 6.41e-06 [allreduce_fusion]: 5.91e-06 [matmul_add_comm_reduction]: 1.365e-05 [allreduce_slice_to_reducescatter]: 1.05001e-06 [virtual_shard_identity]: 1.341e-05 [virtual_dataset]: 1.062e-05 [get_grad_eliminate_]: 1.013e-05 [virtual_output]: 9.84001e-06 [merge_forward]: 7.79002e-06 [cell_reuse_recompute_pass]: 2.17999e-06 [offload_activation]: 1.514e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.012e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 1.692e-05 [set_forward_comm_id_for_comm_node_pass]: 6.81999e-06 [meta_fg_expand]: 7.461e-05 [flash_sp_send_recv_attached]: 1.81e-06 [receive_attached]: 3.09999e-06 [after_resolve]: 1.942e-05 [a_after_grad]: 1.647e-05 [renormalize]: 0.00116207 [add_forward_monad_depend]: 7.28999e-06 [auto_monad_grad]: 2.62001e-06 [auto_monad_eliminator]: 2.134e-05 [cse]: 7.048e-05 [a_3]: 8.527e-05 [Cycle 3]: 0.00108813, [45] [expand_dump_flag]: 2.19999e-06 [switch_simplify]: 1.261e-05 [loop_unroll]: 1.047e-05 [a_1]: 0.00030244 [with_stream_mark]: 1.511e-05 [recompute_prepare]: 1.127e-05 [updatestate_depend_eliminate]: 6.57002e-06 [updatestate_assign_eliminate]: 5.00001e-06 [updatestate_loads_eliminate]: 4.95999e-06 [parameter_eliminate]: 1.44e-06 [a_2]: 0.00014239 [accelerated_algorithm]: 1.466e-05 [shard]: 1.50001e-06 [meta_shard_fg_expand]: 2.75002e-06 [shard_inline]: 1.095e-05 [merge_send_recv]: 9.96e-06 [auto_parallel]: 1.087e-05 [parallel]: 7.85e-06 [flash_sp]: 1.06002e-06 [merge_comm]: 6.63e-06 [allreduce_fusion]: 5.61e-06 [matmul_add_comm_reduction]: 1.245e-05 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 1.164e-05 [virtual_dataset]: 1.078e-05 [get_grad_eliminate_]: 9.58002e-06 [virtual_output]: 9.46e-06 [merge_forward]: 6.36e-06 [cell_reuse_recompute_pass]: 2.36e-06 [offload_activation]: 1.814e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.971e-05 [merge_recompute_call_nodes]: 1.28002e-06 [before_grad]: 1.729e-05 [set_forward_comm_id_for_comm_node_pass]: 5.99999e-06 [meta_fg_expand]: 4.05998e-06 [flash_sp_send_recv_attached]: 1.18001e-06 [receive_attached]: 1.32e-06 [after_resolve]: 1.833e-05 [a_after_grad]: 1.761e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 2.39001e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.474e-05 [cse]: 3.552e-05 [a_3]: 6.847e-05 [py_interpret_to_execute_after_opt_a]: 2.169e-05 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 6.33e-05 [convert_after_rewriter]: 1.046e-05 [order_py_execute_after_rewriter]: 7.43999e-06 [mutable_eliminate]: 0.00082772 [opt_b]: 0.0003509, [1] [Cycle 1]: 0.00034223, [7] [b_1]: 0.0002223 [b_2]: 1.326e-05 [updatestate_depend_eliminate]: 1.051e-05 [updatestate_assign_eliminate]: 4.60999e-06 [updatestate_loads_eliminate]: 5.00999e-06 [renormalize]: 8.80013e-07 [cse]: 4.79e-05 [optimize_parallel_all_gather_comm]: 2.618e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 3.725e-05 [loop_unroll]: 0.00048407 [opt_after_cconv]: 0.00016358, [1] [Cycle 1]: 0.00015652, [7] [c_1]: 5.617e-05 [parameter_eliminate]: 4.24002e-06 [updatestate_depend_eliminate]: 8.88002e-06 [updatestate_assign_eliminate]: 4.94e-06 [updatestate_loads_eliminate]: 4.99e-06 [cse]: 4.077e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 5.936e-05 [tuple_transform]: 0.00011919, [1] [Cycle 1]: 0.00011357, [4] [d_1]: 8.01e-05 [none_parameter_eliminate]: 1.76998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.184e-05 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 7.71e-05 [cse_after_recomputation]: 3.795e-05, [1] [Cycle 1]: 3.257e-05, [1] [cse]: 2.654e-05 [environ_conv]: 1.334e-05 [swap_dp_allreduce_reducescatter]: 9.22001e-06 [bias_add_comm_swap]: 3.18e-06 [label_micro_interleaved_index]: 5.96998e-06 [label_fine_grained_interleaved_index]: 2.74001e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.64999e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.11998e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.15001e-06 [overlap_opt_shard_in_pipeline]: 1.32999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 2.126e-05 [grouped_pairwise_exchange_alltoall]: 1.66002e-06 [offloading_packed_experts]: 5.90002e-06 [overlap_recompute_and_grad_model_parallel]: 6.38e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.60999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.92001e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 6.51999e-06 [overlap_grad_flash_sp]: 3.14e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 0.00011643, [1] [Cycle 1]: 0.00011131, [6] [build]: 1.379e-05 [elim_shapecalc]: 1.508e-05 [elim_not_effective]: 2.061e-05 [opt_reshape]: 1.185e-05 [fold_const_symbol]: 1.91e-05 [renormalize]: 3.19997e-07 [detach_backward]: 2.24001e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 2.749e-05 [get_jit_bprop_graph]: 2.27001e-06 [rewriter_after_jit_bprop_graph]: 5.64e-06 [opt_after_jit_grad]: 0.00053697 [validate]: 6.615e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.143162 [execute]: 1.163e-05 Sums bootstrap : 0.000529s : 0.31% type_inference : 0.012266s : 7.12% event_method : 0.000055s : 0.03% auto_monad : 0.000129s : 0.07% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000055s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000043s : 0.03% optimize.rewriter_before_opt_a : 0.000146s : 0.08% optimize.opt_a.expand_dump_flag : 0.000010s : 0.01% optimize.opt_a.switch_simplify : 0.000134s : 0.08% optimize.opt_a.loop_unroll : 0.000112s : 0.06% optimize.opt_a.a_1 : 0.003469s : 2.01% optimize.opt_a.with_stream_mark : 0.000071s : 0.04% optimize.opt_a.recompute_prepare : 0.000053s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000024s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000018s : 0.01% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000540s : 0.31% optimize.opt_a.accelerated_algorithm : 0.000080s : 0.05% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.01% optimize.opt_a.shard_inline : 0.000039s : 0.02% optimize.opt_a.merge_send_recv : 0.000042s : 0.02% optimize.opt_a.auto_parallel : 0.000038s : 0.02% optimize.opt_a.parallel : 0.000041s : 0.02% optimize.opt_a.flash_sp : 0.000020s : 0.01% optimize.opt_a.merge_comm : 0.000023s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000059s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000046s : 0.03% optimize.opt_a.virtual_dataset : 0.000038s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.02% optimize.opt_a.virtual_output : 0.000035s : 0.02% optimize.opt_a.merge_forward : 0.000024s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000053s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000072s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000064s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.01% optimize.opt_a.meta_fg_expand : 0.001921s : 1.12% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000112s : 0.06% optimize.opt_a.a_after_grad : 0.000125s : 0.07% optimize.opt_a.renormalize : 0.004369s : 2.54% optimize.opt_a.add_forward_monad_depend : 0.000023s : 0.01% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000100s : 0.06% optimize.opt_a.cse : 0.000295s : 0.17% optimize.opt_a.a_3 : 0.000529s : 0.31% optimize.py_interpret_to_execute_after_opt_a : 0.000022s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000063s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000828s : 0.48% optimize.opt_b.b_1 : 0.000222s : 0.13% optimize.opt_b.b_2 : 0.000013s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000048s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000037s : 0.02% optimize.loop_unroll : 0.000484s : 0.28% optimize.opt_after_cconv.c_1 : 0.000056s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.cse : 0.000041s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000059s : 0.03% optimize.tuple_transform.d_1 : 0.000080s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000012s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000077s : 0.04% optimize.cse_after_recomputation.cse : 0.000027s : 0.02% optimize.environ_conv : 0.000013s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000021s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000007s : 0.00% optimize.overlap_grad_flash_sp : 0.000031s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000019s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000537s : 0.31% validate : 0.000066s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.143162s : 83.12% execute : 0.000012s : 0.01% Time group info: ------[substitution.] 0.000947 227 6.36% : 0.000060s : 11: substitution.arithmetic_simplify 2.95% : 0.000028s : 4: substitution.cast_eliminate 0.32% : 0.000003s : 6: substitution.elim_not_effective 0.50% : 0.000005s : 5: substitution.float_depend_g_call 0.49% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000003s : 6: substitution.fold_const_symbol 0.94% : 0.000009s : 9: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 55.48% : 0.000525s : 16: substitution.inline 2.37% : 0.000022s : 2: substitution.inline_without_move 1.47% : 0.000014s : 22: substitution.j_node_and_user_rematch 2.03% : 0.000019s : 3: substitution.less_batch_normalization 1.64% : 0.000016s : 11: substitution.minmaximum_grad 0.82% : 0.000008s : 5: substitution.partial_eliminate 1.75% : 0.000017s : 22: substitution.remove_not_recompute_node 3.30% : 0.000031s : 10: substitution.replace_applicator 1.56% : 0.000015s : 15: substitution.replace_old_param 0.28% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.51% : 0.000033s : 11: substitution.tuple_list_convert_item_index_to_positive 1.58% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.18% : 0.000021s : 11: substitution.tuple_list_get_item_depend_reorder 7.46% : 0.000071s : 28: substitution.tuple_list_get_item_eliminator 2.13% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012169 2 85.80% : 0.010440s : 1: type_inference.infer 14.20% : 0.001729s : 1: type_inference.specialize ------[replace.] 0.000221 30 60.12% : 0.000133s : 16: replace.inline 39.88% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000549 30 93.99% : 0.000516s : 16: match.inline 6.01% : 0.000033s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000809 5897 1.04% : 0.000008s : 69: predicate.accumulaten_eliminater 0.30% : 0.000002s : 9: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 34: predicate.addn_check_dump 1.12% : 0.000009s : 69: predicate.addn_zero_filter 1.04% : 0.000008s : 69: predicate.adjust_all_reduce_mul_add 2.10% : 0.000017s : 103: predicate.arithmetic_simplify 1.14% : 0.000009s : 69: predicate.cast_eliminate 1.20% : 0.000010s : 71: predicate.check_bprop_eliminate 0.57% : 0.000005s : 34: predicate.compare_switch_simplify 0.09% : 0.000001s : 9: predicate.const_output_eliminate 0.51% : 0.000004s : 34: predicate.depend_value_elim 1.12% : 0.000009s : 69: predicate.dict_get_item_const_eliminator 1.20% : 0.000010s : 69: predicate.dict_get_item_eliminator 1.09% : 0.000009s : 69: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 18: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 9: predicate.elim_not_effective 0.18% : 0.000001s : 9: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000010s : 78: predicate.environ_add_const_eliminate 1.18% : 0.000010s : 78: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 78: predicate.environ_get_depend_swap 1.73% : 0.000014s : 112: predicate.environ_get_eliminate 1.23% : 0.000010s : 78: predicate.environ_get_set_eliminate 1.58% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.18% : 0.000018s : 99: predicate.float_depend_g_call 0.54% : 0.000004s : 34: predicate.float_environ_get_switch 0.71% : 0.000006s : 43: predicate.float_tuple_getitem_switch 0.10% : 0.000001s : 9: predicate.fold_const_symbol 0.59% : 0.000005s : 34: predicate.get_grad_eliminate 0.10% : 0.000001s : 9: predicate.graph_param_transform 0.55% : 0.000004s : 34: predicate.incorporate_call 0.48% : 0.000004s : 34: predicate.incorporate_call_switch 5.63% : 0.000046s : 254: predicate.inline 1.28% : 0.000010s : 57: predicate.inline_without_move 0.31% : 0.000002s : 34: predicate.j_node_and_user_rematch 0.77% : 0.000006s : 34: predicate.less_batch_normalization 1.69% : 0.000014s : 101: predicate.list_to_tuple_eliminator_ 2.52% : 0.000020s : 170: predicate.load_eliminater 0.36% : 0.000003s : 9: predicate.loop_unroll_after_grad 2.11% : 0.000017s : 130: predicate.loop_unroll_before_grad 1.50% : 0.000012s : 87: predicate.make_slice_get_slice_eliminator 0.56% : 0.000005s : 34: predicate.merge_addn 1.17% : 0.000009s : 71: predicate.micro_step_allgather_replace 1.11% : 0.000009s : 71: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 69: predicate.minmaximum_grad 0.49% : 0.000004s : 9: predicate.mutable_eliminate 0.16% : 0.000001s : 9: predicate.opt_reshape 0.19% : 0.000002s : 9: predicate.parallel_virtual_node 1.93% : 0.000016s : 99: predicate.partial_defer_inline 1.65% : 0.000013s : 92: predicate.partial_eliminate 1.03% : 0.000008s : 69: predicate.print_const_string_wrapper 0.55% : 0.000004s : 34: predicate.reduce_all_const_elim 1.33% : 0.000011s : 69: predicate.reduce_eliminate 2.61% : 0.000021s : 170: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000003s : 34: predicate.remove_not_recompute_node 1.86% : 0.000015s : 154: predicate.replace_applicator 0.59% : 0.000005s : 57: predicate.replace_old_param 0.13% : 0.000001s : 9: predicate.reset_defer_inline 1.07% : 0.000009s : 69: predicate.reshape_eliminate 1.19% : 0.000010s : 71: predicate.row_tensor_add_zeros_like 0.24% : 0.000002s : 9: predicate.row_tensor_eliminate 1.37% : 0.000011s : 71: predicate.same_eliminate 0.36% : 0.000003s : 34: predicate.set_cell_output_no_recompute 0.68% : 0.000005s : 34: predicate.shard_identity_eliminate 0.37% : 0.000003s : 18: predicate.special_op_eliminate 0.64% : 0.000005s : 34: predicate.specialize_transform 1.32% : 0.000011s : 71: predicate.split_environ_get_set_with_tuple_value 1.24% : 0.000010s : 57: predicate.stack_unstack_eliminate 0.19% : 0.000002s : 9: predicate.switch_call_monad_eliminater 1.76% : 0.000014s : 99: predicate.switch_defer_inline 2.80% : 0.000023s : 170: predicate.switch_layer_defer_inline 4.78% : 0.000039s : 272: predicate.switch_simplify 1.15% : 0.000009s : 69: predicate.tile_eliminate 1.09% : 0.000009s : 69: predicate.transpose_eliminate 1.48% : 0.000012s : 87: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 87: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000011s : 87: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000023s : 135: predicate.tuple_list_get_item_eliminator 1.47% : 0.000012s : 87: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000017s : 121: predicate.tuple_list_set_item_eliminator 1.56% : 0.000013s : 101: predicate.tuple_to_list_eliminator_ 2.52% : 0.000020s : 170: predicate.updatestate_pure_node_eliminater 3.17% : 0.000026s : 204: predicate.updatestate_useless_node_eliminater 0.19% : 0.000002s : 9: predicate.value_based_eliminate 0.63% : 0.000005s : 34: predicate.virtual_dataset_eliminate 0.62% : 0.000005s : 34: predicate.virtual_output_eliminate 0.14% : 0.000001s : 9: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 9: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001971 32 55.56% : 0.001095s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.44% : 0.000876s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.207996 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.84% : 0.003820s : 1: add_attr 1.83% : 0.003805s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000082s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000138s : 1: auto_monad 0.02% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.27% : 0.000570s : 1: bootstrap 0.02% : 0.000041s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000025s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.02% : 0.000041s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000017s : 1: environ_conv 0.03% : 0.000063s : 1: event_method 0.01% : 0.000019s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.24% : 0.000494s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000841s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000025s : 1: opt.transform.mutable_eliminate 2.56% : 0.005328s : 117: opt.transform.opt_a 0.03% : 0.000055s : 1: opt.transform.opt_after_cconv 0.02% : 0.000040s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000207s : 28: opt.transform.opt_b 0.04% : 0.000090s : 2: opt.transform.opt_trans_graph 0.03% : 0.000063s : 4: opt.transform.symbol_engine_opt 6.53% : 0.013588s : 1: opt_a 0.08% : 0.000167s : 1: opt_after_cconv 0.26% : 0.000549s : 1: opt_after_jit_grad 0.17% : 0.000355s : 1: opt_b 7.97% : 0.016572s : 1: optimize 0.01% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000035s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000059s : 1: pre_auto_parallel 0.02% : 0.000048s : 1: py_interpret_to_execute 0.01% : 0.000025s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000064s : 1: remove_dup_value 1.17% : 0.002427s : 2: renormalize.infer 0.92% : 0.001921s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000069s : 1: rewriter_after_opt_a 0.07% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000119s : 1: symbol_engine_optimizer 68.85% : 0.143197s : 1: task_emit 0.06% : 0.000122s : 1: tuple_transform 5.91% : 0.012300s : 1: type_inference 0.05% : 0.000103s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x0-ge],max_mem:38.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x1-pynative],max_mem:38.0M TotalTime = 0.0283528, [24] [bootstrap]: 0.00066476 [type_inference]: 0.00816692 [event_method]: 1.774e-05 [auto_monad]: 6.77e-05 [graph_reusing]: 6.64001e-06 [inline]: 3.31999e-06 [add_attr]: 0.00481409, [1] [add_attr_with_inline]: 0.00479635, [1] [Cycle 1]: 7.666e-05, [2] [tag_attr]: 2.442e-05 [meta_addattr_fg_expand]: 4.18001e-06 [parallel-infer-symbol]: 4.14002e-06 [pre_auto_parallel]: 4.563e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 1.22999e-06 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.97999e-06 [optimize]: 0.00536562, [53] [py_interpret_to_execute]: 3.504e-05 [rewriter_before_opt_a]: 7.843e-05 [opt_a]: 0.00288048, [2] [Cycle 1]: 0.00219961, [45] [expand_dump_flag]: 2.76999e-06 [switch_simplify]: 3.55e-05 [loop_unroll]: 2.104e-05 [a_1]: 0.00055144 [with_stream_mark]: 2.298e-05 [recompute_prepare]: 1.016e-05 [updatestate_depend_eliminate]: 4.60001e-06 [updatestate_assign_eliminate]: 3.66999e-06 [updatestate_loads_eliminate]: 3.02002e-06 [parameter_eliminate]: 2.22999e-06 [a_2]: 8.382e-05 [accelerated_algorithm]: 6.69001e-06 [shard]: 2.86e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 1.06e-05 [merge_send_recv]: 9.52999e-06 [auto_parallel]: 7.65998e-06 [parallel]: 3.179e-05 [flash_sp]: 1.061e-05 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.43999e-06 [matmul_add_comm_reduction]: 1.063e-05 [allreduce_slice_to_reducescatter]: 1.11002e-06 [virtual_shard_identity]: 8.07998e-06 [virtual_dataset]: 6.25002e-06 [get_grad_eliminate_]: 5.77999e-06 [virtual_output]: 6.13998e-06 [merge_forward]: 4.57e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 1.149e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.296e-05 [merge_recompute_call_nodes]: 1.89e-06 [before_grad]: 1.123e-05 [set_forward_comm_id_for_comm_node_pass]: 3.71999e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 1.185e-05 [a_after_grad]: 9.39e-06 [renormalize]: 0.00087394 [add_forward_monad_depend]: 7.15e-06 [auto_monad_grad]: 2.68e-06 [auto_monad_eliminator]: 1.856e-05 [cse]: 3.548e-05 [a_3]: 5.367e-05 [Cycle 2]: 0.00066656, [45] [expand_dump_flag]: 2.19001e-06 [switch_simplify]: 7.73999e-06 [loop_unroll]: 6.12001e-06 [a_1]: 0.00013828 [with_stream_mark]: 1.488e-05 [recompute_prepare]: 6.12999e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.054e-05 [accelerated_algorithm]: 5.66998e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 1.56002e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 6.79001e-06 [auto_parallel]: 7.68999e-06 [parallel]: 7.68999e-06 [flash_sp]: 3.98001e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.13e-06 [matmul_add_comm_reduction]: 8.13001e-06 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 6.39001e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.23002e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 2.48e-06 [offload_activation]: 9.94001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.175e-05 [merge_recompute_call_nodes]: 9.20001e-07 [before_grad]: 8.37e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 1.87001e-06 [flash_sp_send_recv_attached]: 1.10999e-06 [receive_attached]: 1.62001e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 9.40001e-06 [renormalize]: 9.99717e-08 [add_forward_monad_depend]: 1.37e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 9.35001e-06 [cse]: 2.094e-05 [a_3]: 3.367e-05 [py_interpret_to_execute_after_opt_a]: 1.66e-05 [slice_cell_reuse_recomputed_activation]: 2.18002e-06 [rewriter_after_opt_a]: 4.099e-05 [convert_after_rewriter]: 8.12998e-06 [order_py_execute_after_rewriter]: 5.85002e-06 [mutable_eliminate]: 0.00075293 [opt_b]: 0.00026997, [1] [Cycle 1]: 0.0002618, [7] [b_1]: 0.00017644 [b_2]: 7.59002e-06 [updatestate_depend_eliminate]: 8.69e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.62001e-06 [renormalize]: 9.09989e-07 [cse]: 2.514e-05 [optimize_parallel_all_gather_comm]: 2.081e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 3.632e-05 [loop_unroll]: 0.00048523 [opt_after_cconv]: 0.00010919, [1] [Cycle 1]: 0.00010245, [7] [c_1]: 2.908e-05 [parameter_eliminate]: 4.85999e-06 [updatestate_depend_eliminate]: 8.33001e-06 [updatestate_assign_eliminate]: 2.70997e-06 [updatestate_loads_eliminate]: 2.81e-06 [cse]: 1.986e-05 [renormalize]: 6.49976e-07 [remove_dup_value]: 1.49e-05 [tuple_transform]: 8.103e-05, [1] [Cycle 1]: 7.628e-05, [4] [d_1]: 4.788e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.76e-06 [partial_unused_args_eliminate]: 1.96998e-06 [add_recomputation]: 6.509e-05 [cse_after_recomputation]: 2.316e-05, [1] [Cycle 1]: 1.794e-05, [1] [cse]: 1.206e-05 [environ_conv]: 6.38e-06 [swap_dp_allreduce_reducescatter]: 5.82001e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 5.12e-06 [label_fine_grained_interleaved_index]: 2.98e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.94999e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 1.23002e-06 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.61999e-06 [reorder_send_recv_between_fp_bp]: 3.25e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.498e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 4.22e-06 [overlap_recompute_and_grad_model_parallel]: 5.18002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.56002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.99001e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 2.282e-05 [begin_end_overlap_inline]: 7.30011e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 1.06997e-06 [symbol_engine_optimizer]: 7.943e-05, [1] [Cycle 1]: 7.411e-05, [6] [build]: 4.70999e-06 [elim_shapecalc]: 9.56e-06 [elim_not_effective]: 1.247e-05 [opt_reshape]: 6.91001e-06 [fold_const_symbol]: 1.021e-05 [renormalize]: 3.20026e-07 [detach_backward]: 2.54999e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.947e-05 [get_jit_bprop_graph]: 2.48e-06 [rewriter_after_jit_bprop_graph]: 0.00016643 [opt_after_jit_grad]: 0.00057658 [validate]: 4.83e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00798197 [execute]: 1.051e-05 Sums bootstrap : 0.000665s : 2.98% type_inference : 0.008167s : 36.60% event_method : 0.000018s : 0.08% auto_monad : 0.000068s : 0.30% graph_reusing : 0.000007s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000024s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000046s : 0.20% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.16% optimize.rewriter_before_opt_a : 0.000078s : 0.35% optimize.opt_a.expand_dump_flag : 0.000005s : 0.02% optimize.opt_a.switch_simplify : 0.000043s : 0.19% optimize.opt_a.loop_unroll : 0.000027s : 0.12% optimize.opt_a.a_1 : 0.000690s : 3.09% optimize.opt_a.with_stream_mark : 0.000038s : 0.17% optimize.opt_a.recompute_prepare : 0.000016s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000004s : 0.02% optimize.opt_a.a_2 : 0.000154s : 0.69% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000017s : 0.07% optimize.opt_a.merge_send_recv : 0.000016s : 0.07% optimize.opt_a.auto_parallel : 0.000015s : 0.07% optimize.opt_a.parallel : 0.000039s : 0.18% optimize.opt_a.flash_sp : 0.000015s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.03% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.06% optimize.opt_a.virtual_dataset : 0.000012s : 0.05% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.05% optimize.opt_a.virtual_output : 0.000011s : 0.05% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.02% optimize.opt_a.offload_activation : 0.000021s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000020s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000022s : 0.10% optimize.opt_a.a_after_grad : 0.000019s : 0.08% optimize.opt_a.renormalize : 0.000874s : 3.92% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000028s : 0.13% optimize.opt_a.cse : 0.000056s : 0.25% optimize.opt_a.a_3 : 0.000087s : 0.39% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000041s : 0.18% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000753s : 3.37% optimize.opt_b.b_1 : 0.000176s : 0.79% optimize.opt_b.b_2 : 0.000008s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000025s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000036s : 0.16% optimize.loop_unroll : 0.000485s : 2.17% optimize.opt_after_cconv.c_1 : 0.000029s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.07% optimize.tuple_transform.d_1 : 0.000048s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000065s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.05% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000019s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000166s : 0.75% opt_after_jit_grad : 0.000577s : 2.58% validate : 0.000048s : 0.22% backend_pass : 0.000001s : 0.00% task_emit : 0.007982s : 35.77% execute : 0.000011s : 0.05% Time group info: ------[substitution.] 0.000247 30 14.35% : 0.000035s : 5: substitution.arithmetic_simplify 0.79% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000002s : 2: substitution.fold_const_symbol 2.70% : 0.000007s : 4: substitution.graph_param_transform 69.73% : 0.000172s : 3: substitution.inline 1.68% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.32% : 0.000006s : 4: substitution.remove_not_recompute_node 2.54% : 0.000006s : 4: substitution.replace_old_param 5.22% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.008096 2 90.48% : 0.007325s : 1: type_inference.infer 9.52% : 0.000771s : 1: type_inference.specialize ------[replace.] 0.000046 5 72.14% : 0.000033s : 3: replace.inline 27.86% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000181 5 93.49% : 0.000170s : 3: match.inline 6.51% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000181 1131 0.87% : 0.000002s : 11: predicate.accumulaten_eliminater 1.06% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.49% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000002s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.51% : 0.000005s : 19: predicate.arithmetic_simplify 1.01% : 0.000002s : 11: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.51% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.95% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.50% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000001s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.00% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 0.93% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.13% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.36% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.80% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000001s : 4: predicate.graph_param_transform 0.60% : 0.000001s : 8: predicate.incorporate_call 0.48% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000011s : 51: predicate.inline 0.76% : 0.000001s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000002s : 8: predicate.less_batch_normalization 1.82% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.18% : 0.000004s : 32: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.55% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 11: predicate.minmaximum_grad 1.58% : 0.000003s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.51% : 0.000003s : 16: predicate.partial_defer_inline 1.27% : 0.000002s : 17: predicate.partial_eliminate 0.92% : 0.000002s : 11: predicate.print_const_string_wrapper 0.87% : 0.000002s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 11: predicate.reduce_eliminate 2.67% : 0.000005s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000003s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000002s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000002s : 8: predicate.shard_identity_eliminate 0.91% : 0.000002s : 8: predicate.special_op_eliminate 0.70% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.33% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.67% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.79% : 0.000001s : 11: predicate.transpose_eliminate 1.47% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.52% : 0.000005s : 27: predicate.tuple_list_set_item_eliminator 1.77% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.13% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.91% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.28% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000567 8 45.97% : 0.000261s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.03% : 0.000306s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.040631 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.87% : 0.004822s : 1: add_attr 11.82% : 0.004801s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000070s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000073s : 1: auto_monad 0.06% : 0.000025s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 1.75% : 0.000711s : 1: bootstrap 0.10% : 0.000040s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000019s : 1: control_data_broadcast_order 0.03% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.02% : 0.000010s : 1: environ_conv 0.06% : 0.000025s : 1: event_method 0.04% : 0.000018s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000007s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.22% : 0.000496s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.89% : 0.000767s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.03% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000018s : 1: opt.transform.mutable_eliminate 2.69% : 0.001092s : 78: opt.transform.opt_a 0.07% : 0.000028s : 1: opt.transform.opt_after_cconv 0.07% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.38% : 0.000156s : 28: opt.transform.opt_b 0.13% : 0.000053s : 2: opt.transform.opt_trans_graph 0.09% : 0.000036s : 4: opt.transform.symbol_engine_opt 7.10% : 0.002884s : 1: opt_a 0.28% : 0.000113s : 1: opt_after_cconv 1.46% : 0.000592s : 1: opt_after_jit_grad 0.68% : 0.000274s : 1: opt_b 13.22% : 0.005371s : 1: optimize 0.06% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000009s : 1: parallel-infer-symbol 0.01% : 0.000005s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.12% : 0.000051s : 1: pre_auto_parallel 0.10% : 0.000040s : 1: py_interpret_to_execute 0.05% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000019s : 1: remove_dup_value 1.17% : 0.000476s : 1: renormalize.infer 0.95% : 0.000386s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.43% : 0.000175s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000046s : 1: rewriter_after_opt_a 0.21% : 0.000084s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.20% : 0.000082s : 1: symbol_engine_optimizer 19.70% : 0.008004s : 1: task_emit 0.21% : 0.000084s : 1: tuple_transform 20.17% : 0.008196s : 1: type_inference 0.24% : 0.000098s : 1: validate TotalTime = 0.020726, [24] [bootstrap]: 0.00045145 [type_inference]: 0.00483104 [event_method]: 1.217e-05 [auto_monad]: 5.733e-05 [graph_reusing]: 5.53002e-06 [inline]: 2.89999e-06 [add_attr]: 0.0033719, [1] [add_attr_with_inline]: 0.0033606, [1] [Cycle 1]: 5.561e-05, [2] [tag_attr]: 1.452e-05 [meta_addattr_fg_expand]: 3.81001e-06 [parallel-infer-symbol]: 3.61001e-06 [pre_auto_parallel]: 3.249e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.08998e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00442855, [53] [py_interpret_to_execute]: 2.018e-05 [rewriter_before_opt_a]: 4.849e-05 [opt_a]: 0.00229374, [2] [Cycle 1]: 0.00164211, [45] [expand_dump_flag]: 2.91e-06 [switch_simplify]: 2.519e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.00032768 [with_stream_mark]: 1.926e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.91001e-06 [updatestate_assign_eliminate]: 3.12002e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 7.783e-05 [accelerated_algorithm]: 6.98e-06 [shard]: 3.11001e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 6.74999e-06 [merge_send_recv]: 9.12999e-06 [auto_parallel]: 7.13e-06 [parallel]: 2.034e-05 [flash_sp]: 9.51e-06 [merge_comm]: 3.74002e-06 [allreduce_fusion]: 3.65998e-06 [matmul_add_comm_reduction]: 9.81e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.3e-06 [virtual_dataset]: 6.29001e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 1.017e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.164e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 1.009e-05 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.91999e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.106e-05 [a_after_grad]: 8.92e-06 [renormalize]: 0.00064709 [add_forward_monad_depend]: 4.82e-06 [auto_monad_grad]: 2.84999e-06 [auto_monad_eliminator]: 1.506e-05 [cse]: 3.15e-05 [a_3]: 4.278e-05 [Cycle 2]: 0.00064071, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 7.27002e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012831 [with_stream_mark]: 1.158e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 3.27002e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 6.864e-05 [accelerated_algorithm]: 6.01e-06 [shard]: 1.54e-06 [meta_shard_fg_expand]: 1.41998e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 5.37999e-06 [auto_parallel]: 6.11e-06 [parallel]: 6.06998e-06 [flash_sp]: 3.98999e-06 [merge_comm]: 3.48e-06 [allreduce_fusion]: 3.37997e-06 [matmul_add_comm_reduction]: 6.69999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.66e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.33002e-06 [merge_forward]: 3.84002e-06 [cell_reuse_recompute_pass]: 1.98002e-06 [offload_activation]: 8.24002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.95002e-06 [merge_recompute_call_nodes]: 1.10001e-06 [before_grad]: 8.65001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.69998e-06 [after_resolve]: 1.163e-05 [a_after_grad]: 8.41002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 2.18998e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 8.48001e-06 [cse]: 1.597e-05 [a_3]: 3.271e-05 [py_interpret_to_execute_after_opt_a]: 1.128e-05 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.724e-05 [convert_after_rewriter]: 7.76001e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00061306 [opt_b]: 0.00020129, [1] [Cycle 1]: 0.00019292, [7] [b_1]: 0.00011034 [b_2]: 7.63001e-06 [updatestate_depend_eliminate]: 8.50001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.76e-06 [renormalize]: 8.09989e-07 [cse]: 2.392e-05 [optimize_parallel_all_gather_comm]: 1.957e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 3.369e-05 [loop_unroll]: 0.00044907 [opt_after_cconv]: 0.00010325, [1] [Cycle 1]: 9.693e-05, [7] [c_1]: 2.843e-05 [parameter_eliminate]: 4.32e-06 [updatestate_depend_eliminate]: 6.62002e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.796e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 1.43e-05 [tuple_transform]: 7.556e-05, [1] [Cycle 1]: 7.05e-05, [4] [d_1]: 4.305e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 4.903e-05 [cse_after_recomputation]: 2.038e-05, [1] [Cycle 1]: 1.574e-05, [1] [cse]: 1.065e-05 [environ_conv]: 6.34001e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.97002e-06 [label_micro_interleaved_index]: 5.44e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.84999e-06 [micro_interleaved_order_control]: 2.99001e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.44e-06 [full_micro_interleaved_order_control]: 2.64001e-06 [reorder_send_recv_between_fp_bp]: 3.44001e-06 [comm_op_add_attrs]: 1.39e-06 [add_comm_op_reuse_tag]: 1.09e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.256e-05 [grouped_pairwise_exchange_alltoall]: 1.68002e-06 [offloading_packed_experts]: 4.55001e-06 [overlap_recompute_and_grad_model_parallel]: 4.69998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60999e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 2.347e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 7.374e-05, [1] [Cycle 1]: 6.946e-05, [6] [build]: 3.63e-06 [elim_shapecalc]: 9.56998e-06 [elim_not_effective]: 1.198e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 1.60013e-07 [detach_backward]: 2.55002e-06 [pipeline_parallel_scheduler]: 2.01e-06 [auto_monad_reorder]: 1.809e-05 [get_jit_bprop_graph]: 2.53e-06 [rewriter_after_jit_bprop_graph]: 4.83001e-06 [opt_after_jit_grad]: 0.00048156 [validate]: 4.238e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00669767 [execute]: 9.44e-06 Sums bootstrap : 0.000451s : 2.77% type_inference : 0.004831s : 29.69% event_method : 0.000012s : 0.07% auto_monad : 0.000057s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000032s : 0.20% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000048s : 0.30% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.20% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000456s : 2.80% optimize.opt_a.with_stream_mark : 0.000031s : 0.19% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000005s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000015s : 0.09% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.16% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000008s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000023s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000647s : 3.98% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000005s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.14% optimize.opt_a.cse : 0.000047s : 0.29% optimize.opt_a.a_3 : 0.000075s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.23% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000613s : 3.77% optimize.opt_b.b_1 : 0.000110s : 0.68% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.05% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000024s : 0.15% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000034s : 0.21% optimize.loop_unroll : 0.000449s : 2.76% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.03% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000043s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000023s : 0.14% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.02% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000018s : 0.11% get_jit_bprop_graph : 0.000003s : 0.02% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000482s : 2.96% validate : 0.000042s : 0.26% backend_pass : 0.000001s : 0.01% task_emit : 0.006698s : 41.16% execute : 0.000009s : 0.06% Time group info: ------[substitution.] 0.000151 26 15.92% : 0.000024s : 4: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 4.20% : 0.000006s : 4: substitution.graph_param_transform 68.88% : 0.000104s : 2: substitution.inline 2.53% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.19% : 0.000005s : 4: substitution.remove_not_recompute_node 3.23% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004776 2 92.05% : 0.004396s : 1: type_inference.infer 7.95% : 0.000380s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000102 2 100.00% : 0.000102s : 2: match.inline ------[predicate.] 0.000141 984 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.60% : 0.000004s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.75% : 0.000002s : 21: predicate.environ_get_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.66% : 0.000009s : 44: predicate.inline 1.07% : 0.000002s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.82% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.84% : 0.000003s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.01% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.61% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000001s : 8: predicate.shard_identity_eliminate 1.07% : 0.000002s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.69% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.35% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 1.04% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 1.08% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000311 6 40.92% : 0.000127s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.08% : 0.000184s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030147 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.20% : 0.003377s : 1: add_attr 11.16% : 0.003365s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000063s : 1: auto_monad 0.07% : 0.000022s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.64% : 0.000493s : 1: bootstrap 0.13% : 0.000038s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.05% : 0.000016s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000009s : 1: label_micro_interleaved_index 1.53% : 0.000460s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 2.08% : 0.000628s : 1: mutable_eliminate 0.03% : 0.000008s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 2.72% : 0.000821s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000048s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.62% : 0.002297s : 1: opt_a 0.35% : 0.000107s : 1: opt_after_cconv 1.64% : 0.000495s : 1: opt_after_jit_grad 0.68% : 0.000205s : 1: opt_b 14.71% : 0.004434s : 1: optimize 0.08% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.09% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.12% : 0.000037s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.05% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 1.28% : 0.000386s : 1: renormalize.infer 0.83% : 0.000251s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000042s : 1: rewriter_after_opt_a 0.17% : 0.000053s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000076s : 1: symbol_engine_optimizer 22.28% : 0.006718s : 1: task_emit 0.26% : 0.000078s : 1: tuple_transform 16.12% : 0.004858s : 1: type_inference 0.27% : 0.000083s : 1: validate TotalTime = 0.0230404, [24] [bootstrap]: 0.00048809 [type_inference]: 0.00621609 [event_method]: 1.49e-05 [auto_monad]: 5.692e-05 [graph_reusing]: 5.42999e-06 [inline]: 2.14e-06 [add_attr]: 0.00346381, [1] [add_attr_with_inline]: 0.00345146, [1] [Cycle 1]: 6.803e-05, [2] [tag_attr]: 2.111e-05 [meta_addattr_fg_expand]: 4.49002e-06 [parallel-infer-symbol]: 4.31002e-06 [pre_auto_parallel]: 3.4e-05 [insert-virtual-dataset]: 2.88e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00495546, [53] [py_interpret_to_execute]: 2.751e-05 [rewriter_before_opt_a]: 6.999e-05 [opt_a]: 0.00269286, [2] [Cycle 1]: 0.00204006, [45] [expand_dump_flag]: 3.09999e-06 [switch_simplify]: 3.405e-05 [loop_unroll]: 2.089e-05 [a_1]: 0.00048989 [with_stream_mark]: 1.737e-05 [recompute_prepare]: 8.67e-06 [updatestate_depend_eliminate]: 4.20999e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 7.764e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.86998e-06 [merge_send_recv]: 8.62e-06 [auto_parallel]: 7.33999e-06 [parallel]: 1.965e-05 [flash_sp]: 8.54e-06 [merge_comm]: 3.88999e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 1.003e-05 [allreduce_slice_to_reducescatter]: 1.08001e-06 [virtual_shard_identity]: 7.27002e-06 [virtual_dataset]: 6.17999e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 6.11998e-06 [merge_forward]: 3.95998e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.097e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.71998e-06 [before_grad]: 9.41003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35003e-06 [meta_fg_expand]: 2.69001e-06 [flash_sp_send_recv_attached]: 3.01999e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.93002e-06 [renormalize]: 0.00084925 [add_forward_monad_depend]: 6.70002e-06 [auto_monad_grad]: 2.84001e-06 [auto_monad_eliminator]: 1.865e-05 [cse]: 3.168e-05 [a_3]: 4.485e-05 [Cycle 2]: 0.00064103, [45] [expand_dump_flag]: 1.73002e-06 [switch_simplify]: 7.61001e-06 [loop_unroll]: 5.74e-06 [a_1]: 0.00013441 [with_stream_mark]: 1.194e-05 [recompute_prepare]: 5.94e-06 [updatestate_depend_eliminate]: 3.12002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.25999e-06 [a_2]: 6.851e-05 [accelerated_algorithm]: 5.84999e-06 [shard]: 1.64e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 5.14e-06 [auto_parallel]: 7.35e-06 [parallel]: 6.40002e-06 [flash_sp]: 3.81001e-06 [merge_comm]: 3.3e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 7.77998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.39001e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.14998e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 3.2e-06 [cell_reuse_recompute_pass]: 1.83997e-06 [offload_activation]: 8.08999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 9.49978e-07 [before_grad]: 8.53001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.00002e-06 [flash_sp_send_recv_attached]: 1.39e-06 [receive_attached]: 1.81e-06 [after_resolve]: 1.005e-05 [a_after_grad]: 8.47e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.44e-06 [auto_monad_grad]: 1.28002e-06 [auto_monad_eliminator]: 7.61001e-06 [cse]: 1.94e-05 [a_3]: 3.271e-05 [py_interpret_to_execute_after_opt_a]: 1.287e-05 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.979e-05 [convert_after_rewriter]: 7.93001e-06 [order_py_execute_after_rewriter]: 5.62999e-06 [mutable_eliminate]: 0.00072151 [opt_b]: 0.00019921, [1] [Cycle 1]: 0.00019124, [7] [b_1]: 0.00011459 [b_2]: 7.73999e-06 [updatestate_depend_eliminate]: 6.95002e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 5.00004e-07 [cse]: 2.114e-05 [optimize_parallel_all_gather_comm]: 1.79e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 3.256e-05 [loop_unroll]: 0.00044549 [opt_after_cconv]: 0.00010124, [1] [Cycle 1]: 9.495e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 4.89e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.38998e-06 [cse]: 1.785e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.398e-05 [tuple_transform]: 7.458e-05, [1] [Cycle 1]: 6.993e-05, [4] [d_1]: 4.355e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.39999e-06 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 5.087e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.592e-05, [1] [cse]: 1.056e-05 [environ_conv]: 6.10002e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 3.09001e-06 [label_micro_interleaved_index]: 4.58999e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.51998e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 1.08001e-06 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.72001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88002e-06 [control_data_broadcast_order]: 1.407e-05 [grouped_pairwise_exchange_alltoall]: 1.61998e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 5.52001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.41002e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.02998e-06 [overlap_grad_flash_sp]: 2.278e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.38002e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.11002e-06 [symbol_engine_optimizer]: 7.409e-05, [1] [Cycle 1]: 6.909e-05, [6] [build]: 4.84e-06 [elim_shapecalc]: 8.43999e-06 [elim_not_effective]: 1.188e-05 [opt_reshape]: 6.74999e-06 [fold_const_symbol]: 9.59e-06 [renormalize]: 1.50001e-07 [detach_backward]: 2.16e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.734e-05 [get_jit_bprop_graph]: 2.56998e-06 [rewriter_after_jit_bprop_graph]: 5.62001e-06 [opt_after_jit_grad]: 0.00051747 [validate]: 4.338e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.00693292 [execute]: 1.036e-05 Sums bootstrap : 0.000488s : 2.64% type_inference : 0.006216s : 33.60% event_method : 0.000015s : 0.08% auto_monad : 0.000057s : 0.31% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000034s : 0.18% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000028s : 0.15% optimize.rewriter_before_opt_a : 0.000070s : 0.38% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000042s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000624s : 3.37% optimize.opt_a.with_stream_mark : 0.000029s : 0.16% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.07% optimize.opt_a.auto_parallel : 0.000015s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000019s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000849s : 4.59% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.14% optimize.opt_a.cse : 0.000051s : 0.28% optimize.opt_a.a_3 : 0.000078s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000040s : 0.22% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000722s : 3.90% optimize.opt_b.b_1 : 0.000115s : 0.62% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000033s : 0.18% optimize.loop_unroll : 0.000445s : 2.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000044s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000005s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000003s : 0.01% rewriter_after_jit_bprop_graph : 0.000006s : 0.03% opt_after_jit_grad : 0.000517s : 2.80% validate : 0.000043s : 0.23% backend_pass : 0.000001s : 0.00% task_emit : 0.006933s : 37.47% execute : 0.000010s : 0.06% Time group info: ------[substitution.] 0.000205 30 14.16% : 0.000029s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000002s : 2: substitution.fold_const_symbol 2.93% : 0.000006s : 4: substitution.graph_param_transform 68.48% : 0.000140s : 3: substitution.inline 1.78% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.36% : 0.000005s : 4: substitution.remove_not_recompute_node 2.28% : 0.000005s : 4: substitution.replace_old_param 6.05% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006159 2 90.18% : 0.005554s : 1: type_inference.infer 9.82% : 0.000605s : 1: type_inference.specialize ------[replace.] 0.000044 5 72.64% : 0.000032s : 3: replace.inline 27.36% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000150 5 92.45% : 0.000138s : 3: match.inline 7.55% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000165 1131 0.95% : 0.000002s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000004s : 19: predicate.arithmetic_simplify 1.07% : 0.000002s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.95% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000001s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 6.42% : 0.000011s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 0.98% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.44% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000003s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.27% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 8: predicate.remove_not_recompute_node 1.54% : 0.000003s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.47% : 0.000001s : 4: predicate.reset_defer_inline 0.92% : 0.000002s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000002s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.85% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.38% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.02% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000418 8 43.18% : 0.000180s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.82% : 0.000237s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033459 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.37% : 0.003469s : 1: add_attr 10.33% : 0.003456s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000062s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.59% : 0.000532s : 1: bootstrap 0.11% : 0.000036s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000017s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.05% : 0.000017s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.36% : 0.000456s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.20% : 0.000735s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000016s : 1: opt.transform.mutable_eliminate 2.99% : 0.001001s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000048s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 8.06% : 0.002696s : 1: opt_a 0.31% : 0.000105s : 1: opt_after_cconv 1.58% : 0.000530s : 1: opt_after_jit_grad 0.61% : 0.000203s : 1: opt_b 14.83% : 0.004961s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000038s : 1: pre_auto_parallel 0.09% : 0.000032s : 1: py_interpret_to_execute 0.05% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000018s : 1: remove_dup_value 1.31% : 0.000437s : 1: renormalize.infer 1.20% : 0.000403s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000044s : 1: rewriter_after_opt_a 0.22% : 0.000075s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000077s : 1: symbol_engine_optimizer 20.79% : 0.006955s : 1: task_emit 0.23% : 0.000078s : 1: tuple_transform 18.64% : 0.006237s : 1: type_inference 0.25% : 0.000083s : 1: validate TotalTime = 0.0461644, [24] [bootstrap]: 0.00050991 [type_inference]: 0.0138492 [event_method]: 7.732e-05 [auto_monad]: 0.00014801 [graph_reusing]: 8.67998e-06 [inline]: 2.86999e-06 [add_attr]: 0.00390508, [1] [add_attr_with_inline]: 0.00389093, [1] [Cycle 1]: 0.00010673, [2] [tag_attr]: 4.982e-05 [meta_addattr_fg_expand]: 9.77001e-06 [parallel-infer-symbol]: 4.08999e-06 [pre_auto_parallel]: 6.599e-05 [insert-virtual-dataset]: 2.88e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.0170262, [53] [py_interpret_to_execute]: 4.641e-05 [rewriter_before_opt_a]: 0.00018147 [opt_a]: 0.0140825, [3] [Cycle 1]: 0.00876625, [45] [expand_dump_flag]: 6.07999e-06 [switch_simplify]: 7.897e-05 [loop_unroll]: 6.238e-05 [a_1]: 0.00167812 [with_stream_mark]: 3.278e-05 [recompute_prepare]: 2.488e-05 [updatestate_depend_eliminate]: 1.018e-05 [updatestate_assign_eliminate]: 7.56999e-06 [updatestate_loads_eliminate]: 7.3e-06 [parameter_eliminate]: 3.28998e-06 [a_2]: 0.00025931 [accelerated_algorithm]: 3.714e-05 [shard]: 2.21e-06 [meta_shard_fg_expand]: 4.66002e-06 [shard_inline]: 1.715e-05 [merge_send_recv]: 1.791e-05 [auto_parallel]: 1.4e-05 [parallel]: 2.094e-05 [flash_sp]: 1.496e-05 [merge_comm]: 1.044e-05 [allreduce_fusion]: 9.32999e-06 [matmul_add_comm_reduction]: 3.328e-05 [allreduce_slice_to_reducescatter]: 9.79984e-07 [virtual_shard_identity]: 1.928e-05 [virtual_dataset]: 1.728e-05 [get_grad_eliminate_]: 1.518e-05 [virtual_output]: 1.533e-05 [merge_forward]: 9.89999e-06 [cell_reuse_recompute_pass]: 1.71e-06 [offload_activation]: 2.073e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.411e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 2.92e-05 [set_forward_comm_id_for_comm_node_pass]: 1.046e-05 [meta_fg_expand]: 0.00190794 [flash_sp_send_recv_attached]: 4.05e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 7.47e-05 [a_after_grad]: 8.901e-05 [renormalize]: 0.00318773 [add_forward_monad_depend]: 1.475e-05 [auto_monad_grad]: 7.24001e-06 [auto_monad_eliminator]: 6.342e-05 [cse]: 0.00018369 [a_3]: 0.00035857 [Cycle 2]: 0.00415132, [45] [expand_dump_flag]: 3.05002e-06 [switch_simplify]: 4.913e-05 [loop_unroll]: 4.391e-05 [a_1]: 0.0017836 [with_stream_mark]: 3.006e-05 [recompute_prepare]: 1.695e-05 [updatestate_depend_eliminate]: 6.59001e-06 [updatestate_assign_eliminate]: 5.51002e-06 [updatestate_loads_eliminate]: 4.58999e-06 [parameter_eliminate]: 2.49001e-06 [a_2]: 0.00014275 [accelerated_algorithm]: 1.69e-05 [shard]: 2.67001e-06 [meta_shard_fg_expand]: 2.91999e-06 [shard_inline]: 1.041e-05 [merge_send_recv]: 1.234e-05 [auto_parallel]: 1.424e-05 [parallel]: 1.199e-05 [flash_sp]: 4.55999e-06 [merge_comm]: 5.97001e-06 [allreduce_fusion]: 5.61e-06 [matmul_add_comm_reduction]: 1.227e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 1.514e-05 [virtual_dataset]: 1.09e-05 [get_grad_eliminate_]: 1.002e-05 [virtual_output]: 9.28002e-06 [merge_forward]: 5.79999e-06 [cell_reuse_recompute_pass]: 1.87999e-06 [offload_activation]: 1.401e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.974e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 1.803e-05 [set_forward_comm_id_for_comm_node_pass]: 6.54001e-06 [meta_fg_expand]: 0.00015976 [flash_sp_send_recv_attached]: 2.36998e-06 [receive_attached]: 2.96999e-06 [after_resolve]: 2.523e-05 [a_after_grad]: 1.731e-05 [renormalize]: 0.00114122 [add_forward_monad_depend]: 7.75e-06 [auto_monad_grad]: 2.39001e-06 [auto_monad_eliminator]: 2.186e-05 [cse]: 6.979e-05 [a_3]: 7.735e-05 [Cycle 3]: 0.00114341, [45] [expand_dump_flag]: 2.04e-06 [switch_simplify]: 1.314e-05 [loop_unroll]: 1.069e-05 [a_1]: 0.00029522 [with_stream_mark]: 1.654e-05 [recompute_prepare]: 1.083e-05 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 4.72e-06 [updatestate_loads_eliminate]: 4.92e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 0.00013999 [accelerated_algorithm]: 1.439e-05 [shard]: 2.90998e-06 [meta_shard_fg_expand]: 2.29001e-06 [shard_inline]: 9.81e-06 [merge_send_recv]: 1.074e-05 [auto_parallel]: 1.095e-05 [parallel]: 9.22001e-06 [flash_sp]: 1.72999e-06 [merge_comm]: 5.29e-06 [allreduce_fusion]: 5.59e-06 [matmul_add_comm_reduction]: 1.043e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.217e-05 [virtual_dataset]: 9.48002e-06 [get_grad_eliminate_]: 9.32001e-06 [virtual_output]: 8.94e-06 [merge_forward]: 6.46999e-06 [cell_reuse_recompute_pass]: 2.96001e-06 [offload_activation]: 1.402e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.994e-05 [merge_recompute_call_nodes]: 1.56002e-06 [before_grad]: 1.7e-05 [set_forward_comm_id_for_comm_node_pass]: 5.81998e-06 [meta_fg_expand]: 3.78001e-06 [flash_sp_send_recv_attached]: 1.48002e-06 [receive_attached]: 2.09999e-06 [after_resolve]: 1.688e-05 [a_after_grad]: 1.497e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.79e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.398e-05 [cse]: 3.247e-05 [a_3]: 6.778e-05 [py_interpret_to_execute_after_opt_a]: 2.006e-05 [slice_cell_reuse_recomputed_activation]: 2.42001e-06 [rewriter_after_opt_a]: 5.987e-05 [convert_after_rewriter]: 9.67999e-06 [order_py_execute_after_rewriter]: 7.83999e-06 [mutable_eliminate]: 0.00076404 [opt_b]: 0.00033643, [1] [Cycle 1]: 0.00032694, [7] [b_1]: 0.00021542 [b_2]: 1.251e-05 [updatestate_depend_eliminate]: 9.41003e-06 [updatestate_assign_eliminate]: 4.62e-06 [updatestate_loads_eliminate]: 4.57e-06 [renormalize]: 5.3001e-07 [cse]: 4.108e-05 [optimize_parallel_all_gather_comm]: 2.46e-05 [overlap_param_gather]: 2.56e-06 [cconv]: 3.429e-05 [loop_unroll]: 0.00050463 [opt_after_cconv]: 0.00015872, [1] [Cycle 1]: 0.00015145, [7] [c_1]: 5.618e-05 [parameter_eliminate]: 4.03001e-06 [updatestate_depend_eliminate]: 8.97999e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 4.15999e-06 [cse]: 3.522e-05 [renormalize]: 4.99975e-07 [remove_dup_value]: 4.906e-05 [tuple_transform]: 0.00011637, [1] [Cycle 1]: 0.00011065, [4] [d_1]: 7.636e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 1.154e-05 [partial_unused_args_eliminate]: 1.87001e-06 [add_recomputation]: 7.47e-05 [cse_after_recomputation]: 3.859e-05, [1] [Cycle 1]: 3.296e-05, [1] [cse]: 2.58e-05 [environ_conv]: 1.152e-05 [swap_dp_allreduce_reducescatter]: 8.18001e-06 [bias_add_comm_swap]: 3.77998e-06 [label_micro_interleaved_index]: 6.76999e-06 [label_fine_grained_interleaved_index]: 3.46001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.38002e-06 [assign_add_opt]: 1.84e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.56998e-06 [reorder_send_recv_between_fp_bp]: 3.01001e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.30999e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.968e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 6.06998e-06 [overlap_recompute_and_grad_model_parallel]: 6.16998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.33002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.56998e-06 [overlap_grad_ring_attention]: 5.71998e-06 [overlap_grad_flash_sp]: 2.939e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 0.00011856, [1] [Cycle 1]: 0.00011343, [6] [build]: 1.415e-05 [elim_shapecalc]: 1.686e-05 [elim_not_effective]: 2.107e-05 [opt_reshape]: 1.2e-05 [fold_const_symbol]: 1.783e-05 [renormalize]: 2.89991e-07 [detach_backward]: 2.44999e-06 [pipeline_parallel_scheduler]: 1.74998e-06 [auto_monad_reorder]: 2.855e-05 [get_jit_bprop_graph]: 2.01e-06 [rewriter_after_jit_bprop_graph]: 6.51999e-06 [opt_after_jit_grad]: 0.00064432 [validate]: 6.181e-05 [backend_pass]: 1.24998e-06 [task_emit]: 0.00950302 [execute]: 9.89999e-06 Sums bootstrap : 0.000510s : 1.26% type_inference : 0.013849s : 34.09% event_method : 0.000077s : 0.19% auto_monad : 0.000148s : 0.36% graph_reusing : 0.000009s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000050s : 0.12% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.02% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000066s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000046s : 0.11% optimize.rewriter_before_opt_a : 0.000181s : 0.45% optimize.opt_a.expand_dump_flag : 0.000011s : 0.03% optimize.opt_a.switch_simplify : 0.000141s : 0.35% optimize.opt_a.loop_unroll : 0.000117s : 0.29% optimize.opt_a.a_1 : 0.003757s : 9.25% optimize.opt_a.with_stream_mark : 0.000079s : 0.20% optimize.opt_a.recompute_prepare : 0.000053s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000023s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.04% optimize.opt_a.parameter_eliminate : 0.000008s : 0.02% optimize.opt_a.a_2 : 0.000542s : 1.33% optimize.opt_a.accelerated_algorithm : 0.000068s : 0.17% optimize.opt_a.shard : 0.000008s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.02% optimize.opt_a.shard_inline : 0.000037s : 0.09% optimize.opt_a.merge_send_recv : 0.000041s : 0.10% optimize.opt_a.auto_parallel : 0.000039s : 0.10% optimize.opt_a.parallel : 0.000042s : 0.10% optimize.opt_a.flash_sp : 0.000021s : 0.05% optimize.opt_a.merge_comm : 0.000022s : 0.05% optimize.opt_a.allreduce_fusion : 0.000021s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000056s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000047s : 0.11% optimize.opt_a.virtual_dataset : 0.000038s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.08% optimize.opt_a.virtual_output : 0.000034s : 0.08% optimize.opt_a.merge_forward : 0.000022s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000007s : 0.02% optimize.opt_a.offload_activation : 0.000049s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000074s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.01% optimize.opt_a.before_grad : 0.000064s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.06% optimize.opt_a.meta_fg_expand : 0.002071s : 5.10% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.02% optimize.opt_a.receive_attached : 0.000008s : 0.02% optimize.opt_a.after_resolve : 0.000117s : 0.29% optimize.opt_a.a_after_grad : 0.000121s : 0.30% optimize.opt_a.renormalize : 0.004329s : 10.66% optimize.opt_a.add_forward_monad_depend : 0.000024s : 0.06% optimize.opt_a.auto_monad_grad : 0.000011s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000099s : 0.24% optimize.opt_a.cse : 0.000286s : 0.70% optimize.opt_a.a_3 : 0.000504s : 1.24% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000060s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.02% optimize.order_py_execute_after_rewriter : 0.000008s : 0.02% optimize.mutable_eliminate : 0.000764s : 1.88% optimize.opt_b.b_1 : 0.000215s : 0.53% optimize.opt_b.b_2 : 0.000013s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000041s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000025s : 0.06% optimize.overlap_param_gather : 0.000003s : 0.01% optimize.cconv : 0.000034s : 0.08% optimize.loop_unroll : 0.000505s : 1.24% optimize.opt_after_cconv.c_1 : 0.000056s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000035s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000049s : 0.12% optimize.tuple_transform.d_1 : 0.000076s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000012s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000075s : 0.18% optimize.cse_after_recomputation.cse : 0.000026s : 0.06% optimize.environ_conv : 0.000012s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000004s : 0.01% optimize.label_micro_interleaved_index : 0.000007s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000029s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000014s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000017s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000018s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000029s : 0.07% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.02% opt_after_jit_grad : 0.000644s : 1.59% validate : 0.000062s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.009503s : 23.39% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.001042 222 6.72% : 0.000070s : 12: substitution.arithmetic_simplify 2.39% : 0.000025s : 2: substitution.cast_eliminate 0.27% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000006s : 5: substitution.float_depend_g_call 0.43% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 5: substitution.fold_const_symbol 0.88% : 0.000009s : 8: substitution.graph_param_transform 0.29% : 0.000003s : 2: substitution.incorporate_call 0.22% : 0.000002s : 2: substitution.incorporate_call_switch 57.65% : 0.000600s : 17: substitution.inline 2.03% : 0.000021s : 2: substitution.inline_without_move 1.21% : 0.000013s : 20: substitution.j_node_and_user_rematch 2.08% : 0.000022s : 3: substitution.less_batch_normalization 1.49% : 0.000015s : 11: substitution.minmaximum_grad 0.76% : 0.000008s : 5: substitution.partial_eliminate 1.61% : 0.000017s : 20: substitution.remove_not_recompute_node 2.94% : 0.000031s : 10: substitution.replace_applicator 1.54% : 0.000016s : 15: substitution.replace_old_param 0.25% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.12% : 0.000032s : 11: substitution.tuple_list_convert_item_index_to_positive 1.48% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 1.97% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.90% : 0.000082s : 30: substitution.tuple_list_get_item_eliminator 2.00% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013701 2 85.30% : 0.011687s : 1: type_inference.infer 14.70% : 0.002014s : 1: type_inference.specialize ------[replace.] 0.000254 33 58.88% : 0.000149s : 17: replace.inline 41.12% : 0.000104s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000631 33 93.55% : 0.000590s : 17: match.inline 6.45% : 0.000041s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000821 5764 1.08% : 0.000009s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.55% : 0.000005s : 32: predicate.addn_check_dump 1.07% : 0.000009s : 68: predicate.addn_zero_filter 1.05% : 0.000009s : 68: predicate.adjust_all_reduce_mul_add 2.11% : 0.000017s : 100: predicate.arithmetic_simplify 1.25% : 0.000010s : 68: predicate.cast_eliminate 1.07% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000010s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.17% : 0.000010s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.28% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.23% : 0.000010s : 76: predicate.environ_get_depend_swap 1.73% : 0.000014s : 108: predicate.environ_get_eliminate 1.19% : 0.000010s : 76: predicate.environ_get_set_eliminate 1.67% : 0.000014s : 101: predicate.exchange_switch_depend_value 2.25% : 0.000018s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000006s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.50% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.75% : 0.000047s : 249: predicate.inline 1.28% : 0.000011s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000006s : 32: predicate.less_batch_normalization 1.62% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000022s : 168: predicate.load_eliminater 0.44% : 0.000004s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.44% : 0.000012s : 84: predicate.make_slice_get_slice_eliminator 0.56% : 0.000005s : 32: predicate.merge_addn 1.09% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.06% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000009s : 68: predicate.minmaximum_grad 0.46% : 0.000004s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000017s : 101: predicate.partial_defer_inline 1.65% : 0.000014s : 92: predicate.partial_eliminate 1.10% : 0.000009s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000011s : 68: predicate.reduce_eliminate 2.62% : 0.000022s : 168: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000003s : 32: predicate.remove_not_recompute_node 1.83% : 0.000015s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000009s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.39% : 0.000011s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.34% : 0.000003s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.36% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000015s : 101: predicate.switch_defer_inline 2.84% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.83% : 0.000040s : 277: predicate.switch_simplify 1.08% : 0.000009s : 68: predicate.tile_eliminate 1.09% : 0.000009s : 68: predicate.transpose_eliminate 1.40% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000013s : 84: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000024s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.70% : 0.000014s : 100: predicate.tuple_to_list_eliminator_ 2.56% : 0.000021s : 168: predicate.updatestate_pure_node_eliminater 3.21% : 0.000026s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000005s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001965 34 56.49% : 0.001110s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.51% : 0.000855s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.077373 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.06% : 0.003912s : 1: add_attr 5.03% : 0.003896s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000079s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.20% : 0.000156s : 1: auto_monad 0.04% : 0.000033s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000007s : 1: bias_add_comm_swap 0.71% : 0.000551s : 1: bootstrap 0.05% : 0.000038s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000023s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000042s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000007s : 1: detach_backward 0.02% : 0.000015s : 1: environ_conv 0.11% : 0.000088s : 1: event_method 0.02% : 0.000018s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000007s : 1: label_fine_grained_interleaved_index 0.01% : 0.000010s : 1: label_micro_interleaved_index 0.67% : 0.000516s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.01% : 0.000778s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.03% : 0.000022s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000024s : 1: opt.transform.mutable_eliminate 7.23% : 0.005590s : 117: opt.transform.opt_a 0.07% : 0.000055s : 1: opt.transform.opt_after_cconv 0.05% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000196s : 28: opt.transform.opt_b 0.11% : 0.000086s : 2: opt.transform.opt_trans_graph 0.08% : 0.000064s : 4: opt.transform.symbol_engine_opt 18.21% : 0.014087s : 1: opt_a 0.21% : 0.000162s : 1: opt_after_cconv 0.85% : 0.000658s : 1: opt_after_jit_grad 0.44% : 0.000341s : 1: opt_b 22.01% : 0.017032s : 1: optimize 0.04% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000033s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000006s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000071s : 1: pre_auto_parallel 0.07% : 0.000051s : 1: py_interpret_to_execute 0.03% : 0.000024s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000054s : 1: remove_dup_value 3.14% : 0.002432s : 2: renormalize.infer 2.42% : 0.001875s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000064s : 1: rewriter_after_opt_a 0.24% : 0.000186s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000121s : 1: symbol_engine_optimizer 12.31% : 0.009526s : 1: task_emit 0.15% : 0.000120s : 1: tuple_transform 17.95% : 0.013887s : 1: type_inference 0.14% : 0.000107s : 1: validate TotalTime = 0.0219384, [24] [bootstrap]: 0.00044015 [type_inference]: 0.00471958 [event_method]: 1.197e-05 [auto_monad]: 5.615e-05 [graph_reusing]: 5.05999e-06 [inline]: 2.65002e-06 [add_attr]: 0.0035119, [1] [add_attr_with_inline]: 0.00349782, [1] [Cycle 1]: 6.738e-05, [2] [tag_attr]: 1.635e-05 [meta_addattr_fg_expand]: 3.83999e-06 [parallel-infer-symbol]: 4.42e-06 [pre_auto_parallel]: 3.203e-05 [insert-virtual-dataset]: 2.91999e-06 [parallel-infer-symbol-second]: 9.39996e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00469544, [53] [py_interpret_to_execute]: 2.159e-05 [rewriter_before_opt_a]: 5.033e-05 [opt_a]: 0.00231748, [2] [Cycle 1]: 0.00167381, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 2.694e-05 [loop_unroll]: 1.378e-05 [a_1]: 0.00036204 [with_stream_mark]: 1.878e-05 [recompute_prepare]: 8.15e-06 [updatestate_depend_eliminate]: 3.93001e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 2.39001e-06 [a_2]: 7.899e-05 [accelerated_algorithm]: 6.74001e-06 [shard]: 2.73998e-06 [meta_shard_fg_expand]: 1.83997e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.62e-06 [auto_parallel]: 7.5e-06 [parallel]: 1.877e-05 [flash_sp]: 8.50999e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 9.97999e-06 [allreduce_slice_to_reducescatter]: 9.29984e-07 [virtual_shard_identity]: 8.50001e-06 [virtual_dataset]: 6.09999e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.76e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.021e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.183e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 1.042e-05 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 2.48998e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.70002e-06 [after_resolve]: 1.174e-05 [a_after_grad]: 9.14998e-06 [renormalize]: 0.00063379 [add_forward_monad_depend]: 5.64e-06 [auto_monad_grad]: 2.36e-06 [auto_monad_eliminator]: 1.456e-05 [cse]: 2.958e-05 [a_3]: 4.533e-05 [Cycle 2]: 0.00063197, [45] [expand_dump_flag]: 1.95001e-06 [switch_simplify]: 8.28001e-06 [loop_unroll]: 5.87999e-06 [a_1]: 0.00012939 [with_stream_mark]: 1.242e-05 [recompute_prepare]: 6.51e-06 [updatestate_depend_eliminate]: 3.51999e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.93997e-06 [a_2]: 6.917e-05 [accelerated_algorithm]: 5.74999e-06 [shard]: 1.27e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 5.37001e-06 [auto_parallel]: 7.19001e-06 [parallel]: 6.14001e-06 [flash_sp]: 3.97e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 3.08998e-06 [matmul_add_comm_reduction]: 6.53e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 6.61999e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 4.94998e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 3.16001e-06 [cell_reuse_recompute_pass]: 1.89999e-06 [offload_activation]: 6.87002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.067e-05 [merge_recompute_call_nodes]: 9.00007e-07 [before_grad]: 8.38999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 1.32e-06 [receive_attached]: 1.40001e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 7.97e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 7.70998e-06 [cse]: 1.493e-05 [a_3]: 3.216e-05 [py_interpret_to_execute_after_opt_a]: 1.189e-05 [slice_cell_reuse_recomputed_activation]: 1.91998e-06 [rewriter_after_opt_a]: 3.701e-05 [convert_after_rewriter]: 7.45998e-06 [order_py_execute_after_rewriter]: 5.74e-06 [mutable_eliminate]: 0.00069412 [opt_b]: 0.00020055, [1] [Cycle 1]: 0.00019251, [7] [b_1]: 0.00011284 [b_2]: 8.1e-06 [updatestate_depend_eliminate]: 9.12001e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 6.80011e-07 [cse]: 2.097e-05 [optimize_parallel_all_gather_comm]: 1.836e-05 [overlap_param_gather]: 2.32001e-06 [cconv]: 3.217e-05 [loop_unroll]: 0.00049446 [opt_after_cconv]: 0.00010721, [1] [Cycle 1]: 0.0001, [7] [c_1]: 2.889e-05 [parameter_eliminate]: 4.97999e-06 [updatestate_depend_eliminate]: 6.24001e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.997e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.447e-05 [tuple_transform]: 8.142e-05, [1] [Cycle 1]: 7.629e-05, [4] [d_1]: 4.742e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 7.06001e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 5.288e-05 [cse_after_recomputation]: 0.00010491, [1] [Cycle 1]: 1.752e-05, [1] [cse]: 1.173e-05 [environ_conv]: 5.93998e-06 [swap_dp_allreduce_reducescatter]: 7.00998e-06 [bias_add_comm_swap]: 3.31999e-06 [label_micro_interleaved_index]: 5.67999e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.34999e-06 [assign_add_opt]: 1.96e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 8.60018e-07 [full_micro_interleaved_order_control]: 2.82002e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.52999e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.23002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.17999e-06 [control_data_broadcast_order]: 1.414e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 4.57e-06 [overlap_recompute_and_grad_model_parallel]: 5.30001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 1.93002e-06 [overlap_grad_ring_attention]: 4.53001e-06 [overlap_grad_flash_sp]: 2.182e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 8.232e-05, [1] [Cycle 1]: 7.686e-05, [6] [build]: 4.25e-06 [elim_shapecalc]: 1.044e-05 [elim_not_effective]: 1.41e-05 [opt_reshape]: 7.38e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 2.01998e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.941e-05 [get_jit_bprop_graph]: 2.37001e-06 [rewriter_after_jit_bprop_graph]: 6.29001e-06 [opt_after_jit_grad]: 0.00060888 [validate]: 4.683e-05 [backend_pass]: 1.32999e-06 [task_emit]: 0.00744304 [execute]: 1.005e-05 Sums bootstrap : 0.000440s : 2.56% type_inference : 0.004720s : 27.44% event_method : 0.000012s : 0.07% auto_monad : 0.000056s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.03% pre_auto_parallel : 0.000032s : 0.19% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000050s : 0.29% optimize.opt_a.expand_dump_flag : 0.000005s : 0.03% optimize.opt_a.switch_simplify : 0.000035s : 0.20% optimize.opt_a.loop_unroll : 0.000020s : 0.11% optimize.opt_a.a_1 : 0.000491s : 2.86% optimize.opt_a.with_stream_mark : 0.000031s : 0.18% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000004s : 0.03% optimize.opt_a.a_2 : 0.000148s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.08% optimize.opt_a.auto_parallel : 0.000015s : 0.09% optimize.opt_a.parallel : 0.000025s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000022s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000634s : 3.69% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.13% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000077s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000037s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000694s : 4.04% optimize.opt_b.b_1 : 0.000113s : 0.66% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.05% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000021s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000032s : 0.19% optimize.loop_unroll : 0.000494s : 2.88% optimize.opt_after_cconv.c_1 : 0.000029s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000020s : 0.12% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000047s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.31% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000006s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000002s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000022s : 0.13% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000004s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000014s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000019s : 0.11% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000006s : 0.04% opt_after_jit_grad : 0.000609s : 3.54% validate : 0.000047s : 0.27% backend_pass : 0.000001s : 0.01% task_emit : 0.007443s : 43.28% execute : 0.000010s : 0.06% Time group info: ------[substitution.] 0.000187 26 15.15% : 0.000028s : 4: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.69% : 0.000001s : 2: substitution.fold_const_symbol 3.41% : 0.000006s : 4: substitution.graph_param_transform 71.99% : 0.000135s : 2: substitution.inline 1.95% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 2.88% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004665 2 91.53% : 0.004270s : 1: type_inference.infer 8.47% : 0.000395s : 1: type_inference.specialize ------[replace.] 0.000026 2 100.00% : 0.000026s : 2: replace.inline ------[match.] 0.000133 2 100.00% : 0.000133s : 2: match.inline ------[predicate.] 0.000147 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 2.16% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.67% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.72% : 0.000004s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.80% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.98% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.97% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 13: predicate.environ_get_depend_swap 1.79% : 0.000003s : 21: predicate.environ_get_eliminate 0.98% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.89% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 5.66% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.36% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.91% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 9: predicate.minmaximum_grad 1.66% : 0.000002s : 4: predicate.mutable_eliminate 0.53% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.12% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.84% : 0.000001s : 8: predicate.reduce_all_const_elim 0.87% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.52% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000002s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.26% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.67% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000007s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.53% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.83% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.76% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000307 6 38.41% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.59% : 0.000189s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031810 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.06% : 0.003518s : 1: add_attr 11.01% : 0.003503s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000057s : 1: add_recomputation 0.02% : 0.000005s : 1: assign_add_opt 0.19% : 0.000062s : 1: auto_monad 0.07% : 0.000023s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.50% : 0.000477s : 1: bootstrap 0.11% : 0.000036s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000018s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.34% : 0.000109s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.17% : 0.000056s : 1: event_method 0.05% : 0.000017s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000006s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000009s : 1: label_micro_interleaved_index 1.59% : 0.000506s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 2.23% : 0.000709s : 1: mutable_eliminate 0.02% : 0.000008s : 1: offloading_packed_experts 0.05% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000017s : 1: opt.transform.mutable_eliminate 2.71% : 0.000861s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.10% : 0.000031s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000094s : 28: opt.transform.opt_b 0.16% : 0.000052s : 2: opt.transform.opt_trans_graph 0.12% : 0.000037s : 4: opt.transform.symbol_engine_opt 7.30% : 0.002321s : 1: opt_a 0.35% : 0.000111s : 1: opt_after_cconv 1.98% : 0.000629s : 1: opt_after_jit_grad 0.64% : 0.000204s : 1: opt_b 14.78% : 0.004701s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000009s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.12% : 0.000037s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.05% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000019s : 1: remove_dup_value 1.17% : 0.000372s : 1: renormalize.infer 0.80% : 0.000253s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000042s : 1: rewriter_after_opt_a 0.17% : 0.000055s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000085s : 1: symbol_engine_optimizer 23.47% : 0.007465s : 1: task_emit 0.27% : 0.000084s : 1: tuple_transform 14.92% : 0.004747s : 1: type_inference 0.28% : 0.000089s : 1: validate TotalTime = 0.0441776, [24] [bootstrap]: 0.00049848 [type_inference]: 0.0121565 [event_method]: 4.87e-05 [auto_monad]: 0.00012403 [graph_reusing]: 8.45001e-06 [inline]: 3.31001e-06 [add_attr]: 0.00367484, [1] [add_attr_with_inline]: 0.0036615, [1] [Cycle 1]: 9.299e-05, [2] [tag_attr]: 3.828e-05 [meta_addattr_fg_expand]: 8.88002e-06 [parallel-infer-symbol]: 3.68999e-06 [pre_auto_parallel]: 5.859e-05 [insert-virtual-dataset]: 2.96999e-06 [parallel-infer-symbol-second]: 8.40024e-07 [dataset_repeat_opt]: 2.19999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0161483, [53] [py_interpret_to_execute]: 4.254e-05 [rewriter_before_opt_a]: 0.0001457 [opt_a]: 0.0132732, [3] [Cycle 1]: 0.00859769, [45] [expand_dump_flag]: 4.38999e-06 [switch_simplify]: 6.962e-05 [loop_unroll]: 5.634e-05 [a_1]: 0.0015109 [with_stream_mark]: 3.819e-05 [recompute_prepare]: 3.125e-05 [updatestate_depend_eliminate]: 1.022e-05 [updatestate_assign_eliminate]: 8.48999e-06 [updatestate_loads_eliminate]: 7.53999e-06 [parameter_eliminate]: 3.18998e-06 [a_2]: 0.00026629 [accelerated_algorithm]: 4.046e-05 [shard]: 2.26998e-06 [meta_shard_fg_expand]: 4.42e-06 [shard_inline]: 1.732e-05 [merge_send_recv]: 2.051e-05 [auto_parallel]: 1.618e-05 [parallel]: 2.235e-05 [flash_sp]: 1.522e-05 [merge_comm]: 1.078e-05 [allreduce_fusion]: 9.64999e-06 [matmul_add_comm_reduction]: 3.775e-05 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 2.389e-05 [virtual_dataset]: 1.644e-05 [get_grad_eliminate_]: 1.592e-05 [virtual_output]: 1.678e-05 [merge_forward]: 1.022e-05 [cell_reuse_recompute_pass]: 1.92001e-06 [offload_activation]: 2.24e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.291e-05 [merge_recompute_call_nodes]: 1.80001e-06 [before_grad]: 2.901e-05 [set_forward_comm_id_for_comm_node_pass]: 1.198e-05 [meta_fg_expand]: 0.00185319 [flash_sp_send_recv_attached]: 5.17e-06 [receive_attached]: 2.67001e-06 [after_resolve]: 7.725e-05 [a_after_grad]: 9.082e-05 [renormalize]: 0.00316075 [add_forward_monad_depend]: 1.418e-05 [auto_monad_grad]: 6.73e-06 [auto_monad_eliminator]: 6.212e-05 [cse]: 0.00017817 [a_3]: 0.00036544 [Cycle 2]: 0.00366511, [45] [expand_dump_flag]: 3.38999e-06 [switch_simplify]: 5.08e-05 [loop_unroll]: 4.497e-05 [a_1]: 0.00168083 [with_stream_mark]: 2.748e-05 [recompute_prepare]: 1.666e-05 [updatestate_depend_eliminate]: 6.85998e-06 [updatestate_assign_eliminate]: 5.72999e-06 [updatestate_loads_eliminate]: 4.84003e-06 [parameter_eliminate]: 2.08998e-06 [a_2]: 0.00013456 [accelerated_algorithm]: 1.46e-05 [shard]: 2.02001e-06 [meta_shard_fg_expand]: 2.73e-06 [shard_inline]: 1.009e-05 [merge_send_recv]: 1.104e-05 [auto_parallel]: 1.108e-05 [parallel]: 2.601e-05 [flash_sp]: 3.97e-06 [merge_comm]: 6.16998e-06 [allreduce_fusion]: 5.14e-06 [matmul_add_comm_reduction]: 1.337e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 1.297e-05 [virtual_dataset]: 9.42001e-06 [get_grad_eliminate_]: 8.98002e-06 [virtual_output]: 8.94e-06 [merge_forward]: 5.35999e-06 [cell_reuse_recompute_pass]: 1.89e-06 [offload_activation]: 1.381e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.991e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 1.547e-05 [set_forward_comm_id_for_comm_node_pass]: 5.73997e-06 [meta_fg_expand]: 6.338e-05 [flash_sp_send_recv_attached]: 1.79998e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.967e-05 [a_after_grad]: 1.435e-05 [renormalize]: 0.00091531 [add_forward_monad_depend]: 5.77001e-06 [auto_monad_grad]: 2.51998e-06 [auto_monad_eliminator]: 2.175e-05 [cse]: 6.228e-05 [a_3]: 7.113e-05 [Cycle 3]: 0.00098858, [45] [expand_dump_flag]: 1.82999e-06 [switch_simplify]: 1.143e-05 [loop_unroll]: 9.29e-06 [a_1]: 0.00026508 [with_stream_mark]: 1.413e-05 [recompute_prepare]: 1.083e-05 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 4.50001e-06 [updatestate_loads_eliminate]: 4.91002e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 0.00012879 [accelerated_algorithm]: 1.33e-05 [shard]: 2.13998e-06 [meta_shard_fg_expand]: 2.39999e-06 [shard_inline]: 9.20001e-06 [merge_send_recv]: 8.59e-06 [auto_parallel]: 9.91e-06 [parallel]: 8.55001e-06 [flash_sp]: 1.49e-06 [merge_comm]: 5.24e-06 [allreduce_fusion]: 5.15999e-06 [matmul_add_comm_reduction]: 1.092e-05 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 1.022e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 9.02e-06 [virtual_output]: 8.26002e-06 [merge_forward]: 5.34e-06 [cell_reuse_recompute_pass]: 2.37999e-06 [offload_activation]: 1.164e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.717e-05 [merge_recompute_call_nodes]: 1.11002e-06 [before_grad]: 1.539e-05 [set_forward_comm_id_for_comm_node_pass]: 5.49998e-06 [meta_fg_expand]: 3.21999e-06 [flash_sp_send_recv_attached]: 1.50001e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.651e-05 [a_after_grad]: 1.489e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.47001e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.319e-05 [cse]: 2.979e-05 [a_3]: 6.345e-05 [py_interpret_to_execute_after_opt_a]: 1.845e-05 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 5.907e-05 [convert_after_rewriter]: 1.007e-05 [order_py_execute_after_rewriter]: 7.4e-06 [mutable_eliminate]: 0.00072802 [opt_b]: 0.00032425, [1] [Cycle 1]: 0.00031465, [7] [b_1]: 0.00020025 [b_2]: 1.201e-05 [updatestate_depend_eliminate]: 1.073e-05 [updatestate_assign_eliminate]: 4.79e-06 [updatestate_loads_eliminate]: 4.28001e-06 [renormalize]: 9.50007e-07 [cse]: 4.186e-05 [optimize_parallel_all_gather_comm]: 2.565e-05 [overlap_param_gather]: 2.13002e-06 [cconv]: 3.45e-05 [loop_unroll]: 0.00054881 [opt_after_cconv]: 0.00015902, [1] [Cycle 1]: 0.00015117, [7] [c_1]: 5.374e-05 [parameter_eliminate]: 5.52999e-06 [updatestate_depend_eliminate]: 8.75001e-06 [updatestate_assign_eliminate]: 4.80001e-06 [updatestate_loads_eliminate]: 4.08001e-06 [cse]: 3.762e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 4.894e-05 [tuple_transform]: 0.00011378, [1] [Cycle 1]: 0.00010809, [4] [d_1]: 7.506e-05 [none_parameter_eliminate]: 1.89e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 1.016e-05 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 7.441e-05 [cse_after_recomputation]: 3.661e-05, [1] [Cycle 1]: 3.068e-05, [1] [cse]: 2.276e-05 [environ_conv]: 1.216e-05 [swap_dp_allreduce_reducescatter]: 7.79997e-06 [bias_add_comm_swap]: 2.69999e-06 [label_micro_interleaved_index]: 6.26998e-06 [label_fine_grained_interleaved_index]: 3.23e-06 [merge_cast_opt]: 1.64e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.64e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71002e-06 [control_data_broadcast_order]: 1.954e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 5.74e-06 [overlap_recompute_and_grad_model_parallel]: 6.44999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.16998e-06 [overlap_grad_ring_attention]: 5.71e-06 [overlap_grad_flash_sp]: 2.949e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.13002e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 0.00011174, [1] [Cycle 1]: 0.00010599, [6] [build]: 1.299e-05 [elim_shapecalc]: 1.485e-05 [elim_not_effective]: 1.982e-05 [opt_reshape]: 1.129e-05 [fold_const_symbol]: 1.569e-05 [renormalize]: 1.8999e-07 [detach_backward]: 2.51e-06 [pipeline_parallel_scheduler]: 1.66998e-06 [auto_monad_reorder]: 2.693e-05 [get_jit_bprop_graph]: 2.21998e-06 [rewriter_after_jit_bprop_graph]: 6.68998e-06 [opt_after_jit_grad]: 0.00057471 [validate]: 5.864e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0104813 [execute]: 9.79999e-06 Sums bootstrap : 0.000498s : 1.28% type_inference : 0.012157s : 31.23% event_method : 0.000049s : 0.13% auto_monad : 0.000124s : 0.32% graph_reusing : 0.000008s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.02% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000059s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000043s : 0.11% optimize.rewriter_before_opt_a : 0.000146s : 0.37% optimize.opt_a.expand_dump_flag : 0.000010s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.34% optimize.opt_a.loop_unroll : 0.000111s : 0.28% optimize.opt_a.a_1 : 0.003457s : 8.88% optimize.opt_a.with_stream_mark : 0.000080s : 0.20% optimize.opt_a.recompute_prepare : 0.000059s : 0.15% optimize.opt_a.updatestate_depend_eliminate : 0.000023s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000019s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.04% optimize.opt_a.parameter_eliminate : 0.000007s : 0.02% optimize.opt_a.a_2 : 0.000530s : 1.36% optimize.opt_a.accelerated_algorithm : 0.000068s : 0.18% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.02% optimize.opt_a.shard_inline : 0.000037s : 0.09% optimize.opt_a.merge_send_recv : 0.000040s : 0.10% optimize.opt_a.auto_parallel : 0.000037s : 0.10% optimize.opt_a.parallel : 0.000057s : 0.15% optimize.opt_a.flash_sp : 0.000021s : 0.05% optimize.opt_a.merge_comm : 0.000022s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000062s : 0.16% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000047s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.09% optimize.opt_a.virtual_output : 0.000034s : 0.09% optimize.opt_a.merge_forward : 0.000021s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.02% optimize.opt_a.offload_activation : 0.000048s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000070s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000060s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000023s : 0.06% optimize.opt_a.meta_fg_expand : 0.001920s : 4.93% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.02% optimize.opt_a.receive_attached : 0.000008s : 0.02% optimize.opt_a.after_resolve : 0.000113s : 0.29% optimize.opt_a.a_after_grad : 0.000120s : 0.31% optimize.opt_a.renormalize : 0.004076s : 10.47% optimize.opt_a.add_forward_monad_depend : 0.000021s : 0.06% optimize.opt_a.auto_monad_grad : 0.000011s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000097s : 0.25% optimize.opt_a.cse : 0.000270s : 0.69% optimize.opt_a.a_3 : 0.000500s : 1.28% optimize.py_interpret_to_execute_after_opt_a : 0.000018s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000059s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000728s : 1.87% optimize.opt_b.b_1 : 0.000200s : 0.51% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000011s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000042s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000035s : 0.09% optimize.loop_unroll : 0.000549s : 1.41% optimize.opt_after_cconv.c_1 : 0.000054s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000038s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000049s : 0.13% optimize.tuple_transform.d_1 : 0.000075s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000074s : 0.19% optimize.cse_after_recomputation.cse : 0.000023s : 0.06% optimize.environ_conv : 0.000012s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000006s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000020s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000029s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000007s : 0.02% opt_after_jit_grad : 0.000575s : 1.48% validate : 0.000059s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.010481s : 26.92% execute : 0.000010s : 0.03% Time group info: ------[substitution.] 0.000975 218 6.55% : 0.000064s : 11: substitution.arithmetic_simplify 1.97% : 0.000019s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000005s : 5: substitution.float_depend_g_call 0.50% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 5: substitution.fold_const_symbol 0.90% : 0.000009s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 56.84% : 0.000554s : 16: substitution.inline 2.44% : 0.000024s : 2: substitution.inline_without_move 1.23% : 0.000012s : 20: substitution.j_node_and_user_rematch 2.07% : 0.000020s : 3: substitution.less_batch_normalization 1.51% : 0.000015s : 11: substitution.minmaximum_grad 0.75% : 0.000007s : 5: substitution.partial_eliminate 1.52% : 0.000015s : 20: substitution.remove_not_recompute_node 3.26% : 0.000032s : 10: substitution.replace_applicator 1.68% : 0.000016s : 15: substitution.replace_old_param 0.41% : 0.000004s : 1: substitution.set_cell_output_no_recompute 3.21% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.56% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.10% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.84% : 0.000076s : 28: substitution.tuple_list_get_item_eliminator 2.07% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012060 2 86.37% : 0.010416s : 1: type_inference.infer 13.63% : 0.001644s : 1: type_inference.specialize ------[replace.] 0.000231 30 61.66% : 0.000142s : 16: replace.inline 38.34% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000578 30 93.94% : 0.000543s : 16: match.inline 6.06% : 0.000035s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000783 5663 1.06% : 0.000008s : 67: predicate.accumulaten_eliminater 0.39% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000009s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 99: predicate.arithmetic_simplify 1.22% : 0.000010s : 67: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.21% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.47% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000010s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000014s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.62% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000018s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.12% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000044s : 244: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.58% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.55% : 0.000020s : 164: predicate.load_eliminater 0.45% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.13% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000009s : 67: predicate.minmaximum_grad 0.46% : 0.000004s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000016s : 97: predicate.partial_defer_inline 1.65% : 0.000013s : 89: predicate.partial_eliminate 1.02% : 0.000008s : 67: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.59% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000015s : 149: predicate.replace_applicator 0.75% : 0.000006s : 55: predicate.replace_old_param 0.16% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000009s : 67: predicate.reshape_eliminate 1.09% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.37% : 0.000011s : 68: predicate.same_eliminate 0.48% : 0.000004s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.38% : 0.000003s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.43% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.19% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000015s : 97: predicate.switch_defer_inline 2.82% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000038s : 265: predicate.switch_simplify 1.04% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.25% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000023s : 129: predicate.tuple_list_get_item_eliminator 1.40% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.53% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.53% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.16% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000005s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.23% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001852 32 58.01% : 0.001074s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.99% : 0.000778s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.073683 237 0.00% : 0.000004s : 1: ForceFp32Comm 4.99% : 0.003680s : 1: add_attr 4.98% : 0.003666s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000079s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.18% : 0.000132s : 1: auto_monad 0.04% : 0.000031s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.73% : 0.000535s : 1: bootstrap 0.05% : 0.000038s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000023s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000040s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000016s : 1: environ_conv 0.08% : 0.000057s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.76% : 0.000562s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.01% : 0.000743s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.03% : 0.000023s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000023s : 1: opt.transform.mutable_eliminate 7.14% : 0.005258s : 117: opt.transform.opt_a 0.07% : 0.000052s : 1: opt.transform.opt_after_cconv 0.06% : 0.000041s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000184s : 28: opt.transform.opt_b 0.11% : 0.000083s : 2: opt.transform.opt_trans_graph 0.08% : 0.000058s : 4: opt.transform.symbol_engine_opt 18.02% : 0.013277s : 1: opt_a 0.22% : 0.000163s : 1: opt_after_cconv 0.80% : 0.000589s : 1: opt_after_jit_grad 0.45% : 0.000328s : 1: opt_b 21.92% : 0.016154s : 1: optimize 0.04% : 0.000029s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000033s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000010s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000063s : 1: pre_auto_parallel 0.06% : 0.000047s : 1: py_interpret_to_execute 0.03% : 0.000022s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000054s : 1: remove_dup_value 3.04% : 0.002241s : 2: renormalize.infer 2.46% : 0.001815s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000063s : 1: rewriter_after_opt_a 0.21% : 0.000152s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000115s : 1: symbol_engine_optimizer 14.26% : 0.010505s : 1: task_emit 0.16% : 0.000117s : 1: tuple_transform 16.54% : 0.012190s : 1: type_inference 0.14% : 0.000105s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x1-kbk],max_mem:38.0M TotalTime = 0.11657, [24] [bootstrap]: 0.00074438 [type_inference]: 0.00950159 [event_method]: 1.905e-05 [auto_monad]: 6.312e-05 [graph_reusing]: 6.36998e-06 [inline]: 3.60998e-06 [add_attr]: 0.00491787, [1] [add_attr_with_inline]: 0.00489824, [1] [Cycle 1]: 8.081e-05, [2] [tag_attr]: 2.328e-05 [meta_addattr_fg_expand]: 4.46002e-06 [parallel-infer-symbol]: 4.13999e-06 [pre_auto_parallel]: 4.501e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 3.07002e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00560218, [53] [py_interpret_to_execute]: 3.096e-05 [rewriter_before_opt_a]: 7.717e-05 [opt_a]: 0.00303033, [2] [Cycle 1]: 0.0022887, [45] [expand_dump_flag]: 3.12002e-06 [switch_simplify]: 3.596e-05 [loop_unroll]: 2.095e-05 [a_1]: 0.00054823 [with_stream_mark]: 2.283e-05 [recompute_prepare]: 1.175e-05 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 2.91999e-06 [a_2]: 8.226e-05 [accelerated_algorithm]: 8.77e-06 [shard]: 2.64001e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 6.82002e-06 [merge_send_recv]: 1.022e-05 [auto_parallel]: 1.012e-05 [parallel]: 3.621e-05 [flash_sp]: 1.194e-05 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 1.143e-05 [allreduce_slice_to_reducescatter]: 9.50007e-07 [virtual_shard_identity]: 1.219e-05 [virtual_dataset]: 6.55002e-06 [get_grad_eliminate_]: 6.05002e-06 [virtual_output]: 6.86001e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.66998e-06 [offload_activation]: 1.06e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.713e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 1.31e-05 [set_forward_comm_id_for_comm_node_pass]: 4.45e-06 [meta_fg_expand]: 2.56e-06 [flash_sp_send_recv_attached]: 2.86999e-06 [receive_attached]: 2.83e-06 [after_resolve]: 1.415e-05 [a_after_grad]: 9.89999e-06 [renormalize]: 0.00089111 [add_forward_monad_depend]: 8.37e-06 [auto_monad_grad]: 2.75997e-06 [auto_monad_eliminator]: 2.242e-05 [cse]: 3.34e-05 [a_3]: 5.299e-05 [Cycle 2]: 0.00072568, [45] [expand_dump_flag]: 1.96e-06 [switch_simplify]: 8.65999e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00013675 [with_stream_mark]: 1.695e-05 [recompute_prepare]: 6.71999e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 2.93e-06 [updatestate_loads_eliminate]: 3.25e-06 [parameter_eliminate]: 1.91e-06 [a_2]: 7.096e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 3.04999e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 6.82002e-06 [merge_send_recv]: 8.74e-06 [auto_parallel]: 9.94999e-06 [parallel]: 8.38999e-06 [flash_sp]: 4.87e-06 [merge_comm]: 3.47997e-06 [allreduce_fusion]: 3.13e-06 [matmul_add_comm_reduction]: 9.61e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 9.67001e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.81998e-06 [merge_forward]: 4.07e-06 [cell_reuse_recompute_pass]: 2.24001e-06 [offload_activation]: 1.074e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.254e-05 [merge_recompute_call_nodes]: 2.07001e-06 [before_grad]: 8.95001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.21001e-06 [meta_fg_expand]: 1.90001e-06 [flash_sp_send_recv_attached]: 1.59e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.174e-05 [a_after_grad]: 8.77e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 2.99999e-06 [auto_monad_grad]: 1.25999e-06 [auto_monad_eliminator]: 8.87e-06 [cse]: 3.414e-05 [a_3]: 3.679e-05 [py_interpret_to_execute_after_opt_a]: 1.99e-05 [slice_cell_reuse_recomputed_activation]: 2.52001e-06 [rewriter_after_opt_a]: 5.64e-05 [convert_after_rewriter]: 8.65999e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.00081139 [opt_b]: 0.00022144, [1] [Cycle 1]: 0.00021114, [7] [b_1]: 0.00011647 [b_2]: 8.65001e-06 [updatestate_depend_eliminate]: 1.15e-05 [updatestate_assign_eliminate]: 2.91999e-06 [updatestate_loads_eliminate]: 3.3e-06 [renormalize]: 9.5999e-07 [cse]: 3.035e-05 [optimize_parallel_all_gather_comm]: 2.374e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 3.757e-05 [loop_unroll]: 0.00051123 [opt_after_cconv]: 0.00011586, [1] [Cycle 1]: 0.00010763, [7] [c_1]: 2.889e-05 [parameter_eliminate]: 6.73003e-06 [updatestate_depend_eliminate]: 7.00002e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 2.438e-05 [renormalize]: 7.7e-07 [remove_dup_value]: 1.563e-05 [tuple_transform]: 8.077e-05, [1] [Cycle 1]: 7.532e-05, [4] [d_1]: 4.7e-05 [none_parameter_eliminate]: 1.63002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.84999e-06 [partial_unused_args_eliminate]: 2.39999e-06 [add_recomputation]: 7.519e-05 [cse_after_recomputation]: 2.37e-05, [1] [Cycle 1]: 1.881e-05, [1] [cse]: 1.294e-05 [environ_conv]: 6.24999e-06 [swap_dp_allreduce_reducescatter]: 6.40997e-06 [bias_add_comm_swap]: 3.07002e-06 [label_micro_interleaved_index]: 6.40002e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.46998e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 1.11002e-06 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.39998e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.50999e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.52001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.35e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 4.3e-06 [overlap_recompute_and_grad_model_parallel]: 5.17e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.84e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.35999e-06 [overlap_grad_flash_sp]: 2.242e-05 [begin_end_overlap_inline]: 5.79981e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 7.877e-05, [1] [Cycle 1]: 7.361e-05, [6] [build]: 4.53001e-06 [elim_shapecalc]: 9.95002e-06 [elim_not_effective]: 1.343e-05 [opt_reshape]: 6.46e-06 [fold_const_symbol]: 9.72999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.71e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 2.009e-05 [get_jit_bprop_graph]: 2.25002e-06 [rewriter_after_jit_bprop_graph]: 7.16001e-06 [opt_after_jit_grad]: 0.00059325 [validate]: 4.931e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.0946646 [execute]: 1.009e-05 Sums bootstrap : 0.000744s : 0.67% type_inference : 0.009502s : 8.61% event_method : 0.000019s : 0.02% auto_monad : 0.000063s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000023s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000045s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000031s : 0.03% optimize.rewriter_before_opt_a : 0.000077s : 0.07% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000045s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000685s : 0.62% optimize.opt_a.with_stream_mark : 0.000040s : 0.04% optimize.opt_a.recompute_prepare : 0.000018s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000153s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000015s : 0.01% optimize.opt_a.shard : 0.000006s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000014s : 0.01% optimize.opt_a.merge_send_recv : 0.000019s : 0.02% optimize.opt_a.auto_parallel : 0.000020s : 0.02% optimize.opt_a.parallel : 0.000045s : 0.04% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000021s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000022s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000013s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000021s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000030s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000022s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000009s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000026s : 0.02% optimize.opt_a.a_after_grad : 0.000019s : 0.02% optimize.opt_a.renormalize : 0.000891s : 0.81% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000031s : 0.03% optimize.opt_a.cse : 0.000068s : 0.06% optimize.opt_a.a_3 : 0.000090s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000056s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000811s : 0.73% optimize.opt_b.b_1 : 0.000116s : 0.11% optimize.opt_b.b_2 : 0.000009s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000038s : 0.03% optimize.loop_unroll : 0.000511s : 0.46% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000024s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000016s : 0.01% optimize.tuple_transform.d_1 : 0.000047s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000075s : 0.07% optimize.cse_after_recomputation.cse : 0.000013s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000005s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000020s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.01% opt_after_jit_grad : 0.000593s : 0.54% validate : 0.000049s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.094665s : 85.74% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000246 30 15.44% : 0.000038s : 5: substitution.arithmetic_simplify 0.90% : 0.000002s : 2: substitution.elim_not_effective 0.57% : 0.000001s : 2: substitution.fold_const_symbol 2.42% : 0.000006s : 4: substitution.graph_param_transform 67.89% : 0.000167s : 3: substitution.inline 2.16% : 0.000005s : 4: substitution.j_node_and_user_rematch 2.21% : 0.000005s : 4: substitution.remove_not_recompute_node 2.63% : 0.000006s : 4: substitution.replace_old_param 5.78% : 0.000014s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.009415 2 92.50% : 0.008709s : 1: type_inference.infer 7.50% : 0.000706s : 1: type_inference.specialize ------[replace.] 0.000050 5 71.54% : 0.000036s : 3: replace.inline 28.46% : 0.000014s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000178 5 92.56% : 0.000164s : 3: match.inline 7.44% : 0.000013s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000178 1131 0.78% : 0.000001s : 11: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 8: predicate.addn_check_dump 0.95% : 0.000002s : 11: predicate.addn_zero_filter 0.69% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000004s : 19: predicate.arithmetic_simplify 0.78% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.35% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.33% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.00% : 0.000002s : 15: predicate.environ_get_depend_swap 1.65% : 0.000003s : 23: predicate.environ_get_eliminate 1.00% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.74% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.19% : 0.000000s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.50% : 0.000001s : 8: predicate.incorporate_call_switch 6.35% : 0.000011s : 51: predicate.inline 0.91% : 0.000002s : 8: predicate.inline_without_move 0.34% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.04% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.58% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.53% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 11: predicate.minmaximum_grad 1.71% : 0.000003s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.76% : 0.000003s : 16: predicate.partial_defer_inline 1.29% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.22% : 0.000002s : 11: predicate.reduce_eliminate 2.20% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 21: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.50% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000002s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.52% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000002s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.38% : 0.000002s : 8: predicate.shard_identity_eliminate 0.90% : 0.000002s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 1.16% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.20% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000009s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.96% : 0.000002s : 11: predicate.transpose_eliminate 1.37% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.42% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.56% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.14% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.06% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.66% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.94% : 0.000002s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000504 8 46.29% : 0.000233s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.71% : 0.000271s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.129245 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.81% : 0.004925s : 1: add_attr 3.79% : 0.004903s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000080s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000069s : 1: auto_monad 0.02% : 0.000024s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.62% : 0.000804s : 1: bootstrap 0.03% : 0.000042s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000027s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000026s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.41% : 0.000525s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.64% : 0.000833s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000022s : 1: opt.transform.mutable_eliminate 0.86% : 0.001106s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000097s : 28: opt.transform.opt_b 0.04% : 0.000052s : 2: opt.transform.opt_trans_graph 0.03% : 0.000036s : 4: opt.transform.symbol_engine_opt 2.35% : 0.003034s : 1: opt_a 0.09% : 0.000120s : 1: opt_after_cconv 0.47% : 0.000612s : 1: opt_after_jit_grad 0.17% : 0.000226s : 1: opt_b 4.34% : 0.005608s : 1: optimize 0.02% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000036s : 1: py_interpret_to_execute 0.02% : 0.000024s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000019s : 1: remove_dup_value 0.38% : 0.000497s : 1: renormalize.infer 0.29% : 0.000380s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000010s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000063s : 1: rewriter_after_opt_a 0.06% : 0.000082s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000082s : 1: symbol_engine_optimizer 73.26% : 0.094691s : 1: task_emit 0.06% : 0.000084s : 1: tuple_transform 7.38% : 0.009537s : 1: type_inference 0.07% : 0.000089s : 1: validate TotalTime = 0.0924318, [24] [bootstrap]: 0.00052347 [type_inference]: 0.00539775 [event_method]: 1.405e-05 [auto_monad]: 5.784e-05 [graph_reusing]: 5.72001e-06 [inline]: 2.99999e-06 [add_attr]: 0.00373267, [1] [add_attr_with_inline]: 0.0037191, [1] [Cycle 1]: 6.799e-05, [2] [tag_attr]: 1.641e-05 [meta_addattr_fg_expand]: 3.55998e-06 [parallel-infer-symbol]: 3.99002e-06 [pre_auto_parallel]: 3.093e-05 [insert-virtual-dataset]: 2.71999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00477475, [53] [py_interpret_to_execute]: 2.145e-05 [rewriter_before_opt_a]: 5.279e-05 [opt_a]: 0.00242313, [2] [Cycle 1]: 0.00172915, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.57e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.00033092 [with_stream_mark]: 1.965e-05 [recompute_prepare]: 8.92e-06 [updatestate_depend_eliminate]: 4.45e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.851e-05 [accelerated_algorithm]: 6.98998e-06 [shard]: 2.79999e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.83001e-06 [auto_parallel]: 6.96001e-06 [parallel]: 2.559e-05 [flash_sp]: 8.77e-06 [merge_comm]: 3.91999e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 1.032e-05 [allreduce_slice_to_reducescatter]: 6.90023e-07 [virtual_shard_identity]: 7.56999e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 4.1e-06 [cell_reuse_recompute_pass]: 1.69e-06 [offload_activation]: 1.118e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.229e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 1.085e-05 [set_forward_comm_id_for_comm_node_pass]: 3.88999e-06 [meta_fg_expand]: 2.91999e-06 [flash_sp_send_recv_attached]: 3.21001e-06 [receive_attached]: 2.72001e-06 [after_resolve]: 1.152e-05 [a_after_grad]: 9.27001e-06 [renormalize]: 0.00069458 [add_forward_monad_depend]: 6.44999e-06 [auto_monad_grad]: 2.86e-06 [auto_monad_eliminator]: 1.712e-05 [cse]: 3.083e-05 [a_3]: 4.718e-05 [Cycle 2]: 0.00068136, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 8.00999e-06 [loop_unroll]: 5.38002e-06 [a_1]: 0.00013196 [with_stream_mark]: 1.598e-05 [recompute_prepare]: 6.22001e-06 [updatestate_depend_eliminate]: 3.18e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 1.71002e-06 [a_2]: 6.861e-05 [accelerated_algorithm]: 5.80002e-06 [shard]: 1.84998e-06 [meta_shard_fg_expand]: 1.91003e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 5.92999e-06 [auto_parallel]: 7.92998e-06 [parallel]: 7.18998e-06 [flash_sp]: 4.18001e-06 [merge_comm]: 3.30998e-06 [allreduce_fusion]: 3.11001e-06 [matmul_add_comm_reduction]: 8.81002e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 3.49001e-06 [cell_reuse_recompute_pass]: 1.85001e-06 [offload_activation]: 9.61e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.223e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 8.79e-06 [set_forward_comm_id_for_comm_node_pass]: 4.18999e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 1.40999e-06 [receive_attached]: 1.40999e-06 [after_resolve]: 1.103e-05 [a_after_grad]: 1.079e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 2.59999e-06 [auto_monad_grad]: 1.42e-06 [auto_monad_eliminator]: 9.66e-06 [cse]: 1.696e-05 [a_3]: 3.307e-05 [py_interpret_to_execute_after_opt_a]: 1.449e-05 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 4.485e-05 [convert_after_rewriter]: 7.5e-06 [order_py_execute_after_rewriter]: 5.67001e-06 [mutable_eliminate]: 0.00073715 [opt_b]: 0.00021341, [1] [Cycle 1]: 0.00020495, [7] [b_1]: 0.00011512 [b_2]: 8.11002e-06 [updatestate_depend_eliminate]: 9.78002e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.74001e-06 [renormalize]: 7.39994e-07 [cse]: 2.646e-05 [optimize_parallel_all_gather_comm]: 2.12e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 3.492e-05 [loop_unroll]: 0.00048521 [opt_after_cconv]: 0.00010781, [1] [Cycle 1]: 0.00010121, [7] [c_1]: 2.974e-05 [parameter_eliminate]: 4.74998e-06 [updatestate_depend_eliminate]: 6.27001e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 2.021e-05 [renormalize]: 5.99975e-07 [remove_dup_value]: 1.502e-05 [tuple_transform]: 7.732e-05, [1] [Cycle 1]: 7.226e-05, [4] [d_1]: 4.474e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 2.05002e-06 [add_recomputation]: 5.405e-05 [cse_after_recomputation]: 2.173e-05, [1] [Cycle 1]: 1.713e-05, [1] [cse]: 1.154e-05 [environ_conv]: 6.33998e-06 [swap_dp_allreduce_reducescatter]: 5.33002e-06 [bias_add_comm_swap]: 3.20002e-06 [label_micro_interleaved_index]: 5.34e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.94e-06 [slice_recompute_activation]: 2.41998e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.42e-06 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 3.09999e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.43002e-06 [overlap_opt_shard_in_pipeline]: 1.26997e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06998e-06 [control_data_broadcast_order]: 1.329e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 4.24002e-06 [overlap_recompute_and_grad_model_parallel]: 4.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.62001e-06 [overlap_recompute_comm]: 2.63e-06 [overlap_grad_ring_attention]: 4.47e-06 [overlap_grad_flash_sp]: 2.301e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.711e-05, [1] [Cycle 1]: 7.265e-05, [6] [build]: 4.23999e-06 [elim_shapecalc]: 9.92999e-06 [elim_not_effective]: 1.206e-05 [opt_reshape]: 6.41998e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.66999e-06 [pipeline_parallel_scheduler]: 1.85001e-06 [auto_monad_reorder]: 1.884e-05 [get_jit_bprop_graph]: 1.86e-06 [rewriter_after_jit_bprop_graph]: 5.80002e-06 [opt_after_jit_grad]: 0.000539 [validate]: 4.837e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.0769769 [execute]: 9.92999e-06 Sums bootstrap : 0.000523s : 0.60% type_inference : 0.005398s : 6.16% event_method : 0.000014s : 0.02% auto_monad : 0.000058s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000031s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000053s : 0.06% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000034s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000463s : 0.53% optimize.opt_a.with_stream_mark : 0.000036s : 0.04% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.02% optimize.opt_a.auto_parallel : 0.000015s : 0.02% optimize.opt_a.parallel : 0.000033s : 0.04% optimize.opt_a.flash_sp : 0.000013s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000019s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000021s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000023s : 0.03% optimize.opt_a.a_after_grad : 0.000020s : 0.02% optimize.opt_a.renormalize : 0.000695s : 0.79% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000027s : 0.03% optimize.opt_a.cse : 0.000048s : 0.05% optimize.opt_a.a_3 : 0.000080s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000045s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000737s : 0.84% optimize.opt_b.b_1 : 0.000115s : 0.13% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000026s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.04% optimize.loop_unroll : 0.000485s : 0.55% optimize.opt_after_cconv.c_1 : 0.000030s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000020s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000019s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000539s : 0.62% validate : 0.000048s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.076977s : 87.92% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000158 26 18.14% : 0.000029s : 4: substitution.arithmetic_simplify 1.30% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 4.02% : 0.000006s : 4: substitution.graph_param_transform 66.62% : 0.000105s : 2: substitution.inline 2.48% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.18% : 0.000005s : 4: substitution.remove_not_recompute_node 3.45% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.005336 2 91.76% : 0.004896s : 1: type_inference.infer 8.24% : 0.000440s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000103 2 100.00% : 0.000103s : 2: match.inline ------[predicate.] 0.000153 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 9: predicate.addn_zero_filter 0.66% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 17: predicate.arithmetic_simplify 0.66% : 0.000001s : 9: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.71% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.87% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.62% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.00% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.00% : 0.000002s : 13: predicate.environ_get_depend_swap 1.77% : 0.000003s : 21: predicate.environ_get_eliminate 0.95% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.88% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.71% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.37% : 0.000010s : 44: predicate.inline 1.24% : 0.000002s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.18% : 0.000002s : 8: predicate.less_batch_normalization 1.39% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.98% : 0.000003s : 26: predicate.load_eliminater 2.00% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.60% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.64% : 0.000001s : 9: predicate.minmaximum_grad 2.07% : 0.000003s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.72% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.12% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 9: predicate.reduce_eliminate 2.04% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.18% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.50% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.80% : 0.000001s : 4: predicate.row_tensor_eliminate 1.18% : 0.000002s : 8: predicate.same_eliminate 0.71% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000002s : 8: predicate.shard_identity_eliminate 1.07% : 0.000002s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.55% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.60% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.94% : 0.000001s : 11: predicate.switch_defer_inline 1.57% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.07% : 0.000006s : 41: predicate.switch_simplify 0.91% : 0.000001s : 9: predicate.tile_eliminate 0.71% : 0.000001s : 9: predicate.transpose_eliminate 1.34% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 17: predicate.tuple_to_list_eliminator_ 1.89% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.87% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000367 6 39.64% : 0.000145s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.36% : 0.000221s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.102638 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.64% : 0.003739s : 1: add_attr 3.63% : 0.003724s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000059s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.06% : 0.000064s : 1: auto_monad 0.02% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.55% : 0.000567s : 1: bootstrap 0.04% : 0.000039s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000010s : 1: environ_conv 0.02% : 0.000022s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.48% : 0.000497s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000755s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000023s : 1: opt.transform.mutable_eliminate 0.81% : 0.000833s : 78: opt.transform.opt_a 0.03% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000095s : 28: opt.transform.opt_b 0.05% : 0.000049s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 2.36% : 0.002427s : 1: opt_a 0.11% : 0.000112s : 1: opt_after_cconv 0.54% : 0.000553s : 1: opt_after_jit_grad 0.21% : 0.000218s : 1: opt_b 4.66% : 0.004780s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000036s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.39% : 0.000402s : 1: renormalize.infer 0.28% : 0.000283s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000051s : 1: rewriter_after_opt_a 0.06% : 0.000057s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000080s : 1: symbol_engine_optimizer 75.02% : 0.077003s : 1: task_emit 0.08% : 0.000081s : 1: tuple_transform 5.29% : 0.005431s : 1: type_inference 0.08% : 0.000084s : 1: validate TotalTime = 0.0936458, [24] [bootstrap]: 0.00046231 [type_inference]: 0.00657035 [event_method]: 1.619e-05 [auto_monad]: 6.205e-05 [graph_reusing]: 5.57001e-06 [inline]: 2.74999e-06 [add_attr]: 0.00370113, [1] [add_attr_with_inline]: 0.00368887, [1] [Cycle 1]: 7.342e-05, [2] [tag_attr]: 2.004e-05 [meta_addattr_fg_expand]: 4.24997e-06 [parallel-infer-symbol]: 4.4e-06 [pre_auto_parallel]: 3.78e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.89999e-06 [optimize]: 0.00510137, [53] [py_interpret_to_execute]: 2.886e-05 [rewriter_before_opt_a]: 7.224e-05 [opt_a]: 0.00271382, [2] [Cycle 1]: 0.00203007, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.387e-05 [loop_unroll]: 2.079e-05 [a_1]: 0.00050532 [with_stream_mark]: 1.844e-05 [recompute_prepare]: 8.03999e-06 [updatestate_depend_eliminate]: 3.77002e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 2.02001e-06 [a_2]: 7.882e-05 [accelerated_algorithm]: 6.63998e-06 [shard]: 2.43998e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 8.43001e-06 [auto_parallel]: 7.33999e-06 [parallel]: 2.121e-05 [flash_sp]: 8.27e-06 [merge_comm]: 3.76001e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 9.95002e-06 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 7.27997e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 5.96e-06 [virtual_output]: 5.91e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.058e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.236e-05 [merge_recompute_call_nodes]: 1.71998e-06 [before_grad]: 1.005e-05 [set_forward_comm_id_for_comm_node_pass]: 3.80998e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.83998e-06 [receive_attached]: 3.00002e-06 [after_resolve]: 1.202e-05 [a_after_grad]: 9.44998e-06 [renormalize]: 0.0008133 [add_forward_monad_depend]: 7.19001e-06 [auto_monad_grad]: 2.54001e-06 [auto_monad_eliminator]: 1.801e-05 [cse]: 3.17e-05 [a_3]: 4.656e-05 [Cycle 2]: 0.00065462, [45] [expand_dump_flag]: 2.02999e-06 [switch_simplify]: 8.49002e-06 [loop_unroll]: 5.32999e-06 [a_1]: 0.00013335 [with_stream_mark]: 1.438e-05 [recompute_prepare]: 6.24001e-06 [updatestate_depend_eliminate]: 3.03e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 6.891e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.37999e-06 [merge_send_recv]: 6.25002e-06 [auto_parallel]: 7.49002e-06 [parallel]: 6.34999e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 6.12001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.69998e-06 [offload_activation]: 8.42e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.015e-05 [merge_recompute_call_nodes]: 8.80013e-07 [before_grad]: 8.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.92002e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 1.39e-06 [receive_attached]: 1.59e-06 [after_resolve]: 1.015e-05 [a_after_grad]: 8.15e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.74998e-06 [auto_monad_grad]: 1.42999e-06 [auto_monad_eliminator]: 8.81997e-06 [cse]: 1.685e-05 [a_3]: 3.436e-05 [py_interpret_to_execute_after_opt_a]: 1.498e-05 [slice_cell_reuse_recomputed_activation]: 2.17001e-06 [rewriter_after_opt_a]: 4.117e-05 [convert_after_rewriter]: 7.94997e-06 [order_py_execute_after_rewriter]: 5.72001e-06 [mutable_eliminate]: 0.00077329 [opt_b]: 0.00020758, [1] [Cycle 1]: 0.00019835, [7] [b_1]: 0.00011372 [b_2]: 7.35998e-06 [updatestate_depend_eliminate]: 8.2e-06 [updatestate_assign_eliminate]: 2.82002e-06 [updatestate_loads_eliminate]: 2.66e-06 [renormalize]: 7.2e-07 [cse]: 2.619e-05 [optimize_parallel_all_gather_comm]: 2.118e-05 [overlap_param_gather]: 2.16e-06 [cconv]: 3.469e-05 [loop_unroll]: 0.00048265 [opt_after_cconv]: 0.00010443, [1] [Cycle 1]: 9.814e-05, [7] [c_1]: 2.873e-05 [parameter_eliminate]: 4.76002e-06 [updatestate_depend_eliminate]: 6.26e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.52001e-06 [cse]: 1.846e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.509e-05 [tuple_transform]: 7.602e-05, [1] [Cycle 1]: 7.129e-05, [4] [d_1]: 4.54e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.32001e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 5.241e-05 [cse_after_recomputation]: 2.127e-05, [1] [Cycle 1]: 1.648e-05, [1] [cse]: 1.149e-05 [environ_conv]: 6.29999e-06 [swap_dp_allreduce_reducescatter]: 6.53e-06 [bias_add_comm_swap]: 3.05998e-06 [label_micro_interleaved_index]: 4.75999e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.79001e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.61998e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.35999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.55001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78997e-06 [control_data_broadcast_order]: 1.341e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.92999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.60002e-06 [overlap_grad_ring_attention]: 4.15999e-06 [overlap_grad_flash_sp]: 2.193e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.88e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 7.845e-05, [1] [Cycle 1]: 7.4e-05, [6] [build]: 4.05998e-06 [elim_shapecalc]: 9.86e-06 [elim_not_effective]: 1.313e-05 [opt_reshape]: 6.94999e-06 [fold_const_symbol]: 1.107e-05 [renormalize]: 1.50001e-07 [detach_backward]: 2.03002e-06 [pipeline_parallel_scheduler]: 2.06e-06 [auto_monad_reorder]: 1.75e-05 [get_jit_bprop_graph]: 2.06e-06 [rewriter_after_jit_bprop_graph]: 5.72001e-06 [opt_after_jit_grad]: 0.0005404 [validate]: 4.516e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0767912 [execute]: 9.74e-06 Sums bootstrap : 0.000462s : 0.52% type_inference : 0.006570s : 7.40% event_method : 0.000016s : 0.02% auto_monad : 0.000062s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000038s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000029s : 0.03% optimize.rewriter_before_opt_a : 0.000072s : 0.08% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000042s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000639s : 0.72% optimize.opt_a.with_stream_mark : 0.000033s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.02% optimize.opt_a.auto_parallel : 0.000015s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000813s : 0.92% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000027s : 0.03% optimize.opt_a.cse : 0.000049s : 0.05% optimize.opt_a.a_3 : 0.000081s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000773s : 0.87% optimize.opt_b.b_1 : 0.000114s : 0.13% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000026s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.04% optimize.loop_unroll : 0.000483s : 0.54% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000011s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000540s : 0.61% validate : 0.000045s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.076791s : 86.44% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000219 30 14.10% : 0.000031s : 5: substitution.arithmetic_simplify 0.94% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000002s : 2: substitution.fold_const_symbol 3.09% : 0.000007s : 4: substitution.graph_param_transform 69.19% : 0.000152s : 3: substitution.inline 1.63% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.18% : 0.000005s : 4: substitution.remove_not_recompute_node 2.50% : 0.000005s : 4: substitution.replace_old_param 5.33% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006513 2 90.33% : 0.005884s : 1: type_inference.infer 9.67% : 0.000630s : 1: type_inference.specialize ------[replace.] 0.000042 5 71.82% : 0.000030s : 3: replace.inline 28.18% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000160 5 93.39% : 0.000150s : 3: match.inline 6.61% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000171 1131 1.06% : 0.000002s : 11: predicate.accumulaten_eliminater 1.79% : 0.000003s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 11: predicate.addn_zero_filter 0.72% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.48% : 0.000004s : 19: predicate.arithmetic_simplify 0.82% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.43% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.00% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 0.99% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.19% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000004s : 16: predicate.float_depend_g_call 0.79% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000001s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.36% : 0.000011s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.21% : 0.000004s : 32: predicate.load_eliminater 0.92% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.05% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 11: predicate.minmaximum_grad 1.68% : 0.000003s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.54% : 0.000003s : 16: predicate.partial_defer_inline 1.35% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.27% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 1.31% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.29% : 0.000002s : 16: predicate.switch_defer_inline 1.86% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.60% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.79% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.24% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.60% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.75% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.14% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.03% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000450 8 43.53% : 0.000196s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.47% : 0.000254s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.104446 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.55% : 0.003707s : 1: add_attr 3.54% : 0.003693s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000067s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.48% : 0.000504s : 1: bootstrap 0.04% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000023s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.47% : 0.000493s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000789s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000020s : 1: opt.transform.mutable_eliminate 0.98% : 0.001022s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000028s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000093s : 28: opt.transform.opt_b 0.05% : 0.000050s : 2: opt.transform.opt_trans_graph 0.04% : 0.000037s : 4: opt.transform.symbol_engine_opt 2.60% : 0.002718s : 1: opt_a 0.10% : 0.000108s : 1: opt_after_cconv 0.53% : 0.000553s : 1: opt_after_jit_grad 0.20% : 0.000212s : 1: opt_b 4.89% : 0.005106s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000006s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000042s : 1: pre_auto_parallel 0.03% : 0.000033s : 1: py_interpret_to_execute 0.02% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.43% : 0.000450s : 1: renormalize.infer 0.34% : 0.000353s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000048s : 1: rewriter_after_opt_a 0.07% : 0.000077s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000081s : 1: symbol_engine_optimizer 73.55% : 0.076817s : 1: task_emit 0.08% : 0.000079s : 1: tuple_transform 6.31% : 0.006593s : 1: type_inference 0.08% : 0.000079s : 1: validate TotalTime = 0.13029, [24] [bootstrap]: 0.00050771 [type_inference]: 0.0133216 [event_method]: 5.426e-05 [auto_monad]: 0.0001294 [graph_reusing]: 9.09e-06 [inline]: 2.54999e-06 [add_attr]: 0.00378941, [1] [add_attr_with_inline]: 0.0037775, [1] [Cycle 1]: 9.78e-05, [2] [tag_attr]: 4.282e-05 [meta_addattr_fg_expand]: 9.68002e-06 [parallel-infer-symbol]: 3.68e-06 [pre_auto_parallel]: 6.361e-05 [insert-virtual-dataset]: 2.54999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.0172247, [53] [py_interpret_to_execute]: 4.658e-05 [rewriter_before_opt_a]: 0.00016915 [opt_a]: 0.0144304, [3] [Cycle 1]: 0.00892195, [45] [expand_dump_flag]: 5.34e-06 [switch_simplify]: 7.703e-05 [loop_unroll]: 6.186e-05 [a_1]: 0.00167012 [with_stream_mark]: 3.691e-05 [recompute_prepare]: 2.885e-05 [updatestate_depend_eliminate]: 1.087e-05 [updatestate_assign_eliminate]: 8.11002e-06 [updatestate_loads_eliminate]: 7.16001e-06 [parameter_eliminate]: 3.49001e-06 [a_2]: 0.00025208 [accelerated_algorithm]: 3.74e-05 [shard]: 2.78e-06 [meta_shard_fg_expand]: 5.25999e-06 [shard_inline]: 1.743e-05 [merge_send_recv]: 1.982e-05 [auto_parallel]: 1.538e-05 [parallel]: 2.216e-05 [flash_sp]: 1.496e-05 [merge_comm]: 1e-05 [allreduce_fusion]: 9.08002e-06 [matmul_add_comm_reduction]: 3.578e-05 [allreduce_slice_to_reducescatter]: 1.00001e-06 [virtual_shard_identity]: 2.145e-05 [virtual_dataset]: 1.647e-05 [get_grad_eliminate_]: 1.594e-05 [virtual_output]: 1.562e-05 [merge_forward]: 1.029e-05 [cell_reuse_recompute_pass]: 2.63e-06 [offload_activation]: 2.017e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.275e-05 [merge_recompute_call_nodes]: 2.12999e-06 [before_grad]: 2.812e-05 [set_forward_comm_id_for_comm_node_pass]: 1.01e-05 [meta_fg_expand]: 0.00191639 [flash_sp_send_recv_attached]: 5.64998e-06 [receive_attached]: 2.78e-06 [after_resolve]: 7.693e-05 [a_after_grad]: 9.16e-05 [renormalize]: 0.00326312 [add_forward_monad_depend]: 1.629e-05 [auto_monad_grad]: 6.69001e-06 [auto_monad_eliminator]: 6.576e-05 [cse]: 0.00019082 [a_3]: 0.00036878 [Cycle 2]: 0.00446751, [45] [expand_dump_flag]: 3.72002e-06 [switch_simplify]: 4.85e-05 [loop_unroll]: 4.626e-05 [a_1]: 0.00180021 [with_stream_mark]: 2.963e-05 [recompute_prepare]: 1.898e-05 [updatestate_depend_eliminate]: 8.83001e-06 [updatestate_assign_eliminate]: 5.83997e-06 [updatestate_loads_eliminate]: 4.49002e-06 [parameter_eliminate]: 3.03e-06 [a_2]: 0.00014511 [accelerated_algorithm]: 1.602e-05 [shard]: 3.7e-06 [meta_shard_fg_expand]: 3.54002e-06 [shard_inline]: 1.079e-05 [merge_send_recv]: 1.238e-05 [auto_parallel]: 1.287e-05 [parallel]: 1.205e-05 [flash_sp]: 4.94e-06 [merge_comm]: 5.62001e-06 [allreduce_fusion]: 5.10001e-06 [matmul_add_comm_reduction]: 1.441e-05 [allreduce_slice_to_reducescatter]: 1.20999e-06 [virtual_shard_identity]: 1.265e-05 [virtual_dataset]: 9.89999e-06 [get_grad_eliminate_]: 9.17999e-06 [virtual_output]: 9.06998e-06 [merge_forward]: 6.76e-06 [cell_reuse_recompute_pass]: 2.14999e-06 [offload_activation]: 4.774e-05 [cell_reuse_handle_not_recompute_node_pass]: 6.079e-05 [merge_recompute_call_nodes]: 3.36999e-06 [before_grad]: 1.907e-05 [set_forward_comm_id_for_comm_node_pass]: 1.404e-05 [meta_fg_expand]: 0.00018177 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.86999e-06 [after_resolve]: 2.307e-05 [a_after_grad]: 1.674e-05 [renormalize]: 0.00120216 [add_forward_monad_depend]: 8.60001e-06 [auto_monad_grad]: 3.01999e-06 [auto_monad_eliminator]: 2.277e-05 [cse]: 7.088e-05 [a_3]: 7.941e-05 [Cycle 3]: 0.00101804, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 1.15e-05 [loop_unroll]: 9.86e-06 [a_1]: 0.00027941 [with_stream_mark]: 1.864e-05 [recompute_prepare]: 1.034e-05 [updatestate_depend_eliminate]: 5.84e-06 [updatestate_assign_eliminate]: 4.73001e-06 [updatestate_loads_eliminate]: 4.79998e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 0.00012548 [accelerated_algorithm]: 1.385e-05 [shard]: 2.58e-06 [meta_shard_fg_expand]: 2.32999e-06 [shard_inline]: 9.22999e-06 [merge_send_recv]: 1.081e-05 [auto_parallel]: 1.311e-05 [parallel]: 9.76998e-06 [flash_sp]: 1.97999e-06 [merge_comm]: 5.67999e-06 [allreduce_fusion]: 5.32001e-06 [matmul_add_comm_reduction]: 1.148e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.104e-05 [virtual_dataset]: 9.10001e-06 [get_grad_eliminate_]: 9.01998e-06 [virtual_output]: 8.62998e-06 [merge_forward]: 5.79e-06 [cell_reuse_recompute_pass]: 3.12002e-06 [offload_activation]: 1.186e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.756e-05 [merge_recompute_call_nodes]: 1.20999e-06 [before_grad]: 1.559e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42999e-06 [meta_fg_expand]: 3.31999e-06 [flash_sp_send_recv_attached]: 1.27999e-06 [receive_attached]: 2.03002e-06 [after_resolve]: 1.581e-05 [a_after_grad]: 1.497e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.35001e-06 [auto_monad_grad]: 2.06998e-06 [auto_monad_eliminator]: 1.346e-05 [cse]: 3.195e-05 [a_3]: 6.03e-05 [py_interpret_to_execute_after_opt_a]: 1.999e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 5.639e-05 [convert_after_rewriter]: 9.61e-06 [order_py_execute_after_rewriter]: 6.66e-06 [mutable_eliminate]: 0.00073347 [opt_b]: 0.0003155, [1] [Cycle 1]: 0.00030786, [7] [b_1]: 0.00019553 [b_2]: 1.171e-05 [updatestate_depend_eliminate]: 1.04e-05 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.95e-06 [renormalize]: 8.80013e-07 [cse]: 4.351e-05 [optimize_parallel_all_gather_comm]: 2.251e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 3.187e-05 [loop_unroll]: 0.00046694 [opt_after_cconv]: 0.00015194, [1] [Cycle 1]: 0.0001448, [7] [c_1]: 5.138e-05 [parameter_eliminate]: 4.42e-06 [updatestate_depend_eliminate]: 8.35001e-06 [updatestate_assign_eliminate]: 4.41002e-06 [updatestate_loads_eliminate]: 3.96001e-06 [cse]: 3.586e-05 [renormalize]: 6.00005e-07 [remove_dup_value]: 4.686e-05 [tuple_transform]: 0.00011362, [1] [Cycle 1]: 0.00010803, [4] [d_1]: 7.543e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 2.89991e-07 [switch_simplify]: 1.053e-05 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 7.286e-05 [cse_after_recomputation]: 3.744e-05, [1] [Cycle 1]: 3.173e-05, [1] [cse]: 2.508e-05 [environ_conv]: 1.24e-05 [swap_dp_allreduce_reducescatter]: 8.76002e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.70999e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.56998e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 9.79984e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.99999e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.868e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 5.92001e-06 [overlap_recompute_and_grad_model_parallel]: 6.71999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.74e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 6.01e-06 [overlap_grad_flash_sp]: 2.934e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 0.00011065, [1] [Cycle 1]: 0.00010594, [6] [build]: 1.21e-05 [elim_shapecalc]: 1.63e-05 [elim_not_effective]: 2.059e-05 [opt_reshape]: 1.216e-05 [fold_const_symbol]: 1.505e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.36e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 2.731e-05 [get_jit_bprop_graph]: 2.03002e-06 [rewriter_after_jit_bprop_graph]: 4.64998e-06 [opt_after_jit_grad]: 0.00061988 [validate]: 6.1e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0941761 [execute]: 8.85999e-06 Sums bootstrap : 0.000508s : 0.41% type_inference : 0.013322s : 10.67% event_method : 0.000054s : 0.04% auto_monad : 0.000129s : 0.10% graph_reusing : 0.000009s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000043s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000064s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000047s : 0.04% optimize.rewriter_before_opt_a : 0.000169s : 0.14% optimize.opt_a.expand_dump_flag : 0.000012s : 0.01% optimize.opt_a.switch_simplify : 0.000137s : 0.11% optimize.opt_a.loop_unroll : 0.000118s : 0.09% optimize.opt_a.a_1 : 0.003750s : 3.00% optimize.opt_a.with_stream_mark : 0.000085s : 0.07% optimize.opt_a.recompute_prepare : 0.000058s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000026s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000008s : 0.01% optimize.opt_a.a_2 : 0.000523s : 0.42% optimize.opt_a.accelerated_algorithm : 0.000067s : 0.05% optimize.opt_a.shard : 0.000009s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.01% optimize.opt_a.shard_inline : 0.000037s : 0.03% optimize.opt_a.merge_send_recv : 0.000043s : 0.03% optimize.opt_a.auto_parallel : 0.000041s : 0.03% optimize.opt_a.parallel : 0.000044s : 0.04% optimize.opt_a.flash_sp : 0.000022s : 0.02% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000062s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000045s : 0.04% optimize.opt_a.virtual_dataset : 0.000035s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000023s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000008s : 0.01% optimize.opt_a.offload_activation : 0.000080s : 0.06% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000111s : 0.09% optimize.opt_a.merge_recompute_call_nodes : 0.000007s : 0.01% optimize.opt_a.before_grad : 0.000063s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000030s : 0.02% optimize.opt_a.meta_fg_expand : 0.002101s : 1.68% optimize.opt_a.flash_sp_send_recv_attached : 0.000010s : 0.01% optimize.opt_a.receive_attached : 0.000008s : 0.01% optimize.opt_a.after_resolve : 0.000116s : 0.09% optimize.opt_a.a_after_grad : 0.000123s : 0.10% optimize.opt_a.renormalize : 0.004465s : 3.58% optimize.opt_a.add_forward_monad_depend : 0.000026s : 0.02% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000102s : 0.08% optimize.opt_a.cse : 0.000294s : 0.24% optimize.opt_a.a_3 : 0.000508s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000056s : 0.05% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000733s : 0.59% optimize.opt_b.b_1 : 0.000196s : 0.16% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000044s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000032s : 0.03% optimize.loop_unroll : 0.000467s : 0.37% optimize.opt_after_cconv.c_1 : 0.000051s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000036s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000047s : 0.04% optimize.tuple_transform.d_1 : 0.000075s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000073s : 0.06% optimize.cse_after_recomputation.cse : 0.000025s : 0.02% optimize.environ_conv : 0.000012s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000007s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000620s : 0.50% validate : 0.000061s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.094176s : 75.45% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.001075 222 6.58% : 0.000071s : 12: substitution.arithmetic_simplify 2.39% : 0.000026s : 2: substitution.cast_eliminate 0.25% : 0.000003s : 5: substitution.elim_not_effective 0.46% : 0.000005s : 5: substitution.float_depend_g_call 0.47% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.21% : 0.000002s : 5: substitution.fold_const_symbol 0.88% : 0.000009s : 8: substitution.graph_param_transform 0.28% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 57.46% : 0.000618s : 17: substitution.inline 2.02% : 0.000022s : 2: substitution.inline_without_move 1.14% : 0.000012s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000021s : 3: substitution.less_batch_normalization 1.45% : 0.000016s : 11: substitution.minmaximum_grad 0.76% : 0.000008s : 5: substitution.partial_eliminate 2.02% : 0.000022s : 20: substitution.remove_not_recompute_node 3.17% : 0.000034s : 10: substitution.replace_applicator 1.72% : 0.000019s : 15: substitution.replace_old_param 0.32% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.27% : 0.000035s : 11: substitution.tuple_list_convert_item_index_to_positive 1.37% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 1.94% : 0.000021s : 11: substitution.tuple_list_get_item_depend_reorder 7.62% : 0.000082s : 30: substitution.tuple_list_get_item_eliminator 2.05% : 0.000022s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013218 2 86.98% : 0.011497s : 1: type_inference.infer 13.02% : 0.001721s : 1: type_inference.specialize ------[replace.] 0.000259 33 60.01% : 0.000155s : 17: replace.inline 39.99% : 0.000104s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000649 33 93.60% : 0.000607s : 17: match.inline 6.40% : 0.000042s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000804 5764 1.07% : 0.000009s : 68: predicate.accumulaten_eliminater 0.35% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.19% : 0.000018s : 100: predicate.arithmetic_simplify 1.17% : 0.000009s : 68: predicate.cast_eliminate 1.10% : 0.000009s : 68: predicate.check_bprop_eliminate 0.49% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000010s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.12% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.14% : 0.000009s : 76: predicate.environ_get_depend_swap 1.78% : 0.000014s : 108: predicate.environ_get_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.67% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000019s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000006s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.61% : 0.000005s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000005s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.71% : 0.000046s : 249: predicate.inline 1.35% : 0.000011s : 55: predicate.inline_without_move 0.28% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.74% : 0.000006s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.56% : 0.000021s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.07% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.07% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 68: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.08% : 0.000017s : 101: predicate.partial_defer_inline 1.65% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000009s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000011s : 68: predicate.reduce_eliminate 2.66% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000004s : 32: predicate.remove_not_recompute_node 1.92% : 0.000015s : 152: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.15% : 0.000001s : 8: predicate.reset_defer_inline 1.13% : 0.000009s : 68: predicate.reshape_eliminate 1.10% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.39% : 0.000011s : 68: predicate.same_eliminate 0.46% : 0.000004s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.42% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.25% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000014s : 101: predicate.switch_defer_inline 2.81% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.84% : 0.000039s : 277: predicate.switch_simplify 1.04% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.38% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.43% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000017s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.55% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.10% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000005s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002030 34 57.46% : 0.001166s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.54% : 0.000863s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.161686 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.35% : 0.003796s : 1: add_attr 2.34% : 0.003782s : 1: add_attr_with_inline 0.00% : 0.000005s : 1: add_comm_op_reuse_tag 0.05% : 0.000078s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000138s : 1: auto_monad 0.02% : 0.000032s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.34% : 0.000547s : 1: bootstrap 0.02% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000041s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000016s : 1: environ_conv 0.04% : 0.000063s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.30% : 0.000477s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000747s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 3.46% : 0.005590s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000179s : 28: opt.transform.opt_b 0.05% : 0.000084s : 2: opt.transform.opt_trans_graph 0.04% : 0.000060s : 4: opt.transform.symbol_engine_opt 8.93% : 0.014434s : 1: opt_a 0.10% : 0.000155s : 1: opt_after_cconv 0.39% : 0.000632s : 1: opt_after_jit_grad 0.20% : 0.000319s : 1: opt_b 10.66% : 0.017230s : 1: optimize 0.02% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000033s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000011s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000069s : 1: pre_auto_parallel 0.03% : 0.000051s : 1: py_interpret_to_execute 0.01% : 0.000024s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000052s : 1: remove_dup_value 1.55% : 0.002507s : 2: renormalize.infer 1.20% : 0.001934s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000061s : 1: rewriter_after_opt_a 0.11% : 0.000175s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000113s : 1: symbol_engine_optimizer 58.26% : 0.094200s : 1: task_emit 0.07% : 0.000117s : 1: tuple_transform 8.26% : 0.013355s : 1: type_inference 0.06% : 0.000097s : 1: validate TotalTime = 0.0814497, [24] [bootstrap]: 0.00043271 [type_inference]: 0.00465841 [event_method]: 1.09e-05 [auto_monad]: 5.566e-05 [graph_reusing]: 5.32001e-06 [inline]: 2.49999e-06 [add_attr]: 0.00337006, [1] [add_attr_with_inline]: 0.00335898, [1] [Cycle 1]: 5.439e-05, [2] [tag_attr]: 1.38e-05 [meta_addattr_fg_expand]: 3.24001e-06 [parallel-infer-symbol]: 3.73999e-06 [pre_auto_parallel]: 2.816e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 9.10019e-07 [dataset_repeat_opt]: 1.91998e-06 [pipeline_split]: 1.69998e-06 [optimize]: 0.00442137, [53] [py_interpret_to_execute]: 1.841e-05 [rewriter_before_opt_a]: 4.582e-05 [opt_a]: 0.00231933, [2] [Cycle 1]: 0.00166861, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 2.606e-05 [loop_unroll]: 1.39e-05 [a_1]: 0.00032847 [with_stream_mark]: 1.865e-05 [recompute_prepare]: 8.82e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.59002e-06 [updatestate_loads_eliminate]: 3.58e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 8.379e-05 [accelerated_algorithm]: 6.54999e-06 [shard]: 2.37001e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 8.43999e-06 [auto_parallel]: 6.47001e-06 [parallel]: 1.964e-05 [flash_sp]: 8.87e-06 [merge_comm]: 4.04997e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 1.07e-05 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 8.27998e-06 [virtual_dataset]: 6.19001e-06 [get_grad_eliminate_]: 6.21e-06 [virtual_output]: 5.83997e-06 [merge_forward]: 4.50001e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 1.105e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.174e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 9.71e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55003e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.46998e-06 [after_resolve]: 1.226e-05 [a_after_grad]: 9.66e-06 [renormalize]: 0.00065638 [add_forward_monad_depend]: 4.85999e-06 [auto_monad_grad]: 2.64999e-06 [auto_monad_eliminator]: 1.558e-05 [cse]: 3.077e-05 [a_3]: 4.441e-05 [Cycle 2]: 0.0006392, [45] [expand_dump_flag]: 1.04998e-06 [switch_simplify]: 7.65e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00013227 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 6.14001e-06 [updatestate_depend_eliminate]: 3.11999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 1.12e-06 [a_2]: 7.025e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.35001e-06 [meta_shard_fg_expand]: 1.36998e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 5.43002e-06 [auto_parallel]: 5.12e-06 [parallel]: 5.99e-06 [flash_sp]: 3.33e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.85002e-06 [matmul_add_comm_reduction]: 6.61e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 6.68e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.58002e-06 [virtual_output]: 5.27001e-06 [merge_forward]: 3.04001e-06 [cell_reuse_recompute_pass]: 1.77999e-06 [offload_activation]: 7.56001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 8.40024e-07 [before_grad]: 8.54998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28998e-06 [meta_fg_expand]: 1.89e-06 [flash_sp_send_recv_attached]: 1.19e-06 [receive_attached]: 1.50001e-06 [after_resolve]: 1.022e-05 [a_after_grad]: 8.82999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.44003e-06 [auto_monad_grad]: 1.34e-06 [auto_monad_eliminator]: 7.11001e-06 [cse]: 1.691e-05 [a_3]: 3.237e-05 [py_interpret_to_execute_after_opt_a]: 1.068e-05 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.642e-05 [convert_after_rewriter]: 8.50001e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00060602 [opt_b]: 0.00019377, [1] [Cycle 1]: 0.00018678, [7] [b_1]: 0.00011099 [b_2]: 8.00999e-06 [updatestate_depend_eliminate]: 6.21998e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.57001e-06 [renormalize]: 7.2e-07 [cse]: 2.044e-05 [optimize_parallel_all_gather_comm]: 1.816e-05 [overlap_param_gather]: 2.36e-06 [cconv]: 2.662e-05 [loop_unroll]: 0.00045701 [opt_after_cconv]: 0.00010103, [1] [Cycle 1]: 9.433e-05, [7] [c_1]: 2.796e-05 [parameter_eliminate]: 3.29001e-06 [updatestate_depend_eliminate]: 5.70001e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.771e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.411e-05 [tuple_transform]: 7.618e-05, [1] [Cycle 1]: 7.106e-05, [4] [d_1]: 4.454e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.44999e-06 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 4.721e-05 [cse_after_recomputation]: 2.14e-05, [1] [Cycle 1]: 1.699e-05, [1] [cse]: 1.149e-05 [environ_conv]: 5.32001e-06 [swap_dp_allreduce_reducescatter]: 5.39e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.57998e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.44998e-06 [slice_recompute_activation]: 2.38998e-06 [micro_interleaved_order_control]: 2.11998e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 1.17e-06 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.36002e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.233e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 4.15e-06 [overlap_recompute_and_grad_model_parallel]: 4.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 1.85001e-06 [overlap_grad_ring_attention]: 4.55001e-06 [overlap_grad_flash_sp]: 2.102e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 7.626e-05, [1] [Cycle 1]: 7.116e-05, [6] [build]: 3.09999e-06 [elim_shapecalc]: 9.56998e-06 [elim_not_effective]: 1.25e-05 [opt_reshape]: 7.06999e-06 [fold_const_symbol]: 9.13002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.96e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.677e-05 [get_jit_bprop_graph]: 2.11e-06 [rewriter_after_jit_bprop_graph]: 4.73001e-06 [opt_after_jit_grad]: 0.0004851 [validate]: 4.063e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.067645 [execute]: 9.57001e-06 Sums bootstrap : 0.000433s : 0.56% type_inference : 0.004658s : 6.05% event_method : 0.000011s : 0.01% auto_monad : 0.000056s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000018s : 0.02% optimize.rewriter_before_opt_a : 0.000046s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000034s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000461s : 0.60% optimize.opt_a.with_stream_mark : 0.000032s : 0.04% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000154s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000019s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000022s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000656s : 0.85% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.03% optimize.opt_a.cse : 0.000048s : 0.06% optimize.opt_a.a_3 : 0.000077s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000606s : 0.79% optimize.opt_b.b_1 : 0.000111s : 0.14% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000027s : 0.03% optimize.loop_unroll : 0.000457s : 0.59% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000485s : 0.63% validate : 0.000041s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.067645s : 87.84% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000146 26 17.65% : 0.000026s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000006s : 4: substitution.graph_param_transform 66.35% : 0.000097s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000005s : 4: substitution.remove_not_recompute_node 3.31% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004607 2 91.76% : 0.004227s : 1: type_inference.infer 8.24% : 0.000380s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000095 2 100.00% : 0.000095s : 2: match.inline ------[predicate.] 0.000146 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.69% : 0.000004s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.43% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.77% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000009s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.51% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.55% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.66% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000002s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.60% : 0.000007s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.23% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000297 6 38.27% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.73% : 0.000183s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.090885 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.71% : 0.003375s : 1: add_attr 3.70% : 0.003363s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000062s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.52% : 0.000470s : 1: bootstrap 0.03% : 0.000031s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000467s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.68% : 0.000617s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 0.92% : 0.000838s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000049s : 2: opt.transform.opt_trans_graph 0.04% : 0.000035s : 4: opt.transform.symbol_engine_opt 2.56% : 0.002323s : 1: opt_a 0.12% : 0.000105s : 1: opt_after_cconv 0.54% : 0.000495s : 1: opt_after_jit_grad 0.22% : 0.000197s : 1: opt_b 4.87% : 0.004427s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000009s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000022s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.38% : 0.000346s : 1: renormalize.infer 0.33% : 0.000302s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000041s : 1: rewriter_after_opt_a 0.06% : 0.000050s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000079s : 1: symbol_engine_optimizer 74.46% : 0.067669s : 1: task_emit 0.09% : 0.000079s : 1: tuple_transform 5.15% : 0.004682s : 1: type_inference 0.08% : 0.000072s : 1: validate TotalTime = 0.116358, [24] [bootstrap]: 0.00046125 [type_inference]: 0.0107639 [event_method]: 4.576e-05 [auto_monad]: 0.00011643 [graph_reusing]: 8.63001e-06 [inline]: 1.91003e-06 [add_attr]: 0.00318853, [1] [add_attr_with_inline]: 0.00317893, [1] [Cycle 1]: 7.683e-05, [2] [tag_attr]: 3.689e-05 [meta_addattr_fg_expand]: 8.22e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 4.871e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.77999e-06 [optimize]: 0.0142076, [53] [py_interpret_to_execute]: 3.801e-05 [rewriter_before_opt_a]: 0.00017807 [opt_a]: 0.0116503, [3] [Cycle 1]: 0.00746925, [45] [expand_dump_flag]: 5.40999e-06 [switch_simplify]: 6.767e-05 [loop_unroll]: 5.495e-05 [a_1]: 0.00137716 [with_stream_mark]: 2.551e-05 [recompute_prepare]: 2.277e-05 [updatestate_depend_eliminate]: 8.89e-06 [updatestate_assign_eliminate]: 8.25e-06 [updatestate_loads_eliminate]: 7.15998e-06 [parameter_eliminate]: 2.78998e-06 [a_2]: 0.00024626 [accelerated_algorithm]: 3.103e-05 [shard]: 1.81003e-06 [meta_shard_fg_expand]: 3.81999e-06 [shard_inline]: 1.621e-05 [merge_send_recv]: 1.627e-05 [auto_parallel]: 1.098e-05 [parallel]: 1.907e-05 [flash_sp]: 1.208e-05 [merge_comm]: 9.54e-06 [allreduce_fusion]: 9.57999e-06 [matmul_add_comm_reduction]: 2.74e-05 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 1.862e-05 [virtual_dataset]: 1.567e-05 [get_grad_eliminate_]: 1.527e-05 [virtual_output]: 1.53e-05 [merge_forward]: 9.46e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 1.807e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.917e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 2.735e-05 [set_forward_comm_id_for_comm_node_pass]: 9.87999e-06 [meta_fg_expand]: 0.00148714 [flash_sp_send_recv_attached]: 3.89002e-06 [receive_attached]: 2.88e-06 [after_resolve]: 6.126e-05 [a_after_grad]: 9.465e-05 [renormalize]: 0.0027832 [add_forward_monad_depend]: 1.011e-05 [auto_monad_grad]: 5.84999e-06 [auto_monad_eliminator]: 5.818e-05 [cse]: 0.00017798 [a_3]: 0.00034152 [Cycle 2]: 0.00324635, [45] [expand_dump_flag]: 1.82999e-06 [switch_simplify]: 4.714e-05 [loop_unroll]: 4.386e-05 [a_1]: 0.00161941 [with_stream_mark]: 1.483e-05 [recompute_prepare]: 1.293e-05 [updatestate_depend_eliminate]: 5.51998e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 0.00012742 [accelerated_algorithm]: 1.414e-05 [shard]: 1.54e-06 [meta_shard_fg_expand]: 2.43002e-06 [shard_inline]: 9.54999e-06 [merge_send_recv]: 8.44002e-06 [auto_parallel]: 8.45999e-06 [parallel]: 6.43e-06 [flash_sp]: 3.86001e-06 [merge_comm]: 5.27999e-06 [allreduce_fusion]: 4.87998e-06 [matmul_add_comm_reduction]: 1.036e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.144e-05 [virtual_dataset]: 9.07001e-06 [get_grad_eliminate_]: 1.029e-05 [virtual_output]: 8.45999e-06 [merge_forward]: 5.20999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.202e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.951e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 1.494e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37999e-06 [meta_fg_expand]: 4.526e-05 [flash_sp_send_recv_attached]: 1.30999e-06 [receive_attached]: 1.57001e-06 [after_resolve]: 1.595e-05 [a_after_grad]: 1.449e-05 [renormalize]: 0.00070857 [add_forward_monad_depend]: 5.15999e-06 [auto_monad_grad]: 1.69998e-06 [auto_monad_eliminator]: 1.712e-05 [cse]: 5.226e-05 [a_3]: 6.859e-05 [Cycle 3]: 0.00091752, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 1.039e-05 [loop_unroll]: 8.77999e-06 [a_1]: 0.00025306 [with_stream_mark]: 1.161e-05 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 4.1e-06 [parameter_eliminate]: 1.32999e-06 [a_2]: 0.0001235 [accelerated_algorithm]: 1.164e-05 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.91998e-06 [shard_inline]: 9.04998e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 7.44002e-06 [parallel]: 4.93001e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.92e-06 [allreduce_fusion]: 5.37001e-06 [matmul_add_comm_reduction]: 8.75001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.027e-05 [virtual_dataset]: 8.70999e-06 [get_grad_eliminate_]: 8.55999e-06 [virtual_output]: 8.35999e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 9.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.673e-05 [merge_recompute_call_nodes]: 8.30012e-07 [before_grad]: 1.526e-05 [set_forward_comm_id_for_comm_node_pass]: 5.87001e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.27e-06 [after_resolve]: 1.373e-05 [a_after_grad]: 1.428e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.45001e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.114e-05 [cse]: 2.716e-05 [a_3]: 5.892e-05 [py_interpret_to_execute_after_opt_a]: 1.422e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 5.039e-05 [convert_after_rewriter]: 9.22001e-06 [order_py_execute_after_rewriter]: 7.18998e-06 [mutable_eliminate]: 0.00064332 [opt_b]: 0.00029634, [1] [Cycle 1]: 0.00028937, [7] [b_1]: 0.00019146 [b_2]: 1.141e-05 [updatestate_depend_eliminate]: 7.95998e-06 [updatestate_assign_eliminate]: 4.07998e-06 [updatestate_loads_eliminate]: 4.22998e-06 [renormalize]: 6.89994e-07 [cse]: 3.396e-05 [optimize_parallel_all_gather_comm]: 2.113e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 2.633e-05 [loop_unroll]: 0.000445 [opt_after_cconv]: 0.00014215, [1] [Cycle 1]: 0.00013577, [7] [c_1]: 4.992e-05 [parameter_eliminate]: 2.88e-06 [updatestate_depend_eliminate]: 7.4e-06 [updatestate_assign_eliminate]: 4.34997e-06 [updatestate_loads_eliminate]: 3.91999e-06 [cse]: 3.287e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 3.256e-05 [tuple_transform]: 0.00010455, [1] [Cycle 1]: 9.919e-05, [4] [d_1]: 6.873e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.02e-05 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 6.18e-05 [cse_after_recomputation]: 3.368e-05, [1] [Cycle 1]: 2.813e-05, [1] [cse]: 2.266e-05 [environ_conv]: 9.79e-06 [swap_dp_allreduce_reducescatter]: 7.63999e-06 [bias_add_comm_swap]: 2.86e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.51998e-06 [merge_cast_opt]: 1.13001e-06 [slice_recompute_activation]: 2.33002e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.802e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 5.25999e-06 [overlap_recompute_and_grad_model_parallel]: 5.26998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.55002e-06 [overlap_grad_ring_attention]: 5.27999e-06 [overlap_grad_flash_sp]: 2.63e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.44001e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 0.00010295, [1] [Cycle 1]: 9.838e-05, [6] [build]: 1.11e-05 [elim_shapecalc]: 1.384e-05 [elim_not_effective]: 1.929e-05 [opt_reshape]: 9.81e-06 [fold_const_symbol]: 1.583e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.17999e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 2.595e-05 [get_jit_bprop_graph]: 1.77001e-06 [rewriter_after_jit_bprop_graph]: 4.02e-06 [opt_after_jit_grad]: 0.0004966 [validate]: 5.317e-05 [backend_pass]: 1.13001e-06 [task_emit]: 0.0866695 [execute]: 9.40001e-06 Sums bootstrap : 0.000461s : 0.41% type_inference : 0.010764s : 9.63% event_method : 0.000046s : 0.04% auto_monad : 0.000116s : 0.10% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000178s : 0.16% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.11% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003250s : 2.91% optimize.opt_a.with_stream_mark : 0.000052s : 0.05% optimize.opt_a.recompute_prepare : 0.000045s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000006s : 0.01% optimize.opt_a.a_2 : 0.000497s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000033s : 0.03% optimize.opt_a.auto_parallel : 0.000027s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.03% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000047s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000039s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001535s : 1.37% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.08% optimize.opt_a.a_after_grad : 0.000123s : 0.11% optimize.opt_a.renormalize : 0.003492s : 3.12% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.08% optimize.opt_a.cse : 0.000257s : 0.23% optimize.opt_a.a_3 : 0.000469s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000643s : 0.58% optimize.opt_b.b_1 : 0.000191s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.02% optimize.loop_unroll : 0.000445s : 0.40% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000033s : 0.03% optimize.tuple_transform.d_1 : 0.000069s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000497s : 0.44% validate : 0.000053s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.086669s : 77.50% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000801 218 6.02% : 0.000048s : 11: substitution.arithmetic_simplify 1.87% : 0.000015s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000003s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.31% : 0.000003s : 2: substitution.incorporate_call 0.33% : 0.000003s : 2: substitution.incorporate_call_switch 56.14% : 0.000450s : 16: substitution.inline 2.19% : 0.000017s : 2: substitution.inline_without_move 1.44% : 0.000012s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000016s : 3: substitution.less_batch_normalization 1.74% : 0.000014s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.89% : 0.000015s : 20: substitution.remove_not_recompute_node 2.96% : 0.000024s : 10: substitution.replace_applicator 1.34% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.03% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.28% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010687 2 86.50% : 0.009244s : 1: type_inference.infer 13.50% : 0.001443s : 1: type_inference.specialize ------[replace.] 0.000210 30 58.43% : 0.000123s : 16: replace.inline 41.57% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000473 30 93.39% : 0.000441s : 16: match.inline 6.61% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.12% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000009s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.23% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.54% : 0.000041s : 244: predicate.inline 1.32% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.71% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.44% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.40% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.20% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.68% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.98% : 0.000015s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.22% : 0.000002s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.31% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 97: predicate.switch_defer_inline 2.88% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.80% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.22% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.59% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001649 32 57.27% : 0.000944s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.73% : 0.000705s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.142509 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.24% : 0.003194s : 1: add_attr 2.23% : 0.003183s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000124s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.35% : 0.000494s : 1: bootstrap 0.02% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000053s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000454s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.46% : 0.000654s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000019s : 1: opt.transform.mutable_eliminate 3.46% : 0.004938s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000176s : 28: opt.transform.opt_b 0.05% : 0.000077s : 2: opt.transform.opt_trans_graph 0.04% : 0.000055s : 4: opt.transform.symbol_engine_opt 8.18% : 0.011654s : 1: opt_a 0.10% : 0.000146s : 1: opt_after_cconv 0.36% : 0.000507s : 1: opt_after_jit_grad 0.21% : 0.000300s : 1: opt_b 9.97% : 0.014213s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000037s : 1: remove_dup_value 1.32% : 0.001884s : 2: renormalize.infer 1.12% : 0.001592s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000054s : 1: rewriter_after_opt_a 0.13% : 0.000184s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000106s : 1: symbol_engine_optimizer 60.83% : 0.086692s : 1: task_emit 0.08% : 0.000108s : 1: tuple_transform 7.57% : 0.010785s : 1: type_inference 0.06% : 0.000087s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x1-ge],max_mem:38.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x2-pynative],max_mem:38.0M TotalTime = 0.0219694, [24] [bootstrap]: 0.00050507 [type_inference]: 0.00628797 [event_method]: 1.404e-05 [auto_monad]: 5.389e-05 [graph_reusing]: 5.92999e-06 [inline]: 1.76998e-06 [add_attr]: 0.0035154, [1] [add_attr_with_inline]: 0.00350412, [1] [Cycle 1]: 4.348e-05, [2] [tag_attr]: 1.47e-05 [meta_addattr_fg_expand]: 4.19002e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.839e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 1.16997e-06 [dataset_repeat_opt]: 2.03002e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00395515, [53] [py_interpret_to_execute]: 2.04e-05 [rewriter_before_opt_a]: 6.005e-05 [opt_a]: 0.00212203, [2] [Cycle 1]: 0.00152286, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 3.21e-05 [loop_unroll]: 2.089e-05 [a_1]: 0.00045571 [with_stream_mark]: 1.295e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.89997e-06 [updatestate_assign_eliminate]: 3.48999e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.544e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 5.76003e-06 [parallel]: 2.242e-05 [flash_sp]: 7.00998e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.79998e-06 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 6.41e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 9.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.18002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 2.73e-06 [receive_attached]: 2.18002e-06 [after_resolve]: 9.87999e-06 [a_after_grad]: 8.62e-06 [renormalize]: 0.00042774 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.456e-05 [cse]: 2.521e-05 [a_3]: 4.067e-05 [Cycle 2]: 0.00058962, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.22999e-06 [a_1]: 0.00012537 [with_stream_mark]: 9.25999e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.94001e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.837e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.16997e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.69999e-06 [merge_send_recv]: 4.16001e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.28001e-06 [flash_sp]: 3.2e-06 [merge_comm]: 3.00998e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.019e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.20001e-07 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 5.94e-06 [cse]: 1.595e-05 [a_3]: 3.167e-05 [py_interpret_to_execute_after_opt_a]: 7.65998e-06 [slice_cell_reuse_recomputed_activation]: 1.81998e-06 [rewriter_after_opt_a]: 2.992e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00045544 [opt_b]: 0.00017868, [1] [Cycle 1]: 0.00017271, [7] [b_1]: 0.00010629 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.30008e-07 [cse]: 1.598e-05 [optimize_parallel_all_gather_comm]: 1.596e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.211e-05 [loop_unroll]: 0.00041094 [opt_after_cconv]: 9.393e-05, [1] [Cycle 1]: 8.773e-05, [7] [c_1]: 2.753e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.13998e-06 [cse]: 1.543e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.755e-05, [1] [Cycle 1]: 6.314e-05, [4] [d_1]: 3.811e-05 [none_parameter_eliminate]: 1.53002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.84e-06 [partial_unused_args_eliminate]: 1.66002e-06 [add_recomputation]: 4.814e-05 [cse_after_recomputation]: 2.03e-05, [1] [Cycle 1]: 1.603e-05, [1] [cse]: 1.094e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 3.9e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91998e-06 [control_data_broadcast_order]: 1.141e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36998e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.657e-05 [begin_end_overlap_inline]: 4.40021e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.97999e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 6.956e-05, [1] [Cycle 1]: 6.542e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.94003e-06 [elim_not_effective]: 1.148e-05 [opt_reshape]: 6.08998e-06 [fold_const_symbol]: 9.07001e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.78002e-06 [auto_monad_reorder]: 1.582e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 0.00011123 [opt_after_jit_grad]: 0.00044775 [validate]: 3.169e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00676482 [execute]: 7.13998e-06 Sums bootstrap : 0.000505s : 2.89% type_inference : 0.006288s : 35.95% event_method : 0.000014s : 0.08% auto_monad : 0.000054s : 0.31% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000581s : 3.32% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000428s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000072s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000455s : 2.60% optimize.opt_b.b_1 : 0.000106s : 0.61% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000411s : 2.35% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.09% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000111s : 0.64% opt_after_jit_grad : 0.000448s : 2.56% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006765s : 38.68% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.46% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000005s : 4: substitution.graph_param_transform 67.31% : 0.000111s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.88% : 0.000005s : 4: substitution.remove_not_recompute_node 2.20% : 0.000004s : 4: substitution.replace_old_param 6.44% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006240 2 90.49% : 0.005647s : 1: type_inference.infer 9.51% : 0.000593s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.64% : 0.000027s : 3: replace.inline 30.36% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.91% : 0.000109s : 3: match.inline 8.09% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.34% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.26% : 0.000002s : 11: predicate.reduce_eliminate 2.53% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.46% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000383 8 46.70% : 0.000179s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.30% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030963 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.37% : 0.003520s : 1: add_attr 11.33% : 0.003508s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000059s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.76% : 0.000544s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000464s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.05% : 0.000944s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.86% : 0.002125s : 1: opt_a 0.31% : 0.000097s : 1: opt_after_cconv 1.48% : 0.000458s : 1: opt_after_jit_grad 0.59% : 0.000182s : 1: opt_b 12.79% : 0.003959s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000219s : 1: renormalize.infer 0.65% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.38% : 0.000118s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 21.88% : 0.006775s : 1: task_emit 0.23% : 0.000070s : 1: tuple_transform 20.35% : 0.006301s : 1: type_inference 0.20% : 0.000063s : 1: validate TotalTime = 0.0182814, [24] [bootstrap]: 0.00044465 [type_inference]: 0.00438748 [event_method]: 1.066e-05 [auto_monad]: 5.049e-05 [graph_reusing]: 5.32999e-06 [inline]: 1.96e-06 [add_attr]: 0.00304028, [1] [add_attr_with_inline]: 0.00303218, [1] [Cycle 1]: 4.402e-05, [2] [tag_attr]: 1.138e-05 [meta_addattr_fg_expand]: 3.10998e-06 [parallel-infer-symbol]: 2.46998e-06 [pre_auto_parallel]: 2.139e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.00367816, [53] [py_interpret_to_execute]: 1.587e-05 [rewriter_before_opt_a]: 3.783e-05 [opt_a]: 0.00185103, [2] [Cycle 1]: 0.00125387, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 2.327e-05 [loop_unroll]: 1.382e-05 [a_1]: 0.00029473 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 4.03001e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 1.71002e-06 [a_2]: 7.668e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 2.31998e-06 [meta_shard_fg_expand]: 1.38002e-06 [shard_inline]: 5.81998e-06 [merge_send_recv]: 7.73001e-06 [auto_parallel]: 5.49998e-06 [parallel]: 1.656e-05 [flash_sp]: 7.94002e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 8.86997e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.32002e-06 [virtual_dataset]: 5.89999e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.72999e-06 [merge_forward]: 3.82998e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.59001e-06 [flash_sp_send_recv_attached]: 2.73998e-06 [receive_attached]: 2.52001e-06 [after_resolve]: 1.086e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00033955 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.276e-05 [cse]: 2.852e-05 [a_3]: 4.001e-05 [Cycle 2]: 0.00058809, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012364 [with_stream_mark]: 9.09998e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.792e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.54e-06 [parallel]: 4.20999e-06 [flash_sp]: 3.05002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.59001e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 4.97999e-06 [merge_forward]: 2.50002e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 6.05002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.55001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.12e-06 [after_resolve]: 8.90999e-06 [a_after_grad]: 8.15e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 5.70001e-06 [cse]: 1.22e-05 [a_3]: 3.328e-05 [py_interpret_to_execute_after_opt_a]: 7.63001e-06 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 3.001e-05 [convert_after_rewriter]: 6.73998e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00045043 [opt_b]: 0.00017818, [1] [Cycle 1]: 0.00017224, [7] [b_1]: 0.00010643 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.23002e-06 [renormalize]: 3.4002e-07 [cse]: 1.582e-05 [optimize_parallel_all_gather_comm]: 1.519e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 2.157e-05 [loop_unroll]: 0.00040843 [opt_after_cconv]: 9.339e-05, [1] [Cycle 1]: 8.751e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.549e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.278e-05 [tuple_transform]: 9.92e-05, [1] [Cycle 1]: 9.486e-05, [4] [d_1]: 3.888e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 3.564e-05 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 4.449e-05 [cse_after_recomputation]: 2.069e-05, [1] [Cycle 1]: 1.629e-05, [1] [cse]: 1.106e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 4.94e-06 [bias_add_comm_swap]: 2.66999e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 3.3e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.40002e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.28002e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.23002e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.83997e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 3.86001e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.876e-05, [1] [Cycle 1]: 6.475e-05, [6] [build]: 2.15002e-06 [elim_shapecalc]: 8.19998e-06 [elim_not_effective]: 1.157e-05 [opt_reshape]: 6.34001e-06 [fold_const_symbol]: 8.70999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.19999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.579e-05 [get_jit_bprop_graph]: 1.14998e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044846 [validate]: 3.165e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00592377 [execute]: 7.05e-06 Sums bootstrap : 0.000445s : 3.11% type_inference : 0.004387s : 30.72% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000002s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000418s : 2.93% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000340s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.13% optimize.opt_a.cse : 0.000041s : 0.29% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.15% optimize.opt_b.b_1 : 0.000106s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000408s : 2.86% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000036s : 0.25% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.02% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 3.14% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005924s : 41.47% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 17.95% : 0.000021s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.35% : 0.000005s : 4: substitution.graph_param_transform 65.84% : 0.000078s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.73% : 0.000004s : 4: substitution.remove_not_recompute_node 3.26% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004347 2 92.13% : 0.004005s : 1: type_inference.infer 7.87% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000163 984 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 8: predicate.addn_check_dump 0.59% : 0.000001s : 9: predicate.addn_zero_filter 0.58% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.98% : 0.000003s : 17: predicate.arithmetic_simplify 0.70% : 0.000001s : 9: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.66% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.74% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.68% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.34% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.95% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.87% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.88% : 0.000001s : 13: predicate.environ_get_depend_swap 1.60% : 0.000003s : 21: predicate.environ_get_eliminate 0.89% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.79% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.56% : 0.000003s : 11: predicate.float_depend_g_call 0.54% : 0.000001s : 8: predicate.float_environ_get_switch 0.83% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 4.90% : 0.000008s : 44: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.52% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.75% : 0.000001s : 8: predicate.less_batch_normalization 1.36% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.77% : 0.000003s : 26: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.58% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.39% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.60% : 0.000001s : 9: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.03% : 0.000002s : 11: predicate.partial_defer_inline 1.07% : 0.000002s : 13: predicate.partial_eliminate 0.64% : 0.000001s : 9: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 0.81% : 0.000001s : 9: predicate.reduce_eliminate 1.80% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 8: predicate.remove_not_recompute_node 1.10% : 0.000002s : 17: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.69% : 0.000001s : 9: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.71% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.72% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.83% : 0.000001s : 11: predicate.switch_defer_inline 1.43% : 0.000002s : 19: predicate.switch_layer_defer_inline 21.46% : 0.000035s : 41: predicate.switch_simplify 0.62% : 0.000001s : 9: predicate.tile_eliminate 0.64% : 0.000001s : 9: predicate.transpose_eliminate 1.28% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.27% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.12% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.67% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.12% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.93% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.24% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.75% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.51% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 42.68% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.32% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026294 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.58% : 0.003044s : 1: add_attr 11.54% : 0.003036s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.83% : 0.000481s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000417s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.93% : 0.000770s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.28% : 0.000073s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.05% : 0.001854s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000458s : 1: opt_after_jit_grad 0.69% : 0.000181s : 1: opt_b 14.00% : 0.003682s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000186s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.57% : 0.005934s : 1: task_emit 0.39% : 0.000102s : 1: tuple_transform 16.74% : 0.004401s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0198439, [24] [bootstrap]: 0.00049326 [type_inference]: 0.00559626 [event_method]: 1.404e-05 [auto_monad]: 5.706e-05 [graph_reusing]: 5.60001e-06 [inline]: 1.82999e-06 [add_attr]: 0.00296975, [1] [add_attr_with_inline]: 0.00296086, [1] [Cycle 1]: 4.436e-05, [2] [tag_attr]: 1.473e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 2.46e-06 [pre_auto_parallel]: 2.601e-05 [insert-virtual-dataset]: 2.21e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00398791, [53] [py_interpret_to_execute]: 2.076e-05 [rewriter_before_opt_a]: 5.809e-05 [opt_a]: 0.0021486, [2] [Cycle 1]: 0.00155255, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 3.145e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.00049945 [with_stream_mark]: 1.719e-05 [recompute_prepare]: 8.15999e-06 [updatestate_depend_eliminate]: 4.08999e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.66e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 8.82e-06 [auto_parallel]: 5.89e-06 [parallel]: 1.788e-05 [flash_sp]: 6.76e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.54002e-06 [matmul_add_comm_reduction]: 8.3e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.96998e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.019e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 2.68e-06 [flash_sp_send_recv_attached]: 2.93e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.85999e-06 [renormalize]: 0.00040691 [add_forward_monad_depend]: 4.32003e-06 [auto_monad_grad]: 1.59998e-06 [auto_monad_eliminator]: 1.335e-05 [cse]: 2.692e-05 [a_3]: 4.155e-05 [Cycle 2]: 0.0005862, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.39998e-06 [a_1]: 0.00012482 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.683e-05 [accelerated_algorithm]: 5.51998e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.25999e-06 [flash_sp]: 2.86e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.49999e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 4.84e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.32999e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.20999e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.02002e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.65999e-06 [a_after_grad]: 8.23001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.07001e-06 [cse]: 1.5e-05 [a_3]: 3.167e-05 [py_interpret_to_execute_after_opt_a]: 7.61001e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.538e-05 [convert_after_rewriter]: 7.11001e-06 [order_py_execute_after_rewriter]: 5.06997e-06 [mutable_eliminate]: 0.000446 [opt_b]: 0.00018124, [1] [Cycle 1]: 0.00017526, [7] [b_1]: 0.00010833 [b_2]: 6.81001e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 4.49974e-07 [cse]: 1.583e-05 [optimize_parallel_all_gather_comm]: 1.87e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.192e-05 [loop_unroll]: 0.00041429 [opt_after_cconv]: 9.442e-05, [1] [Cycle 1]: 8.868e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.62e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.295e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.464e-05, [4] [d_1]: 3.929e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.99e-06 [add_recomputation]: 4.473e-05 [cse_after_recomputation]: 2.034e-05, [1] [Cycle 1]: 1.576e-05, [1] [cse]: 1.057e-05 [environ_conv]: 4.58001e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.02e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.29001e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.04998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.65999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.66e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.765e-05, [1] [Cycle 1]: 6.362e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.02e-06 [elim_not_effective]: 1.147e-05 [opt_reshape]: 6.06998e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 1.70025e-07 [detach_backward]: 2.11998e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.667e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.00047118 [validate]: 3.035e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00595933 [execute]: 6.51e-06 Sums bootstrap : 0.000493s : 3.10% type_inference : 0.005596s : 35.16% event_method : 0.000014s : 0.09% auto_monad : 0.000057s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000002s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000624s : 3.92% optimize.opt_a.with_stream_mark : 0.000027s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000019s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000407s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 2.80% optimize.opt_b.b_1 : 0.000108s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000414s : 2.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000471s : 2.96% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005959s : 37.44% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.71% : 0.000024s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.29% : 0.000005s : 4: substitution.graph_param_transform 67.34% : 0.000111s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.36% : 0.000004s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.35% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005553 2 89.53% : 0.004971s : 1: type_inference.infer 10.47% : 0.000582s : 1: type_inference.specialize ------[replace.] 0.000088 5 87.03% : 0.000077s : 3: replace.inline 12.97% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.97% : 0.000109s : 3: match.inline 8.03% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.42% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.81% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 0.95% : 0.000001s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.51% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.26% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.96% : 0.000002s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000346 8 45.56% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.44% : 0.000188s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028351 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.49% : 0.002974s : 1: add_attr 10.46% : 0.002964s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000062s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000527s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.49% : 0.000989s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.59% : 0.002151s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.70% : 0.000481s : 1: opt_after_jit_grad 0.65% : 0.000185s : 1: opt_b 14.08% : 0.003992s : 1: optimize 0.08% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000209s : 1: renormalize.infer 0.68% : 0.000192s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000040s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.05% : 0.005969s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.79% : 0.005610s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0377786, [24] [bootstrap]: 0.00052385 [type_inference]: 0.0115596 [event_method]: 4.641e-05 [auto_monad]: 0.00012331 [graph_reusing]: 8.20999e-06 [inline]: 2.03002e-06 [add_attr]: 0.00309534, [1] [add_attr_with_inline]: 0.00308676, [1] [Cycle 1]: 6.922e-05, [2] [tag_attr]: 3.53e-05 [meta_addattr_fg_expand]: 9.35001e-06 [parallel-infer-symbol]: 2.90002e-06 [pre_auto_parallel]: 4.966e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.0133045, [53] [py_interpret_to_execute]: 4.012e-05 [rewriter_before_opt_a]: 0.00014637 [opt_a]: 0.0110169, [3] [Cycle 1]: 0.00706976, [45] [expand_dump_flag]: 4.38001e-06 [switch_simplify]: 7.251e-05 [loop_unroll]: 6.148e-05 [a_1]: 0.00144141 [with_stream_mark]: 2.375e-05 [recompute_prepare]: 2.087e-05 [updatestate_depend_eliminate]: 9.23002e-06 [updatestate_assign_eliminate]: 8.07e-06 [updatestate_loads_eliminate]: 7.18998e-06 [parameter_eliminate]: 2.44999e-06 [a_2]: 0.00024333 [accelerated_algorithm]: 3.103e-05 [shard]: 1.77001e-06 [meta_shard_fg_expand]: 3.14999e-06 [shard_inline]: 1.597e-05 [merge_send_recv]: 1.508e-05 [auto_parallel]: 1.13e-05 [parallel]: 2.141e-05 [flash_sp]: 1.187e-05 [merge_comm]: 9.78002e-06 [allreduce_fusion]: 8.98002e-06 [matmul_add_comm_reduction]: 2.592e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.772e-05 [virtual_dataset]: 1.584e-05 [get_grad_eliminate_]: 1.55e-05 [virtual_output]: 1.498e-05 [merge_forward]: 9.49999e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 1.763e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.912e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 2.788e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00146192 [flash_sp_send_recv_attached]: 3.5e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 5.946e-05 [a_after_grad]: 8.15e-05 [renormalize]: 0.00240953 [add_forward_monad_depend]: 8.91002e-06 [auto_monad_grad]: 5.39998e-06 [auto_monad_eliminator]: 5.554e-05 [cse]: 0.00015931 [a_3]: 0.00033512 [Cycle 2]: 0.00302876, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 4.652e-05 [loop_unroll]: 4.434e-05 [a_1]: 0.00157014 [with_stream_mark]: 1.229e-05 [recompute_prepare]: 1.079e-05 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.64002e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012621 [accelerated_algorithm]: 1.186e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 9.35001e-06 [merge_send_recv]: 6.75002e-06 [auto_parallel]: 7.30998e-06 [parallel]: 5.09e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 5.88998e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.99002e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 1.015e-05 [virtual_dataset]: 8.91002e-06 [get_grad_eliminate_]: 8.63001e-06 [virtual_output]: 8.48999e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 8.69998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.675e-05 [merge_recompute_call_nodes]: 8.29983e-07 [before_grad]: 1.428e-05 [set_forward_comm_id_for_comm_node_pass]: 5.67001e-06 [meta_fg_expand]: 7.233e-05 [flash_sp_send_recv_attached]: 9.99979e-07 [receive_attached]: 1.19998e-06 [after_resolve]: 1.638e-05 [a_after_grad]: 1.468e-05 [renormalize]: 0.0005753 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.461e-05 [cse]: 4.585e-05 [a_3]: 6.436e-05 [Cycle 3]: 0.0009044, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 1.048e-05 [loop_unroll]: 8.88002e-06 [a_1]: 0.00024969 [with_stream_mark]: 9.87001e-06 [recompute_prepare]: 9.35001e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 3.92002e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 0.00012297 [accelerated_algorithm]: 1.181e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.66002e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.15e-06 [parallel]: 4.87e-06 [flash_sp]: 1.15001e-06 [merge_comm]: 5.00001e-06 [allreduce_fusion]: 4.79998e-06 [matmul_add_comm_reduction]: 7.63001e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 8.75999e-06 [get_grad_eliminate_]: 8.49998e-06 [virtual_output]: 8.20999e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.24998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.72e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.455e-05 [set_forward_comm_id_for_comm_node_pass]: 5.56e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 1.385e-05 [a_after_grad]: 1.397e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 1.06e-05 [cse]: 2.545e-05 [a_3]: 5.949e-05 [py_interpret_to_execute_after_opt_a]: 9.81998e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 5.051e-05 [convert_after_rewriter]: 9.22999e-06 [order_py_execute_after_rewriter]: 6.53e-06 [mutable_eliminate]: 0.00048585 [opt_b]: 0.0002893, [1] [Cycle 1]: 0.00028331, [7] [b_1]: 0.00018963 [b_2]: 1.125e-05 [updatestate_depend_eliminate]: 7.3e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 4.07998e-06 [renormalize]: 2.69996e-07 [cse]: 3.158e-05 [optimize_parallel_all_gather_comm]: 2.015e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 1.986e-05 [loop_unroll]: 0.0004269 [opt_after_cconv]: 0.00013511, [1] [Cycle 1]: 0.0001292, [7] [c_1]: 4.834e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.94002e-06 [cse]: 2.929e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.77e-05 [tuple_transform]: 0.00010043, [1] [Cycle 1]: 9.578e-05, [4] [d_1]: 6.639e-05 [none_parameter_eliminate]: 1.41998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 9.61e-06 [partial_unused_args_eliminate]: 1.54e-06 [add_recomputation]: 5.566e-05 [cse_after_recomputation]: 3.132e-05, [1] [Cycle 1]: 2.678e-05, [1] [cse]: 2.134e-05 [environ_conv]: 8.18001e-06 [swap_dp_allreduce_reducescatter]: 7.87998e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 7.39994e-07 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.644e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 5.07999e-06 [overlap_recompute_and_grad_model_parallel]: 5.92001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.51998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 4.98001e-06 [overlap_grad_flash_sp]: 2.449e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.565e-05, [1] [Cycle 1]: 9.14e-05, [6] [build]: 9.35001e-06 [elim_shapecalc]: 1.293e-05 [elim_not_effective]: 1.809e-05 [opt_reshape]: 9.32001e-06 [fold_const_symbol]: 1.438e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 2.527e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00046905 [validate]: 4.588e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00829338 [execute]: 6.93e-06 Sums bootstrap : 0.000524s : 1.57% type_inference : 0.011560s : 34.57% event_method : 0.000046s : 0.14% auto_monad : 0.000123s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000040s : 0.12% optimize.rewriter_before_opt_a : 0.000146s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.34% optimize.opt_a.a_1 : 0.003261s : 9.75% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.47% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000031s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001537s : 4.60% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.002985s : 8.93% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000231s : 0.69% optimize.opt_a.a_3 : 0.000459s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000486s : 1.45% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000427s : 1.28% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.08% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000469s : 1.40% validate : 0.000046s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008293s : 24.80% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000806 222 5.56% : 0.000045s : 12: substitution.arithmetic_simplify 1.69% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.49% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 58.20% : 0.000469s : 17: substitution.inline 1.94% : 0.000016s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000015s : 3: substitution.less_batch_normalization 1.59% : 0.000013s : 11: substitution.minmaximum_grad 0.63% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.01% : 0.000024s : 10: substitution.replace_applicator 1.31% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.41% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.66% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.16% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.29% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.21% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011486 2 86.66% : 0.009953s : 1: type_inference.infer 13.34% : 0.001532s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.77% : 0.000127s : 17: replace.inline 42.23% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000495 33 92.96% : 0.000460s : 17: match.inline 7.04% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.06% : 0.000008s : 68: predicate.accumulaten_eliminater 0.25% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.67% : 0.000042s : 249: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.68% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.04% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.95% : 0.000015s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.23% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 3.00% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.98% : 0.000037s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001559 34 56.92% : 0.000887s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.08% : 0.000672s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062412 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.97% : 0.003100s : 1: add_attr 4.95% : 0.003091s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000131s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.89% : 0.000557s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000495s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.89% : 0.004924s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.08% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.66% : 0.011020s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.77% : 0.000478s : 1: opt_after_jit_grad 0.47% : 0.000293s : 1: opt_b 21.32% : 0.013308s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.57% : 0.001606s : 2: renormalize.infer 2.19% : 0.001365s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000055s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000098s : 1: symbol_engine_optimizer 13.30% : 0.008304s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 18.54% : 0.011574s : 1: type_inference 0.13% : 0.000079s : 1: validate TotalTime = 0.0185817, [24] [bootstrap]: 0.00046119 [type_inference]: 0.00434624 [event_method]: 1.057e-05 [auto_monad]: 5.111e-05 [graph_reusing]: 5.24e-06 [inline]: 1.87001e-06 [add_attr]: 0.00301256, [1] [add_attr_with_inline]: 0.00300444, [1] [Cycle 1]: 4.448e-05, [2] [tag_attr]: 1.162e-05 [meta_addattr_fg_expand]: 3.01001e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.118e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.90001e-06 [optimize]: 0.00365851, [53] [py_interpret_to_execute]: 1.561e-05 [rewriter_before_opt_a]: 3.935e-05 [opt_a]: 0.00185986, [2] [Cycle 1]: 0.001268, [45] [expand_dump_flag]: 2.47001e-06 [switch_simplify]: 2.44e-05 [loop_unroll]: 1.395e-05 [a_1]: 0.00028986 [with_stream_mark]: 1.341e-05 [recompute_prepare]: 7.03998e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 3.48e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.684e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 6.00002e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 5.94e-06 [parallel]: 1.786e-05 [flash_sp]: 7.19001e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 9.30001e-06 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.77001e-06 [merge_forward]: 3.5e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.51e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.171e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 8.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 1.99999e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 8.62998e-06 [renormalize]: 0.00035928 [add_forward_monad_depend]: 4.41002e-06 [auto_monad_grad]: 1.88002e-06 [auto_monad_eliminator]: 1.359e-05 [cse]: 2.726e-05 [a_3]: 3.966e-05 [Cycle 2]: 0.00058255, [45] [expand_dump_flag]: 7.79983e-07 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.00012373 [with_stream_mark]: 9.34e-06 [recompute_prepare]: 5.52001e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.28002e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.737e-05 [accelerated_algorithm]: 5.47999e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.15e-06 [auto_parallel]: 5.04003e-06 [parallel]: 4.18001e-06 [flash_sp]: 3.2e-06 [merge_comm]: 2.93998e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 3.99974e-07 [virtual_shard_identity]: 6.05002e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 5.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.83998e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.97e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.07999e-06 [a_after_grad]: 7.82998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 5.99e-06 [cse]: 1.245e-05 [a_3]: 3.13e-05 [py_interpret_to_execute_after_opt_a]: 7.51999e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 3.232e-05 [convert_after_rewriter]: 7.05e-06 [order_py_execute_after_rewriter]: 5.22999e-06 [mutable_eliminate]: 0.00044827 [opt_b]: 0.00017905, [1] [Cycle 1]: 0.00017303, [7] [b_1]: 0.00010649 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.39991e-07 [cse]: 1.605e-05 [optimize_parallel_all_gather_comm]: 1.525e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.154e-05 [loop_unroll]: 0.00041034 [opt_after_cconv]: 9.407e-05, [1] [Cycle 1]: 8.832e-05, [7] [c_1]: 2.739e-05 [parameter_eliminate]: 2.41998e-06 [updatestate_depend_eliminate]: 5.04003e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.615e-05 [renormalize]: 2.59985e-07 [remove_dup_value]: 1.231e-05 [tuple_transform]: 6.893e-05, [1] [Cycle 1]: 6.469e-05, [4] [d_1]: 3.892e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.436e-05 [cse_after_recomputation]: 2.026e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.078e-05 [environ_conv]: 4.53001e-06 [swap_dp_allreduce_reducescatter]: 5.71e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 3.65003e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.35001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79998e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.71999e-06 [overlap_grad_flash_sp]: 1.661e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.736e-05, [1] [Cycle 1]: 6.334e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.17998e-06 [elim_not_effective]: 1.122e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.60001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.587e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00044253 [validate]: 3.075e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00630755 [execute]: 6.52001e-06 Sums bootstrap : 0.000461s : 3.15% type_inference : 0.004346s : 29.73% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000414s : 2.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000359s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.07% optimize.opt_b.b_1 : 0.000106s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000410s : 2.81% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000443s : 3.03% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006308s : 43.14% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000119 26 18.50% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.42% : 0.000005s : 4: substitution.graph_param_transform 65.05% : 0.000078s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.05% : 0.000005s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004306 2 92.06% : 0.003964s : 1: type_inference.infer 7.94% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000134 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.49% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.86% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 2.12% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 11: predicate.float_depend_g_call 0.77% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.51% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.25% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.89% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 1.01% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.01% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.81% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.19% : 0.000134s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026534 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.37% : 0.003017s : 1: add_attr 11.34% : 0.003008s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000496s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000006s : 1: label_micro_interleaved_index 1.58% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.88% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.02% : 0.001863s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.70% : 0.000452s : 1: opt_after_jit_grad 0.69% : 0.000183s : 1: opt_b 13.80% : 0.003662s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000185s : 1: renormalize.infer 0.63% : 0.000167s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000037s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.81% : 0.006318s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.43% : 0.004360s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0359357, [24] [bootstrap]: 0.0005502 [type_inference]: 0.0102223 [event_method]: 4.103e-05 [auto_monad]: 0.00011399 [graph_reusing]: 7.75e-06 [inline]: 1.75001e-06 [add_attr]: 0.00299532, [1] [add_attr_with_inline]: 0.00298598, [1] [Cycle 1]: 6.732e-05, [2] [tag_attr]: 3.243e-05 [meta_addattr_fg_expand]: 8.30999e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 4.47e-05 [insert-virtual-dataset]: 2.64001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0129929, [53] [py_interpret_to_execute]: 3.672e-05 [rewriter_before_opt_a]: 0.00012741 [opt_a]: 0.0107582, [3] [Cycle 1]: 0.006889, [45] [expand_dump_flag]: 3.12002e-06 [switch_simplify]: 6.629e-05 [loop_unroll]: 5.469e-05 [a_1]: 0.00133788 [with_stream_mark]: 2.25e-05 [recompute_prepare]: 2.164e-05 [updatestate_depend_eliminate]: 8.86002e-06 [updatestate_assign_eliminate]: 7.93001e-06 [updatestate_loads_eliminate]: 7.51001e-06 [parameter_eliminate]: 2.61999e-06 [a_2]: 0.00024543 [accelerated_algorithm]: 3.042e-05 [shard]: 1.91003e-06 [meta_shard_fg_expand]: 3.38e-06 [shard_inline]: 1.605e-05 [merge_send_recv]: 1.64e-05 [auto_parallel]: 1.102e-05 [parallel]: 1.96e-05 [flash_sp]: 1.13e-05 [merge_comm]: 9.79e-06 [allreduce_fusion]: 9.07999e-06 [matmul_add_comm_reduction]: 2.741e-05 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 1.875e-05 [virtual_dataset]: 1.576e-05 [get_grad_eliminate_]: 1.537e-05 [virtual_output]: 1.525e-05 [merge_forward]: 9.46003e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.769e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.909e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.761e-05 [set_forward_comm_id_for_comm_node_pass]: 9.59999e-06 [meta_fg_expand]: 0.00136681 [flash_sp_send_recv_attached]: 3.5e-06 [receive_attached]: 2.56e-06 [after_resolve]: 5.932e-05 [a_after_grad]: 8.001e-05 [renormalize]: 0.00239504 [add_forward_monad_depend]: 9.10999e-06 [auto_monad_grad]: 5.51e-06 [auto_monad_eliminator]: 5.479e-05 [cse]: 0.00016078 [a_3]: 0.00033635 [Cycle 2]: 0.00292624, [45] [expand_dump_flag]: 1.47001e-06 [switch_simplify]: 4.688e-05 [loop_unroll]: 4.382e-05 [a_1]: 0.00152686 [with_stream_mark]: 1.188e-05 [recompute_prepare]: 1.076e-05 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 0.00012674 [accelerated_algorithm]: 1.206e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 6.61999e-06 [auto_parallel]: 7.85e-06 [parallel]: 4.94e-06 [flash_sp]: 3.08e-06 [merge_comm]: 5.19e-06 [allreduce_fusion]: 4.59002e-06 [matmul_add_comm_reduction]: 7.58001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 9.88998e-06 [virtual_dataset]: 8.62e-06 [get_grad_eliminate_]: 9.42999e-06 [virtual_output]: 8.61002e-06 [merge_forward]: 4.52998e-06 [cell_reuse_recompute_pass]: 8.00006e-07 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.658e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.419e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 3.337e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.495e-05 [a_after_grad]: 1.412e-05 [renormalize]: 0.00056493 [add_forward_monad_depend]: 4e-06 [auto_monad_grad]: 1.21002e-06 [auto_monad_eliminator]: 1.433e-05 [cse]: 4.465e-05 [a_3]: 6.449e-05 [Cycle 3]: 0.00092856, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.045e-05 [loop_unroll]: 8.97e-06 [a_1]: 0.0002845 [with_stream_mark]: 1.024e-05 [recompute_prepare]: 9.47999e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 4e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.000123 [accelerated_algorithm]: 1.184e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 9.09003e-06 [merge_send_recv]: 7.09001e-06 [auto_parallel]: 7.15e-06 [parallel]: 4.84e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.16002e-06 [allreduce_fusion]: 4.97999e-06 [matmul_add_comm_reduction]: 7.43e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 8.89e-06 [get_grad_eliminate_]: 8.50001e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.20999e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.37998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.592e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.407e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29998e-06 [meta_fg_expand]: 2.96001e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.319e-05 [a_after_grad]: 1.411e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 1.072e-05 [cse]: 2.489e-05 [a_3]: 5.669e-05 [py_interpret_to_execute_after_opt_a]: 1.018e-05 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 4.604e-05 [convert_after_rewriter]: 9.60001e-06 [order_py_execute_after_rewriter]: 7.33e-06 [mutable_eliminate]: 0.00045529 [opt_b]: 0.00028655, [1] [Cycle 1]: 0.00028025, [7] [b_1]: 0.00018862 [b_2]: 1.08e-05 [updatestate_depend_eliminate]: 7.19001e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.93001e-06 [renormalize]: 5.50004e-07 [cse]: 3.024e-05 [optimize_parallel_all_gather_comm]: 2.072e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 1.968e-05 [loop_unroll]: 0.00042055 [opt_after_cconv]: 0.0001338, [1] [Cycle 1]: 0.00012803, [7] [c_1]: 4.793e-05 [parameter_eliminate]: 2.18998e-06 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 2.898e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.805e-05 [tuple_transform]: 0.00010183, [1] [Cycle 1]: 9.726e-05, [4] [d_1]: 6.677e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 9.86e-06 [partial_unused_args_eliminate]: 1.94999e-06 [add_recomputation]: 6.253e-05 [cse_after_recomputation]: 3.273e-05, [1] [Cycle 1]: 2.767e-05, [1] [cse]: 2.168e-05 [environ_conv]: 8.18001e-06 [swap_dp_allreduce_reducescatter]: 7.68999e-06 [bias_add_comm_swap]: 2.60002e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.14998e-06 [ForceFp32Comm]: 9.10019e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.70002e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.27999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.702e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 5.81e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 5.17e-06 [overlap_grad_flash_sp]: 2.324e-05 [begin_end_overlap_inline]: 6.39993e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.824e-05, [1] [Cycle 1]: 9.41e-05, [6] [build]: 9.42001e-06 [elim_shapecalc]: 1.349e-05 [elim_not_effective]: 1.786e-05 [opt_reshape]: 1.011e-05 [fold_const_symbol]: 1.498e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.437e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.61001e-06 [opt_after_jit_grad]: 0.00046274 [validate]: 4.434e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00820569 [execute]: 6.62002e-06 Sums bootstrap : 0.000550s : 1.74% type_inference : 0.010222s : 32.29% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.12% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003149s : 9.95% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000495s : 1.56% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.10% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.11% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001403s : 4.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000108s : 0.34% optimize.opt_a.renormalize : 0.002960s : 9.35% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000230s : 0.73% optimize.opt_a.a_3 : 0.000458s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000455s : 1.44% optimize.opt_b.b_1 : 0.000189s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000421s : 1.33% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000063s : 0.20% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000463s : 1.46% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008206s : 25.92% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000727 218 5.90% : 0.000043s : 11: substitution.arithmetic_simplify 1.97% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.64% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 54.73% : 0.000398s : 16: substitution.inline 2.09% : 0.000015s : 2: substitution.inline_without_move 1.44% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000015s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.77% : 0.000006s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.29% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.80% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.41% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.52% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010153 2 87.49% : 0.008883s : 1: type_inference.infer 12.51% : 0.001270s : 1: type_inference.specialize ------[replace.] 0.000198 30 59.05% : 0.000117s : 16: replace.inline 40.95% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000420 30 92.86% : 0.000390s : 16: match.inline 7.14% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000733 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000016s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.69% : 0.000042s : 244: predicate.inline 1.29% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 164: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.74% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.68% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.93% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.42% : 0.000010s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.28% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001474 32 57.81% : 0.000852s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.19% : 0.000622s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060007 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.00% : 0.003000s : 1: add_attr 4.98% : 0.002990s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000067s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.97% : 0.000583s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.99% : 0.004797s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.93% : 0.010761s : 1: opt_a 0.23% : 0.000137s : 1: opt_after_cconv 0.79% : 0.000472s : 1: opt_after_jit_grad 0.48% : 0.000290s : 1: opt_b 21.66% : 0.012997s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000049s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.59% : 0.001555s : 2: renormalize.infer 2.32% : 0.001392s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.22% : 0.000132s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.69% : 0.008216s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 17.06% : 0.010237s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x2-kbk],max_mem:38.0M TotalTime = 0.128339, [24] [bootstrap]: 0.00050453 [type_inference]: 0.00612629 [event_method]: 1.392e-05 [auto_monad]: 5.722e-05 [graph_reusing]: 5.53002e-06 [inline]: 1.87999e-06 [add_attr]: 0.00374046, [1] [add_attr_with_inline]: 0.00372831, [1] [Cycle 1]: 5.177e-05, [2] [tag_attr]: 1.915e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 3.053e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.17001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0043377, [53] [py_interpret_to_execute]: 2.109e-05 [rewriter_before_opt_a]: 6.491e-05 [opt_a]: 0.00232305, [2] [Cycle 1]: 0.00172192, [45] [expand_dump_flag]: 2.66999e-06 [switch_simplify]: 3.179e-05 [loop_unroll]: 2.056e-05 [a_1]: 0.00050435 [with_stream_mark]: 1.423e-05 [recompute_prepare]: 7.84002e-06 [updatestate_depend_eliminate]: 4.38001e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 7.639e-05 [accelerated_algorithm]: 6.45002e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.6e-06 [auto_parallel]: 5.97001e-06 [parallel]: 2.435e-05 [flash_sp]: 7.71999e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 8.1e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 6.14999e-06 [get_grad_eliminate_]: 5.76998e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.16e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 9.32999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 9.60001e-06 [renormalize]: 0.00056208 [add_forward_monad_depend]: 4.92e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.367e-05 [cse]: 2.748e-05 [a_3]: 4.192e-05 [Cycle 2]: 0.00059186, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.55001e-06 [a_1]: 0.00012684 [with_stream_mark]: 1.06e-05 [recompute_prepare]: 5.75001e-06 [updatestate_depend_eliminate]: 3.35e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.823e-05 [accelerated_algorithm]: 5.68002e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.56002e-06 [auto_parallel]: 5.14998e-06 [parallel]: 4.44002e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.08998e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 5.38002e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.24999e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 5.72001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82999e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.62998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01999e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.19e-06 [a_after_grad]: 7.96001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 7.29982e-07 [auto_monad_eliminator]: 6.26998e-06 [cse]: 1.294e-05 [a_3]: 3.227e-05 [py_interpret_to_execute_after_opt_a]: 8.38999e-06 [slice_cell_reuse_recomputed_activation]: 1.79998e-06 [rewriter_after_opt_a]: 3.126e-05 [convert_after_rewriter]: 7.06999e-06 [order_py_execute_after_rewriter]: 4.85999e-06 [mutable_eliminate]: 0.0005086 [opt_b]: 0.00018792, [1] [Cycle 1]: 0.00018149, [7] [b_1]: 0.00011166 [b_2]: 7.56999e-06 [updatestate_depend_eliminate]: 5.49998e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 4.30009e-07 [cse]: 1.751e-05 [optimize_parallel_all_gather_comm]: 1.589e-05 [overlap_param_gather]: 2.11e-06 [cconv]: 2.254e-05 [loop_unroll]: 0.00043808 [opt_after_cconv]: 9.764e-05, [1] [Cycle 1]: 9.165e-05, [7] [c_1]: 2.827e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 5.55001e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.647e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.192e-05 [tuple_transform]: 7.05e-05, [1] [Cycle 1]: 6.568e-05, [4] [d_1]: 4.012e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 2.59985e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.88002e-06 [add_recomputation]: 5.091e-05 [cse_after_recomputation]: 2.078e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.088e-05 [environ_conv]: 4.45e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 3.01001e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.01998e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.73998e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.56999e-06 [overlap_recompute_and_grad_model_parallel]: 4.82998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.672e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 2.01998e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 0.00013619, [1] [Cycle 1]: 0.0001319, [6] [build]: 2.68e-06 [elim_shapecalc]: 8.80001e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 7.064e-05 [fold_const_symbol]: 9.75002e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00047065 [validate]: 3.183e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.112752 [execute]: 8.85001e-06 Sums bootstrap : 0.000505s : 0.41% type_inference : 0.006126s : 4.96% event_method : 0.000014s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000031s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000065s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000631s : 0.51% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000562s : 0.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.03% optimize.opt_a.a_3 : 0.000074s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000509s : 0.41% optimize.opt_b.b_1 : 0.000112s : 0.09% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000438s : 0.35% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000071s : 0.06% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000471s : 0.38% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.112752s : 91.22% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000216 30 11.45% : 0.000025s : 5: substitution.arithmetic_simplify 0.87% : 0.000002s : 2: substitution.elim_not_effective 0.60% : 0.000001s : 2: substitution.fold_const_symbol 2.64% : 0.000006s : 4: substitution.graph_param_transform 73.75% : 0.000159s : 3: substitution.inline 1.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.31% : 0.000005s : 4: substitution.remove_not_recompute_node 1.98% : 0.000004s : 4: substitution.replace_old_param 5.15% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006081 2 90.57% : 0.005508s : 1: type_inference.infer 9.43% : 0.000574s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.68% : 0.000028s : 3: replace.inline 30.32% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000167 5 93.95% : 0.000157s : 3: match.inline 6.05% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.72% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.81% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.47% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.89% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.88% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.73% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.50% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.24% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000358 8 44.27% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.73% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.138191 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.71% : 0.003746s : 1: add_attr 2.70% : 0.003732s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000062s : 1: auto_monad 0.01% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.39% : 0.000541s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000447s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000518s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.72% : 0.000999s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000091s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.07% : 0.000097s : 4: opt.transform.symbol_engine_opt 1.68% : 0.002326s : 1: opt_a 0.07% : 0.000101s : 1: opt_after_cconv 0.35% : 0.000481s : 1: opt_after_jit_grad 0.14% : 0.000191s : 1: opt_b 3.14% : 0.004341s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000035s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.22% : 0.000302s : 1: renormalize.infer 0.18% : 0.000253s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000069s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000139s : 1: symbol_engine_optimizer 81.61% : 0.112774s : 1: task_emit 0.05% : 0.000073s : 1: tuple_transform 4.44% : 0.006141s : 1: type_inference 0.04% : 0.000060s : 1: validate TotalTime = 0.109178, [24] [bootstrap]: 0.00046123 [type_inference]: 0.00444405 [event_method]: 1.161e-05 [auto_monad]: 4.953e-05 [graph_reusing]: 4.87e-06 [inline]: 2.14999e-06 [add_attr]: 0.00299192, [1] [add_attr_with_inline]: 0.00298444, [1] [Cycle 1]: 4.541e-05, [2] [tag_attr]: 1.194e-05 [meta_addattr_fg_expand]: 3.14999e-06 [parallel-infer-symbol]: 2.89001e-06 [pre_auto_parallel]: 2.203e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00373774, [53] [py_interpret_to_execute]: 1.526e-05 [rewriter_before_opt_a]: 3.845e-05 [opt_a]: 0.00191124, [2] [Cycle 1]: 0.00130606, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 2.508e-05 [loop_unroll]: 1.417e-05 [a_1]: 0.00029208 [with_stream_mark]: 1.39e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.51999e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.82999e-06 [a_2]: 7.645e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.23001e-06 [auto_parallel]: 5.65001e-06 [parallel]: 1.747e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 3.99002e-06 [allreduce_fusion]: 3.86001e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 7.95998e-06 [virtual_dataset]: 6.49001e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.60998e-06 [cell_reuse_recompute_pass]: 1.21997e-06 [offload_activation]: 9.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.124e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 9.69999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.116e-05 [a_after_grad]: 8.79e-06 [renormalize]: 0.00035388 [add_forward_monad_depend]: 4.67998e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.237e-05 [cse]: 2.665e-05 [a_3]: 7.274e-05 [Cycle 2]: 0.00059603, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.94001e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.00012338 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.43002e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.41e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.721e-05 [accelerated_algorithm]: 5.79999e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.32e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.34001e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.60001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.76e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.88001e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.51002e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.026e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 7.71001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1e-05 [a_after_grad]: 8.16002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.61999e-06 [cse]: 1.297e-05 [a_3]: 3.18e-05 [py_interpret_to_execute_after_opt_a]: 7.98999e-06 [slice_cell_reuse_recomputed_activation]: 1.93997e-06 [rewriter_after_opt_a]: 3.419e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00046412 [opt_b]: 0.00018647, [1] [Cycle 1]: 0.00018046, [7] [b_1]: 0.00011279 [b_2]: 7.40998e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 5.19998e-07 [cse]: 1.572e-05 [optimize_parallel_all_gather_comm]: 1.509e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.00041703 [opt_after_cconv]: 9.337e-05, [1] [Cycle 1]: 8.768e-05, [7] [c_1]: 2.787e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.06e-06 [cse]: 1.542e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.216e-05 [tuple_transform]: 6.86e-05, [1] [Cycle 1]: 6.435e-05, [4] [d_1]: 3.889e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 4.213e-05 [cse_after_recomputation]: 1.92e-05, [1] [Cycle 1]: 1.486e-05, [1] [cse]: 9.99999e-06 [environ_conv]: 4.74e-06 [swap_dp_allreduce_reducescatter]: 4.96002e-06 [bias_add_comm_swap]: 2.99999e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.39001e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.78998e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.136e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.47002e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.681e-05 [begin_end_overlap_inline]: 4.50003e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.719e-05, [1] [Cycle 1]: 6.318e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.1e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.79e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.452e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.21999e-06 [opt_after_jit_grad]: 0.0004437 [validate]: 3.098e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.0967318 [execute]: 9.18002e-06 Sums bootstrap : 0.000461s : 0.44% type_inference : 0.004444s : 4.22% event_method : 0.000012s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000038s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000415s : 0.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000354s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000105s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000464s : 0.44% optimize.opt_b.b_1 : 0.000113s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000417s : 0.40% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.42% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096732s : 91.94% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000119 26 19.73% : 0.000023s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.32% : 0.000005s : 4: substitution.graph_param_transform 63.74% : 0.000076s : 2: substitution.inline 2.60% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000004s : 4: substitution.remove_not_recompute_node 3.52% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004403 2 91.81% : 0.004042s : 1: type_inference.infer 8.19% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000074 2 100.00% : 0.000074s : 2: match.inline ------[predicate.] 0.000139 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.49% : 0.000003s : 17: predicate.arithmetic_simplify 0.72% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.57% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.07% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.78% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.08% : 0.000001s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.90% : 0.000001s : 9: predicate.print_const_string_wrapper 0.90% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 1.18% : 0.000002s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.11% : 0.000002s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.74% : 0.000007s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000262 6 44.13% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.87% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.117220 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.56% : 0.002996s : 1: add_attr 2.55% : 0.002988s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000054s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000496s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000473s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000011s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000804s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000093s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.63% : 0.001914s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.39% : 0.000453s : 1: opt_after_jit_grad 0.16% : 0.000190s : 1: opt_b 3.19% : 0.003741s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.16% : 0.000191s : 1: renormalize.infer 0.13% : 0.000156s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000038s : 1: rewriter_after_opt_a 0.04% : 0.000042s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.54% : 0.096754s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.80% : 0.004458s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.111093, [24] [bootstrap]: 0.00045887 [type_inference]: 0.00550146 [event_method]: 1.434e-05 [auto_monad]: 5.475e-05 [graph_reusing]: 5.99e-06 [inline]: 1.87001e-06 [add_attr]: 0.00299869, [1] [add_attr_with_inline]: 0.00299107, [1] [Cycle 1]: 4.484e-05, [2] [tag_attr]: 1.539e-05 [meta_addattr_fg_expand]: 4.04997e-06 [parallel-infer-symbol]: 2.48e-06 [pre_auto_parallel]: 2.487e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00393863, [53] [py_interpret_to_execute]: 2.162e-05 [rewriter_before_opt_a]: 5.737e-05 [opt_a]: 0.00211418, [2] [Cycle 1]: 0.00151375, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 3.148e-05 [loop_unroll]: 2.052e-05 [a_1]: 0.00044328 [with_stream_mark]: 1.305e-05 [recompute_prepare]: 7.55998e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 3.33998e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.554e-05 [accelerated_algorithm]: 6.52001e-06 [shard]: 1.86998e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 7.48e-06 [auto_parallel]: 6.02999e-06 [parallel]: 1.837e-05 [flash_sp]: 8e-06 [merge_comm]: 3.50998e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 8.06001e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.21001e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.24998e-06 [virtual_output]: 5.77001e-06 [merge_forward]: 4.08999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.09998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 9.52999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.99001e-06 [receive_attached]: 2.10002e-06 [after_resolve]: 9.96e-06 [a_after_grad]: 8.70999e-06 [renormalize]: 0.0004314 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.448e-05 [cse]: 2.846e-05 [a_3]: 4.067e-05 [Cycle 2]: 0.00059102, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 6.85998e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012504 [with_stream_mark]: 9.07001e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.41998e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 6.744e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.05999e-06 [shard_inline]: 5.24e-06 [merge_send_recv]: 4.39998e-06 [auto_parallel]: 4.94e-06 [parallel]: 3.74002e-06 [flash_sp]: 3.61001e-06 [merge_comm]: 2.97002e-06 [allreduce_fusion]: 3.03998e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.87001e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 4.85999e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 5.80002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.77001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.89002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00002e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.77999e-06 [a_after_grad]: 8.13001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04003e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.21e-06 [cse]: 1.317e-05 [a_3]: 3.194e-05 [py_interpret_to_execute_after_opt_a]: 7.75998e-06 [slice_cell_reuse_recomputed_activation]: 2.21998e-06 [rewriter_after_opt_a]: 2.978e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.64e-06 [mutable_eliminate]: 0.00044872 [opt_b]: 0.00017885, [1] [Cycle 1]: 0.00017288, [7] [b_1]: 0.00010707 [b_2]: 6.75998e-06 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.00003e-07 [cse]: 1.638e-05 [optimize_parallel_all_gather_comm]: 1.597e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 2.319e-05 [loop_unroll]: 0.00041415 [opt_after_cconv]: 9.391e-05, [1] [Cycle 1]: 8.824e-05, [7] [c_1]: 2.75e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 4.89998e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.13998e-06 [cse]: 1.609e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.185e-05 [tuple_transform]: 6.919e-05, [1] [Cycle 1]: 6.486e-05, [4] [d_1]: 3.921e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 4.318e-05 [cse_after_recomputation]: 2.025e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.087e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.52999e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 3.92002e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 8.80013e-07 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86003e-06 [control_data_broadcast_order]: 1.125e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 3.67002e-06 [overlap_grad_flash_sp]: 1.756e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 2.14e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 6.753e-05, [1] [Cycle 1]: 6.314e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 7.87003e-06 [elim_not_effective]: 1.099e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.73002e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.496e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00044951 [validate]: 3.04e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0973638 [execute]: 9.25999e-06 Sums bootstrap : 0.000459s : 0.43% type_inference : 0.005501s : 5.14% event_method : 0.000014s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000002s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000057s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000568s : 0.53% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000431s : 0.40% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.42% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000414s : 0.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.42% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.097364s : 90.89% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 15.07% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000001s : 2: substitution.fold_const_symbol 3.49% : 0.000006s : 4: substitution.graph_param_transform 66.14% : 0.000108s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.71% : 0.000004s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005462 2 89.92% : 0.004911s : 1: type_inference.infer 10.08% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.27% : 0.000028s : 3: replace.inline 28.73% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.48% : 0.000106s : 3: match.inline 8.52% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 1.08% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000009s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.31% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 1.00% : 0.000002s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.25% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000353 8 47.01% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.99% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.119543 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.51% : 0.003003s : 1: add_attr 2.51% : 0.002995s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000495s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.78% : 0.000930s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.77% : 0.002117s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.38% : 0.000459s : 1: opt_after_jit_grad 0.15% : 0.000182s : 1: opt_b 3.30% : 0.003942s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.18% : 0.000209s : 1: renormalize.infer 0.18% : 0.000215s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000061s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.47% : 0.097386s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.61% : 0.005515s : 1: type_inference 0.04% : 0.000051s : 1: validate TotalTime = 0.146961, [24] [bootstrap]: 0.00050239 [type_inference]: 0.0113918 [event_method]: 4.875e-05 [auto_monad]: 0.00011878 [graph_reusing]: 8.11002e-06 [inline]: 1.89e-06 [add_attr]: 0.00306772, [1] [add_attr_with_inline]: 0.00305917, [1] [Cycle 1]: 7.025e-05, [2] [tag_attr]: 3.456e-05 [meta_addattr_fg_expand]: 9.46e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 4.922e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 2.24999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0133785, [53] [py_interpret_to_execute]: 3.841e-05 [rewriter_before_opt_a]: 0.00014351 [opt_a]: 0.0110785, [3] [Cycle 1]: 0.00712789, [45] [expand_dump_flag]: 4.07e-06 [switch_simplify]: 7.304e-05 [loop_unroll]: 6.176e-05 [a_1]: 0.00148928 [with_stream_mark]: 2.236e-05 [recompute_prepare]: 2.23e-05 [updatestate_depend_eliminate]: 9.19e-06 [updatestate_assign_eliminate]: 7.40998e-06 [updatestate_loads_eliminate]: 7.31001e-06 [parameter_eliminate]: 2.82002e-06 [a_2]: 0.0002449 [accelerated_algorithm]: 3.055e-05 [shard]: 1.74e-06 [meta_shard_fg_expand]: 3.23e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.621e-05 [auto_parallel]: 1.063e-05 [parallel]: 1.715e-05 [flash_sp]: 1.16e-05 [merge_comm]: 9.71e-06 [allreduce_fusion]: 8.90999e-06 [matmul_add_comm_reduction]: 2.693e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 1.798e-05 [virtual_dataset]: 1.559e-05 [get_grad_eliminate_]: 1.53e-05 [virtual_output]: 1.517e-05 [merge_forward]: 9.61e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.714e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.915e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 2.727e-05 [set_forward_comm_id_for_comm_node_pass]: 1.018e-05 [meta_fg_expand]: 0.00142174 [flash_sp_send_recv_attached]: 3.41001e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 5.977e-05 [a_after_grad]: 8.028e-05 [renormalize]: 0.00244984 [add_forward_monad_depend]: 9.14e-06 [auto_monad_grad]: 5.37001e-06 [auto_monad_eliminator]: 5.593e-05 [cse]: 0.00016903 [a_3]: 0.00033537 [Cycle 2]: 0.00303653, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.675e-05 [loop_unroll]: 4.414e-05 [a_1]: 0.00157408 [with_stream_mark]: 1.166e-05 [recompute_prepare]: 1.11e-05 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 4.00998e-06 [parameter_eliminate]: 1.09003e-06 [a_2]: 0.00012606 [accelerated_algorithm]: 1.183e-05 [shard]: 8.80013e-07 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.17001e-06 [merge_send_recv]: 6.84999e-06 [auto_parallel]: 7.81001e-06 [parallel]: 5.09998e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 5.25999e-06 [allreduce_fusion]: 4.52e-06 [matmul_add_comm_reduction]: 7.53e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 9.77001e-06 [virtual_dataset]: 8.60001e-06 [get_grad_eliminate_]: 8.90999e-06 [virtual_output]: 8.39998e-06 [merge_forward]: 4.01001e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.614e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.82999e-06 [meta_fg_expand]: 7.345e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.12e-06 [after_resolve]: 1.612e-05 [a_after_grad]: 1.422e-05 [renormalize]: 0.00058534 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.26002e-06 [auto_monad_eliminator]: 1.415e-05 [cse]: 4.559e-05 [a_3]: 6.473e-05 [Cycle 3]: 0.00090028, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 1.072e-05 [loop_unroll]: 9.14e-06 [a_1]: 0.00025124 [with_stream_mark]: 1.02e-05 [recompute_prepare]: 9.04e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012268 [accelerated_algorithm]: 1.142e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 6.94999e-06 [auto_parallel]: 7.13e-06 [parallel]: 4.29002e-06 [flash_sp]: 1.09e-06 [merge_comm]: 4.90999e-06 [allreduce_fusion]: 4.67e-06 [matmul_add_comm_reduction]: 7.63001e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 9.71e-06 [virtual_dataset]: 8.42998e-06 [get_grad_eliminate_]: 8.57e-06 [virtual_output]: 8.2e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.41002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.561e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.05999e-06 [meta_fg_expand]: 2.94001e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.474e-05 [a_after_grad]: 1.476e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.124e-05 [cse]: 2.671e-05 [a_3]: 5.9e-05 [py_interpret_to_execute_after_opt_a]: 1.043e-05 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 4.781e-05 [convert_after_rewriter]: 9.27001e-06 [order_py_execute_after_rewriter]: 6.53e-06 [mutable_eliminate]: 0.00050299 [opt_b]: 0.00028581, [1] [Cycle 1]: 0.00027979, [7] [b_1]: 0.00018776 [b_2]: 1.057e-05 [updatestate_depend_eliminate]: 7.27997e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.87998e-06 [renormalize]: 4.19997e-07 [cse]: 3.159e-05 [optimize_parallel_all_gather_comm]: 2.097e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 1.998e-05 [loop_unroll]: 0.0004208 [opt_after_cconv]: 0.0001361, [1] [Cycle 1]: 0.00013034, [7] [c_1]: 4.887e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 6.96999e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 3.021e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 2.946e-05 [tuple_transform]: 0.00010166, [1] [Cycle 1]: 9.697e-05, [4] [d_1]: 6.633e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 5.696e-05 [cse_after_recomputation]: 3.306e-05, [1] [Cycle 1]: 2.833e-05, [1] [cse]: 2.283e-05 [environ_conv]: 8.82e-06 [swap_dp_allreduce_reducescatter]: 8.08999e-06 [bias_add_comm_swap]: 2.66999e-06 [label_micro_interleaved_index]: 4.09002e-06 [label_fine_grained_interleaved_index]: 2.43002e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.59998e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.27e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.709e-05 [grouped_pairwise_exchange_alltoall]: 1.82001e-06 [offloading_packed_experts]: 5.20001e-06 [overlap_recompute_and_grad_model_parallel]: 5.86e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 1.82001e-06 [overlap_grad_ring_attention]: 4.98001e-06 [overlap_grad_flash_sp]: 2.36e-05 [begin_end_overlap_inline]: 5.99975e-07 [split_matmul_comm_elemetwise]: 1.97001e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.885e-05, [1] [Cycle 1]: 9.477e-05, [6] [build]: 1.042e-05 [elim_shapecalc]: 1.307e-05 [elim_not_effective]: 1.869e-05 [opt_reshape]: 1.005e-05 [fold_const_symbol]: 1.427e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.41002e-06 [auto_monad_reorder]: 2.489e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00046843 [validate]: 4.567e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.117609 [execute]: 8.54e-06 Sums bootstrap : 0.000502s : 0.35% type_inference : 0.011392s : 7.99% event_method : 0.000049s : 0.03% auto_monad : 0.000119s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000144s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.09% optimize.opt_a.loop_unroll : 0.000115s : 0.08% optimize.opt_a.a_1 : 0.003315s : 2.32% optimize.opt_a.with_stream_mark : 0.000044s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001498s : 1.05% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.06% optimize.opt_a.a_after_grad : 0.000109s : 0.08% optimize.opt_a.renormalize : 0.003035s : 2.13% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.06% optimize.opt_a.cse : 0.000241s : 0.17% optimize.opt_a.a_3 : 0.000459s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000503s : 0.35% optimize.opt_b.b_1 : 0.000188s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000421s : 0.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000468s : 0.33% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.117609s : 82.45% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000766 222 5.94% : 0.000046s : 12: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.66% : 0.000426s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.89% : 0.000014s : 3: substitution.less_batch_normalization 1.77% : 0.000014s : 11: substitution.minmaximum_grad 0.77% : 0.000006s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.19% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011319 2 86.22% : 0.009760s : 1: type_inference.infer 13.78% : 0.001560s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.80% : 0.000127s : 17: replace.inline 42.20% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 33 92.48% : 0.000417s : 17: match.inline 7.52% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.22% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000043s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.68% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000009s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001610 34 56.39% : 0.000908s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.61% : 0.000702s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.171736 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.79% : 0.003072s : 1: add_attr 1.78% : 0.003063s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000126s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000537s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000055s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.30% : 0.000513s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.90% : 0.004974s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000173s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.45% : 0.011082s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.28% : 0.000478s : 1: opt_after_jit_grad 0.17% : 0.000289s : 1: opt_b 7.79% : 0.013382s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000054s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.94% : 0.001607s : 2: renormalize.infer 0.82% : 0.001415s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.09% : 0.000148s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 68.50% : 0.117631s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.64% : 0.011406s : 1: type_inference 0.04% : 0.000071s : 1: validate TotalTime = 0.106212, [24] [bootstrap]: 0.00046325 [type_inference]: 0.00435725 [event_method]: 1.147e-05 [auto_monad]: 4.961e-05 [graph_reusing]: 5.34e-06 [inline]: 1.84e-06 [add_attr]: 0.00296777, [1] [add_attr_with_inline]: 0.00296016, [1] [Cycle 1]: 4.613e-05, [2] [tag_attr]: 1.185e-05 [meta_addattr_fg_expand]: 3.28e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 2.252e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 1.01002e-06 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00368654, [53] [py_interpret_to_execute]: 1.58e-05 [rewriter_before_opt_a]: 3.87e-05 [opt_a]: 0.00188729, [2] [Cycle 1]: 0.00128737, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.464e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.00028784 [with_stream_mark]: 1.337e-05 [recompute_prepare]: 8.18001e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.54998e-06 [a_2]: 7.734e-05 [accelerated_algorithm]: 6.04999e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.60001e-06 [merge_send_recv]: 7.85e-06 [auto_parallel]: 5.64e-06 [parallel]: 1.88e-05 [flash_sp]: 7.26001e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.66001e-06 [matmul_add_comm_reduction]: 8.35999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 5.61998e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.54002e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 4.073e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.11e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.75002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 9.17001e-06 [renormalize]: 0.0003427 [add_forward_monad_depend]: 4.75001e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.714e-05 [a_3]: 4.044e-05 [Cycle 2]: 0.00059057, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.54998e-06 [a_1]: 0.00012558 [with_stream_mark]: 9.57001e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.21998e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.733e-05 [accelerated_algorithm]: 5.49998e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.59998e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.32e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.25002e-06 [virtual_dataset]: 4.99e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 5.01997e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.82001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.30001e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 2.97002e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 8.95001e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.276e-05 [a_3]: 3.136e-05 [py_interpret_to_execute_after_opt_a]: 7.3e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.073e-05 [convert_after_rewriter]: 7.16001e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.0004504 [opt_b]: 0.00017861, [1] [Cycle 1]: 0.00017259, [7] [b_1]: 0.00010583 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.59985e-07 [cse]: 1.609e-05 [optimize_parallel_all_gather_comm]: 1.517e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.136e-05 [loop_unroll]: 0.00041445 [opt_after_cconv]: 9.449e-05, [1] [Cycle 1]: 8.886e-05, [7] [c_1]: 2.733e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.632e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 6.799e-05, [1] [Cycle 1]: 6.36e-05, [4] [d_1]: 3.82e-05 [none_parameter_eliminate]: 1.63002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.39999e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.37e-05 [cse_after_recomputation]: 2.015e-05, [1] [Cycle 1]: 1.556e-05, [1] [cse]: 1.053e-05 [environ_conv]: 4.57e-06 [swap_dp_allreduce_reducescatter]: 5.51e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.06001e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 9.49978e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.109e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.62998e-06 [overlap_recompute_and_grad_model_parallel]: 4.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.46998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.749e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.96e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 6.788e-05, [1] [Cycle 1]: 6.365e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 7.92e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 9.04998e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.75001e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.484e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.66999e-06 [opt_after_jit_grad]: 0.00045037 [validate]: 3.25e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0939153 [execute]: 9.71e-06 Sums bootstrap : 0.000463s : 0.45% type_inference : 0.004357s : 4.26% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000413s : 0.40% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000047s : 0.05% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000343s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000450s : 0.44% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000414s : 0.41% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000450s : 0.44% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.093915s : 91.83% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000115 26 18.25% : 0.000021s : 4: substitution.arithmetic_simplify 1.52% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000001s : 2: substitution.fold_const_symbol 4.36% : 0.000005s : 4: substitution.graph_param_transform 65.19% : 0.000075s : 2: substitution.inline 2.55% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.39% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004319 2 90.76% : 0.003920s : 1: type_inference.infer 9.24% : 0.000399s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000074 2 100.00% : 0.000074s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.06% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.27% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.96% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.96% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.28% : 0.000002s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.65% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.20% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 42.01% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.99% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.114125 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.60% : 0.002972s : 1: add_attr 2.60% : 0.002963s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000497s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.02% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.67% : 0.000765s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.66% : 0.001890s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.40% : 0.000459s : 1: opt_after_jit_grad 0.16% : 0.000182s : 1: opt_b 3.23% : 0.003690s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000186s : 1: renormalize.infer 0.13% : 0.000150s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.31% : 0.093937s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.83% : 0.004371s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.144354, [24] [bootstrap]: 0.00053574 [type_inference]: 0.0102344 [event_method]: 4.272e-05 [auto_monad]: 0.00011434 [graph_reusing]: 8.43999e-06 [inline]: 1.91998e-06 [add_attr]: 0.00302844, [1] [add_attr_with_inline]: 0.00302018, [1] [Cycle 1]: 6.796e-05, [2] [tag_attr]: 3.193e-05 [meta_addattr_fg_expand]: 8.20999e-06 [parallel-infer-symbol]: 3.37002e-06 [pre_auto_parallel]: 4.595e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.013061, [53] [py_interpret_to_execute]: 3.668e-05 [rewriter_before_opt_a]: 0.00012653 [opt_a]: 0.0108257, [3] [Cycle 1]: 0.0069489, [45] [expand_dump_flag]: 3.65998e-06 [switch_simplify]: 6.685e-05 [loop_unroll]: 5.573e-05 [a_1]: 0.00132629 [with_stream_mark]: 2.326e-05 [recompute_prepare]: 2.191e-05 [updatestate_depend_eliminate]: 8.82e-06 [updatestate_assign_eliminate]: 7.68999e-06 [updatestate_loads_eliminate]: 7.18998e-06 [parameter_eliminate]: 2.78998e-06 [a_2]: 0.00026936 [accelerated_algorithm]: 3.055e-05 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 3.28e-06 [shard_inline]: 1.61e-05 [merge_send_recv]: 1.61e-05 [auto_parallel]: 1.109e-05 [parallel]: 1.799e-05 [flash_sp]: 1.098e-05 [merge_comm]: 9.46e-06 [allreduce_fusion]: 9.12001e-06 [matmul_add_comm_reduction]: 2.603e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.765e-05 [virtual_dataset]: 1.529e-05 [get_grad_eliminate_]: 1.488e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.51e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.722e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.851e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 2.777e-05 [set_forward_comm_id_for_comm_node_pass]: 9.53002e-06 [meta_fg_expand]: 0.0013868 [flash_sp_send_recv_attached]: 3.38e-06 [receive_attached]: 2.28002e-06 [after_resolve]: 5.904e-05 [a_after_grad]: 8.064e-05 [renormalize]: 0.00246475 [add_forward_monad_depend]: 9.69e-06 [auto_monad_grad]: 5.61e-06 [auto_monad_eliminator]: 5.629e-05 [cse]: 0.00016826 [a_3]: 0.00033382 [Cycle 2]: 0.00296691, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.7e-05 [loop_unroll]: 4.413e-05 [a_1]: 0.00151504 [with_stream_mark]: 1.178e-05 [recompute_prepare]: 1.07e-05 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 4.38999e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012627 [accelerated_algorithm]: 1.186e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 6.82002e-06 [auto_parallel]: 7.39002e-06 [parallel]: 4.57e-06 [flash_sp]: 3.20998e-06 [merge_comm]: 5.23002e-06 [allreduce_fusion]: 4.65999e-06 [matmul_add_comm_reduction]: 8.41002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.036e-05 [virtual_dataset]: 8.92e-06 [get_grad_eliminate_]: 8.75001e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.56002e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 9.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.628e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.413e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20999e-06 [meta_fg_expand]: 3.488e-05 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.578e-05 [a_after_grad]: 1.444e-05 [renormalize]: 0.00057983 [add_forward_monad_depend]: 4.53999e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.455e-05 [cse]: 4.598e-05 [a_3]: 6.546e-05 [Cycle 3]: 0.0008958, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.08e-05 [loop_unroll]: 8.82999e-06 [a_1]: 0.0002484 [with_stream_mark]: 1.006e-05 [recompute_prepare]: 9.12001e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.90998e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 0.00012308 [accelerated_algorithm]: 1.166e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.89001e-06 [auto_parallel]: 7.25e-06 [parallel]: 4.48999e-06 [flash_sp]: 1.09998e-06 [merge_comm]: 4.89e-06 [allreduce_fusion]: 5.07e-06 [matmul_add_comm_reduction]: 7.50998e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.48999e-06 [virtual_output]: 8.22998e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.42998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.594e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04998e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 1.29e-05 [a_after_grad]: 1.384e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.181e-05 [cse]: 2.6e-05 [a_3]: 5.869e-05 [py_interpret_to_execute_after_opt_a]: 1e-05 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 4.638e-05 [convert_after_rewriter]: 9.66e-06 [order_py_execute_after_rewriter]: 7.33999e-06 [mutable_eliminate]: 0.00046108 [opt_b]: 0.00028553, [1] [Cycle 1]: 0.00027961, [7] [b_1]: 0.0001885 [b_2]: 1.054e-05 [updatestate_depend_eliminate]: 6.85002e-06 [updatestate_assign_eliminate]: 4.04002e-06 [updatestate_loads_eliminate]: 3.83999e-06 [renormalize]: 4.59986e-07 [cse]: 3.091e-05 [optimize_parallel_all_gather_comm]: 2.099e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.042e-05 [loop_unroll]: 0.00041986 [opt_after_cconv]: 0.00013482, [1] [Cycle 1]: 0.00012898, [7] [c_1]: 4.831e-05 [parameter_eliminate]: 1.98002e-06 [updatestate_depend_eliminate]: 7.19001e-06 [updatestate_assign_eliminate]: 4.09002e-06 [updatestate_loads_eliminate]: 3.83001e-06 [cse]: 2.982e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 2.894e-05 [tuple_transform]: 0.00010047, [1] [Cycle 1]: 9.595e-05, [4] [d_1]: 6.649e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 9.81e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 5.909e-05 [cse_after_recomputation]: 3.213e-05, [1] [Cycle 1]: 2.741e-05, [1] [cse]: 2.209e-05 [environ_conv]: 9.03002e-06 [swap_dp_allreduce_reducescatter]: 8.00999e-06 [bias_add_comm_swap]: 2.55002e-06 [label_micro_interleaved_index]: 4.36002e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.57001e-06 [interleave_parallel_branches]: 1.14003e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.769e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 4.84e-06 [overlap_recompute_and_grad_model_parallel]: 5.56998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 5.00999e-06 [overlap_grad_flash_sp]: 2.333e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 2.13998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.791e-05, [1] [Cycle 1]: 9.357e-05, [6] [build]: 9.83002e-06 [elim_shapecalc]: 1.33e-05 [elim_not_effective]: 1.812e-05 [opt_reshape]: 9.72001e-06 [fold_const_symbol]: 1.458e-05 [renormalize]: 3.09985e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.546e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.75998e-06 [opt_after_jit_grad]: 0.00046086 [validate]: 4.764e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.116503 [execute]: 9.91998e-06 Sums bootstrap : 0.000536s : 0.38% type_inference : 0.010234s : 7.31% event_method : 0.000043s : 0.03% auto_monad : 0.000114s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.09% optimize.opt_a.loop_unroll : 0.000109s : 0.08% optimize.opt_a.a_1 : 0.003090s : 2.21% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000519s : 0.37% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001425s : 1.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.06% optimize.opt_a.a_after_grad : 0.000109s : 0.08% optimize.opt_a.renormalize : 0.003045s : 2.17% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000240s : 0.17% optimize.opt_a.a_3 : 0.000458s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.03% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000461s : 0.33% optimize.opt_b.b_1 : 0.000189s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000420s : 0.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000461s : 0.33% validate : 0.000048s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.116503s : 83.19% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000720 218 5.85% : 0.000042s : 11: substitution.arithmetic_simplify 1.79% : 0.000013s : 2: substitution.cast_eliminate 0.43% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.31% : 0.000002s : 2: substitution.incorporate_call_switch 54.91% : 0.000396s : 16: substitution.inline 2.13% : 0.000015s : 2: substitution.inline_without_move 1.41% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.81% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.87% : 0.000013s : 20: substitution.remove_not_recompute_node 3.21% : 0.000023s : 10: substitution.replace_applicator 1.42% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.68% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.49% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.51% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010168 2 86.94% : 0.008840s : 1: type_inference.infer 13.06% : 0.001328s : 1: type_inference.specialize ------[replace.] 0.000199 30 59.42% : 0.000118s : 16: replace.inline 40.58% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000418 30 92.70% : 0.000387s : 16: match.inline 7.30% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5663 1.06% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.03% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.01% : 0.000015s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.10% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 3.67% : 0.000028s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_depend_swap 1.70% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.62% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.18% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.47% : 0.000041s : 244: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.61% : 0.000005s : 32: predicate.less_batch_normalization 1.58% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.61% : 0.000020s : 164: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.12% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.34% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 1.89% : 0.000014s : 97: predicate.partial_defer_inline 1.65% : 0.000013s : 89: predicate.partial_eliminate 1.04% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.24% : 0.000009s : 67: predicate.reduce_eliminate 2.59% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.82% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 67: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.75% : 0.000013s : 97: predicate.switch_defer_inline 2.83% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.71% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.44% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.69% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.56% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.57% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.15% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001570 32 56.19% : 0.000882s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.81% : 0.000688s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168570 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.80% : 0.003033s : 1: add_attr 1.79% : 0.003024s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000121s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.34% : 0.000570s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000049s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000470s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.82% : 0.004760s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.42% : 0.010829s : 1: opt_a 0.08% : 0.000138s : 1: opt_after_cconv 0.28% : 0.000470s : 1: opt_after_jit_grad 0.17% : 0.000289s : 1: opt_b 7.75% : 0.013065s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.93% : 0.001574s : 2: renormalize.infer 0.86% : 0.001458s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000050s : 1: rewriter_after_opt_a 0.08% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 69.13% : 0.116524s : 1: task_emit 0.06% : 0.000103s : 1: tuple_transform 6.08% : 0.010249s : 1: type_inference 0.04% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x2-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x3-pynative],max_mem:42.0M TotalTime = 0.0214808, [24] [bootstrap]: 0.00053699 [type_inference]: 0.00618365 [event_method]: 1.422e-05 [auto_monad]: 0.00010692 [graph_reusing]: 5.09e-06 [inline]: 1.55001e-06 [add_attr]: 0.0033744, [1] [add_attr_with_inline]: 0.00336416, [1] [Cycle 1]: 4.41e-05, [2] [tag_attr]: 1.467e-05 [meta_addattr_fg_expand]: 4.20999e-06 [parallel-infer-symbol]: 3.16999e-06 [pre_auto_parallel]: 2.792e-05 [insert-virtual-dataset]: 2.17001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00397715, [53] [py_interpret_to_execute]: 1.939e-05 [rewriter_before_opt_a]: 5.695e-05 [opt_a]: 0.00214715, [2] [Cycle 1]: 0.00154606, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 3.2e-05 [loop_unroll]: 2.089e-05 [a_1]: 0.00047594 [with_stream_mark]: 1.45e-05 [recompute_prepare]: 7.85e-06 [updatestate_depend_eliminate]: 3.69002e-06 [updatestate_assign_eliminate]: 3.25998e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.611e-05 [accelerated_algorithm]: 6.46999e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.96998e-06 [merge_send_recv]: 7.3e-06 [auto_parallel]: 5.75001e-06 [parallel]: 2.4e-05 [flash_sp]: 7.21999e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 8.78001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.37002e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.72999e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.062e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.23998e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00042285 [add_forward_monad_depend]: 4.48999e-06 [auto_monad_grad]: 2.27999e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.661e-05 [a_3]: 4.065e-05 [Cycle 2]: 0.00059154, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.24998e-06 [a_1]: 0.00012618 [with_stream_mark]: 9.61e-06 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.766e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.27999e-06 [parallel]: 4.27003e-06 [flash_sp]: 3.16001e-06 [merge_comm]: 2.83e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.01998e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 5.82001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.17001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.93999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.22999e-06 [a_after_grad]: 7.87e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.27001e-06 [cse]: 1.374e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.76001e-06 [slice_cell_reuse_recomputed_activation]: 1.80001e-06 [rewriter_after_opt_a]: 2.946e-05 [convert_after_rewriter]: 6.76999e-06 [order_py_execute_after_rewriter]: 5.78002e-06 [mutable_eliminate]: 0.00045088 [opt_b]: 0.00018197, [1] [Cycle 1]: 0.00017618, [7] [b_1]: 0.00010872 [b_2]: 7.17002e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 5.19998e-07 [cse]: 1.622e-05 [optimize_parallel_all_gather_comm]: 1.544e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.206e-05 [loop_unroll]: 0.00041332 [opt_after_cconv]: 9.511e-05, [1] [Cycle 1]: 8.946e-05, [7] [c_1]: 2.755e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.663e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.446e-05 [tuple_transform]: 6.941e-05, [1] [Cycle 1]: 6.467e-05, [4] [d_1]: 3.857e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.73e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.722e-05 [cse_after_recomputation]: 2.03e-05, [1] [Cycle 1]: 1.589e-05, [1] [cse]: 1.08e-05 [environ_conv]: 4.87998e-06 [swap_dp_allreduce_reducescatter]: 4.68001e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.58999e-06 [label_fine_grained_interleaved_index]: 2.92002e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.50001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97001e-06 [control_data_broadcast_order]: 1.09e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.678e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.01998e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.674e-05, [1] [Cycle 1]: 6.278e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.3e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.569e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 0.0001288 [opt_after_jit_grad]: 0.00047461 [validate]: 3.268e-05 [backend_pass]: 1.09998e-06 [task_emit]: 0.00637478 [execute]: 6.89001e-06 Sums bootstrap : 0.000537s : 3.13% type_inference : 0.006184s : 36.07% event_method : 0.000014s : 0.08% auto_monad : 0.000107s : 0.62% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000057s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000602s : 3.51% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000423s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.63% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000413s : 2.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000129s : 0.75% opt_after_jit_grad : 0.000475s : 2.77% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006375s : 37.19% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.74% : 0.000024s : 5: substitution.arithmetic_simplify 0.98% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.28% : 0.000005s : 4: substitution.graph_param_transform 66.62% : 0.000109s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.45% : 0.000004s : 4: substitution.remove_not_recompute_node 2.61% : 0.000004s : 4: substitution.replace_old_param 6.88% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006140 2 90.55% : 0.005560s : 1: type_inference.infer 9.45% : 0.000580s : 1: type_inference.specialize ------[replace.] 0.000038 5 66.69% : 0.000026s : 3: replace.inline 33.31% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.31% : 0.000107s : 3: match.inline 8.69% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.10% : 0.000003s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.34% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.54% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.51% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.30% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.84% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000376 8 47.90% : 0.000180s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.10% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030374 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.12% : 0.003379s : 1: add_attr 11.09% : 0.003368s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.37% : 0.000113s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000574s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.18% : 0.000967s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.08% : 0.002150s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.59% : 0.000484s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.11% : 0.003981s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.71% : 0.000216s : 1: renormalize.infer 0.66% : 0.000200s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000134s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.20% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000069s : 1: symbol_engine_optimizer 21.02% : 0.006384s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.40% : 0.006197s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0181864, [24] [bootstrap]: 0.00046331 [type_inference]: 0.00440644 [event_method]: 1.034e-05 [auto_monad]: 5.222e-05 [graph_reusing]: 4.96002e-06 [inline]: 2.12999e-06 [add_attr]: 0.00300385, [1] [add_attr_with_inline]: 0.00299602, [1] [Cycle 1]: 4.366e-05, [2] [tag_attr]: 1.155e-05 [meta_addattr_fg_expand]: 2.94999e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.216e-05 [insert-virtual-dataset]: 2.81999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00366441, [53] [py_interpret_to_execute]: 1.47e-05 [rewriter_before_opt_a]: 3.814e-05 [opt_a]: 0.00184502, [2] [Cycle 1]: 0.00124969, [45] [expand_dump_flag]: 2.97002e-06 [switch_simplify]: 2.402e-05 [loop_unroll]: 1.354e-05 [a_1]: 0.00029301 [with_stream_mark]: 1.357e-05 [recompute_prepare]: 7.21001e-06 [updatestate_depend_eliminate]: 3.48999e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.639e-05 [accelerated_algorithm]: 6.07001e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 5.57001e-06 [parallel]: 1.679e-05 [flash_sp]: 7.66999e-06 [merge_comm]: 3.48e-06 [allreduce_fusion]: 3.55e-06 [matmul_add_comm_reduction]: 8.70999e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 6.23e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 4.19002e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 9.47999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.094e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.60001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.93001e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.035e-05 [a_after_grad]: 8.96002e-06 [renormalize]: 0.00033919 [add_forward_monad_depend]: 4.06001e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.313e-05 [cse]: 2.617e-05 [a_3]: 4.12e-05 [Cycle 2]: 0.00058601, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012409 [with_stream_mark]: 9.27001e-06 [recompute_prepare]: 5.47999e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.696e-05 [accelerated_algorithm]: 5.39998e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.12998e-06 [auto_parallel]: 5.17999e-06 [parallel]: 4.57998e-06 [flash_sp]: 3.17002e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.93998e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.99999e-06 [virtual_dataset]: 5.15001e-06 [get_grad_eliminate_]: 5.02999e-06 [virtual_output]: 4.89003e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.78999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.29984e-07 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.02999e-06 [cse]: 1.229e-05 [a_3]: 3.229e-05 [py_interpret_to_execute_after_opt_a]: 7.59002e-06 [slice_cell_reuse_recomputed_activation]: 1.74998e-06 [rewriter_after_opt_a]: 3.069e-05 [convert_after_rewriter]: 6.53003e-06 [order_py_execute_after_rewriter]: 4.68999e-06 [mutable_eliminate]: 0.00044799 [opt_b]: 0.00017921, [1] [Cycle 1]: 0.00017289, [7] [b_1]: 0.00010735 [b_2]: 6.86999e-06 [updatestate_depend_eliminate]: 5.31998e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 3.09985e-07 [cse]: 1.562e-05 [optimize_parallel_all_gather_comm]: 1.539e-05 [overlap_param_gather]: 1.86003e-06 [cconv]: 2.127e-05 [loop_unroll]: 0.0004126 [opt_after_cconv]: 0.00012003, [1] [Cycle 1]: 0.00011432, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 4.139e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 1.313e-05 [tuple_transform]: 6.901e-05, [1] [Cycle 1]: 6.458e-05, [4] [d_1]: 3.912e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.66998e-06 [add_recomputation]: 4.375e-05 [cse_after_recomputation]: 1.999e-05, [1] [Cycle 1]: 1.549e-05, [1] [cse]: 1.052e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.02999e-06 [micro_interleaved_order_control]: 2.75002e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 1.40999e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.154e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 3.83999e-06 [overlap_grad_flash_sp]: 1.726e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 6.608e-05, [1] [Cycle 1]: 6.198e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 7.89002e-06 [elim_not_effective]: 1.117e-05 [opt_reshape]: 5.87999e-06 [fold_const_symbol]: 8.47e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.552e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00044875 [validate]: 3.126e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00584442 [execute]: 6.29001e-06 Sums bootstrap : 0.000463s : 3.26% type_inference : 0.004406s : 30.96% event_method : 0.000010s : 0.07% auto_monad : 0.000052s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.93% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.15% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000413s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000041s : 0.29% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.15% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005844s : 41.06% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000118 26 18.89% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.25% : 0.000005s : 4: substitution.graph_param_transform 65.61% : 0.000077s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.98% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004365 2 91.87% : 0.004010s : 1: type_inference.infer 8.13% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000134 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.20% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.95% : 0.000001s : 9: predicate.cast_eliminate 1.01% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.84% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.48% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.81% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.53% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 42.12% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.88% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026117 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.52% : 0.003008s : 1: add_attr 11.48% : 0.002999s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000499s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.75% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000766s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000030s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001848s : 1: opt_a 0.47% : 0.000124s : 1: opt_after_cconv 1.75% : 0.000458s : 1: opt_after_jit_grad 0.70% : 0.000182s : 1: opt_b 14.05% : 0.003668s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000185s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000069s : 1: symbol_engine_optimizer 22.42% : 0.005854s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.92% : 0.004419s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0195958, [24] [bootstrap]: 0.00047489 [type_inference]: 0.00557977 [event_method]: 1.363e-05 [auto_monad]: 5.422e-05 [graph_reusing]: 6.19001e-06 [inline]: 1.77999e-06 [add_attr]: 0.00295582, [1] [add_attr_with_inline]: 0.00294809, [1] [Cycle 1]: 4.486e-05, [2] [tag_attr]: 1.544e-05 [meta_addattr_fg_expand]: 4.03001e-06 [parallel-infer-symbol]: 2.54999e-06 [pre_auto_parallel]: 2.554e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.00394426, [53] [py_interpret_to_execute]: 1.866e-05 [rewriter_before_opt_a]: 5.826e-05 [opt_a]: 0.00211623, [2] [Cycle 1]: 0.00151969, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 3.141e-05 [loop_unroll]: 2.056e-05 [a_1]: 0.00048254 [with_stream_mark]: 1.386e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.58999e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 3.05002e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.665e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.49998e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 7.91001e-06 [auto_parallel]: 5.74e-06 [parallel]: 1.787e-05 [flash_sp]: 7.53e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.60001e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 5.80002e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.33002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.013e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.74001e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.67e-06 [renormalize]: 0.00040258 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.302e-05 [cse]: 2.7e-05 [a_3]: 3.979e-05 [Cycle 2]: 0.00058709, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.76e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012528 [with_stream_mark]: 9.44e-06 [recompute_prepare]: 5.69999e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 7.59988e-07 [a_2]: 6.75e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.05998e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 3.00998e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.44001e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.05001e-06 [merge_recompute_call_nodes]: 6.29982e-07 [before_grad]: 7.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.97e-06 [a_after_grad]: 8.28001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 5.81e-06 [cse]: 1.499e-05 [a_3]: 3.166e-05 [py_interpret_to_execute_after_opt_a]: 7.66001e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 3.044e-05 [convert_after_rewriter]: 6.66e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00045198 [opt_b]: 0.00018155, [1] [Cycle 1]: 0.00017538, [7] [b_1]: 0.00010627 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.4002e-07 [cse]: 1.692e-05 [optimize_parallel_all_gather_comm]: 1.602e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00041241 [opt_after_cconv]: 9.437e-05, [1] [Cycle 1]: 8.872e-05, [7] [c_1]: 2.796e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.518e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.304e-05 [tuple_transform]: 6.818e-05, [1] [Cycle 1]: 6.4e-05, [4] [d_1]: 3.846e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.91998e-06 [add_recomputation]: 4.387e-05 [cse_after_recomputation]: 1.953e-05, [1] [Cycle 1]: 1.55e-05, [1] [cse]: 1.041e-05 [environ_conv]: 4.28001e-06 [swap_dp_allreduce_reducescatter]: 6.14999e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.46002e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.46e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.04003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.96998e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.68999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.771e-05 [begin_end_overlap_inline]: 8.39995e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.54e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.74e-05, [1] [Cycle 1]: 6.338e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 8.45001e-06 [elim_not_effective]: 1.131e-05 [opt_reshape]: 6.13002e-06 [fold_const_symbol]: 8.57e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.532e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00044944 [validate]: 3.137e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00582699 [execute]: 6.53998e-06 Sums bootstrap : 0.000475s : 3.03% type_inference : 0.005580s : 35.56% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000608s : 3.87% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000019s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000403s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.88% optimize.opt_b.b_1 : 0.000106s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000412s : 2.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.86% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005827s : 37.13% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000159 30 15.39% : 0.000024s : 5: substitution.arithmetic_simplify 1.24% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.36% : 0.000005s : 4: substitution.graph_param_transform 65.92% : 0.000105s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005540 2 89.53% : 0.004959s : 1: type_inference.infer 10.47% : 0.000580s : 1: type_inference.specialize ------[replace.] 0.000037 5 70.35% : 0.000026s : 3: replace.inline 29.65% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000112 5 91.70% : 0.000103s : 3: match.inline 8.30% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000197 1131 0.71% : 0.000001s : 11: predicate.accumulaten_eliminater 0.71% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.48% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 11: predicate.addn_zero_filter 0.63% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.88% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000002s : 11: predicate.cast_eliminate 0.56% : 0.000001s : 8: predicate.check_bprop_eliminate 0.46% : 0.000001s : 8: predicate.compare_switch_simplify 0.19% : 0.000000s : 4: predicate.const_output_eliminate 0.48% : 0.000001s : 8: predicate.depend_value_elim 0.69% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.85% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.66% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.95% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.31% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.90% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.85% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.86% : 0.000002s : 15: predicate.environ_get_depend_swap 1.44% : 0.000003s : 23: predicate.environ_get_eliminate 0.89% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.05% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.83% : 0.000004s : 16: predicate.float_depend_g_call 0.46% : 0.000001s : 8: predicate.float_environ_get_switch 0.70% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.57% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.57% : 0.000001s : 8: predicate.incorporate_call 0.48% : 0.000001s : 8: predicate.incorporate_call_switch 4.94% : 0.000010s : 51: predicate.inline 0.69% : 0.000001s : 8: predicate.inline_without_move 0.29% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.74% : 0.000001s : 8: predicate.less_batch_normalization 1.37% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.92% : 0.000004s : 32: predicate.load_eliminater 0.82% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.40% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.49% : 0.000001s : 8: predicate.merge_addn 0.51% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.52% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.62% : 0.000001s : 11: predicate.minmaximum_grad 0.90% : 0.000002s : 4: predicate.mutable_eliminate 0.32% : 0.000001s : 4: predicate.opt_reshape 0.32% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000003s : 16: predicate.partial_defer_inline 1.16% : 0.000002s : 17: predicate.partial_eliminate 0.67% : 0.000001s : 11: predicate.print_const_string_wrapper 0.51% : 0.000001s : 8: predicate.reduce_all_const_elim 0.82% : 0.000002s : 11: predicate.reduce_eliminate 1.87% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.40% : 0.000001s : 8: predicate.remove_not_recompute_node 1.07% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.27% : 0.000001s : 4: predicate.reset_defer_inline 0.68% : 0.000001s : 11: predicate.reshape_eliminate 0.55% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.33% : 0.000001s : 4: predicate.row_tensor_eliminate 0.64% : 0.000001s : 8: predicate.same_eliminate 0.41% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.67% : 0.000001s : 8: predicate.shard_identity_eliminate 0.62% : 0.000001s : 8: predicate.special_op_eliminate 0.64% : 0.000001s : 8: predicate.specialize_transform 0.80% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.64% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.08% : 0.000002s : 16: predicate.switch_defer_inline 1.59% : 0.000003s : 24: predicate.switch_layer_defer_inline 24.02% : 0.000047s : 54: predicate.switch_simplify 0.64% : 0.000001s : 11: predicate.tile_eliminate 0.68% : 0.000001s : 11: predicate.transpose_eliminate 1.23% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.21% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.10% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.71% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.13% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.75% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.36% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.85% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.54% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.31% : 0.000001s : 4: predicate.value_based_eliminate 0.58% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.57% : 0.000001s : 8: predicate.virtual_output_eliminate 0.28% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000377 8 42.65% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.35% : 0.000216s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028024 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.56% : 0.002960s : 1: add_attr 10.53% : 0.002951s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.82% : 0.000509s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.46% : 0.000970s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.56% : 0.002119s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.64% : 0.000459s : 1: opt_after_jit_grad 0.66% : 0.000185s : 1: opt_b 14.09% : 0.003948s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000206s : 1: renormalize.infer 0.68% : 0.000189s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 20.83% : 0.005837s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.96% : 0.005593s : 1: type_inference 0.21% : 0.000057s : 1: validate TotalTime = 0.0359789, [24] [bootstrap]: 0.00051923 [type_inference]: 0.0112973 [event_method]: 4.771e-05 [auto_monad]: 0.00011852 [graph_reusing]: 8.45999e-06 [inline]: 1.87999e-06 [add_attr]: 0.00297821, [1] [add_attr_with_inline]: 0.00297001, [1] [Cycle 1]: 7.082e-05, [2] [tag_attr]: 3.44e-05 [meta_addattr_fg_expand]: 9.86e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 4.905e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.013005, [53] [py_interpret_to_execute]: 3.713e-05 [rewriter_before_opt_a]: 0.00017484 [opt_a]: 0.0104739, [3] [Cycle 1]: 0.00685285, [45] [expand_dump_flag]: 4.3e-06 [switch_simplify]: 7.476e-05 [loop_unroll]: 6.242e-05 [a_1]: 0.00144325 [with_stream_mark]: 2.284e-05 [recompute_prepare]: 2.151e-05 [updatestate_depend_eliminate]: 9.25999e-06 [updatestate_assign_eliminate]: 7.96001e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 2.49001e-06 [a_2]: 0.00024407 [accelerated_algorithm]: 3.097e-05 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.614e-05 [merge_send_recv]: 1.613e-05 [auto_parallel]: 1.096e-05 [parallel]: 1.839e-05 [flash_sp]: 1.103e-05 [merge_comm]: 1.017e-05 [allreduce_fusion]: 9.02e-06 [matmul_add_comm_reduction]: 2.545e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.825e-05 [virtual_dataset]: 1.578e-05 [get_grad_eliminate_]: 1.501e-05 [virtual_output]: 1.526e-05 [merge_forward]: 9.19e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 1.718e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.863e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 2.706e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.00138311 [flash_sp_send_recv_attached]: 3.70998e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 0.00010226 [a_after_grad]: 8.204e-05 [renormalize]: 0.00223864 [add_forward_monad_depend]: 8.94998e-06 [auto_monad_grad]: 5.29e-06 [auto_monad_eliminator]: 5.369e-05 [cse]: 0.00015753 [a_3]: 0.00032825 [Cycle 2]: 0.00281494, [45] [expand_dump_flag]: 1.54998e-06 [switch_simplify]: 4.609e-05 [loop_unroll]: 4.281e-05 [a_1]: 0.00150215 [with_stream_mark]: 1.128e-05 [recompute_prepare]: 1.007e-05 [updatestate_depend_eliminate]: 4.48001e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00010909 [accelerated_algorithm]: 1.108e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 8e-06 [merge_send_recv]: 5.77001e-06 [auto_parallel]: 6.58e-06 [parallel]: 4.89e-06 [flash_sp]: 3.31001e-06 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 3.85e-06 [matmul_add_comm_reduction]: 7.05998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.34998e-06 [virtual_dataset]: 7.71001e-06 [get_grad_eliminate_]: 7.55e-06 [virtual_output]: 7.85998e-06 [merge_forward]: 4.69002e-06 [cell_reuse_recompute_pass]: 9.29984e-07 [offload_activation]: 8.30999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.428e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.226e-05 [set_forward_comm_id_for_comm_node_pass]: 4.45999e-06 [meta_fg_expand]: 6.788e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.487e-05 [a_after_grad]: 1.287e-05 [renormalize]: 0.00051423 [add_forward_monad_depend]: 3.98999e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.249e-05 [cse]: 2.102e-05 [a_3]: 5.614e-05 [Cycle 3]: 0.00079222, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 9.04e-06 [loop_unroll]: 7.58999e-06 [a_1]: 0.00021276 [with_stream_mark]: 8.80001e-06 [recompute_prepare]: 8.31002e-06 [updatestate_depend_eliminate]: 3.98001e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 3.24001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00010678 [accelerated_algorithm]: 1.049e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 7.97e-06 [merge_send_recv]: 5.63997e-06 [auto_parallel]: 6.51999e-06 [parallel]: 4.90001e-06 [flash_sp]: 1.05001e-06 [merge_comm]: 4.09002e-06 [allreduce_fusion]: 3.95e-06 [matmul_add_comm_reduction]: 6.55002e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 8.77999e-06 [virtual_dataset]: 7.52998e-06 [get_grad_eliminate_]: 7.43e-06 [virtual_output]: 7.31999e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.52999e-06 [offload_activation]: 7.5e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.415e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.168e-05 [set_forward_comm_id_for_comm_node_pass]: 4.27e-06 [meta_fg_expand]: 2.58e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.211e-05 [a_after_grad]: 1.197e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 8.65999e-06 [cse]: 1.822e-05 [a_3]: 4.943e-05 [py_interpret_to_execute_after_opt_a]: 8.75001e-06 [slice_cell_reuse_recomputed_activation]: 1.72001e-06 [rewriter_after_opt_a]: 4.061e-05 [convert_after_rewriter]: 8.37e-06 [order_py_execute_after_rewriter]: 6.21e-06 [mutable_eliminate]: 0.00046575 [opt_b]: 0.00025173, [1] [Cycle 1]: 0.00024548, [7] [b_1]: 0.00016556 [b_2]: 9.34e-06 [updatestate_depend_eliminate]: 6.44001e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 3.30998e-06 [renormalize]: 3.19997e-07 [cse]: 2.299e-05 [optimize_parallel_all_gather_comm]: 1.838e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 1.943e-05 [loop_unroll]: 0.00042237 [opt_after_cconv]: 0.00045535, [1] [Cycle 1]: 0.00044907, [7] [c_1]: 4.173e-05 [parameter_eliminate]: 2.54001e-06 [updatestate_depend_eliminate]: 8.23001e-06 [updatestate_assign_eliminate]: 3.58e-06 [updatestate_loads_eliminate]: 3.28998e-06 [cse]: 2.364e-05 [renormalize]: 1.69995e-07 [remove_dup_value]: 2.595e-05 [tuple_transform]: 9.748e-05, [1] [Cycle 1]: 9.171e-05, [4] [d_1]: 6.319e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 8.64e-06 [partial_unused_args_eliminate]: 1.96003e-06 [add_recomputation]: 5.236e-05 [cse_after_recomputation]: 2.537e-05, [1] [Cycle 1]: 2.089e-05, [1] [cse]: 1.566e-05 [environ_conv]: 7.21999e-06 [swap_dp_allreduce_reducescatter]: 7.34002e-06 [bias_add_comm_swap]: 2.49001e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.33998e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.51e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.33002e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.436e-05 [grouped_pairwise_exchange_alltoall]: 1.70001e-06 [offloading_packed_experts]: 4.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.99e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.55999e-06 [overlap_grad_flash_sp]: 2.171e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.86998e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 8.96e-05, [1] [Cycle 1]: 8.542e-05, [6] [build]: 9.42999e-06 [elim_shapecalc]: 1.141e-05 [elim_not_effective]: 1.554e-05 [opt_reshape]: 8.67e-06 [fold_const_symbol]: 1.313e-05 [renormalize]: 1.69995e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.137e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.86001e-06 [opt_after_jit_grad]: 0.00047712 [validate]: 3.978e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00718909 [execute]: 7.35e-06 Sums bootstrap : 0.000519s : 1.65% type_inference : 0.011297s : 35.94% event_method : 0.000048s : 0.15% auto_monad : 0.000119s : 0.38% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.12% optimize.rewriter_before_opt_a : 0.000175s : 0.56% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.41% optimize.opt_a.loop_unroll : 0.000113s : 0.36% optimize.opt_a.a_1 : 0.003158s : 10.05% optimize.opt_a.with_stream_mark : 0.000043s : 0.14% optimize.opt_a.recompute_prepare : 0.000040s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000460s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000006s : 0.02% optimize.opt_a.shard_inline : 0.000032s : 0.10% optimize.opt_a.merge_send_recv : 0.000028s : 0.09% optimize.opt_a.auto_parallel : 0.000024s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000017s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.12% optimize.opt_a.virtual_dataset : 0.000031s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.10% optimize.opt_a.virtual_output : 0.000030s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000033s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000051s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.06% optimize.opt_a.meta_fg_expand : 0.001454s : 4.62% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000129s : 0.41% optimize.opt_a.a_after_grad : 0.000107s : 0.34% optimize.opt_a.renormalize : 0.002753s : 8.76% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000075s : 0.24% optimize.opt_a.cse : 0.000197s : 0.63% optimize.opt_a.a_3 : 0.000434s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000041s : 0.13% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000466s : 1.48% optimize.opt_b.b_1 : 0.000166s : 0.53% optimize.opt_b.b_2 : 0.000009s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000023s : 0.07% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000422s : 1.34% optimize.opt_after_cconv.c_1 : 0.000042s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000024s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000026s : 0.08% optimize.tuple_transform.d_1 : 0.000063s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.17% optimize.cse_after_recomputation.cse : 0.000016s : 0.05% optimize.environ_conv : 0.000007s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000014s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000022s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000021s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000477s : 1.52% validate : 0.000040s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.007189s : 22.87% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000735 213 6.03% : 0.000044s : 12: substitution.arithmetic_simplify 0.33% : 0.000002s : 4: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 4: substitution.fold_const_symbol 0.96% : 0.000007s : 7: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 56.75% : 0.000417s : 17: substitution.inline 2.15% : 0.000016s : 2: substitution.inline_without_move 1.23% : 0.000009s : 18: substitution.j_node_and_user_rematch 2.06% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.80% : 0.000006s : 5: substitution.partial_eliminate 1.71% : 0.000013s : 18: substitution.remove_not_recompute_node 3.21% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.73% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.37% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.90% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011224 2 87.07% : 0.009773s : 1: type_inference.infer 12.93% : 0.001451s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.51% : 0.000124s : 17: replace.inline 42.49% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000442 33 92.39% : 0.000408s : 17: match.inline 7.61% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000717 5530 1.08% : 0.000008s : 66: predicate.accumulaten_eliminater 0.28% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 30: predicate.addn_check_dump 1.06% : 0.000008s : 66: predicate.addn_zero_filter 1.06% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 1.97% : 0.000014s : 96: predicate.arithmetic_simplify 1.17% : 0.000008s : 66: predicate.cast_eliminate 1.13% : 0.000008s : 65: predicate.check_bprop_eliminate 0.51% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.49% : 0.000003s : 30: predicate.depend_value_elim 1.21% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 66: predicate.dict_get_item_eliminator 1.16% : 0.000008s : 66: predicate.dict_set_item_eliminator 0.35% : 0.000002s : 14: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 7: predicate.elim_not_effective 0.15% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 73: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 73: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 73: predicate.environ_get_depend_swap 1.74% : 0.000012s : 103: predicate.environ_get_eliminate 1.20% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.78% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.36% : 0.000017s : 99: predicate.float_depend_g_call 0.50% : 0.000004s : 30: predicate.float_environ_get_switch 0.66% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 7: predicate.fold_const_symbol 0.53% : 0.000004s : 30: predicate.get_grad_eliminate 0.09% : 0.000001s : 7: predicate.graph_param_transform 0.53% : 0.000004s : 30: predicate.incorporate_call 0.48% : 0.000003s : 30: predicate.incorporate_call_switch 5.65% : 0.000041s : 239: predicate.inline 1.25% : 0.000009s : 53: predicate.inline_without_move 0.28% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 30: predicate.less_batch_normalization 1.64% : 0.000012s : 96: predicate.list_to_tuple_eliminator_ 2.70% : 0.000019s : 162: predicate.load_eliminater 0.29% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.33% : 0.000017s : 134: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 80: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 30: predicate.merge_addn 1.10% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 66: predicate.minmaximum_grad 0.32% : 0.000002s : 7: predicate.mutable_eliminate 0.15% : 0.000001s : 7: predicate.opt_reshape 0.14% : 0.000001s : 7: predicate.parallel_virtual_node 2.04% : 0.000015s : 99: predicate.partial_defer_inline 1.70% : 0.000012s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 66: predicate.print_const_string_wrapper 0.51% : 0.000004s : 30: predicate.reduce_all_const_elim 1.36% : 0.000010s : 66: predicate.reduce_eliminate 2.68% : 0.000019s : 162: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 30: predicate.remove_not_recompute_node 1.97% : 0.000014s : 147: predicate.replace_applicator 0.59% : 0.000004s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.09% : 0.000008s : 66: predicate.reshape_eliminate 1.14% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 7: predicate.row_tensor_eliminate 1.25% : 0.000009s : 65: predicate.same_eliminate 0.36% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.62% : 0.000004s : 30: predicate.shard_identity_eliminate 0.26% : 0.000002s : 14: predicate.special_op_eliminate 0.61% : 0.000004s : 30: predicate.specialize_transform 1.27% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.93% : 0.000014s : 99: predicate.switch_defer_inline 2.98% : 0.000021s : 164: predicate.switch_layer_defer_inline 5.15% : 0.000037s : 270: predicate.switch_simplify 1.08% : 0.000008s : 66: predicate.tile_eliminate 1.11% : 0.000008s : 66: predicate.transpose_eliminate 1.44% : 0.000010s : 80: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 80: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000009s : 80: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000020s : 126: predicate.tuple_list_get_item_eliminator 1.42% : 0.000010s : 80: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000014s : 110: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 96: predicate.tuple_to_list_eliminator_ 2.67% : 0.000019s : 162: predicate.updatestate_pure_node_eliminater 3.27% : 0.000023s : 192: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 7: predicate.value_based_eliminate 0.55% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 30: predicate.virtual_output_eliminate 0.12% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001515 34 56.82% : 0.000861s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.18% : 0.000654s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059763 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.99% : 0.002983s : 1: add_attr 4.98% : 0.002974s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000125s : 1: auto_monad 0.04% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.93% : 0.000555s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000017s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000010s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000003s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000475s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000016s : 1: opt.transform.mutable_eliminate 7.99% : 0.004773s : 117: opt.transform.opt_a 0.07% : 0.000041s : 1: opt.transform.opt_after_cconv 0.05% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000148s : 28: opt.transform.opt_b 0.12% : 0.000069s : 2: opt.transform.opt_trans_graph 0.08% : 0.000046s : 4: opt.transform.symbol_engine_opt 17.53% : 0.010477s : 1: opt_a 0.77% : 0.000459s : 1: opt_after_cconv 0.81% : 0.000487s : 1: opt_after_jit_grad 0.43% : 0.000255s : 1: opt_b 21.77% : 0.013009s : 1: optimize 0.04% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000025s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000030s : 1: remove_dup_value 2.37% : 0.001417s : 2: renormalize.infer 2.21% : 0.001322s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000045s : 1: rewriter_after_opt_a 0.30% : 0.000180s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000092s : 1: symbol_engine_optimizer 12.05% : 0.007199s : 1: task_emit 0.17% : 0.000100s : 1: tuple_transform 18.93% : 0.011312s : 1: type_inference 0.12% : 0.000069s : 1: validate TotalTime = 0.0182984, [24] [bootstrap]: 0.00046279 [type_inference]: 0.00426743 [event_method]: 1.052e-05 [auto_monad]: 5.021e-05 [graph_reusing]: 5.17999e-06 [inline]: 2.04e-06 [add_attr]: 0.00297876, [1] [add_attr_with_inline]: 0.00297053, [1] [Cycle 1]: 4.601e-05, [2] [tag_attr]: 1.184e-05 [meta_addattr_fg_expand]: 3.09999e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.172e-05 [insert-virtual-dataset]: 2.79001e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00366923, [53] [py_interpret_to_execute]: 1.462e-05 [rewriter_before_opt_a]: 3.87e-05 [opt_a]: 0.00187083, [2] [Cycle 1]: 0.00124803, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.451e-05 [loop_unroll]: 1.379e-05 [a_1]: 0.00028908 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 7.04001e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 1.87001e-06 [a_2]: 7.551e-05 [accelerated_algorithm]: 6.39001e-06 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 1.44998e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.57998e-06 [auto_parallel]: 5.80002e-06 [parallel]: 1.796e-05 [flash_sp]: 6.84999e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 9.91998e-06 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 7.44002e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.44e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 8.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.174e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.77001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.086e-05 [a_after_grad]: 8.90001e-06 [renormalize]: 0.0003386 [add_forward_monad_depend]: 4.38001e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 1.238e-05 [cse]: 2.768e-05 [a_3]: 3.995e-05 [Cycle 2]: 0.00061363, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.62002e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012363 [with_stream_mark]: 9.07001e-06 [recompute_prepare]: 5.52999e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.8e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.35001e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.26001e-06 [flash_sp]: 3.71001e-06 [merge_comm]: 2.97002e-06 [allreduce_fusion]: 2.58e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 5.89e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 5.86998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.24e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.69002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.34e-06 [a_after_grad]: 7.87e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.06998e-06 [cse]: 1.281e-05 [a_3]: 3.131e-05 [py_interpret_to_execute_after_opt_a]: 7.72998e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.11e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 4.76002e-06 [mutable_eliminate]: 0.00044885 [opt_b]: 0.00017961, [1] [Cycle 1]: 0.00017379, [7] [b_1]: 0.00010788 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 3.69997e-07 [cse]: 1.568e-05 [optimize_parallel_all_gather_comm]: 1.538e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.189e-05 [loop_unroll]: 0.00041256 [opt_after_cconv]: 9.45e-05, [1] [Cycle 1]: 8.887e-05, [7] [c_1]: 2.822e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.547e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.257e-05 [tuple_transform]: 6.837e-05, [1] [Cycle 1]: 6.389e-05, [4] [d_1]: 3.812e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.357e-05 [cse_after_recomputation]: 2.001e-05, [1] [Cycle 1]: 1.562e-05, [1] [cse]: 1.054e-05 [environ_conv]: 4.50001e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.58001e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.48002e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.30999e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.125e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.67998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 3.92002e-06 [overlap_grad_flash_sp]: 1.755e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.06003e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.18001e-06 [symbol_engine_optimizer]: 6.787e-05, [1] [Cycle 1]: 6.364e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.02e-06 [elim_not_effective]: 1.174e-05 [opt_reshape]: 5.91998e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.76998e-06 [pipeline_parallel_scheduler]: 1.30999e-06 [auto_monad_reorder]: 1.497e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00044378 [validate]: 3.005e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00612762 [execute]: 6.85002e-06 Sums bootstrap : 0.000463s : 3.23% type_inference : 0.004267s : 29.75% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000413s : 2.88% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 3.13% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000413s : 2.88% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 3.09% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006128s : 42.72% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.31% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.26% : 0.000001s : 2: substitution.fold_const_symbol 4.19% : 0.000005s : 4: substitution.graph_param_transform 64.93% : 0.000077s : 2: substitution.inline 2.57% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.94% : 0.000005s : 4: substitution.remove_not_recompute_node 3.35% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004227 2 92.00% : 0.003889s : 1: type_inference.infer 8.00% : 0.000338s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.88% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.38% : 0.000003s : 17: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.85% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.92% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.34% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.08% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 1.01% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000000s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.94% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.57% : 0.000006s : 41: predicate.switch_simplify 0.70% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.86% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.14% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026207 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.38% : 0.002983s : 1: add_attr 11.35% : 0.002974s : 1: add_attr_with_inline 0.02% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000496s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.91% : 0.000762s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.15% : 0.001874s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.73% : 0.000453s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.02% : 0.003673s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000186s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.42% : 0.006137s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.33% : 0.004281s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0341657, [24] [bootstrap]: 0.00049703 [type_inference]: 0.0101486 [event_method]: 4.054e-05 [auto_monad]: 0.00011131 [graph_reusing]: 8.26002e-06 [inline]: 2.33998e-06 [add_attr]: 0.00298746, [1] [add_attr_with_inline]: 0.00297931, [1] [Cycle 1]: 6.589e-05, [2] [tag_attr]: 3.073e-05 [meta_addattr_fg_expand]: 8.35001e-06 [parallel-infer-symbol]: 2.56998e-06 [pre_auto_parallel]: 4.529e-05 [insert-virtual-dataset]: 2.53003e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.0123921, [53] [py_interpret_to_execute]: 3.638e-05 [rewriter_before_opt_a]: 0.00012562 [opt_a]: 0.0102648, [3] [Cycle 1]: 0.00670213, [45] [expand_dump_flag]: 3.61001e-06 [switch_simplify]: 6.605e-05 [loop_unroll]: 5.502e-05 [a_1]: 0.00135548 [with_stream_mark]: 2.276e-05 [recompute_prepare]: 2.151e-05 [updatestate_depend_eliminate]: 9.10999e-06 [updatestate_assign_eliminate]: 8.13001e-06 [updatestate_loads_eliminate]: 7.55998e-06 [parameter_eliminate]: 2.86999e-06 [a_2]: 0.00024418 [accelerated_algorithm]: 3.105e-05 [shard]: 2.09e-06 [meta_shard_fg_expand]: 3.17002e-06 [shard_inline]: 1.591e-05 [merge_send_recv]: 1.634e-05 [auto_parallel]: 1.085e-05 [parallel]: 1.779e-05 [flash_sp]: 1.098e-05 [merge_comm]: 9.57999e-06 [allreduce_fusion]: 8.82999e-06 [matmul_add_comm_reduction]: 2.506e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.768e-05 [virtual_dataset]: 1.571e-05 [get_grad_eliminate_]: 1.558e-05 [virtual_output]: 1.558e-05 [merge_forward]: 9.46e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 1.753e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.908e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.792e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.00136656 [flash_sp_send_recv_attached]: 3.6e-06 [receive_attached]: 2.76999e-06 [after_resolve]: 5.917e-05 [a_after_grad]: 8.043e-05 [renormalize]: 0.00224552 [add_forward_monad_depend]: 9.19e-06 [auto_monad_grad]: 5.54998e-06 [auto_monad_eliminator]: 5.396e-05 [cse]: 0.00016051 [a_3]: 0.00032844 [Cycle 2]: 0.00275783, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.569e-05 [loop_unroll]: 4.251e-05 [a_1]: 0.00148167 [with_stream_mark]: 1.077e-05 [recompute_prepare]: 1.016e-05 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 3.72002e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.02998e-06 [a_2]: 0.00010866 [accelerated_algorithm]: 1.093e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 8.1e-06 [merge_send_recv]: 5.89e-06 [auto_parallel]: 6.78e-06 [parallel]: 4.70999e-06 [flash_sp]: 3.2e-06 [merge_comm]: 4.22003e-06 [allreduce_fusion]: 4.03001e-06 [matmul_add_comm_reduction]: 6.68e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 9.06002e-06 [virtual_dataset]: 7.77e-06 [get_grad_eliminate_]: 7.28e-06 [virtual_output]: 7.77e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 8.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.49e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.295e-05 [set_forward_comm_id_for_comm_node_pass]: 4.52998e-06 [meta_fg_expand]: 3.288e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.12e-06 [after_resolve]: 1.392e-05 [a_after_grad]: 1.28e-05 [renormalize]: 0.00051691 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 1.272e-05 [cse]: 2.151e-05 [a_3]: 5.617e-05 [Cycle 3]: 0.00079024, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 9.53997e-06 [loop_unroll]: 7.95998e-06 [a_1]: 0.00021207 [with_stream_mark]: 8.43999e-06 [recompute_prepare]: 8.07998e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.25002e-06 [updatestate_loads_eliminate]: 3.29001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 0.00010585 [accelerated_algorithm]: 1.077e-05 [shard]: 8.70001e-07 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 7.93999e-06 [merge_send_recv]: 5.52999e-06 [auto_parallel]: 6.27001e-06 [parallel]: 4.43001e-06 [flash_sp]: 1.04e-06 [merge_comm]: 4.13999e-06 [allreduce_fusion]: 3.77998e-06 [matmul_add_comm_reduction]: 6.48998e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 8.70001e-06 [virtual_dataset]: 7.48e-06 [get_grad_eliminate_]: 7.39002e-06 [virtual_output]: 7.75998e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 7.57002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.419e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.201e-05 [set_forward_comm_id_for_comm_node_pass]: 4.34002e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 6.50005e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 1.249e-05 [a_after_grad]: 1.224e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 8.77999e-06 [cse]: 1.84e-05 [a_3]: 4.933e-05 [py_interpret_to_execute_after_opt_a]: 8.96998e-06 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 4.197e-05 [convert_after_rewriter]: 8.22998e-06 [order_py_execute_after_rewriter]: 5.92001e-06 [mutable_eliminate]: 0.00045828 [opt_b]: 0.00025663, [1] [Cycle 1]: 0.00025007, [7] [b_1]: 0.00016267 [b_2]: 1.436e-05 [updatestate_depend_eliminate]: 6.66999e-06 [updatestate_assign_eliminate]: 3.73001e-06 [updatestate_loads_eliminate]: 3.73999e-06 [renormalize]: 4.50003e-07 [cse]: 2.331e-05 [optimize_parallel_all_gather_comm]: 1.804e-05 [overlap_param_gather]: 2.44001e-06 [cconv]: 1.915e-05 [loop_unroll]: 0.00041736 [opt_after_cconv]: 0.00011967, [1] [Cycle 1]: 0.00011408, [7] [c_1]: 4.224e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 6.48e-06 [updatestate_assign_eliminate]: 3.66999e-06 [updatestate_loads_eliminate]: 3.23e-06 [cse]: 2.295e-05 [renormalize]: 2.59985e-07 [remove_dup_value]: 2.722e-05 [tuple_transform]: 9.237e-05, [1] [Cycle 1]: 8.786e-05, [4] [d_1]: 5.897e-05 [none_parameter_eliminate]: 1.87001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 8.70001e-06 [partial_unused_args_eliminate]: 2.29999e-06 [add_recomputation]: 5.109e-05 [cse_after_recomputation]: 2.529e-05, [1] [Cycle 1]: 2.079e-05, [1] [cse]: 1.565e-05 [environ_conv]: 7.99002e-06 [swap_dp_allreduce_reducescatter]: 6.92002e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.528e-05 [grouped_pairwise_exchange_alltoall]: 1.39998e-06 [offloading_packed_experts]: 4.29002e-06 [overlap_recompute_and_grad_model_parallel]: 5.52001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 4.52e-06 [overlap_grad_flash_sp]: 2.094e-05 [begin_end_overlap_inline]: 8.2e-07 [split_matmul_comm_elemetwise]: 2.13002e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 8.868e-05, [1] [Cycle 1]: 8.466e-05, [6] [build]: 8.99e-06 [elim_shapecalc]: 1.157e-05 [elim_not_effective]: 1.627e-05 [opt_reshape]: 8.37998e-06 [fold_const_symbol]: 1.255e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.29998e-06 [auto_monad_reorder]: 2.106e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00046157 [validate]: 3.949e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.00718565 [execute]: 6.38e-06 Sums bootstrap : 0.000497s : 1.66% type_inference : 0.010149s : 33.89% event_method : 0.000041s : 0.14% auto_monad : 0.000111s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000036s : 0.12% optimize.rewriter_before_opt_a : 0.000126s : 0.42% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000121s : 0.41% optimize.opt_a.loop_unroll : 0.000105s : 0.35% optimize.opt_a.a_1 : 0.003049s : 10.18% optimize.opt_a.with_stream_mark : 0.000042s : 0.14% optimize.opt_a.recompute_prepare : 0.000040s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000018s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000459s : 1.53% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.18% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000006s : 0.02% optimize.opt_a.shard_inline : 0.000032s : 0.11% optimize.opt_a.merge_send_recv : 0.000028s : 0.09% optimize.opt_a.auto_parallel : 0.000024s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000018s : 0.06% optimize.opt_a.allreduce_fusion : 0.000017s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000038s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000035s : 0.12% optimize.opt_a.virtual_dataset : 0.000031s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.10% optimize.opt_a.virtual_output : 0.000031s : 0.10% optimize.opt_a.merge_forward : 0.000017s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000033s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000058s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000053s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.06% optimize.opt_a.meta_fg_expand : 0.001402s : 4.68% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000086s : 0.29% optimize.opt_a.a_after_grad : 0.000105s : 0.35% optimize.opt_a.renormalize : 0.002763s : 9.23% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000075s : 0.25% optimize.opt_a.cse : 0.000200s : 0.67% optimize.opt_a.a_3 : 0.000434s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000042s : 0.14% optimize.convert_after_rewriter : 0.000008s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.53% optimize.opt_b.b_1 : 0.000163s : 0.54% optimize.opt_b.b_2 : 0.000014s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000023s : 0.08% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000417s : 1.39% optimize.opt_after_cconv.c_1 : 0.000042s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.cse : 0.000023s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000027s : 0.09% optimize.tuple_transform.d_1 : 0.000059s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.17% optimize.cse_after_recomputation.cse : 0.000016s : 0.05% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000015s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000021s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000021s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000462s : 1.54% validate : 0.000039s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.007186s : 24.00% execute : 0.000006s : 0.02% Time group info: ------[substitution.] 0.000730 209 5.68% : 0.000041s : 11: substitution.arithmetic_simplify 0.32% : 0.000002s : 4: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 4: substitution.fold_const_symbol 0.91% : 0.000007s : 7: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 57.60% : 0.000421s : 16: substitution.inline 2.11% : 0.000015s : 2: substitution.inline_without_move 1.34% : 0.000010s : 18: substitution.j_node_and_user_rematch 2.05% : 0.000015s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.88% : 0.000014s : 18: substitution.remove_not_recompute_node 3.31% : 0.000024s : 10: substitution.replace_applicator 1.37% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.34% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010083 2 87.48% : 0.008821s : 1: type_inference.infer 12.52% : 0.001262s : 1: type_inference.specialize ------[replace.] 0.000197 30 59.03% : 0.000116s : 16: replace.inline 40.97% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000442 30 93.25% : 0.000412s : 16: match.inline 6.75% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000700 5429 1.10% : 0.000008s : 65: predicate.accumulaten_eliminater 0.26% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 30: predicate.addn_check_dump 1.10% : 0.000008s : 65: predicate.addn_zero_filter 1.08% : 0.000008s : 65: predicate.adjust_all_reduce_mul_add 2.06% : 0.000014s : 95: predicate.arithmetic_simplify 1.15% : 0.000008s : 65: predicate.cast_eliminate 1.17% : 0.000008s : 65: predicate.check_bprop_eliminate 0.52% : 0.000004s : 30: predicate.compare_switch_simplify 0.09% : 0.000001s : 7: predicate.const_output_eliminate 0.51% : 0.000004s : 30: predicate.depend_value_elim 1.21% : 0.000008s : 65: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 65: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 14: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 7: predicate.elim_not_effective 0.15% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000008s : 72: predicate.environ_add_const_eliminate 1.20% : 0.000008s : 72: predicate.environ_get_add_eliminate 1.20% : 0.000008s : 72: predicate.environ_get_depend_swap 1.75% : 0.000012s : 102: predicate.environ_get_eliminate 1.20% : 0.000008s : 72: predicate.environ_get_set_eliminate 1.73% : 0.000012s : 95: predicate.exchange_switch_depend_value 2.31% : 0.000016s : 95: predicate.float_depend_g_call 0.50% : 0.000004s : 30: predicate.float_environ_get_switch 0.66% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.55% : 0.000004s : 30: predicate.get_grad_eliminate 0.09% : 0.000001s : 7: predicate.graph_param_transform 0.52% : 0.000004s : 30: predicate.incorporate_call 0.49% : 0.000003s : 30: predicate.incorporate_call_switch 5.61% : 0.000039s : 234: predicate.inline 1.30% : 0.000009s : 53: predicate.inline_without_move 0.29% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 30: predicate.less_batch_normalization 1.61% : 0.000011s : 93: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 158: predicate.load_eliminater 0.33% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.24% : 0.000016s : 126: predicate.loop_unroll_before_grad 1.36% : 0.000009s : 79: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 30: predicate.merge_addn 1.14% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 65: predicate.minmaximum_grad 0.33% : 0.000002s : 7: predicate.mutable_eliminate 0.14% : 0.000001s : 7: predicate.opt_reshape 0.15% : 0.000001s : 7: predicate.parallel_virtual_node 2.03% : 0.000014s : 95: predicate.partial_defer_inline 1.73% : 0.000012s : 86: predicate.partial_eliminate 1.07% : 0.000008s : 65: predicate.print_const_string_wrapper 0.53% : 0.000004s : 30: predicate.reduce_all_const_elim 1.34% : 0.000009s : 65: predicate.reduce_eliminate 2.67% : 0.000019s : 158: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 30: predicate.remove_not_recompute_node 1.90% : 0.000013s : 144: predicate.replace_applicator 0.65% : 0.000005s : 53: predicate.replace_old_param 0.09% : 0.000001s : 7: predicate.reset_defer_inline 1.10% : 0.000008s : 65: predicate.reshape_eliminate 1.15% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 7: predicate.row_tensor_eliminate 1.29% : 0.000009s : 65: predicate.same_eliminate 0.38% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 30: predicate.shard_identity_eliminate 0.30% : 0.000002s : 14: predicate.special_op_eliminate 0.62% : 0.000004s : 30: predicate.specialize_transform 1.25% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.84% : 0.000013s : 95: predicate.switch_defer_inline 2.95% : 0.000021s : 160: predicate.switch_layer_defer_inline 4.97% : 0.000035s : 258: predicate.switch_simplify 1.08% : 0.000008s : 65: predicate.tile_eliminate 1.11% : 0.000008s : 65: predicate.transpose_eliminate 1.45% : 0.000010s : 79: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 79: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000009s : 79: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000019s : 123: predicate.tuple_list_get_item_eliminator 1.42% : 0.000010s : 79: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000014s : 109: predicate.tuple_list_set_item_eliminator 1.67% : 0.000012s : 93: predicate.tuple_to_list_eliminator_ 2.66% : 0.000019s : 158: predicate.updatestate_pure_node_eliminater 3.26% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 7: predicate.value_based_eliminate 0.55% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.60% : 0.000004s : 30: predicate.virtual_output_eliminate 0.14% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001417 32 57.66% : 0.000817s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.34% : 0.000600s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.057192 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.23% : 0.002992s : 1: add_attr 5.22% : 0.002983s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000118s : 1: auto_monad 0.04% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.93% : 0.000531s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000018s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000028s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.82% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.03% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000016s : 1: opt.transform.mutable_eliminate 8.05% : 0.004607s : 117: opt.transform.opt_a 0.07% : 0.000041s : 1: opt.transform.opt_after_cconv 0.05% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000153s : 28: opt.transform.opt_b 0.11% : 0.000066s : 2: opt.transform.opt_trans_graph 0.08% : 0.000046s : 4: opt.transform.symbol_engine_opt 17.95% : 0.010268s : 1: opt_a 0.22% : 0.000123s : 1: opt_after_cconv 0.82% : 0.000471s : 1: opt_after_jit_grad 0.45% : 0.000260s : 1: opt_b 21.67% : 0.012396s : 1: optimize 0.04% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000031s : 1: remove_dup_value 2.53% : 0.001445s : 2: renormalize.infer 2.28% : 0.001304s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000046s : 1: rewriter_after_opt_a 0.23% : 0.000130s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000091s : 1: symbol_engine_optimizer 12.58% : 0.007195s : 1: task_emit 0.17% : 0.000095s : 1: tuple_transform 17.77% : 0.010163s : 1: type_inference 0.12% : 0.000069s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x3-kbk],max_mem:42.0M . TotalTime = 0.745332, [24] [bootstrap]: 0.00053397 [type_inference]: 0.00595251 [event_method]: 1.386e-05 [auto_monad]: 5.463e-05 [graph_reusing]: 5.02e-06 [inline]: 1.79e-06 [add_attr]: 0.00342057, [1] [add_attr_with_inline]: 0.00340997, [1] [Cycle 1]: 4.465e-05, [2] [tag_attr]: 1.54e-05 [meta_addattr_fg_expand]: 4.30999e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.731e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00397468, [53] [py_interpret_to_execute]: 2.086e-05 [rewriter_before_opt_a]: 5.734e-05 [opt_a]: 0.00213895, [2] [Cycle 1]: 0.00154702, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 3.3e-05 [loop_unroll]: 2.061e-05 [a_1]: 0.00044577 [with_stream_mark]: 1.304e-05 [recompute_prepare]: 7.55e-06 [updatestate_depend_eliminate]: 3.66999e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.66998e-06 [a_2]: 7.537e-05 [accelerated_algorithm]: 6.63998e-06 [shard]: 1.93002e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.07001e-06 [merge_send_recv]: 7.82998e-06 [auto_parallel]: 5.34e-06 [parallel]: 2.248e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 9.41e-06 [allreduce_slice_to_reducescatter]: 7.60017e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.61003e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 9.19998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 9.31998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.33002e-06 [after_resolve]: 1.005e-05 [a_after_grad]: 8.23001e-06 [renormalize]: 0.00040167 [add_forward_monad_depend]: 4.85001e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.37e-05 [cse]: 2.606e-05 [a_3]: 4.117e-05 [Cycle 2]: 0.00058254, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.83998e-06 [loop_unroll]: 5.34998e-06 [a_1]: 0.00012526 [with_stream_mark]: 9.57999e-06 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.41998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.636e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.05e-06 [auto_parallel]: 5.02999e-06 [parallel]: 4.03001e-06 [flash_sp]: 2.88998e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 5.20001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.98002e-06 [virtual_dataset]: 5.49998e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 4.85999e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 5.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42999e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.01001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 6.50005e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.01002e-06 [a_after_grad]: 7.88001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.24e-05 [a_3]: 3.144e-05 [py_interpret_to_execute_after_opt_a]: 7.92998e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 3.112e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00045135 [opt_b]: 0.00018367, [1] [Cycle 1]: 0.00017752, [7] [b_1]: 0.00010569 [b_2]: 1.061e-05 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 3.80009e-07 [cse]: 1.676e-05 [optimize_parallel_all_gather_comm]: 1.63e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.156e-05 [loop_unroll]: 0.00041426 [opt_after_cconv]: 9.486e-05, [1] [Cycle 1]: 8.902e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.643e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 6.851e-05, [1] [Cycle 1]: 6.416e-05, [4] [d_1]: 3.877e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.10002e-06 [partial_unused_args_eliminate]: 1.53002e-06 [add_recomputation]: 4.894e-05 [cse_after_recomputation]: 2.022e-05, [1] [Cycle 1]: 1.599e-05, [1] [cse]: 1.087e-05 [environ_conv]: 4.37998e-06 [swap_dp_allreduce_reducescatter]: 4.98001e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.50001e-06 [slice_recompute_activation]: 2.01e-06 [micro_interleaved_order_control]: 2.89999e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 1.94e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.09003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.131e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.54002e-06 [overlap_recompute_and_grad_model_parallel]: 4.22998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.61001e-06 [overlap_grad_flash_sp]: 1.647e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.801e-05, [1] [Cycle 1]: 6.394e-05, [6] [build]: 2.18002e-06 [elim_shapecalc]: 8.31002e-06 [elim_not_effective]: 1.129e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.523e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.27002e-06 [opt_after_jit_grad]: 0.00044867 [validate]: 3.11e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.730619 [execute]: 8.81002e-06 Sums bootstrap : 0.000534s : 0.07% type_inference : 0.005953s : 0.80% event_method : 0.000014s : 0.00% auto_monad : 0.000055s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000057s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.01% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000571s : 0.08% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000010s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000010s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000402s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000038s : 0.01% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.06% optimize.opt_b.b_1 : 0.000106s : 0.01% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000414s : 0.06% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.06% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.730619s : 98.61% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000160 30 15.04% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.38% : 0.000005s : 4: substitution.graph_param_transform 66.36% : 0.000106s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.44% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005907 2 90.93% : 0.005371s : 1: type_inference.infer 9.07% : 0.000536s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.91% : 0.000026s : 3: replace.inline 31.09% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.59% : 0.000104s : 3: match.inline 8.41% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.94% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.30% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.01% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.68% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.07% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.30% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000331 8 46.99% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.01% : 0.000175s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.754216 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.45% : 0.003425s : 1: add_attr 0.45% : 0.003414s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.01% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000060s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.08% : 0.000572s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.06% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000460s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.12% : 0.000934s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000092s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.28% : 0.002142s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.06% : 0.000458s : 1: opt_after_jit_grad 0.02% : 0.000187s : 1: opt_b 0.53% : 0.003978s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000006s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000015s : 1: remove_dup_value 0.03% : 0.000207s : 1: renormalize.infer 0.02% : 0.000188s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000071s : 1: symbol_engine_optimizer 96.87% : 0.730641s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.79% : 0.005966s : 1: type_inference 0.01% : 0.000051s : 1: validate TotalTime = 0.055254, [24] [bootstrap]: 0.00047827 [type_inference]: 0.00435185 [event_method]: 1.159e-05 [auto_monad]: 5.179e-05 [graph_reusing]: 5.05001e-06 [inline]: 2.24999e-06 [add_attr]: 0.00297842, [1] [add_attr_with_inline]: 0.00297054, [1] [Cycle 1]: 4.486e-05, [2] [tag_attr]: 1.125e-05 [meta_addattr_fg_expand]: 2.96001e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 2.131e-05 [insert-virtual-dataset]: 3.13998e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00367269, [53] [py_interpret_to_execute]: 1.545e-05 [rewriter_before_opt_a]: 3.793e-05 [opt_a]: 0.0018865, [2] [Cycle 1]: 0.00124679, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 2.466e-05 [loop_unroll]: 1.354e-05 [a_1]: 0.00028915 [with_stream_mark]: 1.327e-05 [recompute_prepare]: 7.71001e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.61e-05 [accelerated_algorithm]: 6.66999e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 7.95e-06 [auto_parallel]: 5.44e-06 [parallel]: 1.747e-05 [flash_sp]: 7.54002e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 8.50001e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.78002e-06 [get_grad_eliminate_]: 6.25002e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.43002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.12e-05 [merge_recompute_call_nodes]: 1.70001e-06 [before_grad]: 9.44998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.087e-05 [a_after_grad]: 9.12999e-06 [renormalize]: 0.00034184 [add_forward_monad_depend]: 4.12e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.267e-05 [cse]: 2.637e-05 [a_3]: 3.964e-05 [Cycle 2]: 0.00063015, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.88998e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012285 [with_stream_mark]: 9.20001e-06 [recompute_prepare]: 5.56998e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.667e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.29002e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.17e-06 [flash_sp]: 2.84001e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.81999e-06 [matmul_add_comm_reduction]: 5.20001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.55997e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.65002e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.99002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 9.51e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.28002e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.294e-05 [a_3]: 3.154e-05 [py_interpret_to_execute_after_opt_a]: 7.51999e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.002e-05 [convert_after_rewriter]: 7.08e-06 [order_py_execute_after_rewriter]: 4.94998e-06 [mutable_eliminate]: 0.0004474 [opt_b]: 0.00017766, [1] [Cycle 1]: 0.00017171, [7] [b_1]: 0.00010553 [b_2]: 6.92002e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 3.19997e-07 [cse]: 1.566e-05 [optimize_parallel_all_gather_comm]: 1.547e-05 [overlap_param_gather]: 2.28002e-06 [cconv]: 2.191e-05 [loop_unroll]: 0.00040849 [opt_after_cconv]: 9.299e-05, [1] [Cycle 1]: 8.75e-05, [7] [c_1]: 2.664e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.63e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.241e-05 [tuple_transform]: 6.834e-05, [1] [Cycle 1]: 6.405e-05, [4] [d_1]: 3.873e-05 [none_parameter_eliminate]: 1.44998e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.15002e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.278e-05 [cse_after_recomputation]: 2.008e-05, [1] [Cycle 1]: 1.565e-05, [1] [cse]: 1.048e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.43002e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.40999e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 1.30001e-06 [interleave_split_concat_branches]: 1.26997e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.95001e-06 [control_data_broadcast_order]: 1.129e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.56999e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.24998e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 3.69002e-06 [overlap_grad_flash_sp]: 1.688e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.25002e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 6.672e-05, [1] [Cycle 1]: 6.23e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 7.87e-06 [elim_not_effective]: 1.133e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.52e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.66002e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.598e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.0004454 [validate]: 3.182e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0429597 [execute]: 8.40001e-06 Sums bootstrap : 0.000478s : 0.93% type_inference : 0.004352s : 8.49% event_method : 0.000012s : 0.02% auto_monad : 0.000052s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000021s : 0.04% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000038s : 0.07% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.06% optimize.opt_a.loop_unroll : 0.000019s : 0.04% optimize.opt_a.a_1 : 0.000412s : 0.80% optimize.opt_a.with_stream_mark : 0.000022s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000342s : 0.67% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.04% optimize.opt_a.cse : 0.000039s : 0.08% optimize.opt_a.a_3 : 0.000071s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.87% optimize.opt_b.b_1 : 0.000106s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000408s : 0.80% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.08% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000445s : 0.87% validate : 0.000032s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042960s : 83.79% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000121 26 17.57% : 0.000021s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.13% : 0.000005s : 4: substitution.graph_param_transform 66.26% : 0.000080s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000004s : 4: substitution.remove_not_recompute_node 3.60% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004308 2 91.91% : 0.003960s : 1: type_inference.infer 8.09% : 0.000349s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 0.96% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.83% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.90% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.24% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.72% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000254 6 42.56% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.44% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063160 196 0.01% : 0.000003s : 1: ForceFp32Comm 4.72% : 0.002983s : 1: add_attr 4.71% : 0.002974s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000058s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.81% : 0.000513s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000018s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.66% : 0.000417s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.72% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.21% : 0.000763s : 78: opt.transform.opt_a 0.04% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000088s : 28: opt.transform.opt_b 0.07% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.99% : 0.001889s : 1: opt_a 0.15% : 0.000096s : 1: opt_after_cconv 0.72% : 0.000455s : 1: opt_after_jit_grad 0.29% : 0.000181s : 1: opt_b 5.82% : 0.003676s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000025s : 1: pre_auto_parallel 0.03% : 0.000019s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000016s : 1: remove_dup_value 0.29% : 0.000182s : 1: renormalize.infer 0.24% : 0.000153s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000069s : 1: symbol_engine_optimizer 68.04% : 0.042976s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 6.92% : 0.004368s : 1: type_inference 0.08% : 0.000052s : 1: validate TotalTime = 0.056345, [24] [bootstrap]: 0.00046206 [type_inference]: 0.00558902 [event_method]: 1.365e-05 [auto_monad]: 5.376e-05 [graph_reusing]: 5.37001e-06 [inline]: 1.78002e-06 [add_attr]: 0.00294258, [1] [add_attr_with_inline]: 0.00293506, [1] [Cycle 1]: 4.353e-05, [2] [tag_attr]: 1.437e-05 [meta_addattr_fg_expand]: 3.91001e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 2.464e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00395889, [53] [py_interpret_to_execute]: 2.012e-05 [rewriter_before_opt_a]: 5.747e-05 [opt_a]: 0.00214162, [2] [Cycle 1]: 0.00154257, [45] [expand_dump_flag]: 2.65002e-06 [switch_simplify]: 3.215e-05 [loop_unroll]: 2.05e-05 [a_1]: 0.00049278 [with_stream_mark]: 1.322e-05 [recompute_prepare]: 7.41001e-06 [updatestate_depend_eliminate]: 3.60998e-06 [updatestate_assign_eliminate]: 3.58e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.82999e-06 [a_2]: 7.521e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 8.06001e-06 [auto_parallel]: 6.26e-06 [parallel]: 1.657e-05 [flash_sp]: 7.14001e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.48999e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.065e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 8.92e-06 [renormalize]: 0.00041347 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.347e-05 [cse]: 2.763e-05 [a_3]: 4.072e-05 [Cycle 2]: 0.00058982, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.70998e-06 [loop_unroll]: 5.56998e-06 [a_1]: 0.00012573 [with_stream_mark]: 9.13002e-06 [recompute_prepare]: 5.52001e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.759e-05 [accelerated_algorithm]: 5.58002e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.33002e-06 [merge_send_recv]: 4.39998e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.73001e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 4.89e-06 [allreduce_slice_to_reducescatter]: 2.19996e-07 [virtual_shard_identity]: 5.86e-06 [virtual_dataset]: 5.21998e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.40001e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 7.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.91e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 1.09003e-06 [receive_attached]: 1.05999e-06 [after_resolve]: 8.83001e-06 [a_after_grad]: 7.92e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.33002e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.04999e-06 [cse]: 1.312e-05 [a_3]: 3.102e-05 [py_interpret_to_execute_after_opt_a]: 7.56001e-06 [slice_cell_reuse_recomputed_activation]: 2.21003e-06 [rewriter_after_opt_a]: 3.153e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.10001e-06 [mutable_eliminate]: 0.00044725 [opt_b]: 0.00017772, [1] [Cycle 1]: 0.00017157, [7] [b_1]: 0.0001057 [b_2]: 7.06999e-06 [updatestate_depend_eliminate]: 4.76002e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.60014e-07 [cse]: 1.541e-05 [optimize_parallel_all_gather_comm]: 1.54e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.14e-05 [loop_unroll]: 0.00041191 [opt_after_cconv]: 9.517e-05, [1] [Cycle 1]: 8.91e-05, [7] [c_1]: 2.727e-05 [parameter_eliminate]: 2.44001e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.18002e-06 [cse]: 1.603e-05 [renormalize]: 5.70028e-07 [remove_dup_value]: 1.246e-05 [tuple_transform]: 6.846e-05, [1] [Cycle 1]: 6.431e-05, [4] [d_1]: 3.917e-05 [none_parameter_eliminate]: 1.44998e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.00002e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 4.29e-05 [cse_after_recomputation]: 1.97e-05, [1] [Cycle 1]: 1.53e-05, [1] [cse]: 1.023e-05 [environ_conv]: 4.73001e-06 [swap_dp_allreduce_reducescatter]: 5.28002e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.43e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.51e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.151e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.13999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.793e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.38002e-06 [symbol_engine_optimizer]: 6.754e-05, [1] [Cycle 1]: 6.36e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 8.05e-06 [elim_not_effective]: 1.109e-05 [opt_reshape]: 6.03002e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.58002e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.518e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00046414 [validate]: 3.208e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0425576 [execute]: 8.27e-06 Sums bootstrap : 0.000462s : 0.88% type_inference : 0.005589s : 10.66% event_method : 0.000014s : 0.03% auto_monad : 0.000054s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000025s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.04% optimize.rewriter_before_opt_a : 0.000057s : 0.11% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.07% optimize.opt_a.loop_unroll : 0.000026s : 0.05% optimize.opt_a.a_1 : 0.000619s : 1.18% optimize.opt_a.with_stream_mark : 0.000022s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000020s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000414s : 0.79% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000041s : 0.08% optimize.opt_a.a_3 : 0.000072s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.85% optimize.opt_b.b_1 : 0.000106s : 0.20% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.04% optimize.loop_unroll : 0.000412s : 0.79% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.08% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000464s : 0.89% validate : 0.000032s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042558s : 81.15% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000161 30 15.59% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.38% : 0.000005s : 4: substitution.graph_param_transform 65.49% : 0.000106s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.97% : 0.000005s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.70% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005549 2 88.88% : 0.004932s : 1: type_inference.infer 11.12% : 0.000617s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.92% : 0.000026s : 3: replace.inline 31.08% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 5 91.40% : 0.000104s : 3: match.inline 8.60% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000205 1131 0.68% : 0.000001s : 11: predicate.accumulaten_eliminater 0.70% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.42% : 0.000001s : 8: predicate.addn_check_dump 0.63% : 0.000001s : 11: predicate.addn_zero_filter 0.61% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.99% : 0.000004s : 19: predicate.arithmetic_simplify 0.70% : 0.000001s : 11: predicate.cast_eliminate 0.55% : 0.000001s : 8: predicate.check_bprop_eliminate 0.43% : 0.000001s : 8: predicate.compare_switch_simplify 0.20% : 0.000000s : 4: predicate.const_output_eliminate 0.44% : 0.000001s : 8: predicate.depend_value_elim 0.67% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.72% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000002s : 11: predicate.dict_set_item_eliminator 0.86% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.20% : 0.000000s : 4: predicate.elim_not_effective 0.30% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.87% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.85% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.83% : 0.000002s : 15: predicate.environ_get_depend_swap 1.36% : 0.000003s : 23: predicate.environ_get_eliminate 0.83% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.97% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.69% : 0.000003s : 16: predicate.float_depend_g_call 0.44% : 0.000001s : 8: predicate.float_environ_get_switch 0.70% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.17% : 0.000000s : 4: predicate.fold_const_symbol 0.56% : 0.000001s : 8: predicate.get_grad_eliminate 0.19% : 0.000000s : 4: predicate.graph_param_transform 0.55% : 0.000001s : 8: predicate.incorporate_call 0.42% : 0.000001s : 8: predicate.incorporate_call_switch 4.56% : 0.000009s : 51: predicate.inline 0.66% : 0.000001s : 8: predicate.inline_without_move 0.29% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.67% : 0.000001s : 8: predicate.less_batch_normalization 1.29% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.85% : 0.000004s : 32: predicate.load_eliminater 0.78% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.66% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.30% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.45% : 0.000001s : 8: predicate.merge_addn 0.49% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.52% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.60% : 0.000001s : 11: predicate.minmaximum_grad 0.81% : 0.000002s : 4: predicate.mutable_eliminate 0.27% : 0.000001s : 4: predicate.opt_reshape 0.31% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 16: predicate.partial_defer_inline 1.16% : 0.000002s : 17: predicate.partial_eliminate 0.66% : 0.000001s : 11: predicate.print_const_string_wrapper 0.50% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 1.81% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000001s : 8: predicate.remove_not_recompute_node 1.12% : 0.000002s : 21: predicate.replace_applicator 0.40% : 0.000001s : 8: predicate.replace_old_param 0.20% : 0.000000s : 4: predicate.reset_defer_inline 0.67% : 0.000001s : 11: predicate.reshape_eliminate 0.57% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.30% : 0.000001s : 4: predicate.row_tensor_eliminate 0.62% : 0.000001s : 8: predicate.same_eliminate 0.41% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.61% : 0.000001s : 8: predicate.shard_identity_eliminate 0.63% : 0.000001s : 8: predicate.special_op_eliminate 0.62% : 0.000001s : 8: predicate.specialize_transform 0.78% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.65% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.30% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000002s : 16: predicate.switch_defer_inline 1.58% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.97% : 0.000008s : 54: predicate.switch_simplify 0.62% : 0.000001s : 11: predicate.tile_eliminate 0.65% : 0.000001s : 11: predicate.transpose_eliminate 1.12% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.17% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.03% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.71% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.09% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.73% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.21% : 0.000002s : 21: predicate.tuple_to_list_eliminator_ 25.19% : 0.000052s : 32: predicate.updatestate_pure_node_eliminater 2.33% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.30% : 0.000001s : 4: predicate.value_based_eliminate 0.56% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.56% : 0.000001s : 8: predicate.virtual_output_eliminate 0.25% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.38% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000351 8 47.29% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.71% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064796 196 0.01% : 0.000003s : 1: ForceFp32Comm 4.55% : 0.002947s : 1: add_attr 4.53% : 0.002938s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000059s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.77% : 0.000499s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.65% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.70% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.51% : 0.000981s : 78: opt.transform.opt_a 0.04% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.07% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000031s : 4: opt.transform.symbol_engine_opt 3.31% : 0.002145s : 1: opt_a 0.15% : 0.000098s : 1: opt_after_cconv 0.73% : 0.000474s : 1: opt_after_jit_grad 0.28% : 0.000181s : 1: opt_b 6.12% : 0.003963s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.04% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.31% : 0.000204s : 1: renormalize.infer 0.31% : 0.000203s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.09% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000070s : 1: symbol_engine_optimizer 65.70% : 0.042573s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 8.65% : 0.005603s : 1: type_inference 0.08% : 0.000053s : 1: validate TotalTime = 0.0811488, [24] [bootstrap]: 0.0004855 [type_inference]: 0.0114208 [event_method]: 4.936e-05 [auto_monad]: 0.000118 [graph_reusing]: 7.81001e-06 [inline]: 2.25002e-06 [add_attr]: 0.0029979, [1] [add_attr_with_inline]: 0.0029896, [1] [Cycle 1]: 7.101e-05, [2] [tag_attr]: 3.484e-05 [meta_addattr_fg_expand]: 9.02999e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 5.077e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.26998e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.0128358, [53] [py_interpret_to_execute]: 3.887e-05 [rewriter_before_opt_a]: 0.00014394 [opt_a]: 0.0106403, [3] [Cycle 1]: 0.00699156, [45] [expand_dump_flag]: 3.86001e-06 [switch_simplify]: 0.00011596 [loop_unroll]: 6.338e-05 [a_1]: 0.00143822 [with_stream_mark]: 2.275e-05 [recompute_prepare]: 2.176e-05 [updatestate_depend_eliminate]: 9.09998e-06 [updatestate_assign_eliminate]: 8.17e-06 [updatestate_loads_eliminate]: 7.48999e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 0.00024581 [accelerated_algorithm]: 3.193e-05 [shard]: 1.82999e-06 [meta_shard_fg_expand]: 3.09999e-06 [shard_inline]: 1.605e-05 [merge_send_recv]: 1.598e-05 [auto_parallel]: 1.088e-05 [parallel]: 1.841e-05 [flash_sp]: 1.158e-05 [merge_comm]: 9.94999e-06 [allreduce_fusion]: 9.61e-06 [matmul_add_comm_reduction]: 2.532e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 1.801e-05 [virtual_dataset]: 1.567e-05 [get_grad_eliminate_]: 1.535e-05 [virtual_output]: 1.549e-05 [merge_forward]: 9.08002e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.758e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.873e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 2.748e-05 [set_forward_comm_id_for_comm_node_pass]: 9.47001e-06 [meta_fg_expand]: 0.0014039 [flash_sp_send_recv_attached]: 3.91999e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 5.97e-05 [a_after_grad]: 8.017e-05 [renormalize]: 0.00235275 [add_forward_monad_depend]: 9.35001e-06 [auto_monad_grad]: 5.47001e-06 [auto_monad_eliminator]: 5.557e-05 [cse]: 0.00016257 [a_3]: 0.00032544 [Cycle 2]: 0.00284119, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 4.577e-05 [loop_unroll]: 4.303e-05 [a_1]: 0.00151956 [with_stream_mark]: 1.184e-05 [recompute_prepare]: 9.77001e-06 [updatestate_depend_eliminate]: 4.32998e-06 [updatestate_assign_eliminate]: 3.82998e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00010874 [accelerated_algorithm]: 1.067e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.68002e-06 [shard_inline]: 8.04002e-06 [merge_send_recv]: 5.94999e-06 [auto_parallel]: 6.51e-06 [parallel]: 5.35999e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 5.42001e-06 [allreduce_fusion]: 4.55999e-06 [matmul_add_comm_reduction]: 7.18e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 8.88002e-06 [virtual_dataset]: 7.68999e-06 [get_grad_eliminate_]: 7.53e-06 [virtual_output]: 7.41001e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 8.40024e-07 [offload_activation]: 7.88999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.451e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.213e-05 [set_forward_comm_id_for_comm_node_pass]: 4.55001e-06 [meta_fg_expand]: 7.001e-05 [flash_sp_send_recv_attached]: 1.01002e-06 [receive_attached]: 1.10001e-06 [after_resolve]: 1.453e-05 [a_after_grad]: 1.285e-05 [renormalize]: 0.00051723 [add_forward_monad_depend]: 4e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 1.268e-05 [cse]: 2.197e-05 [a_3]: 5.704e-05 [Cycle 3]: 0.0007935, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 9.47001e-06 [loop_unroll]: 7.92998e-06 [a_1]: 0.00021288 [with_stream_mark]: 9.04e-06 [recompute_prepare]: 8.03999e-06 [updatestate_depend_eliminate]: 4.04002e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.36999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 0.00010637 [accelerated_algorithm]: 1.063e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.44998e-06 [shard_inline]: 8.03999e-06 [merge_send_recv]: 5.77999e-06 [auto_parallel]: 6.33e-06 [parallel]: 4.80001e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 4.17e-06 [allreduce_fusion]: 3.92002e-06 [matmul_add_comm_reduction]: 6.61e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 8.69e-06 [virtual_dataset]: 7.48999e-06 [get_grad_eliminate_]: 7.46999e-06 [virtual_output]: 7.18e-06 [merge_forward]: 3.56999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 7.61001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.361e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.232e-05 [set_forward_comm_id_for_comm_node_pass]: 4.3e-06 [meta_fg_expand]: 2.56e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.01997e-06 [after_resolve]: 1.236e-05 [a_after_grad]: 1.212e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 8.86997e-06 [cse]: 1.788e-05 [a_3]: 4.909e-05 [py_interpret_to_execute_after_opt_a]: 9.34998e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 4.123e-05 [convert_after_rewriter]: 7.93999e-06 [order_py_execute_after_rewriter]: 6.07999e-06 [mutable_eliminate]: 0.00045546 [opt_b]: 0.0002608, [1] [Cycle 1]: 0.0002547, [7] [b_1]: 0.00017157 [b_2]: 9.57999e-06 [updatestate_depend_eliminate]: 6.85998e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.16999e-06 [renormalize]: 3.10014e-07 [cse]: 2.437e-05 [optimize_parallel_all_gather_comm]: 1.874e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 1.902e-05 [loop_unroll]: 0.00042102 [opt_after_cconv]: 0.00016163, [1] [Cycle 1]: 0.00011289, [7] [c_1]: 4.182e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 6.44001e-06 [updatestate_assign_eliminate]: 3.56001e-06 [updatestate_loads_eliminate]: 3.16001e-06 [cse]: 2.293e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 2.725e-05 [tuple_transform]: 9.246e-05, [1] [Cycle 1]: 8.772e-05, [4] [d_1]: 5.859e-05 [none_parameter_eliminate]: 1.66002e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 8.72e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 5.281e-05 [cse_after_recomputation]: 2.654e-05, [1] [Cycle 1]: 2.217e-05, [1] [cse]: 1.707e-05 [environ_conv]: 7.75e-06 [swap_dp_allreduce_reducescatter]: 6.93e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.94999e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.464e-05 [grouped_pairwise_exchange_alltoall]: 1.97001e-06 [offloading_packed_experts]: 4.29997e-06 [overlap_recompute_and_grad_model_parallel]: 5.19998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 4.43999e-06 [overlap_grad_flash_sp]: 2.132e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 8.883e-05, [1] [Cycle 1]: 8.455e-05, [6] [build]: 9.89999e-06 [elim_shapecalc]: 1.131e-05 [elim_not_effective]: 1.532e-05 [opt_reshape]: 8.32e-06 [fold_const_symbol]: 1.266e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 2.206e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00046563 [validate]: 4.126e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.052421 [execute]: 7.70998e-06 Sums bootstrap : 0.000486s : 0.63% type_inference : 0.011421s : 14.86% event_method : 0.000049s : 0.06% auto_monad : 0.000118s : 0.15% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.07% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.05% optimize.rewriter_before_opt_a : 0.000144s : 0.19% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000171s : 0.22% optimize.opt_a.loop_unroll : 0.000114s : 0.15% optimize.opt_a.a_1 : 0.003171s : 4.12% optimize.opt_a.with_stream_mark : 0.000044s : 0.06% optimize.opt_a.recompute_prepare : 0.000040s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.02% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000461s : 0.60% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.07% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000006s : 0.01% optimize.opt_a.shard_inline : 0.000032s : 0.04% optimize.opt_a.merge_send_recv : 0.000028s : 0.04% optimize.opt_a.auto_parallel : 0.000024s : 0.03% optimize.opt_a.parallel : 0.000029s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.03% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.05% optimize.opt_a.virtual_dataset : 0.000031s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000030s : 0.04% optimize.opt_a.virtual_output : 0.000030s : 0.04% optimize.opt_a.merge_forward : 0.000017s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000033s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000052s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000018s : 0.02% optimize.opt_a.meta_fg_expand : 0.001476s : 1.92% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.11% optimize.opt_a.a_after_grad : 0.000105s : 0.14% optimize.opt_a.renormalize : 0.002870s : 3.73% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.02% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000077s : 0.10% optimize.opt_a.cse : 0.000202s : 0.26% optimize.opt_a.a_3 : 0.000432s : 0.56% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000455s : 0.59% optimize.opt_b.b_1 : 0.000172s : 0.22% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000024s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000421s : 0.55% optimize.opt_after_cconv.c_1 : 0.000042s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000023s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000027s : 0.04% optimize.tuple_transform.d_1 : 0.000059s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.07% optimize.cse_after_recomputation.cse : 0.000017s : 0.02% optimize.environ_conv : 0.000008s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000008s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000022s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000466s : 0.61% validate : 0.000041s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.052421s : 68.19% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000740 213 6.03% : 0.000045s : 12: substitution.arithmetic_simplify 0.31% : 0.000002s : 4: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 4: substitution.fold_const_symbol 0.90% : 0.000007s : 7: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 57.08% : 0.000422s : 17: substitution.inline 2.05% : 0.000015s : 2: substitution.inline_without_move 1.27% : 0.000009s : 18: substitution.j_node_and_user_rematch 2.15% : 0.000016s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.64% : 0.000012s : 18: substitution.remove_not_recompute_node 3.11% : 0.000023s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.68% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.95% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011346 2 86.41% : 0.009804s : 1: type_inference.infer 13.59% : 0.001542s : 1: type_inference.specialize ------[replace.] 0.000217 33 57.00% : 0.000124s : 17: replace.inline 43.00% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000447 33 92.51% : 0.000414s : 17: match.inline 7.49% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000716 5530 1.09% : 0.000008s : 66: predicate.accumulaten_eliminater 0.27% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 30: predicate.addn_check_dump 1.07% : 0.000008s : 66: predicate.addn_zero_filter 1.06% : 0.000008s : 66: predicate.adjust_all_reduce_mul_add 2.01% : 0.000014s : 96: predicate.arithmetic_simplify 1.13% : 0.000008s : 66: predicate.cast_eliminate 1.13% : 0.000008s : 65: predicate.check_bprop_eliminate 0.51% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.51% : 0.000004s : 30: predicate.depend_value_elim 1.22% : 0.000009s : 66: predicate.dict_get_item_const_eliminator 1.24% : 0.000009s : 66: predicate.dict_get_item_eliminator 1.15% : 0.000008s : 66: predicate.dict_set_item_eliminator 0.34% : 0.000002s : 14: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 7: predicate.elim_not_effective 0.14% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 73: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 73: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 73: predicate.environ_get_depend_swap 1.76% : 0.000013s : 103: predicate.environ_get_eliminate 1.20% : 0.000009s : 73: predicate.environ_get_set_eliminate 1.77% : 0.000013s : 99: predicate.exchange_switch_depend_value 2.37% : 0.000017s : 99: predicate.float_depend_g_call 0.49% : 0.000004s : 30: predicate.float_environ_get_switch 0.64% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.54% : 0.000004s : 30: predicate.get_grad_eliminate 0.09% : 0.000001s : 7: predicate.graph_param_transform 0.54% : 0.000004s : 30: predicate.incorporate_call 0.47% : 0.000003s : 30: predicate.incorporate_call_switch 5.62% : 0.000040s : 239: predicate.inline 1.28% : 0.000009s : 53: predicate.inline_without_move 0.30% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 30: predicate.less_batch_normalization 1.61% : 0.000012s : 96: predicate.list_to_tuple_eliminator_ 2.67% : 0.000019s : 162: predicate.load_eliminater 0.31% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.32% : 0.000017s : 134: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 80: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 30: predicate.merge_addn 1.09% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 66: predicate.minmaximum_grad 0.32% : 0.000002s : 7: predicate.mutable_eliminate 0.13% : 0.000001s : 7: predicate.opt_reshape 0.14% : 0.000001s : 7: predicate.parallel_virtual_node 2.04% : 0.000015s : 99: predicate.partial_defer_inline 1.75% : 0.000012s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 66: predicate.print_const_string_wrapper 0.52% : 0.000004s : 30: predicate.reduce_all_const_elim 1.28% : 0.000009s : 66: predicate.reduce_eliminate 2.71% : 0.000019s : 162: predicate.redundant_stop_gradient_eliminater 0.41% : 0.000003s : 30: predicate.remove_not_recompute_node 1.90% : 0.000014s : 147: predicate.replace_applicator 0.60% : 0.000004s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.10% : 0.000008s : 66: predicate.reshape_eliminate 1.12% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 7: predicate.row_tensor_eliminate 1.22% : 0.000009s : 65: predicate.same_eliminate 0.38% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 30: predicate.shard_identity_eliminate 0.27% : 0.000002s : 14: predicate.special_op_eliminate 0.64% : 0.000005s : 30: predicate.specialize_transform 1.27% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.88% : 0.000013s : 99: predicate.switch_defer_inline 3.00% : 0.000021s : 164: predicate.switch_layer_defer_inline 5.11% : 0.000037s : 270: predicate.switch_simplify 1.07% : 0.000008s : 66: predicate.tile_eliminate 1.11% : 0.000008s : 66: predicate.transpose_eliminate 1.47% : 0.000011s : 80: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 80: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000009s : 80: predicate.tuple_list_get_item_depend_reorder 2.89% : 0.000021s : 126: predicate.tuple_list_get_item_eliminator 1.44% : 0.000010s : 80: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000014s : 110: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 96: predicate.tuple_to_list_eliminator_ 2.67% : 0.000019s : 162: predicate.updatestate_pure_node_eliminater 3.26% : 0.000023s : 192: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 7: predicate.value_based_eliminate 0.54% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 30: predicate.virtual_output_eliminate 0.13% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001590 34 56.74% : 0.000902s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.26% : 0.000688s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.104913 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.86% : 0.003003s : 1: add_attr 2.85% : 0.002994s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000125s : 1: auto_monad 0.02% : 0.000026s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.50% : 0.000520s : 1: bootstrap 0.02% : 0.000022s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000018s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000029s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.05% : 0.000056s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.44% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000016s : 1: opt.transform.mutable_eliminate 4.56% : 0.004784s : 117: opt.transform.opt_a 0.04% : 0.000041s : 1: opt.transform.opt_after_cconv 0.03% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000154s : 28: opt.transform.opt_b 0.06% : 0.000065s : 2: opt.transform.opt_trans_graph 0.04% : 0.000044s : 4: opt.transform.symbol_engine_opt 10.14% : 0.010643s : 1: opt_a 0.16% : 0.000165s : 1: opt_after_cconv 0.45% : 0.000475s : 1: opt_after_jit_grad 0.25% : 0.000264s : 1: opt_b 12.24% : 0.012840s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.05% : 0.000056s : 1: pre_auto_parallel 0.04% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000031s : 1: remove_dup_value 1.40% : 0.001474s : 2: renormalize.infer 1.32% : 0.001383s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000045s : 1: rewriter_after_opt_a 0.14% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000091s : 1: symbol_engine_optimizer 49.98% : 0.052437s : 1: task_emit 0.09% : 0.000095s : 1: tuple_transform 10.90% : 0.011435s : 1: type_inference 0.06% : 0.000064s : 1: validate TotalTime = 0.0583669, [24] [bootstrap]: 0.00046694 [type_inference]: 0.00430466 [event_method]: 1.055e-05 [auto_monad]: 5.231e-05 [graph_reusing]: 5.15999e-06 [inline]: 1.82999e-06 [add_attr]: 0.00303736, [1] [add_attr_with_inline]: 0.00302932, [1] [Cycle 1]: 4.461e-05, [2] [tag_attr]: 1.204e-05 [meta_addattr_fg_expand]: 3.04001e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.112e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.82999e-06 [optimize]: 0.00368601, [53] [py_interpret_to_execute]: 1.532e-05 [rewriter_before_opt_a]: 3.824e-05 [opt_a]: 0.00185213, [2] [Cycle 1]: 0.00124966, [45] [expand_dump_flag]: 2.73998e-06 [switch_simplify]: 2.385e-05 [loop_unroll]: 1.339e-05 [a_1]: 0.00029237 [with_stream_mark]: 1.365e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.96001e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.711e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 1.96998e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 5.64e-06 [parallel]: 1.772e-05 [flash_sp]: 7.23999e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.28998e-06 [matmul_add_comm_reduction]: 8.51002e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.26001e-06 [virtual_dataset]: 5.72999e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.094e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.10001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37997e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.33998e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.01e-05 [a_after_grad]: 8.59e-06 [renormalize]: 0.00033975 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.92001e-06 [auto_monad_eliminator]: 1.314e-05 [cse]: 2.793e-05 [a_3]: 3.972e-05 [Cycle 2]: 0.00059318, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.72002e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012607 [with_stream_mark]: 1.034e-05 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.783e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 8.99978e-07 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.40001e-06 [parallel]: 4.57e-06 [flash_sp]: 3.73001e-06 [merge_comm]: 3.03998e-06 [allreduce_fusion]: 2.85002e-06 [matmul_add_comm_reduction]: 5.61e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.04999e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.75e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.15001e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 8.3e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.291e-05 [a_3]: 3.185e-05 [py_interpret_to_execute_after_opt_a]: 7.08998e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.203e-05 [convert_after_rewriter]: 7.16001e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00044935 [opt_b]: 0.00020975, [1] [Cycle 1]: 0.000204, [7] [b_1]: 0.00013595 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 4.00003e-07 [cse]: 1.595e-05 [optimize_parallel_all_gather_comm]: 1.563e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.138e-05 [loop_unroll]: 0.0004165 [opt_after_cconv]: 9.391e-05, [1] [Cycle 1]: 8.851e-05, [7] [c_1]: 2.735e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.586e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.275e-05 [tuple_transform]: 6.853e-05, [1] [Cycle 1]: 6.408e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.66998e-06 [add_recomputation]: 4.323e-05 [cse_after_recomputation]: 2.027e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.115e-05 [environ_conv]: 5.07999e-06 [swap_dp_allreduce_reducescatter]: 5.02999e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.49002e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.15001e-06 [ForceFp32Comm]: 9.40025e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 9.79984e-07 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.216e-05 [grouped_pairwise_exchange_alltoall]: 1.82001e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 3.86001e-06 [overlap_grad_flash_sp]: 1.614e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.839e-05, [1] [Cycle 1]: 6.415e-05, [6] [build]: 2.43002e-06 [elim_shapecalc]: 8.55999e-06 [elim_not_effective]: 1.151e-05 [opt_reshape]: 5.85002e-06 [fold_const_symbol]: 8.62e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.62002e-06 [opt_after_jit_grad]: 0.0004499 [validate]: 3.131e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0460597 [execute]: 8.46002e-06 Sums bootstrap : 0.000467s : 0.86% type_inference : 0.004305s : 7.92% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000021s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000038s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.06% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000418s : 0.77% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000340s : 0.63% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.04% optimize.opt_a.cse : 0.000041s : 0.08% optimize.opt_a.a_3 : 0.000072s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.83% optimize.opt_b.b_1 : 0.000136s : 0.25% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.04% optimize.loop_unroll : 0.000416s : 0.77% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.08% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000450s : 0.83% validate : 0.000031s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.046060s : 84.71% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000119 26 18.60% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.18% : 0.000005s : 4: substitution.graph_param_transform 65.71% : 0.000079s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.62% : 0.000004s : 4: substitution.remove_not_recompute_node 3.15% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004264 2 91.74% : 0.003912s : 1: type_inference.infer 8.26% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.17% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.91% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.83% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 1.01% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 42.18% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.82% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.066381 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.58% : 0.003041s : 1: add_attr 4.57% : 0.003033s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000057s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.76% : 0.000503s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.64% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.69% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.16% : 0.000768s : 78: opt.transform.opt_a 0.04% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.18% : 0.000119s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.79% : 0.001855s : 1: opt_a 0.15% : 0.000097s : 1: opt_after_cconv 0.69% : 0.000459s : 1: opt_after_jit_grad 0.32% : 0.000213s : 1: opt_b 5.56% : 0.003690s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000025s : 1: pre_auto_parallel 0.03% : 0.000019s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.28% : 0.000184s : 1: renormalize.infer 0.23% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000071s : 1: symbol_engine_optimizer 69.41% : 0.046076s : 1: task_emit 0.11% : 0.000072s : 1: tuple_transform 6.50% : 0.004318s : 1: type_inference 0.08% : 0.000052s : 1: validate TotalTime = 0.0808613, [24] [bootstrap]: 0.00049047 [type_inference]: 0.0102274 [event_method]: 4.355e-05 [auto_monad]: 0.00011417 [graph_reusing]: 7.57002e-06 [inline]: 2.09999e-06 [add_attr]: 0.00300184, [1] [add_attr_with_inline]: 0.00299392, [1] [Cycle 1]: 6.597e-05, [2] [tag_attr]: 3.119e-05 [meta_addattr_fg_expand]: 8.37e-06 [parallel-infer-symbol]: 2.61999e-06 [pre_auto_parallel]: 4.622e-05 [insert-virtual-dataset]: 2.28002e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.0125452, [53] [py_interpret_to_execute]: 3.675e-05 [rewriter_before_opt_a]: 0.00012711 [opt_a]: 0.0103964, [3] [Cycle 1]: 0.00680893, [45] [expand_dump_flag]: 3.53e-06 [switch_simplify]: 6.693e-05 [loop_unroll]: 5.562e-05 [a_1]: 0.00135837 [with_stream_mark]: 2.297e-05 [recompute_prepare]: 2.17e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 7.88001e-06 [updatestate_loads_eliminate]: 7.68001e-06 [parameter_eliminate]: 2.78e-06 [a_2]: 0.00024392 [accelerated_algorithm]: 3.046e-05 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 3.10998e-06 [shard_inline]: 1.59e-05 [merge_send_recv]: 1.54e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.775e-05 [flash_sp]: 1.189e-05 [merge_comm]: 9.86e-06 [allreduce_fusion]: 8.85999e-06 [matmul_add_comm_reduction]: 2.563e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.748e-05 [virtual_dataset]: 1.568e-05 [get_grad_eliminate_]: 1.555e-05 [virtual_output]: 1.513e-05 [merge_forward]: 9.10001e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.703e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.893e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.746e-05 [set_forward_comm_id_for_comm_node_pass]: 9.67999e-06 [meta_fg_expand]: 0.00137141 [flash_sp_send_recv_attached]: 3.79002e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 5.894e-05 [a_after_grad]: 8.05e-05 [renormalize]: 0.00234683 [add_forward_monad_depend]: 8.95999e-06 [auto_monad_grad]: 5.13002e-06 [auto_monad_eliminator]: 5.464e-05 [cse]: 0.00016463 [a_3]: 0.00032547 [Cycle 2]: 0.00277664, [45] [expand_dump_flag]: 1.44998e-06 [switch_simplify]: 4.596e-05 [loop_unroll]: 4.294e-05 [a_1]: 0.0014948 [with_stream_mark]: 1.139e-05 [recompute_prepare]: 9.69e-06 [updatestate_depend_eliminate]: 4.15e-06 [updatestate_assign_eliminate]: 3.73999e-06 [updatestate_loads_eliminate]: 3.20998e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00010946 [accelerated_algorithm]: 1.089e-05 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 8.21002e-06 [merge_send_recv]: 6.04999e-06 [auto_parallel]: 6.69999e-06 [parallel]: 5.29998e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 4.62998e-06 [allreduce_fusion]: 4e-06 [matmul_add_comm_reduction]: 7e-06 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 9.24e-06 [virtual_dataset]: 7.75998e-06 [get_grad_eliminate_]: 8.63001e-06 [virtual_output]: 7.63001e-06 [merge_forward]: 3.87002e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 7.99002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.447e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.194e-05 [set_forward_comm_id_for_comm_node_pass]: 4.60001e-06 [meta_fg_expand]: 3.309e-05 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.47999e-06 [after_resolve]: 1.423e-05 [a_after_grad]: 1.279e-05 [renormalize]: 0.0005168 [add_forward_monad_depend]: 4.06001e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.289e-05 [cse]: 2.184e-05 [a_3]: 5.628e-05 [Cycle 3]: 0.00079743, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 9.05999e-06 [loop_unroll]: 7.75998e-06 [a_1]: 0.00021267 [with_stream_mark]: 8.37e-06 [recompute_prepare]: 8.35999e-06 [updatestate_depend_eliminate]: 3.96001e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 3.5e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00010756 [accelerated_algorithm]: 1.075e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 8.10999e-06 [merge_send_recv]: 5.82999e-06 [auto_parallel]: 6.53e-06 [parallel]: 4.42003e-06 [flash_sp]: 1.02e-06 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 4.11001e-06 [matmul_add_comm_reduction]: 6.52001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 9.10999e-06 [virtual_dataset]: 7.85e-06 [get_grad_eliminate_]: 7.61999e-06 [virtual_output]: 7.42998e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 7.56999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.399e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.226e-05 [set_forward_comm_id_for_comm_node_pass]: 4.55001e-06 [meta_fg_expand]: 2.48998e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.268e-05 [a_after_grad]: 1.239e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 9.12999e-06 [cse]: 1.951e-05 [a_3]: 4.92e-05 [py_interpret_to_execute_after_opt_a]: 9.17001e-06 [slice_cell_reuse_recomputed_activation]: 2.24001e-06 [rewriter_after_opt_a]: 4.127e-05 [convert_after_rewriter]: 8.38001e-06 [order_py_execute_after_rewriter]: 6.46e-06 [mutable_eliminate]: 0.00046064 [opt_b]: 0.0002596, [1] [Cycle 1]: 0.00025378, [7] [b_1]: 0.00017124 [b_2]: 9.66e-06 [updatestate_depend_eliminate]: 6.53e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.34001e-06 [renormalize]: 4.30009e-07 [cse]: 2.458e-05 [optimize_parallel_all_gather_comm]: 1.765e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 1.936e-05 [loop_unroll]: 0.00042513 [opt_after_cconv]: 0.00012133, [1] [Cycle 1]: 0.0001156, [7] [c_1]: 4.259e-05 [parameter_eliminate]: 2.44001e-06 [updatestate_depend_eliminate]: 6.17999e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 3.41999e-06 [cse]: 2.389e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 2.586e-05 [tuple_transform]: 9.263e-05, [1] [Cycle 1]: 8.798e-05, [4] [d_1]: 5.896e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 8.62998e-06 [partial_unused_args_eliminate]: 1.51998e-06 [add_recomputation]: 5.287e-05 [cse_after_recomputation]: 2.652e-05, [1] [Cycle 1]: 2.197e-05, [1] [cse]: 1.671e-05 [environ_conv]: 7.45998e-06 [swap_dp_allreduce_reducescatter]: 6.51e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.17003e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.21e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 8.40024e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.45002e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.50999e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.431e-05 [grouped_pairwise_exchange_alltoall]: 1.38002e-06 [offloading_packed_experts]: 4.43001e-06 [overlap_recompute_and_grad_model_parallel]: 6.07999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.48002e-06 [overlap_grad_ring_attention]: 4.74e-06 [overlap_grad_flash_sp]: 2.09e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 9.09989e-07 [symbol_engine_optimizer]: 8.95e-05, [1] [Cycle 1]: 8.558e-05, [6] [build]: 9.43002e-06 [elim_shapecalc]: 1.145e-05 [elim_not_effective]: 1.608e-05 [opt_reshape]: 8.53001e-06 [fold_const_symbol]: 1.284e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.147e-05 [get_jit_bprop_graph]: 9.30013e-07 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00046258 [validate]: 4.163e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.053627 [execute]: 8.15e-06 Sums bootstrap : 0.000490s : 0.64% type_inference : 0.010227s : 13.35% event_method : 0.000044s : 0.06% auto_monad : 0.000114s : 0.15% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.05% optimize.rewriter_before_opt_a : 0.000127s : 0.17% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000122s : 0.16% optimize.opt_a.loop_unroll : 0.000106s : 0.14% optimize.opt_a.a_1 : 0.003066s : 4.00% optimize.opt_a.with_stream_mark : 0.000043s : 0.06% optimize.opt_a.recompute_prepare : 0.000040s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000015s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.02% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000461s : 0.60% optimize.opt_a.accelerated_algorithm : 0.000052s : 0.07% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000006s : 0.01% optimize.opt_a.shard_inline : 0.000032s : 0.04% optimize.opt_a.merge_send_recv : 0.000027s : 0.04% optimize.opt_a.auto_parallel : 0.000024s : 0.03% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000019s : 0.02% optimize.opt_a.allreduce_fusion : 0.000017s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000039s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000036s : 0.05% optimize.opt_a.virtual_dataset : 0.000031s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.04% optimize.opt_a.virtual_output : 0.000030s : 0.04% optimize.opt_a.merge_forward : 0.000017s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000033s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000057s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000052s : 0.07% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000019s : 0.02% optimize.opt_a.meta_fg_expand : 0.001407s : 1.84% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000086s : 0.11% optimize.opt_a.a_after_grad : 0.000106s : 0.14% optimize.opt_a.renormalize : 0.002864s : 3.74% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.02% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000077s : 0.10% optimize.opt_a.cse : 0.000206s : 0.27% optimize.opt_a.a_3 : 0.000431s : 0.56% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000041s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000461s : 0.60% optimize.opt_b.b_1 : 0.000171s : 0.22% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000025s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.03% optimize.loop_unroll : 0.000425s : 0.55% optimize.opt_after_cconv.c_1 : 0.000043s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.cse : 0.000024s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000026s : 0.03% optimize.tuple_transform.d_1 : 0.000059s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.07% optimize.cse_after_recomputation.cse : 0.000017s : 0.02% optimize.environ_conv : 0.000007s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000021s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000016s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000013s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000021s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000463s : 0.60% validate : 0.000042s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.053627s : 69.99% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000715 209 6.05% : 0.000043s : 11: substitution.arithmetic_simplify 0.33% : 0.000002s : 4: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.25% : 0.000002s : 4: substitution.fold_const_symbol 0.98% : 0.000007s : 7: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 56.12% : 0.000401s : 16: substitution.inline 2.16% : 0.000015s : 2: substitution.inline_without_move 1.33% : 0.000010s : 18: substitution.j_node_and_user_rematch 2.05% : 0.000015s : 3: substitution.less_batch_normalization 1.79% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000013s : 18: substitution.remove_not_recompute_node 3.26% : 0.000023s : 10: substitution.replace_applicator 1.52% : 0.000011s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.89% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.88% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.68% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.60% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010159 2 87.06% : 0.008844s : 1: type_inference.infer 12.94% : 0.001315s : 1: type_inference.specialize ------[replace.] 0.000199 30 59.11% : 0.000118s : 16: replace.inline 40.89% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.86% : 0.000392s : 16: match.inline 7.14% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000703 5429 1.11% : 0.000008s : 65: predicate.accumulaten_eliminater 0.28% : 0.000002s : 7: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 30: predicate.addn_check_dump 1.08% : 0.000008s : 65: predicate.addn_zero_filter 1.06% : 0.000007s : 65: predicate.adjust_all_reduce_mul_add 2.12% : 0.000015s : 95: predicate.arithmetic_simplify 1.13% : 0.000008s : 65: predicate.cast_eliminate 1.14% : 0.000008s : 65: predicate.check_bprop_eliminate 0.52% : 0.000004s : 30: predicate.compare_switch_simplify 0.08% : 0.000001s : 7: predicate.const_output_eliminate 0.51% : 0.000004s : 30: predicate.depend_value_elim 1.21% : 0.000008s : 65: predicate.dict_get_item_const_eliminator 1.21% : 0.000008s : 65: predicate.dict_get_item_eliminator 1.17% : 0.000008s : 65: predicate.dict_set_item_eliminator 0.35% : 0.000002s : 14: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 7: predicate.elim_not_effective 0.16% : 0.000001s : 7: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 72: predicate.environ_add_const_eliminate 1.20% : 0.000008s : 72: predicate.environ_get_add_eliminate 1.18% : 0.000008s : 72: predicate.environ_get_depend_swap 1.76% : 0.000012s : 102: predicate.environ_get_eliminate 1.20% : 0.000008s : 72: predicate.environ_get_set_eliminate 1.74% : 0.000012s : 95: predicate.exchange_switch_depend_value 2.33% : 0.000016s : 95: predicate.float_depend_g_call 0.50% : 0.000004s : 30: predicate.float_environ_get_switch 0.68% : 0.000005s : 37: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 7: predicate.fold_const_symbol 0.56% : 0.000004s : 30: predicate.get_grad_eliminate 0.08% : 0.000001s : 7: predicate.graph_param_transform 0.55% : 0.000004s : 30: predicate.incorporate_call 0.49% : 0.000003s : 30: predicate.incorporate_call_switch 5.59% : 0.000039s : 234: predicate.inline 1.26% : 0.000009s : 53: predicate.inline_without_move 0.31% : 0.000002s : 30: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 30: predicate.less_batch_normalization 1.66% : 0.000012s : 93: predicate.list_to_tuple_eliminator_ 2.67% : 0.000019s : 158: predicate.load_eliminater 0.32% : 0.000002s : 7: predicate.loop_unroll_after_grad 2.27% : 0.000016s : 126: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 79: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 30: predicate.merge_addn 1.11% : 0.000008s : 65: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 65: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 65: predicate.minmaximum_grad 0.32% : 0.000002s : 7: predicate.mutable_eliminate 0.14% : 0.000001s : 7: predicate.opt_reshape 0.18% : 0.000001s : 7: predicate.parallel_virtual_node 2.03% : 0.000014s : 95: predicate.partial_defer_inline 1.73% : 0.000012s : 86: predicate.partial_eliminate 1.09% : 0.000008s : 65: predicate.print_const_string_wrapper 0.52% : 0.000004s : 30: predicate.reduce_all_const_elim 1.27% : 0.000009s : 65: predicate.reduce_eliminate 2.68% : 0.000019s : 158: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 30: predicate.remove_not_recompute_node 1.93% : 0.000014s : 144: predicate.replace_applicator 0.62% : 0.000004s : 53: predicate.replace_old_param 0.10% : 0.000001s : 7: predicate.reset_defer_inline 1.08% : 0.000008s : 65: predicate.reshape_eliminate 1.15% : 0.000008s : 65: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 7: predicate.row_tensor_eliminate 1.26% : 0.000009s : 65: predicate.same_eliminate 0.40% : 0.000003s : 30: predicate.set_cell_output_no_recompute 0.64% : 0.000004s : 30: predicate.shard_identity_eliminate 0.27% : 0.000002s : 14: predicate.special_op_eliminate 0.64% : 0.000004s : 30: predicate.specialize_transform 1.21% : 0.000009s : 65: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 53: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 7: predicate.switch_call_monad_eliminater 1.87% : 0.000013s : 95: predicate.switch_defer_inline 2.96% : 0.000021s : 160: predicate.switch_layer_defer_inline 4.98% : 0.000035s : 258: predicate.switch_simplify 1.08% : 0.000008s : 65: predicate.tile_eliminate 1.07% : 0.000008s : 65: predicate.transpose_eliminate 1.44% : 0.000010s : 79: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 79: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000009s : 79: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000020s : 123: predicate.tuple_list_get_item_eliminator 1.40% : 0.000010s : 79: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000015s : 109: predicate.tuple_list_set_item_eliminator 1.61% : 0.000011s : 93: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 158: predicate.updatestate_pure_node_eliminater 3.28% : 0.000023s : 188: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 7: predicate.value_based_eliminate 0.57% : 0.000004s : 30: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 30: predicate.virtual_output_eliminate 0.12% : 0.000001s : 7: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 7: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001479 32 57.29% : 0.000847s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.71% : 0.000632s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.104173 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.89% : 0.003006s : 1: add_attr 2.88% : 0.002998s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000057s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.12% : 0.000121s : 1: auto_monad 0.02% : 0.000025s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.50% : 0.000524s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000029s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000011s : 1: environ_conv 0.05% : 0.000050s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.42% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 4.44% : 0.004624s : 117: opt.transform.opt_a 0.04% : 0.000041s : 1: opt.transform.opt_after_cconv 0.03% : 0.000030s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000155s : 28: opt.transform.opt_b 0.06% : 0.000066s : 2: opt.transform.opt_trans_graph 0.04% : 0.000046s : 4: opt.transform.symbol_engine_opt 9.98% : 0.010399s : 1: opt_a 0.12% : 0.000125s : 1: opt_after_cconv 0.45% : 0.000472s : 1: opt_after_jit_grad 0.25% : 0.000263s : 1: opt_b 12.05% : 0.012549s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000024s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000051s : 1: pre_auto_parallel 0.04% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000030s : 1: remove_dup_value 1.37% : 0.001427s : 2: renormalize.infer 1.37% : 0.001423s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000045s : 1: rewriter_after_opt_a 0.13% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000092s : 1: symbol_engine_optimizer 51.49% : 0.053643s : 1: task_emit 0.09% : 0.000096s : 1: tuple_transform 9.83% : 0.010242s : 1: type_inference 0.06% : 0.000065s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x3-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x4-pynative],max_mem:42.0M TotalTime = 0.0215076, [24] [bootstrap]: 0.0005439 [type_inference]: 0.00623564 [event_method]: 1.461e-05 [auto_monad]: 5.307e-05 [graph_reusing]: 5.30999e-06 [inline]: 1.94e-06 [add_attr]: 0.00344177, [1] [add_attr_with_inline]: 0.00343042, [1] [Cycle 1]: 4.356e-05, [2] [tag_attr]: 1.514e-05 [meta_addattr_fg_expand]: 4.19002e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.956e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.00397124, [53] [py_interpret_to_execute]: 1.923e-05 [rewriter_before_opt_a]: 5.936e-05 [opt_a]: 0.0021157, [2] [Cycle 1]: 0.00151635, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 3.294e-05 [loop_unroll]: 2.074e-05 [a_1]: 0.000452 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 7.41999e-06 [updatestate_depend_eliminate]: 3.76001e-06 [updatestate_assign_eliminate]: 3.26999e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.631e-05 [accelerated_algorithm]: 6.25002e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 7.77e-06 [auto_parallel]: 6.07001e-06 [parallel]: 2.419e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 4.28001e-06 [allreduce_fusion]: 3.63e-06 [matmul_add_comm_reduction]: 8.42e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.24001e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.96001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.04998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.46998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 8.95999e-06 [renormalize]: 0.00042087 [add_forward_monad_depend]: 4.42e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.379e-05 [cse]: 2.561e-05 [a_3]: 4.049e-05 [Cycle 2]: 0.00058958, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.78998e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012593 [with_stream_mark]: 9.89999e-06 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.23002e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 6.787e-05 [accelerated_algorithm]: 5.67001e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.39998e-06 [flash_sp]: 3.08998e-06 [merge_comm]: 3.00002e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.04999e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 4.92999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.24999e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.11998e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.95998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.73998e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.09988e-07 [receive_attached]: 1.24003e-06 [after_resolve]: 8.87999e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.29001e-06 [cse]: 1.547e-05 [a_3]: 3.143e-05 [py_interpret_to_execute_after_opt_a]: 7.38e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 2.882e-05 [convert_after_rewriter]: 6.94999e-06 [order_py_execute_after_rewriter]: 4.84e-06 [mutable_eliminate]: 0.00045254 [opt_b]: 0.00018238, [1] [Cycle 1]: 0.00017614, [7] [b_1]: 0.0001069 [b_2]: 7.89002e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.43002e-06 [renormalize]: 5.50004e-07 [cse]: 1.687e-05 [optimize_parallel_all_gather_comm]: 1.566e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.185e-05 [loop_unroll]: 0.00041402 [opt_after_cconv]: 9.374e-05, [1] [Cycle 1]: 8.806e-05, [7] [c_1]: 2.749e-05 [parameter_eliminate]: 2.07001e-06 [updatestate_depend_eliminate]: 4.94998e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.23002e-06 [cse]: 1.6e-05 [renormalize]: 4.99975e-07 [remove_dup_value]: 1.247e-05 [tuple_transform]: 6.937e-05, [1] [Cycle 1]: 6.467e-05, [4] [d_1]: 3.917e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.20002e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 6.339e-05 [cse_after_recomputation]: 2.137e-05, [1] [Cycle 1]: 1.687e-05, [1] [cse]: 1.156e-05 [environ_conv]: 4.97e-06 [swap_dp_allreduce_reducescatter]: 5.51002e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.94001e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.64999e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.16001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.48002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.13002e-06 [overlap_grad_ring_attention]: 3.75e-06 [overlap_grad_flash_sp]: 1.77e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.837e-05, [1] [Cycle 1]: 6.44e-05, [6] [build]: 2.79999e-06 [elim_shapecalc]: 8.89998e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.75999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.522e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 0.0001267 [opt_after_jit_grad]: 0.00045351 [validate]: 3.049e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00636332 [execute]: 7.36999e-06 Sums bootstrap : 0.000544s : 3.18% type_inference : 0.006236s : 36.45% event_method : 0.000015s : 0.09% auto_monad : 0.000053s : 0.31% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000578s : 3.38% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000421s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.65% optimize.opt_b.b_1 : 0.000107s : 0.62% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000414s : 2.42% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000063s : 0.37% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000127s : 0.74% opt_after_jit_grad : 0.000454s : 2.65% validate : 0.000030s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.006363s : 37.20% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.59% : 0.000024s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.23% : 0.000005s : 4: substitution.graph_param_transform 66.92% : 0.000109s : 3: substitution.inline 1.83% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.66% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006192 2 90.71% : 0.005617s : 1: type_inference.infer 9.29% : 0.000575s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.86% : 0.000027s : 3: replace.inline 30.14% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.63% : 0.000107s : 3: match.inline 8.37% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000009s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000002s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.57% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.92% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.18% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.55% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000365 8 46.99% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.01% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030436 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.32% : 0.003446s : 1: add_attr 11.28% : 0.003434s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.22% : 0.000068s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000058s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000580s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.10% : 0.000943s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.96% : 0.002119s : 1: opt_a 0.32% : 0.000097s : 1: opt_after_cconv 1.52% : 0.000464s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.06% : 0.003975s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000215s : 1: renormalize.infer 0.65% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000132s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.94% : 0.006373s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.53% : 0.006249s : 1: type_inference 0.19% : 0.000057s : 1: validate TotalTime = 0.018085, [24] [bootstrap]: 0.00046622 [type_inference]: 0.00437551 [event_method]: 1.081e-05 [auto_monad]: 5.076e-05 [graph_reusing]: 4.68001e-06 [inline]: 1.94e-06 [add_attr]: 0.00298528, [1] [add_attr_with_inline]: 0.00297785, [1] [Cycle 1]: 4.574e-05, [2] [tag_attr]: 1.181e-05 [meta_addattr_fg_expand]: 2.97002e-06 [parallel-infer-symbol]: 2.63998e-06 [pre_auto_parallel]: 2.101e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 1.78002e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0036702, [53] [py_interpret_to_execute]: 1.579e-05 [rewriter_before_opt_a]: 3.904e-05 [opt_a]: 0.00183295, [2] [Cycle 1]: 0.00124006, [45] [expand_dump_flag]: 2.56998e-06 [switch_simplify]: 2.486e-05 [loop_unroll]: 1.37e-05 [a_1]: 0.0002907 [with_stream_mark]: 1.305e-05 [recompute_prepare]: 7.3e-06 [updatestate_depend_eliminate]: 3.46001e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 3.23998e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.53998e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.76998e-06 [merge_send_recv]: 7.85e-06 [auto_parallel]: 5.47001e-06 [parallel]: 1.802e-05 [flash_sp]: 7.14001e-06 [merge_comm]: 3.57002e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 6.76e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.39e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.29998e-06 [before_grad]: 9.36998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.38998e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.31998e-06 [after_resolve]: 9.94001e-06 [a_after_grad]: 9.07999e-06 [renormalize]: 0.00033701 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.347e-05 [cse]: 2.56e-05 [a_3]: 3.906e-05 [Cycle 2]: 0.00058335, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.51999e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012374 [with_stream_mark]: 9.56e-06 [recompute_prepare]: 5.58002e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.11e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.647e-05 [accelerated_algorithm]: 5.39998e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.23999e-06 [auto_parallel]: 5.09998e-06 [parallel]: 4.45e-06 [flash_sp]: 3.22002e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 4.95001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 5.93002e-06 [virtual_dataset]: 5.12999e-06 [get_grad_eliminate_]: 4.87e-06 [virtual_output]: 4.80999e-06 [merge_forward]: 2.53998e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 8.81997e-06 [a_after_grad]: 7.83999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.208e-05 [a_3]: 3.186e-05 [py_interpret_to_execute_after_opt_a]: 7.63001e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.11e-05 [convert_after_rewriter]: 6.51e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00045012 [opt_b]: 0.00017966, [1] [Cycle 1]: 0.00017361, [7] [b_1]: 0.00010548 [b_2]: 7.26001e-06 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.11e-06 [renormalize]: 4.89992e-07 [cse]: 1.652e-05 [optimize_parallel_all_gather_comm]: 1.658e-05 [overlap_param_gather]: 2.08002e-06 [cconv]: 2.257e-05 [loop_unroll]: 0.00041267 [opt_after_cconv]: 9.335e-05, [1] [Cycle 1]: 8.785e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.558e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.263e-05 [tuple_transform]: 6.92e-05, [1] [Cycle 1]: 6.488e-05, [4] [d_1]: 3.888e-05 [none_parameter_eliminate]: 1.95001e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.10002e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 7.726e-05 [cse_after_recomputation]: 2.008e-05, [1] [Cycle 1]: 1.568e-05, [1] [cse]: 1.051e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 5.13002e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.07998e-06 [label_fine_grained_interleaved_index]: 2.53998e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 1.96e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.23e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 3.89002e-06 [overlap_grad_flash_sp]: 1.714e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.28998e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.907e-05, [1] [Cycle 1]: 6.493e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.55999e-06 [elim_not_effective]: 1.168e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.71002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.66002e-06 [auto_monad_reorder]: 1.562e-05 [get_jit_bprop_graph]: 9.30013e-07 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00044602 [validate]: 3.041e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.00579056 [execute]: 6.83e-06 Sums bootstrap : 0.000466s : 3.29% type_inference : 0.004376s : 30.92% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000414s : 2.93% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000337s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.18% optimize.opt_b.b_1 : 0.000105s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000413s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000077s : 0.55% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.09% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000446s : 3.15% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005791s : 40.92% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000117 26 18.14% : 0.000021s : 4: substitution.arithmetic_simplify 1.70% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.76% : 0.000006s : 4: substitution.graph_param_transform 65.33% : 0.000076s : 2: substitution.inline 2.23% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.59% : 0.000004s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004335 2 92.11% : 0.003993s : 1: type_inference.infer 7.89% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000135 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.71% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.87% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.57% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.92% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 9: predicate.reduce_eliminate 2.48% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.53% : 0.000001s : 4: predicate.row_tensor_eliminate 1.03% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.84% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.36% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.64% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025996 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.50% : 0.002989s : 1: add_attr 11.47% : 0.002981s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.31% : 0.000081s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000500s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000760s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.06% : 0.001836s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.75% : 0.000455s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.13% : 0.003674s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000185s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000072s : 1: symbol_engine_optimizer 22.31% : 0.005800s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.88% : 0.004389s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0197569, [24] [bootstrap]: 0.00048976 [type_inference]: 0.00564702 [event_method]: 1.329e-05 [auto_monad]: 5.507e-05 [graph_reusing]: 5.71e-06 [inline]: 1.82001e-06 [add_attr]: 0.00302236, [1] [add_attr_with_inline]: 0.00301467, [1] [Cycle 1]: 4.47e-05, [2] [tag_attr]: 1.523e-05 [meta_addattr_fg_expand]: 4.33001e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.423e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.00392239, [53] [py_interpret_to_execute]: 1.916e-05 [rewriter_before_opt_a]: 5.788e-05 [opt_a]: 0.00207668, [2] [Cycle 1]: 0.00148122, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 3.163e-05 [loop_unroll]: 2.076e-05 [a_1]: 0.00044375 [with_stream_mark]: 1.334e-05 [recompute_prepare]: 7.35e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 1.49998e-06 [a_2]: 7.626e-05 [accelerated_algorithm]: 6.50997e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 7.46001e-06 [auto_parallel]: 5.97001e-06 [parallel]: 1.702e-05 [flash_sp]: 7.29001e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.84001e-06 [virtual_dataset]: 6.23002e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.48002e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.34e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.054e-05 [merge_recompute_call_nodes]: 1.64998e-06 [before_grad]: 9.42001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 2.49999e-06 [flash_sp_send_recv_attached]: 3.08998e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.022e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00040198 [add_forward_monad_depend]: 5.32999e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.295e-05 [cse]: 2.703e-05 [a_3]: 4.007e-05 [Cycle 2]: 0.0005858, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.00012504 [with_stream_mark]: 9.64e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.744e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.33001e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.45e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 3.00998e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.04998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.93998e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 4.88001e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 5.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.07002e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.12e-06 [after_resolve]: 8.97999e-06 [a_after_grad]: 8.1e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 5.92999e-06 [cse]: 1.516e-05 [a_3]: 3.17e-05 [py_interpret_to_execute_after_opt_a]: 8.11002e-06 [slice_cell_reuse_recomputed_activation]: 1.80001e-06 [rewriter_after_opt_a]: 3.047e-05 [convert_after_rewriter]: 6.69999e-06 [order_py_execute_after_rewriter]: 5.03002e-06 [mutable_eliminate]: 0.00044248 [opt_b]: 0.00018214, [1] [Cycle 1]: 0.00017602, [7] [b_1]: 0.00010832 [b_2]: 6.55002e-06 [updatestate_depend_eliminate]: 5.48002e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 3.89991e-07 [cse]: 1.646e-05 [optimize_parallel_all_gather_comm]: 1.486e-05 [overlap_param_gather]: 2.24001e-06 [cconv]: 2.215e-05 [loop_unroll]: 0.00044297 [opt_after_cconv]: 9.55e-05, [1] [Cycle 1]: 8.972e-05, [7] [c_1]: 2.92e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.07999e-06 [cse]: 1.497e-05 [renormalize]: 1.90019e-07 [remove_dup_value]: 1.295e-05 [tuple_transform]: 6.794e-05, [1] [Cycle 1]: 6.372e-05, [4] [d_1]: 3.849e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.07001e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.374e-05 [cse_after_recomputation]: 1.917e-05, [1] [Cycle 1]: 1.495e-05, [1] [cse]: 1.001e-05 [environ_conv]: 4.73001e-06 [swap_dp_allreduce_reducescatter]: 5.27001e-06 [bias_add_comm_swap]: 2.49999e-06 [label_micro_interleaved_index]: 4.02998e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.01998e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.53003e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81998e-06 [control_data_broadcast_order]: 1.188e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.23e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.42e-06 [overlap_grad_flash_sp]: 1.795e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 2.03002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.568e-05, [1] [Cycle 1]: 6.156e-05, [6] [build]: 2.39999e-06 [elim_shapecalc]: 7.85e-06 [elim_not_effective]: 1.077e-05 [opt_reshape]: 5.80002e-06 [fold_const_symbol]: 8.50001e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.556e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.60998e-06 [opt_after_jit_grad]: 0.00044318 [validate]: 3.044e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00587145 [execute]: 6.91999e-06 Sums bootstrap : 0.000490s : 3.10% type_inference : 0.005647s : 35.76% event_method : 0.000013s : 0.08% auto_monad : 0.000055s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000569s : 3.60% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000402s : 2.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000442s : 2.80% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000443s : 2.80% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000443s : 2.81% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005871s : 37.18% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000161 30 14.99% : 0.000024s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.10% : 0.000005s : 4: substitution.graph_param_transform 66.38% : 0.000107s : 3: substitution.inline 1.85% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000004s : 4: substitution.remove_not_recompute_node 2.44% : 0.000004s : 4: substitution.replace_old_param 6.74% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005607 2 90.34% : 0.005065s : 1: type_inference.infer 9.66% : 0.000542s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.54% : 0.000027s : 3: replace.inline 29.46% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.45% : 0.000105s : 3: match.inline 8.55% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.42% : 0.000004s : 19: predicate.arithmetic_simplify 1.00% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.95% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.52% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.61% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.26% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 46.86% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.14% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028194 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.73% : 0.003027s : 1: add_attr 10.70% : 0.003018s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000524s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000018s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000452s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.60% : 0.000451s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.31% : 0.000932s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000030s : 4: opt.transform.symbol_engine_opt 7.38% : 0.002079s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.61% : 0.000453s : 1: opt_after_jit_grad 0.66% : 0.000186s : 1: opt_b 13.93% : 0.003926s : 1: optimize 0.06% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000207s : 1: renormalize.infer 0.67% : 0.000188s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000068s : 1: symbol_engine_optimizer 20.86% : 0.005882s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 20.08% : 0.005660s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0372157, [24] [bootstrap]: 0.00048896 [type_inference]: 0.011319 [event_method]: 4.66e-05 [auto_monad]: 0.00011955 [graph_reusing]: 7.77998e-06 [inline]: 1.89999e-06 [add_attr]: 0.00299599, [1] [add_attr_with_inline]: 0.00298738, [1] [Cycle 1]: 6.969e-05, [2] [tag_attr]: 3.46e-05 [meta_addattr_fg_expand]: 9.12001e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 4.997e-05 [insert-virtual-dataset]: 2.33998e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0132637, [53] [py_interpret_to_execute]: 3.806e-05 [rewriter_before_opt_a]: 0.00014758 [opt_a]: 0.0109711, [3] [Cycle 1]: 0.00702849, [45] [expand_dump_flag]: 3.7e-06 [switch_simplify]: 7.51e-05 [loop_unroll]: 6.109e-05 [a_1]: 0.00144478 [with_stream_mark]: 2.263e-05 [recompute_prepare]: 2.102e-05 [updatestate_depend_eliminate]: 9.00001e-06 [updatestate_assign_eliminate]: 7.82002e-06 [updatestate_loads_eliminate]: 7.28999e-06 [parameter_eliminate]: 2.48e-06 [a_2]: 0.00024233 [accelerated_algorithm]: 3e-05 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 3.3e-06 [shard_inline]: 1.641e-05 [merge_send_recv]: 1.602e-05 [auto_parallel]: 1.082e-05 [parallel]: 1.901e-05 [flash_sp]: 1.107e-05 [merge_comm]: 9.89001e-06 [allreduce_fusion]: 8.97999e-06 [matmul_add_comm_reduction]: 2.588e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.753e-05 [virtual_dataset]: 1.556e-05 [get_grad_eliminate_]: 1.528e-05 [virtual_output]: 1.525e-05 [merge_forward]: 9.61e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.78e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.899e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 2.749e-05 [set_forward_comm_id_for_comm_node_pass]: 9.69999e-06 [meta_fg_expand]: 0.0014226 [flash_sp_send_recv_attached]: 3.68999e-06 [receive_attached]: 2.16e-06 [after_resolve]: 6.156e-05 [a_after_grad]: 8.572e-05 [renormalize]: 0.00239515 [add_forward_monad_depend]: 9.68997e-06 [auto_monad_grad]: 4.99e-06 [auto_monad_eliminator]: 5.499e-05 [cse]: 0.0001626 [a_3]: 0.00033392 [Cycle 2]: 0.0030359, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 9.03e-05 [loop_unroll]: 4.482e-05 [a_1]: 0.00152653 [with_stream_mark]: 1.184e-05 [recompute_prepare]: 1.083e-05 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 4.41002e-06 [updatestate_loads_eliminate]: 3.75998e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012513 [accelerated_algorithm]: 1.156e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 9.25001e-06 [merge_send_recv]: 6.66e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.75001e-06 [flash_sp]: 3.32002e-06 [merge_comm]: 5.31002e-06 [allreduce_fusion]: 4.68001e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.04e-05 [virtual_dataset]: 9.00001e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.50001e-06 [merge_forward]: 4.37998e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 9.47001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.653e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.404e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32999e-06 [meta_fg_expand]: 6.947e-05 [flash_sp_send_recv_attached]: 1.04998e-06 [receive_attached]: 1.22e-06 [after_resolve]: 1.671e-05 [a_after_grad]: 1.449e-05 [renormalize]: 0.00058639 [add_forward_monad_depend]: 4.05e-06 [auto_monad_grad]: 1.13001e-06 [auto_monad_eliminator]: 1.468e-05 [cse]: 4.545e-05 [a_3]: 6.446e-05 [Cycle 3]: 0.00089298, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 1.036e-05 [loop_unroll]: 8.85001e-06 [a_1]: 0.00024893 [with_stream_mark]: 1.002e-05 [recompute_prepare]: 9.05999e-06 [updatestate_depend_eliminate]: 4.60999e-06 [updatestate_assign_eliminate]: 3.87998e-06 [updatestate_loads_eliminate]: 3.76999e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012238 [accelerated_algorithm]: 1.147e-05 [shard]: 8.89995e-07 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.13002e-06 [merge_send_recv]: 6.83e-06 [auto_parallel]: 7.46001e-06 [parallel]: 4.99e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 4.95999e-06 [allreduce_fusion]: 5.07e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.027e-05 [virtual_dataset]: 8.66997e-06 [get_grad_eliminate_]: 8.48001e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.04002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 8.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.557e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 4.87998e-06 [meta_fg_expand]: 3.01001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.319e-05 [a_after_grad]: 1.403e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 1.065e-05 [cse]: 2.401e-05 [a_3]: 6.012e-05 [py_interpret_to_execute_after_opt_a]: 9.92999e-06 [slice_cell_reuse_recomputed_activation]: 1.73002e-06 [rewriter_after_opt_a]: 4.796e-05 [convert_after_rewriter]: 8.90999e-06 [order_py_execute_after_rewriter]: 6.46999e-06 [mutable_eliminate]: 0.00049236 [opt_b]: 0.00028555, [1] [Cycle 1]: 0.00027906, [7] [b_1]: 0.00018778 [b_2]: 1.036e-05 [updatestate_depend_eliminate]: 7.07002e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 3.86001e-06 [renormalize]: 7.39994e-07 [cse]: 3.042e-05 [optimize_parallel_all_gather_comm]: 2.045e-05 [overlap_param_gather]: 1.88997e-06 [cconv]: 1.998e-05 [loop_unroll]: 0.00042195 [opt_after_cconv]: 0.00013555, [1] [Cycle 1]: 0.00012941, [7] [c_1]: 4.834e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 4.03001e-06 [cse]: 2.887e-05 [renormalize]: 5.39992e-07 [remove_dup_value]: 2.969e-05 [tuple_transform]: 0.00010133, [1] [Cycle 1]: 9.678e-05, [4] [d_1]: 6.596e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.049e-05 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 6.05e-05 [cse_after_recomputation]: 3.131e-05, [1] [Cycle 1]: 2.669e-05, [1] [cse]: 2.118e-05 [environ_conv]: 8.18999e-06 [swap_dp_allreduce_reducescatter]: 7.48999e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.30999e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.32999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.724e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 4.65001e-06 [overlap_recompute_and_grad_model_parallel]: 5.84999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 5.32999e-06 [overlap_grad_flash_sp]: 2.385e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.22999e-06 [symbol_engine_optimizer]: 9.773e-05, [1] [Cycle 1]: 9.366e-05, [6] [build]: 9.56e-06 [elim_shapecalc]: 1.346e-05 [elim_not_effective]: 1.769e-05 [opt_reshape]: 1.009e-05 [fold_const_symbol]: 1.476e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 2.454e-05 [get_jit_bprop_graph]: 1.13001e-06 [rewriter_after_jit_bprop_graph]: 3.54002e-06 [opt_after_jit_grad]: 0.00046961 [validate]: 4.377e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00815655 [execute]: 6.09001e-06 Sums bootstrap : 0.000489s : 1.48% type_inference : 0.011319s : 34.33% event_method : 0.000047s : 0.14% auto_monad : 0.000120s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.12% optimize.rewriter_before_opt_a : 0.000148s : 0.45% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000176s : 0.53% optimize.opt_a.loop_unroll : 0.000115s : 0.35% optimize.opt_a.a_1 : 0.003220s : 9.77% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000490s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001495s : 4.53% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.28% optimize.opt_a.a_after_grad : 0.000114s : 0.35% optimize.opt_a.renormalize : 0.002982s : 9.04% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000232s : 0.70% optimize.opt_a.a_3 : 0.000458s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000492s : 1.49% optimize.opt_b.b_1 : 0.000188s : 0.57% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000422s : 1.28% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000061s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.42% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008157s : 24.74% execute : 0.000006s : 0.02% Time group info: ------[substitution.] 0.000753 222 5.88% : 0.000044s : 12: substitution.arithmetic_simplify 1.91% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.74% : 0.000420s : 17: substitution.inline 2.07% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.81% : 0.000014s : 20: substitution.remove_not_recompute_node 3.20% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.73% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011245 2 86.96% : 0.009779s : 1: type_inference.infer 13.04% : 0.001466s : 1: type_inference.specialize ------[replace.] 0.000215 33 57.29% : 0.000123s : 17: replace.inline 42.71% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 33 92.37% : 0.000411s : 17: match.inline 7.63% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 249: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.65% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.23% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.91% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.05% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001554 34 57.79% : 0.000898s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.21% : 0.000656s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061708 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.86% : 0.003000s : 1: add_attr 4.85% : 0.002991s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.85% : 0.000522s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.81% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.98% : 0.004926s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.78% : 0.010974s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.78% : 0.000479s : 1: opt_after_jit_grad 0.47% : 0.000289s : 1: opt_b 21.50% : 0.013268s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.62% : 0.001614s : 2: renormalize.infer 2.20% : 0.001355s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.25% : 0.000152s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.24% : 0.008167s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.37% : 0.011333s : 1: type_inference 0.12% : 0.000075s : 1: validate TotalTime = 0.0184691, [24] [bootstrap]: 0.00045758 [type_inference]: 0.00444753 [event_method]: 1.019e-05 [auto_monad]: 5.077e-05 [graph_reusing]: 5.12999e-06 [inline]: 2.11e-06 [add_attr]: 0.00295303, [1] [add_attr_with_inline]: 0.00294506, [1] [Cycle 1]: 4.617e-05, [2] [tag_attr]: 1.242e-05 [meta_addattr_fg_expand]: 2.83998e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.162e-05 [insert-virtual-dataset]: 2.54999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.08002e-06 [pipeline_split]: 1.51998e-06 [optimize]: 0.00364565, [53] [py_interpret_to_execute]: 1.536e-05 [rewriter_before_opt_a]: 3.934e-05 [opt_a]: 0.00186204, [2] [Cycle 1]: 0.00126611, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.453e-05 [loop_unroll]: 1.327e-05 [a_1]: 0.00029107 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 6.82002e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 3.14001e-06 [parameter_eliminate]: 1.90001e-06 [a_2]: 7.585e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.56e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 7.39002e-06 [auto_parallel]: 5.51002e-06 [parallel]: 1.68e-05 [flash_sp]: 7.49002e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 6.86999e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.39e-06 [merge_forward]: 3.51999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.20999e-06 [cell_reuse_handle_not_recompute_node_pass]: 3.69e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 1.007e-05 [set_forward_comm_id_for_comm_node_pass]: 3.70998e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.86e-06 [receive_attached]: 2.23002e-06 [after_resolve]: 1.073e-05 [a_after_grad]: 9.07001e-06 [renormalize]: 0.00033542 [add_forward_monad_depend]: 4.4e-06 [auto_monad_grad]: 2.03002e-06 [auto_monad_eliminator]: 1.242e-05 [cse]: 2.526e-05 [a_3]: 4.052e-05 [Cycle 2]: 0.0005866, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012443 [with_stream_mark]: 1.038e-05 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 6.759e-05 [accelerated_algorithm]: 5.46998e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.02e-06 [flash_sp]: 2.91999e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.91002e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 5.75001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25002e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.22999e-06 [after_resolve]: 8.72998e-06 [a_after_grad]: 7.87e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 7.50006e-07 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.237e-05 [a_3]: 3.176e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.047e-05 [convert_after_rewriter]: 6.72002e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00044299 [opt_b]: 0.00017815, [1] [Cycle 1]: 0.00017212, [7] [b_1]: 0.00010481 [b_2]: 7.16999e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.41998e-06 [renormalize]: 5.09986e-07 [cse]: 1.584e-05 [optimize_parallel_all_gather_comm]: 1.507e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.186e-05 [loop_unroll]: 0.00041026 [opt_after_cconv]: 9.338e-05, [1] [Cycle 1]: 8.758e-05, [7] [c_1]: 2.71e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.589e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 6.91e-05, [1] [Cycle 1]: 6.484e-05, [4] [d_1]: 3.842e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 2.06e-06 [add_recomputation]: 4.307e-05 [cse_after_recomputation]: 1.933e-05, [1] [Cycle 1]: 1.509e-05, [1] [cse]: 1.039e-05 [environ_conv]: 4.74998e-06 [swap_dp_allreduce_reducescatter]: 4.99003e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.01001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.10002e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 8.19971e-07 [remove_cast_before_assign_add]: 1.36998e-06 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.48998e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.2e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.72002e-06 [overlap_grad_flash_sp]: 1.564e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.893e-05, [1] [Cycle 1]: 6.487e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.75001e-06 [elim_not_effective]: 1.166e-05 [opt_reshape]: 6.33e-06 [fold_const_symbol]: 8.45999e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.515e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.95e-06 [opt_after_jit_grad]: 0.00044398 [validate]: 3.041e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00617146 [execute]: 6.63e-06 Sums bootstrap : 0.000458s : 3.14% type_inference : 0.004448s : 30.52% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000416s : 2.85% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000046s : 0.32% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000335s : 2.30% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000443s : 3.04% optimize.opt_b.b_1 : 0.000105s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000410s : 2.82% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000444s : 3.05% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006171s : 42.35% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 17.98% : 0.000021s : 4: substitution.arithmetic_simplify 1.58% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.47% : 0.000005s : 4: substitution.graph_param_transform 65.56% : 0.000078s : 2: substitution.inline 2.39% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.75% : 0.000004s : 4: substitution.remove_not_recompute_node 3.27% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004406 2 91.31% : 0.004023s : 1: type_inference.infer 8.69% : 0.000383s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.96% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.22% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.22% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.87% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.03% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 1.07% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.44% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 43.02% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.98% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026351 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.22% : 0.002957s : 1: add_attr 11.19% : 0.002949s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000491s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000452s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.00% : 0.000790s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000088s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001865s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.72% : 0.000453s : 1: opt_after_jit_grad 0.69% : 0.000181s : 1: opt_b 13.85% : 0.003649s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000186s : 1: renormalize.infer 0.54% : 0.000143s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 23.46% : 0.006181s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.93% : 0.004461s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0359609, [24] [bootstrap]: 0.00050767 [type_inference]: 0.0102589 [event_method]: 4.197e-05 [auto_monad]: 0.00011366 [graph_reusing]: 7.95998e-06 [inline]: 1.81e-06 [add_attr]: 0.00299558, [1] [add_attr_with_inline]: 0.0029875, [1] [Cycle 1]: 6.638e-05, [2] [tag_attr]: 3.123e-05 [meta_addattr_fg_expand]: 8.22e-06 [parallel-infer-symbol]: 3.34001e-06 [pre_auto_parallel]: 4.698e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.0129469, [53] [py_interpret_to_execute]: 3.498e-05 [rewriter_before_opt_a]: 0.00012611 [opt_a]: 0.0107152, [3] [Cycle 1]: 0.00683955, [45] [expand_dump_flag]: 3.61999e-06 [switch_simplify]: 6.65e-05 [loop_unroll]: 5.433e-05 [a_1]: 0.00132632 [with_stream_mark]: 2.215e-05 [recompute_prepare]: 2.188e-05 [updatestate_depend_eliminate]: 9.31002e-06 [updatestate_assign_eliminate]: 7.54002e-06 [updatestate_loads_eliminate]: 7.85998e-06 [parameter_eliminate]: 2.36998e-06 [a_2]: 0.00024452 [accelerated_algorithm]: 3.042e-05 [shard]: 1.79e-06 [meta_shard_fg_expand]: 3.05002e-06 [shard_inline]: 1.583e-05 [merge_send_recv]: 1.573e-05 [auto_parallel]: 1.081e-05 [parallel]: 1.762e-05 [flash_sp]: 1.138e-05 [merge_comm]: 9.66998e-06 [allreduce_fusion]: 9.31e-06 [matmul_add_comm_reduction]: 2.543e-05 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.794e-05 [virtual_dataset]: 1.556e-05 [get_grad_eliminate_]: 1.488e-05 [virtual_output]: 1.498e-05 [merge_forward]: 9.37999e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.702e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.839e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.725e-05 [set_forward_comm_id_for_comm_node_pass]: 9.76e-06 [meta_fg_expand]: 0.00141566 [flash_sp_send_recv_attached]: 4.07998e-06 [receive_attached]: 2.58e-06 [after_resolve]: 5.988e-05 [a_after_grad]: 8.057e-05 [renormalize]: 0.00233239 [add_forward_monad_depend]: 8.63001e-06 [auto_monad_grad]: 5.18002e-06 [auto_monad_eliminator]: 5.476e-05 [cse]: 0.00016023 [a_3]: 0.00036566 [Cycle 2]: 0.00293085, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 4.682e-05 [loop_unroll]: 4.364e-05 [a_1]: 0.00152232 [with_stream_mark]: 1.167e-05 [recompute_prepare]: 1.051e-05 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 0.00012464 [accelerated_algorithm]: 1.156e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 6.98e-06 [auto_parallel]: 7.16999e-06 [parallel]: 4.77e-06 [flash_sp]: 3.38e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.031e-05 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.57998e-06 [virtual_output]: 8.80001e-06 [merge_forward]: 4.96002e-06 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 9.41e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.664e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.408e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35999e-06 [meta_fg_expand]: 3.543e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.453e-05 [a_after_grad]: 1.43e-05 [renormalize]: 0.0005745 [add_forward_monad_depend]: 4.12e-06 [auto_monad_grad]: 1.09998e-06 [auto_monad_eliminator]: 1.4e-05 [cse]: 4.529e-05 [a_3]: 6.41e-05 [Cycle 3]: 0.00093112, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.032e-05 [loop_unroll]: 8.59002e-06 [a_1]: 0.00024748 [with_stream_mark]: 9.76e-06 [recompute_prepare]: 9.25001e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.0001228 [accelerated_algorithm]: 1.149e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.68002e-06 [shard_inline]: 8.97999e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 6.91001e-06 [parallel]: 4.83001e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 4.94998e-06 [allreduce_fusion]: 4.75999e-06 [matmul_add_comm_reduction]: 7.72002e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 9.86998e-06 [virtual_dataset]: 8.59e-06 [get_grad_eliminate_]: 8.30999e-06 [virtual_output]: 8.13001e-06 [merge_forward]: 4.26001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 8.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.57e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.389e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 3.04001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.328e-05 [a_after_grad]: 5.386e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.43002e-06 [auto_monad_grad]: 1.04998e-06 [auto_monad_eliminator]: 1.123e-05 [cse]: 2.54e-05 [a_3]: 5.721e-05 [py_interpret_to_execute_after_opt_a]: 1.124e-05 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 4.96e-05 [convert_after_rewriter]: 9.03002e-06 [order_py_execute_after_rewriter]: 7.12002e-06 [mutable_eliminate]: 0.00045826 [opt_b]: 0.00028561, [1] [Cycle 1]: 0.0002791, [7] [b_1]: 0.00018782 [b_2]: 1.064e-05 [updatestate_depend_eliminate]: 7.06999e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 4.14002e-06 [renormalize]: 3.7998e-07 [cse]: 3.086e-05 [optimize_parallel_all_gather_comm]: 2.037e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 1.912e-05 [loop_unroll]: 0.0004234 [opt_after_cconv]: 0.00013333, [1] [Cycle 1]: 0.00012751, [7] [c_1]: 4.743e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 7.00998e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.79002e-06 [cse]: 2.93e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 2.858e-05 [tuple_transform]: 9.927e-05, [1] [Cycle 1]: 9.473e-05, [4] [d_1]: 6.496e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 9.75002e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 5.632e-05 [cse_after_recomputation]: 3.152e-05, [1] [Cycle 1]: 2.702e-05, [1] [cse]: 2.145e-05 [environ_conv]: 8.54002e-06 [swap_dp_allreduce_reducescatter]: 7.63999e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.651e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.12e-06 [overlap_recompute_and_grad_model_parallel]: 5.79999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.28998e-06 [overlap_grad_ring_attention]: 4.92e-06 [overlap_grad_flash_sp]: 2.404e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 2.03002e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 9.79e-05, [1] [Cycle 1]: 9.377e-05, [6] [build]: 1.003e-05 [elim_shapecalc]: 1.327e-05 [elim_not_effective]: 1.776e-05 [opt_reshape]: 9.94999e-06 [fold_const_symbol]: 1.474e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.434e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.52997e-06 [opt_after_jit_grad]: 0.00046643 [validate]: 4.446e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00827469 [execute]: 6.54001e-06 Sums bootstrap : 0.000508s : 1.60% type_inference : 0.010259s : 32.34% event_method : 0.000042s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000126s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003096s : 9.76% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.55% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001454s : 4.58% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000149s : 0.47% optimize.opt_a.renormalize : 0.002907s : 9.16% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000231s : 0.73% optimize.opt_a.a_3 : 0.000487s : 1.54% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000050s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.44% optimize.opt_b.b_1 : 0.000188s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000423s : 1.33% optimize.opt_after_cconv.c_1 : 0.000047s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000065s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000466s : 1.47% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008275s : 26.09% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000718 218 5.96% : 0.000043s : 11: substitution.arithmetic_simplify 1.88% : 0.000013s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 54.63% : 0.000392s : 16: substitution.inline 2.13% : 0.000015s : 2: substitution.inline_without_move 1.40% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000014s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.91% : 0.000014s : 20: substitution.remove_not_recompute_node 3.28% : 0.000024s : 10: substitution.replace_applicator 1.47% : 0.000011s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.79% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.88% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.48% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010190 2 87.64% : 0.008930s : 1: type_inference.infer 12.36% : 0.001260s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.94% : 0.000119s : 16: replace.inline 41.06% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000414 30 92.82% : 0.000384s : 16: match.inline 7.18% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000730 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.13% : 0.000008s : 67: predicate.addn_zero_filter 1.07% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 99: predicate.arithmetic_simplify 1.19% : 0.000009s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.17% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.20% : 0.000001s : 8: predicate.parallel_virtual_node 1.93% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.65% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.35% : 0.000003s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.42% : 0.000010s : 83: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001442 32 57.70% : 0.000832s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.30% : 0.000610s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059935 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.00% : 0.003000s : 1: add_attr 4.99% : 0.002991s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.90% : 0.000542s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.01% : 0.004802s : 117: opt.transform.opt_a 0.08% : 0.000046s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000073s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.88% : 0.010718s : 1: opt_a 0.23% : 0.000137s : 1: opt_after_cconv 0.79% : 0.000476s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.61% : 0.012951s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000051s : 1: pre_auto_parallel 0.06% : 0.000039s : 1: py_interpret_to_execute 0.03% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.59% : 0.001553s : 2: renormalize.infer 2.24% : 0.001341s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000054s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.82% : 0.008285s : 1: task_emit 0.17% : 0.000102s : 1: tuple_transform 17.14% : 0.010273s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x4-kbk],max_mem:42.0M TotalTime = 0.943226, [24] [bootstrap]: 0.00057539 [type_inference]: 0.00605838 [event_method]: 1.391e-05 [auto_monad]: 5.354e-05 [graph_reusing]: 5.02e-06 [inline]: 2.17999e-06 [add_attr]: 0.00342832, [1] [add_attr_with_inline]: 0.00341726, [1] [Cycle 1]: 4.366e-05, [2] [tag_attr]: 1.462e-05 [meta_addattr_fg_expand]: 3.88001e-06 [parallel-infer-symbol]: 2.53e-06 [pre_auto_parallel]: 2.926e-05 [insert-virtual-dataset]: 2.71e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.00396062, [53] [py_interpret_to_execute]: 2.053e-05 [rewriter_before_opt_a]: 5.841e-05 [opt_a]: 0.00209444, [2] [Cycle 1]: 0.00149997, [45] [expand_dump_flag]: 2.66999e-06 [switch_simplify]: 3.263e-05 [loop_unroll]: 2.055e-05 [a_1]: 0.00045627 [with_stream_mark]: 1.272e-05 [recompute_prepare]: 7.80998e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 2.99999e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 1.52999e-06 [a_2]: 7.504e-05 [accelerated_algorithm]: 5.99e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.14001e-06 [merge_send_recv]: 7.51999e-06 [auto_parallel]: 6.03002e-06 [parallel]: 2.29e-05 [flash_sp]: 7.06999e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 7.93001e-06 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.38002e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.76001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.036e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.19e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.23002e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.03e-05 [a_after_grad]: 8.59e-06 [renormalize]: 0.00040593 [add_forward_monad_depend]: 4.77998e-06 [auto_monad_grad]: 1.99e-06 [auto_monad_eliminator]: 1.358e-05 [cse]: 2.668e-05 [a_3]: 4.051e-05 [Cycle 2]: 0.0005853, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012448 [with_stream_mark]: 9.61e-06 [recompute_prepare]: 5.85002e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.675e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.29002e-06 [auto_parallel]: 5.02e-06 [parallel]: 4.32998e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.38998e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.83e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 5.80002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.60001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.81997e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.80013e-07 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.245e-05 [a_3]: 3.162e-05 [py_interpret_to_execute_after_opt_a]: 7.76001e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 2.985e-05 [convert_after_rewriter]: 7.95e-06 [order_py_execute_after_rewriter]: 5.24e-06 [mutable_eliminate]: 0.0004454 [opt_b]: 0.00018463, [1] [Cycle 1]: 0.0001781, [7] [b_1]: 0.00010679 [b_2]: 6.48e-06 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 3.7998e-07 [cse]: 2.107e-05 [optimize_parallel_all_gather_comm]: 1.653e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.133e-05 [loop_unroll]: 0.00041428 [opt_after_cconv]: 9.64e-05, [1] [Cycle 1]: 9.012e-05, [7] [c_1]: 2.806e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.40002e-06 [cse]: 1.642e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.276e-05 [tuple_transform]: 6.916e-05, [1] [Cycle 1]: 6.507e-05, [4] [d_1]: 3.936e-05 [none_parameter_eliminate]: 1.43002e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 5.145e-05 [cse_after_recomputation]: 1.957e-05, [1] [Cycle 1]: 1.536e-05, [1] [cse]: 1.044e-05 [environ_conv]: 4.52998e-06 [swap_dp_allreduce_reducescatter]: 4.94e-06 [bias_add_comm_swap]: 2.46998e-06 [label_micro_interleaved_index]: 4.63999e-06 [label_fine_grained_interleaved_index]: 3.14001e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.23002e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.08999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 4.04002e-06 [overlap_grad_flash_sp]: 1.691e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 6.631e-05, [1] [Cycle 1]: 6.231e-05, [6] [build]: 2.07001e-06 [elim_shapecalc]: 8.06001e-06 [elim_not_effective]: 1.117e-05 [opt_reshape]: 6.00002e-06 [fold_const_symbol]: 8.78001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.513e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.88999e-06 [opt_after_jit_grad]: 0.00044742 [validate]: 3.067e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.928368 [execute]: 9.51e-06 Sums bootstrap : 0.000575s : 0.06% type_inference : 0.006058s : 0.65% event_method : 0.000014s : 0.00% auto_monad : 0.000054s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000581s : 0.06% optimize.opt_a.with_stream_mark : 0.000022s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000406s : 0.04% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.00% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000445s : 0.05% optimize.opt_b.b_1 : 0.000107s : 0.01% optimize.opt_b.b_2 : 0.000006s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000021s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.00% optimize.loop_unroll : 0.000414s : 0.04% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.05% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.928368s : 98.89% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000165 30 14.32% : 0.000024s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.21% : 0.000005s : 4: substitution.graph_param_transform 67.57% : 0.000111s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.32% : 0.000004s : 4: substitution.replace_old_param 6.49% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006005 2 91.02% : 0.005466s : 1: type_inference.infer 8.98% : 0.000539s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.98% : 0.000027s : 3: replace.inline 30.02% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.84% : 0.000109s : 3: match.inline 8.16% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.98% : 0.000002s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 1.00% : 0.000002s : 11: predicate.cast_eliminate 0.65% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000009s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.91% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.06% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000335 8 47.43% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.57% : 0.000176s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.952113 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.36% : 0.003433s : 1: add_attr 0.36% : 0.003421s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000058s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000611s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000454s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000012s : 1: opt.transform.mutable_eliminate 0.10% : 0.000945s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000089s : 28: opt.transform.opt_b 0.00% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000031s : 4: opt.transform.symbol_engine_opt 0.22% : 0.002097s : 1: opt_a 0.01% : 0.000100s : 1: opt_after_cconv 0.05% : 0.000457s : 1: opt_after_jit_grad 0.02% : 0.000188s : 1: opt_b 0.42% : 0.003964s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000034s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000209s : 1: renormalize.infer 0.02% : 0.000190s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000062s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000069s : 1: symbol_engine_optimizer 97.51% : 0.928390s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.64% : 0.006072s : 1: type_inference 0.01% : 0.000055s : 1: validate TotalTime = 0.0721703, [24] [bootstrap]: 0.00053467 [type_inference]: 0.00436298 [event_method]: 1.067e-05 [auto_monad]: 4.959e-05 [graph_reusing]: 4.67e-06 [inline]: 1.70001e-06 [add_attr]: 0.00347441, [1] [add_attr_with_inline]: 0.0034654, [1] [Cycle 1]: 4.998e-05, [2] [tag_attr]: 1.275e-05 [meta_addattr_fg_expand]: 3.30998e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.297e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00374584, [53] [py_interpret_to_execute]: 1.561e-05 [rewriter_before_opt_a]: 4.02e-05 [opt_a]: 0.00191725, [2] [Cycle 1]: 0.00131394, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 2.334e-05 [loop_unroll]: 1.36e-05 [a_1]: 0.00029413 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 6.93e-06 [updatestate_depend_eliminate]: 4.09002e-06 [updatestate_assign_eliminate]: 3.63e-06 [updatestate_loads_eliminate]: 2.78998e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.523e-05 [accelerated_algorithm]: 6.51999e-06 [shard]: 2.28998e-06 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 7.96001e-06 [auto_parallel]: 5.47001e-06 [parallel]: 2.096e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 9.62001e-06 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.47999e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.018e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.035e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.09e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 1.96e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.23002e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 8.49998e-06 [renormalize]: 0.00040006 [add_forward_monad_depend]: 4.49002e-06 [auto_monad_grad]: 2.00002e-06 [auto_monad_eliminator]: 1.297e-05 [cse]: 2.737e-05 [a_3]: 4.019e-05 [Cycle 2]: 0.00059375, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.70998e-06 [loop_unroll]: 5.35001e-06 [a_1]: 0.00012449 [with_stream_mark]: 9.72001e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.805e-05 [accelerated_algorithm]: 5.65001e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.71998e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.73997e-06 [parallel]: 4.32e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.51e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.16e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.37999e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 5.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.78999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22002e-06 [meta_fg_expand]: 1.55001e-06 [flash_sp_send_recv_attached]: 8.90024e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.07e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.57002e-06 [cse]: 1.305e-05 [a_3]: 3.157e-05 [py_interpret_to_execute_after_opt_a]: 7.51999e-06 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 3.004e-05 [convert_after_rewriter]: 6.74001e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00045221 [opt_b]: 0.00018064, [1] [Cycle 1]: 0.00017458, [7] [b_1]: 0.00010731 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 5.54998e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 2.50002e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.565e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.19e-05 [loop_unroll]: 0.00041941 [opt_after_cconv]: 9.474e-05, [1] [Cycle 1]: 8.894e-05, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.02999e-06 [cse]: 1.676e-05 [renormalize]: 2.40019e-07 [remove_dup_value]: 1.162e-05 [tuple_transform]: 6.838e-05, [1] [Cycle 1]: 6.404e-05, [4] [d_1]: 3.883e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 1.30007e-07 [switch_simplify]: 5.98998e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 4.458e-05 [cse_after_recomputation]: 2.159e-05, [1] [Cycle 1]: 1.73e-05, [1] [cse]: 1.209e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.67001e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.87e-06 [label_fine_grained_interleaved_index]: 2.80002e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.13002e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.62999e-06 [ForceFp32Comm]: 1.06002e-06 [remove_cast_before_assign_add]: 8.80013e-07 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.37001e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.115e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.57002e-06 [overlap_recompute_and_grad_model_parallel]: 4.58001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.01e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.733e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 8.107e-05, [1] [Cycle 1]: 7.651e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 7.85998e-06 [elim_not_effective]: 1.153e-05 [opt_reshape]: 5.81e-06 [fold_const_symbol]: 2.112e-05 [renormalize]: 1.50001e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.544e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.0004535 [validate]: 3.244e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0592352 [execute]: 8.54e-06 Sums bootstrap : 0.000535s : 0.79% type_inference : 0.004363s : 6.44% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000025s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000400s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.67% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000419s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000021s : 0.03% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000453s : 0.67% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059235s : 87.45% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000124 26 17.59% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.15% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000005s : 4: substitution.graph_param_transform 66.75% : 0.000083s : 2: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.32% : 0.000004s : 4: substitution.remove_not_recompute_node 3.22% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004323 2 91.62% : 0.003961s : 1: type_inference.infer 8.38% : 0.000362s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000135 984 0.95% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.22% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.47% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.26% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.27% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000264 6 42.29% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.71% : 0.000152s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080724 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.31% : 0.003479s : 1: add_attr 4.30% : 0.003469s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.71% : 0.000571s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.53% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.95% : 0.000765s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000044s : 4: opt.transform.symbol_engine_opt 2.38% : 0.001920s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000464s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.64% : 0.003750s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.28% : 0.000225s : 1: renormalize.infer 0.21% : 0.000169s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000084s : 1: symbol_engine_optimizer 73.40% : 0.059250s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.42% : 0.004377s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.0724213, [24] [bootstrap]: 0.00046126 [type_inference]: 0.00551503 [event_method]: 1.385e-05 [auto_monad]: 5.335e-05 [graph_reusing]: 5.66003e-06 [inline]: 1.71e-06 [add_attr]: 0.00295748, [1] [add_attr_with_inline]: 0.00294915, [1] [Cycle 1]: 4.441e-05, [2] [tag_attr]: 1.493e-05 [meta_addattr_fg_expand]: 4.03001e-06 [parallel-infer-symbol]: 2.58e-06 [pre_auto_parallel]: 2.526e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00395463, [53] [py_interpret_to_execute]: 2.16e-05 [rewriter_before_opt_a]: 5.816e-05 [opt_a]: 0.00209097, [2] [Cycle 1]: 0.00149228, [45] [expand_dump_flag]: 2.62001e-06 [switch_simplify]: 3.161e-05 [loop_unroll]: 2.055e-05 [a_1]: 0.00044688 [with_stream_mark]: 1.369e-05 [recompute_prepare]: 7.54002e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.88998e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.432e-05 [accelerated_algorithm]: 6.09001e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 7.51001e-06 [auto_parallel]: 5.46e-06 [parallel]: 1.717e-05 [flash_sp]: 7.67998e-06 [merge_comm]: 3.49001e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 8.62e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.39998e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 8.63001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.093e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.22999e-06 [after_resolve]: 1.068e-05 [a_after_grad]: 8.62998e-06 [renormalize]: 0.00041491 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.69998e-06 [auto_monad_eliminator]: 1.337e-05 [cse]: 2.627e-05 [a_3]: 3.981e-05 [Cycle 2]: 0.00058935, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012214 [with_stream_mark]: 9.36998e-06 [recompute_prepare]: 5.52001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.68e-05 [accelerated_algorithm]: 5.37001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.47998e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.94002e-06 [flash_sp]: 3.40003e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 4.87998e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.73999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.92999e-06 [a_after_grad]: 8.17e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 6.44001e-06 [cse]: 1.34e-05 [a_3]: 3.146e-05 [py_interpret_to_execute_after_opt_a]: 7.67998e-06 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 3.027e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.65001e-06 [mutable_eliminate]: 0.00044974 [opt_b]: 0.0001789, [1] [Cycle 1]: 0.00017317, [7] [b_1]: 0.00010622 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 4.09986e-07 [cse]: 1.64e-05 [optimize_parallel_all_gather_comm]: 1.59e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00041718 [opt_after_cconv]: 9.516e-05, [1] [Cycle 1]: 8.946e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.679e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.241e-05 [tuple_transform]: 6.914e-05, [1] [Cycle 1]: 6.493e-05, [4] [d_1]: 3.855e-05 [none_parameter_eliminate]: 1.91e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 4.472e-05 [cse_after_recomputation]: 1.99e-05, [1] [Cycle 1]: 1.553e-05, [1] [cse]: 1.042e-05 [environ_conv]: 4.65999e-06 [swap_dp_allreduce_reducescatter]: 5.52999e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.33002e-06 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.67998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.81999e-06 [overlap_grad_flash_sp]: 1.728e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.791e-05, [1] [Cycle 1]: 6.405e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.40999e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 5.77999e-06 [fold_const_symbol]: 8.86002e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.534e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.78001e-06 [opt_after_jit_grad]: 0.0004514 [validate]: 3.027e-05 [backend_pass]: 8.30012e-07 [task_emit]: 0.0587132 [execute]: 8.88002e-06 Sums bootstrap : 0.000461s : 0.67% type_inference : 0.005515s : 8.05% event_method : 0.000014s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000569s : 0.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000415s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.66% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000417s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000451s : 0.66% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.058713s : 85.74% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000164 30 15.32% : 0.000025s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.27% : 0.000109s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000005s : 4: substitution.remove_not_recompute_node 2.54% : 0.000004s : 4: substitution.replace_old_param 6.20% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005474 2 89.97% : 0.004925s : 1: type_inference.infer 10.03% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.81% : 0.000027s : 3: replace.inline 29.19% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 92.09% : 0.000107s : 3: match.inline 7.91% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.88% : 0.000003s : 23: predicate.environ_get_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.53% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.48% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.32% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000355 8 48.20% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.80% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080833 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.66% : 0.002962s : 1: add_attr 3.65% : 0.002953s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000497s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.04% : 0.000035s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000425s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.15% : 0.000930s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.59% : 0.002094s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.57% : 0.000461s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.90% : 0.003958s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000205s : 1: renormalize.infer 0.25% : 0.000203s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.08% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 72.66% : 0.058730s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.84% : 0.005528s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 1.09599, [24] [bootstrap]: 0.00050247 [type_inference]: 0.0114333 [event_method]: 4.868e-05 [auto_monad]: 0.00012054 [graph_reusing]: 8.35999e-06 [inline]: 2.14999e-06 [add_attr]: 0.00304907, [1] [add_attr_with_inline]: 0.00304096, [1] [Cycle 1]: 7.023e-05, [2] [tag_attr]: 3.429e-05 [meta_addattr_fg_expand]: 9.38002e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 4.948e-05 [insert-virtual-dataset]: 2.81e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.51002e-06 [optimize]: 0.0134365, [53] [py_interpret_to_execute]: 3.66e-05 [rewriter_before_opt_a]: 0.00014676 [opt_a]: 0.0111252, [3] [Cycle 1]: 0.00718936, [45] [expand_dump_flag]: 3.75998e-06 [switch_simplify]: 7.346e-05 [loop_unroll]: 6.169e-05 [a_1]: 0.00147907 [with_stream_mark]: 2.28e-05 [recompute_prepare]: 2.136e-05 [updatestate_depend_eliminate]: 9.05999e-06 [updatestate_assign_eliminate]: 7.5e-06 [updatestate_loads_eliminate]: 7.16001e-06 [parameter_eliminate]: 2.91e-06 [a_2]: 0.00024335 [accelerated_algorithm]: 3.12e-05 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 3.45e-06 [shard_inline]: 1.612e-05 [merge_send_recv]: 1.537e-05 [auto_parallel]: 1.056e-05 [parallel]: 1.801e-05 [flash_sp]: 1.135e-05 [merge_comm]: 9.51e-06 [allreduce_fusion]: 8.68001e-06 [matmul_add_comm_reduction]: 2.672e-05 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 1.777e-05 [virtual_dataset]: 1.596e-05 [get_grad_eliminate_]: 1.546e-05 [virtual_output]: 1.543e-05 [merge_forward]: 9.46003e-06 [cell_reuse_recompute_pass]: 9.99979e-07 [offload_activation]: 1.774e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.872e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 2.788e-05 [set_forward_comm_id_for_comm_node_pass]: 9.54e-06 [meta_fg_expand]: 0.00141774 [flash_sp_send_recv_attached]: 3.73001e-06 [receive_attached]: 1.94999e-06 [after_resolve]: 6.573e-05 [a_after_grad]: 9.142e-05 [renormalize]: 0.00250202 [add_forward_monad_depend]: 9.51e-06 [auto_monad_grad]: 5.32001e-06 [auto_monad_eliminator]: 5.801e-05 [cse]: 0.00017118 [a_3]: 0.00033662 [Cycle 2]: 0.00301793, [45] [expand_dump_flag]: 1.57001e-06 [switch_simplify]: 4.673e-05 [loop_unroll]: 4.364e-05 [a_1]: 0.00153071 [with_stream_mark]: 1.195e-05 [recompute_prepare]: 1.093e-05 [updatestate_depend_eliminate]: 5.54e-06 [updatestate_assign_eliminate]: 4.51002e-06 [updatestate_loads_eliminate]: 3.58e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012656 [accelerated_algorithm]: 1.227e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.22999e-06 [merge_send_recv]: 6.98998e-06 [auto_parallel]: 7.45998e-06 [parallel]: 5.20999e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.09998e-06 [allreduce_fusion]: 4.75999e-06 [matmul_add_comm_reduction]: 1.64e-05 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 1.074e-05 [virtual_dataset]: 8.78001e-06 [get_grad_eliminate_]: 9.15001e-06 [virtual_output]: 8.52998e-06 [merge_forward]: 5.02e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.683e-05 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 1.443e-05 [set_forward_comm_id_for_comm_node_pass]: 5.52999e-06 [meta_fg_expand]: 6.997e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.12999e-06 [after_resolve]: 1.598e-05 [a_after_grad]: 1.441e-05 [renormalize]: 0.00059418 [add_forward_monad_depend]: 3.91999e-06 [auto_monad_grad]: 1.14998e-06 [auto_monad_eliminator]: 1.449e-05 [cse]: 4.651e-05 [a_3]: 6.527e-05 [Cycle 3]: 0.00090378, [45] [expand_dump_flag]: 1.14e-06 [switch_simplify]: 1.085e-05 [loop_unroll]: 9.04e-06 [a_1]: 0.00025008 [with_stream_mark]: 9.79999e-06 [recompute_prepare]: 9.27001e-06 [updatestate_depend_eliminate]: 4.89003e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.97e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.00012344 [accelerated_algorithm]: 1.177e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 9.00999e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.18e-06 [parallel]: 4.47e-06 [flash_sp]: 1.06002e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 4.82e-06 [matmul_add_comm_reduction]: 7.43e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 9.89001e-06 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.59002e-06 [virtual_output]: 8.39998e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 8.28999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.593e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.474e-05 [set_forward_comm_id_for_comm_node_pass]: 5.79e-06 [meta_fg_expand]: 2.86e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.358e-05 [a_after_grad]: 1.404e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 1.073e-05 [cse]: 2.679e-05 [a_3]: 5.958e-05 [py_interpret_to_execute_after_opt_a]: 1.035e-05 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 4.721e-05 [convert_after_rewriter]: 9.46003e-06 [order_py_execute_after_rewriter]: 6.93e-06 [mutable_eliminate]: 0.00045494 [opt_b]: 0.00028923, [1] [Cycle 1]: 0.00028333, [7] [b_1]: 0.0001915 [b_2]: 1.098e-05 [updatestate_depend_eliminate]: 7.06999e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 4.69998e-07 [cse]: 3.103e-05 [optimize_parallel_all_gather_comm]: 2.004e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.009e-05 [loop_unroll]: 0.00042324 [opt_after_cconv]: 0.00013547, [1] [Cycle 1]: 0.00012949, [7] [c_1]: 4.786e-05 [parameter_eliminate]: 2.41998e-06 [updatestate_depend_eliminate]: 7.01001e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.83001e-06 [cse]: 3.038e-05 [renormalize]: 2.29978e-07 [remove_dup_value]: 2.85e-05 [tuple_transform]: 0.00010278, [1] [Cycle 1]: 9.742e-05, [4] [d_1]: 6.621e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.032e-05 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.7e-05 [cse_after_recomputation]: 3.248e-05, [1] [Cycle 1]: 2.755e-05, [1] [cse]: 2.187e-05 [environ_conv]: 8.18999e-06 [swap_dp_allreduce_reducescatter]: 8.02998e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.65002e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 1.30999e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.679e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.00001e-06 [overlap_recompute_and_grad_model_parallel]: 6.07001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.57001e-06 [overlap_grad_ring_attention]: 5.14e-06 [overlap_grad_flash_sp]: 2.459e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.87999e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 0.00015024, [1] [Cycle 1]: 0.00014598, [6] [build]: 9.57999e-06 [elim_shapecalc]: 1.347e-05 [elim_not_effective]: 1.802e-05 [opt_reshape]: 1.025e-05 [fold_const_symbol]: 1.593e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.533e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.60998e-06 [opt_after_jit_grad]: 0.00046821 [validate]: 4.545e-05 [backend_pass]: 9.29984e-07 [task_emit]: 1.06656 [execute]: 1e-05 Sums bootstrap : 0.000502s : 0.05% type_inference : 0.011433s : 1.05% event_method : 0.000049s : 0.00% auto_monad : 0.000121s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.00% optimize.rewriter_before_opt_a : 0.000147s : 0.01% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.01% optimize.opt_a.loop_unroll : 0.000114s : 0.01% optimize.opt_a.a_1 : 0.003260s : 0.30% optimize.opt_a.with_stream_mark : 0.000045s : 0.00% optimize.opt_a.recompute_prepare : 0.000042s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000493s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.00% optimize.opt_a.merge_send_recv : 0.000029s : 0.00% optimize.opt_a.auto_parallel : 0.000025s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000051s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.00% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001491s : 0.14% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000095s : 0.01% optimize.opt_a.a_after_grad : 0.000120s : 0.01% optimize.opt_a.renormalize : 0.003096s : 0.28% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.01% optimize.opt_a.cse : 0.000244s : 0.02% optimize.opt_a.a_3 : 0.000461s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000455s : 0.04% optimize.opt_b.b_1 : 0.000192s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000423s : 0.04% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.00% optimize.tuple_transform.d_1 : 0.000066s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000008s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000468s : 0.04% validate : 0.000045s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.066559s : 97.70% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000760 222 5.78% : 0.000044s : 12: substitution.arithmetic_simplify 1.76% : 0.000013s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.46% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.50% : 0.000422s : 17: substitution.inline 2.20% : 0.000017s : 2: substitution.inline_without_move 1.42% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.81% : 0.000014s : 20: substitution.remove_not_recompute_node 3.14% : 0.000024s : 10: substitution.replace_applicator 1.48% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.85% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011358 2 86.69% : 0.009846s : 1: type_inference.infer 13.31% : 0.001512s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.12% : 0.000125s : 17: replace.inline 42.88% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000448 33 92.20% : 0.000413s : 17: match.inline 7.80% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000773 5764 1.04% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.04% : 0.000008s : 68: predicate.addn_zero_filter 1.01% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.91% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.09% : 0.000008s : 68: predicate.check_bprop_eliminate 0.49% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.14% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.66% : 0.000013s : 101: predicate.exchange_switch_depend_value 5.14% : 0.000040s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.48% : 0.000042s : 249: predicate.inline 1.33% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.56% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.60% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.07% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 101: predicate.partial_defer_inline 1.68% : 0.000013s : 92: predicate.partial_eliminate 1.04% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.23% : 0.000010s : 68: predicate.reduce_eliminate 2.61% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.05% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.22% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.19% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000014s : 101: predicate.switch_defer_inline 2.85% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.85% : 0.000038s : 277: predicate.switch_simplify 1.03% : 0.000008s : 68: predicate.tile_eliminate 1.03% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.73% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.92% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.56% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.57% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.16% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001577 34 56.62% : 0.000893s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.38% : 0.000684s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.120845 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.27% : 0.003053s : 1: add_attr 0.27% : 0.003045s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000128s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.05% : 0.000536s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000011s : 1: environ_conv 0.00% : 0.000056s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000464s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.44% : 0.004943s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000177s : 28: opt.transform.opt_b 0.01% : 0.000074s : 2: opt.transform.opt_trans_graph 0.00% : 0.000054s : 4: opt.transform.symbol_engine_opt 0.99% : 0.011128s : 1: opt_a 0.01% : 0.000139s : 1: opt_after_cconv 0.04% : 0.000478s : 1: opt_after_jit_grad 0.03% : 0.000293s : 1: opt_b 1.20% : 0.013440s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000054s : 1: pre_auto_parallel 0.00% : 0.000041s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000033s : 1: remove_dup_value 0.15% : 0.001665s : 2: renormalize.infer 0.13% : 0.001417s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000051s : 1: rewriter_after_opt_a 0.01% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000153s : 1: symbol_engine_optimizer 95.16% : 1.066582s : 1: task_emit 0.01% : 0.000106s : 1: tuple_transform 1.02% : 0.011448s : 1: type_inference 0.01% : 0.000069s : 1: validate TotalTime = 0.0720538, [24] [bootstrap]: 0.00045839 [type_inference]: 0.00437479 [event_method]: 1.108e-05 [auto_monad]: 5.056e-05 [graph_reusing]: 5.22e-06 [inline]: 2.01e-06 [add_attr]: 0.00311385, [1] [add_attr_with_inline]: 0.00310489, [1] [Cycle 1]: 5.044e-05, [2] [tag_attr]: 1.268e-05 [meta_addattr_fg_expand]: 3.28998e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.312e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.90001e-06 [optimize]: 0.00385803, [53] [py_interpret_to_execute]: 1.658e-05 [rewriter_before_opt_a]: 4.198e-05 [opt_a]: 0.00199967, [2] [Cycle 1]: 0.0013466, [45] [expand_dump_flag]: 2.62001e-06 [switch_simplify]: 2.524e-05 [loop_unroll]: 1.334e-05 [a_1]: 0.00029453 [with_stream_mark]: 1.524e-05 [recompute_prepare]: 7.11001e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 3.26999e-06 [parameter_eliminate]: 1.56998e-06 [a_2]: 7.533e-05 [accelerated_algorithm]: 6.04999e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 7.88001e-06 [auto_parallel]: 5.37999e-06 [parallel]: 1.907e-05 [flash_sp]: 7.52002e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.68e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.84999e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.79e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.35001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.077e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.62999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.56998e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.023e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00042802 [add_forward_monad_depend]: 4.30999e-06 [auto_monad_grad]: 1.94999e-06 [auto_monad_eliminator]: 1.389e-05 [cse]: 2.733e-05 [a_3]: 4.117e-05 [Cycle 2]: 0.00064348, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.43002e-06 [a_1]: 0.00012536 [with_stream_mark]: 9.17001e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.20002e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 6.716e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.35001e-06 [parallel]: 4e-06 [flash_sp]: 3.55e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.27999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.22001e-06 [virtual_dataset]: 5.50001e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.26002e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.024e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22002e-06 [meta_fg_expand]: 1.77999e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.34e-06 [a_after_grad]: 8.61002e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 7.2e-06 [cse]: 1.447e-05 [a_3]: 3.253e-05 [py_interpret_to_execute_after_opt_a]: 8.03001e-06 [slice_cell_reuse_recomputed_activation]: 2.42001e-06 [rewriter_after_opt_a]: 3.201e-05 [convert_after_rewriter]: 6.66999e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00047114 [opt_b]: 0.00018109, [1] [Cycle 1]: 0.00017496, [7] [b_1]: 0.00010653 [b_2]: 6.84999e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.40002e-06 [renormalize]: 6.10016e-07 [cse]: 1.726e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.00042683 [opt_after_cconv]: 9.681e-05, [1] [Cycle 1]: 9.088e-05, [7] [c_1]: 2.765e-05 [parameter_eliminate]: 2.33998e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.707e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.323e-05 [tuple_transform]: 6.898e-05, [1] [Cycle 1]: 6.464e-05, [4] [d_1]: 3.908e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.63e-05 [cse_after_recomputation]: 2.035e-05, [1] [Cycle 1]: 1.603e-05, [1] [cse]: 1.08e-05 [environ_conv]: 5.15001e-06 [swap_dp_allreduce_reducescatter]: 5.90002e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.47998e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.57999e-06 [slice_recompute_activation]: 2.63e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.20999e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.13e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.77002e-06 [overlap_recompute_and_grad_model_parallel]: 4.82e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 4.12e-06 [overlap_grad_flash_sp]: 1.646e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 6.985e-05, [1] [Cycle 1]: 6.581e-05, [6] [build]: 2.49001e-06 [elim_shapecalc]: 8.1e-06 [elim_not_effective]: 1.212e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.39003e-06 [auto_monad_reorder]: 1.597e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00046304 [validate]: 3.322e-05 [backend_pass]: 1.21997e-06 [task_emit]: 0.059401 [execute]: 8.99e-06 Sums bootstrap : 0.000458s : 0.67% type_inference : 0.004375s : 6.44% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.02% optimize.rewriter_before_opt_a : 0.000042s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000428s : 0.63% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000471s : 0.69% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000427s : 0.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000463s : 0.68% validate : 0.000033s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059401s : 87.47% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000126 26 18.71% : 0.000024s : 4: substitution.arithmetic_simplify 1.74% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000006s : 4: substitution.graph_param_transform 64.77% : 0.000082s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.92% : 0.000005s : 4: substitution.remove_not_recompute_node 3.15% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004333 2 91.77% : 0.003976s : 1: type_inference.infer 8.23% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000138 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 9: predicate.addn_zero_filter 0.76% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 1.21% : 0.000002s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.80% : 0.000008s : 44: predicate.inline 1.08% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.49% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.49% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.07% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.57% : 0.000006s : 41: predicate.switch_simplify 0.82% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.38% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000257 6 40.37% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.63% : 0.000153s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080376 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.88% : 0.003118s : 1: add_attr 3.87% : 0.003109s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.62% : 0.000495s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.60% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.96% : 0.000772s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.49% : 0.002003s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.59% : 0.000473s : 1: opt_after_jit_grad 0.23% : 0.000185s : 1: opt_b 4.80% : 0.003862s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.03% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.30% : 0.000242s : 1: renormalize.infer 0.22% : 0.000180s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.06% : 0.000046s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 73.93% : 0.059423s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.46% : 0.004390s : 1: type_inference 0.07% : 0.000057s : 1: validate TotalTime = 0.111463, [24] [bootstrap]: 0.00055006 [type_inference]: 0.0104369 [event_method]: 4.326e-05 [auto_monad]: 0.00011661 [graph_reusing]: 8e-06 [inline]: 1.93002e-06 [add_attr]: 0.0029829, [1] [add_attr_with_inline]: 0.00297441, [1] [Cycle 1]: 6.643e-05, [2] [tag_attr]: 3.147e-05 [meta_addattr_fg_expand]: 8.64e-06 [parallel-infer-symbol]: 3.26001e-06 [pre_auto_parallel]: 4.62e-05 [insert-virtual-dataset]: 2.71e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0131236, [53] [py_interpret_to_execute]: 3.575e-05 [rewriter_before_opt_a]: 0.00013185 [opt_a]: 0.0108596, [3] [Cycle 1]: 0.00695658, [45] [expand_dump_flag]: 3.74002e-06 [switch_simplify]: 6.724e-05 [loop_unroll]: 5.433e-05 [a_1]: 0.00133916 [with_stream_mark]: 2.284e-05 [recompute_prepare]: 2.177e-05 [updatestate_depend_eliminate]: 9.11002e-06 [updatestate_assign_eliminate]: 7.6e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 2.64001e-06 [a_2]: 0.00024378 [accelerated_algorithm]: 3.123e-05 [shard]: 2.26e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.605e-05 [merge_send_recv]: 1.673e-05 [auto_parallel]: 1.059e-05 [parallel]: 2.037e-05 [flash_sp]: 1.174e-05 [merge_comm]: 9.48002e-06 [allreduce_fusion]: 8.69998e-06 [matmul_add_comm_reduction]: 2.646e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.821e-05 [virtual_dataset]: 1.571e-05 [get_grad_eliminate_]: 1.51e-05 [virtual_output]: 1.514e-05 [merge_forward]: 1.032e-05 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.809e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.857e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 2.72e-05 [set_forward_comm_id_for_comm_node_pass]: 9.57999e-06 [meta_fg_expand]: 0.00140763 [flash_sp_send_recv_attached]: 3.82002e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 6.073e-05 [a_after_grad]: 8.068e-05 [renormalize]: 0.00244983 [add_forward_monad_depend]: 9.71003e-06 [auto_monad_grad]: 5.15999e-06 [auto_monad_eliminator]: 5.709e-05 [cse]: 0.00017078 [a_3]: 0.00033652 [Cycle 2]: 0.00298783, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.724e-05 [loop_unroll]: 4.426e-05 [a_1]: 0.00156323 [with_stream_mark]: 1.177e-05 [recompute_prepare]: 1.103e-05 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 4.38999e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.00012635 [accelerated_algorithm]: 1.191e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.81998e-06 [shard_inline]: 9.29998e-06 [merge_send_recv]: 6.74999e-06 [auto_parallel]: 7.29001e-06 [parallel]: 4.82998e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 5.93002e-06 [allreduce_fusion]: 5.04e-06 [matmul_add_comm_reduction]: 7.75e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.77999e-06 [get_grad_eliminate_]: 8.67998e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 8.90001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.582e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35999e-06 [meta_fg_expand]: 3.46e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.471e-05 [a_after_grad]: 1.402e-05 [renormalize]: 0.00058375 [add_forward_monad_depend]: 4.23999e-06 [auto_monad_grad]: 1.09998e-06 [auto_monad_eliminator]: 1.409e-05 [cse]: 4.58e-05 [a_3]: 6.527e-05 [Cycle 3]: 0.00090096, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.049e-05 [loop_unroll]: 8.93002e-06 [a_1]: 0.00024885 [with_stream_mark]: 9.93002e-06 [recompute_prepare]: 9.70002e-06 [updatestate_depend_eliminate]: 4.80001e-06 [updatestate_assign_eliminate]: 3.98001e-06 [updatestate_loads_eliminate]: 4.23999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012365 [accelerated_algorithm]: 1.167e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 7.03e-06 [auto_parallel]: 6.98e-06 [parallel]: 4.53999e-06 [flash_sp]: 1.03001e-06 [merge_comm]: 4.82998e-06 [allreduce_fusion]: 5.06002e-06 [matmul_add_comm_reduction]: 7.69002e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 9.99001e-06 [virtual_dataset]: 8.60001e-06 [get_grad_eliminate_]: 8.39002e-06 [virtual_output]: 8.38999e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.78001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.616e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42999e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.342e-05 [a_after_grad]: 1.452e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 1.016e-05 [cse]: 2.806e-05 [a_3]: 6.012e-05 [py_interpret_to_execute_after_opt_a]: 1.012e-05 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 4.883e-05 [convert_after_rewriter]: 9.19998e-06 [order_py_execute_after_rewriter]: 7.01999e-06 [mutable_eliminate]: 0.00047273 [opt_b]: 0.00029095, [1] [Cycle 1]: 0.00028501, [7] [b_1]: 0.00019013 [b_2]: 1.059e-05 [updatestate_depend_eliminate]: 7.2e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 4.16001e-06 [renormalize]: 5.3001e-07 [cse]: 3.256e-05 [optimize_parallel_all_gather_comm]: 2.04e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.06e-05 [loop_unroll]: 0.000414 [opt_after_cconv]: 0.00013604, [1] [Cycle 1]: 0.00013026, [7] [c_1]: 4.797e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 7.15003e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.81001e-06 [cse]: 2.993e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 3.051e-05 [tuple_transform]: 0.0001016, [1] [Cycle 1]: 9.704e-05, [4] [d_1]: 6.679e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 9.71e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 6.109e-05 [cse_after_recomputation]: 3.325e-05, [1] [Cycle 1]: 2.827e-05, [1] [cse]: 2.255e-05 [environ_conv]: 9.92999e-06 [swap_dp_allreduce_reducescatter]: 7.73001e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.27003e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.64001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.711e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 4.94998e-06 [overlap_recompute_and_grad_model_parallel]: 5.42999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72999e-06 [overlap_recompute_comm]: 2.02999e-06 [overlap_grad_ring_attention]: 5.27001e-06 [overlap_grad_flash_sp]: 2.476e-05 [begin_end_overlap_inline]: 8.70001e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 0.00010061, [1] [Cycle 1]: 9.619e-05, [6] [build]: 9.97001e-06 [elim_shapecalc]: 1.378e-05 [elim_not_effective]: 1.882e-05 [opt_reshape]: 1.034e-05 [fold_const_symbol]: 1.529e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 2.553e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.83999e-06 [opt_after_jit_grad]: 0.00045948 [validate]: 4.627e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.0833746 [execute]: 9.40001e-06 Sums bootstrap : 0.000550s : 0.51% type_inference : 0.010437s : 9.73% event_method : 0.000043s : 0.04% auto_monad : 0.000117s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000132s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003151s : 2.94% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.46% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001445s : 1.35% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000109s : 0.10% optimize.opt_a.renormalize : 0.003034s : 2.83% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.08% optimize.opt_a.cse : 0.000245s : 0.23% optimize.opt_a.a_3 : 0.000462s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000473s : 0.44% optimize.opt_b.b_1 : 0.000190s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000414s : 0.39% optimize.opt_after_cconv.c_1 : 0.000048s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000061s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000459s : 0.43% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.083375s : 77.76% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000740 218 5.76% : 0.000043s : 11: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.04% : 0.000407s : 16: substitution.inline 2.06% : 0.000015s : 2: substitution.inline_without_move 1.40% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000015s : 3: substitution.less_batch_normalization 1.82% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000005s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.27% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.76% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.45% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010368 2 86.71% : 0.008990s : 1: type_inference.infer 13.29% : 0.001378s : 1: type_inference.specialize ------[replace.] 0.000204 30 58.57% : 0.000119s : 16: replace.inline 41.43% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000430 30 92.80% : 0.000399s : 16: match.inline 7.20% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000737 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.66% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001529 32 57.60% : 0.000881s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.40% : 0.000648s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.135730 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.20% : 0.002987s : 1: add_attr 2.19% : 0.002979s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000065s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000124s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.43% : 0.000587s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000482s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.54% : 0.004803s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.05% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000055s : 4: opt.transform.symbol_engine_opt 8.00% : 0.010863s : 1: opt_a 0.10% : 0.000140s : 1: opt_after_cconv 0.35% : 0.000469s : 1: opt_after_jit_grad 0.22% : 0.000295s : 1: opt_b 9.67% : 0.013127s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000035s : 1: remove_dup_value 1.18% : 0.001600s : 2: renormalize.infer 1.05% : 0.001420s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000053s : 1: rewriter_after_opt_a 0.10% : 0.000136s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000103s : 1: symbol_engine_optimizer 61.44% : 0.083396s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 7.70% : 0.010452s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x4-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x5-pynative],max_mem:42.0M TotalTime = 0.0216259, [24] [bootstrap]: 0.00056486 [type_inference]: 0.00617826 [event_method]: 1.537e-05 [auto_monad]: 5.559e-05 [graph_reusing]: 5.40999e-06 [inline]: 1.91e-06 [add_attr]: 0.00336396, [1] [add_attr_with_inline]: 0.00335264, [1] [Cycle 1]: 4.491e-05, [2] [tag_attr]: 1.6e-05 [meta_addattr_fg_expand]: 3.91001e-06 [parallel-infer-symbol]: 3.10002e-06 [pre_auto_parallel]: 2.84e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.37999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00399546, [53] [py_interpret_to_execute]: 1.988e-05 [rewriter_before_opt_a]: 6.116e-05 [opt_a]: 0.00216261, [2] [Cycle 1]: 0.0015305, [45] [expand_dump_flag]: 2.85002e-06 [switch_simplify]: 3.193e-05 [loop_unroll]: 2.098e-05 [a_1]: 0.00045409 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.20998e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.55e-05 [accelerated_algorithm]: 6.11e-06 [shard]: 2.11998e-06 [meta_shard_fg_expand]: 1.49998e-06 [shard_inline]: 6.17999e-06 [merge_send_recv]: 7.86001e-06 [auto_parallel]: 5.81003e-06 [parallel]: 2.423e-05 [flash_sp]: 6.93e-06 [merge_comm]: 3.89002e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.17999e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.26001e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.33002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 2.37001e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00042627 [add_forward_monad_depend]: 4.56002e-06 [auto_monad_grad]: 1.76998e-06 [auto_monad_eliminator]: 1.349e-05 [cse]: 3.102e-05 [a_3]: 4.108e-05 [Cycle 2]: 0.00062281, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.82002e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00012514 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 5.51002e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 6.719e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.65001e-06 [auto_parallel]: 5.12999e-06 [parallel]: 4.65999e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.78998e-06 [matmul_add_comm_reduction]: 5.29e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 5.77999e-06 [virtual_dataset]: 5.18002e-06 [get_grad_eliminate_]: 4.90999e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 6.24001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.019e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.08999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.02999e-06 [a_after_grad]: 8.25e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.28002e-06 [cse]: 1.415e-05 [a_3]: 3.132e-05 [py_interpret_to_execute_after_opt_a]: 7.61001e-06 [slice_cell_reuse_recomputed_activation]: 2.26998e-06 [rewriter_after_opt_a]: 3e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.57999e-06 [mutable_eliminate]: 0.00044577 [opt_b]: 0.00018038, [1] [Cycle 1]: 0.00017437, [7] [b_1]: 0.00010716 [b_2]: 6.88998e-06 [updatestate_depend_eliminate]: 5.01002e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 4.50003e-07 [cse]: 1.63e-05 [optimize_parallel_all_gather_comm]: 1.622e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.277e-05 [loop_unroll]: 0.00040505 [opt_after_cconv]: 9.512e-05, [1] [Cycle 1]: 8.951e-05, [7] [c_1]: 2.744e-05 [parameter_eliminate]: 2.40002e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.73e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.591e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.295e-05 [tuple_transform]: 6.908e-05, [1] [Cycle 1]: 6.463e-05, [4] [d_1]: 3.934e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.855e-05 [cse_after_recomputation]: 2.108e-05, [1] [Cycle 1]: 1.639e-05, [1] [cse]: 1.133e-05 [environ_conv]: 5.07999e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.53003e-06 [micro_interleaved_order_control]: 2.74001e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.70002e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.81e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.226e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 3.97e-06 [overlap_recompute_and_grad_model_parallel]: 4.3e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.61002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.79001e-06 [overlap_grad_ring_attention]: 4.23999e-06 [overlap_grad_flash_sp]: 1.724e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.71998e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.881e-05, [1] [Cycle 1]: 6.469e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.188e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.79003e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.61e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 0.00010978 [opt_after_jit_grad]: 0.00044231 [validate]: 3.184e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.00658959 [execute]: 7.16001e-06 Sums bootstrap : 0.000565s : 3.27% type_inference : 0.006178s : 35.78% event_method : 0.000015s : 0.09% auto_monad : 0.000056s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000061s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000579s : 3.35% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000426s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000446s : 2.58% optimize.opt_b.b_1 : 0.000107s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000405s : 2.35% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000110s : 0.64% opt_after_jit_grad : 0.000442s : 2.56% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006590s : 38.16% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000167 30 14.71% : 0.000025s : 5: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.41% : 0.000006s : 4: substitution.graph_param_transform 66.82% : 0.000112s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.23% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006132 2 90.27% : 0.005536s : 1: type_inference.infer 9.73% : 0.000596s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.36% : 0.000028s : 3: replace.inline 28.64% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.60% : 0.000110s : 3: match.inline 8.40% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.12% : 0.000003s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000010s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.87% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 1.03% : 0.000002s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.90% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 47.04% : 0.000176s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.96% : 0.000198s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030505 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.04% : 0.003368s : 1: add_attr 11.00% : 0.003356s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.97% : 0.000602s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000413s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.09% : 0.000943s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.10% : 0.002166s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.48% : 0.000452s : 1: opt_after_jit_grad 0.60% : 0.000184s : 1: opt_b 13.11% : 0.003999s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.72% : 0.000221s : 1: renormalize.infer 0.65% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.38% : 0.000115s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 21.64% : 0.006600s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.30% : 0.006192s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0184847, [24] [bootstrap]: 0.00047623 [type_inference]: 0.00448241 [event_method]: 1.051e-05 [auto_monad]: 5.244e-05 [graph_reusing]: 5.46e-06 [inline]: 1.71e-06 [add_attr]: 0.00299254, [1] [add_attr_with_inline]: 0.00298495, [1] [Cycle 1]: 4.462e-05, [2] [tag_attr]: 1.241e-05 [meta_addattr_fg_expand]: 4.05e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.155e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.99e-06 [optimize]: 0.00371406, [53] [py_interpret_to_execute]: 1.567e-05 [rewriter_before_opt_a]: 4.025e-05 [opt_a]: 0.00188271, [2] [Cycle 1]: 0.00126363, [45] [expand_dump_flag]: 2.55997e-06 [switch_simplify]: 2.516e-05 [loop_unroll]: 1.4e-05 [a_1]: 0.00029261 [with_stream_mark]: 1.409e-05 [recompute_prepare]: 7.65998e-06 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 3.14001e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 7.699e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 2.36e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 6.34999e-06 [merge_send_recv]: 7.95e-06 [auto_parallel]: 6.22001e-06 [parallel]: 1.815e-05 [flash_sp]: 7.26999e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.63002e-06 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 6.81999e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.87001e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.028e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.46002e-06 [before_grad]: 9.93002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.30998e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.76999e-06 [receive_attached]: 2.27999e-06 [after_resolve]: 1.118e-05 [a_after_grad]: 8.90001e-06 [renormalize]: 0.0003436 [add_forward_monad_depend]: 4.62998e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.32e-05 [cse]: 2.831e-05 [a_3]: 4.006e-05 [Cycle 2]: 0.00060958, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00013872 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 6.03002e-06 [updatestate_depend_eliminate]: 2.90002e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.50002e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.802e-05 [accelerated_algorithm]: 5.94e-06 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.22998e-06 [auto_parallel]: 5.35999e-06 [parallel]: 4.25e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.18998e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.69e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 5.24998e-06 [virtual_output]: 5.04998e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 6.43e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.81001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 9.49978e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.74e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.358e-05 [a_3]: 3.114e-05 [py_interpret_to_execute_after_opt_a]: 7.76001e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.13e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.81e-06 [mutable_eliminate]: 0.00045613 [opt_b]: 0.00018235, [1] [Cycle 1]: 0.00017618, [7] [b_1]: 0.00010828 [b_2]: 7.1e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.59986e-07 [cse]: 1.605e-05 [optimize_parallel_all_gather_comm]: 1.633e-05 [overlap_param_gather]: 1.95001e-06 [cconv]: 2.253e-05 [loop_unroll]: 0.00042372 [opt_after_cconv]: 9.397e-05, [1] [Cycle 1]: 8.847e-05, [7] [c_1]: 2.825e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 4.94003e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.578e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 1.326e-05 [tuple_transform]: 6.971e-05, [1] [Cycle 1]: 6.56e-05, [4] [d_1]: 3.913e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.41998e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.333e-05 [cse_after_recomputation]: 1.932e-05, [1] [Cycle 1]: 1.526e-05, [1] [cse]: 1.025e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.47998e-06 [label_fine_grained_interleaved_index]: 2.93998e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.59001e-06 [micro_interleaved_order_control]: 2.62001e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.86999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.75001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.221e-05 [grouped_pairwise_exchange_alltoall]: 1.77999e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.46002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.26998e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.703e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.16998e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 1.42e-06 [symbol_engine_optimizer]: 6.831e-05, [1] [Cycle 1]: 6.43e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.37998e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 6.32001e-06 [fold_const_symbol]: 8.69003e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.14999e-06 [opt_after_jit_grad]: 0.0004588 [validate]: 3.105e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00600184 [execute]: 7.45e-06 Sums bootstrap : 0.000476s : 3.28% type_inference : 0.004482s : 30.84% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000431s : 2.97% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000344s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000042s : 0.29% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000456s : 3.14% optimize.opt_b.b_1 : 0.000108s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000424s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000459s : 3.16% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006002s : 41.29% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.94% : 0.000023s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.31% : 0.000005s : 4: substitution.graph_param_transform 65.20% : 0.000080s : 2: substitution.inline 2.44% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.10% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004441 2 92.01% : 0.004086s : 1: type_inference.infer 7.99% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000153 984 1.00% : 0.000002s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 0.65% : 0.000001s : 9: predicate.addn_zero_filter 0.62% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.05% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.73% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 9.75% : 0.000015s : 9: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.05% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.00% : 0.000002s : 13: predicate.environ_get_add_eliminate 0.96% : 0.000001s : 13: predicate.environ_get_depend_swap 1.70% : 0.000003s : 21: predicate.environ_get_eliminate 0.97% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.84% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.70% : 0.000003s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.95% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.21% : 0.000008s : 44: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.51% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.99% : 0.000003s : 26: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.64% : 0.000003s : 18: predicate.loop_unroll_before_grad 2.06% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.62% : 0.000001s : 9: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.12% : 0.000002s : 11: predicate.partial_defer_inline 1.10% : 0.000002s : 13: predicate.partial_eliminate 0.72% : 0.000001s : 9: predicate.print_const_string_wrapper 0.95% : 0.000001s : 8: predicate.reduce_all_const_elim 0.89% : 0.000001s : 9: predicate.reduce_eliminate 1.89% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 8: predicate.remove_not_recompute_node 1.18% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.70% : 0.000001s : 9: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.04% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.93% : 0.000001s : 11: predicate.switch_defer_inline 1.55% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.00% : 0.000006s : 41: predicate.switch_simplify 0.70% : 0.000001s : 9: predicate.tile_eliminate 0.72% : 0.000001s : 9: predicate.transpose_eliminate 1.40% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.89% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.78% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 43.48% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.52% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026481 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.32% : 0.002997s : 1: add_attr 11.29% : 0.002988s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000512s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000432s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000465s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.96% : 0.000785s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.12% : 0.001886s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.77% : 0.000469s : 1: opt_after_jit_grad 0.70% : 0.000186s : 1: opt_b 14.04% : 0.003718s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.71% : 0.000189s : 1: renormalize.infer 0.56% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.70% : 0.006012s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.98% : 0.004497s : 1: type_inference 0.21% : 0.000057s : 1: validate TotalTime = 0.0190575, [24] [bootstrap]: 0.00041044 [type_inference]: 0.00544232 [event_method]: 1.355e-05 [auto_monad]: 4.831e-05 [graph_reusing]: 5.07999e-06 [inline]: 1.66e-06 [add_attr]: 0.00287704, [1] [add_attr_with_inline]: 0.00286904, [1] [Cycle 1]: 4.281e-05, [2] [tag_attr]: 1.372e-05 [meta_addattr_fg_expand]: 4.50999e-06 [parallel-infer-symbol]: 2.43e-06 [pre_auto_parallel]: 2.319e-05 [insert-virtual-dataset]: 1.94e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.56002e-06 [pipeline_split]: 1.94e-06 [optimize]: 0.00389563, [53] [py_interpret_to_execute]: 1.846e-05 [rewriter_before_opt_a]: 5.685e-05 [opt_a]: 0.00207377, [2] [Cycle 1]: 0.001472, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 3.009e-05 [loop_unroll]: 2.09e-05 [a_1]: 0.00043476 [with_stream_mark]: 1.207e-05 [recompute_prepare]: 7.71999e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 1.29998e-06 [a_2]: 7.55e-05 [accelerated_algorithm]: 6.35997e-06 [shard]: 1.55001e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 6.41e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.432e-05 [flash_sp]: 6.24001e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 6.88e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 6.12001e-06 [get_grad_eliminate_]: 5.69999e-06 [virtual_output]: 5.51002e-06 [merge_forward]: 3.25002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 8.62998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.062e-05 [merge_recompute_call_nodes]: 9.50007e-07 [before_grad]: 9.54999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.07001e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.001e-05 [a_after_grad]: 8.69e-06 [renormalize]: 0.00039832 [add_forward_monad_depend]: 4.39002e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.233e-05 [cse]: 2.548e-05 [a_3]: 4.16e-05 [Cycle 2]: 0.00059268, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 6.74999e-06 [loop_unroll]: 5.71e-06 [a_1]: 0.00012634 [with_stream_mark]: 8.62e-06 [recompute_prepare]: 5.60001e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.99979e-07 [a_2]: 6.734e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.29997e-06 [auto_parallel]: 5.26998e-06 [parallel]: 4.08999e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.20001e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.48998e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.81998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39998e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47997e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 6.89994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.49e-06 [a_after_grad]: 8.2e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.627e-05 [a_3]: 3.176e-05 [py_interpret_to_execute_after_opt_a]: 7.96001e-06 [slice_cell_reuse_recomputed_activation]: 1.64998e-06 [rewriter_after_opt_a]: 2.914e-05 [convert_after_rewriter]: 5.81e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00045816 [opt_b]: 0.00018297, [1] [Cycle 1]: 0.00017722, [7] [b_1]: 0.00011003 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 3.39991e-07 [cse]: 1.611e-05 [optimize_parallel_all_gather_comm]: 1.317e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.051e-05 [loop_unroll]: 0.00042226 [opt_after_cconv]: 9.391e-05, [1] [Cycle 1]: 8.834e-05, [7] [c_1]: 2.77e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.605e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.043e-05 [tuple_transform]: 6.743e-05, [1] [Cycle 1]: 6.316e-05, [4] [d_1]: 3.778e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 5.99999e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.012e-05 [cse_after_recomputation]: 1.966e-05, [1] [Cycle 1]: 1.529e-05, [1] [cse]: 1.009e-05 [environ_conv]: 3.91999e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.00002e-06 [label_micro_interleaved_index]: 3.98999e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 9.20001e-07 [slice_recompute_activation]: 1.82999e-06 [micro_interleaved_order_control]: 1.90001e-06 [assign_add_opt]: 8.80013e-07 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.02001e-06 [comm_op_add_attrs]: 6.50005e-07 [add_comm_op_reuse_tag]: 8.39995e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.17e-06 [control_data_broadcast_order]: 1.117e-05 [grouped_pairwise_exchange_alltoall]: 1.22e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.32e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 4.12998e-06 [overlap_grad_flash_sp]: 1.698e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 1.40999e-06 [split_layernorm_comm]: 1.55001e-06 [handle_group_info]: 8.09989e-07 [symbol_engine_optimizer]: 6.687e-05, [1] [Cycle 1]: 6.248e-05, [6] [build]: 2.02999e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.133e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.83001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.86998e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.398e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.23e-06 [opt_after_jit_grad]: 0.00045557 [validate]: 2.85e-05 [backend_pass]: 7.10017e-07 [task_emit]: 0.00563653 [execute]: 6.63e-06 Sums bootstrap : 0.000410s : 2.70% type_inference : 0.005442s : 35.75% event_method : 0.000014s : 0.09% auto_monad : 0.000048s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000002s : 0.02% pre_auto_parallel : 0.000023s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000561s : 3.69% optimize.opt_a.with_stream_mark : 0.000021s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000011s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000018s : 0.12% optimize.opt_a.flash_sp : 0.000009s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000012s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000398s : 2.62% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.19% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000458s : 3.01% optimize.opt_b.b_1 : 0.000110s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000013s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.13% optimize.loop_unroll : 0.000422s : 2.77% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000010s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000040s : 0.26% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000001s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000014s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000456s : 2.99% validate : 0.000029s : 0.19% backend_pass : 0.000001s : 0.00% task_emit : 0.005637s : 37.03% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000154 30 15.02% : 0.000023s : 5: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000002s : 2: substitution.fold_const_symbol 2.97% : 0.000005s : 4: substitution.graph_param_transform 64.64% : 0.000100s : 3: substitution.inline 2.00% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000004s : 4: substitution.remove_not_recompute_node 2.79% : 0.000004s : 4: substitution.replace_old_param 7.47% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005405 2 89.15% : 0.004818s : 1: type_inference.infer 10.85% : 0.000586s : 1: type_inference.specialize ------[replace.] 0.000036 5 69.46% : 0.000025s : 3: replace.inline 30.54% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000108 5 90.28% : 0.000098s : 3: match.inline 9.72% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.41% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000001s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.97% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.91% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.19% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.10% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000330 8 44.47% : 0.000147s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.53% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027315 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.55% : 0.002881s : 1: add_attr 10.52% : 0.002873s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000044s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000053s : 1: auto_monad 0.06% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.60% : 0.000438s : 1: bootstrap 0.09% : 0.000024s : 1: cconv 0.01% : 0.000003s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000009s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000003s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000468s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.39% : 0.000926s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.60% : 0.002077s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.70% : 0.000465s : 1: opt_after_jit_grad 0.68% : 0.000187s : 1: opt_b 14.28% : 0.003900s : 1: optimize 0.06% : 0.000016s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000014s : 1: remove_dup_value 0.73% : 0.000200s : 1: renormalize.infer 0.70% : 0.000192s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000033s : 1: rewriter_after_opt_a 0.22% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000004s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000069s : 1: symbol_engine_optimizer 20.67% : 0.005646s : 1: task_emit 0.26% : 0.000070s : 1: tuple_transform 19.97% : 0.005456s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.0371742, [24] [bootstrap]: 0.00051899 [type_inference]: 0.0112602 [event_method]: 4.612e-05 [auto_monad]: 0.00011621 [graph_reusing]: 8.08999e-06 [inline]: 1.89e-06 [add_attr]: 0.00297426, [1] [add_attr_with_inline]: 0.00296629, [1] [Cycle 1]: 6.925e-05, [2] [tag_attr]: 3.408e-05 [meta_addattr_fg_expand]: 8.85001e-06 [parallel-infer-symbol]: 2.63998e-06 [pre_auto_parallel]: 4.713e-05 [insert-virtual-dataset]: 1.97999e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 1.72999e-06 [pipeline_split]: 1.38002e-06 [optimize]: 0.0132855, [53] [py_interpret_to_execute]: 3.543e-05 [rewriter_before_opt_a]: 0.00014621 [opt_a]: 0.0110087, [3] [Cycle 1]: 0.00705503, [45] [expand_dump_flag]: 4.03001e-06 [switch_simplify]: 7.251e-05 [loop_unroll]: 6.204e-05 [a_1]: 0.00147098 [with_stream_mark]: 2.284e-05 [recompute_prepare]: 2.197e-05 [updatestate_depend_eliminate]: 8.85999e-06 [updatestate_assign_eliminate]: 7.73001e-06 [updatestate_loads_eliminate]: 6.56999e-06 [parameter_eliminate]: 2.34001e-06 [a_2]: 0.00024632 [accelerated_algorithm]: 3.139e-05 [shard]: 1.58002e-06 [meta_shard_fg_expand]: 3.40998e-06 [shard_inline]: 1.625e-05 [merge_send_recv]: 1.501e-05 [auto_parallel]: 1.067e-05 [parallel]: 1.577e-05 [flash_sp]: 1.022e-05 [merge_comm]: 9.92001e-06 [allreduce_fusion]: 8.98002e-06 [matmul_add_comm_reduction]: 2.502e-05 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.773e-05 [virtual_dataset]: 1.554e-05 [get_grad_eliminate_]: 1.538e-05 [virtual_output]: 1.526e-05 [merge_forward]: 9.45001e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 1.716e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.875e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 2.868e-05 [set_forward_comm_id_for_comm_node_pass]: 9.56003e-06 [meta_fg_expand]: 0.00141644 [flash_sp_send_recv_attached]: 3.00998e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 5.991e-05 [a_after_grad]: 8.135e-05 [renormalize]: 0.00241185 [add_forward_monad_depend]: 8.62e-06 [auto_monad_grad]: 5.59998e-06 [auto_monad_eliminator]: 5.546e-05 [cse]: 0.00015937 [a_3]: 0.00033716 [Cycle 2]: 0.00303243, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.744e-05 [loop_unroll]: 4.399e-05 [a_1]: 0.0015359 [with_stream_mark]: 1.207e-05 [recompute_prepare]: 1.082e-05 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 4.22998e-06 [updatestate_loads_eliminate]: 4.03999e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 0.00012634 [accelerated_algorithm]: 1.19e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.76998e-06 [shard_inline]: 9.39e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 7.63001e-06 [parallel]: 4.88001e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.10999e-06 [allreduce_fusion]: 4.72998e-06 [matmul_add_comm_reduction]: 7.41001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.048e-05 [virtual_dataset]: 8.86997e-06 [get_grad_eliminate_]: 8.68001e-06 [virtual_output]: 8.79e-06 [merge_forward]: 5.22e-06 [cell_reuse_recompute_pass]: 8.80013e-07 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.674e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 8.235e-05 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.42e-06 [after_resolve]: 1.77e-05 [a_after_grad]: 1.5e-05 [renormalize]: 0.00059055 [add_forward_monad_depend]: 3.83999e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.486e-05 [cse]: 4.688e-05 [a_3]: 6.631e-05 [Cycle 3]: 0.0009072, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 1.05e-05 [loop_unroll]: 8.77999e-06 [a_1]: 0.00025057 [with_stream_mark]: 1.004e-05 [recompute_prepare]: 9.49e-06 [updatestate_depend_eliminate]: 4.91002e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 1.07998e-06 [a_2]: 0.00012349 [accelerated_algorithm]: 1.152e-05 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.04001e-06 [parallel]: 4.65001e-06 [flash_sp]: 1.16002e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.9e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.011e-05 [virtual_dataset]: 8.87999e-06 [get_grad_eliminate_]: 8.66002e-06 [virtual_output]: 8.42998e-06 [merge_forward]: 4.27998e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.67998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.616e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34998e-06 [meta_fg_expand]: 2.96001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.508e-05 [a_after_grad]: 1.516e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 1.131e-05 [cse]: 2.631e-05 [a_3]: 5.956e-05 [py_interpret_to_execute_after_opt_a]: 1.026e-05 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 4.764e-05 [convert_after_rewriter]: 9.02999e-06 [order_py_execute_after_rewriter]: 6.89999e-06 [mutable_eliminate]: 0.0004682 [opt_b]: 0.00028932, [1] [Cycle 1]: 0.00028276, [7] [b_1]: 0.00018977 [b_2]: 1.082e-05 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 4.03001e-06 [renormalize]: 4.30009e-07 [cse]: 3.096e-05 [optimize_parallel_all_gather_comm]: 1.918e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 1.987e-05 [loop_unroll]: 0.00043348 [opt_after_cconv]: 0.00013659, [1] [Cycle 1]: 0.00013082, [7] [c_1]: 4.879e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 7.2e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 3.83001e-06 [cse]: 2.992e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 2.903e-05 [tuple_transform]: 0.00010186, [1] [Cycle 1]: 9.726e-05, [4] [d_1]: 6.712e-05 [none_parameter_eliminate]: 1.29e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 1.013e-05 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.411e-05 [cse_after_recomputation]: 3.219e-05, [1] [Cycle 1]: 2.75e-05, [1] [cse]: 2.206e-05 [environ_conv]: 8.55999e-06 [swap_dp_allreduce_reducescatter]: 8.18001e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 9.30013e-07 [slice_recompute_activation]: 1.82999e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.41998e-06 [full_micro_interleaved_order_control]: 2.03997e-06 [reorder_send_recv_between_fp_bp]: 2.32999e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.00006e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.16997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.698e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 4.84998e-06 [overlap_recompute_and_grad_model_parallel]: 5.27001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.14998e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 5.05001e-06 [overlap_grad_flash_sp]: 2.366e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.59001e-06 [split_layernorm_comm]: 1.50001e-06 [handle_group_info]: 7.89994e-07 [symbol_engine_optimizer]: 0.00010013, [1] [Cycle 1]: 9.588e-05, [6] [build]: 1.086e-05 [elim_shapecalc]: 1.341e-05 [elim_not_effective]: 1.811e-05 [opt_reshape]: 9.99001e-06 [fold_const_symbol]: 1.527e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 2.414e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.28e-06 [opt_after_jit_grad]: 0.00049192 [validate]: 4.262e-05 [backend_pass]: 7.00005e-07 [task_emit]: 0.00813295 [execute]: 6.64001e-06 Sums bootstrap : 0.000519s : 1.58% type_inference : 0.011260s : 34.18% event_method : 0.000046s : 0.14% auto_monad : 0.000116s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000146s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.40% optimize.opt_a.loop_unroll : 0.000115s : 0.35% optimize.opt_a.a_1 : 0.003257s : 9.89% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000014s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.51% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000025s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.04% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001502s : 4.56% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000093s : 0.28% optimize.opt_a.a_after_grad : 0.000112s : 0.34% optimize.opt_a.renormalize : 0.003002s : 9.11% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.25% optimize.opt_a.cse : 0.000233s : 0.71% optimize.opt_a.a_3 : 0.000463s : 1.41% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000468s : 1.42% optimize.opt_b.b_1 : 0.000190s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000433s : 1.32% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000054s : 0.16% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000492s : 1.49% validate : 0.000043s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008133s : 24.69% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000777 222 5.77% : 0.000045s : 12: substitution.arithmetic_simplify 1.77% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.62% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 53.71% : 0.000417s : 17: substitution.inline 2.04% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.03% : 0.000016s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.69% : 0.000013s : 20: substitution.remove_not_recompute_node 3.00% : 0.000023s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 6.33% : 0.000049s : 11: substitution.tuple_list_convert_item_index_to_positive 1.72% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.46% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.29% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011188 2 87.18% : 0.009754s : 1: type_inference.infer 12.82% : 0.001435s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.86% : 0.000126s : 17: replace.inline 42.14% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000443 33 92.30% : 0.000409s : 17: match.inline 7.70% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.17% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.05% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.68% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001527 34 57.20% : 0.000874s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.80% : 0.000654s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061700 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.83% : 0.002979s : 1: add_attr 4.81% : 0.002970s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.09% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000124s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.89% : 0.000548s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000003s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000478s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.99% : 0.004932s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.85% : 0.011012s : 1: opt_a 0.23% : 0.000140s : 1: opt_after_cconv 0.81% : 0.000502s : 1: opt_after_jit_grad 0.47% : 0.000293s : 1: opt_b 21.54% : 0.013289s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000052s : 1: pre_auto_parallel 0.06% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.66% : 0.001640s : 2: renormalize.infer 2.19% : 0.001350s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.25% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.20% : 0.008144s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.28% : 0.011276s : 1: type_inference 0.12% : 0.000073s : 1: validate TotalTime = 0.0186524, [24] [bootstrap]: 0.0004493 [type_inference]: 0.00434141 [event_method]: 1.033e-05 [auto_monad]: 5.277e-05 [graph_reusing]: 5.26998e-06 [inline]: 2.13002e-06 [add_attr]: 0.00302556, [1] [add_attr_with_inline]: 0.00301758, [1] [Cycle 1]: 4.391e-05, [2] [tag_attr]: 1.212e-05 [meta_addattr_fg_expand]: 3.03e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.092e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00370314, [53] [py_interpret_to_execute]: 1.531e-05 [rewriter_before_opt_a]: 3.843e-05 [opt_a]: 0.00188676, [2] [Cycle 1]: 0.00125779, [45] [expand_dump_flag]: 3.16001e-06 [switch_simplify]: 2.438e-05 [loop_unroll]: 1.354e-05 [a_1]: 0.00029289 [with_stream_mark]: 1.436e-05 [recompute_prepare]: 7.21001e-06 [updatestate_depend_eliminate]: 3.55998e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.81003e-06 [a_2]: 7.724e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 6.04999e-06 [merge_send_recv]: 7.68001e-06 [auto_parallel]: 5.75001e-06 [parallel]: 1.78e-05 [flash_sp]: 7.46001e-06 [merge_comm]: 3.75998e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 9.56003e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 9.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.59998e-06 [before_grad]: 9.07999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.11e-05 [a_after_grad]: 9.03002e-06 [renormalize]: 0.00034021 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.344e-05 [cse]: 2.836e-05 [a_3]: 4.04e-05 [Cycle 2]: 0.00061983, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.61e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00012549 [with_stream_mark]: 3.422e-05 [recompute_prepare]: 6.23998e-06 [updatestate_depend_eliminate]: 2.64001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 6.901e-05 [accelerated_algorithm]: 5.68002e-06 [shard]: 1.14003e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.59998e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.08999e-06 [flash_sp]: 3.32002e-06 [merge_comm]: 3.18998e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.34998e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.12001e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 9.99979e-07 [after_resolve]: 8.89998e-06 [a_after_grad]: 8.54998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.71999e-06 [cse]: 1.291e-05 [a_3]: 3.162e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.01e-05 [convert_after_rewriter]: 7.24001e-06 [order_py_execute_after_rewriter]: 5.55001e-06 [mutable_eliminate]: 0.00045438 [opt_b]: 0.00018363, [1] [Cycle 1]: 0.00017755, [7] [b_1]: 0.00010997 [b_2]: 7.35998e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.33998e-06 [renormalize]: 6.19999e-07 [cse]: 1.569e-05 [optimize_parallel_all_gather_comm]: 1.597e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.255e-05 [loop_unroll]: 0.00041599 [opt_after_cconv]: 9.365e-05, [1] [Cycle 1]: 8.814e-05, [7] [c_1]: 2.788e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.532e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.175e-05 [tuple_transform]: 7.061e-05, [1] [Cycle 1]: 6.598e-05, [4] [d_1]: 3.962e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.91003e-06 [add_recomputation]: 4.627e-05 [cse_after_recomputation]: 2.05e-05, [1] [Cycle 1]: 1.611e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.3e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.37001e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.26997e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 1.10999e-06 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.32999e-06 [comm_op_add_attrs]: 8.79983e-07 [add_comm_op_reuse_tag]: 1.14e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61998e-06 [control_data_broadcast_order]: 1.185e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.64002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.651e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.48e-06 [split_layernorm_comm]: 1.42e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 6.811e-05, [1] [Cycle 1]: 6.402e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 7.94997e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.99998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.577e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.06999e-06 [opt_after_jit_grad]: 0.00045237 [validate]: 3.134e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.00632358 [execute]: 7.82e-06 Sums bootstrap : 0.000449s : 3.06% type_inference : 0.004341s : 29.59% event_method : 0.000010s : 0.07% auto_monad : 0.000053s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000418s : 2.85% optimize.opt_a.with_stream_mark : 0.000049s : 0.33% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000340s : 2.32% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000454s : 3.10% optimize.opt_b.b_1 : 0.000110s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000416s : 2.84% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.32% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000001s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 3.08% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006324s : 43.10% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000122 26 19.25% : 0.000024s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.69% : 0.000006s : 4: substitution.graph_param_transform 64.16% : 0.000078s : 2: substitution.inline 2.15% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.67% : 0.000004s : 4: substitution.remove_not_recompute_node 3.44% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004299 2 91.86% : 0.003949s : 1: type_inference.infer 8.14% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.82% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.48% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.71% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.58% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.53% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 41.61% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.39% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026654 196 0.02% : 0.000004s : 1: ForceFp32Comm 11.37% : 0.003030s : 1: add_attr 11.33% : 0.003021s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000050s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.81% : 0.000482s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000464s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.89% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001890s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.73% : 0.000462s : 1: opt_after_jit_grad 0.70% : 0.000187s : 1: opt_b 13.91% : 0.003707s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.70% : 0.000187s : 1: renormalize.infer 0.55% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.76% : 0.006334s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.34% : 0.004356s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.035589, [24] [bootstrap]: 0.00044533 [type_inference]: 0.0102108 [event_method]: 4.026e-05 [auto_monad]: 0.00011281 [graph_reusing]: 7.42998e-06 [inline]: 2.14999e-06 [add_attr]: 0.00289815, [1] [add_attr_with_inline]: 0.0028899, [1] [Cycle 1]: 6.478e-05, [2] [tag_attr]: 2.989e-05 [meta_addattr_fg_expand]: 8.59e-06 [parallel-infer-symbol]: 2.59001e-06 [pre_auto_parallel]: 4.408e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.70001e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.0130754, [53] [py_interpret_to_execute]: 3.41e-05 [rewriter_before_opt_a]: 0.00012831 [opt_a]: 0.0108184, [3] [Cycle 1]: 0.00692309, [45] [expand_dump_flag]: 3.33e-06 [switch_simplify]: 6.509e-05 [loop_unroll]: 5.4e-05 [a_1]: 0.00138428 [with_stream_mark]: 2.225e-05 [recompute_prepare]: 2.282e-05 [updatestate_depend_eliminate]: 8.97e-06 [updatestate_assign_eliminate]: 7.8e-06 [updatestate_loads_eliminate]: 7.03998e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 0.00024427 [accelerated_algorithm]: 3.097e-05 [shard]: 1.50999e-06 [meta_shard_fg_expand]: 3.46999e-06 [shard_inline]: 1.579e-05 [merge_send_recv]: 1.543e-05 [auto_parallel]: 1.034e-05 [parallel]: 1.779e-05 [flash_sp]: 1.074e-05 [merge_comm]: 9.91e-06 [allreduce_fusion]: 8.75999e-06 [matmul_add_comm_reduction]: 2.508e-05 [allreduce_slice_to_reducescatter]: 5.19998e-07 [virtual_shard_identity]: 1.809e-05 [virtual_dataset]: 1.603e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.528e-05 [merge_forward]: 9.66003e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.708e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.812e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 2.707e-05 [set_forward_comm_id_for_comm_node_pass]: 9.59999e-06 [meta_fg_expand]: 0.00138969 [flash_sp_send_recv_attached]: 3.25e-06 [receive_attached]: 2.68003e-06 [after_resolve]: 5.889e-05 [a_after_grad]: 8e-05 [renormalize]: 0.00241106 [add_forward_monad_depend]: 9.67999e-06 [auto_monad_grad]: 5.36998e-06 [auto_monad_eliminator]: 5.599e-05 [cse]: 0.0001601 [a_3]: 0.00033388 [Cycle 2]: 0.00298186, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 4.695e-05 [loop_unroll]: 4.317e-05 [a_1]: 0.00156342 [with_stream_mark]: 1.223e-05 [recompute_prepare]: 1.096e-05 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 4.51002e-06 [updatestate_loads_eliminate]: 3.55e-06 [parameter_eliminate]: 1.22e-06 [a_2]: 0.00012521 [accelerated_algorithm]: 1.17e-05 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 6.53e-06 [auto_parallel]: 7.45e-06 [parallel]: 4.83001e-06 [flash_sp]: 3.32002e-06 [merge_comm]: 5.20001e-06 [allreduce_fusion]: 4.64998e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.052e-05 [virtual_dataset]: 8.85999e-06 [get_grad_eliminate_]: 8.57e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.22999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.612e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.404e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 3.523e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.09e-06 [after_resolve]: 1.485e-05 [a_after_grad]: 1.43e-05 [renormalize]: 0.00057945 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.501e-05 [cse]: 4.419e-05 [a_3]: 6.449e-05 [Cycle 3]: 0.00089962, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 1.045e-05 [loop_unroll]: 8.89e-06 [a_1]: 0.00024707 [with_stream_mark]: 9.79999e-06 [recompute_prepare]: 9.22001e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 4.03999e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012311 [accelerated_algorithm]: 1.176e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.86998e-06 [shard_inline]: 9.07001e-06 [merge_send_recv]: 7.03998e-06 [auto_parallel]: 7.15e-06 [parallel]: 4.67998e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 4.80999e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.039e-05 [virtual_dataset]: 9.07001e-06 [get_grad_eliminate_]: 8.65001e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.31002e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.59002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.606e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.411e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 2.91999e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.311e-05 [a_after_grad]: 1.597e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 1.128e-05 [cse]: 2.561e-05 [a_3]: 5.916e-05 [py_interpret_to_execute_after_opt_a]: 1.073e-05 [slice_cell_reuse_recomputed_activation]: 2.48e-06 [rewriter_after_opt_a]: 4.725e-05 [convert_after_rewriter]: 8.87e-06 [order_py_execute_after_rewriter]: 6.63998e-06 [mutable_eliminate]: 0.00046036 [opt_b]: 0.00028529, [1] [Cycle 1]: 0.00027909, [7] [b_1]: 0.0001874 [b_2]: 1.025e-05 [updatestate_depend_eliminate]: 7.25e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 4.00998e-06 [renormalize]: 5.39992e-07 [cse]: 3.087e-05 [optimize_parallel_all_gather_comm]: 1.987e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 1.967e-05 [loop_unroll]: 0.00042653 [opt_after_cconv]: 0.00015892, [1] [Cycle 1]: 0.00015284, [7] [c_1]: 7.064e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 7.58999e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.94002e-06 [cse]: 2.896e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 2.826e-05 [tuple_transform]: 0.00010173, [1] [Cycle 1]: 9.691e-05, [4] [d_1]: 6.669e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 1.034e-05 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 5.457e-05 [cse_after_recomputation]: 3.104e-05, [1] [Cycle 1]: 2.647e-05, [1] [cse]: 2.108e-05 [environ_conv]: 7.91001e-06 [swap_dp_allreduce_reducescatter]: 7.93999e-06 [bias_add_comm_swap]: 2.02001e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.20002e-06 [merge_cast_opt]: 8.80013e-07 [slice_recompute_activation]: 2.02999e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 6.10016e-07 [remove_cast_before_assign_add]: 8.49977e-07 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 7.89994e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.43002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.71e-05 [grouped_pairwise_exchange_alltoall]: 1.77001e-06 [offloading_packed_experts]: 5.14e-06 [overlap_recompute_and_grad_model_parallel]: 5.40999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.24998e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 5.52999e-06 [overlap_grad_flash_sp]: 2.325e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 7.7e-07 [symbol_engine_optimizer]: 0.0001001, [1] [Cycle 1]: 9.566e-05, [6] [build]: 1.003e-05 [elim_shapecalc]: 1.353e-05 [elim_not_effective]: 1.838e-05 [opt_reshape]: 1.03e-05 [fold_const_symbol]: 1.521e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.26002e-06 [auto_monad_reorder]: 2.441e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00047322 [validate]: 4.327e-05 [backend_pass]: 6.49976e-07 [task_emit]: 0.00798773 [execute]: 6.28998e-06 Sums bootstrap : 0.000445s : 1.42% type_inference : 0.010211s : 32.48% event_method : 0.000040s : 0.13% auto_monad : 0.000113s : 0.36% graph_reusing : 0.000007s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000044s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000128s : 0.41% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000122s : 0.39% optimize.opt_a.loop_unroll : 0.000106s : 0.34% optimize.opt_a.a_1 : 0.003195s : 10.16% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.14% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001428s : 4.54% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002991s : 9.51% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.26% optimize.opt_a.cse : 0.000230s : 0.73% optimize.opt_a.a_3 : 0.000458s : 1.46% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000460s : 1.46% optimize.opt_b.b_1 : 0.000187s : 0.60% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000427s : 1.36% optimize.opt_after_cconv.c_1 : 0.000071s : 0.22% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000055s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000473s : 1.51% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.007988s : 25.41% execute : 0.000006s : 0.02% Time group info: ------[substitution.] 0.000792 218 5.19% : 0.000041s : 11: substitution.arithmetic_simplify 1.73% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 57.93% : 0.000459s : 16: substitution.inline 1.95% : 0.000015s : 2: substitution.inline_without_move 1.25% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000014s : 11: substitution.minmaximum_grad 0.64% : 0.000005s : 5: substitution.partial_eliminate 1.63% : 0.000013s : 20: substitution.remove_not_recompute_node 3.05% : 0.000024s : 10: substitution.replace_applicator 1.29% : 0.000010s : 15: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.53% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.73% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.03% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.36% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010143 2 86.82% : 0.008806s : 1: type_inference.infer 13.18% : 0.001337s : 1: type_inference.specialize ------[replace.] 0.000201 30 58.43% : 0.000117s : 16: replace.inline 41.57% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000481 30 93.68% : 0.000450s : 16: match.inline 6.32% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.19% : 0.000009s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.16% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000042s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 97: predicate.partial_defer_inline 1.68% : 0.000012s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.73% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.68% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.95% : 0.000037s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.43% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.59% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001484 32 58.70% : 0.000871s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.30% : 0.000613s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059741 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.86% : 0.002903s : 1: add_attr 4.84% : 0.002893s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000058s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000120s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.79% : 0.000473s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000003s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.09% : 0.004836s : 117: opt.transform.opt_a 0.12% : 0.000069s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000172s : 28: opt.transform.opt_b 0.13% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 18.11% : 0.010822s : 1: opt_a 0.27% : 0.000162s : 1: opt_after_cconv 0.81% : 0.000483s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.89% : 0.013079s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000049s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.73% : 0.001632s : 2: renormalize.infer 2.26% : 0.001348s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.39% : 0.007998s : 1: task_emit 0.18% : 0.000105s : 1: tuple_transform 17.12% : 0.010226s : 1: type_inference 0.13% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x5-kbk],max_mem:42.0M . TotalTime = 0.0812338, [24] [bootstrap]: 0.00056817 [type_inference]: 0.00620224 [event_method]: 1.427e-05 [auto_monad]: 5.803e-05 [graph_reusing]: 5.47001e-06 [inline]: 1.97001e-06 [add_attr]: 0.00342119, [1] [add_attr_with_inline]: 0.00340969, [1] [Cycle 1]: 4.708e-05, [2] [tag_attr]: 1.586e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 3.01001e-06 [pre_auto_parallel]: 2.91e-05 [insert-virtual-dataset]: 2.64001e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.0039976, [53] [py_interpret_to_execute]: 2.007e-05 [rewriter_before_opt_a]: 5.953e-05 [opt_a]: 0.00212411, [2] [Cycle 1]: 0.00152931, [45] [expand_dump_flag]: 3.14999e-06 [switch_simplify]: 3.281e-05 [loop_unroll]: 2.074e-05 [a_1]: 0.00045963 [with_stream_mark]: 1.661e-05 [recompute_prepare]: 7.71999e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 3.71999e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 1.95001e-06 [a_2]: 7.609e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 2.16998e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 8.02998e-06 [auto_parallel]: 6.07999e-06 [parallel]: 2.37e-05 [flash_sp]: 7.34002e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.33e-06 [virtual_dataset]: 6.06998e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.51002e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.54999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.061e-05 [merge_recompute_call_nodes]: 1.84998e-06 [before_grad]: 9.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.01e-05 [a_after_grad]: 9.25999e-06 [renormalize]: 0.00041455 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.507e-05 [cse]: 2.846e-05 [a_3]: 4.021e-05 [Cycle 2]: 0.00058582, [45] [expand_dump_flag]: 8.09989e-07 [switch_simplify]: 6.68998e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012486 [with_stream_mark]: 1.018e-05 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.771e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.17e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.11002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.49001e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.01997e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 5.73997e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.78999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00998e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.12e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.15002e-06 [cse]: 1.284e-05 [a_3]: 3.193e-05 [py_interpret_to_execute_after_opt_a]: 7.78001e-06 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.671e-05 [convert_after_rewriter]: 7.39002e-06 [order_py_execute_after_rewriter]: 5.48002e-06 [mutable_eliminate]: 0.0004381 [opt_b]: 0.00020931, [1] [Cycle 1]: 0.00020302, [7] [b_1]: 0.00013434 [b_2]: 7.52002e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.09986e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.946e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.346e-05 [loop_unroll]: 0.00041378 [opt_after_cconv]: 9.532e-05, [1] [Cycle 1]: 8.956e-05, [7] [c_1]: 2.803e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.584e-05 [renormalize]: 9.79984e-07 [remove_dup_value]: 1.328e-05 [tuple_transform]: 6.968e-05, [1] [Cycle 1]: 6.521e-05, [4] [d_1]: 3.966e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.151e-05 [cse_after_recomputation]: 2.03e-05, [1] [Cycle 1]: 1.601e-05, [1] [cse]: 1.108e-05 [environ_conv]: 4.3e-06 [swap_dp_allreduce_reducescatter]: 5.03002e-06 [bias_add_comm_swap]: 2.16e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.85998e-06 [comm_op_add_attrs]: 1.34e-06 [add_comm_op_reuse_tag]: 1.36998e-06 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01998e-06 [control_data_broadcast_order]: 1.133e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.37e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 3.90998e-06 [overlap_grad_flash_sp]: 1.794e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 6.696e-05, [1] [Cycle 1]: 6.298e-05, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.13001e-06 [elim_not_effective]: 1.104e-05 [opt_reshape]: 6.20002e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.631e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.47002e-06 [opt_after_jit_grad]: 0.00044219 [validate]: 3.068e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0661994 [execute]: 8.79e-06 Sums bootstrap : 0.000568s : 0.74% type_inference : 0.006202s : 8.07% event_method : 0.000014s : 0.02% auto_monad : 0.000058s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000584s : 0.76% optimize.opt_a.with_stream_mark : 0.000027s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000415s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000041s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000438s : 0.57% optimize.opt_b.b_1 : 0.000134s : 0.17% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000414s : 0.54% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000442s : 0.58% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.066199s : 86.16% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000168 30 15.00% : 0.000025s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.40% : 0.000006s : 4: substitution.graph_param_transform 66.96% : 0.000112s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.24% : 0.000004s : 4: substitution.replace_old_param 6.50% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006153 2 90.85% : 0.005590s : 1: type_inference.infer 9.15% : 0.000563s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.38% : 0.000028s : 3: replace.inline 29.62% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.78% : 0.000110s : 3: match.inline 8.22% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 1.03% : 0.000002s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.33% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000004s : 16: predicate.float_depend_g_call 0.75% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.40% : 0.000001s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000010s : 51: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.73% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.43% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000002s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 48.02% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.98% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.090192 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.80% : 0.003426s : 1: add_attr 3.78% : 0.003414s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000063s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000607s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.03% : 0.000030s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.47% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.50% : 0.000447s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.05% : 0.000950s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000116s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.36% : 0.002127s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.50% : 0.000452s : 1: opt_after_jit_grad 0.24% : 0.000213s : 1: opt_b 4.44% : 0.004001s : 1: optimize 0.03% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.24% : 0.000214s : 1: renormalize.infer 0.21% : 0.000194s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000041s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 73.42% : 0.066216s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 6.89% : 0.006217s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.0734203, [24] [bootstrap]: 0.00045199 [type_inference]: 0.00447018 [event_method]: 1.074e-05 [auto_monad]: 5.411e-05 [graph_reusing]: 5.61e-06 [inline]: 1.94999e-06 [add_attr]: 0.00296358, [1] [add_attr_with_inline]: 0.00295566, [1] [Cycle 1]: 4.4e-05, [2] [tag_attr]: 1.244e-05 [meta_addattr_fg_expand]: 3.68e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.215e-05 [insert-virtual-dataset]: 2.83e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.88002e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00375327, [53] [py_interpret_to_execute]: 1.56e-05 [rewriter_before_opt_a]: 4.146e-05 [opt_a]: 0.00187501, [2] [Cycle 1]: 0.00126914, [45] [expand_dump_flag]: 3.16001e-06 [switch_simplify]: 2.402e-05 [loop_unroll]: 1.361e-05 [a_1]: 0.00029563 [with_stream_mark]: 1.445e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 1.76998e-06 [a_2]: 7.63e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.62001e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.78997e-06 [merge_send_recv]: 7.56001e-06 [auto_parallel]: 6.28e-06 [parallel]: 1.867e-05 [flash_sp]: 7.90998e-06 [merge_comm]: 3.83999e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.75e-06 [virtual_dataset]: 6.09999e-06 [get_grad_eliminate_]: 5.72999e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.85002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.46998e-06 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.06998e-06 [flash_sp_send_recv_attached]: 2.55002e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.087e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00034673 [add_forward_monad_depend]: 4.76997e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.402e-05 [cse]: 2.848e-05 [a_3]: 4.086e-05 [Cycle 2]: 0.00059679, [45] [expand_dump_flag]: 9.90025e-07 [switch_simplify]: 6.75002e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00012628 [with_stream_mark]: 1.141e-05 [recompute_prepare]: 5.69e-06 [updatestate_depend_eliminate]: 2.64999e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.743e-05 [accelerated_algorithm]: 5.71998e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.36002e-06 [auto_parallel]: 5.23002e-06 [parallel]: 4.15e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 3.17002e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.25001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.42001e-06 [virtual_dataset]: 5.54998e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 4.97999e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 5.86998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.13001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.84e-06 [a_after_grad]: 8.1e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.36e-06 [cse]: 1.322e-05 [a_3]: 3.198e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.206e-05 [convert_after_rewriter]: 6.77002e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.00051531 [opt_b]: 0.00018323, [1] [Cycle 1]: 0.00017707, [7] [b_1]: 0.00010895 [b_2]: 7.33999e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 4.80009e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.61e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.328e-05 [loop_unroll]: 0.00041421 [opt_after_cconv]: 9.38e-05, [1] [Cycle 1]: 8.814e-05, [7] [c_1]: 2.809e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.12001e-06 [cse]: 1.588e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 1.31e-05 [tuple_transform]: 6.864e-05, [1] [Cycle 1]: 6.434e-05, [4] [d_1]: 3.878e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.11998e-06 [partial_unused_args_eliminate]: 1.84998e-06 [add_recomputation]: 4.251e-05 [cse_after_recomputation]: 2.008e-05, [1] [Cycle 1]: 1.56e-05, [1] [cse]: 1.065e-05 [environ_conv]: 5.30999e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.50002e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 1.04003e-06 [remove_cast_before_assign_add]: 8.80013e-07 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.95002e-06 [comm_op_add_attrs]: 1.09998e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44998e-06 [overlap_recompute_comm]: 2.33002e-06 [overlap_grad_ring_attention]: 4.23001e-06 [overlap_grad_flash_sp]: 1.67e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 6.696e-05, [1] [Cycle 1]: 6.294e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.42e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.61002e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.597e-05 [get_jit_bprop_graph]: 9.90025e-07 [rewriter_after_jit_bprop_graph]: 3.50003e-06 [opt_after_jit_grad]: 0.00044754 [validate]: 3.18e-05 [backend_pass]: 1.30999e-06 [task_emit]: 0.0609672 [execute]: 8.72e-06 Sums bootstrap : 0.000452s : 0.65% type_inference : 0.004470s : 6.43% event_method : 0.000011s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000422s : 0.61% optimize.opt_a.with_stream_mark : 0.000026s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000347s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000515s : 0.74% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000414s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000448s : 0.64% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060967s : 87.72% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000124 26 18.67% : 0.000023s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.61% : 0.000006s : 4: substitution.graph_param_transform 65.52% : 0.000081s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.24% : 0.000004s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004428 2 91.48% : 0.004051s : 1: type_inference.infer 8.52% : 0.000377s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000140 984 1.10% : 0.000002s : 9: predicate.accumulaten_eliminater 1.07% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.95% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.97% : 0.000003s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.40% : 0.000001s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 8: predicate.less_batch_normalization 1.70% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.92% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.70% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.20% : 0.000002s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.07% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.28% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000276 6 42.28% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.72% : 0.000160s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081412 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.65% : 0.002968s : 1: add_attr 3.63% : 0.002959s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.59% : 0.000484s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.64% : 0.000524s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 0.95% : 0.000775s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.31% : 0.001878s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.56% : 0.000457s : 1: opt_after_jit_grad 0.23% : 0.000187s : 1: opt_b 4.62% : 0.003757s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.23% : 0.000190s : 1: renormalize.infer 0.18% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.06% : 0.000046s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.91% : 0.060984s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.51% : 0.004484s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.0747661, [24] [bootstrap]: 0.00044463 [type_inference]: 0.00560498 [event_method]: 1.44e-05 [auto_monad]: 5.577e-05 [graph_reusing]: 5.97999e-06 [inline]: 1.91e-06 [add_attr]: 0.00291579, [1] [add_attr_with_inline]: 0.00290768, [1] [Cycle 1]: 4.737e-05, [2] [tag_attr]: 1.599e-05 [meta_addattr_fg_expand]: 4.17998e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 2.535e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.00400875, [53] [py_interpret_to_execute]: 2.151e-05 [rewriter_before_opt_a]: 5.979e-05 [opt_a]: 0.00215616, [2] [Cycle 1]: 0.00154519, [45] [expand_dump_flag]: 3.03003e-06 [switch_simplify]: 3.337e-05 [loop_unroll]: 5.429e-05 [a_1]: 0.00045038 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.49002e-06 [updatestate_depend_eliminate]: 3.80998e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 7.558e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 7.33e-06 [auto_parallel]: 5.97001e-06 [parallel]: 1.876e-05 [flash_sp]: 7.22002e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.80001e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 7.00998e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.51002e-06 [virtual_output]: 5.53002e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 8.63001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.41e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.45997e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.006e-05 [a_after_grad]: 8.87e-06 [renormalize]: 0.00041982 [add_forward_monad_depend]: 4.74e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.448e-05 [cse]: 2.694e-05 [a_3]: 4.133e-05 [Cycle 2]: 0.00060164, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.0001254 [with_stream_mark]: 9.45001e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.783e-05 [accelerated_algorithm]: 5.69e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.25999e-06 [parallel]: 4.38001e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 3.14001e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.31998e-06 [virtual_dataset]: 5.43002e-06 [get_grad_eliminate_]: 5.61998e-06 [virtual_output]: 5.07999e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.23998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.51e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.86997e-06 [a_after_grad]: 7.89002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.29999e-06 [cse]: 1.407e-05 [a_3]: 3.451e-05 [py_interpret_to_execute_after_opt_a]: 7.64002e-06 [slice_cell_reuse_recomputed_activation]: 2.08002e-06 [rewriter_after_opt_a]: 3.107e-05 [convert_after_rewriter]: 6.94001e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.0004539 [opt_b]: 0.00018214, [1] [Cycle 1]: 0.00017616, [7] [b_1]: 0.00010856 [b_2]: 7.31999e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 5.50004e-07 [cse]: 1.563e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.298e-05 [loop_unroll]: 0.0004175 [opt_after_cconv]: 9.522e-05, [1] [Cycle 1]: 8.945e-05, [7] [c_1]: 2.766e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.651e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.262e-05 [tuple_transform]: 6.941e-05, [1] [Cycle 1]: 6.501e-05, [4] [d_1]: 3.879e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 4.299e-05 [cse_after_recomputation]: 2.037e-05, [1] [Cycle 1]: 1.575e-05, [1] [cse]: 1.066e-05 [environ_conv]: 5.29e-06 [swap_dp_allreduce_reducescatter]: 5.70001e-06 [bias_add_comm_swap]: 2.67001e-06 [label_micro_interleaved_index]: 4.53999e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.66999e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 1.97001e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 1.27999e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.207e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.59998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.27e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.58998e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.989e-05, [1] [Cycle 1]: 6.56e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.61002e-06 [elim_not_effective]: 1.178e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 9.25001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.63002e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.514e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.66999e-06 [opt_after_jit_grad]: 0.00050077 [validate]: 3.157e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0609203 [execute]: 8.54e-06 Sums bootstrap : 0.000445s : 0.63% type_inference : 0.005605s : 7.91% event_method : 0.000014s : 0.02% auto_monad : 0.000056s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.06% optimize.opt_a.loop_unroll : 0.000060s : 0.08% optimize.opt_a.a_1 : 0.000576s : 0.81% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000420s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000076s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.64% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000418s : 0.59% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000501s : 0.71% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.060920s : 85.94% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000164 30 15.33% : 0.000025s : 5: substitution.arithmetic_simplify 1.27% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.28% : 0.000005s : 4: substitution.graph_param_transform 65.83% : 0.000108s : 3: substitution.inline 1.86% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000004s : 4: substitution.remove_not_recompute_node 2.23% : 0.000004s : 4: substitution.replace_old_param 6.69% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005562 2 89.96% : 0.005004s : 1: type_inference.infer 10.04% : 0.000558s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.11% : 0.000026s : 3: replace.inline 29.89% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.45% : 0.000106s : 3: match.inline 8.55% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.73% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.68% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.30% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.77% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.61% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.75% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000347 8 47.12% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.88% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.083244 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.51% : 0.002920s : 1: add_attr 3.50% : 0.002911s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000061s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.57% : 0.000472s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.17% : 0.000977s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.59% : 0.002159s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.61% : 0.000511s : 1: opt_after_jit_grad 0.22% : 0.000186s : 1: opt_b 4.82% : 0.004013s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000207s : 1: renormalize.infer 0.25% : 0.000206s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 73.20% : 0.060938s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.75% : 0.005620s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.109722, [24] [bootstrap]: 0.00048405 [type_inference]: 0.0113667 [event_method]: 8.682e-05 [auto_monad]: 0.00012348 [graph_reusing]: 8.43999e-06 [inline]: 2.01e-06 [add_attr]: 0.00295594, [1] [add_attr_with_inline]: 0.0029474, [1] [Cycle 1]: 7.033e-05, [2] [tag_attr]: 3.422e-05 [meta_addattr_fg_expand]: 9.71003e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 4.944e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.07999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.0134776, [53] [py_interpret_to_execute]: 3.741e-05 [rewriter_before_opt_a]: 0.00014736 [opt_a]: 0.0111907, [3] [Cycle 1]: 0.00719116, [45] [expand_dump_flag]: 3.98001e-06 [switch_simplify]: 7.405e-05 [loop_unroll]: 6.2e-05 [a_1]: 0.00152287 [with_stream_mark]: 2.296e-05 [recompute_prepare]: 2.175e-05 [updatestate_depend_eliminate]: 9.20999e-06 [updatestate_assign_eliminate]: 8.42e-06 [updatestate_loads_eliminate]: 7.58999e-06 [parameter_eliminate]: 2.75997e-06 [a_2]: 0.00024501 [accelerated_algorithm]: 3.129e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.48e-06 [shard_inline]: 1.641e-05 [merge_send_recv]: 1.621e-05 [auto_parallel]: 1.053e-05 [parallel]: 1.999e-05 [flash_sp]: 1.192e-05 [merge_comm]: 9.47001e-06 [allreduce_fusion]: 8.78001e-06 [matmul_add_comm_reduction]: 2.731e-05 [allreduce_slice_to_reducescatter]: 7.89994e-07 [virtual_shard_identity]: 1.753e-05 [virtual_dataset]: 1.588e-05 [get_grad_eliminate_]: 1.531e-05 [virtual_output]: 1.512e-05 [merge_forward]: 9.24e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.768e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.949e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 2.777e-05 [set_forward_comm_id_for_comm_node_pass]: 9.60001e-06 [meta_fg_expand]: 0.00140377 [flash_sp_send_recv_attached]: 3.85e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 5.92e-05 [a_after_grad]: 8.122e-05 [renormalize]: 0.00248658 [add_forward_monad_depend]: 9.29e-06 [auto_monad_grad]: 5.40999e-06 [auto_monad_eliminator]: 5.642e-05 [cse]: 0.0001693 [a_3]: 0.00033822 [Cycle 2]: 0.00307796, [45] [expand_dump_flag]: 1.43002e-06 [switch_simplify]: 4.705e-05 [loop_unroll]: 4.456e-05 [a_1]: 0.00159477 [with_stream_mark]: 1.207e-05 [recompute_prepare]: 1.046e-05 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 9.99979e-07 [a_2]: 0.00012723 [accelerated_algorithm]: 1.177e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 9.27001e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.15998e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.23e-06 [merge_comm]: 5.16002e-06 [allreduce_fusion]: 4.70999e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.90999e-06 [virtual_output]: 8.47e-06 [merge_forward]: 4.49998e-06 [cell_reuse_recompute_pass]: 8.90024e-07 [offload_activation]: 9.66998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.616e-05 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 1.409e-05 [set_forward_comm_id_for_comm_node_pass]: 5.54e-06 [meta_fg_expand]: 6.914e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.619e-05 [a_after_grad]: 1.509e-05 [renormalize]: 0.00060092 [add_forward_monad_depend]: 4.12003e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.483e-05 [cse]: 4.667e-05 [a_3]: 6.635e-05 [Cycle 3]: 0.00090744, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 1.056e-05 [loop_unroll]: 8.92e-06 [a_1]: 0.00025069 [with_stream_mark]: 9.99001e-06 [recompute_prepare]: 9.02999e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 4.37998e-06 [updatestate_loads_eliminate]: 3.87998e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.0001233 [accelerated_algorithm]: 1.146e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.84998e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 7.08998e-06 [auto_parallel]: 6.91001e-06 [parallel]: 4.55999e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 5.05999e-06 [matmul_add_comm_reduction]: 7.82e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.047e-05 [virtual_dataset]: 8.79e-06 [get_grad_eliminate_]: 8.60999e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.48999e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.648e-05 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 1.468e-05 [set_forward_comm_id_for_comm_node_pass]: 5.56e-06 [meta_fg_expand]: 3.09999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.358e-05 [a_after_grad]: 1.433e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 1.081e-05 [cse]: 2.674e-05 [a_3]: 5.983e-05 [py_interpret_to_execute_after_opt_a]: 1.016e-05 [slice_cell_reuse_recomputed_activation]: 2.17001e-06 [rewriter_after_opt_a]: 4.984e-05 [convert_after_rewriter]: 9.86998e-06 [order_py_execute_after_rewriter]: 6.93998e-06 [mutable_eliminate]: 0.00045442 [opt_b]: 0.00028938, [1] [Cycle 1]: 0.00028294, [7] [b_1]: 0.00018993 [b_2]: 1.083e-05 [updatestate_depend_eliminate]: 7.45e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 3.89002e-06 [renormalize]: 4.69998e-07 [cse]: 3.181e-05 [optimize_parallel_all_gather_comm]: 2.059e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.015e-05 [loop_unroll]: 0.00041906 [opt_after_cconv]: 0.00014124, [1] [Cycle 1]: 0.000135, [7] [c_1]: 4.977e-05 [parameter_eliminate]: 2.44001e-06 [updatestate_depend_eliminate]: 7.53e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 4.15e-06 [cse]: 3.134e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 3.04e-05 [tuple_transform]: 0.00010237, [1] [Cycle 1]: 9.766e-05, [4] [d_1]: 6.729e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 1.015e-05 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.96e-05 [cse_after_recomputation]: 3.221e-05, [1] [Cycle 1]: 2.741e-05, [1] [cse]: 2.192e-05 [environ_conv]: 9.37999e-06 [swap_dp_allreduce_reducescatter]: 7.82998e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 4.62998e-06 [label_fine_grained_interleaved_index]: 2.90998e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.56e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 1.92001e-06 [reorder_send_recv_between_fp_bp]: 2.91999e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.792e-05 [grouped_pairwise_exchange_alltoall]: 1.74998e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.58002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60001e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 5.19e-06 [overlap_grad_flash_sp]: 2.501e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.936e-05, [1] [Cycle 1]: 9.515e-05, [6] [build]: 1.063e-05 [elim_shapecalc]: 1.343e-05 [elim_not_effective]: 1.842e-05 [opt_reshape]: 9.94999e-06 [fold_const_symbol]: 1.504e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.76998e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 2.524e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.75998e-06 [opt_after_jit_grad]: 0.00046924 [validate]: 4.734e-05 [backend_pass]: 1.15001e-06 [task_emit]: 0.080386 [execute]: 8.92e-06 Sums bootstrap : 0.000484s : 0.46% type_inference : 0.011367s : 10.77% event_method : 0.000087s : 0.08% auto_monad : 0.000123s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.04% optimize.rewriter_before_opt_a : 0.000147s : 0.14% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.12% optimize.opt_a.loop_unroll : 0.000115s : 0.11% optimize.opt_a.a_1 : 0.003368s : 3.19% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001476s : 1.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.10% optimize.opt_a.renormalize : 0.003088s : 2.93% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000243s : 0.23% optimize.opt_a.a_3 : 0.000464s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.05% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.43% optimize.opt_b.b_1 : 0.000190s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000419s : 0.40% optimize.opt_after_cconv.c_1 : 0.000050s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000469s : 0.44% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.080386s : 76.20% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000765 222 6.01% : 0.000046s : 12: substitution.arithmetic_simplify 1.74% : 0.000013s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 55.41% : 0.000424s : 17: substitution.inline 2.02% : 0.000015s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.00% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.86% : 0.000014s : 20: substitution.remove_not_recompute_node 3.07% : 0.000023s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.73% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011290 2 86.54% : 0.009770s : 1: type_inference.infer 13.46% : 0.001520s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.50% : 0.000126s : 17: replace.inline 42.50% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 33 92.22% : 0.000415s : 17: match.inline 7.78% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000828 5764 0.97% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.46% : 0.000004s : 32: predicate.addn_check_dump 0.98% : 0.000008s : 68: predicate.addn_zero_filter 0.96% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.86% : 0.000015s : 100: predicate.arithmetic_simplify 1.05% : 0.000009s : 68: predicate.cast_eliminate 1.04% : 0.000009s : 68: predicate.check_bprop_eliminate 0.46% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.10% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.02% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.14% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.08% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.09% : 0.000009s : 76: predicate.environ_get_depend_swap 1.62% : 0.000013s : 108: predicate.environ_get_eliminate 1.09% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.56% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.13% : 0.000018s : 101: predicate.float_depend_g_call 0.46% : 0.000004s : 32: predicate.float_environ_get_switch 0.60% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000005s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.49% : 0.000004s : 32: predicate.incorporate_call 0.43% : 0.000004s : 32: predicate.incorporate_call_switch 5.15% : 0.000043s : 249: predicate.inline 1.17% : 0.000010s : 55: predicate.inline_without_move 0.27% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.59% : 0.000005s : 32: predicate.less_batch_normalization 1.49% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.41% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.12% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.29% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.49% : 0.000004s : 32: predicate.merge_addn 1.03% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.02% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.04% : 0.000009s : 68: predicate.minmaximum_grad 0.31% : 0.000003s : 8: predicate.mutable_eliminate 0.13% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.83% : 0.000015s : 101: predicate.partial_defer_inline 1.64% : 0.000014s : 92: predicate.partial_eliminate 1.01% : 0.000008s : 68: predicate.print_const_string_wrapper 0.49% : 0.000004s : 32: predicate.reduce_all_const_elim 1.20% : 0.000010s : 68: predicate.reduce_eliminate 2.42% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000003s : 32: predicate.remove_not_recompute_node 1.71% : 0.000014s : 152: predicate.replace_applicator 0.55% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.01% : 0.000008s : 68: predicate.reshape_eliminate 1.02% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.16% : 0.000010s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.57% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.57% : 0.000005s : 32: predicate.specialize_transform 1.17% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.69% : 0.000014s : 101: predicate.switch_defer_inline 2.67% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.61% : 0.000038s : 277: predicate.switch_simplify 0.98% : 0.000008s : 68: predicate.tile_eliminate 0.97% : 0.000008s : 68: predicate.transpose_eliminate 1.34% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.41% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 9.85% : 0.000082s : 84: predicate.tuple_list_get_item_depend_reorder 2.61% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.30% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.83% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.51% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.40% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 2.95% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.51% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.51% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.14% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001598 34 56.82% : 0.000908s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.18% : 0.000690s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.134609 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.20% : 0.002960s : 1: add_attr 2.19% : 0.002951s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000131s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.38% : 0.000518s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.07% : 0.000096s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000440s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.74% : 0.005040s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000176s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.32% : 0.011194s : 1: opt_a 0.11% : 0.000145s : 1: opt_after_cconv 0.36% : 0.000479s : 1: opt_after_jit_grad 0.22% : 0.000293s : 1: opt_b 10.02% : 0.013482s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000054s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000034s : 1: remove_dup_value 1.23% : 0.001656s : 2: renormalize.infer 1.05% : 0.001418s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000054s : 1: rewriter_after_opt_a 0.11% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000102s : 1: symbol_engine_optimizer 59.73% : 0.080404s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 8.46% : 0.011382s : 1: type_inference 0.05% : 0.000071s : 1: validate TotalTime = 0.0733384, [24] [bootstrap]: 0.00044092 [type_inference]: 0.00434908 [event_method]: 1.087e-05 [auto_monad]: 5.185e-05 [graph_reusing]: 5.79999e-06 [inline]: 2.04e-06 [add_attr]: 0.00296204, [1] [add_attr_with_inline]: 0.00295396, [1] [Cycle 1]: 4.332e-05, [2] [tag_attr]: 1.193e-05 [meta_addattr_fg_expand]: 3.4e-06 [parallel-infer-symbol]: 3.43e-06 [pre_auto_parallel]: 2.131e-05 [insert-virtual-dataset]: 2.47001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.0037453, [53] [py_interpret_to_execute]: 1.578e-05 [rewriter_before_opt_a]: 3.974e-05 [opt_a]: 0.00192788, [2] [Cycle 1]: 0.0013237, [45] [expand_dump_flag]: 2.91e-06 [switch_simplify]: 2.427e-05 [loop_unroll]: 1.401e-05 [a_1]: 0.00029469 [with_stream_mark]: 1.381e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 7.582e-05 [accelerated_algorithm]: 6.46999e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 5.98998e-06 [merge_send_recv]: 7.68001e-06 [auto_parallel]: 5.70001e-06 [parallel]: 1.784e-05 [flash_sp]: 7.58999e-06 [merge_comm]: 3.65998e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.15001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.33999e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.82002e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 9.44998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 9.69999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.35002e-06 [receive_attached]: 2.91e-06 [after_resolve]: 1.101e-05 [a_after_grad]: 8.64998e-06 [renormalize]: 0.00040265 [add_forward_monad_depend]: 4.53001e-06 [auto_monad_grad]: 1.69998e-06 [auto_monad_eliminator]: 1.367e-05 [cse]: 2.74e-05 [a_3]: 4.027e-05 [Cycle 2]: 0.00059495, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.64999e-06 [loop_unroll]: 5.51002e-06 [a_1]: 0.00012456 [with_stream_mark]: 1.17e-05 [recompute_prepare]: 6.02001e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.706e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.52001e-06 [parallel]: 3.99002e-06 [flash_sp]: 3.44001e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.01e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.96997e-06 [merge_forward]: 2.62001e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 6.41998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.58002e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.38001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.83002e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.12001e-06 [a_after_grad]: 7.8e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.32001e-06 [cse]: 1.344e-05 [a_3]: 3.183e-05 [py_interpret_to_execute_after_opt_a]: 7.87998e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.157e-05 [convert_after_rewriter]: 7.03998e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.0004497 [opt_b]: 0.00018279, [1] [Cycle 1]: 0.00017677, [7] [b_1]: 0.00010793 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.45002e-06 [renormalize]: 2.80008e-07 [cse]: 1.694e-05 [optimize_parallel_all_gather_comm]: 1.684e-05 [overlap_param_gather]: 2.16e-06 [cconv]: 2.356e-05 [loop_unroll]: 0.00041497 [opt_after_cconv]: 9.556e-05, [1] [Cycle 1]: 8.968e-05, [7] [c_1]: 2.761e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.649e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.342e-05 [tuple_transform]: 6.954e-05, [1] [Cycle 1]: 6.516e-05, [4] [d_1]: 3.98e-05 [none_parameter_eliminate]: 1.53002e-06 [renormalize]: 1.99972e-07 [switch_simplify]: 6.32001e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.327e-05 [cse_after_recomputation]: 2.045e-05, [1] [Cycle 1]: 1.613e-05, [1] [cse]: 1.12e-05 [environ_conv]: 4.83001e-06 [swap_dp_allreduce_reducescatter]: 6.10002e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.99999e-06 [comm_op_add_attrs]: 1.32e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.53002e-06 [overlap_opt_shard_grad_in_pipeline]: 2.05002e-06 [control_data_broadcast_order]: 1.147e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.68e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.692e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 6.818e-05, [1] [Cycle 1]: 6.377e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 7.98999e-06 [elim_not_effective]: 1.187e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.587e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00045091 [validate]: 3.173e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.0610282 [execute]: 8.84998e-06 Sums bootstrap : 0.000441s : 0.64% type_inference : 0.004349s : 6.26% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.60% optimize.opt_a.with_stream_mark : 0.000026s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000403s : 0.58% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000415s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000451s : 0.65% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.061028s : 87.91% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.06% : 0.000022s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.72% : 0.000006s : 4: substitution.graph_param_transform 65.66% : 0.000080s : 2: substitution.inline 2.41% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.31% : 0.000004s : 4: substitution.remove_not_recompute_node 3.40% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004308 2 91.82% : 0.003956s : 1: type_inference.infer 8.18% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000140 984 1.03% : 0.000001s : 9: predicate.accumulaten_eliminater 1.18% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.00% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.99% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 5.60% : 0.000008s : 44: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 8: predicate.less_batch_normalization 1.79% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.80% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.84% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.46% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.84% : 0.000001s : 9: predicate.print_const_string_wrapper 0.95% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.25% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000249 6 43.97% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.03% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081376 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.65% : 0.002966s : 1: add_attr 3.63% : 0.002958s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.58% : 0.000471s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.95% : 0.000771s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.37% : 0.001931s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.57% : 0.000461s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.61% : 0.003749s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.30% : 0.000241s : 1: renormalize.infer 0.19% : 0.000155s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 75.02% : 0.061045s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.36% : 0.004363s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.106971, [24] [bootstrap]: 0.00047752 [type_inference]: 0.0103371 [event_method]: 4.372e-05 [auto_monad]: 0.00011862 [graph_reusing]: 7.99002e-06 [inline]: 2.01998e-06 [add_attr]: 0.00300029, [1] [add_attr_with_inline]: 0.00299224, [1] [Cycle 1]: 6.617e-05, [2] [tag_attr]: 3.148e-05 [meta_addattr_fg_expand]: 8.65999e-06 [parallel-infer-symbol]: 2.90998e-06 [pre_auto_parallel]: 4.55e-05 [insert-virtual-dataset]: 2.83003e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 2.29999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.0132481, [53] [py_interpret_to_execute]: 3.76e-05 [rewriter_before_opt_a]: 0.00012689 [opt_a]: 0.0109101, [3] [Cycle 1]: 0.00697104, [45] [expand_dump_flag]: 3.48e-06 [switch_simplify]: 6.672e-05 [loop_unroll]: 5.568e-05 [a_1]: 0.0013372 [with_stream_mark]: 2.306e-05 [recompute_prepare]: 2.115e-05 [updatestate_depend_eliminate]: 8.95001e-06 [updatestate_assign_eliminate]: 8.15999e-06 [updatestate_loads_eliminate]: 7.67998e-06 [parameter_eliminate]: 2.84001e-06 [a_2]: 0.0002475 [accelerated_algorithm]: 3.12e-05 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.588e-05 [auto_parallel]: 1.06e-05 [parallel]: 1.9e-05 [flash_sp]: 1.11e-05 [merge_comm]: 1.017e-05 [allreduce_fusion]: 8.90999e-06 [matmul_add_comm_reduction]: 2.701e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.78e-05 [virtual_dataset]: 1.579e-05 [get_grad_eliminate_]: 1.534e-05 [virtual_output]: 1.542e-05 [merge_forward]: 9.84001e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 1.826e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.886e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 2.71e-05 [set_forward_comm_id_for_comm_node_pass]: 9.47999e-06 [meta_fg_expand]: 0.0014346 [flash_sp_send_recv_attached]: 3.64002e-06 [receive_attached]: 2.73e-06 [after_resolve]: 5.862e-05 [a_after_grad]: 8.072e-05 [renormalize]: 0.0024362 [add_forward_monad_depend]: 8.93002e-06 [auto_monad_grad]: 5.67001e-06 [auto_monad_eliminator]: 5.647e-05 [cse]: 0.00017439 [a_3]: 0.00033398 [Cycle 2]: 0.00302235, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 4.675e-05 [loop_unroll]: 4.357e-05 [a_1]: 0.00157749 [with_stream_mark]: 1.223e-05 [recompute_prepare]: 1.094e-05 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 1.19998e-06 [a_2]: 0.00012651 [accelerated_algorithm]: 1.194e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 2.01e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.68e-06 [auto_parallel]: 7.32997e-06 [parallel]: 4.77e-06 [flash_sp]: 3.48e-06 [merge_comm]: 4.90001e-06 [allreduce_fusion]: 4.62998e-06 [matmul_add_comm_reduction]: 7.61999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.71997e-06 [get_grad_eliminate_]: 8.80999e-06 [virtual_output]: 8.23001e-06 [merge_forward]: 4.15999e-06 [cell_reuse_recompute_pass]: 9.29984e-07 [offload_activation]: 9.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.681e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.388e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 3.565e-05 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 1.504e-05 [a_after_grad]: 1.443e-05 [renormalize]: 0.00059781 [add_forward_monad_depend]: 3.85998e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.472e-05 [cse]: 4.733e-05 [a_3]: 6.481e-05 [Cycle 3]: 0.0009029, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 1.059e-05 [loop_unroll]: 9.10999e-06 [a_1]: 0.00024916 [with_stream_mark]: 9.93002e-06 [recompute_prepare]: 9.41e-06 [updatestate_depend_eliminate]: 4.60001e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012315 [accelerated_algorithm]: 1.148e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 6.96001e-06 [auto_parallel]: 7.05998e-06 [parallel]: 4.72998e-06 [flash_sp]: 1.12e-06 [merge_comm]: 4.85999e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 7.8e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.018e-05 [virtual_dataset]: 8.59998e-06 [get_grad_eliminate_]: 8.44998e-06 [virtual_output]: 8.29998e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 8.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.565e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 1.526e-05 [set_forward_comm_id_for_comm_node_pass]: 5.82001e-06 [meta_fg_expand]: 3.06999e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.38e-05 [a_after_grad]: 1.409e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09003e-06 [auto_monad_grad]: 1.05999e-06 [auto_monad_eliminator]: 1.085e-05 [cse]: 2.808e-05 [a_3]: 5.917e-05 [py_interpret_to_execute_after_opt_a]: 1.024e-05 [slice_cell_reuse_recomputed_activation]: 2.61999e-06 [rewriter_after_opt_a]: 4.817e-05 [convert_after_rewriter]: 9.00001e-06 [order_py_execute_after_rewriter]: 7.08998e-06 [mutable_eliminate]: 0.0005258 [opt_b]: 0.00029338, [1] [Cycle 1]: 0.00028688, [7] [b_1]: 0.00019242 [b_2]: 1.094e-05 [updatestate_depend_eliminate]: 7.35e-06 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 3.30008e-07 [cse]: 3.232e-05 [optimize_parallel_all_gather_comm]: 2.151e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.037e-05 [loop_unroll]: 0.0004285 [opt_after_cconv]: 0.00013756, [1] [Cycle 1]: 0.00013199, [7] [c_1]: 4.853e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 7.28e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 4.06001e-06 [cse]: 3.138e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 3.05e-05 [tuple_transform]: 0.00010244, [1] [Cycle 1]: 9.712e-05, [4] [d_1]: 6.737e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.77001e-06 [partial_unused_args_eliminate]: 2.29999e-06 [add_recomputation]: 5.943e-05 [cse_after_recomputation]: 3.304e-05, [1] [Cycle 1]: 2.822e-05, [1] [cse]: 2.248e-05 [environ_conv]: 9.45001e-06 [swap_dp_allreduce_reducescatter]: 7.94997e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.98e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.93e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.46998e-06 [reorder_send_recv_between_fp_bp]: 2.96999e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.778e-05 [grouped_pairwise_exchange_alltoall]: 1.94e-06 [offloading_packed_experts]: 5.07999e-06 [overlap_recompute_and_grad_model_parallel]: 6.02999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.74998e-06 [overlap_recompute_comm]: 2.36998e-06 [overlap_grad_ring_attention]: 4.97e-06 [overlap_grad_flash_sp]: 2.407e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 2.17999e-06 [handle_group_info]: 1.43002e-06 [symbol_engine_optimizer]: 9.91e-05, [1] [Cycle 1]: 9.467e-05, [6] [build]: 1.036e-05 [elim_shapecalc]: 1.35e-05 [elim_not_effective]: 1.801e-05 [opt_reshape]: 9.84999e-06 [fold_const_symbol]: 1.529e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.52999e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 2.542e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.61001e-06 [opt_after_jit_grad]: 0.00046702 [validate]: 4.667e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.07892 [execute]: 8.38999e-06 Sums bootstrap : 0.000478s : 0.46% type_inference : 0.010337s : 10.06% event_method : 0.000044s : 0.04% auto_monad : 0.000119s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000127s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000124s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.11% optimize.opt_a.a_1 : 0.003164s : 3.08% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000497s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001473s : 1.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.09% optimize.opt_a.a_after_grad : 0.000109s : 0.11% optimize.opt_a.renormalize : 0.003034s : 2.95% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000250s : 0.24% optimize.opt_a.a_3 : 0.000458s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000526s : 0.51% optimize.opt_b.b_1 : 0.000192s : 0.19% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000429s : 0.42% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000467s : 0.45% validate : 0.000047s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.078920s : 76.83% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000745 218 5.83% : 0.000043s : 11: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.42% : 0.000003s : 5: substitution.elim_not_effective 0.57% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000002s : 5: substitution.fold_const_symbol 1.13% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 54.76% : 0.000408s : 16: substitution.inline 2.13% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000015s : 3: substitution.less_batch_normalization 1.81% : 0.000014s : 11: substitution.minmaximum_grad 0.77% : 0.000006s : 5: substitution.partial_eliminate 1.84% : 0.000014s : 20: substitution.remove_not_recompute_node 3.22% : 0.000024s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.72% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.55% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010268 2 86.63% : 0.008895s : 1: type_inference.infer 13.37% : 0.001372s : 1: type_inference.specialize ------[replace.] 0.000231 30 51.11% : 0.000118s : 16: replace.inline 48.89% : 0.000113s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000431 30 92.64% : 0.000400s : 16: match.inline 7.36% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000742 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.07% : 0.000015s : 99: predicate.arithmetic_simplify 1.18% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.16% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.21% : 0.000016s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000042s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.68% : 0.000012s : 89: predicate.partial_eliminate 1.14% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 97: predicate.switch_defer_inline 2.90% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.21% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001556 32 56.45% : 0.000878s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.55% : 0.000678s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131394 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.29% : 0.003005s : 1: add_attr 2.28% : 0.002996s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000126s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.39% : 0.000507s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000438s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.41% : 0.000536s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.66% : 0.004812s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000177s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.31% : 0.010913s : 1: opt_a 0.11% : 0.000141s : 1: opt_after_cconv 0.36% : 0.000477s : 1: opt_after_jit_grad 0.23% : 0.000297s : 1: opt_b 10.09% : 0.013252s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000035s : 1: remove_dup_value 1.22% : 0.001604s : 2: renormalize.infer 1.08% : 0.001417s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000102s : 1: symbol_engine_optimizer 60.08% : 0.078937s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 7.88% : 0.010353s : 1: type_inference 0.05% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x5-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x6-pynative],max_mem:42.0M TotalTime = 0.0218494, [24] [bootstrap]: 0.00053627 [type_inference]: 0.00623097 [event_method]: 1.454e-05 [auto_monad]: 6.145e-05 [graph_reusing]: 6.06e-06 [inline]: 1.97999e-06 [add_attr]: 0.00349086, [1] [add_attr_with_inline]: 0.0034807, [1] [Cycle 1]: 4.529e-05, [2] [tag_attr]: 1.584e-05 [meta_addattr_fg_expand]: 4.02e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 2.874e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00399617, [53] [py_interpret_to_execute]: 1.965e-05 [rewriter_before_opt_a]: 6.157e-05 [opt_a]: 0.00214461, [2] [Cycle 1]: 0.00153281, [45] [expand_dump_flag]: 2.91999e-06 [switch_simplify]: 3.217e-05 [loop_unroll]: 2.144e-05 [a_1]: 0.0004572 [with_stream_mark]: 1.296e-05 [recompute_prepare]: 7.66001e-06 [updatestate_depend_eliminate]: 4.13001e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.601e-05 [accelerated_algorithm]: 6.56999e-06 [shard]: 2.17001e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.75999e-06 [auto_parallel]: 5.59e-06 [parallel]: 2.526e-05 [flash_sp]: 7.85998e-06 [merge_comm]: 3.93999e-06 [allreduce_fusion]: 3.58999e-06 [matmul_add_comm_reduction]: 9.49e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 6.05002e-06 [virtual_output]: 5.94e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.49999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.064e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 9.33002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.61999e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.55002e-06 [after_resolve]: 9.94001e-06 [a_after_grad]: 8.70999e-06 [renormalize]: 0.00042187 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.404e-05 [cse]: 2.712e-05 [a_3]: 4.203e-05 [Cycle 2]: 0.00060221, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012455 [with_stream_mark]: 9.91e-06 [recompute_prepare]: 5.47999e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.718e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.2e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.71002e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.49998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.02001e-06 [virtual_dataset]: 5.21002e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.18998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.82e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.652e-05 [a_3]: 3.198e-05 [py_interpret_to_execute_after_opt_a]: 7.53e-06 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.175e-05 [convert_after_rewriter]: 7.25998e-06 [order_py_execute_after_rewriter]: 5.22999e-06 [mutable_eliminate]: 0.00045261 [opt_b]: 0.00018072, [1] [Cycle 1]: 0.00017499, [7] [b_1]: 0.00010704 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 5.54e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 2.50002e-07 [cse]: 1.626e-05 [optimize_parallel_all_gather_comm]: 1.684e-05 [overlap_param_gather]: 2.28998e-06 [cconv]: 2.255e-05 [loop_unroll]: 0.00041127 [opt_after_cconv]: 9.451e-05, [1] [Cycle 1]: 8.875e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.566e-05 [renormalize]: 1.79978e-07 [remove_dup_value]: 1.431e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.424e-05, [4] [d_1]: 3.9e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 6.24001e-06 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 5.252e-05 [cse_after_recomputation]: 2.061e-05, [1] [Cycle 1]: 1.619e-05, [1] [cse]: 1.118e-05 [environ_conv]: 5.42001e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.24002e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.57001e-06 [assign_add_opt]: 1.24003e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.33002e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.187e-05 [grouped_pairwise_exchange_alltoall]: 1.95001e-06 [offloading_packed_experts]: 3.48999e-06 [overlap_recompute_and_grad_model_parallel]: 4.69002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.60997e-06 [overlap_grad_ring_attention]: 3.97998e-06 [overlap_grad_flash_sp]: 1.79e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 2.07999e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.805e-05, [1] [Cycle 1]: 6.412e-05, [6] [build]: 2.32999e-06 [elim_shapecalc]: 8.04002e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 6.26998e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.98997e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 0.00011421 [opt_after_jit_grad]: 0.00045019 [validate]: 3.483e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00660079 [execute]: 7.4e-06 Sums bootstrap : 0.000536s : 3.09% type_inference : 0.006231s : 35.92% event_method : 0.000015s : 0.08% auto_monad : 0.000061s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000062s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000582s : 3.35% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000422s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.61% optimize.opt_b.b_1 : 0.000107s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000411s : 2.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000114s : 0.66% opt_after_jit_grad : 0.000450s : 2.60% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006601s : 38.06% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000168 30 14.59% : 0.000024s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 4: substitution.graph_param_transform 67.60% : 0.000113s : 3: substitution.inline 1.73% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.06% : 0.000003s : 4: substitution.replace_old_param 6.55% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006186 2 90.48% : 0.005597s : 1: type_inference.infer 9.52% : 0.000589s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.65% : 0.000028s : 3: replace.inline 29.35% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.82% : 0.000112s : 3: match.inline 8.18% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.03% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.87% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.29% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.49% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.87% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.41% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000383 8 47.62% : 0.000182s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.38% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030861 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.33% : 0.003495s : 1: add_attr 11.29% : 0.003484s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000067s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.99% : 0.000615s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.07% : 0.000948s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.96% : 0.002147s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.49% : 0.000460s : 1: opt_after_jit_grad 0.60% : 0.000184s : 1: opt_b 12.96% : 0.004000s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.70% : 0.000215s : 1: renormalize.infer 0.64% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.39% : 0.000120s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.21% : 0.000066s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.42% : 0.006611s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.24% : 0.006245s : 1: type_inference 0.21% : 0.000064s : 1: validate TotalTime = 0.0179004, [24] [bootstrap]: 0.0004243 [type_inference]: 0.00429251 [event_method]: 9.81e-06 [auto_monad]: 5.071e-05 [graph_reusing]: 4.95001e-06 [inline]: 1.72999e-06 [add_attr]: 0.00293851, [1] [add_attr_with_inline]: 0.00293063, [1] [Cycle 1]: 4.021e-05, [2] [tag_attr]: 1.138e-05 [meta_addattr_fg_expand]: 4.07998e-06 [parallel-infer-symbol]: 2.59999e-06 [pre_auto_parallel]: 2.065e-05 [insert-virtual-dataset]: 1.96998e-06 [parallel-infer-symbol-second]: 6.59988e-07 [dataset_repeat_opt]: 1.62001e-06 [pipeline_split]: 1.39998e-06 [optimize]: 0.00367973, [53] [py_interpret_to_execute]: 1.395e-05 [rewriter_before_opt_a]: 3.777e-05 [opt_a]: 0.0018545, [2] [Cycle 1]: 0.00125074, [45] [expand_dump_flag]: 2.64001e-06 [switch_simplify]: 2.187e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.00028919 [with_stream_mark]: 1.314e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.66999e-06 [updatestate_assign_eliminate]: 2.88e-06 [updatestate_loads_eliminate]: 2.33998e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.562e-05 [accelerated_algorithm]: 6.11e-06 [shard]: 2.31998e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.24002e-06 [auto_parallel]: 5.51e-06 [parallel]: 1.73e-05 [flash_sp]: 6.41e-06 [merge_comm]: 3.54002e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 8.64e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 7.40003e-06 [virtual_dataset]: 5.66998e-06 [get_grad_eliminate_]: 5.74999e-06 [virtual_output]: 5.91998e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.14998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.127e-05 [merge_recompute_call_nodes]: 1.20001e-06 [before_grad]: 9.51998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 1.11e-05 [a_after_grad]: 9.15999e-06 [renormalize]: 0.00034784 [add_forward_monad_depend]: 4.23001e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.179e-05 [cse]: 2.662e-05 [a_3]: 4.039e-05 [Cycle 2]: 0.00059488, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.74999e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012569 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.73998e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.792e-05 [accelerated_algorithm]: 5.56998e-06 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.30999e-06 [auto_parallel]: 5.07999e-06 [parallel]: 4.13999e-06 [flash_sp]: 2.84999e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.80002e-06 [matmul_add_comm_reduction]: 5.42999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.06998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95998e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.97999e-06 [a_after_grad]: 7.95e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.47001e-06 [cse]: 1.377e-05 [a_3]: 3.22e-05 [py_interpret_to_execute_after_opt_a]: 7.53999e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.057e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.61e-06 [mutable_eliminate]: 0.00047462 [opt_b]: 0.00018072, [1] [Cycle 1]: 0.0001749, [7] [b_1]: 0.00010708 [b_2]: 7.32002e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 3.9002e-07 [cse]: 1.635e-05 [optimize_parallel_all_gather_comm]: 1.506e-05 [overlap_param_gather]: 2.21e-06 [cconv]: 2.109e-05 [loop_unroll]: 0.00041398 [opt_after_cconv]: 9.59e-05, [1] [Cycle 1]: 9.032e-05, [7] [c_1]: 2.822e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.12001e-06 [cse]: 1.657e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.156e-05 [tuple_transform]: 7.001e-05, [1] [Cycle 1]: 6.594e-05, [4] [d_1]: 3.973e-05 [none_parameter_eliminate]: 1.39e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.58e-06 [partial_unused_args_eliminate]: 1.52001e-06 [add_recomputation]: 4.144e-05 [cse_after_recomputation]: 2.109e-05, [1] [Cycle 1]: 1.687e-05, [1] [cse]: 1.156e-05 [environ_conv]: 4.22e-06 [swap_dp_allreduce_reducescatter]: 5.11997e-06 [bias_add_comm_swap]: 1.86e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.09e-06 [slice_recompute_activation]: 1.84e-06 [micro_interleaved_order_control]: 2.39999e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.20001e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 7.59988e-07 [add_comm_op_reuse_tag]: 7.80012e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 6.90023e-07 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.253e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.12998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.01e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.609e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.79e-06 [split_layernorm_comm]: 1.50999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.853e-05, [1] [Cycle 1]: 6.455e-05, [6] [build]: 2.18002e-06 [elim_shapecalc]: 8.54e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.03998e-06 [fold_const_symbol]: 9.20999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.53002e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.42e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00044871 [validate]: 2.955e-05 [backend_pass]: 8.29983e-07 [task_emit]: 0.00577465 [execute]: 6.66e-06 Sums bootstrap : 0.000424s : 3.03% type_inference : 0.004293s : 30.64% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000029s : 0.20% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000415s : 2.96% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000009s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000348s : 2.48% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.13% optimize.opt_a.cse : 0.000040s : 0.29% optimize.opt_a.a_3 : 0.000073s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000475s : 3.39% optimize.opt_b.b_1 : 0.000107s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000414s : 2.96% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000041s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.09% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.07% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000014s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.20% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005775s : 41.23% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 19.10% : 0.000023s : 4: substitution.arithmetic_simplify 1.55% : 0.000002s : 2: substitution.elim_not_effective 1.28% : 0.000002s : 2: substitution.fold_const_symbol 4.37% : 0.000005s : 4: substitution.graph_param_transform 64.32% : 0.000077s : 2: substitution.inline 2.21% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.72% : 0.000004s : 4: substitution.remove_not_recompute_node 3.45% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004253 2 92.09% : 0.003916s : 1: type_inference.infer 7.91% : 0.000336s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.14% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.81% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.47% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.34% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.84% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.52% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.38% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 1.07% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.06% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000232 6 41.66% : 0.000097s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.34% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025795 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.41% : 0.002943s : 1: add_attr 11.38% : 0.002934s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000045s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.76% : 0.000455s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000422s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.88% : 0.000484s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.96% : 0.000764s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.13% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.20% : 0.001857s : 1: opt_a 0.39% : 0.000099s : 1: opt_after_cconv 1.77% : 0.000458s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.28% : 0.003683s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000003s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.74% : 0.000192s : 1: renormalize.infer 0.58% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000004s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000071s : 1: symbol_engine_optimizer 22.43% : 0.005785s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.69% : 0.004306s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0187593, [24] [bootstrap]: 0.00039055 [type_inference]: 0.00527605 [event_method]: 1.384e-05 [auto_monad]: 4.84e-05 [graph_reusing]: 4.90999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00285766, [1] [add_attr_with_inline]: 0.00284992, [1] [Cycle 1]: 4.244e-05, [2] [tag_attr]: 1.405e-05 [meta_addattr_fg_expand]: 4.12e-06 [parallel-infer-symbol]: 2.08002e-06 [pre_auto_parallel]: 2.401e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00387822, [53] [py_interpret_to_execute]: 1.819e-05 [rewriter_before_opt_a]: 5.531e-05 [opt_a]: 0.00205821, [2] [Cycle 1]: 0.0014525, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 2.899e-05 [loop_unroll]: 2.077e-05 [a_1]: 0.00043479 [with_stream_mark]: 1.159e-05 [recompute_prepare]: 7.55e-06 [updatestate_depend_eliminate]: 4.15e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.29999e-06 [parameter_eliminate]: 1.39998e-06 [a_2]: 7.418e-05 [accelerated_algorithm]: 6.42001e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 6.25002e-06 [auto_parallel]: 5.47999e-06 [parallel]: 1.362e-05 [flash_sp]: 6.53e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 6.70002e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 6.93998e-06 [virtual_dataset]: 6.19999e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 7.90998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 1.09e-06 [before_grad]: 9.56e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 1.59e-06 [receive_attached]: 1.94e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 8.86997e-06 [renormalize]: 0.00040931 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.69998e-06 [auto_monad_eliminator]: 1.229e-05 [cse]: 2.085e-05 [a_3]: 4.061e-05 [Cycle 2]: 0.00059641, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00012851 [with_stream_mark]: 8.65001e-06 [recompute_prepare]: 5.85002e-06 [updatestate_depend_eliminate]: 2.61e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.876e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.38002e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 4.89e-06 [parallel]: 4.88001e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.56e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.06997e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.07999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 2.97002e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.44999e-06 [cse]: 1.732e-05 [a_3]: 3.27e-05 [py_interpret_to_execute_after_opt_a]: 7.72002e-06 [slice_cell_reuse_recomputed_activation]: 1.39998e-06 [rewriter_after_opt_a]: 2.923e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00046731 [opt_b]: 0.00018327, [1] [Cycle 1]: 0.00017731, [7] [b_1]: 0.00011057 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 4.40021e-07 [cse]: 1.577e-05 [optimize_parallel_all_gather_comm]: 1.322e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2e-05 [loop_unroll]: 0.00041001 [opt_after_cconv]: 9.663e-05, [1] [Cycle 1]: 9.088e-05, [7] [c_1]: 2.85e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.612e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.039e-05 [tuple_transform]: 6.8e-05, [1] [Cycle 1]: 6.383e-05, [4] [d_1]: 3.855e-05 [none_parameter_eliminate]: 1.13001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 3.963e-05 [cse_after_recomputation]: 1.978e-05, [1] [Cycle 1]: 1.52e-05, [1] [cse]: 1.014e-05 [environ_conv]: 4.27e-06 [swap_dp_allreduce_reducescatter]: 4.70999e-06 [bias_add_comm_swap]: 2.16e-06 [label_micro_interleaved_index]: 3.91999e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 9.29984e-07 [slice_recompute_activation]: 1.99e-06 [micro_interleaved_order_control]: 2.79999e-06 [assign_add_opt]: 9.89996e-07 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 6.50005e-07 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.11e-06 [comm_op_add_attrs]: 7.7e-07 [add_comm_op_reuse_tag]: 7.89994e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.092e-05 [grouped_pairwise_exchange_alltoall]: 1.12999e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.20999e-06 [overlap_grad_matmul_and_grad_allreduce]: 9.80013e-07 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.595e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.39998e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.7e-05, [1] [Cycle 1]: 6.288e-05, [6] [build]: 1.82001e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.112e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.67e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.44003e-06 [auto_monad_reorder]: 1.288e-05 [get_jit_bprop_graph]: 9.90025e-07 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00044427 [validate]: 2.734e-05 [backend_pass]: 6.90023e-07 [task_emit]: 0.00557348 [execute]: 6.73e-06 Sums bootstrap : 0.000391s : 2.61% type_inference : 0.005276s : 35.27% event_method : 0.000014s : 0.09% auto_monad : 0.000048s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000002s : 0.01% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.12% optimize.rewriter_before_opt_a : 0.000055s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.18% optimize.opt_a.a_1 : 0.000563s : 3.77% optimize.opt_a.with_stream_mark : 0.000020s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.96% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000002s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000011s : 0.07% optimize.opt_a.auto_parallel : 0.000010s : 0.07% optimize.opt_a.parallel : 0.000018s : 0.12% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000012s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000014s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000002s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000409s : 2.74% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000001s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000467s : 3.12% optimize.opt_b.b_1 : 0.000111s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000013s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.13% optimize.loop_unroll : 0.000410s : 2.74% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000010s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000040s : 0.26% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000001s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000013s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000444s : 2.97% validate : 0.000027s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.005573s : 37.26% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000149 30 14.28% : 0.000021s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 65.95% : 0.000098s : 3: substitution.inline 1.95% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000004s : 4: substitution.remove_not_recompute_node 2.61% : 0.000004s : 4: substitution.replace_old_param 7.29% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005240 2 89.84% : 0.004708s : 1: type_inference.infer 10.16% : 0.000533s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.34% : 0.000026s : 3: replace.inline 31.66% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000106 5 90.75% : 0.000096s : 3: match.inline 9.25% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 1.15% : 0.000002s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 11: predicate.addn_zero_filter 0.83% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 19: predicate.arithmetic_simplify 1.00% : 0.000002s : 11: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.02% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.97% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000002s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.96% : 0.000002s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000326 8 44.12% : 0.000144s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.88% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026991 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.60% : 0.002862s : 1: add_attr 10.57% : 0.002853s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000044s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000053s : 1: auto_monad 0.06% : 0.000017s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.54% : 0.000416s : 1: bootstrap 0.09% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.55% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000477s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.43% : 0.000926s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.64% : 0.002061s : 1: opt_a 0.37% : 0.000100s : 1: opt_after_cconv 1.68% : 0.000454s : 1: opt_after_jit_grad 0.69% : 0.000187s : 1: opt_b 14.38% : 0.003882s : 1: optimize 0.06% : 0.000017s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000014s : 1: remove_dup_value 0.76% : 0.000206s : 1: renormalize.infer 0.73% : 0.000196s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000033s : 1: rewriter_after_opt_a 0.22% : 0.000060s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 20.69% : 0.005585s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 19.60% : 0.005290s : 1: type_inference 0.20% : 0.000053s : 1: validate TotalTime = 0.0374434, [24] [bootstrap]: 0.00047115 [type_inference]: 0.011441 [event_method]: 4.733e-05 [auto_monad]: 0.00012117 [graph_reusing]: 8.51002e-06 [inline]: 2.02001e-06 [add_attr]: 0.00298072, [1] [add_attr_with_inline]: 0.0029724, [1] [Cycle 1]: 7.051e-05, [2] [tag_attr]: 3.412e-05 [meta_addattr_fg_expand]: 9.20999e-06 [parallel-infer-symbol]: 3.34001e-06 [pre_auto_parallel]: 5.018e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0132943, [53] [py_interpret_to_execute]: 3.869e-05 [rewriter_before_opt_a]: 0.00014583 [opt_a]: 0.0110157, [3] [Cycle 1]: 0.00708029, [45] [expand_dump_flag]: 3.75998e-06 [switch_simplify]: 7.477e-05 [loop_unroll]: 6.122e-05 [a_1]: 0.00144503 [with_stream_mark]: 2.325e-05 [recompute_prepare]: 2.152e-05 [updatestate_depend_eliminate]: 9.10001e-06 [updatestate_assign_eliminate]: 7.90998e-06 [updatestate_loads_eliminate]: 7.21999e-06 [parameter_eliminate]: 2.62001e-06 [a_2]: 0.0002436 [accelerated_algorithm]: 3.087e-05 [shard]: 1.96e-06 [meta_shard_fg_expand]: 3.41999e-06 [shard_inline]: 1.611e-05 [merge_send_recv]: 1.684e-05 [auto_parallel]: 1.069e-05 [parallel]: 2.026e-05 [flash_sp]: 1.135e-05 [merge_comm]: 9.78998e-06 [allreduce_fusion]: 8.82999e-06 [matmul_add_comm_reduction]: 2.686e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.771e-05 [virtual_dataset]: 1.523e-05 [get_grad_eliminate_]: 1.498e-05 [virtual_output]: 1.5e-05 [merge_forward]: 9.42001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.78e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.818e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.649e-05 [set_forward_comm_id_for_comm_node_pass]: 9.78998e-06 [meta_fg_expand]: 0.00139562 [flash_sp_send_recv_attached]: 4.07e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 6e-05 [a_after_grad]: 0.00012821 [renormalize]: 0.00239705 [add_forward_monad_depend]: 9.38002e-06 [auto_monad_grad]: 5.40999e-06 [auto_monad_eliminator]: 5.561e-05 [cse]: 0.00016674 [a_3]: 0.0003343 [Cycle 2]: 0.00302438, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.715e-05 [loop_unroll]: 4.354e-05 [a_1]: 0.00155809 [with_stream_mark]: 1.229e-05 [recompute_prepare]: 1.08e-05 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 4.49998e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 1.09e-06 [a_2]: 0.00012638 [accelerated_algorithm]: 1.169e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.24998e-06 [merge_send_recv]: 6.84001e-06 [auto_parallel]: 7.44002e-06 [parallel]: 4.95999e-06 [flash_sp]: 3.22002e-06 [merge_comm]: 4.99998e-06 [allreduce_fusion]: 4.67998e-06 [matmul_add_comm_reduction]: 7.87e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 9.19e-06 [get_grad_eliminate_]: 8.94e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 8.30012e-07 [offload_activation]: 9.26998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.629e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.365e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44998e-06 [meta_fg_expand]: 7.02e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.07998e-06 [after_resolve]: 1.582e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00058551 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.13001e-06 [auto_monad_eliminator]: 1.483e-05 [cse]: 4.631e-05 [a_3]: 6.432e-05 [Cycle 3]: 0.00089736, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 1.035e-05 [loop_unroll]: 9.01002e-06 [a_1]: 0.00024865 [with_stream_mark]: 9.71e-06 [recompute_prepare]: 9.07001e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 3.93999e-06 [updatestate_loads_eliminate]: 3.83999e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012285 [accelerated_algorithm]: 1.155e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 9.07999e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 6.84999e-06 [parallel]: 4.68999e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 4.81997e-06 [allreduce_fusion]: 4.84998e-06 [matmul_add_comm_reduction]: 7.79002e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.62e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.18001e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.602e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.389e-05 [set_forward_comm_id_for_comm_node_pass]: 5.11997e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 7.40023e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 1.477e-05 [a_after_grad]: 1.478e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 1.048e-05 [cse]: 2.625e-05 [a_3]: 5.957e-05 [py_interpret_to_execute_after_opt_a]: 1.063e-05 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 4.749e-05 [convert_after_rewriter]: 9.05001e-06 [order_py_execute_after_rewriter]: 6.86999e-06 [mutable_eliminate]: 0.00045579 [opt_b]: 0.00028762, [1] [Cycle 1]: 0.00028166, [7] [b_1]: 0.00018908 [b_2]: 1.07e-05 [updatestate_depend_eliminate]: 7.25e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 3.91001e-06 [renormalize]: 3.49974e-07 [cse]: 3.174e-05 [optimize_parallel_all_gather_comm]: 2.244e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.188e-05 [loop_unroll]: 0.00043907 [opt_after_cconv]: 0.00013605, [1] [Cycle 1]: 0.00013006, [7] [c_1]: 4.85e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.7e-06 [cse]: 2.982e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 2.896e-05 [tuple_transform]: 0.00010199, [1] [Cycle 1]: 9.727e-05, [4] [d_1]: 6.725e-05 [none_parameter_eliminate]: 1.71998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.79e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 5.778e-05 [cse_after_recomputation]: 3.184e-05, [1] [Cycle 1]: 2.703e-05, [1] [cse]: 2.166e-05 [environ_conv]: 9.02999e-06 [swap_dp_allreduce_reducescatter]: 8.42998e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.37999e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 3.13e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.699e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 5.09998e-06 [overlap_recompute_and_grad_model_parallel]: 5.76e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46998e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 5.05001e-06 [overlap_grad_flash_sp]: 2.446e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.38998e-06 [split_layernorm_comm]: 1.96998e-06 [handle_group_info]: 9.40025e-07 [symbol_engine_optimizer]: 9.691e-05, [1] [Cycle 1]: 9.268e-05, [6] [build]: 9.63997e-06 [elim_shapecalc]: 1.287e-05 [elim_not_effective]: 1.849e-05 [opt_reshape]: 9.71003e-06 [fold_const_symbol]: 1.487e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.91003e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 2.564e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.73999e-06 [opt_after_jit_grad]: 0.0004614 [validate]: 4.534e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00825736 [execute]: 7.37997e-06 Sums bootstrap : 0.000471s : 1.42% type_inference : 0.011441s : 34.48% event_method : 0.000047s : 0.14% auto_monad : 0.000121s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000146s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003252s : 9.80% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000030s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000031s : 0.09% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000054s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001469s : 4.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.27% optimize.opt_a.a_after_grad : 0.000157s : 0.47% optimize.opt_a.renormalize : 0.002983s : 8.99% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000239s : 0.72% optimize.opt_a.a_3 : 0.000458s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000456s : 1.37% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.07% optimize.loop_unroll : 0.000439s : 1.32% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000461s : 1.39% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008257s : 24.89% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000762 222 5.92% : 0.000045s : 12: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.08% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 55.36% : 0.000422s : 17: substitution.inline 2.06% : 0.000016s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.73% : 0.000013s : 20: substitution.remove_not_recompute_node 3.19% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.71% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.62% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011367 2 86.99% : 0.009888s : 1: type_inference.infer 13.01% : 0.001478s : 1: type_inference.specialize ------[replace.] 0.000222 33 57.45% : 0.000128s : 17: replace.inline 42.55% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000448 33 92.30% : 0.000413s : 17: match.inline 7.70% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.16% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000042s : 249: predicate.inline 1.23% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.46% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.24% : 0.000009s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001541 34 57.30% : 0.000883s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.70% : 0.000658s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061981 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.82% : 0.002985s : 1: add_attr 4.80% : 0.002976s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000128s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.82% : 0.000509s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000448s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.99% : 0.004954s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.78% : 0.011019s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.76% : 0.000471s : 1: opt_after_jit_grad 0.47% : 0.000291s : 1: opt_b 21.46% : 0.013298s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.61% : 0.001616s : 2: renormalize.infer 2.18% : 0.001353s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.34% : 0.008267s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.48% : 0.011457s : 1: type_inference 0.13% : 0.000078s : 1: validate TotalTime = 0.018492, [24] [bootstrap]: 0.0004264 [type_inference]: 0.00432666 [event_method]: 1.096e-05 [auto_monad]: 5.125e-05 [graph_reusing]: 5.15001e-06 [inline]: 1.68002e-06 [add_attr]: 0.00295669, [1] [add_attr_with_inline]: 0.00294827, [1] [Cycle 1]: 4.342e-05, [2] [tag_attr]: 1.211e-05 [meta_addattr_fg_expand]: 3.35003e-06 [parallel-infer-symbol]: 3.07002e-06 [pre_auto_parallel]: 2.164e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00368685, [53] [py_interpret_to_execute]: 1.541e-05 [rewriter_before_opt_a]: 4.042e-05 [opt_a]: 0.00186841, [2] [Cycle 1]: 0.00126193, [45] [expand_dump_flag]: 2.55002e-06 [switch_simplify]: 2.473e-05 [loop_unroll]: 1.42e-05 [a_1]: 0.00029505 [with_stream_mark]: 1.365e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 4.17003e-06 [updatestate_assign_eliminate]: 3.5e-06 [updatestate_loads_eliminate]: 3.35003e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.674e-05 [accelerated_algorithm]: 6.19999e-06 [shard]: 2.35002e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.76001e-06 [auto_parallel]: 6.39001e-06 [parallel]: 1.806e-05 [flash_sp]: 7.76001e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.23002e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.31999e-06 [virtual_dataset]: 6.29999e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.90002e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 9.49e-06 [set_forward_comm_id_for_comm_node_pass]: 3.78999e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.064e-05 [a_after_grad]: 9.22999e-06 [renormalize]: 0.00034049 [add_forward_monad_depend]: 4.4e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.393e-05 [cse]: 2.847e-05 [a_3]: 4.012e-05 [Cycle 2]: 0.00059734, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.90002e-06 [loop_unroll]: 5.76e-06 [a_1]: 0.00012531 [with_stream_mark]: 9.36e-06 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.88998e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.799e-05 [accelerated_algorithm]: 5.35999e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.12999e-06 [parallel]: 4.15999e-06 [flash_sp]: 3.53e-06 [merge_comm]: 3.26999e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.35999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.19999e-06 [virtual_dataset]: 5.31002e-06 [get_grad_eliminate_]: 5.20001e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.87998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22002e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 1.32e-06 [receive_attached]: 1.01002e-06 [after_resolve]: 8.70999e-06 [a_after_grad]: 8.1e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.54999e-06 [cse]: 1.316e-05 [a_3]: 3.407e-05 [py_interpret_to_execute_after_opt_a]: 7.70998e-06 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 3.123e-05 [convert_after_rewriter]: 7.14001e-06 [order_py_execute_after_rewriter]: 5.07e-06 [mutable_eliminate]: 0.00046032 [opt_b]: 0.00017892, [1] [Cycle 1]: 0.0001729, [7] [b_1]: 0.00010626 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.38998e-06 [renormalize]: 3.30008e-07 [cse]: 1.62e-05 [optimize_parallel_all_gather_comm]: 1.541e-05 [overlap_param_gather]: 2.46e-06 [cconv]: 2.313e-05 [loop_unroll]: 0.00041128 [opt_after_cconv]: 9.449e-05, [1] [Cycle 1]: 8.886e-05, [7] [c_1]: 2.769e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.21002e-06 [updatestate_assign_eliminate]: 2.63998e-06 [updatestate_loads_eliminate]: 2.33002e-06 [cse]: 1.563e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.256e-05 [tuple_transform]: 6.88e-05, [1] [Cycle 1]: 6.438e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.29e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.584e-05 [cse_after_recomputation]: 1.971e-05, [1] [Cycle 1]: 1.557e-05, [1] [cse]: 1.032e-05 [environ_conv]: 5.25999e-06 [swap_dp_allreduce_reducescatter]: 5.46e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.99001e-06 [merge_cast_opt]: 1.36998e-06 [slice_recompute_activation]: 2.51e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.44003e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 8.59989e-07 [full_micro_interleaved_order_control]: 2.01998e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 8.89995e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.77002e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.654e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.64001e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.761e-05, [1] [Cycle 1]: 6.349e-05, [6] [build]: 2.23002e-06 [elim_shapecalc]: 8.06001e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.07001e-06 [fold_const_symbol]: 8.77999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.66998e-06 [pipeline_parallel_scheduler]: 1.82999e-06 [auto_monad_reorder]: 1.599e-05 [get_jit_bprop_graph]: 9.49978e-07 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044596 [validate]: 3.054e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.00628988 [execute]: 7.38e-06 Sums bootstrap : 0.000426s : 2.93% type_inference : 0.004327s : 29.68% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000420s : 2.88% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000341s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000042s : 0.29% optimize.opt_a.a_3 : 0.000074s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000460s : 3.16% optimize.opt_b.b_1 : 0.000106s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000411s : 2.82% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.06% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006290s : 43.15% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.35% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.78% : 0.000006s : 4: substitution.graph_param_transform 65.47% : 0.000080s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.28% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004284 2 91.86% : 0.003935s : 1: type_inference.infer 8.14% : 0.000349s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000139 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.70% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.07% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.12% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.69% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.58% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.44% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.98% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 1.12% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.70% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 43.04% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.96% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026404 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.21% : 0.002961s : 1: add_attr 11.18% : 0.002952s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000050s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.75% : 0.000462s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000420s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.78% : 0.000470s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001871s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.73% : 0.000456s : 1: opt_after_jit_grad 0.69% : 0.000182s : 1: opt_b 13.98% : 0.003691s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000186s : 1: renormalize.infer 0.56% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.86% : 0.006300s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.44% : 0.004341s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0360363, [24] [bootstrap]: 0.00050456 [type_inference]: 0.0103146 [event_method]: 4.13e-05 [auto_monad]: 0.00011502 [graph_reusing]: 8.02998e-06 [inline]: 1.94e-06 [add_attr]: 0.00303327, [1] [add_attr_with_inline]: 0.00302493, [1] [Cycle 1]: 6.752e-05, [2] [tag_attr]: 3.252e-05 [meta_addattr_fg_expand]: 8.76002e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 4.622e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.83997e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.0129769, [53] [py_interpret_to_execute]: 3.538e-05 [rewriter_before_opt_a]: 0.00012623 [opt_a]: 0.0107127, [3] [Cycle 1]: 0.00685738, [45] [expand_dump_flag]: 3.65e-06 [switch_simplify]: 6.706e-05 [loop_unroll]: 5.48e-05 [a_1]: 0.00133426 [with_stream_mark]: 2.267e-05 [recompute_prepare]: 2.139e-05 [updatestate_depend_eliminate]: 9.19e-06 [updatestate_assign_eliminate]: 7.68001e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 2.43e-06 [a_2]: 0.00024541 [accelerated_algorithm]: 3.049e-05 [shard]: 1.83002e-06 [meta_shard_fg_expand]: 3.15998e-06 [shard_inline]: 1.588e-05 [merge_send_recv]: 1.528e-05 [auto_parallel]: 1.034e-05 [parallel]: 1.811e-05 [flash_sp]: 1.112e-05 [merge_comm]: 9.66e-06 [allreduce_fusion]: 9.05999e-06 [matmul_add_comm_reduction]: 2.57e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.766e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.513e-05 [virtual_output]: 1.509e-05 [merge_forward]: 9.42001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.74e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.888e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 2.797e-05 [set_forward_comm_id_for_comm_node_pass]: 9.91e-06 [meta_fg_expand]: 0.00140215 [flash_sp_send_recv_attached]: 3.75e-06 [receive_attached]: 2.65002e-06 [after_resolve]: 6.159e-05 [a_after_grad]: 8.421e-05 [renormalize]: 0.00237071 [add_forward_monad_depend]: 9.09e-06 [auto_monad_grad]: 5.34e-06 [auto_monad_eliminator]: 5.62e-05 [cse]: 0.0001638 [a_3]: 0.00033561 [Cycle 2]: 0.00295326, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.649e-05 [loop_unroll]: 4.338e-05 [a_1]: 0.00155258 [with_stream_mark]: 1.193e-05 [recompute_prepare]: 1.114e-05 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012525 [accelerated_algorithm]: 1.164e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 6.62002e-06 [auto_parallel]: 7.03e-06 [parallel]: 4.78001e-06 [flash_sp]: 3.10002e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.66999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.69998e-06 [get_grad_eliminate_]: 8.98002e-06 [virtual_output]: 8.42998e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.606e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.415e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 3.591e-05 [flash_sp_send_recv_attached]: 8.79983e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.481e-05 [a_after_grad]: 1.411e-05 [renormalize]: 0.00056899 [add_forward_monad_depend]: 3.76999e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 1.379e-05 [cse]: 4.52e-05 [a_3]: 6.415e-05 [Cycle 3]: 0.0008884, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.024e-05 [loop_unroll]: 8.98002e-06 [a_1]: 0.00024891 [with_stream_mark]: 9.67999e-06 [recompute_prepare]: 9.14e-06 [updatestate_depend_eliminate]: 4.74e-06 [updatestate_assign_eliminate]: 3.75e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 0.00012357 [accelerated_algorithm]: 1.18e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 6.86999e-06 [auto_parallel]: 6.93998e-06 [parallel]: 4.29002e-06 [flash_sp]: 1.09e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 4.97999e-06 [matmul_add_comm_reduction]: 7.73001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.78002e-06 [virtual_dataset]: 8.68001e-06 [get_grad_eliminate_]: 8.63001e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.28001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.32e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.55e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.357e-05 [set_forward_comm_id_for_comm_node_pass]: 5.06002e-06 [meta_fg_expand]: 2.87002e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.314e-05 [a_after_grad]: 1.421e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 1.026e-05 [cse]: 2.466e-05 [a_3]: 5.619e-05 [py_interpret_to_execute_after_opt_a]: 9.79e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 4.723e-05 [convert_after_rewriter]: 8.87e-06 [order_py_execute_after_rewriter]: 6.56999e-06 [mutable_eliminate]: 0.00045802 [opt_b]: 0.00031888, [1] [Cycle 1]: 0.00031276, [7] [b_1]: 0.00022048 [b_2]: 1.102e-05 [updatestate_depend_eliminate]: 7.43999e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 4.12998e-06 [renormalize]: 2.10013e-07 [cse]: 3.012e-05 [optimize_parallel_all_gather_comm]: 2.047e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 1.975e-05 [loop_unroll]: 0.00042306 [opt_after_cconv]: 0.00013443, [1] [Cycle 1]: 0.00012847, [7] [c_1]: 4.788e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 4.05998e-06 [cse]: 2.88e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 2.965e-05 [tuple_transform]: 0.00010133, [1] [Cycle 1]: 9.661e-05, [4] [d_1]: 6.622e-05 [none_parameter_eliminate]: 1.91e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 9.85002e-06 [partial_unused_args_eliminate]: 2.15002e-06 [add_recomputation]: 5.889e-05 [cse_after_recomputation]: 3.13e-05, [1] [Cycle 1]: 2.667e-05, [1] [cse]: 2.123e-05 [environ_conv]: 8.74998e-06 [swap_dp_allreduce_reducescatter]: 7.45998e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 3.03998e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.40001e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.663e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 4.77e-06 [overlap_recompute_and_grad_model_parallel]: 5.49e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 5.02e-06 [overlap_grad_flash_sp]: 2.457e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.734e-05, [1] [Cycle 1]: 9.32e-05, [6] [build]: 9.37999e-06 [elim_shapecalc]: 1.309e-05 [elim_not_effective]: 1.834e-05 [opt_reshape]: 1.009e-05 [fold_const_symbol]: 1.461e-05 [renormalize]: 1.70025e-07 [detach_backward]: 1.55001e-06 [pipeline_parallel_scheduler]: 1.29e-06 [auto_monad_reorder]: 2.494e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.00046117 [validate]: 4.382e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.00823262 [execute]: 7.22002e-06 Sums bootstrap : 0.000505s : 1.59% type_inference : 0.010315s : 32.48% event_method : 0.000041s : 0.13% auto_monad : 0.000115s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000126s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003136s : 9.87% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.56% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000024s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001441s : 4.54% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.28% optimize.opt_a.a_after_grad : 0.000113s : 0.35% optimize.opt_a.renormalize : 0.002940s : 9.26% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000234s : 0.74% optimize.opt_a.a_3 : 0.000456s : 1.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.44% optimize.opt_b.b_1 : 0.000220s : 0.69% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000423s : 1.33% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000059s : 0.19% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000461s : 1.45% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008233s : 25.92% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000727 218 6.00% : 0.000044s : 11: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.73% : 0.000398s : 16: substitution.inline 2.21% : 0.000016s : 2: substitution.inline_without_move 1.45% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000015s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.77% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000013s : 20: substitution.remove_not_recompute_node 3.24% : 0.000024s : 10: substitution.replace_applicator 1.49% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.71% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.42% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010247 2 87.15% : 0.008931s : 1: type_inference.infer 12.85% : 0.001316s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.83% : 0.000119s : 16: replace.inline 41.17% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000420 30 92.87% : 0.000390s : 16: match.inline 7.13% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000732 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.00% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.59% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.35% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000019s : 164: predicate.load_eliminater 0.34% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.43% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.43% : 0.000010s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.21% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001478 32 57.41% : 0.000849s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.59% : 0.000630s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060127 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.05% : 0.003038s : 1: add_attr 5.04% : 0.003029s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000122s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.90% : 0.000539s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.96% : 0.004784s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000205s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.82% : 0.010716s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.78% : 0.000470s : 1: opt_after_jit_grad 0.54% : 0.000323s : 1: opt_b 21.59% : 0.012981s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.60% : 0.001562s : 2: renormalize.infer 2.27% : 0.001365s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.71% : 0.008243s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.18% : 0.010329s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x6-kbk],max_mem:42.0M TotalTime = 0.844397, [24] [bootstrap]: 0.00053306 [type_inference]: 0.00610018 [event_method]: 1.362e-05 [auto_monad]: 5.394e-05 [graph_reusing]: 5.45001e-06 [inline]: 2.08998e-06 [add_attr]: 0.00344134, [1] [add_attr_with_inline]: 0.00343021, [1] [Cycle 1]: 4.331e-05, [2] [tag_attr]: 1.522e-05 [meta_addattr_fg_expand]: 4.03001e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.674e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.0039683, [53] [py_interpret_to_execute]: 1.922e-05 [rewriter_before_opt_a]: 5.924e-05 [opt_a]: 0.00209427, [2] [Cycle 1]: 0.00150323, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 3.183e-05 [loop_unroll]: 2.061e-05 [a_1]: 0.00045058 [with_stream_mark]: 1.369e-05 [recompute_prepare]: 7.35998e-06 [updatestate_depend_eliminate]: 4.11001e-06 [updatestate_assign_eliminate]: 3.15002e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 7.582e-05 [accelerated_algorithm]: 6.68998e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.44998e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 7.20998e-06 [auto_parallel]: 5.84999e-06 [parallel]: 2.227e-05 [flash_sp]: 6.91001e-06 [merge_comm]: 3.67002e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.35001e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.30003e-06 [virtual_dataset]: 6.22001e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.66003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.051e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.062e-05 [a_after_grad]: 9.12001e-06 [renormalize]: 0.00041125 [add_forward_monad_depend]: 4.37998e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.732e-05 [a_3]: 3.983e-05 [Cycle 2]: 0.00058161, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7.11001e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.0001249 [with_stream_mark]: 9.94001e-06 [recompute_prepare]: 5.54998e-06 [updatestate_depend_eliminate]: 2.85002e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.45002e-06 [parameter_eliminate]: 7.39994e-07 [a_2]: 6.742e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.51998e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.09e-06 [parallel]: 4.12e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 2.86999e-06 [allreduce_fusion]: 2.57001e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.78002e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 4.89998e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.77999e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.89002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95002e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.89e-06 [a_after_grad]: 7.74002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 5.90002e-06 [cse]: 1.216e-05 [a_3]: 3.123e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.013e-05 [convert_after_rewriter]: 6.44999e-06 [order_py_execute_after_rewriter]: 4.65001e-06 [mutable_eliminate]: 0.00044556 [opt_b]: 0.0001877, [1] [Cycle 1]: 0.00018158, [7] [b_1]: 0.00011351 [b_2]: 6.97002e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.18002e-06 [renormalize]: 4.7998e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.546e-05 [overlap_param_gather]: 1.73002e-06 [cconv]: 2.165e-05 [loop_unroll]: 0.00041377 [opt_after_cconv]: 9.498e-05, [1] [Cycle 1]: 8.906e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.613e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.307e-05 [tuple_transform]: 6.871e-05, [1] [Cycle 1]: 6.433e-05, [4] [d_1]: 3.881e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.13998e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 5.158e-05 [cse_after_recomputation]: 1.988e-05, [1] [Cycle 1]: 1.531e-05, [1] [cse]: 1.033e-05 [environ_conv]: 4.55999e-06 [swap_dp_allreduce_reducescatter]: 4.95999e-06 [bias_add_comm_swap]: 2.51998e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 1.44e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.16e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.711e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.783e-05, [1] [Cycle 1]: 6.373e-05, [6] [build]: 2.11998e-06 [elim_shapecalc]: 8.67e-06 [elim_not_effective]: 1.174e-05 [opt_reshape]: 5.88002e-06 [fold_const_symbol]: 8.79003e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.498e-05 [get_jit_bprop_graph]: 1.34998e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00044591 [validate]: 3.089e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.829523 [execute]: 9.35001e-06 Sums bootstrap : 0.000533s : 0.06% type_inference : 0.006100s : 0.73% event_method : 0.000014s : 0.00% auto_monad : 0.000054s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000575s : 0.07% optimize.opt_a.with_stream_mark : 0.000024s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000411s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000071s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.00% optimize.convert_after_rewriter : 0.000006s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.05% optimize.opt_b.b_1 : 0.000114s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000414s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.05% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.829523s : 98.76% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000164 30 14.41% : 0.000024s : 5: substitution.arithmetic_simplify 1.28% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.29% : 0.000005s : 4: substitution.graph_param_transform 66.82% : 0.000110s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 6.70% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006054 2 90.89% : 0.005503s : 1: type_inference.infer 9.11% : 0.000552s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.92% : 0.000027s : 3: replace.inline 29.08% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.53% : 0.000108s : 3: match.inline 8.47% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 1.01% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.92% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000009s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.65% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.87% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.40% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.40% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.86% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.14% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.853273 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.40% : 0.003446s : 1: add_attr 0.40% : 0.003434s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000059s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000570s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000005s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000455s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000939s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000094s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.25% : 0.002097s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.05% : 0.000455s : 1: opt_after_jit_grad 0.02% : 0.000191s : 1: opt_b 0.47% : 0.003972s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000031s : 1: pre_auto_parallel 0.00% : 0.000023s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000212s : 1: renormalize.infer 0.02% : 0.000193s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000034s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000070s : 1: symbol_engine_optimizer 97.22% : 0.829546s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.72% : 0.006114s : 1: type_inference 0.01% : 0.000052s : 1: validate TotalTime = 0.0776119, [24] [bootstrap]: 0.00046002 [type_inference]: 0.0054075 [event_method]: 1.121e-05 [auto_monad]: 4.978e-05 [graph_reusing]: 4.76997e-06 [inline]: 2.41e-06 [add_attr]: 0.00305635, [1] [add_attr_with_inline]: 0.00304855, [1] [Cycle 1]: 4.494e-05, [2] [tag_attr]: 1.155e-05 [meta_addattr_fg_expand]: 3.27002e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.155e-05 [insert-virtual-dataset]: 2.57001e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.63002e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00368503, [53] [py_interpret_to_execute]: 1.563e-05 [rewriter_before_opt_a]: 3.832e-05 [opt_a]: 0.00185004, [2] [Cycle 1]: 0.00125668, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 2.371e-05 [loop_unroll]: 1.4e-05 [a_1]: 0.00029138 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.18998e-06 [updatestate_depend_eliminate]: 3.41999e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.589e-05 [accelerated_algorithm]: 6.18002e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 7.56001e-06 [auto_parallel]: 5.79999e-06 [parallel]: 1.769e-05 [flash_sp]: 7.73999e-06 [merge_comm]: 3.40003e-06 [allreduce_fusion]: 3.43999e-06 [matmul_add_comm_reduction]: 8.75001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 6.76999e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.47999e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.17999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.92002e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.037e-05 [a_after_grad]: 8.53001e-06 [renormalize]: 0.00035056 [add_forward_monad_depend]: 4.02e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.283e-05 [cse]: 2.736e-05 [a_3]: 4.05e-05 [Cycle 2]: 0.00058426, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.56999e-06 [loop_unroll]: 5.25999e-06 [a_1]: 0.00012221 [with_stream_mark]: 9.13002e-06 [recompute_prepare]: 5.68002e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.717e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 9.90025e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.30999e-06 [auto_parallel]: 5.22e-06 [parallel]: 4.42e-06 [flash_sp]: 3.12002e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.85002e-06 [virtual_dataset]: 5.05001e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.24998e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.63999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03998e-06 [meta_fg_expand]: 1.56002e-06 [flash_sp_send_recv_attached]: 1.22999e-06 [receive_attached]: 1.07e-06 [after_resolve]: 8.80001e-06 [a_after_grad]: 8.18001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.09999e-06 [cse]: 1.285e-05 [a_3]: 3.214e-05 [py_interpret_to_execute_after_opt_a]: 7.65e-06 [slice_cell_reuse_recomputed_activation]: 2.17999e-06 [rewriter_after_opt_a]: 3.06e-05 [convert_after_rewriter]: 7.29001e-06 [order_py_execute_after_rewriter]: 5.28002e-06 [mutable_eliminate]: 0.00044953 [opt_b]: 0.00018081, [1] [Cycle 1]: 0.00017475, [7] [b_1]: 0.00010734 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 3.60014e-07 [cse]: 1.538e-05 [optimize_parallel_all_gather_comm]: 1.522e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.3e-05 [loop_unroll]: 0.0004135 [opt_after_cconv]: 9.411e-05, [1] [Cycle 1]: 8.836e-05, [7] [c_1]: 2.678e-05 [parameter_eliminate]: 2.10002e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.544e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.215e-05 [tuple_transform]: 6.767e-05, [1] [Cycle 1]: 6.307e-05, [4] [d_1]: 3.806e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 5.99999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.679e-05 [cse_after_recomputation]: 2.014e-05, [1] [Cycle 1]: 1.583e-05, [1] [cse]: 1.071e-05 [environ_conv]: 5.08002e-06 [swap_dp_allreduce_reducescatter]: 5.20001e-06 [bias_add_comm_swap]: 3.06001e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.48002e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.28002e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.53002e-06 [control_data_broadcast_order]: 1.156e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.40999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.669e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.86003e-06 [split_layernorm_comm]: 2.04999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.611e-05, [1] [Cycle 1]: 9.214e-05, [6] [build]: 2.944e-05 [elim_shapecalc]: 8.35001e-06 [elim_not_effective]: 1.185e-05 [opt_reshape]: 5.77999e-06 [fold_const_symbol]: 8.39002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.547e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00044706 [validate]: 3.129e-05 [backend_pass]: 1.20999e-06 [task_emit]: 0.0641883 [execute]: 8.80999e-06 Sums bootstrap : 0.000460s : 0.63% type_inference : 0.005408s : 7.35% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000414s : 0.56% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000351s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.61% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000413s : 0.56% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000029s : 0.04% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.61% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064188s : 87.23% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.52% : 0.000022s : 4: substitution.arithmetic_simplify 1.72% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.46% : 0.000005s : 4: substitution.graph_param_transform 65.60% : 0.000078s : 2: substitution.inline 2.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.26% : 0.000004s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.005365 2 93.06% : 0.004993s : 1: type_inference.infer 6.94% : 0.000372s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 1.08% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.60% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.80% : 0.000002s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.85% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.64% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000263 6 43.94% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.06% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.085613 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.58% : 0.003061s : 1: add_attr 3.56% : 0.003052s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.58% : 0.000495s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.54% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.89% : 0.000759s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.16% : 0.001853s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.53% : 0.000457s : 1: opt_after_jit_grad 0.22% : 0.000184s : 1: opt_b 4.31% : 0.003689s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000193s : 1: renormalize.infer 0.18% : 0.000152s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.12% : 0.000099s : 1: symbol_engine_optimizer 74.99% : 0.064205s : 1: task_emit 0.08% : 0.000070s : 1: tuple_transform 6.34% : 0.005424s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.0758209, [24] [bootstrap]: 0.00052781 [type_inference]: 0.00570677 [event_method]: 1.455e-05 [auto_monad]: 5.349e-05 [graph_reusing]: 5.60001e-06 [inline]: 2.26e-06 [add_attr]: 0.00399414, [1] [add_attr_with_inline]: 0.00398578, [1] [Cycle 1]: 4.577e-05, [2] [tag_attr]: 1.558e-05 [meta_addattr_fg_expand]: 4.50001e-06 [parallel-infer-symbol]: 2.64999e-06 [pre_auto_parallel]: 2.545e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.00407388, [53] [py_interpret_to_execute]: 2.119e-05 [rewriter_before_opt_a]: 5.939e-05 [opt_a]: 0.0022297, [2] [Cycle 1]: 0.00155689, [45] [expand_dump_flag]: 2.73998e-06 [switch_simplify]: 3.117e-05 [loop_unroll]: 2.112e-05 [a_1]: 0.00045963 [with_stream_mark]: 1.41e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 3.27002e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 8.169e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 1.92001e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 6.14001e-06 [merge_send_recv]: 7.7e-06 [auto_parallel]: 6.02001e-06 [parallel]: 1.715e-05 [flash_sp]: 7.87e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.75e-06 [matmul_add_comm_reduction]: 8.62e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.37002e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 3.99002e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 8.96998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.072e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 1.244e-05 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 2.61e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.053e-05 [a_after_grad]: 9.16998e-06 [renormalize]: 0.00044419 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 2.21e-06 [auto_monad_eliminator]: 1.391e-05 [cse]: 2.717e-05 [a_3]: 4.141e-05 [Cycle 2]: 0.00066353, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.40001e-06 [a_1]: 0.00012591 [with_stream_mark]: 9.92999e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.88998e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00012207 [accelerated_algorithm]: 5.79999e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.58999e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.71002e-06 [flash_sp]: 3.38e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 3.11999e-06 [matmul_add_comm_reduction]: 8.73001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.44001e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.13002e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.74e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 8.25e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21999e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.14e-06 [after_resolve]: 9.00001e-06 [a_after_grad]: 8.32998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 8.69972e-07 [auto_monad_eliminator]: 6.81001e-06 [cse]: 1.502e-05 [a_3]: 3.246e-05 [py_interpret_to_execute_after_opt_a]: 7.70998e-06 [slice_cell_reuse_recomputed_activation]: 1.75001e-06 [rewriter_after_opt_a]: 3.139e-05 [convert_after_rewriter]: 6.77002e-06 [order_py_execute_after_rewriter]: 5.30999e-06 [mutable_eliminate]: 0.00045395 [opt_b]: 0.00018303, [1] [Cycle 1]: 0.00017669, [7] [b_1]: 0.00010818 [b_2]: 7.18998e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 5.89993e-07 [cse]: 1.697e-05 [optimize_parallel_all_gather_comm]: 1.621e-05 [overlap_param_gather]: 1.81003e-06 [cconv]: 2.231e-05 [loop_unroll]: 0.00041529 [opt_after_cconv]: 9.61e-05, [1] [Cycle 1]: 9.033e-05, [7] [c_1]: 2.827e-05 [parameter_eliminate]: 2.38998e-06 [updatestate_depend_eliminate]: 4.88001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.642e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.195e-05 [tuple_transform]: 6.947e-05, [1] [Cycle 1]: 6.505e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.81998e-06 [add_recomputation]: 4.246e-05 [cse_after_recomputation]: 2.123e-05, [1] [Cycle 1]: 1.684e-05, [1] [cse]: 1.134e-05 [environ_conv]: 5.04003e-06 [swap_dp_allreduce_reducescatter]: 5.31998e-06 [bias_add_comm_swap]: 2.93e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 3.03e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.64e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.34999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66998e-06 [control_data_broadcast_order]: 1.13e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.653e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.88997e-06 [handle_group_info]: 1.44e-06 [symbol_engine_optimizer]: 6.808e-05, [1] [Cycle 1]: 6.371e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.46002e-06 [elim_not_effective]: 1.122e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 1.613e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00045268 [validate]: 3.156e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.0606892 [execute]: 7.78999e-06 Sums bootstrap : 0.000528s : 0.74% type_inference : 0.005707s : 8.05% event_method : 0.000015s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000586s : 0.83% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000204s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000444s : 0.63% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.64% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.59% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000453s : 0.64% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.060689s : 85.66% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.47% : 0.000024s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.00% : 0.000005s : 4: substitution.graph_param_transform 67.76% : 0.000114s : 3: substitution.inline 1.74% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 6.37% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005666 2 90.23% : 0.005113s : 1: type_inference.infer 9.77% : 0.000554s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.75% : 0.000027s : 3: replace.inline 30.25% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 92.03% : 0.000111s : 3: match.inline 7.97% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 1.03% : 0.000002s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.11% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.28% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.94% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.89% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.82% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.81% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 8 46.31% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.69% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.085499 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.68% : 0.003998s : 1: add_attr 4.67% : 0.003989s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.66% : 0.000562s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.54% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.19% : 0.001016s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.61% : 0.002233s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.54% : 0.000462s : 1: opt_after_jit_grad 0.22% : 0.000186s : 1: opt_b 4.77% : 0.004078s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000220s : 1: renormalize.infer 0.25% : 0.000218s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 71.00% : 0.060705s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.69% : 0.005719s : 1: type_inference 0.06% : 0.000054s : 1: validate TotalTime = 0.866138, [24] [bootstrap]: 0.00058895 [type_inference]: 0.0116887 [event_method]: 4.944e-05 [auto_monad]: 0.00012136 [graph_reusing]: 8.74003e-06 [inline]: 2.21e-06 [add_attr]: 0.00303662, [1] [add_attr_with_inline]: 0.00302858, [1] [Cycle 1]: 7.041e-05, [2] [tag_attr]: 3.483e-05 [meta_addattr_fg_expand]: 9.34e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 4.926e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.0148006, [53] [py_interpret_to_execute]: 4.005e-05 [rewriter_before_opt_a]: 0.00014853 [opt_a]: 0.0113016, [3] [Cycle 1]: 0.00727967, [45] [expand_dump_flag]: 4.25999e-06 [switch_simplify]: 7.698e-05 [loop_unroll]: 6.384e-05 [a_1]: 0.00151288 [with_stream_mark]: 2.311e-05 [recompute_prepare]: 2.238e-05 [updatestate_depend_eliminate]: 9.25001e-06 [updatestate_assign_eliminate]: 7.57998e-06 [updatestate_loads_eliminate]: 7.12002e-06 [parameter_eliminate]: 2.44001e-06 [a_2]: 0.00024951 [accelerated_algorithm]: 3.096e-05 [shard]: 1.84998e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.658e-05 [merge_send_recv]: 1.54e-05 [auto_parallel]: 1.096e-05 [parallel]: 1.932e-05 [flash_sp]: 1.176e-05 [merge_comm]: 9.42001e-06 [allreduce_fusion]: 8.73001e-06 [matmul_add_comm_reduction]: 2.697e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.835e-05 [virtual_dataset]: 1.606e-05 [get_grad_eliminate_]: 1.557e-05 [virtual_output]: 1.546e-05 [merge_forward]: 9.47001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 1.743e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.853e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.799e-05 [set_forward_comm_id_for_comm_node_pass]: 9.84001e-06 [meta_fg_expand]: 0.00141205 [flash_sp_send_recv_attached]: 3.88999e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 6.167e-05 [a_after_grad]: 8.228e-05 [renormalize]: 0.00255611 [add_forward_monad_depend]: 9.12001e-06 [auto_monad_grad]: 5.62999e-06 [auto_monad_eliminator]: 5.682e-05 [cse]: 0.00017026 [a_3]: 0.00034234 [Cycle 2]: 0.00309231, [45] [expand_dump_flag]: 1.58002e-06 [switch_simplify]: 4.802e-05 [loop_unroll]: 4.401e-05 [a_1]: 0.00158677 [with_stream_mark]: 1.248e-05 [recompute_prepare]: 1.115e-05 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.53999e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012894 [accelerated_algorithm]: 1.207e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 9.42999e-06 [merge_send_recv]: 6.74999e-06 [auto_parallel]: 7.30998e-06 [parallel]: 4.83001e-06 [flash_sp]: 3.08e-06 [merge_comm]: 5.27001e-06 [allreduce_fusion]: 4.72998e-06 [matmul_add_comm_reduction]: 7.58001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.066e-05 [virtual_dataset]: 8.87999e-06 [get_grad_eliminate_]: 9.09998e-06 [virtual_output]: 8.62e-06 [merge_forward]: 5.12e-06 [cell_reuse_recompute_pass]: 8.09989e-07 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.66e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.431e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42001e-06 [meta_fg_expand]: 7.233e-05 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.14998e-06 [after_resolve]: 1.693e-05 [a_after_grad]: 1.483e-05 [renormalize]: 0.00060644 [add_forward_monad_depend]: 4.2e-06 [auto_monad_grad]: 1.17999e-06 [auto_monad_eliminator]: 1.46e-05 [cse]: 4.826e-05 [a_3]: 6.656e-05 [Cycle 3]: 0.00091569, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 1.063e-05 [loop_unroll]: 9.09998e-06 [a_1]: 0.0002542 [with_stream_mark]: 9.94999e-06 [recompute_prepare]: 9.46e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 3.95998e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 0.00012605 [accelerated_algorithm]: 1.181e-05 [shard]: 8.89995e-07 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.27001e-06 [merge_send_recv]: 7e-06 [auto_parallel]: 7.33e-06 [parallel]: 4.45e-06 [flash_sp]: 1.03001e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.95001e-06 [matmul_add_comm_reduction]: 7.66999e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 1.025e-05 [virtual_dataset]: 8.81002e-06 [get_grad_eliminate_]: 8.80001e-06 [virtual_output]: 8.41002e-06 [merge_forward]: 3.99002e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.43001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.596e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.54e-05 [set_forward_comm_id_for_comm_node_pass]: 6.12001e-06 [meta_fg_expand]: 3.12002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.479e-05 [a_after_grad]: 1.424e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.069e-05 [cse]: 2.628e-05 [a_3]: 6.167e-05 [py_interpret_to_execute_after_opt_a]: 1.093e-05 [slice_cell_reuse_recomputed_activation]: 2.25002e-06 [rewriter_after_opt_a]: 4.722e-05 [convert_after_rewriter]: 9.30001e-06 [order_py_execute_after_rewriter]: 7.18998e-06 [mutable_eliminate]: 0.00046605 [opt_b]: 0.00029344, [1] [Cycle 1]: 0.00028716, [7] [b_1]: 0.00019294 [b_2]: 1.071e-05 [updatestate_depend_eliminate]: 6.97002e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.97002e-06 [renormalize]: 6.79982e-07 [cse]: 3.278e-05 [optimize_parallel_all_gather_comm]: 2.031e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.07e-05 [loop_unroll]: 0.00158427 [opt_after_cconv]: 0.00015171, [1] [Cycle 1]: 0.00014494, [7] [c_1]: 5.257e-05 [parameter_eliminate]: 2.71999e-06 [updatestate_depend_eliminate]: 7.62002e-06 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 3.498e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 3.163e-05 [tuple_transform]: 0.0001096, [1] [Cycle 1]: 0.00010458, [4] [d_1]: 7.184e-05 [none_parameter_eliminate]: 1.88002e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 1.026e-05 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 6.05e-05 [cse_after_recomputation]: 3.434e-05, [1] [Cycle 1]: 2.92e-05, [1] [cse]: 2.343e-05 [environ_conv]: 1.047e-05 [swap_dp_allreduce_reducescatter]: 8.28999e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.66002e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.86999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 9.90025e-07 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.24998e-06 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.792e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 4.84e-06 [overlap_recompute_and_grad_model_parallel]: 5.58002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 6.12001e-06 [overlap_grad_flash_sp]: 2.487e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.16998e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 0.00010626, [1] [Cycle 1]: 0.00010201, [6] [build]: 1.035e-05 [elim_shapecalc]: 1.631e-05 [elim_not_effective]: 1.843e-05 [opt_reshape]: 1.231e-05 [fold_const_symbol]: 1.55e-05 [renormalize]: 2.09984e-07 [detach_backward]: 2.15002e-06 [pipeline_parallel_scheduler]: 1.70001e-06 [auto_monad_reorder]: 2.691e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00048488 [validate]: 4.611e-05 [backend_pass]: 1.08001e-06 [task_emit]: 0.834982 [execute]: 9.97999e-06 Sums bootstrap : 0.000589s : 0.07% type_inference : 0.011689s : 1.36% event_method : 0.000049s : 0.01% auto_monad : 0.000121s : 0.01% graph_reusing : 0.000009s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.00% optimize.rewriter_before_opt_a : 0.000149s : 0.02% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000136s : 0.02% optimize.opt_a.loop_unroll : 0.000117s : 0.01% optimize.opt_a.a_1 : 0.003354s : 0.39% optimize.opt_a.with_stream_mark : 0.000046s : 0.01% optimize.opt_a.recompute_prepare : 0.000043s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000504s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000029s : 0.00% optimize.opt_a.auto_parallel : 0.000026s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000018s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001488s : 0.17% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000093s : 0.01% optimize.opt_a.a_after_grad : 0.000111s : 0.01% optimize.opt_a.renormalize : 0.003163s : 0.37% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.00% optimize.opt_a.auto_monad_grad : 0.000008s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.01% optimize.opt_a.cse : 0.000245s : 0.03% optimize.opt_a.a_3 : 0.000471s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000466s : 0.05% optimize.opt_b.b_1 : 0.000193s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.00% optimize.loop_unroll : 0.001584s : 0.18% optimize.opt_after_cconv.c_1 : 0.000053s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000035s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.00% optimize.tuple_transform.d_1 : 0.000072s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.01% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000485s : 0.06% validate : 0.000046s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.834982s : 96.89% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000773 222 5.90% : 0.000046s : 12: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000003s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.18% : 0.000434s : 17: substitution.inline 2.01% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000015s : 3: substitution.less_batch_normalization 1.69% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.72% : 0.000013s : 20: substitution.remove_not_recompute_node 3.11% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.60% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.49% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011576 2 86.86% : 0.010055s : 1: type_inference.infer 13.14% : 0.001521s : 1: type_inference.specialize ------[replace.] 0.000262 33 62.58% : 0.000164s : 17: replace.inline 37.42% : 0.000098s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000459 33 92.66% : 0.000425s : 17: match.inline 7.34% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000760 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000042s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.55% : 0.000004s : 8: predicate.loop_unroll_after_grad 2.33% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.03% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.62% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 101: predicate.switch_defer_inline 2.92% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.07% : 0.000039s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.50% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000012s : 84: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.53% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001627 34 57.67% : 0.000938s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.33% : 0.000689s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.892535 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.34% : 0.003041s : 1: add_attr 0.34% : 0.003032s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000065s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000128s : 1: auto_monad 0.00% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000626s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000014s : 1: environ_conv 0.01% : 0.000056s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.18% : 0.001598s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.05% : 0.000475s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000021s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.57% : 0.005056s : 117: opt.transform.opt_a 0.01% : 0.000051s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000178s : 28: opt.transform.opt_b 0.01% : 0.000080s : 2: opt.transform.opt_trans_graph 0.01% : 0.000059s : 4: opt.transform.symbol_engine_opt 1.27% : 0.011305s : 1: opt_a 0.02% : 0.000155s : 1: opt_after_cconv 0.06% : 0.000495s : 1: opt_after_jit_grad 0.03% : 0.000297s : 1: opt_b 1.66% : 0.014805s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000054s : 1: pre_auto_parallel 0.00% : 0.000044s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000036s : 1: remove_dup_value 0.19% : 0.001723s : 2: renormalize.infer 0.16% : 0.001427s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000051s : 1: rewriter_after_opt_a 0.02% : 0.000153s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000109s : 1: symbol_engine_optimizer 93.55% : 0.835003s : 1: task_emit 0.01% : 0.000113s : 1: tuple_transform 1.31% : 0.011703s : 1: type_inference 0.01% : 0.000073s : 1: validate TotalTime = 0.0725618, [24] [bootstrap]: 0.00046822 [type_inference]: 0.00432562 [event_method]: 1.101e-05 [auto_monad]: 5.039e-05 [graph_reusing]: 5.15001e-06 [inline]: 1.81e-06 [add_attr]: 0.00300043, [1] [add_attr_with_inline]: 0.00299272, [1] [Cycle 1]: 4.494e-05, [2] [tag_attr]: 1.214e-05 [meta_addattr_fg_expand]: 2.86999e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.113e-05 [insert-virtual-dataset]: 2.21e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00372272, [53] [py_interpret_to_execute]: 1.454e-05 [rewriter_before_opt_a]: 4.058e-05 [opt_a]: 0.00187183, [2] [Cycle 1]: 0.00126542, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 2.365e-05 [loop_unroll]: 1.397e-05 [a_1]: 0.00029214 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 7.51001e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.791e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 2.23002e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 6.10002e-06 [merge_send_recv]: 8.43999e-06 [auto_parallel]: 5.88998e-06 [parallel]: 1.867e-05 [flash_sp]: 7.78999e-06 [merge_comm]: 3.33e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 8.62998e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.31999e-06 [virtual_dataset]: 6.01998e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.31998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.081e-05 [merge_recompute_call_nodes]: 1.69998e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63999e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.063e-05 [a_after_grad]: 9.29e-06 [renormalize]: 0.00034633 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.384e-05 [cse]: 2.718e-05 [a_3]: 4.057e-05 [Cycle 2]: 0.00059731, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012567 [with_stream_mark]: 1.098e-05 [recompute_prepare]: 5.78997e-06 [updatestate_depend_eliminate]: 2.83998e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.842e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.39998e-06 [merge_send_recv]: 4.08999e-06 [auto_parallel]: 5.24e-06 [parallel]: 4e-06 [flash_sp]: 3.02002e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.27001e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 5.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.99999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.93001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.92001e-06 [a_after_grad]: 8.63001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.44001e-06 [cse]: 1.293e-05 [a_3]: 3.262e-05 [py_interpret_to_execute_after_opt_a]: 7.46999e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.019e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.14998e-06 [mutable_eliminate]: 0.00045418 [opt_b]: 0.00018367, [1] [Cycle 1]: 0.00017777, [7] [b_1]: 0.00010859 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.4002e-07 [cse]: 1.681e-05 [optimize_parallel_all_gather_comm]: 1.529e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.211e-05 [loop_unroll]: 0.00041939 [opt_after_cconv]: 9.518e-05, [1] [Cycle 1]: 8.958e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 4.88001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.668e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.254e-05 [tuple_transform]: 6.858e-05, [1] [Cycle 1]: 6.425e-05, [4] [d_1]: 3.857e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.48e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 6.919e-05 [cse_after_recomputation]: 2.268e-05, [1] [Cycle 1]: 1.824e-05, [1] [cse]: 1.3e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 4.95999e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.36002e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61998e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.39e-06 [offloading_packed_experts]: 3.60998e-06 [overlap_recompute_and_grad_model_parallel]: 4.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.63998e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.738e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.987e-05, [1] [Cycle 1]: 6.578e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.49998e-06 [elim_not_effective]: 1.2e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.583e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00045604 [validate]: 3.146e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0602265 [execute]: 8.65001e-06 Sums bootstrap : 0.000468s : 0.68% type_inference : 0.004326s : 6.31% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000418s : 0.61% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000346s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.66% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000419s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000069s : 0.10% optimize.cse_after_recomputation.cse : 0.000013s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000456s : 0.66% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060227s : 87.80% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000119 26 17.61% : 0.000021s : 4: substitution.arithmetic_simplify 1.50% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.51% : 0.000005s : 4: substitution.graph_param_transform 66.03% : 0.000079s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.84% : 0.000005s : 4: substitution.remove_not_recompute_node 3.22% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004285 2 91.60% : 0.003926s : 1: type_inference.infer 8.40% : 0.000360s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.93% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.54% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.70% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.70% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.54% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.87% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 42.19% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.81% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080562 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.73% : 0.003005s : 1: add_attr 3.72% : 0.002996s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000074s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.63% : 0.000504s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000026s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.96% : 0.000773s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.33% : 0.001875s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.58% : 0.000465s : 1: opt_after_jit_grad 0.23% : 0.000187s : 1: opt_b 4.63% : 0.003727s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000189s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000045s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 74.78% : 0.060243s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.39% : 0.004339s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.108637, [24] [bootstrap]: 0.00049837 [type_inference]: 0.0103703 [event_method]: 4.258e-05 [auto_monad]: 0.00011441 [graph_reusing]: 7.7e-06 [inline]: 1.72999e-06 [add_attr]: 0.00304557, [1] [add_attr_with_inline]: 0.00303749, [1] [Cycle 1]: 6.636e-05, [2] [tag_attr]: 3.105e-05 [meta_addattr_fg_expand]: 8.74003e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 4.618e-05 [insert-virtual-dataset]: 2.44999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.68002e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0131845, [53] [py_interpret_to_execute]: 3.62e-05 [rewriter_before_opt_a]: 0.00012765 [opt_a]: 0.0109246, [3] [Cycle 1]: 0.0070023, [45] [expand_dump_flag]: 3.53e-06 [switch_simplify]: 6.701e-05 [loop_unroll]: 5.617e-05 [a_1]: 0.00139178 [with_stream_mark]: 2.263e-05 [recompute_prepare]: 2.19e-05 [updatestate_depend_eliminate]: 9.07001e-06 [updatestate_assign_eliminate]: 8.15999e-06 [updatestate_loads_eliminate]: 7.25e-06 [parameter_eliminate]: 2.74001e-06 [a_2]: 0.000246 [accelerated_algorithm]: 3.072e-05 [shard]: 1.84e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.557e-05 [auto_parallel]: 1.092e-05 [parallel]: 1.716e-05 [flash_sp]: 1.075e-05 [merge_comm]: 9.56e-06 [allreduce_fusion]: 8.99e-06 [matmul_add_comm_reduction]: 2.532e-05 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 1.779e-05 [virtual_dataset]: 1.593e-05 [get_grad_eliminate_]: 1.537e-05 [virtual_output]: 1.643e-05 [merge_forward]: 9.46998e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.733e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.048e-05 [merge_recompute_call_nodes]: 1.49998e-06 [before_grad]: 2.764e-05 [set_forward_comm_id_for_comm_node_pass]: 9.39e-06 [meta_fg_expand]: 0.00138984 [flash_sp_send_recv_attached]: 3.63e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 5.968e-05 [a_after_grad]: 8.244e-05 [renormalize]: 0.0024644 [add_forward_monad_depend]: 9.10999e-06 [auto_monad_grad]: 5.30001e-06 [auto_monad_eliminator]: 5.675e-05 [cse]: 0.000169 [a_3]: 0.00033512 [Cycle 2]: 0.00300638, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 4.7e-05 [loop_unroll]: 4.411e-05 [a_1]: 0.00153799 [with_stream_mark]: 1.176e-05 [recompute_prepare]: 1.055e-05 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012587 [accelerated_algorithm]: 1.22e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.81998e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 6.83998e-06 [auto_parallel]: 7.28e-06 [parallel]: 4.62998e-06 [flash_sp]: 3.18998e-06 [merge_comm]: 5.72001e-06 [allreduce_fusion]: 4.75999e-06 [matmul_add_comm_reduction]: 7.87003e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.03e-05 [virtual_dataset]: 8.89998e-06 [get_grad_eliminate_]: 8.53001e-06 [virtual_output]: 8.58001e-06 [merge_forward]: 4.29997e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 8.73001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.654e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.419e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 3.485e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.548e-05 [a_after_grad]: 1.463e-05 [renormalize]: 0.00062675 [add_forward_monad_depend]: 3.92998e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.513e-05 [cse]: 4.748e-05 [a_3]: 6.488e-05 [Cycle 3]: 0.00090123, [45] [expand_dump_flag]: 1.10001e-06 [switch_simplify]: 1.079e-05 [loop_unroll]: 9.01002e-06 [a_1]: 0.00025008 [with_stream_mark]: 1.016e-05 [recompute_prepare]: 9.44998e-06 [updatestate_depend_eliminate]: 4.66002e-06 [updatestate_assign_eliminate]: 3.80998e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00012324 [accelerated_algorithm]: 1.166e-05 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 6.75998e-06 [auto_parallel]: 7.16001e-06 [parallel]: 4.72e-06 [flash_sp]: 9.79984e-07 [merge_comm]: 5.10001e-06 [allreduce_fusion]: 5.02e-06 [matmul_add_comm_reduction]: 7.73999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.025e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.45001e-06 [virtual_output]: 8.22998e-06 [merge_forward]: 4.38001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.52998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.615e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.413e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42001e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.313e-05 [a_after_grad]: 1.424e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 1.05999e-06 [auto_monad_eliminator]: 1.123e-05 [cse]: 2.66e-05 [a_3]: 5.827e-05 [py_interpret_to_execute_after_opt_a]: 1.038e-05 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 4.696e-05 [convert_after_rewriter]: 9.37001e-06 [order_py_execute_after_rewriter]: 7.12002e-06 [mutable_eliminate]: 0.00046559 [opt_b]: 0.00028912, [1] [Cycle 1]: 0.00028314, [7] [b_1]: 0.00019011 [b_2]: 1.08e-05 [updatestate_depend_eliminate]: 7.04001e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.95998e-06 [renormalize]: 2.09984e-07 [cse]: 3.169e-05 [optimize_parallel_all_gather_comm]: 2.086e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 1.936e-05 [loop_unroll]: 0.00043253 [opt_after_cconv]: 0.00013578, [1] [Cycle 1]: 0.00012951, [7] [c_1]: 4.834e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 6.94001e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 3.98001e-06 [cse]: 2.967e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 2.865e-05 [tuple_transform]: 0.00010269, [1] [Cycle 1]: 9.789e-05, [4] [d_1]: 6.764e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.92999e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 5.732e-05 [cse_after_recomputation]: 3.27e-05, [1] [Cycle 1]: 2.786e-05, [1] [cse]: 2.232e-05 [environ_conv]: 9.20999e-06 [swap_dp_allreduce_reducescatter]: 7.7e-06 [bias_add_comm_swap]: 2.30002e-06 [label_micro_interleaved_index]: 4.73001e-06 [label_fine_grained_interleaved_index]: 2.40002e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.15001e-06 [full_micro_interleaved_order_control]: 2.30002e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.698e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 5.08002e-06 [overlap_recompute_and_grad_model_parallel]: 5.82001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 5.47001e-06 [overlap_grad_flash_sp]: 2.388e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.21002e-06 [symbol_engine_optimizer]: 9.765e-05, [1] [Cycle 1]: 9.355e-05, [6] [build]: 9.66e-06 [elim_shapecalc]: 1.34e-05 [elim_not_effective]: 1.795e-05 [opt_reshape]: 9.91e-06 [fold_const_symbol]: 1.495e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 2.387e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00049159 [validate]: 4.586e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0805268 [execute]: 8.18001e-06 Sums bootstrap : 0.000498s : 0.48% type_inference : 0.010370s : 9.94% event_method : 0.000043s : 0.04% auto_monad : 0.000114s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000128s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.12% optimize.opt_a.loop_unroll : 0.000109s : 0.10% optimize.opt_a.a_1 : 0.003180s : 3.05% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001428s : 1.37% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.11% optimize.opt_a.renormalize : 0.003091s : 2.96% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000243s : 0.23% optimize.opt_a.a_3 : 0.000458s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000466s : 0.45% optimize.opt_b.b_1 : 0.000190s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000433s : 0.41% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000492s : 0.47% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.080527s : 77.18% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000733 218 5.75% : 0.000042s : 11: substitution.arithmetic_simplify 1.80% : 0.000013s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.33% : 0.000406s : 16: substitution.inline 2.21% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000015s : 3: substitution.less_batch_normalization 1.79% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.82% : 0.000013s : 20: substitution.remove_not_recompute_node 3.20% : 0.000023s : 10: substitution.replace_applicator 1.39% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.73% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.56% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010300 2 87.13% : 0.008975s : 1: type_inference.infer 12.87% : 0.001325s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.79% : 0.000119s : 16: replace.inline 41.21% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000428 30 92.73% : 0.000397s : 16: match.inline 7.27% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000741 5663 1.13% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.63% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.60% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.22% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001531 32 57.42% : 0.000879s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.58% : 0.000652s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.133117 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.29% : 0.003050s : 1: add_attr 2.28% : 0.003041s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000121s : 1: auto_monad 0.02% : 0.000027s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000533s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000049s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.33% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000475s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.63% : 0.004834s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.21% : 0.010928s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.38% : 0.000501s : 1: opt_after_jit_grad 0.22% : 0.000293s : 1: opt_b 9.91% : 0.013188s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.23% : 0.001634s : 2: renormalize.infer 1.08% : 0.001444s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000100s : 1: symbol_engine_optimizer 60.51% : 0.080545s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 7.80% : 0.010385s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x6-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x7-pynative],max_mem:42.0M TotalTime = 0.0240913, [24] [bootstrap]: 0.00057803 [type_inference]: 0.00641481 [event_method]: 1.487e-05 [auto_monad]: 5.658e-05 [graph_reusing]: 5.39998e-06 [inline]: 1.94e-06 [add_attr]: 0.00339973, [1] [add_attr_with_inline]: 0.00338961, [1] [Cycle 1]: 4.518e-05, [2] [tag_attr]: 1.558e-05 [meta_addattr_fg_expand]: 4.17e-06 [parallel-infer-symbol]: 2.96999e-06 [pre_auto_parallel]: 2.834e-05 [insert-virtual-dataset]: 2.84999e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00398182, [53] [py_interpret_to_execute]: 2e-05 [rewriter_before_opt_a]: 5.772e-05 [opt_a]: 0.00212119, [2] [Cycle 1]: 0.00152075, [45] [expand_dump_flag]: 2.76e-06 [switch_simplify]: 3.192e-05 [loop_unroll]: 2.117e-05 [a_1]: 0.00045384 [with_stream_mark]: 1.332e-05 [recompute_prepare]: 7.85e-06 [updatestate_depend_eliminate]: 4.35e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.553e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.78002e-06 [merge_send_recv]: 7.46001e-06 [auto_parallel]: 5.99999e-06 [parallel]: 2.283e-05 [flash_sp]: 7.56001e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 9.11002e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.44002e-06 [virtual_dataset]: 5.89999e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 4.95999e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 9.27999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.56998e-06 [before_grad]: 9.38997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 2.81999e-06 [flash_sp_send_recv_attached]: 2.98e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 8.55999e-06 [renormalize]: 0.00041894 [add_forward_monad_depend]: 4.41002e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.388e-05 [cse]: 2.83e-05 [a_3]: 4.167e-05 [Cycle 2]: 0.00059098, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.66e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.0001279 [with_stream_mark]: 9.89999e-06 [recompute_prepare]: 5.52999e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.765e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.23001e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.28001e-06 [flash_sp]: 2.89999e-06 [merge_comm]: 2.77002e-06 [allreduce_fusion]: 2.49001e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.83001e-06 [merge_forward]: 2.53003e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.24e-06 [a_after_grad]: 7.92e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.615e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 7.83001e-06 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 2.874e-05 [convert_after_rewriter]: 7.02002e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00045077 [opt_b]: 0.0001818, [1] [Cycle 1]: 0.00017571, [7] [b_1]: 0.00010812 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 8.49977e-07 [cse]: 1.685e-05 [optimize_parallel_all_gather_comm]: 1.59e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 2.243e-05 [loop_unroll]: 0.00043544 [opt_after_cconv]: 9.564e-05, [1] [Cycle 1]: 8.987e-05, [7] [c_1]: 2.792e-05 [parameter_eliminate]: 1.98002e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.31998e-06 [cse]: 1.673e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.284e-05 [tuple_transform]: 6.919e-05, [1] [Cycle 1]: 6.443e-05, [4] [d_1]: 3.885e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 5.006e-05 [cse_after_recomputation]: 2.064e-05, [1] [Cycle 1]: 1.643e-05, [1] [cse]: 1.153e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.32999e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.02002e-06 [label_fine_grained_interleaved_index]: 2.48002e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.39e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.177e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.38e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.13998e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.657e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 6.785e-05, [1] [Cycle 1]: 6.356e-05, [6] [build]: 2.12001e-06 [elim_shapecalc]: 8.17e-06 [elim_not_effective]: 1.133e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 0.00012835 [opt_after_jit_grad]: 0.00045775 [validate]: 3.125e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00874591 [execute]: 7e-06 Sums bootstrap : 0.000578s : 2.93% type_inference : 0.006415s : 32.52% event_method : 0.000015s : 0.08% auto_monad : 0.000057s : 0.29% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.10% optimize.rewriter_before_opt_a : 0.000058s : 0.29% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.20% optimize.opt_a.loop_unroll : 0.000027s : 0.13% optimize.opt_a.a_1 : 0.000582s : 2.95% optimize.opt_a.with_stream_mark : 0.000023s : 0.12% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.73% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.06% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.05% optimize.opt_a.merge_comm : 0.000006s : 0.03% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.05% optimize.opt_a.virtual_output : 0.000010s : 0.05% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.03% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.10% optimize.opt_a.a_after_grad : 0.000016s : 0.08% optimize.opt_a.renormalize : 0.000419s : 2.12% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.10% optimize.opt_a.cse : 0.000044s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.37% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.15% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.29% optimize.opt_b.b_1 : 0.000108s : 0.55% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.11% optimize.loop_unroll : 0.000435s : 2.21% optimize.opt_after_cconv.c_1 : 0.000028s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.08% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.25% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000005s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000128s : 0.65% opt_after_jit_grad : 0.000458s : 2.32% validate : 0.000031s : 0.16% backend_pass : 0.000001s : 0.01% task_emit : 0.008746s : 44.34% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.60% : 0.000024s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 67.05% : 0.000110s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.69% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006367 2 90.76% : 0.005778s : 1: type_inference.infer 9.24% : 0.000588s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.17% : 0.000027s : 3: replace.inline 29.83% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.59% : 0.000108s : 3: match.inline 8.41% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.06% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.52% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000000s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000383 8 46.92% : 0.000180s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.08% : 0.000203s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032990 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.32% : 0.003404s : 1: add_attr 10.29% : 0.003393s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000062s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000614s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.35% : 0.000445s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.39% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.87% : 0.000945s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000090s : 28: opt.transform.opt_b 0.13% : 0.000043s : 2: opt.transform.opt_trans_graph 0.09% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.44% : 0.002124s : 1: opt_a 0.30% : 0.000099s : 1: opt_after_cconv 1.42% : 0.000467s : 1: opt_after_jit_grad 0.56% : 0.000185s : 1: opt_b 12.08% : 0.003986s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.07% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.64% : 0.000211s : 1: renormalize.infer 0.61% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.41% : 0.000134s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000033s : 1: rewriter_after_opt_a 0.19% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.21% : 0.000070s : 1: symbol_engine_optimizer 26.55% : 0.008758s : 1: task_emit 0.22% : 0.000072s : 1: tuple_transform 19.48% : 0.006428s : 1: type_inference 0.19% : 0.000062s : 1: validate TotalTime = 0.0190581, [24] [bootstrap]: 0.00043841 [type_inference]: 0.00494966 [event_method]: 1.061e-05 [auto_monad]: 5.02e-05 [graph_reusing]: 5.01002e-06 [inline]: 1.89e-06 [add_attr]: 0.00308421, [1] [add_attr_with_inline]: 0.00307584, [1] [Cycle 1]: 4.215e-05, [2] [tag_attr]: 1.22e-05 [meta_addattr_fg_expand]: 3.37002e-06 [parallel-infer-symbol]: 3.09999e-06 [pre_auto_parallel]: 2.143e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.00373487, [53] [py_interpret_to_execute]: 1.527e-05 [rewriter_before_opt_a]: 3.951e-05 [opt_a]: 0.0019242, [2] [Cycle 1]: 0.00127837, [45] [expand_dump_flag]: 2.91e-06 [switch_simplify]: 2.394e-05 [loop_unroll]: 1.392e-05 [a_1]: 0.00029178 [with_stream_mark]: 1.371e-05 [recompute_prepare]: 7.35e-06 [updatestate_depend_eliminate]: 3.74002e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 3.48999e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 7.37997e-06 [auto_parallel]: 6.26998e-06 [parallel]: 1.717e-05 [flash_sp]: 7.24001e-06 [merge_comm]: 3.35e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.39998e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 6.94001e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.98999e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.131e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.49999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.61e-06 [receive_attached]: 2.21998e-06 [after_resolve]: 1.091e-05 [a_after_grad]: 9.24e-06 [renormalize]: 0.0003672 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.289e-05 [cse]: 2.639e-05 [a_3]: 3.98e-05 [Cycle 2]: 0.00063619, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.54998e-06 [a_1]: 0.00012393 [with_stream_mark]: 9.07001e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.953e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.34997e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.50001e-06 [flash_sp]: 3.20002e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.01e-06 [virtual_dataset]: 5.08002e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 4.89998e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.24999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.55001e-06 [a_after_grad]: 7.92e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14003e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.22001e-06 [cse]: 1.35e-05 [a_3]: 3.359e-05 [py_interpret_to_execute_after_opt_a]: 7.53e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 3.1e-05 [convert_after_rewriter]: 6.86001e-06 [order_py_execute_after_rewriter]: 5.44998e-06 [mutable_eliminate]: 0.0004504 [opt_b]: 0.00018021, [1] [Cycle 1]: 0.00017413, [7] [b_1]: 0.00010621 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.19997e-07 [cse]: 1.665e-05 [optimize_parallel_all_gather_comm]: 1.521e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.182e-05 [loop_unroll]: 0.00041402 [opt_after_cconv]: 9.51e-05, [1] [Cycle 1]: 8.921e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.08998e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.624e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.184e-05 [tuple_transform]: 6.942e-05, [1] [Cycle 1]: 6.5e-05, [4] [d_1]: 3.919e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 2.02001e-06 [add_recomputation]: 4.522e-05 [cse_after_recomputation]: 2.1e-05, [1] [Cycle 1]: 1.631e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.53001e-06 [swap_dp_allreduce_reducescatter]: 4.78001e-06 [bias_add_comm_swap]: 2.94999e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.05002e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.63998e-06 [comm_op_add_attrs]: 9.90025e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.32e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64998e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.81999e-06 [overlap_recompute_and_grad_model_parallel]: 4.32998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 3.88999e-06 [overlap_grad_flash_sp]: 1.695e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.13998e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.842e-05, [1] [Cycle 1]: 6.448e-05, [6] [build]: 2.13998e-06 [elim_shapecalc]: 8.07003e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.52001e-06 [fold_const_symbol]: 8.84e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.544e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00045207 [validate]: 3.189e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00603801 [execute]: 6.73e-06 Sums bootstrap : 0.000438s : 2.93% type_inference : 0.004950s : 33.07% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000416s : 2.78% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000367s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.01% optimize.opt_b.b_1 : 0.000106s : 0.71% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000414s : 2.77% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 3.02% validate : 0.000032s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006038s : 40.35% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000120 26 18.13% : 0.000022s : 4: substitution.arithmetic_simplify 1.68% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.56% : 0.000005s : 4: substitution.graph_param_transform 65.27% : 0.000078s : 2: substitution.inline 2.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004903 2 92.75% : 0.004547s : 1: type_inference.infer 7.25% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.68% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.10% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.94% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.58% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.18% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 41.49% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.51% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027169 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.37% : 0.003089s : 1: add_attr 11.34% : 0.003080s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.75% : 0.000474s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.56% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.83% : 0.000768s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001927s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.70% : 0.000462s : 1: opt_after_jit_grad 0.68% : 0.000184s : 1: opt_b 13.76% : 0.003739s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.76% : 0.000206s : 1: renormalize.infer 0.57% : 0.000154s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000071s : 1: symbol_engine_optimizer 22.26% : 0.006048s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 18.28% : 0.004966s : 1: type_inference 0.22% : 0.000059s : 1: validate TotalTime = 0.0196116, [24] [bootstrap]: 0.00046 [type_inference]: 0.00556122 [event_method]: 1.432e-05 [auto_monad]: 5.44e-05 [graph_reusing]: 6.52001e-06 [inline]: 2.22999e-06 [add_attr]: 0.00294411, [1] [add_attr_with_inline]: 0.00293603, [1] [Cycle 1]: 4.523e-05, [2] [tag_attr]: 1.483e-05 [meta_addattr_fg_expand]: 4.47998e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.545e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.18998e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.00391754, [53] [py_interpret_to_execute]: 2.044e-05 [rewriter_before_opt_a]: 5.755e-05 [opt_a]: 0.00208791, [2] [Cycle 1]: 0.00149568, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 3.224e-05 [loop_unroll]: 2.09e-05 [a_1]: 0.00044236 [with_stream_mark]: 1.386e-05 [recompute_prepare]: 7.36999e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.79998e-06 [a_2]: 7.562e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.28999e-06 [auto_parallel]: 5.90002e-06 [parallel]: 1.724e-05 [flash_sp]: 7.25998e-06 [merge_comm]: 3.41001e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.50001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.47998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63999e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.43999e-06 [renormalize]: 0.00040277 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 1.92001e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.666e-05 [a_3]: 4.052e-05 [Cycle 2]: 0.00058245, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.32999e-06 [a_1]: 0.00012446 [with_stream_mark]: 9.14e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.74e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.51998e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.04e-06 [parallel]: 4.08999e-06 [flash_sp]: 3.04001e-06 [merge_comm]: 2.84999e-06 [allreduce_fusion]: 2.56998e-06 [matmul_add_comm_reduction]: 5.37999e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 6.20997e-06 [virtual_dataset]: 5.02999e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.76001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.90002e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 8.84e-06 [a_after_grad]: 7.97e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.241e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.56999e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.028e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 5.03002e-06 [mutable_eliminate]: 0.00045398 [opt_b]: 0.00018249, [1] [Cycle 1]: 0.0001766, [7] [b_1]: 0.00010935 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 5.50004e-07 [cse]: 1.591e-05 [optimize_parallel_all_gather_comm]: 1.524e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.175e-05 [loop_unroll]: 0.00041394 [opt_after_cconv]: 9.397e-05, [1] [Cycle 1]: 8.825e-05, [7] [c_1]: 2.725e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.621e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 1.274e-05 [tuple_transform]: 6.964e-05, [1] [Cycle 1]: 6.543e-05, [4] [d_1]: 3.927e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.178e-05 [cse_after_recomputation]: 1.97e-05, [1] [Cycle 1]: 1.549e-05, [1] [cse]: 1.029e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.38998e-06 [label_micro_interleaved_index]: 4.02e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.15001e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.203e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.60998e-06 [overlap_recompute_and_grad_model_parallel]: 4.75001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 3.95998e-06 [overlap_grad_flash_sp]: 1.704e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.30002e-06 [split_layernorm_comm]: 2.20002e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.742e-05, [1] [Cycle 1]: 6.325e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.48999e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 5.86998e-06 [fold_const_symbol]: 8.37998e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.595e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.0004513 [validate]: 3.089e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00591199 [execute]: 6.36998e-06 Sums bootstrap : 0.000460s : 2.93% type_inference : 0.005561s : 35.41% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.35% graph_reusing : 0.000007s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000567s : 3.61% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000014s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000403s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000039s : 0.25% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000454s : 2.89% optimize.opt_b.b_1 : 0.000109s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000414s : 2.64% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000451s : 2.87% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005912s : 37.65% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000162 30 14.50% : 0.000023s : 5: substitution.arithmetic_simplify 1.21% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.50% : 0.000006s : 4: substitution.graph_param_transform 67.12% : 0.000109s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.42% : 0.000004s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 4: substitution.replace_old_param 6.53% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005519 2 89.30% : 0.004929s : 1: type_inference.infer 10.70% : 0.000591s : 1: type_inference.specialize ------[replace.] 0.000038 5 71.16% : 0.000027s : 3: replace.inline 28.84% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.75% : 0.000107s : 3: match.inline 8.25% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.95% : 0.000002s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.17% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.53% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.30% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000336 8 46.52% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.48% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027961 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.54% : 0.002948s : 1: add_attr 10.51% : 0.002940s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000495s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.66% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.32% : 0.000928s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.48% : 0.002091s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.65% : 0.000461s : 1: opt_after_jit_grad 0.67% : 0.000186s : 1: opt_b 14.02% : 0.003921s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000029s : 1: pre_auto_parallel 0.09% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000206s : 1: renormalize.infer 0.68% : 0.000191s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.18% : 0.005921s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 19.94% : 0.005575s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0371512, [24] [bootstrap]: 0.00050661 [type_inference]: 0.0112939 [event_method]: 4.719e-05 [auto_monad]: 0.00011932 [graph_reusing]: 8.63001e-06 [inline]: 1.87999e-06 [add_attr]: 0.00299456, [1] [add_attr_with_inline]: 0.00298608, [1] [Cycle 1]: 7.106e-05, [2] [tag_attr]: 3.481e-05 [meta_addattr_fg_expand]: 9.69e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 5.019e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0132338, [53] [py_interpret_to_execute]: 3.795e-05 [rewriter_before_opt_a]: 0.00014592 [opt_a]: 0.0109834, [3] [Cycle 1]: 0.00701766, [45] [expand_dump_flag]: 4.27e-06 [switch_simplify]: 7.362e-05 [loop_unroll]: 6.103e-05 [a_1]: 0.00147341 [with_stream_mark]: 2.254e-05 [recompute_prepare]: 2.121e-05 [updatestate_depend_eliminate]: 9.20999e-06 [updatestate_assign_eliminate]: 7.77998e-06 [updatestate_loads_eliminate]: 7.06999e-06 [parameter_eliminate]: 2.41998e-06 [a_2]: 0.0002421 [accelerated_algorithm]: 3.092e-05 [shard]: 2.09e-06 [meta_shard_fg_expand]: 3.45998e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.57e-05 [auto_parallel]: 1.095e-05 [parallel]: 1.746e-05 [flash_sp]: 1.189e-05 [merge_comm]: 9.55001e-06 [allreduce_fusion]: 8.68001e-06 [matmul_add_comm_reduction]: 2.614e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.811e-05 [virtual_dataset]: 1.58e-05 [get_grad_eliminate_]: 1.523e-05 [virtual_output]: 1.504e-05 [merge_forward]: 9.34998e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.794e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.867e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 2.749e-05 [set_forward_comm_id_for_comm_node_pass]: 9.64999e-06 [meta_fg_expand]: 0.00138347 [flash_sp_send_recv_attached]: 3.58e-06 [receive_attached]: 2.88e-06 [after_resolve]: 5.95e-05 [a_after_grad]: 8.162e-05 [renormalize]: 0.00240532 [add_forward_monad_depend]: 8.93002e-06 [auto_monad_grad]: 5.44e-06 [auto_monad_eliminator]: 5.582e-05 [cse]: 0.00016304 [a_3]: 0.00033525 [Cycle 2]: 0.00304584, [45] [expand_dump_flag]: 1.47001e-06 [switch_simplify]: 4.769e-05 [loop_unroll]: 4.365e-05 [a_1]: 0.00152986 [with_stream_mark]: 1.225e-05 [recompute_prepare]: 1.118e-05 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 4.36002e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012588 [accelerated_algorithm]: 1.168e-05 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 9.17001e-06 [merge_send_recv]: 6.59001e-06 [auto_parallel]: 7.28999e-06 [parallel]: 5.05999e-06 [flash_sp]: 3.66001e-06 [merge_comm]: 5.64998e-06 [allreduce_fusion]: 4.75001e-06 [matmul_add_comm_reduction]: 7.61001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 8.97e-06 [get_grad_eliminate_]: 8.54998e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 8.39995e-07 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.639e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 5.792e-05 [set_forward_comm_id_for_comm_node_pass]: 5.57999e-06 [meta_fg_expand]: 7.065e-05 [flash_sp_send_recv_attached]: 1.14e-06 [receive_attached]: 1.08001e-06 [after_resolve]: 1.635e-05 [a_after_grad]: 1.473e-05 [renormalize]: 0.0005865 [add_forward_monad_depend]: 4.03999e-06 [auto_monad_grad]: 1.34998e-06 [auto_monad_eliminator]: 1.441e-05 [cse]: 4.699e-05 [a_3]: 6.486e-05 [Cycle 3]: 0.00090586, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 1.063e-05 [loop_unroll]: 8.77e-06 [a_1]: 0.00024969 [with_stream_mark]: 9.49999e-06 [recompute_prepare]: 9.10001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 3.89002e-06 [updatestate_loads_eliminate]: 4.03001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012394 [accelerated_algorithm]: 1.198e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.04998e-06 [merge_send_recv]: 7.19001e-06 [auto_parallel]: 7.30998e-06 [parallel]: 4.96002e-06 [flash_sp]: 1.19e-06 [merge_comm]: 4.94003e-06 [allreduce_fusion]: 4.95999e-06 [matmul_add_comm_reduction]: 7.38999e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 9.97001e-06 [virtual_dataset]: 8.79998e-06 [get_grad_eliminate_]: 8.55001e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.33001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.47e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.698e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.46e-05 [set_forward_comm_id_for_comm_node_pass]: 5.77999e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.373e-05 [a_after_grad]: 1.435e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.29998e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 1.039e-05 [cse]: 2.638e-05 [a_3]: 5.977e-05 [py_interpret_to_execute_after_opt_a]: 1.023e-05 [slice_cell_reuse_recomputed_activation]: 2.35002e-06 [rewriter_after_opt_a]: 4.64e-05 [convert_after_rewriter]: 8.98002e-06 [order_py_execute_after_rewriter]: 6.58e-06 [mutable_eliminate]: 0.00045715 [opt_b]: 0.00028638, [1] [Cycle 1]: 0.00028027, [7] [b_1]: 0.00018922 [b_2]: 1.057e-05 [updatestate_depend_eliminate]: 7.19001e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 3.83001e-06 [renormalize]: 3.09985e-07 [cse]: 3.126e-05 [optimize_parallel_all_gather_comm]: 2.043e-05 [overlap_param_gather]: 1.81003e-06 [cconv]: 1.951e-05 [loop_unroll]: 0.00042476 [opt_after_cconv]: 0.00013498, [1] [Cycle 1]: 0.00012906, [7] [c_1]: 4.877e-05 [parameter_eliminate]: 2.16998e-06 [updatestate_depend_eliminate]: 7.36001e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 3.80998e-06 [cse]: 2.923e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 2.78e-05 [tuple_transform]: 0.0001007, [1] [Cycle 1]: 9.597e-05, [4] [d_1]: 6.544e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 9.94999e-06 [partial_unused_args_eliminate]: 1.64998e-06 [add_recomputation]: 5.664e-05 [cse_after_recomputation]: 3.175e-05, [1] [Cycle 1]: 2.688e-05, [1] [cse]: 2.122e-05 [environ_conv]: 8.23999e-06 [swap_dp_allreduce_reducescatter]: 7.66999e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.17998e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.699e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 5.39e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 4.99e-06 [overlap_grad_flash_sp]: 2.323e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.93997e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 9.798e-05, [1] [Cycle 1]: 9.36e-05, [6] [build]: 8.99998e-06 [elim_shapecalc]: 1.393e-05 [elim_not_effective]: 1.807e-05 [opt_reshape]: 1.014e-05 [fold_const_symbol]: 1.449e-05 [renormalize]: 2.09984e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.487e-05 [get_jit_bprop_graph]: 9.60019e-07 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00048277 [validate]: 4.299e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00811313 [execute]: 7.31001e-06 Sums bootstrap : 0.000507s : 1.54% type_inference : 0.011294s : 34.32% event_method : 0.000047s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.12% optimize.rewriter_before_opt_a : 0.000146s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.40% optimize.opt_a.loop_unroll : 0.000113s : 0.34% optimize.opt_a.a_1 : 0.003253s : 9.89% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000100s : 0.30% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001457s : 4.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000111s : 0.34% optimize.opt_a.renormalize : 0.002992s : 9.09% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000236s : 0.72% optimize.opt_a.a_3 : 0.000460s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000457s : 1.39% optimize.opt_b.b_1 : 0.000189s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000425s : 1.29% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.08% optimize.tuple_transform.d_1 : 0.000065s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000483s : 1.47% validate : 0.000043s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008113s : 24.66% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000759 222 5.90% : 0.000045s : 12: substitution.arithmetic_simplify 1.80% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000007s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.63% : 0.000423s : 17: substitution.inline 2.00% : 0.000015s : 2: substitution.inline_without_move 1.42% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.84% : 0.000014s : 20: substitution.remove_not_recompute_node 3.14% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.82% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.71% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011221 2 87.07% : 0.009771s : 1: type_inference.infer 12.93% : 0.001450s : 1: type_inference.specialize ------[replace.] 0.000221 33 57.61% : 0.000127s : 17: replace.inline 42.39% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000448 33 92.40% : 0.000414s : 17: match.inline 7.60% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000747 5764 1.10% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.22% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000037s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001539 34 57.23% : 0.000881s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.77% : 0.000658s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061660 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.86% : 0.002999s : 1: add_attr 4.85% : 0.002990s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000540s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.04% : 0.004960s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000073s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.82% : 0.010986s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.80% : 0.000492s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.47% : 0.013238s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.63% : 0.001619s : 2: renormalize.infer 2.21% : 0.001361s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.17% : 0.008124s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.34% : 0.011309s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0184329, [24] [bootstrap]: 0.00045494 [type_inference]: 0.00430167 [event_method]: 1.075e-05 [auto_monad]: 5.21e-05 [graph_reusing]: 5.24998e-06 [inline]: 2.06e-06 [add_attr]: 0.00302261, [1] [add_attr_with_inline]: 0.00301411, [1] [Cycle 1]: 4.694e-05, [2] [tag_attr]: 1.168e-05 [meta_addattr_fg_expand]: 3.4e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.089e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.88002e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00368335, [53] [py_interpret_to_execute]: 1.576e-05 [rewriter_before_opt_a]: 3.837e-05 [opt_a]: 0.00185248, [2] [Cycle 1]: 0.00125086, [45] [expand_dump_flag]: 2.78e-06 [switch_simplify]: 2.419e-05 [loop_unroll]: 1.372e-05 [a_1]: 0.00029216 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.81001e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.31001e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.782e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.08998e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 5.79e-06 [parallel]: 1.854e-05 [flash_sp]: 7.2e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 3.86999e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 5.67001e-06 [get_grad_eliminate_]: 5.51998e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.30001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 9.83002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.02001e-06 [after_resolve]: 1.07e-05 [a_after_grad]: 8.95999e-06 [renormalize]: 0.00033694 [add_forward_monad_depend]: 4.47998e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.301e-05 [cse]: 2.773e-05 [a_3]: 4.028e-05 [Cycle 2]: 0.00059276, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012459 [with_stream_mark]: 1.113e-05 [recompute_prepare]: 5.94999e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 7.99977e-07 [a_2]: 6.816e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.48002e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.14e-06 [parallel]: 4e-06 [flash_sp]: 3.19001e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.42999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.17999e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 6.01e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.34002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.04998e-06 [a_after_grad]: 8.22e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.09001e-06 [cse]: 1.259e-05 [a_3]: 3.227e-05 [py_interpret_to_execute_after_opt_a]: 7.3e-06 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 3.043e-05 [convert_after_rewriter]: 6.95998e-06 [order_py_execute_after_rewriter]: 4.90001e-06 [mutable_eliminate]: 0.00047778 [opt_b]: 0.00018242, [1] [Cycle 1]: 0.00017658, [7] [b_1]: 0.00010902 [b_2]: 7.53e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.48e-06 [renormalize]: 3.50003e-07 [cse]: 1.606e-05 [optimize_parallel_all_gather_comm]: 1.591e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.174e-05 [loop_unroll]: 0.00041585 [opt_after_cconv]: 9.36e-05, [1] [Cycle 1]: 8.778e-05, [7] [c_1]: 2.749e-05 [parameter_eliminate]: 2.18002e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.567e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.222e-05 [tuple_transform]: 6.86e-05, [1] [Cycle 1]: 6.422e-05, [4] [d_1]: 3.88e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 4.359e-05 [cse_after_recomputation]: 1.993e-05, [1] [Cycle 1]: 1.556e-05, [1] [cse]: 1.028e-05 [environ_conv]: 4.47e-06 [swap_dp_allreduce_reducescatter]: 4.90001e-06 [bias_add_comm_swap]: 2.26998e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 3.06001e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.45997e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.113e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.31002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 3.72998e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 7.7e-07 [split_matmul_comm_elemetwise]: 1.94e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.40025e-07 [symbol_engine_optimizer]: 6.722e-05, [1] [Cycle 1]: 6.316e-05, [6] [build]: 2.05002e-06 [elim_shapecalc]: 8.43999e-06 [elim_not_effective]: 1.111e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00045241 [validate]: 3.216e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00616419 [execute]: 6.56e-06 Sums bootstrap : 0.000455s : 3.15% type_inference : 0.004302s : 29.75% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.88% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000337s : 2.33% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000478s : 3.30% optimize.opt_b.b_1 : 0.000109s : 0.75% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.88% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000452s : 3.13% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006164s : 42.63% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 17.64% : 0.000021s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.25% : 0.000001s : 2: substitution.fold_const_symbol 4.41% : 0.000005s : 4: substitution.graph_param_transform 65.94% : 0.000079s : 2: substitution.inline 2.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.45% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004261 2 92.01% : 0.003920s : 1: type_inference.infer 7.99% : 0.000340s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.73% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.73% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.76% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.82% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.98% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.71% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.66% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000231 6 42.41% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.59% : 0.000133s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026404 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.46% : 0.003027s : 1: add_attr 11.43% : 0.003018s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000490s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000425s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.85% : 0.000487s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.03% : 0.001855s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.75% : 0.000462s : 1: opt_after_jit_grad 0.70% : 0.000186s : 1: opt_b 13.96% : 0.003687s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000185s : 1: renormalize.infer 0.55% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.38% : 0.006174s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.34% : 0.004315s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0355733, [24] [bootstrap]: 0.00049586 [type_inference]: 0.0101999 [event_method]: 4.14e-05 [auto_monad]: 0.00011469 [graph_reusing]: 7.96001e-06 [inline]: 2.13002e-06 [add_attr]: 0.00298022, [1] [add_attr_with_inline]: 0.00297203, [1] [Cycle 1]: 6.737e-05, [2] [tag_attr]: 3.211e-05 [meta_addattr_fg_expand]: 9.13002e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 4.548e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 9.40025e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.0128829, [53] [py_interpret_to_execute]: 3.375e-05 [rewriter_before_opt_a]: 0.00012742 [opt_a]: 0.0106505, [3] [Cycle 1]: 0.00682569, [45] [expand_dump_flag]: 3.55998e-06 [switch_simplify]: 6.661e-05 [loop_unroll]: 5.421e-05 [a_1]: 0.00132282 [with_stream_mark]: 2.22e-05 [recompute_prepare]: 2.121e-05 [updatestate_depend_eliminate]: 8.88002e-06 [updatestate_assign_eliminate]: 7.55e-06 [updatestate_loads_eliminate]: 7.23e-06 [parameter_eliminate]: 2.54999e-06 [a_2]: 0.00024312 [accelerated_algorithm]: 3.079e-05 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.583e-05 [merge_send_recv]: 1.555e-05 [auto_parallel]: 1.109e-05 [parallel]: 1.738e-05 [flash_sp]: 1.095e-05 [merge_comm]: 9.87999e-06 [allreduce_fusion]: 8.78001e-06 [matmul_add_comm_reduction]: 2.55e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.832e-05 [virtual_dataset]: 1.554e-05 [get_grad_eliminate_]: 1.535e-05 [virtual_output]: 1.583e-05 [merge_forward]: 9.67999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.801e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.861e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 2.724e-05 [set_forward_comm_id_for_comm_node_pass]: 9.46003e-06 [meta_fg_expand]: 0.00139912 [flash_sp_send_recv_attached]: 4.05998e-06 [receive_attached]: 2.71e-06 [after_resolve]: 5.967e-05 [a_after_grad]: 8.069e-05 [renormalize]: 0.00234077 [add_forward_monad_depend]: 9.04e-06 [auto_monad_grad]: 5.07999e-06 [auto_monad_eliminator]: 5.442e-05 [cse]: 0.0001606 [a_3]: 0.00033259 [Cycle 2]: 0.00292249, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.728e-05 [loop_unroll]: 4.363e-05 [a_1]: 0.00151669 [with_stream_mark]: 1.183e-05 [recompute_prepare]: 1.047e-05 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 4.29002e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 0.00012622 [accelerated_algorithm]: 1.195e-05 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.81998e-06 [shard_inline]: 9.19e-06 [merge_send_recv]: 6.51e-06 [auto_parallel]: 7.26999e-06 [parallel]: 4.63999e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 4.59002e-06 [matmul_add_comm_reduction]: 7.4e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 1.005e-05 [virtual_dataset]: 8.72998e-06 [get_grad_eliminate_]: 8.89e-06 [virtual_output]: 8.25999e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 8.81002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.615e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 1.497e-05 [set_forward_comm_id_for_comm_node_pass]: 5.49e-06 [meta_fg_expand]: 3.304e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.01997e-06 [after_resolve]: 1.511e-05 [a_after_grad]: 1.425e-05 [renormalize]: 0.00057333 [add_forward_monad_depend]: 3.91001e-06 [auto_monad_grad]: 1.09998e-06 [auto_monad_eliminator]: 1.443e-05 [cse]: 4.519e-05 [a_3]: 6.437e-05 [Cycle 3]: 0.00088879, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.059e-05 [loop_unroll]: 8.90001e-06 [a_1]: 0.00024697 [with_stream_mark]: 9.69e-06 [recompute_prepare]: 9.36e-06 [updatestate_depend_eliminate]: 4.79998e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 3.76001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012268 [accelerated_algorithm]: 1.167e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 6.91001e-06 [auto_parallel]: 6.78e-06 [parallel]: 4.33001e-06 [flash_sp]: 1.12e-06 [merge_comm]: 4.87e-06 [allreduce_fusion]: 4.92e-06 [matmul_add_comm_reduction]: 7.43e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 9.81e-06 [virtual_dataset]: 8.65999e-06 [get_grad_eliminate_]: 8.31002e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 8.38001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.568e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.38e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22e-06 [meta_fg_expand]: 3.06999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.22e-06 [after_resolve]: 1.303e-05 [a_after_grad]: 1.379e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 1.102e-05 [cse]: 2.518e-05 [a_3]: 5.76e-05 [py_interpret_to_execute_after_opt_a]: 1.081e-05 [slice_cell_reuse_recomputed_activation]: 1.81003e-06 [rewriter_after_opt_a]: 4.65e-05 [convert_after_rewriter]: 9.09e-06 [order_py_execute_after_rewriter]: 6.47001e-06 [mutable_eliminate]: 0.00047368 [opt_b]: 0.00028383, [1] [Cycle 1]: 0.00027757, [7] [b_1]: 0.00018726 [b_2]: 1.048e-05 [updatestate_depend_eliminate]: 7.28e-06 [updatestate_assign_eliminate]: 3.99002e-06 [updatestate_loads_eliminate]: 3.81999e-06 [renormalize]: 3.00002e-07 [cse]: 3.018e-05 [optimize_parallel_all_gather_comm]: 2.054e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 1.997e-05 [loop_unroll]: 0.00041864 [opt_after_cconv]: 0.00013424, [1] [Cycle 1]: 0.0001282, [7] [c_1]: 4.699e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 7.14001e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.76999e-06 [cse]: 2.983e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 2.803e-05 [tuple_transform]: 9.968e-05, [1] [Cycle 1]: 9.506e-05, [4] [d_1]: 6.518e-05 [none_parameter_eliminate]: 1.71998e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 9.52999e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 5.622e-05 [cse_after_recomputation]: 3.129e-05, [1] [Cycle 1]: 2.66e-05, [1] [cse]: 2.122e-05 [environ_conv]: 8.3e-06 [swap_dp_allreduce_reducescatter]: 7.26001e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.79001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.08001e-06 [interleave_parallel_branches]: 1.37e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66998e-06 [control_data_broadcast_order]: 1.687e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.12e-06 [overlap_recompute_and_grad_model_parallel]: 5.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.69999e-06 [overlap_grad_ring_attention]: 5.17e-06 [overlap_grad_flash_sp]: 2.431e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 9.65e-05, [1] [Cycle 1]: 9.237e-05, [6] [build]: 9.15001e-06 [elim_shapecalc]: 1.292e-05 [elim_not_effective]: 1.797e-05 [opt_reshape]: 9.73998e-06 [fold_const_symbol]: 1.5e-05 [renormalize]: 2.59985e-07 [detach_backward]: 1.75001e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 2.549e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00045874 [validate]: 4.414e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00804316 [execute]: 7.11999e-06 Sums bootstrap : 0.000496s : 1.58% type_inference : 0.010200s : 32.56% event_method : 0.000041s : 0.13% auto_monad : 0.000115s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.41% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.40% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003086s : 9.85% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001435s : 4.58% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000109s : 0.35% optimize.opt_a.renormalize : 0.002914s : 9.30% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000231s : 0.74% optimize.opt_a.a_3 : 0.000455s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000474s : 1.51% optimize.opt_b.b_1 : 0.000187s : 0.60% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000419s : 1.34% optimize.opt_after_cconv.c_1 : 0.000047s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000065s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000459s : 1.46% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008043s : 25.68% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000724 218 5.84% : 0.000042s : 11: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 54.49% : 0.000395s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.41% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.09% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000005s : 5: substitution.partial_eliminate 1.86% : 0.000013s : 20: substitution.remove_not_recompute_node 3.26% : 0.000024s : 10: substitution.replace_applicator 1.48% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.76% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.51% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.69% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010133 2 87.52% : 0.008868s : 1: type_inference.infer 12.48% : 0.001265s : 1: type_inference.specialize ------[replace.] 0.000201 30 59.08% : 0.000119s : 16: replace.inline 40.92% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000417 30 92.53% : 0.000386s : 16: match.inline 7.47% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.14% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.16% : 0.000016s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.56% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000019s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.20% : 0.000001s : 8: predicate.parallel_virtual_node 1.95% : 0.000014s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.68% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.96% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.40% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001452 32 57.57% : 0.000836s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.43% : 0.000616s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059395 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.03% : 0.002985s : 1: add_attr 5.01% : 0.002976s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000122s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.89% : 0.000530s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.81% : 0.000483s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.95% : 0.004725s : 117: opt.transform.opt_a 0.08% : 0.000046s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000073s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.94% : 0.010653s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.79% : 0.000468s : 1: opt_after_jit_grad 0.48% : 0.000287s : 1: opt_b 21.70% : 0.012887s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.63% : 0.001561s : 2: renormalize.infer 2.26% : 0.001340s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000132s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000099s : 1: symbol_engine_optimizer 13.56% : 0.008053s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.20% : 0.010215s : 1: type_inference 0.13% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x7-kbk],max_mem:42.0M . TotalTime = 0.841422, [24] [bootstrap]: 0.00059058 [type_inference]: 0.00609373 [event_method]: 1.423e-05 [auto_monad]: 5.377e-05 [graph_reusing]: 5.82001e-06 [inline]: 1.69e-06 [add_attr]: 0.00346644, [1] [add_attr_with_inline]: 0.00345462, [1] [Cycle 1]: 4.632e-05, [2] [tag_attr]: 1.513e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.805e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 9.50007e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.46998e-06 [optimize]: 0.00399479, [53] [py_interpret_to_execute]: 2.085e-05 [rewriter_before_opt_a]: 5.845e-05 [opt_a]: 0.00215194, [2] [Cycle 1]: 0.00150175, [45] [expand_dump_flag]: 2.98998e-06 [switch_simplify]: 3.228e-05 [loop_unroll]: 2.064e-05 [a_1]: 0.00045201 [with_stream_mark]: 1.359e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 3.98001e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.629e-05 [accelerated_algorithm]: 6.59999e-06 [shard]: 1.94999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 7.49002e-06 [auto_parallel]: 5.48002e-06 [parallel]: 2.338e-05 [flash_sp]: 7.2e-06 [merge_comm]: 3.48e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 8.15e-06 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 6.18002e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.83002e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 9.44998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47002e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.91e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 8.77e-06 [renormalize]: 0.00040611 [add_forward_monad_depend]: 4.27998e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.317e-05 [cse]: 2.585e-05 [a_3]: 4.11e-05 [Cycle 2]: 0.00064111, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.00018072 [with_stream_mark]: 9.34998e-06 [recompute_prepare]: 5.71003e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.774e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.33001e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.57e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 4.99998e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.02001e-06 [virtual_dataset]: 5.02999e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 4.89003e-06 [merge_forward]: 2.36e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.26e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.01002e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 7.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.83e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.55001e-06 [a_after_grad]: 8.25e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.53e-06 [cse]: 1.275e-05 [a_3]: 3.132e-05 [py_interpret_to_execute_after_opt_a]: 7.85e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.228e-05 [convert_after_rewriter]: 6.37001e-06 [order_py_execute_after_rewriter]: 5.43002e-06 [mutable_eliminate]: 0.00044782 [opt_b]: 0.0001858, [1] [Cycle 1]: 0.00017954, [7] [b_1]: 0.00011305 [b_2]: 6.79999e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 4.50003e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.562e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.278e-05 [loop_unroll]: 0.00041791 [opt_after_cconv]: 9.543e-05, [1] [Cycle 1]: 8.967e-05, [7] [c_1]: 2.785e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.566e-05 [renormalize]: 1.80007e-07 [remove_dup_value]: 1.325e-05 [tuple_transform]: 6.861e-05, [1] [Cycle 1]: 6.427e-05, [4] [d_1]: 3.871e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.55999e-06 [add_recomputation]: 4.847e-05 [cse_after_recomputation]: 2.047e-05, [1] [Cycle 1]: 1.625e-05, [1] [cse]: 1.107e-05 [environ_conv]: 4.43999e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.61999e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.50999e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.01998e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.67001e-06 [offloading_packed_experts]: 3.35e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.67002e-06 [overlap_grad_flash_sp]: 1.692e-05 [begin_end_overlap_inline]: 7.29982e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.54e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.698e-05, [1] [Cycle 1]: 6.294e-05, [6] [build]: 2.15002e-06 [elim_shapecalc]: 8.43001e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.74003e-06 [renormalize]: 1.70025e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.92001e-06 [auto_monad_reorder]: 1.562e-05 [get_jit_bprop_graph]: 9.30013e-07 [rewriter_after_jit_bprop_graph]: 3.63999e-06 [opt_after_jit_grad]: 0.00045128 [validate]: 3.045e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.82643 [execute]: 9.30001e-06 Sums bootstrap : 0.000591s : 0.07% type_inference : 0.006094s : 0.73% event_method : 0.000014s : 0.00% auto_monad : 0.000054s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000633s : 0.08% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000406s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000006s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000448s : 0.05% optimize.opt_b.b_1 : 0.000113s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.00% optimize.loop_unroll : 0.000418s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000451s : 0.05% validate : 0.000030s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.826430s : 98.74% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000164 30 15.00% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.65% : 0.000109s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.27% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006046 2 90.92% : 0.005497s : 1: type_inference.infer 9.08% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.29% : 0.000026s : 3: replace.inline 30.71% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.55% : 0.000107s : 3: match.inline 8.45% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 1.00% : 0.000002s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.35% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.55% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000335 8 47.20% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.80% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.850437 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.41% : 0.003471s : 1: add_attr 0.41% : 0.003458s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000059s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000630s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000009s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000457s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000012s : 1: opt.transform.mutable_eliminate 0.12% : 0.000997s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000095s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.25% : 0.002155s : 1: opt_a 0.01% : 0.000099s : 1: opt_after_cconv 0.05% : 0.000461s : 1: opt_after_jit_grad 0.02% : 0.000189s : 1: opt_b 0.47% : 0.003999s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.02% : 0.000209s : 1: renormalize.infer 0.02% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000070s : 1: symbol_engine_optimizer 97.18% : 0.826453s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.72% : 0.006108s : 1: type_inference 0.01% : 0.000055s : 1: validate TotalTime = 0.0713657, [24] [bootstrap]: 0.00040789 [type_inference]: 0.00433711 [event_method]: 1.111e-05 [auto_monad]: 5.098e-05 [graph_reusing]: 5.14998e-06 [inline]: 1.74e-06 [add_attr]: 0.00295632, [1] [add_attr_with_inline]: 0.00294876, [1] [Cycle 1]: 4.324e-05, [2] [tag_attr]: 1.138e-05 [meta_addattr_fg_expand]: 3.60998e-06 [parallel-infer-symbol]: 3.03998e-06 [pre_auto_parallel]: 2.06e-05 [insert-virtual-dataset]: 2.21998e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.0036261, [53] [py_interpret_to_execute]: 1.48e-05 [rewriter_before_opt_a]: 3.983e-05 [opt_a]: 0.00184349, [2] [Cycle 1]: 0.00124746, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.408e-05 [loop_unroll]: 1.363e-05 [a_1]: 0.00029056 [with_stream_mark]: 1.299e-05 [recompute_prepare]: 7.28999e-06 [updatestate_depend_eliminate]: 3.85998e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 7.55e-05 [accelerated_algorithm]: 6.02999e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 7.75e-06 [auto_parallel]: 5.63002e-06 [parallel]: 1.65e-05 [flash_sp]: 7.5e-06 [merge_comm]: 3.47002e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 9.00999e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.73e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.33002e-06 [virtual_output]: 5.46998e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 8.67998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.033e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.94001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 2.01e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.063e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00034825 [add_forward_monad_depend]: 4.36002e-06 [auto_monad_grad]: 1.91e-06 [auto_monad_eliminator]: 1.268e-05 [cse]: 2.635e-05 [a_3]: 3.915e-05 [Cycle 2]: 0.00058688, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.00012421 [with_stream_mark]: 1.061e-05 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.757e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.23999e-06 [auto_parallel]: 4.75999e-06 [parallel]: 4.17e-06 [flash_sp]: 3.4e-06 [merge_comm]: 2.88998e-06 [allreduce_fusion]: 2.76999e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.17999e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52999e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 8.83001e-06 [a_after_grad]: 8.3e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 9.79984e-07 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.23e-06 [cse]: 1.283e-05 [a_3]: 3.128e-05 [py_interpret_to_execute_after_opt_a]: 7.31001e-06 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 3.126e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 5.34e-06 [mutable_eliminate]: 0.00043786 [opt_b]: 0.00018044, [1] [Cycle 1]: 0.00017468, [7] [b_1]: 0.00010794 [b_2]: 6.79001e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 4.50003e-07 [cse]: 1.571e-05 [optimize_parallel_all_gather_comm]: 1.539e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 2.2e-05 [loop_unroll]: 0.00040646 [opt_after_cconv]: 9.478e-05, [1] [Cycle 1]: 8.928e-05, [7] [c_1]: 2.773e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.28998e-06 [cse]: 1.646e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.143e-05 [tuple_transform]: 6.841e-05, [1] [Cycle 1]: 6.41e-05, [4] [d_1]: 3.844e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 5.97001e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.331e-05 [cse_after_recomputation]: 1.96e-05, [1] [Cycle 1]: 1.498e-05, [1] [cse]: 1.004e-05 [environ_conv]: 4.65999e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.79002e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.07001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.23002e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.18001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.167e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 4.00998e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.68e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.783e-05, [1] [Cycle 1]: 6.376e-05, [6] [build]: 2.24999e-06 [elim_shapecalc]: 8.14002e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 6.33e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.469e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00046817 [validate]: 3.058e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0592086 [execute]: 9.54999e-06 Sums bootstrap : 0.000408s : 0.60% type_inference : 0.004337s : 6.43% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000415s : 0.61% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000010s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000348s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000070s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000438s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000406s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000468s : 0.69% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059209s : 87.77% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.18% : 0.000022s : 4: substitution.arithmetic_simplify 1.66% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.46% : 0.000005s : 4: substitution.graph_param_transform 65.26% : 0.000079s : 2: substitution.inline 2.55% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.43% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004297 2 91.61% : 0.003936s : 1: type_inference.infer 8.39% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.93% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.81% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.38% : 0.000003s : 17: predicate.arithmetic_simplify 0.94% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.97% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.06% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.73% : 0.000001s : 9: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.31% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.67% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.75% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.16% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000261 6 42.00% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.00% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079212 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.74% : 0.002961s : 1: add_attr 3.73% : 0.002952s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.56% : 0.000443s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000415s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000446s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.96% : 0.000761s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.33% : 0.001846s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.60% : 0.000478s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.58% : 0.003630s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000024s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.23% : 0.000185s : 1: renormalize.infer 0.20% : 0.000156s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.77% : 0.059226s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.49% : 0.004351s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.0722737, [24] [bootstrap]: 0.00042453 [type_inference]: 0.00551795 [event_method]: 1.432e-05 [auto_monad]: 5.373e-05 [graph_reusing]: 5.35999e-06 [inline]: 1.89e-06 [add_attr]: 0.0029209, [1] [add_attr_with_inline]: 0.00291234, [1] [Cycle 1]: 4.553e-05, [2] [tag_attr]: 1.522e-05 [meta_addattr_fg_expand]: 4.42998e-06 [parallel-infer-symbol]: 2.49999e-06 [pre_auto_parallel]: 2.489e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0039827, [53] [py_interpret_to_execute]: 2.055e-05 [rewriter_before_opt_a]: 5.781e-05 [opt_a]: 0.00216099, [2] [Cycle 1]: 0.00155448, [45] [expand_dump_flag]: 2.57001e-06 [switch_simplify]: 3.246e-05 [loop_unroll]: 2.097e-05 [a_1]: 0.0004442 [with_stream_mark]: 1.376e-05 [recompute_prepare]: 7.26001e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 0.00012236 [accelerated_algorithm]: 6.46999e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 6.06998e-06 [merge_send_recv]: 8.43001e-06 [auto_parallel]: 6.39999e-06 [parallel]: 1.842e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.43999e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.18998e-06 [virtual_dataset]: 6.19001e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.64998e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 8.81997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 9.69e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.062e-05 [a_after_grad]: 8.85001e-06 [renormalize]: 0.00042232 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 2.676e-05 [a_3]: 4.081e-05 [Cycle 2]: 0.00059707, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.32999e-06 [a_1]: 0.00012536 [with_stream_mark]: 9.36e-06 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.38998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.824e-05 [accelerated_algorithm]: 5.34e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.11002e-06 [shard_inline]: 5.32001e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.05e-06 [flash_sp]: 3.54002e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 3.08e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.29001e-06 [virtual_dataset]: 5.43002e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.41e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.63002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.60001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.10001e-06 [a_after_grad]: 8.21002e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.35997e-06 [cse]: 1.328e-05 [a_3]: 3.244e-05 [py_interpret_to_execute_after_opt_a]: 7.65e-06 [slice_cell_reuse_recomputed_activation]: 1.88002e-06 [rewriter_after_opt_a]: 3.102e-05 [convert_after_rewriter]: 7.05e-06 [order_py_execute_after_rewriter]: 5.23002e-06 [mutable_eliminate]: 0.00044749 [opt_b]: 0.00018275, [1] [Cycle 1]: 0.00017666, [7] [b_1]: 0.00010803 [b_2]: 7.04001e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 6.89994e-07 [cse]: 1.619e-05 [optimize_parallel_all_gather_comm]: 1.531e-05 [overlap_param_gather]: 1.96003e-06 [cconv]: 2.227e-05 [loop_unroll]: 0.00041139 [opt_after_cconv]: 9.443e-05, [1] [Cycle 1]: 8.883e-05, [7] [c_1]: 2.777e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.628e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 6.849e-05, [1] [Cycle 1]: 6.407e-05, [4] [d_1]: 3.899e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 4.259e-05 [cse_after_recomputation]: 2.075e-05, [1] [Cycle 1]: 1.596e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.58001e-06 [swap_dp_allreduce_reducescatter]: 5.22e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.37001e-06 [merge_cast_opt]: 1.11002e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.01e-06 [assign_add_opt]: 1.54998e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.30999e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.11002e-06 [interleave_split_concat_branches]: 1.07998e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.646e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.57001e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 6.632e-05, [1] [Cycle 1]: 6.215e-05, [6] [build]: 2.22001e-06 [elim_shapecalc]: 8.07e-06 [elim_not_effective]: 1.089e-05 [opt_reshape]: 5.87999e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.56e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.30998e-06 [opt_after_jit_grad]: 0.0004437 [validate]: 3.06e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0586087 [execute]: 8.78001e-06 Sums bootstrap : 0.000425s : 0.62% type_inference : 0.005518s : 8.07% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000002s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000570s : 0.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000191s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000014s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000422s : 0.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000411s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.65% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.058609s : 85.71% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 15.82% : 0.000026s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.21% : 0.000005s : 4: substitution.graph_param_transform 65.86% : 0.000108s : 3: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000004s : 4: substitution.remove_not_recompute_node 2.60% : 0.000004s : 4: substitution.replace_old_param 6.15% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005478 2 89.20% : 0.004887s : 1: type_inference.infer 10.80% : 0.000591s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.59% : 0.000028s : 3: replace.inline 29.41% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 92.12% : 0.000105s : 3: match.inline 7.88% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.97% : 0.000002s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 19: predicate.arithmetic_simplify 0.94% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.12% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 1.10% : 0.000002s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.42% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000343 8 46.30% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.70% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080736 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.62% : 0.002925s : 1: add_attr 3.61% : 0.002916s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.57% : 0.000459s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.22% : 0.000983s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.68% : 0.002164s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.56% : 0.000453s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.94% : 0.003987s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000210s : 1: renormalize.infer 0.26% : 0.000206s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000069s : 1: symbol_engine_optimizer 72.62% : 0.058627s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 6.85% : 0.005532s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.865024, [24] [bootstrap]: 0.00045347 [type_inference]: 0.0113386 [event_method]: 4.9e-05 [auto_monad]: 0.00012093 [graph_reusing]: 7.71001e-06 [inline]: 2.01e-06 [add_attr]: 0.00302617, [1] [add_attr_with_inline]: 0.00301796, [1] [Cycle 1]: 6.97e-05, [2] [tag_attr]: 3.4e-05 [meta_addattr_fg_expand]: 9.13002e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 4.95e-05 [insert-virtual-dataset]: 2.33998e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.0133945, [53] [py_interpret_to_execute]: 3.783e-05 [rewriter_before_opt_a]: 0.00014507 [opt_a]: 0.0111289, [3] [Cycle 1]: 0.00717838, [45] [expand_dump_flag]: 3.76001e-06 [switch_simplify]: 7.433e-05 [loop_unroll]: 6.17e-05 [a_1]: 0.00145636 [with_stream_mark]: 2.348e-05 [recompute_prepare]: 2.093e-05 [updatestate_depend_eliminate]: 8.87999e-06 [updatestate_assign_eliminate]: 8.22e-06 [updatestate_loads_eliminate]: 7.65998e-06 [parameter_eliminate]: 2.58e-06 [a_2]: 0.00024386 [accelerated_algorithm]: 3.06e-05 [shard]: 1.76e-06 [meta_shard_fg_expand]: 3.23e-06 [shard_inline]: 1.595e-05 [merge_send_recv]: 1.535e-05 [auto_parallel]: 1.123e-05 [parallel]: 1.855e-05 [flash_sp]: 1.192e-05 [merge_comm]: 9.51e-06 [allreduce_fusion]: 8.84e-06 [matmul_add_comm_reduction]: 2.703e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.8e-05 [virtual_dataset]: 1.561e-05 [get_grad_eliminate_]: 1.513e-05 [virtual_output]: 1.516e-05 [merge_forward]: 9.39e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 1.718e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.866e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 2.844e-05 [set_forward_comm_id_for_comm_node_pass]: 9.86e-06 [meta_fg_expand]: 0.00144974 [flash_sp_send_recv_attached]: 3.7e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 5.937e-05 [a_after_grad]: 8.05e-05 [renormalize]: 0.0024757 [add_forward_monad_depend]: 9.06998e-06 [auto_monad_grad]: 5.62001e-06 [auto_monad_eliminator]: 5.743e-05 [cse]: 0.00019877 [a_3]: 0.00033437 [Cycle 2]: 0.00300589, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 4.757e-05 [loop_unroll]: 4.404e-05 [a_1]: 0.00152446 [with_stream_mark]: 1.17e-05 [recompute_prepare]: 1.065e-05 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.55998e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012542 [accelerated_algorithm]: 1.176e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.27999e-06 [merge_send_recv]: 7.01999e-06 [auto_parallel]: 7.22997e-06 [parallel]: 4.70001e-06 [flash_sp]: 3.00998e-06 [merge_comm]: 5.19003e-06 [allreduce_fusion]: 4.67998e-06 [matmul_add_comm_reduction]: 7.64002e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 8.82999e-06 [get_grad_eliminate_]: 8.84003e-06 [virtual_output]: 8.42e-06 [merge_forward]: 5.01997e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 9.57999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.624e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.406e-05 [set_forward_comm_id_for_comm_node_pass]: 5.28002e-06 [meta_fg_expand]: 7.006e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.612e-05 [a_after_grad]: 1.461e-05 [renormalize]: 0.00060176 [add_forward_monad_depend]: 4.17e-06 [auto_monad_grad]: 1.51998e-06 [auto_monad_eliminator]: 1.455e-05 [cse]: 4.713e-05 [a_3]: 6.539e-05 [Cycle 3]: 0.00093095, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.061e-05 [loop_unroll]: 8.99e-06 [a_1]: 0.00024894 [with_stream_mark]: 9.57001e-06 [recompute_prepare]: 9.42999e-06 [updatestate_depend_eliminate]: 4.70001e-06 [updatestate_assign_eliminate]: 3.84002e-06 [updatestate_loads_eliminate]: 3.83001e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012263 [accelerated_algorithm]: 1.157e-05 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 9.29e-06 [merge_send_recv]: 7.40998e-06 [auto_parallel]: 7.10998e-06 [parallel]: 4.66002e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 5.02999e-06 [allreduce_fusion]: 3.034e-05 [matmul_add_comm_reduction]: 7.91001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 8.75999e-06 [get_grad_eliminate_]: 8.57998e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.18999e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.627e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.375e-05 [set_forward_comm_id_for_comm_node_pass]: 5.81e-06 [meta_fg_expand]: 3.02002e-06 [flash_sp_send_recv_attached]: 9.29984e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.459e-05 [a_after_grad]: 1.407e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.47001e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 1.094e-05 [cse]: 2.715e-05 [a_3]: 5.922e-05 [py_interpret_to_execute_after_opt_a]: 1.025e-05 [slice_cell_reuse_recomputed_activation]: 1.70001e-06 [rewriter_after_opt_a]: 4.59e-05 [convert_after_rewriter]: 9.44e-06 [order_py_execute_after_rewriter]: 6.94999e-06 [mutable_eliminate]: 0.00045964 [opt_b]: 0.00028662, [1] [Cycle 1]: 0.00028068, [7] [b_1]: 0.00018816 [b_2]: 1.075e-05 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 4.19002e-06 [updatestate_loads_eliminate]: 3.97e-06 [renormalize]: 1.8999e-07 [cse]: 3.161e-05 [optimize_parallel_all_gather_comm]: 2.003e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.059e-05 [loop_unroll]: 0.00042573 [opt_after_cconv]: 0.00013735, [1] [Cycle 1]: 0.00013117, [7] [c_1]: 4.812e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 7.22002e-06 [updatestate_assign_eliminate]: 4.13999e-06 [updatestate_loads_eliminate]: 3.86999e-06 [cse]: 3.088e-05 [renormalize]: 2.99973e-07 [remove_dup_value]: 2.88e-05 [tuple_transform]: 0.0001024, [1] [Cycle 1]: 9.755e-05, [4] [d_1]: 6.711e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 1.005e-05 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 5.629e-05 [cse_after_recomputation]: 3.257e-05, [1] [Cycle 1]: 2.79e-05, [1] [cse]: 2.264e-05 [environ_conv]: 8.95001e-06 [swap_dp_allreduce_reducescatter]: 7.94997e-06 [bias_add_comm_swap]: 2.94999e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.75002e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 1.96e-06 [micro_interleaved_order_control]: 2.60002e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.712e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 4.99e-06 [overlap_recompute_and_grad_model_parallel]: 5.84999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.64e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 5.20999e-06 [overlap_grad_flash_sp]: 2.463e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 1.61998e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 9.871e-05, [1] [Cycle 1]: 9.413e-05, [6] [build]: 9.37001e-06 [elim_shapecalc]: 1.349e-05 [elim_not_effective]: 1.82e-05 [opt_reshape]: 1.033e-05 [fold_const_symbol]: 1.441e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.541e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.12002e-06 [opt_after_jit_grad]: 0.00046626 [validate]: 4.527e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.835801 [execute]: 8.59e-06 Sums bootstrap : 0.000453s : 0.05% type_inference : 0.011339s : 1.32% event_method : 0.000049s : 0.01% auto_monad : 0.000121s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.01% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000145s : 0.02% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000133s : 0.02% optimize.opt_a.loop_unroll : 0.000115s : 0.01% optimize.opt_a.a_1 : 0.003230s : 0.38% optimize.opt_a.with_stream_mark : 0.000045s : 0.01% optimize.opt_a.recompute_prepare : 0.000041s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000492s : 0.06% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000026s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000044s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.00% optimize.opt_a.virtual_dataset : 0.000033s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001523s : 0.18% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.01% optimize.opt_a.a_after_grad : 0.000109s : 0.01% optimize.opt_a.renormalize : 0.003078s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000008s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.01% optimize.opt_a.cse : 0.000273s : 0.03% optimize.opt_a.a_3 : 0.000459s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.01% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000460s : 0.05% optimize.opt_b.b_1 : 0.000188s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.00% optimize.loop_unroll : 0.000426s : 0.05% optimize.opt_after_cconv.c_1 : 0.000048s : 0.01% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.00% optimize.tuple_transform.d_1 : 0.000067s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.01% optimize.cse_after_recomputation.cse : 0.000023s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000466s : 0.05% validate : 0.000045s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.835801s : 97.10% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000762 222 5.88% : 0.000045s : 12: substitution.arithmetic_simplify 1.81% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.71% : 0.000425s : 17: substitution.inline 2.09% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.66% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.77% : 0.000013s : 20: substitution.remove_not_recompute_node 3.10% : 0.000024s : 10: substitution.replace_applicator 1.34% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.68% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011264 2 86.73% : 0.009769s : 1: type_inference.infer 13.27% : 0.001494s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.58% : 0.000126s : 17: replace.inline 42.42% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000449 33 92.56% : 0.000416s : 17: match.inline 7.44% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.03% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.22% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.72% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.11% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.21% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001585 34 56.47% : 0.000895s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.53% : 0.000690s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.889733 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.34% : 0.003031s : 1: add_attr 0.34% : 0.003022s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000128s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.05% : 0.000487s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000056s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000469s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.55% : 0.004891s : 117: opt.transform.opt_a 0.01% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000173s : 28: opt.transform.opt_b 0.01% : 0.000075s : 2: opt.transform.opt_trans_graph 0.01% : 0.000053s : 4: opt.transform.symbol_engine_opt 1.25% : 0.011132s : 1: opt_a 0.02% : 0.000141s : 1: opt_after_cconv 0.05% : 0.000476s : 1: opt_after_jit_grad 0.03% : 0.000290s : 1: opt_b 1.51% : 0.013398s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000054s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000033s : 1: remove_dup_value 0.18% : 0.001640s : 2: renormalize.infer 0.16% : 0.001425s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.01% : 0.000050s : 1: rewriter_after_opt_a 0.02% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000101s : 1: symbol_engine_optimizer 93.94% : 0.835823s : 1: task_emit 0.01% : 0.000105s : 1: tuple_transform 1.28% : 0.011353s : 1: type_inference 0.01% : 0.000070s : 1: validate TotalTime = 0.0708605, [24] [bootstrap]: 0.00044636 [type_inference]: 0.00433025 [event_method]: 1.067e-05 [auto_monad]: 4.968e-05 [graph_reusing]: 5.95002e-06 [inline]: 1.60999e-06 [add_attr]: 0.00296085, [1] [add_attr_with_inline]: 0.00295307, [1] [Cycle 1]: 4.743e-05, [2] [tag_attr]: 1.203e-05 [meta_addattr_fg_expand]: 3.82998e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.153e-05 [insert-virtual-dataset]: 2.17999e-06 [parallel-infer-symbol-second]: 6.30011e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00376942, [53] [py_interpret_to_execute]: 1.447e-05 [rewriter_before_opt_a]: 3.714e-05 [opt_a]: 0.0019377, [2] [Cycle 1]: 0.00128799, [45] [expand_dump_flag]: 2.70002e-06 [switch_simplify]: 2.415e-05 [loop_unroll]: 1.339e-05 [a_1]: 0.00028878 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.53999e-06 [updatestate_depend_eliminate]: 3.51001e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.505e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.29999e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.83997e-06 [merge_send_recv]: 7.24001e-06 [auto_parallel]: 5.27001e-06 [parallel]: 1.777e-05 [flash_sp]: 7.39002e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 9.22999e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 6.88998e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 5.458e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.75002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.33002e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 8.90001e-06 [renormalize]: 0.00033969 [add_forward_monad_depend]: 4.3e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.306e-05 [cse]: 2.71e-05 [a_3]: 3.942e-05 [Cycle 2]: 0.00064044, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.53e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.00013087 [with_stream_mark]: 1.133e-05 [recompute_prepare]: 6.24001e-06 [updatestate_depend_eliminate]: 3.06001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 7.554e-05 [accelerated_algorithm]: 6.42001e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.90002e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.92999e-06 [parallel]: 4.75001e-06 [flash_sp]: 3.25998e-06 [merge_comm]: 3.32002e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 5.39e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.96001e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 6.51e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 1.76003e-06 [flash_sp_send_recv_attached]: 1.15999e-06 [receive_attached]: 1.24e-06 [after_resolve]: 1.096e-05 [a_after_grad]: 9.33002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.78e-06 [cse]: 1.294e-05 [a_3]: 3.593e-05 [py_interpret_to_execute_after_opt_a]: 8.84e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.452e-05 [convert_after_rewriter]: 7.2e-06 [order_py_execute_after_rewriter]: 5.41998e-06 [mutable_eliminate]: 0.00048633 [opt_b]: 0.00017892, [1] [Cycle 1]: 0.00017291, [7] [b_1]: 0.00010597 [b_2]: 7.09001e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.59985e-07 [cse]: 1.614e-05 [optimize_parallel_all_gather_comm]: 1.505e-05 [overlap_param_gather]: 2.37001e-06 [cconv]: 2.168e-05 [loop_unroll]: 0.00040816 [opt_after_cconv]: 9.37e-05, [1] [Cycle 1]: 8.795e-05, [7] [c_1]: 2.706e-05 [parameter_eliminate]: 1.95001e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.627e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.256e-05 [tuple_transform]: 6.768e-05, [1] [Cycle 1]: 6.336e-05, [4] [d_1]: 3.852e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.50001e-06 [add_recomputation]: 4.452e-05 [cse_after_recomputation]: 1.992e-05, [1] [Cycle 1]: 1.567e-05, [1] [cse]: 1.071e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.39998e-06 [bias_add_comm_swap]: 2.22999e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 3.69002e-06 [overlap_recompute_and_grad_model_parallel]: 4.27998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.687e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.787e-05, [1] [Cycle 1]: 6.372e-05, [6] [build]: 2.04e-06 [elim_shapecalc]: 8.28001e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.59998e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.517e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00044607 [validate]: 3.004e-05 [backend_pass]: 1.19e-06 [task_emit]: 0.058551 [execute]: 8.27998e-06 Sums bootstrap : 0.000446s : 0.67% type_inference : 0.004330s : 6.47% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000037s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.63% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.10% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000022s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000340s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000486s : 0.73% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000408s : 0.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000446s : 0.67% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.058551s : 87.48% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.23% : 0.000022s : 4: substitution.arithmetic_simplify 1.69% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000005s : 4: substitution.graph_param_transform 64.98% : 0.000078s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.84% : 0.000005s : 4: substitution.remove_not_recompute_node 3.66% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004290 2 91.18% : 0.003912s : 1: type_inference.infer 8.82% : 0.000378s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.90% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.85% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.72% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.86% : 0.000001s : 8: predicate.incorporate_call 0.70% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 1.08% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.13% : 0.000002s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.07% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.13% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.87% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.04% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.97% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.63% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 42.23% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.77% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078917 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.76% : 0.002965s : 1: add_attr 3.75% : 0.002956s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000484s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000416s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.63% : 0.000496s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.05% : 0.000830s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.46% : 0.001941s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.58% : 0.000455s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.78% : 0.003773s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000003s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000186s : 1: renormalize.infer 0.19% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000039s : 1: rewriter_after_opt_a 0.05% : 0.000041s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.21% : 0.058567s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.50% : 0.004344s : 1: type_inference 0.06% : 0.000051s : 1: validate TotalTime = 0.112123, [24] [bootstrap]: 0.0042418 [type_inference]: 0.0106606 [event_method]: 4.33e-05 [auto_monad]: 0.00011586 [graph_reusing]: 8.06001e-06 [inline]: 1.79e-06 [add_attr]: 0.00302379, [1] [add_attr_with_inline]: 0.0030154, [1] [Cycle 1]: 6.54e-05, [2] [tag_attr]: 3.093e-05 [meta_addattr_fg_expand]: 8.42e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 4.539e-05 [insert-virtual-dataset]: 2.78003e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.67999e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0130821, [53] [py_interpret_to_execute]: 3.456e-05 [rewriter_before_opt_a]: 0.00012598 [opt_a]: 0.0108612, [3] [Cycle 1]: 0.00694029, [45] [expand_dump_flag]: 3.9e-06 [switch_simplify]: 6.495e-05 [loop_unroll]: 5.486e-05 [a_1]: 0.00133416 [with_stream_mark]: 2.304e-05 [recompute_prepare]: 2.139e-05 [updatestate_depend_eliminate]: 9.12001e-06 [updatestate_assign_eliminate]: 7.87e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 2.55002e-06 [a_2]: 0.00024481 [accelerated_algorithm]: 3.044e-05 [shard]: 1.76e-06 [meta_shard_fg_expand]: 3.36001e-06 [shard_inline]: 1.593e-05 [merge_send_recv]: 3.134e-05 [auto_parallel]: 1.148e-05 [parallel]: 2.183e-05 [flash_sp]: 1.123e-05 [merge_comm]: 1.022e-05 [allreduce_fusion]: 9.36e-06 [matmul_add_comm_reduction]: 2.612e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.837e-05 [virtual_dataset]: 1.553e-05 [get_grad_eliminate_]: 1.514e-05 [virtual_output]: 1.525e-05 [merge_forward]: 9.15999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.733e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.967e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.765e-05 [set_forward_comm_id_for_comm_node_pass]: 9.51e-06 [meta_fg_expand]: 0.00139529 [flash_sp_send_recv_attached]: 3.88999e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 5.993e-05 [a_after_grad]: 8.186e-05 [renormalize]: 0.00243855 [add_forward_monad_depend]: 9.39998e-06 [auto_monad_grad]: 5.22e-06 [auto_monad_eliminator]: 5.612e-05 [cse]: 0.00016665 [a_3]: 0.00033509 [Cycle 2]: 0.00296107, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 4.723e-05 [loop_unroll]: 4.38e-05 [a_1]: 0.00152914 [with_stream_mark]: 1.168e-05 [recompute_prepare]: 1.092e-05 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 0.0001264 [accelerated_algorithm]: 1.214e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 6.86001e-06 [auto_parallel]: 7.3e-06 [parallel]: 4.50001e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 5.80002e-06 [allreduce_fusion]: 4.86002e-06 [matmul_add_comm_reduction]: 7.64002e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.041e-05 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.46002e-06 [virtual_output]: 8.45999e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 9.80013e-07 [offload_activation]: 8.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.616e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.399e-05 [set_forward_comm_id_for_comm_node_pass]: 5.47999e-06 [meta_fg_expand]: 3.461e-05 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.487e-05 [a_after_grad]: 1.441e-05 [renormalize]: 0.00058787 [add_forward_monad_depend]: 4.15e-06 [auto_monad_grad]: 1.38002e-06 [auto_monad_eliminator]: 1.465e-05 [cse]: 4.641e-05 [a_3]: 6.52e-05 [Cycle 3]: 0.00094572, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 1.176e-05 [loop_unroll]: 9.05001e-06 [a_1]: 0.00025153 [with_stream_mark]: 1.031e-05 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.72998e-06 [updatestate_assign_eliminate]: 3.83999e-06 [updatestate_loads_eliminate]: 3.80998e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012286 [accelerated_algorithm]: 1.169e-05 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 7e-06 [auto_parallel]: 7.06001e-06 [parallel]: 4.77e-06 [flash_sp]: 1.02998e-06 [merge_comm]: 4.92e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.82e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 9.84001e-06 [virtual_dataset]: 8.48999e-06 [get_grad_eliminate_]: 8.33999e-06 [virtual_output]: 8.15999e-06 [merge_forward]: 4.34997e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 8.42e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.566e-05 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 1.385e-05 [set_forward_comm_id_for_comm_node_pass]: 5.21002e-06 [meta_fg_expand]: 2.79999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.336e-05 [a_after_grad]: 1.407e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 1.169e-05 [cse]: 2.657e-05 [a_3]: 5.905e-05 [py_interpret_to_execute_after_opt_a]: 1.098e-05 [slice_cell_reuse_recomputed_activation]: 2.37001e-06 [rewriter_after_opt_a]: 4.605e-05 [convert_after_rewriter]: 9.84001e-06 [order_py_execute_after_rewriter]: 6.61999e-06 [mutable_eliminate]: 0.00044903 [opt_b]: 0.00028809, [1] [Cycle 1]: 0.00028198, [7] [b_1]: 0.00018843 [b_2]: 1.023e-05 [updatestate_depend_eliminate]: 7.38999e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 4.10015e-07 [cse]: 3.175e-05 [optimize_parallel_all_gather_comm]: 2.104e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 1.953e-05 [loop_unroll]: 0.00041684 [opt_after_cconv]: 0.00013631, [1] [Cycle 1]: 0.00013045, [7] [c_1]: 4.888e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 6.91001e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.88999e-06 [cse]: 3.003e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 2.938e-05 [tuple_transform]: 0.00010252, [1] [Cycle 1]: 9.791e-05, [4] [d_1]: 6.672e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.056e-05 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 5.738e-05 [cse_after_recomputation]: 3.192e-05, [1] [Cycle 1]: 2.716e-05, [1] [cse]: 2.155e-05 [environ_conv]: 9.32999e-06 [swap_dp_allreduce_reducescatter]: 7.7e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 1.99999e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.49977e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.14003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.53002e-06 [control_data_broadcast_order]: 1.619e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 4.90001e-06 [overlap_recompute_and_grad_model_parallel]: 6.02001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 5.27999e-06 [overlap_grad_flash_sp]: 2.421e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.89e-06 [split_layernorm_comm]: 2.09999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 9.862e-05, [1] [Cycle 1]: 9.435e-05, [6] [build]: 9.12999e-06 [elim_shapecalc]: 1.387e-05 [elim_not_effective]: 1.865e-05 [opt_reshape]: 9.84001e-06 [fold_const_symbol]: 1.469e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.57001e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 2.475e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 3.50998e-06 [opt_after_jit_grad]: 0.00046195 [validate]: 4.604e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0801267 [execute]: 8.32e-06 Sums bootstrap : 0.004242s : 3.94% type_inference : 0.010661s : 9.89% event_method : 0.000043s : 0.04% auto_monad : 0.000116s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000126s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000124s : 0.11% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003115s : 2.89% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.46% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000045s : 0.04% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000031s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001433s : 1.33% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.10% optimize.opt_a.renormalize : 0.003026s : 2.81% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000240s : 0.22% optimize.opt_a.a_3 : 0.000459s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.42% optimize.opt_b.b_1 : 0.000188s : 0.17% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000417s : 0.39% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000462s : 0.43% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.080127s : 74.34% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000732 218 5.93% : 0.000043s : 11: substitution.arithmetic_simplify 1.85% : 0.000014s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.75% : 0.000401s : 16: substitution.inline 2.15% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000014s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.93% : 0.000014s : 20: substitution.remove_not_recompute_node 3.25% : 0.000024s : 10: substitution.replace_applicator 1.46% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.76% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.57% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.48% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010590 2 87.57% : 0.009274s : 1: type_inference.infer 12.43% : 0.001316s : 1: type_inference.specialize ------[replace.] 0.000199 30 59.41% : 0.000118s : 16: replace.inline 40.59% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000424 30 92.62% : 0.000393s : 16: match.inline 7.38% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000735 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.09% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.19% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.10% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.52% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.29% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.94% : 0.000014s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.96% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.90% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.13% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001675 32 60.62% : 0.001016s : 12: func_graph_cloner_run.FuncGraphClonerGraph 39.38% : 0.000660s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.136342 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.22% : 0.003028s : 1: add_attr 2.21% : 0.003019s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000123s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 3.14% : 0.004280s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000019s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000049s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.49% : 0.004762s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 7.97% : 0.010864s : 1: opt_a 0.10% : 0.000140s : 1: opt_after_cconv 0.35% : 0.000472s : 1: opt_after_jit_grad 0.21% : 0.000292s : 1: opt_b 9.60% : 0.013086s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.15% : 0.001568s : 2: renormalize.infer 1.06% : 0.001445s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.10% : 0.000130s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000101s : 1: symbol_engine_optimizer 58.78% : 0.080145s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 7.83% : 0.010676s : 1: type_inference 0.05% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x7-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x8-pynative],max_mem:42.0M TotalTime = 0.0214866, [24] [bootstrap]: 0.00051437 [type_inference]: 0.00614048 [event_method]: 1.53e-05 [auto_monad]: 5.892e-05 [graph_reusing]: 5.39998e-06 [inline]: 1.64e-06 [add_attr]: 0.00342221, [1] [add_attr_with_inline]: 0.00341139, [1] [Cycle 1]: 4.432e-05, [2] [tag_attr]: 1.593e-05 [meta_addattr_fg_expand]: 3.82002e-06 [parallel-infer-symbol]: 2.53e-06 [pre_auto_parallel]: 2.821e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00398091, [53] [py_interpret_to_execute]: 2.007e-05 [rewriter_before_opt_a]: 6.024e-05 [opt_a]: 0.00210975, [2] [Cycle 1]: 0.00150465, [45] [expand_dump_flag]: 3.10002e-06 [switch_simplify]: 3.299e-05 [loop_unroll]: 2.06e-05 [a_1]: 0.00045347 [with_stream_mark]: 1.327e-05 [recompute_prepare]: 7.78001e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.34001e-06 [updatestate_loads_eliminate]: 2.80002e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.538e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.35002e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 7.09001e-06 [auto_parallel]: 6.03002e-06 [parallel]: 2.494e-05 [flash_sp]: 6.98998e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.15998e-06 [matmul_add_comm_reduction]: 8.55001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.72001e-06 [virtual_output]: 6.10002e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.051e-05 [merge_recompute_call_nodes]: 1.70001e-06 [before_grad]: 9.24998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.26998e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 9.90002e-06 [a_after_grad]: 8.75999e-06 [renormalize]: 0.00041055 [add_forward_monad_depend]: 4.91002e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.374e-05 [cse]: 2.537e-05 [a_3]: 4.108e-05 [Cycle 2]: 0.00059574, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012742 [with_stream_mark]: 9.47001e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.45002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.859e-05 [accelerated_algorithm]: 5.89e-06 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.27e-06 [flash_sp]: 2.94001e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 4.95999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.48e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.79999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.25999e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.92e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 9.05999e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.44999e-06 [cse]: 1.343e-05 [a_3]: 3.183e-05 [py_interpret_to_execute_after_opt_a]: 7.36001e-06 [slice_cell_reuse_recomputed_activation]: 2.34001e-06 [rewriter_after_opt_a]: 3.013e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.07e-06 [mutable_eliminate]: 0.0004896 [opt_b]: 0.00018032, [1] [Cycle 1]: 0.00017422, [7] [b_1]: 0.00010647 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 5.51002e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.19997e-07 [cse]: 1.65e-05 [optimize_parallel_all_gather_comm]: 1.527e-05 [overlap_param_gather]: 1.86003e-06 [cconv]: 2.29e-05 [loop_unroll]: 0.00040972 [opt_after_cconv]: 9.507e-05, [1] [Cycle 1]: 8.943e-05, [7] [c_1]: 2.787e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.612e-05 [renormalize]: 5.8001e-07 [remove_dup_value]: 1.224e-05 [tuple_transform]: 6.9e-05, [1] [Cycle 1]: 6.492e-05, [4] [d_1]: 3.962e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.851e-05 [cse_after_recomputation]: 2.019e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.74e-06 [swap_dp_allreduce_reducescatter]: 5.16002e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.50002e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.46998e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 7.50006e-07 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 1.23002e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96003e-06 [control_data_broadcast_order]: 1.156e-05 [grouped_pairwise_exchange_alltoall]: 1.77999e-06 [offloading_packed_experts]: 3.52997e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 3.69002e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.55001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.826e-05, [1] [Cycle 1]: 6.44e-05, [6] [build]: 2.20002e-06 [elim_shapecalc]: 8.72e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.581e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 0.00012542 [opt_after_jit_grad]: 0.0004522 [validate]: 3.236e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00647046 [execute]: 6.86999e-06 Sums bootstrap : 0.000514s : 3.01% type_inference : 0.006140s : 35.90% event_method : 0.000015s : 0.09% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000581s : 3.40% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000011s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000411s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000490s : 2.86% optimize.opt_b.b_1 : 0.000106s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000410s : 2.40% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000125s : 0.73% opt_after_jit_grad : 0.000452s : 2.64% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006470s : 37.83% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.28% : 0.000023s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000001s : 2: substitution.fold_const_symbol 3.10% : 0.000005s : 4: substitution.graph_param_transform 67.58% : 0.000110s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 6.42% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006096 2 90.61% : 0.005523s : 1: type_inference.infer 9.39% : 0.000572s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.92% : 0.000027s : 3: replace.inline 30.08% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.98% : 0.000108s : 3: match.inline 8.02% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.20% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.95% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000009s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.62% : 0.000004s : 32: predicate.load_eliminater 0.97% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.25% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.53% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.87% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.42% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000365 8 47.90% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.10% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030402 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.27% : 0.003426s : 1: add_attr 11.23% : 0.003415s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.81% : 0.000551s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000499s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.11% : 0.000947s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.95% : 0.002112s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.52% : 0.000462s : 1: opt_after_jit_grad 0.60% : 0.000184s : 1: opt_b 13.11% : 0.003985s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.69% : 0.000211s : 1: renormalize.infer 0.64% : 0.000193s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.43% : 0.000131s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.32% : 0.006480s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.24% : 0.006155s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.018183, [24] [bootstrap]: 0.00052668 [type_inference]: 0.00432092 [event_method]: 1.095e-05 [auto_monad]: 5.148e-05 [graph_reusing]: 4.92e-06 [inline]: 1.65001e-06 [add_attr]: 0.00295648, [1] [add_attr_with_inline]: 0.00294859, [1] [Cycle 1]: 4.785e-05, [2] [tag_attr]: 1.193e-05 [meta_addattr_fg_expand]: 3.19001e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.147e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.26e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00369396, [53] [py_interpret_to_execute]: 1.592e-05 [rewriter_before_opt_a]: 3.734e-05 [opt_a]: 0.0018907, [2] [Cycle 1]: 0.00129197, [45] [expand_dump_flag]: 2.62001e-06 [switch_simplify]: 2.379e-05 [loop_unroll]: 1.362e-05 [a_1]: 0.00034006 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 7.83999e-06 [updatestate_depend_eliminate]: 3.74002e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 2.72001e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.723e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 2.38998e-06 [meta_shard_fg_expand]: 1.40001e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.68001e-06 [auto_parallel]: 5.82001e-06 [parallel]: 1.662e-05 [flash_sp]: 7.19001e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 9.20999e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 6.59999e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.78001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.75001e-06 [before_grad]: 9.69e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 2.49999e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 8.79003e-06 [renormalize]: 0.0003351 [add_forward_monad_depend]: 4.17998e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.34e-05 [cse]: 2.768e-05 [a_3]: 3.977e-05 [Cycle 2]: 0.00058961, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012529 [with_stream_mark]: 1.088e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.761e-05 [accelerated_algorithm]: 5.58997e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.19002e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.18001e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 2.85002e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 5.09998e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.28e-06 [virtual_dataset]: 5.14003e-06 [get_grad_eliminate_]: 4.96002e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.77002e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 5.72999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.19e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 2.88e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.69999e-06 [a_after_grad]: 7.92998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 9.99979e-07 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.34001e-06 [cse]: 1.275e-05 [a_3]: 3.208e-05 [py_interpret_to_execute_after_opt_a]: 7.85e-06 [slice_cell_reuse_recomputed_activation]: 1.70001e-06 [rewriter_after_opt_a]: 3.028e-05 [convert_after_rewriter]: 6.71999e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00045084 [opt_b]: 0.00017837, [1] [Cycle 1]: 0.00017217, [7] [b_1]: 0.00010619 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.46998e-06 [renormalize]: 3.19997e-07 [cse]: 1.54e-05 [optimize_parallel_all_gather_comm]: 1.574e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 2.25e-05 [loop_unroll]: 0.0004165 [opt_after_cconv]: 9.497e-05, [1] [Cycle 1]: 8.934e-05, [7] [c_1]: 2.845e-05 [parameter_eliminate]: 2.04e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.6e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.191e-05 [tuple_transform]: 6.808e-05, [1] [Cycle 1]: 6.368e-05, [4] [d_1]: 3.834e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.01998e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 4.281e-05 [cse_after_recomputation]: 1.986e-05, [1] [Cycle 1]: 1.545e-05, [1] [cse]: 1.04e-05 [environ_conv]: 5.20999e-06 [swap_dp_allreduce_reducescatter]: 4.87998e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.13998e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.46998e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.06997e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.162e-05 [grouped_pairwise_exchange_alltoall]: 1.46998e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 4.59998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 1.82001e-06 [overlap_grad_ring_attention]: 4.20999e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.819e-05, [1] [Cycle 1]: 6.402e-05, [6] [build]: 2.37001e-06 [elim_shapecalc]: 8e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 8.61002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.63002e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.596e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.00044969 [validate]: 3.071e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00587809 [execute]: 7.31001e-06 Sums bootstrap : 0.000527s : 3.69% type_inference : 0.004321s : 30.27% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000465s : 3.26% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000335s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000451s : 3.16% optimize.opt_b.b_1 : 0.000106s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000416s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 3.15% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005878s : 41.18% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.52% : 0.000022s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.24% : 0.000005s : 4: substitution.graph_param_transform 64.73% : 0.000077s : 2: substitution.inline 2.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.81% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004281 2 92.03% : 0.003940s : 1: type_inference.infer 7.97% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000067 2 100.00% : 0.000067s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.25% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 1.03% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.40% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000008s : 44: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.32% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.66% : 0.000001s : 4: predicate.parallel_virtual_node 1.30% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 9: predicate.reshape_eliminate 0.97% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.39% : 0.000006s : 41: predicate.switch_simplify 0.81% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.02% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.60% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 42.44% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.56% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026141 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.33% : 0.002961s : 1: add_attr 11.29% : 0.002952s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.15% : 0.000562s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000425s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000460s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.12% : 0.000815s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.24% : 0.001894s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.76% : 0.000459s : 1: opt_after_jit_grad 0.70% : 0.000182s : 1: opt_b 14.15% : 0.003698s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.70% : 0.000183s : 1: renormalize.infer 0.56% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.52% : 0.005888s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.59% : 0.004336s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0198443, [24] [bootstrap]: 0.00045331 [type_inference]: 0.00552395 [event_method]: 1.382e-05 [auto_monad]: 5.592e-05 [graph_reusing]: 4.89e-06 [inline]: 1.99e-06 [add_attr]: 0.00293766, [1] [add_attr_with_inline]: 0.00292967, [1] [Cycle 1]: 4.491e-05, [2] [tag_attr]: 1.455e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.527e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00417516, [53] [py_interpret_to_execute]: 1.946e-05 [rewriter_before_opt_a]: 5.667e-05 [opt_a]: 0.00225376, [2] [Cycle 1]: 0.00160288, [45] [expand_dump_flag]: 3.06001e-06 [switch_simplify]: 3.152e-05 [loop_unroll]: 2.089e-05 [a_1]: 0.00050107 [with_stream_mark]: 1.424e-05 [recompute_prepare]: 8.80999e-06 [updatestate_depend_eliminate]: 3.78999e-06 [updatestate_assign_eliminate]: 3.34001e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 8.277e-05 [accelerated_algorithm]: 7.51999e-06 [shard]: 2.41e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.78e-06 [merge_send_recv]: 8.23999e-06 [auto_parallel]: 6.48e-06 [parallel]: 1.838e-05 [flash_sp]: 7.85e-06 [merge_comm]: 4.28999e-06 [allreduce_fusion]: 3.54002e-06 [matmul_add_comm_reduction]: 8.74e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.83001e-06 [virtual_dataset]: 6.50002e-06 [get_grad_eliminate_]: 6.12001e-06 [virtual_output]: 5.97001e-06 [merge_forward]: 3.80998e-06 [cell_reuse_recompute_pass]: 1.69e-06 [offload_activation]: 9.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.189e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 1.002e-05 [set_forward_comm_id_for_comm_node_pass]: 3.99002e-06 [meta_fg_expand]: 2.31998e-06 [flash_sp_send_recv_attached]: 2.66999e-06 [receive_attached]: 2.95002e-06 [after_resolve]: 1.142e-05 [a_after_grad]: 9.67001e-06 [renormalize]: 0.00042201 [add_forward_monad_depend]: 4.53001e-06 [auto_monad_grad]: 1.82001e-06 [auto_monad_eliminator]: 1.392e-05 [cse]: 2.701e-05 [a_3]: 4.392e-05 [Cycle 2]: 0.0006411, [45] [expand_dump_flag]: 1.04003e-06 [switch_simplify]: 7.58999e-06 [loop_unroll]: 6.14001e-06 [a_1]: 0.00014141 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 6.12001e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.72001e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 7.514e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 5.02e-06 [auto_parallel]: 5.61998e-06 [parallel]: 4.73001e-06 [flash_sp]: 3.16001e-06 [merge_comm]: 3.20998e-06 [allreduce_fusion]: 3.06001e-06 [matmul_add_comm_reduction]: 5.62001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.26002e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.89003e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59e-06 [merge_recompute_call_nodes]: 8.49977e-07 [before_grad]: 8.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.297e-05 [a_3]: 3.665e-05 [py_interpret_to_execute_after_opt_a]: 8.17e-06 [slice_cell_reuse_recomputed_activation]: 2.54999e-06 [rewriter_after_opt_a]: 3.493e-05 [convert_after_rewriter]: 7.55e-06 [order_py_execute_after_rewriter]: 5.57001e-06 [mutable_eliminate]: 0.00050092 [opt_b]: 0.00019605, [1] [Cycle 1]: 0.00018967, [7] [b_1]: 0.00012102 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.30002e-06 [renormalize]: 4.39992e-07 [cse]: 1.57e-05 [optimize_parallel_all_gather_comm]: 1.552e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.252e-05 [loop_unroll]: 0.00043438 [opt_after_cconv]: 9.446e-05, [1] [Cycle 1]: 8.878e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.39998e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.47001e-06 [cse]: 1.54e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 1.187e-05 [tuple_transform]: 6.876e-05, [1] [Cycle 1]: 6.417e-05, [4] [d_1]: 3.89e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 4.185e-05 [cse_after_recomputation]: 1.966e-05, [1] [Cycle 1]: 1.528e-05, [1] [cse]: 1.027e-05 [environ_conv]: 4.42e-06 [swap_dp_allreduce_reducescatter]: 4.87e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 2.45002e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.44998e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.28002e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 2.07999e-06 [offloading_packed_experts]: 3.41001e-06 [overlap_recompute_and_grad_model_parallel]: 4.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.15e-06 [overlap_grad_flash_sp]: 1.736e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.92999e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 6.845e-05, [1] [Cycle 1]: 6.404e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.61002e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 1.36002e-06 [auto_monad_reorder]: 1.556e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00048219 [validate]: 3.023e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.00590843 [execute]: 7.48e-06 Sums bootstrap : 0.000453s : 2.85% type_inference : 0.005524s : 34.69% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000642s : 4.04% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000158s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000022s : 0.14% optimize.opt_a.a_after_grad : 0.000019s : 0.12% optimize.opt_a.renormalize : 0.000422s : 2.65% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000040s : 0.25% optimize.opt_a.a_3 : 0.000081s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000035s : 0.22% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000501s : 3.15% optimize.opt_b.b_1 : 0.000121s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000434s : 2.73% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.26% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000482s : 3.03% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005908s : 37.11% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000169 30 15.05% : 0.000025s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 65.89% : 0.000111s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.83% : 0.000005s : 4: substitution.remove_not_recompute_node 2.79% : 0.000005s : 4: substitution.replace_old_param 6.73% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005484 2 89.82% : 0.004926s : 1: type_inference.infer 10.18% : 0.000558s : 1: type_inference.specialize ------[replace.] 0.000059 5 79.62% : 0.000047s : 3: replace.inline 20.38% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.50% : 0.000109s : 3: match.inline 8.50% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000175 1131 0.96% : 0.000002s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 1.04% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.33% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.29% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.47% : 0.000011s : 51: predicate.inline 0.95% : 0.000002s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000002s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.68% : 0.000005s : 32: predicate.load_eliminater 1.01% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.37% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000002s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.46% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000002s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.66% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000002s : 8: predicate.specialize_transform 0.86% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000004s : 24: predicate.switch_layer_defer_inline 4.77% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.92% : 0.000002s : 11: predicate.transpose_eliminate 1.45% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 8 43.83% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.17% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028587 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.29% : 0.002942s : 1: add_attr 10.26% : 0.002933s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.71% : 0.000489s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.55% : 0.000443s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.78% : 0.000510s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 3.63% : 0.001038s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000099s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.89% : 0.002257s : 1: opt_a 0.34% : 0.000098s : 1: opt_after_cconv 1.72% : 0.000492s : 1: opt_after_jit_grad 0.70% : 0.000199s : 1: opt_b 14.62% : 0.004179s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.74% : 0.000212s : 1: renormalize.infer 0.71% : 0.000203s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000039s : 1: rewriter_after_opt_a 0.21% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 20.70% : 0.005918s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.37% : 0.005538s : 1: type_inference 0.19% : 0.000055s : 1: validate TotalTime = 0.037162, [24] [bootstrap]: 0.00050105 [type_inference]: 0.0113714 [event_method]: 4.569e-05 [auto_monad]: 0.00011888 [graph_reusing]: 7.92e-06 [inline]: 2.47001e-06 [add_attr]: 0.00302759, [1] [add_attr_with_inline]: 0.00301939, [1] [Cycle 1]: 6.924e-05, [2] [tag_attr]: 3.432e-05 [meta_addattr_fg_expand]: 9.64e-06 [parallel-infer-symbol]: 2.75997e-06 [pre_auto_parallel]: 4.971e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0131239, [53] [py_interpret_to_execute]: 3.743e-05 [rewriter_before_opt_a]: 0.00014595 [opt_a]: 0.0108962, [3] [Cycle 1]: 0.00698317, [45] [expand_dump_flag]: 3.95e-06 [switch_simplify]: 7.409e-05 [loop_unroll]: 6.105e-05 [a_1]: 0.00142782 [with_stream_mark]: 2.244e-05 [recompute_prepare]: 2.143e-05 [updatestate_depend_eliminate]: 9.27999e-06 [updatestate_assign_eliminate]: 7.70998e-06 [updatestate_loads_eliminate]: 7.39002e-06 [parameter_eliminate]: 2.57001e-06 [a_2]: 0.0002435 [accelerated_algorithm]: 2.999e-05 [shard]: 1.69e-06 [meta_shard_fg_expand]: 3.66999e-06 [shard_inline]: 1.588e-05 [merge_send_recv]: 1.578e-05 [auto_parallel]: 1.119e-05 [parallel]: 1.831e-05 [flash_sp]: 1.153e-05 [merge_comm]: 9.77999e-06 [allreduce_fusion]: 8.97999e-06 [matmul_add_comm_reduction]: 2.569e-05 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 1.82e-05 [virtual_dataset]: 1.567e-05 [get_grad_eliminate_]: 1.52e-05 [virtual_output]: 1.527e-05 [merge_forward]: 9.02e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 1.735e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.858e-05 [merge_recompute_call_nodes]: 1.41002e-06 [before_grad]: 2.693e-05 [set_forward_comm_id_for_comm_node_pass]: 9.32001e-06 [meta_fg_expand]: 0.0014097 [flash_sp_send_recv_attached]: 3.75e-06 [receive_attached]: 2.72001e-06 [after_resolve]: 5.927e-05 [a_after_grad]: 8.058e-05 [renormalize]: 0.00235683 [add_forward_monad_depend]: 9.14998e-06 [auto_monad_grad]: 5.24998e-06 [auto_monad_eliminator]: 9.07e-05 [cse]: 0.00016312 [a_3]: 0.00033281 [Cycle 2]: 0.00298913, [45] [expand_dump_flag]: 1.51998e-06 [switch_simplify]: 4.647e-05 [loop_unroll]: 4.327e-05 [a_1]: 0.00152544 [with_stream_mark]: 1.143e-05 [recompute_prepare]: 1.106e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.42e-06 [updatestate_loads_eliminate]: 3.75998e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.0001256 [accelerated_algorithm]: 1.186e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.23002e-06 [merge_send_recv]: 6.68e-06 [auto_parallel]: 7.16001e-06 [parallel]: 5.32001e-06 [flash_sp]: 4.2e-06 [merge_comm]: 5.91e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.95e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.043e-05 [virtual_dataset]: 8.84e-06 [get_grad_eliminate_]: 8.60999e-06 [virtual_output]: 9.39e-06 [merge_forward]: 5.04003e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.35001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.718e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.388e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 6.837e-05 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.18001e-06 [after_resolve]: 1.541e-05 [a_after_grad]: 1.447e-05 [renormalize]: 0.00058477 [add_forward_monad_depend]: 3.9e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.42e-05 [cse]: 4.685e-05 [a_3]: 6.394e-05 [Cycle 3]: 0.00090997, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.043e-05 [loop_unroll]: 8.87999e-06 [a_1]: 0.00024718 [with_stream_mark]: 9.61e-06 [recompute_prepare]: 9.41998e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 3.91001e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00013239 [accelerated_algorithm]: 1.227e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 6.98e-06 [auto_parallel]: 7.01001e-06 [parallel]: 4.90001e-06 [flash_sp]: 1.09998e-06 [merge_comm]: 4.75999e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.57998e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 9.98002e-06 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.21002e-06 [merge_forward]: 4.45999e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 8.52e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.578e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.387e-05 [set_forward_comm_id_for_comm_node_pass]: 5.12e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.495e-05 [a_after_grad]: 1.501e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.46998e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.086e-05 [cse]: 2.626e-05 [a_3]: 5.979e-05 [py_interpret_to_execute_after_opt_a]: 1.03e-05 [slice_cell_reuse_recomputed_activation]: 1.74998e-06 [rewriter_after_opt_a]: 4.609e-05 [convert_after_rewriter]: 9.47001e-06 [order_py_execute_after_rewriter]: 6.79999e-06 [mutable_eliminate]: 0.00045207 [opt_b]: 0.00028172, [1] [Cycle 1]: 0.00027575, [7] [b_1]: 0.00018622 [b_2]: 1.081e-05 [updatestate_depend_eliminate]: 7.1e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.17e-06 [renormalize]: 3.09985e-07 [cse]: 3.004e-05 [optimize_parallel_all_gather_comm]: 1.988e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 1.89e-05 [loop_unroll]: 0.00041696 [opt_after_cconv]: 0.00013469, [1] [Cycle 1]: 0.00012885, [7] [c_1]: 4.813e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 7.04001e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 4.08001e-06 [cse]: 2.938e-05 [renormalize]: 6.10016e-07 [remove_dup_value]: 2.873e-05 [tuple_transform]: 0.00010128, [1] [Cycle 1]: 9.645e-05, [4] [d_1]: 6.586e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 9.82999e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.574e-05 [cse_after_recomputation]: 3.214e-05, [1] [Cycle 1]: 2.728e-05, [1] [cse]: 2.203e-05 [environ_conv]: 8.87e-06 [swap_dp_allreduce_reducescatter]: 7.99002e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.55002e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.03997e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.695e-05 [grouped_pairwise_exchange_alltoall]: 1.42e-06 [offloading_packed_experts]: 5.00001e-06 [overlap_recompute_and_grad_model_parallel]: 5.29998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 5.07e-06 [overlap_grad_flash_sp]: 2.323e-05 [begin_end_overlap_inline]: 4.50003e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.06997e-06 [symbol_engine_optimizer]: 9.651e-05, [1] [Cycle 1]: 9.233e-05, [6] [build]: 9.27001e-06 [elim_shapecalc]: 1.325e-05 [elim_not_effective]: 1.808e-05 [opt_reshape]: 9.61998e-06 [fold_const_symbol]: 1.487e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.49e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 2.459e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.7e-06 [opt_after_jit_grad]: 0.00046005 [validate]: 4.309e-05 [backend_pass]: 8.30012e-07 [task_emit]: 0.00815648 [execute]: 6.89999e-06 Sums bootstrap : 0.000501s : 1.52% type_inference : 0.011371s : 34.57% event_method : 0.000046s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000146s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000113s : 0.34% optimize.opt_a.a_1 : 0.003200s : 9.73% optimize.opt_a.with_stream_mark : 0.000043s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000501s : 1.52% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001481s : 4.50% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.002942s : 8.94% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000116s : 0.35% optimize.opt_a.cse : 0.000236s : 0.72% optimize.opt_a.a_3 : 0.000457s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000452s : 1.37% optimize.opt_b.b_1 : 0.000186s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000417s : 1.27% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000460s : 1.40% validate : 0.000043s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008156s : 24.80% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000753 222 5.88% : 0.000044s : 12: substitution.arithmetic_simplify 1.83% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.63% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.22% : 0.000416s : 17: substitution.inline 2.02% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000014s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.81% : 0.000014s : 20: substitution.remove_not_recompute_node 3.24% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.68% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.82% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011299 2 87.23% : 0.009856s : 1: type_inference.infer 12.77% : 0.001443s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.87% : 0.000126s : 17: replace.inline 42.13% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000442 33 92.14% : 0.000407s : 17: match.inline 7.86% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000748 5764 1.10% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.19% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.78% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.70% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.71% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.94% : 0.000037s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001535 34 57.55% : 0.000883s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.45% : 0.000651s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061445 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.93% : 0.003032s : 1: add_attr 4.92% : 0.003023s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000126s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.87% : 0.000536s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000052s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.92% : 0.004866s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000172s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.74% : 0.010899s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.76% : 0.000470s : 1: opt_after_jit_grad 0.46% : 0.000285s : 1: opt_b 21.36% : 0.013128s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.57% : 0.001580s : 2: renormalize.infer 2.19% : 0.001349s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000099s : 1: symbol_engine_optimizer 13.29% : 0.008167s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.53% : 0.011387s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0185415, [24] [bootstrap]: 0.00045436 [type_inference]: 0.00428928 [event_method]: 1.057e-05 [auto_monad]: 5.191e-05 [graph_reusing]: 5.70001e-06 [inline]: 1.91e-06 [add_attr]: 0.00298837, [1] [add_attr_with_inline]: 0.00297918, [1] [Cycle 1]: 4.795e-05, [2] [tag_attr]: 1.164e-05 [meta_addattr_fg_expand]: 3.57002e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 2.207e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0036861, [53] [py_interpret_to_execute]: 1.613e-05 [rewriter_before_opt_a]: 3.843e-05 [opt_a]: 0.00184782, [2] [Cycle 1]: 0.00125263, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 2.645e-05 [loop_unroll]: 1.37e-05 [a_1]: 0.00029245 [with_stream_mark]: 1.567e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.82998e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.42002e-06 [parameter_eliminate]: 2.02001e-06 [a_2]: 7.73e-05 [accelerated_algorithm]: 6.44001e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.33999e-06 [auto_parallel]: 6.06e-06 [parallel]: 1.803e-05 [flash_sp]: 6.94001e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.71999e-06 [matmul_add_comm_reduction]: 9.35001e-06 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 6.88998e-06 [virtual_dataset]: 6.19001e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.55e-06 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 9.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.102e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 8.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.87998e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 9.92999e-06 [a_after_grad]: 8.77e-06 [renormalize]: 0.00033557 [add_forward_monad_depend]: 4.52998e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.333e-05 [cse]: 2.603e-05 [a_3]: 3.951e-05 [Cycle 2]: 0.00058594, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.22e-06 [a_1]: 0.0001247 [with_stream_mark]: 1.103e-05 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.50002e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.682e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 5.26998e-06 [parallel]: 3.93999e-06 [flash_sp]: 2.97002e-06 [merge_comm]: 2.86999e-06 [allreduce_fusion]: 2.82002e-06 [matmul_add_comm_reduction]: 5.00001e-06 [allreduce_slice_to_reducescatter]: 2.99973e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.26998e-06 [get_grad_eliminate_]: 5.06002e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.75002e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.92998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.87002e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 8.80999e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.08998e-06 [cse]: 1.183e-05 [a_3]: 3.131e-05 [py_interpret_to_execute_after_opt_a]: 7.29001e-06 [slice_cell_reuse_recomputed_activation]: 1.75001e-06 [rewriter_after_opt_a]: 3.532e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.51e-06 [mutable_eliminate]: 0.00044007 [opt_b]: 0.00022065, [1] [Cycle 1]: 0.00021485, [7] [b_1]: 0.00010698 [b_2]: 7.02002e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.59001e-06 [renormalize]: 3.60014e-07 [cse]: 5.628e-05 [optimize_parallel_all_gather_comm]: 1.989e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.298e-05 [loop_unroll]: 0.00040958 [opt_after_cconv]: 9.328e-05, [1] [Cycle 1]: 8.76e-05, [7] [c_1]: 2.739e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.10002e-06 [cse]: 1.555e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.213e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.42e-05, [4] [d_1]: 3.916e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 5.94999e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 4.239e-05 [cse_after_recomputation]: 1.965e-05, [1] [Cycle 1]: 1.543e-05, [1] [cse]: 1.038e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.32999e-06 [bias_add_comm_swap]: 2.26003e-06 [label_micro_interleaved_index]: 4.53001e-06 [label_fine_grained_interleaved_index]: 2.37999e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.61999e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 1.30999e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.15e-05 [grouped_pairwise_exchange_alltoall]: 1.81998e-06 [offloading_packed_experts]: 4.07998e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.592e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.758e-05, [1] [Cycle 1]: 6.359e-05, [6] [build]: 2.49001e-06 [elim_shapecalc]: 8.13001e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 5.95002e-06 [fold_const_symbol]: 8.92999e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.63002e-06 [pipeline_parallel_scheduler]: 1.69998e-06 [auto_monad_reorder]: 1.492e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00043768 [validate]: 2.965e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00633369 [execute]: 6.83998e-06 Sums bootstrap : 0.000454s : 3.11% type_inference : 0.004289s : 29.38% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000033s : 0.23% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.86% optimize.opt_a.with_stream_mark : 0.000027s : 0.18% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000336s : 2.30% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.24% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000440s : 3.01% optimize.opt_b.b_1 : 0.000107s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000056s : 0.39% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.14% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000410s : 2.81% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000438s : 3.00% validate : 0.000030s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006334s : 43.38% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.66% : 0.000023s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.64% : 0.000006s : 4: substitution.graph_param_transform 65.68% : 0.000079s : 2: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.84% : 0.000003s : 4: substitution.replace_old_param ------[type_inference.] 0.004247 2 91.72% : 0.003895s : 1: type_inference.infer 8.28% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.76% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.65% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.33% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.97% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.71% : 0.000001s : 8: predicate.incorporate_call_switch 6.12% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.51% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.79% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.46% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.85% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.57% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.73% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.27% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026472 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.31% : 0.002993s : 1: add_attr 11.27% : 0.002983s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.85% : 0.000489s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.70% : 0.000449s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.90% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.99% : 0.001851s : 1: opt_a 0.36% : 0.000097s : 1: opt_after_cconv 1.69% : 0.000447s : 1: opt_after_jit_grad 0.85% : 0.000224s : 1: opt_b 13.94% : 0.003690s : 1: optimize 0.09% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.69% : 0.000184s : 1: renormalize.infer 0.55% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.15% : 0.000039s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.96% : 0.006343s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.26% : 0.004303s : 1: type_inference 0.21% : 0.000055s : 1: validate TotalTime = 0.0358689, [24] [bootstrap]: 0.00048076 [type_inference]: 0.0101782 [event_method]: 4.088e-05 [auto_monad]: 0.00011388 [graph_reusing]: 7.82998e-06 [inline]: 1.89999e-06 [add_attr]: 0.00304957, [1] [add_attr_with_inline]: 0.00304125, [1] [Cycle 1]: 6.604e-05, [2] [tag_attr]: 3.131e-05 [meta_addattr_fg_expand]: 8.25e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 4.692e-05 [insert-virtual-dataset]: 2.79999e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.36e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.0129687, [53] [py_interpret_to_execute]: 3.516e-05 [rewriter_before_opt_a]: 0.00012408 [opt_a]: 0.0107422, [3] [Cycle 1]: 0.00686903, [45] [expand_dump_flag]: 3.66999e-06 [switch_simplify]: 6.647e-05 [loop_unroll]: 5.523e-05 [a_1]: 0.00138075 [with_stream_mark]: 2.226e-05 [recompute_prepare]: 2.197e-05 [updatestate_depend_eliminate]: 9.05999e-06 [updatestate_assign_eliminate]: 7.85e-06 [updatestate_loads_eliminate]: 7.65e-06 [parameter_eliminate]: 2.58e-06 [a_2]: 0.00024356 [accelerated_algorithm]: 3.059e-05 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 3.61001e-06 [shard_inline]: 1.657e-05 [merge_send_recv]: 1.578e-05 [auto_parallel]: 1.09e-05 [parallel]: 1.72e-05 [flash_sp]: 1.103e-05 [merge_comm]: 9.95002e-06 [allreduce_fusion]: 8.87999e-06 [matmul_add_comm_reduction]: 2.571e-05 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 1.793e-05 [virtual_dataset]: 1.544e-05 [get_grad_eliminate_]: 1.553e-05 [virtual_output]: 1.521e-05 [merge_forward]: 9.42999e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 1.726e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.897e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.733e-05 [set_forward_comm_id_for_comm_node_pass]: 9.53002e-06 [meta_fg_expand]: 0.00137674 [flash_sp_send_recv_attached]: 3.93001e-06 [receive_attached]: 2.86e-06 [after_resolve]: 5.933e-05 [a_after_grad]: 8.242e-05 [renormalize]: 0.00236632 [add_forward_monad_depend]: 8.81997e-06 [auto_monad_grad]: 5.12999e-06 [auto_monad_eliminator]: 5.585e-05 [cse]: 0.00016149 [a_3]: 0.00033791 [Cycle 2]: 0.00296816, [45] [expand_dump_flag]: 1.57001e-06 [switch_simplify]: 4.747e-05 [loop_unroll]: 4.425e-05 [a_1]: 0.00153834 [with_stream_mark]: 1.205e-05 [recompute_prepare]: 1.114e-05 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012738 [accelerated_algorithm]: 1.213e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 9.40001e-06 [merge_send_recv]: 6.56999e-06 [auto_parallel]: 7.29001e-06 [parallel]: 4.80001e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.26002e-06 [allreduce_fusion]: 4.91002e-06 [matmul_add_comm_reduction]: 7.48999e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 1.05e-05 [virtual_dataset]: 8.95001e-06 [get_grad_eliminate_]: 8.95001e-06 [virtual_output]: 8.41002e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.04998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.661e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.405e-05 [set_forward_comm_id_for_comm_node_pass]: 5.50001e-06 [meta_fg_expand]: 3.407e-05 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 1.636e-05 [a_after_grad]: 1.568e-05 [renormalize]: 0.00057476 [add_forward_monad_depend]: 4.20999e-06 [auto_monad_grad]: 1.18001e-06 [auto_monad_eliminator]: 1.391e-05 [cse]: 4.471e-05 [a_3]: 6.508e-05 [Cycle 3]: 0.00089093, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 1.057e-05 [loop_unroll]: 8.80001e-06 [a_1]: 0.00025009 [with_stream_mark]: 9.88998e-06 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.85001e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.0001232 [accelerated_algorithm]: 1.187e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 9.03002e-06 [merge_send_recv]: 6.84001e-06 [auto_parallel]: 7.4e-06 [parallel]: 4.45999e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.14e-06 [allreduce_fusion]: 4.83001e-06 [matmul_add_comm_reduction]: 7.68999e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 9.79e-06 [virtual_dataset]: 8.70001e-06 [get_grad_eliminate_]: 8.50001e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 8.40999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.587e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.12e-06 [meta_fg_expand]: 3.11001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.356e-05 [a_after_grad]: 1.439e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.24998e-06 [auto_monad_grad]: 9.99979e-07 [auto_monad_eliminator]: 9.82001e-06 [cse]: 2.435e-05 [a_3]: 5.618e-05 [py_interpret_to_execute_after_opt_a]: 1e-05 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 5.015e-05 [convert_after_rewriter]: 9.22001e-06 [order_py_execute_after_rewriter]: 7.29001e-06 [mutable_eliminate]: 0.00045097 [opt_b]: 0.00028606, [1] [Cycle 1]: 0.00027977, [7] [b_1]: 0.00018873 [b_2]: 1.065e-05 [updatestate_depend_eliminate]: 7.05002e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 4.02e-06 [renormalize]: 4.70027e-07 [cse]: 2.979e-05 [optimize_parallel_all_gather_comm]: 2.052e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 1.978e-05 [loop_unroll]: 0.00041817 [opt_after_cconv]: 0.00013513, [1] [Cycle 1]: 0.00012949, [7] [c_1]: 4.854e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 7.14001e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.73999e-06 [cse]: 2.98e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.847e-05 [tuple_transform]: 0.00010121, [1] [Cycle 1]: 9.65e-05, [4] [d_1]: 6.642e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.014e-05 [partial_unused_args_eliminate]: 2.14e-06 [add_recomputation]: 6.029e-05 [cse_after_recomputation]: 3.234e-05, [1] [Cycle 1]: 2.756e-05, [1] [cse]: 2.18e-05 [environ_conv]: 8.47998e-06 [swap_dp_allreduce_reducescatter]: 8.32998e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 4.45999e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.21998e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.32e-06 [overlap_opt_shard_in_pipeline]: 1.00999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.632e-05 [grouped_pairwise_exchange_alltoall]: 1.67999e-06 [offloading_packed_experts]: 4.85001e-06 [overlap_recompute_and_grad_model_parallel]: 5.71e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.84e-06 [overlap_grad_ring_attention]: 5.58002e-06 [overlap_grad_flash_sp]: 2.417e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.18002e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.869e-05, [1] [Cycle 1]: 9.448e-05, [6] [build]: 9.93002e-06 [elim_shapecalc]: 1.33e-05 [elim_not_effective]: 1.797e-05 [opt_reshape]: 1.028e-05 [fold_const_symbol]: 1.496e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.41998e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 2.476e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.23e-06 [opt_after_jit_grad]: 0.00048635 [validate]: 4.591e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00819449 [execute]: 6.89001e-06 Sums bootstrap : 0.000481s : 1.52% type_inference : 0.010178s : 32.25% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000124s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003169s : 10.04% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001414s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000089s : 0.28% optimize.opt_a.a_after_grad : 0.000112s : 0.36% optimize.opt_a.renormalize : 0.002941s : 9.32% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000231s : 0.73% optimize.opt_a.a_3 : 0.000459s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000050s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000451s : 1.43% optimize.opt_b.b_1 : 0.000189s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000418s : 1.32% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.19% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000486s : 1.54% validate : 0.000046s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008194s : 25.96% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000732 218 5.71% : 0.000042s : 11: substitution.arithmetic_simplify 1.93% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 54.73% : 0.000400s : 16: substitution.inline 2.24% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.06% : 0.000015s : 3: substitution.less_batch_normalization 1.85% : 0.000014s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.25% : 0.000024s : 10: substitution.replace_applicator 1.49% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.79% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010111 2 87.50% : 0.008848s : 1: type_inference.infer 12.50% : 0.001264s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.82% : 0.000119s : 16: replace.inline 41.18% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.75% : 0.000392s : 16: match.inline 7.25% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000737 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.21% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.66% : 0.000042s : 244: predicate.inline 1.29% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.95% : 0.000014s : 97: predicate.partial_defer_inline 1.69% : 0.000012s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.69% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001450 32 57.62% : 0.000836s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.38% : 0.000614s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059980 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.09% : 0.003054s : 1: add_attr 5.08% : 0.003045s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000120s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.86% : 0.000514s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.04% : 0.004823s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.91% : 0.010745s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.83% : 0.000496s : 1: opt_after_jit_grad 0.48% : 0.000290s : 1: opt_b 21.63% : 0.012973s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000011s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000052s : 1: pre_auto_parallel 0.06% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.64% : 0.001582s : 2: renormalize.infer 2.25% : 0.001348s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000054s : 1: rewriter_after_opt_a 0.21% : 0.000129s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.68% : 0.008205s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 16.99% : 0.010193s : 1: type_inference 0.13% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x8-kbk],max_mem:42.0M TotalTime = 0.875369, [24] [bootstrap]: 0.00060696 [type_inference]: 0.00679428 [event_method]: 1.399e-05 [auto_monad]: 5.438e-05 [graph_reusing]: 5.08002e-06 [inline]: 1.92001e-06 [add_attr]: 0.00433634, [1] [add_attr_with_inline]: 0.00432184, [1] [Cycle 1]: 4.483e-05, [2] [tag_attr]: 1.542e-05 [meta_addattr_fg_expand]: 4.07e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.922e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00399201, [53] [py_interpret_to_execute]: 2.01e-05 [rewriter_before_opt_a]: 5.874e-05 [opt_a]: 0.00213364, [2] [Cycle 1]: 0.00152884, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 3.12e-05 [loop_unroll]: 2.127e-05 [a_1]: 0.00045644 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 2.91e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.567e-05 [accelerated_algorithm]: 6.31e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.17999e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.821e-05 [flash_sp]: 7.12997e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.58001e-06 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.33002e-06 [virtual_output]: 6.22001e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.43001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 1.76e-06 [before_grad]: 9.21002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.84002e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.43998e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.102e-05 [a_after_grad]: 8.73001e-06 [renormalize]: 0.00040823 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 2.17999e-06 [auto_monad_eliminator]: 1.301e-05 [cse]: 2.618e-05 [a_3]: 4.033e-05 [Cycle 2]: 0.0005956, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 7e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.0001339 [with_stream_mark]: 9.62999e-06 [recompute_prepare]: 5.61003e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.778e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.03002e-06 [parallel]: 4.4e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.14001e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.91002e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.19e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.12001e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.49999e-06 [cse]: 1.276e-05 [a_3]: 3.141e-05 [py_interpret_to_execute_after_opt_a]: 7.28999e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 2.983e-05 [convert_after_rewriter]: 6.61e-06 [order_py_execute_after_rewriter]: 5.57001e-06 [mutable_eliminate]: 0.00044343 [opt_b]: 0.00018635, [1] [Cycle 1]: 0.00018032, [7] [b_1]: 0.00011284 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 4.60015e-07 [cse]: 1.586e-05 [optimize_parallel_all_gather_comm]: 1.563e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.215e-05 [loop_unroll]: 0.00041025 [opt_after_cconv]: 9.29e-05, [1] [Cycle 1]: 8.713e-05, [7] [c_1]: 2.737e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.526e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 6.817e-05, [1] [Cycle 1]: 6.381e-05, [4] [d_1]: 3.842e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 7.902e-05 [cse_after_recomputation]: 2.033e-05, [1] [Cycle 1]: 1.599e-05, [1] [cse]: 1.088e-05 [environ_conv]: 4.60001e-06 [swap_dp_allreduce_reducescatter]: 4.98001e-06 [bias_add_comm_swap]: 2.35997e-06 [label_micro_interleaved_index]: 4.47998e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.21e-06 [micro_interleaved_order_control]: 2.78e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.14998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.122e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.75001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55001e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.653e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.78e-05, [1] [Cycle 1]: 6.388e-05, [6] [build]: 2.26998e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.172e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.453e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00044122 [validate]: 2.995e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.858816 [execute]: 9.21002e-06 Sums bootstrap : 0.000607s : 0.07% type_inference : 0.006794s : 0.78% event_method : 0.000014s : 0.00% auto_monad : 0.000054s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000590s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000010s : 0.00% optimize.opt_a.parallel : 0.000053s : 0.01% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000014s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000408s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000443s : 0.05% optimize.opt_b.b_1 : 0.000113s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000410s : 0.05% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000079s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000441s : 0.05% validate : 0.000030s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.858816s : 98.71% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000164 30 15.05% : 0.000025s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.03% : 0.000005s : 4: substitution.graph_param_transform 66.32% : 0.000109s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.46% : 0.000004s : 4: substitution.remove_not_recompute_node 2.52% : 0.000004s : 4: substitution.replace_old_param 6.99% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006751 2 91.61% : 0.006185s : 1: type_inference.infer 8.39% : 0.000566s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.33% : 0.000027s : 3: replace.inline 29.67% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.06% : 0.000107s : 3: match.inline 8.94% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.95% : 0.000002s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.91% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.18% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.50% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.38% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.51% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.59% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.86% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000344 8 45.12% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.88% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.885205 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.49% : 0.004341s : 1: add_attr 0.49% : 0.004325s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000083s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000059s : 1: auto_monad 0.00% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000642s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000418s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000452s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000012s : 1: opt.transform.mutable_eliminate 0.11% : 0.000954s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000095s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.24% : 0.002136s : 1: opt_a 0.01% : 0.000096s : 1: opt_after_cconv 0.05% : 0.000450s : 1: opt_after_jit_grad 0.02% : 0.000190s : 1: opt_b 0.45% : 0.003996s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000034s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000212s : 1: renormalize.infer 0.02% : 0.000189s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000033s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000070s : 1: symbol_engine_optimizer 97.02% : 0.858837s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.77% : 0.006808s : 1: type_inference 0.01% : 0.000051s : 1: validate TotalTime = 0.0707595, [24] [bootstrap]: 0.00042264 [type_inference]: 0.00442168 [event_method]: 1.045e-05 [auto_monad]: 4.969e-05 [graph_reusing]: 5.09e-06 [inline]: 1.77999e-06 [add_attr]: 0.00291358, [1] [add_attr_with_inline]: 0.00290564, [1] [Cycle 1]: 4.207e-05, [2] [tag_attr]: 1.209e-05 [meta_addattr_fg_expand]: 3.42002e-06 [parallel-infer-symbol]: 2.54999e-06 [pre_auto_parallel]: 2.092e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00362591, [53] [py_interpret_to_execute]: 1.47e-05 [rewriter_before_opt_a]: 3.778e-05 [opt_a]: 0.00184249, [2] [Cycle 1]: 0.00125001, [45] [expand_dump_flag]: 2.59001e-06 [switch_simplify]: 2.334e-05 [loop_unroll]: 1.333e-05 [a_1]: 0.00028712 [with_stream_mark]: 1.261e-05 [recompute_prepare]: 7.35998e-06 [updatestate_depend_eliminate]: 3.97e-06 [updatestate_assign_eliminate]: 3.28998e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 7.486e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 6.19999e-06 [merge_send_recv]: 8.38999e-06 [auto_parallel]: 5.44e-06 [parallel]: 1.699e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 9.10999e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.28999e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 9.22999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 9.14e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.46998e-06 [flash_sp_send_recv_attached]: 2.24999e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 1.102e-05 [a_after_grad]: 8.94998e-06 [renormalize]: 0.00033429 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.292e-05 [cse]: 2.853e-05 [a_3]: 3.855e-05 [Cycle 2]: 0.0005834, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.70002e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00012261 [with_stream_mark]: 9.09e-06 [recompute_prepare]: 5.91998e-06 [updatestate_depend_eliminate]: 2.62001e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.752e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.24997e-06 [auto_parallel]: 5.00999e-06 [parallel]: 3.78999e-06 [flash_sp]: 3.14999e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.62001e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 5.89999e-06 [virtual_dataset]: 5.11997e-06 [get_grad_eliminate_]: 4.82e-06 [virtual_output]: 4.77998e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42001e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.71999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.60001e-06 [a_after_grad]: 7.82e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.259e-05 [a_3]: 3.139e-05 [py_interpret_to_execute_after_opt_a]: 7.43999e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.182e-05 [convert_after_rewriter]: 7.21001e-06 [order_py_execute_after_rewriter]: 4.96997e-06 [mutable_eliminate]: 0.00044158 [opt_b]: 0.00017944, [1] [Cycle 1]: 0.00017352, [7] [b_1]: 0.00010671 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 8.80013e-07 [cse]: 1.609e-05 [optimize_parallel_all_gather_comm]: 1.549e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.166e-05 [loop_unroll]: 0.00040771 [opt_after_cconv]: 9.551e-05, [1] [Cycle 1]: 8.971e-05, [7] [c_1]: 2.759e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.652e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.209e-05 [tuple_transform]: 6.931e-05, [1] [Cycle 1]: 6.45e-05, [4] [d_1]: 3.901e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.268e-05 [cse_after_recomputation]: 1.98e-05, [1] [Cycle 1]: 1.545e-05, [1] [cse]: 1.025e-05 [environ_conv]: 4.74e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.89999e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.70997e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97999e-06 [control_data_broadcast_order]: 1.125e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.53999e-06 [overlap_recompute_and_grad_model_parallel]: 4.12e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.683e-05 [begin_end_overlap_inline]: 8.2e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.71002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.767e-05, [1] [Cycle 1]: 6.373e-05, [6] [build]: 1.87001e-06 [elim_shapecalc]: 8.15e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.61e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.26001e-06 [opt_after_jit_grad]: 0.00044362 [validate]: 3.128e-05 [backend_pass]: 8.30012e-07 [task_emit]: 0.0585816 [execute]: 7.93999e-06 Sums bootstrap : 0.000423s : 0.63% type_inference : 0.004422s : 6.61% event_method : 0.000010s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000410s : 0.61% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000010s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000334s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000070s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000442s : 0.66% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000408s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.66% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058582s : 87.58% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000118 26 18.33% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.39% : 0.000005s : 4: substitution.graph_param_transform 65.17% : 0.000077s : 2: substitution.inline 2.39% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.50% : 0.000004s : 4: substitution.remove_not_recompute_node 3.67% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004383 2 90.94% : 0.003986s : 1: type_inference.infer 9.06% : 0.000397s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.98% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.95% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.31% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.44% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.91% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 1.05% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000253 6 43.90% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.10% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078545 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.71% : 0.002918s : 1: add_attr 3.70% : 0.002909s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.57% : 0.000451s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000416s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000451s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.96% : 0.000756s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.35% : 0.001845s : 1: opt_a 0.13% : 0.000099s : 1: opt_after_cconv 0.58% : 0.000453s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.62% : 0.003630s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.23% : 0.000182s : 1: renormalize.infer 0.19% : 0.000145s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.60% : 0.058597s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.65% : 0.004436s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.0718031, [24] [bootstrap]: 0.00048817 [type_inference]: 0.00550885 [event_method]: 1.42e-05 [auto_monad]: 5.489e-05 [graph_reusing]: 5.44e-06 [inline]: 1.67001e-06 [add_attr]: 0.00294011, [1] [add_attr_with_inline]: 0.00293237, [1] [Cycle 1]: 4.521e-05, [2] [tag_attr]: 1.571e-05 [meta_addattr_fg_expand]: 4.27e-06 [parallel-infer-symbol]: 2.90002e-06 [pre_auto_parallel]: 2.433e-05 [insert-virtual-dataset]: 2.22001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.16998e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00395956, [53] [py_interpret_to_execute]: 2.04e-05 [rewriter_before_opt_a]: 5.718e-05 [opt_a]: 0.00209146, [2] [Cycle 1]: 0.00149015, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 3.166e-05 [loop_unroll]: 2.061e-05 [a_1]: 0.00044349 [with_stream_mark]: 1.377e-05 [recompute_prepare]: 7.98999e-06 [updatestate_depend_eliminate]: 3.80998e-06 [updatestate_assign_eliminate]: 3.27002e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 7.619e-05 [accelerated_algorithm]: 6.52001e-06 [shard]: 2.02999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 8.19002e-06 [auto_parallel]: 5.71e-06 [parallel]: 1.633e-05 [flash_sp]: 7.38999e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.36999e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 8.91002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 9.74e-06 [a_after_grad]: 8.83001e-06 [renormalize]: 0.00041373 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.627e-05 [a_3]: 4.086e-05 [Cycle 2]: 0.00059173, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.00012418 [with_stream_mark]: 9.49999e-06 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.37999e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.754e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.32001e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.07e-06 [parallel]: 3.75e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 2.88003e-06 [allreduce_fusion]: 2.76999e-06 [matmul_add_comm_reduction]: 6.28e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.19998e-06 [get_grad_eliminate_]: 4.82e-06 [virtual_output]: 4.85999e-06 [merge_forward]: 2.37999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 7.30998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36998e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 7.75998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.00001e-06 [a_after_grad]: 7.94002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.313e-05 [a_3]: 3.341e-05 [py_interpret_to_execute_after_opt_a]: 7.48e-06 [slice_cell_reuse_recomputed_activation]: 1.70001e-06 [rewriter_after_opt_a]: 3.09e-05 [convert_after_rewriter]: 7.16999e-06 [order_py_execute_after_rewriter]: 4.84003e-06 [mutable_eliminate]: 0.00044986 [opt_b]: 0.00017972, [1] [Cycle 1]: 0.00017378, [7] [b_1]: 0.00010655 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 4.2998e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.534e-05 [overlap_param_gather]: 1.96998e-06 [cconv]: 2.063e-05 [loop_unroll]: 0.00045725 [opt_after_cconv]: 9.461e-05, [1] [Cycle 1]: 8.882e-05, [7] [c_1]: 2.743e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.621e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 1.201e-05 [tuple_transform]: 6.94e-05, [1] [Cycle 1]: 6.517e-05, [4] [d_1]: 3.926e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.29978e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.414e-05 [cse_after_recomputation]: 2.009e-05, [1] [Cycle 1]: 1.578e-05, [1] [cse]: 1.061e-05 [environ_conv]: 4.28001e-06 [swap_dp_allreduce_reducescatter]: 5.36998e-06 [bias_add_comm_swap]: 3.06001e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.65001e-06 [ForceFp32Comm]: 9.99979e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.28998e-06 [reorder_send_recv_between_fp_bp]: 2.30002e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.203e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 4.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.39001e-06 [overlap_grad_ring_attention]: 3.73999e-06 [overlap_grad_flash_sp]: 1.647e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 6.672e-05, [1] [Cycle 1]: 6.257e-05, [6] [build]: 2.31998e-06 [elim_shapecalc]: 8.2e-06 [elim_not_effective]: 1.136e-05 [opt_reshape]: 5.84999e-06 [fold_const_symbol]: 8.53001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.564e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00044701 [validate]: 3.314e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0580882 [execute]: 8.00999e-06 Sums bootstrap : 0.000488s : 0.72% type_inference : 0.005509s : 8.11% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000568s : 0.84% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000020s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000414s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.66% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000457s : 0.67% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000447s : 0.66% validate : 0.000033s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058088s : 85.54% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000160 30 14.59% : 0.000023s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 66.95% : 0.000107s : 3: substitution.inline 1.82% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.51% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005468 2 90.17% : 0.004931s : 1: type_inference.infer 9.83% : 0.000538s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.43% : 0.000026s : 3: replace.inline 30.57% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.84% : 0.000105s : 3: match.inline 8.16% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.08% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.53% : 0.000002s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 47.44% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.56% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080202 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.67% : 0.002944s : 1: add_attr 3.66% : 0.002936s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000523s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.58% : 0.000466s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.16% : 0.000932s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.61% : 0.002095s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000456s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.94% : 0.003963s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000028s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000205s : 1: renormalize.infer 0.25% : 0.000202s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000069s : 1: symbol_engine_optimizer 72.45% : 0.058104s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.88% : 0.005522s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 1.01893, [24] [bootstrap]: 0.0004729 [type_inference]: 0.0113687 [event_method]: 4.735e-05 [auto_monad]: 0.00011888 [graph_reusing]: 8.07e-06 [inline]: 2.01e-06 [add_attr]: 0.00301138, [1] [add_attr_with_inline]: 0.00300304, [1] [Cycle 1]: 6.917e-05, [2] [tag_attr]: 3.457e-05 [meta_addattr_fg_expand]: 9.08002e-06 [parallel-infer-symbol]: 2.85002e-06 [pre_auto_parallel]: 4.878e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.88002e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.0133343, [53] [py_interpret_to_execute]: 3.796e-05 [rewriter_before_opt_a]: 0.00014492 [opt_a]: 0.0110765, [3] [Cycle 1]: 0.00711759, [45] [expand_dump_flag]: 3.63999e-06 [switch_simplify]: 7.35e-05 [loop_unroll]: 6.129e-05 [a_1]: 0.00145261 [with_stream_mark]: 2.338e-05 [recompute_prepare]: 2.192e-05 [updatestate_depend_eliminate]: 8.92999e-06 [updatestate_assign_eliminate]: 8.27998e-06 [updatestate_loads_eliminate]: 7.26001e-06 [parameter_eliminate]: 2.56998e-06 [a_2]: 0.00024905 [accelerated_algorithm]: 2.993e-05 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 3.39001e-06 [shard_inline]: 1.626e-05 [merge_send_recv]: 1.627e-05 [auto_parallel]: 1.094e-05 [parallel]: 1.913e-05 [flash_sp]: 1.122e-05 [merge_comm]: 9.76e-06 [allreduce_fusion]: 8.94e-06 [matmul_add_comm_reduction]: 2.763e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 1.869e-05 [virtual_dataset]: 1.596e-05 [get_grad_eliminate_]: 1.544e-05 [virtual_output]: 1.533e-05 [merge_forward]: 9.24e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.775e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.928e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 2.73e-05 [set_forward_comm_id_for_comm_node_pass]: 9.98998e-06 [meta_fg_expand]: 0.00139781 [flash_sp_send_recv_attached]: 3.76999e-06 [receive_attached]: 2.09e-06 [after_resolve]: 6.026e-05 [a_after_grad]: 8.278e-05 [renormalize]: 0.00246954 [add_forward_monad_depend]: 9.37999e-06 [auto_monad_grad]: 4.99e-06 [auto_monad_eliminator]: 5.747e-05 [cse]: 0.00016852 [a_3]: 0.00033746 [Cycle 2]: 0.00304514, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 4.745e-05 [loop_unroll]: 4.389e-05 [a_1]: 0.0015341 [with_stream_mark]: 1.168e-05 [recompute_prepare]: 1.073e-05 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.59002e-06 [parameter_eliminate]: 9.90025e-07 [a_2]: 0.00012676 [accelerated_algorithm]: 1.201e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.30001e-06 [merge_send_recv]: 7.16001e-06 [auto_parallel]: 7.25e-06 [parallel]: 5.00999e-06 [flash_sp]: 3.63e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.66001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.062e-05 [virtual_dataset]: 8.95001e-06 [get_grad_eliminate_]: 9.04e-06 [virtual_output]: 8.51997e-06 [merge_forward]: 4.58999e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.645e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.414e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22e-06 [meta_fg_expand]: 6.813e-05 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.19998e-06 [after_resolve]: 1.62e-05 [a_after_grad]: 1.444e-05 [renormalize]: 0.00059775 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.23002e-06 [auto_monad_eliminator]: 1.438e-05 [cse]: 4.606e-05 [a_3]: 9.895e-05 [Cycle 3]: 0.00090004, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 1.093e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.00024969 [with_stream_mark]: 9.79e-06 [recompute_prepare]: 9.48997e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 3.83999e-06 [parameter_eliminate]: 9.69972e-07 [a_2]: 0.00012318 [accelerated_algorithm]: 1.156e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 8.98002e-06 [merge_send_recv]: 7.03e-06 [auto_parallel]: 7.14001e-06 [parallel]: 4.54002e-06 [flash_sp]: 1.01997e-06 [merge_comm]: 4.97e-06 [allreduce_fusion]: 5.05001e-06 [matmul_add_comm_reduction]: 7.53e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 9.72999e-06 [virtual_dataset]: 8.62e-06 [get_grad_eliminate_]: 8.48001e-06 [virtual_output]: 8.19002e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 8.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.573e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.408e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.324e-05 [a_after_grad]: 1.402e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 1.118e-05 [cse]: 2.768e-05 [a_3]: 5.873e-05 [py_interpret_to_execute_after_opt_a]: 1.046e-05 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 4.732e-05 [convert_after_rewriter]: 8.95001e-06 [order_py_execute_after_rewriter]: 7.29001e-06 [mutable_eliminate]: 0.00045408 [opt_b]: 0.00028833, [1] [Cycle 1]: 0.00028186, [7] [b_1]: 0.00018878 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 7.3e-06 [updatestate_assign_eliminate]: 4.07998e-06 [updatestate_loads_eliminate]: 3.9e-06 [renormalize]: 4.59986e-07 [cse]: 3.171e-05 [optimize_parallel_all_gather_comm]: 2.068e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.017e-05 [loop_unroll]: 0.00042369 [opt_after_cconv]: 0.00013703, [1] [Cycle 1]: 0.00013106, [7] [c_1]: 4.85e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 7.33999e-06 [updatestate_assign_eliminate]: 4.12998e-06 [updatestate_loads_eliminate]: 4.13001e-06 [cse]: 3.076e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 2.828e-05 [tuple_transform]: 0.00010209, [1] [Cycle 1]: 9.747e-05, [4] [d_1]: 6.737e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 9.79e-06 [partial_unused_args_eliminate]: 1.73997e-06 [add_recomputation]: 5.593e-05 [cse_after_recomputation]: 3.222e-05, [1] [Cycle 1]: 2.757e-05, [1] [cse]: 2.213e-05 [environ_conv]: 9.75002e-06 [swap_dp_allreduce_reducescatter]: 7.41001e-06 [bias_add_comm_swap]: 2.98998e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.63998e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 9.79984e-07 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.741e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 5.05001e-06 [overlap_recompute_and_grad_model_parallel]: 5.67001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.53998e-06 [overlap_grad_ring_attention]: 5.32001e-06 [overlap_grad_flash_sp]: 2.377e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 9.892e-05, [1] [Cycle 1]: 9.473e-05, [6] [build]: 1.027e-05 [elim_shapecalc]: 1.303e-05 [elim_not_effective]: 1.813e-05 [opt_reshape]: 1.006e-05 [fold_const_symbol]: 1.486e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.47001e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.465e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00046303 [validate]: 4.449e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.989749 [execute]: 9.29998e-06 Sums bootstrap : 0.000473s : 0.05% type_inference : 0.011369s : 1.12% event_method : 0.000047s : 0.00% auto_monad : 0.000119s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.00% optimize.rewriter_before_opt_a : 0.000145s : 0.01% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.01% optimize.opt_a.loop_unroll : 0.000114s : 0.01% optimize.opt_a.a_1 : 0.003236s : 0.32% optimize.opt_a.with_stream_mark : 0.000045s : 0.00% optimize.opt_a.recompute_prepare : 0.000042s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000025s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000020s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000018s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.00% optimize.opt_a.meta_fg_expand : 0.001469s : 0.14% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.01% optimize.opt_a.a_after_grad : 0.000111s : 0.01% optimize.opt_a.renormalize : 0.003067s : 0.30% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.00% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.01% optimize.opt_a.cse : 0.000242s : 0.02% optimize.opt_a.a_3 : 0.000495s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.00% optimize.convert_after_rewriter : 0.000009s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000454s : 0.04% optimize.opt_b.b_1 : 0.000189s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000424s : 0.04% optimize.opt_after_cconv.c_1 : 0.000049s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.00% optimize.tuple_transform.d_1 : 0.000067s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000010s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000463s : 0.05% validate : 0.000044s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.989749s : 97.55% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000760 222 5.99% : 0.000046s : 12: substitution.arithmetic_simplify 1.79% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.90% : 0.000425s : 17: substitution.inline 2.11% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.86% : 0.000014s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.77% : 0.000013s : 20: substitution.remove_not_recompute_node 3.14% : 0.000024s : 10: substitution.replace_applicator 1.35% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.59% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011295 2 86.77% : 0.009801s : 1: type_inference.infer 13.23% : 0.001494s : 1: type_inference.specialize ------[replace.] 0.000223 33 57.78% : 0.000129s : 17: replace.inline 42.22% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 33 92.51% : 0.000416s : 17: match.inline 7.49% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.10% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 100: predicate.arithmetic_simplify 1.17% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.43% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.35% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.66% : 0.000043s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.44% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000009s : 68: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.09% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.92% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001591 34 56.92% : 0.000906s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.08% : 0.000686s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.043609 237 0.00% : 0.000004s : 1: ForceFp32Comm 0.29% : 0.003016s : 1: add_attr 0.29% : 0.003007s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000126s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.05% : 0.000500s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000021s : 1: control_data_broadcast_order 0.00% : 0.000012s : 1: convert_after_rewriter 0.00% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000013s : 1: environ_conv 0.01% : 0.000054s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000463s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.47% : 0.004940s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000175s : 28: opt.transform.opt_b 0.01% : 0.000075s : 2: opt.transform.opt_trans_graph 0.01% : 0.000053s : 4: opt.transform.symbol_engine_opt 1.06% : 0.011080s : 1: opt_a 0.01% : 0.000140s : 1: opt_after_cconv 0.05% : 0.000472s : 1: opt_after_jit_grad 0.03% : 0.000292s : 1: opt_b 1.28% : 0.013338s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.01% : 0.000053s : 1: pre_auto_parallel 0.00% : 0.000042s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.00% : 0.000032s : 1: remove_dup_value 0.15% : 0.001614s : 2: renormalize.infer 0.14% : 0.001440s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000051s : 1: rewriter_after_opt_a 0.01% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000102s : 1: symbol_engine_optimizer 94.84% : 0.989771s : 1: task_emit 0.01% : 0.000105s : 1: tuple_transform 1.09% : 0.011384s : 1: type_inference 0.01% : 0.000070s : 1: validate TotalTime = 0.0703521, [24] [bootstrap]: 0.0004198 [type_inference]: 0.00423557 [event_method]: 1.066e-05 [auto_monad]: 5.109e-05 [graph_reusing]: 4.72e-06 [inline]: 1.96e-06 [add_attr]: 0.00296367, [1] [add_attr_with_inline]: 0.00295542, [1] [Cycle 1]: 4.169e-05, [2] [tag_attr]: 1.191e-05 [meta_addattr_fg_expand]: 3.35998e-06 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 2.084e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.46002e-06 [optimize]: 0.00363949, [53] [py_interpret_to_execute]: 1.505e-05 [rewriter_before_opt_a]: 3.888e-05 [opt_a]: 0.00184907, [2] [Cycle 1]: 0.00125136, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 2.603e-05 [loop_unroll]: 1.43e-05 [a_1]: 0.00029081 [with_stream_mark]: 1.275e-05 [recompute_prepare]: 7.15998e-06 [updatestate_depend_eliminate]: 3.60998e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.736e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.70997e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 7.9e-06 [auto_parallel]: 5.84999e-06 [parallel]: 1.799e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.14001e-06 [matmul_add_comm_reduction]: 9.10001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.75001e-06 [merge_forward]: 3.92998e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.06e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.84001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 8.23001e-06 [renormalize]: 0.00034193 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.79998e-06 [auto_monad_eliminator]: 1.283e-05 [cse]: 2.664e-05 [a_3]: 4.115e-05 [Cycle 2]: 0.00058811, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.43002e-06 [a_1]: 0.00012409 [with_stream_mark]: 9.39e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.20002e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.765e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.09003e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.32001e-06 [merge_send_recv]: 4.09997e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.15e-06 [flash_sp]: 3.03998e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 4.96002e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.54001e-06 [virtual_dataset]: 5.34998e-06 [get_grad_eliminate_]: 5.06002e-06 [virtual_output]: 5.07999e-06 [merge_forward]: 2.71999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.79e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.70998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.82001e-06 [a_after_grad]: 8.27e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 5.86e-06 [cse]: 1.237e-05 [a_3]: 3.162e-05 [py_interpret_to_execute_after_opt_a]: 7.27002e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.028e-05 [convert_after_rewriter]: 6.84001e-06 [order_py_execute_after_rewriter]: 4.79e-06 [mutable_eliminate]: 0.0004507 [opt_b]: 0.00017869, [1] [Cycle 1]: 0.00017243, [7] [b_1]: 0.00010631 [b_2]: 7.15998e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.4002e-07 [cse]: 1.598e-05 [optimize_parallel_all_gather_comm]: 1.592e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.162e-05 [loop_unroll]: 0.00040238 [opt_after_cconv]: 9.326e-05, [1] [Cycle 1]: 8.765e-05, [7] [c_1]: 2.712e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.591e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 1.23e-05 [tuple_transform]: 6.831e-05, [1] [Cycle 1]: 6.419e-05, [4] [d_1]: 3.844e-05 [none_parameter_eliminate]: 1.49003e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.43998e-06 [partial_unused_args_eliminate]: 1.50999e-06 [add_recomputation]: 4.462e-05 [cse_after_recomputation]: 2.127e-05, [1] [Cycle 1]: 1.684e-05, [1] [cse]: 1.156e-05 [environ_conv]: 5.17999e-06 [swap_dp_allreduce_reducescatter]: 5.51e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.68998e-06 [merge_cast_opt]: 1.51998e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.31002e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.48998e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.136e-05 [grouped_pairwise_exchange_alltoall]: 1.75001e-06 [offloading_packed_experts]: 3.45998e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.53999e-06 [overlap_grad_flash_sp]: 1.654e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.93002e-06 [split_layernorm_comm]: 1.69998e-06 [handle_group_info]: 1.26997e-06 [symbol_engine_optimizer]: 6.811e-05, [1] [Cycle 1]: 6.402e-05, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 8.67998e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.66998e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.0004421 [validate]: 3.021e-05 [backend_pass]: 1.04003e-06 [task_emit]: 0.0583006 [execute]: 8.44002e-06 Sums bootstrap : 0.000420s : 0.63% type_inference : 0.004236s : 6.37% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000033s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000415s : 0.62% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000342s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.68% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000402s : 0.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000442s : 0.67% validate : 0.000030s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058301s : 87.75% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.21% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.29% : 0.000005s : 4: substitution.graph_param_transform 65.73% : 0.000079s : 2: substitution.inline 2.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.65% : 0.000004s : 4: substitution.remove_not_recompute_node 3.18% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004195 2 91.66% : 0.003845s : 1: type_inference.infer 8.34% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.00% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.03% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.04% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.95% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.89% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.55% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.94% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 42.14% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.86% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078219 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.79% : 0.002968s : 1: add_attr 3.78% : 0.002959s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.57% : 0.000447s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000411s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 0.98% : 0.000768s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.37% : 0.001852s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.58% : 0.000451s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.66% : 0.003643s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000185s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.56% : 0.058317s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.43% : 0.004250s : 1: type_inference 0.07% : 0.000051s : 1: validate TotalTime = 0.10992, [24] [bootstrap]: 0.00045009 [type_inference]: 0.0101053 [event_method]: 4.277e-05 [auto_monad]: 0.00011525 [graph_reusing]: 7.98001e-06 [inline]: 1.86003e-06 [add_attr]: 0.00298117, [1] [add_attr_with_inline]: 0.00297313, [1] [Cycle 1]: 6.606e-05, [2] [tag_attr]: 3.126e-05 [meta_addattr_fg_expand]: 8.81002e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 4.583e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0130742, [53] [py_interpret_to_execute]: 3.45e-05 [rewriter_before_opt_a]: 0.00012562 [opt_a]: 0.0108478, [3] [Cycle 1]: 0.00692974, [45] [expand_dump_flag]: 3.88001e-06 [switch_simplify]: 6.613e-05 [loop_unroll]: 5.503e-05 [a_1]: 0.00133192 [with_stream_mark]: 2.259e-05 [recompute_prepare]: 2.136e-05 [updatestate_depend_eliminate]: 8.94998e-06 [updatestate_assign_eliminate]: 7.34002e-06 [updatestate_loads_eliminate]: 7.25e-06 [parameter_eliminate]: 2.51998e-06 [a_2]: 0.00026984 [accelerated_algorithm]: 3.144e-05 [shard]: 1.84e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.656e-05 [merge_send_recv]: 1.67e-05 [auto_parallel]: 1.084e-05 [parallel]: 1.857e-05 [flash_sp]: 1.101e-05 [merge_comm]: 9.42999e-06 [allreduce_fusion]: 8.90001e-06 [matmul_add_comm_reduction]: 2.602e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.806e-05 [virtual_dataset]: 1.607e-05 [get_grad_eliminate_]: 1.533e-05 [virtual_output]: 1.541e-05 [merge_forward]: 9.55001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.775e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.83e-05 [merge_recompute_call_nodes]: 1.41002e-06 [before_grad]: 2.711e-05 [set_forward_comm_id_for_comm_node_pass]: 1.005e-05 [meta_fg_expand]: 0.00138069 [flash_sp_send_recv_attached]: 3.83001e-06 [receive_attached]: 2.98e-06 [after_resolve]: 5.886e-05 [a_after_grad]: 8.07e-05 [renormalize]: 0.00244606 [add_forward_monad_depend]: 8.75999e-06 [auto_monad_grad]: 4.95001e-06 [auto_monad_eliminator]: 5.599e-05 [cse]: 0.00016545 [a_3]: 0.00033341 [Cycle 2]: 0.00300052, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.678e-05 [loop_unroll]: 4.439e-05 [a_1]: 0.00152205 [with_stream_mark]: 1.156e-05 [recompute_prepare]: 1.09e-05 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012864 [accelerated_algorithm]: 1.197e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.30001e-06 [merge_send_recv]: 7.03e-06 [auto_parallel]: 7.28e-06 [parallel]: 4.70999e-06 [flash_sp]: 3.40998e-06 [merge_comm]: 5.77999e-06 [allreduce_fusion]: 4.95999e-06 [matmul_add_comm_reduction]: 7.81001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.025e-05 [virtual_dataset]: 9.09e-06 [get_grad_eliminate_]: 8.68001e-06 [virtual_output]: 8.52998e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 9.29984e-07 [offload_activation]: 9.22999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.633e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.15999e-06 [meta_fg_expand]: 3.552e-05 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.491e-05 [a_after_grad]: 1.396e-05 [renormalize]: 0.00058452 [add_forward_monad_depend]: 4.52998e-06 [auto_monad_grad]: 1.09003e-06 [auto_monad_eliminator]: 1.426e-05 [cse]: 9.113e-05 [a_3]: 6.538e-05 [Cycle 3]: 0.00090331, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.043e-05 [loop_unroll]: 8.90001e-06 [a_1]: 0.00025008 [with_stream_mark]: 1.022e-05 [recompute_prepare]: 9.44998e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.74002e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012345 [accelerated_algorithm]: 1.139e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 8.97e-06 [merge_send_recv]: 6.96001e-06 [auto_parallel]: 7.03998e-06 [parallel]: 4.77e-06 [flash_sp]: 1.11997e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 4.78001e-06 [matmul_add_comm_reduction]: 7.70998e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 9.79999e-06 [virtual_dataset]: 8.85001e-06 [get_grad_eliminate_]: 8.62e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.27003e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 8.57e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.595e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 2.88998e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.481e-05 [a_after_grad]: 1.476e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.31002e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 1.079e-05 [cse]: 2.622e-05 [a_3]: 6.012e-05 [py_interpret_to_execute_after_opt_a]: 1.022e-05 [slice_cell_reuse_recomputed_activation]: 2.37999e-06 [rewriter_after_opt_a]: 4.852e-05 [convert_after_rewriter]: 9.15999e-06 [order_py_execute_after_rewriter]: 6.74999e-06 [mutable_eliminate]: 0.00045735 [opt_b]: 0.00028687, [1] [Cycle 1]: 0.00028093, [7] [b_1]: 0.00018899 [b_2]: 1.074e-05 [updatestate_depend_eliminate]: 7.01001e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 4.15e-06 [renormalize]: 3.89991e-07 [cse]: 3.14e-05 [optimize_parallel_all_gather_comm]: 2.002e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.034e-05 [loop_unroll]: 0.0004211 [opt_after_cconv]: 0.00013552, [1] [Cycle 1]: 0.00012981, [7] [c_1]: 4.839e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 3.85e-06 [cse]: 2.989e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.887e-05 [tuple_transform]: 0.00010057, [1] [Cycle 1]: 9.588e-05, [4] [d_1]: 6.625e-05 [none_parameter_eliminate]: 1.61998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.87999e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 5.623e-05 [cse_after_recomputation]: 3.216e-05, [1] [Cycle 1]: 2.754e-05, [1] [cse]: 2.239e-05 [environ_conv]: 9.61e-06 [swap_dp_allreduce_reducescatter]: 8.01001e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.40999e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.59999e-06 [micro_interleaved_order_control]: 2.11998e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 9.29984e-07 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.61999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.672e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.10999e-06 [overlap_recompute_and_grad_model_parallel]: 5.37001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 5.05999e-06 [overlap_grad_flash_sp]: 2.323e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 9.657e-05, [1] [Cycle 1]: 9.226e-05, [6] [build]: 1.011e-05 [elim_shapecalc]: 1.283e-05 [elim_not_effective]: 1.759e-05 [opt_reshape]: 9.81e-06 [fold_const_symbol]: 1.473e-05 [renormalize]: 1.69995e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.69998e-06 [auto_monad_reorder]: 2.409e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 4.16001e-06 [opt_after_jit_grad]: 0.00045901 [validate]: 4.504e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0823396 [execute]: 8.72998e-06 Sums bootstrap : 0.000450s : 0.43% type_inference : 0.010105s : 9.56% event_method : 0.000043s : 0.04% auto_monad : 0.000115s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000126s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000123s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003104s : 2.94% optimize.opt_a.with_stream_mark : 0.000044s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000522s : 0.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001419s : 1.34% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000109s : 0.10% optimize.opt_a.renormalize : 0.003031s : 2.87% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.08% optimize.opt_a.cse : 0.000283s : 0.27% optimize.opt_a.a_3 : 0.000459s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.43% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000421s : 0.40% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000066s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000459s : 0.43% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.082340s : 77.90% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000730 218 5.88% : 0.000043s : 11: substitution.arithmetic_simplify 1.90% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.99% : 0.000401s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000014s : 3: substitution.less_batch_normalization 1.83% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.80% : 0.000013s : 20: substitution.remove_not_recompute_node 3.24% : 0.000024s : 10: substitution.replace_applicator 1.43% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.70% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.50% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010037 2 86.98% : 0.008730s : 1: type_inference.infer 13.02% : 0.001307s : 1: type_inference.specialize ------[replace.] 0.000198 30 59.06% : 0.000117s : 16: replace.inline 40.94% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.85% : 0.000393s : 16: match.inline 7.15% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000737 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.20% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.80% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.68% : 0.000042s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.18% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 149: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.84% : 0.000036s : 265: predicate.switch_simplify 1.11% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001507 32 57.34% : 0.000864s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.66% : 0.000643s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.134105 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.23% : 0.002986s : 1: add_attr 2.22% : 0.002977s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000123s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.36% : 0.000477s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000049s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.56% : 0.004778s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000052s : 4: opt.transform.symbol_engine_opt 8.09% : 0.010851s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.35% : 0.000468s : 1: opt_after_jit_grad 0.22% : 0.000290s : 1: opt_b 9.75% : 0.013078s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000038s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.17% : 0.001570s : 2: renormalize.infer 1.08% : 0.001447s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000130s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000099s : 1: symbol_engine_optimizer 61.41% : 0.082356s : 1: task_emit 0.08% : 0.000103s : 1: tuple_transform 7.55% : 0.010120s : 1: type_inference 0.05% : 0.000068s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x8-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x9-pynative],max_mem:42.0M TotalTime = 0.0210948, [24] [bootstrap]: 0.00050541 [type_inference]: 0.00600482 [event_method]: 1.448e-05 [auto_monad]: 5.931e-05 [graph_reusing]: 5.19e-06 [inline]: 1.93997e-06 [add_attr]: 0.00336547, [1] [add_attr_with_inline]: 0.00335546, [1] [Cycle 1]: 4.416e-05, [2] [tag_attr]: 1.527e-05 [meta_addattr_fg_expand]: 4.27e-06 [parallel-infer-symbol]: 2.71999e-06 [pre_auto_parallel]: 2.643e-05 [insert-virtual-dataset]: 2.61999e-06 [parallel-infer-symbol-second]: 7.90023e-07 [dataset_repeat_opt]: 1.91998e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00394689, [53] [py_interpret_to_execute]: 2.019e-05 [rewriter_before_opt_a]: 5.626e-05 [opt_a]: 0.00210269, [2] [Cycle 1]: 0.0015038, [45] [expand_dump_flag]: 2.99001e-06 [switch_simplify]: 3.133e-05 [loop_unroll]: 2.076e-05 [a_1]: 0.00045075 [with_stream_mark]: 1.293e-05 [recompute_prepare]: 8.17998e-06 [updatestate_depend_eliminate]: 3.60998e-06 [updatestate_assign_eliminate]: 3.40998e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.643e-05 [accelerated_algorithm]: 6.73e-06 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 1.66998e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.49002e-06 [auto_parallel]: 5.90002e-06 [parallel]: 2.229e-05 [flash_sp]: 6.69001e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.14001e-06 [matmul_add_comm_reduction]: 8.14002e-06 [allreduce_slice_to_reducescatter]: 7.89994e-07 [virtual_shard_identity]: 7.37002e-06 [virtual_dataset]: 5.87001e-06 [get_grad_eliminate_]: 5.56002e-06 [virtual_output]: 5.89999e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.76e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.84999e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.76002e-06 [renormalize]: 0.00041188 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.303e-05 [cse]: 2.771e-05 [a_3]: 3.968e-05 [Cycle 2]: 0.00058969, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012489 [with_stream_mark]: 9.04e-06 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.88998e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.735e-05 [accelerated_algorithm]: 5.47999e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.44998e-06 [merge_send_recv]: 4.32003e-06 [auto_parallel]: 5.36998e-06 [parallel]: 4.07e-06 [flash_sp]: 3.16999e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.81999e-06 [matmul_add_comm_reduction]: 5.05999e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.50002e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.81998e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.74002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.73001e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.19999e-06 [cse]: 1.593e-05 [a_3]: 3.351e-05 [py_interpret_to_execute_after_opt_a]: 7.6e-06 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 2.949e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 4.95999e-06 [mutable_eliminate]: 0.00044374 [opt_b]: 0.00017972, [1] [Cycle 1]: 0.00017367, [7] [b_1]: 0.00010646 [b_2]: 6.76999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 4.19997e-07 [cse]: 1.617e-05 [optimize_parallel_all_gather_comm]: 1.53e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 2.154e-05 [loop_unroll]: 0.0004333 [opt_after_cconv]: 9.521e-05, [1] [Cycle 1]: 8.947e-05, [7] [c_1]: 2.783e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.45002e-06 [cse]: 1.652e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.306e-05 [tuple_transform]: 6.866e-05, [1] [Cycle 1]: 6.448e-05, [4] [d_1]: 3.874e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 4.858e-05 [cse_after_recomputation]: 2.05e-05, [1] [Cycle 1]: 1.61e-05, [1] [cse]: 1.082e-05 [environ_conv]: 5.15999e-06 [swap_dp_allreduce_reducescatter]: 5.46998e-06 [bias_add_comm_swap]: 2.12999e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.45002e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.11e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 3.38e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 1.87001e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.689e-05 [begin_end_overlap_inline]: 8.2e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 6.891e-05, [1] [Cycle 1]: 6.478e-05, [6] [build]: 2.50997e-06 [elim_shapecalc]: 8.64998e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.50001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.481e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 0.00012609 [opt_after_jit_grad]: 0.00044716 [validate]: 3.137e-05 [backend_pass]: 1.27e-06 [task_emit]: 0.00632192 [execute]: 6.54999e-06 Sums bootstrap : 0.000505s : 3.01% type_inference : 0.006005s : 35.81% event_method : 0.000014s : 0.09% auto_monad : 0.000059s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000056s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000576s : 3.43% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000412s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000044s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.65% optimize.opt_b.b_1 : 0.000106s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000433s : 2.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000126s : 0.75% opt_after_jit_grad : 0.000447s : 2.67% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006322s : 37.70% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.84% : 0.000024s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000001s : 2: substitution.fold_const_symbol 3.40% : 0.000006s : 4: substitution.graph_param_transform 66.60% : 0.000109s : 3: substitution.inline 1.92% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000004s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.47% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005960 2 90.33% : 0.005383s : 1: type_inference.infer 9.67% : 0.000576s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.82% : 0.000028s : 3: replace.inline 29.18% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.79% : 0.000107s : 3: match.inline 8.21% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.15% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.36% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000002s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.27% : 0.000000s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000408 8 52.19% : 0.000213s : 3: func_graph_cloner_run.FuncGraphClonerGraph 47.81% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029911 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.27% : 0.003370s : 1: add_attr 11.23% : 0.003359s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000065s : 1: auto_monad 0.06% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.81% : 0.000542s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000442s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000940s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000088s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.04% : 0.002106s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.53% : 0.000457s : 1: opt_after_jit_grad 0.61% : 0.000183s : 1: opt_b 13.21% : 0.003951s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.70% : 0.000209s : 1: renormalize.infer 0.66% : 0.000196s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000132s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 21.17% : 0.006332s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.12% : 0.006018s : 1: type_inference 0.21% : 0.000062s : 1: validate TotalTime = 0.0180328, [24] [bootstrap]: 0.0004605 [type_inference]: 0.0043191 [event_method]: 1.1e-05 [auto_monad]: 5.311e-05 [graph_reusing]: 4.82e-06 [inline]: 1.82001e-06 [add_attr]: 0.00296101, [1] [add_attr_with_inline]: 0.00295365, [1] [Cycle 1]: 4.32e-05, [2] [tag_attr]: 1.117e-05 [meta_addattr_fg_expand]: 3.23e-06 [parallel-infer-symbol]: 2.70997e-06 [pre_auto_parallel]: 2.083e-05 [insert-virtual-dataset]: 2.79999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00366116, [53] [py_interpret_to_execute]: 1.486e-05 [rewriter_before_opt_a]: 3.652e-05 [opt_a]: 0.00183074, [2] [Cycle 1]: 0.00124004, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 2.393e-05 [loop_unroll]: 1.316e-05 [a_1]: 0.00029082 [with_stream_mark]: 1.369e-05 [recompute_prepare]: 7.56999e-06 [updatestate_depend_eliminate]: 3.35998e-06 [updatestate_assign_eliminate]: 3.09001e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.95e-05 [accelerated_algorithm]: 6.18998e-06 [shard]: 1.98002e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.92999e-06 [merge_send_recv]: 7.77e-06 [auto_parallel]: 5.99e-06 [parallel]: 1.815e-05 [flash_sp]: 7.31999e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 8.42e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.79e-06 [merge_forward]: 3.56999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.02001e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 1.066e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00033279 [add_forward_monad_depend]: 4.38001e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.359e-05 [cse]: 2.664e-05 [a_3]: 3.942e-05 [Cycle 2]: 0.00058174, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00011979 [with_stream_mark]: 9.24e-06 [recompute_prepare]: 5.49e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.668e-05 [accelerated_algorithm]: 5.30999e-06 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.42998e-06 [flash_sp]: 3.4e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.57001e-06 [matmul_add_comm_reduction]: 5.20001e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.02999e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 4.96002e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.68003e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.34998e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.68999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25998e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 6.80011e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.46e-06 [a_after_grad]: 7.8e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 9.49978e-07 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.259e-05 [a_3]: 3.188e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.07e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.00049441 [opt_b]: 0.00017694, [1] [Cycle 1]: 0.00017109, [7] [b_1]: 0.00010575 [b_2]: 6.79001e-06 [updatestate_depend_eliminate]: 5.09003e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 3.30008e-07 [cse]: 1.521e-05 [optimize_parallel_all_gather_comm]: 1.507e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.193e-05 [loop_unroll]: 0.00040984 [opt_after_cconv]: 9.335e-05, [1] [Cycle 1]: 8.794e-05, [7] [c_1]: 2.792e-05 [parameter_eliminate]: 2.04e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.508e-05 [renormalize]: 4.20026e-07 [remove_dup_value]: 1.306e-05 [tuple_transform]: 6.873e-05, [1] [Cycle 1]: 6.422e-05, [4] [d_1]: 3.84e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 2.16e-06 [add_recomputation]: 4.219e-05 [cse_after_recomputation]: 2.005e-05, [1] [Cycle 1]: 1.574e-05, [1] [cse]: 1.06e-05 [environ_conv]: 4.60999e-06 [swap_dp_allreduce_reducescatter]: 4.78001e-06 [bias_add_comm_swap]: 3.10998e-06 [label_micro_interleaved_index]: 3.93999e-06 [label_fine_grained_interleaved_index]: 2.58998e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.08998e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.60018e-07 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 8.39995e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.137e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.28001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.53e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.653e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.13998e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.698e-05, [1] [Cycle 1]: 6.271e-05, [6] [build]: 2.07001e-06 [elim_shapecalc]: 7.78001e-06 [elim_not_effective]: 1.111e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.52999e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.487e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00044973 [validate]: 2.991e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00582859 [execute]: 6.52001e-06 Sums bootstrap : 0.000460s : 3.26% type_inference : 0.004319s : 30.57% event_method : 0.000011s : 0.08% auto_monad : 0.000053s : 0.38% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000411s : 2.91% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000333s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000494s : 3.50% optimize.opt_b.b_1 : 0.000106s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000410s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.02% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 3.18% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005829s : 41.26% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.62% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.54% : 0.000005s : 4: substitution.graph_param_transform 64.82% : 0.000078s : 2: substitution.inline 2.47% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.70% : 0.000004s : 4: substitution.remove_not_recompute_node 3.36% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004279 2 92.15% : 0.003943s : 1: type_inference.infer 7.85% : 0.000336s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.99% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 17: predicate.arithmetic_simplify 0.92% : 0.000001s : 9: predicate.cast_eliminate 0.89% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.53% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.10% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.05% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 1.02% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.39% : 0.000006s : 41: predicate.switch_simplify 0.79% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 43.51% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.49% : 0.000132s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025903 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.45% : 0.002965s : 1: add_attr 11.42% : 0.002957s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.92% : 0.000497s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.03% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.94% : 0.000503s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000759s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000088s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001834s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.77% : 0.000459s : 1: opt_after_jit_grad 0.70% : 0.000180s : 1: opt_b 14.15% : 0.003665s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000182s : 1: renormalize.infer 0.56% : 0.000144s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000040s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000069s : 1: symbol_engine_optimizer 22.54% : 0.005839s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.72% : 0.004332s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0200121, [24] [bootstrap]: 0.00045342 [type_inference]: 0.00545694 [event_method]: 1.396e-05 [auto_monad]: 5.534e-05 [graph_reusing]: 5.35001e-06 [inline]: 1.72001e-06 [add_attr]: 0.00339779, [1] [add_attr_with_inline]: 0.00338933, [1] [Cycle 1]: 5.077e-05, [2] [tag_attr]: 1.633e-05 [meta_addattr_fg_expand]: 4.33999e-06 [parallel-infer-symbol]: 2.56998e-06 [pre_auto_parallel]: 2.667e-05 [insert-virtual-dataset]: 2.28002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00404782, [53] [py_interpret_to_execute]: 2.086e-05 [rewriter_before_opt_a]: 5.825e-05 [opt_a]: 0.00218373, [2] [Cycle 1]: 0.00158402, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 3.077e-05 [loop_unroll]: 2.107e-05 [a_1]: 0.00045217 [with_stream_mark]: 4.009e-05 [recompute_prepare]: 7.95e-06 [updatestate_depend_eliminate]: 3.94002e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 1.89999e-06 [a_2]: 7.64e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 6.06e-06 [parallel]: 1.714e-05 [flash_sp]: 7.65e-06 [merge_comm]: 3.48999e-06 [allreduce_fusion]: 3.50003e-06 [matmul_add_comm_reduction]: 9.12999e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.37999e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.34e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.065e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35003e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.63001e-06 [renormalize]: 0.0004681 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.33e-05 [cse]: 2.731e-05 [a_3]: 4.044e-05 [Cycle 2]: 0.00059047, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012604 [with_stream_mark]: 9.07001e-06 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.741e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.70001e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 5.10001e-06 [parallel]: 4.38999e-06 [flash_sp]: 2.98e-06 [merge_comm]: 2.84999e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 4.87e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 6.52001e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 4.79e-06 [virtual_output]: 5.06997e-06 [merge_forward]: 2.93e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.26998e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.08001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99001e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.612e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.55e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 3.143e-05 [convert_after_rewriter]: 6.74999e-06 [order_py_execute_after_rewriter]: 4.74002e-06 [mutable_eliminate]: 0.00048219 [opt_b]: 0.00018159, [1] [Cycle 1]: 0.00017563, [7] [b_1]: 0.00010877 [b_2]: 6.93e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.16998e-06 [renormalize]: 5.69999e-07 [cse]: 1.575e-05 [optimize_parallel_all_gather_comm]: 1.55e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.125e-05 [loop_unroll]: 0.00042264 [opt_after_cconv]: 9.397e-05, [1] [Cycle 1]: 8.824e-05, [7] [c_1]: 2.728e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 4.99003e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.09999e-06 [cse]: 1.628e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.198e-05 [tuple_transform]: 6.805e-05, [1] [Cycle 1]: 6.392e-05, [4] [d_1]: 3.869e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.78002e-06 [add_recomputation]: 4.216e-05 [cse_after_recomputation]: 2.004e-05, [1] [Cycle 1]: 1.571e-05, [1] [cse]: 1.059e-05 [environ_conv]: 5.07e-06 [swap_dp_allreduce_reducescatter]: 5.38002e-06 [bias_add_comm_swap]: 2.75997e-06 [label_micro_interleaved_index]: 4.52998e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 8.2e-07 [full_micro_interleaved_order_control]: 2.69001e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.25999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.43002e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 4.05998e-06 [overlap_grad_flash_sp]: 1.705e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.756e-05, [1] [Cycle 1]: 6.328e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.116e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.88002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.482e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.0004586 [validate]: 3.113e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00583264 [execute]: 6.88998e-06 Sums bootstrap : 0.000453s : 2.89% type_inference : 0.005457s : 34.82% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000578s : 3.69% optimize.opt_a.with_stream_mark : 0.000049s : 0.31% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000468s : 2.99% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000482s : 3.08% optimize.opt_b.b_1 : 0.000109s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.14% optimize.loop_unroll : 0.000423s : 2.70% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000459s : 2.93% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005833s : 37.22% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000167 30 14.74% : 0.000025s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.11% : 0.000005s : 4: substitution.graph_param_transform 67.03% : 0.000112s : 3: substitution.inline 1.81% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.59% : 0.000004s : 4: substitution.replace_old_param 6.32% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005417 2 89.94% : 0.004873s : 1: type_inference.infer 10.06% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.99% : 0.000027s : 3: replace.inline 30.01% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 92.06% : 0.000110s : 3: match.inline 7.94% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.94% : 0.000001s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 19: predicate.arithmetic_simplify 1.01% : 0.000002s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 45.46% : 0.000154s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.54% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029025 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.72% : 0.003402s : 1: add_attr 11.69% : 0.003393s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.06% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.68% : 0.000486s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000491s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.24% : 0.000941s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.53% : 0.002187s : 1: opt_a 0.34% : 0.000097s : 1: opt_after_cconv 1.61% : 0.000468s : 1: opt_after_jit_grad 0.64% : 0.000185s : 1: opt_b 13.96% : 0.004052s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.85% : 0.000247s : 1: renormalize.infer 0.74% : 0.000214s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000070s : 1: symbol_engine_optimizer 20.13% : 0.005843s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 18.85% : 0.005470s : 1: type_inference 0.20% : 0.000058s : 1: validate TotalTime = 0.0369966, [24] [bootstrap]: 0.00050309 [type_inference]: 0.011244 [event_method]: 5.94e-05 [auto_monad]: 0.0001198 [graph_reusing]: 8.22998e-06 [inline]: 2.16e-06 [add_attr]: 0.00296365, [1] [add_attr_with_inline]: 0.00295582, [1] [Cycle 1]: 6.809e-05, [2] [tag_attr]: 3.366e-05 [meta_addattr_fg_expand]: 9.07999e-06 [parallel-infer-symbol]: 2.77002e-06 [pre_auto_parallel]: 4.841e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0132119, [53] [py_interpret_to_execute]: 3.942e-05 [rewriter_before_opt_a]: 0.00014323 [opt_a]: 0.010912, [3] [Cycle 1]: 0.00701105, [45] [expand_dump_flag]: 3.78999e-06 [switch_simplify]: 7.375e-05 [loop_unroll]: 6.135e-05 [a_1]: 0.00147875 [with_stream_mark]: 2.279e-05 [recompute_prepare]: 2.194e-05 [updatestate_depend_eliminate]: 8.87999e-06 [updatestate_assign_eliminate]: 8.23999e-06 [updatestate_loads_eliminate]: 7.65e-06 [parameter_eliminate]: 2.63e-06 [a_2]: 0.00024445 [accelerated_algorithm]: 3.028e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 3.19001e-06 [shard_inline]: 1.617e-05 [merge_send_recv]: 1.578e-05 [auto_parallel]: 1.087e-05 [parallel]: 1.801e-05 [flash_sp]: 1.111e-05 [merge_comm]: 9.57999e-06 [allreduce_fusion]: 8.92e-06 [matmul_add_comm_reduction]: 2.613e-05 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 1.78e-05 [virtual_dataset]: 1.56e-05 [get_grad_eliminate_]: 1.542e-05 [virtual_output]: 1.491e-05 [merge_forward]: 9.14e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.735e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.826e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.723e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.00138757 [flash_sp_send_recv_attached]: 3.91001e-06 [receive_attached]: 2.55002e-06 [after_resolve]: 5.857e-05 [a_after_grad]: 8.229e-05 [renormalize]: 0.00239263 [add_forward_monad_depend]: 8.80999e-06 [auto_monad_grad]: 5.44998e-06 [auto_monad_eliminator]: 5.493e-05 [cse]: 0.00016055 [a_3]: 0.00033583 [Cycle 2]: 0.00298939, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.692e-05 [loop_unroll]: 4.351e-05 [a_1]: 0.00153323 [with_stream_mark]: 1.227e-05 [recompute_prepare]: 1.085e-05 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 4.39002e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 0.00012599 [accelerated_algorithm]: 1.193e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.22001e-06 [merge_send_recv]: 6.61e-06 [auto_parallel]: 7.09001e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.21001e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 4.61002e-06 [matmul_add_comm_reduction]: 8.05e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 9.99001e-06 [virtual_dataset]: 8.62998e-06 [get_grad_eliminate_]: 9.12001e-06 [virtual_output]: 8.37998e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.624e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.505e-05 [set_forward_comm_id_for_comm_node_pass]: 5.91003e-06 [meta_fg_expand]: 6.967e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.606e-05 [a_after_grad]: 1.465e-05 [renormalize]: 0.00058007 [add_forward_monad_depend]: 4.30999e-06 [auto_monad_grad]: 1.25999e-06 [auto_monad_eliminator]: 1.445e-05 [cse]: 4.541e-05 [a_3]: 6.53e-05 [Cycle 3]: 0.00089737, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 1.061e-05 [loop_unroll]: 8.87e-06 [a_1]: 0.00024866 [with_stream_mark]: 9.87999e-06 [recompute_prepare]: 9.15999e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 3.97998e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012216 [accelerated_algorithm]: 1.145e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.71002e-06 [shard_inline]: 8.95999e-06 [merge_send_recv]: 6.79001e-06 [auto_parallel]: 6.88e-06 [parallel]: 4.25e-06 [flash_sp]: 1.12999e-06 [merge_comm]: 4.99003e-06 [allreduce_fusion]: 4.86002e-06 [matmul_add_comm_reduction]: 7.65e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.021e-05 [virtual_dataset]: 8.45001e-06 [get_grad_eliminate_]: 8.45001e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.18001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.661e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.495e-05 [set_forward_comm_id_for_comm_node_pass]: 5.95002e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.426e-05 [a_after_grad]: 1.431e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 1.048e-05 [cse]: 2.507e-05 [a_3]: 5.934e-05 [py_interpret_to_execute_after_opt_a]: 9.69999e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 4.627e-05 [convert_after_rewriter]: 9.34e-06 [order_py_execute_after_rewriter]: 6.79001e-06 [mutable_eliminate]: 0.00046044 [opt_b]: 0.00028562, [1] [Cycle 1]: 0.00027952, [7] [b_1]: 0.00018815 [b_2]: 1.047e-05 [updatestate_depend_eliminate]: 7.26001e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 4.12e-06 [renormalize]: 4.69998e-07 [cse]: 3.049e-05 [optimize_parallel_all_gather_comm]: 1.985e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 1.936e-05 [loop_unroll]: 0.00042756 [opt_after_cconv]: 0.00013473, [1] [Cycle 1]: 0.00012874, [7] [c_1]: 4.818e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 7.11999e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 4.16001e-06 [cse]: 2.917e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.837e-05 [tuple_transform]: 0.00010068, [1] [Cycle 1]: 9.619e-05, [4] [d_1]: 6.593e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.84999e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 9.997e-05 [cse_after_recomputation]: 3.339e-05, [1] [Cycle 1]: 2.866e-05, [1] [cse]: 2.301e-05 [environ_conv]: 8.99e-06 [swap_dp_allreduce_reducescatter]: 8.13999e-06 [bias_add_comm_swap]: 2.69999e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.55997e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.689e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.10999e-06 [overlap_recompute_and_grad_model_parallel]: 5.54e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.56002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 5.09e-06 [overlap_grad_flash_sp]: 2.394e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 9.831e-05, [1] [Cycle 1]: 9.42e-05, [6] [build]: 9.42999e-06 [elim_shapecalc]: 1.36e-05 [elim_not_effective]: 1.837e-05 [opt_reshape]: 1e-05 [fold_const_symbol]: 1.442e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.92999e-06 [pipeline_parallel_scheduler]: 1.36002e-06 [auto_monad_reorder]: 2.447e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.32002e-06 [opt_after_jit_grad]: 0.00046921 [validate]: 4.383e-05 [backend_pass]: 1.29998e-06 [task_emit]: 0.00807082 [execute]: 7.10998e-06 Sums bootstrap : 0.000503s : 1.53% type_inference : 0.011244s : 34.29% event_method : 0.000059s : 0.18% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000143s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003261s : 9.94% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.07% optimize.opt_a.meta_fg_expand : 0.001460s : 4.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000111s : 0.34% optimize.opt_a.renormalize : 0.002973s : 9.07% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000231s : 0.70% optimize.opt_a.a_3 : 0.000460s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000460s : 1.40% optimize.opt_b.b_1 : 0.000188s : 0.57% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000428s : 1.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000100s : 0.30% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000469s : 1.43% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008071s : 24.61% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000779 222 5.51% : 0.000043s : 12: substitution.arithmetic_simplify 1.74% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 57.38% : 0.000447s : 17: substitution.inline 2.03% : 0.000016s : 2: substitution.inline_without_move 1.40% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.82% : 0.000014s : 3: substitution.less_batch_normalization 1.67% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.65% : 0.000013s : 20: substitution.remove_not_recompute_node 3.00% : 0.000023s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.46% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.32% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011172 2 87.18% : 0.009740s : 1: type_inference.infer 12.82% : 0.001432s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.73% : 0.000125s : 17: replace.inline 42.27% : 0.000091s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000472 33 92.86% : 0.000438s : 17: match.inline 7.14% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000764 5764 1.04% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 3.02% : 0.000023s : 68: predicate.adjust_all_reduce_mul_add 1.94% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.11% : 0.000008s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.07% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.72% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.50% : 0.000042s : 249: predicate.inline 1.24% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.58% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000015s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.21% : 0.000009s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 101: predicate.switch_defer_inline 2.89% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.92% : 0.000038s : 277: predicate.switch_simplify 1.04% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.40% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.59% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001527 34 57.47% : 0.000878s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.53% : 0.000650s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061393 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.83% : 0.002968s : 1: add_attr 4.82% : 0.002959s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000105s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000537s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.11% : 0.000067s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.02% : 0.004923s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.78% : 0.010915s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.78% : 0.000478s : 1: opt_after_jit_grad 0.47% : 0.000289s : 1: opt_b 21.53% : 0.013216s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.64% : 0.001618s : 2: renormalize.infer 2.19% : 0.001342s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000148s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.16% : 0.008081s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.34% : 0.011257s : 1: type_inference 0.12% : 0.000073s : 1: validate TotalTime = 0.0183229, [24] [bootstrap]: 0.00045699 [type_inference]: 0.00424077 [event_method]: 1.075e-05 [auto_monad]: 5.021e-05 [graph_reusing]: 5.42001e-06 [inline]: 1.80001e-06 [add_attr]: 0.00300869, [1] [add_attr_with_inline]: 0.00300083, [1] [Cycle 1]: 4.486e-05, [2] [tag_attr]: 1.181e-05 [meta_addattr_fg_expand]: 2.99001e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.091e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00366447, [53] [py_interpret_to_execute]: 1.573e-05 [rewriter_before_opt_a]: 3.876e-05 [opt_a]: 0.00182795, [2] [Cycle 1]: 0.00123298, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 2.471e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00028681 [with_stream_mark]: 1.311e-05 [recompute_prepare]: 7.18e-06 [updatestate_depend_eliminate]: 3.25e-06 [updatestate_assign_eliminate]: 3.01999e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.68e-05 [accelerated_algorithm]: 6.09999e-06 [shard]: 2.00002e-06 [meta_shard_fg_expand]: 1.93997e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 7.79002e-06 [auto_parallel]: 5.66998e-06 [parallel]: 1.681e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.55001e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 6.94001e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.46e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.84001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.24e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.31998e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 9.05001e-06 [renormalize]: 0.00033312 [add_forward_monad_depend]: 4.02e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.297e-05 [cse]: 2.564e-05 [a_3]: 3.986e-05 [Cycle 2]: 0.00058579, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.66e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012315 [with_stream_mark]: 9.27001e-06 [recompute_prepare]: 5.65001e-06 [updatestate_depend_eliminate]: 2.63e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.732e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.31002e-06 [auto_parallel]: 5.17e-06 [parallel]: 4.69998e-06 [flash_sp]: 3.11999e-06 [merge_comm]: 2.90002e-06 [allreduce_fusion]: 2.76999e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.92001e-06 [virtual_dataset]: 5.19e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.81998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.66e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.05001e-06 [a_after_grad]: 8.18001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04003e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 5.84e-06 [cse]: 1.24e-05 [a_3]: 3.158e-05 [py_interpret_to_execute_after_opt_a]: 7.58001e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.218e-05 [convert_after_rewriter]: 7.33e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00044807 [opt_b]: 0.00017946, [1] [Cycle 1]: 0.00017354, [7] [b_1]: 0.00010734 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 3.59985e-07 [cse]: 1.532e-05 [optimize_parallel_all_gather_comm]: 1.491e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.28e-05 [loop_unroll]: 0.00045324 [opt_after_cconv]: 9.287e-05, [1] [Cycle 1]: 8.718e-05, [7] [c_1]: 2.716e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.543e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.247e-05 [tuple_transform]: 6.755e-05, [1] [Cycle 1]: 6.325e-05, [4] [d_1]: 3.811e-05 [none_parameter_eliminate]: 1.42999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.03002e-06 [partial_unused_args_eliminate]: 1.94999e-06 [add_recomputation]: 4.417e-05 [cse_after_recomputation]: 1.958e-05, [1] [Cycle 1]: 1.522e-05, [1] [cse]: 9.96e-06 [environ_conv]: 5.19e-06 [swap_dp_allreduce_reducescatter]: 5.46e-06 [bias_add_comm_swap]: 2.31998e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.03997e-06 [micro_interleaved_order_control]: 2.84001e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 1.11997e-06 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.094e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 3.69002e-06 [overlap_recompute_and_grad_model_parallel]: 4.06001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.632e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.709e-05, [1] [Cycle 1]: 6.294e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.13999e-06 [elim_not_effective]: 1.138e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.66997e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.97001e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.22002e-06 [opt_after_jit_grad]: 0.00044543 [validate]: 2.994e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00616041 [execute]: 6.28998e-06 Sums bootstrap : 0.000457s : 3.18% type_inference : 0.004241s : 29.51% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000410s : 2.85% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000333s : 2.32% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.12% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000453s : 3.15% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.10% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006160s : 42.87% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000117 26 18.18% : 0.000021s : 4: substitution.arithmetic_simplify 1.59% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.46% : 0.000005s : 4: substitution.graph_param_transform 65.17% : 0.000076s : 2: substitution.inline 2.57% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004201 2 92.03% : 0.003867s : 1: type_inference.infer 7.97% : 0.000335s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000136 984 1.07% : 0.000001s : 9: predicate.accumulaten_eliminater 1.12% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.47% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.98% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.33% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.69% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.65% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000232 6 43.67% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.33% : 0.000131s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026246 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.48% : 0.003013s : 1: add_attr 11.45% : 0.003004s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000491s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.76% : 0.000462s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.74% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.89% : 0.000759s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.98% : 0.001831s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.73% : 0.000455s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 13.98% : 0.003668s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000183s : 1: renormalize.infer 0.55% : 0.000144s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.51% : 0.006169s : 1: task_emit 0.27% : 0.000070s : 1: tuple_transform 16.21% : 0.004254s : 1: type_inference 0.21% : 0.000055s : 1: validate TotalTime = 0.0355854, [24] [bootstrap]: 0.0004957 [type_inference]: 0.010228 [event_method]: 4.066e-05 [auto_monad]: 0.00011397 [graph_reusing]: 7.66999e-06 [inline]: 1.96e-06 [add_attr]: 0.002999, [1] [add_attr_with_inline]: 0.00299047, [1] [Cycle 1]: 0.00010235, [2] [tag_attr]: 3.055e-05 [meta_addattr_fg_expand]: 8.90001e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 4.51e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 7.90023e-07 [dataset_repeat_opt]: 1.93002e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.0128928, [53] [py_interpret_to_execute]: 3.377e-05 [rewriter_before_opt_a]: 0.0001265 [opt_a]: 0.0106405, [3] [Cycle 1]: 0.00676706, [45] [expand_dump_flag]: 3.6e-06 [switch_simplify]: 6.563e-05 [loop_unroll]: 5.505e-05 [a_1]: 0.00132288 [with_stream_mark]: 2.278e-05 [recompute_prepare]: 2.123e-05 [updatestate_depend_eliminate]: 9.31998e-06 [updatestate_assign_eliminate]: 7.44002e-06 [updatestate_loads_eliminate]: 7.34002e-06 [parameter_eliminate]: 2.54999e-06 [a_2]: 0.00024202 [accelerated_algorithm]: 3.07e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 3.21001e-06 [shard_inline]: 1.576e-05 [merge_send_recv]: 1.554e-05 [auto_parallel]: 1.038e-05 [parallel]: 1.83e-05 [flash_sp]: 1.076e-05 [merge_comm]: 1.002e-05 [allreduce_fusion]: 8.97e-06 [matmul_add_comm_reduction]: 2.6e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 1.785e-05 [virtual_dataset]: 1.52e-05 [get_grad_eliminate_]: 1.508e-05 [virtual_output]: 1.501e-05 [merge_forward]: 9.36e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.707e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.818e-05 [merge_recompute_call_nodes]: 1.77999e-06 [before_grad]: 2.676e-05 [set_forward_comm_id_for_comm_node_pass]: 9.59999e-06 [meta_fg_expand]: 0.00135463 [flash_sp_send_recv_attached]: 3.6e-06 [receive_attached]: 2.52001e-06 [after_resolve]: 0.00010848 [a_after_grad]: 8.151e-05 [renormalize]: 0.00231756 [add_forward_monad_depend]: 8.80999e-06 [auto_monad_grad]: 5.17e-06 [auto_monad_eliminator]: 5.506e-05 [cse]: 0.00015767 [a_3]: 0.00033356 [Cycle 2]: 0.00297099, [45] [expand_dump_flag]: 1.49998e-06 [switch_simplify]: 4.656e-05 [loop_unroll]: 4.355e-05 [a_1]: 0.00157542 [with_stream_mark]: 1.174e-05 [recompute_prepare]: 1.121e-05 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 3.63999e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012494 [accelerated_algorithm]: 1.166e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 6.68e-06 [auto_parallel]: 7.4e-06 [parallel]: 4.41002e-06 [flash_sp]: 3.48e-06 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.66002e-06 [matmul_add_comm_reduction]: 7.65998e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.008e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.85001e-06 [virtual_output]: 8.38001e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 9.09989e-07 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.566e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.356e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24998e-06 [meta_fg_expand]: 3.405e-05 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.487e-05 [a_after_grad]: 1.415e-05 [renormalize]: 0.00056775 [add_forward_monad_depend]: 4.00998e-06 [auto_monad_grad]: 1.19998e-06 [auto_monad_eliminator]: 1.411e-05 [cse]: 4.494e-05 [a_3]: 6.424e-05 [Cycle 3]: 0.00088846, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.038e-05 [loop_unroll]: 8.80001e-06 [a_1]: 0.0002479 [with_stream_mark]: 9.52001e-06 [recompute_prepare]: 9.36e-06 [updatestate_depend_eliminate]: 4.62e-06 [updatestate_assign_eliminate]: 3.83999e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012246 [accelerated_algorithm]: 1.156e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 8.84e-06 [merge_send_recv]: 6.87002e-06 [auto_parallel]: 6.76e-06 [parallel]: 4.47003e-06 [flash_sp]: 1.02e-06 [merge_comm]: 4.82e-06 [allreduce_fusion]: 4.75001e-06 [matmul_add_comm_reduction]: 7.49002e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 9.91e-06 [virtual_dataset]: 8.64003e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.26002e-06 [merge_forward]: 4.01001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.24002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.546e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.386e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 2.92002e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.321e-05 [a_after_grad]: 1.427e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 1.01e-05 [cse]: 2.54e-05 [a_3]: 5.881e-05 [py_interpret_to_execute_after_opt_a]: 9.97999e-06 [slice_cell_reuse_recomputed_activation]: 2.18002e-06 [rewriter_after_opt_a]: 4.674e-05 [convert_after_rewriter]: 9.35001e-06 [order_py_execute_after_rewriter]: 6.69001e-06 [mutable_eliminate]: 0.00046115 [opt_b]: 0.00028476, [1] [Cycle 1]: 0.00027882, [7] [b_1]: 0.00018679 [b_2]: 1.101e-05 [updatestate_depend_eliminate]: 6.98e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 4e-06 [renormalize]: 3.29979e-07 [cse]: 3.088e-05 [optimize_parallel_all_gather_comm]: 1.982e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.06e-05 [loop_unroll]: 0.00045071 [opt_after_cconv]: 0.00013524, [1] [Cycle 1]: 0.00012936, [7] [c_1]: 4.821e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 7.11999e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 4.02e-06 [cse]: 2.939e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 2.839e-05 [tuple_transform]: 0.00010016, [1] [Cycle 1]: 9.542e-05, [4] [d_1]: 6.604e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 9.74999e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 5.507e-05 [cse_after_recomputation]: 3.128e-05, [1] [Cycle 1]: 2.659e-05, [1] [cse]: 2.121e-05 [environ_conv]: 8.33999e-06 [swap_dp_allreduce_reducescatter]: 7.58001e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.00002e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.93998e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.656e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 5.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.85001e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 5.80002e-06 [overlap_grad_flash_sp]: 2.382e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 1.98002e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 1.41002e-06 [symbol_engine_optimizer]: 9.693e-05, [1] [Cycle 1]: 9.28e-05, [6] [build]: 9.24e-06 [elim_shapecalc]: 1.259e-05 [elim_not_effective]: 1.783e-05 [opt_reshape]: 1.009e-05 [fold_const_symbol]: 1.518e-05 [renormalize]: 2.59985e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.495e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00046722 [validate]: 4.336e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00799765 [execute]: 7.45e-06 Sums bootstrap : 0.000496s : 1.58% type_inference : 0.010228s : 32.62% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003146s : 10.03% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000489s : 1.56% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000059s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000054s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001392s : 4.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000137s : 0.44% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002885s : 9.20% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000079s : 0.25% optimize.opt_a.cse : 0.000228s : 0.73% optimize.opt_a.a_3 : 0.000457s : 1.46% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000461s : 1.47% optimize.opt_b.b_1 : 0.000187s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000451s : 1.44% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000055s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000467s : 1.49% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.007998s : 25.51% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000722 218 5.81% : 0.000042s : 11: substitution.arithmetic_simplify 1.79% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.09% : 0.000008s : 8: substitution.graph_param_transform 0.41% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 54.60% : 0.000394s : 16: substitution.inline 2.20% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.07% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.80% : 0.000013s : 20: substitution.remove_not_recompute_node 3.38% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.85% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.58% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.49% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010160 2 87.35% : 0.008875s : 1: type_inference.infer 12.65% : 0.001285s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.75% : 0.000119s : 16: replace.inline 41.25% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000416 30 92.72% : 0.000386s : 16: match.inline 7.28% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000732 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.19% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000008s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.23% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000041s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.95% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 67: predicate.reduce_eliminate 2.69% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.10% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001476 32 56.39% : 0.000832s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.61% : 0.000644s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059514 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.05% : 0.003003s : 1: add_attr 5.03% : 0.002994s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.10% : 0.000059s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.89% : 0.000529s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.77% : 0.000459s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.11% : 0.004827s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.88% : 0.010643s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.80% : 0.000477s : 1: opt_after_jit_grad 0.48% : 0.000288s : 1: opt_b 21.67% : 0.012897s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.58% : 0.001534s : 2: renormalize.infer 2.25% : 0.001339s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.45% : 0.008007s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.21% : 0.010242s : 1: type_inference 0.12% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x9-kbk],max_mem:42.0M . TotalTime = 0.864037, [24] [bootstrap]: 0.00060049 [type_inference]: 0.00597331 [event_method]: 1.361e-05 [auto_monad]: 5.712e-05 [graph_reusing]: 5.59e-06 [inline]: 1.87001e-06 [add_attr]: 0.00346091, [1] [add_attr_with_inline]: 0.00345047, [1] [Cycle 1]: 4.476e-05, [2] [tag_attr]: 1.552e-05 [meta_addattr_fg_expand]: 4.08001e-06 [parallel-infer-symbol]: 2.87002e-06 [pre_auto_parallel]: 2.833e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00394542, [53] [py_interpret_to_execute]: 2.097e-05 [rewriter_before_opt_a]: 5.786e-05 [opt_a]: 0.00211695, [2] [Cycle 1]: 0.00150261, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 3.161e-05 [loop_unroll]: 2.07e-05 [a_1]: 0.00045468 [with_stream_mark]: 1.326e-05 [recompute_prepare]: 7.97e-06 [updatestate_depend_eliminate]: 3.48999e-06 [updatestate_assign_eliminate]: 3.38999e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.94e-06 [a_2]: 7.545e-05 [accelerated_algorithm]: 6.14999e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 8.2e-06 [auto_parallel]: 6.16e-06 [parallel]: 2.242e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 4.19002e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.16999e-06 [virtual_dataset]: 6.02999e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 8.90001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 9.81e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.028e-05 [a_after_grad]: 9.00001e-06 [renormalize]: 0.00040558 [add_forward_monad_depend]: 4.50001e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.325e-05 [cse]: 2.712e-05 [a_3]: 4.026e-05 [Cycle 2]: 0.00060518, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012536 [with_stream_mark]: 9.41e-06 [recompute_prepare]: 5.86998e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.67e-05 [accelerated_algorithm]: 5.34e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 2.479e-05 [auto_parallel]: 5.67999e-06 [parallel]: 4.35e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.70002e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 4.92999e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.42001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 5.87001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.02002e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.64e-06 [a_after_grad]: 7.85e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.225e-05 [a_3]: 3.162e-05 [py_interpret_to_execute_after_opt_a]: 7.75e-06 [slice_cell_reuse_recomputed_activation]: 2.14999e-06 [rewriter_after_opt_a]: 3.016e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00044859 [opt_b]: 0.00018593, [1] [Cycle 1]: 0.00017991, [7] [b_1]: 0.0001118 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 8.39995e-07 [cse]: 1.586e-05 [optimize_parallel_all_gather_comm]: 1.502e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.248e-05 [loop_unroll]: 0.00041281 [opt_after_cconv]: 9.269e-05, [1] [Cycle 1]: 8.709e-05, [7] [c_1]: 2.74e-05 [parameter_eliminate]: 2.16003e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.532e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.275e-05 [tuple_transform]: 6.791e-05, [1] [Cycle 1]: 6.357e-05, [4] [d_1]: 3.848e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.00002e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.918e-05 [cse_after_recomputation]: 2.005e-05, [1] [Cycle 1]: 1.577e-05, [1] [cse]: 1.07e-05 [environ_conv]: 4.70001e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.31998e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.50002e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.21998e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.01997e-06 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.081e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.83002e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.619e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.603e-05, [1] [Cycle 1]: 6.202e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.12e-06 [elim_not_effective]: 1.141e-05 [opt_reshape]: 5.92001e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.539e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.23998e-06 [opt_after_jit_grad]: 0.00044963 [validate]: 3.011e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.849222 [execute]: 9.00001e-06 Sums bootstrap : 0.000600s : 0.07% type_inference : 0.005973s : 0.69% event_method : 0.000014s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000580s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000033s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000010s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000406s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.05% optimize.opt_b.b_1 : 0.000112s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000413s : 0.05% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.05% validate : 0.000030s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.849222s : 98.79% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000166 30 15.04% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 66.34% : 0.000110s : 3: substitution.inline 1.81% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.45% : 0.000004s : 4: substitution.remove_not_recompute_node 2.48% : 0.000004s : 4: substitution.replace_old_param 6.80% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005927 2 90.89% : 0.005387s : 1: type_inference.infer 9.11% : 0.000540s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.68% : 0.000028s : 3: replace.inline 29.32% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.37% : 0.000108s : 3: match.inline 8.63% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.76% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 0.94% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 2.24% : 0.000004s : 23: predicate.environ_get_eliminate 1.18% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.42% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.28% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000331 8 45.60% : 0.000151s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.40% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.872944 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.40% : 0.003465s : 1: add_attr 0.40% : 0.003454s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.01% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000062s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000636s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000018s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000457s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000943s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000094s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000031s : 4: opt.transform.symbol_engine_opt 0.24% : 0.002120s : 1: opt_a 0.01% : 0.000096s : 1: opt_after_cconv 0.05% : 0.000459s : 1: opt_after_jit_grad 0.02% : 0.000189s : 1: opt_b 0.45% : 0.003949s : 1: optimize 0.00% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000206s : 1: renormalize.infer 0.02% : 0.000193s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000034s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000069s : 1: symbol_engine_optimizer 97.29% : 0.849244s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.69% : 0.005987s : 1: type_inference 0.01% : 0.000051s : 1: validate TotalTime = 0.0713594, [24] [bootstrap]: 0.00050423 [type_inference]: 0.00440084 [event_method]: 1.05e-05 [auto_monad]: 4.883e-05 [graph_reusing]: 5.29e-06 [inline]: 2.02001e-06 [add_attr]: 0.00293297, [1] [add_attr_with_inline]: 0.00292511, [1] [Cycle 1]: 4.619e-05, [2] [tag_attr]: 1.233e-05 [meta_addattr_fg_expand]: 3.3e-06 [parallel-infer-symbol]: 2.59001e-06 [pre_auto_parallel]: 2.059e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00368154, [53] [py_interpret_to_execute]: 1.47e-05 [rewriter_before_opt_a]: 3.797e-05 [opt_a]: 0.00189004, [2] [Cycle 1]: 0.00129093, [45] [expand_dump_flag]: 2.39001e-06 [switch_simplify]: 2.364e-05 [loop_unroll]: 1.383e-05 [a_1]: 0.00033602 [with_stream_mark]: 1.446e-05 [recompute_prepare]: 7.05002e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.51e-05 [accelerated_algorithm]: 6.04999e-06 [shard]: 2.60002e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.43e-06 [auto_parallel]: 5.81e-06 [parallel]: 1.744e-05 [flash_sp]: 7.28e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.66001e-06 [matmul_add_comm_reduction]: 8.58001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.77002e-06 [virtual_dataset]: 5.66998e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.5e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 8.81997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.05e-05 [merge_recompute_call_nodes]: 1.30001e-06 [before_grad]: 8.84998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 3.14001e-06 [receive_attached]: 3.23e-06 [after_resolve]: 1.003e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00034397 [add_forward_monad_depend]: 4.64002e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.277e-05 [cse]: 2.639e-05 [a_3]: 4.004e-05 [Cycle 2]: 0.00059007, [45] [expand_dump_flag]: 7.39994e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012367 [with_stream_mark]: 1.06e-05 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.70997e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.638e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.51e-06 [parallel]: 4.3e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.45001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.02001e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 5.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.18999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.14e-06 [after_resolve]: 9.10001e-06 [a_after_grad]: 7.92998e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 7.39994e-07 [auto_monad_eliminator]: 6.33e-06 [cse]: 1.258e-05 [a_3]: 3.19e-05 [py_interpret_to_execute_after_opt_a]: 7.36001e-06 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 3.046e-05 [convert_after_rewriter]: 7.16999e-06 [order_py_execute_after_rewriter]: 5.49e-06 [mutable_eliminate]: 0.0004448 [opt_b]: 0.00017856, [1] [Cycle 1]: 0.00017263, [7] [b_1]: 0.00010563 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.60015e-07 [cse]: 1.649e-05 [optimize_parallel_all_gather_comm]: 1.552e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.273e-05 [loop_unroll]: 0.00041096 [opt_after_cconv]: 9.458e-05, [1] [Cycle 1]: 8.899e-05, [7] [c_1]: 2.784e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.689e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.202e-05 [tuple_transform]: 6.861e-05, [1] [Cycle 1]: 6.414e-05, [4] [d_1]: 3.849e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.221e-05 [cse_after_recomputation]: 2.017e-05, [1] [Cycle 1]: 1.601e-05, [1] [cse]: 1.095e-05 [environ_conv]: 4.78001e-06 [swap_dp_allreduce_reducescatter]: 4.97999e-06 [bias_add_comm_swap]: 3.16999e-06 [label_micro_interleaved_index]: 4.62998e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.44998e-06 [full_micro_interleaved_order_control]: 2.46e-06 [reorder_send_recv_between_fp_bp]: 2.41998e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.091e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.58001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.694e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.56998e-06 [split_layernorm_comm]: 1.86003e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 6.785e-05, [1] [Cycle 1]: 6.381e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.154e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.494e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.25e-06 [opt_after_jit_grad]: 0.00044904 [validate]: 3.056e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0590388 [execute]: 8.37e-06 Sums bootstrap : 0.000504s : 0.75% type_inference : 0.004401s : 6.52% event_method : 0.000010s : 0.02% auto_monad : 0.000049s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000460s : 0.68% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000344s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.66% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000411s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059039s : 87.50% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000117 26 18.27% : 0.000021s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.30% : 0.000002s : 2: substitution.fold_const_symbol 4.33% : 0.000005s : 4: substitution.graph_param_transform 65.70% : 0.000077s : 2: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.35% : 0.000004s : 4: substitution.remove_not_recompute_node 3.14% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004360 2 91.90% : 0.004006s : 1: type_inference.infer 8.10% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 17: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.50% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.42% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.29% : 0.000003s : 26: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.05% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.98% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.91% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.95% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000258 6 42.28% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.72% : 0.000149s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079279 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.70% : 0.002937s : 1: add_attr 3.69% : 0.002928s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000054s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.68% : 0.000538s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000015s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.53% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.02% : 0.000805s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000088s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.39% : 0.001893s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.58% : 0.000458s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.65% : 0.003685s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.23% : 0.000183s : 1: renormalize.infer 0.20% : 0.000155s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.49% : 0.059055s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.57% : 0.004414s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.0728818, [24] [bootstrap]: 0.00046401 [type_inference]: 0.00557009 [event_method]: 1.481e-05 [auto_monad]: 5.455e-05 [graph_reusing]: 5.13002e-06 [inline]: 1.99e-06 [add_attr]: 0.00294363, [1] [add_attr_with_inline]: 0.00293573, [1] [Cycle 1]: 4.512e-05, [2] [tag_attr]: 1.53e-05 [meta_addattr_fg_expand]: 4.33999e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.494e-05 [insert-virtual-dataset]: 2.23002e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.47999e-06 [optimize]: 0.0039482, [53] [py_interpret_to_execute]: 2.064e-05 [rewriter_before_opt_a]: 5.685e-05 [opt_a]: 0.00209668, [2] [Cycle 1]: 0.00149273, [45] [expand_dump_flag]: 2.76e-06 [switch_simplify]: 3.144e-05 [loop_unroll]: 2.088e-05 [a_1]: 0.00044926 [with_stream_mark]: 1.456e-05 [recompute_prepare]: 7.66999e-06 [updatestate_depend_eliminate]: 3.95998e-06 [updatestate_assign_eliminate]: 2.99001e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 2.32999e-06 [a_2]: 7.497e-05 [accelerated_algorithm]: 6.41998e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 7.21999e-06 [auto_parallel]: 5.78002e-06 [parallel]: 1.718e-05 [flash_sp]: 7.14001e-06 [merge_comm]: 3.55998e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.26998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.67998e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 1.015e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00040684 [add_forward_monad_depend]: 4.75001e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.316e-05 [cse]: 2.729e-05 [a_3]: 4.082e-05 [Cycle 2]: 0.00059431, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.80002e-06 [loop_unroll]: 5.17e-06 [a_1]: 0.00012479 [with_stream_mark]: 9.10999e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.32999e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.819e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.91001e-06 [flash_sp]: 3.23998e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 3.03e-06 [matmul_add_comm_reduction]: 4.91002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.02001e-06 [virtual_dataset]: 5.48002e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 6.07999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54999e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.75e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.66997e-06 [a_after_grad]: 7.77002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 6.35002e-06 [cse]: 1.307e-05 [a_3]: 3.309e-05 [py_interpret_to_execute_after_opt_a]: 7.56001e-06 [slice_cell_reuse_recomputed_activation]: 2.28002e-06 [rewriter_after_opt_a]: 3.26e-05 [convert_after_rewriter]: 6.64999e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00044797 [opt_b]: 0.00018209, [1] [Cycle 1]: 0.00017623, [7] [b_1]: 0.00010814 [b_2]: 7.29001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.33002e-06 [renormalize]: 3.19997e-07 [cse]: 1.638e-05 [optimize_parallel_all_gather_comm]: 1.487e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.316e-05 [loop_unroll]: 0.00041141 [opt_after_cconv]: 0.00011902, [1] [Cycle 1]: 0.00011306, [7] [c_1]: 2.733e-05 [parameter_eliminate]: 2.78e-06 [updatestate_depend_eliminate]: 4.94998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.28002e-06 [cse]: 3.86e-05 [renormalize]: 5.39992e-07 [remove_dup_value]: 1.29e-05 [tuple_transform]: 6.983e-05, [1] [Cycle 1]: 6.524e-05, [4] [d_1]: 3.969e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 4.351e-05 [cse_after_recomputation]: 2.027e-05, [1] [Cycle 1]: 1.583e-05, [1] [cse]: 1.061e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 2.53998e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 1.42e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.12e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.2e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.83001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60001e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.725e-05 [begin_end_overlap_inline]: 4.70027e-07 [split_matmul_comm_elemetwise]: 1.82999e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.715e-05, [1] [Cycle 1]: 6.309e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.03001e-06 [elim_not_effective]: 1.112e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 9.19998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.523e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.58999e-06 [opt_after_jit_grad]: 0.00044956 [validate]: 3.156e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0591349 [execute]: 8.27998e-06 Sums bootstrap : 0.000464s : 0.67% type_inference : 0.005570s : 8.08% event_method : 0.000015s : 0.02% auto_monad : 0.000055s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000574s : 0.83% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000407s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000411s : 0.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000039s : 0.06% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000450s : 0.65% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059135s : 85.73% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000163 30 15.35% : 0.000025s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 66.23% : 0.000108s : 3: substitution.inline 1.61% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.38% : 0.000004s : 4: substitution.replace_old_param 6.99% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005529 2 90.14% : 0.004984s : 1: type_inference.infer 9.86% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.78% : 0.000026s : 3: replace.inline 30.22% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.06% : 0.000106s : 3: match.inline 8.94% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.61% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.38% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000002s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.48% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000002s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.70% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 47.13% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.87% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081273 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.63% : 0.002948s : 1: add_attr 3.62% : 0.002939s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000500s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.15% : 0.000936s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.58% : 0.002100s : 1: opt_a 0.15% : 0.000123s : 1: opt_after_cconv 0.56% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.86% : 0.003952s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000202s : 1: renormalize.infer 0.24% : 0.000198s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.08% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 72.78% : 0.059151s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 6.87% : 0.005584s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 1.07124, [24] [bootstrap]: 0.00050844 [type_inference]: 0.0117696 [event_method]: 5.006e-05 [auto_monad]: 0.00012332 [graph_reusing]: 8.33999e-06 [inline]: 2.07999e-06 [add_attr]: 0.00306142, [1] [add_attr_with_inline]: 0.00305338, [1] [Cycle 1]: 7.105e-05, [2] [tag_attr]: 3.424e-05 [meta_addattr_fg_expand]: 1.023e-05 [parallel-infer-symbol]: 2.85002e-06 [pre_auto_parallel]: 5.054e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.0136932, [53] [py_interpret_to_execute]: 3.977e-05 [rewriter_before_opt_a]: 0.00014704 [opt_a]: 0.0114048, [3] [Cycle 1]: 0.00738182, [45] [expand_dump_flag]: 3.93999e-06 [switch_simplify]: 7.506e-05 [loop_unroll]: 6.305e-05 [a_1]: 0.00146933 [with_stream_mark]: 2.301e-05 [recompute_prepare]: 2.19e-05 [updatestate_depend_eliminate]: 9.41998e-06 [updatestate_assign_eliminate]: 7.83999e-06 [updatestate_loads_eliminate]: 7.41001e-06 [parameter_eliminate]: 2.45002e-06 [a_2]: 0.00024675 [accelerated_algorithm]: 3.193e-05 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 3.41999e-06 [shard_inline]: 1.62e-05 [merge_send_recv]: 1.617e-05 [auto_parallel]: 3.214e-05 [parallel]: 1.865e-05 [flash_sp]: 1.146e-05 [merge_comm]: 1.029e-05 [allreduce_fusion]: 8.84e-06 [matmul_add_comm_reduction]: 2.64e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.84e-05 [virtual_dataset]: 1.612e-05 [get_grad_eliminate_]: 1.574e-05 [virtual_output]: 1.546e-05 [merge_forward]: 9.52999e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.754e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.933e-05 [merge_recompute_call_nodes]: 1.71002e-06 [before_grad]: 2.753e-05 [set_forward_comm_id_for_comm_node_pass]: 1e-05 [meta_fg_expand]: 0.0014731 [flash_sp_send_recv_attached]: 4.21001e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 6.034e-05 [a_after_grad]: 8.275e-05 [renormalize]: 0.00261692 [add_forward_monad_depend]: 8.72998e-06 [auto_monad_grad]: 5.17e-06 [auto_monad_eliminator]: 5.875e-05 [cse]: 0.00017586 [a_3]: 0.00034356 [Cycle 2]: 0.00309176, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.783e-05 [loop_unroll]: 4.465e-05 [a_1]: 0.0015603 [with_stream_mark]: 1.293e-05 [recompute_prepare]: 1.112e-05 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 0.00012721 [accelerated_algorithm]: 1.189e-05 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.98997e-06 [shard_inline]: 9.41e-06 [merge_send_recv]: 6.91001e-06 [auto_parallel]: 8e-06 [parallel]: 4.88001e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 5.57999e-06 [allreduce_fusion]: 4.99e-06 [matmul_add_comm_reduction]: 7.84002e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.024e-05 [virtual_dataset]: 8.92e-06 [get_grad_eliminate_]: 8.92999e-06 [virtual_output]: 8.59002e-06 [merge_forward]: 4.65999e-06 [cell_reuse_recompute_pass]: 9.60019e-07 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.657e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.433e-05 [set_forward_comm_id_for_comm_node_pass]: 5.51e-06 [meta_fg_expand]: 7.225e-05 [flash_sp_send_recv_attached]: 1.12999e-06 [receive_attached]: 1.34e-06 [after_resolve]: 1.664e-05 [a_after_grad]: 1.482e-05 [renormalize]: 0.000633 [add_forward_monad_depend]: 4.10998e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.498e-05 [cse]: 4.951e-05 [a_3]: 6.671e-05 [Cycle 3]: 0.00091748, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.068e-05 [loop_unroll]: 9.02999e-06 [a_1]: 0.00025217 [with_stream_mark]: 9.71e-06 [recompute_prepare]: 9.89001e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 4.20999e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.00012469 [accelerated_algorithm]: 1.201e-05 [shard]: 8.70001e-07 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 9.13002e-06 [merge_send_recv]: 7.33e-06 [auto_parallel]: 7.24001e-06 [parallel]: 4.63001e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 5.12999e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.86001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.063e-05 [virtual_dataset]: 8.89e-06 [get_grad_eliminate_]: 8.58001e-06 [virtual_output]: 8.33001e-06 [merge_forward]: 4.39002e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.655e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.439e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.35e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.507e-05 [a_after_grad]: 1.522e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 1.07998e-06 [auto_monad_eliminator]: 1.12e-05 [cse]: 2.756e-05 [a_3]: 6.064e-05 [py_interpret_to_execute_after_opt_a]: 1.094e-05 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 4.665e-05 [convert_after_rewriter]: 9.71e-06 [order_py_execute_after_rewriter]: 7.21999e-06 [mutable_eliminate]: 0.00046356 [opt_b]: 0.00029122, [1] [Cycle 1]: 0.00028492, [7] [b_1]: 0.00019089 [b_2]: 1.1e-05 [updatestate_depend_eliminate]: 7.29001e-06 [updatestate_assign_eliminate]: 4.13999e-06 [updatestate_loads_eliminate]: 4.27e-06 [renormalize]: 2.89991e-07 [cse]: 3.258e-05 [optimize_parallel_all_gather_comm]: 2.016e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 1.976e-05 [loop_unroll]: 0.00043094 [opt_after_cconv]: 0.00013828, [1] [Cycle 1]: 0.00013218, [7] [c_1]: 4.846e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 7.32997e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 4.45e-06 [cse]: 3.166e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.901e-05 [tuple_transform]: 0.00010387, [1] [Cycle 1]: 9.893e-05, [4] [d_1]: 6.786e-05 [none_parameter_eliminate]: 1.87001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.026e-05 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 5.681e-05 [cse_after_recomputation]: 3.249e-05, [1] [Cycle 1]: 2.781e-05, [1] [cse]: 2.235e-05 [environ_conv]: 8.65001e-06 [swap_dp_allreduce_reducescatter]: 8.95999e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.39001e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.24e-06 [full_micro_interleaved_order_control]: 2.40997e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.715e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 4.97e-06 [overlap_recompute_and_grad_model_parallel]: 5.42999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 5.10001e-06 [overlap_grad_flash_sp]: 2.415e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.96e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 9.10019e-07 [symbol_engine_optimizer]: 9.861e-05, [1] [Cycle 1]: 9.426e-05, [6] [build]: 9.28002e-06 [elim_shapecalc]: 1.354e-05 [elim_not_effective]: 1.843e-05 [opt_reshape]: 1.029e-05 [fold_const_symbol]: 1.496e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.55999e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 2.512e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.83001e-06 [opt_after_jit_grad]: 0.00047203 [validate]: 4.631e-05 [backend_pass]: 3.21001e-06 [task_emit]: 1.0379 [execute]: 9.79e-06 Sums bootstrap : 0.000508s : 0.05% type_inference : 0.011770s : 1.11% event_method : 0.000050s : 0.00% auto_monad : 0.000123s : 0.01% graph_reusing : 0.000008s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.00% optimize.rewriter_before_opt_a : 0.000147s : 0.01% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000134s : 0.01% optimize.opt_a.loop_unroll : 0.000117s : 0.01% optimize.opt_a.a_1 : 0.003282s : 0.31% optimize.opt_a.with_stream_mark : 0.000046s : 0.00% optimize.opt_a.recompute_prepare : 0.000043s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.00% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.05% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.00% optimize.opt_a.merge_send_recv : 0.000030s : 0.00% optimize.opt_a.auto_parallel : 0.000047s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000016s : 0.00% optimize.opt_a.merge_comm : 0.000021s : 0.00% optimize.opt_a.allreduce_fusion : 0.000019s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.00% optimize.opt_a.virtual_dataset : 0.000034s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.00% optimize.opt_a.virtual_output : 0.000032s : 0.00% optimize.opt_a.merge_forward : 0.000019s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.00% optimize.opt_a.meta_fg_expand : 0.001549s : 0.15% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.01% optimize.opt_a.a_after_grad : 0.000113s : 0.01% optimize.opt_a.renormalize : 0.003250s : 0.31% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.00% optimize.opt_a.auto_monad_grad : 0.000007s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.01% optimize.opt_a.cse : 0.000253s : 0.02% optimize.opt_a.a_3 : 0.000471s : 0.04% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.00% optimize.convert_after_rewriter : 0.000010s : 0.00% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000464s : 0.04% optimize.opt_b.b_1 : 0.000191s : 0.02% optimize.opt_b.b_2 : 0.000011s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.00% optimize.loop_unroll : 0.000431s : 0.04% optimize.opt_after_cconv.c_1 : 0.000048s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.00% optimize.tuple_transform.d_1 : 0.000068s : 0.01% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.01% optimize.cse_after_recomputation.cse : 0.000022s : 0.00% optimize.environ_conv : 0.000009s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000472s : 0.04% validate : 0.000046s : 0.00% backend_pass : 0.000003s : 0.00% task_emit : 1.037897s : 97.58% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000777 222 5.77% : 0.000045s : 12: substitution.arithmetic_simplify 1.88% : 0.000015s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.44% : 0.000439s : 17: substitution.inline 2.01% : 0.000016s : 2: substitution.inline_without_move 1.30% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.66% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000006s : 5: substitution.partial_eliminate 1.76% : 0.000014s : 20: substitution.remove_not_recompute_node 3.07% : 0.000024s : 10: substitution.replace_applicator 1.32% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.48% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011693 2 85.74% : 0.010026s : 1: type_inference.infer 14.26% : 0.001667s : 1: type_inference.specialize ------[replace.] 0.000231 33 56.95% : 0.000131s : 17: replace.inline 43.05% : 0.000099s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000464 33 92.68% : 0.000430s : 17: match.inline 7.32% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.40% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000042s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.31% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.13% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000016s : 101: predicate.partial_defer_inline 1.80% : 0.000014s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.63% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.91% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.98% : 0.000038s : 277: predicate.switch_simplify 1.05% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.89% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.22% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001751 34 54.83% : 0.000960s : 13: func_graph_cloner_run.FuncGraphClonerGraph 45.17% : 0.000791s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.096548 237 0.00% : 0.000003s : 1: ForceFp32Comm 0.28% : 0.003066s : 1: add_attr 0.28% : 0.003057s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000130s : 1: auto_monad 0.00% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000024s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.05% : 0.000543s : 1: bootstrap 0.00% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000020s : 1: control_data_broadcast_order 0.00% : 0.000013s : 1: convert_after_rewriter 0.00% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000012s : 1: environ_conv 0.01% : 0.000057s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000440s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000473s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.00% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000017s : 1: opt.transform.mutable_eliminate 0.45% : 0.004977s : 117: opt.transform.opt_a 0.00% : 0.000047s : 1: opt.transform.opt_after_cconv 0.00% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.02% : 0.000177s : 28: opt.transform.opt_b 0.01% : 0.000076s : 2: opt.transform.opt_trans_graph 0.00% : 0.000054s : 4: opt.transform.symbol_engine_opt 1.04% : 0.011408s : 1: opt_a 0.01% : 0.000142s : 1: opt_after_cconv 0.04% : 0.000482s : 1: opt_after_jit_grad 0.03% : 0.000295s : 1: opt_b 1.25% : 0.013697s : 1: optimize 0.00% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.00% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.01% : 0.000055s : 1: pre_auto_parallel 0.00% : 0.000044s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000033s : 1: remove_dup_value 0.15% : 0.001644s : 2: renormalize.infer 0.15% : 0.001592s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000050s : 1: rewriter_after_opt_a 0.01% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000101s : 1: symbol_engine_optimizer 94.65% : 1.037918s : 1: task_emit 0.01% : 0.000107s : 1: tuple_transform 1.07% : 0.011784s : 1: type_inference 0.30% : 0.003341s : 1: validate TotalTime = 0.0712837, [24] [bootstrap]: 0.00050371 [type_inference]: 0.0043056 [event_method]: 1.137e-05 [auto_monad]: 5.026e-05 [graph_reusing]: 5.20999e-06 [inline]: 1.84998e-06 [add_attr]: 0.00303481, [1] [add_attr_with_inline]: 0.00302714, [1] [Cycle 1]: 4.433e-05, [2] [tag_attr]: 1.14e-05 [meta_addattr_fg_expand]: 3.11001e-06 [parallel-infer-symbol]: 2.93003e-06 [pre_auto_parallel]: 2.163e-05 [insert-virtual-dataset]: 2.57001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.15002e-06 [pipeline_split]: 1.88002e-06 [optimize]: 0.00374161, [53] [py_interpret_to_execute]: 1.431e-05 [rewriter_before_opt_a]: 3.927e-05 [opt_a]: 0.00192068, [2] [Cycle 1]: 0.00130976, [45] [expand_dump_flag]: 2.57001e-06 [switch_simplify]: 2.395e-05 [loop_unroll]: 1.434e-05 [a_1]: 0.00033413 [with_stream_mark]: 1.362e-05 [recompute_prepare]: 7.54002e-06 [updatestate_depend_eliminate]: 3.66999e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.869e-05 [accelerated_algorithm]: 6.57002e-06 [shard]: 2.58e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.4e-06 [auto_parallel]: 6.71999e-06 [parallel]: 1.894e-05 [flash_sp]: 7.55998e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.08998e-06 [virtual_dataset]: 5.75001e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.44e-06 [merge_forward]: 3.68999e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 9.07999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.09e-06 [after_resolve]: 1.039e-05 [a_after_grad]: 9.14e-06 [renormalize]: 0.00035053 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.281e-05 [cse]: 2.661e-05 [a_3]: 4.034e-05 [Cycle 2]: 0.0006018, [45] [expand_dump_flag]: 7.7e-07 [switch_simplify]: 6.74999e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012635 [with_stream_mark]: 1.138e-05 [recompute_prepare]: 6.04001e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.886e-05 [accelerated_algorithm]: 5.82999e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 5.66998e-06 [parallel]: 3.95998e-06 [flash_sp]: 3.13998e-06 [merge_comm]: 3.29001e-06 [allreduce_fusion]: 2.78998e-06 [matmul_add_comm_reduction]: 5.27999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.33002e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.19998e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 1.79998e-06 [flash_sp_send_recv_attached]: 1.22999e-06 [receive_attached]: 1.00001e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.59999e-06 [cse]: 1.252e-05 [a_3]: 3.283e-05 [py_interpret_to_execute_after_opt_a]: 7.9e-06 [slice_cell_reuse_recomputed_activation]: 1.83002e-06 [rewriter_after_opt_a]: 3.093e-05 [convert_after_rewriter]: 7.11001e-06 [order_py_execute_after_rewriter]: 5.03002e-06 [mutable_eliminate]: 0.00044926 [opt_b]: 0.00018435, [1] [Cycle 1]: 0.00017828, [7] [b_1]: 0.00010942 [b_2]: 7.41999e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 4.90021e-07 [cse]: 1.575e-05 [optimize_parallel_all_gather_comm]: 1.617e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.179e-05 [loop_unroll]: 0.00041915 [opt_after_cconv]: 9.664e-05, [1] [Cycle 1]: 9.112e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.55001e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.31998e-06 [cse]: 1.699e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.265e-05 [tuple_transform]: 7.048e-05, [1] [Cycle 1]: 6.617e-05, [4] [d_1]: 3.982e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.51e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.32e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.584e-05, [1] [cse]: 1.057e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.40001e-06 [bias_add_comm_swap]: 2.46998e-06 [label_micro_interleaved_index]: 4.02e-06 [label_fine_grained_interleaved_index]: 3.04001e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.52999e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.145e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 1.89999e-06 [overlap_grad_ring_attention]: 3.82002e-06 [overlap_grad_flash_sp]: 1.657e-05 [begin_end_overlap_inline]: 6.50005e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 7.043e-05, [1] [Cycle 1]: 6.617e-05, [6] [build]: 2.61e-06 [elim_shapecalc]: 8.80001e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.07001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.80001e-06 [auto_monad_reorder]: 1.593e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00047809 [validate]: 3.286e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0588595 [execute]: 7.45e-06 Sums bootstrap : 0.000504s : 0.75% type_inference : 0.004306s : 6.40% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000460s : 0.68% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000351s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.67% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000419s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000478s : 0.71% validate : 0.000033s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058860s : 87.48% execute : 0.000007s : 0.01% Time group info: ------[substitution.] 0.000159 26 13.78% : 0.000022s : 4: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.29% : 0.000005s : 4: substitution.graph_param_transform 74.21% : 0.000118s : 2: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.44% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004265 2 91.68% : 0.003910s : 1: type_inference.infer 8.32% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000116 2 100.00% : 0.000116s : 2: match.inline ------[predicate.] 0.000162 984 0.75% : 0.000001s : 9: predicate.accumulaten_eliminater 15.36% : 0.000025s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.67% : 0.000001s : 9: predicate.addn_zero_filter 0.58% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.05% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.68% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.76% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.68% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.35% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.94% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.88% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.88% : 0.000001s : 13: predicate.environ_get_depend_swap 1.69% : 0.000003s : 21: predicate.environ_get_eliminate 0.88% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.80% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000003s : 11: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.11% : 0.000008s : 44: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.36% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.83% : 0.000003s : 26: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.46% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.43% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.59% : 0.000001s : 9: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.06% : 0.000002s : 11: predicate.partial_defer_inline 1.03% : 0.000002s : 13: predicate.partial_eliminate 0.72% : 0.000001s : 9: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 0.84% : 0.000001s : 9: predicate.reduce_eliminate 1.84% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.11% : 0.000002s : 17: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.67% : 0.000001s : 9: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.89% : 0.000001s : 11: predicate.switch_defer_inline 1.47% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.76% : 0.000006s : 41: predicate.switch_simplify 0.64% : 0.000001s : 9: predicate.tile_eliminate 0.65% : 0.000001s : 9: predicate.transpose_eliminate 1.31% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.24% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.24% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.26% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.77% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.59% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 42.10% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.90% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079413 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.83% : 0.003039s : 1: add_attr 3.82% : 0.003031s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.68% : 0.000537s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.03% : 0.000817s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.06% : 0.000046s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.42% : 0.001924s : 1: opt_a 0.13% : 0.000100s : 1: opt_after_cconv 0.61% : 0.000488s : 1: opt_after_jit_grad 0.24% : 0.000188s : 1: opt_b 4.72% : 0.003745s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000189s : 1: renormalize.infer 0.20% : 0.000155s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 74.14% : 0.058875s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.44% : 0.004319s : 1: type_inference 0.07% : 0.000055s : 1: validate TotalTime = 0.109274, [24] [bootstrap]: 0.00050233 [type_inference]: 0.0106731 [event_method]: 4.608e-05 [auto_monad]: 0.00012118 [graph_reusing]: 8.37e-06 [inline]: 2.53e-06 [add_attr]: 0.00305606, [1] [add_attr_with_inline]: 0.00304716, [1] [Cycle 1]: 6.81e-05, [2] [tag_attr]: 3.234e-05 [meta_addattr_fg_expand]: 9.10001e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 4.711e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.67999e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.0136527, [53] [py_interpret_to_execute]: 3.65e-05 [rewriter_before_opt_a]: 0.00012985 [opt_a]: 0.0113669, [3] [Cycle 1]: 0.00725828, [45] [expand_dump_flag]: 3.61999e-06 [switch_simplify]: 6.831e-05 [loop_unroll]: 5.668e-05 [a_1]: 0.00138062 [with_stream_mark]: 2.341e-05 [recompute_prepare]: 2.278e-05 [updatestate_depend_eliminate]: 9.39e-06 [updatestate_assign_eliminate]: 7.60998e-06 [updatestate_loads_eliminate]: 7.74002e-06 [parameter_eliminate]: 2.37001e-06 [a_2]: 0.00024914 [accelerated_algorithm]: 3.877e-05 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 3.80998e-06 [shard_inline]: 1.625e-05 [merge_send_recv]: 1.683e-05 [auto_parallel]: 1.135e-05 [parallel]: 1.874e-05 [flash_sp]: 1.176e-05 [merge_comm]: 9.99001e-06 [allreduce_fusion]: 9.51998e-06 [matmul_add_comm_reduction]: 2.723e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.825e-05 [virtual_dataset]: 1.607e-05 [get_grad_eliminate_]: 1.531e-05 [virtual_output]: 1.543e-05 [merge_forward]: 9.59999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 1.788e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.687e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 2.878e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.00148871 [flash_sp_send_recv_attached]: 4.28999e-06 [receive_attached]: 2.83e-06 [after_resolve]: 6.102e-05 [a_after_grad]: 8.386e-05 [renormalize]: 0.00258206 [add_forward_monad_depend]: 9.12999e-06 [auto_monad_grad]: 5.44e-06 [auto_monad_eliminator]: 5.796e-05 [cse]: 0.00017537 [a_3]: 0.00034271 [Cycle 2]: 0.00308911, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 4.793e-05 [loop_unroll]: 4.661e-05 [a_1]: 0.00156246 [with_stream_mark]: 1.234e-05 [recompute_prepare]: 1.108e-05 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 4.34997e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012902 [accelerated_algorithm]: 1.217e-05 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.31998e-06 [merge_send_recv]: 6.64999e-06 [auto_parallel]: 7.50998e-06 [parallel]: 4.75001e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 5.33002e-06 [allreduce_fusion]: 4.89998e-06 [matmul_add_comm_reduction]: 7.92998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.077e-05 [virtual_dataset]: 9.53002e-06 [get_grad_eliminate_]: 9.00999e-06 [virtual_output]: 9.19e-06 [merge_forward]: 5.20999e-06 [cell_reuse_recompute_pass]: 9.00007e-07 [offload_activation]: 9.19998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.67e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.462e-05 [set_forward_comm_id_for_comm_node_pass]: 5.62001e-06 [meta_fg_expand]: 3.62e-05 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.549e-05 [a_after_grad]: 1.531e-05 [renormalize]: 0.00061988 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.461e-05 [cse]: 5.016e-05 [a_3]: 6.585e-05 [Cycle 3]: 0.0010056, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 1.074e-05 [loop_unroll]: 9.16998e-06 [a_1]: 0.00025466 [with_stream_mark]: 1.119e-05 [recompute_prepare]: 1.01e-05 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 4.26001e-06 [updatestate_loads_eliminate]: 4.03001e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00018009 [accelerated_algorithm]: 1.335e-05 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 1.91998e-06 [shard_inline]: 9.71e-06 [merge_send_recv]: 7.73999e-06 [auto_parallel]: 7.69002e-06 [parallel]: 4.87e-06 [flash_sp]: 1.20001e-06 [merge_comm]: 5.17999e-06 [allreduce_fusion]: 5.25999e-06 [matmul_add_comm_reduction]: 8.02e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.066e-05 [virtual_dataset]: 9.00001e-06 [get_grad_eliminate_]: 9.00001e-06 [virtual_output]: 8.54e-06 [merge_forward]: 4.28001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 9.10999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.657e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.426e-05 [set_forward_comm_id_for_comm_node_pass]: 6.37001e-06 [meta_fg_expand]: 3.61001e-06 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.509e-05 [a_after_grad]: 1.494e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.74998e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.141e-05 [cse]: 2.788e-05 [a_3]: 7.463e-05 [py_interpret_to_execute_after_opt_a]: 1.1e-05 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 4.737e-05 [convert_after_rewriter]: 9.74e-06 [order_py_execute_after_rewriter]: 7.55e-06 [mutable_eliminate]: 0.00046388 [opt_b]: 0.00029818, [1] [Cycle 1]: 0.00029169, [7] [b_1]: 0.000194 [b_2]: 1.152e-05 [updatestate_depend_eliminate]: 7.65e-06 [updatestate_assign_eliminate]: 4.26001e-06 [updatestate_loads_eliminate]: 4.13001e-06 [renormalize]: 5.39992e-07 [cse]: 3.375e-05 [optimize_parallel_all_gather_comm]: 2.125e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.061e-05 [loop_unroll]: 0.00042827 [opt_after_cconv]: 0.00013895, [1] [Cycle 1]: 0.00013314, [7] [c_1]: 4.949e-05 [parameter_eliminate]: 2.44999e-06 [updatestate_depend_eliminate]: 7.28e-06 [updatestate_assign_eliminate]: 4.53999e-06 [updatestate_loads_eliminate]: 3.91999e-06 [cse]: 3.087e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 3.024e-05 [tuple_transform]: 0.00010344, [1] [Cycle 1]: 9.872e-05, [4] [d_1]: 6.784e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.009e-05 [partial_unused_args_eliminate]: 2.21e-06 [add_recomputation]: 5.877e-05 [cse_after_recomputation]: 3.382e-05, [1] [Cycle 1]: 2.901e-05, [1] [cse]: 2.322e-05 [environ_conv]: 9.77001e-06 [swap_dp_allreduce_reducescatter]: 8.12e-06 [bias_add_comm_swap]: 2.22999e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.16997e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.06997e-06 [full_micro_interleaved_order_control]: 2.54001e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.24e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.77e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 4.96997e-06 [overlap_recompute_and_grad_model_parallel]: 5.82001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.25002e-06 [overlap_grad_ring_attention]: 5.30999e-06 [overlap_grad_flash_sp]: 2.366e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 1.16002e-06 [symbol_engine_optimizer]: 0.00010184, [1] [Cycle 1]: 9.741e-05, [6] [build]: 1.064e-05 [elim_shapecalc]: 1.383e-05 [elim_not_effective]: 1.893e-05 [opt_reshape]: 1.02e-05 [fold_const_symbol]: 1.573e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.53002e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.565e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00047456 [validate]: 4.573e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0803784 [execute]: 9.81e-06 Sums bootstrap : 0.000502s : 0.48% type_inference : 0.010673s : 10.17% event_method : 0.000046s : 0.04% auto_monad : 0.000121s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000130s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000127s : 0.12% optimize.opt_a.loop_unroll : 0.000112s : 0.11% optimize.opt_a.a_1 : 0.003198s : 3.05% optimize.opt_a.with_stream_mark : 0.000047s : 0.04% optimize.opt_a.recompute_prepare : 0.000044s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000558s : 0.53% optimize.opt_a.accelerated_algorithm : 0.000064s : 0.06% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000027s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.04% optimize.opt_a.virtual_dataset : 0.000035s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000070s : 0.07% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.02% optimize.opt_a.meta_fg_expand : 0.001529s : 1.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.09% optimize.opt_a.a_after_grad : 0.000114s : 0.11% optimize.opt_a.renormalize : 0.003202s : 3.05% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.08% optimize.opt_a.cse : 0.000253s : 0.24% optimize.opt_a.a_3 : 0.000483s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.05% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000008s : 0.01% optimize.mutable_eliminate : 0.000464s : 0.44% optimize.opt_b.b_1 : 0.000194s : 0.18% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000428s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000475s : 0.45% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.080378s : 76.62% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000772 218 5.47% : 0.000042s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 54.71% : 0.000422s : 16: substitution.inline 2.10% : 0.000016s : 2: substitution.inline_without_move 1.42% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.85% : 0.000022s : 3: substitution.less_batch_normalization 1.68% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 2.74% : 0.000021s : 20: substitution.remove_not_recompute_node 3.19% : 0.000025s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.51% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.31% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.29% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010603 2 86.03% : 0.009121s : 1: type_inference.infer 13.97% : 0.001481s : 1: type_inference.specialize ------[replace.] 0.000213 30 59.03% : 0.000126s : 16: replace.inline 40.97% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 30 92.89% : 0.000413s : 16: match.inline 7.11% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.01% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 99: predicate.arithmetic_simplify 1.13% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.07% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.63% : 0.000005s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000042s : 244: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.61% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.37% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.84% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.20% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000014s : 97: predicate.switch_defer_inline 2.90% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000037s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.54% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.89% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.58% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001713 32 55.65% : 0.000953s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.35% : 0.000760s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.134491 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.28% : 0.003061s : 1: add_attr 2.27% : 0.003051s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000128s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000538s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000053s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000473s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.70% : 0.004973s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000179s : 28: opt.transform.opt_b 0.06% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000055s : 4: opt.transform.symbol_engine_opt 8.45% : 0.011370s : 1: opt_a 0.11% : 0.000143s : 1: opt_after_cconv 0.36% : 0.000485s : 1: opt_after_jit_grad 0.22% : 0.000302s : 1: opt_b 10.15% : 0.013657s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000035s : 1: remove_dup_value 1.22% : 0.001642s : 2: renormalize.infer 1.15% : 0.001547s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.10% : 0.000135s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000105s : 1: symbol_engine_optimizer 59.78% : 0.080396s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 7.95% : 0.010688s : 1: type_inference 0.05% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y3-dtype_x9-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x0-pynative],max_mem:42.0M TotalTime = 0.0221807, [24] [bootstrap]: 0.00056911 [type_inference]: 0.00645891 [event_method]: 1.543e-05 [auto_monad]: 5.497e-05 [graph_reusing]: 5.71e-06 [inline]: 1.67999e-06 [add_attr]: 0.00350512, [1] [add_attr_with_inline]: 0.00349462, [1] [Cycle 1]: 4.361e-05, [2] [tag_attr]: 1.567e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.905e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.98002e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00406444, [53] [py_interpret_to_execute]: 2.114e-05 [rewriter_before_opt_a]: 5.998e-05 [opt_a]: 0.00220067, [2] [Cycle 1]: 0.00155422, [45] [expand_dump_flag]: 3.35998e-06 [switch_simplify]: 3.122e-05 [loop_unroll]: 2.148e-05 [a_1]: 0.00046303 [with_stream_mark]: 1.304e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.997e-05 [accelerated_algorithm]: 6.70002e-06 [shard]: 2.23998e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.32001e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 6.34001e-06 [parallel]: 2.147e-05 [flash_sp]: 7.3e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.72002e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 8.15e-06 [virtual_dataset]: 6.17999e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 6.23e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.35002e-06 [receive_attached]: 2.11998e-06 [after_resolve]: 1.131e-05 [a_after_grad]: 8.64e-06 [renormalize]: 0.00043627 [add_forward_monad_depend]: 4.62998e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.296e-05 [cse]: 2.746e-05 [a_3]: 4.184e-05 [Cycle 2]: 0.00063672, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 7.11001e-06 [loop_unroll]: 5.77999e-06 [a_1]: 0.00012894 [with_stream_mark]: 9.86e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.918e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.10998e-06 [flash_sp]: 2.98e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 3.17002e-06 [matmul_add_comm_reduction]: 4.85999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.39001e-06 [virtual_dataset]: 3.261e-05 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.89999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.14999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.13999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.002e-05 [a_after_grad]: 8.83001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.758e-05 [a_3]: 3.272e-05 [py_interpret_to_execute_after_opt_a]: 7.88999e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 3.038e-05 [convert_after_rewriter]: 7.26999e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.00045326 [opt_b]: 0.00018549, [1] [Cycle 1]: 0.00017934, [7] [b_1]: 0.00011107 [b_2]: 7.42998e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 3.39991e-07 [cse]: 1.64e-05 [optimize_parallel_all_gather_comm]: 1.54e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.139e-05 [loop_unroll]: 0.00041969 [opt_after_cconv]: 9.754e-05, [1] [Cycle 1]: 9.151e-05, [7] [c_1]: 2.903e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.618e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.338e-05 [tuple_transform]: 7.006e-05, [1] [Cycle 1]: 6.562e-05, [4] [d_1]: 3.959e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.30002e-06 [partial_unused_args_eliminate]: 2.06998e-06 [add_recomputation]: 4.761e-05 [cse_after_recomputation]: 2.072e-05, [1] [Cycle 1]: 1.592e-05, [1] [cse]: 1.053e-05 [environ_conv]: 5.30999e-06 [swap_dp_allreduce_reducescatter]: 4.87998e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.68e-06 [micro_interleaved_order_control]: 2.07001e-06 [assign_add_opt]: 1.49998e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 1.96e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.40001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66998e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.56002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 7.044e-05, [1] [Cycle 1]: 6.613e-05, [6] [build]: 2.63e-06 [elim_shapecalc]: 8.75999e-06 [elim_not_effective]: 1.174e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 9.32999e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 1.557e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 0.00013154 [opt_after_jit_grad]: 0.00045989 [validate]: 3.193e-05 [backend_pass]: 1.09998e-06 [task_emit]: 0.0066081 [execute]: 6.66e-06 Sums bootstrap : 0.000569s : 3.22% type_inference : 0.006459s : 36.51% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.31% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000592s : 3.35% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000149s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000039s : 0.22% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000436s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000045s : 0.25% optimize.opt_a.a_3 : 0.000075s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.56% optimize.opt_b.b_1 : 0.000111s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.12% optimize.loop_unroll : 0.000420s : 2.37% optimize.opt_after_cconv.c_1 : 0.000029s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.09% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000132s : 0.74% opt_after_jit_grad : 0.000460s : 2.60% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006608s : 37.35% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000169 30 14.10% : 0.000024s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.13% : 0.000005s : 4: substitution.graph_param_transform 67.52% : 0.000114s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.47% : 0.000004s : 4: substitution.remove_not_recompute_node 2.79% : 0.000005s : 4: substitution.replace_old_param 6.62% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006409 2 90.35% : 0.005790s : 1: type_inference.infer 9.65% : 0.000618s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.33% : 0.000028s : 3: replace.inline 30.67% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 91.75% : 0.000112s : 3: match.inline 8.25% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.21% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.93% : 0.000002s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.86% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.86% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.63% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.06% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000404 8 46.44% : 0.000188s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.56% : 0.000217s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031337 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.20% : 0.003510s : 1: add_attr 11.16% : 0.003498s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000605s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.48% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.17% : 0.000995s : 78: opt.transform.opt_a 0.09% : 0.000028s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.03% : 0.002204s : 1: opt_a 0.32% : 0.000101s : 1: opt_after_cconv 1.50% : 0.000470s : 1: opt_after_jit_grad 0.60% : 0.000189s : 1: opt_b 12.98% : 0.004068s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.69% : 0.000217s : 1: renormalize.infer 0.68% : 0.000212s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000137s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000073s : 1: symbol_engine_optimizer 21.12% : 0.006618s : 1: task_emit 0.23% : 0.000073s : 1: tuple_transform 20.65% : 0.006473s : 1: type_inference 0.20% : 0.000064s : 1: validate TotalTime = 0.0185865, [24] [bootstrap]: 0.00045618 [type_inference]: 0.00440356 [event_method]: 1.086e-05 [auto_monad]: 5.354e-05 [graph_reusing]: 5.25999e-06 [inline]: 1.67999e-06 [add_attr]: 0.00305311, [1] [add_attr_with_inline]: 0.003045, [1] [Cycle 1]: 4.698e-05, [2] [tag_attr]: 1.222e-05 [meta_addattr_fg_expand]: 3.48e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 2.234e-05 [insert-virtual-dataset]: 2.45002e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00378741, [53] [py_interpret_to_execute]: 1.68e-05 [rewriter_before_opt_a]: 3.849e-05 [opt_a]: 0.00194192, [2] [Cycle 1]: 0.00132229, [45] [expand_dump_flag]: 2.59001e-06 [switch_simplify]: 2.426e-05 [loop_unroll]: 1.45e-05 [a_1]: 0.00033896 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 8.18999e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 2.94999e-06 [parameter_eliminate]: 2.22001e-06 [a_2]: 7.848e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.57002e-06 [merge_send_recv]: 8.12e-06 [auto_parallel]: 5.76e-06 [parallel]: 1.79e-05 [flash_sp]: 7.21999e-06 [merge_comm]: 3.55003e-06 [allreduce_fusion]: 3.45003e-06 [matmul_add_comm_reduction]: 8.90001e-06 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 7.33999e-06 [virtual_dataset]: 6.29001e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.147e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.66e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.07e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00035189 [add_forward_monad_depend]: 4.72998e-06 [auto_monad_grad]: 2.13002e-06 [auto_monad_eliminator]: 1.289e-05 [cse]: 2.68e-05 [a_3]: 4.018e-05 [Cycle 2]: 0.00061031, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 7.09001e-06 [loop_unroll]: 5.82001e-06 [a_1]: 0.00012937 [with_stream_mark]: 1.183e-05 [recompute_prepare]: 6.26e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.60002e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.982e-05 [accelerated_algorithm]: 5.94999e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.88002e-06 [merge_send_recv]: 4.62998e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.64998e-06 [flash_sp]: 3.2e-06 [merge_comm]: 3.59002e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.16002e-06 [virtual_output]: 5.15001e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 6.19001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52999e-06 [merge_recompute_call_nodes]: 7.80012e-07 [before_grad]: 8.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.32002e-06 [meta_fg_expand]: 1.87001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.82999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.96001e-06 [cse]: 1.346e-05 [a_3]: 3.271e-05 [py_interpret_to_execute_after_opt_a]: 8e-06 [slice_cell_reuse_recomputed_activation]: 1.68002e-06 [rewriter_after_opt_a]: 3.176e-05 [convert_after_rewriter]: 6.68998e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00045394 [opt_b]: 0.00018581, [1] [Cycle 1]: 0.00017959, [7] [b_1]: 0.00011019 [b_2]: 7.4e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.48002e-06 [renormalize]: 3.09985e-07 [cse]: 1.664e-05 [optimize_parallel_all_gather_comm]: 1.582e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 2.296e-05 [loop_unroll]: 0.00042627 [opt_after_cconv]: 9.691e-05, [1] [Cycle 1]: 9.107e-05, [7] [c_1]: 2.813e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.705e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.221e-05 [tuple_transform]: 7.077e-05, [1] [Cycle 1]: 6.633e-05, [4] [d_1]: 4.004e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 4.502e-05 [cse_after_recomputation]: 2.129e-05, [1] [Cycle 1]: 1.676e-05, [1] [cse]: 1.145e-05 [environ_conv]: 4.60001e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.58002e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.66999e-06 [comm_op_add_attrs]: 1.21002e-06 [add_comm_op_reuse_tag]: 1.30001e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.93002e-06 [control_data_broadcast_order]: 1.205e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.683e-05 [begin_end_overlap_inline]: 5.79981e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.104e-05, [1] [Cycle 1]: 6.696e-05, [6] [build]: 2.54001e-06 [elim_shapecalc]: 8.92999e-06 [elim_not_effective]: 1.228e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.627e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.77002e-06 [opt_after_jit_grad]: 0.00045808 [validate]: 5.903e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00600498 [execute]: 6.83e-06 Sums bootstrap : 0.000456s : 3.14% type_inference : 0.004404s : 30.31% event_method : 0.000011s : 0.07% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.12% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000468s : 3.22% optimize.opt_a.with_stream_mark : 0.000026s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000148s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.09% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000352s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000454s : 3.12% optimize.opt_b.b_1 : 0.000110s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000426s : 2.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000458s : 3.15% validate : 0.000059s : 0.41% backend_pass : 0.000001s : 0.01% task_emit : 0.006005s : 41.34% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.00% : 0.000022s : 4: substitution.arithmetic_simplify 1.78% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.26% : 0.000005s : 4: substitution.graph_param_transform 65.67% : 0.000080s : 2: substitution.inline 2.39% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 3.32% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004362 2 92.05% : 0.004015s : 1: type_inference.infer 7.95% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000139 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.92% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.71% : 0.000004s : 17: predicate.arithmetic_simplify 0.73% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.38% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.85% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.95% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.83% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.36% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.64% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.73% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.82% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 43.44% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.56% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026771 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.42% : 0.003057s : 1: add_attr 11.39% : 0.003048s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.96% : 0.000524s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000435s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000829s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000093s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.26% : 0.001945s : 1: opt_a 0.37% : 0.000100s : 1: opt_after_cconv 1.75% : 0.000467s : 1: opt_after_jit_grad 0.71% : 0.000189s : 1: opt_b 14.16% : 0.003791s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000021s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000193s : 1: renormalize.infer 0.57% : 0.000152s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000074s : 1: symbol_engine_optimizer 22.47% : 0.006015s : 1: task_emit 0.28% : 0.000074s : 1: tuple_transform 16.50% : 0.004417s : 1: type_inference 0.32% : 0.000086s : 1: validate TotalTime = 0.0199938, [24] [bootstrap]: 0.00046389 [type_inference]: 0.00560709 [event_method]: 1.47e-05 [auto_monad]: 5.545e-05 [graph_reusing]: 5.49e-06 [inline]: 1.55001e-06 [add_attr]: 0.00302588, [1] [add_attr_with_inline]: 0.00301795, [1] [Cycle 1]: 4.468e-05, [2] [tag_attr]: 1.575e-05 [meta_addattr_fg_expand]: 4.53001e-06 [parallel-infer-symbol]: 2.64999e-06 [pre_auto_parallel]: 2.538e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00401092, [53] [py_interpret_to_execute]: 1.943e-05 [rewriter_before_opt_a]: 5.88e-05 [opt_a]: 0.00213513, [2] [Cycle 1]: 0.00151961, [45] [expand_dump_flag]: 2.69999e-06 [switch_simplify]: 3.33e-05 [loop_unroll]: 2.244e-05 [a_1]: 0.00045899 [with_stream_mark]: 1.272e-05 [recompute_prepare]: 8.24998e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.719e-05 [accelerated_algorithm]: 6.65002e-06 [shard]: 1.91e-06 [meta_shard_fg_expand]: 1.99999e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 5.95002e-06 [parallel]: 1.696e-05 [flash_sp]: 7.16999e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.42002e-06 [matmul_add_comm_reduction]: 8.80999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.54002e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 5.53002e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.51999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.57001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.102e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.29e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.55002e-06 [flash_sp_send_recv_attached]: 2.14999e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.0004146 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.377e-05 [cse]: 2.7e-05 [a_3]: 4.177e-05 [Cycle 2]: 0.00060587, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.60001e-06 [a_1]: 0.00012793 [with_stream_mark]: 1.048e-05 [recompute_prepare]: 6.06998e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.899e-05 [accelerated_algorithm]: 6.04001e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 4.23001e-06 [auto_parallel]: 5.21002e-06 [parallel]: 4.14002e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 4.95999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.27001e-06 [virtual_dataset]: 5.60001e-06 [get_grad_eliminate_]: 5.27999e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.67001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.03002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39998e-06 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 8.38001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.09003e-06 [after_resolve]: 9.46e-06 [a_after_grad]: 8.28999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.32e-05 [a_3]: 3.32e-05 [py_interpret_to_execute_after_opt_a]: 7.97e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 2.941e-05 [convert_after_rewriter]: 7.83999e-06 [order_py_execute_after_rewriter]: 5.49e-06 [mutable_eliminate]: 0.00044853 [opt_b]: 0.00018631, [1] [Cycle 1]: 0.00018002, [7] [b_1]: 0.00011142 [b_2]: 7.26999e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 5.00004e-07 [cse]: 1.643e-05 [optimize_parallel_all_gather_comm]: 1.554e-05 [overlap_param_gather]: 2.21e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.00042195 [opt_after_cconv]: 9.793e-05, [1] [Cycle 1]: 9.179e-05, [7] [c_1]: 2.826e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.61998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.647e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.316e-05 [tuple_transform]: 7.233e-05, [1] [Cycle 1]: 6.776e-05, [4] [d_1]: 4.053e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.70002e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.423e-05 [cse_after_recomputation]: 2.163e-05, [1] [Cycle 1]: 1.725e-05, [1] [cse]: 1.175e-05 [environ_conv]: 4.55001e-06 [swap_dp_allreduce_reducescatter]: 5.37001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.45002e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.59999e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.65002e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 1.15999e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.13999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 3.77002e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.047e-05, [1] [Cycle 1]: 6.594e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 9.17999e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 6.43e-06 [fold_const_symbol]: 9.41e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.621e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.66999e-06 [opt_after_jit_grad]: 0.00045825 [validate]: 3.089e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00605644 [execute]: 7.18e-06 Sums bootstrap : 0.000464s : 2.90% type_inference : 0.005607s : 35.10% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000028s : 0.18% optimize.opt_a.a_1 : 0.000587s : 3.67% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000415s : 2.60% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000040s : 0.25% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.18% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 2.81% optimize.opt_b.b_1 : 0.000111s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000422s : 2.64% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000458s : 2.87% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006056s : 37.91% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.80% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 67.29% : 0.000111s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.78% : 0.000005s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 6.13% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005566 2 89.90% : 0.005004s : 1: type_inference.infer 10.10% : 0.000562s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.44% : 0.000027s : 3: replace.inline 30.56% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 92.33% : 0.000109s : 3: match.inline 7.67% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 1.06% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.52% : 0.000002s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.30% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.86% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.15% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.57% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 45.77% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.23% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028552 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.61% : 0.003030s : 1: add_attr 10.58% : 0.003021s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.75% : 0.000499s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.60% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.37% : 0.000962s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000093s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.49% : 0.002138s : 1: opt_a 0.36% : 0.000101s : 1: opt_after_cconv 1.64% : 0.000468s : 1: opt_after_jit_grad 0.66% : 0.000190s : 1: opt_b 14.06% : 0.004015s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.73% : 0.000210s : 1: renormalize.infer 0.69% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000033s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000073s : 1: symbol_engine_optimizer 21.25% : 0.006067s : 1: task_emit 0.26% : 0.000075s : 1: tuple_transform 19.69% : 0.005621s : 1: type_inference 0.20% : 0.000058s : 1: validate TotalTime = 0.0430715, [24] [bootstrap]: 0.00050781 [type_inference]: 0.0115785 [event_method]: 4.932e-05 [auto_monad]: 0.00012524 [graph_reusing]: 8.56002e-06 [inline]: 2.16998e-06 [add_attr]: 0.00701772, [1] [add_attr_with_inline]: 0.00700804, [1] [Cycle 1]: 8.19e-05, [2] [tag_attr]: 3.939e-05 [meta_addattr_fg_expand]: 1.064e-05 [parallel-infer-symbol]: 3.16001e-06 [pre_auto_parallel]: 5.537e-05 [insert-virtual-dataset]: 2.72001e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0143407, [53] [py_interpret_to_execute]: 4.004e-05 [rewriter_before_opt_a]: 0.0001561 [opt_a]: 0.0119303, [3] [Cycle 1]: 0.00774067, [45] [expand_dump_flag]: 5.20001e-06 [switch_simplify]: 7.72e-05 [loop_unroll]: 6.324e-05 [a_1]: 0.00151863 [with_stream_mark]: 2.456e-05 [recompute_prepare]: 2.242e-05 [updatestate_depend_eliminate]: 9.63997e-06 [updatestate_assign_eliminate]: 7.98001e-06 [updatestate_loads_eliminate]: 8.02998e-06 [parameter_eliminate]: 2.59001e-06 [a_2]: 0.00025032 [accelerated_algorithm]: 3.186e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 4.47e-06 [shard_inline]: 1.665e-05 [merge_send_recv]: 1.677e-05 [auto_parallel]: 1.121e-05 [parallel]: 1.848e-05 [flash_sp]: 1.169e-05 [merge_comm]: 1.035e-05 [allreduce_fusion]: 9.44e-06 [matmul_add_comm_reduction]: 2.827e-05 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 1.872e-05 [virtual_dataset]: 1.617e-05 [get_grad_eliminate_]: 1.554e-05 [virtual_output]: 1.543e-05 [merge_forward]: 9.57001e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 1.782e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.995e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 2.807e-05 [set_forward_comm_id_for_comm_node_pass]: 1e-05 [meta_fg_expand]: 0.00161092 [flash_sp_send_recv_attached]: 3.95998e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 6.305e-05 [a_after_grad]: 8.55e-05 [renormalize]: 0.00276318 [add_forward_monad_depend]: 1.02e-05 [auto_monad_grad]: 5.72999e-06 [auto_monad_eliminator]: 5.895e-05 [cse]: 0.00018617 [a_3]: 0.00034632 [Cycle 2]: 0.00320109, [45] [expand_dump_flag]: 1.81e-06 [switch_simplify]: 4.845e-05 [loop_unroll]: 4.55e-05 [a_1]: 0.00159153 [with_stream_mark]: 1.257e-05 [recompute_prepare]: 1.174e-05 [updatestate_depend_eliminate]: 5.50001e-06 [updatestate_assign_eliminate]: 4.63999e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.00012988 [accelerated_algorithm]: 1.264e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 2.07001e-06 [shard_inline]: 9.52001e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 7.76001e-06 [parallel]: 5.02e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 5.32999e-06 [allreduce_fusion]: 4.82998e-06 [matmul_add_comm_reduction]: 7.98999e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.083e-05 [virtual_dataset]: 8.90999e-06 [get_grad_eliminate_]: 9.44998e-06 [virtual_output]: 8.64998e-06 [merge_forward]: 4.57e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 9.55001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.671e-05 [merge_recompute_call_nodes]: 6.29982e-07 [before_grad]: 1.462e-05 [set_forward_comm_id_for_comm_node_pass]: 6.24001e-06 [meta_fg_expand]: 9.513e-05 [flash_sp_send_recv_attached]: 1.10001e-06 [receive_attached]: 1.19e-06 [after_resolve]: 1.811e-05 [a_after_grad]: 1.59e-05 [renormalize]: 0.00066655 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.26002e-06 [auto_monad_eliminator]: 1.562e-05 [cse]: 4.79e-05 [a_3]: 6.798e-05 [Cycle 3]: 0.00097478, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 1.096e-05 [loop_unroll]: 9.29e-06 [a_1]: 0.00028524 [with_stream_mark]: 1.096e-05 [recompute_prepare]: 9.94001e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.95998e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012694 [accelerated_algorithm]: 1.255e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 9.61998e-06 [merge_send_recv]: 7.45998e-06 [auto_parallel]: 7.43e-06 [parallel]: 4.65001e-06 [flash_sp]: 1.21002e-06 [merge_comm]: 5.14998e-06 [allreduce_fusion]: 5.19e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 9.05999e-06 [get_grad_eliminate_]: 8.86002e-06 [virtual_output]: 8.47998e-06 [merge_forward]: 4.58999e-06 [cell_reuse_recompute_pass]: 1.50999e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.8e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.52e-05 [set_forward_comm_id_for_comm_node_pass]: 6.16998e-06 [meta_fg_expand]: 3.61999e-06 [flash_sp_send_recv_attached]: 8.90024e-07 [receive_attached]: 1.17999e-06 [after_resolve]: 1.424e-05 [a_after_grad]: 1.442e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.49e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.11e-05 [cse]: 2.859e-05 [a_3]: 6.164e-05 [py_interpret_to_execute_after_opt_a]: 1.11e-05 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 4.879e-05 [convert_after_rewriter]: 9.40001e-06 [order_py_execute_after_rewriter]: 7.1e-06 [mutable_eliminate]: 0.00050017 [opt_b]: 0.00030264, [1] [Cycle 1]: 0.00029648, [7] [b_1]: 0.00019654 [b_2]: 1.184e-05 [updatestate_depend_eliminate]: 7.71999e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 4.04002e-06 [renormalize]: 4.40021e-07 [cse]: 3.534e-05 [optimize_parallel_all_gather_comm]: 2.191e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.021e-05 [loop_unroll]: 0.00046531 [opt_after_cconv]: 0.00014233, [1] [Cycle 1]: 0.00013611, [7] [c_1]: 5.019e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 7.97e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 4.27003e-06 [cse]: 3.267e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 3.125e-05 [tuple_transform]: 0.00010703, [1] [Cycle 1]: 0.000102, [4] [d_1]: 7.005e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.069e-05 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 5.957e-05 [cse_after_recomputation]: 3.341e-05, [1] [Cycle 1]: 2.865e-05, [1] [cse]: 2.29e-05 [environ_conv]: 8.55999e-06 [swap_dp_allreduce_reducescatter]: 8.35001e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.22003e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 1.37e-06 [full_micro_interleaved_order_control]: 2.46e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.37e-06 [add_comm_op_reuse_tag]: 1.20999e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.63002e-06 [control_data_broadcast_order]: 1.78e-05 [grouped_pairwise_exchange_alltoall]: 1.98002e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 6.02001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.57999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 5.15999e-06 [overlap_grad_flash_sp]: 2.383e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 1.96003e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 0.00010068, [1] [Cycle 1]: 9.62e-05, [6] [build]: 9.77999e-06 [elim_shapecalc]: 1.424e-05 [elim_not_effective]: 1.886e-05 [opt_reshape]: 1.006e-05 [fold_const_symbol]: 1.52e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 2.689e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00051458 [validate]: 4.622e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00855811 [execute]: 7.03e-06 Sums bootstrap : 0.000508s : 1.46% type_inference : 0.011578s : 33.32% event_method : 0.000049s : 0.14% auto_monad : 0.000125s : 0.36% graph_reusing : 0.000009s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000039s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000055s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.12% optimize.rewriter_before_opt_a : 0.000156s : 0.45% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000137s : 0.39% optimize.opt_a.loop_unroll : 0.000118s : 0.34% optimize.opt_a.a_1 : 0.003395s : 9.77% optimize.opt_a.with_stream_mark : 0.000048s : 0.14% optimize.opt_a.recompute_prepare : 0.000044s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000507s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.10% optimize.opt_a.merge_send_recv : 0.000032s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000044s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000058s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.06% optimize.opt_a.meta_fg_expand : 0.001710s : 4.92% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000095s : 0.27% optimize.opt_a.a_after_grad : 0.000116s : 0.33% optimize.opt_a.renormalize : 0.003430s : 9.87% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.25% optimize.opt_a.cse : 0.000263s : 0.76% optimize.opt_a.a_3 : 0.000476s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000500s : 1.44% optimize.opt_b.b_1 : 0.000197s : 0.57% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000035s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000465s : 1.34% optimize.opt_after_cconv.c_1 : 0.000050s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000033s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000070s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.17% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000515s : 1.48% validate : 0.000046s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008558s : 24.63% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000829 222 5.83% : 0.000048s : 12: substitution.arithmetic_simplify 1.76% : 0.000015s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 57.60% : 0.000478s : 17: substitution.inline 2.04% : 0.000017s : 2: substitution.inline_without_move 1.30% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.87% : 0.000016s : 3: substitution.less_batch_normalization 1.60% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000006s : 5: substitution.partial_eliminate 1.74% : 0.000014s : 20: substitution.remove_not_recompute_node 3.05% : 0.000025s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.41% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.67% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.22% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.19% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.19% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011504 2 86.04% : 0.009899s : 1: type_inference.infer 13.96% : 0.001605s : 1: type_inference.specialize ------[replace.] 0.000240 33 56.53% : 0.000135s : 17: replace.inline 43.47% : 0.000104s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000503 33 93.04% : 0.000468s : 17: match.inline 6.96% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000779 5764 1.05% : 0.000008s : 68: predicate.accumulaten_eliminater 0.34% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.00% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.16% : 0.000017s : 100: predicate.arithmetic_simplify 1.16% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.14% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.14% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_depend_swap 1.71% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.70% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.38% : 0.000019s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.66% : 0.000044s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000006s : 32: predicate.less_batch_normalization 1.62% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 168: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000009s : 68: predicate.minmaximum_grad 0.40% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.11% : 0.000016s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 68: predicate.reduce_eliminate 2.61% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.85% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000009s : 68: predicate.reshape_eliminate 1.10% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.91% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.98% : 0.000039s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.51% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000013s : 84: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.53% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.55% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.20% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.16% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001738 34 56.31% : 0.000978s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.69% : 0.000759s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.073316 237 0.00% : 0.000004s : 1: ForceFp32Comm 9.58% : 0.007022s : 1: add_attr 9.56% : 0.007013s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000133s : 1: auto_monad 0.04% : 0.000031s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.74% : 0.000541s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000057s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.65% : 0.000475s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.70% : 0.000511s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 6.98% : 0.005117s : 117: opt.transform.opt_a 0.07% : 0.000049s : 1: opt.transform.opt_after_cconv 0.05% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000181s : 28: opt.transform.opt_b 0.11% : 0.000079s : 2: opt.transform.opt_trans_graph 0.07% : 0.000055s : 4: opt.transform.symbol_engine_opt 16.28% : 0.011933s : 1: opt_a 0.20% : 0.000146s : 1: opt_after_cconv 0.72% : 0.000525s : 1: opt_after_jit_grad 0.42% : 0.000306s : 1: opt_b 19.57% : 0.014345s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000061s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000035s : 1: remove_dup_value 2.52% : 0.001846s : 2: renormalize.infer 2.14% : 0.001569s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000053s : 1: rewriter_after_opt_a 0.22% : 0.000161s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.14% : 0.000103s : 1: symbol_engine_optimizer 11.69% : 0.008568s : 1: task_emit 0.15% : 0.000110s : 1: tuple_transform 15.81% : 0.011593s : 1: type_inference 0.11% : 0.000081s : 1: validate TotalTime = 0.0188478, [24] [bootstrap]: 0.00048199 [type_inference]: 0.00439831 [event_method]: 1.059e-05 [auto_monad]: 5.154e-05 [graph_reusing]: 5.46e-06 [inline]: 1.62001e-06 [add_attr]: 0.00304821, [1] [add_attr_with_inline]: 0.00303999, [1] [Cycle 1]: 4.597e-05, [2] [tag_attr]: 1.219e-05 [meta_addattr_fg_expand]: 3.23e-06 [parallel-infer-symbol]: 2.76999e-06 [pre_auto_parallel]: 2.088e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00375846, [53] [py_interpret_to_execute]: 1.512e-05 [rewriter_before_opt_a]: 3.937e-05 [opt_a]: 0.00190237, [2] [Cycle 1]: 0.00128225, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 2.598e-05 [loop_unroll]: 1.421e-05 [a_1]: 0.00029847 [with_stream_mark]: 1.364e-05 [recompute_prepare]: 7.55998e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.99001e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 7.722e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 2.49001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 7.93999e-06 [auto_parallel]: 5.61e-06 [parallel]: 2.113e-05 [flash_sp]: 7.73001e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.71999e-06 [matmul_add_comm_reduction]: 9.36e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 5.83002e-06 [get_grad_eliminate_]: 6.20002e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.55e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.173e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.95002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.50002e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.051e-05 [a_after_grad]: 9.15999e-06 [renormalize]: 0.00034938 [add_forward_monad_depend]: 4.95999e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.293e-05 [cse]: 2.711e-05 [a_3]: 4.184e-05 [Cycle 2]: 0.00061059, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.97002e-06 [loop_unroll]: 5.67999e-06 [a_1]: 0.00012894 [with_stream_mark]: 1.017e-05 [recompute_prepare]: 6.05002e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.40997e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.922e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.69999e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.51e-06 [parallel]: 4.56002e-06 [flash_sp]: 3.34001e-06 [merge_comm]: 3.46999e-06 [allreduce_fusion]: 2.87002e-06 [matmul_add_comm_reduction]: 5.25001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.39999e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.77002e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.81e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.99002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 1.84e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.06997e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.59e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 6.02999e-06 [cse]: 1.335e-05 [a_3]: 3.434e-05 [py_interpret_to_execute_after_opt_a]: 7.92e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.208e-05 [convert_after_rewriter]: 7.36999e-06 [order_py_execute_after_rewriter]: 5.67001e-06 [mutable_eliminate]: 0.00045263 [opt_b]: 0.00018639, [1] [Cycle 1]: 0.00018046, [7] [b_1]: 0.00011149 [b_2]: 7.44002e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 3.69997e-07 [cse]: 1.658e-05 [optimize_parallel_all_gather_comm]: 1.617e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.203e-05 [loop_unroll]: 0.00043849 [opt_after_cconv]: 9.703e-05, [1] [Cycle 1]: 9.112e-05, [7] [c_1]: 2.886e-05 [parameter_eliminate]: 2.63003e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.581e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.257e-05 [tuple_transform]: 7.18e-05, [1] [Cycle 1]: 6.688e-05, [4] [d_1]: 4.054e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.63e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.508e-05 [cse_after_recomputation]: 2.043e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.071e-05 [environ_conv]: 4.78001e-06 [swap_dp_allreduce_reducescatter]: 5.26998e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.60999e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 8.80013e-07 [remove_cast_before_assign_add]: 7.59988e-07 [full_micro_interleaved_order_control]: 2.03002e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.51998e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.77998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 4.00998e-06 [overlap_grad_flash_sp]: 1.664e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.027e-05, [1] [Cycle 1]: 6.611e-05, [6] [build]: 2.12001e-06 [elim_shapecalc]: 8.84003e-06 [elim_not_effective]: 1.212e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.11002e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.55e-05 [get_jit_bprop_graph]: 1.09003e-06 [rewriter_after_jit_bprop_graph]: 3.44001e-06 [opt_after_jit_grad]: 0.00045947 [validate]: 3.084e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00633983 [execute]: 6.34999e-06 Sums bootstrap : 0.000482s : 3.25% type_inference : 0.004398s : 29.68% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000033s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000427s : 2.88% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000349s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000076s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000453s : 3.05% optimize.opt_b.b_1 : 0.000111s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000438s : 2.96% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000459s : 3.10% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006340s : 42.78% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000122 26 17.71% : 0.000022s : 4: substitution.arithmetic_simplify 1.58% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.41% : 0.000005s : 4: substitution.graph_param_transform 65.97% : 0.000081s : 2: substitution.inline 2.52% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000004s : 4: substitution.remove_not_recompute_node 3.23% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004359 2 91.74% : 0.003998s : 1: type_inference.infer 8.26% : 0.000360s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000141 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.62% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.79% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.78% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.87% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.38% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.92% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 9: predicate.reduce_eliminate 2.05% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.69% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 1.24% : 0.000002s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.65% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 41.19% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.81% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026957 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.33% : 0.003053s : 1: add_attr 11.29% : 0.003043s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000520s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.66% : 0.000448s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.92% : 0.000788s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000094s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.07% : 0.001905s : 1: opt_a 0.37% : 0.000101s : 1: opt_after_cconv 1.74% : 0.000469s : 1: opt_after_jit_grad 0.70% : 0.000190s : 1: opt_b 13.96% : 0.003762s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000189s : 1: renormalize.infer 0.57% : 0.000154s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000073s : 1: symbol_engine_optimizer 23.56% : 0.006350s : 1: task_emit 0.28% : 0.000075s : 1: tuple_transform 16.37% : 0.004412s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0381619, [24] [bootstrap]: 0.00049838 [type_inference]: 0.010384 [event_method]: 4.319e-05 [auto_monad]: 0.0001201 [graph_reusing]: 8.28999e-06 [inline]: 1.97999e-06 [add_attr]: 0.00301939, [1] [add_attr_with_inline]: 0.0030109, [1] [Cycle 1]: 6.987e-05, [2] [tag_attr]: 3.372e-05 [meta_addattr_fg_expand]: 9.59999e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 4.831e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.90001e-06 [optimize]: 0.0135034, [53] [py_interpret_to_execute]: 3.715e-05 [rewriter_before_opt_a]: 0.00013106 [opt_a]: 0.0112148, [3] [Cycle 1]: 0.0071941, [45] [expand_dump_flag]: 3.58e-06 [switch_simplify]: 6.896e-05 [loop_unroll]: 5.732e-05 [a_1]: 0.00137655 [with_stream_mark]: 2.359e-05 [recompute_prepare]: 2.219e-05 [updatestate_depend_eliminate]: 9.44e-06 [updatestate_assign_eliminate]: 7.92998e-06 [updatestate_loads_eliminate]: 8.40001e-06 [parameter_eliminate]: 2.46998e-06 [a_2]: 0.00025045 [accelerated_algorithm]: 3.194e-05 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 3.68e-06 [shard_inline]: 1.64e-05 [merge_send_recv]: 1.675e-05 [auto_parallel]: 1.15e-05 [parallel]: 1.777e-05 [flash_sp]: 1.193e-05 [merge_comm]: 1.069e-05 [allreduce_fusion]: 9.29998e-06 [matmul_add_comm_reduction]: 2.612e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.866e-05 [virtual_dataset]: 1.617e-05 [get_grad_eliminate_]: 1.564e-05 [virtual_output]: 1.554e-05 [merge_forward]: 1.007e-05 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 1.792e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.937e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.862e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89999e-06 [meta_fg_expand]: 0.00149292 [flash_sp_send_recv_attached]: 3.86001e-06 [receive_attached]: 2.58e-06 [after_resolve]: 6.227e-05 [a_after_grad]: 8.431e-05 [renormalize]: 0.00249421 [add_forward_monad_depend]: 9.54e-06 [auto_monad_grad]: 5.04998e-06 [auto_monad_eliminator]: 5.736e-05 [cse]: 0.00017442 [a_3]: 0.00037069 [Cycle 2]: 0.00305208, [45] [expand_dump_flag]: 1.74998e-06 [switch_simplify]: 4.829e-05 [loop_unroll]: 4.486e-05 [a_1]: 0.00157951 [with_stream_mark]: 1.262e-05 [recompute_prepare]: 1.141e-05 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 4.63001e-06 [updatestate_loads_eliminate]: 3.82002e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012879 [accelerated_algorithm]: 1.24e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 2.11e-06 [shard_inline]: 9.41998e-06 [merge_send_recv]: 6.93998e-06 [auto_parallel]: 7.48e-06 [parallel]: 5.37999e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 5.29e-06 [allreduce_fusion]: 4.74e-06 [matmul_add_comm_reduction]: 8.2e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.04e-05 [virtual_dataset]: 9.00001e-06 [get_grad_eliminate_]: 9.12999e-06 [virtual_output]: 8.39998e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.49999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.64e-05 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 1.473e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40999e-06 [meta_fg_expand]: 3.601e-05 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.574e-05 [a_after_grad]: 1.471e-05 [renormalize]: 0.00060745 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.21002e-06 [auto_monad_eliminator]: 1.489e-05 [cse]: 4.951e-05 [a_3]: 6.763e-05 [Cycle 3]: 0.00095429, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.118e-05 [loop_unroll]: 9.05001e-06 [a_1]: 0.00025475 [with_stream_mark]: 1.023e-05 [recompute_prepare]: 1.032e-05 [updatestate_depend_eliminate]: 5.04003e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 4.23001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012562 [accelerated_algorithm]: 1.214e-05 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.89999e-06 [shard_inline]: 9.27999e-06 [merge_send_recv]: 7.06999e-06 [auto_parallel]: 7.5e-06 [parallel]: 4.58999e-06 [flash_sp]: 1.22e-06 [merge_comm]: 5.35001e-06 [allreduce_fusion]: 5.27001e-06 [matmul_add_comm_reduction]: 7.88999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 1.07e-05 [virtual_dataset]: 8.99998e-06 [get_grad_eliminate_]: 8.80001e-06 [virtual_output]: 8.89998e-06 [merge_forward]: 4.48999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.632e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.424e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44e-06 [meta_fg_expand]: 3.40998e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.18001e-06 [after_resolve]: 1.455e-05 [a_after_grad]: 4.013e-05 [renormalize]: 1.50001e-07 [add_forward_monad_depend]: 1.66002e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 1.196e-05 [cse]: 2.74e-05 [a_3]: 6.091e-05 [py_interpret_to_execute_after_opt_a]: 1.106e-05 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 4.816e-05 [convert_after_rewriter]: 9.46e-06 [order_py_execute_after_rewriter]: 7.36999e-06 [mutable_eliminate]: 0.00046191 [opt_b]: 0.00029418, [1] [Cycle 1]: 0.00028802, [7] [b_1]: 0.0001935 [b_2]: 1.14e-05 [updatestate_depend_eliminate]: 7.63001e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 3.98001e-06 [renormalize]: 3.7998e-07 [cse]: 3.167e-05 [optimize_parallel_all_gather_comm]: 2.149e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 1.933e-05 [loop_unroll]: 0.00043123 [opt_after_cconv]: 0.00013845, [1] [Cycle 1]: 0.00013254, [7] [c_1]: 4.917e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 7.62998e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 4.11001e-06 [cse]: 3.027e-05 [renormalize]: 1.80007e-07 [remove_dup_value]: 3.062e-05 [tuple_transform]: 0.00010427, [1] [Cycle 1]: 9.963e-05, [4] [d_1]: 6.847e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.034e-05 [partial_unused_args_eliminate]: 1.52999e-06 [add_recomputation]: 5.916e-05 [cse_after_recomputation]: 3.272e-05, [1] [Cycle 1]: 2.79e-05, [1] [cse]: 2.215e-05 [environ_conv]: 8.82999e-06 [swap_dp_allreduce_reducescatter]: 8.23001e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.03999e-06 [label_fine_grained_interleaved_index]: 2.48998e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.726e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 5.00001e-06 [overlap_recompute_and_grad_model_parallel]: 5.57999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 5.52001e-06 [overlap_grad_flash_sp]: 2.444e-05 [begin_end_overlap_inline]: 4.50003e-07 [split_matmul_comm_elemetwise]: 1.98002e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 0.00010299, [1] [Cycle 1]: 9.881e-05, [6] [build]: 8.73001e-06 [elim_shapecalc]: 1.504e-05 [elim_not_effective]: 2e-05 [opt_reshape]: 1.06e-05 [fold_const_symbol]: 1.575e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.98002e-06 [pipeline_parallel_scheduler]: 1.32999e-06 [auto_monad_reorder]: 2.563e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00047614 [validate]: 4.524e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00975139 [execute]: 7.3e-06 Sums bootstrap : 0.000498s : 1.47% type_inference : 0.010384s : 30.67% event_method : 0.000043s : 0.13% auto_monad : 0.000120s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000131s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000128s : 0.38% optimize.opt_a.loop_unroll : 0.000111s : 0.33% optimize.opt_a.a_1 : 0.003211s : 9.48% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000044s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000505s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000058s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001532s : 4.53% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000093s : 0.27% optimize.opt_a.a_after_grad : 0.000139s : 0.41% optimize.opt_a.renormalize : 0.003102s : 9.16% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.25% optimize.opt_a.cse : 0.000251s : 0.74% optimize.opt_a.a_3 : 0.000499s : 1.47% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000462s : 1.36% optimize.opt_b.b_1 : 0.000194s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000431s : 1.27% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000476s : 1.41% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.009751s : 28.80% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000761 218 5.60% : 0.000043s : 11: substitution.arithmetic_simplify 1.92% : 0.000015s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000007s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 55.60% : 0.000423s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.05% : 0.000016s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.77% : 0.000013s : 20: substitution.remove_not_recompute_node 3.29% : 0.000025s : 10: substitution.replace_applicator 1.52% : 0.000012s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.74% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.13% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010313 2 87.25% : 0.008998s : 1: type_inference.infer 12.75% : 0.001315s : 1: type_inference.specialize ------[replace.] 0.000216 30 58.84% : 0.000127s : 16: replace.inline 41.16% : 0.000089s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000444 30 93.25% : 0.000414s : 16: match.inline 6.75% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5663 1.05% : 0.000008s : 67: predicate.accumulaten_eliminater 0.33% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.19% : 0.000017s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.65% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000043s : 244: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.59% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000010s : 67: predicate.reduce_eliminate 2.62% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.66% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.31% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 97: predicate.switch_defer_inline 2.90% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.82% : 0.000037s : 265: predicate.switch_simplify 1.04% : 0.000008s : 67: predicate.tile_eliminate 1.05% : 0.000008s : 67: predicate.transpose_eliminate 1.57% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.60% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.19% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000005s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001577 32 58.19% : 0.000918s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.81% : 0.000659s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063081 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.79% : 0.003024s : 1: add_attr 4.78% : 0.003015s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000127s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000533s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000050s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.86% : 0.004958s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000179s : 28: opt.transform.opt_b 0.12% : 0.000077s : 2: opt.transform.opt_trans_graph 0.09% : 0.000058s : 4: opt.transform.symbol_engine_opt 17.78% : 0.011218s : 1: opt_a 0.22% : 0.000142s : 1: opt_after_cconv 0.77% : 0.000486s : 1: opt_after_jit_grad 0.47% : 0.000298s : 1: opt_b 21.41% : 0.013507s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000035s : 1: remove_dup_value 2.59% : 0.001631s : 2: renormalize.infer 2.31% : 0.001458s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.22% : 0.000136s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000106s : 1: symbol_engine_optimizer 15.47% : 0.009762s : 1: task_emit 0.17% : 0.000107s : 1: tuple_transform 16.49% : 0.010399s : 1: type_inference 0.12% : 0.000079s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x0-kbk],max_mem:42.0M TotalTime = 0.0796571, [24] [bootstrap]: 0.00065918 [type_inference]: 0.00645131 [event_method]: 1.442e-05 [auto_monad]: 5.495e-05 [graph_reusing]: 5.88998e-06 [inline]: 1.77001e-06 [add_attr]: 0.00345572, [1] [add_attr_with_inline]: 0.00344501, [1] [Cycle 1]: 4.643e-05, [2] [tag_attr]: 1.605e-05 [meta_addattr_fg_expand]: 4.52e-06 [parallel-infer-symbol]: 2.71999e-06 [pre_auto_parallel]: 2.792e-05 [insert-virtual-dataset]: 2.43002e-06 [parallel-infer-symbol-second]: 8.79983e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00406128, [53] [py_interpret_to_execute]: 2.041e-05 [rewriter_before_opt_a]: 5.927e-05 [opt_a]: 0.00218172, [2] [Cycle 1]: 0.00156689, [45] [expand_dump_flag]: 2.91999e-06 [switch_simplify]: 3.304e-05 [loop_unroll]: 2.235e-05 [a_1]: 0.0004679 [with_stream_mark]: 1.405e-05 [recompute_prepare]: 7.98999e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 3.29001e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.834e-05 [accelerated_algorithm]: 6.52001e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 8.20999e-06 [auto_parallel]: 7.16001e-06 [parallel]: 2.125e-05 [flash_sp]: 7.72002e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 8.99e-06 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 7.92e-06 [virtual_dataset]: 6.50002e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.54002e-06 [cell_reuse_recompute_pass]: 1.19003e-06 [offload_activation]: 9.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.89001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.44999e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 9.30001e-06 [renormalize]: 0.00044101 [add_forward_monad_depend]: 4.69998e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.36e-05 [cse]: 2.686e-05 [a_3]: 4.076e-05 [Cycle 2]: 0.00060524, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012915 [with_stream_mark]: 9.87999e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.23998e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.882e-05 [accelerated_algorithm]: 5.48002e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.37999e-06 [parallel]: 4.62e-06 [flash_sp]: 3.03998e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.62002e-06 [virtual_dataset]: 5.48997e-06 [get_grad_eliminate_]: 5.70001e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.21997e-06 [offload_activation]: 6.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.004e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.87001e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.42001e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.19999e-06 [cse]: 1.306e-05 [a_3]: 3.321e-05 [py_interpret_to_execute_after_opt_a]: 7.95e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 3.232e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 5.05001e-06 [mutable_eliminate]: 0.00045245 [opt_b]: 0.00019568, [1] [Cycle 1]: 0.00018959, [7] [b_1]: 0.00011976 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 2.89991e-07 [cse]: 1.663e-05 [optimize_parallel_all_gather_comm]: 1.59e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.157e-05 [loop_unroll]: 0.00042376 [opt_after_cconv]: 9.806e-05, [1] [Cycle 1]: 9.203e-05, [7] [c_1]: 2.874e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.35002e-06 [cse]: 1.681e-05 [renormalize]: 4.59986e-07 [remove_dup_value]: 1.217e-05 [tuple_transform]: 7.08e-05, [1] [Cycle 1]: 6.626e-05, [4] [d_1]: 3.986e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.78e-06 [partial_unused_args_eliminate]: 1.99e-06 [add_recomputation]: 5.128e-05 [cse_after_recomputation]: 2.06e-05, [1] [Cycle 1]: 1.612e-05, [1] [cse]: 1.095e-05 [environ_conv]: 4.55999e-06 [swap_dp_allreduce_reducescatter]: 5.46e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.35999e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.13002e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.149e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.38001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.68e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.97e-05, [1] [Cycle 1]: 6.562e-05, [6] [build]: 2.29999e-06 [elim_shapecalc]: 8.74e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 7.03e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.61998e-06 [pipeline_parallel_scheduler]: 1.36002e-06 [auto_monad_reorder]: 1.594e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00045625 [validate]: 3.16e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0641892 [execute]: 8.47e-06 Sums bootstrap : 0.000659s : 0.88% type_inference : 0.006451s : 8.58% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000028s : 0.04% optimize.opt_a.a_1 : 0.000597s : 0.79% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000441s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.60% optimize.opt_b.b_1 : 0.000120s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000424s : 0.56% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000456s : 0.61% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064189s : 85.34% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000169 30 14.16% : 0.000024s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 67.94% : 0.000115s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.46% : 0.000004s : 4: substitution.remove_not_recompute_node 2.48% : 0.000004s : 4: substitution.replace_old_param 6.38% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006401 2 90.72% : 0.005806s : 1: type_inference.infer 9.28% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000041 5 69.30% : 0.000028s : 3: replace.inline 30.70% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.04% : 0.000113s : 3: match.inline 7.96% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.83% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.50% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.93% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 11: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.68% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.80% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.75% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000370 8 45.65% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.35% : 0.000201s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088750 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.90% : 0.003460s : 1: add_attr 3.89% : 0.003449s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.79% : 0.000698s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.10% : 0.000976s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000094s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.46% : 0.002185s : 1: opt_a 0.11% : 0.000101s : 1: opt_after_cconv 0.53% : 0.000466s : 1: opt_after_jit_grad 0.22% : 0.000199s : 1: opt_b 4.58% : 0.004065s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000221s : 1: renormalize.infer 0.24% : 0.000214s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 72.34% : 0.064206s : 1: task_emit 0.08% : 0.000074s : 1: tuple_transform 7.29% : 0.006466s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.0719149, [24] [bootstrap]: 0.00047187 [type_inference]: 0.0044672 [event_method]: 1.137e-05 [auto_monad]: 5.031e-05 [graph_reusing]: 5.23002e-06 [inline]: 2.53003e-06 [add_attr]: 0.00303507, [1] [add_attr_with_inline]: 0.0030269, [1] [Cycle 1]: 4.498e-05, [2] [tag_attr]: 1.202e-05 [meta_addattr_fg_expand]: 3.31001e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.141e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00378326, [53] [py_interpret_to_execute]: 1.489e-05 [rewriter_before_opt_a]: 3.884e-05 [opt_a]: 0.00190446, [2] [Cycle 1]: 0.00128074, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.483e-05 [loop_unroll]: 1.436e-05 [a_1]: 0.00029759 [with_stream_mark]: 1.315e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.68e-05 [accelerated_algorithm]: 6.78998e-06 [shard]: 2.51e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 7.77e-06 [auto_parallel]: 6.34999e-06 [parallel]: 1.757e-05 [flash_sp]: 7.09001e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.42002e-06 [matmul_add_comm_reduction]: 9.49e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 5.97999e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.82999e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.14998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.097e-05 [merge_recompute_call_nodes]: 1.92999e-06 [before_grad]: 9.51e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47002e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 1.083e-05 [a_after_grad]: 9.47999e-06 [renormalize]: 0.00035645 [add_forward_monad_depend]: 4.52998e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.37e-05 [cse]: 2.664e-05 [a_3]: 4.118e-05 [Cycle 2]: 0.00061414, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.06001e-06 [loop_unroll]: 5.55001e-06 [a_1]: 0.0001285 [with_stream_mark]: 1.125e-05 [recompute_prepare]: 5.87999e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 7.08e-05 [accelerated_algorithm]: 6.04999e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.26002e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 4.64998e-06 [auto_parallel]: 5.86e-06 [parallel]: 4.42e-06 [flash_sp]: 3.10998e-06 [merge_comm]: 3.2e-06 [allreduce_fusion]: 2.92002e-06 [matmul_add_comm_reduction]: 5.35001e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.66e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.61999e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 6.28002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.67001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 1.83002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.33001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.322e-05 [a_3]: 3.364e-05 [py_interpret_to_execute_after_opt_a]: 7.78001e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.104e-05 [convert_after_rewriter]: 6.91999e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00045427 [opt_b]: 0.0001863, [1] [Cycle 1]: 0.00017998, [7] [b_1]: 0.00010984 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 6.69999e-07 [cse]: 1.666e-05 [optimize_parallel_all_gather_comm]: 1.613e-05 [overlap_param_gather]: 1.67001e-06 [cconv]: 2.267e-05 [loop_unroll]: 0.00046981 [opt_after_cconv]: 9.72e-05, [1] [Cycle 1]: 9.136e-05, [7] [c_1]: 2.832e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.23998e-06 [cse]: 1.666e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.186e-05 [tuple_transform]: 7.017e-05, [1] [Cycle 1]: 6.584e-05, [4] [d_1]: 3.96e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.33002e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.22e-05 [cse_after_recomputation]: 2.055e-05, [1] [Cycle 1]: 1.622e-05, [1] [cse]: 1.07e-05 [environ_conv]: 4.92999e-06 [swap_dp_allreduce_reducescatter]: 4.99998e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.43998e-06 [micro_interleaved_order_control]: 2.04e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.96001e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.01998e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.689e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.01e-05, [1] [Cycle 1]: 6.601e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.56002e-06 [elim_not_effective]: 1.183e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 9.29e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.555e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.0004576 [validate]: 3.136e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0593342 [execute]: 8.05999e-06 Sums bootstrap : 0.000472s : 0.69% type_inference : 0.004467s : 6.58% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000426s : 0.63% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000357s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.67% optimize.opt_b.b_1 : 0.000110s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000470s : 0.69% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000458s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059334s : 87.38% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000123 26 19.33% : 0.000024s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.19% : 0.000001s : 2: substitution.fold_const_symbol 4.32% : 0.000005s : 4: substitution.graph_param_transform 64.71% : 0.000079s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.44% : 0.000004s : 4: substitution.remove_not_recompute_node 3.23% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004424 2 91.41% : 0.004044s : 1: type_inference.infer 8.59% : 0.000380s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000142 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 1.14% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.56% : 0.000004s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.38% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.38% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000009s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.49% : 0.000004s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.62% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.13% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000274 6 42.96% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.04% : 0.000156s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080036 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.80% : 0.003039s : 1: add_attr 3.79% : 0.003030s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.63% : 0.000507s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.60% : 0.000479s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000787s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.38% : 0.001907s : 1: opt_a 0.13% : 0.000101s : 1: opt_after_cconv 0.58% : 0.000467s : 1: opt_after_jit_grad 0.24% : 0.000190s : 1: opt_b 4.73% : 0.003787s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000193s : 1: renormalize.infer 0.20% : 0.000157s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 74.16% : 0.059352s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.60% : 0.004481s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.0729129, [24] [bootstrap]: 0.00046327 [type_inference]: 0.0055799 [event_method]: 1.457e-05 [auto_monad]: 5.364e-05 [graph_reusing]: 5.91e-06 [inline]: 1.98002e-06 [add_attr]: 0.00300659, [1] [add_attr_with_inline]: 0.00299862, [1] [Cycle 1]: 4.538e-05, [2] [tag_attr]: 1.588e-05 [meta_addattr_fg_expand]: 4.53999e-06 [parallel-infer-symbol]: 2.80997e-06 [pre_auto_parallel]: 2.478e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.93002e-06 [pipeline_split]: 1.47999e-06 [optimize]: 0.0041413, [53] [py_interpret_to_execute]: 2.436e-05 [rewriter_before_opt_a]: 5.907e-05 [opt_a]: 0.00225865, [2] [Cycle 1]: 0.00157489, [45] [expand_dump_flag]: 2.60997e-06 [switch_simplify]: 3.288e-05 [loop_unroll]: 2.11e-05 [a_1]: 0.00047179 [with_stream_mark]: 1.424e-05 [recompute_prepare]: 8.15e-06 [updatestate_depend_eliminate]: 3.72002e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.743e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 2.51998e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 8.08001e-06 [auto_parallel]: 5.99999e-06 [parallel]: 2.647e-05 [flash_sp]: 7.09001e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.73001e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 6.21e-06 [get_grad_eliminate_]: 5.51002e-06 [virtual_output]: 5.66998e-06 [merge_forward]: 3.67998e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.19999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.261e-05 [a_after_grad]: 9.21998e-06 [renormalize]: 0.00043851 [add_forward_monad_depend]: 4.47e-06 [auto_monad_grad]: 2.04e-06 [auto_monad_eliminator]: 1.398e-05 [cse]: 2.981e-05 [a_3]: 4.246e-05 [Cycle 2]: 0.00067428, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.54998e-06 [a_1]: 0.00012805 [with_stream_mark]: 1.047e-05 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 2.98e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 6.999e-05 [accelerated_algorithm]: 5.99e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 6.11e-06 [parallel]: 4.02e-06 [flash_sp]: 3.39001e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 2.96999e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.86001e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.71e-06 [merge_forward]: 2.72001e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 6.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.042e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.95998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08998e-06 [meta_fg_expand]: 1.82001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.62998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.80998e-06 [cse]: 1.465e-05 [a_3]: 3.281e-05 [py_interpret_to_execute_after_opt_a]: 7.88999e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.037e-05 [convert_after_rewriter]: 7.05e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00045959 [opt_b]: 0.00018787, [1] [Cycle 1]: 0.00018186, [7] [b_1]: 0.00011155 [b_2]: 7.5e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.69998e-07 [cse]: 1.699e-05 [optimize_parallel_all_gather_comm]: 1.522e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.244e-05 [loop_unroll]: 0.00042295 [opt_after_cconv]: 9.678e-05, [1] [Cycle 1]: 9.094e-05, [7] [c_1]: 2.859e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.7e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.413e-05 [tuple_transform]: 7.094e-05, [1] [Cycle 1]: 6.652e-05, [4] [d_1]: 3.986e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.69999e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 4.365e-05 [cse_after_recomputation]: 2.192e-05, [1] [Cycle 1]: 1.761e-05, [1] [cse]: 1.203e-05 [environ_conv]: 4.91002e-06 [swap_dp_allreduce_reducescatter]: 5.67001e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.69002e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.51002e-06 [slice_recompute_activation]: 2.66e-06 [micro_interleaved_order_control]: 2.83e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.53e-06 [reorder_send_recv_between_fp_bp]: 2.43e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 4.12e-06 [overlap_grad_flash_sp]: 1.758e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.01998e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 7.05e-05, [1] [Cycle 1]: 6.623e-05, [6] [build]: 2.84001e-06 [elim_shapecalc]: 8.89e-06 [elim_not_effective]: 1.199e-05 [opt_reshape]: 6.25002e-06 [fold_const_symbol]: 9.09998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.543e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00046019 [validate]: 3.111e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0588868 [execute]: 8.70001e-06 Sums bootstrap : 0.000463s : 0.67% type_inference : 0.005580s : 8.10% event_method : 0.000015s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.04% optimize.rewriter_before_opt_a : 0.000059s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000600s : 0.87% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000439s : 0.64% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000044s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.67% optimize.opt_b.b_1 : 0.000112s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000423s : 0.61% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000460s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058887s : 85.52% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000175 30 13.64% : 0.000024s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.84% : 0.000001s : 2: substitution.fold_const_symbol 3.33% : 0.000006s : 4: substitution.graph_param_transform 68.47% : 0.000120s : 3: substitution.inline 1.60% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000005s : 4: substitution.remove_not_recompute_node 2.22% : 0.000004s : 4: substitution.replace_old_param 6.17% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005539 2 89.57% : 0.004961s : 1: type_inference.infer 10.43% : 0.000577s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.50% : 0.000028s : 3: replace.inline 30.50% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000128 5 92.39% : 0.000118s : 3: match.inline 7.61% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000010s : 51: predicate.inline 0.93% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.83% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.77% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000361 8 44.87% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.13% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081633 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.69% : 0.003011s : 1: add_attr 3.68% : 0.003002s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000499s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.53% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.57% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.20% : 0.000977s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000093s : 28: opt.transform.opt_b 0.05% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.77% : 0.002262s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.58% : 0.000470s : 1: opt_after_jit_grad 0.23% : 0.000191s : 1: opt_b 5.08% : 0.004145s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.04% : 0.000029s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.26% : 0.000215s : 1: renormalize.infer 0.26% : 0.000216s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 72.16% : 0.058903s : 1: task_emit 0.09% : 0.000074s : 1: tuple_transform 6.85% : 0.005593s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.118004, [24] [bootstrap]: 0.00049247 [type_inference]: 0.0118718 [event_method]: 5.328e-05 [auto_monad]: 0.00012876 [graph_reusing]: 8.32e-06 [inline]: 2.29001e-06 [add_attr]: 0.0031724, [1] [add_attr_with_inline]: 0.00316372, [1] [Cycle 1]: 7.201e-05, [2] [tag_attr]: 3.387e-05 [meta_addattr_fg_expand]: 1.038e-05 [parallel-infer-symbol]: 2.68998e-06 [pre_auto_parallel]: 5.159e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.0139751, [53] [py_interpret_to_execute]: 3.89e-05 [rewriter_before_opt_a]: 0.00015022 [opt_a]: 0.01164, [3] [Cycle 1]: 0.00750489, [45] [expand_dump_flag]: 3.98001e-06 [switch_simplify]: 7.732e-05 [loop_unroll]: 6.437e-05 [a_1]: 0.00148743 [with_stream_mark]: 2.356e-05 [recompute_prepare]: 2.23e-05 [updatestate_depend_eliminate]: 9.66003e-06 [updatestate_assign_eliminate]: 7.65e-06 [updatestate_loads_eliminate]: 7.43999e-06 [parameter_eliminate]: 2.39001e-06 [a_2]: 0.00024801 [accelerated_algorithm]: 3.119e-05 [shard]: 2.03002e-06 [meta_shard_fg_expand]: 3.75e-06 [shard_inline]: 1.705e-05 [merge_send_recv]: 1.576e-05 [auto_parallel]: 1.082e-05 [parallel]: 1.874e-05 [flash_sp]: 1.137e-05 [merge_comm]: 1.013e-05 [allreduce_fusion]: 9.12001e-06 [matmul_add_comm_reduction]: 2.625e-05 [allreduce_slice_to_reducescatter]: 5.99975e-07 [virtual_shard_identity]: 1.814e-05 [virtual_dataset]: 1.636e-05 [get_grad_eliminate_]: 1.553e-05 [virtual_output]: 1.556e-05 [merge_forward]: 9.56e-06 [cell_reuse_recompute_pass]: 1.09003e-06 [offload_activation]: 1.864e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.9e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 2.801e-05 [set_forward_comm_id_for_comm_node_pass]: 1.005e-05 [meta_fg_expand]: 0.0015448 [flash_sp_send_recv_attached]: 3.86999e-06 [receive_attached]: 2.60002e-06 [after_resolve]: 6.179e-05 [a_after_grad]: 8.408e-05 [renormalize]: 0.00261468 [add_forward_monad_depend]: 9.39e-06 [auto_monad_grad]: 5.55001e-06 [auto_monad_eliminator]: 5.892e-05 [cse]: 0.00018041 [a_3]: 0.0003453 [Cycle 2]: 0.00314463, [45] [expand_dump_flag]: 1.60001e-06 [switch_simplify]: 4.811e-05 [loop_unroll]: 4.468e-05 [a_1]: 0.00158896 [with_stream_mark]: 1.264e-05 [recompute_prepare]: 1.115e-05 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 4.55001e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012861 [accelerated_algorithm]: 1.243e-05 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 2.11e-06 [shard_inline]: 9.64e-06 [merge_send_recv]: 6.74999e-06 [auto_parallel]: 7.45e-06 [parallel]: 5.44e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 5.28002e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.87e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.067e-05 [virtual_dataset]: 9.15999e-06 [get_grad_eliminate_]: 9.14998e-06 [virtual_output]: 8.46002e-06 [merge_forward]: 5.35001e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.015e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.69e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.426e-05 [set_forward_comm_id_for_comm_node_pass]: 5.56e-06 [meta_fg_expand]: 7.367e-05 [flash_sp_send_recv_attached]: 1.01002e-06 [receive_attached]: 1.09003e-06 [after_resolve]: 1.66e-05 [a_after_grad]: 1.477e-05 [renormalize]: 0.00064024 [add_forward_monad_depend]: 4.75001e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.565e-05 [cse]: 5.267e-05 [a_3]: 6.769e-05 [Cycle 3]: 0.00097608, [45] [expand_dump_flag]: 1.15001e-06 [switch_simplify]: 1.101e-05 [loop_unroll]: 9.31e-06 [a_1]: 0.00028921 [with_stream_mark]: 1.115e-05 [recompute_prepare]: 9.82999e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.24997e-06 [updatestate_loads_eliminate]: 4.33001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012774 [accelerated_algorithm]: 1.223e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.83002e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 7.60998e-06 [auto_parallel]: 7.38999e-06 [parallel]: 4.89e-06 [flash_sp]: 1.08001e-06 [merge_comm]: 5.27999e-06 [allreduce_fusion]: 5.21002e-06 [matmul_add_comm_reduction]: 8.03001e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 1.131e-05 [virtual_dataset]: 9.41e-06 [get_grad_eliminate_]: 9.00999e-06 [virtual_output]: 8.69e-06 [merge_forward]: 4.39002e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 9.20001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.666e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.463e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.41999e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.39998e-06 [after_resolve]: 1.591e-05 [a_after_grad]: 1.539e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.41002e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.138e-05 [cse]: 2.937e-05 [a_3]: 6.251e-05 [py_interpret_to_execute_after_opt_a]: 1.115e-05 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 4.817e-05 [convert_after_rewriter]: 9.29e-06 [order_py_execute_after_rewriter]: 7.2e-06 [mutable_eliminate]: 0.00046687 [opt_b]: 0.00030021, [1] [Cycle 1]: 0.00029408, [7] [b_1]: 0.00019581 [b_2]: 1.134e-05 [updatestate_depend_eliminate]: 7.66001e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 4.10998e-06 [renormalize]: 4.39992e-07 [cse]: 3.445e-05 [optimize_parallel_all_gather_comm]: 2.169e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.015e-05 [loop_unroll]: 0.00043626 [opt_after_cconv]: 0.00014242, [1] [Cycle 1]: 0.00013617, [7] [c_1]: 5.043e-05 [parameter_eliminate]: 2.48e-06 [updatestate_depend_eliminate]: 7.59002e-06 [updatestate_assign_eliminate]: 4.70001e-06 [updatestate_loads_eliminate]: 4.14997e-06 [cse]: 3.202e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.945e-05 [tuple_transform]: 0.00010615, [1] [Cycle 1]: 0.00010105, [4] [d_1]: 6.99e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.019e-05 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 5.843e-05 [cse_after_recomputation]: 3.447e-05, [1] [Cycle 1]: 2.958e-05, [1] [cse]: 2.401e-05 [environ_conv]: 9.49e-06 [swap_dp_allreduce_reducescatter]: 8.40999e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.05002e-06 [assign_add_opt]: 1.50001e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.03002e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.30999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.785e-05 [grouped_pairwise_exchange_alltoall]: 1.43002e-06 [offloading_packed_experts]: 5.76e-06 [overlap_recompute_and_grad_model_parallel]: 5.63002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.64999e-06 [overlap_grad_ring_attention]: 5.19e-06 [overlap_grad_flash_sp]: 2.448e-05 [begin_end_overlap_inline]: 6.79982e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 0.00010347, [1] [Cycle 1]: 9.91e-05, [6] [build]: 9.59e-06 [elim_shapecalc]: 1.407e-05 [elim_not_effective]: 1.955e-05 [opt_reshape]: 1.101e-05 [fold_const_symbol]: 1.55e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.36002e-06 [auto_monad_reorder]: 2.667e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00048531 [validate]: 4.634e-05 [backend_pass]: 1.17999e-06 [task_emit]: 0.0874472 [execute]: 8.21002e-06 Sums bootstrap : 0.000492s : 0.43% type_inference : 0.011872s : 10.46% event_method : 0.000053s : 0.05% auto_monad : 0.000129s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000052s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000150s : 0.13% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000136s : 0.12% optimize.opt_a.loop_unroll : 0.000118s : 0.10% optimize.opt_a.a_1 : 0.003366s : 2.97% optimize.opt_a.with_stream_mark : 0.000047s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000504s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000036s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.04% optimize.opt_a.virtual_dataset : 0.000035s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001622s : 1.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000094s : 0.08% optimize.opt_a.a_after_grad : 0.000114s : 0.10% optimize.opt_a.renormalize : 0.003255s : 2.87% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.08% optimize.opt_a.cse : 0.000262s : 0.23% optimize.opt_a.a_3 : 0.000476s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.41% optimize.opt_b.b_1 : 0.000196s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000436s : 0.38% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000070s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000485s : 0.43% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.087447s : 77.05% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000799 222 5.90% : 0.000047s : 12: substitution.arithmetic_simplify 1.75% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.49% : 0.000004s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.70% : 0.000453s : 17: substitution.inline 1.99% : 0.000016s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.86% : 0.000015s : 3: substitution.less_batch_normalization 1.69% : 0.000014s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.72% : 0.000014s : 20: substitution.remove_not_recompute_node 3.12% : 0.000025s : 10: substitution.replace_applicator 1.37% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.50% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.23% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.45% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011791 2 85.52% : 0.010084s : 1: type_inference.infer 14.48% : 0.001707s : 1: type_inference.specialize ------[replace.] 0.000234 33 56.88% : 0.000133s : 17: replace.inline 43.12% : 0.000101s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000479 33 92.67% : 0.000444s : 17: match.inline 7.33% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000774 5764 1.05% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.04% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.19% : 0.000017s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_depend_swap 1.71% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000043s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.60% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000009s : 68: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000016s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.03% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 68: predicate.reduce_eliminate 2.61% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 152: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.92% : 0.000023s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000039s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.53% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.55% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000017s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.56% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.20% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.19% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.17% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001757 34 55.95% : 0.000983s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.05% : 0.000774s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.143824 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.21% : 0.003177s : 1: add_attr 2.20% : 0.003168s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000136s : 1: auto_monad 0.02% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.37% : 0.000527s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000061s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000445s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000476s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.53% : 0.005081s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000181s : 28: opt.transform.opt_b 0.05% : 0.000078s : 2: opt.transform.opt_trans_graph 0.04% : 0.000057s : 4: opt.transform.symbol_engine_opt 8.10% : 0.011643s : 1: opt_a 0.10% : 0.000146s : 1: opt_after_cconv 0.34% : 0.000495s : 1: opt_after_jit_grad 0.21% : 0.000304s : 1: opt_b 9.72% : 0.013979s : 1: optimize 0.02% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000056s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.18% : 0.001695s : 2: renormalize.infer 1.07% : 0.001546s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.11% : 0.000155s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000106s : 1: symbol_engine_optimizer 60.81% : 0.087465s : 1: task_emit 0.08% : 0.000109s : 1: tuple_transform 8.27% : 0.011887s : 1: type_inference 0.05% : 0.000071s : 1: validate TotalTime = 0.0720451, [24] [bootstrap]: 0.00046373 [type_inference]: 0.0044105 [event_method]: 1.133e-05 [auto_monad]: 5.167e-05 [graph_reusing]: 6.07999e-06 [inline]: 1.74e-06 [add_attr]: 0.00307553, [1] [add_attr_with_inline]: 0.00306726, [1] [Cycle 1]: 4.565e-05, [2] [tag_attr]: 1.228e-05 [meta_addattr_fg_expand]: 3.35998e-06 [parallel-infer-symbol]: 2.68998e-06 [pre_auto_parallel]: 2.196e-05 [insert-virtual-dataset]: 2.70997e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00379272, [53] [py_interpret_to_execute]: 1.48e-05 [rewriter_before_opt_a]: 3.831e-05 [opt_a]: 0.00194311, [2] [Cycle 1]: 0.00132281, [45] [expand_dump_flag]: 3.03998e-06 [switch_simplify]: 2.554e-05 [loop_unroll]: 1.424e-05 [a_1]: 0.00029939 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.91001e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.768e-05 [accelerated_algorithm]: 6.56999e-06 [shard]: 2.30002e-06 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 6.05002e-06 [merge_send_recv]: 8.38001e-06 [auto_parallel]: 6.18998e-06 [parallel]: 1.764e-05 [flash_sp]: 7.11999e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.43999e-06 [matmul_add_comm_reduction]: 8.90999e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.53e-06 [virtual_dataset]: 6.23e-06 [get_grad_eliminate_]: 5.90002e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 3.70998e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 9.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.175e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 1.021e-05 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.83e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.116e-05 [a_after_grad]: 9.07001e-06 [renormalize]: 0.00038432 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 2.06e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.904e-05 [a_3]: 4.211e-05 [Cycle 2]: 0.00061099, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.10998e-06 [loop_unroll]: 5.67001e-06 [a_1]: 0.00012831 [with_stream_mark]: 1.184e-05 [recompute_prepare]: 5.96e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.949e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.39e-06 [meta_shard_fg_expand]: 1.23002e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.31002e-06 [auto_parallel]: 5.19998e-06 [parallel]: 4.32e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 4.98001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.72002e-06 [virtual_dataset]: 5.46002e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.83e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.67001e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.09002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 1.24e-06 [receive_attached]: 1.04998e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 9.15999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.70002e-06 [cse]: 1.357e-05 [a_3]: 3.329e-05 [py_interpret_to_execute_after_opt_a]: 7.79002e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.057e-05 [convert_after_rewriter]: 7.23e-06 [order_py_execute_after_rewriter]: 5.19998e-06 [mutable_eliminate]: 0.00045841 [opt_b]: 0.00018729, [1] [Cycle 1]: 0.00018076, [7] [b_1]: 0.00011157 [b_2]: 7.5e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 2.50002e-07 [cse]: 1.643e-05 [optimize_parallel_all_gather_comm]: 1.645e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 2.236e-05 [loop_unroll]: 0.00042438 [opt_after_cconv]: 9.76e-05, [1] [Cycle 1]: 9.174e-05, [7] [c_1]: 2.888e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.744e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.307e-05 [tuple_transform]: 7.095e-05, [1] [Cycle 1]: 6.649e-05, [4] [d_1]: 4.027e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.44999e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 4.51e-05 [cse_after_recomputation]: 2.132e-05, [1] [Cycle 1]: 1.681e-05, [1] [cse]: 1.15e-05 [environ_conv]: 5.05001e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 2.63998e-06 [label_micro_interleaved_index]: 4.65999e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.58998e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.99977e-07 [remove_cast_before_assign_add]: 8.70001e-07 [full_micro_interleaved_order_control]: 2.13002e-06 [reorder_send_recv_between_fp_bp]: 2.41e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.204e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.81001e-06 [overlap_recompute_and_grad_model_parallel]: 4.65999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.34999e-06 [overlap_grad_ring_attention]: 4.15999e-06 [overlap_grad_flash_sp]: 1.713e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.40002e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 7.058e-05, [1] [Cycle 1]: 6.63e-05, [6] [build]: 2.61e-06 [elim_shapecalc]: 8.78001e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 6.11998e-06 [fold_const_symbol]: 9.15999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.589e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00046329 [validate]: 3.123e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.0594726 [execute]: 8.06001e-06 Sums bootstrap : 0.000464s : 0.68% type_inference : 0.004410s : 6.49% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000033s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000428s : 0.63% optimize.opt_a.with_stream_mark : 0.000026s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000384s : 0.57% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000043s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.67% optimize.opt_b.b_1 : 0.000112s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000424s : 0.62% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000463s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059473s : 87.47% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000124 26 17.71% : 0.000022s : 4: substitution.arithmetic_simplify 1.52% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.36% : 0.000005s : 4: substitution.graph_param_transform 65.33% : 0.000081s : 2: substitution.inline 2.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.85% : 0.000005s : 4: substitution.remove_not_recompute_node 3.50% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004368 2 91.57% : 0.004000s : 1: type_inference.infer 8.43% : 0.000368s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000142 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.29% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.76% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.96% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.98% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.96% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.68% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.74% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.76% : 0.000003s : 17: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.62% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.57% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.69% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.98% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 41.07% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.93% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080250 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.84% : 0.003080s : 1: add_attr 3.83% : 0.003071s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000498s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.54% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.99% : 0.000793s : 78: opt.transform.opt_a 0.03% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000094s : 28: opt.transform.opt_b 0.06% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.42% : 0.001946s : 1: opt_a 0.13% : 0.000101s : 1: opt_after_cconv 0.59% : 0.000473s : 1: opt_after_jit_grad 0.24% : 0.000191s : 1: opt_b 4.73% : 0.003797s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.27% : 0.000215s : 1: renormalize.infer 0.20% : 0.000163s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 74.13% : 0.059489s : 1: task_emit 0.09% : 0.000074s : 1: tuple_transform 5.51% : 0.004424s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.112348, [24] [bootstrap]: 0.00050308 [type_inference]: 0.0106894 [event_method]: 4.744e-05 [auto_monad]: 0.00012105 [graph_reusing]: 8.79998e-06 [inline]: 2.06e-06 [add_attr]: 0.00304465, [1] [add_attr_with_inline]: 0.00303609, [1] [Cycle 1]: 6.812e-05, [2] [tag_attr]: 3.224e-05 [meta_addattr_fg_expand]: 9.71e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 4.718e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.51998e-06 [optimize]: 0.0138139, [53] [py_interpret_to_execute]: 3.598e-05 [rewriter_before_opt_a]: 0.00016013 [opt_a]: 0.01143, [3] [Cycle 1]: 0.00735888, [45] [expand_dump_flag]: 3.93999e-06 [switch_simplify]: 6.827e-05 [loop_unroll]: 5.723e-05 [a_1]: 0.0013835 [with_stream_mark]: 2.429e-05 [recompute_prepare]: 2.232e-05 [updatestate_depend_eliminate]: 9.29e-06 [updatestate_assign_eliminate]: 8.58001e-06 [updatestate_loads_eliminate]: 7.48e-06 [parameter_eliminate]: 2.53998e-06 [a_2]: 0.000248 [accelerated_algorithm]: 3.13e-05 [shard]: 2.16998e-06 [meta_shard_fg_expand]: 3.55e-06 [shard_inline]: 1.662e-05 [merge_send_recv]: 1.671e-05 [auto_parallel]: 1.172e-05 [parallel]: 1.816e-05 [flash_sp]: 1.136e-05 [merge_comm]: 1.02e-05 [allreduce_fusion]: 9.16998e-06 [matmul_add_comm_reduction]: 2.7e-05 [allreduce_slice_to_reducescatter]: 7.89994e-07 [virtual_shard_identity]: 1.814e-05 [virtual_dataset]: 1.721e-05 [get_grad_eliminate_]: 1.546e-05 [virtual_output]: 1.548e-05 [merge_forward]: 9.57999e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 1.821e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.999e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.833e-05 [set_forward_comm_id_for_comm_node_pass]: 1.032e-05 [meta_fg_expand]: 0.00149006 [flash_sp_send_recv_attached]: 4.38999e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 6.165e-05 [a_after_grad]: 8.623e-05 [renormalize]: 0.00261281 [add_forward_monad_depend]: 9.22001e-06 [auto_monad_grad]: 6.04001e-06 [auto_monad_eliminator]: 5.811e-05 [cse]: 0.00018082 [a_3]: 0.00034464 [Cycle 2]: 0.00312269, [45] [expand_dump_flag]: 1.58002e-06 [switch_simplify]: 4.844e-05 [loop_unroll]: 4.477e-05 [a_1]: 0.00161491 [with_stream_mark]: 1.262e-05 [recompute_prepare]: 1.179e-05 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.50001e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 1.09003e-06 [a_2]: 0.00013041 [accelerated_algorithm]: 1.239e-05 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 2.06e-06 [shard_inline]: 9.42001e-06 [merge_send_recv]: 7.02002e-06 [auto_parallel]: 7.37002e-06 [parallel]: 4.89e-06 [flash_sp]: 3.56999e-06 [merge_comm]: 5.20999e-06 [allreduce_fusion]: 4.79e-06 [matmul_add_comm_reduction]: 8.29998e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.087e-05 [virtual_dataset]: 9.31e-06 [get_grad_eliminate_]: 9.20999e-06 [virtual_output]: 8.72998e-06 [merge_forward]: 4.72e-06 [cell_reuse_recompute_pass]: 9.5999e-07 [offload_activation]: 9.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.762e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 1.483e-05 [set_forward_comm_id_for_comm_node_pass]: 5.50001e-06 [meta_fg_expand]: 3.677e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.569e-05 [a_after_grad]: 1.529e-05 [renormalize]: 0.00062783 [add_forward_monad_depend]: 4.23999e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.569e-05 [cse]: 5.264e-05 [a_3]: 6.868e-05 [Cycle 3]: 0.00093403, [45] [expand_dump_flag]: 1.09998e-06 [switch_simplify]: 1.119e-05 [loop_unroll]: 9.21002e-06 [a_1]: 0.0002567 [with_stream_mark]: 1.06e-05 [recompute_prepare]: 1.007e-05 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 4.02e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012678 [accelerated_algorithm]: 1.223e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.91998e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 7.34002e-06 [auto_parallel]: 7.29001e-06 [parallel]: 4.58999e-06 [flash_sp]: 1.12999e-06 [merge_comm]: 5.14e-06 [allreduce_fusion]: 5.20999e-06 [matmul_add_comm_reduction]: 7.83999e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 9.02e-06 [get_grad_eliminate_]: 8.69e-06 [virtual_output]: 8.57998e-06 [merge_forward]: 4.51002e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.772e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.564e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30999e-06 [meta_fg_expand]: 3.33998e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.431e-05 [a_after_grad]: 1.487e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.45999e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 1.161e-05 [cse]: 2.888e-05 [a_3]: 6.228e-05 [py_interpret_to_execute_after_opt_a]: 1.056e-05 [slice_cell_reuse_recomputed_activation]: 1.86003e-06 [rewriter_after_opt_a]: 4.706e-05 [convert_after_rewriter]: 9.87999e-06 [order_py_execute_after_rewriter]: 7.55998e-06 [mutable_eliminate]: 0.00050529 [opt_b]: 0.00030235, [1] [Cycle 1]: 0.00029596, [7] [b_1]: 0.00019818 [b_2]: 1.133e-05 [updatestate_depend_eliminate]: 7.47002e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 4.41002e-06 [renormalize]: 7.09988e-07 [cse]: 3.419e-05 [optimize_parallel_all_gather_comm]: 2.168e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.095e-05 [loop_unroll]: 0.00043864 [opt_after_cconv]: 0.0001413, [1] [Cycle 1]: 0.00013542, [7] [c_1]: 5.032e-05 [parameter_eliminate]: 2.38002e-06 [updatestate_depend_eliminate]: 7.62998e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 3.98001e-06 [cse]: 3.169e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 2.973e-05 [tuple_transform]: 0.00010653, [1] [Cycle 1]: 0.00010162, [4] [d_1]: 7.02e-05 [none_parameter_eliminate]: 1.88997e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.024e-05 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 5.826e-05 [cse_after_recomputation]: 3.534e-05, [1] [Cycle 1]: 3.046e-05, [1] [cse]: 2.43e-05 [environ_conv]: 9.52001e-06 [swap_dp_allreduce_reducescatter]: 8.2e-06 [bias_add_comm_swap]: 2.69999e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.80002e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.13002e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.76e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 5.74999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.38002e-06 [overlap_grad_ring_attention]: 5.35999e-06 [overlap_grad_flash_sp]: 2.377e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.93002e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.24998e-06 [symbol_engine_optimizer]: 0.00010377, [1] [Cycle 1]: 9.885e-05, [6] [build]: 9.98998e-06 [elim_shapecalc]: 1.516e-05 [elim_not_effective]: 1.947e-05 [opt_reshape]: 1.056e-05 [fold_const_symbol]: 1.504e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 2.603e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.48999e-06 [opt_after_jit_grad]: 0.00048534 [validate]: 4.684e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0832737 [execute]: 8.65001e-06 Sums bootstrap : 0.000503s : 0.47% type_inference : 0.010689s : 9.90% event_method : 0.000047s : 0.04% auto_monad : 0.000121s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000160s : 0.15% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000128s : 0.12% optimize.opt_a.loop_unroll : 0.000111s : 0.10% optimize.opt_a.a_1 : 0.003255s : 3.02% optimize.opt_a.with_stream_mark : 0.000048s : 0.04% optimize.opt_a.recompute_prepare : 0.000044s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000505s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000036s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000059s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001530s : 1.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.08% optimize.opt_a.a_after_grad : 0.000116s : 0.11% optimize.opt_a.renormalize : 0.003241s : 3.00% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.08% optimize.opt_a.cse : 0.000262s : 0.24% optimize.opt_a.a_3 : 0.000476s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000008s : 0.01% optimize.mutable_eliminate : 0.000505s : 0.47% optimize.opt_b.b_1 : 0.000198s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000034s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000439s : 0.41% optimize.opt_after_cconv.c_1 : 0.000050s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000070s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000485s : 0.45% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.083274s : 77.14% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000768 218 5.66% : 0.000043s : 11: substitution.arithmetic_simplify 1.74% : 0.000013s : 2: substitution.cast_eliminate 0.41% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.02% : 0.000430s : 16: substitution.inline 2.18% : 0.000017s : 2: substitution.inline_without_move 1.41% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000014s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.91% : 0.000015s : 20: substitution.remove_not_recompute_node 3.24% : 0.000025s : 10: substitution.replace_applicator 1.35% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.71% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.18% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010619 2 85.60% : 0.009089s : 1: type_inference.infer 14.40% : 0.001530s : 1: type_inference.specialize ------[replace.] 0.000219 30 58.19% : 0.000127s : 16: replace.inline 41.81% : 0.000091s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000453 30 93.12% : 0.000422s : 16: match.inline 6.88% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000793 5663 1.02% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.04% : 0.000008s : 67: predicate.addn_zero_filter 0.98% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 99: predicate.arithmetic_simplify 1.08% : 0.000009s : 67: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.10% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.03% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.12% : 0.000009s : 75: predicate.environ_get_depend_swap 1.65% : 0.000013s : 107: predicate.environ_get_eliminate 1.10% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.58% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.18% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.35% : 0.000042s : 244: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.70% : 0.000013s : 97: predicate.list_to_tuple_eliminator_ 5.78% : 0.000046s : 164: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.14% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.33% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000009s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000016s : 97: predicate.partial_defer_inline 1.65% : 0.000013s : 89: predicate.partial_eliminate 1.03% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000011s : 67: predicate.reduce_eliminate 2.55% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.78% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.03% : 0.000008s : 67: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.27% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.74% : 0.000014s : 97: predicate.switch_defer_inline 2.77% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.65% : 0.000037s : 265: predicate.switch_simplify 1.14% : 0.000009s : 67: predicate.tile_eliminate 1.03% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.49% : 0.000012s : 83: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000017s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000013s : 97: predicate.tuple_to_list_eliminator_ 2.46% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.11% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.16% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001733 32 54.51% : 0.000945s : 12: func_graph_cloner_run.FuncGraphClonerGraph 45.49% : 0.000788s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.137747 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.21% : 0.003049s : 1: add_attr 2.21% : 0.003040s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000128s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.39% : 0.000536s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000055s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000447s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000515s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.60% : 0.004959s : 117: opt.transform.opt_a 0.04% : 0.000049s : 1: opt.transform.opt_after_cconv 0.03% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000183s : 28: opt.transform.opt_b 0.06% : 0.000078s : 2: opt.transform.opt_trans_graph 0.04% : 0.000057s : 4: opt.transform.symbol_engine_opt 8.30% : 0.011433s : 1: opt_a 0.11% : 0.000145s : 1: opt_after_cconv 0.36% : 0.000495s : 1: opt_after_jit_grad 0.22% : 0.000306s : 1: opt_b 10.03% : 0.013818s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.21% : 0.001661s : 2: renormalize.infer 1.14% : 0.001566s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.12% : 0.000166s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000107s : 1: symbol_engine_optimizer 60.47% : 0.083291s : 1: task_emit 0.08% : 0.000110s : 1: tuple_transform 7.77% : 0.010705s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x0-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x1-pynative],max_mem:42.0M TotalTime = 0.0228763, [24] [bootstrap]: 0.00063058 [type_inference]: 0.00686127 [event_method]: 1.544e-05 [auto_monad]: 5.598e-05 [graph_reusing]: 5.86998e-06 [inline]: 1.62001e-06 [add_attr]: 0.00353988, [1] [add_attr_with_inline]: 0.00352868, [1] [Cycle 1]: 4.513e-05, [2] [tag_attr]: 1.548e-05 [meta_addattr_fg_expand]: 4.33999e-06 [parallel-infer-symbol]: 2.58e-06 [pre_auto_parallel]: 2.733e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.81003e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00413899, [53] [py_interpret_to_execute]: 2.074e-05 [rewriter_before_opt_a]: 6.013e-05 [opt_a]: 0.00224744, [2] [Cycle 1]: 0.00162877, [45] [expand_dump_flag]: 2.42001e-06 [switch_simplify]: 3.279e-05 [loop_unroll]: 7.594e-05 [a_1]: 0.00046928 [with_stream_mark]: 1.375e-05 [recompute_prepare]: 7.92e-06 [updatestate_depend_eliminate]: 4.17e-06 [updatestate_assign_eliminate]: 3.44001e-06 [updatestate_loads_eliminate]: 3.24001e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 8.12e-05 [accelerated_algorithm]: 6.76e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 6.03998e-06 [parallel]: 2.292e-05 [flash_sp]: 7.05002e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 8.79003e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.71999e-06 [virtual_dataset]: 6.24999e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.79e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 9.08002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.134e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.21998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.56e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 1.115e-05 [a_after_grad]: 9.41e-06 [renormalize]: 0.00044189 [add_forward_monad_depend]: 4.86002e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 2.747e-05 [a_3]: 4.275e-05 [Cycle 2]: 0.00060933, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012772 [with_stream_mark]: 9.92001e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 3.02002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 6.868e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.61998e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.09e-06 [parallel]: 4.3e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.25001e-06 [merge_forward]: 2.73998e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.66002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.17002e-06 [meta_fg_expand]: 1.79998e-06 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.15001e-06 [a_after_grad]: 8.65999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 6.96999e-06 [cse]: 1.35e-05 [a_3]: 3.34e-05 [py_interpret_to_execute_after_opt_a]: 7.78001e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.089e-05 [convert_after_rewriter]: 7.75998e-06 [order_py_execute_after_rewriter]: 5.34e-06 [mutable_eliminate]: 0.00045651 [opt_b]: 0.00018842, [1] [Cycle 1]: 0.0001821, [7] [b_1]: 0.00011207 [b_2]: 7.58001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.51998e-06 [renormalize]: 3.69997e-07 [cse]: 1.668e-05 [optimize_parallel_all_gather_comm]: 1.676e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.203e-05 [loop_unroll]: 0.00042958 [opt_after_cconv]: 9.768e-05, [1] [Cycle 1]: 9.207e-05, [7] [c_1]: 2.895e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.67e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.229e-05 [tuple_transform]: 7.167e-05, [1] [Cycle 1]: 6.72e-05, [4] [d_1]: 4.084e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.942e-05 [cse_after_recomputation]: 2.15e-05, [1] [Cycle 1]: 1.694e-05, [1] [cse]: 1.152e-05 [environ_conv]: 5.68997e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 2.15002e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.79999e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 1.28002e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.61999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.211e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.62998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 7.202e-05, [1] [Cycle 1]: 6.765e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 9.52999e-06 [elim_not_effective]: 1.255e-05 [opt_reshape]: 6.37001e-06 [fold_const_symbol]: 9.32001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 4.336e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 0.00013823 [opt_after_jit_grad]: 0.00046478 [validate]: 3.171e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00668954 [execute]: 7.51999e-06 Sums bootstrap : 0.000631s : 3.44% type_inference : 0.006861s : 37.41% event_method : 0.000015s : 0.08% auto_monad : 0.000056s : 0.31% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000027s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.11% optimize.rewriter_before_opt_a : 0.000060s : 0.33% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.22% optimize.opt_a.loop_unroll : 0.000081s : 0.44% optimize.opt_a.a_1 : 0.000597s : 3.26% optimize.opt_a.with_stream_mark : 0.000024s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000150s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000018s : 0.10% optimize.opt_a.renormalize : 0.000442s : 2.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000041s : 0.22% optimize.opt_a.a_3 : 0.000076s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.17% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000457s : 2.49% optimize.opt_b.b_1 : 0.000112s : 0.61% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000430s : 2.34% optimize.opt_after_cconv.c_1 : 0.000029s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000043s : 0.24% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000138s : 0.75% opt_after_jit_grad : 0.000465s : 2.53% validate : 0.000032s : 0.17% backend_pass : 0.000001s : 0.01% task_emit : 0.006690s : 36.48% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000169 30 14.59% : 0.000025s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.24% : 0.000005s : 4: substitution.graph_param_transform 67.66% : 0.000114s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 4: substitution.replace_old_param 6.28% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006805 2 89.69% : 0.006103s : 1: type_inference.infer 10.31% : 0.000701s : 1: type_inference.specialize ------[replace.] 0.000042 5 68.52% : 0.000029s : 3: replace.inline 31.48% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.16% : 0.000112s : 3: match.inline 7.84% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.96% : 0.000002s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000002s : 8: predicate.less_batch_normalization 1.79% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.36% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.41% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 21: predicate.replace_applicator 0.50% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.89% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.47% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.11% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.88% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.65% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.67% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000433 8 45.22% : 0.000196s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.78% : 0.000237s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032186 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.01% : 0.003544s : 1: add_attr 10.97% : 0.003532s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000061s : 1: auto_monad 0.15% : 0.000048s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.08% : 0.000668s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000438s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.45% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.20% : 0.001031s : 78: opt.transform.opt_a 0.09% : 0.000028s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 6.99% : 0.002251s : 1: opt_a 0.31% : 0.000101s : 1: opt_after_cconv 1.47% : 0.000475s : 1: opt_after_jit_grad 0.60% : 0.000192s : 1: opt_b 12.87% : 0.004143s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.69% : 0.000222s : 1: renormalize.infer 0.66% : 0.000213s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000145s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000075s : 1: symbol_engine_optimizer 20.82% : 0.006700s : 1: task_emit 0.23% : 0.000075s : 1: tuple_transform 21.36% : 0.006875s : 1: type_inference 0.20% : 0.000065s : 1: validate TotalTime = 0.0186769, [24] [bootstrap]: 0.00046025 [type_inference]: 0.00448722 [event_method]: 1.089e-05 [auto_monad]: 8.396e-05 [graph_reusing]: 5.66e-06 [inline]: 2.11e-06 [add_attr]: 0.00301904, [1] [add_attr_with_inline]: 0.00301093, [1] [Cycle 1]: 4.468e-05, [2] [tag_attr]: 1.209e-05 [meta_addattr_fg_expand]: 3.18998e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.17e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.53e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00380931, [53] [py_interpret_to_execute]: 1.534e-05 [rewriter_before_opt_a]: 4.04e-05 [opt_a]: 0.00195636, [2] [Cycle 1]: 0.00133737, [45] [expand_dump_flag]: 2.63003e-06 [switch_simplify]: 2.477e-05 [loop_unroll]: 1.461e-05 [a_1]: 0.00030118 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 3.69002e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 2.94999e-06 [parameter_eliminate]: 2.19001e-06 [a_2]: 7.884e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.54999e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 6.04001e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 5.83002e-06 [parallel]: 1.817e-05 [flash_sp]: 7.05e-06 [merge_comm]: 3.36001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.99998e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.71001e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.63997e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.115e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 9.60001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76999e-06 [meta_fg_expand]: 2.55002e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 1.086e-05 [a_after_grad]: 9.64999e-06 [renormalize]: 0.00035648 [add_forward_monad_depend]: 4.32e-06 [auto_monad_grad]: 1.76998e-06 [auto_monad_eliminator]: 1.31e-05 [cse]: 2.743e-05 [a_3]: 4.286e-05 [Cycle 2]: 0.00060956, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.71999e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.0001298 [with_stream_mark]: 9.62999e-06 [recompute_prepare]: 6.05002e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.999e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.28002e-06 [shard_inline]: 5.73997e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.42001e-06 [parallel]: 4.78001e-06 [flash_sp]: 3.44001e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.42999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.32001e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 5.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.75002e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.54e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.89e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.69972e-07 [after_resolve]: 9.17001e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.44001e-06 [cse]: 1.323e-05 [a_3]: 3.355e-05 [py_interpret_to_execute_after_opt_a]: 7.61999e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.096e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.00045635 [opt_b]: 0.00018708, [1] [Cycle 1]: 0.00018089, [7] [b_1]: 0.00011125 [b_2]: 7.51001e-06 [updatestate_depend_eliminate]: 5.48002e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.48e-06 [renormalize]: 3.69997e-07 [cse]: 1.683e-05 [optimize_parallel_all_gather_comm]: 1.61e-05 [overlap_param_gather]: 2.28998e-06 [cconv]: 2.274e-05 [loop_unroll]: 0.00042938 [opt_after_cconv]: 9.745e-05, [1] [Cycle 1]: 9.164e-05, [7] [c_1]: 2.913e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.605e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.277e-05 [tuple_transform]: 7.182e-05, [1] [Cycle 1]: 6.721e-05, [4] [d_1]: 4.067e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.51e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.379e-05 [cse_after_recomputation]: 1.982e-05, [1] [Cycle 1]: 1.521e-05, [1] [cse]: 1.007e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 5.17e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.17001e-06 [micro_interleaved_order_control]: 2.37001e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.30013e-07 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.91999e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.213e-05 [grouped_pairwise_exchange_alltoall]: 1.90001e-06 [offloading_packed_experts]: 3.56999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 1.92999e-06 [overlap_grad_ring_attention]: 4.3e-06 [overlap_grad_flash_sp]: 1.636e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 7.121e-05, [1] [Cycle 1]: 6.695e-05, [6] [build]: 2.35002e-06 [elim_shapecalc]: 8.58001e-06 [elim_not_effective]: 1.199e-05 [opt_reshape]: 6.53e-06 [fold_const_symbol]: 9.24998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.576e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.70998e-06 [opt_after_jit_grad]: 0.00046392 [validate]: 3.141e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00604148 [execute]: 6.84999e-06 Sums bootstrap : 0.000460s : 3.15% type_inference : 0.004487s : 30.66% event_method : 0.000011s : 0.07% auto_monad : 0.000084s : 0.57% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000431s : 2.95% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000149s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000357s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000076s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000456s : 3.12% optimize.opt_b.b_1 : 0.000111s : 0.76% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000429s : 2.93% optimize.opt_after_cconv.c_1 : 0.000029s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000464s : 3.17% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006041s : 41.29% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000123 26 17.87% : 0.000022s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.17% : 0.000001s : 2: substitution.fold_const_symbol 4.26% : 0.000005s : 4: substitution.graph_param_transform 66.40% : 0.000082s : 2: substitution.inline 2.46% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 2.97% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004447 2 91.76% : 0.004081s : 1: type_inference.infer 8.24% : 0.000366s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000142 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.75% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.60% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.08% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.76% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.25% : 0.000002s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.99% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.69% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.68% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000255 6 41.14% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.86% : 0.000150s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026818 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.27% : 0.003023s : 1: add_attr 11.24% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.33% : 0.000090s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.85% : 0.000497s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000006s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000438s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.96% : 0.000794s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000093s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.31% : 0.001959s : 1: opt_a 0.38% : 0.000101s : 1: opt_after_cconv 1.77% : 0.000473s : 1: opt_after_jit_grad 0.71% : 0.000191s : 1: opt_b 14.22% : 0.003813s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000193s : 1: renormalize.infer 0.59% : 0.000157s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000074s : 1: symbol_engine_optimizer 22.56% : 0.006051s : 1: task_emit 0.28% : 0.000075s : 1: tuple_transform 16.78% : 0.004501s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0201214, [24] [bootstrap]: 0.00045983 [type_inference]: 0.00560334 [event_method]: 1.43e-05 [auto_monad]: 5.67e-05 [graph_reusing]: 5.79999e-06 [inline]: 1.84998e-06 [add_attr]: 0.00309576, [1] [add_attr_with_inline]: 0.00308726, [1] [Cycle 1]: 4.801e-05, [2] [tag_attr]: 1.653e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 3.3e-06 [pre_auto_parallel]: 2.592e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00408935, [53] [py_interpret_to_execute]: 2.027e-05 [rewriter_before_opt_a]: 5.852e-05 [opt_a]: 0.00216476, [2] [Cycle 1]: 0.00154017, [45] [expand_dump_flag]: 2.41e-06 [switch_simplify]: 3.269e-05 [loop_unroll]: 2.237e-05 [a_1]: 0.00045915 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 8.09002e-06 [updatestate_depend_eliminate]: 3.87002e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.78002e-06 [a_2]: 7.826e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 2.28002e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 6.17001e-06 [merge_send_recv]: 7.63001e-06 [auto_parallel]: 6.63998e-06 [parallel]: 1.689e-05 [flash_sp]: 7.66001e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 7.60998e-06 [virtual_dataset]: 6.26e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.69999e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.51e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.151e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.42001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.08002e-06 [after_resolve]: 1.034e-05 [a_after_grad]: 9.63002e-06 [renormalize]: 0.00042772 [add_forward_monad_depend]: 5.01002e-06 [auto_monad_grad]: 1.68997e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.775e-05 [a_3]: 4.189e-05 [Cycle 2]: 0.00061458, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 7.51001e-06 [loop_unroll]: 5.78997e-06 [a_1]: 0.00012921 [with_stream_mark]: 9.97001e-06 [recompute_prepare]: 6.16e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 7.234e-05 [accelerated_algorithm]: 5.95002e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.26997e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 4.58999e-06 [auto_parallel]: 5.37999e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.19001e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 5.14003e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 6.29982e-07 [before_grad]: 8.46002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.92999e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.56e-06 [a_after_grad]: 8.47e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.13998e-06 [cse]: 1.832e-05 [a_3]: 3.323e-05 [py_interpret_to_execute_after_opt_a]: 7.88999e-06 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 3.042e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.75001e-06 [mutable_eliminate]: 0.00045604 [opt_b]: 0.00018732, [1] [Cycle 1]: 0.00018121, [7] [b_1]: 0.00011204 [b_2]: 7.31999e-06 [updatestate_depend_eliminate]: 5.16998e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.48e-06 [renormalize]: 3.50003e-07 [cse]: 1.617e-05 [optimize_parallel_all_gather_comm]: 1.541e-05 [overlap_param_gather]: 1.78002e-06 [cconv]: 2.152e-05 [loop_unroll]: 0.00047955 [opt_after_cconv]: 9.844e-05, [1] [Cycle 1]: 9.262e-05, [7] [c_1]: 2.882e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.719e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.186e-05 [tuple_transform]: 7.106e-05, [1] [Cycle 1]: 6.657e-05, [4] [d_1]: 4.059e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.69999e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 4.468e-05 [cse_after_recomputation]: 2.156e-05, [1] [Cycle 1]: 1.687e-05, [1] [cse]: 1.158e-05 [environ_conv]: 5.04003e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.94001e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.13002e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 2.66999e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 1.11002e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50999e-06 [control_data_broadcast_order]: 1.149e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.015e-05, [1] [Cycle 1]: 6.608e-05, [6] [build]: 2.56e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.34001e-06 [fold_const_symbol]: 9.34e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 1.594e-05 [get_jit_bprop_graph]: 1.14998e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00046033 [validate]: 3.125e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00603978 [execute]: 7.26001e-06 Sums bootstrap : 0.000460s : 2.86% type_inference : 0.005603s : 34.90% event_method : 0.000014s : 0.09% auto_monad : 0.000057s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.36% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000028s : 0.18% optimize.opt_a.a_1 : 0.000588s : 3.67% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000428s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000046s : 0.29% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000456s : 2.84% optimize.opt_b.b_1 : 0.000112s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000480s : 2.99% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000460s : 2.87% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006040s : 37.62% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000168 30 14.08% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.44% : 0.000006s : 4: substitution.graph_param_transform 67.14% : 0.000113s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.68% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005562 2 89.80% : 0.004995s : 1: type_inference.infer 10.20% : 0.000567s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.50% : 0.000029s : 3: replace.inline 29.50% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.65% : 0.000110s : 3: match.inline 8.35% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.61% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.78% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 21: predicate.replace_applicator 0.58% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000353 8 46.04% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.96% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028866 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.74% : 0.003100s : 1: add_attr 10.71% : 0.003091s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000062s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.72% : 0.000495s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.69% : 0.000488s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000465s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.36% : 0.000969s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000094s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.51% : 0.002168s : 1: opt_a 0.35% : 0.000102s : 1: opt_after_cconv 1.63% : 0.000470s : 1: opt_after_jit_grad 0.66% : 0.000191s : 1: opt_b 14.18% : 0.004093s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.75% : 0.000216s : 1: renormalize.infer 0.71% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000073s : 1: symbol_engine_optimizer 20.96% : 0.006050s : 1: task_emit 0.26% : 0.000074s : 1: tuple_transform 19.46% : 0.005617s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0386312, [24] [bootstrap]: 0.00049941 [type_inference]: 0.0117272 [event_method]: 5.061e-05 [auto_monad]: 0.00012654 [graph_reusing]: 9.37999e-06 [inline]: 1.86e-06 [add_attr]: 0.00310469, [1] [add_attr_with_inline]: 0.00309619, [1] [Cycle 1]: 7.235e-05, [2] [tag_attr]: 3.644e-05 [meta_addattr_fg_expand]: 1.043e-05 [parallel-infer-symbol]: 2.83003e-06 [pre_auto_parallel]: 5.144e-05 [insert-virtual-dataset]: 2.72001e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.83002e-06 [optimize]: 0.0138575, [53] [py_interpret_to_execute]: 3.88e-05 [rewriter_before_opt_a]: 0.00015181 [opt_a]: 0.0115272, [3] [Cycle 1]: 0.00742617, [45] [expand_dump_flag]: 3.94002e-06 [switch_simplify]: 7.772e-05 [loop_unroll]: 6.621e-05 [a_1]: 0.00150497 [with_stream_mark]: 4.231e-05 [recompute_prepare]: 2.343e-05 [updatestate_depend_eliminate]: 1.012e-05 [updatestate_assign_eliminate]: 8.43001e-06 [updatestate_loads_eliminate]: 8.12e-06 [parameter_eliminate]: 3.26001e-06 [a_2]: 0.00025259 [accelerated_algorithm]: 3.133e-05 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 3.93999e-06 [shard_inline]: 1.677e-05 [merge_send_recv]: 1.623e-05 [auto_parallel]: 1.129e-05 [parallel]: 1.897e-05 [flash_sp]: 1.219e-05 [merge_comm]: 9.89999e-06 [allreduce_fusion]: 9.04998e-06 [matmul_add_comm_reduction]: 2.672e-05 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 1.83e-05 [virtual_dataset]: 1.648e-05 [get_grad_eliminate_]: 1.586e-05 [virtual_output]: 1.6e-05 [merge_forward]: 9.77999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.858e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.916e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 2.807e-05 [set_forward_comm_id_for_comm_node_pass]: 1.017e-05 [meta_fg_expand]: 0.00147434 [flash_sp_send_recv_attached]: 3.99002e-06 [receive_attached]: 2.26998e-06 [after_resolve]: 6.184e-05 [a_after_grad]: 8.328e-05 [renormalize]: 0.0025926 [add_forward_monad_depend]: 9.82999e-06 [auto_monad_grad]: 5.34e-06 [auto_monad_eliminator]: 5.694e-05 [cse]: 0.00017615 [a_3]: 0.00034867 [Cycle 2]: 0.00315097, [45] [expand_dump_flag]: 1.74e-06 [switch_simplify]: 4.796e-05 [loop_unroll]: 4.516e-05 [a_1]: 0.00158263 [with_stream_mark]: 1.257e-05 [recompute_prepare]: 1.187e-05 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 4.57e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 1.09998e-06 [a_2]: 0.00013213 [accelerated_algorithm]: 1.265e-05 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 2.12999e-06 [shard_inline]: 9.51998e-06 [merge_send_recv]: 7e-06 [auto_parallel]: 7.58999e-06 [parallel]: 4.75999e-06 [flash_sp]: 3.49001e-06 [merge_comm]: 6.17999e-06 [allreduce_fusion]: 5.32999e-06 [matmul_add_comm_reduction]: 8.23999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.064e-05 [virtual_dataset]: 9.57999e-06 [get_grad_eliminate_]: 9.10999e-06 [virtual_output]: 8.65999e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 9.90025e-07 [offload_activation]: 9.14998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.706e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.457e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 7.222e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.25999e-06 [after_resolve]: 1.797e-05 [a_after_grad]: 1.503e-05 [renormalize]: 0.00062505 [add_forward_monad_depend]: 4.3e-06 [auto_monad_grad]: 1.40001e-06 [auto_monad_eliminator]: 1.529e-05 [cse]: 5.172e-05 [a_3]: 6.861e-05 [Cycle 3]: 0.00093564, [45] [expand_dump_flag]: 1.17e-06 [switch_simplify]: 1.11e-05 [loop_unroll]: 9.38002e-06 [a_1]: 0.00025645 [with_stream_mark]: 1.079e-05 [recompute_prepare]: 1.013e-05 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.24002e-06 [parameter_eliminate]: 1.04998e-06 [a_2]: 0.000127 [accelerated_algorithm]: 1.172e-05 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 9.34998e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 7.36001e-06 [parallel]: 4.70001e-06 [flash_sp]: 1.15001e-06 [merge_comm]: 5.05001e-06 [allreduce_fusion]: 5.37999e-06 [matmul_add_comm_reduction]: 8.28001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.045e-05 [virtual_dataset]: 9.12001e-06 [get_grad_eliminate_]: 8.82e-06 [virtual_output]: 8.48999e-06 [merge_forward]: 4.62e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.684e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.444e-05 [set_forward_comm_id_for_comm_node_pass]: 5.21002e-06 [meta_fg_expand]: 3.45e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.479e-05 [a_after_grad]: 1.573e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.40001e-06 [auto_monad_grad]: 1.05999e-06 [auto_monad_eliminator]: 1.149e-05 [cse]: 2.858e-05 [a_3]: 6.264e-05 [py_interpret_to_execute_after_opt_a]: 1.072e-05 [slice_cell_reuse_recomputed_activation]: 2.34001e-06 [rewriter_after_opt_a]: 4.844e-05 [convert_after_rewriter]: 1.045e-05 [order_py_execute_after_rewriter]: 7.08e-06 [mutable_eliminate]: 0.00046721 [opt_b]: 0.00029917, [1] [Cycle 1]: 0.00029316, [7] [b_1]: 0.00019587 [b_2]: 1.136e-05 [updatestate_depend_eliminate]: 7.41001e-06 [updatestate_assign_eliminate]: 4.53999e-06 [updatestate_loads_eliminate]: 4.28001e-06 [renormalize]: 3.7998e-07 [cse]: 3.31e-05 [optimize_parallel_all_gather_comm]: 2.016e-05 [overlap_param_gather]: 2.31e-06 [cconv]: 1.97e-05 [loop_unroll]: 0.00044116 [opt_after_cconv]: 0.00013856, [1] [Cycle 1]: 0.00013261, [7] [c_1]: 4.963e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 7.42998e-06 [updatestate_assign_eliminate]: 4.32998e-06 [updatestate_loads_eliminate]: 4.08001e-06 [cse]: 3.055e-05 [renormalize]: 5.39992e-07 [remove_dup_value]: 3.003e-05 [tuple_transform]: 0.00010491, [1] [Cycle 1]: 9.983e-05, [4] [d_1]: 6.86e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.98002e-06 [partial_unused_args_eliminate]: 1.87001e-06 [add_recomputation]: 5.844e-05 [cse_after_recomputation]: 3.384e-05, [1] [Cycle 1]: 2.896e-05, [1] [cse]: 2.307e-05 [environ_conv]: 9.04e-06 [swap_dp_allreduce_reducescatter]: 8.48999e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.04e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.29998e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.712e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 4.90999e-06 [overlap_recompute_and_grad_model_parallel]: 5.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.29999e-06 [overlap_grad_ring_attention]: 5.05999e-06 [overlap_grad_flash_sp]: 2.398e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.81003e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 0.00010217, [1] [Cycle 1]: 9.79e-05, [6] [build]: 8.99e-06 [elim_shapecalc]: 1.493e-05 [elim_not_effective]: 1.91e-05 [opt_reshape]: 1.055e-05 [fold_const_symbol]: 1.557e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 2.574e-05 [get_jit_bprop_graph]: 1.24e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00048322 [validate]: 4.451e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.0083658 [execute]: 6.84001e-06 Sums bootstrap : 0.000499s : 1.46% type_inference : 0.011727s : 34.32% event_method : 0.000051s : 0.15% auto_monad : 0.000127s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000152s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000137s : 0.40% optimize.opt_a.loop_unroll : 0.000121s : 0.35% optimize.opt_a.a_1 : 0.003344s : 9.79% optimize.opt_a.with_stream_mark : 0.000066s : 0.19% optimize.opt_a.recompute_prepare : 0.000045s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000512s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001550s : 4.54% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000095s : 0.28% optimize.opt_a.a_after_grad : 0.000114s : 0.33% optimize.opt_a.renormalize : 0.003218s : 9.42% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.25% optimize.opt_a.cse : 0.000256s : 0.75% optimize.opt_a.a_3 : 0.000480s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000467s : 1.37% optimize.opt_b.b_1 : 0.000196s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000441s : 1.29% optimize.opt_after_cconv.c_1 : 0.000050s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000069s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.17% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000483s : 1.41% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008366s : 24.48% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000802 222 5.76% : 0.000046s : 12: substitution.arithmetic_simplify 1.62% : 0.000013s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 57.00% : 0.000457s : 17: substitution.inline 2.07% : 0.000017s : 2: substitution.inline_without_move 1.24% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.85% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000014s : 11: substitution.minmaximum_grad 0.62% : 0.000005s : 5: substitution.partial_eliminate 1.68% : 0.000013s : 20: substitution.remove_not_recompute_node 3.07% : 0.000025s : 10: substitution.replace_applicator 1.44% : 0.000012s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.64% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.22% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.55% : 0.000069s : 30: substitution.tuple_list_get_item_eliminator 2.27% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011651 2 86.36% : 0.010062s : 1: type_inference.infer 13.64% : 0.001590s : 1: type_inference.specialize ------[replace.] 0.000236 33 57.54% : 0.000136s : 17: replace.inline 42.46% : 0.000100s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000482 33 92.79% : 0.000447s : 17: match.inline 7.21% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000773 5764 1.11% : 0.000009s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.10% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_depend_swap 1.72% : 0.000013s : 108: predicate.environ_get_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.68% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.67% : 0.000044s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.32% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.37% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000016s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.04% : 0.000008s : 68: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 68: predicate.reduce_eliminate 2.62% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.85% : 0.000014s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.09% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 101: predicate.switch_defer_inline 2.91% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.08% : 0.000039s : 277: predicate.switch_simplify 1.05% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.54% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.58% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.18% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001678 34 55.98% : 0.000939s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.02% : 0.000739s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064220 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.84% : 0.003109s : 1: add_attr 4.83% : 0.003100s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000134s : 1: auto_monad 0.12% : 0.000079s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.83% : 0.000533s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000014s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000058s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000014s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000450s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000476s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.90% : 0.005071s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000180s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000057s : 4: opt.transform.symbol_engine_opt 17.95% : 0.011530s : 1: opt_a 0.22% : 0.000142s : 1: opt_after_cconv 0.77% : 0.000493s : 1: opt_after_jit_grad 0.47% : 0.000303s : 1: opt_b 21.58% : 0.013862s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000056s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.58% : 0.001658s : 2: renormalize.infer 2.41% : 0.001546s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000053s : 1: rewriter_after_opt_a 0.24% : 0.000157s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000105s : 1: symbol_engine_optimizer 13.04% : 0.008376s : 1: task_emit 0.17% : 0.000108s : 1: tuple_transform 18.28% : 0.011742s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.018847, [24] [bootstrap]: 0.00045811 [type_inference]: 0.00437368 [event_method]: 1.026e-05 [auto_monad]: 4.95e-05 [graph_reusing]: 5.17e-06 [inline]: 2.24001e-06 [add_attr]: 0.00306914, [1] [add_attr_with_inline]: 0.00306012, [1] [Cycle 1]: 4.593e-05, [2] [tag_attr]: 1.207e-05 [meta_addattr_fg_expand]: 3.11999e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 2.117e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.10002e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00381815, [53] [py_interpret_to_execute]: 1.58e-05 [rewriter_before_opt_a]: 3.896e-05 [opt_a]: 0.00191449, [2] [Cycle 1]: 0.00129523, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 2.602e-05 [loop_unroll]: 1.433e-05 [a_1]: 0.00030282 [with_stream_mark]: 1.441e-05 [recompute_prepare]: 7.87998e-06 [updatestate_depend_eliminate]: 3.78999e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 3.18998e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.836e-05 [accelerated_algorithm]: 6.50002e-06 [shard]: 2.13002e-06 [meta_shard_fg_expand]: 1.61002e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 8.27e-06 [auto_parallel]: 5.83997e-06 [parallel]: 1.74e-05 [flash_sp]: 6.76e-06 [merge_comm]: 3.89002e-06 [allreduce_fusion]: 3.88001e-06 [matmul_add_comm_reduction]: 9.37999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 6.37001e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.149e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 9.49999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.81001e-06 [meta_fg_expand]: 2.54999e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.50002e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 9.46e-06 [renormalize]: 0.00035633 [add_forward_monad_depend]: 4.47e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.306e-05 [cse]: 2.626e-05 [a_3]: 4.241e-05 [Cycle 2]: 0.00060977, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.03998e-06 [loop_unroll]: 5.91e-06 [a_1]: 0.00012784 [with_stream_mark]: 9.94001e-06 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 7.029e-05 [accelerated_algorithm]: 5.73002e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.25001e-06 [shard_inline]: 5.73002e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.47998e-06 [flash_sp]: 3.33e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.65002e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.47001e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 6.19999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.021e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.18999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.84998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.34e-06 [a_after_grad]: 8.88002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.29e-05 [a_3]: 3.436e-05 [py_interpret_to_execute_after_opt_a]: 7.83001e-06 [slice_cell_reuse_recomputed_activation]: 1.93002e-06 [rewriter_after_opt_a]: 3.176e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.00999e-06 [mutable_eliminate]: 0.000516 [opt_b]: 0.00018714, [1] [Cycle 1]: 0.00018087, [7] [b_1]: 0.00011102 [b_2]: 7.73001e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 3.69997e-07 [cse]: 1.667e-05 [optimize_parallel_all_gather_comm]: 1.558e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.182e-05 [loop_unroll]: 0.00042397 [opt_after_cconv]: 9.701e-05, [1] [Cycle 1]: 9.142e-05, [7] [c_1]: 2.872e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.606e-05 [renormalize]: 2.70025e-07 [remove_dup_value]: 1.284e-05 [tuple_transform]: 7.162e-05, [1] [Cycle 1]: 6.711e-05, [4] [d_1]: 4.005e-05 [none_parameter_eliminate]: 1.79e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.57002e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.3e-05 [cse_after_recomputation]: 2.057e-05, [1] [Cycle 1]: 1.601e-05, [1] [cse]: 1.086e-05 [environ_conv]: 5.01002e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.37001e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.67001e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.70002e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.21002e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.232e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.51999e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 4.33999e-06 [overlap_grad_flash_sp]: 1.664e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 6.973e-05, [1] [Cycle 1]: 6.542e-05, [6] [build]: 2.59001e-06 [elim_shapecalc]: 8.60999e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 6.33e-06 [fold_const_symbol]: 8.99e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.584e-05 [get_jit_bprop_graph]: 1.11002e-06 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00045907 [validate]: 3.087e-05 [backend_pass]: 1.10001e-06 [task_emit]: 0.00631186 [execute]: 6.29999e-06 Sums bootstrap : 0.000458s : 3.10% type_inference : 0.004374s : 29.55% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000033s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000431s : 2.91% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000149s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000356s : 2.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.26% optimize.opt_a.a_3 : 0.000077s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000516s : 3.49% optimize.opt_b.b_1 : 0.000111s : 0.75% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000424s : 2.86% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000459s : 3.10% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006312s : 42.64% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000124 26 17.63% : 0.000022s : 4: substitution.arithmetic_simplify 1.52% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.47% : 0.000006s : 4: substitution.graph_param_transform 66.35% : 0.000083s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.77% : 0.000005s : 4: substitution.remove_not_recompute_node 2.89% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004331 2 91.56% : 0.003966s : 1: type_inference.infer 8.44% : 0.000366s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000140 984 0.77% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.66% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.60% : 0.000004s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 1.03% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.35% : 0.000002s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.67% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.99% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.49% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.15% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 40.90% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.10% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027052 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.36% : 0.003074s : 1: add_attr 11.33% : 0.003064s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.20% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.83% : 0.000495s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000016s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000433s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.94% : 0.000525s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.95% : 0.000797s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000093s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001917s : 1: opt_a 0.37% : 0.000101s : 1: opt_after_cconv 1.73% : 0.000469s : 1: opt_after_jit_grad 0.71% : 0.000191s : 1: opt_b 14.13% : 0.003822s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000192s : 1: renormalize.infer 0.58% : 0.000158s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 23.37% : 0.006322s : 1: task_emit 0.28% : 0.000075s : 1: tuple_transform 16.22% : 0.004388s : 1: type_inference 0.21% : 0.000058s : 1: validate TotalTime = 0.036588, [24] [bootstrap]: 0.00047438 [type_inference]: 0.0103362 [event_method]: 4.385e-05 [auto_monad]: 0.00012022 [graph_reusing]: 8.50001e-06 [inline]: 1.90001e-06 [add_attr]: 0.00301873, [1] [add_attr_with_inline]: 0.00301048, [1] [Cycle 1]: 7.019e-05, [2] [tag_attr]: 3.468e-05 [meta_addattr_fg_expand]: 9.24998e-06 [parallel-infer-symbol]: 3.01001e-06 [pre_auto_parallel]: 4.68e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.013477, [53] [py_interpret_to_execute]: 3.724e-05 [rewriter_before_opt_a]: 0.00013233 [opt_a]: 0.0111868, [3] [Cycle 1]: 0.00715141, [45] [expand_dump_flag]: 3.55998e-06 [switch_simplify]: 6.922e-05 [loop_unroll]: 5.696e-05 [a_1]: 0.00137533 [with_stream_mark]: 2.364e-05 [recompute_prepare]: 2.265e-05 [updatestate_depend_eliminate]: 9.22999e-06 [updatestate_assign_eliminate]: 7.98999e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.64999e-06 [a_2]: 0.00024962 [accelerated_algorithm]: 3.193e-05 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 3.7e-06 [shard_inline]: 1.65e-05 [merge_send_recv]: 1.63e-05 [auto_parallel]: 1.118e-05 [parallel]: 1.861e-05 [flash_sp]: 1.224e-05 [merge_comm]: 1.014e-05 [allreduce_fusion]: 9.44e-06 [matmul_add_comm_reduction]: 2.663e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.851e-05 [virtual_dataset]: 1.618e-05 [get_grad_eliminate_]: 1.555e-05 [virtual_output]: 1.534e-05 [merge_forward]: 9.67999e-06 [cell_reuse_recompute_pass]: 1.57999e-06 [offload_activation]: 1.812e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.941e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 2.831e-05 [set_forward_comm_id_for_comm_node_pass]: 9.97999e-06 [meta_fg_expand]: 0.00145077 [flash_sp_send_recv_attached]: 3.86001e-06 [receive_attached]: 2.26998e-06 [after_resolve]: 6.086e-05 [a_after_grad]: 8.519e-05 [renormalize]: 0.00249512 [add_forward_monad_depend]: 9.31e-06 [auto_monad_grad]: 5.59e-06 [auto_monad_eliminator]: 5.706e-05 [cse]: 0.0001984 [a_3]: 0.00034641 [Cycle 2]: 0.00305802, [45] [expand_dump_flag]: 1.62999e-06 [switch_simplify]: 4.829e-05 [loop_unroll]: 4.514e-05 [a_1]: 0.00158248 [with_stream_mark]: 1.198e-05 [recompute_prepare]: 1.133e-05 [updatestate_depend_eliminate]: 5.54e-06 [updatestate_assign_eliminate]: 4.68999e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 1.15999e-06 [a_2]: 0.00012942 [accelerated_algorithm]: 1.283e-05 [shard]: 1.27e-06 [meta_shard_fg_expand]: 2.04e-06 [shard_inline]: 9.61e-06 [merge_send_recv]: 7.06001e-06 [auto_parallel]: 7.95e-06 [parallel]: 5.00001e-06 [flash_sp]: 3.48e-06 [merge_comm]: 5.49e-06 [allreduce_fusion]: 4.89e-06 [matmul_add_comm_reduction]: 8.43999e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.041e-05 [virtual_dataset]: 8.90999e-06 [get_grad_eliminate_]: 9.24e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.46998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.666e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.411e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35001e-06 [meta_fg_expand]: 3.742e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.568e-05 [a_after_grad]: 1.489e-05 [renormalize]: 0.00060518 [add_forward_monad_depend]: 3.88001e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.519e-05 [cse]: 5.063e-05 [a_3]: 6.859e-05 [Cycle 3]: 0.00096312, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 1.086e-05 [loop_unroll]: 9.21002e-06 [a_1]: 0.0002573 [with_stream_mark]: 1.042e-05 [recompute_prepare]: 9.87999e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 4.15999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00015304 [accelerated_algorithm]: 1.265e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.96998e-06 [shard_inline]: 9.51e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 7.75e-06 [parallel]: 4.73001e-06 [flash_sp]: 1.23002e-06 [merge_comm]: 5.24e-06 [allreduce_fusion]: 5.07e-06 [matmul_add_comm_reduction]: 8.22e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.081e-05 [virtual_dataset]: 9.41998e-06 [get_grad_eliminate_]: 9.04e-06 [virtual_output]: 8.59e-06 [merge_forward]: 4.64002e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 9.22999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.653e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.474e-05 [set_forward_comm_id_for_comm_node_pass]: 6.29999e-06 [meta_fg_expand]: 3.45998e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.527e-05 [a_after_grad]: 1.47e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.38002e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.128e-05 [cse]: 2.729e-05 [a_3]: 6.122e-05 [py_interpret_to_execute_after_opt_a]: 1.122e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 4.804e-05 [convert_after_rewriter]: 9.51e-06 [order_py_execute_after_rewriter]: 7.34002e-06 [mutable_eliminate]: 0.0004629 [opt_b]: 0.00029502, [1] [Cycle 1]: 0.00028892, [7] [b_1]: 0.00019452 [b_2]: 1.089e-05 [updatestate_depend_eliminate]: 7.32997e-06 [updatestate_assign_eliminate]: 4.38001e-06 [updatestate_loads_eliminate]: 4.21001e-06 [renormalize]: 4.09986e-07 [cse]: 3.131e-05 [optimize_parallel_all_gather_comm]: 2.07e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.088e-05 [loop_unroll]: 0.00042705 [opt_after_cconv]: 0.00014036, [1] [Cycle 1]: 0.00013456, [7] [c_1]: 5.01e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 7.92998e-06 [updatestate_assign_eliminate]: 4.56002e-06 [updatestate_loads_eliminate]: 4.21001e-06 [cse]: 3.051e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 3.02e-05 [tuple_transform]: 0.00010675, [1] [Cycle 1]: 0.00010157, [4] [d_1]: 6.996e-05 [none_parameter_eliminate]: 1.86998e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.031e-05 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 5.702e-05 [cse_after_recomputation]: 3.369e-05, [1] [Cycle 1]: 2.908e-05, [1] [cse]: 2.325e-05 [environ_conv]: 8.54e-06 [swap_dp_allreduce_reducescatter]: 8.25e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.12e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 8.09989e-07 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.23002e-06 [overlap_opt_shard_in_pipeline]: 1.00999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50001e-06 [control_data_broadcast_order]: 1.824e-05 [grouped_pairwise_exchange_alltoall]: 1.41998e-06 [offloading_packed_experts]: 5.11002e-06 [overlap_recompute_and_grad_model_parallel]: 6.35002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 5.24998e-06 [overlap_grad_flash_sp]: 2.45e-05 [begin_end_overlap_inline]: 8.50006e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010088, [1] [Cycle 1]: 9.66e-05, [6] [build]: 9.12001e-06 [elim_shapecalc]: 1.408e-05 [elim_not_effective]: 1.929e-05 [opt_reshape]: 1.002e-05 [fold_const_symbol]: 1.538e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.52e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00047304 [validate]: 4.488e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00828159 [execute]: 7.16001e-06 Sums bootstrap : 0.000474s : 1.47% type_inference : 0.010336s : 32.02% event_method : 0.000044s : 0.14% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.12% optimize.rewriter_before_opt_a : 0.000132s : 0.41% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000128s : 0.40% optimize.opt_a.loop_unroll : 0.000111s : 0.34% optimize.opt_a.a_1 : 0.003215s : 9.96% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000044s : 0.14% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000532s : 1.65% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.18% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.10% optimize.opt_a.auto_parallel : 0.000027s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.07% optimize.opt_a.meta_fg_expand : 0.001492s : 4.62% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000092s : 0.28% optimize.opt_a.a_after_grad : 0.000115s : 0.36% optimize.opt_a.renormalize : 0.003100s : 9.60% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.26% optimize.opt_a.cse : 0.000276s : 0.86% optimize.opt_a.a_3 : 0.000476s : 1.48% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000463s : 1.43% optimize.opt_b.b_1 : 0.000195s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000427s : 1.32% optimize.opt_after_cconv.c_1 : 0.000050s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000070s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.18% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000473s : 1.47% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008282s : 25.65% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000764 218 5.82% : 0.000044s : 11: substitution.arithmetic_simplify 1.72% : 0.000013s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.59% : 0.000424s : 16: substitution.inline 2.26% : 0.000017s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.06% : 0.000016s : 3: substitution.less_batch_normalization 1.81% : 0.000014s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.76% : 0.000013s : 20: substitution.remove_not_recompute_node 3.21% : 0.000025s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.18% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010265 2 86.99% : 0.008930s : 1: type_inference.infer 13.01% : 0.001335s : 1: type_inference.specialize ------[replace.] 0.000217 30 58.21% : 0.000126s : 16: replace.inline 41.79% : 0.000091s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000447 30 93.07% : 0.000416s : 16: match.inline 6.93% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.17% : 0.000016s : 99: predicate.arithmetic_simplify 1.10% : 0.000008s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.23% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.72% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.59% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.35% : 0.000010s : 67: predicate.reduce_eliminate 2.63% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.82% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.04% : 0.000008s : 67: predicate.reshape_eliminate 1.18% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 97: predicate.switch_defer_inline 2.85% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000037s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.53% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000023s : 129: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.18% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001537 32 56.72% : 0.000872s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.28% : 0.000665s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061463 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.92% : 0.003023s : 1: add_attr 4.90% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000128s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.83% : 0.000509s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000051s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.04% : 0.004943s : 117: opt.transform.opt_a 0.08% : 0.000049s : 1: opt.transform.opt_after_cconv 0.06% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000179s : 28: opt.transform.opt_b 0.13% : 0.000078s : 2: opt.transform.opt_trans_graph 0.09% : 0.000055s : 4: opt.transform.symbol_engine_opt 18.21% : 0.011190s : 1: opt_a 0.23% : 0.000144s : 1: opt_after_cconv 0.79% : 0.000483s : 1: opt_after_jit_grad 0.49% : 0.000299s : 1: opt_b 21.93% : 0.013481s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000035s : 1: remove_dup_value 2.64% : 0.001622s : 2: renormalize.infer 2.38% : 0.001465s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.22% : 0.000137s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000104s : 1: symbol_engine_optimizer 13.49% : 0.008292s : 1: task_emit 0.18% : 0.000110s : 1: tuple_transform 16.84% : 0.010350s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x1-kbk],max_mem:42.0M . TotalTime = 0.0818388, [24] [bootstrap]: 0.00059012 [type_inference]: 0.00659583 [event_method]: 1.396e-05 [auto_monad]: 5.565e-05 [graph_reusing]: 5.08002e-06 [inline]: 1.86e-06 [add_attr]: 0.00354965, [1] [add_attr_with_inline]: 0.00353861, [1] [Cycle 1]: 4.591e-05, [2] [tag_attr]: 1.609e-05 [meta_addattr_fg_expand]: 4.50001e-06 [parallel-infer-symbol]: 2.63998e-06 [pre_auto_parallel]: 2.852e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.88002e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00411943, [53] [py_interpret_to_execute]: 2.155e-05 [rewriter_before_opt_a]: 5.928e-05 [opt_a]: 0.00218875, [2] [Cycle 1]: 0.00157635, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 3.244e-05 [loop_unroll]: 2.204e-05 [a_1]: 0.00047082 [with_stream_mark]: 1.35e-05 [recompute_prepare]: 8.21002e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.917e-05 [accelerated_algorithm]: 6.69001e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 9.09e-06 [auto_parallel]: 6.44999e-06 [parallel]: 2.441e-05 [flash_sp]: 7.75e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.47002e-06 [matmul_add_comm_reduction]: 8.69003e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.31999e-06 [virtual_dataset]: 6.51999e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.84e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.129e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 1.337e-05 [set_forward_comm_id_for_comm_node_pass]: 3.47997e-06 [meta_fg_expand]: 2.48002e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.55002e-06 [after_resolve]: 1.118e-05 [a_after_grad]: 9.15999e-06 [renormalize]: 0.00043719 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.382e-05 [cse]: 2.648e-05 [a_3]: 4.314e-05 [Cycle 2]: 0.00060309, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 7.13998e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012938 [with_stream_mark]: 9.80002e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.942e-05 [accelerated_algorithm]: 5.86e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.29998e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 5.47001e-06 [parallel]: 4.07998e-06 [flash_sp]: 3.43999e-06 [merge_comm]: 3.26999e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.38002e-06 [allreduce_slice_to_reducescatter]: 2.3999e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 6.11e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.001e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.13999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 9.72001e-06 [a_after_grad]: 8.1e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.68e-06 [cse]: 1.261e-05 [a_3]: 3.283e-05 [py_interpret_to_execute_after_opt_a]: 8.31002e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.166e-05 [convert_after_rewriter]: 6.98998e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.00045792 [opt_b]: 0.00018989, [1] [Cycle 1]: 0.00018379, [7] [b_1]: 0.00011395 [b_2]: 7.76001e-06 [updatestate_depend_eliminate]: 5.06997e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 3.80009e-07 [cse]: 1.633e-05 [optimize_parallel_all_gather_comm]: 1.552e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.181e-05 [loop_unroll]: 0.00046789 [opt_after_cconv]: 9.829e-05, [1] [Cycle 1]: 9.233e-05, [7] [c_1]: 2.948e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.33002e-06 [cse]: 1.628e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.251e-05 [tuple_transform]: 7.233e-05, [1] [Cycle 1]: 6.762e-05, [4] [d_1]: 4.09e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.43e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 5.228e-05 [cse_after_recomputation]: 2.168e-05, [1] [Cycle 1]: 1.707e-05, [1] [cse]: 1.204e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.42999e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.52e-06 [label_fine_grained_interleaved_index]: 2.43e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71998e-06 [control_data_broadcast_order]: 1.177e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 4.25999e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 6.975e-05, [1] [Cycle 1]: 6.561e-05, [6] [build]: 2.17999e-06 [elim_shapecalc]: 9.50001e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 6.46e-06 [fold_const_symbol]: 9.05001e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.93997e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.588e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.68999e-06 [opt_after_jit_grad]: 0.00045766 [validate]: 3.013e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.066124 [execute]: 8.69e-06 Sums bootstrap : 0.000590s : 0.76% type_inference : 0.006596s : 8.53% event_method : 0.000014s : 0.02% auto_monad : 0.000056s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000028s : 0.04% optimize.opt_a.a_1 : 0.000600s : 0.78% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000149s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000022s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000437s : 0.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000076s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.59% optimize.opt_b.b_1 : 0.000114s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000468s : 0.61% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000041s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000458s : 0.59% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.066124s : 85.56% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000172 30 14.54% : 0.000025s : 5: substitution.arithmetic_simplify 1.00% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000006s : 4: substitution.graph_param_transform 66.98% : 0.000115s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000005s : 4: substitution.remove_not_recompute_node 2.64% : 0.000005s : 4: substitution.replace_old_param 6.54% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006541 2 90.82% : 0.005941s : 1: type_inference.infer 9.18% : 0.000601s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.25% : 0.000029s : 3: replace.inline 29.75% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 91.74% : 0.000113s : 3: match.inline 8.26% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 1.00% : 0.000002s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.25% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.95% : 0.000002s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.08% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.55% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000384 8 47.10% : 0.000181s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.90% : 0.000203s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.091084 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.90% : 0.003554s : 1: add_attr 3.89% : 0.003543s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000061s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.70% : 0.000642s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000477s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.08% : 0.000981s : 78: opt.transform.opt_a 0.03% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000093s : 28: opt.transform.opt_b 0.05% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.41% : 0.002192s : 1: opt_a 0.11% : 0.000102s : 1: opt_after_cconv 0.51% : 0.000467s : 1: opt_after_jit_grad 0.21% : 0.000193s : 1: opt_b 4.53% : 0.004123s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000224s : 1: renormalize.infer 0.23% : 0.000206s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 72.62% : 0.066141s : 1: task_emit 0.08% : 0.000075s : 1: tuple_transform 7.26% : 0.006609s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.072611, [24] [bootstrap]: 0.00047252 [type_inference]: 0.00447214 [event_method]: 1.09e-05 [auto_monad]: 5.348e-05 [graph_reusing]: 4.75999e-06 [inline]: 1.69e-06 [add_attr]: 0.00305856, [1] [add_attr_with_inline]: 0.00304899, [1] [Cycle 1]: 4.571e-05, [2] [tag_attr]: 1.172e-05 [meta_addattr_fg_expand]: 3.09999e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.172e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.00383667, [53] [py_interpret_to_execute]: 1.709e-05 [rewriter_before_opt_a]: 3.895e-05 [opt_a]: 0.00192562, [2] [Cycle 1]: 0.00130596, [45] [expand_dump_flag]: 2.82002e-06 [switch_simplify]: 2.481e-05 [loop_unroll]: 1.426e-05 [a_1]: 0.00030003 [with_stream_mark]: 1.622e-05 [recompute_prepare]: 7.56999e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.05002e-06 [parameter_eliminate]: 1.66002e-06 [a_2]: 7.887e-05 [accelerated_algorithm]: 6.38998e-06 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.35999e-06 [auto_parallel]: 5.85002e-06 [parallel]: 1.798e-05 [flash_sp]: 7.38999e-06 [merge_comm]: 3.54002e-06 [allreduce_fusion]: 3.54002e-06 [matmul_add_comm_reduction]: 9.14998e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.32002e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 4.1e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.61e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.54999e-06 [flash_sp_send_recv_attached]: 2.44999e-06 [receive_attached]: 3.26001e-06 [after_resolve]: 1.084e-05 [a_after_grad]: 9.09e-06 [renormalize]: 0.00037015 [add_forward_monad_depend]: 4.44998e-06 [auto_monad_grad]: 1.69998e-06 [auto_monad_eliminator]: 1.387e-05 [cse]: 2.714e-05 [a_3]: 4.187e-05 [Cycle 2]: 0.00060991, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.16001e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012802 [with_stream_mark]: 1.405e-05 [recompute_prepare]: 6.01e-06 [updatestate_depend_eliminate]: 3.13e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.74999e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.911e-05 [accelerated_algorithm]: 5.48002e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.45e-06 [flash_sp]: 3.21999e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.33e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.25001e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.76e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.05e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86e-06 [meta_fg_expand]: 1.84e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.22999e-06 [after_resolve]: 9.20001e-06 [a_after_grad]: 8.27e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.344e-05 [a_3]: 3.288e-05 [py_interpret_to_execute_after_opt_a]: 7.79002e-06 [slice_cell_reuse_recomputed_activation]: 2.43002e-06 [rewriter_after_opt_a]: 3.736e-05 [convert_after_rewriter]: 7.55e-06 [order_py_execute_after_rewriter]: 5.48002e-06 [mutable_eliminate]: 0.00044866 [opt_b]: 0.00018733, [1] [Cycle 1]: 0.00018132, [7] [b_1]: 0.0001116 [b_2]: 7.46001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.68998e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 3.69997e-07 [cse]: 1.67e-05 [optimize_parallel_all_gather_comm]: 2.011e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.218e-05 [loop_unroll]: 0.00047844 [opt_after_cconv]: 9.874e-05, [1] [Cycle 1]: 9.292e-05, [7] [c_1]: 2.859e-05 [parameter_eliminate]: 2.69999e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 1.726e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.293e-05 [tuple_transform]: 7.05e-05, [1] [Cycle 1]: 6.633e-05, [4] [d_1]: 4.013e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.35002e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.291e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.049e-05 [environ_conv]: 4.70999e-06 [swap_dp_allreduce_reducescatter]: 5.25001e-06 [bias_add_comm_swap]: 3.12002e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 1.13001e-06 [remove_cast_before_assign_add]: 1.35999e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 2.63003e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.11002e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.223e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.38001e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 2.12999e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 6.966e-05, [1] [Cycle 1]: 6.506e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.179e-05 [opt_reshape]: 6.38003e-06 [fold_const_symbol]: 8.93002e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.574e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00045155 [validate]: 3.221e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.0599518 [execute]: 8.08999e-06 Sums bootstrap : 0.000473s : 0.69% type_inference : 0.004472s : 6.52% event_method : 0.000011s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000428s : 0.62% optimize.opt_a.with_stream_mark : 0.000030s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000370s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.65% optimize.opt_b.b_1 : 0.000112s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000478s : 0.70% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000452s : 0.66% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059952s : 87.44% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000123 26 17.72% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.68% : 0.000006s : 4: substitution.graph_param_transform 66.17% : 0.000081s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 3.15% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004428 2 91.47% : 0.004051s : 1: type_inference.infer 8.53% : 0.000378s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000142 984 1.02% : 0.000001s : 9: predicate.accumulaten_eliminater 1.14% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.80% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000001s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.79% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.14% : 0.000002s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.86% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.97% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.62% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.20% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.90% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000270 6 41.91% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.09% : 0.000157s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080877 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.79% : 0.003063s : 1: add_attr 3.77% : 0.003053s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.63% : 0.000507s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.60% : 0.000488s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.09% : 0.000070s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000787s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000093s : 28: opt.transform.opt_b 0.06% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.38% : 0.001929s : 1: opt_a 0.13% : 0.000102s : 1: opt_after_cconv 0.57% : 0.000461s : 1: opt_after_jit_grad 0.24% : 0.000191s : 1: opt_b 4.75% : 0.003840s : 1: optimize 0.03% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.03% : 0.000021s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000192s : 1: renormalize.infer 0.21% : 0.000171s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000043s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 74.15% : 0.059969s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.55% : 0.004486s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.0732204, [24] [bootstrap]: 0.00046327 [type_inference]: 0.0056582 [event_method]: 1.437e-05 [auto_monad]: 5.469e-05 [graph_reusing]: 5.32999e-06 [inline]: 1.72999e-06 [add_attr]: 0.0030452, [1] [add_attr_with_inline]: 0.00303733, [1] [Cycle 1]: 4.477e-05, [2] [tag_attr]: 1.58e-05 [meta_addattr_fg_expand]: 4.4e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.539e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.68002e-06 [optimize]: 0.00407684, [53] [py_interpret_to_execute]: 2.348e-05 [rewriter_before_opt_a]: 6.015e-05 [opt_a]: 0.00221532, [2] [Cycle 1]: 0.00158836, [45] [expand_dump_flag]: 2.88003e-06 [switch_simplify]: 3.265e-05 [loop_unroll]: 2.136e-05 [a_1]: 0.00046278 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.98999e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.607e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 9.34e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 5.60001e-06 [parallel]: 1.731e-05 [flash_sp]: 7.11999e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.69e-06 [allreduce_slice_to_reducescatter]: 5.99975e-07 [virtual_shard_identity]: 7.55e-06 [virtual_dataset]: 6.02001e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.109e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.60002e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 9.19e-06 [renormalize]: 0.00047404 [add_forward_monad_depend]: 4.61002e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 2.577e-05 [a_3]: 4.273e-05 [Cycle 2]: 0.00061704, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 7.58001e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012672 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 6.04001e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.972e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.19998e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.57e-06 [auto_parallel]: 5.56002e-06 [parallel]: 4.12e-06 [flash_sp]: 3.08998e-06 [merge_comm]: 3.2e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.49998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.23002e-06 [virtual_output]: 5.14003e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 6.20002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.71999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13998e-06 [meta_fg_expand]: 1.86e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 9.17001e-06 [a_after_grad]: 8.92e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.29998e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.94001e-06 [cse]: 1.4e-05 [a_3]: 3.448e-05 [py_interpret_to_execute_after_opt_a]: 7.7e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.157e-05 [convert_after_rewriter]: 7.05998e-06 [order_py_execute_after_rewriter]: 5.21002e-06 [mutable_eliminate]: 0.00045055 [opt_b]: 0.00018733, [1] [Cycle 1]: 0.00018128, [7] [b_1]: 0.00011142 [b_2]: 7.48e-06 [updatestate_depend_eliminate]: 5.66e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 2.89991e-07 [cse]: 1.663e-05 [optimize_parallel_all_gather_comm]: 1.53e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00042022 [opt_after_cconv]: 9.616e-05, [1] [Cycle 1]: 9.04e-05, [7] [c_1]: 2.785e-05 [parameter_eliminate]: 2.53e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.632e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 7.238e-05, [1] [Cycle 1]: 6.797e-05, [4] [d_1]: 4.062e-05 [none_parameter_eliminate]: 1.86003e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.83e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 4.307e-05 [cse_after_recomputation]: 2.083e-05, [1] [Cycle 1]: 1.653e-05, [1] [cse]: 1.135e-05 [environ_conv]: 5.24e-06 [swap_dp_allreduce_reducescatter]: 5.03002e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 1.97999e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.87999e-06 [offloading_packed_experts]: 3.41999e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 3.60998e-06 [overlap_grad_flash_sp]: 1.753e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.95e-05, [1] [Cycle 1]: 6.518e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 9.05001e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 1.70025e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.32999e-06 [auto_monad_reorder]: 1.597e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00045746 [validate]: 3.181e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0591422 [execute]: 8.59e-06 Sums bootstrap : 0.000463s : 0.67% type_inference : 0.005658s : 8.18% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000590s : 0.85% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000015s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000474s : 0.69% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000077s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.65% optimize.opt_b.b_1 : 0.000111s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000420s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000041s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000457s : 0.66% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059142s : 85.48% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000169 30 14.38% : 0.000024s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.28% : 0.000006s : 4: substitution.graph_param_transform 67.72% : 0.000115s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.76% : 0.000005s : 4: substitution.remove_not_recompute_node 2.24% : 0.000004s : 4: substitution.replace_old_param 6.15% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005617 2 88.70% : 0.004982s : 1: type_inference.infer 11.30% : 0.000635s : 1: type_inference.specialize ------[replace.] 0.000044 5 71.03% : 0.000031s : 3: replace.inline 28.97% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.36% : 0.000113s : 3: match.inline 7.64% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.34% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.77% : 0.000009s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.80% : 0.000003s : 16: predicate.partial_defer_inline 1.40% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 1.05% : 0.000002s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.25% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 8 44.23% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.77% : 0.000208s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081943 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.72% : 0.003050s : 1: add_attr 3.71% : 0.003041s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.61% : 0.000498s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.18% : 0.000967s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000093s : 28: opt.transform.opt_b 0.06% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.71% : 0.002218s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.57% : 0.000467s : 1: opt_after_jit_grad 0.23% : 0.000191s : 1: opt_b 4.98% : 0.004081s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000006s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000216s : 1: renormalize.infer 0.31% : 0.000251s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 72.19% : 0.059158s : 1: task_emit 0.09% : 0.000075s : 1: tuple_transform 6.92% : 0.005671s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.119586, [24] [bootstrap]: 0.00045812 [type_inference]: 0.0118708 [event_method]: 5.283e-05 [auto_monad]: 0.00012673 [graph_reusing]: 9.20999e-06 [inline]: 1.75001e-06 [add_attr]: 0.0031054, [1] [add_attr_with_inline]: 0.00309663, [1] [Cycle 1]: 7.233e-05, [2] [tag_attr]: 3.492e-05 [meta_addattr_fg_expand]: 1.036e-05 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 5.135e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.0142602, [53] [py_interpret_to_execute]: 3.846e-05 [rewriter_before_opt_a]: 0.00014964 [opt_a]: 0.0118181, [3] [Cycle 1]: 0.007592, [45] [expand_dump_flag]: 4.22e-06 [switch_simplify]: 7.655e-05 [loop_unroll]: 6.391e-05 [a_1]: 0.00154133 [with_stream_mark]: 2.413e-05 [recompute_prepare]: 2.247e-05 [updatestate_depend_eliminate]: 9.09998e-06 [updatestate_assign_eliminate]: 7.73001e-06 [updatestate_loads_eliminate]: 7.18998e-06 [parameter_eliminate]: 2.96001e-06 [a_2]: 0.00024976 [accelerated_algorithm]: 3.175e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.88999e-06 [shard_inline]: 1.697e-05 [merge_send_recv]: 1.638e-05 [auto_parallel]: 1.097e-05 [parallel]: 1.91e-05 [flash_sp]: 1.187e-05 [merge_comm]: 9.90002e-06 [allreduce_fusion]: 9.61e-06 [matmul_add_comm_reduction]: 2.785e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.864e-05 [virtual_dataset]: 1.638e-05 [get_grad_eliminate_]: 1.652e-05 [virtual_output]: 1.58e-05 [merge_forward]: 9.67999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 1.839e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.932e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.835e-05 [set_forward_comm_id_for_comm_node_pass]: 1.019e-05 [meta_fg_expand]: 0.00152416 [flash_sp_send_recv_attached]: 4.3e-06 [receive_attached]: 2.48e-06 [after_resolve]: 6.243e-05 [a_after_grad]: 8.319e-05 [renormalize]: 0.00269886 [add_forward_monad_depend]: 1.005e-05 [auto_monad_grad]: 5.46998e-06 [auto_monad_eliminator]: 5.906e-05 [cse]: 0.00017702 [a_3]: 0.00034646 [Cycle 2]: 0.00327771, [45] [expand_dump_flag]: 1.65001e-06 [switch_simplify]: 4.759e-05 [loop_unroll]: 4.565e-05 [a_1]: 0.00159393 [with_stream_mark]: 1.397e-05 [recompute_prepare]: 1.17e-05 [updatestate_depend_eliminate]: 5.69999e-06 [updatestate_assign_eliminate]: 4.55001e-06 [updatestate_loads_eliminate]: 4.35e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 0.00013224 [accelerated_algorithm]: 1.263e-05 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 2.46e-06 [shard_inline]: 9.70002e-06 [merge_send_recv]: 7.82998e-06 [auto_parallel]: 9.09e-06 [parallel]: 5.08002e-06 [flash_sp]: 3.73001e-06 [merge_comm]: 5.71998e-06 [allreduce_fusion]: 5.17e-06 [matmul_add_comm_reduction]: 8.62998e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.131e-05 [virtual_dataset]: 9.32001e-06 [get_grad_eliminate_]: 9.54e-06 [virtual_output]: 8.78001e-06 [merge_forward]: 5.09e-06 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 1.067e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.689e-05 [merge_recompute_call_nodes]: 1.04e-06 [before_grad]: 1.506e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37001e-06 [meta_fg_expand]: 8.04e-05 [flash_sp_send_recv_attached]: 1.15999e-06 [receive_attached]: 1.32e-06 [after_resolve]: 1.72e-05 [a_after_grad]: 1.514e-05 [renormalize]: 0.00070748 [add_forward_monad_depend]: 4.58001e-06 [auto_monad_grad]: 1.42e-06 [auto_monad_eliminator]: 1.599e-05 [cse]: 5.279e-05 [a_3]: 6.773e-05 [Cycle 3]: 0.00093335, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.077e-05 [loop_unroll]: 9.57001e-06 [a_1]: 0.0002578 [with_stream_mark]: 1.036e-05 [recompute_prepare]: 1.001e-05 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 4.23001e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012693 [accelerated_algorithm]: 1.257e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 9.34e-06 [merge_send_recv]: 7.15e-06 [auto_parallel]: 7.46999e-06 [parallel]: 5.34e-06 [flash_sp]: 1.19e-06 [merge_comm]: 5.34998e-06 [allreduce_fusion]: 4.95999e-06 [matmul_add_comm_reduction]: 8.1e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.073e-05 [virtual_dataset]: 9.20999e-06 [get_grad_eliminate_]: 8.69e-06 [virtual_output]: 8.60001e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 9.09998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.657e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.472e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40999e-06 [meta_fg_expand]: 3.31999e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.364e-05 [a_after_grad]: 1.491e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.52999e-06 [auto_monad_grad]: 1.05999e-06 [auto_monad_eliminator]: 1.132e-05 [cse]: 2.857e-05 [a_3]: 6.164e-05 [py_interpret_to_execute_after_opt_a]: 1.256e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 5.024e-05 [convert_after_rewriter]: 9.69999e-06 [order_py_execute_after_rewriter]: 7.15e-06 [mutable_eliminate]: 0.00049602 [opt_b]: 0.00029942, [1] [Cycle 1]: 0.00029312, [7] [b_1]: 0.00019655 [b_2]: 1.158e-05 [updatestate_depend_eliminate]: 7.53e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 4.25e-06 [renormalize]: 3.10014e-07 [cse]: 3.346e-05 [optimize_parallel_all_gather_comm]: 2.136e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.112e-05 [loop_unroll]: 0.00043653 [opt_after_cconv]: 0.00021408, [1] [Cycle 1]: 0.00020808, [7] [c_1]: 5.084e-05 [parameter_eliminate]: 2.63998e-06 [updatestate_depend_eliminate]: 7.45998e-06 [updatestate_assign_eliminate]: 4.38999e-06 [updatestate_loads_eliminate]: 4.27e-06 [cse]: 3.19e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 3.104e-05 [tuple_transform]: 0.00010672, [1] [Cycle 1]: 0.00010161, [4] [d_1]: 7.031e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 3.10014e-07 [switch_simplify]: 1.038e-05 [partial_unused_args_eliminate]: 2.06e-06 [add_recomputation]: 5.825e-05 [cse_after_recomputation]: 3.523e-05, [1] [Cycle 1]: 3.036e-05, [1] [cse]: 2.46e-05 [environ_conv]: 9.40001e-06 [swap_dp_allreduce_reducescatter]: 8.25999e-06 [bias_add_comm_swap]: 2.78e-06 [label_micro_interleaved_index]: 4.33001e-06 [label_fine_grained_interleaved_index]: 3.21001e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50999e-06 [control_data_broadcast_order]: 1.798e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 5.72999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.53002e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 5.32001e-06 [overlap_grad_flash_sp]: 2.458e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 0.00010347, [1] [Cycle 1]: 9.931e-05, [6] [build]: 1.011e-05 [elim_shapecalc]: 1.456e-05 [elim_not_effective]: 1.9e-05 [opt_reshape]: 1.1e-05 [fold_const_symbol]: 1.552e-05 [renormalize]: 3.4002e-07 [detach_backward]: 1.54e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 2.678e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.96001e-06 [opt_after_jit_grad]: 0.00048156 [validate]: 4.806e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0888362 [execute]: 9.24998e-06 Sums bootstrap : 0.000458s : 0.40% type_inference : 0.011871s : 10.32% event_method : 0.000053s : 0.05% auto_monad : 0.000127s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000150s : 0.13% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000135s : 0.12% optimize.opt_a.loop_unroll : 0.000119s : 0.10% optimize.opt_a.a_1 : 0.003393s : 2.95% optimize.opt_a.with_stream_mark : 0.000048s : 0.04% optimize.opt_a.recompute_prepare : 0.000044s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000509s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000036s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000028s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.03% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000045s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.04% optimize.opt_a.virtual_dataset : 0.000035s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001608s : 1.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000093s : 0.08% optimize.opt_a.a_after_grad : 0.000113s : 0.10% optimize.opt_a.renormalize : 0.003406s : 2.96% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.08% optimize.opt_a.cse : 0.000258s : 0.22% optimize.opt_a.a_3 : 0.000476s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000496s : 0.43% optimize.opt_b.b_1 : 0.000197s : 0.17% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000437s : 0.38% optimize.opt_after_cconv.c_1 : 0.000051s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.03% optimize.tuple_transform.d_1 : 0.000070s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000025s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000482s : 0.42% validate : 0.000048s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.088836s : 77.20% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000818 222 5.81% : 0.000048s : 12: substitution.arithmetic_simplify 1.80% : 0.000015s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 57.40% : 0.000469s : 17: substitution.inline 1.98% : 0.000016s : 2: substitution.inline_without_move 1.25% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.82% : 0.000015s : 3: substitution.less_batch_normalization 1.63% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000006s : 5: substitution.partial_eliminate 1.66% : 0.000014s : 20: substitution.remove_not_recompute_node 3.05% : 0.000025s : 10: substitution.replace_applicator 1.37% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.41% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.71% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.23% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.33% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.31% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011792 2 85.72% : 0.010109s : 1: type_inference.infer 14.28% : 0.001684s : 1: type_inference.specialize ------[replace.] 0.000243 33 57.72% : 0.000140s : 17: replace.inline 42.28% : 0.000103s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000494 33 93.00% : 0.000459s : 17: match.inline 7.00% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000803 5764 1.02% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 1.02% : 0.000008s : 68: predicate.addn_zero_filter 1.01% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000017s : 100: predicate.arithmetic_simplify 1.09% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.13% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.12% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.12% : 0.000009s : 76: predicate.environ_get_depend_swap 1.67% : 0.000013s : 108: predicate.environ_get_eliminate 1.11% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.63% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000018s : 101: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.63% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.39% : 0.000043s : 249: predicate.inline 1.20% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.56% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.49% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.06% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.05% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000016s : 101: predicate.partial_defer_inline 5.38% : 0.000043s : 92: predicate.partial_eliminate 1.02% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.48% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.81% : 0.000015s : 152: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.03% : 0.000008s : 68: predicate.reshape_eliminate 1.07% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.22% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.76% : 0.000014s : 101: predicate.switch_defer_inline 2.80% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.74% : 0.000038s : 277: predicate.switch_simplify 1.03% : 0.000008s : 68: predicate.tile_eliminate 1.04% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000017s : 116: predicate.tuple_list_set_item_eliminator 1.56% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.46% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.08% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001809 34 54.86% : 0.000993s : 13: func_graph_cloner_run.FuncGraphClonerGraph 45.14% : 0.000817s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.145804 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.13% : 0.003110s : 1: add_attr 2.13% : 0.003101s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000134s : 1: auto_monad 0.02% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.34% : 0.000494s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000061s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000445s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000505s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000019s : 1: opt.transform.mutable_eliminate 3.51% : 0.005111s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000182s : 28: opt.transform.opt_b 0.05% : 0.000078s : 2: opt.transform.opt_trans_graph 0.04% : 0.000057s : 4: opt.transform.symbol_engine_opt 8.11% : 0.011821s : 1: opt_a 0.15% : 0.000218s : 1: opt_after_cconv 0.34% : 0.000492s : 1: opt_after_jit_grad 0.21% : 0.000303s : 1: opt_b 9.78% : 0.014264s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000056s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000036s : 1: remove_dup_value 1.22% : 0.001785s : 2: renormalize.infer 1.10% : 0.001607s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000054s : 1: rewriter_after_opt_a 0.11% : 0.000154s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000106s : 1: symbol_engine_optimizer 60.94% : 0.088858s : 1: task_emit 0.08% : 0.000110s : 1: tuple_transform 8.15% : 0.011886s : 1: type_inference 0.05% : 0.000076s : 1: validate TotalTime = 0.0713732, [24] [bootstrap]: 0.00050742 [type_inference]: 0.00435184 [event_method]: 1.091e-05 [auto_monad]: 5.287e-05 [graph_reusing]: 4.82998e-06 [inline]: 1.94e-06 [add_attr]: 0.00304071, [1] [add_attr_with_inline]: 0.0030323, [1] [Cycle 1]: 4.772e-05, [2] [tag_attr]: 1.261e-05 [meta_addattr_fg_expand]: 3.35e-06 [parallel-infer-symbol]: 2.89001e-06 [pre_auto_parallel]: 2.104e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00377765, [53] [py_interpret_to_execute]: 1.523e-05 [rewriter_before_opt_a]: 3.963e-05 [opt_a]: 0.00193544, [2] [Cycle 1]: 0.00131408, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.487e-05 [loop_unroll]: 1.433e-05 [a_1]: 0.00031885 [with_stream_mark]: 1.377e-05 [recompute_prepare]: 8.12998e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.10002e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.878e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 6.01003e-06 [merge_send_recv]: 8.27e-06 [auto_parallel]: 6.31e-06 [parallel]: 1.844e-05 [flash_sp]: 7.78001e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.21002e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 6.96001e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.68999e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 8.69998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.09e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.76e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.88e-06 [after_resolve]: 1.102e-05 [a_after_grad]: 9.09998e-06 [renormalize]: 0.00036364 [add_forward_monad_depend]: 5.10001e-06 [auto_monad_grad]: 1.76998e-06 [auto_monad_eliminator]: 1.31e-05 [cse]: 2.589e-05 [a_3]: 4.184e-05 [Cycle 2]: 0.00061205, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012803 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 6.31e-06 [updatestate_depend_eliminate]: 2.69001e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 6.975e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.24998e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 4.48001e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.08001e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.36e-06 [virtual_dataset]: 5.81998e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 6.11998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.91e-06 [merge_recompute_call_nodes]: 6.30011e-07 [before_grad]: 8.17e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.89001e-06 [a_after_grad]: 9.44e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.68e-06 [cse]: 1.294e-05 [a_3]: 3.462e-05 [py_interpret_to_execute_after_opt_a]: 7.75998e-06 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 3.121e-05 [convert_after_rewriter]: 7.17997e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.0004552 [opt_b]: 0.00018638, [1] [Cycle 1]: 0.0001803, [7] [b_1]: 0.00011206 [b_2]: 7.42002e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.00003e-07 [cse]: 1.629e-05 [optimize_parallel_all_gather_comm]: 1.625e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.243e-05 [loop_unroll]: 0.00042346 [opt_after_cconv]: 9.695e-05, [1] [Cycle 1]: 9.104e-05, [7] [c_1]: 2.907e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.621e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.188e-05 [tuple_transform]: 7.085e-05, [1] [Cycle 1]: 6.638e-05, [4] [d_1]: 4.049e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.36e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.517e-05 [cse_after_recomputation]: 2.052e-05, [1] [Cycle 1]: 1.577e-05, [1] [cse]: 1.056e-05 [environ_conv]: 4.83001e-06 [swap_dp_allreduce_reducescatter]: 5.08002e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.70001e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.48998e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.66e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50999e-06 [control_data_broadcast_order]: 1.231e-05 [grouped_pairwise_exchange_alltoall]: 1.66998e-06 [offloading_packed_experts]: 3.88001e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 1.86998e-06 [overlap_grad_ring_attention]: 4.05998e-06 [overlap_grad_flash_sp]: 1.694e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.01003e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.988e-05, [1] [Cycle 1]: 6.57e-05, [6] [build]: 2.48998e-06 [elim_shapecalc]: 8.58001e-06 [elim_not_effective]: 1.184e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.06998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.543e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.78001e-06 [opt_after_jit_grad]: 0.00047757 [validate]: 3.086e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0588522 [execute]: 9.12001e-06 Sums bootstrap : 0.000507s : 0.75% type_inference : 0.004352s : 6.46% event_method : 0.000011s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000447s : 0.66% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000149s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000019s : 0.03% optimize.opt_a.renormalize : 0.000364s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000076s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000455s : 0.68% optimize.opt_b.b_1 : 0.000112s : 0.17% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000423s : 0.63% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000478s : 0.71% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058852s : 87.38% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000128 26 17.38% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.14% : 0.000001s : 2: substitution.fold_const_symbol 4.59% : 0.000006s : 4: substitution.graph_param_transform 66.84% : 0.000085s : 2: substitution.inline 2.21% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.15% : 0.000004s : 4: substitution.remove_not_recompute_node 3.25% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004311 2 91.53% : 0.003946s : 1: type_inference.infer 8.47% : 0.000365s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000083 2 100.00% : 0.000083s : 2: match.inline ------[predicate.] 0.000142 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.58% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.38% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.18% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.24% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.68% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000252 6 41.23% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.77% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079523 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.83% : 0.003045s : 1: add_attr 3.82% : 0.003036s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.68% : 0.000542s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.54% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.02% : 0.000809s : 78: opt.transform.opt_a 0.03% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000093s : 28: opt.transform.opt_b 0.06% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.44% : 0.001938s : 1: opt_a 0.13% : 0.000100s : 1: opt_after_cconv 0.61% : 0.000487s : 1: opt_after_jit_grad 0.24% : 0.000190s : 1: opt_b 4.76% : 0.003781s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.25% : 0.000195s : 1: renormalize.infer 0.20% : 0.000161s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 74.03% : 0.058868s : 1: task_emit 0.09% : 0.000074s : 1: tuple_transform 5.49% : 0.004365s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.118479, [24] [bootstrap]: 0.00049103 [type_inference]: 0.0106875 [event_method]: 4.651e-05 [auto_monad]: 0.00012072 [graph_reusing]: 8.45001e-06 [inline]: 1.62999e-06 [add_attr]: 0.0030778, [1] [add_attr_with_inline]: 0.00306901, [1] [Cycle 1]: 6.872e-05, [2] [tag_attr]: 3.157e-05 [meta_addattr_fg_expand]: 9.49999e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 4.758e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0137782, [53] [py_interpret_to_execute]: 3.65e-05 [rewriter_before_opt_a]: 0.00013077 [opt_a]: 0.0114313, [3] [Cycle 1]: 0.00735732, [45] [expand_dump_flag]: 4.01001e-06 [switch_simplify]: 6.817e-05 [loop_unroll]: 5.624e-05 [a_1]: 0.001402 [with_stream_mark]: 2.392e-05 [recompute_prepare]: 2.224e-05 [updatestate_depend_eliminate]: 9.69e-06 [updatestate_assign_eliminate]: 7.82e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.51e-06 [a_2]: 0.00024743 [accelerated_algorithm]: 3.102e-05 [shard]: 1.99e-06 [meta_shard_fg_expand]: 3.73999e-06 [shard_inline]: 1.656e-05 [merge_send_recv]: 1.545e-05 [auto_parallel]: 1.149e-05 [parallel]: 1.825e-05 [flash_sp]: 1.207e-05 [merge_comm]: 9.79e-06 [allreduce_fusion]: 9.12999e-06 [matmul_add_comm_reduction]: 2.608e-05 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 2.543e-05 [virtual_dataset]: 1.65e-05 [get_grad_eliminate_]: 1.561e-05 [virtual_output]: 1.556e-05 [merge_forward]: 9.72999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 1.871e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.911e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 2.75e-05 [set_forward_comm_id_for_comm_node_pass]: 1.007e-05 [meta_fg_expand]: 0.00148732 [flash_sp_send_recv_attached]: 3.81999e-06 [receive_attached]: 2.52001e-06 [after_resolve]: 6.152e-05 [a_after_grad]: 8.521e-05 [renormalize]: 0.00265983 [add_forward_monad_depend]: 1.001e-05 [auto_monad_grad]: 5.52999e-06 [auto_monad_eliminator]: 5.805e-05 [cse]: 0.00017903 [a_3]: 0.00034585 [Cycle 2]: 0.00313143, [45] [expand_dump_flag]: 1.63002e-06 [switch_simplify]: 4.789e-05 [loop_unroll]: 4.485e-05 [a_1]: 0.00162849 [with_stream_mark]: 1.27e-05 [recompute_prepare]: 1.184e-05 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 4.57e-06 [updatestate_loads_eliminate]: 3.81999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00012872 [accelerated_algorithm]: 1.233e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 2.00002e-06 [shard_inline]: 1.076e-05 [merge_send_recv]: 7.88999e-06 [auto_parallel]: 7.68999e-06 [parallel]: 5.25999e-06 [flash_sp]: 3.03e-06 [merge_comm]: 5.52001e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 8.3e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.051e-05 [virtual_dataset]: 9.02999e-06 [get_grad_eliminate_]: 8.99998e-06 [virtual_output]: 8.69e-06 [merge_forward]: 4.42e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.57001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.688e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.457e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 3.639e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.02e-06 [after_resolve]: 1.506e-05 [a_after_grad]: 1.453e-05 [renormalize]: 0.00063108 [add_forward_monad_depend]: 4.08999e-06 [auto_monad_grad]: 1.29e-06 [auto_monad_eliminator]: 1.498e-05 [cse]: 5.149e-05 [a_3]: 6.728e-05 [Cycle 3]: 0.00092866, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 1.108e-05 [loop_unroll]: 9.56e-06 [a_1]: 0.00025666 [with_stream_mark]: 1.053e-05 [recompute_prepare]: 9.67999e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 4.02998e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012625 [accelerated_algorithm]: 1.23e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 9.34e-06 [merge_send_recv]: 7.31001e-06 [auto_parallel]: 7.6e-06 [parallel]: 4.51002e-06 [flash_sp]: 1.14998e-06 [merge_comm]: 4.94998e-06 [allreduce_fusion]: 5.05001e-06 [matmul_add_comm_reduction]: 7.97003e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.072e-05 [virtual_dataset]: 8.82e-06 [get_grad_eliminate_]: 8.67e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.51002e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.625e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 1.462e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40001e-06 [meta_fg_expand]: 3.26001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 1.431e-05 [a_after_grad]: 1.562e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 1.202e-05 [cse]: 2.876e-05 [a_3]: 6.135e-05 [py_interpret_to_execute_after_opt_a]: 1.135e-05 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 4.825e-05 [convert_after_rewriter]: 9.62001e-06 [order_py_execute_after_rewriter]: 6.88e-06 [mutable_eliminate]: 0.000467 [opt_b]: 0.00030055, [1] [Cycle 1]: 0.0002942, [7] [b_1]: 0.00019534 [b_2]: 1.16e-05 [updatestate_depend_eliminate]: 7.82e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 4.25e-06 [renormalize]: 4.19997e-07 [cse]: 3.478e-05 [optimize_parallel_all_gather_comm]: 2.168e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 1.944e-05 [loop_unroll]: 0.00043407 [opt_after_cconv]: 0.00014168, [1] [Cycle 1]: 0.00013577, [7] [c_1]: 4.998e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 7.83999e-06 [updatestate_assign_eliminate]: 4.26001e-06 [updatestate_loads_eliminate]: 4.32e-06 [cse]: 3.197e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 6.619e-05 [tuple_transform]: 0.00010712, [1] [Cycle 1]: 0.00010191, [4] [d_1]: 7.038e-05 [none_parameter_eliminate]: 1.83002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.03e-05 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 5.808e-05 [cse_after_recomputation]: 3.574e-05, [1] [Cycle 1]: 3.075e-05, [1] [cse]: 2.526e-05 [environ_conv]: 9.77999e-06 [swap_dp_allreduce_reducescatter]: 9.03002e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 1.20001e-06 [remove_cast_before_assign_add]: 1.20001e-06 [full_micro_interleaved_order_control]: 1.94999e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.16997e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.81e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 5.68997e-06 [overlap_recompute_and_grad_model_parallel]: 6.10002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 5.43002e-06 [overlap_grad_flash_sp]: 2.494e-05 [begin_end_overlap_inline]: 7.09988e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 0.00010082, [1] [Cycle 1]: 9.634e-05, [6] [build]: 9.20999e-06 [elim_shapecalc]: 1.44e-05 [elim_not_effective]: 1.892e-05 [opt_reshape]: 1.027e-05 [fold_const_symbol]: 1.506e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 2.601e-05 [get_jit_bprop_graph]: 1.15999e-06 [rewriter_after_jit_bprop_graph]: 3.7e-06 [opt_after_jit_grad]: 0.00048178 [validate]: 4.698e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.089426 [execute]: 8.20999e-06 Sums bootstrap : 0.000491s : 0.43% type_inference : 0.010688s : 9.37% event_method : 0.000047s : 0.04% auto_monad : 0.000121s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000131s : 0.11% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000127s : 0.11% optimize.opt_a.loop_unroll : 0.000111s : 0.10% optimize.opt_a.a_1 : 0.003287s : 2.88% optimize.opt_a.with_stream_mark : 0.000047s : 0.04% optimize.opt_a.recompute_prepare : 0.000044s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000502s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000037s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000027s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000047s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001527s : 1.34% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.08% optimize.opt_a.a_after_grad : 0.000115s : 0.10% optimize.opt_a.renormalize : 0.003291s : 2.88% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.07% optimize.opt_a.cse : 0.000259s : 0.23% optimize.opt_a.a_3 : 0.000474s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.41% optimize.opt_b.b_1 : 0.000195s : 0.17% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000035s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000434s : 0.38% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000066s : 0.06% optimize.tuple_transform.d_1 : 0.000070s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000025s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000482s : 0.42% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.089426s : 78.36% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000773 218 5.76% : 0.000045s : 11: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.38% : 0.000436s : 16: substitution.inline 2.11% : 0.000016s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000015s : 3: substitution.less_batch_normalization 1.68% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.75% : 0.000014s : 20: substitution.remove_not_recompute_node 3.26% : 0.000025s : 10: substitution.replace_applicator 1.43% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.57% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.22% : 0.000064s : 28: substitution.tuple_list_get_item_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010616 2 86.13% : 0.009144s : 1: type_inference.infer 13.87% : 0.001473s : 1: type_inference.specialize ------[replace.] 0.000218 30 59.63% : 0.000130s : 16: replace.inline 40.37% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000458 30 93.14% : 0.000426s : 16: match.inline 6.86% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000769 5663 1.03% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 99: predicate.arithmetic_simplify 1.10% : 0.000008s : 67: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_depend_swap 1.71% : 0.000013s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.61% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.52% : 0.000042s : 244: predicate.inline 1.24% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.14% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 3.37% : 0.000026s : 97: predicate.partial_defer_inline 1.68% : 0.000013s : 89: predicate.partial_eliminate 1.03% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.56% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.13% : 0.000009s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.76% : 0.000014s : 97: predicate.switch_defer_inline 2.84% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.78% : 0.000037s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.04% : 0.000008s : 67: predicate.transpose_eliminate 1.52% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.50% : 0.000012s : 83: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.53% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.17% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001676 32 55.39% : 0.000928s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.61% : 0.000748s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.143949 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.14% : 0.003082s : 1: add_attr 2.13% : 0.003073s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000128s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.37% : 0.000526s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000054s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000443s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000476s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.47% : 0.004989s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000180s : 28: opt.transform.opt_b 0.05% : 0.000078s : 2: opt.transform.opt_trans_graph 0.04% : 0.000055s : 4: opt.transform.symbol_engine_opt 7.94% : 0.011434s : 1: opt_a 0.10% : 0.000145s : 1: opt_after_cconv 0.34% : 0.000491s : 1: opt_after_jit_grad 0.21% : 0.000304s : 1: opt_b 9.57% : 0.013782s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000071s : 1: remove_dup_value 1.19% : 0.001716s : 2: renormalize.infer 1.08% : 0.001560s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.09% : 0.000136s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000104s : 1: symbol_engine_optimizer 62.14% : 0.089443s : 1: task_emit 0.08% : 0.000110s : 1: tuple_transform 7.43% : 0.010702s : 1: type_inference 0.05% : 0.000072s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x1-ge],max_mem:42.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x2-pynative],max_mem:42.0M TotalTime = 0.0234425, [24] [bootstrap]: 0.00061652 [type_inference]: 0.00680354 [event_method]: 1.44e-05 [auto_monad]: 5.711e-05 [graph_reusing]: 6.11e-06 [inline]: 1.89e-06 [add_attr]: 0.00356875, [1] [add_attr_with_inline]: 0.00355813, [1] [Cycle 1]: 4.632e-05, [2] [tag_attr]: 1.617e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 2.50002e-06 [pre_auto_parallel]: 2.825e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.16998e-06 [pipeline_split]: 1.45001e-06 [optimize]: 0.00410659, [53] [py_interpret_to_execute]: 2.098e-05 [rewriter_before_opt_a]: 5.911e-05 [opt_a]: 0.00223104, [2] [Cycle 1]: 0.00158126, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 3.369e-05 [loop_unroll]: 2.264e-05 [a_1]: 0.00046931 [with_stream_mark]: 1.333e-05 [recompute_prepare]: 8.57e-06 [updatestate_depend_eliminate]: 3.59002e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.806e-05 [accelerated_algorithm]: 6.53998e-06 [shard]: 2.54999e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 6.14001e-06 [merge_send_recv]: 8.36002e-06 [auto_parallel]: 6.36e-06 [parallel]: 2.444e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 4.27e-06 [matmul_add_comm_reduction]: 9.03002e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.53999e-06 [virtual_dataset]: 6.39001e-06 [get_grad_eliminate_]: 5.94e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 3.72002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.20999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.13e-05 [merge_recompute_call_nodes]: 1.41998e-06 [before_grad]: 9.79e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 9.47001e-06 [renormalize]: 0.00044201 [add_forward_monad_depend]: 4.92999e-06 [auto_monad_grad]: 2.16e-06 [auto_monad_eliminator]: 1.459e-05 [cse]: 2.827e-05 [a_3]: 4.226e-05 [Cycle 2]: 0.00064032, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.11001e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00013059 [with_stream_mark]: 1.026e-05 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.58003e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 6.864e-05 [accelerated_algorithm]: 5.73002e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.53001e-06 [flash_sp]: 2.93998e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.34001e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.13002e-06 [merge_forward]: 2.96001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 6.39001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.70002e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.54998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 1.84998e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.87999e-06 [a_after_grad]: 8.27003e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.84001e-06 [cse]: 4.346e-05 [a_3]: 3.396e-05 [py_interpret_to_execute_after_opt_a]: 8.12e-06 [slice_cell_reuse_recomputed_activation]: 2.27001e-06 [rewriter_after_opt_a]: 2.992e-05 [convert_after_rewriter]: 7.85998e-06 [order_py_execute_after_rewriter]: 5.37999e-06 [mutable_eliminate]: 0.00045517 [opt_b]: 0.00018551, [1] [Cycle 1]: 0.00017954, [7] [b_1]: 0.0001106 [b_2]: 7.31001e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.62001e-06 [renormalize]: 3.00002e-07 [cse]: 1.672e-05 [optimize_parallel_all_gather_comm]: 1.626e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.182e-05 [loop_unroll]: 0.0004221 [opt_after_cconv]: 9.645e-05, [1] [Cycle 1]: 9.072e-05, [7] [c_1]: 2.857e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.63e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.256e-05 [tuple_transform]: 7.086e-05, [1] [Cycle 1]: 6.641e-05, [4] [d_1]: 4.038e-05 [none_parameter_eliminate]: 1.92001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.56002e-06 [add_recomputation]: 5.07e-05 [cse_after_recomputation]: 2.174e-05, [1] [Cycle 1]: 1.73e-05, [1] [cse]: 1.198e-05 [environ_conv]: 4.90999e-06 [swap_dp_allreduce_reducescatter]: 5.72001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.35999e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.60997e-06 [assign_add_opt]: 1.21997e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 1.45999e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.56002e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.46002e-06 [offloading_packed_experts]: 3.95e-06 [overlap_recompute_and_grad_model_parallel]: 4.38001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.48998e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.665e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 7.147e-05, [1] [Cycle 1]: 6.728e-05, [6] [build]: 2.49999e-06 [elim_shapecalc]: 9.29e-06 [elim_not_effective]: 1.222e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 9.35001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.66998e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.654e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 0.00013989 [opt_after_jit_grad]: 0.00046312 [validate]: 3.256e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00735022 [execute]: 7.33e-06 Sums bootstrap : 0.000617s : 3.27% type_inference : 0.006804s : 36.04% event_method : 0.000014s : 0.08% auto_monad : 0.000057s : 0.30% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000028s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.22% optimize.opt_a.loop_unroll : 0.000028s : 0.15% optimize.opt_a.a_1 : 0.000600s : 3.18% optimize.opt_a.with_stream_mark : 0.000024s : 0.12% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000147s : 0.78% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.05% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000018s : 0.09% optimize.opt_a.renormalize : 0.000442s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.11% optimize.opt_a.cse : 0.000072s : 0.38% optimize.opt_a.a_3 : 0.000076s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.16% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000455s : 2.41% optimize.opt_b.b_1 : 0.000111s : 0.59% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000422s : 2.24% optimize.opt_after_cconv.c_1 : 0.000029s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.09% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000140s : 0.74% opt_after_jit_grad : 0.000463s : 2.45% validate : 0.000033s : 0.17% backend_pass : 0.000001s : 0.00% task_emit : 0.007350s : 38.94% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000170 30 13.96% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.23% : 0.000005s : 4: substitution.graph_param_transform 67.47% : 0.000115s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.62% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006752 2 90.59% : 0.006117s : 1: type_inference.infer 9.41% : 0.000635s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.59% : 0.000028s : 3: replace.inline 30.41% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 91.76% : 0.000113s : 3: match.inline 8.24% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.88% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.49% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.97% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.70% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.53% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.91% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.92% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.33% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000419 8 48.20% : 0.000202s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.80% : 0.000217s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032696 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.93% : 0.003573s : 1: add_attr 10.89% : 0.003562s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000063s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.00% : 0.000655s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.32% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.42% : 0.000464s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 3.00% : 0.000980s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000034s : 4: opt.transform.symbol_engine_opt 6.83% : 0.002234s : 1: opt_a 0.31% : 0.000100s : 1: opt_after_cconv 1.45% : 0.000473s : 1: opt_after_jit_grad 0.58% : 0.000189s : 1: opt_b 12.57% : 0.004111s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.68% : 0.000223s : 1: renormalize.infer 0.65% : 0.000212s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000146s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000034s : 1: rewriter_after_opt_a 0.19% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000074s : 1: symbol_engine_optimizer 22.52% : 0.007363s : 1: task_emit 0.23% : 0.000074s : 1: tuple_transform 20.85% : 0.006817s : 1: type_inference 0.20% : 0.000064s : 1: validate TotalTime = 0.0192864, [24] [bootstrap]: 0.00049317 [type_inference]: 0.00451851 [event_method]: 1.076e-05 [auto_monad]: 5.441e-05 [graph_reusing]: 5.07999e-06 [inline]: 1.61002e-06 [add_attr]: 0.00332868, [1] [add_attr_with_inline]: 0.00332048, [1] [Cycle 1]: 4.675e-05, [2] [tag_attr]: 1.246e-05 [meta_addattr_fg_expand]: 3.26001e-06 [parallel-infer-symbol]: 2.54001e-06 [pre_auto_parallel]: 2.232e-05 [insert-virtual-dataset]: 2.41998e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00382142, [53] [py_interpret_to_execute]: 1.529e-05 [rewriter_before_opt_a]: 3.991e-05 [opt_a]: 0.00195543, [2] [Cycle 1]: 0.00133613, [45] [expand_dump_flag]: 3.01001e-06 [switch_simplify]: 2.46e-05 [loop_unroll]: 1.411e-05 [a_1]: 0.00030006 [with_stream_mark]: 1.378e-05 [recompute_prepare]: 8.32998e-06 [updatestate_depend_eliminate]: 3.56001e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 8.171e-05 [accelerated_algorithm]: 6.57002e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 6.04001e-06 [merge_send_recv]: 7.41001e-06 [auto_parallel]: 6.10002e-06 [parallel]: 6.151e-05 [flash_sp]: 7.6e-06 [merge_comm]: 4.24002e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 8.99e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.70998e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.87001e-06 [merge_forward]: 3.72998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 8.77e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.136e-05 [merge_recompute_call_nodes]: 1.84e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 2.37001e-06 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 2.63998e-06 [after_resolve]: 1.09e-05 [a_after_grad]: 9.09998e-06 [renormalize]: 0.00035788 [add_forward_monad_depend]: 4.89e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.4e-05 [cse]: 2.646e-05 [a_3]: 4.114e-05 [Cycle 2]: 0.0006098, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 6.90002e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.0001292 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 6.09001e-06 [updatestate_depend_eliminate]: 2.93e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.916e-05 [accelerated_algorithm]: 5.81e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.49e-06 [parallel]: 4.62e-06 [flash_sp]: 3.20998e-06 [merge_comm]: 3.26999e-06 [allreduce_fusion]: 2.86e-06 [matmul_add_comm_reduction]: 5.26002e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 6.58998e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.24998e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 8.27e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.86e-06 [flash_sp_send_recv_attached]: 8.69972e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.28002e-06 [a_after_grad]: 8.35001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.53e-06 [cse]: 1.328e-05 [a_3]: 3.444e-05 [py_interpret_to_execute_after_opt_a]: 7.46001e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 5.536e-05 [convert_after_rewriter]: 7.83001e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.00045242 [opt_b]: 0.00018554, [1] [Cycle 1]: 0.00017958, [7] [b_1]: 0.0001111 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.43998e-06 [renormalize]: 4.69998e-07 [cse]: 1.63e-05 [optimize_parallel_all_gather_comm]: 1.631e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.329e-05 [loop_unroll]: 0.00042271 [opt_after_cconv]: 9.65e-05, [1] [Cycle 1]: 9.092e-05, [7] [c_1]: 2.976e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.562e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.239e-05 [tuple_transform]: 7.165e-05, [1] [Cycle 1]: 6.719e-05, [4] [d_1]: 4.077e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.53998e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 4.693e-05 [cse_after_recomputation]: 2.099e-05, [1] [Cycle 1]: 1.668e-05, [1] [cse]: 1.118e-05 [environ_conv]: 5.57999e-06 [swap_dp_allreduce_reducescatter]: 5.54e-06 [bias_add_comm_swap]: 2.19999e-06 [label_micro_interleaved_index]: 4.14997e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 1.92999e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.661e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.962e-05, [1] [Cycle 1]: 6.541e-05, [6] [build]: 2.28998e-06 [elim_shapecalc]: 8.79003e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 8.93002e-06 [renormalize]: 3.39991e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.59e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00045682 [validate]: 3.417e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00629729 [execute]: 7.28999e-06 Sums bootstrap : 0.000493s : 3.29% type_inference : 0.004519s : 30.16% event_method : 0.000011s : 0.07% auto_monad : 0.000054s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000429s : 2.87% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000151s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000066s : 0.44% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000358s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.14% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000076s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000055s : 0.37% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 3.02% optimize.opt_b.b_1 : 0.000111s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000423s : 2.82% optimize.opt_after_cconv.c_1 : 0.000030s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000006s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000457s : 3.05% validate : 0.000034s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006297s : 42.04% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000123 26 17.94% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.37% : 0.000005s : 4: substitution.graph_param_transform 65.79% : 0.000081s : 2: substitution.inline 2.44% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.74% : 0.000005s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004476 2 92.04% : 0.004120s : 1: type_inference.infer 7.96% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000141 984 0.97% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.54% : 0.000004s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000009s : 44: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.89% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.52% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.81% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.36% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.74% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.17% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000247 6 41.42% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.58% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027750 196 0.01% : 0.000004s : 1: ForceFp32Comm 12.01% : 0.003333s : 1: add_attr 11.98% : 0.003324s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000530s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.56% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.66% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.85% : 0.000791s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000093s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.06% : 0.001958s : 1: opt_a 0.36% : 0.000100s : 1: opt_after_cconv 1.68% : 0.000467s : 1: opt_after_jit_grad 0.68% : 0.000189s : 1: opt_b 13.78% : 0.003825s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000195s : 1: renormalize.infer 0.56% : 0.000156s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.22% : 0.000061s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000072s : 1: symbol_engine_optimizer 22.73% : 0.006307s : 1: task_emit 0.27% : 0.000075s : 1: tuple_transform 16.33% : 0.004532s : 1: type_inference 0.23% : 0.000063s : 1: validate TotalTime = 0.0202645, [24] [bootstrap]: 0.00047389 [type_inference]: 0.00575715 [event_method]: 1.444e-05 [auto_monad]: 5.465e-05 [graph_reusing]: 5.87999e-06 [inline]: 2.19001e-06 [add_attr]: 0.00302836, [1] [add_attr_with_inline]: 0.00301978, [1] [Cycle 1]: 4.473e-05, [2] [tag_attr]: 1.513e-05 [meta_addattr_fg_expand]: 4.13999e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 2.664e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.00407251, [53] [py_interpret_to_execute]: 2.082e-05 [rewriter_before_opt_a]: 5.859e-05 [opt_a]: 0.00215283, [2] [Cycle 1]: 0.00153632, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 3.183e-05 [loop_unroll]: 2.161e-05 [a_1]: 0.00045893 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.84002e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 7.651e-05 [accelerated_algorithm]: 6.49001e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 7.58999e-06 [auto_parallel]: 5.98998e-06 [parallel]: 1.63e-05 [flash_sp]: 7.25e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 8.99003e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.97002e-06 [virtual_dataset]: 6.60997e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.87999e-06 [merge_forward]: 4.07003e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.30001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66001e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00043329 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.793e-05 [a_3]: 4.173e-05 [Cycle 2]: 0.00060723, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 6.95002e-06 [loop_unroll]: 5.76e-06 [a_1]: 0.00012813 [with_stream_mark]: 1.001e-05 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.943e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.27999e-06 [shard_inline]: 5.78002e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.19998e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.56001e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.43002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.47001e-06 [virtual_dataset]: 5.51998e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.90001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 9.25001e-06 [a_after_grad]: 8.72998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 7.60017e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.734e-05 [a_3]: 3.272e-05 [py_interpret_to_execute_after_opt_a]: 7.79997e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.154e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 5.34998e-06 [mutable_eliminate]: 0.00046008 [opt_b]: 0.00018786, [1] [Cycle 1]: 0.00018185, [7] [b_1]: 0.00011395 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 2.69996e-07 [cse]: 1.56e-05 [optimize_parallel_all_gather_comm]: 1.563e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.186e-05 [loop_unroll]: 0.00042671 [opt_after_cconv]: 9.732e-05, [1] [Cycle 1]: 9.156e-05, [7] [c_1]: 2.878e-05 [parameter_eliminate]: 2.42001e-06 [updatestate_depend_eliminate]: 5.48002e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 1.633e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 0.00011327, [1] [Cycle 1]: 0.00010876, [4] [d_1]: 8.021e-05 [none_parameter_eliminate]: 2.16e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.56999e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.67e-05 [cse_after_recomputation]: 2.15e-05, [1] [Cycle 1]: 1.693e-05, [1] [cse]: 1.179e-05 [environ_conv]: 4.65001e-06 [swap_dp_allreduce_reducescatter]: 4.84e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.51998e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.01997e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.04003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.212e-05 [grouped_pairwise_exchange_alltoall]: 1.73997e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.696e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.996e-05, [1] [Cycle 1]: 6.578e-05, [6] [build]: 2.65002e-06 [elim_shapecalc]: 9.14998e-06 [elim_not_effective]: 1.183e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 8.84e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.532e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.99002e-06 [opt_after_jit_grad]: 0.00046234 [validate]: 3.277e-05 [backend_pass]: 1.08001e-06 [task_emit]: 0.00609243 [execute]: 7.01001e-06 Sums bootstrap : 0.000474s : 2.91% type_inference : 0.005757s : 35.40% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000587s : 3.61% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000146s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000433s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000045s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000460s : 2.83% optimize.opt_b.b_1 : 0.000114s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000427s : 2.62% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000080s : 0.49% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000462s : 2.84% validate : 0.000033s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006092s : 37.46% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000168 30 14.60% : 0.000024s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.07% : 0.000005s : 4: substitution.graph_param_transform 67.82% : 0.000114s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.16% : 0.000004s : 4: substitution.replace_old_param 6.44% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005714 2 90.06% : 0.005146s : 1: type_inference.infer 9.94% : 0.000568s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.61% : 0.000027s : 3: replace.inline 30.39% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.98% : 0.000111s : 3: match.inline 8.02% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000004s : 19: predicate.arithmetic_simplify 0.94% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000002s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.31% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.90% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.82% : 0.000001s : 11: predicate.transpose_eliminate 1.76% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.61% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.41% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.99% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.67% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000357 8 46.27% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.73% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028960 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.47% : 0.003033s : 1: add_attr 10.44% : 0.003024s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.76% : 0.000511s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.62% : 0.000469s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.31% : 0.000959s : 78: opt.transform.opt_a 0.09% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000093s : 28: opt.transform.opt_b 0.29% : 0.000085s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.44% : 0.002156s : 1: opt_a 0.35% : 0.000101s : 1: opt_after_cconv 1.63% : 0.000472s : 1: opt_after_jit_grad 0.66% : 0.000191s : 1: opt_b 14.08% : 0.004076s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000031s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.76% : 0.000221s : 1: renormalize.infer 0.71% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000073s : 1: symbol_engine_optimizer 21.07% : 0.006103s : 1: task_emit 0.40% : 0.000116s : 1: tuple_transform 19.93% : 0.005771s : 1: type_inference 0.21% : 0.000060s : 1: validate TotalTime = 0.0387016, [24] [bootstrap]: 0.0005065 [type_inference]: 0.0116665 [event_method]: 5.142e-05 [auto_monad]: 0.0001263 [graph_reusing]: 8.87e-06 [inline]: 2.03002e-06 [add_attr]: 0.00310052, [1] [add_attr_with_inline]: 0.00309215, [1] [Cycle 1]: 7.181e-05, [2] [tag_attr]: 3.526e-05 [meta_addattr_fg_expand]: 1.043e-05 [parallel-infer-symbol]: 2.54001e-06 [pre_auto_parallel]: 5.138e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0139123, [53] [py_interpret_to_execute]: 3.91e-05 [rewriter_before_opt_a]: 0.00014953 [opt_a]: 0.0115218, [3] [Cycle 1]: 0.00739967, [45] [expand_dump_flag]: 3.91999e-06 [switch_simplify]: 7.59e-05 [loop_unroll]: 6.374e-05 [a_1]: 0.00151625 [with_stream_mark]: 2.451e-05 [recompute_prepare]: 2.215e-05 [updatestate_depend_eliminate]: 9.63997e-06 [updatestate_assign_eliminate]: 8.47e-06 [updatestate_loads_eliminate]: 7.92e-06 [parameter_eliminate]: 2.53e-06 [a_2]: 0.00024834 [accelerated_algorithm]: 3.113e-05 [shard]: 1.86e-06 [meta_shard_fg_expand]: 3.68e-06 [shard_inline]: 1.627e-05 [merge_send_recv]: 1.589e-05 [auto_parallel]: 1.142e-05 [parallel]: 1.907e-05 [flash_sp]: 1.195e-05 [merge_comm]: 9.96998e-06 [allreduce_fusion]: 9.20001e-06 [matmul_add_comm_reduction]: 2.694e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.818e-05 [virtual_dataset]: 1.607e-05 [get_grad_eliminate_]: 1.537e-05 [virtual_output]: 1.518e-05 [merge_forward]: 9.49999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 1.835e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.993e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.789e-05 [set_forward_comm_id_for_comm_node_pass]: 1.006e-05 [meta_fg_expand]: 0.00150039 [flash_sp_send_recv_attached]: 3.8e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 6.258e-05 [a_after_grad]: 8.421e-05 [renormalize]: 0.00256845 [add_forward_monad_depend]: 9.63997e-06 [auto_monad_grad]: 5.30001e-06 [auto_monad_eliminator]: 5.808e-05 [cse]: 0.00017574 [a_3]: 0.00034313 [Cycle 2]: 0.00316612, [45] [expand_dump_flag]: 1.74998e-06 [switch_simplify]: 4.862e-05 [loop_unroll]: 4.526e-05 [a_1]: 0.0015773 [with_stream_mark]: 1.276e-05 [recompute_prepare]: 1.154e-05 [updatestate_depend_eliminate]: 5.57999e-06 [updatestate_assign_eliminate]: 4.53001e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.00012962 [accelerated_algorithm]: 1.228e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 2.04999e-06 [shard_inline]: 9.49e-06 [merge_send_recv]: 4.649e-05 [auto_parallel]: 8.67998e-06 [parallel]: 5.27001e-06 [flash_sp]: 3.21001e-06 [merge_comm]: 5.67001e-06 [allreduce_fusion]: 4.93001e-06 [matmul_add_comm_reduction]: 8.23001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.149e-05 [virtual_dataset]: 9.31e-06 [get_grad_eliminate_]: 9.39998e-06 [virtual_output]: 8.84e-06 [merge_forward]: 4.77e-06 [cell_reuse_recompute_pass]: 9.79984e-07 [offload_activation]: 9.81998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.732e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.424e-05 [set_forward_comm_id_for_comm_node_pass]: 5.73997e-06 [meta_fg_expand]: 7.431e-05 [flash_sp_send_recv_attached]: 1.14998e-06 [receive_attached]: 1.25999e-06 [after_resolve]: 1.67e-05 [a_after_grad]: 1.513e-05 [renormalize]: 0.0006319 [add_forward_monad_depend]: 4.02e-06 [auto_monad_grad]: 1.35001e-06 [auto_monad_eliminator]: 1.541e-05 [cse]: 4.734e-05 [a_3]: 6.702e-05 [Cycle 3]: 0.00094131, [45] [expand_dump_flag]: 1.09003e-06 [switch_simplify]: 1.101e-05 [loop_unroll]: 8.99e-06 [a_1]: 0.0002545 [with_stream_mark]: 1.046e-05 [recompute_prepare]: 9.77999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 4.10998e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.00013056 [accelerated_algorithm]: 1.222e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 7.19001e-06 [auto_parallel]: 7.48e-06 [parallel]: 4.85001e-06 [flash_sp]: 1.18001e-06 [merge_comm]: 5.14e-06 [allreduce_fusion]: 5.22e-06 [matmul_add_comm_reduction]: 7.86001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.058e-05 [virtual_dataset]: 9.04e-06 [get_grad_eliminate_]: 8.75001e-06 [virtual_output]: 8.49998e-06 [merge_forward]: 4.58001e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 1.657e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.656e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.433e-05 [set_forward_comm_id_for_comm_node_pass]: 5.53002e-06 [meta_fg_expand]: 3.38e-06 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.533e-05 [a_after_grad]: 1.527e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.37e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 1.115e-05 [cse]: 2.773e-05 [a_3]: 6.17e-05 [py_interpret_to_execute_after_opt_a]: 1.081e-05 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 4.858e-05 [convert_after_rewriter]: 9.32001e-06 [order_py_execute_after_rewriter]: 7.25998e-06 [mutable_eliminate]: 0.0004676 [opt_b]: 0.00029969, [1] [Cycle 1]: 0.0002933, [7] [b_1]: 0.00019605 [b_2]: 1.138e-05 [updatestate_depend_eliminate]: 7.39002e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 4.2e-06 [renormalize]: 3.50003e-07 [cse]: 3.38e-05 [optimize_parallel_all_gather_comm]: 2.076e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.054e-05 [loop_unroll]: 0.00043553 [opt_after_cconv]: 0.00014276, [1] [Cycle 1]: 0.00013651, [7] [c_1]: 5.065e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 7.69002e-06 [updatestate_assign_eliminate]: 4.54998e-06 [updatestate_loads_eliminate]: 4.07e-06 [cse]: 3.156e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 2.967e-05 [tuple_transform]: 0.00010507, [1] [Cycle 1]: 0.00010033, [4] [d_1]: 6.866e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 1.037e-05 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.713e-05 [cse_after_recomputation]: 3.498e-05, [1] [Cycle 1]: 3.011e-05, [1] [cse]: 2.42e-05 [environ_conv]: 8.82999e-06 [swap_dp_allreduce_reducescatter]: 7.98001e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.33001e-06 [label_fine_grained_interleaved_index]: 2.57001e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.52001e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.30999e-06 [add_comm_op_reuse_tag]: 9.40025e-07 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.767e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 5.02999e-06 [overlap_recompute_and_grad_model_parallel]: 5.56998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71998e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 5.82001e-06 [overlap_grad_flash_sp]: 2.478e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.96e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 0.00010356, [1] [Cycle 1]: 9.905e-05, [6] [build]: 9.94001e-06 [elim_shapecalc]: 1.458e-05 [elim_not_effective]: 1.955e-05 [opt_reshape]: 1.043e-05 [fold_const_symbol]: 1.543e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.39003e-06 [auto_monad_reorder]: 2.575e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00048264 [validate]: 4.472e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00848591 [execute]: 7.15e-06 Sums bootstrap : 0.000507s : 1.48% type_inference : 0.011666s : 34.06% event_method : 0.000051s : 0.15% auto_monad : 0.000126s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000051s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.11% optimize.rewriter_before_opt_a : 0.000150s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000136s : 0.40% optimize.opt_a.loop_unroll : 0.000118s : 0.34% optimize.opt_a.a_1 : 0.003348s : 9.77% optimize.opt_a.with_stream_mark : 0.000048s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000509s : 1.48% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000070s : 0.20% optimize.opt_a.auto_parallel : 0.000028s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000045s : 0.13% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001578s : 4.61% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000095s : 0.28% optimize.opt_a.a_after_grad : 0.000115s : 0.33% optimize.opt_a.renormalize : 0.003200s : 9.34% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.25% optimize.opt_a.cse : 0.000251s : 0.73% optimize.opt_a.a_3 : 0.000472s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000468s : 1.37% optimize.opt_b.b_1 : 0.000196s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000436s : 1.27% optimize.opt_after_cconv.c_1 : 0.000051s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000032s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000069s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000024s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000483s : 1.41% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008486s : 24.77% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000794 222 5.64% : 0.000045s : 12: substitution.arithmetic_simplify 1.77% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 56.86% : 0.000451s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.89% : 0.000015s : 3: substitution.less_batch_normalization 1.65% : 0.000013s : 11: substitution.minmaximum_grad 0.66% : 0.000005s : 5: substitution.partial_eliminate 1.74% : 0.000014s : 20: substitution.remove_not_recompute_node 3.14% : 0.000025s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.53% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.61% : 0.000068s : 30: substitution.tuple_list_get_item_eliminator 2.23% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011590 2 86.43% : 0.010018s : 1: type_inference.infer 13.57% : 0.001572s : 1: type_inference.specialize ------[replace.] 0.000236 33 57.43% : 0.000135s : 17: replace.inline 42.57% : 0.000100s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000478 33 92.52% : 0.000442s : 17: match.inline 7.48% : 0.000036s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000802 5764 1.01% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 0.99% : 0.000008s : 68: predicate.addn_zero_filter 0.99% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000016s : 100: predicate.arithmetic_simplify 1.08% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.12% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.04% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.14% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_depend_swap 1.66% : 0.000013s : 108: predicate.environ_get_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.62% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000018s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.51% : 0.000044s : 249: predicate.inline 1.23% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.51% : 0.000020s : 168: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000018s : 136: predicate.loop_unroll_before_grad 4.82% : 0.000039s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.05% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.06% : 0.000009s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000016s : 101: predicate.partial_defer_inline 1.69% : 0.000014s : 92: predicate.partial_eliminate 1.00% : 0.000008s : 68: predicate.print_const_string_wrapper 0.50% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000011s : 68: predicate.reduce_eliminate 2.53% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.81% : 0.000014s : 152: predicate.replace_applicator 0.57% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.02% : 0.000008s : 68: predicate.reshape_eliminate 1.09% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.20% : 0.000010s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.78% : 0.000014s : 101: predicate.switch_defer_inline 2.79% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.78% : 0.000038s : 277: predicate.switch_simplify 1.00% : 0.000008s : 68: predicate.tile_eliminate 1.03% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000013s : 84: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.53% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.49% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.12% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001665 34 56.77% : 0.000945s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.23% : 0.000720s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064318 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.83% : 0.003105s : 1: add_attr 4.81% : 0.003096s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000134s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000542s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000038s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000058s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000445s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.74% : 0.000477s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.87% : 0.005062s : 117: opt.transform.opt_a 0.08% : 0.000049s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000181s : 28: opt.transform.opt_b 0.12% : 0.000077s : 2: opt.transform.opt_trans_graph 0.09% : 0.000056s : 4: opt.transform.symbol_engine_opt 17.92% : 0.011525s : 1: opt_a 0.23% : 0.000146s : 1: opt_after_cconv 0.77% : 0.000493s : 1: opt_after_jit_grad 0.47% : 0.000303s : 1: opt_b 21.64% : 0.013916s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000056s : 1: pre_auto_parallel 0.07% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.66% : 0.001710s : 2: renormalize.infer 2.30% : 0.001477s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000053s : 1: rewriter_after_opt_a 0.24% : 0.000154s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.10% : 0.000064s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000106s : 1: symbol_engine_optimizer 13.21% : 0.008496s : 1: task_emit 0.17% : 0.000108s : 1: tuple_transform 18.16% : 0.011681s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0189941, [24] [bootstrap]: 0.00051028 [type_inference]: 0.00437074 [event_method]: 1.14e-05 [auto_monad]: 5.198e-05 [graph_reusing]: 5.51e-06 [inline]: 1.85001e-06 [add_attr]: 0.00304274, [1] [add_attr_with_inline]: 0.00303455, [1] [Cycle 1]: 4.684e-05, [2] [tag_attr]: 1.255e-05 [meta_addattr_fg_expand]: 3.09001e-06 [parallel-infer-symbol]: 2.70002e-06 [pre_auto_parallel]: 2.248e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.83002e-06 [optimize]: 0.00383005, [53] [py_interpret_to_execute]: 1.619e-05 [rewriter_before_opt_a]: 3.776e-05 [opt_a]: 0.00198654, [2] [Cycle 1]: 0.00136692, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 9.427e-05 [loop_unroll]: 1.484e-05 [a_1]: 0.00030418 [with_stream_mark]: 1.35e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.46001e-06 [updatestate_loads_eliminate]: 2.96999e-06 [parameter_eliminate]: 1.63002e-06 [a_2]: 7.771e-05 [accelerated_algorithm]: 7.13e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 6.03998e-06 [merge_send_recv]: 7.95998e-06 [auto_parallel]: 6.67002e-06 [parallel]: 1.862e-05 [flash_sp]: 7.39002e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.43999e-06 [matmul_add_comm_reduction]: 9.08002e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.33e-06 [virtual_dataset]: 6.56e-06 [get_grad_eliminate_]: 5.70001e-06 [virtual_output]: 5.94e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 1.007e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.163e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 9.62001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.74002e-06 [meta_fg_expand]: 2.74999e-06 [flash_sp_send_recv_attached]: 2.86999e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 9.22999e-06 [renormalize]: 0.00035648 [add_forward_monad_depend]: 4.18999e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 2.625e-05 [a_3]: 4.145e-05 [Cycle 2]: 0.00061018, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.42998e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012992 [with_stream_mark]: 1.027e-05 [recompute_prepare]: 5.76998e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 6.961e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.71998e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.37e-06 [flash_sp]: 3.24001e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.32999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.20002e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.24e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.26e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.22e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 1.87999e-06 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 8.62e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.42001e-06 [cse]: 1.262e-05 [a_3]: 3.251e-05 [py_interpret_to_execute_after_opt_a]: 8.00999e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.19e-05 [convert_after_rewriter]: 7.33e-06 [order_py_execute_after_rewriter]: 5.59e-06 [mutable_eliminate]: 0.00045647 [opt_b]: 0.00018504, [1] [Cycle 1]: 0.00017913, [7] [b_1]: 0.00011099 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 3.59985e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.555e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.242e-05 [loop_unroll]: 0.00042492 [opt_after_cconv]: 9.769e-05, [1] [Cycle 1]: 9.182e-05, [7] [c_1]: 2.898e-05 [parameter_eliminate]: 2.38002e-06 [updatestate_depend_eliminate]: 5.36002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.641e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.249e-05 [tuple_transform]: 7.143e-05, [1] [Cycle 1]: 6.708e-05, [4] [d_1]: 4.065e-05 [none_parameter_eliminate]: 1.87999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.51998e-06 [add_recomputation]: 4.431e-05 [cse_after_recomputation]: 1.96e-05, [1] [Cycle 1]: 1.529e-05, [1] [cse]: 1.03e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.17e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.01998e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.27e-06 [full_micro_interleaved_order_control]: 2.51e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 1.34998e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.34e-06 [overlap_opt_shard_grad_in_pipeline]: 2.06e-06 [control_data_broadcast_order]: 1.135e-05 [grouped_pairwise_exchange_alltoall]: 1.81998e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 4.52998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 4.06001e-06 [overlap_grad_flash_sp]: 1.703e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.916e-05, [1] [Cycle 1]: 6.496e-05, [6] [build]: 2.16e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.197e-05 [opt_reshape]: 6.26e-06 [fold_const_symbol]: 8.95001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.69998e-06 [auto_monad_reorder]: 1.559e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.54002e-06 [opt_after_jit_grad]: 0.00050077 [validate]: 3.169e-05 [backend_pass]: 1.08001e-06 [task_emit]: 0.00637751 [execute]: 6.44999e-06 Sums bootstrap : 0.000510s : 3.41% type_inference : 0.004371s : 29.19% event_method : 0.000011s : 0.08% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.25% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000102s : 0.68% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000434s : 2.90% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000357s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000039s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000456s : 3.05% optimize.opt_b.b_1 : 0.000111s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000425s : 2.84% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000501s : 3.34% validate : 0.000032s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006378s : 42.59% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000125 26 17.78% : 0.000022s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.54% : 0.000006s : 4: substitution.graph_param_transform 66.52% : 0.000083s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.99% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004328 2 91.53% : 0.003962s : 1: type_inference.infer 8.47% : 0.000367s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000142 984 1.04% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.64% : 0.000004s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_depend_swap 1.81% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.22% : 0.000009s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.40% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.82% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.25% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.74% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.64% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.44% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000253 6 40.47% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.53% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027252 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.18% : 0.003047s : 1: add_attr 11.15% : 0.003038s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.00% : 0.000545s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000434s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.17% : 0.000864s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000093s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.30% : 0.001989s : 1: opt_a 0.37% : 0.000101s : 1: opt_after_cconv 1.87% : 0.000511s : 1: opt_after_jit_grad 0.69% : 0.000189s : 1: opt_b 14.07% : 0.003834s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000192s : 1: renormalize.infer 0.58% : 0.000158s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.15% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000072s : 1: symbol_engine_optimizer 23.44% : 0.006388s : 1: task_emit 0.27% : 0.000074s : 1: tuple_transform 16.09% : 0.004384s : 1: type_inference 0.21% : 0.000058s : 1: validate TotalTime = 0.0370839, [24] [bootstrap]: 0.00049381 [type_inference]: 0.0104655 [event_method]: 4.558e-05 [auto_monad]: 0.00012268 [graph_reusing]: 8.23999e-06 [inline]: 1.72999e-06 [add_attr]: 0.00304374, [1] [add_attr_with_inline]: 0.00303526, [1] [Cycle 1]: 6.876e-05, [2] [tag_attr]: 3.269e-05 [meta_addattr_fg_expand]: 9.20999e-06 [parallel-infer-symbol]: 3.3e-06 [pre_auto_parallel]: 4.744e-05 [insert-virtual-dataset]: 2.72001e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.0136492, [53] [py_interpret_to_execute]: 3.64e-05 [rewriter_before_opt_a]: 0.00012848 [opt_a]: 0.0113566, [3] [Cycle 1]: 0.00726934, [45] [expand_dump_flag]: 4.13001e-06 [switch_simplify]: 6.91e-05 [loop_unroll]: 5.808e-05 [a_1]: 0.00140511 [with_stream_mark]: 2.367e-05 [recompute_prepare]: 2.144e-05 [updatestate_depend_eliminate]: 9.66e-06 [updatestate_assign_eliminate]: 7.9e-06 [updatestate_loads_eliminate]: 8.27998e-06 [parameter_eliminate]: 2.58e-06 [a_2]: 0.00025395 [accelerated_algorithm]: 3.229e-05 [shard]: 1.81003e-06 [meta_shard_fg_expand]: 3.98001e-06 [shard_inline]: 1.679e-05 [merge_send_recv]: 1.642e-05 [auto_parallel]: 1.134e-05 [parallel]: 1.732e-05 [flash_sp]: 1.192e-05 [merge_comm]: 9.74e-06 [allreduce_fusion]: 8.82999e-06 [matmul_add_comm_reduction]: 2.676e-05 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 1.846e-05 [virtual_dataset]: 1.65e-05 [get_grad_eliminate_]: 1.583e-05 [virtual_output]: 1.567e-05 [merge_forward]: 1.027e-05 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.784e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.039e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 2.889e-05 [set_forward_comm_id_for_comm_node_pass]: 9.91e-06 [meta_fg_expand]: 0.00148198 [flash_sp_send_recv_attached]: 3.75e-06 [receive_attached]: 2.53e-06 [after_resolve]: 6.191e-05 [a_after_grad]: 8.478e-05 [renormalize]: 0.00252433 [add_forward_monad_depend]: 9.42999e-06 [auto_monad_grad]: 5.34e-06 [auto_monad_eliminator]: 5.807e-05 [cse]: 0.00017251 [a_3]: 0.0003956 [Cycle 2]: 0.00310248, [45] [expand_dump_flag]: 1.73002e-06 [switch_simplify]: 5.004e-05 [loop_unroll]: 4.607e-05 [a_1]: 0.00160795 [with_stream_mark]: 1.242e-05 [recompute_prepare]: 1.101e-05 [updatestate_depend_eliminate]: 5.60001e-06 [updatestate_assign_eliminate]: 4.86002e-06 [updatestate_loads_eliminate]: 3.96001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00013207 [accelerated_algorithm]: 1.269e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 9.66e-06 [merge_send_recv]: 7.21001e-06 [auto_parallel]: 7.71001e-06 [parallel]: 4.82e-06 [flash_sp]: 3.49001e-06 [merge_comm]: 5.37999e-06 [allreduce_fusion]: 4.63001e-06 [matmul_add_comm_reduction]: 7.80998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.087e-05 [virtual_dataset]: 9.23002e-06 [get_grad_eliminate_]: 9.20999e-06 [virtual_output]: 8.84003e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 8.79983e-07 [offload_activation]: 9.47999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.653e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.459e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24998e-06 [meta_fg_expand]: 3.702e-05 [flash_sp_send_recv_attached]: 1.12e-06 [receive_attached]: 1.10001e-06 [after_resolve]: 1.591e-05 [a_after_grad]: 1.498e-05 [renormalize]: 0.00061589 [add_forward_monad_depend]: 4.41002e-06 [auto_monad_grad]: 1.19998e-06 [auto_monad_eliminator]: 1.518e-05 [cse]: 5.03e-05 [a_3]: 6.85e-05 [Cycle 3]: 0.00097048, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 1.084e-05 [loop_unroll]: 9.34e-06 [a_1]: 0.00026242 [with_stream_mark]: 1.054e-05 [recompute_prepare]: 9.88998e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 4.06001e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 0.00012971 [accelerated_algorithm]: 1.25e-05 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.55001e-06 [merge_send_recv]: 7.73001e-06 [auto_parallel]: 7.87e-06 [parallel]: 4.57998e-06 [flash_sp]: 1.19003e-06 [merge_comm]: 5.13002e-06 [allreduce_fusion]: 5.07999e-06 [matmul_add_comm_reduction]: 8e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.058e-05 [virtual_dataset]: 9.09e-06 [get_grad_eliminate_]: 8.87999e-06 [virtual_output]: 8.55001e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 8.85999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.647e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.428e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34e-06 [meta_fg_expand]: 3.34001e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.21002e-06 [after_resolve]: 1.423e-05 [a_after_grad]: 1.47e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.40999e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 3.586e-05 [cse]: 2.945e-05 [a_3]: 6.39e-05 [py_interpret_to_execute_after_opt_a]: 1.084e-05 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 4.739e-05 [convert_after_rewriter]: 9.51e-06 [order_py_execute_after_rewriter]: 6.76999e-06 [mutable_eliminate]: 0.0004617 [opt_b]: 0.00030119, [1] [Cycle 1]: 0.00029481, [7] [b_1]: 0.00019817 [b_2]: 1.161e-05 [updatestate_depend_eliminate]: 7.78001e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 4.38999e-06 [renormalize]: 3.50003e-07 [cse]: 3.259e-05 [optimize_parallel_all_gather_comm]: 2.161e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.089e-05 [loop_unroll]: 0.00042637 [opt_after_cconv]: 0.00014256, [1] [Cycle 1]: 0.00013638, [7] [c_1]: 5.153e-05 [parameter_eliminate]: 2.15002e-06 [updatestate_depend_eliminate]: 7.93001e-06 [updatestate_assign_eliminate]: 4.62e-06 [updatestate_loads_eliminate]: 3.95998e-06 [cse]: 3.129e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 2.977e-05 [tuple_transform]: 0.00010641, [1] [Cycle 1]: 0.00010156, [4] [d_1]: 6.999e-05 [none_parameter_eliminate]: 1.85001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.033e-05 [partial_unused_args_eliminate]: 2.51998e-06 [add_recomputation]: 5.871e-05 [cse_after_recomputation]: 3.298e-05, [1] [Cycle 1]: 2.815e-05, [1] [cse]: 2.283e-05 [environ_conv]: 8.82e-06 [swap_dp_allreduce_reducescatter]: 8.45999e-06 [bias_add_comm_swap]: 2.22999e-06 [label_micro_interleaved_index]: 4.03999e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.03997e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.51e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.17e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.825e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 5.15999e-06 [overlap_recompute_and_grad_model_parallel]: 5.76e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 5.27999e-06 [overlap_grad_flash_sp]: 2.443e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.39001e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 0.00010199, [1] [Cycle 1]: 9.759e-05, [6] [build]: 9.54e-06 [elim_shapecalc]: 1.428e-05 [elim_not_effective]: 1.89e-05 [opt_reshape]: 1.062e-05 [fold_const_symbol]: 1.517e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 2.608e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.68999e-06 [opt_after_jit_grad]: 0.000473 [validate]: 4.469e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00842702 [execute]: 6.86999e-06 Sums bootstrap : 0.000494s : 1.51% type_inference : 0.010465s : 31.95% event_method : 0.000046s : 0.14% auto_monad : 0.000123s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000047s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000128s : 0.39% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.40% optimize.opt_a.loop_unroll : 0.000113s : 0.35% optimize.opt_a.a_1 : 0.003275s : 10.00% optimize.opt_a.with_stream_mark : 0.000047s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000516s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.18% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.11% optimize.opt_a.merge_send_recv : 0.000031s : 0.10% optimize.opt_a.auto_parallel : 0.000027s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000058s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001522s : 4.65% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000092s : 0.28% optimize.opt_a.a_after_grad : 0.000114s : 0.35% optimize.opt_a.renormalize : 0.003140s : 9.59% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000109s : 0.33% optimize.opt_a.cse : 0.000252s : 0.77% optimize.opt_a.a_3 : 0.000528s : 1.61% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000462s : 1.41% optimize.opt_b.b_1 : 0.000198s : 0.60% optimize.opt_b.b_2 : 0.000012s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000426s : 1.30% optimize.opt_after_cconv.c_1 : 0.000052s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000070s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000003s : 0.01% optimize.add_recomputation : 0.000059s : 0.18% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000473s : 1.44% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008427s : 25.73% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000765 218 5.59% : 0.000043s : 11: substitution.arithmetic_simplify 1.96% : 0.000015s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 56.15% : 0.000430s : 16: substitution.inline 2.14% : 0.000016s : 2: substitution.inline_without_move 1.40% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000015s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.81% : 0.000014s : 20: substitution.remove_not_recompute_node 3.22% : 0.000025s : 10: substitution.replace_applicator 1.35% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.72% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.19% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010395 2 86.64% : 0.009006s : 1: type_inference.infer 13.36% : 0.001389s : 1: type_inference.specialize ------[replace.] 0.000216 30 58.31% : 0.000126s : 16: replace.inline 41.69% : 0.000090s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 30 93.17% : 0.000420s : 16: match.inline 6.83% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000759 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.15% : 0.000016s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.55% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_depend_swap 1.72% : 0.000013s : 107: predicate.environ_get_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.63% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.52% : 0.000004s : 32: predicate.incorporate_call_switch 5.70% : 0.000043s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.03% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 67: predicate.reduce_eliminate 2.64% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.85% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.12% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.32% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.83% : 0.000037s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.55% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.52% : 0.000012s : 83: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.56% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.59% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001563 32 56.91% : 0.000890s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.09% : 0.000674s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062303 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.89% : 0.003048s : 1: add_attr 4.88% : 0.003039s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000130s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.85% : 0.000527s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.09% : 0.005042s : 117: opt.transform.opt_a 0.08% : 0.000050s : 1: opt.transform.opt_after_cconv 0.06% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000183s : 28: opt.transform.opt_b 0.13% : 0.000078s : 2: opt.transform.opt_trans_graph 0.09% : 0.000056s : 4: opt.transform.symbol_engine_opt 18.23% : 0.011359s : 1: opt_a 0.23% : 0.000146s : 1: opt_after_cconv 0.77% : 0.000482s : 1: opt_after_jit_grad 0.49% : 0.000305s : 1: opt_b 21.91% : 0.013653s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000006s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.65% : 0.001650s : 2: renormalize.infer 2.37% : 0.001477s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.21% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000105s : 1: symbol_engine_optimizer 13.54% : 0.008437s : 1: task_emit 0.18% : 0.000110s : 1: tuple_transform 16.82% : 0.010480s : 1: type_inference 0.12% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x2-kbk],max_mem:42.0M TotalTime = 0.124666, [24] [bootstrap]: 0.00059606 [type_inference]: 0.00675997 [event_method]: 1.455e-05 [auto_monad]: 5.713e-05 [graph_reusing]: 5.57001e-06 [inline]: 2.09e-06 [add_attr]: 0.00352654, [1] [add_attr_with_inline]: 0.00351553, [1] [Cycle 1]: 4.6e-05, [2] [tag_attr]: 1.585e-05 [meta_addattr_fg_expand]: 4.33999e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.899e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 4.05998e-06 [optimize]: 0.00408362, [53] [py_interpret_to_execute]: 2.11e-05 [rewriter_before_opt_a]: 6.085e-05 [opt_a]: 0.00217971, [2] [Cycle 1]: 0.00156866, [45] [expand_dump_flag]: 3.50998e-06 [switch_simplify]: 3.252e-05 [loop_unroll]: 2.237e-05 [a_1]: 0.00046714 [with_stream_mark]: 1.408e-05 [recompute_prepare]: 8.05e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 2.01998e-06 [a_2]: 7.79e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 6.06998e-06 [merge_send_recv]: 8.38001e-06 [auto_parallel]: 6.53998e-06 [parallel]: 2.248e-05 [flash_sp]: 7.15e-06 [merge_comm]: 3.72998e-06 [allreduce_fusion]: 3.63999e-06 [matmul_add_comm_reduction]: 8.67998e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.55998e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 5.80002e-06 [virtual_output]: 5.91998e-06 [merge_forward]: 3.72998e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.47999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.154e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.49e-06 [set_forward_comm_id_for_comm_node_pass]: 3.82998e-06 [meta_fg_expand]: 2.60002e-06 [flash_sp_send_recv_attached]: 2.31998e-06 [receive_attached]: 2.41998e-06 [after_resolve]: 1.072e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00044015 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 1.81998e-06 [auto_monad_eliminator]: 1.345e-05 [cse]: 2.583e-05 [a_3]: 4.169e-05 [Cycle 2]: 0.00060147, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.28e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00012813 [with_stream_mark]: 9.73998e-06 [recompute_prepare]: 6.04999e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.18998e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.842e-05 [accelerated_algorithm]: 5.78002e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.66e-06 [parallel]: 4.25e-06 [flash_sp]: 3.16001e-06 [merge_comm]: 3.05998e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.18002e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.58998e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.23002e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64999e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.89e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.22001e-06 [a_after_grad]: 8.05999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.31e-06 [cse]: 1.274e-05 [a_3]: 3.249e-05 [py_interpret_to_execute_after_opt_a]: 7.82e-06 [slice_cell_reuse_recomputed_activation]: 2.51e-06 [rewriter_after_opt_a]: 3.044e-05 [convert_after_rewriter]: 6.84001e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00046876 [opt_b]: 0.00019561, [1] [Cycle 1]: 0.00018948, [7] [b_1]: 0.0001181 [b_2]: 7.77e-06 [updatestate_depend_eliminate]: 5.31002e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 3.60014e-07 [cse]: 1.685e-05 [optimize_parallel_all_gather_comm]: 1.596e-05 [overlap_param_gather]: 1.64e-06 [cconv]: 2.256e-05 [loop_unroll]: 0.00042548 [opt_after_cconv]: 9.818e-05, [1] [Cycle 1]: 9.225e-05, [7] [c_1]: 2.845e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.648e-05 [renormalize]: 6.00005e-07 [remove_dup_value]: 1.267e-05 [tuple_transform]: 7.208e-05, [1] [Cycle 1]: 6.75e-05, [4] [d_1]: 4.061e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.64001e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.927e-05 [cse_after_recomputation]: 2.077e-05, [1] [Cycle 1]: 1.616e-05, [1] [cse]: 1.072e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.60001e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.50999e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 8.50006e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 1.17e-06 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 9.79984e-07 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.207e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.81001e-06 [overlap_recompute_and_grad_model_parallel]: 4.59998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.26998e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.745e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.08e-05, [1] [Cycle 1]: 6.647e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 9.32001e-06 [elim_not_effective]: 1.174e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 9.59999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 1.535e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.62002e-06 [opt_after_jit_grad]: 0.00045769 [validate]: 3.236e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.108831 [execute]: 8.67998e-06 Sums bootstrap : 0.000596s : 0.50% type_inference : 0.006760s : 5.63% event_method : 0.000015s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000004s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000061s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000028s : 0.02% optimize.opt_a.a_1 : 0.000595s : 0.50% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000440s : 0.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000039s : 0.03% optimize.opt_a.a_3 : 0.000074s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000469s : 0.39% optimize.opt_b.b_1 : 0.000118s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000425s : 0.35% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000458s : 0.38% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.108831s : 90.59% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000170 30 14.57% : 0.000025s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000002s : 2: substitution.fold_const_symbol 3.01% : 0.000005s : 4: substitution.graph_param_transform 67.44% : 0.000115s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.38% : 0.000004s : 4: substitution.replace_old_param 6.39% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006707 2 91.06% : 0.006108s : 1: type_inference.infer 8.94% : 0.000600s : 1: type_inference.specialize ------[replace.] 0.000042 5 68.93% : 0.000029s : 3: replace.inline 31.07% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.05% : 0.000112s : 3: match.inline 7.95% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.83% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.54% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.21% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.23% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.73% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.29% : 0.000002s : 11: predicate.reduce_eliminate 2.28% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.91% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000403 8 47.63% : 0.000192s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.37% : 0.000211s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.133845 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.64% : 0.003531s : 1: add_attr 2.63% : 0.003519s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000063s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.48% : 0.000640s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000003s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000478s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.72% : 0.000970s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000099s : 28: opt.transform.opt_b 0.03% : 0.000045s : 2: opt.transform.opt_trans_graph 0.02% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.63% : 0.002183s : 1: opt_a 0.08% : 0.000102s : 1: opt_after_cconv 0.35% : 0.000467s : 1: opt_after_jit_grad 0.15% : 0.000199s : 1: opt_b 3.05% : 0.004088s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000007s : 1: pipeline_split 0.03% : 0.000034s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000221s : 1: renormalize.infer 0.16% : 0.000212s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000065s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000074s : 1: symbol_engine_optimizer 81.33% : 0.108853s : 1: task_emit 0.06% : 0.000075s : 1: tuple_transform 5.06% : 0.006773s : 1: type_inference 0.04% : 0.000059s : 1: validate TotalTime = 0.111968, [24] [bootstrap]: 0.00046707 [type_inference]: 0.00448466 [event_method]: 1.069e-05 [auto_monad]: 5.223e-05 [graph_reusing]: 5.34e-06 [inline]: 1.77001e-06 [add_attr]: 0.00299922, [1] [add_attr_with_inline]: 0.00299126, [1] [Cycle 1]: 4.499e-05, [2] [tag_attr]: 1.171e-05 [meta_addattr_fg_expand]: 3.7e-06 [parallel-infer-symbol]: 3.06999e-06 [pre_auto_parallel]: 2.263e-05 [insert-virtual-dataset]: 2.74999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.20002e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.00376515, [53] [py_interpret_to_execute]: 1.549e-05 [rewriter_before_opt_a]: 4.033e-05 [opt_a]: 0.0019235, [2] [Cycle 1]: 0.0013047, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 2.436e-05 [loop_unroll]: 1.407e-05 [a_1]: 0.00029838 [with_stream_mark]: 1.402e-05 [recompute_prepare]: 7.63999e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.31001e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.55999e-06 [a_2]: 7.82e-05 [accelerated_algorithm]: 6.49001e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.96998e-06 [shard_inline]: 5.78002e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 6.27001e-06 [parallel]: 1.773e-05 [flash_sp]: 7.15e-06 [merge_comm]: 3.92998e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.52999e-06 [allreduce_slice_to_reducescatter]: 9.70002e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.77999e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.48999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.00001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.49999e-06 [flash_sp_send_recv_attached]: 2.53998e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 9.17001e-06 [renormalize]: 0.00036351 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 2.12001e-06 [auto_monad_eliminator]: 2.629e-05 [cse]: 2.549e-05 [a_3]: 4.163e-05 [Cycle 2]: 0.00060967, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 7.13e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00012812 [with_stream_mark]: 1.141e-05 [recompute_prepare]: 5.99e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.865e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.11997e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 4.57e-06 [auto_parallel]: 5.42001e-06 [parallel]: 4.70999e-06 [flash_sp]: 3.16999e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.83002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.39999e-06 [virtual_dataset]: 5.67001e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 5.19998e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.54002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 1.77999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 9.24e-06 [a_after_grad]: 8.27e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.40997e-06 [cse]: 1.331e-05 [a_3]: 3.34e-05 [py_interpret_to_execute_after_opt_a]: 8.07e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.103e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.00045368 [opt_b]: 0.0001872, [1] [Cycle 1]: 0.00018134, [7] [b_1]: 0.00011131 [b_2]: 7.70998e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.49999e-06 [renormalize]: 5.3001e-07 [cse]: 1.642e-05 [optimize_parallel_all_gather_comm]: 1.605e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.178e-05 [loop_unroll]: 0.0004237 [opt_after_cconv]: 9.747e-05, [1] [Cycle 1]: 9.188e-05, [7] [c_1]: 2.89e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.67e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 7.168e-05, [1] [Cycle 1]: 6.714e-05, [4] [d_1]: 4.033e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.523e-05 [cse_after_recomputation]: 2.066e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.102e-05 [environ_conv]: 4.43999e-06 [swap_dp_allreduce_reducescatter]: 5.75001e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.42002e-06 [overlap_recompute_and_grad_model_parallel]: 4.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.39e-06 [symbol_engine_optimizer]: 6.995e-05, [1] [Cycle 1]: 6.594e-05, [6] [build]: 2.29999e-06 [elim_shapecalc]: 8.69e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.71e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.09e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 1.597e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00045354 [validate]: 3.227e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0994206 [execute]: 9.41e-06 Sums bootstrap : 0.000467s : 0.43% type_inference : 0.004485s : 4.15% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000426s : 0.39% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000364s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000033s : 0.03% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000075s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000454s : 0.42% optimize.opt_b.b_1 : 0.000111s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000424s : 0.39% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000454s : 0.42% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.099421s : 92.07% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.17% : 0.000022s : 4: substitution.arithmetic_simplify 1.35% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.40% : 0.000005s : 4: substitution.graph_param_transform 65.86% : 0.000080s : 2: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.53% : 0.000004s : 4: substitution.remove_not_recompute_node 3.33% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004442 2 91.28% : 0.004055s : 1: type_inference.infer 8.72% : 0.000388s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000141 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.54% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.71% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.16% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.92% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000000s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.55% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.65% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.93% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000278 6 41.56% : 0.000116s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.44% : 0.000162s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120040 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.50% : 0.003003s : 1: add_attr 2.49% : 0.002995s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000501s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.65% : 0.000785s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000093s : 28: opt.transform.opt_b 0.04% : 0.000045s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.60% : 0.001926s : 1: opt_a 0.08% : 0.000101s : 1: opt_after_cconv 0.39% : 0.000463s : 1: opt_after_jit_grad 0.16% : 0.000190s : 1: opt_b 3.14% : 0.003769s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000197s : 1: renormalize.infer 0.13% : 0.000160s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000045s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000073s : 1: symbol_engine_optimizer 82.84% : 0.099443s : 1: task_emit 0.06% : 0.000075s : 1: tuple_transform 3.75% : 0.004498s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.111396, [24] [bootstrap]: 0.00046567 [type_inference]: 0.00572452 [event_method]: 1.452e-05 [auto_monad]: 5.574e-05 [graph_reusing]: 5.50001e-06 [inline]: 2.21e-06 [add_attr]: 0.0029639, [1] [add_attr_with_inline]: 0.00295547, [1] [Cycle 1]: 4.552e-05, [2] [tag_attr]: 1.524e-05 [meta_addattr_fg_expand]: 4.27003e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.506e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.86e-06 [optimize]: 0.0040738, [53] [py_interpret_to_execute]: 2.076e-05 [rewriter_before_opt_a]: 5.8e-05 [opt_a]: 0.00220254, [2] [Cycle 1]: 0.00158396, [45] [expand_dump_flag]: 2.64001e-06 [switch_simplify]: 3.208e-05 [loop_unroll]: 2.134e-05 [a_1]: 0.00050098 [with_stream_mark]: 1.354e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.676e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 7.75e-06 [auto_parallel]: 5.92999e-06 [parallel]: 1.709e-05 [flash_sp]: 7.38e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 8.45999e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.29001e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.89999e-06 [merge_forward]: 4.02998e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 9.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.68e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.54998e-06 [renormalize]: 0.00043698 [add_forward_monad_depend]: 4.51002e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.75e-05 [a_3]: 4.182e-05 [Cycle 2]: 0.00060905, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00012681 [with_stream_mark]: 9.64e-06 [recompute_prepare]: 5.80002e-06 [updatestate_depend_eliminate]: 2.85998e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 6.885e-05 [accelerated_algorithm]: 5.87001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.35e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 4.80999e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 5.96998e-06 [virtual_dataset]: 7.66999e-06 [get_grad_eliminate_]: 4.89998e-06 [virtual_output]: 5.18002e-06 [merge_forward]: 2.60002e-06 [cell_reuse_recompute_pass]: 1.41002e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59999e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.03999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.15001e-06 [a_after_grad]: 7.95e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 6.48e-06 [cse]: 1.44e-05 [a_3]: 3.253e-05 [py_interpret_to_execute_after_opt_a]: 8.08001e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.32e-05 [convert_after_rewriter]: 7.51999e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.00045901 [opt_b]: 0.00018493, [1] [Cycle 1]: 0.00017905, [7] [b_1]: 0.00010988 [b_2]: 7.44002e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.35002e-06 [renormalize]: 4.50003e-07 [cse]: 1.653e-05 [optimize_parallel_all_gather_comm]: 1.648e-05 [overlap_param_gather]: 1.96998e-06 [cconv]: 2.272e-05 [loop_unroll]: 0.00042123 [opt_after_cconv]: 9.786e-05, [1] [Cycle 1]: 9.205e-05, [7] [c_1]: 2.926e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.673e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.328e-05 [tuple_transform]: 7.009e-05, [1] [Cycle 1]: 6.547e-05, [4] [d_1]: 3.959e-05 [none_parameter_eliminate]: 1.51998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.43e-06 [partial_unused_args_eliminate]: 1.61998e-06 [add_recomputation]: 4.472e-05 [cse_after_recomputation]: 2.134e-05, [1] [Cycle 1]: 1.676e-05, [1] [cse]: 1.131e-05 [environ_conv]: 5.19998e-06 [swap_dp_allreduce_reducescatter]: 5.22999e-06 [bias_add_comm_swap]: 2.30002e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.41998e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.68e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.50007e-07 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 3.01999e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.12e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.2e-05 [grouped_pairwise_exchange_alltoall]: 1.94e-06 [offloading_packed_experts]: 3.47002e-06 [overlap_recompute_and_grad_model_parallel]: 4.24002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.87001e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.866e-05, [1] [Cycle 1]: 6.453e-05, [6] [build]: 2.84999e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.128e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.573e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.50998e-06 [opt_after_jit_grad]: 0.00048167 [validate]: 3.185e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.097295 [execute]: 8.73001e-06 Sums bootstrap : 0.000466s : 0.43% type_inference : 0.005725s : 5.33% event_method : 0.000015s : 0.01% auto_monad : 0.000056s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000628s : 0.58% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000014s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000437s : 0.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000459s : 0.43% optimize.opt_b.b_1 : 0.000110s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000421s : 0.39% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000482s : 0.45% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.097295s : 90.56% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000168 30 15.10% : 0.000025s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.69% : 0.000112s : 3: substitution.inline 1.60% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.49% : 0.000004s : 4: substitution.replace_old_param 6.49% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005684 2 89.69% : 0.005098s : 1: type_inference.infer 10.31% : 0.000586s : 1: type_inference.specialize ------[replace.] 0.000042 5 68.41% : 0.000028s : 3: replace.inline 31.59% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.78% : 0.000110s : 3: match.inline 8.22% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 1.08% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 0.94% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.31% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 16: predicate.switch_defer_inline 1.92% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.82% : 0.000001s : 11: predicate.transpose_eliminate 1.59% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.33% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.90% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 45.23% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.77% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120022 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.47% : 0.002968s : 1: add_attr 2.47% : 0.002959s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000061s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000500s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.83% : 0.000999s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000092s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.84% : 0.002206s : 1: opt_a 0.08% : 0.000101s : 1: opt_after_cconv 0.41% : 0.000492s : 1: opt_after_jit_grad 0.16% : 0.000188s : 1: opt_b 3.40% : 0.004078s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.18% : 0.000215s : 1: renormalize.infer 0.18% : 0.000215s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 81.08% : 0.097318s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.78% : 0.005739s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.153163, [24] [bootstrap]: 0.00044957 [type_inference]: 0.011923 [event_method]: 5.593e-05 [auto_monad]: 0.00012565 [graph_reusing]: 7.95998e-06 [inline]: 2.37001e-06 [add_attr]: 0.00333666, [1] [add_attr_with_inline]: 0.00332588, [1] [Cycle 1]: 8.338e-05, [2] [tag_attr]: 3.726e-05 [meta_addattr_fg_expand]: 9.29e-06 [parallel-infer-symbol]: 3.44001e-06 [pre_auto_parallel]: 5.46e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 2.41e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.0149386, [53] [py_interpret_to_execute]: 4.199e-05 [rewriter_before_opt_a]: 0.00015431 [opt_a]: 0.0124151, [3] [Cycle 1]: 0.00810089, [45] [expand_dump_flag]: 4.67998e-06 [switch_simplify]: 7.74e-05 [loop_unroll]: 6.345e-05 [a_1]: 0.00155781 [with_stream_mark]: 2.846e-05 [recompute_prepare]: 2.372e-05 [updatestate_depend_eliminate]: 9.59e-06 [updatestate_assign_eliminate]: 8.21002e-06 [updatestate_loads_eliminate]: 7.46001e-06 [parameter_eliminate]: 2.67001e-06 [a_2]: 0.00024593 [accelerated_algorithm]: 3.266e-05 [shard]: 2.36e-06 [meta_shard_fg_expand]: 3.47002e-06 [shard_inline]: 1.647e-05 [merge_send_recv]: 1.715e-05 [auto_parallel]: 1.226e-05 [parallel]: 1.98e-05 [flash_sp]: 1.24e-05 [merge_comm]: 1.037e-05 [allreduce_fusion]: 8.87e-06 [matmul_add_comm_reduction]: 3.131e-05 [allreduce_slice_to_reducescatter]: 7.10017e-07 [virtual_shard_identity]: 1.874e-05 [virtual_dataset]: 1.603e-05 [get_grad_eliminate_]: 1.537e-05 [virtual_output]: 1.558e-05 [merge_forward]: 9.86e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 1.878e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.929e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 2.811e-05 [set_forward_comm_id_for_comm_node_pass]: 9.69e-06 [meta_fg_expand]: 0.00164965 [flash_sp_send_recv_attached]: 4.52e-06 [receive_attached]: 2.66e-06 [after_resolve]: 6.475e-05 [a_after_grad]: 8.606e-05 [renormalize]: 0.00300407 [add_forward_monad_depend]: 1.183e-05 [auto_monad_grad]: 5.66e-06 [auto_monad_eliminator]: 5.949e-05 [cse]: 0.00018259 [a_3]: 0.00035571 [Cycle 2]: 0.00337533, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 4.776e-05 [loop_unroll]: 4.475e-05 [a_1]: 0.00160962 [with_stream_mark]: 1.848e-05 [recompute_prepare]: 1.12e-05 [updatestate_depend_eliminate]: 5.91e-06 [updatestate_assign_eliminate]: 4.88001e-06 [updatestate_loads_eliminate]: 4.47e-06 [parameter_eliminate]: 2.15002e-06 [a_2]: 0.00012868 [accelerated_algorithm]: 1.295e-05 [shard]: 2.14e-06 [meta_shard_fg_expand]: 2.88e-06 [shard_inline]: 9.31e-06 [merge_send_recv]: 9.72001e-06 [auto_parallel]: 1.139e-05 [parallel]: 9.74999e-06 [flash_sp]: 4.57998e-06 [merge_comm]: 6.38e-06 [allreduce_fusion]: 5.56998e-06 [matmul_add_comm_reduction]: 1.194e-05 [allreduce_slice_to_reducescatter]: 4.90021e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 9.12999e-06 [get_grad_eliminate_]: 8.97e-06 [virtual_output]: 8.97999e-06 [merge_forward]: 5.55001e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.242e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.709e-05 [merge_recompute_call_nodes]: 1.02e-06 [before_grad]: 1.48e-05 [set_forward_comm_id_for_comm_node_pass]: 5.53002e-06 [meta_fg_expand]: 9.836e-05 [flash_sp_send_recv_attached]: 1.82999e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.71e-05 [a_after_grad]: 1.481e-05 [renormalize]: 0.00075956 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.87001e-06 [auto_monad_eliminator]: 3.821e-05 [cse]: 5.348e-05 [a_3]: 6.713e-05 [Cycle 3]: 0.00092029, [45] [expand_dump_flag]: 1.30999e-06 [switch_simplify]: 1.057e-05 [loop_unroll]: 9.02999e-06 [a_1]: 0.00025608 [with_stream_mark]: 1.06e-05 [recompute_prepare]: 9.76e-06 [updatestate_depend_eliminate]: 4.86002e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 0.00012456 [accelerated_algorithm]: 1.165e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 9.02e-06 [merge_send_recv]: 7.31001e-06 [auto_parallel]: 7.47002e-06 [parallel]: 4.85999e-06 [flash_sp]: 1.04998e-06 [merge_comm]: 5.25001e-06 [allreduce_fusion]: 5.27001e-06 [matmul_add_comm_reduction]: 7.87998e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 9.82999e-06 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 8.47e-06 [virtual_output]: 8.51002e-06 [merge_forward]: 4.36002e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 9.14998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.606e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.419e-05 [set_forward_comm_id_for_comm_node_pass]: 6.21998e-06 [meta_fg_expand]: 3.14001e-06 [flash_sp_send_recv_attached]: 9.49978e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.493e-05 [a_after_grad]: 1.538e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 1.083e-05 [cse]: 2.814e-05 [a_3]: 6.053e-05 [py_interpret_to_execute_after_opt_a]: 1.414e-05 [slice_cell_reuse_recomputed_activation]: 2.03997e-06 [rewriter_after_opt_a]: 5.031e-05 [convert_after_rewriter]: 9.32999e-06 [order_py_execute_after_rewriter]: 7.65e-06 [mutable_eliminate]: 0.00063311 [opt_b]: 0.00029702, [1] [Cycle 1]: 0.00028986, [7] [b_1]: 0.00019392 [b_2]: 1.11e-05 [updatestate_depend_eliminate]: 7.53999e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 3.99002e-06 [renormalize]: 4.09986e-07 [cse]: 3.322e-05 [optimize_parallel_all_gather_comm]: 2.196e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.357e-05 [loop_unroll]: 0.00044207 [opt_after_cconv]: 0.00014157, [1] [Cycle 1]: 0.00013516, [7] [c_1]: 4.918e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 7.61001e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 3.247e-05 [renormalize]: 2.59985e-07 [remove_dup_value]: 3.262e-05 [tuple_transform]: 0.00010604, [1] [Cycle 1]: 0.00010106, [4] [d_1]: 6.949e-05 [none_parameter_eliminate]: 1.94999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.005e-05 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 5.866e-05 [cse_after_recomputation]: 3.406e-05, [1] [Cycle 1]: 2.905e-05, [1] [cse]: 2.312e-05 [environ_conv]: 8.90001e-06 [swap_dp_allreduce_reducescatter]: 8.35001e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.92e-06 [label_fine_grained_interleaved_index]: 3.01999e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.762e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 5.46e-06 [overlap_recompute_and_grad_model_parallel]: 6.04001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 5.19e-06 [overlap_grad_flash_sp]: 2.72e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 2.01998e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 0.00010203, [1] [Cycle 1]: 9.743e-05, [6] [build]: 1.017e-05 [elim_shapecalc]: 1.386e-05 [elim_not_effective]: 1.869e-05 [opt_reshape]: 1.032e-05 [fold_const_symbol]: 1.464e-05 [renormalize]: 2.09984e-07 [detach_backward]: 2.33998e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.612e-05 [get_jit_bprop_graph]: 1.72999e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00049456 [validate]: 9.151e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.121379 [execute]: 9.09e-06 Sums bootstrap : 0.000450s : 0.30% type_inference : 0.011923s : 8.03% event_method : 0.000056s : 0.04% auto_monad : 0.000126s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000037s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000055s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.03% optimize.rewriter_before_opt_a : 0.000154s : 0.10% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000136s : 0.09% optimize.opt_a.loop_unroll : 0.000117s : 0.08% optimize.opt_a.a_1 : 0.003424s : 2.31% optimize.opt_a.with_stream_mark : 0.000058s : 0.04% optimize.opt_a.recompute_prepare : 0.000045s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000006s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000034s : 0.02% optimize.opt_a.auto_parallel : 0.000031s : 0.02% optimize.opt_a.parallel : 0.000034s : 0.02% optimize.opt_a.flash_sp : 0.000018s : 0.01% optimize.opt_a.merge_comm : 0.000022s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000051s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000020s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000040s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001751s : 1.18% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.00% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000097s : 0.07% optimize.opt_a.a_after_grad : 0.000116s : 0.08% optimize.opt_a.renormalize : 0.003764s : 2.54% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000109s : 0.07% optimize.opt_a.cse : 0.000264s : 0.18% optimize.opt_a.a_3 : 0.000483s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000008s : 0.01% optimize.mutable_eliminate : 0.000633s : 0.43% optimize.opt_b.b_1 : 0.000194s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000442s : 0.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000033s : 0.02% optimize.tuple_transform.d_1 : 0.000069s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000027s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000495s : 0.33% validate : 0.000092s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.121379s : 81.76% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000899 222 5.84% : 0.000053s : 12: substitution.arithmetic_simplify 1.64% : 0.000015s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000004s : 5: substitution.float_depend_g_call 0.46% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 5: substitution.fold_const_symbol 0.85% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 58.18% : 0.000523s : 17: substitution.inline 2.11% : 0.000019s : 2: substitution.inline_without_move 1.25% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000017s : 3: substitution.less_batch_normalization 1.54% : 0.000014s : 11: substitution.minmaximum_grad 0.71% : 0.000006s : 5: substitution.partial_eliminate 1.56% : 0.000014s : 20: substitution.remove_not_recompute_node 3.37% : 0.000030s : 10: substitution.replace_applicator 1.24% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.35% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.60% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.16% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.21% : 0.000074s : 30: substitution.tuple_list_get_item_eliminator 2.13% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011839 2 85.36% : 0.010106s : 1: type_inference.infer 14.64% : 0.001733s : 1: type_inference.specialize ------[replace.] 0.000244 33 59.69% : 0.000146s : 17: replace.inline 40.31% : 0.000098s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000551 33 93.28% : 0.000513s : 17: match.inline 6.72% : 0.000037s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000770 5764 1.06% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.01% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.14% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_depend_swap 1.72% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000018s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000043s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.60% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 68: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.24% : 0.000017s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.63% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.36% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.93% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.98% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.04% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000024s : 132: predicate.tuple_list_get_item_eliminator 1.54% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.59% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.18% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001753 34 56.01% : 0.000982s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.99% : 0.000771s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.180657 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.85% : 0.003342s : 1: add_attr 1.84% : 0.003330s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000133s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.27% : 0.000485s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000064s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.25% : 0.000451s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000643s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.84% : 0.005139s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000178s : 28: opt.transform.opt_b 0.04% : 0.000077s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.87% : 0.012418s : 1: opt_a 0.08% : 0.000145s : 1: opt_after_cconv 0.28% : 0.000505s : 1: opt_after_jit_grad 0.17% : 0.000301s : 1: opt_b 8.27% : 0.014945s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000061s : 1: pre_auto_parallel 0.03% : 0.000047s : 1: py_interpret_to_execute 0.01% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000037s : 1: remove_dup_value 1.15% : 0.002086s : 2: renormalize.infer 0.92% : 0.001660s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000054s : 1: rewriter_after_opt_a 0.09% : 0.000160s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000105s : 1: symbol_engine_optimizer 67.20% : 0.121402s : 1: task_emit 0.06% : 0.000109s : 1: tuple_transform 6.61% : 0.011943s : 1: type_inference 0.07% : 0.000125s : 1: validate TotalTime = 0.105513, [24] [bootstrap]: 0.00044117 [type_inference]: 0.004294 [event_method]: 1.069e-05 [auto_monad]: 4.965e-05 [graph_reusing]: 5.12e-06 [inline]: 1.86e-06 [add_attr]: 0.00297176, [1] [add_attr_with_inline]: 0.00296324, [1] [Cycle 1]: 4.52e-05, [2] [tag_attr]: 1.245e-05 [meta_addattr_fg_expand]: 2.92002e-06 [parallel-infer-symbol]: 2.83998e-06 [pre_auto_parallel]: 2.158e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.00368642, [53] [py_interpret_to_execute]: 1.568e-05 [rewriter_before_opt_a]: 3.798e-05 [opt_a]: 0.00187145, [2] [Cycle 1]: 0.00126878, [45] [expand_dump_flag]: 2.58003e-06 [switch_simplify]: 2.507e-05 [loop_unroll]: 1.379e-05 [a_1]: 0.0002934 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.88999e-06 [updatestate_depend_eliminate]: 3.59002e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 3.41001e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 7.627e-05 [accelerated_algorithm]: 6.71999e-06 [shard]: 2.43e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 6.19999e-06 [merge_send_recv]: 7.37002e-06 [auto_parallel]: 5.76e-06 [parallel]: 1.78e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 8.74998e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.65001e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 6.02001e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.24998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.53002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.43002e-06 [after_resolve]: 1.084e-05 [a_after_grad]: 8.68001e-06 [renormalize]: 0.00035484 [add_forward_monad_depend]: 4.60999e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.312e-05 [cse]: 2.766e-05 [a_3]: 4.007e-05 [Cycle 2]: 0.00059329, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012469 [with_stream_mark]: 9.45001e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.798e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.02e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.36002e-06 [auto_parallel]: 5.10001e-06 [parallel]: 4.76002e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.29e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.37001e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.06997e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.006e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.09e-06 [a_after_grad]: 8.23999e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.21e-06 [cse]: 1.277e-05 [a_3]: 3.213e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.302e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00044645 [opt_b]: 0.00018155, [1] [Cycle 1]: 0.00017543, [7] [b_1]: 0.00010792 [b_2]: 7.29001e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 3.50003e-07 [cse]: 1.628e-05 [optimize_parallel_all_gather_comm]: 1.521e-05 [overlap_param_gather]: 1.73997e-06 [cconv]: 2.263e-05 [loop_unroll]: 0.00041417 [opt_after_cconv]: 9.549e-05, [1] [Cycle 1]: 8.996e-05, [7] [c_1]: 2.797e-05 [parameter_eliminate]: 2.13998e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.13998e-06 [cse]: 1.639e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.223e-05 [tuple_transform]: 6.903e-05, [1] [Cycle 1]: 6.469e-05, [4] [d_1]: 3.87e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.64999e-06 [partial_unused_args_eliminate]: 1.84998e-06 [add_recomputation]: 4.224e-05 [cse_after_recomputation]: 1.933e-05, [1] [Cycle 1]: 1.497e-05, [1] [cse]: 9.77001e-06 [environ_conv]: 4.63999e-06 [swap_dp_allreduce_reducescatter]: 5.21002e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.01998e-06 [reorder_send_recv_between_fp_bp]: 2.68998e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 1.25999e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.174e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.23998e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 7.798e-05, [1] [Cycle 1]: 7.391e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.92999e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.548e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.98999e-06 [opt_after_jit_grad]: 0.0004466 [validate]: 3.125e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0933036 [execute]: 8.98002e-06 Sums bootstrap : 0.000441s : 0.43% type_inference : 0.004294s : 4.23% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000418s : 0.41% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000355s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.44% optimize.opt_b.b_1 : 0.000108s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000414s : 0.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.44% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.093304s : 91.87% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.18% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000005s : 4: substitution.graph_param_transform 65.59% : 0.000079s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.91% : 0.000005s : 4: substitution.remove_not_recompute_node 3.09% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004253 2 91.49% : 0.003891s : 1: type_inference.infer 8.51% : 0.000362s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 17: predicate.arithmetic_simplify 0.98% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.00% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 2.07% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.57% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.32% : 0.000002s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.82% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.84% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.28% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.69% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.48% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.93% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.06% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 41.71% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.29% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.113453 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.62% : 0.002976s : 1: add_attr 2.62% : 0.002967s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000478s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.68% : 0.000772s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.65% : 0.001874s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.40% : 0.000456s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.25% : 0.003690s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000196s : 1: renormalize.infer 0.13% : 0.000153s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.04% : 0.000042s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000081s : 1: symbol_engine_optimizer 82.26% : 0.093326s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.80% : 0.004307s : 1: type_inference 0.05% : 0.000052s : 1: validate TotalTime = 0.150048, [24] [bootstrap]: 0.00044748 [type_inference]: 0.0102292 [event_method]: 4.317e-05 [auto_monad]: 0.00011807 [graph_reusing]: 7.78001e-06 [inline]: 1.94999e-06 [add_attr]: 0.00305398, [1] [add_attr_with_inline]: 0.00304541, [1] [Cycle 1]: 6.607e-05, [2] [tag_attr]: 3.052e-05 [meta_addattr_fg_expand]: 8.54e-06 [parallel-infer-symbol]: 2.67001e-06 [pre_auto_parallel]: 4.608e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.94999e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.013149, [53] [py_interpret_to_execute]: 3.565e-05 [rewriter_before_opt_a]: 0.00012833 [opt_a]: 0.0109067, [3] [Cycle 1]: 0.0069735, [45] [expand_dump_flag]: 3.75e-06 [switch_simplify]: 6.555e-05 [loop_unroll]: 6.043e-05 [a_1]: 0.00134216 [with_stream_mark]: 2.289e-05 [recompute_prepare]: 2.142e-05 [updatestate_depend_eliminate]: 9.25001e-06 [updatestate_assign_eliminate]: 7.85e-06 [updatestate_loads_eliminate]: 7.71999e-06 [parameter_eliminate]: 2.64999e-06 [a_2]: 0.00024473 [accelerated_algorithm]: 3.099e-05 [shard]: 1.83002e-06 [meta_shard_fg_expand]: 3.36001e-06 [shard_inline]: 1.629e-05 [merge_send_recv]: 1.566e-05 [auto_parallel]: 1.097e-05 [parallel]: 1.863e-05 [flash_sp]: 1.209e-05 [merge_comm]: 1.002e-05 [allreduce_fusion]: 9.02e-06 [matmul_add_comm_reduction]: 2.604e-05 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 1.785e-05 [virtual_dataset]: 1.564e-05 [get_grad_eliminate_]: 1.545e-05 [virtual_output]: 1.566e-05 [merge_forward]: 9.21998e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.804e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.855e-05 [merge_recompute_call_nodes]: 1.92001e-06 [before_grad]: 2.766e-05 [set_forward_comm_id_for_comm_node_pass]: 9.61998e-06 [meta_fg_expand]: 0.00139462 [flash_sp_send_recv_attached]: 4.15e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 5.914e-05 [a_after_grad]: 8.05e-05 [renormalize]: 0.00244273 [add_forward_monad_depend]: 9.05999e-06 [auto_monad_grad]: 6.09001e-06 [auto_monad_eliminator]: 5.597e-05 [cse]: 0.0002007 [a_3]: 0.00033496 [Cycle 2]: 0.00297587, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.664e-05 [loop_unroll]: 4.357e-05 [a_1]: 0.00153202 [with_stream_mark]: 1.234e-05 [recompute_prepare]: 1.089e-05 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 3.65003e-06 [parameter_eliminate]: 1.17e-06 [a_2]: 0.00012538 [accelerated_algorithm]: 1.18e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.92999e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 7.16999e-06 [auto_parallel]: 7.38e-06 [parallel]: 4.93001e-06 [flash_sp]: 2.93e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 4.60999e-06 [matmul_add_comm_reduction]: 8.02003e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 1.052e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.80999e-06 [virtual_output]: 8.50001e-06 [merge_forward]: 5.21002e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.65e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.412e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 3.503e-05 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.533e-05 [a_after_grad]: 1.434e-05 [renormalize]: 0.00059748 [add_forward_monad_depend]: 4.02e-06 [auto_monad_grad]: 1.37e-06 [auto_monad_eliminator]: 1.481e-05 [cse]: 4.769e-05 [a_3]: 6.564e-05 [Cycle 3]: 0.0009433, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 1.051e-05 [loop_unroll]: 9.05999e-06 [a_1]: 0.00025242 [with_stream_mark]: 9.74999e-06 [recompute_prepare]: 9.29e-06 [updatestate_depend_eliminate]: 4.75999e-06 [updatestate_assign_eliminate]: 3.89002e-06 [updatestate_loads_eliminate]: 3.95998e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012381 [accelerated_algorithm]: 1.17e-05 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 3.89e-05 [auto_parallel]: 7.56999e-06 [parallel]: 4.75999e-06 [flash_sp]: 1.12e-06 [merge_comm]: 5.54e-06 [allreduce_fusion]: 5.08002e-06 [matmul_add_comm_reduction]: 7.71999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.042e-05 [virtual_dataset]: 8.74998e-06 [get_grad_eliminate_]: 8.73001e-06 [virtual_output]: 8.24002e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 8.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.617e-05 [merge_recompute_call_nodes]: 8.09989e-07 [before_grad]: 1.386e-05 [set_forward_comm_id_for_comm_node_pass]: 5.31002e-06 [meta_fg_expand]: 3.08998e-06 [flash_sp_send_recv_attached]: 7.49977e-07 [receive_attached]: 1.19998e-06 [after_resolve]: 1.344e-05 [a_after_grad]: 1.424e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.40999e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 1.143e-05 [cse]: 2.821e-05 [a_3]: 5.937e-05 [py_interpret_to_execute_after_opt_a]: 1.065e-05 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 4.682e-05 [convert_after_rewriter]: 9.27999e-06 [order_py_execute_after_rewriter]: 6.66e-06 [mutable_eliminate]: 0.00045616 [opt_b]: 0.00028969, [1] [Cycle 1]: 0.00028353, [7] [b_1]: 0.00018984 [b_2]: 1.093e-05 [updatestate_depend_eliminate]: 7.05e-06 [updatestate_assign_eliminate]: 4.41002e-06 [updatestate_loads_eliminate]: 3.98001e-06 [renormalize]: 4.89992e-07 [cse]: 3.189e-05 [optimize_parallel_all_gather_comm]: 2.071e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.016e-05 [loop_unroll]: 0.00042042 [opt_after_cconv]: 0.00013637, [1] [Cycle 1]: 0.00013058, [7] [c_1]: 4.868e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 7.08998e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.88001e-06 [cse]: 3.044e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 2.882e-05 [tuple_transform]: 0.00010255, [1] [Cycle 1]: 9.779e-05, [4] [d_1]: 6.758e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.52999e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 5.785e-05 [cse_after_recomputation]: 3.287e-05, [1] [Cycle 1]: 2.814e-05, [1] [cse]: 2.23e-05 [environ_conv]: 9.59e-06 [swap_dp_allreduce_reducescatter]: 7.75998e-06 [bias_add_comm_swap]: 2.43998e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.32999e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.677e-05 [grouped_pairwise_exchange_alltoall]: 1.87999e-06 [offloading_packed_experts]: 4.59002e-06 [overlap_recompute_and_grad_model_parallel]: 5.39998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60001e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 5.19998e-06 [overlap_grad_flash_sp]: 2.441e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.971e-05, [1] [Cycle 1]: 9.534e-05, [6] [build]: 9.00999e-06 [elim_shapecalc]: 1.36e-05 [elim_not_effective]: 1.829e-05 [opt_reshape]: 1.085e-05 [fold_const_symbol]: 1.489e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 2.541e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.58999e-06 [opt_after_jit_grad]: 0.00046239 [validate]: 4.632e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.122168 [execute]: 9.34998e-06 Sums bootstrap : 0.000447s : 0.31% type_inference : 0.010229s : 7.02% event_method : 0.000043s : 0.03% auto_monad : 0.000118s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.02% optimize.rewriter_before_opt_a : 0.000128s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000123s : 0.08% optimize.opt_a.loop_unroll : 0.000113s : 0.08% optimize.opt_a.a_1 : 0.003127s : 2.15% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000062s : 0.04% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001433s : 0.98% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.06% optimize.opt_a.a_after_grad : 0.000109s : 0.07% optimize.opt_a.renormalize : 0.003040s : 2.09% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.06% optimize.opt_a.cse : 0.000277s : 0.19% optimize.opt_a.a_3 : 0.000460s : 0.32% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000456s : 0.31% optimize.opt_b.b_1 : 0.000190s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000420s : 0.29% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000068s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000462s : 0.32% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.122168s : 83.84% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000735 218 5.93% : 0.000044s : 11: substitution.arithmetic_simplify 1.97% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.06% : 0.000404s : 16: substitution.inline 2.09% : 0.000015s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.03% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000005s : 5: substitution.partial_eliminate 1.79% : 0.000013s : 20: substitution.remove_not_recompute_node 3.27% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.74% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.37% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.47% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010162 2 86.74% : 0.008814s : 1: type_inference.infer 13.26% : 0.001348s : 1: type_inference.specialize ------[replace.] 0.000201 30 58.95% : 0.000118s : 16: replace.inline 41.05% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000427 30 92.88% : 0.000396s : 16: match.inline 7.12% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000744 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.12% : 0.000016s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.23% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.23% : 0.000009s : 75: predicate.environ_get_depend_swap 1.73% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000042s : 244: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.47% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.76% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 67: predicate.reduce_eliminate 2.62% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.54% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.53% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.60% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001522 32 56.95% : 0.000867s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.05% : 0.000655s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.174392 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.75% : 0.003059s : 1: add_attr 1.75% : 0.003049s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000125s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.27% : 0.000475s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000050s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000465s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.74% : 0.004780s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.26% : 0.010910s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.27% : 0.000472s : 1: opt_after_jit_grad 0.17% : 0.000293s : 1: opt_b 7.54% : 0.013153s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.02% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.92% : 0.001601s : 2: renormalize.infer 0.82% : 0.001426s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.08% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000102s : 1: symbol_engine_optimizer 70.07% : 0.122195s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 5.87% : 0.010245s : 1: type_inference 0.04% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x2-ge],max_mem:44.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x3-pynative],max_mem:44.0M TotalTime = 0.0236088, [24] [bootstrap]: 0.0005861 [type_inference]: 0.00732723 [event_method]: 1.45e-05 [auto_monad]: 5.731e-05 [graph_reusing]: 5.69e-06 [inline]: 1.89999e-06 [add_attr]: 0.0039599, [1] [add_attr_with_inline]: 0.00394943, [1] [Cycle 1]: 4.441e-05, [2] [tag_attr]: 1.55e-05 [meta_addattr_fg_expand]: 4.11001e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.879e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.32999e-06 [pipeline_split]: 1.76003e-06 [optimize]: 0.00399393, [53] [py_interpret_to_execute]: 1.974e-05 [rewriter_before_opt_a]: 5.854e-05 [opt_a]: 0.00213073, [2] [Cycle 1]: 0.00152769, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 3.244e-05 [loop_unroll]: 2.121e-05 [a_1]: 0.00045092 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.05998e-06 [updatestate_loads_eliminate]: 3.21999e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.639e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 7.84002e-06 [auto_parallel]: 5.92001e-06 [parallel]: 2.456e-05 [flash_sp]: 7.31999e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 8.10999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.96e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.28997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.46003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.033e-05 [a_after_grad]: 8.69e-06 [renormalize]: 0.00042837 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.269e-05 [cse]: 2.751e-05 [a_3]: 4.036e-05 [Cycle 2]: 0.00059334, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012553 [with_stream_mark]: 9.54e-06 [recompute_prepare]: 5.67999e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.38002e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.813e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.55999e-06 [flash_sp]: 2.98e-06 [merge_comm]: 3.34001e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.94e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.99001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.51003e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 8.08001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01999e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.02e-06 [a_after_grad]: 7.9e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.348e-05 [a_3]: 3.158e-05 [py_interpret_to_execute_after_opt_a]: 7.8e-06 [slice_cell_reuse_recomputed_activation]: 1.69998e-06 [rewriter_after_opt_a]: 2.853e-05 [convert_after_rewriter]: 6.64001e-06 [order_py_execute_after_rewriter]: 5.00999e-06 [mutable_eliminate]: 0.00047405 [opt_b]: 0.00018123, [1] [Cycle 1]: 0.0001752, [7] [b_1]: 0.00010728 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.80009e-07 [cse]: 1.636e-05 [optimize_parallel_all_gather_comm]: 1.524e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.229e-05 [loop_unroll]: 0.00041274 [opt_after_cconv]: 9.699e-05, [1] [Cycle 1]: 9.093e-05, [7] [c_1]: 2.85e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 5.55001e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.18998e-06 [cse]: 1.659e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.276e-05 [tuple_transform]: 7.006e-05, [1] [Cycle 1]: 6.577e-05, [4] [d_1]: 4e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.18002e-06 [partial_unused_args_eliminate]: 1.52001e-06 [add_recomputation]: 4.837e-05 [cse_after_recomputation]: 2.05e-05, [1] [Cycle 1]: 1.61e-05, [1] [cse]: 1.097e-05 [environ_conv]: 5.00999e-06 [swap_dp_allreduce_reducescatter]: 5.79e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.95001e-06 [label_fine_grained_interleaved_index]: 2.86999e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 1.98002e-06 [micro_interleaved_order_control]: 2.52001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.21002e-06 [add_comm_op_reuse_tag]: 1.08001e-06 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.40999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.161e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.84002e-06 [overlap_recompute_and_grad_model_parallel]: 4.27e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.76e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 6.868e-05, [1] [Cycle 1]: 6.456e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.129e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.90001e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.59e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.587e-05 [get_jit_bprop_graph]: 1.20999e-06 [rewriter_after_jit_bprop_graph]: 0.00012115 [opt_after_jit_grad]: 0.00045054 [validate]: 3.182e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00679204 [execute]: 8.02e-06 Sums bootstrap : 0.000586s : 3.14% type_inference : 0.007327s : 39.22% event_method : 0.000014s : 0.08% auto_monad : 0.000057s : 0.31% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.31% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.21% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000576s : 3.09% optimize.opt_a.with_stream_mark : 0.000023s : 0.12% optimize.opt_a.recompute_prepare : 0.000014s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.77% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.03% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000428s : 2.29% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.10% optimize.opt_a.cse : 0.000041s : 0.22% optimize.opt_a.a_3 : 0.000072s : 0.39% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.15% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000474s : 2.54% optimize.opt_b.b_1 : 0.000107s : 0.57% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000413s : 2.21% optimize.opt_after_cconv.c_1 : 0.000029s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.26% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.09% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.08% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000121s : 0.65% opt_after_jit_grad : 0.000451s : 2.41% validate : 0.000032s : 0.17% backend_pass : 0.000001s : 0.01% task_emit : 0.006792s : 36.35% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.92% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.36% : 0.000006s : 4: substitution.graph_param_transform 66.49% : 0.000109s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007273 2 90.46% : 0.006579s : 1: type_inference.infer 9.54% : 0.000694s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.34% : 0.000026s : 3: replace.inline 30.66% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.45% : 0.000107s : 3: match.inline 8.55% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.20% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.93% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000009s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.27% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.48% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.66% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.94% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000579 8 54.19% : 0.000314s : 3: func_graph_cloner_run.FuncGraphClonerGraph 45.81% : 0.000265s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033087 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.98% : 0.003964s : 1: add_attr 11.95% : 0.003953s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000063s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000615s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.27% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.46% : 0.000483s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.85% : 0.000943s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000089s : 28: opt.transform.opt_b 0.13% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.45% : 0.002133s : 1: opt_a 0.30% : 0.000100s : 1: opt_after_cconv 1.39% : 0.000460s : 1: opt_after_jit_grad 0.56% : 0.000185s : 1: opt_b 12.08% : 0.003998s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.07% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.67% : 0.000221s : 1: renormalize.infer 0.60% : 0.000200s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.38% : 0.000127s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000032s : 1: rewriter_after_opt_a 0.19% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000071s : 1: symbol_engine_optimizer 20.56% : 0.006802s : 1: task_emit 0.22% : 0.000073s : 1: tuple_transform 22.19% : 0.007340s : 1: type_inference 0.20% : 0.000065s : 1: validate TotalTime = 0.0183914, [24] [bootstrap]: 0.00046903 [type_inference]: 0.0044847 [event_method]: 1.186e-05 [auto_monad]: 5.187e-05 [graph_reusing]: 4.80999e-06 [inline]: 2.06998e-06 [add_attr]: 0.00302031, [1] [add_attr_with_inline]: 0.00301265, [1] [Cycle 1]: 4.471e-05, [2] [tag_attr]: 1.222e-05 [meta_addattr_fg_expand]: 3.57002e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.236e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.0037183, [53] [py_interpret_to_execute]: 1.584e-05 [rewriter_before_opt_a]: 3.886e-05 [opt_a]: 0.00192175, [2] [Cycle 1]: 0.0012518, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.353e-05 [loop_unroll]: 1.347e-05 [a_1]: 0.00029207 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 4.16001e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.51998e-06 [a_2]: 7.569e-05 [accelerated_algorithm]: 6.63e-06 [shard]: 2.18002e-06 [meta_shard_fg_expand]: 1.42999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.08001e-06 [auto_parallel]: 6.00002e-06 [parallel]: 1.824e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 9.20999e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 6.85002e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.40999e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 8.83001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.35997e-06 [receive_attached]: 2.87002e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.69e-06 [renormalize]: 0.00034378 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.282e-05 [cse]: 2.544e-05 [a_3]: 4.036e-05 [Cycle 2]: 0.00066069, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00012548 [with_stream_mark]: 9.39e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 7.10017e-07 [a_2]: 0.00013358 [accelerated_algorithm]: 5.86998e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.69999e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.22003e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 5.50001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.23002e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.90998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.73002e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.14e-06 [after_resolve]: 8.90999e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 9.90025e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.263e-05 [a_3]: 3.246e-05 [py_interpret_to_execute_after_opt_a]: 7.62998e-06 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 3.076e-05 [convert_after_rewriter]: 7.01001e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.00044626 [opt_b]: 0.00018118, [1] [Cycle 1]: 0.00017518, [7] [b_1]: 0.00010816 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 3.39991e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.568e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 2.192e-05 [loop_unroll]: 0.00040793 [opt_after_cconv]: 9.399e-05, [1] [Cycle 1]: 8.806e-05, [7] [c_1]: 2.788e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.533e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.243e-05 [tuple_transform]: 7.005e-05, [1] [Cycle 1]: 6.558e-05, [4] [d_1]: 3.956e-05 [none_parameter_eliminate]: 1.86998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 1.56002e-06 [add_recomputation]: 4.257e-05 [cse_after_recomputation]: 2.044e-05, [1] [Cycle 1]: 1.576e-05, [1] [cse]: 1.063e-05 [environ_conv]: 4.91002e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.48998e-06 [label_micro_interleaved_index]: 4.34002e-06 [label_fine_grained_interleaved_index]: 3.05998e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.89995e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 9.40025e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.35999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.38999e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.738e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.742e-05, [1] [Cycle 1]: 6.321e-05, [6] [build]: 2.24999e-06 [elim_shapecalc]: 7.96001e-06 [elim_not_effective]: 1.129e-05 [opt_reshape]: 5.78002e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.533e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00044702 [validate]: 3.068e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00589144 [execute]: 6.39001e-06 Sums bootstrap : 0.000469s : 3.25% type_inference : 0.004485s : 31.12% event_method : 0.000012s : 0.08% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000418s : 2.90% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000209s : 1.45% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000344s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 3.10% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000408s : 2.83% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000447s : 3.10% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005891s : 40.88% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000121 26 18.57% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.59% : 0.000006s : 4: substitution.graph_param_transform 65.44% : 0.000079s : 2: substitution.inline 2.46% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.97% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004441 2 91.87% : 0.004080s : 1: type_inference.infer 8.13% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.98% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.11% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.85% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000008s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.71% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 17: predicate.replace_applicator 0.98% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.62% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.71% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.55% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000244 6 42.66% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.34% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026464 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.43% : 0.003025s : 1: add_attr 11.40% : 0.003016s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000504s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000018s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000416s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000831s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.27% : 0.001925s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.72% : 0.000456s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.06% : 0.003722s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000190s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 22.30% : 0.005902s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 17.00% : 0.004499s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0196095, [24] [bootstrap]: 0.00049221 [type_inference]: 0.00551908 [event_method]: 1.396e-05 [auto_monad]: 6.249e-05 [graph_reusing]: 5.76998e-06 [inline]: 2.04e-06 [add_attr]: 0.00299279, [1] [add_attr_with_inline]: 0.00298508, [1] [Cycle 1]: 4.399e-05, [2] [tag_attr]: 1.487e-05 [meta_addattr_fg_expand]: 4.09002e-06 [parallel-infer-symbol]: 2.75002e-06 [pre_auto_parallel]: 2.567e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 1.66002e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00396344, [53] [py_interpret_to_execute]: 1.963e-05 [rewriter_before_opt_a]: 5.757e-05 [opt_a]: 0.00208937, [2] [Cycle 1]: 0.00149137, [45] [expand_dump_flag]: 3.23e-06 [switch_simplify]: 3.226e-05 [loop_unroll]: 2.124e-05 [a_1]: 0.0004481 [with_stream_mark]: 1.262e-05 [recompute_prepare]: 7.42998e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.22002e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 1.55001e-06 [a_2]: 7.56e-05 [accelerated_algorithm]: 6.30002e-06 [shard]: 2.62001e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 7.63001e-06 [auto_parallel]: 5.72001e-06 [parallel]: 1.65e-05 [flash_sp]: 7.05998e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.03e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 9.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.093e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.13002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.33002e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.01e-05 [a_after_grad]: 9.14e-06 [renormalize]: 0.00041026 [add_forward_monad_depend]: 4.2e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.688e-05 [a_3]: 3.969e-05 [Cycle 2]: 0.0005887, [45] [expand_dump_flag]: 8.90024e-07 [switch_simplify]: 6.48998e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012775 [with_stream_mark]: 9.19998e-06 [recompute_prepare]: 5.90002e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.793e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.35001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 5.20999e-06 [parallel]: 4.63999e-06 [flash_sp]: 3.23998e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 5.92001e-06 [virtual_dataset]: 5.14003e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.70002e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.58002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.21998e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.11002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86999e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 9.09998e-06 [a_after_grad]: 7.75e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.563e-05 [a_3]: 3.094e-05 [py_interpret_to_execute_after_opt_a]: 7.58001e-06 [slice_cell_reuse_recomputed_activation]: 2.32001e-06 [rewriter_after_opt_a]: 3.062e-05 [convert_after_rewriter]: 6.74999e-06 [order_py_execute_after_rewriter]: 4.62e-06 [mutable_eliminate]: 0.00045166 [opt_b]: 0.00018141, [1] [Cycle 1]: 0.00017538, [7] [b_1]: 0.00010706 [b_2]: 6.63e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 5.10016e-07 [cse]: 1.631e-05 [optimize_parallel_all_gather_comm]: 1.655e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 6.438e-05 [loop_unroll]: 0.00041808 [opt_after_cconv]: 9.373e-05, [1] [Cycle 1]: 8.805e-05, [7] [c_1]: 2.778e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.514e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.221e-05 [tuple_transform]: 6.868e-05, [1] [Cycle 1]: 6.442e-05, [4] [d_1]: 3.854e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.50001e-06 [add_recomputation]: 4.323e-05 [cse_after_recomputation]: 1.936e-05, [1] [Cycle 1]: 1.496e-05, [1] [cse]: 9.90002e-06 [environ_conv]: 4.55999e-06 [swap_dp_allreduce_reducescatter]: 5.46e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.14002e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.53e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 2.36e-06 [offloading_packed_experts]: 3.80998e-06 [overlap_recompute_and_grad_model_parallel]: 4.17e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.27001e-06 [overlap_grad_ring_attention]: 3.68e-06 [overlap_grad_flash_sp]: 1.665e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.78002e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.741e-05, [1] [Cycle 1]: 6.342e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.15999e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00044676 [validate]: 3.018e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0058214 [execute]: 7.25e-06 Sums bootstrap : 0.000492s : 3.14% type_inference : 0.005519s : 35.23% event_method : 0.000014s : 0.09% auto_monad : 0.000062s : 0.40% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000576s : 3.68% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000410s : 2.62% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.88% optimize.opt_b.b_1 : 0.000107s : 0.68% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000064s : 0.41% optimize.loop_unroll : 0.000418s : 2.67% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.02% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 2.85% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005821s : 37.16% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000164 30 15.03% : 0.000025s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000005s : 4: substitution.graph_param_transform 66.51% : 0.000109s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000004s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 6.80% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005479 2 90.02% : 0.004932s : 1: type_inference.infer 9.98% : 0.000547s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.17% : 0.000027s : 3: replace.inline 29.83% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.37% : 0.000107s : 3: match.inline 8.63% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.65% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.91% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.99% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.90% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.36% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.91% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 1.01% : 0.000002s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 45.83% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.17% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028072 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.68% : 0.002997s : 1: add_attr 10.65% : 0.002989s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.24% : 0.000068s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.88% : 0.000526s : 1: bootstrap 0.24% : 0.000069s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.52% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.34% : 0.000939s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.45% : 0.002092s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.63% : 0.000456s : 1: opt_after_jit_grad 0.66% : 0.000185s : 1: opt_b 14.13% : 0.003967s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000209s : 1: renormalize.infer 0.69% : 0.000195s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 20.77% : 0.005831s : 1: task_emit 0.25% : 0.000072s : 1: tuple_transform 19.71% : 0.005532s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0374116, [24] [bootstrap]: 0.000525 [type_inference]: 0.0112417 [event_method]: 4.716e-05 [auto_monad]: 0.00011887 [graph_reusing]: 7.90998e-06 [inline]: 2.27999e-06 [add_attr]: 0.00303937, [1] [add_attr_with_inline]: 0.0030311, [1] [Cycle 1]: 6.913e-05, [2] [tag_attr]: 3.446e-05 [meta_addattr_fg_expand]: 9.40001e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 4.934e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.0134647, [53] [py_interpret_to_execute]: 3.744e-05 [rewriter_before_opt_a]: 0.0001453 [opt_a]: 0.0111627, [3] [Cycle 1]: 0.0071419, [45] [expand_dump_flag]: 3.55e-06 [switch_simplify]: 7.375e-05 [loop_unroll]: 6.171e-05 [a_1]: 0.0014627 [with_stream_mark]: 2.221e-05 [recompute_prepare]: 2.109e-05 [updatestate_depend_eliminate]: 9.52999e-06 [updatestate_assign_eliminate]: 8.27e-06 [updatestate_loads_eliminate]: 7.25e-06 [parameter_eliminate]: 2.49001e-06 [a_2]: 0.00024572 [accelerated_algorithm]: 3.11e-05 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 3.47997e-06 [shard_inline]: 1.591e-05 [merge_send_recv]: 1.604e-05 [auto_parallel]: 1.069e-05 [parallel]: 1.773e-05 [flash_sp]: 1.129e-05 [merge_comm]: 1.011e-05 [allreduce_fusion]: 9.37999e-06 [matmul_add_comm_reduction]: 2.628e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.8e-05 [virtual_dataset]: 1.61e-05 [get_grad_eliminate_]: 1.518e-05 [virtual_output]: 1.55e-05 [merge_forward]: 9.82001e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.726e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.823e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 2.699e-05 [set_forward_comm_id_for_comm_node_pass]: 9.53002e-06 [meta_fg_expand]: 0.001396 [flash_sp_send_recv_attached]: 3.68e-06 [receive_attached]: 2.51e-06 [after_resolve]: 5.944e-05 [a_after_grad]: 8.101e-05 [renormalize]: 0.00249409 [add_forward_monad_depend]: 9.81e-06 [auto_monad_grad]: 5.67999e-06 [auto_monad_eliminator]: 5.963e-05 [cse]: 0.00017281 [a_3]: 0.00034381 [Cycle 2]: 0.00310134, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 5.207e-05 [loop_unroll]: 4.942e-05 [a_1]: 0.00158507 [with_stream_mark]: 1.205e-05 [recompute_prepare]: 1.09e-05 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 4.30999e-06 [updatestate_loads_eliminate]: 4e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 0.00012667 [accelerated_algorithm]: 4.282e-05 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 2.08002e-06 [shard_inline]: 9.83998e-06 [merge_send_recv]: 6.84999e-06 [auto_parallel]: 7.46001e-06 [parallel]: 4.92e-06 [flash_sp]: 3.18998e-06 [merge_comm]: 5.12999e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.93001e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 1.061e-05 [virtual_dataset]: 8.79e-06 [get_grad_eliminate_]: 8.95001e-06 [virtual_output]: 8.52e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 9.49978e-07 [offload_activation]: 9.66998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.699e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.408e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34998e-06 [meta_fg_expand]: 6.994e-05 [flash_sp_send_recv_attached]: 9.49978e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.582e-05 [a_after_grad]: 1.464e-05 [renormalize]: 0.00059188 [add_forward_monad_depend]: 4.11001e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.434e-05 [cse]: 4.636e-05 [a_3]: 6.479e-05 [Cycle 3]: 0.00090518, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.057e-05 [loop_unroll]: 8.94e-06 [a_1]: 0.00025014 [with_stream_mark]: 1.032e-05 [recompute_prepare]: 9.36e-06 [updatestate_depend_eliminate]: 4.96997e-06 [updatestate_assign_eliminate]: 4.01001e-06 [updatestate_loads_eliminate]: 4.08999e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012492 [accelerated_algorithm]: 1.181e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 9.32999e-06 [merge_send_recv]: 7.23999e-06 [auto_parallel]: 7.28999e-06 [parallel]: 4.32e-06 [flash_sp]: 9.80013e-07 [merge_comm]: 5.13002e-06 [allreduce_fusion]: 5.00999e-06 [matmul_add_comm_reduction]: 7.61001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 1.001e-05 [virtual_dataset]: 8.60001e-06 [get_grad_eliminate_]: 8.42e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.45999e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 8.60999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.6e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.39e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17e-06 [meta_fg_expand]: 3.06999e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.318e-05 [a_after_grad]: 1.427e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 1.052e-05 [cse]: 2.658e-05 [a_3]: 6.058e-05 [py_interpret_to_execute_after_opt_a]: 1.002e-05 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 4.587e-05 [convert_after_rewriter]: 8.77e-06 [order_py_execute_after_rewriter]: 6.46999e-06 [mutable_eliminate]: 0.0004589 [opt_b]: 0.00028954, [1] [Cycle 1]: 0.00028362, [7] [b_1]: 0.00018928 [b_2]: 1.116e-05 [updatestate_depend_eliminate]: 7.43999e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 4.03001e-06 [renormalize]: 3.80009e-07 [cse]: 3.307e-05 [optimize_parallel_all_gather_comm]: 2.025e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 1.953e-05 [loop_unroll]: 0.0004273 [opt_after_cconv]: 0.0001359, [1] [Cycle 1]: 0.00013008, [7] [c_1]: 4.845e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 7.28999e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 4.03001e-06 [cse]: 2.999e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 2.945e-05 [tuple_transform]: 0.00010195, [1] [Cycle 1]: 9.724e-05, [4] [d_1]: 6.687e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 1.017e-05 [partial_unused_args_eliminate]: 1.66002e-06 [add_recomputation]: 5.802e-05 [cse_after_recomputation]: 3.243e-05, [1] [Cycle 1]: 2.775e-05, [1] [cse]: 2.248e-05 [environ_conv]: 8.13999e-06 [swap_dp_allreduce_reducescatter]: 7.70998e-06 [bias_add_comm_swap]: 2.84999e-06 [label_micro_interleaved_index]: 4.72e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.76999e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 7.10017e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 2.50002e-06 [comm_op_add_attrs]: 9.90025e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.697e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 5.17e-06 [overlap_recompute_and_grad_model_parallel]: 5.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 4.88001e-06 [overlap_grad_flash_sp]: 2.666e-05 [begin_end_overlap_inline]: 5.79981e-07 [split_matmul_comm_elemetwise]: 2.16003e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 9.802e-05, [1] [Cycle 1]: 9.357e-05, [6] [build]: 9.15999e-06 [elim_shapecalc]: 1.343e-05 [elim_not_effective]: 1.817e-05 [opt_reshape]: 9.80002e-06 [fold_const_symbol]: 1.457e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 2.502e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00046949 [validate]: 4.331e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.0081478 [execute]: 6.68e-06 Sums bootstrap : 0.000525s : 1.59% type_inference : 0.011242s : 33.98% event_method : 0.000047s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000145s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000136s : 0.41% optimize.opt_a.loop_unroll : 0.000120s : 0.36% optimize.opt_a.a_1 : 0.003298s : 9.97% optimize.opt_a.with_stream_mark : 0.000045s : 0.13% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000497s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000086s : 0.26% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001469s : 4.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.003086s : 9.33% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.26% optimize.opt_a.cse : 0.000246s : 0.74% optimize.opt_a.a_3 : 0.000469s : 1.42% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000006s : 0.02% optimize.mutable_eliminate : 0.000459s : 1.39% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000427s : 1.29% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000027s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000469s : 1.42% validate : 0.000043s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008148s : 24.63% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000764 222 5.88% : 0.000045s : 12: substitution.arithmetic_simplify 1.77% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.83% : 0.000426s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.00% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.70% : 0.000013s : 20: substitution.remove_not_recompute_node 3.36% : 0.000026s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.62% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.82% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.69% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011169 2 86.82% : 0.009697s : 1: type_inference.infer 13.18% : 0.001472s : 1: type_inference.specialize ------[replace.] 0.000222 33 57.55% : 0.000128s : 17: replace.inline 42.45% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000452 33 92.51% : 0.000418s : 17: match.inline 7.49% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000765 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 68: predicate.addn_zero_filter 1.07% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000016s : 100: predicate.arithmetic_simplify 1.16% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000043s : 249: predicate.inline 1.24% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.31% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.08% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000009s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000016s : 101: predicate.partial_defer_inline 1.77% : 0.000014s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000010s : 68: predicate.reduce_eliminate 2.71% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.19% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000023s : 169: predicate.switch_layer_defer_inline 5.10% : 0.000039s : 277: predicate.switch_simplify 1.10% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.66% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001572 34 56.46% : 0.000888s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.54% : 0.000685s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062345 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.88% : 0.003044s : 1: add_attr 4.87% : 0.003035s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000126s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.90% : 0.000560s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.70% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.04% : 0.005015s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.91% : 0.011166s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000479s : 1: opt_after_jit_grad 0.47% : 0.000293s : 1: opt_b 21.60% : 0.013469s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.05% : 0.000031s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.07% : 0.000043s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.68% : 0.001671s : 2: renormalize.infer 2.25% : 0.001402s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.09% : 0.008158s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.05% : 0.011256s : 1: type_inference 0.12% : 0.000075s : 1: validate TotalTime = 0.0184288, [24] [bootstrap]: 0.0004574 [type_inference]: 0.00430679 [event_method]: 1.125e-05 [auto_monad]: 5.001e-05 [graph_reusing]: 4.76002e-06 [inline]: 1.77001e-06 [add_attr]: 0.00295945, [1] [add_attr_with_inline]: 0.00295123, [1] [Cycle 1]: 4.504e-05, [2] [tag_attr]: 1.141e-05 [meta_addattr_fg_expand]: 2.99999e-06 [parallel-infer-symbol]: 3.01001e-06 [pre_auto_parallel]: 2.114e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00366479, [53] [py_interpret_to_execute]: 1.518e-05 [rewriter_before_opt_a]: 3.95e-05 [opt_a]: 0.00187704, [2] [Cycle 1]: 0.0012801, [45] [expand_dump_flag]: 2.56998e-06 [switch_simplify]: 2.348e-05 [loop_unroll]: 1.404e-05 [a_1]: 0.00029321 [with_stream_mark]: 1.288e-05 [recompute_prepare]: 7.78999e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 1.818e-05 [parameter_eliminate]: 1.76e-06 [a_2]: 7.8e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 5.98998e-06 [merge_send_recv]: 8.13999e-06 [auto_parallel]: 6.01e-06 [parallel]: 1.849e-05 [flash_sp]: 7.42002e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.63999e-06 [matmul_add_comm_reduction]: 8.99998e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.18998e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.59002e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.151e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25e-06 [meta_fg_expand]: 2.02001e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 9.51998e-06 [renormalize]: 0.0003483 [add_forward_monad_depend]: 4.19002e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.302e-05 [cse]: 2.619e-05 [a_3]: 4.008e-05 [Cycle 2]: 0.00058817, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012592 [with_stream_mark]: 8.75001e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.75002e-06 [updatestate_assign_eliminate]: 2.23998e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.779e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.34002e-06 [auto_parallel]: 5.09998e-06 [parallel]: 4.27e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 3.09001e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 5.28002e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.31998e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.92998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18998e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 1.03001e-06 [receive_attached]: 1.07e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.1e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 5.93998e-06 [cse]: 1.192e-05 [a_3]: 3.165e-05 [py_interpret_to_execute_after_opt_a]: 7.31999e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 3.022e-05 [convert_after_rewriter]: 6.59999e-06 [order_py_execute_after_rewriter]: 4.58999e-06 [mutable_eliminate]: 0.00044783 [opt_b]: 0.00018014, [1] [Cycle 1]: 0.00017416, [7] [b_1]: 0.00010716 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.09e-06 [renormalize]: 4.2998e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.271e-05 [loop_unroll]: 0.00040914 [opt_after_cconv]: 9.307e-05, [1] [Cycle 1]: 8.725e-05, [7] [c_1]: 2.785e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.479e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.177e-05 [tuple_transform]: 6.822e-05, [1] [Cycle 1]: 6.361e-05, [4] [d_1]: 3.842e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.52001e-06 [add_recomputation]: 4.195e-05 [cse_after_recomputation]: 1.904e-05, [1] [Cycle 1]: 1.482e-05, [1] [cse]: 9.99001e-06 [environ_conv]: 4.45e-06 [swap_dp_allreduce_reducescatter]: 4.90001e-06 [bias_add_comm_swap]: 2.24001e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.48998e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.61999e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.35002e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 4.03999e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29003e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 3.92998e-06 [overlap_grad_flash_sp]: 1.755e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 1.86e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.725e-05, [1] [Cycle 1]: 6.332e-05, [6] [build]: 2.17001e-06 [elim_shapecalc]: 8.35001e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.62e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.59e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.502e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.0004432 [validate]: 3.021e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00624419 [execute]: 7.14001e-06 Sums bootstrap : 0.000457s : 3.15% type_inference : 0.004307s : 29.67% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000419s : 2.89% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000021s : 0.14% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000348s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.09% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000409s : 2.82% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000443s : 3.05% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006244s : 43.02% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 17.96% : 0.000022s : 4: substitution.arithmetic_simplify 1.40% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.21% : 0.000005s : 4: substitution.graph_param_transform 65.48% : 0.000080s : 2: substitution.inline 2.73% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.94% : 0.000005s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004266 2 90.63% : 0.003866s : 1: type_inference.infer 9.37% : 0.000400s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.42% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 1.04% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 13: predicate.environ_get_depend_swap 1.97% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.68% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.95% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.88% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 1.05% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.67% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000296 6 34.98% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.02% : 0.000192s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026331 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.26% : 0.002964s : 1: add_attr 11.22% : 0.002955s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000492s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000417s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000456s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.14% : 0.001880s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.72% : 0.000453s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 13.93% : 0.003669s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.73% : 0.000191s : 1: renormalize.infer 0.57% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.75% : 0.006254s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.41% : 0.004320s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0364179, [24] [bootstrap]: 0.00050021 [type_inference]: 0.0102821 [event_method]: 4.059e-05 [auto_monad]: 0.0001162 [graph_reusing]: 8.45999e-06 [inline]: 1.99e-06 [add_attr]: 0.00303077, [1] [add_attr_with_inline]: 0.00302227, [1] [Cycle 1]: 6.55e-05, [2] [tag_attr]: 3.131e-05 [meta_addattr_fg_expand]: 8.38001e-06 [parallel-infer-symbol]: 2.61999e-06 [pre_auto_parallel]: 4.641e-05 [insert-virtual-dataset]: 2.26998e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.08998e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.0135726, [53] [py_interpret_to_execute]: 3.407e-05 [rewriter_before_opt_a]: 0.0001254 [opt_a]: 0.0112955, [3] [Cycle 1]: 0.00738793, [45] [expand_dump_flag]: 3.66999e-06 [switch_simplify]: 6.644e-05 [loop_unroll]: 5.42e-05 [a_1]: 0.00133886 [with_stream_mark]: 2.242e-05 [recompute_prepare]: 2.164e-05 [updatestate_depend_eliminate]: 8.89998e-06 [updatestate_assign_eliminate]: 8.23999e-06 [updatestate_loads_eliminate]: 7.58999e-06 [parameter_eliminate]: 2.59999e-06 [a_2]: 0.00024526 [accelerated_algorithm]: 3.023e-05 [shard]: 2.17999e-06 [meta_shard_fg_expand]: 3.21001e-06 [shard_inline]: 1.636e-05 [merge_send_recv]: 1.577e-05 [auto_parallel]: 1.11e-05 [parallel]: 1.814e-05 [flash_sp]: 1.212e-05 [merge_comm]: 9.96e-06 [allreduce_fusion]: 8.77999e-06 [matmul_add_comm_reduction]: 2.776e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.813e-05 [virtual_dataset]: 1.561e-05 [get_grad_eliminate_]: 1.491e-05 [virtual_output]: 1.507e-05 [merge_forward]: 9.54999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.754e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.824e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.682e-05 [set_forward_comm_id_for_comm_node_pass]: 9.67001e-06 [meta_fg_expand]: 0.00191541 [flash_sp_send_recv_attached]: 3.93001e-06 [receive_attached]: 2.58998e-06 [after_resolve]: 6.093e-05 [a_after_grad]: 8.081e-05 [renormalize]: 0.00238444 [add_forward_monad_depend]: 9.75002e-06 [auto_monad_grad]: 5.46e-06 [auto_monad_eliminator]: 5.543e-05 [cse]: 0.00016525 [a_3]: 0.0003356 [Cycle 2]: 0.00299499, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.647e-05 [loop_unroll]: 9.537e-05 [a_1]: 0.00152408 [with_stream_mark]: 1.249e-05 [recompute_prepare]: 1.067e-05 [updatestate_depend_eliminate]: 5.64e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012626 [accelerated_algorithm]: 1.193e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.27999e-06 [merge_send_recv]: 6.61e-06 [auto_parallel]: 7.3e-06 [parallel]: 4.75999e-06 [flash_sp]: 3.04001e-06 [merge_comm]: 5.26002e-06 [allreduce_fusion]: 4.74e-06 [matmul_add_comm_reduction]: 7.73001e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.033e-05 [virtual_dataset]: 8.90001e-06 [get_grad_eliminate_]: 8.70001e-06 [virtual_output]: 8.37e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.602e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.394e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44e-06 [meta_fg_expand]: 3.507e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.492e-05 [a_after_grad]: 1.431e-05 [renormalize]: 0.00057881 [add_forward_monad_depend]: 3.73999e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.455e-05 [cse]: 4.568e-05 [a_3]: 6.469e-05 [Cycle 3]: 0.00089856, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 1.07e-05 [loop_unroll]: 8.85999e-06 [a_1]: 0.0002491 [with_stream_mark]: 1.003e-05 [recompute_prepare]: 8.99003e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 3.97998e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 0.00012253 [accelerated_algorithm]: 1.207e-05 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 7.15e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.54002e-06 [flash_sp]: 1.10999e-06 [merge_comm]: 4.87998e-06 [allreduce_fusion]: 4.95001e-06 [matmul_add_comm_reduction]: 7.85e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 8.80001e-06 [get_grad_eliminate_]: 8.50999e-06 [virtual_output]: 8.29002e-06 [merge_forward]: 4.20999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 8.62e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.564e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.427e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 1.321e-05 [a_after_grad]: 1.409e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 1.099e-05 [cse]: 2.611e-05 [a_3]: 6.008e-05 [py_interpret_to_execute_after_opt_a]: 1.045e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 4.708e-05 [convert_after_rewriter]: 8.96002e-06 [order_py_execute_after_rewriter]: 6.91001e-06 [mutable_eliminate]: 0.00049842 [opt_b]: 0.00028652, [1] [Cycle 1]: 0.00028039, [7] [b_1]: 0.00018836 [b_2]: 1.07e-05 [updatestate_depend_eliminate]: 7.33999e-06 [updatestate_assign_eliminate]: 4.24002e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 3.69997e-07 [cse]: 3.073e-05 [optimize_parallel_all_gather_comm]: 2.083e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.041e-05 [loop_unroll]: 0.00042338 [opt_after_cconv]: 0.00013496, [1] [Cycle 1]: 0.00012896, [7] [c_1]: 4.834e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 7.34002e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.86001e-06 [cse]: 2.895e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 2.911e-05 [tuple_transform]: 0.00010225, [1] [Cycle 1]: 9.742e-05, [4] [d_1]: 6.705e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 9.67999e-06 [partial_unused_args_eliminate]: 1.66002e-06 [add_recomputation]: 5.738e-05 [cse_after_recomputation]: 3.257e-05, [1] [Cycle 1]: 2.733e-05, [1] [cse]: 2.155e-05 [environ_conv]: 8.40999e-06 [swap_dp_allreduce_reducescatter]: 7.89997e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 4.02002e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.42001e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.29984e-07 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.42999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.722e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 4.87e-06 [overlap_recompute_and_grad_model_parallel]: 5.46e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 5.00999e-06 [overlap_grad_flash_sp]: 2.421e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.15002e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 9.851e-05, [1] [Cycle 1]: 9.444e-05, [6] [build]: 9.74e-06 [elim_shapecalc]: 1.355e-05 [elim_not_effective]: 1.767e-05 [opt_reshape]: 1.013e-05 [fold_const_symbol]: 1.472e-05 [renormalize]: 1.69995e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.416e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00046839 [validate]: 4.44e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00805307 [execute]: 6.90002e-06 Sums bootstrap : 0.000500s : 1.56% type_inference : 0.010282s : 31.99% event_method : 0.000041s : 0.13% auto_monad : 0.000116s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000125s : 0.39% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.38% optimize.opt_a.loop_unroll : 0.000158s : 0.49% optimize.opt_a.a_1 : 0.003112s : 9.68% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001953s : 6.08% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.28% optimize.opt_a.a_after_grad : 0.000109s : 0.34% optimize.opt_a.renormalize : 0.002963s : 9.22% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000237s : 0.74% optimize.opt_a.a_3 : 0.000460s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000498s : 1.55% optimize.opt_b.b_1 : 0.000188s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000423s : 1.32% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000468s : 1.46% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008053s : 25.06% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000724 218 5.88% : 0.000043s : 11: substitution.arithmetic_simplify 1.89% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.70% : 0.000396s : 16: substitution.inline 2.15% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.07% : 0.000015s : 3: substitution.less_batch_normalization 1.80% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000005s : 5: substitution.partial_eliminate 1.82% : 0.000013s : 20: substitution.remove_not_recompute_node 3.28% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.78% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.88% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.48% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010214 2 87.25% : 0.008911s : 1: type_inference.infer 12.75% : 0.001302s : 1: type_inference.specialize ------[replace.] 0.000198 30 59.61% : 0.000118s : 16: replace.inline 40.39% : 0.000080s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000418 30 92.83% : 0.000388s : 16: match.inline 7.17% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000741 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.07% : 0.000015s : 99: predicate.arithmetic_simplify 1.18% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.20% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.51% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.44% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.96% : 0.000015s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.10% : 0.000008s : 67: predicate.transpose_eliminate 1.51% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001459 32 57.55% : 0.000840s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.45% : 0.000619s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061121 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.97% : 0.003035s : 1: add_attr 4.95% : 0.003026s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000123s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.87% : 0.000534s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.83% : 0.000507s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.87% : 0.004810s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 18.49% : 0.011299s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.78% : 0.000478s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 22.21% : 0.013576s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.58% : 0.001575s : 2: renormalize.infer 2.25% : 0.001375s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.21% : 0.000130s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.19% : 0.008063s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 16.85% : 0.010296s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x3-kbk],max_mem:44.0M TotalTime = 0.0794448, [24] [bootstrap]: 0.00055467 [type_inference]: 0.0062512 [event_method]: 1.402e-05 [auto_monad]: 5.696e-05 [graph_reusing]: 5.76e-06 [inline]: 2.21e-06 [add_attr]: 0.00344708, [1] [add_attr_with_inline]: 0.00343617, [1] [Cycle 1]: 4.471e-05, [2] [tag_attr]: 1.545e-05 [meta_addattr_fg_expand]: 4.4e-06 [parallel-infer-symbol]: 2.64999e-06 [pre_auto_parallel]: 2.733e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 2.11e-06 [optimize]: 0.00399213, [53] [py_interpret_to_execute]: 1.962e-05 [rewriter_before_opt_a]: 5.743e-05 [opt_a]: 0.00215325, [2] [Cycle 1]: 0.00155934, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 3.134e-05 [loop_unroll]: 2.15e-05 [a_1]: 0.00045202 [with_stream_mark]: 1.296e-05 [recompute_prepare]: 7.41999e-06 [updatestate_depend_eliminate]: 3.95998e-06 [updatestate_assign_eliminate]: 3.25998e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.588e-05 [accelerated_algorithm]: 6.69001e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 7.75e-06 [auto_parallel]: 5.63002e-06 [parallel]: 2.323e-05 [flash_sp]: 6.86999e-06 [merge_comm]: 4.07998e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.70999e-06 [allreduce_slice_to_reducescatter]: 7.40023e-07 [virtual_shard_identity]: 7.46999e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.94e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.098e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.14998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 9.99999e-06 [a_after_grad]: 8.64998e-06 [renormalize]: 0.00041441 [add_forward_monad_depend]: 4.65999e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.386e-05 [cse]: 2.64e-05 [a_3]: 4.038e-05 [Cycle 2]: 0.0005844, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012459 [with_stream_mark]: 9.17001e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.70997e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.714e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.12998e-06 [auto_parallel]: 5.06002e-06 [parallel]: 4.63001e-06 [flash_sp]: 2.93998e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.99003e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.44998e-06 [offload_activation]: 5.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.10001e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.73999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.10002e-06 [meta_fg_expand]: 1.71002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.97999e-06 [a_after_grad]: 7.94002e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 5.86998e-06 [cse]: 1.285e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 6.96001e-06 [slice_cell_reuse_recomputed_activation]: 1.86003e-06 [rewriter_after_opt_a]: 3.071e-05 [convert_after_rewriter]: 6.64999e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.00044975 [opt_b]: 0.00018477, [1] [Cycle 1]: 0.00017859, [7] [b_1]: 0.00010678 [b_2]: 7.01999e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 5.69999e-07 [cse]: 1.98e-05 [optimize_parallel_all_gather_comm]: 1.555e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.248e-05 [loop_unroll]: 0.0004151 [opt_after_cconv]: 9.548e-05, [1] [Cycle 1]: 8.978e-05, [7] [c_1]: 2.797e-05 [parameter_eliminate]: 2.48e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.589e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.217e-05 [tuple_transform]: 6.877e-05, [1] [Cycle 1]: 6.456e-05, [4] [d_1]: 3.9e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.909e-05 [cse_after_recomputation]: 2.04e-05, [1] [Cycle 1]: 1.586e-05, [1] [cse]: 1.045e-05 [environ_conv]: 4.50001e-06 [swap_dp_allreduce_reducescatter]: 4.87998e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.14998e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 1.24998e-06 [interleave_split_concat_branches]: 1.44e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.119e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.61002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 3.76999e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 7.7e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.79e-05, [1] [Cycle 1]: 6.401e-05, [6] [build]: 2.02999e-06 [elim_shapecalc]: 8.25999e-06 [elim_not_effective]: 1.227e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.77999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.459e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00045164 [validate]: 3.012e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.0643622 [execute]: 9.09e-06 Sums bootstrap : 0.000555s : 0.74% type_inference : 0.006251s : 8.34% event_method : 0.000014s : 0.02% auto_monad : 0.000057s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000577s : 0.77% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000414s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.60% optimize.opt_b.b_1 : 0.000107s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000020s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.07% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000452s : 0.60% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064362s : 85.84% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000161 30 14.57% : 0.000023s : 5: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 66.93% : 0.000108s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.53% : 0.000004s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 6.70% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006203 2 90.97% : 0.005642s : 1: type_inference.infer 9.03% : 0.000560s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.20% : 0.000027s : 3: replace.inline 29.80% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.56% : 0.000106s : 3: match.inline 8.44% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.09% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.92% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000009s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.23% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.65% : 0.000003s : 21: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.46% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.23% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.48% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000348 8 46.77% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.23% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088391 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.90% : 0.003451s : 1: add_attr 3.89% : 0.003440s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000062s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000592s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.06% : 0.000941s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.44% : 0.002156s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.52% : 0.000461s : 1: opt_after_jit_grad 0.21% : 0.000188s : 1: opt_b 4.52% : 0.003996s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000212s : 1: renormalize.infer 0.22% : 0.000195s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 72.83% : 0.064378s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 7.09% : 0.006265s : 1: type_inference 0.07% : 0.000059s : 1: validate TotalTime = 0.0712744, [24] [bootstrap]: 0.00047226 [type_inference]: 0.00439115 [event_method]: 1.104e-05 [auto_monad]: 4.958e-05 [graph_reusing]: 4.90999e-06 [inline]: 1.71998e-06 [add_attr]: 0.00298181, [1] [add_attr_with_inline]: 0.00297318, [1] [Cycle 1]: 8.425e-05, [2] [tag_attr]: 1.18e-05 [meta_addattr_fg_expand]: 3.08e-06 [parallel-infer-symbol]: 2.50002e-06 [pre_auto_parallel]: 2.139e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00364766, [53] [py_interpret_to_execute]: 1.527e-05 [rewriter_before_opt_a]: 3.956e-05 [opt_a]: 0.00185995, [2] [Cycle 1]: 0.00125426, [45] [expand_dump_flag]: 2.96001e-06 [switch_simplify]: 2.381e-05 [loop_unroll]: 1.409e-05 [a_1]: 0.00029247 [with_stream_mark]: 1.333e-05 [recompute_prepare]: 7.53999e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.34001e-06 [updatestate_loads_eliminate]: 2.90998e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.464e-05 [accelerated_algorithm]: 6.12001e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.76998e-06 [merge_send_recv]: 8.19998e-06 [auto_parallel]: 5.62999e-06 [parallel]: 1.83e-05 [flash_sp]: 7.51999e-06 [merge_comm]: 3.50998e-06 [allreduce_fusion]: 3.27002e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 7.09001e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.26002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.91e-06 [before_grad]: 9.15001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00034607 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.318e-05 [cse]: 2.63e-05 [a_3]: 4.056e-05 [Cycle 2]: 0.00059679, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 6.97002e-06 [loop_unroll]: 5.57001e-06 [a_1]: 0.00012463 [with_stream_mark]: 1.086e-05 [recompute_prepare]: 5.98002e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.754e-05 [accelerated_algorithm]: 5.59998e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.03001e-06 [flash_sp]: 3.09999e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.18999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.05998e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.89e-06 [a_after_grad]: 8e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 5.97999e-06 [cse]: 1.377e-05 [a_3]: 3.33e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 2.19999e-06 [rewriter_after_opt_a]: 3.045e-05 [convert_after_rewriter]: 6.95002e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00044009 [opt_b]: 0.00018091, [1] [Cycle 1]: 0.00017483, [7] [b_1]: 0.00010798 [b_2]: 6.96999e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 3.30008e-07 [cse]: 1.64e-05 [optimize_parallel_all_gather_comm]: 1.604e-05 [overlap_param_gather]: 2.28998e-06 [cconv]: 2.217e-05 [loop_unroll]: 0.00040786 [opt_after_cconv]: 9.435e-05, [1] [Cycle 1]: 8.864e-05, [7] [c_1]: 2.801e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.65002e-06 [updatestate_loads_eliminate]: 2.40002e-06 [cse]: 1.599e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.204e-05 [tuple_transform]: 6.874e-05, [1] [Cycle 1]: 6.432e-05, [4] [d_1]: 3.906e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 5.96998e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 4.245e-05 [cse_after_recomputation]: 1.982e-05, [1] [Cycle 1]: 1.547e-05, [1] [cse]: 1.057e-05 [environ_conv]: 4.64002e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.78e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.14998e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.164e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.723e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.782e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.12999e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.105e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 8.48001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.54e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.13e-06 [opt_after_jit_grad]: 0.00046647 [validate]: 3.098e-05 [backend_pass]: 8.30012e-07 [task_emit]: 0.0589558 [execute]: 8.57998e-06 Sums bootstrap : 0.000472s : 0.70% type_inference : 0.004391s : 6.52% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000417s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000346s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000440s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000408s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000466s : 0.69% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058956s : 87.56% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.29% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.60% : 0.000006s : 4: substitution.graph_param_transform 66.12% : 0.000079s : 2: substitution.inline 2.23% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 2.97% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004352 2 91.71% : 0.003991s : 1: type_inference.infer 8.29% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000139 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.66% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.08% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.44% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.76% : 0.000002s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.97% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.82% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.90% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 1.05% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.67% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000006s : 41: predicate.switch_simplify 0.79% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.44% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.62% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 1.08% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.00% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000260 6 43.74% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.26% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079175 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.77% : 0.002986s : 1: add_attr 3.76% : 0.002977s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.64% : 0.000506s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000417s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000449s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000768s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.35% : 0.001863s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.60% : 0.000476s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.61% : 0.003651s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000184s : 1: renormalize.infer 0.20% : 0.000156s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.48% : 0.058973s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.56% : 0.004405s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.072296, [24] [bootstrap]: 0.00046296 [type_inference]: 0.0055461 [event_method]: 1.429e-05 [auto_monad]: 5.304e-05 [graph_reusing]: 5.99999e-06 [inline]: 1.95001e-06 [add_attr]: 0.00295319, [1] [add_attr_with_inline]: 0.00294529, [1] [Cycle 1]: 4.396e-05, [2] [tag_attr]: 1.537e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.431e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.83002e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0039725, [53] [py_interpret_to_execute]: 1.995e-05 [rewriter_before_opt_a]: 5.832e-05 [opt_a]: 0.00211957, [2] [Cycle 1]: 0.0015127, [45] [expand_dump_flag]: 2.82002e-06 [switch_simplify]: 3.115e-05 [loop_unroll]: 2.113e-05 [a_1]: 0.00044833 [with_stream_mark]: 1.356e-05 [recompute_prepare]: 7.69002e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.611e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.09999e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.73001e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.816e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.67998e-06 [allreduce_fusion]: 3.80998e-06 [matmul_add_comm_reduction]: 8.25999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.13998e-06 [virtual_dataset]: 5.73002e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 1.67999e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 1.042e-05 [a_after_grad]: 8.90001e-06 [renormalize]: 0.00042258 [add_forward_monad_depend]: 4.68001e-06 [auto_monad_grad]: 1.84998e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.756e-05 [a_3]: 4.264e-05 [Cycle 2]: 0.00059756, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 7.66001e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012508 [with_stream_mark]: 9.62999e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.774e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.30001e-06 [parallel]: 3.81001e-06 [flash_sp]: 3.21999e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.09998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.48e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.53998e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 5.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62001e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.70001e-06 [a_after_grad]: 8.05e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.357e-05 [a_3]: 3.16e-05 [py_interpret_to_execute_after_opt_a]: 7.48e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.041e-05 [convert_after_rewriter]: 7.48999e-06 [order_py_execute_after_rewriter]: 5.65001e-06 [mutable_eliminate]: 0.00044982 [opt_b]: 0.00018101, [1] [Cycle 1]: 0.0001751, [7] [b_1]: 0.00010795 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 4.10015e-07 [cse]: 1.641e-05 [optimize_parallel_all_gather_comm]: 1.592e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.212e-05 [loop_unroll]: 0.00041162 [opt_after_cconv]: 9.606e-05, [1] [Cycle 1]: 9.014e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.26003e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.657e-05 [renormalize]: 4.59986e-07 [remove_dup_value]: 1.22e-05 [tuple_transform]: 7.046e-05, [1] [Cycle 1]: 6.609e-05, [4] [d_1]: 3.993e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 2.03997e-06 [add_recomputation]: 4.394e-05 [cse_after_recomputation]: 2.083e-05, [1] [Cycle 1]: 1.645e-05, [1] [cse]: 1.143e-05 [environ_conv]: 4.53999e-06 [swap_dp_allreduce_reducescatter]: 5.54e-06 [bias_add_comm_swap]: 2.88e-06 [label_micro_interleaved_index]: 4.17998e-06 [label_fine_grained_interleaved_index]: 2.44001e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.51998e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 2.90002e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.47001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 2.11e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.35e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.21998e-06 [overlap_grad_ring_attention]: 4.07e-06 [overlap_grad_flash_sp]: 1.702e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.19998e-06 [symbol_engine_optimizer]: 7.032e-05, [1] [Cycle 1]: 6.593e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 9.34998e-06 [elim_not_effective]: 1.174e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 1.70025e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 1.56e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.86999e-06 [opt_after_jit_grad]: 0.00044475 [validate]: 3.179e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0585442 [execute]: 8.23999e-06 Sums bootstrap : 0.000463s : 0.68% type_inference : 0.005546s : 8.11% event_method : 0.000014s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000573s : 0.84% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000423s : 0.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.66% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000445s : 0.65% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058544s : 85.64% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000163 30 15.25% : 0.000025s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.54% : 0.000006s : 4: substitution.graph_param_transform 66.50% : 0.000108s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 6.44% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005505 2 90.19% : 0.004965s : 1: type_inference.infer 9.81% : 0.000540s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.42% : 0.000027s : 3: replace.inline 29.58% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.84% : 0.000106s : 3: match.inline 8.16% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.03% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.81% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.61% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.31% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.52% : 0.000002s : 21: predicate.replace_applicator 0.90% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.20% : 0.000008s : 54: predicate.switch_simplify 0.78% : 0.000001s : 11: predicate.tile_eliminate 1.03% : 0.000002s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.54% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.46% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080741 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.66% : 0.002957s : 1: add_attr 3.65% : 0.002949s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.62% : 0.000498s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.03% : 0.000021s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000420s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.16% : 0.000939s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.63% : 0.002123s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.56% : 0.000454s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.92% : 0.003976s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000208s : 1: renormalize.infer 0.26% : 0.000208s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.08% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 72.53% : 0.058562s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 6.89% : 0.005560s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.108953, [24] [bootstrap]: 0.00048043 [type_inference]: 0.0114018 [event_method]: 4.849e-05 [auto_monad]: 0.00011945 [graph_reusing]: 8.19998e-06 [inline]: 1.82999e-06 [add_attr]: 0.00304043, [1] [add_attr_with_inline]: 0.00303191, [1] [Cycle 1]: 7.029e-05, [2] [tag_attr]: 3.366e-05 [meta_addattr_fg_expand]: 9.48002e-06 [parallel-infer-symbol]: 3.09999e-06 [pre_auto_parallel]: 4.906e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.83002e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0133437, [53] [py_interpret_to_execute]: 3.835e-05 [rewriter_before_opt_a]: 0.00014474 [opt_a]: 0.0110759, [3] [Cycle 1]: 0.00712763, [45] [expand_dump_flag]: 3.92002e-06 [switch_simplify]: 7.357e-05 [loop_unroll]: 6.195e-05 [a_1]: 0.00144265 [with_stream_mark]: 2.36e-05 [recompute_prepare]: 2.174e-05 [updatestate_depend_eliminate]: 9.06002e-06 [updatestate_assign_eliminate]: 7.7e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 2.57001e-06 [a_2]: 0.00024437 [accelerated_algorithm]: 2.99e-05 [shard]: 1.82999e-06 [meta_shard_fg_expand]: 3.24001e-06 [shard_inline]: 1.587e-05 [merge_send_recv]: 1.622e-05 [auto_parallel]: 1.066e-05 [parallel]: 1.783e-05 [flash_sp]: 1.094e-05 [merge_comm]: 9.87999e-06 [allreduce_fusion]: 8.64998e-06 [matmul_add_comm_reduction]: 2.628e-05 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 1.751e-05 [virtual_dataset]: 1.574e-05 [get_grad_eliminate_]: 1.514e-05 [virtual_output]: 1.492e-05 [merge_forward]: 3.262e-05 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 1.795e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.859e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 2.722e-05 [set_forward_comm_id_for_comm_node_pass]: 9.39e-06 [meta_fg_expand]: 0.00140503 [flash_sp_send_recv_attached]: 3.78001e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 5.846e-05 [a_after_grad]: 8.07e-05 [renormalize]: 0.00249371 [add_forward_monad_depend]: 9.32001e-06 [auto_monad_grad]: 5.10999e-06 [auto_monad_eliminator]: 5.645e-05 [cse]: 0.00016661 [a_3]: 0.00033808 [Cycle 2]: 0.00300782, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.708e-05 [loop_unroll]: 4.424e-05 [a_1]: 0.00153156 [with_stream_mark]: 1.206e-05 [recompute_prepare]: 1.115e-05 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 0.00012667 [accelerated_algorithm]: 1.239e-05 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.96e-06 [shard_inline]: 9.12999e-06 [merge_send_recv]: 7.00998e-06 [auto_parallel]: 7.27002e-06 [parallel]: 4.92e-06 [flash_sp]: 3.45998e-06 [merge_comm]: 5.16002e-06 [allreduce_fusion]: 4.65999e-06 [matmul_add_comm_reduction]: 7.4e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.022e-05 [virtual_dataset]: 8.77999e-06 [get_grad_eliminate_]: 8.94998e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.53001e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.11002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.652e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.374e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 6.76e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.598e-05 [a_after_grad]: 1.461e-05 [renormalize]: 0.00059867 [add_forward_monad_depend]: 4.18999e-06 [auto_monad_grad]: 1.33002e-06 [auto_monad_eliminator]: 1.454e-05 [cse]: 4.618e-05 [a_3]: 6.586e-05 [Cycle 3]: 0.00092511, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.052e-05 [loop_unroll]: 9.14e-06 [a_1]: 0.00026947 [with_stream_mark]: 1.016e-05 [recompute_prepare]: 9.49e-06 [updatestate_depend_eliminate]: 4.83001e-06 [updatestate_assign_eliminate]: 3.91001e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.00012416 [accelerated_algorithm]: 1.152e-05 [shard]: 9.10019e-07 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 6.99001e-06 [auto_parallel]: 7.1e-06 [parallel]: 4.48001e-06 [flash_sp]: 9.79984e-07 [merge_comm]: 4.90001e-06 [allreduce_fusion]: 4.74e-06 [matmul_add_comm_reduction]: 7.58001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 9.84999e-06 [virtual_dataset]: 8.74998e-06 [get_grad_eliminate_]: 8.48999e-06 [virtual_output]: 8.29002e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 8.45999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.58e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.424e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 3.00998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.361e-05 [a_after_grad]: 1.447e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.124e-05 [cse]: 2.713e-05 [a_3]: 5.982e-05 [py_interpret_to_execute_after_opt_a]: 1.019e-05 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 4.768e-05 [convert_after_rewriter]: 9.54999e-06 [order_py_execute_after_rewriter]: 6.68e-06 [mutable_eliminate]: 0.00046095 [opt_b]: 0.00028994, [1] [Cycle 1]: 0.00028373, [7] [b_1]: 0.000191 [b_2]: 1.082e-05 [updatestate_depend_eliminate]: 6.96001e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.96001e-06 [renormalize]: 2.00002e-07 [cse]: 3.182e-05 [optimize_parallel_all_gather_comm]: 2.103e-05 [overlap_param_gather]: 2.16e-06 [cconv]: 1.952e-05 [loop_unroll]: 0.00042492 [opt_after_cconv]: 0.00013619, [1] [Cycle 1]: 0.00013038, [7] [c_1]: 4.882e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 6.98998e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.91001e-06 [cse]: 3.003e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 2.822e-05 [tuple_transform]: 0.00010288, [1] [Cycle 1]: 9.797e-05, [4] [d_1]: 6.79e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 9.74999e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 5.648e-05 [cse_after_recomputation]: 3.316e-05, [1] [Cycle 1]: 2.839e-05, [1] [cse]: 2.256e-05 [environ_conv]: 8.85001e-06 [swap_dp_allreduce_reducescatter]: 7.90998e-06 [bias_add_comm_swap]: 2.14999e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.55999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.18001e-06 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 2.81e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91998e-06 [control_data_broadcast_order]: 1.634e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 4.99998e-06 [overlap_recompute_and_grad_model_parallel]: 6.17999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 5.02e-06 [overlap_grad_flash_sp]: 2.448e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.81998e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 9.851e-05, [1] [Cycle 1]: 9.41e-05, [6] [build]: 9.64999e-06 [elim_shapecalc]: 1.395e-05 [elim_not_effective]: 1.826e-05 [opt_reshape]: 9.94001e-06 [fold_const_symbol]: 1.418e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.99999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.55e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.0004688 [validate]: 4.45e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.0796848 [execute]: 8.51002e-06 Sums bootstrap : 0.000480s : 0.46% type_inference : 0.011402s : 10.89% event_method : 0.000048s : 0.05% auto_monad : 0.000119s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000145s : 0.14% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.13% optimize.opt_a.loop_unroll : 0.000115s : 0.11% optimize.opt_a.a_1 : 0.003244s : 3.10% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000031s : 0.03% optimize.opt_a.merge_forward : 0.000041s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001476s : 1.41% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.10% optimize.opt_a.renormalize : 0.003092s : 2.95% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000240s : 0.23% optimize.opt_a.a_3 : 0.000464s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.05% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000461s : 0.44% optimize.opt_b.b_1 : 0.000191s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000425s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000469s : 0.45% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079685s : 76.14% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000754 222 5.89% : 0.000044s : 12: substitution.arithmetic_simplify 1.90% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.64% : 0.000419s : 17: substitution.inline 2.05% : 0.000015s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.88% : 0.000014s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000013s : 20: substitution.remove_not_recompute_node 3.10% : 0.000023s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.81% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011327 2 86.65% : 0.009815s : 1: type_inference.infer 13.35% : 0.001512s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.44% : 0.000126s : 17: replace.inline 42.56% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 33 92.25% : 0.000410s : 17: match.inline 7.75% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000757 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.26% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.81% : 0.000014s : 108: predicate.environ_get_eliminate 1.24% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000042s : 249: predicate.inline 1.23% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.69% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.72% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001599 34 56.78% : 0.000908s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.22% : 0.000691s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.133664 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.28% : 0.003045s : 1: add_attr 2.27% : 0.003036s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000127s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.39% : 0.000515s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000019s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000056s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.67% : 0.004908s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000176s : 28: opt.transform.opt_b 0.06% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.29% : 0.011079s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.36% : 0.000478s : 1: opt_after_jit_grad 0.22% : 0.000293s : 1: opt_b 9.99% : 0.013348s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 1.21% : 0.001618s : 2: renormalize.infer 1.09% : 0.001461s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.11% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 59.63% : 0.079702s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 8.54% : 0.011416s : 1: type_inference 0.05% : 0.000068s : 1: validate TotalTime = 0.0702974, [24] [bootstrap]: 0.0004527 [type_inference]: 0.00429338 [event_method]: 1.007e-05 [auto_monad]: 5.219e-05 [graph_reusing]: 5.22999e-06 [inline]: 2.06e-06 [add_attr]: 0.00300179, [1] [add_attr_with_inline]: 0.00299417, [1] [Cycle 1]: 4.624e-05, [2] [tag_attr]: 1.19e-05 [meta_addattr_fg_expand]: 3.38e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.123e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.55999e-06 [pipeline_split]: 1.46002e-06 [optimize]: 0.00369601, [53] [py_interpret_to_execute]: 1.494e-05 [rewriter_before_opt_a]: 3.928e-05 [opt_a]: 0.0018951, [2] [Cycle 1]: 0.00129687, [45] [expand_dump_flag]: 3.01001e-06 [switch_simplify]: 2.314e-05 [loop_unroll]: 1.369e-05 [a_1]: 0.00029445 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 7.602e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 2.50997e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 7.66999e-06 [auto_parallel]: 6.01e-06 [parallel]: 1.776e-05 [flash_sp]: 7.57998e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 3.27002e-06 [matmul_add_comm_reduction]: 9.01002e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 6.87002e-06 [virtual_dataset]: 5.57999e-06 [get_grad_eliminate_]: 5.73002e-06 [virtual_output]: 5.44e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.53002e-06 [offload_activation]: 9.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 9.22001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.68e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.09e-06 [after_resolve]: 1.057e-05 [a_after_grad]: 9.00001e-06 [renormalize]: 0.00034347 [add_forward_monad_depend]: 4.31002e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.312e-05 [cse]: 6.624e-05 [a_3]: 4.15e-05 [Cycle 2]: 0.00058907, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 6.89001e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012486 [with_stream_mark]: 9.58002e-06 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.782e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 1.25001e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.15999e-06 [auto_parallel]: 4.95999e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.13998e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.54999e-06 [matmul_add_comm_reduction]: 5.17999e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.07001e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 4.82998e-06 [virtual_output]: 5.20999e-06 [merge_forward]: 2.48998e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95998e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 6.80011e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 8.66997e-06 [a_after_grad]: 7.78999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 6.49001e-06 [cse]: 1.281e-05 [a_3]: 3.193e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 2.14999e-06 [rewriter_after_opt_a]: 3.118e-05 [convert_after_rewriter]: 6.89999e-06 [order_py_execute_after_rewriter]: 5.62999e-06 [mutable_eliminate]: 0.00044626 [opt_b]: 0.00018127, [1] [Cycle 1]: 0.0001751, [7] [b_1]: 0.00010757 [b_2]: 7.38e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.21998e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 3.4002e-07 [cse]: 1.599e-05 [optimize_parallel_all_gather_comm]: 1.592e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.118e-05 [loop_unroll]: 0.00041105 [opt_after_cconv]: 9.512e-05, [1] [Cycle 1]: 8.939e-05, [7] [c_1]: 2.794e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.639e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 1.271e-05 [tuple_transform]: 6.923e-05, [1] [Cycle 1]: 6.496e-05, [4] [d_1]: 3.902e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.55001e-06 [add_recomputation]: 4.378e-05 [cse_after_recomputation]: 2.088e-05, [1] [Cycle 1]: 1.629e-05, [1] [cse]: 1.124e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 5.48002e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.44001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.03002e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.64001e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.22999e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.147e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.3e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.759e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.33998e-06 [split_layernorm_comm]: 1.58002e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.891e-05, [1] [Cycle 1]: 6.471e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 8.70999e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.89998e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 1.568e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.0004455 [validate]: 3.203e-05 [backend_pass]: 1.09998e-06 [task_emit]: 0.0580461 [execute]: 8.37998e-06 Sums bootstrap : 0.000453s : 0.68% type_inference : 0.004293s : 6.47% event_method : 0.000010s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.63% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000344s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000079s : 0.12% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.67% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000411s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000446s : 0.67% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058046s : 87.51% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.33% : 0.000022s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.10% : 0.000001s : 2: substitution.fold_const_symbol 4.11% : 0.000005s : 4: substitution.graph_param_transform 65.88% : 0.000080s : 2: substitution.inline 2.39% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.18% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004253 2 91.72% : 0.003901s : 1: type_inference.infer 8.28% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.98% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000008s : 44: predicate.inline 1.08% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.82% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.06% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 1.06% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.24% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.39% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.21% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.02% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.98% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078263 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.84% : 0.003006s : 1: add_attr 3.83% : 0.002997s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000489s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000015s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000455s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000768s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.43% : 0.001898s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.58% : 0.000455s : 1: opt_after_jit_grad 0.24% : 0.000184s : 1: opt_b 4.73% : 0.003700s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000186s : 1: renormalize.infer 0.19% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.19% : 0.058062s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.50% : 0.004306s : 1: type_inference 0.07% : 0.000053s : 1: validate . TotalTime = 0.1088, [24] [bootstrap]: 0.00049229 [type_inference]: 0.010281 [event_method]: 4.265e-05 [auto_monad]: 0.00011508 [graph_reusing]: 7.88001e-06 [inline]: 2.04e-06 [add_attr]: 0.00297384, [1] [add_attr_with_inline]: 0.0029656, [1] [Cycle 1]: 9.951e-05, [2] [tag_attr]: 6.445e-05 [meta_addattr_fg_expand]: 8.29998e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 4.495e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.0130248, [53] [py_interpret_to_execute]: 3.529e-05 [rewriter_before_opt_a]: 0.00012489 [opt_a]: 0.0107766, [3] [Cycle 1]: 0.0068705, [45] [expand_dump_flag]: 3.47997e-06 [switch_simplify]: 6.667e-05 [loop_unroll]: 5.503e-05 [a_1]: 0.00132433 [with_stream_mark]: 2.228e-05 [recompute_prepare]: 2.142e-05 [updatestate_depend_eliminate]: 8.83001e-06 [updatestate_assign_eliminate]: 7.90998e-06 [updatestate_loads_eliminate]: 7.10998e-06 [parameter_eliminate]: 2.45002e-06 [a_2]: 0.0002445 [accelerated_algorithm]: 3.028e-05 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 3.18e-06 [shard_inline]: 1.685e-05 [merge_send_recv]: 1.569e-05 [auto_parallel]: 1.068e-05 [parallel]: 1.805e-05 [flash_sp]: 1.216e-05 [merge_comm]: 9.34e-06 [allreduce_fusion]: 8.82999e-06 [matmul_add_comm_reduction]: 2.566e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 1.774e-05 [virtual_dataset]: 1.575e-05 [get_grad_eliminate_]: 1.53e-05 [virtual_output]: 1.557e-05 [merge_forward]: 9.14998e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.725e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.829e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 2.839e-05 [set_forward_comm_id_for_comm_node_pass]: 9.67999e-06 [meta_fg_expand]: 0.00137498 [flash_sp_send_recv_attached]: 3.83999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 5.943e-05 [a_after_grad]: 8.048e-05 [renormalize]: 0.0024004 [add_forward_monad_depend]: 9.60001e-06 [auto_monad_grad]: 5.18002e-06 [auto_monad_eliminator]: 5.645e-05 [cse]: 0.00016898 [a_3]: 0.00033456 [Cycle 2]: 0.00299378, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.668e-05 [loop_unroll]: 4.341e-05 [a_1]: 0.00156979 [with_stream_mark]: 1.202e-05 [recompute_prepare]: 1.093e-05 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 4.63001e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012734 [accelerated_algorithm]: 1.184e-05 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 9.22999e-06 [merge_send_recv]: 6.59999e-06 [auto_parallel]: 7.61999e-06 [parallel]: 4.90999e-06 [flash_sp]: 2.88e-06 [merge_comm]: 5.30999e-06 [allreduce_fusion]: 4.58001e-06 [matmul_add_comm_reduction]: 7.47002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.97e-06 [get_grad_eliminate_]: 9.24998e-06 [virtual_output]: 8.68001e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.622e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.431e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 3.377e-05 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.465e-05 [a_after_grad]: 1.416e-05 [renormalize]: 0.00058643 [add_forward_monad_depend]: 4.05e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.457e-05 [cse]: 4.516e-05 [a_3]: 6.479e-05 [Cycle 3]: 0.00089865, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 1.079e-05 [loop_unroll]: 9.10001e-06 [a_1]: 0.00024913 [with_stream_mark]: 9.75002e-06 [recompute_prepare]: 9.57999e-06 [updatestate_depend_eliminate]: 4.76002e-06 [updatestate_assign_eliminate]: 3.81001e-06 [updatestate_loads_eliminate]: 3.77998e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 0.00012314 [accelerated_algorithm]: 1.157e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 6.70002e-06 [auto_parallel]: 6.84999e-06 [parallel]: 4.53001e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.86002e-06 [allreduce_fusion]: 4.85001e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.024e-05 [virtual_dataset]: 8.70001e-06 [get_grad_eliminate_]: 8.32e-06 [virtual_output]: 8.17e-06 [merge_forward]: 4.28001e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 8.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.372e-05 [set_forward_comm_id_for_comm_node_pass]: 5.16998e-06 [meta_fg_expand]: 2.94001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.486e-05 [a_after_grad]: 1.445e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.26002e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 1.046e-05 [cse]: 2.616e-05 [a_3]: 5.935e-05 [py_interpret_to_execute_after_opt_a]: 1.027e-05 [slice_cell_reuse_recomputed_activation]: 1.98002e-06 [rewriter_after_opt_a]: 4.676e-05 [convert_after_rewriter]: 9.10999e-06 [order_py_execute_after_rewriter]: 6.63003e-06 [mutable_eliminate]: 0.00045435 [opt_b]: 0.00028857, [1] [Cycle 1]: 0.00028236, [7] [b_1]: 0.00018947 [b_2]: 1.072e-05 [updatestate_depend_eliminate]: 7e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 4.02e-06 [renormalize]: 8.30012e-07 [cse]: 3.167e-05 [optimize_parallel_all_gather_comm]: 2.003e-05 [overlap_param_gather]: 2.37999e-06 [cconv]: 2.041e-05 [loop_unroll]: 0.00042204 [opt_after_cconv]: 0.00013608, [1] [Cycle 1]: 0.00013038, [7] [c_1]: 4.841e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 6.87002e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 3.85e-06 [cse]: 3.11e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 2.863e-05 [tuple_transform]: 0.00010091, [1] [Cycle 1]: 9.628e-05, [4] [d_1]: 6.664e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 9.39998e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 5.643e-05 [cse_after_recomputation]: 3.286e-05, [1] [Cycle 1]: 2.819e-05, [1] [cse]: 2.249e-05 [environ_conv]: 9.35001e-06 [swap_dp_allreduce_reducescatter]: 8.52e-06 [bias_add_comm_swap]: 2.48e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.47001e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.47999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.723e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 4.62e-06 [overlap_recompute_and_grad_model_parallel]: 5.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.56998e-06 [overlap_grad_ring_attention]: 5.19998e-06 [overlap_grad_flash_sp]: 2.415e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.86e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 9.815e-05, [1] [Cycle 1]: 9.347e-05, [6] [build]: 9.76e-06 [elim_shapecalc]: 1.325e-05 [elim_not_effective]: 1.82e-05 [opt_reshape]: 9.56e-06 [fold_const_symbol]: 1.44e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 2.375e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00046929 [validate]: 4.525e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.0810462 [execute]: 8.03001e-06 Sums bootstrap : 0.000492s : 0.47% type_inference : 0.010281s : 9.83% event_method : 0.000043s : 0.04% auto_monad : 0.000115s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000064s : 0.06% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000125s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000124s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003143s : 3.01% optimize.opt_a.with_stream_mark : 0.000044s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001412s : 1.35% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.09% optimize.opt_a.a_after_grad : 0.000109s : 0.10% optimize.opt_a.renormalize : 0.002987s : 2.86% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.08% optimize.opt_a.cse : 0.000240s : 0.23% optimize.opt_a.a_3 : 0.000459s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.43% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000422s : 0.40% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000009s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000469s : 0.45% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.081046s : 77.50% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000732 218 5.85% : 0.000043s : 11: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 54.87% : 0.000402s : 16: substitution.inline 2.11% : 0.000015s : 2: substitution.inline_without_move 1.50% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.03% : 0.000015s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.78% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000013s : 20: substitution.remove_not_recompute_node 3.26% : 0.000024s : 10: substitution.replace_applicator 1.45% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.75% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.49% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010212 2 86.54% : 0.008838s : 1: type_inference.infer 13.46% : 0.001374s : 1: type_inference.specialize ------[replace.] 0.000199 30 58.79% : 0.000117s : 16: replace.inline 41.21% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000424 30 92.80% : 0.000393s : 16: match.inline 7.20% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000735 5663 1.14% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 99: predicate.arithmetic_simplify 1.17% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.58% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.18% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.69% : 0.000012s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.33% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.96% : 0.000014s : 115: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.31% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001511 32 57.21% : 0.000864s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.79% : 0.000646s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.132903 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.24% : 0.002978s : 1: add_attr 2.23% : 0.002969s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000123s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000527s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000049s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.61% : 0.004791s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000052s : 4: opt.transform.symbol_engine_opt 8.11% : 0.010779s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.36% : 0.000479s : 1: opt_after_jit_grad 0.22% : 0.000292s : 1: opt_b 9.80% : 0.013029s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000023s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000049s : 1: pre_auto_parallel 0.03% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.18% : 0.001567s : 2: renormalize.infer 1.06% : 0.001407s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.10% : 0.000129s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 60.99% : 0.081062s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 7.75% : 0.010295s : 1: type_inference 0.05% : 0.000069s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x3-ge],max_mem:44.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x4-pynative],max_mem:44.0M TotalTime = 0.0212564, [24] [bootstrap]: 0.00053639 [type_inference]: 0.00612709 [event_method]: 1.438e-05 [auto_monad]: 5.455e-05 [graph_reusing]: 4.97e-06 [inline]: 1.77999e-06 [add_attr]: 0.00341565, [1] [add_attr_with_inline]: 0.00340422, [1] [Cycle 1]: 4.326e-05, [2] [tag_attr]: 1.496e-05 [meta_addattr_fg_expand]: 4.02e-06 [parallel-infer-symbol]: 2.47001e-06 [pre_auto_parallel]: 2.711e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00397304, [53] [py_interpret_to_execute]: 1.936e-05 [rewriter_before_opt_a]: 5.834e-05 [opt_a]: 0.00213496, [2] [Cycle 1]: 0.00153655, [45] [expand_dump_flag]: 3.20002e-06 [switch_simplify]: 3.276e-05 [loop_unroll]: 2.096e-05 [a_1]: 0.00044819 [with_stream_mark]: 1.323e-05 [recompute_prepare]: 7.68001e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.40998e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.438e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 7.78999e-06 [auto_parallel]: 5.65001e-06 [parallel]: 2.283e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.48999e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 8.18001e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 6.94999e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 3.82002e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.064e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 8.92999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.28998e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 9.84001e-06 [a_after_grad]: 8.58001e-06 [renormalize]: 0.00044769 [add_forward_monad_depend]: 4.3e-06 [auto_monad_grad]: 2.12999e-06 [auto_monad_eliminator]: 1.469e-05 [cse]: 2.818e-05 [a_3]: 3.979e-05 [Cycle 2]: 0.00058888, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012705 [with_stream_mark]: 9.49e-06 [recompute_prepare]: 5.61998e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.681e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.13001e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.37e-06 [flash_sp]: 2.91e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 4.82e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.90002e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 5.68002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95002e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.90999e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.25002e-06 [cse]: 1.3e-05 [a_3]: 3.107e-05 [py_interpret_to_execute_after_opt_a]: 7.53999e-06 [slice_cell_reuse_recomputed_activation]: 1.80001e-06 [rewriter_after_opt_a]: 2.95e-05 [convert_after_rewriter]: 7.21001e-06 [order_py_execute_after_rewriter]: 5.03002e-06 [mutable_eliminate]: 0.00044897 [opt_b]: 0.00018144, [1] [Cycle 1]: 0.00017561, [7] [b_1]: 0.00010774 [b_2]: 7.15998e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 4.2998e-07 [cse]: 1.646e-05 [optimize_parallel_all_gather_comm]: 1.578e-05 [overlap_param_gather]: 1.73997e-06 [cconv]: 2.211e-05 [loop_unroll]: 0.00041696 [opt_after_cconv]: 9.622e-05, [1] [Cycle 1]: 9.041e-05, [7] [c_1]: 2.811e-05 [parameter_eliminate]: 2.10002e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.636e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.263e-05 [tuple_transform]: 6.856e-05, [1] [Cycle 1]: 6.415e-05, [4] [d_1]: 3.878e-05 [none_parameter_eliminate]: 1.53002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.03998e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.754e-05 [cse_after_recomputation]: 2.038e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.77998e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.75999e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.10999e-06 [slice_recompute_activation]: 2.65997e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.112e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.20999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.761e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.40002e-06 [split_layernorm_comm]: 2.26e-06 [handle_group_info]: 1.24e-06 [symbol_engine_optimizer]: 6.853e-05, [1] [Cycle 1]: 6.434e-05, [6] [build]: 2.49999e-06 [elim_shapecalc]: 8.71002e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 8.71997e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.61998e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.576e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00044978 [validate]: 3.057e-05 [backend_pass]: 9.19972e-07 [task_emit]: 0.00638348 [execute]: 6.29999e-06 Sums bootstrap : 0.000536s : 3.18% type_inference : 0.006127s : 36.30% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000002s : 0.01% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000575s : 3.41% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000141s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000448s : 2.65% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000071s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 2.66% optimize.opt_b.b_1 : 0.000108s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000417s : 2.47% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 2.66% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006383s : 37.82% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000162 30 14.95% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 4: substitution.graph_param_transform 66.49% : 0.000108s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.75% : 0.000004s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 6.75% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006083 2 90.52% : 0.005506s : 1: type_inference.infer 9.48% : 0.000576s : 1: type_inference.specialize ------[replace.] 0.000039 5 68.98% : 0.000027s : 3: replace.inline 31.02% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.47% : 0.000105s : 3: match.inline 8.53% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.97% : 0.000002s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.95% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.34% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.55% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.66% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000385 8 48.21% : 0.000186s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.79% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030180 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.33% : 0.003420s : 1: add_attr 11.29% : 0.003408s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.90% : 0.000572s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.41% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000935s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.08% : 0.002138s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.52% : 0.000459s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.18% : 0.003977s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000213s : 1: renormalize.infer 0.75% : 0.000228s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 21.19% : 0.006394s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.34% : 0.006140s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.018098, [24] [bootstrap]: 0.00047491 [type_inference]: 0.00431355 [event_method]: 1.084e-05 [auto_monad]: 4.982e-05 [graph_reusing]: 5.13002e-06 [inline]: 1.99e-06 [add_attr]: 0.00293284, [1] [add_attr_with_inline]: 0.00292481, [1] [Cycle 1]: 4.535e-05, [2] [tag_attr]: 1.204e-05 [meta_addattr_fg_expand]: 3.25e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 2.14e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 9.39996e-07 [dataset_repeat_opt]: 1.83997e-06 [pipeline_split]: 1.71998e-06 [optimize]: 0.00366953, [53] [py_interpret_to_execute]: 1.516e-05 [rewriter_before_opt_a]: 3.735e-05 [opt_a]: 0.00187802, [2] [Cycle 1]: 0.00128281, [45] [expand_dump_flag]: 2.32999e-06 [switch_simplify]: 2.321e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00028664 [with_stream_mark]: 1.296e-05 [recompute_prepare]: 7.47998e-06 [updatestate_depend_eliminate]: 3.44001e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.71998e-06 [a_2]: 7.669e-05 [accelerated_algorithm]: 6.86999e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.46002e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 7.58001e-06 [auto_parallel]: 5.83002e-06 [parallel]: 1.71e-05 [flash_sp]: 7.25998e-06 [merge_comm]: 3.76001e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 9.42001e-06 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 7.42998e-06 [virtual_dataset]: 5.71998e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.44e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.29003e-06 [offload_activation]: 9.53002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.71e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.13002e-06 [receive_attached]: 2.38002e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.50001e-06 [renormalize]: 0.00033514 [add_forward_monad_depend]: 4.72998e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.253e-05 [cse]: 2.537e-05 [a_3]: 3.987e-05 [Cycle 2]: 0.0005858, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.64001e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012374 [with_stream_mark]: 9.17001e-06 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.73e-05 [accelerated_algorithm]: 5.34e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.00999e-06 [parallel]: 3.98999e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.63998e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.13998e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.81003e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93998e-06 [meta_fg_expand]: 1.89e-06 [flash_sp_send_recv_attached]: 1.15001e-06 [receive_attached]: 1.05001e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.215e-05 [a_3]: 3.115e-05 [py_interpret_to_execute_after_opt_a]: 7.32002e-06 [slice_cell_reuse_recomputed_activation]: 1.91998e-06 [rewriter_after_opt_a]: 3.064e-05 [convert_after_rewriter]: 6.84001e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00044968 [opt_b]: 0.00017732, [1] [Cycle 1]: 0.00017151, [7] [b_1]: 0.00010556 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 4.2998e-07 [cse]: 1.511e-05 [optimize_parallel_all_gather_comm]: 1.482e-05 [overlap_param_gather]: 2.25002e-06 [cconv]: 2.24e-05 [loop_unroll]: 0.00041249 [opt_after_cconv]: 9.494e-05, [1] [Cycle 1]: 8.932e-05, [7] [c_1]: 2.829e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.553e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.166e-05 [tuple_transform]: 6.823e-05, [1] [Cycle 1]: 6.405e-05, [4] [d_1]: 3.867e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.13002e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.278e-05 [cse_after_recomputation]: 1.966e-05, [1] [Cycle 1]: 1.531e-05, [1] [cse]: 1.014e-05 [environ_conv]: 4.63999e-06 [swap_dp_allreduce_reducescatter]: 4.79e-06 [bias_add_comm_swap]: 2.11998e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.44999e-06 [merge_cast_opt]: 1.10999e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.161e-05 [grouped_pairwise_exchange_alltoall]: 1.94e-06 [offloading_packed_experts]: 4.08999e-06 [overlap_recompute_and_grad_model_parallel]: 4.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.05002e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.625e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.807e-05, [1] [Cycle 1]: 6.398e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.21002e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.57e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 1.553e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.69002e-06 [opt_after_jit_grad]: 0.00044859 [validate]: 2.884e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0059069 [execute]: 6.93e-06 Sums bootstrap : 0.000475s : 3.35% type_inference : 0.004314s : 30.45% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000410s : 2.90% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000016s : 0.12% optimize.opt_a.renormalize : 0.000335s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.17% optimize.opt_b.b_1 : 0.000106s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000412s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000449s : 3.17% validate : 0.000029s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005907s : 41.69% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000117 26 18.84% : 0.000022s : 4: substitution.arithmetic_simplify 1.83% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.48% : 0.000005s : 4: substitution.graph_param_transform 64.27% : 0.000075s : 2: substitution.inline 2.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.62% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004274 2 92.00% : 0.003932s : 1: type_inference.infer 8.00% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000074 2 100.00% : 0.000074s : 2: match.inline ------[predicate.] 0.000136 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.50% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.00% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.09% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.50% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.99% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.81% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.36% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.47% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.39% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.61% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025951 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.32% : 0.002937s : 1: add_attr 11.28% : 0.002928s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.97% : 0.000511s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000759s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.25% : 0.001881s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.76% : 0.000458s : 1: opt_after_jit_grad 0.70% : 0.000181s : 1: opt_b 14.16% : 0.003674s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000184s : 1: renormalize.infer 0.56% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.80% : 0.005917s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.67% : 0.004327s : 1: type_inference 0.21% : 0.000054s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x4-kbk],max_mem:44.0M TotalTime = 0.945034, [24] [bootstrap]: 0.00053295 [type_inference]: 0.00590849 [event_method]: 1.367e-05 [auto_monad]: 5.369e-05 [graph_reusing]: 5.49e-06 [inline]: 1.76998e-06 [add_attr]: 0.00336101, [1] [add_attr_with_inline]: 0.00335067, [1] [Cycle 1]: 4.579e-05, [2] [tag_attr]: 1.573e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 2.82002e-06 [pre_auto_parallel]: 2.687e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00393076, [53] [py_interpret_to_execute]: 1.89e-05 [rewriter_before_opt_a]: 5.619e-05 [opt_a]: 0.00212194, [2] [Cycle 1]: 0.00149649, [45] [expand_dump_flag]: 3.11999e-06 [switch_simplify]: 3.106e-05 [loop_unroll]: 2.041e-05 [a_1]: 0.00044871 [with_stream_mark]: 1.307e-05 [recompute_prepare]: 8.17e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.75002e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.568e-05 [accelerated_algorithm]: 6.47001e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 7.61001e-06 [auto_parallel]: 6.06e-06 [parallel]: 2.209e-05 [flash_sp]: 7.13e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 8.35999e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.47998e-06 [virtual_dataset]: 6.22001e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.098e-05 [merge_recompute_call_nodes]: 1.51002e-06 [before_grad]: 9.41998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 2.28998e-06 [flash_sp_send_recv_attached]: 2.37001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.092e-05 [a_after_grad]: 8.79998e-06 [renormalize]: 0.00040756 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 2.633e-05 [a_3]: 4.018e-05 [Cycle 2]: 0.00061635, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.0001249 [with_stream_mark]: 9.22999e-06 [recompute_prepare]: 5.70001e-06 [updatestate_depend_eliminate]: 2.99001e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.42001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.693e-05 [accelerated_algorithm]: 5.31002e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.16002e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.39002e-06 [auto_parallel]: 5.49e-06 [parallel]: 4.47e-06 [flash_sp]: 3.31001e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 2.70025e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 4.77e-06 [virtual_output]: 4.89998e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.86998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.006e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.09002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 8e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.26998e-06 [cse]: 1.553e-05 [a_3]: 3.27e-05 [py_interpret_to_execute_after_opt_a]: 7.27002e-06 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 3.04e-05 [convert_after_rewriter]: 6.47001e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00043965 [opt_b]: 0.00018062, [1] [Cycle 1]: 0.00017454, [7] [b_1]: 0.00010629 [b_2]: 7.77e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 3.39991e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.561e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.244e-05 [loop_unroll]: 0.00040884 [opt_after_cconv]: 9.416e-05, [1] [Cycle 1]: 8.825e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.04e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.499e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.249e-05 [tuple_transform]: 6.837e-05, [1] [Cycle 1]: 6.416e-05, [4] [d_1]: 3.877e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.593e-05 [cse_after_recomputation]: 2.01e-05, [1] [Cycle 1]: 1.584e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.70001e-06 [swap_dp_allreduce_reducescatter]: 5.21998e-06 [bias_add_comm_swap]: 2.16e-06 [label_micro_interleaved_index]: 3.99997e-06 [label_fine_grained_interleaved_index]: 2.42001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.24e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 1.23002e-06 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81998e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.15e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.649e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.719e-05, [1] [Cycle 1]: 6.307e-05, [6] [build]: 2.14999e-06 [elim_shapecalc]: 8.55999e-06 [elim_not_effective]: 1.131e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.62998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.436e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.19001e-06 [opt_after_jit_grad]: 0.00044379 [validate]: 2.973e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.93047 [execute]: 8.48999e-06 Sums bootstrap : 0.000533s : 0.06% type_inference : 0.005908s : 0.63% event_method : 0.000014s : 0.00% auto_monad : 0.000054s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.00% optimize.rewriter_before_opt_a : 0.000056s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000574s : 0.06% optimize.opt_a.with_stream_mark : 0.000022s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000408s : 0.04% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000042s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.00% optimize.convert_after_rewriter : 0.000006s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000440s : 0.05% optimize.opt_b.b_1 : 0.000106s : 0.01% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000409s : 0.04% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000014s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.05% validate : 0.000030s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.930470s : 98.92% execute : 0.000008s : 0.00% Time group info: ------[substitution.] 0.000164 30 15.08% : 0.000025s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.39% : 0.000006s : 4: substitution.graph_param_transform 66.25% : 0.000109s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000004s : 4: substitution.remove_not_recompute_node 2.44% : 0.000004s : 4: substitution.replace_old_param 6.60% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005866 2 90.62% : 0.005316s : 1: type_inference.infer 9.38% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000038 5 71.00% : 0.000027s : 3: replace.inline 29.00% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.57% : 0.000107s : 3: match.inline 8.43% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 1.03% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.88% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.12% : 0.000003s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.91% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.21% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.31% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.47% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.60% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000002s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 46.47% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.53% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.953820 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.35% : 0.003365s : 1: add_attr 0.35% : 0.003354s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000059s : 1: auto_monad 0.00% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000573s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000417s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000449s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000012s : 1: opt.transform.mutable_eliminate 0.10% : 0.000938s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.22% : 0.002125s : 1: opt_a 0.01% : 0.000097s : 1: opt_after_cconv 0.05% : 0.000453s : 1: opt_after_jit_grad 0.02% : 0.000184s : 1: opt_b 0.41% : 0.003935s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000023s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000210s : 1: renormalize.infer 0.02% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000034s : 1: rewriter_after_opt_a 0.01% : 0.000060s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000070s : 1: symbol_engine_optimizer 97.55% : 0.930493s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.62% : 0.005922s : 1: type_inference 0.01% : 0.000053s : 1: validate TotalTime = 0.0544557, [24] [bootstrap]: 0.00044715 [type_inference]: 0.00433864 [event_method]: 1.126e-05 [auto_monad]: 5.025e-05 [graph_reusing]: 5.22e-06 [inline]: 2.02999e-06 [add_attr]: 0.00294593, [1] [add_attr_with_inline]: 0.00293762, [1] [Cycle 1]: 4.453e-05, [2] [tag_attr]: 1.137e-05 [meta_addattr_fg_expand]: 3.35998e-06 [parallel-infer-symbol]: 2.67001e-06 [pre_auto_parallel]: 2.171e-05 [insert-virtual-dataset]: 2.81e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00363007, [53] [py_interpret_to_execute]: 1.471e-05 [rewriter_before_opt_a]: 3.728e-05 [opt_a]: 0.00183716, [2] [Cycle 1]: 0.00123629, [45] [expand_dump_flag]: 2.59001e-06 [switch_simplify]: 2.387e-05 [loop_unroll]: 1.337e-05 [a_1]: 0.00028808 [with_stream_mark]: 1.335e-05 [recompute_prepare]: 7.28999e-06 [updatestate_depend_eliminate]: 3.53999e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 1.71002e-06 [a_2]: 7.577e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 2.31998e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 7.96001e-06 [auto_parallel]: 5.64998e-06 [parallel]: 1.717e-05 [flash_sp]: 6.73e-06 [merge_comm]: 3.36001e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 8.70001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 3.54002e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 8.52e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.046e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 1.97001e-06 [flash_sp_send_recv_attached]: 2.68998e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.43001e-06 [renormalize]: 0.00034055 [add_forward_monad_depend]: 4.18001e-06 [auto_monad_grad]: 1.66002e-06 [auto_monad_eliminator]: 1.294e-05 [cse]: 2.572e-05 [a_3]: 4e-05 [Cycle 2]: 0.00059209, [45] [expand_dump_flag]: 8.30012e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012431 [with_stream_mark]: 1.014e-05 [recompute_prepare]: 5.55001e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.20002e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 6.769e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.31002e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.12e-06 [parallel]: 3.9e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 3.19001e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.26002e-06 [get_grad_eliminate_]: 5.89999e-06 [virtual_output]: 5.35001e-06 [merge_forward]: 2.65002e-06 [cell_reuse_recompute_pass]: 1.52999e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82999e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.25999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.47999e-06 [flash_sp_send_recv_attached]: 8.40024e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.20999e-06 [a_after_grad]: 8.1e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.2e-05 [a_3]: 3.198e-05 [py_interpret_to_execute_after_opt_a]: 7.8e-06 [slice_cell_reuse_recomputed_activation]: 2.03002e-06 [rewriter_after_opt_a]: 3.025e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.0004559 [opt_b]: 0.00017867, [1] [Cycle 1]: 0.00017272, [7] [b_1]: 0.00010657 [b_2]: 7e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.40002e-06 [renormalize]: 6.19999e-07 [cse]: 1.587e-05 [optimize_parallel_all_gather_comm]: 1.49e-05 [overlap_param_gather]: 2.31998e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00040583 [opt_after_cconv]: 9.334e-05, [1] [Cycle 1]: 8.771e-05, [7] [c_1]: 2.729e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.552e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.239e-05 [tuple_transform]: 6.837e-05, [1] [Cycle 1]: 6.41e-05, [4] [d_1]: 3.875e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 4.327e-05 [cse_after_recomputation]: 1.976e-05, [1] [Cycle 1]: 1.539e-05, [1] [cse]: 1.04e-05 [environ_conv]: 4.97e-06 [swap_dp_allreduce_reducescatter]: 5.22999e-06 [bias_add_comm_swap]: 3.11999e-06 [label_micro_interleaved_index]: 4.52e-06 [label_fine_grained_interleaved_index]: 2.39001e-06 [merge_cast_opt]: 1.19003e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.20999e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.08001e-06 [interleave_parallel_branches]: 1.01997e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.99002e-06 [overlap_recompute_and_grad_model_parallel]: 4.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.611e-05 [begin_end_overlap_inline]: 4.50003e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.643e-05, [1] [Cycle 1]: 6.237e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 7.85e-06 [elim_not_effective]: 1.118e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.88997e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 1.494e-05 [get_jit_bprop_graph]: 9.90025e-07 [rewriter_after_jit_bprop_graph]: 3.21001e-06 [opt_after_jit_grad]: 0.00044135 [validate]: 3.165e-05 [backend_pass]: 1.22e-06 [task_emit]: 0.0422955 [execute]: 8.70001e-06 Sums bootstrap : 0.000447s : 0.88% type_inference : 0.004339s : 8.58% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000022s : 0.04% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000037s : 0.07% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.06% optimize.opt_a.loop_unroll : 0.000019s : 0.04% optimize.opt_a.a_1 : 0.000412s : 0.82% optimize.opt_a.with_stream_mark : 0.000023s : 0.05% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000014s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000003s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000341s : 0.67% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.04% optimize.opt_a.cse : 0.000038s : 0.07% optimize.opt_a.a_3 : 0.000072s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000456s : 0.90% optimize.opt_b.b_1 : 0.000107s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000406s : 0.80% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.09% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000441s : 0.87% validate : 0.000032s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042295s : 83.65% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000119 26 18.28% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.49% : 0.000005s : 4: substitution.graph_param_transform 65.26% : 0.000077s : 2: substitution.inline 2.53% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.67% : 0.000004s : 4: substitution.remove_not_recompute_node 3.26% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004298 2 91.61% : 0.003937s : 1: type_inference.infer 8.39% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.20% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 17: predicate.arithmetic_simplify 1.02% : 0.000001s : 9: predicate.cast_eliminate 0.94% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.44% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.16% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.57% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.32% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000258 6 42.66% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.34% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062286 196 0.01% : 0.000003s : 1: ForceFp32Comm 4.74% : 0.002950s : 1: add_attr 4.72% : 0.002941s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000055s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.77% : 0.000482s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.67% : 0.000414s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.22% : 0.000759s : 78: opt.transform.opt_a 0.04% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000089s : 28: opt.transform.opt_b 0.07% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.95% : 0.001840s : 1: opt_a 0.16% : 0.000097s : 1: opt_after_cconv 0.72% : 0.000451s : 1: opt_after_jit_grad 0.29% : 0.000182s : 1: opt_b 5.83% : 0.003634s : 1: optimize 0.03% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000026s : 1: pre_auto_parallel 0.03% : 0.000018s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000016s : 1: remove_dup_value 0.30% : 0.000186s : 1: renormalize.infer 0.24% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000041s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000069s : 1: symbol_engine_optimizer 67.93% : 0.042311s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 6.99% : 0.004352s : 1: type_inference 0.08% : 0.000052s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x4-ge],max_mem:44.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x5-pynative],max_mem:44.0M TotalTime = 0.0209225, [24] [bootstrap]: 0.00052834 [type_inference]: 0.00609138 [event_method]: 1.44e-05 [auto_monad]: 5.644e-05 [graph_reusing]: 5.77001e-06 [inline]: 1.78002e-06 [add_attr]: 0.00332774, [1] [add_attr_with_inline]: 0.00331749, [1] [Cycle 1]: 4.421e-05, [2] [tag_attr]: 1.492e-05 [meta_addattr_fg_expand]: 4.21001e-06 [parallel-infer-symbol]: 2.75997e-06 [pre_auto_parallel]: 2.752e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.29999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00396923, [53] [py_interpret_to_execute]: 2.042e-05 [rewriter_before_opt_a]: 5.843e-05 [opt_a]: 0.00212407, [2] [Cycle 1]: 0.00151971, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 3.171e-05 [loop_unroll]: 2.119e-05 [a_1]: 0.00045184 [with_stream_mark]: 1.307e-05 [recompute_prepare]: 7.75e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 1.55001e-06 [a_2]: 7.643e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 5.73002e-06 [merge_send_recv]: 7.98001e-06 [auto_parallel]: 5.72999e-06 [parallel]: 2.424e-05 [flash_sp]: 7.20003e-06 [merge_comm]: 3.65998e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.19998e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.33999e-06 [virtual_dataset]: 6.10002e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 1.71998e-06 [before_grad]: 9.53002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 2.37999e-06 [flash_sp_send_recv_attached]: 2.88998e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.015e-05 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00042183 [add_forward_monad_depend]: 4.74998e-06 [auto_monad_grad]: 2.20002e-06 [auto_monad_eliminator]: 1.378e-05 [cse]: 2.64e-05 [a_3]: 4.032e-05 [Cycle 2]: 0.00059529, [45] [expand_dump_flag]: 1.10001e-06 [switch_simplify]: 6.78998e-06 [loop_unroll]: 5.34998e-06 [a_1]: 0.00012446 [with_stream_mark]: 9.59999e-06 [recompute_prepare]: 5.78002e-06 [updatestate_depend_eliminate]: 2.69001e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.33998e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.73e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 5.05999e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 4.55999e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 6.44001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.36998e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.21997e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.57001e-06 [a_after_grad]: 8.72e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 7.90023e-07 [auto_monad_eliminator]: 6.74999e-06 [cse]: 1.28e-05 [a_3]: 3.182e-05 [py_interpret_to_execute_after_opt_a]: 7.44002e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.123e-05 [convert_after_rewriter]: 6.72002e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00044364 [opt_b]: 0.00018027, [1] [Cycle 1]: 0.00017382, [7] [b_1]: 0.00010738 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.19997e-07 [cse]: 1.569e-05 [optimize_parallel_all_gather_comm]: 1.582e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.249e-05 [loop_unroll]: 0.0004333 [opt_after_cconv]: 9.501e-05, [1] [Cycle 1]: 8.944e-05, [7] [c_1]: 2.783e-05 [parameter_eliminate]: 2.47001e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.654e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.186e-05 [tuple_transform]: 6.819e-05, [1] [Cycle 1]: 6.395e-05, [4] [d_1]: 3.853e-05 [none_parameter_eliminate]: 1.87999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 5.92001e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.611e-05 [cse_after_recomputation]: 1.987e-05, [1] [Cycle 1]: 1.561e-05, [1] [cse]: 1.058e-05 [environ_conv]: 4.55001e-06 [swap_dp_allreduce_reducescatter]: 5.02999e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 5.06002e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.59999e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.15001e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.199e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.715e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.694e-05, [1] [Cycle 1]: 6.282e-05, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.23999e-06 [elim_not_effective]: 1.12e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.61997e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.626e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.06001e-06 [opt_after_jit_grad]: 0.00044132 [validate]: 3.222e-05 [backend_pass]: 8.09989e-07 [task_emit]: 0.00618439 [execute]: 6.44001e-06 Sums bootstrap : 0.000528s : 3.18% type_inference : 0.006091s : 36.63% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000576s : 3.47% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000422s : 2.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000039s : 0.24% optimize.opt_a.a_3 : 0.000072s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.67% optimize.opt_b.b_1 : 0.000107s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000433s : 2.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000441s : 2.65% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.00% task_emit : 0.006184s : 37.19% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.75% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.32% : 0.000006s : 4: substitution.graph_param_transform 66.74% : 0.000111s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.69% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006046 2 90.74% : 0.005487s : 1: type_inference.infer 9.26% : 0.000560s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.33% : 0.000027s : 3: replace.inline 29.67% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.57% : 0.000109s : 3: match.inline 8.43% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000009s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.83% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.58% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000364 8 48.08% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.92% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029737 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.20% : 0.003332s : 1: add_attr 11.17% : 0.003321s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000062s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000568s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.49% : 0.000442s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.17% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.15% : 0.002127s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.52% : 0.000451s : 1: opt_after_jit_grad 0.62% : 0.000184s : 1: opt_b 13.36% : 0.003973s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.70% : 0.000210s : 1: renormalize.infer 0.69% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000070s : 1: symbol_engine_optimizer 20.83% : 0.006195s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.53% : 0.006105s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0180157, [24] [bootstrap]: 0.00043758 [type_inference]: 0.00434267 [event_method]: 1.092e-05 [auto_monad]: 5.02e-05 [graph_reusing]: 4.79e-06 [inline]: 1.77999e-06 [add_attr]: 0.00295908, [1] [add_attr_with_inline]: 0.00295064, [1] [Cycle 1]: 4.545e-05, [2] [tag_attr]: 1.2e-05 [meta_addattr_fg_expand]: 2.98e-06 [parallel-infer-symbol]: 2.51998e-06 [pre_auto_parallel]: 2.135e-05 [insert-virtual-dataset]: 2.86999e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.13002e-06 [pipeline_split]: 1.86e-06 [optimize]: 0.00368463, [53] [py_interpret_to_execute]: 1.587e-05 [rewriter_before_opt_a]: 3.838e-05 [opt_a]: 0.00184567, [2] [Cycle 1]: 0.00124485, [45] [expand_dump_flag]: 2.55997e-06 [switch_simplify]: 2.453e-05 [loop_unroll]: 1.388e-05 [a_1]: 0.0002904 [with_stream_mark]: 1.333e-05 [recompute_prepare]: 7.48999e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.20998e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 2.24001e-06 [a_2]: 7.711e-05 [accelerated_algorithm]: 6.08002e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 7.61999e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.791e-05 [flash_sp]: 7.18e-06 [merge_comm]: 4.02998e-06 [allreduce_fusion]: 3.42002e-06 [matmul_add_comm_reduction]: 9.22001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.26001e-06 [virtual_dataset]: 5.95002e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.46e-06 [merge_forward]: 3.97998e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.09998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.66e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.12001e-06 [after_resolve]: 1.008e-05 [a_after_grad]: 8.69998e-06 [renormalize]: 0.00033707 [add_forward_monad_depend]: 4.16001e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.32e-05 [cse]: 2.581e-05 [a_3]: 3.968e-05 [Cycle 2]: 0.00059148, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.93998e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012461 [with_stream_mark]: 1.12e-05 [recompute_prepare]: 5.69e-06 [updatestate_depend_eliminate]: 2.92002e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.759e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.15e-06 [auto_parallel]: 5.10999e-06 [parallel]: 4.02998e-06 [flash_sp]: 3.08e-06 [merge_comm]: 2.83003e-06 [allreduce_fusion]: 2.63998e-06 [matmul_add_comm_reduction]: 5.01002e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 6.16998e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.78997e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.14002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 9.54999e-06 [a_after_grad]: 8.17998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 6.41e-06 [cse]: 1.191e-05 [a_3]: 3.173e-05 [py_interpret_to_execute_after_opt_a]: 7.18e-06 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 3.092e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00044233 [opt_b]: 0.00022574, [1] [Cycle 1]: 0.00021973, [7] [b_1]: 0.00015109 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 2.19996e-07 [cse]: 1.634e-05 [optimize_parallel_all_gather_comm]: 1.567e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.178e-05 [loop_unroll]: 0.00041109 [opt_after_cconv]: 9.548e-05, [1] [Cycle 1]: 8.962e-05, [7] [c_1]: 2.794e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.39998e-06 [updatestate_assign_eliminate]: 2.50997e-06 [updatestate_loads_eliminate]: 2.12001e-06 [cse]: 1.612e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.87e-05, [1] [Cycle 1]: 6.415e-05, [4] [d_1]: 3.831e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 2.16e-06 [add_recomputation]: 4.339e-05 [cse_after_recomputation]: 2.023e-05, [1] [Cycle 1]: 1.578e-05, [1] [cse]: 1.032e-05 [environ_conv]: 4.72998e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.54001e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.98e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.42999e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.122e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 3.46001e-06 [overlap_recompute_and_grad_model_parallel]: 4.25e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.16998e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.618e-05 [begin_end_overlap_inline]: 8.59989e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.69972e-07 [symbol_engine_optimizer]: 6.799e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.28998e-06 [elim_shapecalc]: 8.49998e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 5.82999e-06 [fold_const_symbol]: 8.70001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.78997e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.482e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00044178 [validate]: 3.108e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00579961 [execute]: 6.16e-06 Sums bootstrap : 0.000438s : 3.10% type_inference : 0.004343s : 30.79% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000415s : 2.94% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000337s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000442s : 3.14% optimize.opt_b.b_1 : 0.000151s : 1.07% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000411s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.02% optimize.add_recomputation : 0.000043s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000442s : 3.13% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005800s : 41.13% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000119 26 17.84% : 0.000021s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000005s : 4: substitution.graph_param_transform 65.87% : 0.000078s : 2: substitution.inline 2.53% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.82% : 0.000005s : 4: substitution.remove_not_recompute_node 3.12% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004301 2 92.08% : 0.003960s : 1: type_inference.infer 7.92% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 2.11% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.31% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.86% : 0.000001s : 8: predicate.incorporate_call 0.70% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.53% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.84% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.86% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.57% : 0.000006s : 41: predicate.switch_simplify 0.81% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.06% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.68% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.32% : 0.000134s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025966 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.41% : 0.002964s : 1: add_attr 11.38% : 0.002954s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.82% : 0.000473s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.74% : 0.000452s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.95% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.51% : 0.000133s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.12% : 0.001849s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.74% : 0.000451s : 1: opt_after_jit_grad 0.88% : 0.000229s : 1: opt_b 14.20% : 0.003688s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000187s : 1: renormalize.infer 0.55% : 0.000144s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.37% : 0.005809s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.78% : 0.004357s : 1: type_inference 0.22% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x5-kbk],max_mem:44.0M TotalTime = 0.119403, [24] [bootstrap]: 0.00052112 [type_inference]: 0.00596756 [event_method]: 1.369e-05 [auto_monad]: 5.336e-05 [graph_reusing]: 5.67001e-06 [inline]: 1.96e-06 [add_attr]: 0.00342187, [1] [add_attr_with_inline]: 0.00341081, [1] [Cycle 1]: 4.695e-05, [2] [tag_attr]: 1.583e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.827e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 9.89996e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00397645, [53] [py_interpret_to_execute]: 2.031e-05 [rewriter_before_opt_a]: 6.016e-05 [opt_a]: 0.002154, [2] [Cycle 1]: 0.0015555, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 3.262e-05 [loop_unroll]: 2.112e-05 [a_1]: 0.00045621 [with_stream_mark]: 1.362e-05 [recompute_prepare]: 7.7e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 1.94e-06 [a_2]: 7.564e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 8.54e-06 [auto_parallel]: 5.87001e-06 [parallel]: 2.487e-05 [flash_sp]: 7.28e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.43e-06 [matmul_add_comm_reduction]: 2.905e-05 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 7.98001e-06 [virtual_dataset]: 6.34999e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.128e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.14e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.40002e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.73001e-06 [renormalize]: 0.00042968 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.76998e-06 [auto_monad_eliminator]: 1.398e-05 [cse]: 2.659e-05 [a_3]: 4.059e-05 [Cycle 2]: 0.00058899, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.00012657 [with_stream_mark]: 8.87999e-06 [recompute_prepare]: 5.63002e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.23998e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.8e-05 [accelerated_algorithm]: 5.74999e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.73001e-06 [auto_parallel]: 5.20001e-06 [parallel]: 4.24002e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.11002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.89999e-06 [virtual_dataset]: 5.19998e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 5.37001e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 6.03998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.68999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.91999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.09998e-06 [a_after_grad]: 7.8e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 8.89995e-07 [auto_monad_grad]: 6.80011e-07 [auto_monad_eliminator]: 6.04001e-06 [cse]: 1.579e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.63001e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.018e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 4.74998e-06 [mutable_eliminate]: 0.00044588 [opt_b]: 0.00017977, [1] [Cycle 1]: 0.00017376, [7] [b_1]: 0.0001068 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.45002e-06 [renormalize]: 2.80008e-07 [cse]: 1.571e-05 [optimize_parallel_all_gather_comm]: 1.574e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.214e-05 [loop_unroll]: 0.00040838 [opt_after_cconv]: 9.495e-05, [1] [Cycle 1]: 8.894e-05, [7] [c_1]: 2.795e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.604e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.187e-05 [tuple_transform]: 6.914e-05, [1] [Cycle 1]: 6.491e-05, [4] [d_1]: 3.9e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.66002e-06 [add_recomputation]: 4.641e-05 [cse_after_recomputation]: 2.016e-05, [1] [Cycle 1]: 1.607e-05, [1] [cse]: 1.095e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.39e-06 [bias_add_comm_swap]: 2.80002e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.53998e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.50003e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.43002e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.686e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 2.09999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.73e-05, [1] [Cycle 1]: 6.32e-05, [6] [build]: 2.14e-06 [elim_shapecalc]: 7.95e-06 [elim_not_effective]: 1.178e-05 [opt_reshape]: 6.31998e-06 [fold_const_symbol]: 8.94998e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.66e-06 [auto_monad_reorder]: 1.596e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00044567 [validate]: 3.114e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.104682 [execute]: 9.29e-06 Sums bootstrap : 0.000521s : 0.45% type_inference : 0.005968s : 5.19% event_method : 0.000014s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000060s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000583s : 0.51% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000034s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000430s : 0.37% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.39% optimize.opt_b.b_1 : 0.000107s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000408s : 0.36% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.39% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.104682s : 91.02% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.78% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.62% : 0.000109s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.90% : 0.000005s : 4: substitution.remove_not_recompute_node 2.27% : 0.000004s : 4: substitution.replace_old_param 6.55% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005925 2 90.84% : 0.005382s : 1: type_inference.infer 9.16% : 0.000543s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.44% : 0.000027s : 3: replace.inline 30.56% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.66% : 0.000108s : 3: match.inline 8.34% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.84% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 1.00% : 0.000002s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000001s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.21% : 0.000010s : 51: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.93% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.92% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.07% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.41% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 46.68% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.32% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.128330 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.67% : 0.003426s : 1: add_attr 2.66% : 0.003415s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000558s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000417s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000455s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.74% : 0.000950s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.68% : 0.002157s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.35% : 0.000455s : 1: opt_after_jit_grad 0.14% : 0.000183s : 1: opt_b 3.10% : 0.003980s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.18% : 0.000226s : 1: renormalize.infer 0.15% : 0.000197s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 81.59% : 0.104703s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.66% : 0.005981s : 1: type_inference 0.04% : 0.000056s : 1: validate TotalTime = 0.11259, [24] [bootstrap]: 0.00047684 [type_inference]: 0.00434204 [event_method]: 1.104e-05 [auto_monad]: 5.058e-05 [graph_reusing]: 5.27001e-06 [inline]: 1.88002e-06 [add_attr]: 0.00298911, [1] [add_attr_with_inline]: 0.00298152, [1] [Cycle 1]: 4.496e-05, [2] [tag_attr]: 1.21e-05 [meta_addattr_fg_expand]: 3.32002e-06 [parallel-infer-symbol]: 2.60002e-06 [pre_auto_parallel]: 2.053e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00366897, [53] [py_interpret_to_execute]: 1.496e-05 [rewriter_before_opt_a]: 3.914e-05 [opt_a]: 0.00187652, [2] [Cycle 1]: 0.00124168, [45] [expand_dump_flag]: 2.43e-06 [switch_simplify]: 2.48e-05 [loop_unroll]: 1.346e-05 [a_1]: 0.00029034 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 7.66999e-06 [updatestate_depend_eliminate]: 3.62002e-06 [updatestate_assign_eliminate]: 3.19001e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 1.54998e-06 [a_2]: 7.583e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 6.15002e-06 [merge_send_recv]: 7.77e-06 [auto_parallel]: 5.77001e-06 [parallel]: 1.725e-05 [flash_sp]: 6.96001e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.13998e-06 [matmul_add_comm_reduction]: 8.73001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.08002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.07001e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.81e-06 [after_resolve]: 1.029e-05 [a_after_grad]: 8.79e-06 [renormalize]: 0.00034102 [add_forward_monad_depend]: 4.29997e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 1.279e-05 [cse]: 2.579e-05 [a_3]: 3.984e-05 [Cycle 2]: 0.00062565, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 6.53e-06 [loop_unroll]: 5.22999e-06 [a_1]: 0.00012393 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.833e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.48002e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.07999e-06 [parallel]: 4.2e-06 [flash_sp]: 3.24001e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.68998e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 5.98002e-06 [virtual_dataset]: 5.04e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 5.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57999e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.52998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.69999e-06 [a_after_grad]: 7.95e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.229e-05 [a_3]: 3.209e-05 [py_interpret_to_execute_after_opt_a]: 7.41999e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 3.036e-05 [convert_after_rewriter]: 6.71999e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.00044299 [opt_b]: 0.00017921, [1] [Cycle 1]: 0.00017321, [7] [b_1]: 0.00010743 [b_2]: 7.22002e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.10015e-07 [cse]: 1.565e-05 [optimize_parallel_all_gather_comm]: 1.568e-05 [overlap_param_gather]: 2.11998e-06 [cconv]: 2.197e-05 [loop_unroll]: 0.00041221 [opt_after_cconv]: 9.458e-05, [1] [Cycle 1]: 8.88e-05, [7] [c_1]: 2.754e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.73003e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.554e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.246e-05 [tuple_transform]: 6.884e-05, [1] [Cycle 1]: 6.453e-05, [4] [d_1]: 3.905e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.382e-05 [cse_after_recomputation]: 2.002e-05, [1] [Cycle 1]: 1.559e-05, [1] [cse]: 1.042e-05 [environ_conv]: 4.66002e-06 [swap_dp_allreduce_reducescatter]: 5.64e-06 [bias_add_comm_swap]: 2.91999e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.39e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.70002e-07 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.171e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.50998e-06 [overlap_recompute_and_grad_model_parallel]: 4.61002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.06998e-06 [overlap_grad_ring_attention]: 4.22003e-06 [overlap_grad_flash_sp]: 1.625e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.73e-05, [1] [Cycle 1]: 6.314e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.23999e-06 [elim_not_effective]: 1.131e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 8.40001e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.555e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.19001e-06 [opt_after_jit_grad]: 0.00044627 [validate]: 2.993e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.100302 [execute]: 9.58002e-06 Sums bootstrap : 0.000477s : 0.44% type_inference : 0.004342s : 4.00% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000414s : 0.38% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000341s : 0.31% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000038s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000443s : 0.41% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000412s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.41% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.100302s : 92.35% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.49% : 0.000022s : 4: substitution.arithmetic_simplify 1.59% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 65.15% : 0.000078s : 2: substitution.inline 2.41% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.51% : 0.000004s : 4: substitution.remove_not_recompute_node 3.36% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004304 2 91.66% : 0.003945s : 1: type_inference.infer 8.34% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.33% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.01% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.72% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.60% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.23% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000259 6 42.98% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.02% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120509 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.48% : 0.002993s : 1: add_attr 2.48% : 0.002985s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000512s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000452s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.63% : 0.000764s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.56% : 0.001879s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.38% : 0.000455s : 1: opt_after_jit_grad 0.15% : 0.000182s : 1: opt_b 3.05% : 0.003673s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000024s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000188s : 1: renormalize.infer 0.12% : 0.000146s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 83.25% : 0.100324s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.61% : 0.004355s : 1: type_inference 0.04% : 0.000051s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x5-ge],max_mem:46.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x6-pynative],max_mem:46.0M TotalTime = 0.0209486, [24] [bootstrap]: 0.0005415 [type_inference]: 0.00606677 [event_method]: 1.483e-05 [auto_monad]: 5.858e-05 [graph_reusing]: 5.30999e-06 [inline]: 2.07001e-06 [add_attr]: 0.00334792, [1] [add_attr_with_inline]: 0.00333747, [1] [Cycle 1]: 4.308e-05, [2] [tag_attr]: 1.491e-05 [meta_addattr_fg_expand]: 3.78001e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 2.773e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00399141, [53] [py_interpret_to_execute]: 1.951e-05 [rewriter_before_opt_a]: 5.834e-05 [opt_a]: 0.00215277, [2] [Cycle 1]: 0.00152419, [45] [expand_dump_flag]: 3.11001e-06 [switch_simplify]: 3.294e-05 [loop_unroll]: 2.118e-05 [a_1]: 0.00045306 [with_stream_mark]: 1.288e-05 [recompute_prepare]: 7.75e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.652e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.76998e-06 [merge_send_recv]: 8.18999e-06 [auto_parallel]: 5.67999e-06 [parallel]: 2.33e-05 [flash_sp]: 6.93998e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.07001e-06 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 7.9e-06 [virtual_dataset]: 5.96998e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 3.51999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.47999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 2.26998e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 1.007e-05 [a_after_grad]: 8.95001e-06 [renormalize]: 0.00042636 [add_forward_monad_depend]: 4.34002e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.329e-05 [cse]: 2.668e-05 [a_3]: 4.099e-05 [Cycle 2]: 0.0006191, [45] [expand_dump_flag]: 8.29983e-07 [switch_simplify]: 6.77002e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.00014231 [with_stream_mark]: 9.92999e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.34999e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.775e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.06997e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.34002e-06 [auto_parallel]: 5.50001e-06 [parallel]: 4.17e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.80997e-06 [matmul_add_comm_reduction]: 5.04998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.30002e-06 [virtual_dataset]: 5.30999e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.61999e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.55001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.40024e-07 [auto_monad_eliminator]: 6.87002e-06 [cse]: 1.339e-05 [a_3]: 3.209e-05 [py_interpret_to_execute_after_opt_a]: 7.51001e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 2.929e-05 [convert_after_rewriter]: 6.74001e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.00044972 [opt_b]: 0.00018258, [1] [Cycle 1]: 0.00017644, [7] [b_1]: 0.00010818 [b_2]: 6.95002e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 3.50003e-07 [cse]: 1.627e-05 [optimize_parallel_all_gather_comm]: 1.636e-05 [overlap_param_gather]: 1.73002e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00041597 [opt_after_cconv]: 9.508e-05, [1] [Cycle 1]: 8.952e-05, [7] [c_1]: 2.747e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.613e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.249e-05 [tuple_transform]: 7.05e-05, [1] [Cycle 1]: 6.585e-05, [4] [d_1]: 3.991e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.02999e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 4.841e-05 [cse_after_recomputation]: 2.122e-05, [1] [Cycle 1]: 1.691e-05, [1] [cse]: 1.171e-05 [environ_conv]: 4.68999e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.88003e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.38002e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86998e-06 [control_data_broadcast_order]: 1.15e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.48999e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 3.78999e-06 [overlap_grad_flash_sp]: 1.652e-05 [begin_end_overlap_inline]: 8.49977e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.79e-05, [1] [Cycle 1]: 6.395e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.28999e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 6.11998e-06 [fold_const_symbol]: 8.94e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.56002e-06 [auto_monad_reorder]: 1.55e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.13e-06 [opt_after_jit_grad]: 0.00044945 [validate]: 3.183e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00617315 [execute]: 6.51e-06 Sums bootstrap : 0.000541s : 3.26% type_inference : 0.006067s : 36.48% event_method : 0.000015s : 0.09% auto_monad : 0.000059s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000595s : 3.58% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000426s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.70% optimize.opt_b.b_1 : 0.000108s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000416s : 2.50% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.70% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006173s : 37.12% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.42% : 0.000024s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.46% : 0.000006s : 4: substitution.graph_param_transform 66.82% : 0.000110s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000004s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.55% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006026 2 90.55% : 0.005457s : 1: type_inference.infer 9.45% : 0.000569s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.70% : 0.000026s : 3: replace.inline 31.30% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.72% : 0.000108s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.36% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.84% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.57% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000002s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000366 8 45.55% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.45% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029833 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.24% : 0.003352s : 1: add_attr 11.20% : 0.003341s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000577s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.23% : 0.000963s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.23% : 0.002156s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.54% : 0.000458s : 1: opt_after_jit_grad 0.62% : 0.000186s : 1: opt_b 13.39% : 0.003995s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000213s : 1: renormalize.infer 0.69% : 0.000206s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.72% : 0.006183s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 20.38% : 0.006081s : 1: type_inference 0.21% : 0.000061s : 1: validate TotalTime = 0.0181897, [24] [bootstrap]: 0.00046657 [type_inference]: 0.00438224 [event_method]: 1.018e-05 [auto_monad]: 4.978e-05 [graph_reusing]: 4.89998e-06 [inline]: 1.66e-06 [add_attr]: 0.00293916, [1] [add_attr_with_inline]: 0.00293102, [1] [Cycle 1]: 4.617e-05, [2] [tag_attr]: 1.151e-05 [meta_addattr_fg_expand]: 3.09999e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.117e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.56002e-06 [optimize]: 0.00366651, [53] [py_interpret_to_execute]: 1.518e-05 [rewriter_before_opt_a]: 3.777e-05 [opt_a]: 0.00187704, [2] [Cycle 1]: 0.00127698, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 2.411e-05 [loop_unroll]: 1.356e-05 [a_1]: 0.00029152 [with_stream_mark]: 1.273e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 2.72001e-06 [parameter_eliminate]: 1.54998e-06 [a_2]: 7.693e-05 [accelerated_algorithm]: 6.34001e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 5.52001e-06 [parallel]: 1.826e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.52002e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 8.39002e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.75e-06 [virtual_dataset]: 5.89999e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.77001e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.45001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.33998e-06 [after_resolve]: 1.028e-05 [a_after_grad]: 8.54e-06 [renormalize]: 0.00033395 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.61002e-06 [auto_monad_eliminator]: 1.251e-05 [cse]: 2.718e-05 [a_3]: 4.006e-05 [Cycle 2]: 0.00059055, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 6.89001e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012541 [with_stream_mark]: 9.09e-06 [recompute_prepare]: 5.44998e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.53003e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.748e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.06001e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.02998e-06 [flash_sp]: 3.01001e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.10999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.28998e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.94998e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.007e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 8.37e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.00005e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.84999e-06 [a_after_grad]: 8.52e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.01998e-06 [cse]: 1.158e-05 [a_3]: 3.164e-05 [py_interpret_to_execute_after_opt_a]: 7.49002e-06 [slice_cell_reuse_recomputed_activation]: 1.73002e-06 [rewriter_after_opt_a]: 3.139e-05 [convert_after_rewriter]: 6.66999e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00044781 [opt_b]: 0.00017952, [1] [Cycle 1]: 0.00017363, [7] [b_1]: 0.00010677 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 4.30009e-07 [cse]: 1.578e-05 [optimize_parallel_all_gather_comm]: 1.635e-05 [overlap_param_gather]: 1.81003e-06 [cconv]: 2.207e-05 [loop_unroll]: 0.00041101 [opt_after_cconv]: 9.325e-05, [1] [Cycle 1]: 8.774e-05, [7] [c_1]: 2.755e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.558e-05 [renormalize]: 2.99973e-07 [remove_dup_value]: 1.171e-05 [tuple_transform]: 6.824e-05, [1] [Cycle 1]: 6.402e-05, [4] [d_1]: 3.874e-05 [none_parameter_eliminate]: 1.41002e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.26e-05 [cse_after_recomputation]: 1.938e-05, [1] [Cycle 1]: 1.507e-05, [1] [cse]: 9.98998e-06 [environ_conv]: 4.16001e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.15002e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.15001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.09003e-06 [full_micro_interleaved_order_control]: 1.99e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.93002e-06 [control_data_broadcast_order]: 1.118e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.46001e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 1.92001e-06 [overlap_grad_ring_attention]: 3.90998e-06 [overlap_grad_flash_sp]: 1.611e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.26998e-06 [split_layernorm_comm]: 1.58002e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.834e-05, [1] [Cycle 1]: 6.415e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 8.49998e-06 [elim_not_effective]: 1.181e-05 [opt_reshape]: 5.79999e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.518e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.81001e-06 [opt_after_jit_grad]: 0.00044538 [validate]: 3.062e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00593844 [execute]: 7.03e-06 Sums bootstrap : 0.000467s : 3.27% type_inference : 0.004382s : 30.73% event_method : 0.000010s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.92% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000334s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.14% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000411s : 2.88% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000445s : 3.12% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005938s : 41.64% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.64% : 0.000022s : 4: substitution.arithmetic_simplify 1.55% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.35% : 0.000005s : 4: substitution.graph_param_transform 64.95% : 0.000077s : 2: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.57% : 0.000004s : 4: substitution.remove_not_recompute_node 3.52% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004341 2 91.18% : 0.003958s : 1: type_inference.infer 8.82% : 0.000383s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.06% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000002s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.82% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 1.18% : 0.000002s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.24% : 0.000002s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.69% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000273 6 36.57% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.43% : 0.000173s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026056 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.30% : 0.002943s : 1: add_attr 11.26% : 0.002934s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000503s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000420s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000769s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.21% : 0.001880s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.75% : 0.000455s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.09% : 0.003670s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000185s : 1: renormalize.infer 0.55% : 0.000143s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.83% : 0.005948s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.87% : 0.004396s : 1: type_inference 0.21% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x6-kbk],max_mem:46.0M TotalTime = 0.887353, [24] [bootstrap]: 0.00054611 [type_inference]: 0.00596825 [event_method]: 1.391e-05 [auto_monad]: 5.294e-05 [graph_reusing]: 5.61e-06 [inline]: 1.54e-06 [add_attr]: 0.00338739, [1] [add_attr_with_inline]: 0.00337643, [1] [Cycle 1]: 4.479e-05, [2] [tag_attr]: 1.601e-05 [meta_addattr_fg_expand]: 3.93001e-06 [parallel-infer-symbol]: 2.65997e-06 [pre_auto_parallel]: 2.888e-05 [insert-virtual-dataset]: 2.69001e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00397584, [53] [py_interpret_to_execute]: 1.969e-05 [rewriter_before_opt_a]: 5.869e-05 [opt_a]: 0.00210829, [2] [Cycle 1]: 0.00150683, [45] [expand_dump_flag]: 3.20002e-06 [switch_simplify]: 3.222e-05 [loop_unroll]: 2.077e-05 [a_1]: 0.00045362 [with_stream_mark]: 1.337e-05 [recompute_prepare]: 7.66999e-06 [updatestate_depend_eliminate]: 3.69002e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.626e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 7.53e-06 [auto_parallel]: 6.43e-06 [parallel]: 2.354e-05 [flash_sp]: 7.06001e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.25002e-06 [matmul_add_comm_reduction]: 8.20999e-06 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 7.03998e-06 [virtual_dataset]: 5.91998e-06 [get_grad_eliminate_]: 5.42999e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.62998e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.25001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.14e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.33002e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.86002e-06 [renormalize]: 0.00041129 [add_forward_monad_depend]: 4.33001e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.294e-05 [cse]: 2.813e-05 [a_3]: 3.976e-05 [Cycle 2]: 0.00059184, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.87002e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012556 [with_stream_mark]: 9.85002e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.53998e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.66e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 1.14003e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.16001e-06 [auto_parallel]: 5.22999e-06 [parallel]: 4.13999e-06 [flash_sp]: 2.95998e-06 [merge_comm]: 3.15002e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 6.08002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.87998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.05002e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.37999e-06 [a_after_grad]: 8.14002e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.15002e-06 [cse]: 1.27e-05 [a_3]: 3.172e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.129e-05 [convert_after_rewriter]: 7.33e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00048355 [opt_b]: 0.00018221, [1] [Cycle 1]: 0.00017604, [7] [b_1]: 0.00010841 [b_2]: 7.02002e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 5.20027e-07 [cse]: 1.631e-05 [optimize_parallel_all_gather_comm]: 1.554e-05 [overlap_param_gather]: 2.12999e-06 [cconv]: 2.185e-05 [loop_unroll]: 0.00041345 [opt_after_cconv]: 9.371e-05, [1] [Cycle 1]: 8.795e-05, [7] [c_1]: 2.715e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.581e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 1.24e-05 [tuple_transform]: 6.914e-05, [1] [Cycle 1]: 6.495e-05, [4] [d_1]: 3.959e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.21998e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.75e-05 [cse_after_recomputation]: 2.056e-05, [1] [Cycle 1]: 1.603e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.62998e-06 [swap_dp_allreduce_reducescatter]: 5.39e-06 [bias_add_comm_swap]: 2.94999e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.55002e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.16997e-06 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.77001e-06 [offloading_packed_experts]: 3.78001e-06 [overlap_recompute_and_grad_model_parallel]: 4.15e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.704e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.722e-05, [1] [Cycle 1]: 6.324e-05, [6] [build]: 2.26998e-06 [elim_shapecalc]: 8.19998e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.489e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.57002e-06 [opt_after_jit_grad]: 0.00044945 [validate]: 3.056e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.872636 [execute]: 9.99001e-06 Sums bootstrap : 0.000546s : 0.06% type_inference : 0.005968s : 0.68% event_method : 0.000014s : 0.00% auto_monad : 0.000053s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000579s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000411s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000041s : 0.00% optimize.opt_a.a_3 : 0.000071s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000484s : 0.05% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000413s : 0.05% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000449s : 0.05% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.872636s : 98.83% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000164 30 14.71% : 0.000024s : 5: substitution.arithmetic_simplify 1.24% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.08% : 0.000005s : 4: substitution.graph_param_transform 67.08% : 0.000110s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.74% : 0.000004s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 6.39% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005923 2 90.91% : 0.005385s : 1: type_inference.infer 9.09% : 0.000538s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.31% : 0.000027s : 3: replace.inline 29.69% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.96% : 0.000108s : 3: match.inline 8.04% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.86% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 1.06% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.54% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.92% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.07% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 1.00% : 0.000002s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.30% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.97% : 0.000002s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 46.71% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.29% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.896223 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.38% : 0.003391s : 1: add_attr 0.38% : 0.003380s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000058s : 1: auto_monad 0.00% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000585s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000493s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000944s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000031s : 4: opt.transform.symbol_engine_opt 0.24% : 0.002111s : 1: opt_a 0.01% : 0.000097s : 1: opt_after_cconv 0.05% : 0.000459s : 1: opt_after_jit_grad 0.02% : 0.000186s : 1: opt_b 0.44% : 0.003980s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000212s : 1: renormalize.infer 0.02% : 0.000193s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000070s : 1: symbol_engine_optimizer 97.37% : 0.872659s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.67% : 0.005981s : 1: type_inference 0.01% : 0.000055s : 1: validate TotalTime = 0.0693753, [24] [bootstrap]: 0.00048284 [type_inference]: 0.00440159 [event_method]: 1.127e-05 [auto_monad]: 5.098e-05 [graph_reusing]: 4.98001e-06 [inline]: 1.89e-06 [add_attr]: 0.00298743, [1] [add_attr_with_inline]: 0.00297954, [1] [Cycle 1]: 4.591e-05, [2] [tag_attr]: 1.2e-05 [meta_addattr_fg_expand]: 3.08e-06 [parallel-infer-symbol]: 2.85998e-06 [pre_auto_parallel]: 2.127e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00367798, [53] [py_interpret_to_execute]: 1.441e-05 [rewriter_before_opt_a]: 3.893e-05 [opt_a]: 0.00184661, [2] [Cycle 1]: 0.00124373, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 2.396e-05 [loop_unroll]: 1.365e-05 [a_1]: 0.00028892 [with_stream_mark]: 1.334e-05 [recompute_prepare]: 7.55e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.549e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 2.58e-06 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.67002e-06 [auto_parallel]: 5.59e-06 [parallel]: 1.675e-05 [flash_sp]: 6.84999e-06 [merge_comm]: 3.81999e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 6.94001e-06 [virtual_dataset]: 5.68997e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.51001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 8.85999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 9.84001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00034149 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.335e-05 [cse]: 2.718e-05 [a_3]: 4.015e-05 [Cycle 2]: 0.00059381, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012545 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 5.81e-06 [updatestate_depend_eliminate]: 2.69001e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.11003e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.648e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 9.49978e-07 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.36998e-06 [merge_send_recv]: 4.30999e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.35e-06 [flash_sp]: 3.38e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 5.08002e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.83e-06 [virtual_dataset]: 5.57001e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.028e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.64998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.15998e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.00999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.173e-05 [a_3]: 3.198e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 1.71998e-06 [rewriter_after_opt_a]: 3.021e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.0004507 [opt_b]: 0.00018026, [1] [Cycle 1]: 0.00017432, [7] [b_1]: 0.00010792 [b_2]: 6.91999e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 2.50002e-07 [cse]: 1.594e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.00044653 [opt_after_cconv]: 9.491e-05, [1] [Cycle 1]: 8.903e-05, [7] [c_1]: 2.726e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.632e-05 [renormalize]: 6.09987e-07 [remove_dup_value]: 1.256e-05 [tuple_transform]: 6.879e-05, [1] [Cycle 1]: 6.455e-05, [4] [d_1]: 3.902e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.332e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.566e-05, [1] [cse]: 1.064e-05 [environ_conv]: 4.90999e-06 [swap_dp_allreduce_reducescatter]: 5.24998e-06 [bias_add_comm_swap]: 2.22999e-06 [label_micro_interleaved_index]: 4.44998e-06 [label_fine_grained_interleaved_index]: 2.39999e-06 [merge_cast_opt]: 1.12e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.34998e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.11998e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.14e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.35e-06 [overlap_recompute_and_grad_model_parallel]: 4.41002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.86998e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.675e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.44999e-06 [split_layernorm_comm]: 1.75001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.836e-05, [1] [Cycle 1]: 6.429e-05, [6] [build]: 2.14e-06 [elim_shapecalc]: 8.09002e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 6.31998e-06 [fold_const_symbol]: 8.95999e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.523e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.19001e-06 [opt_after_jit_grad]: 0.00044557 [validate]: 3.127e-05 [backend_pass]: 1.48002e-06 [task_emit]: 0.0570222 [execute]: 8.13001e-06 Sums bootstrap : 0.000483s : 0.74% type_inference : 0.004402s : 6.73% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000414s : 0.63% optimize.opt_a.with_stream_mark : 0.000023s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000342s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.69% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000447s : 0.68% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057022s : 87.15% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 19.06% : 0.000023s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 64.26% : 0.000077s : 2: substitution.inline 2.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.78% : 0.000005s : 4: substitution.remove_not_recompute_node 3.27% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004361 2 91.89% : 0.004008s : 1: type_inference.infer 8.11% : 0.000354s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.47% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.15% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.91% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.99% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 1.22% : 0.000002s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 1.02% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.07% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000253 6 43.27% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.73% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.077304 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.87% : 0.002992s : 1: add_attr 3.86% : 0.002983s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000520s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.59% : 0.000455s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.99% : 0.000764s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.39% : 0.001849s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.59% : 0.000455s : 1: opt_after_jit_grad 0.24% : 0.000184s : 1: opt_b 4.76% : 0.003682s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000190s : 1: renormalize.infer 0.19% : 0.000145s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 73.78% : 0.057037s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.71% : 0.004415s : 1: type_inference 0.07% : 0.000052s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x6-ge],max_mem:46.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x7-pynative],max_mem:46.0M TotalTime = 0.0211301, [24] [bootstrap]: 0.00054402 [type_inference]: 0.00614994 [event_method]: 1.387e-05 [auto_monad]: 5.441e-05 [graph_reusing]: 5.49e-06 [inline]: 1.79e-06 [add_attr]: 0.00342292, [1] [add_attr_with_inline]: 0.00341184, [1] [Cycle 1]: 4.462e-05, [2] [tag_attr]: 1.5e-05 [meta_addattr_fg_expand]: 4.25e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.76e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.08002e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00400332, [53] [py_interpret_to_execute]: 2.147e-05 [rewriter_before_opt_a]: 5.77e-05 [opt_a]: 0.00216838, [2] [Cycle 1]: 0.00156036, [45] [expand_dump_flag]: 2.87002e-06 [switch_simplify]: 3.131e-05 [loop_unroll]: 2.061e-05 [a_1]: 0.00045671 [with_stream_mark]: 1.364e-05 [recompute_prepare]: 8.25999e-06 [updatestate_depend_eliminate]: 3.74002e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 1.83002e-06 [a_2]: 7.594e-05 [accelerated_algorithm]: 6.30002e-06 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 8.01001e-06 [auto_parallel]: 6.45002e-06 [parallel]: 2.385e-05 [flash_sp]: 7.20998e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 8.65001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.88001e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.98999e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 8.89998e-06 [renormalize]: 0.00042149 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 1.302e-05 [cse]: 2.641e-05 [a_3]: 4.108e-05 [Cycle 2]: 0.00059873, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012628 [with_stream_mark]: 9.97001e-06 [recompute_prepare]: 5.78002e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.21e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.755e-05 [accelerated_algorithm]: 5.63002e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 7.46999e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.2e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 3.49974e-07 [virtual_shard_identity]: 7.31001e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 4.87e-06 [virtual_output]: 5.30001e-06 [merge_forward]: 2.74999e-06 [cell_reuse_recompute_pass]: 1.41002e-06 [offload_activation]: 6.26e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.011e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.18001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.30001e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.69999e-06 [cse]: 1.276e-05 [a_3]: 3.211e-05 [py_interpret_to_execute_after_opt_a]: 7.45003e-06 [slice_cell_reuse_recomputed_activation]: 1.83002e-06 [rewriter_after_opt_a]: 3.042e-05 [convert_after_rewriter]: 7.27002e-06 [order_py_execute_after_rewriter]: 5.26002e-06 [mutable_eliminate]: 0.00044866 [opt_b]: 0.00018056, [1] [Cycle 1]: 0.0001745, [7] [b_1]: 0.00010687 [b_2]: 7.04001e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 3.00002e-07 [cse]: 1.67e-05 [optimize_parallel_all_gather_comm]: 1.499e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 2.313e-05 [loop_unroll]: 0.00041485 [opt_after_cconv]: 9.449e-05, [1] [Cycle 1]: 8.897e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.12001e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.642e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.975e-05, [1] [Cycle 1]: 6.549e-05, [4] [d_1]: 3.948e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.43e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.526e-05 [cse_after_recomputation]: 2.064e-05, [1] [Cycle 1]: 1.636e-05, [1] [cse]: 1.128e-05 [environ_conv]: 5.36002e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.09e-06 [label_micro_interleaved_index]: 3.90998e-06 [label_fine_grained_interleaved_index]: 2.43002e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.78002e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 7.7e-07 [full_micro_interleaved_order_control]: 2.55002e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.43002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49998e-06 [control_data_broadcast_order]: 1.204e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.57997e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.02999e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 1.89e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.811e-05, [1] [Cycle 1]: 6.397e-05, [6] [build]: 2.52001e-06 [elim_shapecalc]: 8.74998e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.54002e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.34e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.535e-05 [get_jit_bprop_graph]: 9.40025e-07 [rewriter_after_jit_bprop_graph]: 3.55998e-06 [opt_after_jit_grad]: 0.00044831 [validate]: 3.217e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00618918 [execute]: 6.41e-06 Sums bootstrap : 0.000544s : 3.26% type_inference : 0.006150s : 36.81% event_method : 0.000014s : 0.08% auto_monad : 0.000054s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000583s : 3.49% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000422s : 2.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 2.69% optimize.opt_b.b_1 : 0.000107s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000415s : 2.48% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000448s : 2.68% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006189s : 37.04% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000164 30 15.06% : 0.000025s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.41% : 0.000006s : 4: substitution.graph_param_transform 65.90% : 0.000108s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.44% : 0.000004s : 4: substitution.replace_old_param 6.98% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006105 2 90.75% : 0.005540s : 1: type_inference.infer 9.25% : 0.000565s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.85% : 0.000027s : 3: replace.inline 30.15% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.11% : 0.000106s : 3: match.inline 8.89% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.39% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.13% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000009s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.01% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.47% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.92% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000002s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.43% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000369 8 45.92% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.08% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030082 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.39% : 0.003427s : 1: add_attr 11.35% : 0.003416s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000049s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.20% : 0.000059s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000582s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.41% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.17% : 0.000952s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.22% : 0.002171s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.52% : 0.000458s : 1: opt_after_jit_grad 0.61% : 0.000184s : 1: opt_b 13.32% : 0.004007s : 1: optimize 0.06% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.69% : 0.000208s : 1: renormalize.infer 0.69% : 0.000207s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.60% : 0.006198s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.49% : 0.006163s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0181453, [24] [bootstrap]: 0.00052293 [type_inference]: 0.00431455 [event_method]: 1.094e-05 [auto_monad]: 5.034e-05 [graph_reusing]: 4.97999e-06 [inline]: 2.12999e-06 [add_attr]: 0.00292729, [1] [add_attr_with_inline]: 0.0029195, [1] [Cycle 1]: 4.585e-05, [2] [tag_attr]: 1.152e-05 [meta_addattr_fg_expand]: 3.14001e-06 [parallel-infer-symbol]: 2.49001e-06 [pre_auto_parallel]: 2.121e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.59998e-06 [optimize]: 0.00366946, [53] [py_interpret_to_execute]: 1.536e-05 [rewriter_before_opt_a]: 6.05e-05 [opt_a]: 0.00185322, [2] [Cycle 1]: 0.00124858, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 2.513e-05 [loop_unroll]: 1.378e-05 [a_1]: 0.0002889 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.13e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 2.03002e-06 [a_2]: 7.721e-05 [accelerated_algorithm]: 6.04001e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.40001e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 8.33001e-06 [auto_parallel]: 5.96e-06 [parallel]: 1.894e-05 [flash_sp]: 6.88e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.13e-06 [matmul_add_comm_reduction]: 8.50999e-06 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 7.23999e-06 [virtual_dataset]: 5.61998e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 9.04003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.53002e-06 [before_grad]: 9.99999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.33002e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.111e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00033772 [add_forward_monad_depend]: 4.50001e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.367e-05 [cse]: 2.742e-05 [a_3]: 4.023e-05 [Cycle 2]: 0.00059517, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.93998e-06 [loop_unroll]: 5.34998e-06 [a_1]: 0.00012528 [with_stream_mark]: 8.69998e-06 [recompute_prepare]: 5.87001e-06 [updatestate_depend_eliminate]: 2.88003e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.765e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.29e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.23002e-06 [parallel]: 4.12e-06 [flash_sp]: 3.23e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.56e-06 [matmul_add_comm_reduction]: 4.94e-06 [allreduce_slice_to_reducescatter]: 2.10013e-07 [virtual_shard_identity]: 6.26998e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.61999e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.85002e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.48001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 9.42999e-06 [a_after_grad]: 8.23001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 6.77002e-06 [cse]: 1.208e-05 [a_3]: 3.231e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.086e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 4.82998e-06 [mutable_eliminate]: 0.00044679 [opt_b]: 0.00018142, [1] [Cycle 1]: 0.00017556, [7] [b_1]: 0.00010773 [b_2]: 7.18998e-06 [updatestate_depend_eliminate]: 5.36998e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 2.89991e-07 [cse]: 1.65e-05 [optimize_parallel_all_gather_comm]: 1.544e-05 [overlap_param_gather]: 1.73002e-06 [cconv]: 2.12e-05 [loop_unroll]: 0.00040999 [opt_after_cconv]: 9.376e-05, [1] [Cycle 1]: 8.826e-05, [7] [c_1]: 2.809e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.574e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.193e-05 [tuple_transform]: 6.878e-05, [1] [Cycle 1]: 6.443e-05, [4] [d_1]: 3.885e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 4.324e-05 [cse_after_recomputation]: 2.004e-05, [1] [Cycle 1]: 1.553e-05, [1] [cse]: 1.032e-05 [environ_conv]: 4.75999e-06 [swap_dp_allreduce_reducescatter]: 5.52001e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.67998e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.61e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.183e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 3.91001e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.37e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 3.81999e-06 [overlap_grad_flash_sp]: 1.669e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 1.94999e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.74e-05, [1] [Cycle 1]: 6.344e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.26002e-06 [elim_not_effective]: 1.128e-05 [opt_reshape]: 5.89999e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.586e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 3.79002e-06 [opt_after_jit_grad]: 0.00047696 [validate]: 3.027e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00588005 [execute]: 7.13e-06 Sums bootstrap : 0.000523s : 3.67% type_inference : 0.004315s : 30.25% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000002s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000060s : 0.42% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000414s : 2.90% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000338s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000447s : 3.13% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000410s : 2.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000477s : 3.34% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005880s : 41.23% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 17.50% : 0.000021s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.11% : 0.000001s : 2: substitution.fold_const_symbol 4.64% : 0.000006s : 4: substitution.graph_param_transform 64.86% : 0.000078s : 2: substitution.inline 2.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.80% : 0.000005s : 4: substitution.remove_not_recompute_node 3.79% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004275 2 92.00% : 0.003933s : 1: type_inference.infer 8.00% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.88% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.86% : 0.000007s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.05% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.57% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.43% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026009 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.27% : 0.002932s : 1: add_attr 11.24% : 0.002923s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.15% : 0.000559s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.61% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000456s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.96% : 0.000769s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.14% : 0.001856s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.87% : 0.000487s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 14.12% : 0.003673s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.72% : 0.000186s : 1: renormalize.infer 0.56% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.25% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.65% : 0.005890s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.64% : 0.004328s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x7-kbk],max_mem:46.0M TotalTime = 0.963893, [24] [bootstrap]: 0.00053585 [type_inference]: 0.00592813 [event_method]: 1.403e-05 [auto_monad]: 8.29e-05 [graph_reusing]: 5.22e-06 [inline]: 1.62001e-06 [add_attr]: 0.00333778, [1] [add_attr_with_inline]: 0.00332648, [1] [Cycle 1]: 4.466e-05, [2] [tag_attr]: 1.45e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.772e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 1.04e-06 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00398797, [53] [py_interpret_to_execute]: 2.048e-05 [rewriter_before_opt_a]: 5.829e-05 [opt_a]: 0.00215186, [2] [Cycle 1]: 0.0015483, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 3.145e-05 [loop_unroll]: 2.09e-05 [a_1]: 0.00050152 [with_stream_mark]: 1.323e-05 [recompute_prepare]: 8.00999e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.31001e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.736e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 6.00002e-06 [merge_send_recv]: 7.87998e-06 [auto_parallel]: 5.82999e-06 [parallel]: 2.083e-05 [flash_sp]: 6.81999e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 8.15e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.045e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.22001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.37999e-06 [flash_sp_send_recv_attached]: 2.17001e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 1.01e-05 [a_after_grad]: 8.67e-06 [renormalize]: 0.00040727 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.37e-05 [cse]: 2.562e-05 [a_3]: 4.066e-05 [Cycle 2]: 0.00059436, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012639 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.80997e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.755e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.36002e-06 [auto_parallel]: 5.00999e-06 [parallel]: 4.42e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 2.98998e-06 [allreduce_fusion]: 3.03e-06 [matmul_add_comm_reduction]: 4.90999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.18002e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.39001e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.95002e-06 [a_after_grad]: 8.67e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.248e-05 [a_3]: 3.224e-05 [py_interpret_to_execute_after_opt_a]: 7.33e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.074e-05 [convert_after_rewriter]: 7.23e-06 [order_py_execute_after_rewriter]: 5.59e-06 [mutable_eliminate]: 0.00045296 [opt_b]: 0.00018106, [1] [Cycle 1]: 0.00017499, [7] [b_1]: 0.00010831 [b_2]: 7.23e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 2.50002e-07 [cse]: 1.605e-05 [optimize_parallel_all_gather_comm]: 1.537e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.273e-05 [loop_unroll]: 0.00041733 [opt_after_cconv]: 9.265e-05, [1] [Cycle 1]: 8.71e-05, [7] [c_1]: 2.73e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.13998e-06 [cse]: 1.523e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.259e-05 [tuple_transform]: 6.868e-05, [1] [Cycle 1]: 6.437e-05, [4] [d_1]: 3.872e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.537e-05 [cse_after_recomputation]: 1.981e-05, [1] [Cycle 1]: 1.551e-05, [1] [cse]: 1.043e-05 [environ_conv]: 4.33999e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.66999e-06 [label_micro_interleaved_index]: 4.16001e-06 [label_fine_grained_interleaved_index]: 2.78003e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.097e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.53999e-06 [overlap_recompute_and_grad_model_parallel]: 4.37998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.674e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 6.835e-05, [1] [Cycle 1]: 6.41e-05, [6] [build]: 2.11998e-06 [elim_shapecalc]: 8.84003e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.77999e-06 [auto_monad_reorder]: 1.486e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00044975 [validate]: 5.403e-05 [backend_pass]: 1.20999e-06 [task_emit]: 0.949212 [execute]: 8.42998e-06 Sums bootstrap : 0.000536s : 0.06% type_inference : 0.005928s : 0.62% event_method : 0.000014s : 0.00% auto_monad : 0.000083s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000628s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000025s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000407s : 0.04% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000038s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000453s : 0.05% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.00% optimize.loop_unroll : 0.000417s : 0.04% optimize.opt_after_cconv.c_1 : 0.000027s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.00% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.05% validate : 0.000054s : 0.01% backend_pass : 0.000001s : 0.00% task_emit : 0.949212s : 98.92% execute : 0.000008s : 0.00% Time group info: ------[substitution.] 0.000208 30 11.77% : 0.000024s : 5: substitution.arithmetic_simplify 0.84% : 0.000002s : 2: substitution.elim_not_effective 0.59% : 0.000001s : 2: substitution.fold_const_symbol 2.61% : 0.000005s : 4: substitution.graph_param_transform 52.15% : 0.000108s : 3: substitution.inline 1.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.02% : 0.000004s : 4: substitution.remove_not_recompute_node 1.93% : 0.000004s : 4: substitution.replace_old_param 26.74% : 0.000056s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005885 2 90.73% : 0.005340s : 1: type_inference.infer 9.27% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000041 5 69.86% : 0.000029s : 3: replace.inline 30.14% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000161 5 66.11% : 0.000106s : 3: match.inline 33.89% : 0.000055s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 1.13% : 0.000002s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.08% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.44% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000009s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.06% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.70% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.92% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.05% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.65% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 45.72% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.28% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.972768 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.34% : 0.003342s : 1: add_attr 0.34% : 0.003330s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000088s : 1: auto_monad 0.00% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000572s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000462s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.10% : 0.000995s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000091s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.22% : 0.002155s : 1: opt_a 0.01% : 0.000096s : 1: opt_after_cconv 0.05% : 0.000459s : 1: opt_after_jit_grad 0.02% : 0.000184s : 1: opt_b 0.41% : 0.003992s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000209s : 1: renormalize.infer 0.02% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000071s : 1: symbol_engine_optimizer 97.58% : 0.949233s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.61% : 0.005942s : 1: type_inference 0.01% : 0.000079s : 1: validate TotalTime = 0.069946, [24] [bootstrap]: 0.00048508 [type_inference]: 0.00442128 [event_method]: 1.052e-05 [auto_monad]: 5.068e-05 [graph_reusing]: 4.65001e-06 [inline]: 1.99999e-06 [add_attr]: 0.00297701, [1] [add_attr_with_inline]: 0.00296886, [1] [Cycle 1]: 4.608e-05, [2] [tag_attr]: 1.201e-05 [meta_addattr_fg_expand]: 3.01999e-06 [parallel-infer-symbol]: 3.12002e-06 [pre_auto_parallel]: 2.194e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.00368597, [53] [py_interpret_to_execute]: 1.475e-05 [rewriter_before_opt_a]: 3.918e-05 [opt_a]: 0.00185688, [2] [Cycle 1]: 0.00125608, [45] [expand_dump_flag]: 3.43e-06 [switch_simplify]: 2.453e-05 [loop_unroll]: 1.393e-05 [a_1]: 0.00029186 [with_stream_mark]: 1.438e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.91999e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 2.84001e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.716e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 2.46998e-06 [meta_shard_fg_expand]: 1.61002e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 8.23999e-06 [auto_parallel]: 5.56998e-06 [parallel]: 1.746e-05 [flash_sp]: 7.97998e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.69002e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.2e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.46998e-06 [before_grad]: 9.23002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 8.64e-06 [renormalize]: 0.00034115 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 1.71998e-06 [auto_monad_eliminator]: 1.328e-05 [cse]: 2.634e-05 [a_3]: 4.073e-05 [Cycle 2]: 0.00059182, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.74999e-06 [loop_unroll]: 5.31998e-06 [a_1]: 0.00012428 [with_stream_mark]: 1.113e-05 [recompute_prepare]: 5.67001e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 7.30011e-07 [a_2]: 6.77e-05 [accelerated_algorithm]: 5.37999e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.26998e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.33002e-06 [parallel]: 3.65998e-06 [flash_sp]: 3.45998e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.33e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.92999e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22002e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.61002e-06 [a_after_grad]: 8.42e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.09001e-06 [cse]: 1.187e-05 [a_3]: 3.44e-05 [py_interpret_to_execute_after_opt_a]: 6.91001e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 3.002e-05 [convert_after_rewriter]: 7.00998e-06 [order_py_execute_after_rewriter]: 4.87998e-06 [mutable_eliminate]: 0.00044965 [opt_b]: 0.00017828, [1] [Cycle 1]: 0.00017238, [7] [b_1]: 0.00010616 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [renormalize]: 2.60014e-07 [cse]: 1.633e-05 [optimize_parallel_all_gather_comm]: 1.699e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.226e-05 [loop_unroll]: 0.00040969 [opt_after_cconv]: 9.399e-05, [1] [Cycle 1]: 8.798e-05, [7] [c_1]: 2.719e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.12001e-06 [cse]: 1.61e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.258e-05 [tuple_transform]: 6.948e-05, [1] [Cycle 1]: 6.523e-05, [4] [d_1]: 3.973e-05 [none_parameter_eliminate]: 1.44998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.239e-05 [cse_after_recomputation]: 2.045e-05, [1] [Cycle 1]: 1.59e-05, [1] [cse]: 1.081e-05 [environ_conv]: 3.605e-05 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.19999e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.52001e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.41998e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.09003e-06 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.79001e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.14e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.46001e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.09997e-06 [overlap_grad_flash_sp]: 1.666e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.865e-05, [1] [Cycle 1]: 6.466e-05, [6] [build]: 2.03997e-06 [elim_shapecalc]: 8.77999e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 6.25002e-06 [fold_const_symbol]: 8.88002e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.57e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00045585 [validate]: 3.093e-05 [backend_pass]: 1.20999e-06 [task_emit]: 0.0575592 [execute]: 7.95998e-06 Sums bootstrap : 0.000485s : 0.73% type_inference : 0.004421s : 6.70% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.63% optimize.opt_a.with_stream_mark : 0.000026s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000341s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000038s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.68% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000410s : 0.62% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000036s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000456s : 0.69% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057559s : 87.20% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000119 26 17.98% : 0.000021s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.51% : 0.000005s : 4: substitution.graph_param_transform 66.00% : 0.000079s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.57% : 0.000004s : 4: substitution.remove_not_recompute_node 3.12% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004381 2 91.82% : 0.004023s : 1: type_inference.infer 8.18% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 1.02% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.96% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.06% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.39% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.62% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.63% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.82% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.97% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.35% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.96% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.97% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000262 6 43.78% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.22% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.077876 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.83% : 0.002981s : 1: add_attr 3.82% : 0.002973s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000522s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.05% : 0.000040s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.99% : 0.000768s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.39% : 0.001860s : 1: opt_a 0.13% : 0.000097s : 1: opt_after_cconv 0.60% : 0.000465s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.74% : 0.003690s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000188s : 1: renormalize.infer 0.19% : 0.000146s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 73.93% : 0.057574s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.69% : 0.004435s : 1: type_inference 0.07% : 0.000053s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x7-ge],max_mem:46.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x8-pynative],max_mem:46.0M TotalTime = 0.0210246, [24] [bootstrap]: 0.00052242 [type_inference]: 0.00607322 [event_method]: 1.439e-05 [auto_monad]: 5.435e-05 [graph_reusing]: 6.06e-06 [inline]: 1.75001e-06 [add_attr]: 0.00332364, [1] [add_attr_with_inline]: 0.00331276, [1] [Cycle 1]: 4.485e-05, [2] [tag_attr]: 1.572e-05 [meta_addattr_fg_expand]: 4.03999e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 2.9e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00394644, [53] [py_interpret_to_execute]: 2.014e-05 [rewriter_before_opt_a]: 5.862e-05 [opt_a]: 0.00211367, [2] [Cycle 1]: 0.00150816, [45] [expand_dump_flag]: 3.16999e-06 [switch_simplify]: 3.114e-05 [loop_unroll]: 2.097e-05 [a_1]: 0.00045035 [with_stream_mark]: 1.356e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.81001e-06 [updatestate_assign_eliminate]: 3.04999e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.584e-05 [accelerated_algorithm]: 6.06998e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 7.72998e-06 [auto_parallel]: 5.34e-06 [parallel]: 2.272e-05 [flash_sp]: 7.33999e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 8.85001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.30003e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.89e-06 [virtual_output]: 5.58002e-06 [merge_forward]: 3.45998e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.11002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.054e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.52999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.34999e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 8.75999e-06 [renormalize]: 0.00042128 [add_forward_monad_depend]: 4.42e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.313e-05 [cse]: 2.551e-05 [a_3]: 4.089e-05 [Cycle 2]: 0.00059621, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.73998e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.00012642 [with_stream_mark]: 9.46e-06 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.17999e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.784e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.53997e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 6.69999e-06 [parallel]: 4.47e-06 [flash_sp]: 2.98998e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.17999e-06 [get_grad_eliminate_]: 6.29001e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.05e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.68001e-06 [a_after_grad]: 8.18001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.36e-06 [cse]: 1.253e-05 [a_3]: 3.204e-05 [py_interpret_to_execute_after_opt_a]: 7.85e-06 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 2.994e-05 [convert_after_rewriter]: 6.72002e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00044137 [opt_b]: 0.00019451, [1] [Cycle 1]: 0.00018837, [7] [b_1]: 0.00011947 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 2.79979e-07 [cse]: 1.593e-05 [optimize_parallel_all_gather_comm]: 1.622e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.158e-05 [loop_unroll]: 0.00040941 [opt_after_cconv]: 9.396e-05, [1] [Cycle 1]: 8.828e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.602e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.309e-05 [tuple_transform]: 6.82e-05, [1] [Cycle 1]: 6.405e-05, [4] [d_1]: 3.896e-05 [none_parameter_eliminate]: 1.40001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.99999e-06 [add_recomputation]: 4.581e-05 [cse_after_recomputation]: 1.988e-05, [1] [Cycle 1]: 1.549e-05, [1] [cse]: 1.055e-05 [environ_conv]: 4.94e-06 [swap_dp_allreduce_reducescatter]: 4.65999e-06 [bias_add_comm_swap]: 2.65002e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 3.10998e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.24999e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.125e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.68999e-06 [overlap_recompute_and_grad_model_parallel]: 4.48999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 4.05998e-06 [overlap_grad_flash_sp]: 1.733e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 2.02001e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.806e-05, [1] [Cycle 1]: 6.387e-05, [6] [build]: 2.40002e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.72998e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.95001e-06 [auto_monad_reorder]: 1.485e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00044619 [validate]: 3.208e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00629144 [execute]: 7.05002e-06 Sums bootstrap : 0.000522s : 3.13% type_inference : 0.006073s : 36.37% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.33% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000577s : 3.45% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000421s : 2.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000038s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000441s : 2.64% optimize.opt_b.b_1 : 0.000119s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000409s : 2.45% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 2.67% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006291s : 37.68% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.69% : 0.000024s : 5: substitution.arithmetic_simplify 1.28% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.82% : 0.000110s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.47% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006029 2 90.77% : 0.005472s : 1: type_inference.infer 9.23% : 0.000556s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.51% : 0.000026s : 3: replace.inline 30.49% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.84% : 0.000108s : 3: match.inline 8.16% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.96% : 0.000002s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 0.95% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 1.07% : 0.000002s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.55% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.93% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.39% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000379 8 50.25% : 0.000190s : 3: func_graph_cloner_run.FuncGraphClonerGraph 49.75% : 0.000188s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029779 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.18% : 0.003328s : 1: add_attr 11.14% : 0.003316s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.88% : 0.000558s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000450s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.16% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000102s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.11% : 0.002117s : 1: opt_a 0.33% : 0.000097s : 1: opt_after_cconv 1.53% : 0.000455s : 1: opt_after_jit_grad 0.67% : 0.000198s : 1: opt_b 13.27% : 0.003950s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000212s : 1: renormalize.infer 0.68% : 0.000203s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 21.16% : 0.006302s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.44% : 0.006087s : 1: type_inference 0.21% : 0.000061s : 1: validate TotalTime = 0.0180875, [24] [bootstrap]: 0.00045664 [type_inference]: 0.00433433 [event_method]: 1.063e-05 [auto_monad]: 5.131e-05 [graph_reusing]: 5.27999e-06 [inline]: 2.26e-06 [add_attr]: 0.00297249, [1] [add_attr_with_inline]: 0.00296414, [1] [Cycle 1]: 4.369e-05, [2] [tag_attr]: 1.198e-05 [meta_addattr_fg_expand]: 2.98998e-06 [parallel-infer-symbol]: 3.40003e-06 [pre_auto_parallel]: 2.168e-05 [insert-virtual-dataset]: 2.16e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.69998e-06 [optimize]: 0.00365119, [53] [py_interpret_to_execute]: 1.604e-05 [rewriter_before_opt_a]: 3.838e-05 [opt_a]: 0.00185435, [2] [Cycle 1]: 0.00124381, [45] [expand_dump_flag]: 2.46e-06 [switch_simplify]: 2.383e-05 [loop_unroll]: 1.412e-05 [a_1]: 0.00029115 [with_stream_mark]: 1.351e-05 [recompute_prepare]: 7.57998e-06 [updatestate_depend_eliminate]: 3.56001e-06 [updatestate_assign_eliminate]: 3.14001e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.49998e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.50002e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.36001e-06 [auto_parallel]: 5.75001e-06 [parallel]: 1.748e-05 [flash_sp]: 7.64002e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.56e-06 [allreduce_slice_to_reducescatter]: 8.40024e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 6.14999e-06 [get_grad_eliminate_]: 5.37999e-06 [virtual_output]: 5.65001e-06 [merge_forward]: 3.91001e-06 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 9.27999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.037e-05 [a_after_grad]: 8.62e-06 [renormalize]: 0.00033797 [add_forward_monad_depend]: 4.35999e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.358e-05 [cse]: 2.469e-05 [a_3]: 3.925e-05 [Cycle 2]: 0.00060164, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012392 [with_stream_mark]: 9.24e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.63e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.41e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.794e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 1.635e-05 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.83002e-06 [parallel]: 3.86001e-06 [flash_sp]: 2.88998e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 5.06002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.20997e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 4.89003e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.87001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.18001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 8.16002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 5.94e-06 [cse]: 1.271e-05 [a_3]: 3.166e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 3.127e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.18002e-06 [mutable_eliminate]: 0.00044395 [opt_b]: 0.00018225, [1] [Cycle 1]: 0.00017634, [7] [b_1]: 0.00010922 [b_2]: 7e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 4.60015e-07 [cse]: 1.587e-05 [optimize_parallel_all_gather_comm]: 1.571e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.246e-05 [loop_unroll]: 0.00041214 [opt_after_cconv]: 9.425e-05, [1] [Cycle 1]: 8.868e-05, [7] [c_1]: 2.784e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.555e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.223e-05 [tuple_transform]: 6.853e-05, [1] [Cycle 1]: 6.395e-05, [4] [d_1]: 3.857e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.10002e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.165e-05 [cse_after_recomputation]: 1.954e-05, [1] [Cycle 1]: 1.514e-05, [1] [cse]: 1.01e-05 [environ_conv]: 4.39002e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.46998e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.10017e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.38998e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.01997e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.56998e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 3.91999e-06 [overlap_recompute_and_grad_model_parallel]: 4.82e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 1.92001e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.704e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.908e-05, [1] [Cycle 1]: 6.485e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 8.67e-06 [elim_not_effective]: 1.153e-05 [opt_reshape]: 6.01998e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.16e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.596e-05 [get_jit_bprop_graph]: 9.49978e-07 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00044525 [validate]: 3.119e-05 [backend_pass]: 8.09989e-07 [task_emit]: 0.00587306 [execute]: 6.59001e-06 Sums bootstrap : 0.000457s : 3.22% type_inference : 0.004334s : 30.61% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000415s : 2.93% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.02% optimize.opt_a.shard_inline : 0.000022s : 0.16% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000338s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000037s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000444s : 3.13% optimize.opt_b.b_1 : 0.000109s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000412s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.02% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.14% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005873s : 41.47% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 19.23% : 0.000023s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.29% : 0.000002s : 2: substitution.fold_const_symbol 4.11% : 0.000005s : 4: substitution.graph_param_transform 64.17% : 0.000076s : 2: substitution.inline 2.50% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.87% : 0.000005s : 4: substitution.remove_not_recompute_node 3.33% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004292 2 92.14% : 0.003955s : 1: type_inference.infer 7.86% : 0.000337s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000074 2 100.00% : 0.000074s : 2: match.inline ------[predicate.] 0.000147 984 0.74% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.67% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.73% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.98% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.97% : 0.000001s : 13: predicate.environ_get_depend_swap 1.70% : 0.000002s : 21: predicate.environ_get_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.89% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.68% : 0.000002s : 11: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 12.68% : 0.000019s : 44: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.62% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.63% : 0.000001s : 4: predicate.parallel_virtual_node 1.14% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.73% : 0.000001s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 0.86% : 0.000001s : 9: predicate.reduce_eliminate 2.06% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.21% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.96% : 0.000001s : 11: predicate.switch_defer_inline 1.62% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.06% : 0.000006s : 41: predicate.switch_simplify 0.70% : 0.000001s : 9: predicate.tile_eliminate 0.73% : 0.000001s : 9: predicate.transpose_eliminate 1.38% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.97% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.27% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.98% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.88% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 42.26% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.74% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025986 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.46% : 0.002977s : 1: add_attr 11.42% : 0.002968s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000492s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000453s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.99% : 0.000776s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.15% : 0.001857s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000454s : 1: opt_after_jit_grad 0.71% : 0.000186s : 1: opt_b 14.07% : 0.003655s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000188s : 1: renormalize.infer 0.55% : 0.000144s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000072s : 1: symbol_engine_optimizer 22.64% : 0.005883s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.73% : 0.004348s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x8-kbk],max_mem:46.0M TotalTime = 0.140132, [24] [bootstrap]: 0.00052377 [type_inference]: 0.00596992 [event_method]: 1.388e-05 [auto_monad]: 5.403e-05 [graph_reusing]: 5.20001e-06 [inline]: 1.52999e-06 [add_attr]: 0.00340936, [1] [add_attr_with_inline]: 0.00339861, [1] [Cycle 1]: 4.522e-05, [2] [tag_attr]: 1.527e-05 [meta_addattr_fg_expand]: 3.91999e-06 [parallel-infer-symbol]: 2.53e-06 [pre_auto_parallel]: 2.797e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00395808, [53] [py_interpret_to_execute]: 2.116e-05 [rewriter_before_opt_a]: 5.771e-05 [opt_a]: 0.00210384, [2] [Cycle 1]: 0.0015023, [45] [expand_dump_flag]: 2.77002e-06 [switch_simplify]: 3.176e-05 [loop_unroll]: 2.112e-05 [a_1]: 0.00045169 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 7.85e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 3.24001e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.584e-05 [accelerated_algorithm]: 6.91001e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 6.02001e-06 [merge_send_recv]: 7.71001e-06 [auto_parallel]: 6.07999e-06 [parallel]: 2.136e-05 [flash_sp]: 6.98e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 6.34999e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.59e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.46002e-06 [offload_activation]: 8.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 9.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 2.93998e-06 [receive_attached]: 2.68003e-06 [after_resolve]: 9.98002e-06 [a_after_grad]: 8.97e-06 [renormalize]: 0.00040896 [add_forward_monad_depend]: 4.69002e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.321e-05 [cse]: 2.543e-05 [a_3]: 4.11e-05 [Cycle 2]: 0.00059223, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 6.93998e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00012566 [with_stream_mark]: 9.69999e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.75997e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.742e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.14e-06 [parallel]: 3.86999e-06 [flash_sp]: 2.91999e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.91999e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.98998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.77998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.94998e-06 [a_after_grad]: 8.02e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 6.56e-06 [cse]: 1.285e-05 [a_3]: 3.285e-05 [py_interpret_to_execute_after_opt_a]: 7.48e-06 [slice_cell_reuse_recomputed_activation]: 2.35002e-06 [rewriter_after_opt_a]: 3.033e-05 [convert_after_rewriter]: 7.31999e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00044469 [opt_b]: 0.00017972, [1] [Cycle 1]: 0.00017367, [7] [b_1]: 0.00010717 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 5.52001e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.39991e-07 [cse]: 1.503e-05 [optimize_parallel_all_gather_comm]: 1.519e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.00043929 [opt_after_cconv]: 9.368e-05, [1] [Cycle 1]: 8.818e-05, [7] [c_1]: 2.739e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.567e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.329e-05 [tuple_transform]: 6.904e-05, [1] [Cycle 1]: 6.481e-05, [4] [d_1]: 3.866e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.817e-05 [cse_after_recomputation]: 1.971e-05, [1] [Cycle 1]: 1.533e-05, [1] [cse]: 1.041e-05 [environ_conv]: 4.2e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.12998e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.19003e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.23002e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 1.31002e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.181e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 1.91e-06 [overlap_grad_ring_attention]: 3.66001e-06 [overlap_grad_flash_sp]: 1.648e-05 [begin_end_overlap_inline]: 7.99977e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.756e-05, [1] [Cycle 1]: 6.33e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.148e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.21e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.518e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.29001e-06 [opt_after_jit_grad]: 0.00044859 [validate]: 3.128e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.125436 [execute]: 9.42999e-06 Sums bootstrap : 0.000524s : 0.39% type_inference : 0.005970s : 4.40% event_method : 0.000014s : 0.01% auto_monad : 0.000054s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000577s : 0.43% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.11% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000025s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000409s : 0.30% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.01% optimize.opt_a.cse : 0.000038s : 0.03% optimize.opt_a.a_3 : 0.000074s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000445s : 0.33% optimize.opt_b.b_1 : 0.000107s : 0.08% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000439s : 0.32% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.33% validate : 0.000031s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.125436s : 92.40% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000161 30 14.60% : 0.000024s : 5: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.23% : 0.000005s : 4: substitution.graph_param_transform 66.95% : 0.000108s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000004s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005928 2 90.87% : 0.005387s : 1: type_inference.infer 9.13% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.04% : 0.000026s : 3: replace.inline 30.96% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.61% : 0.000105s : 3: match.inline 8.39% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 0.98% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.68% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.43% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.16% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.61% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000344 8 46.73% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.27% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.148999 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.29% : 0.003414s : 1: add_attr 2.28% : 0.003402s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000059s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.38% : 0.000559s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.30% : 0.000448s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.30% : 0.000454s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.63% : 0.000943s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.41% : 0.002107s : 1: opt_a 0.07% : 0.000097s : 1: opt_after_cconv 0.31% : 0.000458s : 1: opt_after_jit_grad 0.12% : 0.000183s : 1: opt_b 2.66% : 0.003963s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.14% : 0.000214s : 1: renormalize.infer 0.13% : 0.000188s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 84.20% : 0.125457s : 1: task_emit 0.05% : 0.000072s : 1: tuple_transform 4.02% : 0.005983s : 1: type_inference 0.04% : 0.000056s : 1: validate TotalTime = 0.135608, [24] [bootstrap]: 0.00045409 [type_inference]: 0.00438222 [event_method]: 1.096e-05 [auto_monad]: 5.03e-05 [graph_reusing]: 5.25001e-06 [inline]: 1.97999e-06 [add_attr]: 0.0029924, [1] [add_attr_with_inline]: 0.00298425, [1] [Cycle 1]: 4.615e-05, [2] [tag_attr]: 1.181e-05 [meta_addattr_fg_expand]: 3.40998e-06 [parallel-infer-symbol]: 2.50002e-06 [pre_auto_parallel]: 2.156e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.78002e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00366945, [53] [py_interpret_to_execute]: 1.521e-05 [rewriter_before_opt_a]: 3.794e-05 [opt_a]: 0.00183729, [2] [Cycle 1]: 0.00123947, [45] [expand_dump_flag]: 2.46998e-06 [switch_simplify]: 2.365e-05 [loop_unroll]: 1.363e-05 [a_1]: 0.00029029 [with_stream_mark]: 1.372e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.76001e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 2.09999e-06 [a_2]: 7.558e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 2.17001e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 7.70998e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.727e-05 [flash_sp]: 7.01001e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.2e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 6.81001e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.70001e-06 [virtual_output]: 5.58002e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 8.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.109e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.41002e-06 [renormalize]: 0.00033892 [add_forward_monad_depend]: 4.29997e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 2.619e-05 [a_3]: 3.956e-05 [Cycle 2]: 0.00058842, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 6.93998e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012463 [with_stream_mark]: 1.045e-05 [recompute_prepare]: 5.42001e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.48998e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.725e-05 [accelerated_algorithm]: 5.32001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.24998e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.15001e-06 [parallel]: 3.75e-06 [flash_sp]: 3.37997e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.56998e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 4.90001e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36998e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.84999e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.86e-06 [a_after_grad]: 8.08999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.07001e-06 [cse]: 1.238e-05 [a_3]: 3.128e-05 [py_interpret_to_execute_after_opt_a]: 7.25e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 3.047e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00044577 [opt_b]: 0.00017915, [1] [Cycle 1]: 0.00017324, [7] [b_1]: 0.00010648 [b_2]: 6.93e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 3.50003e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.585e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.229e-05 [loop_unroll]: 0.00041309 [opt_after_cconv]: 9.397e-05, [1] [Cycle 1]: 8.847e-05, [7] [c_1]: 2.785e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.10002e-06 [cse]: 1.633e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.299e-05 [tuple_transform]: 0.00010222, [1] [Cycle 1]: 9.792e-05, [4] [d_1]: 7.089e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.61e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.315e-05 [cse_after_recomputation]: 2.073e-05, [1] [Cycle 1]: 1.646e-05, [1] [cse]: 1.123e-05 [environ_conv]: 5.17999e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.39001e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.36998e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.60002e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.35001e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.06003e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.742e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.30002e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 6.884e-05, [1] [Cycle 1]: 6.461e-05, [6] [build]: 2.59001e-06 [elim_shapecalc]: 8.57e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.59e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.551e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00044882 [validate]: 2.999e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.123293 [execute]: 8.82e-06 Sums bootstrap : 0.000454s : 0.34% type_inference : 0.004382s : 3.33% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000038s : 0.03% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.02% optimize.opt_a.loop_unroll : 0.000019s : 0.01% optimize.opt_a.a_1 : 0.000415s : 0.32% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.11% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000339s : 0.26% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.01% optimize.opt_a.cse : 0.000039s : 0.03% optimize.opt_a.a_3 : 0.000071s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.34% optimize.opt_b.b_1 : 0.000106s : 0.08% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000413s : 0.31% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000071s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.34% validate : 0.000030s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.123293s : 93.65% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.72% : 0.000023s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.63% : 0.000006s : 4: substitution.graph_param_transform 64.90% : 0.000078s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.64% : 0.000004s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004342 2 91.76% : 0.003985s : 1: type_inference.infer 8.24% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.71% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.75% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.98% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.80% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.18% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.34% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000257 6 42.85% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.15% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.143556 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.09% : 0.002996s : 1: add_attr 2.08% : 0.002988s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000055s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.34% : 0.000490s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.29% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.32% : 0.000455s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.53% : 0.000761s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000075s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.28% : 0.001840s : 1: opt_a 0.07% : 0.000097s : 1: opt_after_cconv 0.32% : 0.000458s : 1: opt_after_jit_grad 0.13% : 0.000182s : 1: opt_b 2.56% : 0.003673s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.01% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.13% : 0.000186s : 1: renormalize.infer 0.10% : 0.000147s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000034s : 1: rewriter_after_opt_a 0.03% : 0.000042s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000071s : 1: symbol_engine_optimizer 85.90% : 0.123315s : 1: task_emit 0.07% : 0.000105s : 1: tuple_transform 3.06% : 0.004396s : 1: type_inference 0.04% : 0.000051s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x8-ge],max_mem:46.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x9-pynative],max_mem:46.0M TotalTime = 0.0217018, [24] [bootstrap]: 0.0005395 [type_inference]: 0.00634903 [event_method]: 1.422e-05 [auto_monad]: 5.762e-05 [graph_reusing]: 5.20999e-06 [inline]: 1.45999e-06 [add_attr]: 0.00354747, [1] [add_attr_with_inline]: 0.00353567, [1] [Cycle 1]: 4.378e-05, [2] [tag_attr]: 1.506e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.766e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.51002e-06 [optimize]: 0.00405548, [53] [py_interpret_to_execute]: 2.372e-05 [rewriter_before_opt_a]: 5.795e-05 [opt_a]: 0.00220577, [2] [Cycle 1]: 0.00159724, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 3.197e-05 [loop_unroll]: 2.116e-05 [a_1]: 0.00050886 [with_stream_mark]: 1.502e-05 [recompute_prepare]: 8.60001e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.614e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 6.21998e-06 [parallel]: 2.498e-05 [flash_sp]: 7.82e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 8.56002e-06 [allreduce_slice_to_reducescatter]: 1.01997e-06 [virtual_shard_identity]: 9.87999e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.84e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.051e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.53002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.43e-06 [flash_sp_send_recv_attached]: 2.33002e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 9.84001e-06 [a_after_grad]: 8.98002e-06 [renormalize]: 0.00042663 [add_forward_monad_depend]: 4.63001e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.412e-05 [cse]: 2.793e-05 [a_3]: 4.093e-05 [Cycle 2]: 0.00059898, [45] [expand_dump_flag]: 1.10001e-06 [switch_simplify]: 7.15003e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012566 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.31998e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 6.773e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.37999e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.23002e-06 [parallel]: 6.41998e-06 [flash_sp]: 3.03e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.59001e-06 [matmul_add_comm_reduction]: 4.93001e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.07e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 6.31998e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 5.76998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.015e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.07e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 8.89e-06 [a_after_grad]: 7.97e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.23002e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.50002e-06 [cse]: 1.307e-05 [a_3]: 3.221e-05 [py_interpret_to_execute_after_opt_a]: 7.32002e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.371e-05 [convert_after_rewriter]: 7.28999e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.00044428 [opt_b]: 0.0001832, [1] [Cycle 1]: 0.00017744, [7] [b_1]: 0.00010863 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.51e-06 [renormalize]: 4.30009e-07 [cse]: 1.654e-05 [optimize_parallel_all_gather_comm]: 1.875e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.232e-05 [loop_unroll]: 0.00041351 [opt_after_cconv]: 9.637e-05, [1] [Cycle 1]: 9.017e-05, [7] [c_1]: 2.815e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.643e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.276e-05 [tuple_transform]: 6.941e-05, [1] [Cycle 1]: 6.509e-05, [4] [d_1]: 3.955e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 5.97001e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.7e-05 [cse_after_recomputation]: 2.105e-05, [1] [Cycle 1]: 1.645e-05, [1] [cse]: 1.138e-05 [environ_conv]: 4.43999e-06 [swap_dp_allreduce_reducescatter]: 5.54e-06 [bias_add_comm_swap]: 2.21998e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.64001e-06 [micro_interleaved_order_control]: 2.72001e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.18998e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.27e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.14e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 1.86e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.962e-05, [1] [Cycle 1]: 6.537e-05, [6] [build]: 2.49999e-06 [elim_shapecalc]: 8.99e-06 [elim_not_effective]: 1.199e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.87e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.532e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.27002e-06 [opt_after_jit_grad]: 0.00047586 [validate]: 3.163e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.0063546 [execute]: 7.45e-06 Sums bootstrap : 0.000540s : 3.14% type_inference : 0.006349s : 36.97% event_method : 0.000014s : 0.08% auto_monad : 0.000058s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000001s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.14% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000635s : 3.69% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000031s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000427s : 2.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.59% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000414s : 2.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000476s : 2.77% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006355s : 37.00% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000169 30 14.82% : 0.000025s : 5: substitution.arithmetic_simplify 1.28% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.38% : 0.000006s : 4: substitution.graph_param_transform 66.04% : 0.000112s : 3: substitution.inline 1.83% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.57% : 0.000004s : 4: substitution.remove_not_recompute_node 2.25% : 0.000004s : 4: substitution.replace_old_param 7.07% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006306 2 90.86% : 0.005730s : 1: type_inference.infer 9.14% : 0.000577s : 1: type_inference.specialize ------[replace.] 0.000082 5 33.83% : 0.000028s : 3: replace.inline 66.17% : 0.000054s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 90.97% : 0.000110s : 3: match.inline 9.03% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000009s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.70% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.75% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.33% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000362 8 49.10% : 0.000178s : 3: func_graph_cloner_run.FuncGraphClonerGraph 50.90% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030917 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.49% : 0.003552s : 1: add_attr 11.45% : 0.003539s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000063s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.87% : 0.000577s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.47% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.25% : 0.001006s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.16% : 0.000051s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002209s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.57% : 0.000486s : 1: opt_after_jit_grad 0.60% : 0.000187s : 1: opt_b 13.13% : 0.004059s : 1: optimize 0.07% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000216s : 1: renormalize.infer 0.66% : 0.000204s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000038s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 20.59% : 0.006364s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.58% : 0.006363s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.018258, [24] [bootstrap]: 0.00047118 [type_inference]: 0.00434331 [event_method]: 1.028e-05 [auto_monad]: 5.183e-05 [graph_reusing]: 5.50001e-06 [inline]: 1.90001e-06 [add_attr]: 0.00295648, [1] [add_attr_with_inline]: 0.00294826, [1] [Cycle 1]: 4.648e-05, [2] [tag_attr]: 1.225e-05 [meta_addattr_fg_expand]: 3.14001e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.178e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.23002e-06 [pipeline_split]: 1.82001e-06 [optimize]: 0.00370954, [53] [py_interpret_to_execute]: 1.472e-05 [rewriter_before_opt_a]: 3.881e-05 [opt_a]: 0.00189554, [2] [Cycle 1]: 0.00126056, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 2.578e-05 [loop_unroll]: 1.374e-05 [a_1]: 0.00029686 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 7.612e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.10002e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 5.27001e-06 [parallel]: 1.787e-05 [flash_sp]: 7.32002e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 9.77001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.20998e-06 [virtual_dataset]: 6.27001e-06 [get_grad_eliminate_]: 5.53997e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 9.78998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.82001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.50003e-06 [meta_fg_expand]: 2.19999e-06 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 2.23998e-06 [after_resolve]: 1.122e-05 [a_after_grad]: 9.34e-06 [renormalize]: 0.00034151 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 1.64998e-06 [auto_monad_eliminator]: 1.306e-05 [cse]: 2.748e-05 [a_3]: 4.044e-05 [Cycle 2]: 0.00062589, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 7.08e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00015384 [with_stream_mark]: 1.154e-05 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.848e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.58001e-06 [auto_parallel]: 5.30999e-06 [parallel]: 3.91999e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.18998e-06 [allreduce_fusion]: 2.98998e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.09998e-06 [virtual_output]: 5.04003e-06 [merge_forward]: 2.50997e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.96003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47999e-06 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.74e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 6.37001e-06 [cse]: 1.295e-05 [a_3]: 3.212e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 1.96998e-06 [rewriter_after_opt_a]: 3.011e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 4.77e-06 [mutable_eliminate]: 0.00045433 [opt_b]: 0.0001832, [1] [Cycle 1]: 0.00017732, [7] [b_1]: 0.00010839 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.61999e-06 [renormalize]: 4.80009e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.63e-05 [overlap_param_gather]: 1.97001e-06 [cconv]: 2.158e-05 [loop_unroll]: 0.00041535 [opt_after_cconv]: 9.525e-05, [1] [Cycle 1]: 8.959e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.609e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.314e-05 [tuple_transform]: 6.935e-05, [1] [Cycle 1]: 6.51e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.64998e-06 [add_recomputation]: 4.353e-05 [cse_after_recomputation]: 2.027e-05, [1] [Cycle 1]: 1.576e-05, [1] [cse]: 1.068e-05 [environ_conv]: 4.60001e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.70002e-06 [comm_op_add_attrs]: 1.29998e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.654e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.87001e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 6.895e-05, [1] [Cycle 1]: 6.485e-05, [6] [build]: 2.40002e-06 [elim_shapecalc]: 8.31002e-06 [elim_not_effective]: 1.151e-05 [opt_reshape]: 6.20002e-06 [fold_const_symbol]: 9.25999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.569e-05 [get_jit_bprop_graph]: 1.06997e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044432 [validate]: 3.262e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00597304 [execute]: 7.18998e-06 Sums bootstrap : 0.000471s : 3.28% type_inference : 0.004343s : 30.28% event_method : 0.000010s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000033s : 0.23% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000451s : 3.14% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.15% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000342s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000454s : 3.17% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000415s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 3.10% validate : 0.000033s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.005973s : 41.64% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.45% : 0.000023s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.11% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 65.47% : 0.000080s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.41% : 0.000004s : 4: substitution.remove_not_recompute_node 3.42% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004303 2 92.08% : 0.003962s : 1: type_inference.infer 7.92% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000139 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.63% : 0.000004s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.32% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.80% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000002s : 11: predicate.float_depend_g_call 0.78% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.63% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.06% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.50% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.69% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.02% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 1.00% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.90% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.67% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.33% : 0.000134s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026237 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.28% : 0.002961s : 1: add_attr 11.25% : 0.002952s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000508s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.08% : 0.000808s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.24% : 0.001898s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.73% : 0.000454s : 1: opt_after_jit_grad 0.71% : 0.000186s : 1: opt_b 14.15% : 0.003713s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000189s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.80% : 0.005983s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.60% : 0.004356s : 1: type_inference 0.23% : 0.000059s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x9-kbk],max_mem:46.0M . TotalTime = 1.1232, [24] [bootstrap]: 0.00058547 [type_inference]: 0.00621113 [event_method]: 1.376e-05 [auto_monad]: 6.797e-05 [graph_reusing]: 5.10001e-06 [inline]: 1.82999e-06 [add_attr]: 0.00345314, [1] [add_attr_with_inline]: 0.00344215, [1] [Cycle 1]: 4.463e-05, [2] [tag_attr]: 1.455e-05 [meta_addattr_fg_expand]: 4.55001e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.752e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.0040965, [53] [py_interpret_to_execute]: 2.135e-05 [rewriter_before_opt_a]: 5.797e-05 [opt_a]: 0.00223009, [2] [Cycle 1]: 0.00161955, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 3.251e-05 [loop_unroll]: 2.184e-05 [a_1]: 0.0005221 [with_stream_mark]: 1.539e-05 [recompute_prepare]: 8.01001e-06 [updatestate_depend_eliminate]: 3.96001e-06 [updatestate_assign_eliminate]: 3.63999e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.95001e-06 [a_2]: 7.867e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 6.74001e-06 [merge_send_recv]: 8.53001e-06 [auto_parallel]: 5.79e-06 [parallel]: 2.491e-05 [flash_sp]: 8.52e-06 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.40998e-06 [matmul_add_comm_reduction]: 8.79e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.48e-06 [virtual_dataset]: 6.31e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.034e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 9.69999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.75998e-06 [meta_fg_expand]: 2.63998e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00042893 [add_forward_monad_depend]: 4.54998e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.383e-05 [cse]: 2.653e-05 [a_3]: 4.242e-05 [Cycle 2]: 0.00060133, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00012962 [with_stream_mark]: 9.67999e-06 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.92e-05 [accelerated_algorithm]: 5.71998e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.21997e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 4.40999e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.44998e-06 [flash_sp]: 3.97998e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.52001e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 5.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 7.99977e-07 [before_grad]: 8.07e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.73997e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 8.18001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.566e-05 [a_3]: 3.21e-05 [py_interpret_to_execute_after_opt_a]: 7.50998e-06 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 3.185e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.0004539 [opt_b]: 0.00018397, [1] [Cycle 1]: 0.00017804, [7] [b_1]: 0.00010899 [b_2]: 7.53999e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.4002e-07 [cse]: 1.602e-05 [optimize_parallel_all_gather_comm]: 1.573e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.00042667 [opt_after_cconv]: 9.751e-05, [1] [Cycle 1]: 9.145e-05, [7] [c_1]: 2.915e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.35002e-06 [cse]: 1.571e-05 [renormalize]: 5.29981e-07 [remove_dup_value]: 1.265e-05 [tuple_transform]: 6.937e-05, [1] [Cycle 1]: 6.502e-05, [4] [d_1]: 3.92e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.42001e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.761e-05 [cse_after_recomputation]: 2.032e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.064e-05 [environ_conv]: 4.16001e-06 [swap_dp_allreduce_reducescatter]: 5.06002e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 4.13999e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.18001e-06 [overlap_opt_shard_in_pipeline]: 1.71e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.169e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.48999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.28998e-06 [overlap_grad_ring_attention]: 3.87002e-06 [overlap_grad_flash_sp]: 1.703e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 7e-05, [1] [Cycle 1]: 6.595e-05, [6] [build]: 2.10002e-06 [elim_shapecalc]: 8.75999e-06 [elim_not_effective]: 1.201e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.37999e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.593e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00049819 [validate]: 3.124e-05 [backend_pass]: 9.39996e-07 [task_emit]: 1.10794 [execute]: 9.18002e-06 Sums bootstrap : 0.000585s : 0.05% type_inference : 0.006211s : 0.56% event_method : 0.000014s : 0.00% auto_monad : 0.000068s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000652s : 0.06% optimize.opt_a.with_stream_mark : 0.000025s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000013s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000029s : 0.00% optimize.opt_a.flash_sp : 0.000012s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000429s : 0.04% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000042s : 0.00% optimize.opt_a.a_3 : 0.000075s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000454s : 0.04% optimize.opt_b.b_1 : 0.000109s : 0.01% optimize.opt_b.b_2 : 0.000008s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.00% optimize.loop_unroll : 0.000427s : 0.04% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000498s : 0.04% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.107943s : 99.03% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000229 30 10.82% : 0.000025s : 5: substitution.arithmetic_simplify 0.83% : 0.000002s : 2: substitution.elim_not_effective 0.60% : 0.000001s : 2: substitution.fold_const_symbol 2.22% : 0.000005s : 4: substitution.graph_param_transform 75.47% : 0.000173s : 3: substitution.inline 1.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 1.94% : 0.000004s : 4: substitution.remove_not_recompute_node 1.72% : 0.000004s : 4: substitution.replace_old_param 5.07% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006167 2 90.86% : 0.005603s : 1: type_inference.infer 9.14% : 0.000564s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.75% : 0.000027s : 3: replace.inline 30.25% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000181 5 94.19% : 0.000171s : 3: match.inline 5.81% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.95% : 0.000002s : 11: predicate.accumulaten_eliminater 1.04% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.89% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 2.04% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.20% : 0.000010s : 51: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.28% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.86% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.94% : 0.000002s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.89% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000370 8 44.94% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.06% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.132362 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.31% : 0.003458s : 1: add_attr 0.30% : 0.003446s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000074s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000630s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000436s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000463s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.09% : 0.001027s : 78: opt.transform.opt_a 0.00% : 0.000028s : 1: opt.transform.opt_after_cconv 0.00% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000091s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.20% : 0.002233s : 1: opt_a 0.01% : 0.000101s : 1: opt_after_cconv 0.04% : 0.000508s : 1: opt_after_jit_grad 0.02% : 0.000188s : 1: opt_b 0.36% : 0.004100s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000218s : 1: renormalize.infer 0.02% : 0.000204s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000073s : 1: symbol_engine_optimizer 97.85% : 1.107965s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.55% : 0.006226s : 1: type_inference 0.01% : 0.000058s : 1: validate TotalTime = 0.12752, [24] [bootstrap]: 0.00048554 [type_inference]: 0.00441219 [event_method]: 1.035e-05 [auto_monad]: 5.011e-05 [graph_reusing]: 5.07999e-06 [inline]: 1.76e-06 [add_attr]: 0.00300287, [1] [add_attr_with_inline]: 0.00299448, [1] [Cycle 1]: 4.784e-05, [2] [tag_attr]: 1.228e-05 [meta_addattr_fg_expand]: 3.53e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 2.175e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.78002e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00372871, [53] [py_interpret_to_execute]: 1.511e-05 [rewriter_before_opt_a]: 3.941e-05 [opt_a]: 0.00186416, [2] [Cycle 1]: 0.00125701, [45] [expand_dump_flag]: 2.55002e-06 [switch_simplify]: 2.448e-05 [loop_unroll]: 1.394e-05 [a_1]: 0.00029124 [with_stream_mark]: 1.301e-05 [recompute_prepare]: 7.7e-06 [updatestate_depend_eliminate]: 3.59002e-06 [updatestate_assign_eliminate]: 3.02002e-06 [updatestate_loads_eliminate]: 2.96999e-06 [parameter_eliminate]: 1.49998e-06 [a_2]: 7.623e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.67001e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 8.02003e-06 [auto_parallel]: 5.93998e-06 [parallel]: 1.833e-05 [flash_sp]: 7.46001e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 5.75001e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.37001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.01002e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.115e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.15001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.17001e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.093e-05 [a_after_grad]: 9.14998e-06 [renormalize]: 0.00034643 [add_forward_monad_depend]: 4.53001e-06 [auto_monad_grad]: 1.82001e-06 [auto_monad_eliminator]: 1.232e-05 [cse]: 2.622e-05 [a_3]: 3.996e-05 [Cycle 2]: 0.00059815, [45] [expand_dump_flag]: 8.30012e-07 [switch_simplify]: 7.07002e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012604 [with_stream_mark]: 1.07e-05 [recompute_prepare]: 5.73997e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.41e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.829e-05 [accelerated_algorithm]: 5.56998e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 4.05e-06 [auto_parallel]: 5.25001e-06 [parallel]: 4.13001e-06 [flash_sp]: 2.88e-06 [merge_comm]: 3.05002e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 1.10999e-06 [receive_attached]: 1.03001e-06 [after_resolve]: 9.68002e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.33e-06 [cse]: 1.303e-05 [a_3]: 3.218e-05 [py_interpret_to_execute_after_opt_a]: 7.37002e-06 [slice_cell_reuse_recomputed_activation]: 1.76003e-06 [rewriter_after_opt_a]: 3.061e-05 [convert_after_rewriter]: 6.54001e-06 [order_py_execute_after_rewriter]: 4.91002e-06 [mutable_eliminate]: 0.00045129 [opt_b]: 0.00018468, [1] [Cycle 1]: 0.00017881, [7] [b_1]: 0.00010978 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 5.00004e-07 [cse]: 1.639e-05 [optimize_parallel_all_gather_comm]: 1.526e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.202e-05 [loop_unroll]: 0.0004668 [opt_after_cconv]: 9.736e-05, [1] [Cycle 1]: 9.146e-05, [7] [c_1]: 2.869e-05 [parameter_eliminate]: 2.66e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.674e-05 [renormalize]: 1.69995e-07 [remove_dup_value]: 1.201e-05 [tuple_transform]: 6.907e-05, [1] [Cycle 1]: 6.49e-05, [4] [d_1]: 3.926e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.33e-06 [partial_unused_args_eliminate]: 1.81e-06 [add_recomputation]: 4.268e-05 [cse_after_recomputation]: 2.041e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.057e-05 [environ_conv]: 4.83001e-06 [swap_dp_allreduce_reducescatter]: 4.80999e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.19997e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.38002e-06 [micro_interleaved_order_control]: 2.74001e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.32999e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.168e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.31002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.67e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.946e-05, [1] [Cycle 1]: 6.532e-05, [6] [build]: 2.58e-06 [elim_shapecalc]: 8.33001e-06 [elim_not_effective]: 1.177e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.66999e-06 [opt_after_jit_grad]: 0.0004504 [validate]: 3.157e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.115076 [execute]: 8.3e-06 Sums bootstrap : 0.000486s : 0.39% type_inference : 0.004412s : 3.57% event_method : 0.000010s : 0.01% auto_monad : 0.000050s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.03% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000417s : 0.34% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000347s : 0.28% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000039s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.37% optimize.opt_b.b_1 : 0.000110s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000467s : 0.38% optimize.opt_after_cconv.c_1 : 0.000029s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000450s : 0.36% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.115076s : 93.14% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 17.61% : 0.000021s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.37% : 0.000005s : 4: substitution.graph_param_transform 65.62% : 0.000079s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.88% : 0.000005s : 4: substitution.remove_not_recompute_node 3.70% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004370 2 91.71% : 0.004008s : 1: type_inference.infer 8.29% : 0.000362s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.42% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000001s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.82% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.86% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.66% : 0.000001s : 4: predicate.parallel_virtual_node 1.31% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.33% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.62% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000261 6 43.18% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.82% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.135529 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.22% : 0.003007s : 1: add_attr 2.21% : 0.002998s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000055s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.39% : 0.000522s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000476s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.57% : 0.000770s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000092s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.38% : 0.001867s : 1: opt_a 0.07% : 0.000101s : 1: opt_after_cconv 0.34% : 0.000460s : 1: opt_after_jit_grad 0.14% : 0.000188s : 1: opt_b 2.75% : 0.003732s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.01% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.14% : 0.000191s : 1: renormalize.infer 0.11% : 0.000148s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.03% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000072s : 1: symbol_engine_optimizer 84.92% : 0.115094s : 1: task_emit 0.05% : 0.000072s : 1: tuple_transform 3.27% : 0.004426s : 1: type_inference 0.04% : 0.000053s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y4-dtype_x9-ge],max_mem:46.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x0-pynative],max_mem:46.0M TotalTime = 0.0217674, [24] [bootstrap]: 0.00057645 [type_inference]: 0.0063402 [event_method]: 1.399e-05 [auto_monad]: 5.765e-05 [graph_reusing]: 5.23002e-06 [inline]: 1.79e-06 [add_attr]: 0.00351834, [1] [add_attr_with_inline]: 0.00350721, [1] [Cycle 1]: 4.514e-05, [2] [tag_attr]: 1.548e-05 [meta_addattr_fg_expand]: 4.33999e-06 [parallel-infer-symbol]: 3.11001e-06 [pre_auto_parallel]: 2.842e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.59998e-06 [optimize]: 0.0040239, [53] [py_interpret_to_execute]: 2.144e-05 [rewriter_before_opt_a]: 5.785e-05 [opt_a]: 0.00214398, [2] [Cycle 1]: 0.00152864, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 3.19e-05 [loop_unroll]: 2.143e-05 [a_1]: 0.00045816 [with_stream_mark]: 1.378e-05 [recompute_prepare]: 7.55998e-06 [updatestate_depend_eliminate]: 3.87998e-06 [updatestate_assign_eliminate]: 3.48999e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.586e-05 [accelerated_algorithm]: 6.45002e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 5.87999e-06 [parallel]: 2.425e-05 [flash_sp]: 6.83e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.43999e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.36999e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.49e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00042232 [add_forward_monad_depend]: 4.83001e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.331e-05 [cse]: 2.745e-05 [a_3]: 4.15e-05 [Cycle 2]: 0.00060576, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012726 [with_stream_mark]: 1.017e-05 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.37001e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 6.785e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.21002e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.05999e-06 [parallel]: 4.64998e-06 [flash_sp]: 2.94999e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 4.98001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.48e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.20999e-06 [merge_forward]: 2.78e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 6.19001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82999e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.05e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21999e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.99998e-06 [a_after_grad]: 8.23999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.84999e-06 [cse]: 1.358e-05 [a_3]: 3.292e-05 [py_interpret_to_execute_after_opt_a]: 7.67998e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 3.17e-05 [convert_after_rewriter]: 7.31999e-06 [order_py_execute_after_rewriter]: 5.15999e-06 [mutable_eliminate]: 0.00045208 [opt_b]: 0.00018257, [1] [Cycle 1]: 0.00017671, [7] [b_1]: 0.00010813 [b_2]: 6.97002e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.50003e-07 [cse]: 1.646e-05 [optimize_parallel_all_gather_comm]: 1.597e-05 [overlap_param_gather]: 1.72001e-06 [cconv]: 2.158e-05 [loop_unroll]: 0.00044641 [opt_after_cconv]: 9.546e-05, [1] [Cycle 1]: 8.99e-05, [7] [c_1]: 2.789e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.659e-05 [renormalize]: 6.00005e-07 [remove_dup_value]: 1.203e-05 [tuple_transform]: 6.787e-05, [1] [Cycle 1]: 6.381e-05, [4] [d_1]: 3.822e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.83e-05 [cse_after_recomputation]: 2.096e-05, [1] [Cycle 1]: 1.664e-05, [1] [cse]: 1.124e-05 [environ_conv]: 4.86997e-06 [swap_dp_allreduce_reducescatter]: 5.74e-06 [bias_add_comm_swap]: 2.91e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.04e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.07998e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.09003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.225e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.51999e-06 [overlap_recompute_and_grad_model_parallel]: 4.66002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.87001e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.797e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 1.90001e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.866e-05, [1] [Cycle 1]: 6.465e-05, [6] [build]: 2.49001e-06 [elim_shapecalc]: 8.54998e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.80001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.73002e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.554e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 0.00012236 [opt_after_jit_grad]: 0.00045875 [validate]: 3.504e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00634165 [execute]: 6.87002e-06 Sums bootstrap : 0.000576s : 3.34% type_inference : 0.006340s : 36.72% event_method : 0.000014s : 0.08% auto_monad : 0.000058s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000585s : 3.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000422s : 2.45% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000074s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.62% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000446s : 2.59% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000122s : 0.71% opt_after_jit_grad : 0.000459s : 2.66% validate : 0.000035s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006342s : 36.73% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000167 30 14.73% : 0.000025s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 3.10% : 0.000005s : 4: substitution.graph_param_transform 67.48% : 0.000113s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000005s : 4: substitution.remove_not_recompute_node 2.27% : 0.000004s : 4: substitution.replace_old_param 6.16% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006298 2 90.68% : 0.005711s : 1: type_inference.infer 9.32% : 0.000587s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.43% : 0.000027s : 3: replace.inline 30.57% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 92.27% : 0.000111s : 3: match.inline 7.73% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.97% : 0.000002s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.37% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.17% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 8: predicate.less_batch_normalization 1.64% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.86% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.86% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000002s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.89% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.83% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000002s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.56% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000371 8 46.99% : 0.000174s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.01% : 0.000197s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030838 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.42% : 0.003522s : 1: add_attr 11.39% : 0.003511s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000063s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.00% : 0.000617s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000455s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000955s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.96% : 0.002147s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.52% : 0.000469s : 1: opt_after_jit_grad 0.60% : 0.000186s : 1: opt_b 13.06% : 0.004028s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.68% : 0.000210s : 1: renormalize.infer 0.67% : 0.000206s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.41% : 0.000127s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.60% : 0.006352s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.60% : 0.006354s : 1: type_inference 0.21% : 0.000065s : 1: validate TotalTime = 0.0181936, [24] [bootstrap]: 0.00047266 [type_inference]: 0.00440036 [event_method]: 1.03e-05 [auto_monad]: 5.271e-05 [graph_reusing]: 4.71002e-06 [inline]: 2.00002e-06 [add_attr]: 0.00297676, [1] [add_attr_with_inline]: 0.00296898, [1] [Cycle 1]: 4.595e-05, [2] [tag_attr]: 1.161e-05 [meta_addattr_fg_expand]: 3.2e-06 [parallel-infer-symbol]: 2.57001e-06 [pre_auto_parallel]: 2.189e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.72999e-06 [pipeline_split]: 1.86998e-06 [optimize]: 0.00370806, [53] [py_interpret_to_execute]: 1.506e-05 [rewriter_before_opt_a]: 3.841e-05 [opt_a]: 0.0018634, [2] [Cycle 1]: 0.00126162, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 2.532e-05 [loop_unroll]: 1.372e-05 [a_1]: 0.00029589 [with_stream_mark]: 1.338e-05 [recompute_prepare]: 7.41999e-06 [updatestate_depend_eliminate]: 3.46001e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.599e-05 [accelerated_algorithm]: 6.28998e-06 [shard]: 2.38002e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 6.37001e-06 [merge_send_recv]: 7.28e-06 [auto_parallel]: 5.77999e-06 [parallel]: 1.714e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.15998e-06 [virtual_dataset]: 5.79999e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.81998e-06 [merge_forward]: 3.60998e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.94001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 1.97001e-06 [before_grad]: 9.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61999e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.86e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 1.051e-05 [a_after_grad]: 9.14e-06 [renormalize]: 0.0003444 [add_forward_monad_depend]: 4.23001e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.254e-05 [cse]: 2.782e-05 [a_3]: 3.987e-05 [Cycle 2]: 0.00059267, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012483 [with_stream_mark]: 1.05e-05 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.839e-05 [accelerated_algorithm]: 5.59998e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.16002e-06 [parallel]: 4.18001e-06 [flash_sp]: 3.24001e-06 [merge_comm]: 2.84001e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.00001e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 5.22999e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.31998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.004e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 1.07998e-06 [receive_attached]: 9.99979e-07 [after_resolve]: 8.78001e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.53e-06 [cse]: 1.289e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.079e-05 [convert_after_rewriter]: 7.34002e-06 [order_py_execute_after_rewriter]: 5.09998e-06 [mutable_eliminate]: 0.00045074 [opt_b]: 0.00018264, [1] [Cycle 1]: 0.00017642, [7] [b_1]: 0.00010852 [b_2]: 6.78e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 5.10016e-07 [cse]: 1.627e-05 [optimize_parallel_all_gather_comm]: 1.649e-05 [overlap_param_gather]: 2.21998e-06 [cconv]: 2.192e-05 [loop_unroll]: 0.00041679 [opt_after_cconv]: 0.0001205, [1] [Cycle 1]: 8.926e-05, [7] [c_1]: 2.812e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.542e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.333e-05 [tuple_transform]: 7.028e-05, [1] [Cycle 1]: 6.581e-05, [4] [d_1]: 3.926e-05 [none_parameter_eliminate]: 1.88002e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.49999e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 4.379e-05 [cse_after_recomputation]: 2.132e-05, [1] [Cycle 1]: 1.684e-05, [1] [cse]: 1.15e-05 [environ_conv]: 5.11002e-06 [swap_dp_allreduce_reducescatter]: 5.67001e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.50002e-06 [merge_cast_opt]: 1.36002e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.01998e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.40999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.162e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 4.46002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.705e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.26998e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 6.836e-05, [1] [Cycle 1]: 6.434e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.2e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.78002e-06 [pipeline_parallel_scheduler]: 1.75001e-06 [auto_monad_reorder]: 1.585e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00045393 [validate]: 3.054e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00582437 [execute]: 6.69001e-06 Sums bootstrap : 0.000473s : 3.32% type_inference : 0.004400s : 30.92% event_method : 0.000010s : 0.07% auto_monad : 0.000053s : 0.37% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.23% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000421s : 2.96% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000344s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.29% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000451s : 3.17% optimize.opt_b.b_1 : 0.000109s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000417s : 2.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000454s : 3.19% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005824s : 40.93% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 17.81% : 0.000021s : 4: substitution.arithmetic_simplify 1.69% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.30% : 0.000005s : 4: substitution.graph_param_transform 66.17% : 0.000079s : 2: substitution.inline 2.18% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.72% : 0.000004s : 4: substitution.remove_not_recompute_node 3.05% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004358 2 92.01% : 0.004010s : 1: type_inference.infer 7.99% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_depend_swap 2.00% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 1.00% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.50% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.24% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.13% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.64% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.72% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.28% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026154 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.40% : 0.002981s : 1: add_attr 11.36% : 0.002972s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000508s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000426s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.96% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.14% : 0.001866s : 1: opt_a 0.48% : 0.000125s : 1: opt_after_cconv 1.77% : 0.000463s : 1: opt_after_jit_grad 0.71% : 0.000186s : 1: opt_b 14.19% : 0.003712s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.72% : 0.000190s : 1: renormalize.infer 0.57% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.31% : 0.005834s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.88% : 0.004414s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0196863, [24] [bootstrap]: 0.00042417 [type_inference]: 0.00557681 [event_method]: 1.429e-05 [auto_monad]: 5.358e-05 [graph_reusing]: 5.74999e-06 [inline]: 1.97001e-06 [add_attr]: 0.00300399, [1] [add_attr_with_inline]: 0.00299565, [1] [Cycle 1]: 4.501e-05, [2] [tag_attr]: 1.525e-05 [meta_addattr_fg_expand]: 4.15999e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.471e-05 [insert-virtual-dataset]: 2.42001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.75001e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00397703, [53] [py_interpret_to_execute]: 1.941e-05 [rewriter_before_opt_a]: 5.878e-05 [opt_a]: 0.00212754, [2] [Cycle 1]: 0.00152274, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 3.221e-05 [loop_unroll]: 2.117e-05 [a_1]: 0.00046464 [with_stream_mark]: 1.374e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.44001e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.654e-05 [accelerated_algorithm]: 6.58003e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 7.6e-06 [auto_parallel]: 5.98002e-06 [parallel]: 1.743e-05 [flash_sp]: 7.24001e-06 [merge_comm]: 3.46999e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 9.50007e-07 [virtual_shard_identity]: 7.11001e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.41998e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 9.14998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.058e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.58002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 2.60002e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00041784 [add_forward_monad_depend]: 4.69002e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.355e-05 [cse]: 2.596e-05 [a_3]: 4.041e-05 [Cycle 2]: 0.00059559, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 6.78998e-06 [loop_unroll]: 5.71e-06 [a_1]: 0.00012631 [with_stream_mark]: 9.21998e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.954e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.21002e-06 [shard_inline]: 5.68997e-06 [merge_send_recv]: 4.20999e-06 [auto_parallel]: 5.39998e-06 [parallel]: 4.35e-06 [flash_sp]: 3.71001e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.36002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.41e-06 [cell_reuse_recompute_pass]: 1.46002e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.06001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.91999e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 1.08001e-06 [receive_attached]: 1.09003e-06 [after_resolve]: 9.07999e-06 [a_after_grad]: 8.1e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 9.69972e-07 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.528e-05 [a_3]: 3.182e-05 [py_interpret_to_execute_after_opt_a]: 7.33e-06 [slice_cell_reuse_recomputed_activation]: 1.88002e-06 [rewriter_after_opt_a]: 3.183e-05 [convert_after_rewriter]: 6.88998e-06 [order_py_execute_after_rewriter]: 5.15999e-06 [mutable_eliminate]: 0.00045212 [opt_b]: 0.0001834, [1] [Cycle 1]: 0.00017713, [7] [b_1]: 0.00010982 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.41002e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.48e-06 [renormalize]: 3.19997e-07 [cse]: 1.575e-05 [optimize_parallel_all_gather_comm]: 1.581e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.219e-05 [loop_unroll]: 0.00042253 [opt_after_cconv]: 9.563e-05, [1] [Cycle 1]: 8.989e-05, [7] [c_1]: 2.811e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.77002e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.617e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.274e-05 [tuple_transform]: 6.946e-05, [1] [Cycle 1]: 6.498e-05, [4] [d_1]: 3.942e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 4.437e-05 [cse_after_recomputation]: 1.975e-05, [1] [Cycle 1]: 1.519e-05, [1] [cse]: 1.022e-05 [environ_conv]: 4.58999e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.59999e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.33002e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.172e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.74e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.39001e-06 [overlap_grad_ring_attention]: 4.48001e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.861e-05, [1] [Cycle 1]: 6.439e-05, [6] [build]: 2.68e-06 [elim_shapecalc]: 8.18001e-06 [elim_not_effective]: 1.178e-05 [opt_reshape]: 6.15002e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.61998e-06 [auto_monad_reorder]: 1.535e-05 [get_jit_bprop_graph]: 1.14998e-06 [rewriter_after_jit_bprop_graph]: 4.15999e-06 [opt_after_jit_grad]: 0.0004535 [validate]: 3.065e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.0058832 [execute]: 7.34002e-06 Sums bootstrap : 0.000424s : 2.70% type_inference : 0.005577s : 35.47% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000591s : 3.76% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000418s : 2.66% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.88% optimize.opt_b.b_1 : 0.000110s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000423s : 2.69% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000453s : 2.88% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005883s : 37.42% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000166 30 15.47% : 0.000026s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.30% : 0.000005s : 4: substitution.graph_param_transform 66.37% : 0.000110s : 3: substitution.inline 1.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.28% : 0.000004s : 4: substitution.replace_old_param 6.34% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005536 2 89.17% : 0.004937s : 1: type_inference.infer 10.83% : 0.000599s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.42% : 0.000028s : 3: replace.inline 28.58% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.92% : 0.000108s : 3: match.inline 8.08% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.71% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.33% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.59% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.51% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.58% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.83% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 46.97% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.03% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028205 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.67% : 0.003008s : 1: add_attr 10.63% : 0.002999s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.63% : 0.000460s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.53% : 0.000431s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.40% : 0.000959s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000092s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.55% : 0.002130s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.64% : 0.000463s : 1: opt_after_jit_grad 0.66% : 0.000187s : 1: opt_b 14.11% : 0.003981s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.77% : 0.000216s : 1: renormalize.infer 0.69% : 0.000195s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 20.89% : 0.005893s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 19.82% : 0.005591s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0376197, [24] [bootstrap]: 0.00045809 [type_inference]: 0.0114806 [event_method]: 4.698e-05 [auto_monad]: 0.00012234 [graph_reusing]: 7.93001e-06 [inline]: 1.96e-06 [add_attr]: 0.00301531, [1] [add_attr_with_inline]: 0.0030068, [1] [Cycle 1]: 7.245e-05, [2] [tag_attr]: 3.522e-05 [meta_addattr_fg_expand]: 9.04e-06 [parallel-infer-symbol]: 3.31999e-06 [pre_auto_parallel]: 4.942e-05 [insert-virtual-dataset]: 2.94001e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.0134616, [53] [py_interpret_to_execute]: 3.843e-05 [rewriter_before_opt_a]: 0.00014413 [opt_a]: 0.0111485, [3] [Cycle 1]: 0.007132, [45] [expand_dump_flag]: 4.3e-06 [switch_simplify]: 7.298e-05 [loop_unroll]: 6.197e-05 [a_1]: 0.00147255 [with_stream_mark]: 2.311e-05 [recompute_prepare]: 2.207e-05 [updatestate_depend_eliminate]: 9.20001e-06 [updatestate_assign_eliminate]: 8.52e-06 [updatestate_loads_eliminate]: 7.51999e-06 [parameter_eliminate]: 2.47001e-06 [a_2]: 0.00024477 [accelerated_algorithm]: 3.037e-05 [shard]: 2.40002e-06 [meta_shard_fg_expand]: 3.48e-06 [shard_inline]: 1.624e-05 [merge_send_recv]: 1.545e-05 [auto_parallel]: 1.14e-05 [parallel]: 1.815e-05 [flash_sp]: 1.088e-05 [merge_comm]: 9.77999e-06 [allreduce_fusion]: 8.90999e-06 [matmul_add_comm_reduction]: 2.609e-05 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 1.812e-05 [virtual_dataset]: 1.548e-05 [get_grad_eliminate_]: 1.579e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.60001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.796e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.139e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 2.946e-05 [set_forward_comm_id_for_comm_node_pass]: 1.009e-05 [meta_fg_expand]: 0.00141297 [flash_sp_send_recv_attached]: 4.2e-06 [receive_attached]: 2.46e-06 [after_resolve]: 6.084e-05 [a_after_grad]: 8.353e-05 [renormalize]: 0.00247067 [add_forward_monad_depend]: 9.49999e-06 [auto_monad_grad]: 5.29e-06 [auto_monad_eliminator]: 5.609e-05 [cse]: 0.00016521 [a_3]: 0.00033649 [Cycle 2]: 0.00308966, [45] [expand_dump_flag]: 1.75001e-06 [switch_simplify]: 4.807e-05 [loop_unroll]: 4.476e-05 [a_1]: 0.00159749 [with_stream_mark]: 1.286e-05 [recompute_prepare]: 1.133e-05 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 1.10999e-06 [a_2]: 0.00012671 [accelerated_algorithm]: 1.239e-05 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 1.041e-05 [merge_send_recv]: 7.13e-06 [auto_parallel]: 8.03001e-06 [parallel]: 5.37999e-06 [flash_sp]: 3.64002e-06 [merge_comm]: 5.34e-06 [allreduce_fusion]: 4.68999e-06 [matmul_add_comm_reduction]: 7.84002e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 9.95002e-06 [virtual_dataset]: 9.04998e-06 [get_grad_eliminate_]: 9.00999e-06 [virtual_output]: 8.52e-06 [merge_forward]: 4.66997e-06 [cell_reuse_recompute_pass]: 8.60018e-07 [offload_activation]: 8.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.634e-05 [merge_recompute_call_nodes]: 8.49977e-07 [before_grad]: 1.448e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 7.096e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.672e-05 [a_after_grad]: 1.48e-05 [renormalize]: 0.00059639 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.508e-05 [cse]: 4.753e-05 [a_3]: 6.589e-05 [Cycle 3]: 0.00091211, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.071e-05 [loop_unroll]: 9.12001e-06 [a_1]: 0.00025572 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 9.69999e-06 [updatestate_depend_eliminate]: 4.70999e-06 [updatestate_assign_eliminate]: 3.85998e-06 [updatestate_loads_eliminate]: 3.86999e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 0.00012461 [accelerated_algorithm]: 1.167e-05 [shard]: 9.49978e-07 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.38999e-06 [parallel]: 4.68999e-06 [flash_sp]: 1.13001e-06 [merge_comm]: 4.86002e-06 [allreduce_fusion]: 4.74998e-06 [matmul_add_comm_reduction]: 7.55e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.94e-06 [get_grad_eliminate_]: 8.62e-06 [virtual_output]: 8.2e-06 [merge_forward]: 4.14002e-06 [cell_reuse_recompute_pass]: 1.24003e-06 [offload_activation]: 8.94998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.587e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.52e-05 [set_forward_comm_id_for_comm_node_pass]: 5.89999e-06 [meta_fg_expand]: 3.09001e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.399e-05 [a_after_grad]: 1.447e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.37e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 1.065e-05 [cse]: 2.527e-05 [a_3]: 6.044e-05 [py_interpret_to_execute_after_opt_a]: 1.121e-05 [slice_cell_reuse_recomputed_activation]: 2.24999e-06 [rewriter_after_opt_a]: 4.757e-05 [convert_after_rewriter]: 9.36998e-06 [order_py_execute_after_rewriter]: 6.76e-06 [mutable_eliminate]: 0.0004694 [opt_b]: 0.00028817, [1] [Cycle 1]: 0.00028181, [7] [b_1]: 0.0001899 [b_2]: 1.094e-05 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.22998e-06 [updatestate_loads_eliminate]: 3.9e-06 [renormalize]: 2.3999e-07 [cse]: 3.086e-05 [optimize_parallel_all_gather_comm]: 2.031e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.032e-05 [loop_unroll]: 0.00045398 [opt_after_cconv]: 0.00013782, [1] [Cycle 1]: 0.00013234, [7] [c_1]: 5.106e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 4.00998e-06 [cse]: 2.991e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 2.92e-05 [tuple_transform]: 0.00010281, [1] [Cycle 1]: 9.802e-05, [4] [d_1]: 6.735e-05 [none_parameter_eliminate]: 1.84998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 1.019e-05 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 6.198e-05 [cse_after_recomputation]: 3.371e-05, [1] [Cycle 1]: 2.868e-05, [1] [cse]: 2.305e-05 [environ_conv]: 9.02999e-06 [swap_dp_allreduce_reducescatter]: 7.84002e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.05998e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.62001e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.65002e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.06997e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.697e-05 [grouped_pairwise_exchange_alltoall]: 1.88002e-06 [offloading_packed_experts]: 4.80001e-06 [overlap_recompute_and_grad_model_parallel]: 5.34e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 5.12999e-06 [overlap_grad_flash_sp]: 2.486e-05 [begin_end_overlap_inline]: 7.00005e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 9.816e-05, [1] [Cycle 1]: 9.375e-05, [6] [build]: 9.70002e-06 [elim_shapecalc]: 1.327e-05 [elim_not_effective]: 1.804e-05 [opt_reshape]: 9.96e-06 [fold_const_symbol]: 1.493e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.11e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 2.431e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00047181 [validate]: 4.478e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00819992 [execute]: 7.61999e-06 Sums bootstrap : 0.000458s : 1.37% type_inference : 0.011481s : 34.43% event_method : 0.000047s : 0.14% auto_monad : 0.000122s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.12% optimize.rewriter_before_opt_a : 0.000144s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.40% optimize.opt_a.loop_unroll : 0.000116s : 0.35% optimize.opt_a.a_1 : 0.003326s : 9.97% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000027s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000059s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001487s : 4.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000092s : 0.27% optimize.opt_a.a_after_grad : 0.000113s : 0.34% optimize.opt_a.renormalize : 0.003067s : 9.20% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.25% optimize.opt_a.cse : 0.000238s : 0.71% optimize.opt_a.a_3 : 0.000463s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000469s : 1.41% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000454s : 1.36% optimize.opt_after_cconv.c_1 : 0.000051s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000062s : 0.19% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000472s : 1.41% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008200s : 24.59% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000771 222 6.02% : 0.000046s : 12: substitution.arithmetic_simplify 1.77% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.64% : 0.000429s : 17: substitution.inline 2.11% : 0.000016s : 2: substitution.inline_without_move 1.43% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.80% : 0.000014s : 20: substitution.remove_not_recompute_node 3.00% : 0.000023s : 10: substitution.replace_applicator 1.43% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.63% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.41% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011405 2 86.83% : 0.009903s : 1: type_inference.infer 13.17% : 0.001502s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.95% : 0.000128s : 17: replace.inline 42.05% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000454 33 92.46% : 0.000420s : 17: match.inline 7.54% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000806 5764 1.02% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 0.98% : 0.000008s : 68: predicate.addn_zero_filter 0.98% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.94% : 0.000016s : 100: predicate.arithmetic_simplify 1.07% : 0.000009s : 68: predicate.cast_eliminate 1.05% : 0.000008s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.48% : 0.000004s : 32: predicate.depend_value_elim 1.11% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.14% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 7.72% : 0.000062s : 76: predicate.environ_add_const_eliminate 1.10% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.12% : 0.000009s : 76: predicate.environ_get_depend_swap 1.61% : 0.000013s : 108: predicate.environ_get_eliminate 1.12% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.63% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.18% : 0.000018s : 101: predicate.float_depend_g_call 0.46% : 0.000004s : 32: predicate.float_environ_get_switch 0.63% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.25% : 0.000042s : 249: predicate.inline 1.16% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.60% : 0.000005s : 32: predicate.less_batch_normalization 1.57% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.48% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.15% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.33% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.02% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.03% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.05% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.93% : 0.000016s : 101: predicate.partial_defer_inline 1.64% : 0.000013s : 92: predicate.partial_eliminate 0.99% : 0.000008s : 68: predicate.print_const_string_wrapper 0.49% : 0.000004s : 32: predicate.reduce_all_const_elim 1.22% : 0.000010s : 68: predicate.reduce_eliminate 2.47% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.74% : 0.000014s : 152: predicate.replace_applicator 0.56% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.01% : 0.000008s : 68: predicate.reshape_eliminate 1.07% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.13% : 0.000009s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.56% : 0.000005s : 32: predicate.shard_identity_eliminate 0.26% : 0.000002s : 16: predicate.special_op_eliminate 0.58% : 0.000005s : 32: predicate.specialize_transform 1.15% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.74% : 0.000014s : 101: predicate.switch_defer_inline 2.77% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.70% : 0.000038s : 277: predicate.switch_simplify 1.01% : 0.000008s : 68: predicate.tile_eliminate 1.01% : 0.000008s : 68: predicate.transpose_eliminate 1.38% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.43% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.66% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.33% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.87% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.50% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.48% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.06% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.52% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.51% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001569 34 57.36% : 0.000900s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.64% : 0.000669s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062506 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.83% : 0.003020s : 1: add_attr 4.82% : 0.003011s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000129s : 1: auto_monad 0.04% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.79% : 0.000492s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000463s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000478s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.01% : 0.005009s : 117: opt.transform.opt_a 0.08% : 0.000050s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.84% : 0.011151s : 1: opt_a 0.23% : 0.000141s : 1: opt_after_cconv 0.77% : 0.000481s : 1: opt_after_jit_grad 0.47% : 0.000292s : 1: opt_b 21.54% : 0.013465s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.70% : 0.001686s : 2: renormalize.infer 2.19% : 0.001369s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.14% : 0.008210s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 18.39% : 0.011496s : 1: type_inference 0.12% : 0.000078s : 1: validate TotalTime = 0.0191524, [24] [bootstrap]: 0.00042814 [type_inference]: 0.00438274 [event_method]: 1.105e-05 [auto_monad]: 5.18e-05 [graph_reusing]: 5.35999e-06 [inline]: 1.86e-06 [add_attr]: 0.00304725, [1] [add_attr_with_inline]: 0.00303836, [1] [Cycle 1]: 4.838e-05, [2] [tag_attr]: 1.254e-05 [meta_addattr_fg_expand]: 2.97002e-06 [parallel-infer-symbol]: 3.83001e-06 [pre_auto_parallel]: 2.361e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 2.19999e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00388687, [53] [py_interpret_to_execute]: 1.531e-05 [rewriter_before_opt_a]: 3.924e-05 [opt_a]: 0.00201201, [2] [Cycle 1]: 0.00141257, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 2.492e-05 [loop_unroll]: 1.353e-05 [a_1]: 0.00029656 [with_stream_mark]: 1.455e-05 [recompute_prepare]: 8.11002e-06 [updatestate_depend_eliminate]: 3.97e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.27002e-06 [parameter_eliminate]: 1.71998e-06 [a_2]: 0.00012802 [accelerated_algorithm]: 6.83e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 6.39001e-06 [merge_send_recv]: 8.92999e-06 [auto_parallel]: 6.59999e-06 [parallel]: 1.909e-05 [flash_sp]: 7.99002e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 9.47999e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.76998e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.54002e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 9.94001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.111e-05 [merge_recompute_call_nodes]: 1.67999e-06 [before_grad]: 9.39998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.40002e-06 [flash_sp_send_recv_attached]: 2.37001e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.104e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00042988 [add_forward_monad_depend]: 4.80001e-06 [auto_monad_grad]: 2.11e-06 [auto_monad_eliminator]: 1.301e-05 [cse]: 2.846e-05 [a_3]: 4.006e-05 [Cycle 2]: 0.00058997, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.78998e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012537 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 6.846e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 5.48002e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.23002e-06 [parallel]: 4.05998e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.35001e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.88e-06 [meta_fg_expand]: 1.63997e-06 [flash_sp_send_recv_attached]: 8.90024e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 7.76001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 1.04998e-06 [auto_monad_eliminator]: 6.56999e-06 [cse]: 1.292e-05 [a_3]: 3.177e-05 [py_interpret_to_execute_after_opt_a]: 1.004e-05 [slice_cell_reuse_recomputed_activation]: 2.14999e-06 [rewriter_after_opt_a]: 3.226e-05 [convert_after_rewriter]: 7.21999e-06 [order_py_execute_after_rewriter]: 5.11002e-06 [mutable_eliminate]: 0.00048334 [opt_b]: 0.00018369, [1] [Cycle 1]: 0.00017735, [7] [b_1]: 0.00010911 [b_2]: 7.19001e-06 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 4.10015e-07 [cse]: 1.654e-05 [optimize_parallel_all_gather_comm]: 1.611e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.422e-05 [loop_unroll]: 0.00042806 [opt_after_cconv]: 9.966e-05, [1] [Cycle 1]: 9.394e-05, [7] [c_1]: 2.877e-05 [parameter_eliminate]: 2.75002e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.822e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.283e-05 [tuple_transform]: 6.915e-05, [1] [Cycle 1]: 6.477e-05, [4] [d_1]: 3.932e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.02001e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.371e-05 [cse_after_recomputation]: 2.016e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.092e-05 [environ_conv]: 5.07e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 3.89002e-06 [label_fine_grained_interleaved_index]: 2.75997e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.28998e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.99977e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.47001e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.49998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.145e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.42002e-06 [overlap_recompute_and_grad_model_parallel]: 4.66002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 4.27003e-06 [overlap_grad_flash_sp]: 1.77e-05 [begin_end_overlap_inline]: 7.2e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.787e-05, [1] [Cycle 1]: 6.397e-05, [6] [build]: 3.04001e-06 [elim_shapecalc]: 8.35001e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 5.89999e-06 [fold_const_symbol]: 8.48999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.661e-05 [get_jit_bprop_graph]: 1.39e-06 [rewriter_after_jit_bprop_graph]: 3.35003e-06 [opt_after_jit_grad]: 0.00045517 [validate]: 3.384e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00656347 [execute]: 8.35001e-06 Sums bootstrap : 0.000428s : 2.83% type_inference : 0.004383s : 28.98% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000004s : 0.03% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000422s : 2.79% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000196s : 1.30% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000430s : 2.84% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000483s : 3.20% optimize.opt_b.b_1 : 0.000109s : 0.72% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000428s : 2.83% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000018s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000455s : 3.01% validate : 0.000034s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006563s : 43.41% execute : 0.000008s : 0.06% Time group info: ------[substitution.] 0.000125 26 18.51% : 0.000023s : 4: substitution.arithmetic_simplify 1.54% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.32% : 0.000005s : 4: substitution.graph_param_transform 65.61% : 0.000082s : 2: substitution.inline 2.12% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.42% : 0.000004s : 4: substitution.remove_not_recompute_node 3.50% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004339 2 91.45% : 0.003968s : 1: type_inference.infer 8.55% : 0.000371s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000139 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.93% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 8: predicate.less_batch_normalization 1.79% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.57% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.83% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.56% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.28% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000256 6 41.35% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.65% : 0.000150s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027495 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.10% : 0.003052s : 1: add_attr 11.06% : 0.003042s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.68% : 0.000463s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.79% : 0.000493s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.00% : 0.000825s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.33% : 0.002015s : 1: opt_a 0.38% : 0.000103s : 1: opt_after_cconv 1.69% : 0.000465s : 1: opt_after_jit_grad 0.68% : 0.000187s : 1: opt_b 14.15% : 0.003891s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.05% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.90% : 0.000248s : 1: renormalize.infer 0.64% : 0.000175s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.93% : 0.006580s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 16.00% : 0.004400s : 1: type_inference 0.24% : 0.000067s : 1: validate TotalTime = 0.0393378, [24] [bootstrap]: 0.00047757 [type_inference]: 0.0110968 [event_method]: 4.424e-05 [auto_monad]: 0.00011875 [graph_reusing]: 7.84002e-06 [inline]: 2.09e-06 [add_attr]: 0.00323519, [1] [add_attr_with_inline]: 0.00322483, [1] [Cycle 1]: 8.14e-05, [2] [tag_attr]: 3.65e-05 [meta_addattr_fg_expand]: 9.00999e-06 [parallel-infer-symbol]: 3.44001e-06 [pre_auto_parallel]: 5.234e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.0146698, [53] [py_interpret_to_execute]: 3.805e-05 [rewriter_before_opt_a]: 0.00013042 [opt_a]: 0.0120247, [3] [Cycle 1]: 0.00765665, [45] [expand_dump_flag]: 4.02e-06 [switch_simplify]: 6.696e-05 [loop_unroll]: 5.549e-05 [a_1]: 0.00140688 [with_stream_mark]: 2.855e-05 [recompute_prepare]: 2.345e-05 [updatestate_depend_eliminate]: 1.002e-05 [updatestate_assign_eliminate]: 7.65e-06 [updatestate_loads_eliminate]: 7.04001e-06 [parameter_eliminate]: 2.76e-06 [a_2]: 0.00024869 [accelerated_algorithm]: 3.127e-05 [shard]: 2.35002e-06 [meta_shard_fg_expand]: 3.85e-06 [shard_inline]: 1.664e-05 [merge_send_recv]: 1.699e-05 [auto_parallel]: 1.217e-05 [parallel]: 2.022e-05 [flash_sp]: 1.208e-05 [merge_comm]: 9.57999e-06 [allreduce_fusion]: 9.15001e-06 [matmul_add_comm_reduction]: 3.151e-05 [allreduce_slice_to_reducescatter]: 8.39995e-07 [virtual_shard_identity]: 1.823e-05 [virtual_dataset]: 1.632e-05 [get_grad_eliminate_]: 1.558e-05 [virtual_output]: 1.519e-05 [merge_forward]: 9.37001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 2.014e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.967e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.82e-05 [set_forward_comm_id_for_comm_node_pass]: 9.71e-06 [meta_fg_expand]: 0.00156571 [flash_sp_send_recv_attached]: 3.85998e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 6.275e-05 [a_after_grad]: 8.386e-05 [renormalize]: 0.00285318 [add_forward_monad_depend]: 1.013e-05 [auto_monad_grad]: 6.11998e-06 [auto_monad_eliminator]: 5.69e-05 [cse]: 0.00017244 [a_3]: 0.00034528 [Cycle 2]: 0.00341767, [45] [expand_dump_flag]: 3.23e-06 [switch_simplify]: 4.777e-05 [loop_unroll]: 4.438e-05 [a_1]: 0.00160648 [with_stream_mark]: 1.797e-05 [recompute_prepare]: 1.241e-05 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 5.12e-06 [updatestate_loads_eliminate]: 4.16001e-06 [parameter_eliminate]: 1.91998e-06 [a_2]: 0.00020399 [accelerated_algorithm]: 1.363e-05 [shard]: 2.09999e-06 [meta_shard_fg_expand]: 2.94999e-06 [shard_inline]: 1.016e-05 [merge_send_recv]: 1.086e-05 [auto_parallel]: 1.158e-05 [parallel]: 8.69e-06 [flash_sp]: 4.56002e-06 [merge_comm]: 6.01998e-06 [allreduce_fusion]: 5.39998e-06 [matmul_add_comm_reduction]: 1.127e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 1.038e-05 [virtual_dataset]: 9.34e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.77e-06 [merge_forward]: 5.28002e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 1.363e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.712e-05 [merge_recompute_call_nodes]: 1.25001e-06 [before_grad]: 1.45e-05 [set_forward_comm_id_for_comm_node_pass]: 6.12999e-06 [meta_fg_expand]: 5.069e-05 [flash_sp_send_recv_attached]: 1.44e-06 [receive_attached]: 2.53998e-06 [after_resolve]: 1.684e-05 [a_after_grad]: 1.482e-05 [renormalize]: 0.00079668 [add_forward_monad_depend]: 4.92999e-06 [auto_monad_grad]: 1.98002e-06 [auto_monad_eliminator]: 1.707e-05 [cse]: 5.567e-05 [a_3]: 6.734e-05 [Cycle 3]: 0.00093202, [45] [expand_dump_flag]: 1.66998e-06 [switch_simplify]: 1.087e-05 [loop_unroll]: 9.24e-06 [a_1]: 0.0002612 [with_stream_mark]: 1.162e-05 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.0001287 [accelerated_algorithm]: 1.209e-05 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 2.14999e-06 [shard_inline]: 9.09998e-06 [merge_send_recv]: 7.83999e-06 [auto_parallel]: 7.68999e-06 [parallel]: 5.76e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 4.91002e-06 [allreduce_fusion]: 5.00999e-06 [matmul_add_comm_reduction]: 9.03002e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 9.99001e-06 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.52998e-06 [virtual_output]: 8.50999e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 2.26998e-06 [offload_activation]: 9.49999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.623e-05 [merge_recompute_call_nodes]: 8.59989e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10001e-06 [meta_fg_expand]: 3.31999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.52001e-06 [after_resolve]: 1.406e-05 [a_after_grad]: 1.41e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.59e-06 [auto_monad_grad]: 1.39e-06 [auto_monad_eliminator]: 1.139e-05 [cse]: 2.63e-05 [a_3]: 5.943e-05 [py_interpret_to_execute_after_opt_a]: 1.613e-05 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 5.054e-05 [convert_after_rewriter]: 9.44998e-06 [order_py_execute_after_rewriter]: 6.53e-06 [mutable_eliminate]: 0.0006804 [opt_b]: 0.00030169, [1] [Cycle 1]: 0.0002938, [7] [b_1]: 0.00019219 [b_2]: 1.152e-05 [updatestate_depend_eliminate]: 9.15999e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 4.05e-06 [renormalize]: 8.39995e-07 [cse]: 3.555e-05 [optimize_parallel_all_gather_comm]: 2.159e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.993e-05 [loop_unroll]: 0.00051077 [opt_after_cconv]: 0.00014415, [1] [Cycle 1]: 0.00013746, [7] [c_1]: 5.023e-05 [parameter_eliminate]: 4.1e-06 [updatestate_depend_eliminate]: 8.16002e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.76999e-06 [cse]: 3.13e-05 [renormalize]: 5.8001e-07 [remove_dup_value]: 4.251e-05 [tuple_transform]: 0.00010556, [1] [Cycle 1]: 0.00010039, [4] [d_1]: 6.945e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.007e-05 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 6.264e-05 [cse_after_recomputation]: 3.301e-05, [1] [Cycle 1]: 2.783e-05, [1] [cse]: 2.187e-05 [environ_conv]: 1.019e-05 [swap_dp_allreduce_reducescatter]: 8.43999e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.94e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.43002e-06 [reorder_send_recv_between_fp_bp]: 3.03998e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 1.22e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.772e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 5.15999e-06 [overlap_recompute_and_grad_model_parallel]: 5.61998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55001e-06 [overlap_recompute_comm]: 2.32999e-06 [overlap_grad_ring_attention]: 5.81e-06 [overlap_grad_flash_sp]: 2.773e-05 [begin_end_overlap_inline]: 4.30009e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 0.0001037, [1] [Cycle 1]: 9.914e-05, [6] [build]: 1.106e-05 [elim_shapecalc]: 1.465e-05 [elim_not_effective]: 1.908e-05 [opt_reshape]: 1.001e-05 [fold_const_symbol]: 1.526e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.21e-06 [pipeline_parallel_scheduler]: 1.80001e-06 [auto_monad_reorder]: 2.591e-05 [get_jit_bprop_graph]: 1.97999e-06 [rewriter_after_jit_bprop_graph]: 5.15999e-06 [opt_after_jit_grad]: 0.00048836 [validate]: 5.448e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00877583 [execute]: 9.28002e-06 Sums bootstrap : 0.000478s : 1.37% type_inference : 0.011097s : 31.94% event_method : 0.000044s : 0.13% auto_monad : 0.000119s : 0.34% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000052s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000130s : 0.38% optimize.opt_a.expand_dump_flag : 0.000009s : 0.03% optimize.opt_a.switch_simplify : 0.000126s : 0.36% optimize.opt_a.loop_unroll : 0.000109s : 0.31% optimize.opt_a.a_1 : 0.003275s : 9.43% optimize.opt_a.with_stream_mark : 0.000058s : 0.17% optimize.opt_a.recompute_prepare : 0.000045s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000581s : 1.67% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.16% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.03% optimize.opt_a.shard_inline : 0.000036s : 0.10% optimize.opt_a.merge_send_recv : 0.000036s : 0.10% optimize.opt_a.auto_parallel : 0.000031s : 0.09% optimize.opt_a.parallel : 0.000035s : 0.10% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000052s : 0.15% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.09% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000019s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000043s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001620s : 4.66% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000094s : 0.27% optimize.opt_a.a_after_grad : 0.000113s : 0.32% optimize.opt_a.renormalize : 0.003650s : 10.51% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000085s : 0.25% optimize.opt_a.cse : 0.000254s : 0.73% optimize.opt_a.a_3 : 0.000472s : 1.36% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000680s : 1.96% optimize.opt_b.b_1 : 0.000192s : 0.55% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000036s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000030s : 0.09% optimize.loop_unroll : 0.000511s : 1.47% optimize.opt_after_cconv.c_1 : 0.000050s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000043s : 0.12% optimize.tuple_transform.d_1 : 0.000069s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000063s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000010s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000028s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000026s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000488s : 1.41% validate : 0.000054s : 0.16% backend_pass : 0.000001s : 0.00% task_emit : 0.008776s : 25.26% execute : 0.000009s : 0.03% Time group info: ------[substitution.] 0.000833 218 6.55% : 0.000055s : 11: substitution.arithmetic_simplify 2.13% : 0.000018s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.59% : 0.000005s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.93% : 0.000008s : 8: substitution.graph_param_transform 0.31% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.34% : 0.000461s : 16: substitution.inline 2.14% : 0.000018s : 2: substitution.inline_without_move 1.27% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000016s : 3: substitution.less_batch_normalization 1.78% : 0.000015s : 11: substitution.minmaximum_grad 0.85% : 0.000007s : 5: substitution.partial_eliminate 1.74% : 0.000014s : 20: substitution.remove_not_recompute_node 3.27% : 0.000027s : 10: substitution.replace_applicator 1.45% : 0.000012s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.53% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 8.12% : 0.000068s : 28: substitution.tuple_list_get_item_eliminator 2.30% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011014 2 87.35% : 0.009620s : 1: type_inference.infer 12.65% : 0.001394s : 1: type_inference.specialize ------[replace.] 0.000215 30 59.55% : 0.000128s : 16: replace.inline 40.45% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000483 30 93.59% : 0.000452s : 16: match.inline 6.41% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000745 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.04% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.09% : 0.000016s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.57% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.21% : 0.000002s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.25% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.50% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 67: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.69% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.20% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.35% : 0.000003s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.35% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.26% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.18% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.89% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.79% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.42% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.59% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001725 32 59.67% : 0.001029s : 12: func_graph_cloner_run.FuncGraphClonerGraph 40.33% : 0.000696s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.066254 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.89% : 0.003240s : 1: add_attr 4.87% : 0.003229s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000068s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000125s : 1: auto_monad 0.04% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.78% : 0.000516s : 1: bootstrap 0.05% : 0.000033s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.79% : 0.000520s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.04% : 0.000691s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.60% : 0.005035s : 117: opt.transform.opt_a 0.07% : 0.000049s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000177s : 28: opt.transform.opt_b 0.12% : 0.000078s : 2: opt.transform.opt_trans_graph 0.08% : 0.000056s : 4: opt.transform.symbol_engine_opt 18.15% : 0.012028s : 1: opt_a 0.22% : 0.000148s : 1: opt_after_cconv 0.75% : 0.000499s : 1: opt_after_jit_grad 0.46% : 0.000305s : 1: opt_b 22.15% : 0.014675s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000031s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000057s : 1: pre_auto_parallel 0.06% : 0.000043s : 1: py_interpret_to_execute 0.03% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.07% : 0.000047s : 1: remove_dup_value 3.11% : 0.002060s : 2: renormalize.infer 2.38% : 0.001574s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000055s : 1: rewriter_after_opt_a 0.20% : 0.000135s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000106s : 1: symbol_engine_optimizer 13.28% : 0.008796s : 1: task_emit 0.16% : 0.000109s : 1: tuple_transform 16.79% : 0.011123s : 1: type_inference 0.15% : 0.000097s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x0-kbk],max_mem:46.0M TotalTime = 0.0965682, [24] [bootstrap]: 0.00056869 [type_inference]: 0.00677028 [event_method]: 1.499e-05 [auto_monad]: 6.057e-05 [graph_reusing]: 5.96e-06 [inline]: 2.19999e-06 [add_attr]: 0.00408769, [1] [add_attr_with_inline]: 0.00407255, [1] [Cycle 1]: 6.566e-05, [2] [tag_attr]: 1.985e-05 [meta_addattr_fg_expand]: 4.19997e-06 [parallel-infer-symbol]: 3.55e-06 [pre_auto_parallel]: 3.667e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 2.01998e-06 [pipeline_split]: 1.89e-06 [optimize]: 0.00451774, [53] [py_interpret_to_execute]: 2.637e-05 [rewriter_before_opt_a]: 6.611e-05 [opt_a]: 0.00249958, [2] [Cycle 1]: 0.00187939, [45] [expand_dump_flag]: 3.01001e-06 [switch_simplify]: 3.446e-05 [loop_unroll]: 2.121e-05 [a_1]: 0.00054768 [with_stream_mark]: 1.589e-05 [recompute_prepare]: 8.02e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.37997e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.745e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.22001e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 6.57002e-06 [parallel]: 2.666e-05 [flash_sp]: 8.28999e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 9.24e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 6.14999e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.81998e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 1.044e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 1.018e-05 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.28002e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 1.119e-05 [a_after_grad]: 8.97e-06 [renormalize]: 0.00065841 [add_forward_monad_depend]: 5.72999e-06 [auto_monad_grad]: 2.58e-06 [auto_monad_eliminator]: 1.398e-05 [cse]: 2.759e-05 [a_3]: 4.284e-05 [Cycle 2]: 0.00060931, [45] [expand_dump_flag]: 1.67001e-06 [switch_simplify]: 6.90998e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012937 [with_stream_mark]: 1.038e-05 [recompute_prepare]: 5.87999e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.48002e-06 [parameter_eliminate]: 1.01997e-06 [a_2]: 6.799e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.25999e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.38002e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.19e-06 [parallel]: 5.14998e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 2.70025e-07 [virtual_shard_identity]: 6.28002e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 4.94998e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.88e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 7.33e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.004e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.47e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03998e-06 [meta_fg_expand]: 2.01e-06 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.15999e-06 [after_resolve]: 9.59e-06 [a_after_grad]: 8.38001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 6.54999e-06 [cse]: 1.381e-05 [a_3]: 3.418e-05 [py_interpret_to_execute_after_opt_a]: 8.70999e-06 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 3.271e-05 [convert_after_rewriter]: 7.26001e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00053531 [opt_b]: 0.00018439, [1] [Cycle 1]: 0.00017709, [7] [b_1]: 0.00010787 [b_2]: 7.15003e-06 [updatestate_depend_eliminate]: 6.09001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 5.19998e-07 [cse]: 1.694e-05 [optimize_parallel_all_gather_comm]: 1.688e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.501e-05 [loop_unroll]: 0.00042351 [opt_after_cconv]: 9.459e-05, [1] [Cycle 1]: 8.878e-05, [7] [c_1]: 2.782e-05 [parameter_eliminate]: 2.64999e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.566e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.212e-05 [tuple_transform]: 7.033e-05, [1] [Cycle 1]: 6.534e-05, [4] [d_1]: 3.985e-05 [none_parameter_eliminate]: 1.91e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.03002e-06 [partial_unused_args_eliminate]: 2.17001e-06 [add_recomputation]: 5.343e-05 [cse_after_recomputation]: 2.061e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.084e-05 [environ_conv]: 4.98001e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.86e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.80002e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.43998e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.21997e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.16997e-06 [full_micro_interleaved_order_control]: 2.66e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.33002e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.202e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.951e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.54001e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.014e-05, [1] [Cycle 1]: 6.627e-05, [6] [build]: 2.45002e-06 [elim_shapecalc]: 9.22001e-06 [elim_not_effective]: 1.195e-05 [opt_reshape]: 6.26998e-06 [fold_const_symbol]: 9.17001e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.575e-05 [get_jit_bprop_graph]: 1.55001e-06 [rewriter_after_jit_bprop_graph]: 3.97e-06 [opt_after_jit_grad]: 0.00045339 [validate]: 3.711e-05 [backend_pass]: 1.31002e-06 [task_emit]: 0.0797292 [execute]: 9.94001e-06 Sums bootstrap : 0.000569s : 0.62% type_inference : 0.006770s : 7.41% event_method : 0.000015s : 0.02% auto_monad : 0.000061s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000020s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000037s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000026s : 0.03% optimize.rewriter_before_opt_a : 0.000066s : 0.07% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000677s : 0.74% optimize.opt_a.with_stream_mark : 0.000026s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000032s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000659s : 0.72% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000041s : 0.05% optimize.opt_a.a_3 : 0.000077s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000535s : 0.59% optimize.opt_b.b_1 : 0.000108s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.03% optimize.loop_unroll : 0.000424s : 0.46% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000053s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000453s : 0.50% validate : 0.000037s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079729s : 87.21% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000193 30 14.20% : 0.000027s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 0.66% : 0.000001s : 2: substitution.fold_const_symbol 2.90% : 0.000006s : 4: substitution.graph_param_transform 69.25% : 0.000134s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.28% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 5.62% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006708 2 91.02% : 0.006106s : 1: type_inference.infer 8.98% : 0.000602s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.94% : 0.000029s : 3: replace.inline 29.06% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000141 5 93.02% : 0.000131s : 3: match.inline 6.98% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.09% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.19% : 0.000002s : 15: predicate.environ_get_depend_swap 1.99% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.83% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.55% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.22% : 0.000008s : 54: predicate.switch_simplify 0.93% : 0.000001s : 11: predicate.tile_eliminate 1.10% : 0.000002s : 11: predicate.transpose_eliminate 1.57% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.68% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.25% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.27% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000401 8 43.74% : 0.000176s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.26% : 0.000226s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.107022 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.82% : 0.004093s : 1: add_attr 3.81% : 0.004077s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000066s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.56% : 0.000604s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.06% : 0.000066s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.40% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000545s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.001050s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.34% : 0.002503s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.43% : 0.000463s : 1: opt_after_jit_grad 0.18% : 0.000188s : 1: opt_b 4.23% : 0.004523s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000041s : 1: pre_auto_parallel 0.03% : 0.000030s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.34% : 0.000362s : 1: renormalize.infer 0.27% : 0.000288s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.07% : 0.000070s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000073s : 1: symbol_engine_optimizer 74.52% : 0.079752s : 1: task_emit 0.07% : 0.000073s : 1: tuple_transform 6.35% : 0.006792s : 1: type_inference 0.07% : 0.000072s : 1: validate TotalTime = 0.077154, [24] [bootstrap]: 0.00043148 [type_inference]: 0.0045951 [event_method]: 1.171e-05 [auto_monad]: 5.224e-05 [graph_reusing]: 4.82e-06 [inline]: 2.42001e-06 [add_attr]: 0.00315472, [1] [add_attr_with_inline]: 0.0031459, [1] [Cycle 1]: 4.887e-05, [2] [tag_attr]: 1.344e-05 [meta_addattr_fg_expand]: 3.29001e-06 [parallel-infer-symbol]: 3.78001e-06 [pre_auto_parallel]: 2.544e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00408277, [53] [py_interpret_to_execute]: 1.841e-05 [rewriter_before_opt_a]: 4.136e-05 [opt_a]: 0.00215649, [2] [Cycle 1]: 0.00152811, [45] [expand_dump_flag]: 2.63e-06 [switch_simplify]: 2.463e-05 [loop_unroll]: 1.374e-05 [a_1]: 0.00030448 [with_stream_mark]: 1.53e-05 [recompute_prepare]: 7.90998e-06 [updatestate_depend_eliminate]: 4.23999e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.643e-05 [accelerated_algorithm]: 6.91001e-06 [shard]: 2.56e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 5.86998e-06 [parallel]: 1.784e-05 [flash_sp]: 9.15999e-06 [merge_comm]: 3.44001e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 8.69998e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.99e-06 [virtual_output]: 5.47999e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.073e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.08e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.49999e-06 [set_forward_comm_id_for_comm_node_pass]: 4e-06 [meta_fg_expand]: 2.26998e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.128e-05 [a_after_grad]: 9.12001e-06 [renormalize]: 0.00058398 [add_forward_monad_depend]: 5.71e-06 [auto_monad_grad]: 2.27999e-06 [auto_monad_eliminator]: 1.405e-05 [cse]: 2.899e-05 [a_3]: 4.261e-05 [Cycle 2]: 0.00061782, [45] [expand_dump_flag]: 1.69998e-06 [switch_simplify]: 7.51999e-06 [loop_unroll]: 5.66e-06 [a_1]: 0.00012796 [with_stream_mark]: 1.175e-05 [recompute_prepare]: 5.87999e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.84e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.27e-06 [meta_shard_fg_expand]: 1.32999e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.84e-06 [auto_parallel]: 5.62001e-06 [parallel]: 4.12998e-06 [flash_sp]: 3.08998e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 5.75001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.82002e-06 [cell_reuse_recompute_pass]: 1.70001e-06 [offload_activation]: 6.58998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.024e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 8.12e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 1.90001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.38002e-06 [after_resolve]: 9.84999e-06 [a_after_grad]: 8.22e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 1.25001e-06 [auto_monad_eliminator]: 8.01001e-06 [cse]: 1.484e-05 [a_3]: 3.246e-05 [py_interpret_to_execute_after_opt_a]: 9.14e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.443e-05 [convert_after_rewriter]: 7.8e-06 [order_py_execute_after_rewriter]: 5.66998e-06 [mutable_eliminate]: 0.00051608 [opt_b]: 0.00019007, [1] [Cycle 1]: 0.00018279, [7] [b_1]: 0.0001101 [b_2]: 7.78001e-06 [updatestate_depend_eliminate]: 6.01e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 5.59987e-07 [cse]: 1.835e-05 [optimize_parallel_all_gather_comm]: 1.578e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 2.633e-05 [loop_unroll]: 0.00042229 [opt_after_cconv]: 9.857e-05, [1] [Cycle 1]: 9.159e-05, [7] [c_1]: 2.785e-05 [parameter_eliminate]: 2.88e-06 [updatestate_depend_eliminate]: 5.62001e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.42001e-06 [cse]: 1.652e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 1.248e-05 [tuple_transform]: 7.023e-05, [1] [Cycle 1]: 6.57e-05, [4] [d_1]: 4.034e-05 [none_parameter_eliminate]: 1.32999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.25002e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.522e-05 [cse_after_recomputation]: 2.005e-05, [1] [Cycle 1]: 1.575e-05, [1] [cse]: 1.07e-05 [environ_conv]: 5.17e-06 [swap_dp_allreduce_reducescatter]: 5.21998e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.03999e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.53002e-06 [slice_recompute_activation]: 1.99e-06 [micro_interleaved_order_control]: 2.24999e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 1.20999e-06 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.61999e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.44998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.2e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 3.79002e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.825e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.961e-05, [1] [Cycle 1]: 6.535e-05, [6] [build]: 2.93e-06 [elim_shapecalc]: 8.49998e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.88997e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.516e-05 [get_jit_bprop_graph]: 1.60999e-06 [rewriter_after_jit_bprop_graph]: 4.32e-06 [opt_after_jit_grad]: 0.00044724 [validate]: 3.627e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0640409 [execute]: 9.99001e-06 Sums bootstrap : 0.000431s : 0.59% type_inference : 0.004595s : 6.30% event_method : 0.000012s : 0.02% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000025s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000018s : 0.03% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000432s : 0.59% optimize.opt_a.with_stream_mark : 0.000027s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000584s : 0.80% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.03% optimize.opt_a.cse : 0.000044s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000516s : 0.71% optimize.opt_b.b_1 : 0.000110s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.04% optimize.loop_unroll : 0.000422s : 0.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000447s : 0.61% validate : 0.000036s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.064041s : 87.74% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000134 26 17.40% : 0.000023s : 4: substitution.arithmetic_simplify 1.32% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.95% : 0.000005s : 4: substitution.graph_param_transform 67.11% : 0.000090s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.30% : 0.000004s : 4: substitution.remove_not_recompute_node 3.78% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004546 2 91.18% : 0.004145s : 1: type_inference.infer 8.82% : 0.000401s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000088 2 100.00% : 0.000088s : 2: match.inline ------[predicate.] 0.000140 984 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.82% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.10% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.99% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.82% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.87% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.63% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.16% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.89% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.97% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 1.04% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.56% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.67% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.35% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.72% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000304 6 38.81% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.19% : 0.000186s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.085923 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.68% : 0.003160s : 1: add_attr 3.67% : 0.003149s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.54% : 0.000463s : 1: bootstrap 0.04% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000018s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.61% : 0.000527s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 0.92% : 0.000792s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.51% : 0.002159s : 1: opt_a 0.12% : 0.000102s : 1: opt_after_cconv 0.53% : 0.000458s : 1: opt_after_jit_grad 0.23% : 0.000194s : 1: opt_b 4.76% : 0.004087s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000022s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.36% : 0.000305s : 1: renormalize.infer 0.32% : 0.000272s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000039s : 1: rewriter_after_opt_a 0.05% : 0.000046s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 74.56% : 0.064063s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.37% : 0.004614s : 1: type_inference 0.07% : 0.000064s : 1: validate TotalTime = 0.0793858, [24] [bootstrap]: 0.00042017 [type_inference]: 0.00559942 [event_method]: 1.492e-05 [auto_monad]: 5.622e-05 [graph_reusing]: 5.38002e-06 [inline]: 2.06e-06 [add_attr]: 0.00319216, [1] [add_attr_with_inline]: 0.00318137, [1] [Cycle 1]: 5.718e-05, [2] [tag_attr]: 1.774e-05 [meta_addattr_fg_expand]: 4.07998e-06 [parallel-infer-symbol]: 3.53e-06 [pre_auto_parallel]: 3.047e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.89999e-06 [optimize]: 0.00447977, [53] [py_interpret_to_execute]: 2.435e-05 [rewriter_before_opt_a]: 6.298e-05 [opt_a]: 0.00246696, [2] [Cycle 1]: 0.0017562, [45] [expand_dump_flag]: 2.78e-06 [switch_simplify]: 3.269e-05 [loop_unroll]: 2.09e-05 [a_1]: 0.00046465 [with_stream_mark]: 1.761e-05 [recompute_prepare]: 8.23999e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 3.01999e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 7.6e-05 [accelerated_algorithm]: 6.61999e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 2.08002e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 7.94002e-06 [auto_parallel]: 6.71e-06 [parallel]: 1.901e-05 [flash_sp]: 7.82002e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.66999e-06 [matmul_add_comm_reduction]: 9.36998e-06 [allreduce_slice_to_reducescatter]: 9.10019e-07 [virtual_shard_identity]: 7.17002e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.92002e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.48002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.63998e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.079e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00062462 [add_forward_monad_depend]: 4.79e-06 [auto_monad_grad]: 2.63998e-06 [auto_monad_eliminator]: 1.45e-05 [cse]: 2.984e-05 [a_3]: 4.777e-05 [Cycle 2]: 0.00070011, [45] [expand_dump_flag]: 1.27999e-06 [switch_simplify]: 7.75e-06 [loop_unroll]: 6.29001e-06 [a_1]: 0.00019294 [with_stream_mark]: 1.205e-05 [recompute_prepare]: 7.09001e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 7.098e-05 [accelerated_algorithm]: 5.76003e-06 [shard]: 1.25999e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 4.98001e-06 [auto_parallel]: 5.60001e-06 [parallel]: 4.97999e-06 [flash_sp]: 3.88001e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.03998e-06 [matmul_add_comm_reduction]: 8.42e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.53e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.39e-06 [merge_forward]: 3.30998e-06 [cell_reuse_recompute_pass]: 2.31e-06 [offload_activation]: 7.05e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.02e-05 [merge_recompute_call_nodes]: 9.5999e-07 [before_grad]: 8.50999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 1.85001e-06 [flash_sp_send_recv_attached]: 1.25001e-06 [receive_attached]: 1.42999e-06 [after_resolve]: 9.59999e-06 [a_after_grad]: 7.75e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.46998e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 7.01001e-06 [cse]: 1.598e-05 [a_3]: 3.437e-05 [py_interpret_to_execute_after_opt_a]: 9.26998e-06 [slice_cell_reuse_recomputed_activation]: 2.51998e-06 [rewriter_after_opt_a]: 3.555e-05 [convert_after_rewriter]: 7.89002e-06 [order_py_execute_after_rewriter]: 5.71e-06 [mutable_eliminate]: 0.00055382 [opt_b]: 0.00019063, [1] [Cycle 1]: 0.00018331, [7] [b_1]: 0.00010927 [b_2]: 7.71999e-06 [updatestate_depend_eliminate]: 6.26998e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.68e-06 [renormalize]: 6.69999e-07 [cse]: 1.816e-05 [optimize_parallel_all_gather_comm]: 1.744e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.628e-05 [loop_unroll]: 0.00043432 [opt_after_cconv]: 9.939e-05, [1] [Cycle 1]: 9.298e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 3.58e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.831e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.186e-05 [tuple_transform]: 7.183e-05, [1] [Cycle 1]: 6.75e-05, [4] [d_1]: 4.157e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.719e-05 [cse_after_recomputation]: 2.09e-05, [1] [Cycle 1]: 1.637e-05, [1] [cse]: 1.096e-05 [environ_conv]: 5.82999e-06 [swap_dp_allreduce_reducescatter]: 5.74e-06 [bias_add_comm_swap]: 2.68998e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 9.30013e-07 [remove_cast_before_assign_add]: 1.40999e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.44001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 9.19972e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.214e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.81001e-06 [overlap_recompute_and_grad_model_parallel]: 4.89e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.56e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.818e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.44999e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 7.058e-05, [1] [Cycle 1]: 6.583e-05, [6] [build]: 3.13e-06 [elim_shapecalc]: 8.97999e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.95999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 2.15002e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.627e-05 [get_jit_bprop_graph]: 1.81e-06 [rewriter_after_jit_bprop_graph]: 4.36002e-06 [opt_after_jit_grad]: 0.00046967 [validate]: 4.026e-05 [backend_pass]: 8.90024e-07 [task_emit]: 0.0648011 [execute]: 9.20999e-06 Sums bootstrap : 0.000420s : 0.56% type_inference : 0.005599s : 7.45% event_method : 0.000015s : 0.02% auto_monad : 0.000056s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000030s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.03% optimize.rewriter_before_opt_a : 0.000063s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000658s : 0.87% optimize.opt_a.with_stream_mark : 0.000030s : 0.04% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000018s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000625s : 0.83% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.03% optimize.opt_a.cse : 0.000046s : 0.06% optimize.opt_a.a_3 : 0.000082s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000554s : 0.74% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.03% optimize.loop_unroll : 0.000434s : 0.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000042s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 0.62% validate : 0.000040s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.064801s : 86.20% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000184 30 14.79% : 0.000027s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.66% : 0.000001s : 2: substitution.fold_const_symbol 3.01% : 0.000006s : 4: substitution.graph_param_transform 67.82% : 0.000125s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.43% : 0.000004s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 5.99% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005554 2 89.87% : 0.004991s : 1: type_inference.infer 10.13% : 0.000562s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.97% : 0.000028s : 3: replace.inline 28.03% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000132 5 92.46% : 0.000122s : 3: match.inline 7.54% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.97% : 0.000002s : 11: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.72% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.79% : 0.000009s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000002s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.90% : 0.000003s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.58% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.62% : 0.000001s : 4: predicate.row_tensor_eliminate 1.07% : 0.000002s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 1.18% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.30% : 0.000002s : 16: predicate.switch_defer_inline 2.06% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.39% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 44.28% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.72% : 0.000215s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088873 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.60% : 0.003197s : 1: add_attr 3.58% : 0.003185s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000062s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.51% : 0.000451s : 1: bootstrap 0.03% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000444s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.64% : 0.000565s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000016s : 1: opt.transform.mutable_eliminate 1.17% : 0.001037s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000046s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.78% : 0.002470s : 1: opt_a 0.12% : 0.000103s : 1: opt_after_cconv 0.54% : 0.000481s : 1: opt_after_jit_grad 0.22% : 0.000194s : 1: opt_b 5.05% : 0.004485s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000035s : 1: pre_auto_parallel 0.03% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.38% : 0.000336s : 1: renormalize.infer 0.31% : 0.000280s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000040s : 1: rewriter_after_opt_a 0.08% : 0.000068s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000073s : 1: symbol_engine_optimizer 72.94% : 0.064826s : 1: task_emit 0.08% : 0.000075s : 1: tuple_transform 6.32% : 0.005617s : 1: type_inference 0.08% : 0.000069s : 1: validate TotalTime = 0.131293, [24] [bootstrap]: 0.00047389 [type_inference]: 0.0134457 [event_method]: 5.877e-05 [auto_monad]: 0.00013249 [graph_reusing]: 8.13001e-06 [inline]: 3.83001e-06 [add_attr]: 0.00389036, [1] [add_attr_with_inline]: 0.00387717, [1] [Cycle 1]: 0.00010734, [2] [tag_attr]: 4.955e-05 [meta_addattr_fg_expand]: 9.89999e-06 [parallel-infer-symbol]: 3.84002e-06 [pre_auto_parallel]: 6.317e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 1.02998e-06 [dataset_repeat_opt]: 2.55002e-06 [pipeline_split]: 1.93002e-06 [optimize]: 0.0173934, [53] [py_interpret_to_execute]: 5.013e-05 [rewriter_before_opt_a]: 0.00017562 [opt_a]: 0.0145022, [3] [Cycle 1]: 0.00931898, [45] [expand_dump_flag]: 6.17001e-06 [switch_simplify]: 7.74e-05 [loop_unroll]: 6.267e-05 [a_1]: 0.00169606 [with_stream_mark]: 3.943e-05 [recompute_prepare]: 3.058e-05 [updatestate_depend_eliminate]: 1.039e-05 [updatestate_assign_eliminate]: 8.2e-06 [updatestate_loads_eliminate]: 8.35001e-06 [parameter_eliminate]: 3.56999e-06 [a_2]: 0.00028458 [accelerated_algorithm]: 4.012e-05 [shard]: 2.43e-06 [meta_shard_fg_expand]: 5.72001e-06 [shard_inline]: 1.759e-05 [merge_send_recv]: 2.075e-05 [auto_parallel]: 1.635e-05 [parallel]: 2.148e-05 [flash_sp]: 1.61e-05 [merge_comm]: 1.029e-05 [allreduce_fusion]: 9.41e-06 [matmul_add_comm_reduction]: 3.369e-05 [allreduce_slice_to_reducescatter]: 9.5999e-07 [virtual_shard_identity]: 2.243e-05 [virtual_dataset]: 1.605e-05 [get_grad_eliminate_]: 1.606e-05 [virtual_output]: 1.598e-05 [merge_forward]: 1.16e-05 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 1.961e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.601e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 3.055e-05 [set_forward_comm_id_for_comm_node_pass]: 1.245e-05 [meta_fg_expand]: 0.00201923 [flash_sp_send_recv_attached]: 5.29e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 7.845e-05 [a_after_grad]: 8.913e-05 [renormalize]: 0.00346582 [add_forward_monad_depend]: 1.653e-05 [auto_monad_grad]: 7.26001e-06 [auto_monad_eliminator]: 6.713e-05 [cse]: 0.00019145 [a_3]: 0.00036147 [Cycle 2]: 0.00407905, [45] [expand_dump_flag]: 3.18e-06 [switch_simplify]: 5.136e-05 [loop_unroll]: 4.552e-05 [a_1]: 0.00180581 [with_stream_mark]: 2.688e-05 [recompute_prepare]: 1.748e-05 [updatestate_depend_eliminate]: 6.58998e-06 [updatestate_assign_eliminate]: 5.10001e-06 [updatestate_loads_eliminate]: 4.59998e-06 [parameter_eliminate]: 2.29999e-06 [a_2]: 0.00013328 [accelerated_algorithm]: 1.45e-05 [shard]: 2.73e-06 [meta_shard_fg_expand]: 3.41001e-06 [shard_inline]: 9.77999e-06 [merge_send_recv]: 1.151e-05 [auto_parallel]: 1.22e-05 [parallel]: 1.105e-05 [flash_sp]: 4.60001e-06 [merge_comm]: 6.36e-06 [allreduce_fusion]: 5.91998e-06 [matmul_add_comm_reduction]: 1.297e-05 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 1.447e-05 [virtual_dataset]: 1.015e-05 [get_grad_eliminate_]: 8.90001e-06 [virtual_output]: 9.12999e-06 [merge_forward]: 6.36998e-06 [cell_reuse_recompute_pass]: 2.27001e-06 [offload_activation]: 1.439e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.076e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 1.92e-05 [set_forward_comm_id_for_comm_node_pass]: 6.47001e-06 [meta_fg_expand]: 0.00014129 [flash_sp_send_recv_attached]: 2.71999e-06 [receive_attached]: 3.18998e-06 [after_resolve]: 2.329e-05 [a_after_grad]: 1.525e-05 [renormalize]: 0.00109429 [add_forward_monad_depend]: 8.34998e-06 [auto_monad_grad]: 2.46e-06 [auto_monad_eliminator]: 2.214e-05 [cse]: 6.759e-05 [a_3]: 7.744e-05 [Cycle 3]: 0.00108125, [45] [expand_dump_flag]: 2.36998e-06 [switch_simplify]: 1.25e-05 [loop_unroll]: 9.17001e-06 [a_1]: 0.00027394 [with_stream_mark]: 1.603e-05 [recompute_prepare]: 1.055e-05 [updatestate_depend_eliminate]: 5.90002e-06 [updatestate_assign_eliminate]: 4.57998e-06 [updatestate_loads_eliminate]: 4.62e-06 [parameter_eliminate]: 1.71998e-06 [a_2]: 0.00012713 [accelerated_algorithm]: 1.552e-05 [shard]: 1.44e-06 [meta_shard_fg_expand]: 2.27001e-06 [shard_inline]: 9.42001e-06 [merge_send_recv]: 1.074e-05 [auto_parallel]: 1.094e-05 [parallel]: 8.69e-06 [flash_sp]: 1.35999e-06 [merge_comm]: 5.03002e-06 [allreduce_fusion]: 5.17999e-06 [matmul_add_comm_reduction]: 1.087e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 1.172e-05 [virtual_dataset]: 9.27999e-06 [get_grad_eliminate_]: 8.60999e-06 [virtual_output]: 8.38999e-06 [merge_forward]: 6.12999e-06 [cell_reuse_recompute_pass]: 2.10002e-06 [offload_activation]: 1.243e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.756e-05 [merge_recompute_call_nodes]: 1.96e-06 [before_grad]: 6.523e-05 [set_forward_comm_id_for_comm_node_pass]: 7.09001e-06 [meta_fg_expand]: 4.77e-06 [flash_sp_send_recv_attached]: 1.33002e-06 [receive_attached]: 1.84998e-06 [after_resolve]: 1.932e-05 [a_after_grad]: 1.61e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.97001e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.569e-05 [cse]: 3.548e-05 [a_3]: 6.27e-05 [py_interpret_to_execute_after_opt_a]: 2.048e-05 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 6.008e-05 [convert_after_rewriter]: 9.10001e-06 [order_py_execute_after_rewriter]: 6.71999e-06 [mutable_eliminate]: 0.00074843 [opt_b]: 0.00032423, [1] [Cycle 1]: 0.00031497, [7] [b_1]: 0.00019577 [b_2]: 1.194e-05 [updatestate_depend_eliminate]: 1.222e-05 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 5.13002e-06 [renormalize]: 7.99977e-07 [cse]: 4.531e-05 [optimize_parallel_all_gather_comm]: 2.619e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 3.453e-05 [loop_unroll]: 0.00051274 [opt_after_cconv]: 0.00015348, [1] [Cycle 1]: 0.00014655, [7] [c_1]: 5e-05 [parameter_eliminate]: 5.08002e-06 [updatestate_depend_eliminate]: 9.41998e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 3.92998e-06 [cse]: 3.732e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 4.765e-05 [tuple_transform]: 0.00011284, [1] [Cycle 1]: 0.00010708, [4] [d_1]: 7.509e-05 [none_parameter_eliminate]: 1.83002e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 1.038e-05 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 7.291e-05 [cse_after_recomputation]: 3.569e-05, [1] [Cycle 1]: 3.084e-05, [1] [cse]: 2.489e-05 [environ_conv]: 1.335e-05 [swap_dp_allreduce_reducescatter]: 8.58001e-06 [bias_add_comm_swap]: 3.06999e-06 [label_micro_interleaved_index]: 6.06e-06 [label_fine_grained_interleaved_index]: 2.98e-06 [merge_cast_opt]: 1.71998e-06 [slice_recompute_activation]: 2.74999e-06 [micro_interleaved_order_control]: 2.56998e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.64001e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.11997e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.26997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97999e-06 [control_data_broadcast_order]: 1.904e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 5.79999e-06 [overlap_recompute_and_grad_model_parallel]: 6.25002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 5.95002e-06 [overlap_grad_flash_sp]: 2.848e-05 [begin_end_overlap_inline]: 7.7e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 2.08002e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 0.00011101, [1] [Cycle 1]: 0.00010603, [6] [build]: 1.278e-05 [elim_shapecalc]: 1.701e-05 [elim_not_effective]: 1.956e-05 [opt_reshape]: 1.147e-05 [fold_const_symbol]: 1.514e-05 [renormalize]: 1.59984e-07 [detach_backward]: 2.76999e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 2.929e-05 [get_jit_bprop_graph]: 2.06e-06 [rewriter_after_jit_bprop_graph]: 5.79e-06 [opt_after_jit_grad]: 0.00061521 [validate]: 6.448e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0948074 [execute]: 9.49e-06 Sums bootstrap : 0.000474s : 0.38% type_inference : 0.013446s : 10.69% event_method : 0.000059s : 0.05% auto_monad : 0.000132s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000004s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000050s : 0.04% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000063s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000003s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000050s : 0.04% optimize.rewriter_before_opt_a : 0.000176s : 0.14% optimize.opt_a.expand_dump_flag : 0.000012s : 0.01% optimize.opt_a.switch_simplify : 0.000141s : 0.11% optimize.opt_a.loop_unroll : 0.000117s : 0.09% optimize.opt_a.a_1 : 0.003776s : 3.00% optimize.opt_a.with_stream_mark : 0.000082s : 0.07% optimize.opt_a.recompute_prepare : 0.000059s : 0.05% optimize.opt_a.updatestate_depend_eliminate : 0.000023s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000018s : 0.01% optimize.opt_a.parameter_eliminate : 0.000008s : 0.01% optimize.opt_a.a_2 : 0.000545s : 0.43% optimize.opt_a.accelerated_algorithm : 0.000070s : 0.06% optimize.opt_a.shard : 0.000007s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000011s : 0.01% optimize.opt_a.shard_inline : 0.000037s : 0.03% optimize.opt_a.merge_send_recv : 0.000043s : 0.03% optimize.opt_a.auto_parallel : 0.000039s : 0.03% optimize.opt_a.parallel : 0.000041s : 0.03% optimize.opt_a.flash_sp : 0.000022s : 0.02% optimize.opt_a.merge_comm : 0.000022s : 0.02% optimize.opt_a.allreduce_fusion : 0.000021s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000058s : 0.05% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000049s : 0.04% optimize.opt_a.virtual_dataset : 0.000035s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000024s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000046s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000074s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000115s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000026s : 0.02% optimize.opt_a.meta_fg_expand : 0.002165s : 1.72% optimize.opt_a.flash_sp_send_recv_attached : 0.000009s : 0.01% optimize.opt_a.receive_attached : 0.000008s : 0.01% optimize.opt_a.after_resolve : 0.000121s : 0.10% optimize.opt_a.a_after_grad : 0.000120s : 0.10% optimize.opt_a.renormalize : 0.004560s : 3.62% optimize.opt_a.add_forward_monad_depend : 0.000027s : 0.02% optimize.opt_a.auto_monad_grad : 0.000011s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000105s : 0.08% optimize.opt_a.cse : 0.000295s : 0.23% optimize.opt_a.a_3 : 0.000502s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000060s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000748s : 0.59% optimize.opt_b.b_1 : 0.000196s : 0.16% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000012s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000045s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.03% optimize.loop_unroll : 0.000513s : 0.41% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000037s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000048s : 0.04% optimize.tuple_transform.d_1 : 0.000075s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000073s : 0.06% optimize.cse_after_recomputation.cse : 0.000025s : 0.02% optimize.environ_conv : 0.000013s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000017s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000029s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000615s : 0.49% validate : 0.000064s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.094807s : 75.35% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.001134 222 5.94% : 0.000067s : 12: substitution.arithmetic_simplify 2.08% : 0.000024s : 2: substitution.cast_eliminate 0.24% : 0.000003s : 5: substitution.elim_not_effective 0.55% : 0.000006s : 5: substitution.float_depend_g_call 0.42% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.20% : 0.000002s : 5: substitution.fold_const_symbol 0.74% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000004s : 2: substitution.incorporate_call 0.20% : 0.000002s : 2: substitution.incorporate_call_switch 60.52% : 0.000686s : 17: substitution.inline 2.06% : 0.000023s : 2: substitution.inline_without_move 1.10% : 0.000012s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000022s : 3: substitution.less_batch_normalization 1.38% : 0.000016s : 11: substitution.minmaximum_grad 0.66% : 0.000007s : 5: substitution.partial_eliminate 1.41% : 0.000016s : 20: substitution.remove_not_recompute_node 3.16% : 0.000036s : 10: substitution.replace_applicator 1.41% : 0.000016s : 15: substitution.replace_old_param 0.41% : 0.000005s : 1: substitution.set_cell_output_no_recompute 2.85% : 0.000032s : 11: substitution.tuple_list_convert_item_index_to_positive 1.35% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 1.84% : 0.000021s : 11: substitution.tuple_list_get_item_depend_reorder 7.50% : 0.000085s : 30: substitution.tuple_list_get_item_eliminator 1.73% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013339 2 86.30% : 0.011512s : 1: type_inference.infer 13.70% : 0.001827s : 1: type_inference.specialize ------[replace.] 0.000275 33 59.72% : 0.000164s : 17: replace.inline 40.28% : 0.000111s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000717 33 94.10% : 0.000675s : 17: match.inline 5.90% : 0.000042s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000797 5764 1.13% : 0.000009s : 68: predicate.accumulaten_eliminater 0.36% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 1.03% : 0.000008s : 68: predicate.addn_zero_filter 1.02% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.20% : 0.000018s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.10% : 0.000009s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.43% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.19% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000010s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_depend_swap 1.70% : 0.000014s : 108: predicate.environ_get_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.65% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000019s : 101: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.63% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.60% : 0.000005s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.73% : 0.000046s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.73% : 0.000006s : 32: predicate.less_batch_normalization 1.63% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.57% : 0.000020s : 168: predicate.load_eliminater 0.39% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.24% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.05% : 0.000008s : 68: predicate.minmaximum_grad 0.58% : 0.000005s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.23% : 0.000018s : 101: predicate.partial_defer_inline 1.61% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000010s : 68: predicate.reduce_eliminate 2.59% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.41% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000015s : 152: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000009s : 68: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.38% : 0.000011s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.40% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.77% : 0.000014s : 101: predicate.switch_defer_inline 2.81% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.96% : 0.000040s : 277: predicate.switch_simplify 1.16% : 0.000009s : 68: predicate.tile_eliminate 1.03% : 0.000008s : 68: predicate.transpose_eliminate 1.40% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000017s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.51% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000026s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002095 34 57.30% : 0.001200s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.70% : 0.000895s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.163128 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.39% : 0.003897s : 1: add_attr 2.38% : 0.003882s : 1: add_attr_with_inline 0.00% : 0.000005s : 1: add_comm_op_reuse_tag 0.05% : 0.000077s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000142s : 1: auto_monad 0.02% : 0.000034s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000511s : 1: bootstrap 0.02% : 0.000038s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000023s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000017s : 1: environ_conv 0.04% : 0.000067s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.32% : 0.000524s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.47% : 0.000763s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000026s : 1: opt.transform.mutable_eliminate 3.47% : 0.005662s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000039s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000178s : 28: opt.transform.opt_b 0.05% : 0.000083s : 2: opt.transform.opt_trans_graph 0.04% : 0.000060s : 4: opt.transform.symbol_engine_opt 8.89% : 0.014506s : 1: opt_a 0.10% : 0.000157s : 1: opt_after_cconv 0.39% : 0.000629s : 1: opt_after_jit_grad 0.20% : 0.000328s : 1: opt_b 10.67% : 0.017399s : 1: optimize 0.02% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000032s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000069s : 1: pre_auto_parallel 0.03% : 0.000055s : 1: py_interpret_to_execute 0.02% : 0.000025s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000052s : 1: remove_dup_value 1.59% : 0.002592s : 2: renormalize.infer 1.19% : 0.001943s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000065s : 1: rewriter_after_opt_a 0.11% : 0.000183s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000114s : 1: symbol_engine_optimizer 58.13% : 0.094831s : 1: task_emit 0.07% : 0.000116s : 1: tuple_transform 8.26% : 0.013476s : 1: type_inference 0.06% : 0.000101s : 1: validate TotalTime = 0.0994508, [24] [bootstrap]: 0.00042375 [type_inference]: 0.00467036 [event_method]: 1.188e-05 [auto_monad]: 5.219e-05 [graph_reusing]: 4.97e-06 [inline]: 2.89999e-06 [add_attr]: 0.0033183, [1] [add_attr_with_inline]: 0.00330688, [1] [Cycle 1]: 5.217e-05, [2] [tag_attr]: 1.372e-05 [meta_addattr_fg_expand]: 3.01999e-06 [parallel-infer-symbol]: 4.28001e-06 [pre_auto_parallel]: 2.812e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.15002e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00415629, [53] [py_interpret_to_execute]: 1.897e-05 [rewriter_before_opt_a]: 4.404e-05 [opt_a]: 0.00215083, [2] [Cycle 1]: 0.00153287, [45] [expand_dump_flag]: 2.77002e-06 [switch_simplify]: 2.502e-05 [loop_unroll]: 1.357e-05 [a_1]: 0.00030848 [with_stream_mark]: 1.751e-05 [recompute_prepare]: 7.73001e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.32002e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.816e-05 [accelerated_algorithm]: 6.45002e-06 [shard]: 2.49999e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 8.06001e-06 [auto_parallel]: 5.96998e-06 [parallel]: 1.908e-05 [flash_sp]: 8.58001e-06 [merge_comm]: 3.92998e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 9.10001e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.63002e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.148e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 1.055e-05 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.90002e-06 [flash_sp_send_recv_attached]: 2.64999e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.103e-05 [a_after_grad]: 9.31e-06 [renormalize]: 0.00057848 [add_forward_monad_depend]: 4.55001e-06 [auto_monad_grad]: 2.14999e-06 [auto_monad_eliminator]: 1.449e-05 [cse]: 3.001e-05 [a_3]: 4.207e-05 [Cycle 2]: 0.00060815, [45] [expand_dump_flag]: 1.24e-06 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012776 [with_stream_mark]: 1.185e-05 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.85998e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.8e-05 [accelerated_algorithm]: 5.61003e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.41002e-06 [shard_inline]: 5.30999e-06 [merge_send_recv]: 5.30001e-06 [auto_parallel]: 5.52001e-06 [parallel]: 4.67998e-06 [flash_sp]: 3.51999e-06 [merge_comm]: 3.49001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.46e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.40002e-06 [virtual_dataset]: 5.52001e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.81e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 6.39001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.38001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 1.14e-06 [receive_attached]: 1.38002e-06 [after_resolve]: 1.008e-05 [a_after_grad]: 8.82e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 7.16001e-06 [cse]: 1.376e-05 [a_3]: 3.18e-05 [py_interpret_to_execute_after_opt_a]: 9.42001e-06 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 3.403e-05 [convert_after_rewriter]: 7.43999e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00058825 [opt_b]: 0.00018534, [1] [Cycle 1]: 0.00017868, [7] [b_1]: 0.00010874 [b_2]: 7.16999e-06 [updatestate_depend_eliminate]: 5.61e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.62001e-06 [renormalize]: 4.69998e-07 [cse]: 1.72e-05 [optimize_parallel_all_gather_comm]: 1.611e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.456e-05 [loop_unroll]: 0.00043131 [opt_after_cconv]: 9.729e-05, [1] [Cycle 1]: 9.137e-05, [7] [c_1]: 2.815e-05 [parameter_eliminate]: 2.44999e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.712e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 1.325e-05 [tuple_transform]: 6.904e-05, [1] [Cycle 1]: 6.482e-05, [4] [d_1]: 3.978e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.758e-05 [cse_after_recomputation]: 2.098e-05, [1] [Cycle 1]: 1.608e-05, [1] [cse]: 1.087e-05 [environ_conv]: 5.56e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 2.70002e-06 [label_micro_interleaved_index]: 4.65001e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 9.39996e-07 [remove_cast_before_assign_add]: 1.55001e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.269e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 4.3e-06 [overlap_recompute_and_grad_model_parallel]: 4.83001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.59e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 3.76999e-06 [overlap_grad_flash_sp]: 1.873e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.89e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 7.079e-05, [1] [Cycle 1]: 6.632e-05, [6] [build]: 3.09001e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.211e-05 [opt_reshape]: 6.22001e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.96998e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.634e-05 [get_jit_bprop_graph]: 2.09e-06 [rewriter_after_jit_bprop_graph]: 3.54002e-06 [opt_after_jit_grad]: 0.00046362 [validate]: 3.791e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0859947 [execute]: 9.16998e-06 Sums bootstrap : 0.000424s : 0.45% type_inference : 0.004670s : 4.91% event_method : 0.000012s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000028s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000044s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000436s : 0.46% optimize.opt_a.with_stream_mark : 0.000029s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.02% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000579s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.02% optimize.opt_a.cse : 0.000044s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000588s : 0.62% optimize.opt_b.b_1 : 0.000109s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.03% optimize.loop_unroll : 0.000431s : 0.45% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000464s : 0.49% validate : 0.000038s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.085995s : 90.41% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000138 26 17.43% : 0.000024s : 4: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.07% : 0.000006s : 4: substitution.graph_param_transform 67.50% : 0.000093s : 2: substitution.inline 1.97% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.26% : 0.000004s : 4: substitution.remove_not_recompute_node 3.56% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004617 2 91.82% : 0.004239s : 1: type_inference.infer 8.18% : 0.000378s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000091 2 100.00% : 0.000091s : 2: match.inline ------[predicate.] 0.000140 984 0.77% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.75% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.88% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.86% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.01% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.44% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 1.01% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.38% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.86% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 9: predicate.minmaximum_grad 1.53% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.16% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.68% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.82% : 0.000001s : 9: predicate.tile_eliminate 1.02% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.97% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000299 6 39.07% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.93% : 0.000182s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.108446 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.06% : 0.003323s : 1: add_attr 3.05% : 0.003311s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000464s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000018s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.41% : 0.000440s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000598s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 0.73% : 0.000793s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.99% : 0.002154s : 1: opt_a 0.09% : 0.000101s : 1: opt_after_cconv 0.44% : 0.000474s : 1: opt_after_jit_grad 0.17% : 0.000189s : 1: opt_b 3.84% : 0.004161s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.31% : 0.000337s : 1: renormalize.infer 0.22% : 0.000235s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000038s : 1: rewriter_after_opt_a 0.04% : 0.000048s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000073s : 1: symbol_engine_optimizer 79.32% : 0.086017s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 4.33% : 0.004694s : 1: type_inference 0.06% : 0.000068s : 1: validate TotalTime = 0.137376, [24] [bootstrap]: 0.00050219 [type_inference]: 0.0122206 [event_method]: 4.973e-05 [auto_monad]: 0.00017436 [graph_reusing]: 8.48999e-06 [inline]: 3.46999e-06 [add_attr]: 0.00357882, [1] [add_attr_with_inline]: 0.00356697, [1] [Cycle 1]: 9.088e-05, [2] [tag_attr]: 3.999e-05 [meta_addattr_fg_expand]: 9.07001e-06 [parallel-infer-symbol]: 3.91999e-06 [pre_auto_parallel]: 5.656e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0160746, [53] [py_interpret_to_execute]: 4.342e-05 [rewriter_before_opt_a]: 0.00014541 [opt_a]: 0.0133559, [3] [Cycle 1]: 0.00868116, [45] [expand_dump_flag]: 5.04e-06 [switch_simplify]: 0.00011905 [loop_unroll]: 5.621e-05 [a_1]: 0.00146807 [with_stream_mark]: 3.724e-05 [recompute_prepare]: 2.768e-05 [updatestate_depend_eliminate]: 1.067e-05 [updatestate_assign_eliminate]: 7.88001e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 2.52001e-06 [a_2]: 0.00025391 [accelerated_algorithm]: 3.833e-05 [shard]: 2.73e-06 [meta_shard_fg_expand]: 4.89e-06 [shard_inline]: 1.686e-05 [merge_send_recv]: 1.915e-05 [auto_parallel]: 1.552e-05 [parallel]: 2.178e-05 [flash_sp]: 1.556e-05 [merge_comm]: 1.038e-05 [allreduce_fusion]: 8.77e-06 [matmul_add_comm_reduction]: 3.397e-05 [allreduce_slice_to_reducescatter]: 1.33002e-06 [virtual_shard_identity]: 2.114e-05 [virtual_dataset]: 1.665e-05 [get_grad_eliminate_]: 1.584e-05 [virtual_output]: 1.578e-05 [merge_forward]: 1.044e-05 [cell_reuse_recompute_pass]: 1.86998e-06 [offload_activation]: 1.988e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.122e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 2.946e-05 [set_forward_comm_id_for_comm_node_pass]: 1.028e-05 [meta_fg_expand]: 0.00193983 [flash_sp_send_recv_attached]: 4.79e-06 [receive_attached]: 2.94001e-06 [after_resolve]: 7.274e-05 [a_after_grad]: 8.727e-05 [renormalize]: 0.00312557 [add_forward_monad_depend]: 1.383e-05 [auto_monad_grad]: 7.25e-06 [auto_monad_eliminator]: 6.391e-05 [cse]: 0.00027616 [a_3]: 0.00036305 [Cycle 2]: 0.00369558, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 5.007e-05 [loop_unroll]: 4.546e-05 [a_1]: 0.00167265 [with_stream_mark]: 2.438e-05 [recompute_prepare]: 1.279e-05 [updatestate_depend_eliminate]: 7.08e-06 [updatestate_assign_eliminate]: 5.96e-06 [updatestate_loads_eliminate]: 4.42e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 0.00013321 [accelerated_algorithm]: 1.526e-05 [shard]: 2.35002e-06 [meta_shard_fg_expand]: 2.84999e-06 [shard_inline]: 9.59e-06 [merge_send_recv]: 1.096e-05 [auto_parallel]: 1.25e-05 [parallel]: 9.78002e-06 [flash_sp]: 3.80998e-06 [merge_comm]: 6.26e-06 [allreduce_fusion]: 5.84999e-06 [matmul_add_comm_reduction]: 1.199e-05 [allreduce_slice_to_reducescatter]: 1.07e-06 [virtual_shard_identity]: 1.15e-05 [virtual_dataset]: 9.46003e-06 [get_grad_eliminate_]: 9.09e-06 [virtual_output]: 8.79e-06 [merge_forward]: 5.60001e-06 [cell_reuse_recompute_pass]: 1.88002e-06 [offload_activation]: 1.345e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.798e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 1.536e-05 [set_forward_comm_id_for_comm_node_pass]: 5.94e-06 [meta_fg_expand]: 6.42e-05 [flash_sp_send_recv_attached]: 1.35001e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 1.8e-05 [a_after_grad]: 1.551e-05 [renormalize]: 0.00099612 [add_forward_monad_depend]: 5.90002e-06 [auto_monad_grad]: 2.66999e-06 [auto_monad_eliminator]: 1.977e-05 [cse]: 6.546e-05 [a_3]: 7.206e-05 [Cycle 3]: 0.0009588, [45] [expand_dump_flag]: 2.21e-06 [switch_simplify]: 1.169e-05 [loop_unroll]: 9.19998e-06 [a_1]: 0.0002659 [with_stream_mark]: 1.306e-05 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 4.12998e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.00012571 [accelerated_algorithm]: 1.258e-05 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 9.17001e-06 [merge_send_recv]: 8.22998e-06 [auto_parallel]: 8.27e-06 [parallel]: 6.58e-06 [flash_sp]: 1.39e-06 [merge_comm]: 5.19e-06 [allreduce_fusion]: 5.27001e-06 [matmul_add_comm_reduction]: 9.32999e-06 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 1.056e-05 [virtual_dataset]: 8.75001e-06 [get_grad_eliminate_]: 8.65999e-06 [virtual_output]: 8.37e-06 [merge_forward]: 5.51e-06 [cell_reuse_recompute_pass]: 1.92001e-06 [offload_activation]: 1.083e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.695e-05 [merge_recompute_call_nodes]: 1.29998e-06 [before_grad]: 1.46e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44998e-06 [meta_fg_expand]: 3.58e-06 [flash_sp_send_recv_attached]: 1.20999e-06 [receive_attached]: 1.39998e-06 [after_resolve]: 1.482e-05 [a_after_grad]: 1.481e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.52001e-06 [auto_monad_grad]: 1.24998e-06 [auto_monad_eliminator]: 1.268e-05 [cse]: 2.88e-05 [a_3]: 6.138e-05 [py_interpret_to_execute_after_opt_a]: 1.926e-05 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 5.483e-05 [convert_after_rewriter]: 9.89999e-06 [order_py_execute_after_rewriter]: 6.69999e-06 [mutable_eliminate]: 0.00070638 [opt_b]: 0.00030833, [1] [Cycle 1]: 0.00029967, [7] [b_1]: 0.00019472 [b_2]: 1.095e-05 [updatestate_depend_eliminate]: 8.77999e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 4.19997e-06 [renormalize]: 5.69999e-07 [cse]: 3.862e-05 [optimize_parallel_all_gather_comm]: 2.294e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 3.201e-05 [loop_unroll]: 0.00048566 [opt_after_cconv]: 0.00014549, [1] [Cycle 1]: 0.00013886, [7] [c_1]: 5.068e-05 [parameter_eliminate]: 3.57002e-06 [updatestate_depend_eliminate]: 7.75e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 4.05998e-06 [cse]: 3.273e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 4.586e-05 [tuple_transform]: 0.00010663, [1] [Cycle 1]: 0.00010136, [4] [d_1]: 7.024e-05 [none_parameter_eliminate]: 1.85001e-06 [renormalize]: 1.30007e-07 [switch_simplify]: 9.78998e-06 [partial_unused_args_eliminate]: 1.66002e-06 [add_recomputation]: 7.751e-05 [cse_after_recomputation]: 3.534e-05, [1] [Cycle 1]: 3.032e-05, [1] [cse]: 2.45e-05 [environ_conv]: 1.123e-05 [swap_dp_allreduce_reducescatter]: 7.71999e-06 [bias_add_comm_swap]: 2.93e-06 [label_micro_interleaved_index]: 5.27999e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.56998e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.35002e-06 [reorder_send_recv_between_fp_bp]: 2.92002e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.19e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.86e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 5.47999e-06 [overlap_recompute_and_grad_model_parallel]: 5.69e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 5.47999e-06 [overlap_grad_flash_sp]: 2.93e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 0.00010687, [1] [Cycle 1]: 0.00010214, [6] [build]: 1.108e-05 [elim_shapecalc]: 1.437e-05 [elim_not_effective]: 2.099e-05 [opt_reshape]: 1.079e-05 [fold_const_symbol]: 1.535e-05 [renormalize]: 1.59984e-07 [detach_backward]: 2.39999e-06 [pipeline_parallel_scheduler]: 1.41998e-06 [auto_monad_reorder]: 2.522e-05 [get_jit_bprop_graph]: 1.94999e-06 [rewriter_after_jit_bprop_graph]: 4.60999e-06 [opt_after_jit_grad]: 0.0006095 [validate]: 5.841e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.103711 [execute]: 1.07e-05 Sums bootstrap : 0.000502s : 0.38% type_inference : 0.012221s : 9.24% event_method : 0.000050s : 0.04% auto_monad : 0.000174s : 0.13% graph_reusing : 0.000008s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000040s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000057s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000043s : 0.03% optimize.rewriter_before_opt_a : 0.000145s : 0.11% optimize.opt_a.expand_dump_flag : 0.000010s : 0.01% optimize.opt_a.switch_simplify : 0.000181s : 0.14% optimize.opt_a.loop_unroll : 0.000111s : 0.08% optimize.opt_a.a_1 : 0.003407s : 2.57% optimize.opt_a.with_stream_mark : 0.000075s : 0.06% optimize.opt_a.recompute_prepare : 0.000050s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000023s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000513s : 0.39% optimize.opt_a.accelerated_algorithm : 0.000066s : 0.05% optimize.opt_a.shard : 0.000007s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.01% optimize.opt_a.shard_inline : 0.000036s : 0.03% optimize.opt_a.merge_send_recv : 0.000038s : 0.03% optimize.opt_a.auto_parallel : 0.000036s : 0.03% optimize.opt_a.parallel : 0.000038s : 0.03% optimize.opt_a.flash_sp : 0.000021s : 0.02% optimize.opt_a.merge_comm : 0.000022s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000055s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.03% optimize.opt_a.virtual_dataset : 0.000035s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000022s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000044s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000066s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000059s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.02% optimize.opt_a.meta_fg_expand : 0.002008s : 1.52% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.01% optimize.opt_a.receive_attached : 0.000007s : 0.01% optimize.opt_a.after_resolve : 0.000106s : 0.08% optimize.opt_a.a_after_grad : 0.000118s : 0.09% optimize.opt_a.renormalize : 0.004122s : 3.12% optimize.opt_a.add_forward_monad_depend : 0.000021s : 0.02% optimize.opt_a.auto_monad_grad : 0.000011s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000096s : 0.07% optimize.opt_a.cse : 0.000370s : 0.28% optimize.opt_a.a_3 : 0.000496s : 0.38% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000055s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000706s : 0.53% optimize.opt_b.b_1 : 0.000195s : 0.15% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000039s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000032s : 0.02% optimize.loop_unroll : 0.000486s : 0.37% optimize.opt_after_cconv.c_1 : 0.000051s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000046s : 0.03% optimize.tuple_transform.d_1 : 0.000070s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000078s : 0.06% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000011s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000021s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000610s : 0.46% validate : 0.000058s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.103711s : 78.38% execute : 0.000011s : 0.01% Time group info: ------[substitution.] 0.000961 218 6.56% : 0.000063s : 11: substitution.arithmetic_simplify 2.12% : 0.000020s : 2: substitution.cast_eliminate 0.30% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000005s : 5: substitution.float_depend_g_call 0.51% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 5: substitution.fold_const_symbol 0.85% : 0.000008s : 8: substitution.graph_param_transform 0.30% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 57.35% : 0.000551s : 16: substitution.inline 2.10% : 0.000020s : 2: substitution.inline_without_move 1.28% : 0.000012s : 20: substitution.j_node_and_user_rematch 2.27% : 0.000022s : 3: substitution.less_batch_normalization 1.52% : 0.000015s : 11: substitution.minmaximum_grad 0.73% : 0.000007s : 5: substitution.partial_eliminate 1.54% : 0.000015s : 20: substitution.remove_not_recompute_node 3.30% : 0.000032s : 10: substitution.replace_applicator 1.39% : 0.000013s : 15: substitution.replace_old_param 0.44% : 0.000004s : 1: substitution.set_cell_output_no_recompute 3.37% : 0.000032s : 11: substitution.tuple_list_convert_item_index_to_positive 1.47% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.01% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.51% : 0.000072s : 28: substitution.tuple_list_get_item_eliminator 2.08% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012126 2 86.93% : 0.010541s : 1: type_inference.infer 13.07% : 0.001585s : 1: type_inference.specialize ------[replace.] 0.000228 30 60.71% : 0.000139s : 16: replace.inline 39.29% : 0.000090s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000576 30 93.96% : 0.000541s : 16: match.inline 6.04% : 0.000035s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000772 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.33% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.17% : 0.000009s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.17% : 0.000017s : 99: predicate.arithmetic_simplify 1.15% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.26% : 0.000010s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.45% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.68% : 0.000013s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.64% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.31% : 0.000018s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.58% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.51% : 0.000043s : 244: predicate.inline 1.33% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000006s : 32: predicate.less_batch_normalization 1.57% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000020s : 164: predicate.load_eliminater 0.37% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.16% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000009s : 67: predicate.minmaximum_grad 0.39% : 0.000003s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000016s : 97: predicate.partial_defer_inline 1.63% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000021s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000015s : 149: predicate.replace_applicator 0.66% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000009s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.66% : 0.000005s : 32: predicate.shard_identity_eliminate 0.38% : 0.000003s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.42% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.77% : 0.000014s : 97: predicate.switch_defer_inline 2.83% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.85% : 0.000037s : 265: predicate.switch_simplify 1.10% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.43% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.15% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.62% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000005s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001887 32 58.14% : 0.001097s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.86% : 0.000790s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.166682 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.15% : 0.003584s : 1: add_attr 2.14% : 0.003571s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000082s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.11% : 0.000182s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.32% : 0.000542s : 1: bootstrap 0.02% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000014s : 1: environ_conv 0.04% : 0.000060s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000007s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.30% : 0.000495s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.43% : 0.000717s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000021s : 1: opt.transform.mutable_eliminate 3.12% : 0.005207s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000178s : 28: opt.transform.opt_b 0.05% : 0.000078s : 2: opt.transform.opt_trans_graph 0.03% : 0.000058s : 4: opt.transform.symbol_engine_opt 8.01% : 0.013359s : 1: opt_a 0.09% : 0.000149s : 1: opt_after_cconv 0.37% : 0.000620s : 1: opt_after_jit_grad 0.19% : 0.000312s : 1: opt_b 9.65% : 0.016080s : 1: optimize 0.02% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000033s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000061s : 1: pre_auto_parallel 0.03% : 0.000048s : 1: py_interpret_to_execute 0.01% : 0.000024s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000050s : 1: remove_dup_value 1.37% : 0.002288s : 2: renormalize.infer 1.09% : 0.001815s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000060s : 1: rewriter_after_opt_a 0.09% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000110s : 1: symbol_engine_optimizer 62.24% : 0.103736s : 1: task_emit 0.07% : 0.000110s : 1: tuple_transform 7.35% : 0.012254s : 1: type_inference 0.06% : 0.000093s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x0-ge],max_mem:46.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x1-pynative],max_mem:46.0M TotalTime = 0.0244077, [24] [bootstrap]: 0.00059006 [type_inference]: 0.00678412 [event_method]: 1.625e-05 [auto_monad]: 5.825e-05 [graph_reusing]: 5.72999e-06 [inline]: 2.27999e-06 [add_attr]: 0.00402057, [1] [add_attr_with_inline]: 0.00400589, [1] [Cycle 1]: 6.359e-05, [2] [tag_attr]: 2.054e-05 [meta_addattr_fg_expand]: 4.13999e-06 [parallel-infer-symbol]: 3.70998e-06 [pre_auto_parallel]: 4.107e-05 [insert-virtual-dataset]: 2.74001e-06 [parallel-infer-symbol-second]: 1.02e-06 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.00473077, [53] [py_interpret_to_execute]: 2.498e-05 [rewriter_before_opt_a]: 7.041e-05 [opt_a]: 0.00259622, [2] [Cycle 1]: 0.00194427, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 3.587e-05 [loop_unroll]: 2.151e-05 [a_1]: 0.00049551 [with_stream_mark]: 1.767e-05 [recompute_prepare]: 8.36002e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.29001e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 2.03997e-06 [a_2]: 8.085e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 2.02001e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 8.57e-06 [auto_parallel]: 7.23e-06 [parallel]: 2.76e-05 [flash_sp]: 8.79e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 9.91e-06 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 8.44998e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 6.06998e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 4.50999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.031e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.222e-05 [merge_recompute_call_nodes]: 2.21e-06 [before_grad]: 1.13e-05 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 2.37001e-06 [flash_sp_send_recv_attached]: 3.04001e-06 [receive_attached]: 3.06999e-06 [after_resolve]: 1.146e-05 [a_after_grad]: 9.02999e-06 [renormalize]: 0.0007156 [add_forward_monad_depend]: 5.71003e-06 [auto_monad_grad]: 2.81999e-06 [auto_monad_eliminator]: 1.637e-05 [cse]: 3.019e-05 [a_3]: 4.545e-05 [Cycle 2]: 0.00064057, [45] [expand_dump_flag]: 1.29e-06 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.46998e-06 [a_1]: 0.00013194 [with_stream_mark]: 1.194e-05 [recompute_prepare]: 6.42001e-06 [updatestate_depend_eliminate]: 3.31999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.17999e-06 [a_2]: 6.991e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 1.59e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 5.74e-06 [auto_parallel]: 6.53003e-06 [parallel]: 6.02999e-06 [flash_sp]: 3.4e-06 [merge_comm]: 3.41999e-06 [allreduce_fusion]: 3.16999e-06 [matmul_add_comm_reduction]: 7.11001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.73003e-06 [virtual_dataset]: 5.58002e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 4.79e-06 [merge_forward]: 3.08998e-06 [cell_reuse_recompute_pass]: 1.92999e-06 [offload_activation]: 8.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.77001e-06 [merge_recompute_call_nodes]: 1.03001e-06 [before_grad]: 8.23999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.96001e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 1.10999e-06 [receive_attached]: 1.77001e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.74e-06 [auto_monad_grad]: 1.07998e-06 [auto_monad_eliminator]: 8e-06 [cse]: 1.764e-05 [a_3]: 3.287e-05 [py_interpret_to_execute_after_opt_a]: 1.113e-05 [slice_cell_reuse_recomputed_activation]: 2.39001e-06 [rewriter_after_opt_a]: 3.564e-05 [convert_after_rewriter]: 8.27e-06 [order_py_execute_after_rewriter]: 5.92999e-06 [mutable_eliminate]: 0.00061746 [opt_b]: 0.00019422, [1] [Cycle 1]: 0.00018663, [7] [b_1]: 0.0001113 [b_2]: 7.38e-06 [updatestate_depend_eliminate]: 6.28002e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 1.07e-06 [cse]: 2.204e-05 [optimize_parallel_all_gather_comm]: 1.723e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.84e-05 [loop_unroll]: 0.00044388 [opt_after_cconv]: 0.00010391, [1] [Cycle 1]: 9.698e-05, [7] [c_1]: 2.845e-05 [parameter_eliminate]: 4.13001e-06 [updatestate_depend_eliminate]: 5.95002e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.823e-05 [renormalize]: 6.50005e-07 [remove_dup_value]: 1.284e-05 [tuple_transform]: 7.181e-05, [1] [Cycle 1]: 6.724e-05, [4] [d_1]: 4.126e-05 [none_parameter_eliminate]: 1.44998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.20002e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 5.961e-05 [cse_after_recomputation]: 2.144e-05, [1] [Cycle 1]: 1.662e-05, [1] [cse]: 1.131e-05 [environ_conv]: 5.89e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 3.06001e-06 [label_micro_interleaved_index]: 4.95001e-06 [label_fine_grained_interleaved_index]: 3.22002e-06 [merge_cast_opt]: 1.55999e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.05002e-06 [assign_add_opt]: 1.64e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.33002e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.22999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.53002e-06 [control_data_broadcast_order]: 1.234e-05 [grouped_pairwise_exchange_alltoall]: 1.63002e-06 [offloading_packed_experts]: 3.65998e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.994e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.38e-05, [1] [Cycle 1]: 6.955e-05, [6] [build]: 2.93e-06 [elim_shapecalc]: 9.60001e-06 [elim_not_effective]: 1.232e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 9.83998e-06 [renormalize]: 4.39992e-07 [detach_backward]: 2.37999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.669e-05 [get_jit_bprop_graph]: 1.87001e-06 [rewriter_after_jit_bprop_graph]: 0.00013313 [opt_after_jit_grad]: 0.00054178 [validate]: 3.945e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.00715446 [execute]: 8.42e-06 Sums bootstrap : 0.000590s : 3.06% type_inference : 0.006784s : 35.13% event_method : 0.000016s : 0.08% auto_monad : 0.000058s : 0.30% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000021s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000041s : 0.21% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000025s : 0.13% optimize.rewriter_before_opt_a : 0.000070s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000043s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.14% optimize.opt_a.a_1 : 0.000627s : 3.25% optimize.opt_a.with_stream_mark : 0.000030s : 0.15% optimize.opt_a.recompute_prepare : 0.000015s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.78% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000014s : 0.07% optimize.opt_a.auto_parallel : 0.000014s : 0.07% optimize.opt_a.parallel : 0.000034s : 0.17% optimize.opt_a.flash_sp : 0.000012s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.05% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000019s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000020s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.03% optimize.opt_a.after_resolve : 0.000022s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.09% optimize.opt_a.renormalize : 0.000716s : 3.71% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.13% optimize.opt_a.cse : 0.000048s : 0.25% optimize.opt_a.a_3 : 0.000078s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.18% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000617s : 3.20% optimize.opt_b.b_1 : 0.000111s : 0.58% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.01% optimize.opt_b.cse : 0.000022s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.15% optimize.loop_unroll : 0.000444s : 2.30% optimize.opt_after_cconv.c_1 : 0.000028s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000006s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000133s : 0.69% opt_after_jit_grad : 0.000542s : 2.81% validate : 0.000039s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.007154s : 37.05% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000202 30 14.67% : 0.000030s : 5: substitution.arithmetic_simplify 0.98% : 0.000002s : 2: substitution.elim_not_effective 0.61% : 0.000001s : 2: substitution.fold_const_symbol 2.91% : 0.000006s : 4: substitution.graph_param_transform 68.52% : 0.000138s : 3: substitution.inline 1.79% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.14% : 0.000004s : 4: substitution.remove_not_recompute_node 3.01% : 0.000006s : 4: substitution.replace_old_param 5.37% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006729 2 90.63% : 0.006099s : 1: type_inference.infer 9.37% : 0.000630s : 1: type_inference.specialize ------[replace.] 0.000044 5 72.20% : 0.000032s : 3: replace.inline 27.80% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000146 5 93.33% : 0.000136s : 3: match.inline 6.67% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 1131 0.79% : 0.000001s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 11: predicate.addn_zero_filter 0.73% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.03% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 16: predicate.float_depend_g_call 0.54% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.99% : 0.000002s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000010s : 51: predicate.inline 0.94% : 0.000002s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.27% : 0.000004s : 32: predicate.load_eliminater 1.41% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.29% : 0.000002s : 11: predicate.reduce_eliminate 2.55% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000002s : 8: predicate.same_eliminate 0.67% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.87% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 1.91% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.19% : 0.000009s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.42% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.26% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.18% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.66% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.64% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000453 8 43.90% : 0.000199s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.10% : 0.000254s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.035040 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.49% : 0.004026s : 1: add_attr 11.44% : 0.004010s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000063s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.80% : 0.000629s : 1: bootstrap 0.09% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000012s : 1: convert_after_rewriter 0.07% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000023s : 1: event_method 0.04% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000008s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.30% : 0.000454s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.79% : 0.000628s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000015s : 1: opt.transform.mutable_eliminate 2.89% : 0.001014s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000025s : 1: opt.transform.opt_after_jit_grad 0.26% : 0.000092s : 28: opt.transform.opt_b 0.13% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000035s : 4: opt.transform.symbol_engine_opt 7.42% : 0.002599s : 1: opt_a 0.31% : 0.000107s : 1: opt_after_cconv 1.58% : 0.000553s : 1: opt_after_jit_grad 0.56% : 0.000198s : 1: opt_b 13.52% : 0.004736s : 1: optimize 0.06% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.13% : 0.000047s : 1: pre_auto_parallel 0.08% : 0.000029s : 1: py_interpret_to_execute 0.04% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 1.12% : 0.000394s : 1: renormalize.infer 0.90% : 0.000314s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.40% : 0.000139s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000040s : 1: rewriter_after_opt_a 0.21% : 0.000075s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000077s : 1: symbol_engine_optimizer 20.47% : 0.007173s : 1: task_emit 0.21% : 0.000075s : 1: tuple_transform 19.42% : 0.006805s : 1: type_inference 0.22% : 0.000077s : 1: validate TotalTime = 0.0203649, [24] [bootstrap]: 0.00042917 [type_inference]: 0.00468856 [event_method]: 1.171e-05 [auto_monad]: 5.406e-05 [graph_reusing]: 4.75001e-06 [inline]: 2.35002e-06 [add_attr]: 0.00350702, [1] [add_attr_with_inline]: 0.00349548, [1] [Cycle 1]: 5.447e-05, [2] [tag_attr]: 1.39e-05 [meta_addattr_fg_expand]: 3.23e-06 [parallel-infer-symbol]: 3.41999e-06 [pre_auto_parallel]: 2.857e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.66e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00436585, [53] [py_interpret_to_execute]: 1.948e-05 [rewriter_before_opt_a]: 4.693e-05 [opt_a]: 0.00224672, [2] [Cycle 1]: 0.00160865, [45] [expand_dump_flag]: 2.75002e-06 [switch_simplify]: 2.488e-05 [loop_unroll]: 1.423e-05 [a_1]: 0.00031851 [with_stream_mark]: 1.947e-05 [recompute_prepare]: 8.52e-06 [updatestate_depend_eliminate]: 4.21001e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 8.023e-05 [accelerated_algorithm]: 6.51999e-06 [shard]: 2.85998e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.82e-06 [auto_parallel]: 6.84001e-06 [parallel]: 1.959e-05 [flash_sp]: 8.58001e-06 [merge_comm]: 4.07998e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.46002e-06 [allreduce_slice_to_reducescatter]: 6.49976e-07 [virtual_shard_identity]: 7.91001e-06 [virtual_dataset]: 6.01998e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.71003e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.074e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.203e-05 [merge_recompute_call_nodes]: 1.41002e-06 [before_grad]: 9.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.02e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.11998e-06 [receive_attached]: 2.41998e-06 [after_resolve]: 1.193e-05 [a_after_grad]: 9.15999e-06 [renormalize]: 0.00062051 [add_forward_monad_depend]: 5.78997e-06 [auto_monad_grad]: 2.64999e-06 [auto_monad_eliminator]: 1.435e-05 [cse]: 3.094e-05 [a_3]: 4.275e-05 [Cycle 2]: 0.00062704, [45] [expand_dump_flag]: 1.70001e-06 [switch_simplify]: 6.99001e-06 [loop_unroll]: 5.60001e-06 [a_1]: 0.00012828 [with_stream_mark]: 1.351e-05 [recompute_prepare]: 5.69e-06 [updatestate_depend_eliminate]: 3.13998e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.55997e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 6.931e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 1.24998e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 5.33002e-06 [auto_parallel]: 5.99999e-06 [parallel]: 6.02001e-06 [flash_sp]: 4.17998e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 6.59999e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.39001e-06 [virtual_dataset]: 5.58002e-06 [get_grad_eliminate_]: 5.25001e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.68002e-06 [offload_activation]: 7.5e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.92999e-06 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 8.05e-06 [set_forward_comm_id_for_comm_node_pass]: 3.93001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 1.71998e-06 [receive_attached]: 1.58002e-06 [after_resolve]: 1.015e-05 [a_after_grad]: 7.90998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.91e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 6.36998e-06 [cse]: 1.468e-05 [a_3]: 3.454e-05 [py_interpret_to_execute_after_opt_a]: 1.01e-05 [slice_cell_reuse_recomputed_activation]: 1.93002e-06 [rewriter_after_opt_a]: 3.457e-05 [convert_after_rewriter]: 7.37997e-06 [order_py_execute_after_rewriter]: 5.17999e-06 [mutable_eliminate]: 0.0006774 [opt_b]: 0.0001879, [1] [Cycle 1]: 0.0001809, [7] [b_1]: 0.00010914 [b_2]: 6.91001e-06 [updatestate_depend_eliminate]: 6.57002e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 3.50003e-07 [cse]: 1.895e-05 [optimize_parallel_all_gather_comm]: 1.86e-05 [overlap_param_gather]: 2.16998e-06 [cconv]: 2.89e-05 [loop_unroll]: 0.00043455 [opt_after_cconv]: 9.868e-05, [1] [Cycle 1]: 9.306e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 3.40003e-06 [updatestate_depend_eliminate]: 5.31002e-06 [updatestate_assign_eliminate]: 2.48002e-06 [updatestate_loads_eliminate]: 2.08998e-06 [cse]: 1.78e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.264e-05 [tuple_transform]: 7.048e-05, [1] [Cycle 1]: 6.627e-05, [4] [d_1]: 4.079e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.696e-05 [cse_after_recomputation]: 2.15e-05, [1] [Cycle 1]: 1.7e-05, [1] [cse]: 1.138e-05 [environ_conv]: 5.29e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.78e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.41002e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.42999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.213e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 4.13001e-06 [overlap_recompute_and_grad_model_parallel]: 4.91997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.51002e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 2.042e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 2.14999e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 7.148e-05, [1] [Cycle 1]: 6.689e-05, [6] [build]: 3.27997e-06 [elim_shapecalc]: 8.65999e-06 [elim_not_effective]: 1.185e-05 [opt_reshape]: 6.49001e-06 [fold_const_symbol]: 8.99e-06 [renormalize]: 1.80007e-07 [detach_backward]: 2.36e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.66e-05 [get_jit_bprop_graph]: 2.16e-06 [rewriter_after_jit_bprop_graph]: 4.42e-06 [opt_after_jit_grad]: 0.00047058 [validate]: 3.76e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00647776 [execute]: 8.62998e-06 Sums bootstrap : 0.000429s : 2.71% type_inference : 0.004689s : 29.63% event_method : 0.000012s : 0.07% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.18% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000047s : 0.30% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.20% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000447s : 2.82% optimize.opt_a.with_stream_mark : 0.000033s : 0.21% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000150s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.16% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000022s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000621s : 3.92% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.05% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000046s : 0.29% optimize.opt_a.a_3 : 0.000077s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000677s : 4.28% optimize.opt_b.b_1 : 0.000109s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000029s : 0.18% optimize.loop_unroll : 0.000435s : 2.75% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000020s : 0.13% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000471s : 2.97% validate : 0.000038s : 0.24% backend_pass : 0.000001s : 0.01% task_emit : 0.006478s : 40.94% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000146 26 17.76% : 0.000026s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.02% : 0.000006s : 4: substitution.graph_param_transform 67.34% : 0.000098s : 2: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.99% : 0.000004s : 4: substitution.remove_not_recompute_node 3.36% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004634 2 91.77% : 0.004253s : 1: type_inference.infer 8.23% : 0.000381s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000097 2 100.00% : 0.000097s : 2: match.inline ------[predicate.] 0.000141 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.83% : 0.000001s : 8: predicate.depend_value_elim 0.75% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.75% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.98% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.20% : 0.000002s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.16% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000009s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.07% : 0.000003s : 26: predicate.load_eliminater 1.91% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.91% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.58% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.82% : 0.000003s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.86% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.08% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 1.09% : 0.000002s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.53% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.94% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000297 6 37.30% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.70% : 0.000186s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029822 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.78% : 0.003512s : 1: add_attr 11.74% : 0.003500s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000059s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.56% : 0.000467s : 1: bootstrap 0.11% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000444s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.31% : 0.000689s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.72% : 0.000812s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.54% : 0.002250s : 1: opt_a 0.34% : 0.000102s : 1: opt_after_cconv 1.62% : 0.000482s : 1: opt_after_jit_grad 0.64% : 0.000191s : 1: opt_b 14.66% : 0.004371s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.05% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 1.21% : 0.000362s : 1: renormalize.infer 0.83% : 0.000249s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000038s : 1: rewriter_after_opt_a 0.17% : 0.000051s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000074s : 1: symbol_engine_optimizer 21.78% : 0.006497s : 1: task_emit 0.25% : 0.000073s : 1: tuple_transform 15.81% : 0.004713s : 1: type_inference 0.24% : 0.000073s : 1: validate TotalTime = 0.0215432, [24] [bootstrap]: 0.00041735 [type_inference]: 0.0058155 [event_method]: 1.528e-05 [auto_monad]: 5.961e-05 [graph_reusing]: 5.02e-06 [inline]: 2.70997e-06 [add_attr]: 0.00333514, [1] [add_attr_with_inline]: 0.00332423, [1] [Cycle 1]: 5.604e-05, [2] [tag_attr]: 1.729e-05 [meta_addattr_fg_expand]: 3.91999e-06 [parallel-infer-symbol]: 3.57002e-06 [pre_auto_parallel]: 3.194e-05 [insert-virtual-dataset]: 2.93998e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00457157, [53] [py_interpret_to_execute]: 2.228e-05 [rewriter_before_opt_a]: 6.405e-05 [opt_a]: 0.00249834, [2] [Cycle 1]: 0.00186394, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 3.413e-05 [loop_unroll]: 2.078e-05 [a_1]: 0.00046907 [with_stream_mark]: 1.666e-05 [recompute_prepare]: 8.37998e-06 [updatestate_depend_eliminate]: 4.07e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.756e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 1.91e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.75999e-06 [auto_parallel]: 6.69999e-06 [parallel]: 1.833e-05 [flash_sp]: 8.39002e-06 [merge_comm]: 4.05e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 7.58001e-06 [virtual_dataset]: 6.84001e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.66e-06 [merge_forward]: 4.42e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 1.055e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.91e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.73e-06 [flash_sp_send_recv_attached]: 2.87002e-06 [receive_attached]: 2.83e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 8.65999e-06 [renormalize]: 0.00065646 [add_forward_monad_depend]: 5.22999e-06 [auto_monad_grad]: 2.63e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 3.067e-05 [a_3]: 4.262e-05 [Cycle 2]: 0.00062388, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012608 [with_stream_mark]: 1.124e-05 [recompute_prepare]: 6.06e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.726e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 4.62e-06 [auto_parallel]: 6.38003e-06 [parallel]: 5.54998e-06 [flash_sp]: 3.55e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.08998e-06 [matmul_add_comm_reduction]: 6.51999e-06 [allreduce_slice_to_reducescatter]: 2.3999e-07 [virtual_shard_identity]: 7.2e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 3.04999e-06 [cell_reuse_recompute_pass]: 1.59998e-06 [offload_activation]: 6.59001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.115e-05 [merge_recompute_call_nodes]: 9.80013e-07 [before_grad]: 9.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.42001e-06 [flash_sp_send_recv_attached]: 1.05001e-06 [receive_attached]: 1.31998e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 9.37999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 1.02998e-06 [auto_monad_eliminator]: 7.53999e-06 [cse]: 1.337e-05 [a_3]: 3.174e-05 [py_interpret_to_execute_after_opt_a]: 1.114e-05 [slice_cell_reuse_recomputed_activation]: 2.26998e-06 [rewriter_after_opt_a]: 3.472e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.56002e-06 [mutable_eliminate]: 0.00056912 [opt_b]: 0.00018925, [1] [Cycle 1]: 0.00018232, [7] [b_1]: 0.00011096 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 6.30002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 5.39992e-07 [cse]: 1.85e-05 [optimize_parallel_all_gather_comm]: 1.795e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.818e-05 [loop_unroll]: 0.00044824 [opt_after_cconv]: 0.00010067, [1] [Cycle 1]: 9.387e-05, [7] [c_1]: 2.861e-05 [parameter_eliminate]: 3.51999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.33002e-06 [cse]: 1.779e-05 [renormalize]: 9.50007e-07 [remove_dup_value]: 1.371e-05 [tuple_transform]: 7.393e-05, [1] [Cycle 1]: 6.86e-05, [4] [d_1]: 4.118e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 7.3e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.987e-05 [cse_after_recomputation]: 2.375e-05, [1] [Cycle 1]: 1.876e-05, [1] [cse]: 1.207e-05 [environ_conv]: 5.39e-06 [swap_dp_allreduce_reducescatter]: 5.74e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 5.62999e-06 [label_fine_grained_interleaved_index]: 2.95002e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.50002e-06 [micro_interleaved_order_control]: 2.70997e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 1.14998e-06 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.32e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.214e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.82998e-06 [overlap_recompute_and_grad_model_parallel]: 5.37999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.24999e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 2.02e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.41998e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 7.342e-05, [1] [Cycle 1]: 6.869e-05, [6] [build]: 3.09999e-06 [elim_shapecalc]: 8.92e-06 [elim_not_effective]: 1.198e-05 [opt_reshape]: 6.83e-06 [fold_const_symbol]: 9.71998e-06 [renormalize]: 2.3999e-07 [detach_backward]: 2.14e-06 [pipeline_parallel_scheduler]: 1.92999e-06 [auto_monad_reorder]: 1.71e-05 [get_jit_bprop_graph]: 1.72999e-06 [rewriter_after_jit_bprop_graph]: 4.90001e-06 [opt_after_jit_grad]: 0.00049869 [validate]: 3.846e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.00646453 [execute]: 9.40001e-06 Sums bootstrap : 0.000417s : 2.44% type_inference : 0.005816s : 34.02% event_method : 0.000015s : 0.09% auto_monad : 0.000060s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000032s : 0.19% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000064s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000041s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000595s : 3.48% optimize.opt_a.with_stream_mark : 0.000028s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.14% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000008s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000657s : 3.84% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000044s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000569s : 3.33% optimize.opt_b.b_1 : 0.000111s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.16% optimize.loop_unroll : 0.000448s : 2.62% optimize.opt_after_cconv.c_1 : 0.000029s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.01% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000006s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000020s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.03% opt_after_jit_grad : 0.000499s : 2.92% validate : 0.000038s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006465s : 37.82% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000187 30 14.19% : 0.000027s : 5: substitution.arithmetic_simplify 0.97% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.02% : 0.000006s : 4: substitution.graph_param_transform 68.15% : 0.000128s : 3: substitution.inline 2.08% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000005s : 4: substitution.remove_not_recompute_node 2.54% : 0.000005s : 4: substitution.replace_old_param 5.63% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005763 2 89.68% : 0.005168s : 1: type_inference.infer 10.32% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.73% : 0.000029s : 3: replace.inline 29.27% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000135 5 92.91% : 0.000126s : 3: match.inline 7.09% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 1.10% : 0.000002s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.38% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.39% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.29% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.06% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.93% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.82% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.81% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.02% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000396 8 43.51% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.49% : 0.000224s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031232 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.70% : 0.003341s : 1: add_attr 10.66% : 0.003329s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000065s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.45% : 0.000453s : 1: bootstrap 0.10% : 0.000032s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000027s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.07% : 0.000022s : 1: event_method 0.05% : 0.000017s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000009s : 1: label_micro_interleaved_index 1.47% : 0.000459s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.85% : 0.000579s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 3.11% : 0.000972s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000046s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 8.01% : 0.002502s : 1: opt_a 0.33% : 0.000104s : 1: opt_after_cconv 1.63% : 0.000510s : 1: opt_after_jit_grad 0.62% : 0.000193s : 1: opt_b 14.65% : 0.004577s : 1: optimize 0.07% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.12% : 0.000036s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.05% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 1.16% : 0.000362s : 1: renormalize.infer 0.92% : 0.000288s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000039s : 1: rewriter_after_opt_a 0.22% : 0.000068s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000076s : 1: symbol_engine_optimizer 20.76% : 0.006483s : 1: task_emit 0.25% : 0.000077s : 1: tuple_transform 18.69% : 0.005837s : 1: type_inference 0.23% : 0.000072s : 1: validate TotalTime = 0.0423779, [24] [bootstrap]: 0.00047439 [type_inference]: 0.0125707 [event_method]: 5.331e-05 [auto_monad]: 0.00012631 [graph_reusing]: 8.21002e-06 [inline]: 2.37999e-06 [add_attr]: 0.00350061, [1] [add_attr_with_inline]: 0.00348931, [1] [Cycle 1]: 9.203e-05, [2] [tag_attr]: 4.298e-05 [meta_addattr_fg_expand]: 9.45001e-06 [parallel-infer-symbol]: 4.20999e-06 [pre_auto_parallel]: 5.973e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 1.25999e-06 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0155128, [53] [py_interpret_to_execute]: 4.468e-05 [rewriter_before_opt_a]: 0.00016482 [opt_a]: 0.0128297, [3] [Cycle 1]: 0.00828424, [45] [expand_dump_flag]: 5.01002e-06 [switch_simplify]: 7.534e-05 [loop_unroll]: 6.243e-05 [a_1]: 0.00157285 [with_stream_mark]: 3.151e-05 [recompute_prepare]: 2.553e-05 [updatestate_depend_eliminate]: 1.01e-05 [updatestate_assign_eliminate]: 8.06001e-06 [updatestate_loads_eliminate]: 7.45e-06 [parameter_eliminate]: 2.69999e-06 [a_2]: 0.00025535 [accelerated_algorithm]: 3.449e-05 [shard]: 2.14e-06 [meta_shard_fg_expand]: 4.12998e-06 [shard_inline]: 1.656e-05 [merge_send_recv]: 1.825e-05 [auto_parallel]: 1.365e-05 [parallel]: 2.232e-05 [flash_sp]: 1.308e-05 [merge_comm]: 9.84001e-06 [allreduce_fusion]: 9.22001e-06 [matmul_add_comm_reduction]: 3.308e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.878e-05 [virtual_dataset]: 1.589e-05 [get_grad_eliminate_]: 1.57e-05 [virtual_output]: 1.567e-05 [merge_forward]: 1.141e-05 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 2.016e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.049e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 2.878e-05 [set_forward_comm_id_for_comm_node_pass]: 1.035e-05 [meta_fg_expand]: 0.00177398 [flash_sp_send_recv_attached]: 4.01001e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 6.865e-05 [a_after_grad]: 8.565e-05 [renormalize]: 0.00301671 [add_forward_monad_depend]: 1.202e-05 [auto_monad_grad]: 6.31e-06 [auto_monad_eliminator]: 5.888e-05 [cse]: 0.00017594 [a_3]: 0.00035074 [Cycle 2]: 0.00358905, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 4.818e-05 [loop_unroll]: 4.406e-05 [a_1]: 0.00164979 [with_stream_mark]: 2.303e-05 [recompute_prepare]: 1.315e-05 [updatestate_depend_eliminate]: 6.27001e-06 [updatestate_assign_eliminate]: 4.84003e-06 [updatestate_loads_eliminate]: 4.06001e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 0.00013106 [accelerated_algorithm]: 1.429e-05 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 2.80997e-06 [shard_inline]: 9.02e-06 [merge_send_recv]: 1.092e-05 [auto_parallel]: 1.169e-05 [parallel]: 9.42999e-06 [flash_sp]: 3.95e-06 [merge_comm]: 5.34998e-06 [allreduce_fusion]: 4.98001e-06 [matmul_add_comm_reduction]: 1.182e-05 [allreduce_slice_to_reducescatter]: 9.89996e-07 [virtual_shard_identity]: 1.182e-05 [virtual_dataset]: 9.44e-06 [get_grad_eliminate_]: 9.68002e-06 [virtual_output]: 8.66002e-06 [merge_forward]: 5.47001e-06 [cell_reuse_recompute_pass]: 1.79e-06 [offload_activation]: 1.27e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.819e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 1.476e-05 [set_forward_comm_id_for_comm_node_pass]: 5.56e-06 [meta_fg_expand]: 0.00011949 [flash_sp_send_recv_attached]: 1.52999e-06 [receive_attached]: 1.72999e-06 [after_resolve]: 1.831e-05 [a_after_grad]: 1.53e-05 [renormalize]: 0.00090196 [add_forward_monad_depend]: 4.51002e-06 [auto_monad_grad]: 2.22001e-06 [auto_monad_eliminator]: 1.856e-05 [cse]: 6.085e-05 [a_3]: 6.922e-05 [Cycle 3]: 0.0009386, [45] [expand_dump_flag]: 1.96e-06 [switch_simplify]: 1.093e-05 [loop_unroll]: 8.83001e-06 [a_1]: 0.000257 [with_stream_mark]: 1.281e-05 [recompute_prepare]: 9.17001e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.83001e-06 [parameter_eliminate]: 1.13001e-06 [a_2]: 0.0001234 [accelerated_algorithm]: 1.258e-05 [shard]: 1.23002e-06 [meta_shard_fg_expand]: 2.46998e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 8.97e-06 [auto_parallel]: 7.57002e-06 [parallel]: 6.18002e-06 [flash_sp]: 1.11002e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 5.39e-06 [matmul_add_comm_reduction]: 9.92999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 1.057e-05 [virtual_dataset]: 8.76002e-06 [get_grad_eliminate_]: 8.47998e-06 [virtual_output]: 8.20999e-06 [merge_forward]: 4.68001e-06 [cell_reuse_recompute_pass]: 2.22001e-06 [offload_activation]: 1.063e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.586e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 1.443e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 3.2e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.37e-06 [after_resolve]: 1.551e-05 [a_after_grad]: 1.509e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.71e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.293e-05 [cse]: 2.822e-05 [a_3]: 5.987e-05 [py_interpret_to_execute_after_opt_a]: 1.686e-05 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 5.145e-05 [convert_after_rewriter]: 9.69e-06 [order_py_execute_after_rewriter]: 6.61e-06 [mutable_eliminate]: 0.00074265 [opt_b]: 0.0002986, [1] [Cycle 1]: 0.00029073, [7] [b_1]: 0.00019267 [b_2]: 1.118e-05 [updatestate_depend_eliminate]: 8.07e-06 [updatestate_assign_eliminate]: 4.32003e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 3.89991e-07 [cse]: 3.476e-05 [optimize_parallel_all_gather_comm]: 2.172e-05 [overlap_param_gather]: 2.27999e-06 [cconv]: 2.788e-05 [loop_unroll]: 0.00046311 [opt_after_cconv]: 0.00013874, [1] [Cycle 1]: 0.00013254, [7] [c_1]: 4.878e-05 [parameter_eliminate]: 3.16001e-06 [updatestate_depend_eliminate]: 7.73001e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 4.05e-06 [cse]: 3.008e-05 [renormalize]: 6.50005e-07 [remove_dup_value]: 4.129e-05 [tuple_transform]: 0.00010504, [1] [Cycle 1]: 0.00010001, [4] [d_1]: 6.94e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 1.02e-05 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 6.257e-05 [cse_after_recomputation]: 3.311e-05, [1] [Cycle 1]: 2.774e-05, [1] [cse]: 2.202e-05 [environ_conv]: 1.051e-05 [swap_dp_allreduce_reducescatter]: 8.17e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.94003e-06 [label_fine_grained_interleaved_index]: 2.53998e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.783e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 5.99e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 5.79999e-06 [overlap_grad_flash_sp]: 2.686e-05 [begin_end_overlap_inline]: 7.09988e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 0.0001011, [1] [Cycle 1]: 9.692e-05, [6] [build]: 1.059e-05 [elim_shapecalc]: 1.388e-05 [elim_not_effective]: 1.906e-05 [opt_reshape]: 9.87999e-06 [fold_const_symbol]: 1.577e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.11e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 2.637e-05 [get_jit_bprop_graph]: 2.12999e-06 [rewriter_after_jit_bprop_graph]: 4.43999e-06 [opt_after_jit_grad]: 0.00049451 [validate]: 5.505e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.009215 [execute]: 8.77e-06 Sums bootstrap : 0.000474s : 1.26% type_inference : 0.012571s : 33.52% event_method : 0.000053s : 0.14% auto_monad : 0.000126s : 0.34% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000043s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000060s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000045s : 0.12% optimize.rewriter_before_opt_a : 0.000165s : 0.44% optimize.opt_a.expand_dump_flag : 0.000010s : 0.03% optimize.opt_a.switch_simplify : 0.000134s : 0.36% optimize.opt_a.loop_unroll : 0.000115s : 0.31% optimize.opt_a.a_1 : 0.003480s : 9.28% optimize.opt_a.with_stream_mark : 0.000067s : 0.18% optimize.opt_a.recompute_prepare : 0.000048s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000006s : 0.01% optimize.opt_a.a_2 : 0.000510s : 1.36% optimize.opt_a.accelerated_algorithm : 0.000061s : 0.16% optimize.opt_a.shard : 0.000006s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.03% optimize.opt_a.shard_inline : 0.000035s : 0.09% optimize.opt_a.merge_send_recv : 0.000038s : 0.10% optimize.opt_a.auto_parallel : 0.000033s : 0.09% optimize.opt_a.parallel : 0.000038s : 0.10% optimize.opt_a.flash_sp : 0.000018s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.05% optimize.opt_a.allreduce_fusion : 0.000020s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000055s : 0.15% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.09% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.09% optimize.opt_a.virtual_output : 0.000033s : 0.09% optimize.opt_a.merge_forward : 0.000022s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.01% optimize.opt_a.offload_activation : 0.000043s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.17% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000058s : 0.15% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001897s : 5.06% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000102s : 0.27% optimize.opt_a.a_after_grad : 0.000116s : 0.31% optimize.opt_a.renormalize : 0.003919s : 10.45% optimize.opt_a.add_forward_monad_depend : 0.000018s : 0.05% optimize.opt_a.auto_monad_grad : 0.000010s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000090s : 0.24% optimize.opt_a.cse : 0.000265s : 0.71% optimize.opt_a.a_3 : 0.000480s : 1.28% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000743s : 1.98% optimize.opt_b.b_1 : 0.000193s : 0.51% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000035s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.07% optimize.loop_unroll : 0.000463s : 1.23% optimize.opt_after_cconv.c_1 : 0.000049s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.08% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000041s : 0.11% optimize.tuple_transform.d_1 : 0.000069s : 0.19% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000063s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000011s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000027s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000495s : 1.32% validate : 0.000055s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.009215s : 24.57% execute : 0.000009s : 0.02% Time group info: ------[substitution.] 0.000918 222 6.28% : 0.000058s : 12: substitution.arithmetic_simplify 1.96% : 0.000018s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.57% : 0.000005s : 5: substitution.float_depend_g_call 0.50% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.86% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 57.11% : 0.000524s : 17: substitution.inline 2.14% : 0.000020s : 2: substitution.inline_without_move 1.28% : 0.000012s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000018s : 3: substitution.less_batch_normalization 1.66% : 0.000015s : 11: substitution.minmaximum_grad 0.71% : 0.000007s : 5: substitution.partial_eliminate 1.53% : 0.000014s : 20: substitution.remove_not_recompute_node 3.18% : 0.000029s : 10: substitution.replace_applicator 1.40% : 0.000013s : 15: substitution.replace_old_param 0.29% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.39% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.64% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 2.27% : 0.000021s : 11: substitution.tuple_list_get_item_depend_reorder 7.93% : 0.000073s : 30: substitution.tuple_list_get_item_eliminator 2.13% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012484 2 87.20% : 0.010885s : 1: type_inference.infer 12.80% : 0.001598s : 1: type_inference.specialize ------[replace.] 0.000257 33 60.42% : 0.000155s : 17: replace.inline 39.58% : 0.000102s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000552 33 93.29% : 0.000515s : 17: match.inline 6.71% : 0.000037s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000781 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.35% : 0.000003s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000009s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.14% : 0.000017s : 100: predicate.arithmetic_simplify 1.17% : 0.000009s : 68: predicate.cast_eliminate 1.10% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.47% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000014s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.68% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.40% : 0.000019s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000044s : 249: predicate.inline 1.22% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.72% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.08% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000009s : 68: predicate.minmaximum_grad 0.42% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.09% : 0.000016s : 101: predicate.partial_defer_inline 1.69% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000009s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.34% : 0.000010s : 68: predicate.reduce_eliminate 2.62% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.38% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000015s : 152: predicate.replace_applicator 0.68% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.09% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.35% : 0.000003s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 101: predicate.switch_defer_inline 2.90% : 0.000023s : 169: predicate.switch_layer_defer_inline 4.92% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.21% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.20% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001818 34 57.32% : 0.001042s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.68% : 0.000776s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.070840 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.95% : 0.003506s : 1: add_attr 4.93% : 0.003494s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000067s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000134s : 1: auto_monad 0.04% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.72% : 0.000507s : 1: bootstrap 0.04% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.09% : 0.000060s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.67% : 0.000472s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.06% : 0.000754s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000020s : 1: opt.transform.mutable_eliminate 7.35% : 0.005207s : 117: opt.transform.opt_a 0.07% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000037s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000176s : 28: opt.transform.opt_b 0.11% : 0.000077s : 2: opt.transform.opt_trans_graph 0.08% : 0.000055s : 4: opt.transform.symbol_engine_opt 18.12% : 0.012833s : 1: opt_a 0.20% : 0.000142s : 1: opt_after_cconv 0.71% : 0.000505s : 1: opt_after_jit_grad 0.43% : 0.000302s : 1: opt_b 21.91% : 0.015518s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000006s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000065s : 1: pre_auto_parallel 0.07% : 0.000049s : 1: py_interpret_to_execute 0.03% : 0.000020s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000045s : 1: remove_dup_value 3.12% : 0.002211s : 2: renormalize.infer 2.39% : 0.001691s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000056s : 1: rewriter_after_opt_a 0.24% : 0.000169s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000104s : 1: symbol_engine_optimizer 13.04% : 0.009235s : 1: task_emit 0.15% : 0.000108s : 1: tuple_transform 17.78% : 0.012592s : 1: type_inference 0.14% : 0.000096s : 1: validate TotalTime = 0.02009, [24] [bootstrap]: 0.00043417 [type_inference]: 0.00458149 [event_method]: 1.191e-05 [auto_monad]: 5.272e-05 [graph_reusing]: 5.70001e-06 [inline]: 2.95002e-06 [add_attr]: 0.00328975, [1] [add_attr_with_inline]: 0.00327827, [1] [Cycle 1]: 5.313e-05, [2] [tag_attr]: 1.422e-05 [meta_addattr_fg_expand]: 3.41999e-06 [parallel-infer-symbol]: 3.28998e-06 [pre_auto_parallel]: 2.848e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00416922, [53] [py_interpret_to_execute]: 1.827e-05 [rewriter_before_opt_a]: 4.12e-05 [opt_a]: 0.00216666, [2] [Cycle 1]: 0.00153605, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 2.454e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00030834 [with_stream_mark]: 1.752e-05 [recompute_prepare]: 7.43e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 3.32002e-06 [parameter_eliminate]: 1.98997e-06 [a_2]: 7.81e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.89001e-06 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 8.07e-06 [auto_parallel]: 5.79e-06 [parallel]: 1.905e-05 [flash_sp]: 8.77e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 9.77001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 6.44999e-06 [get_grad_eliminate_]: 5.68997e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.58002e-06 [offload_activation]: 1.062e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.162e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.15001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.068e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00057915 [add_forward_monad_depend]: 5.10999e-06 [auto_monad_grad]: 2.47001e-06 [auto_monad_eliminator]: 1.325e-05 [cse]: 2.979e-05 [a_3]: 4.272e-05 [Cycle 2]: 0.00061938, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 6.66999e-06 [loop_unroll]: 5.46002e-06 [a_1]: 0.0001263 [with_stream_mark]: 1.316e-05 [recompute_prepare]: 5.91003e-06 [updatestate_depend_eliminate]: 3.06999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.35002e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 6.82e-05 [accelerated_algorithm]: 5.77999e-06 [shard]: 1.39e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 6.41e-06 [auto_parallel]: 5.29e-06 [parallel]: 5.67001e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 4.14002e-06 [allreduce_fusion]: 2.92002e-06 [matmul_add_comm_reduction]: 6.49999e-06 [allreduce_slice_to_reducescatter]: 2.29978e-07 [virtual_shard_identity]: 6.46999e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.02999e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.69999e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 7.91001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.006e-05 [merge_recompute_call_nodes]: 1.05999e-06 [before_grad]: 8.38999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.17999e-06 [after_resolve]: 9.39e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.78002e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 7.78999e-06 [cse]: 1.436e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 1.133e-05 [slice_cell_reuse_recomputed_activation]: 2.18998e-06 [rewriter_after_opt_a]: 3.621e-05 [convert_after_rewriter]: 7.6e-06 [order_py_execute_after_rewriter]: 5.13002e-06 [mutable_eliminate]: 0.00055079 [opt_b]: 0.0001914, [1] [Cycle 1]: 0.00018442, [7] [b_1]: 0.00011138 [b_2]: 7.62998e-06 [updatestate_depend_eliminate]: 5.99e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.50003e-07 [cse]: 1.983e-05 [optimize_parallel_all_gather_comm]: 1.72e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.749e-05 [loop_unroll]: 0.00043031 [opt_after_cconv]: 9.885e-05, [1] [Cycle 1]: 9.259e-05, [7] [c_1]: 2.805e-05 [parameter_eliminate]: 3.43999e-06 [updatestate_depend_eliminate]: 5.56998e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.46e-06 [cse]: 1.759e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 1.348e-05 [tuple_transform]: 7.423e-05, [1] [Cycle 1]: 6.948e-05, [4] [d_1]: 4.354e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.22001e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 4.863e-05 [cse_after_recomputation]: 2.023e-05, [1] [Cycle 1]: 1.573e-05, [1] [cse]: 1.056e-05 [environ_conv]: 5.37999e-06 [swap_dp_allreduce_reducescatter]: 5.58002e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.57998e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.25002e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.92998e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.28002e-06 [overlap_grad_ring_attention]: 4.07e-06 [overlap_grad_flash_sp]: 2.039e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.28002e-06 [symbol_engine_optimizer]: 6.857e-05, [1] [Cycle 1]: 6.448e-05, [6] [build]: 2.93e-06 [elim_shapecalc]: 8.43001e-06 [elim_not_effective]: 1.113e-05 [opt_reshape]: 6.04999e-06 [fold_const_symbol]: 8.82999e-06 [renormalize]: 2.3999e-07 [detach_backward]: 2.11e-06 [pipeline_parallel_scheduler]: 1.56998e-06 [auto_monad_reorder]: 1.667e-05 [get_jit_bprop_graph]: 1.59e-06 [rewriter_after_jit_bprop_graph]: 4.03001e-06 [opt_after_jit_grad]: 0.00046194 [validate]: 3.882e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0067276 [execute]: 8.03001e-06 Sums bootstrap : 0.000434s : 2.76% type_inference : 0.004581s : 29.08% event_method : 0.000012s : 0.08% auto_monad : 0.000053s : 0.33% graph_reusing : 0.000006s : 0.04% inline : 0.000003s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.18% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.12% optimize.rewriter_before_opt_a : 0.000041s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.20% optimize.opt_a.loop_unroll : 0.000019s : 0.12% optimize.opt_a.a_1 : 0.000435s : 2.76% optimize.opt_a.with_stream_mark : 0.000031s : 0.19% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000025s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000019s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000579s : 3.68% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000044s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000036s : 0.23% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000551s : 3.50% optimize.opt_b.b_1 : 0.000111s : 0.71% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000020s : 0.13% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000027s : 0.17% optimize.loop_unroll : 0.000430s : 2.73% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000018s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000044s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000020s : 0.13% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.11% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000462s : 2.93% validate : 0.000039s : 0.25% backend_pass : 0.000001s : 0.01% task_emit : 0.006728s : 42.70% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000136 26 17.94% : 0.000024s : 4: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 4.25% : 0.000006s : 4: substitution.graph_param_transform 67.01% : 0.000091s : 2: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.09% : 0.000004s : 4: substitution.remove_not_recompute_node 3.12% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004531 2 91.59% : 0.004150s : 1: type_inference.infer 8.41% : 0.000381s : 1: type_inference.specialize ------[replace.] 0.000021 2 100.00% : 0.000021s : 2: replace.inline ------[match.] 0.000090 2 100.00% : 0.000090s : 2: match.inline ------[predicate.] 0.000141 984 0.77% : 0.000001s : 9: predicate.accumulaten_eliminater 1.21% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.75% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.74% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.38% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.86% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.67% : 0.000001s : 9: predicate.minmaximum_grad 1.56% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.97% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 17: predicate.replace_applicator 0.91% : 0.000001s : 8: predicate.replace_old_param 0.52% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.10% : 0.000002s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.56% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.55% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.93% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.77% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000290 6 38.39% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.61% : 0.000179s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029075 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.33% : 0.003295s : 1: add_attr 11.29% : 0.003283s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000058s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.63% : 0.000474s : 1: bootstrap 0.16% : 0.000046s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.52% : 0.000441s : 1: loop_unroll 0.02% : 0.000005s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.93% : 0.000562s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.72% : 0.000791s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000093s : 28: opt.transform.opt_b 0.16% : 0.000048s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.46% : 0.002170s : 1: opt_a 0.35% : 0.000102s : 1: opt_after_cconv 1.62% : 0.000472s : 1: opt_after_jit_grad 0.67% : 0.000195s : 1: opt_b 14.36% : 0.004174s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000024s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.05% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 1.18% : 0.000343s : 1: renormalize.infer 0.79% : 0.000229s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000040s : 1: rewriter_after_opt_a 0.16% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 23.20% : 0.006744s : 1: task_emit 0.26% : 0.000077s : 1: tuple_transform 15.84% : 0.004607s : 1: type_inference 0.26% : 0.000075s : 1: validate TotalTime = 0.0419101, [24] [bootstrap]: 0.00046063 [type_inference]: 0.0116037 [event_method]: 4.67e-05 [auto_monad]: 0.00011906 [graph_reusing]: 7.82e-06 [inline]: 2.92002e-06 [add_attr]: 0.00358237, [1] [add_attr_with_inline]: 0.00356954, [1] [Cycle 1]: 9.011e-05, [2] [tag_attr]: 3.838e-05 [meta_addattr_fg_expand]: 8.3e-06 [parallel-infer-symbol]: 4.27998e-06 [pre_auto_parallel]: 5.799e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.95001e-06 [optimize]: 0.016097, [53] [py_interpret_to_execute]: 4.017e-05 [rewriter_before_opt_a]: 0.00014732 [opt_a]: 0.0133784, [3] [Cycle 1]: 0.00866189, [45] [expand_dump_flag]: 5.07e-06 [switch_simplify]: 7.303e-05 [loop_unroll]: 6.007e-05 [a_1]: 0.00158933 [with_stream_mark]: 3.39e-05 [recompute_prepare]: 2.941e-05 [updatestate_depend_eliminate]: 9.67999e-06 [updatestate_assign_eliminate]: 7.92e-06 [updatestate_loads_eliminate]: 7.74002e-06 [parameter_eliminate]: 3.38999e-06 [a_2]: 0.0002688 [accelerated_algorithm]: 3.652e-05 [shard]: 2.79001e-06 [meta_shard_fg_expand]: 3.96001e-06 [shard_inline]: 1.646e-05 [merge_send_recv]: 2.039e-05 [auto_parallel]: 1.329e-05 [parallel]: 2.356e-05 [flash_sp]: 1.435e-05 [merge_comm]: 1.058e-05 [allreduce_fusion]: 1.03e-05 [matmul_add_comm_reduction]: 3.482e-05 [allreduce_slice_to_reducescatter]: 1.12999e-06 [virtual_shard_identity]: 2.318e-05 [virtual_dataset]: 1.788e-05 [get_grad_eliminate_]: 1.563e-05 [virtual_output]: 1.645e-05 [merge_forward]: 1.001e-05 [cell_reuse_recompute_pass]: 2.01e-06 [offload_activation]: 1.932e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.534e-05 [merge_recompute_call_nodes]: 1.73997e-06 [before_grad]: 3.076e-05 [set_forward_comm_id_for_comm_node_pass]: 1.205e-05 [meta_fg_expand]: 0.00184171 [flash_sp_send_recv_attached]: 5.57999e-06 [receive_attached]: 3.58999e-06 [after_resolve]: 7.762e-05 [a_after_grad]: 9.494e-05 [renormalize]: 0.00313349 [add_forward_monad_depend]: 1.405e-05 [auto_monad_grad]: 6.71e-06 [auto_monad_eliminator]: 6.289e-05 [cse]: 0.00018428 [a_3]: 0.00039098 [Cycle 2]: 0.00371241, [45] [expand_dump_flag]: 3.09999e-06 [switch_simplify]: 4.89e-05 [loop_unroll]: 4.415e-05 [a_1]: 0.00170119 [with_stream_mark]: 2.468e-05 [recompute_prepare]: 1.518e-05 [updatestate_depend_eliminate]: 6.17001e-06 [updatestate_assign_eliminate]: 5.52999e-06 [updatestate_loads_eliminate]: 4.75001e-06 [parameter_eliminate]: 2.10002e-06 [a_2]: 0.00013009 [accelerated_algorithm]: 1.404e-05 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 2.69999e-06 [shard_inline]: 9.57001e-06 [merge_send_recv]: 1.161e-05 [auto_parallel]: 1.164e-05 [parallel]: 9.09e-06 [flash_sp]: 4.35999e-06 [merge_comm]: 5.44e-06 [allreduce_fusion]: 5.01002e-06 [matmul_add_comm_reduction]: 1.155e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.264e-05 [virtual_dataset]: 1.002e-05 [get_grad_eliminate_]: 9.01002e-06 [virtual_output]: 8.94998e-06 [merge_forward]: 6.06e-06 [cell_reuse_recompute_pass]: 1.55999e-06 [offload_activation]: 1.43e-05 [cell_reuse_handle_not_recompute_node_pass]: 2e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 1.591e-05 [set_forward_comm_id_for_comm_node_pass]: 5.89e-06 [meta_fg_expand]: 6.33e-05 [flash_sp_send_recv_attached]: 1.79e-06 [receive_attached]: 2.33002e-06 [after_resolve]: 1.907e-05 [a_after_grad]: 1.501e-05 [renormalize]: 0.00097582 [add_forward_monad_depend]: 6.76e-06 [auto_monad_grad]: 2.89999e-06 [auto_monad_eliminator]: 1.963e-05 [cse]: 6.617e-05 [a_3]: 7.843e-05 [Cycle 3]: 0.00098272, [45] [expand_dump_flag]: 2.63003e-06 [switch_simplify]: 1.1e-05 [loop_unroll]: 1.003e-05 [a_1]: 0.00027876 [with_stream_mark]: 1.49e-05 [recompute_prepare]: 9.47001e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 4.05998e-06 [parameter_eliminate]: 1.56998e-06 [a_2]: 0.00012501 [accelerated_algorithm]: 1.254e-05 [shard]: 1.77001e-06 [meta_shard_fg_expand]: 2.19999e-06 [shard_inline]: 9.12999e-06 [merge_send_recv]: 9.22999e-06 [auto_parallel]: 8.67e-06 [parallel]: 7.71001e-06 [flash_sp]: 1.01997e-06 [merge_comm]: 5.05001e-06 [allreduce_fusion]: 5.24e-06 [matmul_add_comm_reduction]: 9.99001e-06 [allreduce_slice_to_reducescatter]: 4.69998e-07 [virtual_shard_identity]: 1.186e-05 [virtual_dataset]: 9.72001e-06 [get_grad_eliminate_]: 8.70999e-06 [virtual_output]: 8.50999e-06 [merge_forward]: 5.66998e-06 [cell_reuse_recompute_pass]: 2.65002e-06 [offload_activation]: 1.131e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.789e-05 [merge_recompute_call_nodes]: 1.24e-06 [before_grad]: 1.452e-05 [set_forward_comm_id_for_comm_node_pass]: 5.65001e-06 [meta_fg_expand]: 3.29001e-06 [flash_sp_send_recv_attached]: 1.19e-06 [receive_attached]: 1.78997e-06 [after_resolve]: 1.517e-05 [a_after_grad]: 1.428e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.49998e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.21e-05 [cse]: 3.083e-05 [a_3]: 5.759e-05 [py_interpret_to_execute_after_opt_a]: 1.857e-05 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 5.726e-05 [convert_after_rewriter]: 9.91998e-06 [order_py_execute_after_rewriter]: 8.02e-06 [mutable_eliminate]: 0.00072869 [opt_b]: 0.00031005, [1] [Cycle 1]: 0.00030105, [7] [b_1]: 0.00019506 [b_2]: 1.2e-05 [updatestate_depend_eliminate]: 1.005e-05 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 4.50001e-06 [renormalize]: 7.39994e-07 [cse]: 3.686e-05 [optimize_parallel_all_gather_comm]: 2.361e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 3.191e-05 [loop_unroll]: 0.00046463 [opt_after_cconv]: 0.00014458, [1] [Cycle 1]: 0.00013697, [7] [c_1]: 4.887e-05 [parameter_eliminate]: 3.66999e-06 [updatestate_depend_eliminate]: 7.66999e-06 [updatestate_assign_eliminate]: 4.64002e-06 [updatestate_loads_eliminate]: 4.10998e-06 [cse]: 3.174e-05 [renormalize]: 4.7998e-07 [remove_dup_value]: 4.636e-05 [tuple_transform]: 0.00010801, [1] [Cycle 1]: 0.00010318, [4] [d_1]: 7.213e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 1.001e-05 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 7.1e-05 [cse_after_recomputation]: 3.419e-05, [1] [Cycle 1]: 2.95e-05, [1] [cse]: 2.392e-05 [environ_conv]: 1.073e-05 [swap_dp_allreduce_reducescatter]: 8.13001e-06 [bias_add_comm_swap]: 3.44001e-06 [label_micro_interleaved_index]: 4.77e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.49999e-06 [micro_interleaved_order_control]: 2.22001e-06 [assign_add_opt]: 1.59e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.32999e-06 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.47001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88997e-06 [control_data_broadcast_order]: 1.804e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 5.56998e-06 [overlap_recompute_and_grad_model_parallel]: 5.92001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 5.53002e-06 [overlap_grad_flash_sp]: 2.864e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.53e-06 [split_layernorm_comm]: 1.66998e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 0.00010716, [1] [Cycle 1]: 0.00010275, [6] [build]: 1.272e-05 [elim_shapecalc]: 1.461e-05 [elim_not_effective]: 1.898e-05 [opt_reshape]: 1.143e-05 [fold_const_symbol]: 1.488e-05 [renormalize]: 1.59984e-07 [detach_backward]: 2.63e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.652e-05 [get_jit_bprop_graph]: 2.14e-06 [rewriter_after_jit_bprop_graph]: 5.12e-06 [opt_after_jit_grad]: 0.00048938 [validate]: 5.755e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.00899594 [execute]: 9.36e-06 Sums bootstrap : 0.000461s : 1.25% type_inference : 0.011604s : 31.57% event_method : 0.000047s : 0.13% auto_monad : 0.000119s : 0.32% graph_reusing : 0.000008s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000038s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.02% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000058s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000040s : 0.11% optimize.rewriter_before_opt_a : 0.000147s : 0.40% optimize.opt_a.expand_dump_flag : 0.000011s : 0.03% optimize.opt_a.switch_simplify : 0.000133s : 0.36% optimize.opt_a.loop_unroll : 0.000114s : 0.31% optimize.opt_a.a_1 : 0.003569s : 9.71% optimize.opt_a.with_stream_mark : 0.000073s : 0.20% optimize.opt_a.recompute_prepare : 0.000054s : 0.15% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000017s : 0.05% optimize.opt_a.parameter_eliminate : 0.000007s : 0.02% optimize.opt_a.a_2 : 0.000524s : 1.43% optimize.opt_a.accelerated_algorithm : 0.000063s : 0.17% optimize.opt_a.shard : 0.000007s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000041s : 0.11% optimize.opt_a.auto_parallel : 0.000034s : 0.09% optimize.opt_a.parallel : 0.000040s : 0.11% optimize.opt_a.flash_sp : 0.000020s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000021s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000056s : 0.15% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000048s : 0.13% optimize.opt_a.virtual_dataset : 0.000038s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.09% optimize.opt_a.virtual_output : 0.000034s : 0.09% optimize.opt_a.merge_forward : 0.000022s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.02% optimize.opt_a.offload_activation : 0.000045s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000073s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.01% optimize.opt_a.before_grad : 0.000061s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000024s : 0.06% optimize.opt_a.meta_fg_expand : 0.001908s : 5.19% optimize.opt_a.flash_sp_send_recv_attached : 0.000009s : 0.02% optimize.opt_a.receive_attached : 0.000008s : 0.02% optimize.opt_a.after_resolve : 0.000112s : 0.30% optimize.opt_a.a_after_grad : 0.000124s : 0.34% optimize.opt_a.renormalize : 0.004109s : 11.18% optimize.opt_a.add_forward_monad_depend : 0.000022s : 0.06% optimize.opt_a.auto_monad_grad : 0.000011s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000095s : 0.26% optimize.opt_a.cse : 0.000281s : 0.77% optimize.opt_a.a_3 : 0.000527s : 1.43% optimize.py_interpret_to_execute_after_opt_a : 0.000019s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000057s : 0.16% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000008s : 0.02% optimize.mutable_eliminate : 0.000729s : 1.98% optimize.opt_b.b_1 : 0.000195s : 0.53% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000037s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000032s : 0.09% optimize.loop_unroll : 0.000465s : 1.26% optimize.opt_after_cconv.c_1 : 0.000049s : 0.13% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000032s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000046s : 0.13% optimize.tuple_transform.d_1 : 0.000072s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000071s : 0.19% optimize.cse_after_recomputation.cse : 0.000024s : 0.07% optimize.environ_conv : 0.000011s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000029s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000013s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000027s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000489s : 1.33% validate : 0.000058s : 0.16% backend_pass : 0.000001s : 0.00% task_emit : 0.008996s : 24.47% execute : 0.000009s : 0.03% Time group info: ------[substitution.] 0.001041 218 5.98% : 0.000062s : 11: substitution.arithmetic_simplify 2.03% : 0.000021s : 2: substitution.cast_eliminate 0.29% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000005s : 5: substitution.float_depend_g_call 0.41% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.20% : 0.000002s : 5: substitution.fold_const_symbol 0.85% : 0.000009s : 8: substitution.graph_param_transform 0.31% : 0.000003s : 2: substitution.incorporate_call 0.18% : 0.000002s : 2: substitution.incorporate_call_switch 60.10% : 0.000625s : 16: substitution.inline 1.98% : 0.000021s : 2: substitution.inline_without_move 1.20% : 0.000013s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000020s : 3: substitution.less_batch_normalization 1.41% : 0.000015s : 11: substitution.minmaximum_grad 0.72% : 0.000008s : 5: substitution.partial_eliminate 1.51% : 0.000016s : 20: substitution.remove_not_recompute_node 3.11% : 0.000032s : 10: substitution.replace_applicator 1.45% : 0.000015s : 15: substitution.replace_old_param 0.42% : 0.000004s : 1: substitution.set_cell_output_no_recompute 3.13% : 0.000033s : 11: substitution.tuple_list_convert_item_index_to_positive 1.44% : 0.000015s : 11: substitution.tuple_list_get_item_const_eliminator 1.90% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.04% : 0.000073s : 28: substitution.tuple_list_get_item_eliminator 1.97% : 0.000021s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011512 2 87.70% : 0.010096s : 1: type_inference.infer 12.30% : 0.001417s : 1: type_inference.specialize ------[replace.] 0.000225 30 60.82% : 0.000137s : 16: replace.inline 39.18% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000650 30 94.74% : 0.000615s : 16: match.inline 5.26% : 0.000034s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000809 5663 1.05% : 0.000009s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.12% : 0.000009s : 67: predicate.addn_zero_filter 1.00% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.09% : 0.000017s : 99: predicate.arithmetic_simplify 1.17% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.14% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.30% : 0.000011s : 67: predicate.dict_get_item_eliminator 2.61% : 0.000021s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000010s : 75: predicate.environ_get_depend_swap 1.76% : 0.000014s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000014s : 97: predicate.exchange_switch_depend_value 2.21% : 0.000018s : 97: predicate.float_depend_g_call 0.47% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.60% : 0.000005s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.40% : 0.000044s : 244: predicate.inline 1.33% : 0.000011s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000006s : 32: predicate.less_batch_normalization 1.55% : 0.000013s : 97: predicate.list_to_tuple_eliminator_ 2.60% : 0.000021s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.16% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.50% : 0.000004s : 32: predicate.merge_addn 1.06% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.07% : 0.000009s : 67: predicate.minmaximum_grad 0.38% : 0.000003s : 8: predicate.mutable_eliminate 0.19% : 0.000002s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.10% : 0.000017s : 97: predicate.partial_defer_inline 1.60% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000009s : 67: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000011s : 67: predicate.reduce_eliminate 2.63% : 0.000021s : 164: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000015s : 149: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.14% : 0.000001s : 8: predicate.reset_defer_inline 1.04% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.47% : 0.000012s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.83% : 0.000007s : 32: predicate.shard_identity_eliminate 0.34% : 0.000003s : 16: predicate.special_op_eliminate 0.56% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 97: predicate.switch_defer_inline 2.89% : 0.000023s : 165: predicate.switch_layer_defer_inline 4.75% : 0.000038s : 265: predicate.switch_simplify 1.08% : 0.000009s : 67: predicate.tile_eliminate 1.06% : 0.000009s : 67: predicate.transpose_eliminate 1.43% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.44% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.63% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.37% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.93% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.55% : 0.000013s : 97: predicate.tuple_to_list_eliminator_ 2.52% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.13% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000005s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001799 32 58.56% : 0.001054s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.44% : 0.000745s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.071414 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.02% : 0.003588s : 1: add_attr 5.00% : 0.003574s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000075s : 1: add_recomputation 0.01% : 0.000005s : 1: assign_add_opt 0.18% : 0.000127s : 1: auto_monad 0.04% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.70% : 0.000499s : 1: bootstrap 0.05% : 0.000036s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000014s : 1: environ_conv 0.08% : 0.000055s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000007s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.66% : 0.000474s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.04% : 0.000742s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000021s : 1: opt.transform.mutable_eliminate 7.55% : 0.005393s : 117: opt.transform.opt_a 0.07% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.25% : 0.000179s : 28: opt.transform.opt_b 0.11% : 0.000080s : 2: opt.transform.opt_trans_graph 0.08% : 0.000057s : 4: opt.transform.symbol_engine_opt 18.74% : 0.013382s : 1: opt_a 0.21% : 0.000148s : 1: opt_after_cconv 0.79% : 0.000565s : 1: opt_after_jit_grad 0.44% : 0.000314s : 1: opt_b 22.55% : 0.016103s : 1: optimize 0.04% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000032s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000009s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000063s : 1: pre_auto_parallel 0.06% : 0.000044s : 1: py_interpret_to_execute 0.03% : 0.000022s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000051s : 1: remove_dup_value 3.24% : 0.002316s : 2: renormalize.infer 2.48% : 0.001773s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000062s : 1: rewriter_after_opt_a 0.21% : 0.000153s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.15% : 0.000110s : 1: symbol_engine_optimizer 12.62% : 0.009016s : 1: task_emit 0.16% : 0.000111s : 1: tuple_transform 16.29% : 0.011635s : 1: type_inference 0.14% : 0.000101s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x1-kbk],max_mem:46.0M TotalTime = 0.110628, [24] [bootstrap]: 0.00056308 [type_inference]: 0.00814992 [event_method]: 1.781e-05 [auto_monad]: 6.158e-05 [graph_reusing]: 6.02999e-06 [inline]: 3.16999e-06 [add_attr]: 0.00477764, [1] [add_attr_with_inline]: 0.00475986, [1] [Cycle 1]: 7.267e-05, [2] [tag_attr]: 2.251e-05 [meta_addattr_fg_expand]: 4.66002e-06 [parallel-infer-symbol]: 4.48999e-06 [pre_auto_parallel]: 3.94e-05 [insert-virtual-dataset]: 2.67001e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.44001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00540272, [53] [py_interpret_to_execute]: 2.905e-05 [rewriter_before_opt_a]: 9.329e-05 [opt_a]: 0.0028614, [2] [Cycle 1]: 0.00219177, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 3.474e-05 [loop_unroll]: 2.119e-05 [a_1]: 0.00052227 [with_stream_mark]: 2.035e-05 [recompute_prepare]: 1.092e-05 [updatestate_depend_eliminate]: 4.18001e-06 [updatestate_assign_eliminate]: 3.47002e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 7.862e-05 [accelerated_algorithm]: 7.08e-06 [shard]: 2.49999e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 6.49999e-06 [merge_send_recv]: 9.51e-06 [auto_parallel]: 8.52e-06 [parallel]: 3.094e-05 [flash_sp]: 1.041e-05 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 1.046e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 9.61e-06 [virtual_dataset]: 6.60002e-06 [get_grad_eliminate_]: 6.02001e-06 [virtual_output]: 5.84e-06 [merge_forward]: 4.57e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.108e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.349e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 9.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.46998e-06 [flash_sp_send_recv_attached]: 2.65002e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.137e-05 [a_after_grad]: 9.42001e-06 [renormalize]: 0.00090501 [add_forward_monad_depend]: 8.28001e-06 [auto_monad_grad]: 2.99999e-06 [auto_monad_eliminator]: 1.977e-05 [cse]: 3.042e-05 [a_3]: 4.908e-05 [Cycle 2]: 0.00065557, [45] [expand_dump_flag]: 1.76e-06 [switch_simplify]: 8.05e-06 [loop_unroll]: 5.72999e-06 [a_1]: 0.00013494 [with_stream_mark]: 1.556e-05 [recompute_prepare]: 5.81998e-06 [updatestate_depend_eliminate]: 4.04002e-06 [updatestate_assign_eliminate]: 2.77002e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 1.08001e-06 [a_2]: 7.059e-05 [accelerated_algorithm]: 5.87999e-06 [shard]: 1.75001e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 6.42001e-06 [auto_parallel]: 7.45e-06 [parallel]: 7.62002e-06 [flash_sp]: 3.98999e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 2.95002e-06 [matmul_add_comm_reduction]: 6.96999e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 7.21999e-06 [virtual_dataset]: 5.52001e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 5.40999e-06 [merge_forward]: 3.16999e-06 [cell_reuse_recompute_pass]: 2.42001e-06 [offload_activation]: 1.014e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.125e-05 [merge_recompute_call_nodes]: 1.05001e-06 [before_grad]: 8.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 1.17999e-06 [receive_attached]: 1.67999e-06 [after_resolve]: 1.116e-05 [a_after_grad]: 8.72e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 2.22999e-06 [auto_monad_grad]: 1.53002e-06 [auto_monad_eliminator]: 9.25999e-06 [cse]: 1.519e-05 [a_3]: 3.293e-05 [py_interpret_to_execute_after_opt_a]: 1.476e-05 [slice_cell_reuse_recomputed_activation]: 2.17999e-06 [rewriter_after_opt_a]: 4.245e-05 [convert_after_rewriter]: 7.06999e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00076899 [opt_b]: 0.00024601, [1] [Cycle 1]: 0.00023695, [7] [b_1]: 0.00012495 [b_2]: 7.95e-06 [updatestate_depend_eliminate]: 9.17001e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 3.20998e-06 [renormalize]: 6.30011e-07 [cse]: 4.691e-05 [optimize_parallel_all_gather_comm]: 2.146e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 3.542e-05 [loop_unroll]: 0.00053386 [opt_after_cconv]: 0.00011591, [1] [Cycle 1]: 0.00010758, [7] [c_1]: 3.055e-05 [parameter_eliminate]: 5.87999e-06 [updatestate_depend_eliminate]: 7.35003e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 2.24e-05 [renormalize]: 8.99978e-07 [remove_dup_value]: 1.545e-05 [tuple_transform]: 8.007e-05, [1] [Cycle 1]: 7.475e-05, [4] [d_1]: 4.777e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 4.09986e-07 [switch_simplify]: 6.51e-06 [partial_unused_args_eliminate]: 2.06e-06 [add_recomputation]: 6.409e-05 [cse_after_recomputation]: 2.222e-05, [1] [Cycle 1]: 1.677e-05, [1] [cse]: 1.112e-05 [environ_conv]: 5.74999e-06 [swap_dp_allreduce_reducescatter]: 5.27001e-06 [bias_add_comm_swap]: 3.13e-06 [label_micro_interleaved_index]: 5.89e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.64998e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 8.39995e-07 [full_micro_interleaved_order_control]: 2.19001e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.55001e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.34998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.394e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.93999e-06 [overlap_recompute_and_grad_model_parallel]: 5.28002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 2.231e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.46e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 8.209e-05, [1] [Cycle 1]: 7.71e-05, [6] [build]: 3.86001e-06 [elim_shapecalc]: 1.172e-05 [elim_not_effective]: 1.458e-05 [opt_reshape]: 6.67002e-06 [fold_const_symbol]: 9.09e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.83e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.881e-05 [get_jit_bprop_graph]: 1.89e-06 [rewriter_after_jit_bprop_graph]: 7.1e-06 [opt_after_jit_grad]: 0.00060259 [validate]: 4.648e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0906232 [execute]: 1.008e-05 Sums bootstrap : 0.000563s : 0.54% type_inference : 0.008150s : 7.78% event_method : 0.000018s : 0.02% auto_monad : 0.000062s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000023s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000039s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000029s : 0.03% optimize.rewriter_before_opt_a : 0.000093s : 0.09% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000043s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000657s : 0.63% optimize.opt_a.with_stream_mark : 0.000036s : 0.03% optimize.opt_a.recompute_prepare : 0.000017s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000149s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000016s : 0.02% optimize.opt_a.auto_parallel : 0.000016s : 0.02% optimize.opt_a.parallel : 0.000039s : 0.04% optimize.opt_a.flash_sp : 0.000014s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000008s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000021s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000023s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000905s : 0.86% optimize.opt_a.add_forward_monad_depend : 0.000011s : 0.01% optimize.opt_a.auto_monad_grad : 0.000005s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000029s : 0.03% optimize.opt_a.cse : 0.000046s : 0.04% optimize.opt_a.a_3 : 0.000082s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000042s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000769s : 0.73% optimize.opt_b.b_1 : 0.000125s : 0.12% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000047s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.03% optimize.loop_unroll : 0.000534s : 0.51% optimize.opt_after_cconv.c_1 : 0.000031s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000015s : 0.01% optimize.tuple_transform.d_1 : 0.000048s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000064s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000014s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000022s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000015s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000003s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000019s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000007s : 0.01% opt_after_jit_grad : 0.000603s : 0.58% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.090623s : 86.56% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000222 30 14.17% : 0.000031s : 5: substitution.arithmetic_simplify 0.91% : 0.000002s : 2: substitution.elim_not_effective 0.54% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000007s : 4: substitution.graph_param_transform 69.17% : 0.000153s : 3: substitution.inline 1.56% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.34% : 0.000005s : 4: substitution.remove_not_recompute_node 2.52% : 0.000006s : 4: substitution.replace_old_param 5.65% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.008084 2 91.42% : 0.007390s : 1: type_inference.infer 8.58% : 0.000694s : 1: type_inference.specialize ------[replace.] 0.000045 5 71.46% : 0.000032s : 3: replace.inline 28.54% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000163 5 92.93% : 0.000151s : 3: match.inline 7.07% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000180 1131 0.76% : 0.000001s : 11: predicate.accumulaten_eliminater 1.24% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.51% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 11: predicate.addn_zero_filter 0.68% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.97% : 0.000002s : 11: predicate.cast_eliminate 0.59% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.72% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.76% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.46% : 0.000001s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.05% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.98% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.95% : 0.000002s : 15: predicate.environ_get_depend_swap 1.67% : 0.000003s : 23: predicate.environ_get_eliminate 0.93% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.14% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000004s : 16: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.78% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.67% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.59% : 0.000001s : 8: predicate.incorporate_call 0.48% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000011s : 51: predicate.inline 0.87% : 0.000002s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.28% : 0.000004s : 32: predicate.load_eliminater 2.28% : 0.000004s : 4: predicate.loop_unroll_after_grad 1.96% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.57% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.57% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 11: predicate.minmaximum_grad 2.11% : 0.000004s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.25% : 0.000002s : 17: predicate.partial_eliminate 0.74% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.27% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.26% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000002s : 8: predicate.same_eliminate 0.44% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.25% : 0.000002s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 1.48% : 0.000003s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 16: predicate.switch_defer_inline 1.82% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000009s : 54: predicate.switch_simplify 0.98% : 0.000002s : 11: predicate.tile_eliminate 0.77% : 0.000001s : 11: predicate.transpose_eliminate 1.35% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.43% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.53% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.05% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.90% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.62% : 0.000001s : 4: predicate.value_based_eliminate 0.60% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000483 8 45.37% : 0.000219s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.63% : 0.000264s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.122938 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.89% : 0.004784s : 1: add_attr 3.88% : 0.004764s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000069s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.06% : 0.000068s : 1: auto_monad 0.02% : 0.000024s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.49% : 0.000607s : 1: bootstrap 0.03% : 0.000040s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000017s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000024s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000009s : 1: label_micro_interleaved_index 0.45% : 0.000548s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.64% : 0.000786s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000020s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000023s : 1: opt.transform.mutable_eliminate 0.86% : 0.001052s : 78: opt.transform.opt_a 0.02% : 0.000029s : 1: opt.transform.opt_after_cconv 0.02% : 0.000029s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000103s : 28: opt.transform.opt_b 0.04% : 0.000052s : 2: opt.transform.opt_trans_graph 0.03% : 0.000038s : 4: opt.transform.symbol_engine_opt 2.33% : 0.002865s : 1: opt_a 0.10% : 0.000119s : 1: opt_after_cconv 0.50% : 0.000619s : 1: opt_after_jit_grad 0.20% : 0.000250s : 1: opt_b 4.40% : 0.005408s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000044s : 1: pre_auto_parallel 0.03% : 0.000033s : 1: py_interpret_to_execute 0.01% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000019s : 1: remove_dup_value 0.44% : 0.000538s : 1: renormalize.infer 0.29% : 0.000356s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000011s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000048s : 1: rewriter_after_opt_a 0.08% : 0.000099s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000085s : 1: symbol_engine_optimizer 73.73% : 0.090646s : 1: task_emit 0.07% : 0.000083s : 1: tuple_transform 6.65% : 0.008178s : 1: type_inference 0.07% : 0.000088s : 1: validate TotalTime = 0.108952, [24] [bootstrap]: 0.00044646 [type_inference]: 0.00511433 [event_method]: 1.31e-05 [auto_monad]: 5.569e-05 [graph_reusing]: 5.00001e-06 [inline]: 2.37999e-06 [add_attr]: 0.00349908, [1] [add_attr_with_inline]: 0.00348672, [1] [Cycle 1]: 5.849e-05, [2] [tag_attr]: 1.495e-05 [meta_addattr_fg_expand]: 3.18998e-06 [parallel-infer-symbol]: 3.76999e-06 [pre_auto_parallel]: 3.057e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 6.30011e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00455814, [53] [py_interpret_to_execute]: 2.184e-05 [rewriter_before_opt_a]: 4.893e-05 [opt_a]: 0.00233494, [2] [Cycle 1]: 0.00167897, [45] [expand_dump_flag]: 2.91999e-06 [switch_simplify]: 2.562e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.0003207 [with_stream_mark]: 2.006e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 4.23999e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.944e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 2.93e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.25999e-06 [auto_parallel]: 7.24001e-06 [parallel]: 1.93e-05 [flash_sp]: 8.74e-06 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.40998e-06 [matmul_add_comm_reduction]: 1.017e-05 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.50998e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 1.124e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.269e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.52001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55998e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.118e-05 [a_after_grad]: 9.33002e-06 [renormalize]: 0.0006753 [add_forward_monad_depend]: 6.69001e-06 [auto_monad_grad]: 2.86e-06 [auto_monad_eliminator]: 1.633e-05 [cse]: 3.085e-05 [a_3]: 4.62e-05 [Cycle 2]: 0.00064421, [45] [expand_dump_flag]: 1.76e-06 [switch_simplify]: 8.2e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.00013332 [with_stream_mark]: 1.483e-05 [recompute_prepare]: 6.19999e-06 [updatestate_depend_eliminate]: 3.98999e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 6.838e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.42999e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 6.59001e-06 [auto_parallel]: 7.41001e-06 [parallel]: 6.83998e-06 [flash_sp]: 3.30998e-06 [merge_comm]: 3.95998e-06 [allreduce_fusion]: 3.09001e-06 [matmul_add_comm_reduction]: 7.27002e-06 [allreduce_slice_to_reducescatter]: 5.10016e-07 [virtual_shard_identity]: 6.50002e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 5.09e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.92001e-06 [offload_activation]: 8.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 8.89995e-07 [before_grad]: 8.03999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 1.97001e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 2.28002e-06 [after_resolve]: 9.95002e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 1.48002e-06 [auto_monad_eliminator]: 8.59e-06 [cse]: 1.589e-05 [a_3]: 3.177e-05 [py_interpret_to_execute_after_opt_a]: 1.221e-05 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.858e-05 [convert_after_rewriter]: 7.13998e-06 [order_py_execute_after_rewriter]: 5.78997e-06 [mutable_eliminate]: 0.00069761 [opt_b]: 0.00019764, [1] [Cycle 1]: 0.00018947, [7] [b_1]: 0.00011061 [b_2]: 7.6e-06 [updatestate_depend_eliminate]: 8.20999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.91e-06 [renormalize]: 6.09987e-07 [cse]: 2.179e-05 [optimize_parallel_all_gather_comm]: 1.851e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.887e-05 [loop_unroll]: 0.00046497 [opt_after_cconv]: 0.00010325, [1] [Cycle 1]: 9.643e-05, [7] [c_1]: 2.856e-05 [parameter_eliminate]: 3.61001e-06 [updatestate_depend_eliminate]: 6.63e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.82e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.364e-05 [tuple_transform]: 7.359e-05, [1] [Cycle 1]: 6.892e-05, [4] [d_1]: 4.301e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 5.97999e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.995e-05 [cse_after_recomputation]: 2.183e-05, [1] [Cycle 1]: 1.682e-05, [1] [cse]: 1.125e-05 [environ_conv]: 5.32001e-06 [swap_dp_allreduce_reducescatter]: 5.91e-06 [bias_add_comm_swap]: 3.01001e-06 [label_micro_interleaved_index]: 5.33002e-06 [label_fine_grained_interleaved_index]: 3.08e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.68002e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.19e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86998e-06 [control_data_broadcast_order]: 1.539e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 4.62e-06 [overlap_recompute_and_grad_model_parallel]: 4.59002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.62999e-06 [overlap_recompute_comm]: 2.38002e-06 [overlap_grad_ring_attention]: 4.02002e-06 [overlap_grad_flash_sp]: 2.137e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 7.571e-05, [1] [Cycle 1]: 7.082e-05, [6] [build]: 3.26001e-06 [elim_shapecalc]: 9.64e-06 [elim_not_effective]: 1.223e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 9.64e-06 [renormalize]: 2.09984e-07 [detach_backward]: 2.28002e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.689e-05 [get_jit_bprop_graph]: 1.70001e-06 [rewriter_after_jit_bprop_graph]: 4.49998e-06 [opt_after_jit_grad]: 0.00050631 [validate]: 4.352e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.094374 [execute]: 1.016e-05 Sums bootstrap : 0.000446s : 0.43% type_inference : 0.005114s : 4.90% event_method : 0.000013s : 0.01% auto_monad : 0.000056s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000031s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000049s : 0.05% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000034s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000454s : 0.43% optimize.opt_a.with_stream_mark : 0.000035s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000015s : 0.01% optimize.opt_a.auto_parallel : 0.000015s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000675s : 0.65% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.02% optimize.opt_a.cse : 0.000047s : 0.04% optimize.opt_a.a_3 : 0.000078s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000698s : 0.67% optimize.opt_b.b_1 : 0.000111s : 0.11% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000022s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000029s : 0.03% optimize.loop_unroll : 0.000465s : 0.45% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.01% optimize.tuple_transform.d_1 : 0.000043s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000015s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000506s : 0.49% validate : 0.000044s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.094374s : 90.42% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000151 26 18.51% : 0.000028s : 4: substitution.arithmetic_simplify 1.27% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000002s : 2: substitution.fold_const_symbol 4.39% : 0.000007s : 4: substitution.graph_param_transform 65.91% : 0.000099s : 2: substitution.inline 2.04% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.73% : 0.000006s : 4: substitution.remove_not_recompute_node 3.12% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.005059 2 91.73% : 0.004641s : 1: type_inference.infer 8.27% : 0.000418s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000098 2 100.00% : 0.000098s : 2: match.inline ------[predicate.] 0.000150 984 1.15% : 0.000002s : 9: predicate.accumulaten_eliminater 1.36% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 0.67% : 0.000001s : 9: predicate.addn_zero_filter 0.65% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.53% : 0.000004s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.81% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.01% : 0.000002s : 9: predicate.dict_set_item_eliminator 1.74% : 0.000003s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000001s : 4: predicate.elim_not_effective 0.73% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.14% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 0.98% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.36% : 0.000001s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000009s : 44: predicate.inline 1.19% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000002s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.90% : 0.000003s : 4: predicate.loop_unroll_after_grad 1.61% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.64% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.64% : 0.000001s : 9: predicate.minmaximum_grad 1.89% : 0.000003s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.12% : 0.000002s : 11: predicate.partial_defer_inline 1.13% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.04% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.80% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.48% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.19% : 0.000002s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.68% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000007s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.43% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.99% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.82% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.91% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000352 6 41.05% : 0.000145s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.95% : 0.000208s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.118657 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.95% : 0.003505s : 1: add_attr 2.94% : 0.003491s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000055s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.05% : 0.000061s : 1: auto_monad 0.02% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000485s : 1: bootstrap 0.03% : 0.000032s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.40% : 0.000476s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.60% : 0.000711s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000018s : 1: opt.transform.mutable_eliminate 0.69% : 0.000816s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000092s : 28: opt.transform.opt_b 0.04% : 0.000047s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.97% : 0.002338s : 1: opt_a 0.09% : 0.000106s : 1: opt_after_cconv 0.44% : 0.000517s : 1: opt_after_jit_grad 0.17% : 0.000201s : 1: opt_b 3.85% : 0.004563s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000035s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.32% : 0.000378s : 1: renormalize.infer 0.24% : 0.000287s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000044s : 1: rewriter_after_opt_a 0.05% : 0.000053s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000078s : 1: symbol_engine_optimizer 79.55% : 0.094398s : 1: task_emit 0.06% : 0.000077s : 1: tuple_transform 4.33% : 0.005143s : 1: type_inference 0.07% : 0.000077s : 1: validate TotalTime = 0.100404, [24] [bootstrap]: 0.00046421 [type_inference]: 0.00652372 [event_method]: 1.595e-05 [auto_monad]: 5.725e-05 [graph_reusing]: 5.77001e-06 [inline]: 2.79001e-06 [add_attr]: 0.00355005, [1] [add_attr_with_inline]: 0.00353579, [1] [Cycle 1]: 9.305e-05, [2] [tag_attr]: 2.306e-05 [meta_addattr_fg_expand]: 4.42e-06 [parallel-infer-symbol]: 3.85e-06 [pre_auto_parallel]: 4.02e-05 [insert-virtual-dataset]: 2.59001e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.32001e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.00524346, [53] [py_interpret_to_execute]: 2.962e-05 [rewriter_before_opt_a]: 7.672e-05 [opt_a]: 0.00282375, [2] [Cycle 1]: 0.00214222, [45] [expand_dump_flag]: 2.78998e-06 [switch_simplify]: 3.393e-05 [loop_unroll]: 2.223e-05 [a_1]: 0.00052231 [with_stream_mark]: 2.129e-05 [recompute_prepare]: 1.111e-05 [updatestate_depend_eliminate]: 4.10998e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 8.03e-05 [accelerated_algorithm]: 7.27997e-06 [shard]: 2.53998e-06 [meta_shard_fg_expand]: 2.06998e-06 [shard_inline]: 6.33e-06 [merge_send_recv]: 9.87999e-06 [auto_parallel]: 8.37e-06 [parallel]: 2.092e-05 [flash_sp]: 9.81e-06 [merge_comm]: 4.50999e-06 [allreduce_fusion]: 3.57002e-06 [matmul_add_comm_reduction]: 1.022e-05 [allreduce_slice_to_reducescatter]: 8.29983e-07 [virtual_shard_identity]: 8.65999e-06 [virtual_dataset]: 6.53e-06 [get_grad_eliminate_]: 6.18002e-06 [virtual_output]: 6.01998e-06 [merge_forward]: 5.17e-06 [cell_reuse_recompute_pass]: 1.64e-06 [offload_activation]: 1.081e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.241e-05 [merge_recompute_call_nodes]: 1.41002e-06 [before_grad]: 9.92001e-06 [set_forward_comm_id_for_comm_node_pass]: 4.53999e-06 [meta_fg_expand]: 2.42001e-06 [flash_sp_send_recv_attached]: 3.14001e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 1.172e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00084471 [add_forward_monad_depend]: 7.56999e-06 [auto_monad_grad]: 2.68e-06 [auto_monad_eliminator]: 2.001e-05 [cse]: 3.314e-05 [a_3]: 5.182e-05 [Cycle 2]: 0.00066744, [45] [expand_dump_flag]: 2.47001e-06 [switch_simplify]: 8.92e-06 [loop_unroll]: 5.57001e-06 [a_1]: 0.00013669 [with_stream_mark]: 1.679e-05 [recompute_prepare]: 6.38e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.76999e-06 [parameter_eliminate]: 1.42999e-06 [a_2]: 7.018e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.17001e-06 [meta_shard_fg_expand]: 1.41998e-06 [shard_inline]: 5.51002e-06 [merge_send_recv]: 6.83e-06 [auto_parallel]: 7.57998e-06 [parallel]: 6.44001e-06 [flash_sp]: 3.70998e-06 [merge_comm]: 3.39001e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.55001e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.35003e-06 [virtual_dataset]: 5.80002e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 5.09e-06 [merge_forward]: 3.71001e-06 [cell_reuse_recompute_pass]: 1.96e-06 [offload_activation]: 9.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.28002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.99e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 8.10999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.64e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 9.03002e-06 [cse]: 1.564e-05 [a_3]: 3.273e-05 [py_interpret_to_execute_after_opt_a]: 1.412e-05 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 3.885e-05 [convert_after_rewriter]: 7.33e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00078963 [opt_b]: 0.00022435, [1] [Cycle 1]: 0.00021555, [7] [b_1]: 0.00011296 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 8.84998e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.79999e-06 [renormalize]: 2.50002e-07 [cse]: 2.55e-05 [optimize_parallel_all_gather_comm]: 1.842e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 3.244e-05 [loop_unroll]: 0.00048758 [opt_after_cconv]: 0.00010459, [1] [Cycle 1]: 9.774e-05, [7] [c_1]: 2.876e-05 [parameter_eliminate]: 4.63999e-06 [updatestate_depend_eliminate]: 6.27001e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.927e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.473e-05 [tuple_transform]: 7.734e-05, [1] [Cycle 1]: 7.249e-05, [4] [d_1]: 4.517e-05 [none_parameter_eliminate]: 1.96e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.63e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 5.358e-05 [cse_after_recomputation]: 2.114e-05, [1] [Cycle 1]: 1.668e-05, [1] [cse]: 1.123e-05 [environ_conv]: 5.37001e-06 [swap_dp_allreduce_reducescatter]: 5.67001e-06 [bias_add_comm_swap]: 3.09999e-06 [label_micro_interleaved_index]: 5.34e-06 [label_fine_grained_interleaved_index]: 3.13e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.52001e-06 [reorder_send_recv_between_fp_bp]: 2.89001e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.35999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.24998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.10002e-06 [control_data_broadcast_order]: 1.327e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.78999e-06 [overlap_recompute_and_grad_model_parallel]: 5.40001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.16998e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 2.137e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 8.006e-05, [1] [Cycle 1]: 7.552e-05, [6] [build]: 3.97998e-06 [elim_shapecalc]: 1.066e-05 [elim_not_effective]: 1.265e-05 [opt_reshape]: 7.05e-06 [fold_const_symbol]: 9.72999e-06 [renormalize]: 2.3999e-07 [detach_backward]: 2.14999e-06 [pipeline_parallel_scheduler]: 1.63002e-06 [auto_monad_reorder]: 1.768e-05 [get_jit_bprop_graph]: 1.90001e-06 [rewriter_after_jit_bprop_graph]: 6.24001e-06 [opt_after_jit_grad]: 0.00052019 [validate]: 4.407e-05 [backend_pass]: 1.04003e-06 [task_emit]: 0.0836327 [execute]: 9.52999e-06 Sums bootstrap : 0.000464s : 0.48% type_inference : 0.006524s : 6.82% event_method : 0.000016s : 0.02% auto_monad : 0.000057s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000023s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000040s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000030s : 0.03% optimize.rewriter_before_opt_a : 0.000077s : 0.08% optimize.opt_a.expand_dump_flag : 0.000005s : 0.01% optimize.opt_a.switch_simplify : 0.000043s : 0.04% optimize.opt_a.loop_unroll : 0.000028s : 0.03% optimize.opt_a.a_1 : 0.000659s : 0.69% optimize.opt_a.with_stream_mark : 0.000038s : 0.04% optimize.opt_a.recompute_prepare : 0.000017s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000017s : 0.02% optimize.opt_a.auto_parallel : 0.000016s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000014s : 0.01% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000020s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000016s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000009s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000020s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000019s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000845s : 0.88% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000029s : 0.03% optimize.opt_a.cse : 0.000049s : 0.05% optimize.opt_a.a_3 : 0.000085s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000039s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000790s : 0.82% optimize.opt_b.b_1 : 0.000113s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000009s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000025s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000032s : 0.03% optimize.loop_unroll : 0.000488s : 0.51% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000015s : 0.02% optimize.tuple_transform.d_1 : 0.000045s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000054s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000021s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000018s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.01% opt_after_jit_grad : 0.000520s : 0.54% validate : 0.000044s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.083633s : 87.38% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000228 30 14.52% : 0.000033s : 5: substitution.arithmetic_simplify 0.76% : 0.000002s : 2: substitution.elim_not_effective 0.61% : 0.000001s : 2: substitution.fold_const_symbol 3.02% : 0.000007s : 4: substitution.graph_param_transform 69.40% : 0.000159s : 3: substitution.inline 1.49% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.08% : 0.000005s : 4: substitution.remove_not_recompute_node 2.29% : 0.000005s : 4: substitution.replace_old_param 5.83% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006467 2 87.80% : 0.005678s : 1: type_inference.infer 12.20% : 0.000789s : 1: type_inference.specialize ------[replace.] 0.000045 5 73.18% : 0.000033s : 3: replace.inline 26.82% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000168 5 92.67% : 0.000156s : 3: match.inline 7.33% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000175 1131 0.93% : 0.000002s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.73% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.50% : 0.000004s : 19: predicate.arithmetic_simplify 1.09% : 0.000002s : 11: predicate.cast_eliminate 0.64% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.97% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.99% : 0.000002s : 15: predicate.environ_get_depend_swap 1.67% : 0.000003s : 23: predicate.environ_get_eliminate 0.97% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.15% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000001s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.51% : 0.000001s : 8: predicate.incorporate_call_switch 6.21% : 0.000011s : 51: predicate.inline 0.93% : 0.000002s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000002s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.21% : 0.000004s : 32: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.54% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.83% : 0.000003s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.39% : 0.000002s : 16: predicate.partial_defer_inline 1.36% : 0.000002s : 17: predicate.partial_eliminate 0.77% : 0.000001s : 11: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.22% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.43% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000002s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.25% : 0.000002s : 16: predicate.switch_defer_inline 1.84% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.29% : 0.000009s : 54: predicate.switch_simplify 1.08% : 0.000002s : 11: predicate.tile_eliminate 0.90% : 0.000002s : 11: predicate.transpose_eliminate 1.48% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000003s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.25% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.19% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.89% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.96% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000427 8 41.65% : 0.000178s : 3: func_graph_cloner_run.FuncGraphClonerGraph 58.35% : 0.000249s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.111255 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.20% : 0.003557s : 1: add_attr 3.18% : 0.003540s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000058s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000063s : 1: auto_monad 0.02% : 0.000022s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.45% : 0.000505s : 1: bootstrap 0.03% : 0.000036s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000023s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.45% : 0.000498s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.72% : 0.000804s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000019s : 1: opt.transform.mutable_eliminate 0.95% : 0.001053s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000092s : 28: opt.transform.opt_b 0.04% : 0.000050s : 2: opt.transform.opt_trans_graph 0.03% : 0.000036s : 4: opt.transform.symbol_engine_opt 2.54% : 0.002827s : 1: opt_a 0.10% : 0.000108s : 1: opt_after_cconv 0.48% : 0.000533s : 1: opt_after_jit_grad 0.21% : 0.000228s : 1: opt_b 4.72% : 0.005249s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000025s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000045s : 1: pre_auto_parallel 0.03% : 0.000033s : 1: py_interpret_to_execute 0.02% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.42% : 0.000464s : 1: renormalize.infer 0.33% : 0.000370s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000044s : 1: rewriter_after_opt_a 0.07% : 0.000082s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000083s : 1: symbol_engine_optimizer 75.19% : 0.083656s : 1: task_emit 0.07% : 0.000080s : 1: tuple_transform 5.88% : 0.006546s : 1: type_inference 0.07% : 0.000077s : 1: validate . TotalTime = 0.153877, [24] [bootstrap]: 0.00054086 [type_inference]: 0.0134054 [event_method]: 5.966e-05 [auto_monad]: 0.00013429 [graph_reusing]: 8.91002e-06 [inline]: 2.32001e-06 [add_attr]: 0.00384227, [1] [add_attr_with_inline]: 0.00382978, [1] [Cycle 1]: 9.605e-05, [2] [tag_attr]: 4.39e-05 [meta_addattr_fg_expand]: 9.42999e-06 [parallel-infer-symbol]: 4.45e-06 [pre_auto_parallel]: 6.207e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0167566, [53] [py_interpret_to_execute]: 4.49e-05 [rewriter_before_opt_a]: 0.00016834 [opt_a]: 0.0138513, [3] [Cycle 1]: 0.00881539, [45] [expand_dump_flag]: 6.24999e-06 [switch_simplify]: 7.582e-05 [loop_unroll]: 6.292e-05 [a_1]: 0.0015628 [with_stream_mark]: 3.274e-05 [recompute_prepare]: 2.575e-05 [updatestate_depend_eliminate]: 1.023e-05 [updatestate_assign_eliminate]: 8.01001e-06 [updatestate_loads_eliminate]: 7.65e-06 [parameter_eliminate]: 3.26999e-06 [a_2]: 0.00025003 [accelerated_algorithm]: 3.524e-05 [shard]: 2.11e-06 [meta_shard_fg_expand]: 4.32998e-06 [shard_inline]: 1.657e-05 [merge_send_recv]: 1.883e-05 [auto_parallel]: 1.322e-05 [parallel]: 2.24e-05 [flash_sp]: 1.526e-05 [merge_comm]: 1.054e-05 [allreduce_fusion]: 8.89e-06 [matmul_add_comm_reduction]: 3.354e-05 [allreduce_slice_to_reducescatter]: 1.17e-06 [virtual_shard_identity]: 1.86e-05 [virtual_dataset]: 1.594e-05 [get_grad_eliminate_]: 1.524e-05 [virtual_output]: 1.538e-05 [merge_forward]: 1.053e-05 [cell_reuse_recompute_pass]: 1.68002e-06 [offload_activation]: 1.846e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.271e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 2.845e-05 [set_forward_comm_id_for_comm_node_pass]: 1.058e-05 [meta_fg_expand]: 0.00197581 [flash_sp_send_recv_attached]: 4.63999e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 7.157e-05 [a_after_grad]: 8.812e-05 [renormalize]: 0.00320865 [add_forward_monad_depend]: 6.893e-05 [auto_monad_grad]: 7.40998e-06 [auto_monad_eliminator]: 6.586e-05 [cse]: 0.00018801 [a_3]: 0.00036336 [Cycle 2]: 0.00400089, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 4.942e-05 [loop_unroll]: 4.454e-05 [a_1]: 0.00168393 [with_stream_mark]: 2.805e-05 [recompute_prepare]: 1.541e-05 [updatestate_depend_eliminate]: 6.66999e-06 [updatestate_assign_eliminate]: 5.04e-06 [updatestate_loads_eliminate]: 4.32e-06 [parameter_eliminate]: 3.04001e-06 [a_2]: 0.00013466 [accelerated_algorithm]: 1.596e-05 [shard]: 2.84001e-06 [meta_shard_fg_expand]: 3.09001e-06 [shard_inline]: 9.72999e-06 [merge_send_recv]: 1.03e-05 [auto_parallel]: 1.277e-05 [parallel]: 1.121e-05 [flash_sp]: 4.94e-06 [merge_comm]: 5.62001e-06 [allreduce_fusion]: 5.14e-06 [matmul_add_comm_reduction]: 1.316e-05 [allreduce_slice_to_reducescatter]: 1.30999e-06 [virtual_shard_identity]: 1.394e-05 [virtual_dataset]: 8.96002e-06 [get_grad_eliminate_]: 1.172e-05 [virtual_output]: 9.54e-06 [merge_forward]: 7.03e-06 [cell_reuse_recompute_pass]: 1.82999e-06 [offload_activation]: 1.312e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.843e-05 [merge_recompute_call_nodes]: 1.94999e-06 [before_grad]: 1.591e-05 [set_forward_comm_id_for_comm_node_pass]: 6.87002e-06 [meta_fg_expand]: 0.00014896 [flash_sp_send_recv_attached]: 1.77999e-06 [receive_attached]: 2.91e-06 [after_resolve]: 2.246e-05 [a_after_grad]: 1.666e-05 [renormalize]: 0.00113236 [add_forward_monad_depend]: 7.41999e-06 [auto_monad_grad]: 2.74999e-06 [auto_monad_eliminator]: 2.373e-05 [cse]: 7.494e-05 [a_3]: 7.771e-05 [Cycle 3]: 0.00101252, [45] [expand_dump_flag]: 1.99e-06 [switch_simplify]: 1.121e-05 [loop_unroll]: 9.55001e-06 [a_1]: 0.00027323 [with_stream_mark]: 1.612e-05 [recompute_prepare]: 1.061e-05 [updatestate_depend_eliminate]: 5.79e-06 [updatestate_assign_eliminate]: 4.54002e-06 [updatestate_loads_eliminate]: 4.00998e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 0.00012572 [accelerated_algorithm]: 1.49e-05 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 2.49001e-06 [shard_inline]: 9.24998e-06 [merge_send_recv]: 1.054e-05 [auto_parallel]: 1.051e-05 [parallel]: 9.56998e-06 [flash_sp]: 1.45999e-06 [merge_comm]: 5.07999e-06 [allreduce_fusion]: 5.61e-06 [matmul_add_comm_reduction]: 1.092e-05 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 1.085e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.54998e-06 [virtual_output]: 9.22999e-06 [merge_forward]: 4.99e-06 [cell_reuse_recompute_pass]: 2.69999e-06 [offload_activation]: 1.387e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.908e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 1.577e-05 [set_forward_comm_id_for_comm_node_pass]: 6.16998e-06 [meta_fg_expand]: 4e-06 [flash_sp_send_recv_attached]: 1.40001e-06 [receive_attached]: 1.69e-06 [after_resolve]: 1.63e-05 [a_after_grad]: 1.412e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.99999e-06 [auto_monad_grad]: 1.48002e-06 [auto_monad_eliminator]: 1.387e-05 [cse]: 3.752e-05 [a_3]: 6.124e-05 [py_interpret_to_execute_after_opt_a]: 2.452e-05 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 6.048e-05 [convert_after_rewriter]: 9.31002e-06 [order_py_execute_after_rewriter]: 6.84999e-06 [mutable_eliminate]: 0.00074195 [opt_b]: 0.00031395, [1] [Cycle 1]: 0.00030557, [7] [b_1]: 0.00019668 [b_2]: 1.109e-05 [updatestate_depend_eliminate]: 9.82001e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 4.45999e-06 [renormalize]: 4.50003e-07 [cse]: 4.264e-05 [optimize_parallel_all_gather_comm]: 2.589e-05 [overlap_param_gather]: 2.14e-06 [cconv]: 3.49e-05 [loop_unroll]: 0.00048583 [opt_after_cconv]: 0.00014835, [1] [Cycle 1]: 0.00014145, [7] [c_1]: 5.152e-05 [parameter_eliminate]: 4.28001e-06 [updatestate_depend_eliminate]: 7.88001e-06 [updatestate_assign_eliminate]: 4.26001e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 3.421e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 4.71e-05 [tuple_transform]: 0.00011012, [1] [Cycle 1]: 0.00010503, [4] [d_1]: 7.444e-05 [none_parameter_eliminate]: 1.77999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 9.64e-06 [partial_unused_args_eliminate]: 2.18998e-06 [add_recomputation]: 7.351e-05 [cse_after_recomputation]: 3.498e-05, [1] [Cycle 1]: 3.022e-05, [1] [cse]: 2.424e-05 [environ_conv]: 1.229e-05 [swap_dp_allreduce_reducescatter]: 8.92e-06 [bias_add_comm_swap]: 3.06001e-06 [label_micro_interleaved_index]: 5.82001e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.66e-06 [slice_recompute_activation]: 2.26998e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.90024e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.34998e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.891e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 5.97999e-06 [overlap_recompute_and_grad_model_parallel]: 5.92001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 5.32001e-06 [overlap_grad_flash_sp]: 2.852e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 0.0001122, [1] [Cycle 1]: 0.00010713, [6] [build]: 1.431e-05 [elim_shapecalc]: 1.517e-05 [elim_not_effective]: 1.898e-05 [opt_reshape]: 1.151e-05 [fold_const_symbol]: 1.509e-05 [renormalize]: 1.69995e-07 [detach_backward]: 2.49999e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 2.817e-05 [get_jit_bprop_graph]: 2.39999e-06 [rewriter_after_jit_bprop_graph]: 5.90002e-06 [opt_after_jit_grad]: 0.00051449 [validate]: 6.183e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.118156 [execute]: 9.76e-06 Sums bootstrap : 0.000541s : 0.36% type_inference : 0.013405s : 9.03% event_method : 0.000060s : 0.04% auto_monad : 0.000134s : 0.09% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000044s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000062s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000045s : 0.03% optimize.rewriter_before_opt_a : 0.000168s : 0.11% optimize.opt_a.expand_dump_flag : 0.000011s : 0.01% optimize.opt_a.switch_simplify : 0.000136s : 0.09% optimize.opt_a.loop_unroll : 0.000117s : 0.08% optimize.opt_a.a_1 : 0.003520s : 2.37% optimize.opt_a.with_stream_mark : 0.000077s : 0.05% optimize.opt_a.recompute_prepare : 0.000052s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000023s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000008s : 0.01% optimize.opt_a.a_2 : 0.000510s : 0.34% optimize.opt_a.accelerated_algorithm : 0.000066s : 0.04% optimize.opt_a.shard : 0.000007s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.01% optimize.opt_a.shard_inline : 0.000036s : 0.02% optimize.opt_a.merge_send_recv : 0.000040s : 0.03% optimize.opt_a.auto_parallel : 0.000036s : 0.02% optimize.opt_a.parallel : 0.000043s : 0.03% optimize.opt_a.flash_sp : 0.000022s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000058s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000003s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000043s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000036s : 0.02% optimize.opt_a.virtual_output : 0.000034s : 0.02% optimize.opt_a.merge_forward : 0.000023s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000006s : 0.00% optimize.opt_a.offload_activation : 0.000045s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000070s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000005s : 0.00% optimize.opt_a.before_grad : 0.000060s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000024s : 0.02% optimize.opt_a.meta_fg_expand : 0.002129s : 1.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000008s : 0.01% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000110s : 0.07% optimize.opt_a.a_after_grad : 0.000119s : 0.08% optimize.opt_a.renormalize : 0.004341s : 2.92% optimize.opt_a.add_forward_monad_depend : 0.000078s : 0.05% optimize.opt_a.auto_monad_grad : 0.000012s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000103s : 0.07% optimize.opt_a.cse : 0.000300s : 0.20% optimize.opt_a.a_3 : 0.000502s : 0.34% optimize.py_interpret_to_execute_after_opt_a : 0.000025s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000060s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000742s : 0.50% optimize.opt_b.b_1 : 0.000197s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000010s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000043s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000026s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000035s : 0.02% optimize.loop_unroll : 0.000486s : 0.33% optimize.opt_after_cconv.c_1 : 0.000052s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000047s : 0.03% optimize.tuple_transform.d_1 : 0.000074s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000074s : 0.05% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000012s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000009s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000012s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000028s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000006s : 0.00% opt_after_jit_grad : 0.000514s : 0.35% validate : 0.000062s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.118156s : 79.61% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.001000 222 6.55% : 0.000065s : 12: substitution.arithmetic_simplify 2.49% : 0.000025s : 2: substitution.cast_eliminate 0.27% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000005s : 5: substitution.float_depend_g_call 0.47% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000010s : 8: substitution.graph_param_transform 0.29% : 0.000003s : 2: substitution.incorporate_call 0.20% : 0.000002s : 2: substitution.incorporate_call_switch 57.48% : 0.000575s : 17: substitution.inline 2.11% : 0.000021s : 2: substitution.inline_without_move 1.33% : 0.000013s : 20: substitution.j_node_and_user_rematch 2.10% : 0.000021s : 3: substitution.less_batch_normalization 1.52% : 0.000015s : 11: substitution.minmaximum_grad 0.75% : 0.000007s : 5: substitution.partial_eliminate 1.66% : 0.000017s : 20: substitution.remove_not_recompute_node 3.08% : 0.000031s : 10: substitution.replace_applicator 1.55% : 0.000015s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.29% : 0.000033s : 11: substitution.tuple_list_convert_item_index_to_positive 1.43% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 1.96% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.50% : 0.000075s : 30: substitution.tuple_list_get_item_eliminator 2.01% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.013292 2 86.04% : 0.011436s : 1: type_inference.infer 13.96% : 0.001856s : 1: type_inference.specialize ------[replace.] 0.000245 33 59.09% : 0.000145s : 17: replace.inline 40.91% : 0.000100s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000601 33 93.71% : 0.000563s : 17: match.inline 6.29% : 0.000038s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000786 5764 1.11% : 0.000009s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.01% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.16% : 0.000017s : 100: predicate.arithmetic_simplify 1.29% : 0.000010s : 68: predicate.cast_eliminate 1.10% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.25% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000014s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.65% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000018s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.52% : 0.000043s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.72% : 0.000006s : 32: predicate.less_batch_normalization 1.71% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 168: predicate.load_eliminater 0.39% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000009s : 68: predicate.minmaximum_grad 0.48% : 0.000004s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000016s : 101: predicate.partial_defer_inline 1.68% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000009s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.60% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.93% : 0.000015s : 152: predicate.replace_applicator 0.68% : 0.000005s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000009s : 68: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.36% : 0.000011s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.41% : 0.000011s : 68: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000010s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 101: predicate.switch_defer_inline 2.84% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.88% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.11% : 0.000009s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.89% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.69% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.52% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.17% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.22% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.002080 34 56.44% : 0.001174s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.56% : 0.000906s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.184441 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.09% : 0.003849s : 1: add_attr 2.08% : 0.003834s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000078s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000143s : 1: auto_monad 0.02% : 0.000033s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000579s : 1: bootstrap 0.02% : 0.000039s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.05% : 0.000096s : 1: environ_conv 0.04% : 0.000068s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000009s : 1: label_micro_interleaved_index 0.27% : 0.000496s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000756s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000024s : 1: opt.transform.mutable_eliminate 2.87% : 0.005296s : 117: opt.transform.opt_a 0.03% : 0.000050s : 1: opt.transform.opt_after_cconv 0.02% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000178s : 28: opt.transform.opt_b 0.04% : 0.000082s : 2: opt.transform.opt_trans_graph 0.03% : 0.000058s : 4: opt.transform.symbol_engine_opt 7.51% : 0.013855s : 1: opt_a 0.08% : 0.000152s : 1: opt_after_cconv 0.29% : 0.000526s : 1: opt_after_jit_grad 0.17% : 0.000317s : 1: opt_b 9.09% : 0.016762s : 1: optimize 0.02% : 0.000030s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000032s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000067s : 1: pre_auto_parallel 0.03% : 0.000049s : 1: py_interpret_to_execute 0.02% : 0.000029s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000052s : 1: remove_dup_value 1.29% : 0.002383s : 2: renormalize.infer 1.05% : 0.001937s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000009s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000066s : 1: rewriter_after_opt_a 0.09% : 0.000174s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000115s : 1: symbol_engine_optimizer 64.08% : 0.118180s : 1: task_emit 0.06% : 0.000113s : 1: tuple_transform 7.29% : 0.013440s : 1: type_inference 0.05% : 0.000096s : 1: validate TotalTime = 0.0916781, [24] [bootstrap]: 0.00042521 [type_inference]: 0.00462245 [event_method]: 1.219e-05 [auto_monad]: 5.338e-05 [graph_reusing]: 5.62001e-06 [inline]: 2.37001e-06 [add_attr]: 0.00344875, [1] [add_attr_with_inline]: 0.00343876, [1] [Cycle 1]: 5.863e-05, [2] [tag_attr]: 1.498e-05 [meta_addattr_fg_expand]: 3.29001e-06 [parallel-infer-symbol]: 4.35e-06 [pre_auto_parallel]: 3e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.12999e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.00426729, [53] [py_interpret_to_execute]: 1.918e-05 [rewriter_before_opt_a]: 4.527e-05 [opt_a]: 0.00221217, [2] [Cycle 1]: 0.00158585, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 2.518e-05 [loop_unroll]: 1.388e-05 [a_1]: 0.00031748 [with_stream_mark]: 1.764e-05 [recompute_prepare]: 7.62002e-06 [updatestate_depend_eliminate]: 4.22e-06 [updatestate_assign_eliminate]: 3.32002e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.832e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.96e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 8.14002e-06 [auto_parallel]: 6.39001e-06 [parallel]: 1.933e-05 [flash_sp]: 8.30999e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 9.42999e-06 [allreduce_slice_to_reducescatter]: 9.30013e-07 [virtual_shard_identity]: 7.65e-06 [virtual_dataset]: 6.28998e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.002e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.31998e-06 [before_grad]: 9.56e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.61e-06 [flash_sp_send_recv_attached]: 2.73e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 1.119e-05 [a_after_grad]: 9.77999e-06 [renormalize]: 0.00061632 [add_forward_monad_depend]: 5.32001e-06 [auto_monad_grad]: 2.43e-06 [auto_monad_eliminator]: 1.501e-05 [cse]: 2.968e-05 [a_3]: 4.343e-05 [Cycle 2]: 0.00061566, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00013154 [with_stream_mark]: 1.07e-05 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 3.10998e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 3.29001e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.826e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.27999e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.72999e-06 [parallel]: 5.12e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.14001e-06 [matmul_add_comm_reduction]: 5.99e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.39999e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 3.11999e-06 [cell_reuse_recompute_pass]: 1.59e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.75002e-06 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.18001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 2.00002e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.39e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 7.99997e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 7.01999e-06 [cse]: 1.45e-05 [a_3]: 3.227e-05 [py_interpret_to_execute_after_opt_a]: 1.029e-05 [slice_cell_reuse_recomputed_activation]: 2.64001e-06 [rewriter_after_opt_a]: 3.554e-05 [convert_after_rewriter]: 7.05002e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00060737 [opt_b]: 0.00018898, [1] [Cycle 1]: 0.00018183, [7] [b_1]: 0.0001096 [b_2]: 7.87e-06 [updatestate_depend_eliminate]: 5.95002e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.17001e-06 [renormalize]: 3.00002e-07 [cse]: 1.868e-05 [optimize_parallel_all_gather_comm]: 1.759e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.643e-05 [loop_unroll]: 0.00043926 [opt_after_cconv]: 9.877e-05, [1] [Cycle 1]: 9.203e-05, [7] [c_1]: 2.855e-05 [parameter_eliminate]: 2.79001e-06 [updatestate_depend_eliminate]: 5.59998e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.09e-06 [cse]: 1.628e-05 [renormalize]: 9.00007e-07 [remove_dup_value]: 1.219e-05 [tuple_transform]: 7.17e-05, [1] [Cycle 1]: 6.678e-05, [4] [d_1]: 4.077e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.15002e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.569e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.626e-05, [1] [cse]: 1.081e-05 [environ_conv]: 5.76998e-06 [swap_dp_allreduce_reducescatter]: 5.42999e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.51998e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.63e-06 [reorder_send_recv_between_fp_bp]: 2.85002e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91998e-06 [control_data_broadcast_order]: 1.235e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 4.31002e-06 [overlap_recompute_and_grad_model_parallel]: 5.23002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 4.50999e-06 [overlap_grad_flash_sp]: 1.944e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.50002e-06 [split_layernorm_comm]: 1.68002e-06 [handle_group_info]: 1.03001e-06 [symbol_engine_optimizer]: 7.304e-05, [1] [Cycle 1]: 6.864e-05, [6] [build]: 4.23999e-06 [elim_shapecalc]: 9.48002e-06 [elim_not_effective]: 1.169e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.36998e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.645e-05 [get_jit_bprop_graph]: 1.96e-06 [rewriter_after_jit_bprop_graph]: 4.42e-06 [opt_after_jit_grad]: 0.00049932 [validate]: 4.384e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.077975 [execute]: 1.018e-05 Sums bootstrap : 0.000425s : 0.49% type_inference : 0.004622s : 5.30% event_method : 0.000012s : 0.01% auto_monad : 0.000053s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000030s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000045s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000449s : 0.51% optimize.opt_a.with_stream_mark : 0.000028s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000024s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000616s : 0.71% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.03% optimize.opt_a.cse : 0.000044s : 0.05% optimize.opt_a.a_3 : 0.000076s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000607s : 0.70% optimize.opt_b.b_1 : 0.000110s : 0.13% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000018s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000026s : 0.03% optimize.loop_unroll : 0.000439s : 0.50% optimize.opt_after_cconv.c_1 : 0.000029s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000041s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000499s : 0.57% validate : 0.000044s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.077975s : 89.42% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000142 26 17.89% : 0.000025s : 4: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 4.04% : 0.000006s : 4: substitution.graph_param_transform 67.52% : 0.000096s : 2: substitution.inline 2.02% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.00% : 0.000004s : 4: substitution.remove_not_recompute_node 3.29% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004572 2 91.69% : 0.004193s : 1: type_inference.infer 8.31% : 0.000380s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000094 2 100.00% : 0.000094s : 2: match.inline ------[predicate.] 0.000142 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.98% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.90% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.73% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.39% : 0.000001s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.08% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.64% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.59% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.84% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 1.34% : 0.000002s : 8: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 4: predicate.row_tensor_eliminate 1.10% : 0.000002s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 1.01% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.97% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.45% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.98% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.47% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000304 6 37.48% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 62.52% : 0.000190s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.100969 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.42% : 0.003454s : 1: add_attr 3.41% : 0.003443s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000058s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.46% : 0.000465s : 1: bootstrap 0.03% : 0.000030s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000018s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.44% : 0.000449s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.61% : 0.000618s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000015s : 1: opt.transform.mutable_eliminate 0.80% : 0.000807s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000092s : 28: opt.transform.opt_b 0.04% : 0.000045s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.19% : 0.002215s : 1: opt_a 0.10% : 0.000102s : 1: opt_after_cconv 0.51% : 0.000511s : 1: opt_after_jit_grad 0.19% : 0.000193s : 1: opt_b 4.23% : 0.004272s : 1: optimize 0.02% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000035s : 1: pre_auto_parallel 0.02% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.36% : 0.000368s : 1: renormalize.infer 0.24% : 0.000241s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000040s : 1: rewriter_after_opt_a 0.05% : 0.000050s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000076s : 1: symbol_engine_optimizer 77.25% : 0.077997s : 1: task_emit 0.07% : 0.000075s : 1: tuple_transform 4.60% : 0.004644s : 1: type_inference 0.08% : 0.000077s : 1: validate TotalTime = 0.138032, [24] [bootstrap]: 0.00049791 [type_inference]: 0.0111387 [event_method]: 4.789e-05 [auto_monad]: 0.00011924 [graph_reusing]: 8.43999e-06 [inline]: 2.46998e-06 [add_attr]: 0.00332445, [1] [add_attr_with_inline]: 0.00331452, [1] [Cycle 1]: 7.57e-05, [2] [tag_attr]: 3.343e-05 [meta_addattr_fg_expand]: 8.11002e-06 [parallel-infer-symbol]: 3.23e-06 [pre_auto_parallel]: 5.044e-05 [insert-virtual-dataset]: 2.41998e-06 [parallel-infer-symbol-second]: 8.79983e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.0149404, [53] [py_interpret_to_execute]: 3.798e-05 [rewriter_before_opt_a]: 0.00013403 [opt_a]: 0.0122127, [3] [Cycle 1]: 0.00790582, [45] [expand_dump_flag]: 3.95e-06 [switch_simplify]: 6.849e-05 [loop_unroll]: 5.504e-05 [a_1]: 0.00143366 [with_stream_mark]: 2.78e-05 [recompute_prepare]: 2.247e-05 [updatestate_depend_eliminate]: 9.52999e-06 [updatestate_assign_eliminate]: 7.65e-06 [updatestate_loads_eliminate]: 7.15003e-06 [parameter_eliminate]: 2.66999e-06 [a_2]: 0.00024752 [accelerated_algorithm]: 3.307e-05 [shard]: 2.21e-06 [meta_shard_fg_expand]: 3.79002e-06 [shard_inline]: 1.581e-05 [merge_send_recv]: 1.767e-05 [auto_parallel]: 1.169e-05 [parallel]: 2.082e-05 [flash_sp]: 1.323e-05 [merge_comm]: 1.018e-05 [allreduce_fusion]: 9.56e-06 [matmul_add_comm_reduction]: 2.977e-05 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 1.77e-05 [virtual_dataset]: 1.545e-05 [get_grad_eliminate_]: 1.518e-05 [virtual_output]: 1.509e-05 [merge_forward]: 9.87001e-06 [cell_reuse_recompute_pass]: 1.65001e-06 [offload_activation]: 1.87e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.002e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 2.775e-05 [set_forward_comm_id_for_comm_node_pass]: 1.019e-05 [meta_fg_expand]: 0.00163847 [flash_sp_send_recv_attached]: 3.99002e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 6.364e-05 [a_after_grad]: 8.398e-05 [renormalize]: 0.0029987 [add_forward_monad_depend]: 1.013e-05 [auto_monad_grad]: 6.33e-06 [auto_monad_eliminator]: 5.772e-05 [cse]: 0.00018053 [a_3]: 0.000344 [Cycle 2]: 0.00332647, [45] [expand_dump_flag]: 2.76999e-06 [switch_simplify]: 4.777e-05 [loop_unroll]: 4.428e-05 [a_1]: 0.00161534 [with_stream_mark]: 1.476e-05 [recompute_prepare]: 1.121e-05 [updatestate_depend_eliminate]: 5.76998e-06 [updatestate_assign_eliminate]: 5.44e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 1.89999e-06 [a_2]: 0.00012743 [accelerated_algorithm]: 1.33e-05 [shard]: 2.26e-06 [meta_shard_fg_expand]: 2.95998e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 9.39e-06 [auto_parallel]: 1.07e-05 [parallel]: 9.28002e-06 [flash_sp]: 3.8e-06 [merge_comm]: 6.33e-06 [allreduce_fusion]: 5.47999e-06 [matmul_add_comm_reduction]: 1.104e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.016e-05 [virtual_dataset]: 9.20001e-06 [get_grad_eliminate_]: 8.50999e-06 [virtual_output]: 8.55001e-06 [merge_forward]: 5.37001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.217e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.718e-05 [merge_recompute_call_nodes]: 1.27e-06 [before_grad]: 1.507e-05 [set_forward_comm_id_for_comm_node_pass]: 5.83002e-06 [meta_fg_expand]: 5.07e-05 [flash_sp_send_recv_attached]: 1.66002e-06 [receive_attached]: 2.38002e-06 [after_resolve]: 1.634e-05 [a_after_grad]: 1.436e-05 [renormalize]: 0.00077605 [add_forward_monad_depend]: 4.87e-06 [auto_monad_grad]: 2.82002e-06 [auto_monad_eliminator]: 1.798e-05 [cse]: 5.917e-05 [a_3]: 6.94e-05 [Cycle 3]: 0.00096138, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 1.08e-05 [loop_unroll]: 9.26998e-06 [a_1]: 0.00026454 [with_stream_mark]: 1.204e-05 [recompute_prepare]: 9.22001e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012549 [accelerated_algorithm]: 1.31e-05 [shard]: 1.23002e-06 [meta_shard_fg_expand]: 2.34999e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 8.35999e-06 [parallel]: 6.19999e-06 [flash_sp]: 1.33002e-06 [merge_comm]: 5.19e-06 [allreduce_fusion]: 5.54998e-06 [matmul_add_comm_reduction]: 9.74e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 1.021e-05 [virtual_dataset]: 9.17999e-06 [get_grad_eliminate_]: 8.57e-06 [virtual_output]: 8.38001e-06 [merge_forward]: 5.49e-06 [cell_reuse_recompute_pass]: 2.31e-06 [offload_activation]: 1.089e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.751e-05 [merge_recompute_call_nodes]: 1.05999e-06 [before_grad]: 1.51e-05 [set_forward_comm_id_for_comm_node_pass]: 5.61003e-06 [meta_fg_expand]: 3.08e-06 [flash_sp_send_recv_attached]: 1.07998e-06 [receive_attached]: 1.42e-06 [after_resolve]: 1.446e-05 [a_after_grad]: 1.452e-05 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.88002e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.319e-05 [cse]: 3.277e-05 [a_3]: 6.333e-05 [py_interpret_to_execute_after_opt_a]: 1.973e-05 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 5.676e-05 [convert_after_rewriter]: 9.74999e-06 [order_py_execute_after_rewriter]: 7.44002e-06 [mutable_eliminate]: 0.00069835 [opt_b]: 0.00034158, [1] [Cycle 1]: 0.00033287, [7] [b_1]: 0.00022341 [b_2]: 1.312e-05 [updatestate_depend_eliminate]: 8.35999e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 4.48001e-06 [renormalize]: 2.60014e-07 [cse]: 3.859e-05 [optimize_parallel_all_gather_comm]: 2.302e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.919e-05 [loop_unroll]: 0.00048007 [opt_after_cconv]: 0.00014595, [1] [Cycle 1]: 0.00013915, [7] [c_1]: 5.055e-05 [parameter_eliminate]: 2.96999e-06 [updatestate_depend_eliminate]: 8e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 3.85e-06 [cse]: 3.374e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 4.428e-05 [tuple_transform]: 0.0001104, [1] [Cycle 1]: 0.00010556, [4] [d_1]: 7.387e-05 [none_parameter_eliminate]: 1.80001e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 1.027e-05 [partial_unused_args_eliminate]: 1.79e-06 [add_recomputation]: 6.889e-05 [cse_after_recomputation]: 3.609e-05, [1] [Cycle 1]: 3.099e-05, [1] [cse]: 2.47e-05 [environ_conv]: 1.074e-05 [swap_dp_allreduce_reducescatter]: 8.37998e-06 [bias_add_comm_swap]: 2.56998e-06 [label_micro_interleaved_index]: 5.34e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.81e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 1.23002e-06 [remove_cast_before_assign_add]: 1.24e-06 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 3.08e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.932e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 5.44e-06 [overlap_recompute_and_grad_model_parallel]: 5.97001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 6.01e-06 [overlap_grad_flash_sp]: 2.891e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 2.19001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00011277, [1] [Cycle 1]: 0.0001076, [6] [build]: 1.18e-05 [elim_shapecalc]: 1.631e-05 [elim_not_effective]: 1.96e-05 [opt_reshape]: 1.064e-05 [fold_const_symbol]: 1.721e-05 [renormalize]: 3.10014e-07 [detach_backward]: 2.19999e-06 [pipeline_parallel_scheduler]: 1.89e-06 [auto_monad_reorder]: 2.633e-05 [get_jit_bprop_graph]: 2.51998e-06 [rewriter_after_jit_bprop_graph]: 4.77998e-06 [opt_after_jit_grad]: 0.00052062 [validate]: 5.885e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.107009 [execute]: 9.00999e-06 Sums bootstrap : 0.000498s : 0.37% type_inference : 0.011139s : 8.35% event_method : 0.000048s : 0.04% auto_monad : 0.000119s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000134s : 0.10% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000127s : 0.10% optimize.opt_a.loop_unroll : 0.000109s : 0.08% optimize.opt_a.a_1 : 0.003314s : 2.49% optimize.opt_a.with_stream_mark : 0.000055s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000500s : 0.38% optimize.opt_a.accelerated_algorithm : 0.000059s : 0.04% optimize.opt_a.shard : 0.000006s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000035s : 0.03% optimize.opt_a.auto_parallel : 0.000031s : 0.02% optimize.opt_a.parallel : 0.000036s : 0.03% optimize.opt_a.flash_sp : 0.000018s : 0.01% optimize.opt_a.merge_comm : 0.000022s : 0.02% optimize.opt_a.allreduce_fusion : 0.000021s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000051s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000021s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000042s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000065s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.02% optimize.opt_a.meta_fg_expand : 0.001692s : 1.27% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.01% optimize.opt_a.receive_attached : 0.000006s : 0.00% optimize.opt_a.after_resolve : 0.000094s : 0.07% optimize.opt_a.a_after_grad : 0.000113s : 0.08% optimize.opt_a.renormalize : 0.003775s : 2.83% optimize.opt_a.add_forward_monad_depend : 0.000017s : 0.01% optimize.opt_a.auto_monad_grad : 0.000010s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000089s : 0.07% optimize.opt_a.cse : 0.000272s : 0.20% optimize.opt_a.a_3 : 0.000477s : 0.36% optimize.py_interpret_to_execute_after_opt_a : 0.000020s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000057s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000698s : 0.52% optimize.opt_b.b_1 : 0.000223s : 0.17% optimize.opt_b.b_2 : 0.000013s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000039s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000023s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000029s : 0.02% optimize.loop_unroll : 0.000480s : 0.36% optimize.opt_after_cconv.c_1 : 0.000051s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000044s : 0.03% optimize.tuple_transform.d_1 : 0.000074s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000069s : 0.05% optimize.cse_after_recomputation.cse : 0.000025s : 0.02% optimize.environ_conv : 0.000011s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000019s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000029s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000012s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000016s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000020s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000017s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000003s : 0.00% rewriter_after_jit_bprop_graph : 0.000005s : 0.00% opt_after_jit_grad : 0.000521s : 0.39% validate : 0.000059s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.107009s : 80.26% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000837 218 6.60% : 0.000055s : 11: substitution.arithmetic_simplify 2.07% : 0.000017s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.50% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000003s : 5: substitution.fold_const_symbol 1.03% : 0.000009s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.31% : 0.000463s : 16: substitution.inline 2.11% : 0.000018s : 2: substitution.inline_without_move 1.40% : 0.000012s : 20: substitution.j_node_and_user_rematch 2.18% : 0.000018s : 3: substitution.less_batch_normalization 1.70% : 0.000014s : 11: substitution.minmaximum_grad 0.79% : 0.000007s : 5: substitution.partial_eliminate 1.70% : 0.000014s : 20: substitution.remove_not_recompute_node 3.24% : 0.000027s : 10: substitution.replace_applicator 1.48% : 0.000012s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 1.69% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.22% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 7.91% : 0.000066s : 28: substitution.tuple_list_get_item_eliminator 2.24% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011055 2 87.05% : 0.009624s : 1: type_inference.infer 12.95% : 0.001431s : 1: type_inference.specialize ------[replace.] 0.000211 30 58.99% : 0.000125s : 16: replace.inline 41.01% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000486 30 93.56% : 0.000454s : 16: match.inline 6.44% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000800 5663 1.03% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 1.01% : 0.000008s : 67: predicate.addn_zero_filter 0.98% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 99: predicate.arithmetic_simplify 1.07% : 0.000009s : 67: predicate.cast_eliminate 1.11% : 0.000009s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.10% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.12% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.05% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.12% : 0.000009s : 75: predicate.environ_get_depend_swap 1.66% : 0.000013s : 107: predicate.environ_get_eliminate 1.10% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.56% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.13% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.30% : 0.000042s : 244: predicate.inline 1.18% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 3.65% : 0.000029s : 97: predicate.list_to_tuple_eliminator_ 2.48% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.04% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.35% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.07% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.06% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 67: predicate.minmaximum_grad 0.37% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.86% : 0.000015s : 97: predicate.partial_defer_inline 1.58% : 0.000013s : 89: predicate.partial_eliminate 1.00% : 0.000008s : 67: predicate.print_const_string_wrapper 0.57% : 0.000005s : 32: predicate.reduce_all_const_elim 1.26% : 0.000010s : 67: predicate.reduce_eliminate 5.62% : 0.000045s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.80% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.13% : 0.000001s : 8: predicate.reset_defer_inline 1.03% : 0.000008s : 67: predicate.reshape_eliminate 1.10% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000003s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.70% : 0.000014s : 97: predicate.switch_defer_inline 2.74% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.58% : 0.000037s : 265: predicate.switch_simplify 1.03% : 0.000008s : 67: predicate.tile_eliminate 1.05% : 0.000008s : 67: predicate.transpose_eliminate 1.42% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.65% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.38% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.94% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.51% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.46% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.05% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001714 32 57.71% : 0.000989s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.29% : 0.000725s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.165433 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.01% : 0.003329s : 1: add_attr 2.01% : 0.003319s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000074s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000127s : 1: auto_monad 0.02% : 0.000031s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.32% : 0.000535s : 1: bootstrap 0.02% : 0.000033s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000022s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.02% : 0.000039s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000014s : 1: environ_conv 0.03% : 0.000056s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000007s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.30% : 0.000491s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.43% : 0.000710s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000019s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000020s : 1: opt.transform.mutable_eliminate 3.02% : 0.004997s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000036s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000206s : 28: opt.transform.opt_b 0.05% : 0.000082s : 2: opt.transform.opt_trans_graph 0.04% : 0.000060s : 4: opt.transform.symbol_engine_opt 7.38% : 0.012216s : 1: opt_a 0.09% : 0.000150s : 1: opt_after_cconv 0.32% : 0.000532s : 1: opt_after_jit_grad 0.21% : 0.000345s : 1: opt_b 9.03% : 0.014946s : 1: optimize 0.02% : 0.000027s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000011s : 1: order_py_execute_after_rewriter 0.02% : 0.000032s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000023s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000049s : 1: remove_dup_value 1.27% : 0.002101s : 2: renormalize.infer 1.00% : 0.001658s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000061s : 1: rewriter_after_opt_a 0.08% : 0.000139s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000012s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000116s : 1: symbol_engine_optimizer 64.70% : 0.107032s : 1: task_emit 0.07% : 0.000113s : 1: tuple_transform 6.75% : 0.011164s : 1: type_inference 0.06% : 0.000093s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x1-ge],max_mem:46.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x2-pynative],max_mem:46.0M TotalTime = 0.0228782, [24] [bootstrap]: 0.00055686 [type_inference]: 0.00650548 [event_method]: 1.441e-05 [auto_monad]: 5.764e-05 [graph_reusing]: 5.15001e-06 [inline]: 2.04e-06 [add_attr]: 0.00376912, [1] [add_attr_with_inline]: 0.0036842, [1] [Cycle 1]: 4.711e-05, [2] [tag_attr]: 1.604e-05 [meta_addattr_fg_expand]: 4.02e-06 [parallel-infer-symbol]: 3.09999e-06 [pre_auto_parallel]: 3.042e-05 [insert-virtual-dataset]: 2.99999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.10002e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.0041428, [53] [py_interpret_to_execute]: 2.148e-05 [rewriter_before_opt_a]: 5.964e-05 [opt_a]: 0.00227596, [2] [Cycle 1]: 0.00166573, [45] [expand_dump_flag]: 3.21001e-06 [switch_simplify]: 3.193e-05 [loop_unroll]: 8.277e-05 [a_1]: 0.0004623 [with_stream_mark]: 1.35e-05 [recompute_prepare]: 7.84002e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.76003e-06 [a_2]: 7.545e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 1.84998e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 8.22998e-06 [auto_parallel]: 6.09001e-06 [parallel]: 2.573e-05 [flash_sp]: 6.89999e-06 [merge_comm]: 3.79002e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 9.29e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 6.38003e-06 [get_grad_eliminate_]: 5.31998e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.48999e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 1.016e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 8.95001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61001e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 9.87999e-06 [a_after_grad]: 8.67e-06 [renormalize]: 0.00049392 [add_forward_monad_depend]: 4.69998e-06 [auto_monad_grad]: 2.19999e-06 [auto_monad_eliminator]: 1.343e-05 [cse]: 2.718e-05 [a_3]: 4.087e-05 [Cycle 2]: 0.00060004, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.0001296 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 5.92999e-06 [updatestate_depend_eliminate]: 2.98e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.788e-05 [accelerated_algorithm]: 5.32999e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.50999e-06 [auto_parallel]: 5.17e-06 [parallel]: 4.13999e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 2.95998e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.19998e-06 [allreduce_slice_to_reducescatter]: 2.30008e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 5.01002e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.85002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.97e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95002e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.90001e-06 [a_after_grad]: 7.84002e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.09998e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.51999e-06 [cse]: 1.401e-05 [a_3]: 3.235e-05 [py_interpret_to_execute_after_opt_a]: 7.59002e-06 [slice_cell_reuse_recomputed_activation]: 2.02001e-06 [rewriter_after_opt_a]: 2.998e-05 [convert_after_rewriter]: 7.51001e-06 [order_py_execute_after_rewriter]: 5.17999e-06 [mutable_eliminate]: 0.00046368 [opt_b]: 0.0001823, [1] [Cycle 1]: 0.00017534, [7] [b_1]: 0.00010733 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.35002e-06 [renormalize]: 4.69998e-07 [cse]: 1.663e-05 [optimize_parallel_all_gather_comm]: 1.656e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.309e-05 [loop_unroll]: 0.00041914 [opt_after_cconv]: 9.466e-05, [1] [Cycle 1]: 8.903e-05, [7] [c_1]: 2.72e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.612e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 6.994e-05, [1] [Cycle 1]: 6.595e-05, [4] [d_1]: 4.002e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 4.878e-05 [cse_after_recomputation]: 2.129e-05, [1] [Cycle 1]: 1.683e-05, [1] [cse]: 1.147e-05 [environ_conv]: 5.12e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.41998e-06 [label_micro_interleaved_index]: 4.55999e-06 [label_fine_grained_interleaved_index]: 2.45002e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.57001e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.24998e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.176e-05 [grouped_pairwise_exchange_alltoall]: 1.79998e-06 [offloading_packed_experts]: 3.40998e-06 [overlap_recompute_and_grad_model_parallel]: 4.51002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.759e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.71002e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 6.897e-05, [1] [Cycle 1]: 6.486e-05, [6] [build]: 2.55002e-06 [elim_shapecalc]: 8.70999e-06 [elim_not_effective]: 1.124e-05 [opt_reshape]: 6.25002e-06 [fold_const_symbol]: 9.31e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 4.49e-05 [get_jit_bprop_graph]: 1.14003e-06 [rewriter_after_jit_bprop_graph]: 0.00011549 [opt_after_jit_grad]: 0.00045697 [validate]: 3.382e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00690048 [execute]: 8.84e-06 Sums bootstrap : 0.000557s : 3.07% type_inference : 0.006505s : 35.90% event_method : 0.000014s : 0.08% auto_monad : 0.000058s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000088s : 0.49% optimize.opt_a.a_1 : 0.000592s : 3.27% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.06% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000030s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.07% optimize.opt_a.virtual_dataset : 0.000012s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.09% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000494s : 2.73% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000041s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000464s : 2.56% optimize.opt_b.b_1 : 0.000107s : 0.59% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000419s : 2.31% optimize.opt_after_cconv.c_1 : 0.000027s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000045s : 0.25% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000115s : 0.64% opt_after_jit_grad : 0.000457s : 2.52% validate : 0.000034s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006900s : 38.08% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000173 30 14.23% : 0.000025s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.32% : 0.000006s : 4: substitution.graph_param_transform 68.21% : 0.000118s : 3: substitution.inline 1.59% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.15% : 0.000004s : 4: substitution.replace_old_param 6.21% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006458 2 90.89% : 0.005869s : 1: type_inference.infer 9.11% : 0.000588s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.00% : 0.000028s : 3: replace.inline 29.00% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000126 5 92.29% : 0.000116s : 3: match.inline 7.71% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.88% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.82% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000001s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.06% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000002s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.94% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.71% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000403 8 47.22% : 0.000190s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.78% : 0.000212s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032379 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.65% : 0.003773s : 1: add_attr 11.39% : 0.003688s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000063s : 1: auto_monad 0.15% : 0.000049s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.85% : 0.000598s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.32% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.46% : 0.000472s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.15% : 0.001019s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.04% : 0.002279s : 1: opt_a 0.30% : 0.000098s : 1: opt_after_cconv 1.44% : 0.000467s : 1: opt_after_jit_grad 0.57% : 0.000186s : 1: opt_b 12.81% : 0.004147s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000035s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.81% : 0.000262s : 1: renormalize.infer 0.69% : 0.000225s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.37% : 0.000121s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000072s : 1: symbol_engine_optimizer 21.35% : 0.006912s : 1: task_emit 0.22% : 0.000073s : 1: tuple_transform 20.14% : 0.006521s : 1: type_inference 0.21% : 0.000069s : 1: validate TotalTime = 0.0186479, [24] [bootstrap]: 0.00042067 [type_inference]: 0.00437317 [event_method]: 1.064e-05 [auto_monad]: 5.031e-05 [graph_reusing]: 4.94e-06 [inline]: 1.88002e-06 [add_attr]: 0.00310604, [1] [add_attr_with_inline]: 0.00309784, [1] [Cycle 1]: 4.462e-05, [2] [tag_attr]: 1.185e-05 [meta_addattr_fg_expand]: 3.04999e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.216e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00383609, [53] [py_interpret_to_execute]: 1.592e-05 [rewriter_before_opt_a]: 3.946e-05 [opt_a]: 0.00192239, [2] [Cycle 1]: 0.00131966, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.406e-05 [loop_unroll]: 1.351e-05 [a_1]: 0.00030201 [with_stream_mark]: 1.462e-05 [recompute_prepare]: 7e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.55e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 5.77001e-06 [merge_send_recv]: 8.12e-06 [auto_parallel]: 6.36e-06 [parallel]: 1.806e-05 [flash_sp]: 7.86001e-06 [merge_comm]: 3.70998e-06 [allreduce_fusion]: 3.54002e-06 [matmul_add_comm_reduction]: 9.32999e-06 [allreduce_slice_to_reducescatter]: 6.49976e-07 [virtual_shard_identity]: 7.56999e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 3.62998e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.49999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.117e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.71001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 1.098e-05 [a_after_grad]: 8.56002e-06 [renormalize]: 0.00039305 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 2.826e-05 [a_3]: 3.948e-05 [Cycle 2]: 0.00059292, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 6.53e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.0001257 [with_stream_mark]: 1.014e-05 [recompute_prepare]: 5.49998e-06 [updatestate_depend_eliminate]: 2.64999e-06 [updatestate_assign_eliminate]: 2.14e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.705e-05 [accelerated_algorithm]: 5.37999e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.28002e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.47998e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.89e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 5.00999e-06 [allreduce_slice_to_reducescatter]: 2.30008e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.29998e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 8.19998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 8.97999e-06 [a_after_grad]: 7.96001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 6.93998e-06 [cse]: 1.232e-05 [a_3]: 3.168e-05 [py_interpret_to_execute_after_opt_a]: 8.23001e-06 [slice_cell_reuse_recomputed_activation]: 2.03002e-06 [rewriter_after_opt_a]: 3.168e-05 [convert_after_rewriter]: 7.26999e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.00054786 [opt_b]: 0.00018301, [1] [Cycle 1]: 0.00017666, [7] [b_1]: 0.00010861 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 5.28002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 5.59987e-07 [cse]: 1.599e-05 [optimize_parallel_all_gather_comm]: 1.648e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.308e-05 [loop_unroll]: 0.000415 [opt_after_cconv]: 9.423e-05, [1] [Cycle 1]: 8.834e-05, [7] [c_1]: 2.73e-05 [parameter_eliminate]: 2.80002e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.552e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.309e-05 [tuple_transform]: 6.915e-05, [1] [Cycle 1]: 6.469e-05, [4] [d_1]: 3.934e-05 [none_parameter_eliminate]: 1.30001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.94999e-06 [add_recomputation]: 4.292e-05 [cse_after_recomputation]: 2.012e-05, [1] [Cycle 1]: 1.562e-05, [1] [cse]: 1.05e-05 [environ_conv]: 4.77998e-06 [swap_dp_allreduce_reducescatter]: 4.93001e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.70002e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.70002e-06 [comm_op_add_attrs]: 1.56002e-06 [add_comm_op_reuse_tag]: 9.10019e-07 [interleave_split_concat_branches]: 1.21002e-06 [interleave_parallel_branches]: 1.04003e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66998e-06 [control_data_broadcast_order]: 1.177e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.794e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 2.00002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.913e-05, [1] [Cycle 1]: 6.49e-05, [6] [build]: 2.91999e-06 [elim_shapecalc]: 8.51997e-06 [elim_not_effective]: 1.161e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.60999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 2.26e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.499e-05 [get_jit_bprop_graph]: 1.37e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00045165 [validate]: 3.453e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00609211 [execute]: 7.05e-06 Sums bootstrap : 0.000421s : 2.89% type_inference : 0.004373s : 30.01% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000428s : 2.93% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000012s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000393s : 2.70% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000548s : 3.76% optimize.opt_b.b_1 : 0.000109s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000415s : 2.85% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000002s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.02% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 3.10% validate : 0.000035s : 0.24% backend_pass : 0.000001s : 0.01% task_emit : 0.006092s : 41.80% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000131 26 17.66% : 0.000023s : 4: substitution.arithmetic_simplify 1.55% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.94% : 0.000005s : 4: substitution.graph_param_transform 67.48% : 0.000088s : 2: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 2.93% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004330 2 91.89% : 0.003979s : 1: type_inference.infer 8.11% : 0.000351s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000087 2 100.00% : 0.000087s : 2: match.inline ------[predicate.] 0.000141 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 9: predicate.addn_zero_filter 0.67% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.52% : 0.000004s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.76% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.02% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.52% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.80% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.97% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.39% : 0.000001s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.52% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.81% : 0.000003s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.36% : 0.000002s : 11: predicate.partial_defer_inline 1.16% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.91% : 0.000001s : 8: predicate.remove_not_recompute_node 1.54% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.49% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.14% : 0.000002s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.26% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.95% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000247 6 40.91% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.09% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026921 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.55% : 0.003111s : 1: add_attr 11.52% : 0.003101s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.70% : 0.000457s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.07% : 0.000557s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000015s : 1: opt.transform.mutable_eliminate 2.89% : 0.000777s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.15% : 0.001925s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000461s : 1: opt_after_jit_grad 0.69% : 0.000186s : 1: opt_b 14.26% : 0.003840s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.82% : 0.000220s : 1: renormalize.infer 0.62% : 0.000166s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.67% : 0.006103s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.30% : 0.004389s : 1: type_inference 0.24% : 0.000063s : 1: validate TotalTime = 0.0198129, [24] [bootstrap]: 0.00041097 [type_inference]: 0.00550135 [event_method]: 1.416e-05 [auto_monad]: 5.388e-05 [graph_reusing]: 5.96998e-06 [inline]: 1.57999e-06 [add_attr]: 0.00301462, [1] [add_attr_with_inline]: 0.00300601, [1] [Cycle 1]: 4.899e-05, [2] [tag_attr]: 1.607e-05 [meta_addattr_fg_expand]: 3.87002e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.769e-05 [insert-virtual-dataset]: 2.66999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00406663, [53] [py_interpret_to_execute]: 2.035e-05 [rewriter_before_opt_a]: 5.768e-05 [opt_a]: 0.00215157, [2] [Cycle 1]: 0.00155309, [45] [expand_dump_flag]: 2.62001e-06 [switch_simplify]: 3.257e-05 [loop_unroll]: 2.056e-05 [a_1]: 0.00045079 [with_stream_mark]: 1.43e-05 [recompute_prepare]: 7.84002e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.76998e-06 [a_2]: 7.564e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.51998e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 7.94002e-06 [auto_parallel]: 5.99999e-06 [parallel]: 1.688e-05 [flash_sp]: 7.33999e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.14001e-06 [matmul_add_comm_reduction]: 8.94e-06 [allreduce_slice_to_reducescatter]: 1.09e-06 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.37999e-06 [virtual_output]: 5.59e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.89001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.37999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.35997e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 9.94001e-06 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00046261 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 2.02001e-06 [auto_monad_eliminator]: 1.462e-05 [cse]: 2.773e-05 [a_3]: 4.052e-05 [Cycle 2]: 0.00058813, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 7.13e-06 [loop_unroll]: 5.28002e-06 [a_1]: 0.00012461 [with_stream_mark]: 9.36e-06 [recompute_prepare]: 5.73997e-06 [updatestate_depend_eliminate]: 2.95002e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.687e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.35001e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.24002e-06 [auto_parallel]: 5.10999e-06 [parallel]: 4.35999e-06 [flash_sp]: 3.55998e-06 [merge_comm]: 2.91999e-06 [allreduce_fusion]: 2.79999e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.50001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.86001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.84999e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.22999e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.469e-05 [a_3]: 3.181e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.93002e-06 [rewriter_after_opt_a]: 3.305e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00047375 [opt_b]: 0.00018449, [1] [Cycle 1]: 0.000178, [7] [b_1]: 0.00010996 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 4.39992e-07 [cse]: 1.583e-05 [optimize_parallel_all_gather_comm]: 1.619e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.246e-05 [loop_unroll]: 0.00041518 [opt_after_cconv]: 9.477e-05, [1] [Cycle 1]: 8.874e-05, [7] [c_1]: 2.762e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.24003e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.40002e-06 [cse]: 1.547e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.207e-05 [tuple_transform]: 0.00012174, [1] [Cycle 1]: 0.00011745, [4] [d_1]: 9.06e-05 [none_parameter_eliminate]: 2.04e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.39999e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.464e-05 [cse_after_recomputation]: 2.064e-05, [1] [Cycle 1]: 1.607e-05, [1] [cse]: 1.049e-05 [environ_conv]: 4.90999e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.33998e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.05002e-06 [assign_add_opt]: 1.50001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.33002e-06 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 9.10019e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.161e-05 [grouped_pairwise_exchange_alltoall]: 1.91998e-06 [offloading_packed_experts]: 3.69002e-06 [overlap_recompute_and_grad_model_parallel]: 4.2e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.55001e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.692e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.762e-05, [1] [Cycle 1]: 6.34e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.54002e-06 [elim_not_effective]: 1.076e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.62e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.629e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.84002e-06 [opt_after_jit_grad]: 0.00044818 [validate]: 3.142e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00599449 [execute]: 7.58001e-06 Sums bootstrap : 0.000411s : 2.60% type_inference : 0.005501s : 34.74% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000575s : 3.63% optimize.opt_a.with_stream_mark : 0.000024s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.90% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000463s : 2.92% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000474s : 2.99% optimize.opt_b.b_1 : 0.000110s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000415s : 2.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000091s : 0.57% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000448s : 2.83% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005994s : 37.85% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000169 30 14.96% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.21% : 0.000005s : 4: substitution.graph_param_transform 67.05% : 0.000113s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.50% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005459 2 90.00% : 0.004913s : 1: type_inference.infer 10.00% : 0.000546s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.44% : 0.000027s : 3: replace.inline 30.56% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.80% : 0.000111s : 3: match.inline 8.20% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.77% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.51% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 1.10% : 0.000002s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.23% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000348 8 45.75% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.25% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028501 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.59% : 0.003019s : 1: add_attr 10.56% : 0.003010s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.56% : 0.000446s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.69% : 0.000483s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.29% : 0.000938s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.33% : 0.000095s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.56% : 0.002155s : 1: opt_a 0.34% : 0.000098s : 1: opt_after_cconv 1.61% : 0.000458s : 1: opt_after_jit_grad 0.66% : 0.000188s : 1: opt_b 14.28% : 0.004070s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.86% : 0.000245s : 1: renormalize.infer 0.74% : 0.000211s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.07% : 0.006005s : 1: task_emit 0.44% : 0.000125s : 1: tuple_transform 19.35% : 0.005516s : 1: type_inference 0.21% : 0.000060s : 1: validate TotalTime = 0.0380702, [24] [bootstrap]: 0.00046332 [type_inference]: 0.0115936 [event_method]: 4.627e-05 [auto_monad]: 0.00011915 [graph_reusing]: 8.46002e-06 [inline]: 1.77999e-06 [add_attr]: 0.00307171, [1] [add_attr_with_inline]: 0.00306322, [1] [Cycle 1]: 7.322e-05, [2] [tag_attr]: 3.521e-05 [meta_addattr_fg_expand]: 9.09e-06 [parallel-infer-symbol]: 3.16001e-06 [pre_auto_parallel]: 5.171e-05 [insert-virtual-dataset]: 2.43002e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.0135, [53] [py_interpret_to_execute]: 3.994e-05 [rewriter_before_opt_a]: 0.00014686 [opt_a]: 0.0111876, [3] [Cycle 1]: 0.00715703, [45] [expand_dump_flag]: 4.58001e-06 [switch_simplify]: 7.415e-05 [loop_unroll]: 6.153e-05 [a_1]: 0.00144533 [with_stream_mark]: 2.371e-05 [recompute_prepare]: 2.192e-05 [updatestate_depend_eliminate]: 9.25999e-06 [updatestate_assign_eliminate]: 7.61999e-06 [updatestate_loads_eliminate]: 7.23e-06 [parameter_eliminate]: 2.81999e-06 [a_2]: 0.00024343 [accelerated_algorithm]: 3.037e-05 [shard]: 2.21e-06 [meta_shard_fg_expand]: 4e-06 [shard_inline]: 1.584e-05 [merge_send_recv]: 1.589e-05 [auto_parallel]: 1.066e-05 [parallel]: 1.901e-05 [flash_sp]: 1.102e-05 [merge_comm]: 9.81998e-06 [allreduce_fusion]: 9.15999e-06 [matmul_add_comm_reduction]: 2.654e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.791e-05 [virtual_dataset]: 1.55e-05 [get_grad_eliminate_]: 1.543e-05 [virtual_output]: 1.54e-05 [merge_forward]: 9.66e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.791e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.931e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.712e-05 [set_forward_comm_id_for_comm_node_pass]: 9.76e-06 [meta_fg_expand]: 0.00146692 [flash_sp_send_recv_attached]: 4.17e-06 [receive_attached]: 2.76999e-06 [after_resolve]: 5.974e-05 [a_after_grad]: 8.067e-05 [renormalize]: 0.00244803 [add_forward_monad_depend]: 9.51003e-06 [auto_monad_grad]: 5.51e-06 [auto_monad_eliminator]: 5.627e-05 [cse]: 0.00019453 [a_3]: 0.00033509 [Cycle 2]: 0.00307163, [45] [expand_dump_flag]: 1.90001e-06 [switch_simplify]: 4.691e-05 [loop_unroll]: 4.345e-05 [a_1]: 0.00153525 [with_stream_mark]: 1.396e-05 [recompute_prepare]: 1.152e-05 [updatestate_depend_eliminate]: 5.94e-06 [updatestate_assign_eliminate]: 4.48001e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 1.16002e-06 [a_2]: 0.00012683 [accelerated_algorithm]: 1.228e-05 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 9.21002e-06 [merge_send_recv]: 7.58999e-06 [auto_parallel]: 7.70998e-06 [parallel]: 5.66e-06 [flash_sp]: 3.24001e-06 [merge_comm]: 5.22e-06 [allreduce_fusion]: 4.63001e-06 [matmul_add_comm_reduction]: 8.72998e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 1.027e-05 [virtual_dataset]: 8.91997e-06 [get_grad_eliminate_]: 8.68001e-06 [virtual_output]: 8.82e-06 [merge_forward]: 5.05999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.031e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.728e-05 [merge_recompute_call_nodes]: 9.70002e-07 [before_grad]: 1.457e-05 [set_forward_comm_id_for_comm_node_pass]: 5.54e-06 [meta_fg_expand]: 7.469e-05 [flash_sp_send_recv_attached]: 1.27999e-06 [receive_attached]: 1.40001e-06 [after_resolve]: 1.638e-05 [a_after_grad]: 1.466e-05 [renormalize]: 0.0006346 [add_forward_monad_depend]: 4.34002e-06 [auto_monad_grad]: 1.54e-06 [auto_monad_eliminator]: 1.472e-05 [cse]: 4.906e-05 [a_3]: 6.475e-05 [Cycle 3]: 0.00094375, [45] [expand_dump_flag]: 1.42e-06 [switch_simplify]: 1.06e-05 [loop_unroll]: 8.87e-06 [a_1]: 0.00024987 [with_stream_mark]: 1.032e-05 [recompute_prepare]: 9.23002e-06 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012303 [accelerated_algorithm]: 1.156e-05 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 4.733e-05 [merge_send_recv]: 7.05e-06 [auto_parallel]: 7.87e-06 [parallel]: 4.57003e-06 [flash_sp]: 1.12e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 5.14e-06 [matmul_add_comm_reduction]: 7.88001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 9.97001e-06 [virtual_dataset]: 8.54002e-06 [get_grad_eliminate_]: 8.44998e-06 [virtual_output]: 8.36002e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 9.22999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.649e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.12e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.476e-05 [a_after_grad]: 1.445e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 1.096e-05 [cse]: 2.596e-05 [a_3]: 5.899e-05 [py_interpret_to_execute_after_opt_a]: 1.181e-05 [slice_cell_reuse_recomputed_activation]: 2.08002e-06 [rewriter_after_opt_a]: 4.843e-05 [convert_after_rewriter]: 8.92999e-06 [order_py_execute_after_rewriter]: 6.89999e-06 [mutable_eliminate]: 0.00049655 [opt_b]: 0.00028788, [1] [Cycle 1]: 0.00028128, [7] [b_1]: 0.00018986 [b_2]: 1.06e-05 [updatestate_depend_eliminate]: 7.1e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 4.05998e-06 [renormalize]: 5.09986e-07 [cse]: 3.096e-05 [optimize_parallel_all_gather_comm]: 2.114e-05 [overlap_param_gather]: 2.02001e-06 [cconv]: 2.157e-05 [loop_unroll]: 0.0004242 [opt_after_cconv]: 0.00013454, [1] [Cycle 1]: 0.00012865, [7] [c_1]: 4.837e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 7.01001e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 3.85998e-06 [cse]: 2.901e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 3.126e-05 [tuple_transform]: 0.00010291, [1] [Cycle 1]: 9.776e-05, [4] [d_1]: 6.755e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.84001e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 6.005e-05 [cse_after_recomputation]: 3.211e-05, [1] [Cycle 1]: 2.753e-05, [1] [cse]: 2.171e-05 [environ_conv]: 8.27e-06 [swap_dp_allreduce_reducescatter]: 7.76001e-06 [bias_add_comm_swap]: 3.13998e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.78998e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.39e-06 [overlap_opt_shard_in_pipeline]: 1.06002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.648e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 4.73001e-06 [overlap_recompute_and_grad_model_parallel]: 5.66e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69998e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 5.14e-06 [overlap_grad_flash_sp]: 2.544e-05 [begin_end_overlap_inline]: 6.70028e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 1.01002e-06 [symbol_engine_optimizer]: 9.76e-05, [1] [Cycle 1]: 9.334e-05, [6] [build]: 9.97999e-06 [elim_shapecalc]: 1.329e-05 [elim_not_effective]: 1.804e-05 [opt_reshape]: 9.62999e-06 [fold_const_symbol]: 1.481e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.91998e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 2.462e-05 [get_jit_bprop_graph]: 1.29e-06 [rewriter_after_jit_bprop_graph]: 3.92002e-06 [opt_after_jit_grad]: 0.00046954 [validate]: 4.389e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00843918 [execute]: 7.32002e-06 Sums bootstrap : 0.000463s : 1.37% type_inference : 0.011594s : 34.36% event_method : 0.000046s : 0.14% auto_monad : 0.000119s : 0.35% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000052s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.12% optimize.rewriter_before_opt_a : 0.000147s : 0.44% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003230s : 9.57% optimize.opt_a.with_stream_mark : 0.000048s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.46% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000072s : 0.21% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001544s : 4.58% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000091s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.003083s : 9.14% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.24% optimize.opt_a.cse : 0.000270s : 0.80% optimize.opt_a.a_3 : 0.000459s : 1.36% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000497s : 1.47% optimize.opt_b.b_1 : 0.000190s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.06% optimize.loop_unroll : 0.000424s : 1.26% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.39% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008439s : 25.01% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000773 222 6.19% : 0.000048s : 12: substitution.arithmetic_simplify 1.78% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.47% : 0.000429s : 17: substitution.inline 2.06% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000015s : 3: substitution.less_batch_normalization 1.69% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.15% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.54% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.41% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011517 2 87.35% : 0.010060s : 1: type_inference.infer 12.65% : 0.001457s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.64% : 0.000127s : 17: replace.inline 42.36% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000454 33 92.44% : 0.000420s : 17: match.inline 7.56% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000755 5764 1.06% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.01% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000043s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.58% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.11% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.19% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.18% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.05% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.42% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001570 34 57.35% : 0.000900s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.65% : 0.000669s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062982 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.88% : 0.003076s : 1: add_attr 4.87% : 0.003067s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000064s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000126s : 1: auto_monad 0.04% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.79% : 0.000498s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.80% : 0.000506s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.83% : 0.004932s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.77% : 0.011191s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.76% : 0.000479s : 1: opt_after_jit_grad 0.46% : 0.000291s : 1: opt_b 21.44% : 0.013504s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000057s : 1: pre_auto_parallel 0.07% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000036s : 1: remove_dup_value 2.66% : 0.001677s : 2: renormalize.infer 2.21% : 0.001392s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.42% : 0.008450s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 18.43% : 0.011609s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.018385, [24] [bootstrap]: 0.00042492 [type_inference]: 0.00425221 [event_method]: 1.052e-05 [auto_monad]: 5.151e-05 [graph_reusing]: 4.58999e-06 [inline]: 1.99e-06 [add_attr]: 0.00297293, [1] [add_attr_with_inline]: 0.0029646, [1] [Cycle 1]: 4.059e-05, [2] [tag_attr]: 1.153e-05 [meta_addattr_fg_expand]: 3.15002e-06 [parallel-infer-symbol]: 3.14999e-06 [pre_auto_parallel]: 2.244e-05 [insert-virtual-dataset]: 2.71e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00366954, [53] [py_interpret_to_execute]: 1.64e-05 [rewriter_before_opt_a]: 3.871e-05 [opt_a]: 0.00184525, [2] [Cycle 1]: 0.00124841, [45] [expand_dump_flag]: 2.26998e-06 [switch_simplify]: 2.407e-05 [loop_unroll]: 1.341e-05 [a_1]: 0.00029057 [with_stream_mark]: 1.312e-05 [recompute_prepare]: 7.61001e-06 [updatestate_depend_eliminate]: 4.07e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.75e-05 [accelerated_algorithm]: 6.84001e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.97999e-06 [merge_send_recv]: 8.33999e-06 [auto_parallel]: 5.82001e-06 [parallel]: 1.699e-05 [flash_sp]: 7.58999e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.65003e-06 [matmul_add_comm_reduction]: 9.25999e-06 [allreduce_slice_to_reducescatter]: 8.00006e-07 [virtual_shard_identity]: 7.03998e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 3.51999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 8.85999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.11002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.88998e-06 [receive_attached]: 3.11001e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 8.70999e-06 [renormalize]: 0.00034192 [add_forward_monad_depend]: 4.15999e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.222e-05 [cse]: 2.732e-05 [a_3]: 3.881e-05 [Cycle 2]: 0.00058692, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 6.54999e-06 [loop_unroll]: 5.39998e-06 [a_1]: 0.00012405 [with_stream_mark]: 9.37001e-06 [recompute_prepare]: 5.63002e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.656e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 4.87e-06 [parallel]: 4.26001e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.21002e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.26002e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 7.60998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.88999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 7.30011e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.84e-06 [a_after_grad]: 8.02e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.227e-05 [a_3]: 3.155e-05 [py_interpret_to_execute_after_opt_a]: 7.13998e-06 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 3.031e-05 [convert_after_rewriter]: 7.05002e-06 [order_py_execute_after_rewriter]: 5.57999e-06 [mutable_eliminate]: 0.0004439 [opt_b]: 0.00017808, [1] [Cycle 1]: 0.00017212, [7] [b_1]: 0.00010553 [b_2]: 7.19001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 4.09986e-07 [cse]: 1.527e-05 [optimize_parallel_all_gather_comm]: 1.552e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.35e-05 [loop_unroll]: 0.00040929 [opt_after_cconv]: 9.365e-05, [1] [Cycle 1]: 8.789e-05, [7] [c_1]: 2.717e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.40997e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.585e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 6.809e-05, [1] [Cycle 1]: 6.382e-05, [4] [d_1]: 3.853e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.11998e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 7.873e-05 [cse_after_recomputation]: 2.059e-05, [1] [Cycle 1]: 1.616e-05, [1] [cse]: 1.083e-05 [environ_conv]: 4.80001e-06 [swap_dp_allreduce_reducescatter]: 5.16998e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 3.83001e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.56998e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.77002e-06 [comm_op_add_attrs]: 9.69972e-07 [add_comm_op_reuse_tag]: 8.90024e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.36998e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.14e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.68999e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.86003e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.635e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.91003e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.666e-05, [1] [Cycle 1]: 6.257e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 7.9e-06 [elim_not_effective]: 1.137e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.70999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 1.30999e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00044372 [validate]: 3.094e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00626673 [execute]: 7.15e-06 Sums bootstrap : 0.000425s : 2.94% type_inference : 0.004252s : 29.40% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.87% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000342s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000070s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000444s : 3.07% optimize.opt_b.b_1 : 0.000106s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000409s : 2.83% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000079s : 0.54% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 3.07% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006267s : 43.33% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.45% : 0.000022s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.21% : 0.000001s : 2: substitution.fold_const_symbol 4.29% : 0.000005s : 4: substitution.graph_param_transform 65.53% : 0.000077s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 3.26% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004212 2 91.80% : 0.003867s : 1: type_inference.infer 8.20% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.95% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.03% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000008s : 44: predicate.inline 1.11% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.44% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 1.01% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.56% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.67% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 40.97% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.03% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026288 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.33% : 0.002977s : 1: add_attr 11.29% : 0.002968s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.32% : 0.000083s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.75% : 0.000460s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000418s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.90% : 0.000763s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.03% : 0.001848s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.72% : 0.000453s : 1: opt_after_jit_grad 0.69% : 0.000181s : 1: opt_b 13.97% : 0.003673s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000185s : 1: renormalize.infer 0.57% : 0.000151s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000069s : 1: symbol_engine_optimizer 23.88% : 0.006277s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.23% : 0.004266s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0359944, [24] [bootstrap]: 0.00045762 [type_inference]: 0.0102363 [event_method]: 4.061e-05 [auto_monad]: 0.00011319 [graph_reusing]: 7.71999e-06 [inline]: 2.09999e-06 [add_attr]: 0.00296049, [1] [add_attr_with_inline]: 0.00295155, [1] [Cycle 1]: 6.576e-05, [2] [tag_attr]: 3.039e-05 [meta_addattr_fg_expand]: 8.84998e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 4.63e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.90023e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.0130967, [53] [py_interpret_to_execute]: 3.619e-05 [rewriter_before_opt_a]: 0.00015199 [opt_a]: 0.0107808, [3] [Cycle 1]: 0.00686381, [45] [expand_dump_flag]: 4.01001e-06 [switch_simplify]: 6.677e-05 [loop_unroll]: 5.492e-05 [a_1]: 0.00132048 [with_stream_mark]: 2.278e-05 [recompute_prepare]: 2.108e-05 [updatestate_depend_eliminate]: 8.88002e-06 [updatestate_assign_eliminate]: 7.95e-06 [updatestate_loads_eliminate]: 7.26999e-06 [parameter_eliminate]: 2.82002e-06 [a_2]: 0.00024379 [accelerated_algorithm]: 3.038e-05 [shard]: 2.04e-06 [meta_shard_fg_expand]: 3.06001e-06 [shard_inline]: 1.595e-05 [merge_send_recv]: 1.547e-05 [auto_parallel]: 1.082e-05 [parallel]: 1.78e-05 [flash_sp]: 1.136e-05 [merge_comm]: 9.65002e-06 [allreduce_fusion]: 9.02999e-06 [matmul_add_comm_reduction]: 2.592e-05 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 1.757e-05 [virtual_dataset]: 1.599e-05 [get_grad_eliminate_]: 1.504e-05 [virtual_output]: 1.506e-05 [merge_forward]: 9.54e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.75e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.855e-05 [merge_recompute_call_nodes]: 1.56002e-06 [before_grad]: 2.736e-05 [set_forward_comm_id_for_comm_node_pass]: 9.78998e-06 [meta_fg_expand]: 0.00136819 [flash_sp_send_recv_attached]: 3.38e-06 [receive_attached]: 2.02001e-06 [after_resolve]: 5.899e-05 [a_after_grad]: 8.04e-05 [renormalize]: 0.00243578 [add_forward_monad_depend]: 9.65002e-06 [auto_monad_grad]: 5.62001e-06 [auto_monad_eliminator]: 5.479e-05 [cse]: 0.00016295 [a_3]: 0.00033478 [Cycle 2]: 0.00300734, [45] [expand_dump_flag]: 1.65001e-06 [switch_simplify]: 4.692e-05 [loop_unroll]: 4.347e-05 [a_1]: 0.00157843 [with_stream_mark]: 1.256e-05 [recompute_prepare]: 1.136e-05 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 4.47998e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012609 [accelerated_algorithm]: 1.188e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.99999e-06 [shard_inline]: 9.08002e-06 [merge_send_recv]: 7.15e-06 [auto_parallel]: 7.77e-06 [parallel]: 5.84e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.04e-06 [allreduce_fusion]: 4.68999e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 2.99973e-07 [virtual_shard_identity]: 1.005e-05 [virtual_dataset]: 8.93002e-06 [get_grad_eliminate_]: 8.72e-06 [virtual_output]: 8.36002e-06 [merge_forward]: 4.28001e-06 [cell_reuse_recompute_pass]: 8.2e-07 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.575e-05 [merge_recompute_call_nodes]: 8.2e-07 [before_grad]: 1.397e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99e-06 [meta_fg_expand]: 3.473e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.465e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00059087 [add_forward_monad_depend]: 4.32e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.489e-05 [cse]: 4.5e-05 [a_3]: 6.499e-05 [Cycle 3]: 0.00089436, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 1.057e-05 [loop_unroll]: 8.98002e-06 [a_1]: 0.00024957 [with_stream_mark]: 1.042e-05 [recompute_prepare]: 9.27999e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 3.97e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012226 [accelerated_algorithm]: 1.154e-05 [shard]: 8.99978e-07 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 8.85999e-06 [merge_send_recv]: 7.06001e-06 [auto_parallel]: 7.25e-06 [parallel]: 4.36002e-06 [flash_sp]: 1.14e-06 [merge_comm]: 4.93001e-06 [allreduce_fusion]: 4.94998e-06 [matmul_add_comm_reduction]: 7.70998e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 9.69999e-06 [virtual_dataset]: 8.54e-06 [get_grad_eliminate_]: 8.40999e-06 [virtual_output]: 8.08001e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 8.42998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.599e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.366e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.309e-05 [a_after_grad]: 1.415e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 1.051e-05 [cse]: 2.427e-05 [a_3]: 5.993e-05 [py_interpret_to_execute_after_opt_a]: 1.089e-05 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 4.537e-05 [convert_after_rewriter]: 8.98002e-06 [order_py_execute_after_rewriter]: 6.56999e-06 [mutable_eliminate]: 0.00047721 [opt_b]: 0.00028558, [1] [Cycle 1]: 0.00027906, [7] [b_1]: 0.00018926 [b_2]: 1.054e-05 [updatestate_depend_eliminate]: 7.01001e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 4.18999e-06 [renormalize]: 3.60014e-07 [cse]: 2.949e-05 [optimize_parallel_all_gather_comm]: 2.009e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 1.968e-05 [loop_unroll]: 0.00046319 [opt_after_cconv]: 0.00013437, [1] [Cycle 1]: 0.00012866, [7] [c_1]: 4.852e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.8e-06 [cse]: 2.844e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 2.972e-05 [tuple_transform]: 0.00010233, [1] [Cycle 1]: 9.762e-05, [4] [d_1]: 6.779e-05 [none_parameter_eliminate]: 1.68002e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.73002e-06 [partial_unused_args_eliminate]: 2.19999e-06 [add_recomputation]: 5.727e-05 [cse_after_recomputation]: 3.11e-05, [1] [Cycle 1]: 2.636e-05, [1] [cse]: 2.116e-05 [environ_conv]: 8.42e-06 [swap_dp_allreduce_reducescatter]: 7.75e-06 [bias_add_comm_swap]: 2.46998e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.75002e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 9.79984e-07 [remove_cast_before_assign_add]: 9.60019e-07 [full_micro_interleaved_order_control]: 2.55002e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52001e-06 [control_data_broadcast_order]: 1.683e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 4.77e-06 [overlap_recompute_and_grad_model_parallel]: 5.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.53002e-06 [overlap_recompute_comm]: 1.92999e-06 [overlap_grad_ring_attention]: 5.24e-06 [overlap_grad_flash_sp]: 2.407e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.685e-05, [1] [Cycle 1]: 9.27e-05, [6] [build]: 9.47001e-06 [elim_shapecalc]: 1.324e-05 [elim_not_effective]: 1.852e-05 [opt_reshape]: 9.95002e-06 [fold_const_symbol]: 1.432e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.96e-06 [pipeline_parallel_scheduler]: 1.32999e-06 [auto_monad_reorder]: 2.417e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00046856 [validate]: 4.427e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00826728 [execute]: 6.91999e-06 Sums bootstrap : 0.000458s : 1.44% type_inference : 0.010236s : 32.20% event_method : 0.000041s : 0.13% auto_monad : 0.000113s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000152s : 0.48% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003148s : 9.90% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000492s : 1.55% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001406s : 4.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.34% optimize.opt_a.renormalize : 0.003027s : 9.52% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000232s : 0.73% optimize.opt_a.a_3 : 0.000460s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000045s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000477s : 1.50% optimize.opt_b.b_1 : 0.000189s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000029s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000463s : 1.46% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000028s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000469s : 1.47% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008267s : 26.01% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000730 218 6.13% : 0.000045s : 11: substitution.arithmetic_simplify 1.79% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.57% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 54.53% : 0.000398s : 16: substitution.inline 2.13% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000015s : 3: substitution.less_batch_normalization 1.83% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000005s : 5: substitution.partial_eliminate 1.77% : 0.000013s : 20: substitution.remove_not_recompute_node 3.23% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.83% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.90% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.48% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010168 2 87.26% : 0.008872s : 1: type_inference.infer 12.74% : 0.001296s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.31% : 0.000119s : 16: replace.inline 40.69% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000421 30 92.58% : 0.000390s : 16: match.inline 7.42% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.21% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.23% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.16% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000009s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 97: predicate.switch_defer_inline 2.98% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.81% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001497 32 56.62% : 0.000847s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.38% : 0.000649s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060195 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.93% : 0.002965s : 1: add_attr 4.91% : 0.002955s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000120s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.82% : 0.000492s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.78% : 0.000472s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.81% : 0.000486s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.95% : 0.004788s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.13% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.91% : 0.010784s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.79% : 0.000478s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.76% : 0.013100s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.73% : 0.001643s : 2: renormalize.infer 2.28% : 0.001371s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000049s : 1: rewriter_after_opt_a 0.26% : 0.000157s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.75% : 0.008277s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 17.03% : 0.010252s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x2-kbk],max_mem:46.0M TotalTime = 0.118154, [24] [bootstrap]: 0.0004833 [type_inference]: 0.00596322 [event_method]: 1.386e-05 [auto_monad]: 5.453e-05 [graph_reusing]: 5.34e-06 [inline]: 1.55001e-06 [add_attr]: 0.00342999, [1] [add_attr_with_inline]: 0.00341949, [1] [Cycle 1]: 4.531e-05, [2] [tag_attr]: 1.556e-05 [meta_addattr_fg_expand]: 4.16001e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 2.697e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00395119, [53] [py_interpret_to_execute]: 1.897e-05 [rewriter_before_opt_a]: 5.736e-05 [opt_a]: 0.00210071, [2] [Cycle 1]: 0.00150785, [45] [expand_dump_flag]: 2.87002e-06 [switch_simplify]: 3.166e-05 [loop_unroll]: 2.052e-05 [a_1]: 0.00045292 [with_stream_mark]: 1.26e-05 [recompute_prepare]: 7.88001e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.52002e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.76998e-06 [a_2]: 7.58e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 5.74e-06 [parallel]: 2.31e-05 [flash_sp]: 6.73e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 9.01998e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.67998e-06 [virtual_dataset]: 6.13002e-06 [get_grad_eliminate_]: 5.70001e-06 [virtual_output]: 5.53002e-06 [merge_forward]: 3.56999e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 8.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.097e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.09998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.49999e-06 [receive_attached]: 2.21998e-06 [after_resolve]: 1.023e-05 [a_after_grad]: 8.77e-06 [renormalize]: 0.00041563 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.82001e-06 [auto_monad_eliminator]: 1.325e-05 [cse]: 2.66e-05 [a_3]: 4.033e-05 [Cycle 2]: 0.00058371, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00012534 [with_stream_mark]: 9.24998e-06 [recompute_prepare]: 5.99999e-06 [updatestate_depend_eliminate]: 2.67001e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.744e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 5.64e-06 [parallel]: 4.22e-06 [flash_sp]: 3.28998e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.41e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.93999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99001e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.49977e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.36998e-06 [a_after_grad]: 8.54002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 5.76e-06 [cse]: 1.157e-05 [a_3]: 3.155e-05 [py_interpret_to_execute_after_opt_a]: 7.43e-06 [slice_cell_reuse_recomputed_activation]: 1.93997e-06 [rewriter_after_opt_a]: 3.115e-05 [convert_after_rewriter]: 6.94001e-06 [order_py_execute_after_rewriter]: 5.56e-06 [mutable_eliminate]: 0.00044878 [opt_b]: 0.00019883, [1] [Cycle 1]: 0.00019256, [7] [b_1]: 0.00012321 [b_2]: 7.31001e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.67001e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 4.00003e-07 [cse]: 1.579e-05 [optimize_parallel_all_gather_comm]: 1.484e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 2.249e-05 [loop_unroll]: 0.00041479 [opt_after_cconv]: 9.414e-05, [1] [Cycle 1]: 8.847e-05, [7] [c_1]: 2.768e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.58e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.227e-05 [tuple_transform]: 6.842e-05, [1] [Cycle 1]: 6.375e-05, [4] [d_1]: 3.842e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.962e-05 [cse_after_recomputation]: 2.066e-05, [1] [Cycle 1]: 1.613e-05, [1] [cse]: 1.096e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 5.07999e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 8.29983e-07 [full_micro_interleaved_order_control]: 2.57001e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.04003e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.193e-05 [grouped_pairwise_exchange_alltoall]: 2.00002e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67999e-06 [overlap_recompute_comm]: 1.88002e-06 [overlap_grad_ring_attention]: 4.09002e-06 [overlap_grad_flash_sp]: 1.737e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.41e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.763e-05, [1] [Cycle 1]: 6.351e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.60999e-06 [elim_not_effective]: 1.135e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 8.78001e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.527e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.46999e-06 [opt_after_jit_grad]: 0.00044628 [validate]: 3.1e-05 [backend_pass]: 1.30999e-06 [task_emit]: 0.103493 [execute]: 9.10999e-06 Sums bootstrap : 0.000483s : 0.42% type_inference : 0.005963s : 5.24% event_method : 0.000014s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000057s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000578s : 0.51% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000416s : 0.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000038s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.39% optimize.opt_b.b_1 : 0.000123s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000415s : 0.36% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.39% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.103493s : 90.98% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.82% : 0.000024s : 5: substitution.arithmetic_simplify 1.23% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.40% : 0.000006s : 4: substitution.graph_param_transform 66.23% : 0.000108s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.85% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005918 2 90.90% : 0.005379s : 1: type_inference.infer 9.10% : 0.000539s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.98% : 0.000027s : 3: replace.inline 29.02% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.25% : 0.000106s : 3: match.inline 8.75% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000171 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.73% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000004s : 19: predicate.arithmetic_simplify 0.81% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.53% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.03% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.02% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.01% : 0.000002s : 15: predicate.environ_get_depend_swap 1.67% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.17% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.05% : 0.000003s : 16: predicate.float_depend_g_call 0.54% : 0.000001s : 8: predicate.float_environ_get_switch 0.80% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.60% : 0.000010s : 51: predicate.inline 0.74% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.20% : 0.000004s : 32: predicate.load_eliminater 0.89% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.96% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.59% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.49% : 0.000003s : 16: predicate.partial_defer_inline 1.36% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.28% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.35% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.70% : 0.000001s : 8: predicate.specialize_transform 0.87% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.74% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.24% : 0.000002s : 16: predicate.switch_defer_inline 1.86% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.72% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.80% : 0.000001s : 11: predicate.transpose_eliminate 1.43% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.24% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.49% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.14% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.96% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.33% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 7.27% : 0.000012s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000334 8 47.15% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.85% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.127057 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.70% : 0.003434s : 1: add_attr 2.69% : 0.003423s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000520s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.74% : 0.000942s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000105s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.66% : 0.002104s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.36% : 0.000456s : 1: opt_after_jit_grad 0.16% : 0.000202s : 1: opt_b 3.11% : 0.003955s : 1: optimize 0.01% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000214s : 1: renormalize.infer 0.15% : 0.000195s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.47% : 0.103515s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.70% : 0.005976s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.112268, [24] [bootstrap]: 0.00041802 [type_inference]: 0.00435711 [event_method]: 1.055e-05 [auto_monad]: 4.845e-05 [graph_reusing]: 4.89003e-06 [inline]: 2.13002e-06 [add_attr]: 0.00294815, [1] [add_attr_with_inline]: 0.0029405, [1] [Cycle 1]: 4.038e-05, [2] [tag_attr]: 1.133e-05 [meta_addattr_fg_expand]: 3.25002e-06 [parallel-infer-symbol]: 2.70997e-06 [pre_auto_parallel]: 2.046e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.83002e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00366218, [53] [py_interpret_to_execute]: 1.484e-05 [rewriter_before_opt_a]: 3.958e-05 [opt_a]: 0.00184875, [2] [Cycle 1]: 0.00125238, [45] [expand_dump_flag]: 2.78e-06 [switch_simplify]: 2.491e-05 [loop_unroll]: 1.39e-05 [a_1]: 0.00029075 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.63002e-06 [a_2]: 7.608e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.14e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.03999e-06 [auto_parallel]: 6.46e-06 [parallel]: 1.859e-05 [flash_sp]: 7.2e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 8.70999e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.03998e-06 [virtual_dataset]: 5.77999e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.46001e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 9.21998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.47e-06 [renormalize]: 0.0003439 [add_forward_monad_depend]: 4.42998e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.293e-05 [cse]: 2.813e-05 [a_3]: 3.95e-05 [Cycle 2]: 0.00058718, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.46002e-06 [a_1]: 0.00012488 [with_stream_mark]: 1.06e-05 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.71e-05 [accelerated_algorithm]: 5.35999e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.36002e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.19001e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.61999e-06 [matmul_add_comm_reduction]: 5.23002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.89e-06 [virtual_dataset]: 5.09998e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.37999e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 5.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.40023e-07 [receive_attached]: 9.29984e-07 [after_resolve]: 8.97e-06 [a_after_grad]: 7.94997e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 6.79982e-07 [auto_monad_eliminator]: 6.09001e-06 [cse]: 1.261e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 7.14001e-06 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 3.017e-05 [convert_after_rewriter]: 7.58999e-06 [order_py_execute_after_rewriter]: 5.28002e-06 [mutable_eliminate]: 0.0004431 [opt_b]: 0.00017904, [1] [Cycle 1]: 0.0001731, [7] [b_1]: 0.00010611 [b_2]: 7.07002e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 4.49974e-07 [cse]: 1.556e-05 [optimize_parallel_all_gather_comm]: 3.307e-05 [overlap_param_gather]: 2.24999e-06 [cconv]: 2.259e-05 [loop_unroll]: 0.00041337 [opt_after_cconv]: 9.398e-05, [1] [Cycle 1]: 8.844e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.603e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.165e-05 [tuple_transform]: 6.904e-05, [1] [Cycle 1]: 6.468e-05, [4] [d_1]: 3.95e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.29978e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.411e-05 [cse_after_recomputation]: 2.051e-05, [1] [Cycle 1]: 1.62e-05, [1] [cse]: 1.076e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 5.32001e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.64999e-06 [micro_interleaved_order_control]: 2.71999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.60997e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.45999e-06 [overlap_opt_shard_in_pipeline]: 1.25001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.128e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 1.641e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.61002e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 6.721e-05, [1] [Cycle 1]: 6.324e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 8.04997e-06 [elim_not_effective]: 1.112e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.55e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00044448 [validate]: 3.102e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.100072 [execute]: 9.45001e-06 Sums bootstrap : 0.000418s : 0.39% type_inference : 0.004357s : 4.02% event_method : 0.000011s : 0.01% auto_monad : 0.000048s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000020s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000416s : 0.38% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000344s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000443s : 0.41% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000033s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000413s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.41% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.100072s : 92.36% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000119 26 17.58% : 0.000021s : 4: substitution.arithmetic_simplify 1.35% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.63% : 0.000006s : 4: substitution.graph_param_transform 65.84% : 0.000078s : 2: substitution.inline 2.49% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.57% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004316 2 91.64% : 0.003955s : 1: type_inference.infer 8.36% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.98% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.15% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.06% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.95% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.66% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.88% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.70% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000259 6 42.41% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.59% : 0.000149s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120142 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.46% : 0.002952s : 1: add_attr 2.45% : 0.002944s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000053s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.38% : 0.000453s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000452s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.64% : 0.000764s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.54% : 0.001852s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.38% : 0.000454s : 1: opt_after_jit_grad 0.15% : 0.000182s : 1: opt_b 3.05% : 0.003666s : 1: optimize 0.03% : 0.000037s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.16% : 0.000188s : 1: renormalize.infer 0.12% : 0.000150s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 83.31% : 0.100095s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.64% : 0.004371s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.111972, [24] [bootstrap]: 0.00045238 [type_inference]: 0.00554009 [event_method]: 1.449e-05 [auto_monad]: 5.523e-05 [graph_reusing]: 5.65001e-06 [inline]: 2.02999e-06 [add_attr]: 0.00292078, [1] [add_attr_with_inline]: 0.00291267, [1] [Cycle 1]: 4.492e-05, [2] [tag_attr]: 1.524e-05 [meta_addattr_fg_expand]: 3.87002e-06 [parallel-infer-symbol]: 2.80002e-06 [pre_auto_parallel]: 2.511e-05 [insert-virtual-dataset]: 2.67001e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00397847, [53] [py_interpret_to_execute]: 2.176e-05 [rewriter_before_opt_a]: 5.816e-05 [opt_a]: 0.00214595, [2] [Cycle 1]: 0.00154125, [45] [expand_dump_flag]: 2.94001e-06 [switch_simplify]: 3.209e-05 [loop_unroll]: 5.163e-05 [a_1]: 0.00044861 [with_stream_mark]: 1.352e-05 [recompute_prepare]: 7.70998e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.604e-05 [accelerated_algorithm]: 6.44999e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.92999e-06 [merge_send_recv]: 7.72998e-06 [auto_parallel]: 5.89e-06 [parallel]: 1.739e-05 [flash_sp]: 7.41001e-06 [merge_comm]: 3.41999e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.96002e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.01999e-06 [virtual_dataset]: 5.97999e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.56999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.053e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.16998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 2.51998e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.037e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.0004235 [add_forward_monad_depend]: 4.49002e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.675e-05 [a_3]: 4.124e-05 [Cycle 2]: 0.00059508, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.66998e-06 [a_1]: 0.00012534 [with_stream_mark]: 9.14e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.719e-05 [accelerated_algorithm]: 5.61003e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.02e-06 [parallel]: 3.92002e-06 [flash_sp]: 2.96001e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 2.94001e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.28e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.10001e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.58998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.31002e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.90001e-06 [a_after_grad]: 8.44998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.17001e-06 [cse]: 1.336e-05 [a_3]: 3.185e-05 [py_interpret_to_execute_after_opt_a]: 7.43999e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 3.083e-05 [convert_after_rewriter]: 7.12002e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00044652 [opt_b]: 0.00018405, [1] [Cycle 1]: 0.00017795, [7] [b_1]: 0.00010997 [b_2]: 6.91001e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 2.19996e-07 [cse]: 1.663e-05 [optimize_parallel_all_gather_comm]: 1.531e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.309e-05 [loop_unroll]: 0.00041426 [opt_after_cconv]: 9.503e-05, [1] [Cycle 1]: 8.917e-05, [7] [c_1]: 2.779e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.04999e-06 [cse]: 1.613e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.899e-05, [1] [Cycle 1]: 6.446e-05, [4] [d_1]: 3.918e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.08002e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.212e-05 [cse_after_recomputation]: 2.05e-05, [1] [Cycle 1]: 1.619e-05, [1] [cse]: 1.099e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.01002e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.12003e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.11998e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.18002e-06 [reorder_send_recv_between_fp_bp]: 2.36e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97001e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.35e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.48998e-06 [overlap_grad_ring_attention]: 4.21001e-06 [overlap_grad_flash_sp]: 1.712e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 2.11e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.766e-05, [1] [Cycle 1]: 6.366e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 6.20002e-06 [fold_const_symbol]: 8.45999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 2.17001e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.529e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00251327 [validate]: 3.342e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0961777 [execute]: 8.38999e-06 Sums bootstrap : 0.000452s : 0.42% type_inference : 0.005540s : 5.13% event_method : 0.000014s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000057s : 0.05% optimize.opt_a.a_1 : 0.000574s : 0.53% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000424s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.41% optimize.opt_b.b_1 : 0.000110s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000414s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.002513s : 2.33% validate : 0.000033s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096178s : 89.00% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.86% : 0.000024s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.21% : 0.000005s : 4: substitution.graph_param_transform 66.38% : 0.000109s : 3: substitution.inline 1.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.49% : 0.000004s : 4: substitution.replace_old_param 6.87% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005498 2 90.14% : 0.004956s : 1: type_inference.infer 9.86% : 0.000542s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.82% : 0.000027s : 3: replace.inline 30.18% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.26% : 0.000107s : 3: match.inline 8.74% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000189 1131 0.83% : 0.000002s : 11: predicate.accumulaten_eliminater 1.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.49% : 0.000001s : 8: predicate.addn_check_dump 0.67% : 0.000001s : 11: predicate.addn_zero_filter 0.67% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.91% : 0.000004s : 19: predicate.arithmetic_simplify 0.71% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.47% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.52% : 0.000001s : 8: predicate.depend_value_elim 0.73% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.77% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.71% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.96% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.33% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.97% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.92% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.89% : 0.000002s : 15: predicate.environ_get_depend_swap 1.49% : 0.000003s : 23: predicate.environ_get_eliminate 0.88% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.06% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.73% : 0.000003s : 16: predicate.float_depend_g_call 0.48% : 0.000001s : 8: predicate.float_environ_get_switch 0.74% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 4: predicate.fold_const_symbol 0.58% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.57% : 0.000001s : 8: predicate.incorporate_call 0.50% : 0.000001s : 8: predicate.incorporate_call_switch 5.16% : 0.000010s : 51: predicate.inline 0.75% : 0.000001s : 8: predicate.inline_without_move 0.30% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.75% : 0.000001s : 8: predicate.less_batch_normalization 1.47% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.96% : 0.000004s : 32: predicate.load_eliminater 0.95% : 0.000002s : 4: predicate.loop_unroll_after_grad 17.72% : 0.000033s : 26: predicate.loop_unroll_before_grad 1.44% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.52% : 0.000001s : 8: predicate.merge_addn 0.54% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.56% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.66% : 0.000001s : 11: predicate.minmaximum_grad 0.90% : 0.000002s : 4: predicate.mutable_eliminate 0.29% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.38% : 0.000003s : 16: predicate.partial_defer_inline 1.20% : 0.000002s : 17: predicate.partial_eliminate 0.69% : 0.000001s : 11: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 0.89% : 0.000002s : 11: predicate.reduce_eliminate 1.96% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.39% : 0.000001s : 8: predicate.remove_not_recompute_node 1.14% : 0.000002s : 21: predicate.replace_applicator 0.46% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000001s : 4: predicate.reset_defer_inline 0.66% : 0.000001s : 11: predicate.reshape_eliminate 0.56% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.31% : 0.000001s : 4: predicate.row_tensor_eliminate 0.70% : 0.000001s : 8: predicate.same_eliminate 0.41% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.70% : 0.000001s : 8: predicate.shard_identity_eliminate 0.66% : 0.000001s : 8: predicate.special_op_eliminate 0.69% : 0.000001s : 8: predicate.specialize_transform 0.75% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.73% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.32% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.13% : 0.000002s : 16: predicate.switch_defer_inline 1.69% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.24% : 0.000008s : 54: predicate.switch_simplify 0.69% : 0.000001s : 11: predicate.tile_eliminate 0.70% : 0.000001s : 11: predicate.transpose_eliminate 1.30% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.36% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.16% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.20% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.83% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.41% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.89% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.59% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.61% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.60% : 0.000001s : 8: predicate.virtual_output_eliminate 0.26% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000343 8 46.37% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.63% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120425 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.43% : 0.002925s : 1: add_attr 2.42% : 0.002916s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000061s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000487s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.80% : 0.000968s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000027s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.78% : 0.002149s : 1: opt_a 0.08% : 0.000099s : 1: opt_after_cconv 2.10% : 0.002527s : 1: opt_after_jit_grad 0.16% : 0.000188s : 1: opt_b 3.31% : 0.003982s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.18% : 0.000213s : 1: renormalize.infer 0.17% : 0.000204s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 79.88% : 0.096199s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.61% : 0.005554s : 1: type_inference 0.05% : 0.000057s : 1: validate TotalTime = 0.176649, [24] [bootstrap]: 0.00055697 [type_inference]: 0.0124235 [event_method]: 5.36e-05 [auto_monad]: 0.00012682 [graph_reusing]: 8.40001e-06 [inline]: 2.50002e-06 [add_attr]: 0.00334876, [1] [add_attr_with_inline]: 0.00333766, [1] [Cycle 1]: 9.515e-05, [2] [tag_attr]: 4.119e-05 [meta_addattr_fg_expand]: 9.59e-06 [parallel-infer-symbol]: 3.58999e-06 [pre_auto_parallel]: 5.723e-05 [insert-virtual-dataset]: 2.60002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.99e-06 [optimize]: 0.0153556, [53] [py_interpret_to_execute]: 4.452e-05 [rewriter_before_opt_a]: 0.00016625 [opt_a]: 0.0126083, [3] [Cycle 1]: 0.00804775, [45] [expand_dump_flag]: 4.85999e-06 [switch_simplify]: 7.637e-05 [loop_unroll]: 6.294e-05 [a_1]: 0.00152737 [with_stream_mark]: 2.892e-05 [recompute_prepare]: 2.397e-05 [updatestate_depend_eliminate]: 9.94001e-06 [updatestate_assign_eliminate]: 7.93001e-06 [updatestate_loads_eliminate]: 7.43e-06 [parameter_eliminate]: 2.96001e-06 [a_2]: 0.00024939 [accelerated_algorithm]: 3.4e-05 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 4.03999e-06 [shard_inline]: 1.659e-05 [merge_send_recv]: 1.725e-05 [auto_parallel]: 1.214e-05 [parallel]: 2.158e-05 [flash_sp]: 1.24e-05 [merge_comm]: 1.027e-05 [allreduce_fusion]: 8.85999e-06 [matmul_add_comm_reduction]: 3.094e-05 [allreduce_slice_to_reducescatter]: 7.99977e-07 [virtual_shard_identity]: 1.811e-05 [virtual_dataset]: 1.589e-05 [get_grad_eliminate_]: 1.507e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.19998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.814e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.913e-05 [merge_recompute_call_nodes]: 1.48002e-06 [before_grad]: 2.692e-05 [set_forward_comm_id_for_comm_node_pass]: 9.74e-06 [meta_fg_expand]: 0.00174787 [flash_sp_send_recv_attached]: 3.75e-06 [receive_attached]: 2.63998e-06 [after_resolve]: 6.232e-05 [a_after_grad]: 8.368e-05 [renormalize]: 0.00292688 [add_forward_monad_depend]: 1.01e-05 [auto_monad_grad]: 6.68e-06 [auto_monad_eliminator]: 5.834e-05 [cse]: 0.00017729 [a_3]: 0.00033859 [Cycle 2]: 0.00360424, [45] [expand_dump_flag]: 2.14999e-06 [switch_simplify]: 4.744e-05 [loop_unroll]: 4.36e-05 [a_1]: 0.00156957 [with_stream_mark]: 1.768e-05 [recompute_prepare]: 1.202e-05 [updatestate_depend_eliminate]: 6.32001e-06 [updatestate_assign_eliminate]: 5.76e-06 [updatestate_loads_eliminate]: 4.25e-06 [parameter_eliminate]: 2.46e-06 [a_2]: 0.00012928 [accelerated_algorithm]: 1.427e-05 [shard]: 2.36e-06 [meta_shard_fg_expand]: 2.38002e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 9.92001e-06 [auto_parallel]: 1.063e-05 [parallel]: 9.92999e-06 [flash_sp]: 4.08999e-06 [merge_comm]: 5.24e-06 [allreduce_fusion]: 5.00001e-06 [matmul_add_comm_reduction]: 1.14e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 1.122e-05 [virtual_dataset]: 9.24e-06 [get_grad_eliminate_]: 9.57999e-06 [virtual_output]: 9.00999e-06 [merge_forward]: 5.46e-06 [cell_reuse_recompute_pass]: 2.47001e-06 [offload_activation]: 1.294e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.765e-05 [merge_recompute_call_nodes]: 1.27999e-06 [before_grad]: 1.527e-05 [set_forward_comm_id_for_comm_node_pass]: 5.57999e-06 [meta_fg_expand]: 0.00012198 [flash_sp_send_recv_attached]: 1.63002e-06 [receive_attached]: 2.56e-06 [after_resolve]: 2.022e-05 [a_after_grad]: 1.513e-05 [renormalize]: 0.00098919 [add_forward_monad_depend]: 6.45002e-06 [auto_monad_grad]: 2.83e-06 [auto_monad_eliminator]: 2.101e-05 [cse]: 6.643e-05 [a_3]: 6.86e-05 [Cycle 3]: 0.00093779, [45] [expand_dump_flag]: 1.66e-06 [switch_simplify]: 1.165e-05 [loop_unroll]: 9.47001e-06 [a_1]: 0.00025997 [with_stream_mark]: 1.33e-05 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.76999e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 0.00012364 [accelerated_algorithm]: 1.244e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.92001e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 7.33e-06 [auto_parallel]: 8.07998e-06 [parallel]: 6.12999e-06 [flash_sp]: 1.06002e-06 [merge_comm]: 5.02999e-06 [allreduce_fusion]: 5.35001e-06 [matmul_add_comm_reduction]: 8.09002e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.048e-05 [virtual_dataset]: 8.80001e-06 [get_grad_eliminate_]: 8.62e-06 [virtual_output]: 8.53001e-06 [merge_forward]: 4.63001e-06 [cell_reuse_recompute_pass]: 1.61002e-06 [offload_activation]: 9.34e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.619e-05 [merge_recompute_call_nodes]: 9.30013e-07 [before_grad]: 1.474e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27001e-06 [meta_fg_expand]: 2.94001e-06 [flash_sp_send_recv_attached]: 6.89994e-07 [receive_attached]: 1.50001e-06 [after_resolve]: 1.605e-05 [a_after_grad]: 1.54e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.171e-05 [cse]: 2.894e-05 [a_3]: 5.976e-05 [py_interpret_to_execute_after_opt_a]: 1.581e-05 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 5.303e-05 [convert_after_rewriter]: 9.14998e-06 [order_py_execute_after_rewriter]: 7.03e-06 [mutable_eliminate]: 0.00073482 [opt_b]: 0.00029494, [1] [Cycle 1]: 0.00028742, [7] [b_1]: 0.00018932 [b_2]: 1.139e-05 [updatestate_depend_eliminate]: 7.17002e-06 [updatestate_assign_eliminate]: 4.04002e-06 [updatestate_loads_eliminate]: 4.17998e-06 [renormalize]: 5.60016e-07 [cse]: 3.54e-05 [optimize_parallel_all_gather_comm]: 2.161e-05 [overlap_param_gather]: 2.21e-06 [cconv]: 2.537e-05 [loop_unroll]: 0.00045286 [opt_after_cconv]: 0.00014186, [1] [Cycle 1]: 0.00013579, [7] [c_1]: 4.98e-05 [parameter_eliminate]: 2.90002e-06 [updatestate_depend_eliminate]: 7.45003e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 4e-06 [cse]: 3.261e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 3.788e-05 [tuple_transform]: 0.00010442, [1] [Cycle 1]: 9.963e-05, [4] [d_1]: 6.866e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 1.005e-05 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 6.218e-05 [cse_after_recomputation]: 3.539e-05, [1] [Cycle 1]: 3.081e-05, [1] [cse]: 2.483e-05 [environ_conv]: 9.50001e-06 [swap_dp_allreduce_reducescatter]: 8.07998e-06 [bias_add_comm_swap]: 3.05002e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.01998e-06 [micro_interleaved_order_control]: 2.43998e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 8.79983e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.88998e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.90025e-07 [overlap_opt_shard_in_pipeline]: 1.45001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 9.562e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 5.95002e-06 [overlap_recompute_and_grad_model_parallel]: 5.98998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.73002e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 5.52999e-06 [overlap_grad_flash_sp]: 2.805e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 0.00010387, [1] [Cycle 1]: 9.914e-05, [6] [build]: 1.104e-05 [elim_shapecalc]: 1.507e-05 [elim_not_effective]: 1.866e-05 [opt_reshape]: 1.033e-05 [fold_const_symbol]: 1.475e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.01e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 2.536e-05 [get_jit_bprop_graph]: 2.16e-06 [rewriter_after_jit_bprop_graph]: 3.81001e-06 [opt_after_jit_grad]: 0.00048975 [validate]: 5.275e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.143871 [execute]: 9.10001e-06 Sums bootstrap : 0.000557s : 0.32% type_inference : 0.012423s : 7.23% event_method : 0.000054s : 0.03% auto_monad : 0.000127s : 0.07% graph_reusing : 0.000008s : 0.00% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000041s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000057s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000045s : 0.03% optimize.rewriter_before_opt_a : 0.000166s : 0.10% optimize.opt_a.expand_dump_flag : 0.000009s : 0.01% optimize.opt_a.switch_simplify : 0.000135s : 0.08% optimize.opt_a.loop_unroll : 0.000116s : 0.07% optimize.opt_a.a_1 : 0.003357s : 1.95% optimize.opt_a.with_stream_mark : 0.000060s : 0.03% optimize.opt_a.recompute_prepare : 0.000045s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000022s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000018s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000007s : 0.00% optimize.opt_a.a_2 : 0.000502s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000061s : 0.04% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000035s : 0.02% optimize.opt_a.auto_parallel : 0.000031s : 0.02% optimize.opt_a.parallel : 0.000038s : 0.02% optimize.opt_a.flash_sp : 0.000018s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000050s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.02% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000005s : 0.00% optimize.opt_a.offload_activation : 0.000040s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001873s : 1.09% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000007s : 0.00% optimize.opt_a.after_resolve : 0.000099s : 0.06% optimize.opt_a.a_after_grad : 0.000114s : 0.07% optimize.opt_a.renormalize : 0.003916s : 2.28% optimize.opt_a.add_forward_monad_depend : 0.000018s : 0.01% optimize.opt_a.auto_monad_grad : 0.000011s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000091s : 0.05% optimize.opt_a.cse : 0.000273s : 0.16% optimize.opt_a.a_3 : 0.000467s : 0.27% optimize.py_interpret_to_execute_after_opt_a : 0.000016s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000053s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000735s : 0.43% optimize.opt_b.b_1 : 0.000189s : 0.11% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000035s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.01% optimize.loop_unroll : 0.000453s : 0.26% optimize.opt_after_cconv.c_1 : 0.000050s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000033s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000038s : 0.02% optimize.tuple_transform.d_1 : 0.000069s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.04% optimize.cse_after_recomputation.cse : 0.000025s : 0.01% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000096s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000028s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000015s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.01% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000490s : 0.28% validate : 0.000053s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.143871s : 83.67% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000868 222 6.45% : 0.000056s : 12: substitution.arithmetic_simplify 2.10% : 0.000018s : 2: substitution.cast_eliminate 0.31% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.91% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.73% : 0.000492s : 17: substitution.inline 1.99% : 0.000017s : 2: substitution.inline_without_move 1.28% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.14% : 0.000019s : 3: substitution.less_batch_normalization 1.52% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.63% : 0.000014s : 20: substitution.remove_not_recompute_node 2.99% : 0.000026s : 10: substitution.replace_applicator 1.42% : 0.000012s : 15: substitution.replace_old_param 0.27% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.46% : 0.000030s : 11: substitution.tuple_list_convert_item_index_to_positive 1.64% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.12% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.14% : 0.000071s : 30: substitution.tuple_list_get_item_eliminator 2.26% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012336 2 86.52% : 0.010673s : 1: type_inference.infer 13.48% : 0.001663s : 1: type_inference.specialize ------[replace.] 0.000228 33 58.25% : 0.000133s : 17: replace.inline 41.75% : 0.000095s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000519 33 93.03% : 0.000483s : 17: match.inline 6.97% : 0.000036s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000766 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.10% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.70% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.39% : 0.000018s : 101: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000005s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000043s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.61% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000009s : 68: predicate.minmaximum_grad 0.39% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000015s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.11% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.19% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.32% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 101: predicate.switch_defer_inline 2.90% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000039s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.23% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001803 34 56.06% : 0.001011s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.94% : 0.000792s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.204646 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.64% : 0.003354s : 1: add_attr 1.63% : 0.003342s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000066s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000134s : 1: auto_monad 0.01% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.29% : 0.000595s : 1: bootstrap 0.01% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000100s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000038s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000062s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.23% : 0.000462s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000745s : 1: mutable_eliminate 0.00% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000019s : 1: opt.transform.mutable_eliminate 2.47% : 0.005061s : 117: opt.transform.opt_a 0.02% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000077s : 2: opt.transform.opt_trans_graph 0.03% : 0.000055s : 4: opt.transform.symbol_engine_opt 6.16% : 0.012612s : 1: opt_a 0.07% : 0.000145s : 1: opt_after_cconv 0.24% : 0.000499s : 1: opt_after_jit_grad 0.15% : 0.000299s : 1: opt_b 7.51% : 0.015361s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000031s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000062s : 1: pre_auto_parallel 0.02% : 0.000049s : 1: py_interpret_to_execute 0.01% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000042s : 1: remove_dup_value 1.08% : 0.002203s : 2: renormalize.infer 0.83% : 0.001695s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000057s : 1: rewriter_after_opt_a 0.08% : 0.000172s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000107s : 1: symbol_engine_optimizer 70.31% : 0.143894s : 1: task_emit 0.05% : 0.000108s : 1: tuple_transform 6.08% : 0.012445s : 1: type_inference 0.04% : 0.000085s : 1: validate TotalTime = 0.104892, [24] [bootstrap]: 0.00047267 [type_inference]: 0.00431 [event_method]: 1.08e-05 [auto_monad]: 5.026e-05 [graph_reusing]: 4.85001e-06 [inline]: 1.69e-06 [add_attr]: 0.00298183, [1] [add_attr_with_inline]: 0.00297364, [1] [Cycle 1]: 4.406e-05, [2] [tag_attr]: 1.154e-05 [meta_addattr_fg_expand]: 3.28e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.075e-05 [insert-virtual-dataset]: 2.57001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00366007, [53] [py_interpret_to_execute]: 1.543e-05 [rewriter_before_opt_a]: 3.832e-05 [opt_a]: 0.00185186, [2] [Cycle 1]: 0.0012514, [45] [expand_dump_flag]: 2.46e-06 [switch_simplify]: 2.44e-05 [loop_unroll]: 1.363e-05 [a_1]: 0.00029089 [with_stream_mark]: 1.352e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.41999e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.645e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 7.48e-06 [auto_parallel]: 5.65001e-06 [parallel]: 1.687e-05 [flash_sp]: 7.56001e-06 [merge_comm]: 3.77998e-06 [allreduce_fusion]: 3.16999e-06 [matmul_add_comm_reduction]: 8.50999e-06 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.51e-06 [merge_forward]: 4.02998e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 8.68001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.047e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.22001e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 1.087e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00034377 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 1.88002e-06 [auto_monad_eliminator]: 1.317e-05 [cse]: 2.767e-05 [a_3]: 3.982e-05 [Cycle 2]: 0.00059147, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.67999e-06 [a_1]: 0.0001255 [with_stream_mark]: 9.04e-06 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.69001e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.50002e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.739e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.39002e-06 [auto_parallel]: 5.10999e-06 [parallel]: 3.97998e-06 [flash_sp]: 3.00998e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.80002e-06 [matmul_add_comm_reduction]: 5.35999e-06 [allreduce_slice_to_reducescatter]: 2.70025e-07 [virtual_shard_identity]: 6.29001e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.44e-06 [merge_forward]: 2.43002e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.94999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.024e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 8.08001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.88e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 8.80001e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.48e-06 [cse]: 1.29e-05 [a_3]: 3.198e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 1.83002e-06 [rewriter_after_opt_a]: 3.188e-05 [convert_after_rewriter]: 6.80998e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00045074 [opt_b]: 0.00017922, [1] [Cycle 1]: 0.00017312, [7] [b_1]: 0.000107 [b_2]: 7.19001e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.7998e-07 [cse]: 1.541e-05 [optimize_parallel_all_gather_comm]: 1.502e-05 [overlap_param_gather]: 2.27999e-06 [cconv]: 2.171e-05 [loop_unroll]: 0.00041571 [opt_after_cconv]: 9.551e-05, [1] [Cycle 1]: 8.991e-05, [7] [c_1]: 2.731e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.67e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.259e-05 [tuple_transform]: 6.812e-05, [1] [Cycle 1]: 6.385e-05, [4] [d_1]: 3.849e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.12001e-06 [partial_unused_args_eliminate]: 1.90001e-06 [add_recomputation]: 4.404e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.078e-05 [environ_conv]: 4.68999e-06 [swap_dp_allreduce_reducescatter]: 5.58002e-06 [bias_add_comm_swap]: 2.18998e-06 [label_micro_interleaved_index]: 3.95998e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.33002e-06 [interleave_parallel_branches]: 1.01997e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.83002e-06 [control_data_broadcast_order]: 1.174e-05 [grouped_pairwise_exchange_alltoall]: 1.99999e-06 [offloading_packed_experts]: 3.75003e-06 [overlap_recompute_and_grad_model_parallel]: 4.44002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.53003e-06 [overlap_grad_ring_attention]: 4.25e-06 [overlap_grad_flash_sp]: 1.742e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.756e-05, [1] [Cycle 1]: 6.356e-05, [6] [build]: 2.28002e-06 [elim_shapecalc]: 8.29998e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 8.49998e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.582e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.95e-06 [opt_after_jit_grad]: 0.00047493 [validate]: 3.084e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0926217 [execute]: 9.34e-06 Sums bootstrap : 0.000473s : 0.47% type_inference : 0.004310s : 4.27% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000416s : 0.41% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000344s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.45% optimize.opt_b.b_1 : 0.000107s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000416s : 0.41% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000475s : 0.47% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.092622s : 91.76% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.25% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.58% : 0.000006s : 4: substitution.graph_param_transform 65.48% : 0.000080s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.68% : 0.000005s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004270 2 91.64% : 0.003914s : 1: type_inference.infer 8.36% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.12% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.79% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.52% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.98% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 8: predicate.remove_not_recompute_node 1.26% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000000s : 4: predicate.reset_defer_inline 1.08% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.21% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.68% : 0.000006s : 41: predicate.switch_simplify 0.79% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.60% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.46% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 41.08% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.92% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.112797 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.65% : 0.002986s : 1: add_attr 2.64% : 0.002977s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000508s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.38% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.68% : 0.000767s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.64% : 0.001855s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.43% : 0.000484s : 1: opt_after_jit_grad 0.16% : 0.000183s : 1: opt_b 3.25% : 0.003664s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000188s : 1: renormalize.infer 0.13% : 0.000149s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.13% : 0.092644s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.83% : 0.004324s : 1: type_inference 0.05% : 0.000053s : 1: validate TotalTime = 0.144403, [24] [bootstrap]: 0.00049328 [type_inference]: 0.0102546 [event_method]: 4.377e-05 [auto_monad]: 0.00011605 [graph_reusing]: 8.03001e-06 [inline]: 2.22999e-06 [add_attr]: 0.00302063, [1] [add_attr_with_inline]: 0.00301188, [1] [Cycle 1]: 6.555e-05, [2] [tag_attr]: 3.005e-05 [meta_addattr_fg_expand]: 8.78001e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 4.654e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 2.30002e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0130849, [53] [py_interpret_to_execute]: 3.531e-05 [rewriter_before_opt_a]: 0.00012673 [opt_a]: 0.0108459, [3] [Cycle 1]: 0.00696029, [45] [expand_dump_flag]: 3.77998e-06 [switch_simplify]: 6.559e-05 [loop_unroll]: 5.437e-05 [a_1]: 0.00139649 [with_stream_mark]: 2.337e-05 [recompute_prepare]: 2.122e-05 [updatestate_depend_eliminate]: 9.10999e-06 [updatestate_assign_eliminate]: 7.95e-06 [updatestate_loads_eliminate]: 7.46001e-06 [parameter_eliminate]: 2.61999e-06 [a_2]: 0.00024648 [accelerated_algorithm]: 2.997e-05 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 3.16999e-06 [shard_inline]: 1.606e-05 [merge_send_recv]: 1.588e-05 [auto_parallel]: 1.066e-05 [parallel]: 1.808e-05 [flash_sp]: 1.1e-05 [merge_comm]: 9.44e-06 [allreduce_fusion]: 8.69e-06 [matmul_add_comm_reduction]: 2.535e-05 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 1.756e-05 [virtual_dataset]: 1.559e-05 [get_grad_eliminate_]: 1.548e-05 [virtual_output]: 1.546e-05 [merge_forward]: 9.46e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.761e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.846e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 2.696e-05 [set_forward_comm_id_for_comm_node_pass]: 9.37999e-06 [meta_fg_expand]: 0.0013827 [flash_sp_send_recv_attached]: 4.05998e-06 [receive_attached]: 2.48e-06 [after_resolve]: 5.98e-05 [a_after_grad]: 8.023e-05 [renormalize]: 0.00243365 [add_forward_monad_depend]: 8.89e-06 [auto_monad_grad]: 5.39e-06 [auto_monad_eliminator]: 5.828e-05 [cse]: 0.00016777 [a_3]: 0.00033323 [Cycle 2]: 0.00297912, [45] [expand_dump_flag]: 1.59e-06 [switch_simplify]: 4.691e-05 [loop_unroll]: 4.416e-05 [a_1]: 0.00152254 [with_stream_mark]: 1.212e-05 [recompute_prepare]: 1.129e-05 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 3.67002e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012595 [accelerated_algorithm]: 1.208e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.57999e-06 [merge_send_recv]: 6.48e-06 [auto_parallel]: 7.45998e-06 [parallel]: 4.78001e-06 [flash_sp]: 2.96001e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.49998e-06 [matmul_add_comm_reduction]: 7.46999e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 1.016e-05 [virtual_dataset]: 8.73001e-06 [get_grad_eliminate_]: 8.83001e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.32998e-06 [cell_reuse_recompute_pass]: 9.60019e-07 [offload_activation]: 9.16002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.602e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 4.499e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30001e-06 [meta_fg_expand]: 3.409e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.448e-05 [a_after_grad]: 1.416e-05 [renormalize]: 0.00058793 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 1.488e-05 [cse]: 4.555e-05 [a_3]: 6.524e-05 [Cycle 3]: 0.0008924, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 1.064e-05 [loop_unroll]: 8.92999e-06 [a_1]: 0.0002492 [with_stream_mark]: 9.30001e-06 [recompute_prepare]: 9.50001e-06 [updatestate_depend_eliminate]: 4.68999e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.72998e-06 [parameter_eliminate]: 9.99979e-07 [a_2]: 0.00012207 [accelerated_algorithm]: 1.163e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.61002e-06 [shard_inline]: 9.17999e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.31999e-06 [parallel]: 4.65001e-06 [flash_sp]: 1.02998e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.89998e-06 [matmul_add_comm_reduction]: 7.53e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.99999e-06 [virtual_dataset]: 8.92999e-06 [get_grad_eliminate_]: 8.54998e-06 [virtual_output]: 8.31002e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.34998e-06 [offload_activation]: 8.26002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.563e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 4.97e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.315e-05 [a_after_grad]: 1.439e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 9.99979e-07 [auto_monad_eliminator]: 1.05e-05 [cse]: 2.573e-05 [a_3]: 5.92e-05 [py_interpret_to_execute_after_opt_a]: 9.94999e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 4.777e-05 [convert_after_rewriter]: 8.90001e-06 [order_py_execute_after_rewriter]: 7.1e-06 [mutable_eliminate]: 0.00046039 [opt_b]: 0.00028545, [1] [Cycle 1]: 0.00027926, [7] [b_1]: 0.00018693 [b_2]: 1.088e-05 [updatestate_depend_eliminate]: 6.93e-06 [updatestate_assign_eliminate]: 4.09002e-06 [updatestate_loads_eliminate]: 3.88999e-06 [renormalize]: 4.19997e-07 [cse]: 3.129e-05 [optimize_parallel_all_gather_comm]: 2.03e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 1.966e-05 [loop_unroll]: 0.00042514 [opt_after_cconv]: 0.00013552, [1] [Cycle 1]: 0.00012961, [7] [c_1]: 4.8e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 6.86001e-06 [updatestate_assign_eliminate]: 4.46002e-06 [updatestate_loads_eliminate]: 3.88001e-06 [cse]: 2.977e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 2.992e-05 [tuple_transform]: 0.00010055, [1] [Cycle 1]: 9.589e-05, [4] [d_1]: 6.614e-05 [none_parameter_eliminate]: 1.61998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.69e-06 [partial_unused_args_eliminate]: 1.54e-06 [add_recomputation]: 5.681e-05 [cse_after_recomputation]: 3.222e-05, [1] [Cycle 1]: 2.776e-05, [1] [cse]: 2.209e-05 [environ_conv]: 9.10001e-06 [swap_dp_allreduce_reducescatter]: 7.8e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.03002e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 8.49977e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.739e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 5.39e-06 [overlap_recompute_and_grad_model_parallel]: 5.47999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 5.37999e-06 [overlap_grad_flash_sp]: 2.363e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 9.737e-05, [1] [Cycle 1]: 9.306e-05, [6] [build]: 9.31998e-06 [elim_shapecalc]: 1.306e-05 [elim_not_effective]: 1.789e-05 [opt_reshape]: 1.013e-05 [fold_const_symbol]: 1.46e-05 [renormalize]: 1.8999e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.36002e-06 [auto_monad_reorder]: 2.525e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00051784 [validate]: 4.605e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.1165 [execute]: 9.89999e-06 Sums bootstrap : 0.000493s : 0.35% type_inference : 0.010255s : 7.32% event_method : 0.000044s : 0.03% auto_monad : 0.000116s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000123s : 0.09% optimize.opt_a.loop_unroll : 0.000107s : 0.08% optimize.opt_a.a_1 : 0.003168s : 2.26% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000086s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.01% optimize.opt_a.meta_fg_expand : 0.001420s : 1.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.06% optimize.opt_a.a_after_grad : 0.000109s : 0.08% optimize.opt_a.renormalize : 0.003022s : 2.16% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.06% optimize.opt_a.cse : 0.000239s : 0.17% optimize.opt_a.a_3 : 0.000458s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.33% optimize.opt_b.b_1 : 0.000187s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000425s : 0.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000518s : 0.37% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.116500s : 83.14% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000755 218 5.67% : 0.000043s : 11: substitution.arithmetic_simplify 1.99% : 0.000015s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 52.35% : 0.000395s : 16: substitution.inline 2.05% : 0.000015s : 2: substitution.inline_without_move 5.40% : 0.000041s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000014s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.74% : 0.000013s : 20: substitution.remove_not_recompute_node 3.12% : 0.000024s : 10: substitution.replace_applicator 1.43% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.57% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.05% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010185 2 86.97% : 0.008858s : 1: type_inference.infer 13.03% : 0.001327s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.24% : 0.000119s : 16: replace.inline 40.76% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000417 30 92.78% : 0.000387s : 16: match.inline 7.22% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000803 5663 0.99% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 0.99% : 0.000008s : 67: predicate.addn_zero_filter 9.27% : 0.000074s : 67: predicate.adjust_all_reduce_mul_add 1.89% : 0.000015s : 99: predicate.arithmetic_simplify 1.06% : 0.000009s : 67: predicate.cast_eliminate 1.07% : 0.000009s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.48% : 0.000004s : 32: predicate.depend_value_elim 1.08% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.13% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.03% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.10% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.10% : 0.000009s : 75: predicate.environ_get_depend_swap 1.62% : 0.000013s : 107: predicate.environ_get_eliminate 1.10% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.55% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.05% : 0.000016s : 97: predicate.float_depend_g_call 0.47% : 0.000004s : 32: predicate.float_environ_get_switch 0.61% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.51% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.45% : 0.000004s : 32: predicate.incorporate_call_switch 5.17% : 0.000041s : 244: predicate.inline 1.17% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.59% : 0.000005s : 32: predicate.less_batch_normalization 1.49% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.45% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.06% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.29% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.02% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.03% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.02% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.80% : 0.000014s : 97: predicate.partial_defer_inline 1.56% : 0.000013s : 89: predicate.partial_eliminate 0.97% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.19% : 0.000010s : 67: predicate.reduce_eliminate 2.45% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.75% : 0.000014s : 149: predicate.replace_applicator 0.56% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.01% : 0.000008s : 67: predicate.reshape_eliminate 1.06% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.16% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.58% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.14% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.04% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.69% : 0.000014s : 97: predicate.switch_defer_inline 2.68% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.52% : 0.000036s : 265: predicate.switch_simplify 0.99% : 0.000008s : 67: predicate.tile_eliminate 1.04% : 0.000008s : 67: predicate.transpose_eliminate 1.34% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.43% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.63% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.33% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.88% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.45% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.41% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 2.97% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.52% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001510 32 57.15% : 0.000863s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.85% : 0.000647s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168691 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.79% : 0.003025s : 1: add_attr 1.79% : 0.003016s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000123s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.31% : 0.000529s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000051s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.26% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000469s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.87% : 0.004841s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000173s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.43% : 0.010849s : 1: opt_a 0.08% : 0.000139s : 1: opt_after_cconv 0.31% : 0.000527s : 1: opt_after_jit_grad 0.17% : 0.000289s : 1: opt_b 7.76% : 0.013089s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.02% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 0.95% : 0.001610s : 2: renormalize.infer 0.83% : 0.001399s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.08% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000100s : 1: symbol_engine_optimizer 69.07% : 0.116521s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.09% : 0.010269s : 1: type_inference 0.04% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x2-ge],max_mem:50.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x3-pynative],max_mem:50.0M TotalTime = 0.0222048, [24] [bootstrap]: 0.00053487 [type_inference]: 0.00643934 [event_method]: 1.569e-05 [auto_monad]: 5.92e-05 [graph_reusing]: 5.15999e-06 [inline]: 1.69e-06 [add_attr]: 0.00364628, [1] [add_attr_with_inline]: 0.00363489, [1] [Cycle 1]: 4.874e-05, [2] [tag_attr]: 1.547e-05 [meta_addattr_fg_expand]: 3.97002e-06 [parallel-infer-symbol]: 3.84997e-06 [pre_auto_parallel]: 3.016e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00405986, [53] [py_interpret_to_execute]: 2.058e-05 [rewriter_before_opt_a]: 5.867e-05 [opt_a]: 0.00220177, [2] [Cycle 1]: 0.0016004, [45] [expand_dump_flag]: 2.62001e-06 [switch_simplify]: 3.232e-05 [loop_unroll]: 2.081e-05 [a_1]: 0.00045552 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 4.28999e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 7.424e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 7.56001e-06 [auto_parallel]: 5.84e-06 [parallel]: 2.269e-05 [flash_sp]: 7.7e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.61999e-06 [matmul_add_comm_reduction]: 8.67e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.84e-06 [get_grad_eliminate_]: 5.42999e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.29998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.049e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.24998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.76999e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00049928 [add_forward_monad_depend]: 4.62e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.825e-05 [a_3]: 4.061e-05 [Cycle 2]: 0.00059199, [45] [expand_dump_flag]: 1.09998e-06 [switch_simplify]: 6.58e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012515 [with_stream_mark]: 9.71998e-06 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.16998e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 6.713e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.09e-06 [parallel]: 4.74e-06 [flash_sp]: 3.39001e-06 [merge_comm]: 3.26001e-06 [allreduce_fusion]: 2.75002e-06 [matmul_add_comm_reduction]: 5.32999e-06 [allreduce_slice_to_reducescatter]: 2.3999e-07 [virtual_shard_identity]: 6.20002e-06 [virtual_dataset]: 5.46002e-06 [get_grad_eliminate_]: 4.99003e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 6.06998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 7.87e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 6.47001e-06 [cse]: 1.307e-05 [a_3]: 3.247e-05 [py_interpret_to_execute_after_opt_a]: 7.61001e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 2.768e-05 [convert_after_rewriter]: 6.73998e-06 [order_py_execute_after_rewriter]: 5.17999e-06 [mutable_eliminate]: 0.00045487 [opt_b]: 0.00018275, [1] [Cycle 1]: 0.00017672, [7] [b_1]: 0.00010927 [b_2]: 7.02002e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 3.30008e-07 [cse]: 1.641e-05 [optimize_parallel_all_gather_comm]: 1.523e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.128e-05 [loop_unroll]: 0.00043208 [opt_after_cconv]: 9.419e-05, [1] [Cycle 1]: 8.874e-05, [7] [c_1]: 2.78e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.607e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.229e-05 [tuple_transform]: 6.818e-05, [1] [Cycle 1]: 6.388e-05, [4] [d_1]: 3.829e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 4.886e-05 [cse_after_recomputation]: 2.083e-05, [1] [Cycle 1]: 1.635e-05, [1] [cse]: 1.104e-05 [environ_conv]: 4.75999e-06 [swap_dp_allreduce_reducescatter]: 5.47999e-06 [bias_add_comm_swap]: 2.54001e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 1.15001e-06 [remove_cast_before_assign_add]: 8.2e-07 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 7.90023e-07 [interleave_split_concat_branches]: 1.40001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.29e-06 [overlap_opt_shard_grad_in_pipeline]: 1.51002e-06 [control_data_broadcast_order]: 1.204e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 3.77002e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.01e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.74e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.18002e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.834e-05, [1] [Cycle 1]: 6.45e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.65999e-06 [elim_not_effective]: 1.165e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.92e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.56998e-06 [auto_monad_reorder]: 1.515e-05 [get_jit_bprop_graph]: 1.17e-06 [rewriter_after_jit_bprop_graph]: 0.00012961 [opt_after_jit_grad]: 0.00045604 [validate]: 3.144e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00654585 [execute]: 7.2e-06 Sums bootstrap : 0.000535s : 3.04% type_inference : 0.006439s : 36.62% event_method : 0.000016s : 0.09% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000581s : 3.30% optimize.opt_a.with_stream_mark : 0.000024s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000141s : 0.80% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000016s : 0.09% optimize.opt_a.renormalize : 0.000499s : 2.84% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000041s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000455s : 2.59% optimize.opt_b.b_1 : 0.000109s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.12% optimize.loop_unroll : 0.000432s : 2.46% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000130s : 0.74% opt_after_jit_grad : 0.000456s : 2.59% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006546s : 37.23% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000167 30 14.54% : 0.000024s : 5: substitution.arithmetic_simplify 1.26% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.15% : 0.000005s : 4: substitution.graph_param_transform 66.97% : 0.000112s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000004s : 4: substitution.remove_not_recompute_node 2.34% : 0.000004s : 4: substitution.replace_old_param 6.57% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006388 2 89.96% : 0.005747s : 1: type_inference.infer 10.04% : 0.000642s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.23% : 0.000029s : 3: replace.inline 28.77% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.71% : 0.000110s : 3: match.inline 8.29% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.96% : 0.000002s : 11: predicate.accumulaten_eliminater 1.03% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.93% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.51% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000009s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.40% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000405 8 45.11% : 0.000183s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.89% : 0.000222s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031504 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.59% : 0.003650s : 1: add_attr 11.55% : 0.003638s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000064s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.81% : 0.000570s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.07% : 0.000022s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000441s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.47% : 0.000464s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 2.99% : 0.000942s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.00% : 0.002205s : 1: opt_a 0.31% : 0.000098s : 1: opt_after_cconv 1.48% : 0.000466s : 1: opt_after_jit_grad 0.59% : 0.000186s : 1: opt_b 12.90% : 0.004064s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000035s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.85% : 0.000267s : 1: renormalize.infer 0.72% : 0.000226s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.43% : 0.000136s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000032s : 1: rewriter_after_opt_a 0.20% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.81% : 0.006556s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.50% : 0.006457s : 1: type_inference 0.21% : 0.000065s : 1: validate TotalTime = 0.0181823, [24] [bootstrap]: 0.00047064 [type_inference]: 0.00436805 [event_method]: 1.035e-05 [auto_monad]: 5.029e-05 [graph_reusing]: 5.07e-06 [inline]: 2.16998e-06 [add_attr]: 0.00300241, [1] [add_attr_with_inline]: 0.00299389, [1] [Cycle 1]: 4.419e-05, [2] [tag_attr]: 1.201e-05 [meta_addattr_fg_expand]: 2.89999e-06 [parallel-infer-symbol]: 2.78998e-06 [pre_auto_parallel]: 2.123e-05 [insert-virtual-dataset]: 2.58998e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.71002e-06 [optimize]: 0.00369213, [53] [py_interpret_to_execute]: 1.516e-05 [rewriter_before_opt_a]: 3.802e-05 [opt_a]: 0.00188913, [2] [Cycle 1]: 0.00124209, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 2.295e-05 [loop_unroll]: 1.356e-05 [a_1]: 0.00029178 [with_stream_mark]: 1.386e-05 [recompute_prepare]: 7.40998e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 7.576e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 5.69999e-06 [parallel]: 1.683e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.79002e-06 [allreduce_fusion]: 3.08998e-06 [matmul_add_comm_reduction]: 9.10001e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 6.87002e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.73001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 8.88002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.48002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.05002e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00033974 [add_forward_monad_depend]: 4.36002e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.324e-05 [cse]: 2.57e-05 [a_3]: 3.966e-05 [Cycle 2]: 0.00063777, [45] [expand_dump_flag]: 9.90025e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012396 [with_stream_mark]: 1.106e-05 [recompute_prepare]: 5.55001e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 6.77e-05 [accelerated_algorithm]: 5.61998e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.22999e-06 [parallel]: 4.13999e-06 [flash_sp]: 3.2e-06 [merge_comm]: 2.93998e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 4.96002e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.19999e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 4.95001e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 5.81003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.17999e-06 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 7.77998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 5.92001e-06 [cse]: 1.263e-05 [a_3]: 8.085e-05 [py_interpret_to_execute_after_opt_a]: 7.71999e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.22e-05 [convert_after_rewriter]: 6.63998e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.00045129 [opt_b]: 0.00018081, [1] [Cycle 1]: 0.00017495, [7] [b_1]: 0.00010704 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 2.50002e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.543e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.238e-05 [loop_unroll]: 0.00041256 [opt_after_cconv]: 9.374e-05, [1] [Cycle 1]: 8.807e-05, [7] [c_1]: 2.762e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.61e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.233e-05 [tuple_transform]: 6.869e-05, [1] [Cycle 1]: 6.434e-05, [4] [d_1]: 3.844e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.16998e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.318e-05 [cse_after_recomputation]: 2.012e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.066e-05 [environ_conv]: 4.1e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 3.93999e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.03997e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 9.49978e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.48998e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 8.2e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.91001e-06 [overlap_recompute_and_grad_model_parallel]: 4.73001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.87001e-06 [overlap_recompute_comm]: 1.84998e-06 [overlap_grad_ring_attention]: 4.16001e-06 [overlap_grad_flash_sp]: 1.68e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.824e-05, [1] [Cycle 1]: 6.413e-05, [6] [build]: 2.18002e-06 [elim_shapecalc]: 8.58001e-06 [elim_not_effective]: 1.109e-05 [opt_reshape]: 6.03002e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 1.526e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044583 [validate]: 3.084e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00584291 [execute]: 6.94001e-06 Sums bootstrap : 0.000471s : 3.31% type_inference : 0.004368s : 30.72% event_method : 0.000010s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000416s : 2.92% optimize.opt_a.with_stream_mark : 0.000025s : 0.18% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000340s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000121s : 0.85% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 3.17% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000413s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.14% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005843s : 41.09% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.69% : 0.000023s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.54% : 0.000005s : 4: substitution.graph_param_transform 64.82% : 0.000078s : 2: substitution.inline 2.48% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.75% : 0.000005s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004327 2 92.04% : 0.003983s : 1: type_inference.infer 7.96% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.52% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.50% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.23% : 0.000009s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.08% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.97% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.68% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 43.30% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.70% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026186 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.48% : 0.003007s : 1: add_attr 11.45% : 0.002998s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.96% : 0.000514s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000460s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 3.11% : 0.000813s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.23% : 0.001892s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000455s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.11% : 0.003696s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000186s : 1: renormalize.infer 0.56% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.35% : 0.005852s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.73% : 0.004382s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0195595, [24] [bootstrap]: 0.00048197 [type_inference]: 0.00548204 [event_method]: 1.444e-05 [auto_monad]: 5.351e-05 [graph_reusing]: 5.15999e-06 [inline]: 1.71e-06 [add_attr]: 0.00295972, [1] [add_attr_with_inline]: 0.00295176, [1] [Cycle 1]: 4.437e-05, [2] [tag_attr]: 1.473e-05 [meta_addattr_fg_expand]: 4.22998e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.562e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00391207, [53] [py_interpret_to_execute]: 1.893e-05 [rewriter_before_opt_a]: 5.634e-05 [opt_a]: 0.00209211, [2] [Cycle 1]: 0.00148197, [45] [expand_dump_flag]: 3.07002e-06 [switch_simplify]: 3.155e-05 [loop_unroll]: 2.049e-05 [a_1]: 0.00044582 [with_stream_mark]: 1.3e-05 [recompute_prepare]: 8.03999e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.533e-05 [accelerated_algorithm]: 6.39001e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 7.33e-06 [auto_parallel]: 6.41998e-06 [parallel]: 1.763e-05 [flash_sp]: 6.56e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 8.50001e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 6.79001e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.41002e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 4.12e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.01002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22002e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 9.89999e-06 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00040491 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.84998e-06 [auto_monad_eliminator]: 1.332e-05 [cse]: 2.732e-05 [a_3]: 4.025e-05 [Cycle 2]: 0.00060079, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012102 [with_stream_mark]: 9.45001e-06 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.23002e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.759e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.27998e-06 [auto_parallel]: 4.97e-06 [parallel]: 1.445e-05 [flash_sp]: 3.25e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.93e-06 [virtual_dataset]: 5.54998e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.80002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.87998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.05002e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 6.31e-06 [cse]: 1.67e-05 [a_3]: 3.188e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.086e-05 [convert_after_rewriter]: 7.28e-06 [order_py_execute_after_rewriter]: 4.79e-06 [mutable_eliminate]: 0.00045074 [opt_b]: 0.00018143, [1] [Cycle 1]: 0.00017514, [7] [b_1]: 0.00010804 [b_2]: 7.47998e-06 [updatestate_depend_eliminate]: 5.57999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.05002e-06 [renormalize]: 3.69997e-07 [cse]: 1.61e-05 [optimize_parallel_all_gather_comm]: 1.538e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.181e-05 [loop_unroll]: 0.00041094 [opt_after_cconv]: 9.385e-05, [1] [Cycle 1]: 8.802e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.16003e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.526e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.295e-05 [tuple_transform]: 6.837e-05, [1] [Cycle 1]: 6.406e-05, [4] [d_1]: 3.823e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 2.17001e-06 [add_recomputation]: 4.233e-05 [cse_after_recomputation]: 1.949e-05, [1] [Cycle 1]: 1.516e-05, [1] [cse]: 1.011e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 4.88001e-06 [bias_add_comm_swap]: 2.77002e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.01e-06 [assign_add_opt]: 1.69e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.092e-05 [grouped_pairwise_exchange_alltoall]: 1.82999e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.709e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 6.737e-05, [1] [Cycle 1]: 6.323e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 5.77999e-06 [fold_const_symbol]: 8.94e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.544e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00045272 [validate]: 3.015e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00590748 [execute]: 7.06999e-06 Sums bootstrap : 0.000482s : 3.08% type_inference : 0.005482s : 35.03% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000056s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000567s : 3.62% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000032s : 0.21% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000405s : 2.59% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000044s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.88% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000411s : 2.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000453s : 2.89% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005907s : 37.75% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000161 30 14.91% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.65% : 0.000107s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.45% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005441 2 90.02% : 0.004898s : 1: type_inference.infer 9.98% : 0.000543s : 1: type_inference.specialize ------[replace.] 0.000039 5 68.96% : 0.000027s : 3: replace.inline 31.04% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.70% : 0.000105s : 3: match.inline 8.30% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000335 8 45.71% : 0.000153s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.29% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027923 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.62% : 0.002964s : 1: add_attr 10.58% : 0.002955s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.86% : 0.000519s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.65% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.33% : 0.000929s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.50% : 0.002095s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.66% : 0.000462s : 1: opt_after_jit_grad 0.66% : 0.000185s : 1: opt_b 14.02% : 0.003916s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000205s : 1: renormalize.infer 0.69% : 0.000192s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.19% : 0.005917s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 19.68% : 0.005495s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0378228, [24] [bootstrap]: 0.00054037 [type_inference]: 0.0112798 [event_method]: 4.605e-05 [auto_monad]: 0.00011981 [graph_reusing]: 8.03001e-06 [inline]: 1.97001e-06 [add_attr]: 0.00304111, [1] [add_attr_with_inline]: 0.00303235, [1] [Cycle 1]: 7.337e-05, [2] [tag_attr]: 3.57e-05 [meta_addattr_fg_expand]: 9.64e-06 [parallel-infer-symbol]: 2.91999e-06 [pre_auto_parallel]: 4.999e-05 [insert-virtual-dataset]: 2.25002e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.0134983, [53] [py_interpret_to_execute]: 3.864e-05 [rewriter_before_opt_a]: 0.00014696 [opt_a]: 0.0111835, [3] [Cycle 1]: 0.00716937, [45] [expand_dump_flag]: 3.68e-06 [switch_simplify]: 7.358e-05 [loop_unroll]: 6.165e-05 [a_1]: 0.00147337 [with_stream_mark]: 2.295e-05 [recompute_prepare]: 2.143e-05 [updatestate_depend_eliminate]: 9.27999e-06 [updatestate_assign_eliminate]: 8.27e-06 [updatestate_loads_eliminate]: 7.6e-06 [parameter_eliminate]: 2.98e-06 [a_2]: 0.00024304 [accelerated_algorithm]: 3.106e-05 [shard]: 2.02999e-06 [meta_shard_fg_expand]: 3.31999e-06 [shard_inline]: 1.617e-05 [merge_send_recv]: 1.548e-05 [auto_parallel]: 1.122e-05 [parallel]: 1.738e-05 [flash_sp]: 1.166e-05 [merge_comm]: 1.017e-05 [allreduce_fusion]: 9.04e-06 [matmul_add_comm_reduction]: 2.689e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.771e-05 [virtual_dataset]: 1.549e-05 [get_grad_eliminate_]: 1.534e-05 [virtual_output]: 1.517e-05 [merge_forward]: 9.41998e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.79e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.883e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.816e-05 [set_forward_comm_id_for_comm_node_pass]: 9.42999e-06 [meta_fg_expand]: 0.00140769 [flash_sp_send_recv_attached]: 3.78001e-06 [receive_attached]: 3.13e-06 [after_resolve]: 6.051e-05 [a_after_grad]: 8.16e-05 [renormalize]: 0.00252249 [add_forward_monad_depend]: 9.50001e-06 [auto_monad_grad]: 5.92999e-06 [auto_monad_eliminator]: 5.581e-05 [cse]: 0.00016225 [a_3]: 0.00033528 [Cycle 2]: 0.00309328, [45] [expand_dump_flag]: 1.96e-06 [switch_simplify]: 4.677e-05 [loop_unroll]: 4.353e-05 [a_1]: 0.00153488 [with_stream_mark]: 1.23e-05 [recompute_prepare]: 1.083e-05 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.00015075 [accelerated_algorithm]: 1.278e-05 [shard]: 1.52999e-06 [meta_shard_fg_expand]: 2.07999e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 6.76999e-06 [auto_parallel]: 7.3e-06 [parallel]: 5.01997e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 5.07999e-06 [allreduce_fusion]: 4.52998e-06 [matmul_add_comm_reduction]: 7.9e-06 [allreduce_slice_to_reducescatter]: 4.80009e-07 [virtual_shard_identity]: 1.022e-05 [virtual_dataset]: 9.15999e-06 [get_grad_eliminate_]: 9.40001e-06 [virtual_output]: 8.35999e-06 [merge_forward]: 4.38999e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.625e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.436e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40001e-06 [meta_fg_expand]: 7.518e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.34e-06 [after_resolve]: 1.605e-05 [a_after_grad]: 1.435e-05 [renormalize]: 0.000644 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.45999e-06 [auto_monad_eliminator]: 1.452e-05 [cse]: 4.805e-05 [a_3]: 6.536e-05 [Cycle 3]: 0.00090609, [45] [expand_dump_flag]: 1.20999e-06 [switch_simplify]: 1.051e-05 [loop_unroll]: 8.65001e-06 [a_1]: 0.00025141 [with_stream_mark]: 9.91e-06 [recompute_prepare]: 9.61e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00012313 [accelerated_algorithm]: 1.174e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.89e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 7.11001e-06 [auto_parallel]: 7.38e-06 [parallel]: 5.10999e-06 [flash_sp]: 1.05999e-06 [merge_comm]: 5.03002e-06 [allreduce_fusion]: 4.92e-06 [matmul_add_comm_reduction]: 7.6e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 9.89001e-06 [virtual_dataset]: 8.62e-06 [get_grad_eliminate_]: 8.44998e-06 [virtual_output]: 8.24002e-06 [merge_forward]: 4.05998e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 8.56002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.599e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.366e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.08998e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.349e-05 [a_after_grad]: 1.396e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 1.134e-05 [cse]: 2.669e-05 [a_3]: 6.015e-05 [py_interpret_to_execute_after_opt_a]: 1.097e-05 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 4.615e-05 [convert_after_rewriter]: 8.99e-06 [order_py_execute_after_rewriter]: 6.79999e-06 [mutable_eliminate]: 0.00048 [opt_b]: 0.00028737, [1] [Cycle 1]: 0.00028074, [7] [b_1]: 0.0001887 [b_2]: 1.051e-05 [updatestate_depend_eliminate]: 7.08e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.98001e-06 [renormalize]: 4.50003e-07 [cse]: 3.158e-05 [optimize_parallel_all_gather_comm]: 2.071e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 2.04e-05 [loop_unroll]: 0.00042464 [opt_after_cconv]: 0.00013625, [1] [Cycle 1]: 0.00013014, [7] [c_1]: 4.852e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 7.06999e-06 [updatestate_assign_eliminate]: 4.06001e-06 [updatestate_loads_eliminate]: 4.02998e-06 [cse]: 2.963e-05 [renormalize]: 6.69999e-07 [remove_dup_value]: 3.022e-05 [tuple_transform]: 0.00010044, [1] [Cycle 1]: 9.552e-05, [4] [d_1]: 6.589e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.67001e-06 [partial_unused_args_eliminate]: 2.12999e-06 [add_recomputation]: 5.723e-05 [cse_after_recomputation]: 3.089e-05, [1] [Cycle 1]: 2.627e-05, [1] [cse]: 2.114e-05 [environ_conv]: 3.47e-05 [swap_dp_allreduce_reducescatter]: 7.71001e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.65997e-06 [assign_add_opt]: 1.35999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.23002e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.714e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.75001e-06 [overlap_recompute_and_grad_model_parallel]: 6.10002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 5.32999e-06 [overlap_grad_flash_sp]: 2.491e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.13002e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 9.847e-05, [1] [Cycle 1]: 9.435e-05, [6] [build]: 9.66998e-06 [elim_shapecalc]: 1.362e-05 [elim_not_effective]: 1.811e-05 [opt_reshape]: 9.98002e-06 [fold_const_symbol]: 1.464e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 2.443e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00047038 [validate]: 4.45e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.00847016 [execute]: 7.51999e-06 Sums bootstrap : 0.000540s : 1.61% type_inference : 0.011280s : 33.64% event_method : 0.000046s : 0.14% auto_monad : 0.000120s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000147s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.39% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003260s : 9.72% optimize.opt_a.with_stream_mark : 0.000045s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000517s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.17% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001486s : 4.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.003167s : 9.44% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.24% optimize.opt_a.cse : 0.000237s : 0.71% optimize.opt_a.a_3 : 0.000461s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000480s : 1.43% optimize.opt_b.b_1 : 0.000189s : 0.56% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000425s : 1.27% optimize.opt_after_cconv.c_1 : 0.000049s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000035s : 0.10% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.40% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008470s : 25.26% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000763 222 6.04% : 0.000046s : 12: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000008s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.65% : 0.000425s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.73% : 0.000013s : 20: substitution.remove_not_recompute_node 3.10% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.62% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.44% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011206 2 86.93% : 0.009741s : 1: type_inference.infer 13.07% : 0.001465s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.67% : 0.000127s : 17: replace.inline 42.33% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 33 92.52% : 0.000416s : 17: match.inline 7.48% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.12% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.24% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000009s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.17% : 0.000009s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.50% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001576 34 57.10% : 0.000900s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.90% : 0.000676s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062799 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.85% : 0.003045s : 1: add_attr 4.83% : 0.003036s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000127s : 1: auto_monad 0.04% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.90% : 0.000568s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.05% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.06% : 0.000038s : 1: environ_conv 0.08% : 0.000053s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000489s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.87% : 0.004945s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000073s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.81% : 0.011187s : 1: opt_a 0.22% : 0.000140s : 1: opt_after_cconv 0.76% : 0.000480s : 1: opt_after_jit_grad 0.46% : 0.000291s : 1: opt_b 21.50% : 0.013502s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.79% : 0.001754s : 2: renormalize.infer 2.23% : 0.001399s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000151s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000102s : 1: symbol_engine_optimizer 13.50% : 0.008481s : 1: task_emit 0.16% : 0.000103s : 1: tuple_transform 17.99% : 0.011295s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0187371, [24] [bootstrap]: 0.0004845 [type_inference]: 0.00438779 [event_method]: 1.032e-05 [auto_monad]: 4.984e-05 [graph_reusing]: 4.93001e-06 [inline]: 1.81003e-06 [add_attr]: 0.00295326, [1] [add_attr_with_inline]: 0.00294572, [1] [Cycle 1]: 4.499e-05, [2] [tag_attr]: 1.235e-05 [meta_addattr_fg_expand]: 3.24001e-06 [parallel-infer-symbol]: 3.13998e-06 [pre_auto_parallel]: 2.117e-05 [insert-virtual-dataset]: 2.42001e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00368302, [53] [py_interpret_to_execute]: 1.565e-05 [rewriter_before_opt_a]: 3.803e-05 [opt_a]: 0.0018861, [2] [Cycle 1]: 0.00128859, [45] [expand_dump_flag]: 2.36e-06 [switch_simplify]: 2.355e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00029095 [with_stream_mark]: 1.289e-05 [recompute_prepare]: 7.3e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.711e-05 [accelerated_algorithm]: 6.19999e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.88002e-06 [merge_send_recv]: 7.82002e-06 [auto_parallel]: 5.59998e-06 [parallel]: 1.669e-05 [flash_sp]: 7.26999e-06 [merge_comm]: 4.07e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 9.66998e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 7.82998e-06 [virtual_dataset]: 6.05002e-06 [get_grad_eliminate_]: 5.77001e-06 [virtual_output]: 5.59e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.96002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.13e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 1.029e-05 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.49001e-06 [receive_attached]: 2.68003e-06 [after_resolve]: 1.104e-05 [a_after_grad]: 9.19e-06 [renormalize]: 0.00035015 [add_forward_monad_depend]: 4.68999e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.292e-05 [cse]: 2.675e-05 [a_3]: 3.919e-05 [Cycle 2]: 0.00058834, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.46002e-06 [a_1]: 0.00012573 [with_stream_mark]: 9.18002e-06 [recompute_prepare]: 5.54998e-06 [updatestate_depend_eliminate]: 2.90998e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 7.39994e-07 [a_2]: 6.765e-05 [accelerated_algorithm]: 5.67999e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.51e-06 [parallel]: 4.40999e-06 [flash_sp]: 3.02002e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.56e-06 [matmul_add_comm_reduction]: 5.29e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 5.94e-06 [virtual_dataset]: 5.15001e-06 [get_grad_eliminate_]: 5.10001e-06 [virtual_output]: 4.82e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 8.07e-06 [set_forward_comm_id_for_comm_node_pass]: 3.12002e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 8.80001e-06 [a_after_grad]: 8.05999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 5.84e-06 [cse]: 1.26e-05 [a_3]: 3.138e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.095e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 5.25999e-06 [mutable_eliminate]: 0.00044531 [opt_b]: 0.00018191, [1] [Cycle 1]: 0.00017611, [7] [b_1]: 0.00010962 [b_2]: 6.91999e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 3.80009e-07 [cse]: 1.561e-05 [optimize_parallel_all_gather_comm]: 1.669e-05 [overlap_param_gather]: 2.03002e-06 [cconv]: 2.214e-05 [loop_unroll]: 0.00041048 [opt_after_cconv]: 9.419e-05, [1] [Cycle 1]: 8.85e-05, [7] [c_1]: 2.762e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.479e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.28e-05 [tuple_transform]: 6.904e-05, [1] [Cycle 1]: 6.486e-05, [4] [d_1]: 3.96e-05 [none_parameter_eliminate]: 1.35999e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.187e-05 [cse_after_recomputation]: 1.981e-05, [1] [Cycle 1]: 1.547e-05, [1] [cse]: 1.032e-05 [environ_conv]: 4.80001e-06 [swap_dp_allreduce_reducescatter]: 5.37999e-06 [bias_add_comm_swap]: 2.30002e-06 [label_micro_interleaved_index]: 3.88001e-06 [label_fine_grained_interleaved_index]: 2.37999e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.21e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.99978e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.168e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.88999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.732e-05 [begin_end_overlap_inline]: 8.09989e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 6.922e-05, [1] [Cycle 1]: 6.514e-05, [6] [build]: 2.55002e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.178e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 9.07999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.645e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.76999e-06 [opt_after_jit_grad]: 0.00044321 [validate]: 3.104e-05 [backend_pass]: 1.14998e-06 [task_emit]: 0.00643612 [execute]: 7.23e-06 Sums bootstrap : 0.000484s : 3.27% type_inference : 0.004388s : 29.63% event_method : 0.000010s : 0.07% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.20% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.81% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000350s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000445s : 3.01% optimize.opt_b.b_1 : 0.000110s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000410s : 2.77% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000443s : 2.99% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006436s : 43.46% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 17.72% : 0.000021s : 4: substitution.arithmetic_simplify 1.59% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.44% : 0.000005s : 4: substitution.graph_param_transform 65.44% : 0.000079s : 2: substitution.inline 2.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 3.35% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004347 2 90.68% : 0.003942s : 1: type_inference.infer 9.32% : 0.000405s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000134 984 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.43% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.52% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.85% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 1.05% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.09% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000294 6 34.94% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 65.06% : 0.000192s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026652 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.10% : 0.002957s : 1: add_attr 11.06% : 0.002949s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000514s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.71% : 0.000454s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.88% : 0.000769s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001889s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.70% : 0.000453s : 1: opt_after_jit_grad 0.70% : 0.000185s : 1: opt_b 13.83% : 0.003687s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000194s : 1: renormalize.infer 0.56% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 24.19% : 0.006446s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.52% : 0.004402s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0386936, [24] [bootstrap]: 0.00052455 [type_inference]: 0.010272 [event_method]: 4.24e-05 [auto_monad]: 0.00011655 [graph_reusing]: 7.98999e-06 [inline]: 1.67999e-06 [add_attr]: 0.00313587, [1] [add_attr_with_inline]: 0.00312648, [1] [Cycle 1]: 7.269e-05, [2] [tag_attr]: 3.267e-05 [meta_addattr_fg_expand]: 8.77e-06 [parallel-infer-symbol]: 3.71999e-06 [pre_auto_parallel]: 4.924e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0146603, [53] [py_interpret_to_execute]: 4.242e-05 [rewriter_before_opt_a]: 0.00013233 [opt_a]: 0.011979, [3] [Cycle 1]: 0.00762858, [45] [expand_dump_flag]: 4.68999e-06 [switch_simplify]: 6.706e-05 [loop_unroll]: 5.522e-05 [a_1]: 0.0014338 [with_stream_mark]: 2.573e-05 [recompute_prepare]: 2.352e-05 [updatestate_depend_eliminate]: 9.58002e-06 [updatestate_assign_eliminate]: 7.66001e-06 [updatestate_loads_eliminate]: 7.38999e-06 [parameter_eliminate]: 2.92002e-06 [a_2]: 0.00024403 [accelerated_algorithm]: 3.116e-05 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 3.46001e-06 [shard_inline]: 1.589e-05 [merge_send_recv]: 1.686e-05 [auto_parallel]: 1.25e-05 [parallel]: 1.815e-05 [flash_sp]: 1.182e-05 [merge_comm]: 9.84001e-06 [allreduce_fusion]: 8.70001e-06 [matmul_add_comm_reduction]: 2.98e-05 [allreduce_slice_to_reducescatter]: 9.20001e-07 [virtual_shard_identity]: 1.831e-05 [virtual_dataset]: 1.529e-05 [get_grad_eliminate_]: 1.519e-05 [virtual_output]: 1.525e-05 [merge_forward]: 9.72001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.802e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.899e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 2.734e-05 [set_forward_comm_id_for_comm_node_pass]: 9.72999e-06 [meta_fg_expand]: 0.00153424 [flash_sp_send_recv_attached]: 3.92002e-06 [receive_attached]: 2.58998e-06 [after_resolve]: 6.39e-05 [a_after_grad]: 8.336e-05 [renormalize]: 0.00283041 [add_forward_monad_depend]: 1.321e-05 [auto_monad_grad]: 6.44999e-06 [auto_monad_eliminator]: 5.908e-05 [cse]: 0.00017279 [a_3]: 0.00035027 [Cycle 2]: 0.00341515, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 4.852e-05 [loop_unroll]: 4.353e-05 [a_1]: 0.00164841 [with_stream_mark]: 1.787e-05 [recompute_prepare]: 1.244e-05 [updatestate_depend_eliminate]: 6.33e-06 [updatestate_assign_eliminate]: 5.20999e-06 [updatestate_loads_eliminate]: 4.31002e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 0.00012835 [accelerated_algorithm]: 1.343e-05 [shard]: 2.43e-06 [meta_shard_fg_expand]: 2.81e-06 [shard_inline]: 9.10999e-06 [merge_send_recv]: 9.69999e-06 [auto_parallel]: 1.223e-05 [parallel]: 9.27999e-06 [flash_sp]: 3.81999e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 5.05001e-06 [matmul_add_comm_reduction]: 9.99001e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.116e-05 [virtual_dataset]: 9.12001e-06 [get_grad_eliminate_]: 1.034e-05 [virtual_output]: 8.79e-06 [merge_forward]: 6.17001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.214e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.721e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 1.477e-05 [set_forward_comm_id_for_comm_node_pass]: 5.77001e-06 [meta_fg_expand]: 5.056e-05 [flash_sp_send_recv_attached]: 1.37e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.723e-05 [a_after_grad]: 1.464e-05 [renormalize]: 0.00083506 [add_forward_monad_depend]: 5.20999e-06 [auto_monad_grad]: 2.11e-06 [auto_monad_eliminator]: 1.649e-05 [cse]: 5.21e-05 [a_3]: 6.756e-05 [Cycle 3]: 0.00091503, [45] [expand_dump_flag]: 1.22999e-06 [switch_simplify]: 1.075e-05 [loop_unroll]: 8.82e-06 [a_1]: 0.0002547 [with_stream_mark]: 1.105e-05 [recompute_prepare]: 9.51998e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 3.81999e-06 [parameter_eliminate]: 1.12999e-06 [a_2]: 0.00012306 [accelerated_algorithm]: 1.222e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 2.04e-06 [shard_inline]: 8.97e-06 [merge_send_recv]: 7.46999e-06 [auto_parallel]: 7.78001e-06 [parallel]: 5.17e-06 [flash_sp]: 1.08001e-06 [merge_comm]: 4.82e-06 [allreduce_fusion]: 5.46e-06 [matmul_add_comm_reduction]: 8e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.87999e-06 [virtual_dataset]: 9.12999e-06 [get_grad_eliminate_]: 8.38001e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.82e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 9.37999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.609e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.451e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39998e-06 [meta_fg_expand]: 3.12002e-06 [flash_sp_send_recv_attached]: 1.27e-06 [receive_attached]: 1.30001e-06 [after_resolve]: 1.42e-05 [a_after_grad]: 1.472e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.90024e-07 [auto_monad_eliminator]: 1.155e-05 [cse]: 2.562e-05 [a_3]: 5.746e-05 [py_interpret_to_execute_after_opt_a]: 1.68e-05 [slice_cell_reuse_recomputed_activation]: 2.78998e-06 [rewriter_after_opt_a]: 5.492e-05 [convert_after_rewriter]: 9.39e-06 [order_py_execute_after_rewriter]: 6.89999e-06 [mutable_eliminate]: 0.00073823 [opt_b]: 0.00033103, [1] [Cycle 1]: 0.00032293, [7] [b_1]: 0.00022376 [b_2]: 1.151e-05 [updatestate_depend_eliminate]: 7.68001e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 4.32e-06 [renormalize]: 3.00002e-07 [cse]: 3.443e-05 [optimize_parallel_all_gather_comm]: 2.227e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.783e-05 [loop_unroll]: 0.00046129 [opt_after_cconv]: 0.00013872, [1] [Cycle 1]: 0.00013246, [7] [c_1]: 4.909e-05 [parameter_eliminate]: 3.24001e-06 [updatestate_depend_eliminate]: 7.13998e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.89002e-06 [cse]: 3.061e-05 [renormalize]: 4.70027e-07 [remove_dup_value]: 3.88e-05 [tuple_transform]: 0.00010372, [1] [Cycle 1]: 9.88e-05, [4] [d_1]: 6.822e-05 [none_parameter_eliminate]: 1.94e-06 [renormalize]: 2.79979e-07 [switch_simplify]: 1.022e-05 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 6.278e-05 [cse_after_recomputation]: 3.23e-05, [1] [Cycle 1]: 2.759e-05, [1] [cse]: 2.168e-05 [environ_conv]: 9.87999e-06 [swap_dp_allreduce_reducescatter]: 8.03999e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.49998e-06 [label_fine_grained_interleaved_index]: 2.46998e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.46998e-06 [micro_interleaved_order_control]: 2.37001e-06 [assign_add_opt]: 1.29003e-06 [ForceFp32Comm]: 1.06002e-06 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.45002e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 1.22e-06 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.795e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 5.20999e-06 [overlap_recompute_and_grad_model_parallel]: 6.27001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.39999e-06 [overlap_grad_ring_attention]: 5.59998e-06 [overlap_grad_flash_sp]: 2.818e-05 [begin_end_overlap_inline]: 6.10016e-07 [split_matmul_comm_elemetwise]: 2.49001e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 0.00010313, [1] [Cycle 1]: 9.868e-05, [6] [build]: 1.091e-05 [elim_shapecalc]: 1.415e-05 [elim_not_effective]: 1.852e-05 [opt_reshape]: 1.018e-05 [fold_const_symbol]: 1.556e-05 [renormalize]: 1.70025e-07 [detach_backward]: 2.36998e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.535e-05 [get_jit_bprop_graph]: 2.09999e-06 [rewriter_after_jit_bprop_graph]: 4.89998e-06 [opt_after_jit_grad]: 0.00052428 [validate]: 5.585e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.009012 [execute]: 8.91997e-06 Sums bootstrap : 0.000525s : 1.53% type_inference : 0.010272s : 30.02% event_method : 0.000042s : 0.12% auto_monad : 0.000117s : 0.34% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000049s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.12% optimize.rewriter_before_opt_a : 0.000132s : 0.39% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000126s : 0.37% optimize.opt_a.loop_unroll : 0.000108s : 0.31% optimize.opt_a.a_1 : 0.003337s : 9.75% optimize.opt_a.with_stream_mark : 0.000055s : 0.16% optimize.opt_a.recompute_prepare : 0.000045s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000006s : 0.02% optimize.opt_a.a_2 : 0.000495s : 1.45% optimize.opt_a.accelerated_algorithm : 0.000057s : 0.17% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000034s : 0.10% optimize.opt_a.auto_parallel : 0.000033s : 0.10% optimize.opt_a.parallel : 0.000033s : 0.10% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.09% optimize.opt_a.merge_forward : 0.000021s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000040s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001588s : 4.64% optimize.opt_a.flash_sp_send_recv_attached : 0.000007s : 0.02% optimize.opt_a.receive_attached : 0.000006s : 0.02% optimize.opt_a.after_resolve : 0.000095s : 0.28% optimize.opt_a.a_after_grad : 0.000113s : 0.33% optimize.opt_a.renormalize : 0.003666s : 10.71% optimize.opt_a.add_forward_monad_depend : 0.000020s : 0.06% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000087s : 0.25% optimize.opt_a.cse : 0.000251s : 0.73% optimize.opt_a.a_3 : 0.000475s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000017s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000055s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000738s : 2.16% optimize.opt_b.b_1 : 0.000224s : 0.65% optimize.opt_b.b_2 : 0.000012s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.08% optimize.loop_unroll : 0.000461s : 1.35% optimize.opt_after_cconv.c_1 : 0.000049s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000039s : 0.11% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000063s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000010s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000028s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000005s : 0.01% opt_after_jit_grad : 0.000524s : 1.53% validate : 0.000056s : 0.16% backend_pass : 0.000001s : 0.00% task_emit : 0.009012s : 26.34% execute : 0.000009s : 0.03% Time group info: ------[substitution.] 0.000924 218 5.90% : 0.000055s : 11: substitution.arithmetic_simplify 1.58% : 0.000015s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000005s : 5: substitution.float_depend_g_call 0.47% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.24% : 0.000002s : 5: substitution.fold_const_symbol 0.85% : 0.000008s : 8: substitution.graph_param_transform 0.31% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 55.00% : 0.000508s : 16: substitution.inline 1.91% : 0.000018s : 2: substitution.inline_without_move 1.16% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.76% : 0.000016s : 3: substitution.less_batch_normalization 1.56% : 0.000014s : 11: substitution.minmaximum_grad 0.67% : 0.000006s : 5: substitution.partial_eliminate 1.51% : 0.000014s : 20: substitution.remove_not_recompute_node 3.16% : 0.000029s : 10: substitution.replace_applicator 1.39% : 0.000013s : 15: substitution.replace_old_param 0.31% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.31% : 0.000031s : 11: substitution.tuple_list_convert_item_index_to_positive 6.42% : 0.000059s : 11: substitution.tuple_list_get_item_const_eliminator 2.20% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 7.18% : 0.000066s : 28: substitution.tuple_list_get_item_eliminator 2.08% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010200 2 87.27% : 0.008902s : 1: type_inference.infer 12.73% : 0.001298s : 1: type_inference.specialize ------[replace.] 0.000220 30 59.80% : 0.000132s : 16: replace.inline 40.20% : 0.000089s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000530 30 94.23% : 0.000500s : 16: match.inline 5.77% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.33% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.15% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.52% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.80% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.19% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.52% : 0.000041s : 244: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 164: predicate.load_eliminater 0.40% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.41% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.66% : 0.000012s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.58% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 67: predicate.reshape_eliminate 1.20% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.38% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.30% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.24% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000037s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.41% : 0.000010s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.61% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001572 32 57.24% : 0.000900s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.76% : 0.000672s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065528 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.79% : 0.003141s : 1: add_attr 4.78% : 0.003131s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000067s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000123s : 1: auto_monad 0.04% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.84% : 0.000552s : 1: bootstrap 0.05% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.05% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.72% : 0.000470s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.14% : 0.000748s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000021s : 1: opt.transform.mutable_eliminate 7.65% : 0.005012s : 117: opt.transform.opt_a 0.07% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000038s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000207s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.08% : 0.000055s : 4: opt.transform.symbol_engine_opt 18.29% : 0.011982s : 1: opt_a 0.22% : 0.000142s : 1: opt_after_cconv 0.82% : 0.000536s : 1: opt_after_jit_grad 0.51% : 0.000335s : 1: opt_b 22.38% : 0.014665s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000031s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000047s : 1: py_interpret_to_execute 0.03% : 0.000021s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.07% : 0.000043s : 1: remove_dup_value 3.15% : 0.002061s : 2: renormalize.infer 2.42% : 0.001589s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000060s : 1: rewriter_after_opt_a 0.21% : 0.000138s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000106s : 1: symbol_engine_optimizer 13.78% : 0.009031s : 1: task_emit 0.16% : 0.000107s : 1: tuple_transform 15.70% : 0.010290s : 1: type_inference 0.15% : 0.000099s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x3-kbk],max_mem:50.0M TotalTime = 0.0885999, [24] [bootstrap]: 0.00057045 [type_inference]: 0.00608187 [event_method]: 1.374e-05 [auto_monad]: 5.722e-05 [graph_reusing]: 6.23e-06 [inline]: 1.86003e-06 [add_attr]: 0.00351529, [1] [add_attr_with_inline]: 0.00350387, [1] [Cycle 1]: 4.56e-05, [2] [tag_attr]: 1.553e-05 [meta_addattr_fg_expand]: 4.25e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 2.899e-05 [insert-virtual-dataset]: 2.28002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00401316, [53] [py_interpret_to_execute]: 1.98e-05 [rewriter_before_opt_a]: 5.906e-05 [opt_a]: 0.00217338, [2] [Cycle 1]: 0.00158238, [45] [expand_dump_flag]: 3.23998e-06 [switch_simplify]: 3.243e-05 [loop_unroll]: 2.094e-05 [a_1]: 0.00045397 [with_stream_mark]: 1.422e-05 [recompute_prepare]: 8.36002e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.36999e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.509e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 7.87998e-06 [auto_parallel]: 5.82999e-06 [parallel]: 2.289e-05 [flash_sp]: 7.21001e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.03002e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.84999e-06 [virtual_output]: 5.44e-06 [merge_forward]: 3.52002e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.108e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 9.87001e-06 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00043513 [add_forward_monad_depend]: 4.62e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.374e-05 [cse]: 7.309e-05 [a_3]: 4.113e-05 [Cycle 2]: 0.00058162, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012413 [with_stream_mark]: 9.62001e-06 [recompute_prepare]: 5.57999e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.43002e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.716e-05 [accelerated_algorithm]: 5.35999e-06 [shard]: 1.35999e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.2e-06 [flash_sp]: 3.08e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.89999e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.60001e-06 [get_grad_eliminate_]: 4.92999e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.60002e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 5.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.79002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.50999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.17e-06 [after_resolve]: 9.27999e-06 [a_after_grad]: 7.78999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.66e-06 [cse]: 1.166e-05 [a_3]: 3.132e-05 [py_interpret_to_execute_after_opt_a]: 7.60998e-06 [slice_cell_reuse_recomputed_activation]: 2.18998e-06 [rewriter_after_opt_a]: 3.056e-05 [convert_after_rewriter]: 7.31001e-06 [order_py_execute_after_rewriter]: 5.52001e-06 [mutable_eliminate]: 0.00045355 [opt_b]: 0.00018326, [1] [Cycle 1]: 0.00017714, [7] [b_1]: 0.00010601 [b_2]: 1.143e-05 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 4.39992e-07 [cse]: 1.594e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 1.91998e-06 [cconv]: 2.144e-05 [loop_unroll]: 0.00041397 [opt_after_cconv]: 9.255e-05, [1] [Cycle 1]: 8.682e-05, [7] [c_1]: 2.72e-05 [parameter_eliminate]: 2.44001e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.487e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.183e-05 [tuple_transform]: 6.891e-05, [1] [Cycle 1]: 6.466e-05, [4] [d_1]: 3.912e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.15997e-06 [partial_unused_args_eliminate]: 1.64998e-06 [add_recomputation]: 4.973e-05 [cse_after_recomputation]: 1.971e-05, [1] [Cycle 1]: 1.547e-05, [1] [cse]: 1.038e-05 [environ_conv]: 4.82998e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.51998e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 1.01002e-06 [remove_cast_before_assign_add]: 1.45999e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.145e-05 [grouped_pairwise_exchange_alltoall]: 1.51998e-06 [offloading_packed_experts]: 3.89002e-06 [overlap_recompute_and_grad_model_parallel]: 4.27e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 3.91001e-06 [overlap_grad_flash_sp]: 1.69e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.96998e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.753e-05, [1] [Cycle 1]: 6.352e-05, [6] [build]: 2.58e-06 [elim_shapecalc]: 8.65001e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 5.84e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.492e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 3.50998e-06 [opt_after_jit_grad]: 0.00045035 [validate]: 3.113e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.0735799 [execute]: 8.38001e-06 Sums bootstrap : 0.000570s : 0.68% type_inference : 0.006082s : 7.23% event_method : 0.000014s : 0.02% auto_monad : 0.000057s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000578s : 0.69% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000435s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000085s : 0.10% optimize.opt_a.a_3 : 0.000072s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.54% optimize.opt_b.b_1 : 0.000106s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000414s : 0.49% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000450s : 0.54% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.073580s : 87.47% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.45% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.30% : 0.000005s : 4: substitution.graph_param_transform 66.90% : 0.000110s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.78% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006034 2 90.63% : 0.005469s : 1: type_inference.infer 9.37% : 0.000566s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.12% : 0.000029s : 3: replace.inline 28.88% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.45% : 0.000108s : 3: match.inline 8.55% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.17% : 0.000010s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.51% : 0.000002s : 17: predicate.partial_eliminate 0.91% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.61% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.92% : 0.000001s : 11: predicate.transpose_eliminate 1.47% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 44.95% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.05% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.097656 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.60% : 0.003519s : 1: add_attr 3.59% : 0.003508s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000062s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000608s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.43% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.47% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.96% : 0.000941s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000093s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.23% : 0.002176s : 1: opt_a 0.10% : 0.000096s : 1: opt_after_cconv 0.47% : 0.000460s : 1: opt_after_jit_grad 0.19% : 0.000187s : 1: opt_b 4.11% : 0.004017s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000034s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000231s : 1: renormalize.infer 0.20% : 0.000198s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000070s : 1: symbol_engine_optimizer 75.37% : 0.073601s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 6.24% : 0.006095s : 1: type_inference 0.05% : 0.000053s : 1: validate TotalTime = 0.0749821, [24] [bootstrap]: 0.00046815 [type_inference]: 0.00442731 [event_method]: 1.054e-05 [auto_monad]: 5.139e-05 [graph_reusing]: 5.21998e-06 [inline]: 2.18002e-06 [add_attr]: 0.00473021, [1] [add_attr_with_inline]: 0.00472262, [1] [Cycle 1]: 4.124e-05, [2] [tag_attr]: 1.241e-05 [meta_addattr_fg_expand]: 3.04999e-06 [parallel-infer-symbol]: 2.69001e-06 [pre_auto_parallel]: 2.228e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.00371555, [53] [py_interpret_to_execute]: 1.441e-05 [rewriter_before_opt_a]: 3.935e-05 [opt_a]: 0.00186256, [2] [Cycle 1]: 0.00126692, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 2.388e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00028925 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 3.05998e-06 [parameter_eliminate]: 1.54998e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.53e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.08999e-06 [auto_parallel]: 5.91e-06 [parallel]: 1.756e-05 [flash_sp]: 7.48999e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 8.66002e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.115e-05 [merge_recompute_call_nodes]: 1.46002e-06 [before_grad]: 9.07999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.32997e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 8.62e-06 [renormalize]: 0.0003638 [add_forward_monad_depend]: 4.32e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.293e-05 [cse]: 2.637e-05 [a_3]: 3.956e-05 [Cycle 2]: 0.00058622, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012372 [with_stream_mark]: 1.048e-05 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.09999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.708e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.03001e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.10001e-06 [parallel]: 4.22e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.60002e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 5.89999e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.60001e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.01998e-06 [a_after_grad]: 7.82e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 6.00002e-06 [cse]: 1.348e-05 [a_3]: 3.164e-05 [py_interpret_to_execute_after_opt_a]: 7.48e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.034e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 4.72998e-06 [mutable_eliminate]: 0.0004488 [opt_b]: 0.00018026, [1] [Cycle 1]: 0.0001744, [7] [b_1]: 0.00010684 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 2.19996e-07 [cse]: 1.63e-05 [optimize_parallel_all_gather_comm]: 1.493e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.226e-05 [loop_unroll]: 0.00041655 [opt_after_cconv]: 0.00013976, [1] [Cycle 1]: 0.00013337, [7] [c_1]: 2.825e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 4.99003e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.09e-06 [cse]: 1.64e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.259e-05 [tuple_transform]: 7.067e-05, [1] [Cycle 1]: 6.618e-05, [4] [d_1]: 3.951e-05 [none_parameter_eliminate]: 1.70001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.38998e-06 [partial_unused_args_eliminate]: 1.56998e-06 [add_recomputation]: 4.435e-05 [cse_after_recomputation]: 2.116e-05, [1] [Cycle 1]: 1.665e-05, [1] [cse]: 1.157e-05 [environ_conv]: 5.24e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.18002e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.13002e-06 [reorder_send_recv_between_fp_bp]: 2.88998e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.25001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.23999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.37e-06 [overlap_grad_flash_sp]: 1.695e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.795e-05, [1] [Cycle 1]: 6.387e-05, [6] [build]: 2.10002e-06 [elim_shapecalc]: 8.13001e-06 [elim_not_effective]: 1.19e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.551e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00044754 [validate]: 3.257e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0608226 [execute]: 9.19e-06 Sums bootstrap : 0.000468s : 0.68% type_inference : 0.004427s : 6.39% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000413s : 0.60% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000364s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.65% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000417s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.65% validate : 0.000033s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060823s : 87.84% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.44% : 0.000022s : 4: substitution.arithmetic_simplify 1.58% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000005s : 4: substitution.graph_param_transform 65.60% : 0.000079s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.87% : 0.000005s : 4: substitution.remove_not_recompute_node 3.04% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004386 2 91.94% : 0.004033s : 1: type_inference.infer 8.06% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.72% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.03% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.02% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 1.03% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.15% : 0.000002s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.92% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.08% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.88% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.17% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.19% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.31% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.82% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000261 6 44.84% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.16% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084707 196 0.00% : 0.000004s : 1: ForceFp32Comm 5.59% : 0.004734s : 1: add_attr 5.58% : 0.004726s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.59% : 0.000503s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.54% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.90% : 0.000761s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.20% : 0.001866s : 1: opt_a 0.17% : 0.000143s : 1: opt_after_cconv 0.54% : 0.000457s : 1: opt_after_jit_grad 0.22% : 0.000184s : 1: opt_b 4.39% : 0.003719s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000202s : 1: renormalize.infer 0.18% : 0.000155s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 71.83% : 0.060841s : 1: task_emit 0.09% : 0.000074s : 1: tuple_transform 5.24% : 0.004441s : 1: type_inference 0.07% : 0.000056s : 1: validate TotalTime = 0.0741024, [24] [bootstrap]: 0.000471 [type_inference]: 0.00546387 [event_method]: 1.399e-05 [auto_monad]: 5.39e-05 [graph_reusing]: 5.56e-06 [inline]: 1.91e-06 [add_attr]: 0.0029431, [1] [add_attr_with_inline]: 0.00293542, [1] [Cycle 1]: 4.448e-05, [2] [tag_attr]: 1.524e-05 [meta_addattr_fg_expand]: 3.93001e-06 [parallel-infer-symbol]: 2.62001e-06 [pre_auto_parallel]: 2.466e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.0039677, [53] [py_interpret_to_execute]: 2.094e-05 [rewriter_before_opt_a]: 6.023e-05 [opt_a]: 0.0021098, [2] [Cycle 1]: 0.00150955, [45] [expand_dump_flag]: 2.93998e-06 [switch_simplify]: 3.137e-05 [loop_unroll]: 2.158e-05 [a_1]: 0.00044581 [with_stream_mark]: 1.3e-05 [recompute_prepare]: 7.3e-06 [updatestate_depend_eliminate]: 3.45998e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.677e-05 [accelerated_algorithm]: 6.49999e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.3e-06 [auto_parallel]: 6.02999e-06 [parallel]: 1.788e-05 [flash_sp]: 7.26999e-06 [merge_comm]: 3.43999e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 8.75001e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.18998e-06 [virtual_dataset]: 5.77999e-06 [get_grad_eliminate_]: 5.21998e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 8.66002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.091e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.7e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.98998e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 9.13002e-06 [renormalize]: 0.0004244 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.336e-05 [cse]: 2.663e-05 [a_3]: 4.094e-05 [Cycle 2]: 0.000591, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.60002e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00011964 [with_stream_mark]: 9.62999e-06 [recompute_prepare]: 5.49e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 7.10017e-07 [a_2]: 6.714e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.48001e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.03e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 4.45e-06 [matmul_add_comm_reduction]: 4.81002e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.02999e-06 [virtual_dataset]: 5.45001e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 6.04999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.95998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04003e-06 [after_resolve]: 9.06002e-06 [a_after_grad]: 8.47e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 8.80013e-07 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.38003e-06 [cse]: 1.396e-05 [a_3]: 3.285e-05 [py_interpret_to_execute_after_opt_a]: 7.65e-06 [slice_cell_reuse_recomputed_activation]: 1.85001e-06 [rewriter_after_opt_a]: 3.126e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 5.46e-06 [mutable_eliminate]: 0.00044849 [opt_b]: 0.0001809, [1] [Cycle 1]: 0.00017455, [7] [b_1]: 0.00010699 [b_2]: 7.01999e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 4.20026e-07 [cse]: 1.631e-05 [optimize_parallel_all_gather_comm]: 1.528e-05 [overlap_param_gather]: 2.26e-06 [cconv]: 2.173e-05 [loop_unroll]: 0.00041414 [opt_after_cconv]: 9.582e-05, [1] [Cycle 1]: 8.989e-05, [7] [c_1]: 2.743e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.667e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.241e-05 [tuple_transform]: 6.877e-05, [1] [Cycle 1]: 6.453e-05, [4] [d_1]: 3.907e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.371e-05 [cse_after_recomputation]: 2.02e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.086e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 5.10001e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.01001e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.53e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.52999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.30999e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.135e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.35e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.677e-05 [begin_end_overlap_inline]: 6.69999e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.56002e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 6.864e-05, [1] [Cycle 1]: 6.451e-05, [6] [build]: 2.54001e-06 [elim_shapecalc]: 8.34998e-06 [elim_not_effective]: 1.141e-05 [opt_reshape]: 6.05002e-06 [fold_const_symbol]: 9.22999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.511e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00044539 [validate]: 3.168e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0604422 [execute]: 8.85001e-06 Sums bootstrap : 0.000471s : 0.67% type_inference : 0.005464s : 7.79% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000565s : 0.81% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000008s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000424s : 0.60% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.64% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000414s : 0.59% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000445s : 0.63% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060442s : 86.14% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.45% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000002s : 2: substitution.fold_const_symbol 3.23% : 0.000005s : 4: substitution.graph_param_transform 66.08% : 0.000108s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.77% : 0.000005s : 4: substitution.remove_not_recompute_node 2.58% : 0.000004s : 4: substitution.replace_old_param 7.05% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005424 2 89.74% : 0.004867s : 1: type_inference.infer 10.26% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.83% : 0.000026s : 3: replace.inline 30.17% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.02% : 0.000106s : 3: match.inline 8.98% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 1.01% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 1.10% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000010s : 51: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.29% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.55% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.92% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000355 8 45.96% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.04% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.082519 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.57% : 0.002947s : 1: add_attr 3.56% : 0.002939s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000500s : 1: bootstrap 0.06% : 0.000051s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.13% : 0.000930s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.56% : 0.002113s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.55% : 0.000455s : 1: opt_after_jit_grad 0.22% : 0.000184s : 1: opt_b 4.81% : 0.003971s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000209s : 1: renormalize.infer 0.25% : 0.000208s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 73.27% : 0.060461s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.64% : 0.005477s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.113828, [24] [bootstrap]: 0.00050762 [type_inference]: 0.0113716 [event_method]: 4.785e-05 [auto_monad]: 0.0001209 [graph_reusing]: 8.52998e-06 [inline]: 2.04999e-06 [add_attr]: 0.00297615, [1] [add_attr_with_inline]: 0.00296735, [1] [Cycle 1]: 6.913e-05, [2] [tag_attr]: 3.372e-05 [meta_addattr_fg_expand]: 9.06002e-06 [parallel-infer-symbol]: 3.26999e-06 [pre_auto_parallel]: 8.55e-05 [insert-virtual-dataset]: 2.76999e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 2.20002e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0134399, [53] [py_interpret_to_execute]: 3.958e-05 [rewriter_before_opt_a]: 0.00014555 [opt_a]: 0.0111056, [3] [Cycle 1]: 0.00710921, [45] [expand_dump_flag]: 3.68e-06 [switch_simplify]: 7.345e-05 [loop_unroll]: 6.296e-05 [a_1]: 0.00144981 [with_stream_mark]: 2.287e-05 [recompute_prepare]: 2.21e-05 [updatestate_depend_eliminate]: 9.38997e-06 [updatestate_assign_eliminate]: 7.79002e-06 [updatestate_loads_eliminate]: 7.20998e-06 [parameter_eliminate]: 2.48e-06 [a_2]: 0.0002428 [accelerated_algorithm]: 3.086e-05 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 3.17997e-06 [shard_inline]: 1.598e-05 [merge_send_recv]: 1.64e-05 [auto_parallel]: 1.079e-05 [parallel]: 1.783e-05 [flash_sp]: 1.201e-05 [merge_comm]: 9.70002e-06 [allreduce_fusion]: 8.87999e-06 [matmul_add_comm_reduction]: 2.636e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.794e-05 [virtual_dataset]: 1.575e-05 [get_grad_eliminate_]: 1.513e-05 [virtual_output]: 1.497e-05 [merge_forward]: 9.50001e-06 [cell_reuse_recompute_pass]: 1.04003e-06 [offload_activation]: 1.734e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.915e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.732e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89999e-06 [meta_fg_expand]: 0.00142476 [flash_sp_send_recv_attached]: 3.61999e-06 [receive_attached]: 2.53003e-06 [after_resolve]: 6.061e-05 [a_after_grad]: 8.097e-05 [renormalize]: 0.00245858 [add_forward_monad_depend]: 9.73998e-06 [auto_monad_grad]: 5.20001e-06 [auto_monad_eliminator]: 5.73e-05 [cse]: 0.00017139 [a_3]: 0.00033793 [Cycle 2]: 0.00307659, [45] [expand_dump_flag]: 1.40999e-06 [switch_simplify]: 4.651e-05 [loop_unroll]: 4.446e-05 [a_1]: 0.00157193 [with_stream_mark]: 1.237e-05 [recompute_prepare]: 1.121e-05 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 4.24002e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012791 [accelerated_algorithm]: 1.243e-05 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 2.01998e-06 [shard_inline]: 9.32999e-06 [merge_send_recv]: 6.75998e-06 [auto_parallel]: 7.47998e-06 [parallel]: 4.83001e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.12e-06 [allreduce_fusion]: 4.50999e-06 [matmul_add_comm_reduction]: 7.92998e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.042e-05 [virtual_dataset]: 9.10001e-06 [get_grad_eliminate_]: 8.79e-06 [virtual_output]: 8.77999e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 8.30012e-07 [offload_activation]: 8.90999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.66e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 1.447e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19998e-06 [meta_fg_expand]: 7.336e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.17e-06 [after_resolve]: 1.613e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00061543 [add_forward_monad_depend]: 3.93001e-06 [auto_monad_grad]: 1.22e-06 [auto_monad_eliminator]: 1.484e-05 [cse]: 4.63e-05 [a_3]: 6.551e-05 [Cycle 3]: 0.00090575, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.068e-05 [loop_unroll]: 9.22001e-06 [a_1]: 0.00025231 [with_stream_mark]: 1.005e-05 [recompute_prepare]: 9.37001e-06 [updatestate_depend_eliminate]: 4.72e-06 [updatestate_assign_eliminate]: 3.9e-06 [updatestate_loads_eliminate]: 3.80998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012397 [accelerated_algorithm]: 1.183e-05 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 9.17001e-06 [merge_send_recv]: 7.1e-06 [auto_parallel]: 7.15e-06 [parallel]: 4.42003e-06 [flash_sp]: 9.90025e-07 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 9.91998e-06 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.35001e-06 [virtual_output]: 8.13999e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.62e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.418e-05 [set_forward_comm_id_for_comm_node_pass]: 5.96998e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.474e-05 [a_after_grad]: 1.46e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 1.043e-05 [cse]: 2.632e-05 [a_3]: 5.888e-05 [py_interpret_to_execute_after_opt_a]: 9.99999e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 4.665e-05 [convert_after_rewriter]: 8.92999e-06 [order_py_execute_after_rewriter]: 6.59999e-06 [mutable_eliminate]: 0.00047208 [opt_b]: 0.00033742, [1] [Cycle 1]: 0.00033089, [7] [b_1]: 0.00023635 [b_2]: 1.091e-05 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 4.17998e-06 [renormalize]: 3.69997e-07 [cse]: 3.214e-05 [optimize_parallel_all_gather_comm]: 2.167e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 1.977e-05 [loop_unroll]: 0.00043266 [opt_after_cconv]: 0.00013761, [1] [Cycle 1]: 0.00013147, [7] [c_1]: 4.92e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 7.22002e-06 [updatestate_assign_eliminate]: 4.41002e-06 [updatestate_loads_eliminate]: 3.85e-06 [cse]: 3.044e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 2.875e-05 [tuple_transform]: 0.00010348, [1] [Cycle 1]: 9.871e-05, [4] [d_1]: 6.838e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 9.89999e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.507e-05 [cse_after_recomputation]: 3.256e-05, [1] [Cycle 1]: 2.784e-05, [1] [cse]: 2.221e-05 [environ_conv]: 9.07001e-06 [swap_dp_allreduce_reducescatter]: 7.83001e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.39001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.672e-05 [grouped_pairwise_exchange_alltoall]: 1.42e-06 [offloading_packed_experts]: 4.90999e-06 [overlap_recompute_and_grad_model_parallel]: 5.83997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 5.02e-06 [overlap_grad_flash_sp]: 2.466e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.892e-05, [1] [Cycle 1]: 9.47e-05, [6] [build]: 9.61e-06 [elim_shapecalc]: 1.332e-05 [elim_not_effective]: 1.859e-05 [opt_reshape]: 1.05e-05 [fold_const_symbol]: 1.488e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.457e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00048017 [validate]: 4.548e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.084478 [execute]: 8.90001e-06 Sums bootstrap : 0.000508s : 0.46% type_inference : 0.011372s : 10.38% event_method : 0.000048s : 0.04% auto_monad : 0.000121s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000086s : 0.08% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.04% optimize.rewriter_before_opt_a : 0.000146s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.12% optimize.opt_a.loop_unroll : 0.000117s : 0.11% optimize.opt_a.a_1 : 0.003274s : 2.99% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.45% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001501s : 1.37% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.10% optimize.opt_a.renormalize : 0.003074s : 2.81% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000244s : 0.22% optimize.opt_a.a_3 : 0.000462s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000472s : 0.43% optimize.opt_b.b_1 : 0.000236s : 0.22% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000433s : 0.39% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000480s : 0.44% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.084478s : 77.09% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000760 222 5.87% : 0.000045s : 12: substitution.arithmetic_simplify 1.80% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000002s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.73% : 0.000424s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.83% : 0.000014s : 20: substitution.remove_not_recompute_node 3.09% : 0.000024s : 10: substitution.replace_applicator 1.43% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.77% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011299 2 86.39% : 0.009761s : 1: type_inference.infer 13.61% : 0.001538s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.33% : 0.000126s : 17: replace.inline 42.67% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 33 92.27% : 0.000415s : 17: match.inline 7.73% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000755 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.36% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.94% : 0.000015s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.09% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001596 34 56.00% : 0.000894s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.00% : 0.000702s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.138636 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.15% : 0.002981s : 1: add_attr 2.14% : 0.002971s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000128s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.39% : 0.000541s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000054s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.57% : 0.004944s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000221s : 28: opt.transform.opt_b 0.05% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.01% : 0.011109s : 1: opt_a 0.10% : 0.000141s : 1: opt_after_cconv 0.35% : 0.000490s : 1: opt_after_jit_grad 0.25% : 0.000341s : 1: opt_b 9.70% : 0.013444s : 1: optimize 0.02% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.07% : 0.000091s : 1: pre_auto_parallel 0.03% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.17% : 0.001624s : 2: renormalize.infer 1.04% : 0.001436s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.11% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000102s : 1: symbol_engine_optimizer 60.95% : 0.084495s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 8.21% : 0.011386s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.0789126, [24] [bootstrap]: 0.00052175 [type_inference]: 0.00430461 [event_method]: 1.11e-05 [auto_monad]: 5.113e-05 [graph_reusing]: 5.00001e-06 [inline]: 1.62999e-06 [add_attr]: 0.0029532, [1] [add_attr_with_inline]: 0.00294498, [1] [Cycle 1]: 4.659e-05, [2] [tag_attr]: 1.143e-05 [meta_addattr_fg_expand]: 3.27002e-06 [parallel-infer-symbol]: 2.70002e-06 [pre_auto_parallel]: 2.047e-05 [insert-virtual-dataset]: 2.45002e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00372313, [53] [py_interpret_to_execute]: 1.436e-05 [rewriter_before_opt_a]: 3.791e-05 [opt_a]: 0.00191227, [2] [Cycle 1]: 0.00131139, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 2.446e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.00033955 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 3.45e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.75002e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.59e-05 [accelerated_algorithm]: 6.89999e-06 [shard]: 2.58e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 7.35998e-06 [auto_parallel]: 5.76e-06 [parallel]: 1.74e-05 [flash_sp]: 8.37e-06 [merge_comm]: 3.62998e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 8.72e-06 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.58002e-06 [get_grad_eliminate_]: 5.53002e-06 [virtual_output]: 5.51002e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.93002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.065e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 9.28997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42997e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.62001e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.057e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00035009 [add_forward_monad_depend]: 4.50001e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.269e-05 [cse]: 2.827e-05 [a_3]: 4.011e-05 [Cycle 2]: 0.00059153, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.90998e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.0001246 [with_stream_mark]: 9.19998e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.693e-05 [accelerated_algorithm]: 5.61003e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.05001e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.39998e-06 [auto_parallel]: 5.20999e-06 [parallel]: 4.38001e-06 [flash_sp]: 3.3e-06 [merge_comm]: 2.84999e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 4.70999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.93998e-06 [virtual_dataset]: 5.10001e-06 [get_grad_eliminate_]: 4.95001e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 5.87001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.92001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.82002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 1.07998e-06 [receive_attached]: 1.05999e-06 [after_resolve]: 8.94998e-06 [a_after_grad]: 8.14997e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.21997e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.44001e-06 [cse]: 1.309e-05 [a_3]: 3.28e-05 [py_interpret_to_execute_after_opt_a]: 7.53999e-06 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 3.063e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.17999e-06 [mutable_eliminate]: 0.00045143 [opt_b]: 0.00018298, [1] [Cycle 1]: 0.00017685, [7] [b_1]: 0.0001079 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 2.59985e-07 [cse]: 1.684e-05 [optimize_parallel_all_gather_comm]: 1.645e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.0004145 [opt_after_cconv]: 9.517e-05, [1] [Cycle 1]: 8.913e-05, [7] [c_1]: 2.754e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.622e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.156e-05 [tuple_transform]: 6.812e-05, [1] [Cycle 1]: 6.38e-05, [4] [d_1]: 3.83e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.466e-05 [cse_after_recomputation]: 1.998e-05, [1] [Cycle 1]: 1.563e-05, [1] [cse]: 1.053e-05 [environ_conv]: 4.51002e-06 [swap_dp_allreduce_reducescatter]: 5.01002e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.24998e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.68002e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.69972e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.105e-05 [grouped_pairwise_exchange_alltoall]: 1.85001e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.804e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.68002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.766e-05, [1] [Cycle 1]: 6.367e-05, [6] [build]: 2.15002e-06 [elim_shapecalc]: 8.03999e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.80001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.551e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.47997e-06 [opt_after_jit_grad]: 0.00047861 [validate]: 3.278e-05 [backend_pass]: 1.17e-06 [task_emit]: 0.0665588 [execute]: 8.73001e-06 Sums bootstrap : 0.000522s : 0.70% type_inference : 0.004305s : 5.74% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000020s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000464s : 0.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000350s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.60% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000414s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000479s : 0.64% validate : 0.000033s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.066559s : 88.77% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.08% : 0.000022s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.57% : 0.000005s : 4: substitution.graph_param_transform 65.69% : 0.000079s : 2: substitution.inline 2.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004265 2 91.86% : 0.003918s : 1: type_inference.infer 8.14% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 1.03% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.45% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.37% : 0.000001s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000002s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.02% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.51% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 1.02% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.84% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.60% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.47% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 42.62% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.38% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.086909 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.40% : 0.002958s : 1: add_attr 3.39% : 0.002948s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.64% : 0.000558s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.94% : 0.000814s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.20% : 0.001915s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.56% : 0.000488s : 1: opt_after_jit_grad 0.21% : 0.000186s : 1: opt_b 4.29% : 0.003727s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000024s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.22% : 0.000193s : 1: renormalize.infer 0.17% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.61% : 0.066581s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 4.97% : 0.004319s : 1: type_inference 0.06% : 0.000055s : 1: validate TotalTime = 0.114477, [24] [bootstrap]: 0.00051696 [type_inference]: 0.0103698 [event_method]: 4.269e-05 [auto_monad]: 0.00011605 [graph_reusing]: 8.43999e-06 [inline]: 2.49999e-06 [add_attr]: 0.00303479, [1] [add_attr_with_inline]: 0.0030269, [1] [Cycle 1]: 6.708e-05, [2] [tag_attr]: 3.15e-05 [meta_addattr_fg_expand]: 8.38999e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 4.536e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 8.49977e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.0132236, [53] [py_interpret_to_execute]: 3.657e-05 [rewriter_before_opt_a]: 0.00012778 [opt_a]: 0.0109404, [3] [Cycle 1]: 0.00698495, [45] [expand_dump_flag]: 3.62002e-06 [switch_simplify]: 6.774e-05 [loop_unroll]: 5.549e-05 [a_1]: 0.00135696 [with_stream_mark]: 2.291e-05 [recompute_prepare]: 2.119e-05 [updatestate_depend_eliminate]: 9.29e-06 [updatestate_assign_eliminate]: 8.25999e-06 [updatestate_loads_eliminate]: 7.51001e-06 [parameter_eliminate]: 2.51e-06 [a_2]: 0.00024597 [accelerated_algorithm]: 3.097e-05 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 3.34001e-06 [shard_inline]: 1.606e-05 [merge_send_recv]: 1.632e-05 [auto_parallel]: 1.051e-05 [parallel]: 1.907e-05 [flash_sp]: 1.195e-05 [merge_comm]: 9.88002e-06 [allreduce_fusion]: 8.80999e-06 [matmul_add_comm_reduction]: 2.611e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.782e-05 [virtual_dataset]: 1.577e-05 [get_grad_eliminate_]: 1.521e-05 [virtual_output]: 1.509e-05 [merge_forward]: 1.036e-05 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.825e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.881e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.713e-05 [set_forward_comm_id_for_comm_node_pass]: 1.003e-05 [meta_fg_expand]: 0.00141875 [flash_sp_send_recv_attached]: 3.66001e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 5.937e-05 [a_after_grad]: 8.185e-05 [renormalize]: 0.00244157 [add_forward_monad_depend]: 9.32999e-06 [auto_monad_grad]: 6.34999e-06 [auto_monad_eliminator]: 5.608e-05 [cse]: 0.00017193 [a_3]: 0.00033832 [Cycle 2]: 0.00303346, [45] [expand_dump_flag]: 1.61002e-06 [switch_simplify]: 4.741e-05 [loop_unroll]: 4.419e-05 [a_1]: 0.00159273 [with_stream_mark]: 1.194e-05 [recompute_prepare]: 1.094e-05 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 3.88999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012722 [accelerated_algorithm]: 1.236e-05 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 9.35001e-06 [merge_send_recv]: 6.78e-06 [auto_parallel]: 7.52998e-06 [parallel]: 4.77998e-06 [flash_sp]: 3.03e-06 [merge_comm]: 4.99998e-06 [allreduce_fusion]: 4.57998e-06 [matmul_add_comm_reduction]: 8.54e-06 [allreduce_slice_to_reducescatter]: 4.60015e-07 [virtual_shard_identity]: 1e-05 [virtual_dataset]: 9.24998e-06 [get_grad_eliminate_]: 9.59999e-06 [virtual_output]: 8.52e-06 [merge_forward]: 4.74998e-06 [cell_reuse_recompute_pass]: 8.80013e-07 [offload_activation]: 8.99998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.668e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.402e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20999e-06 [meta_fg_expand]: 3.373e-05 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.469e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00058773 [add_forward_monad_depend]: 4.08001e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 1.502e-05 [cse]: 4.612e-05 [a_3]: 6.592e-05 [Cycle 3]: 0.00090811, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 1.039e-05 [loop_unroll]: 8.93002e-06 [a_1]: 0.00025426 [with_stream_mark]: 1.035e-05 [recompute_prepare]: 9.27999e-06 [updatestate_depend_eliminate]: 4.78001e-06 [updatestate_assign_eliminate]: 3.91001e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 0.00012389 [accelerated_algorithm]: 1.153e-05 [shard]: 8.80013e-07 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 9.02999e-06 [merge_send_recv]: 7.31999e-06 [auto_parallel]: 7.25e-06 [parallel]: 4.75001e-06 [flash_sp]: 1.09e-06 [merge_comm]: 4.90999e-06 [allreduce_fusion]: 4.99e-06 [matmul_add_comm_reduction]: 7.53999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 9.96998e-06 [virtual_dataset]: 8.88002e-06 [get_grad_eliminate_]: 8.57998e-06 [virtual_output]: 8.57998e-06 [merge_forward]: 4.18999e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.52e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.605e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27001e-06 [meta_fg_expand]: 2.97002e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.464e-05 [a_after_grad]: 1.458e-05 [renormalize]: 1.19995e-07 [add_forward_monad_depend]: 1.27999e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 1.066e-05 [cse]: 2.621e-05 [a_3]: 5.944e-05 [py_interpret_to_execute_after_opt_a]: 1.036e-05 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 4.751e-05 [convert_after_rewriter]: 9.25999e-06 [order_py_execute_after_rewriter]: 7.4e-06 [mutable_eliminate]: 0.00048041 [opt_b]: 0.00028978, [1] [Cycle 1]: 0.00028335, [7] [b_1]: 0.00019043 [b_2]: 1.088e-05 [updatestate_depend_eliminate]: 7.28999e-06 [updatestate_assign_eliminate]: 4.07998e-06 [updatestate_loads_eliminate]: 4.25e-06 [renormalize]: 3.10014e-07 [cse]: 3.075e-05 [optimize_parallel_all_gather_comm]: 2.017e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 2.133e-05 [loop_unroll]: 0.00043244 [opt_after_cconv]: 0.00013869, [1] [Cycle 1]: 0.00013271, [7] [c_1]: 4.948e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 7.10998e-06 [updatestate_assign_eliminate]: 4.42998e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 3.049e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.912e-05 [tuple_transform]: 0.00010284, [1] [Cycle 1]: 9.82e-05, [4] [d_1]: 6.759e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.039e-05 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 5.742e-05 [cse_after_recomputation]: 3.242e-05, [1] [Cycle 1]: 2.779e-05, [1] [cse]: 2.215e-05 [environ_conv]: 9.07999e-06 [swap_dp_allreduce_reducescatter]: 7.71001e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.59998e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 1.42e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.67999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.704e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.89999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 5.02999e-06 [overlap_grad_flash_sp]: 2.487e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 9.883e-05, [1] [Cycle 1]: 9.46e-05, [6] [build]: 9.23002e-06 [elim_shapecalc]: 1.361e-05 [elim_not_effective]: 1.84e-05 [opt_reshape]: 9.93002e-06 [fold_const_symbol]: 1.529e-05 [renormalize]: 2.60014e-07 [detach_backward]: 2.07001e-06 [pipeline_parallel_scheduler]: 1.51998e-06 [auto_monad_reorder]: 2.412e-05 [get_jit_bprop_graph]: 1.21002e-06 [rewriter_after_jit_bprop_graph]: 3.71999e-06 [opt_after_jit_grad]: 0.0004753 [validate]: 4.736e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0863338 [execute]: 7.63999e-06 Sums bootstrap : 0.000517s : 0.47% type_inference : 0.010370s : 9.41% event_method : 0.000043s : 0.04% auto_monad : 0.000116s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000128s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.11% optimize.opt_a.loop_unroll : 0.000109s : 0.10% optimize.opt_a.a_1 : 0.003204s : 2.91% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000497s : 0.45% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001455s : 1.32% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.10% optimize.opt_a.renormalize : 0.003029s : 2.75% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.07% optimize.opt_a.cse : 0.000244s : 0.22% optimize.opt_a.a_3 : 0.000464s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000480s : 0.44% optimize.opt_b.b_1 : 0.000190s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000432s : 0.39% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000475s : 0.43% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.086334s : 78.36% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000737 218 6.01% : 0.000044s : 11: substitution.arithmetic_simplify 1.91% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.71% : 0.000403s : 16: substitution.inline 2.11% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.08% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.28% : 0.000024s : 10: substitution.replace_applicator 1.45% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.82% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010303 2 86.81% : 0.008945s : 1: type_inference.infer 13.19% : 0.001359s : 1: type_inference.specialize ------[replace.] 0.000246 30 65.91% : 0.000162s : 16: replace.inline 34.09% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000428 30 92.47% : 0.000395s : 16: match.inline 7.53% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000747 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.12% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.69% : 0.000043s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.95% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.25% : 0.000009s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.95% : 0.000037s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001525 32 57.39% : 0.000875s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.61% : 0.000650s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.138959 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.19% : 0.003039s : 1: add_attr 2.18% : 0.003031s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000123s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000553s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000441s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000490s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.50% : 0.004864s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000176s : 28: opt.transform.opt_b 0.05% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 7.88% : 0.010943s : 1: opt_a 0.10% : 0.000142s : 1: opt_after_cconv 0.35% : 0.000485s : 1: opt_after_jit_grad 0.21% : 0.000294s : 1: opt_b 9.52% : 0.013228s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.15% : 0.001598s : 2: renormalize.infer 1.02% : 0.001418s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000102s : 1: symbol_engine_optimizer 62.14% : 0.086351s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 7.47% : 0.010384s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x3-ge],max_mem:50.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x4-pynative],max_mem:50.0M TotalTime = 0.0213994, [24] [bootstrap]: 0.00055607 [type_inference]: 0.00617947 [event_method]: 1.382e-05 [auto_monad]: 5.724e-05 [graph_reusing]: 5.39e-06 [inline]: 2.41e-06 [add_attr]: 0.0034249, [1] [add_attr_with_inline]: 0.00341388, [1] [Cycle 1]: 4.611e-05, [2] [tag_attr]: 1.56e-05 [meta_addattr_fg_expand]: 4.25999e-06 [parallel-infer-symbol]: 3.01999e-06 [pre_auto_parallel]: 2.856e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00399112, [53] [py_interpret_to_execute]: 1.927e-05 [rewriter_before_opt_a]: 5.775e-05 [opt_a]: 0.00215151, [2] [Cycle 1]: 0.0015494, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 3.191e-05 [loop_unroll]: 2.104e-05 [a_1]: 0.000493 [with_stream_mark]: 1.392e-05 [recompute_prepare]: 7.90998e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.577e-05 [accelerated_algorithm]: 6.31998e-06 [shard]: 2.09999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 5.99e-06 [parallel]: 2.317e-05 [flash_sp]: 7.68001e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.23001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.36999e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.03e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 9.18002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47997e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 2.41998e-06 [receive_attached]: 2.41e-06 [after_resolve]: 9.78002e-06 [a_after_grad]: 8.66002e-06 [renormalize]: 0.00041525 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.331e-05 [cse]: 2.772e-05 [a_3]: 3.982e-05 [Cycle 2]: 0.00059244, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.000126 [with_stream_mark]: 9.47999e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 6.735e-05 [accelerated_algorithm]: 5.40001e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.68997e-06 [merge_send_recv]: 4.54002e-06 [auto_parallel]: 5.76e-06 [parallel]: 4.35e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 2.74999e-06 [allreduce_fusion]: 3.16999e-06 [matmul_add_comm_reduction]: 5.00001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 5.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.05e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45003e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.09e-06 [a_after_grad]: 8.30999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.24999e-06 [cse]: 1.271e-05 [a_3]: 3.598e-05 [py_interpret_to_execute_after_opt_a]: 7.63001e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 3.057e-05 [convert_after_rewriter]: 6.71999e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00044798 [opt_b]: 0.0001803, [1] [Cycle 1]: 0.0001745, [7] [b_1]: 0.00010744 [b_2]: 6.71999e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.80009e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.535e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.209e-05 [loop_unroll]: 0.00041678 [opt_after_cconv]: 9.585e-05, [1] [Cycle 1]: 9.004e-05, [7] [c_1]: 2.796e-05 [parameter_eliminate]: 2.63998e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.693e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.22e-05 [tuple_transform]: 6.772e-05, [1] [Cycle 1]: 6.332e-05, [4] [d_1]: 3.82e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.933e-05 [cse_after_recomputation]: 2.107e-05, [1] [Cycle 1]: 1.676e-05, [1] [cse]: 1.152e-05 [environ_conv]: 4.51002e-06 [swap_dp_allreduce_reducescatter]: 5.56e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.40999e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.35001e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.19998e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.177e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 3.67002e-06 [overlap_grad_flash_sp]: 1.636e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 6.881e-05, [1] [Cycle 1]: 6.475e-05, [6] [build]: 2.18002e-06 [elim_shapecalc]: 8.69998e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 9.27001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.556e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00049477 [validate]: 3.201e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.00637614 [execute]: 6.58e-06 Sums bootstrap : 0.000556s : 3.27% type_inference : 0.006179s : 36.31% event_method : 0.000014s : 0.08% auto_monad : 0.000057s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000619s : 3.64% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000415s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000076s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 2.63% optimize.opt_b.b_1 : 0.000107s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000417s : 2.45% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000495s : 2.91% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006376s : 37.47% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000204 30 11.87% : 0.000024s : 5: substitution.arithmetic_simplify 0.83% : 0.000002s : 2: substitution.elim_not_effective 0.64% : 0.000001s : 2: substitution.fold_const_symbol 2.38% : 0.000005s : 4: substitution.graph_param_transform 73.65% : 0.000150s : 3: substitution.inline 1.52% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.03% : 0.000004s : 4: substitution.remove_not_recompute_node 1.88% : 0.000004s : 4: substitution.replace_old_param 5.19% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006136 2 89.93% : 0.005518s : 1: type_inference.infer 10.07% : 0.000618s : 1: type_inference.specialize ------[replace.] 0.000038 5 71.30% : 0.000027s : 3: replace.inline 28.70% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000158 5 93.96% : 0.000148s : 3: match.inline 6.04% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.23% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 1.01% : 0.000002s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.22% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000364 8 45.38% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.62% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030364 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.29% : 0.003429s : 1: add_attr 11.25% : 0.003417s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000063s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.95% : 0.000593s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.23% : 0.000982s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.10% : 0.002154s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.66% : 0.000504s : 1: opt_after_jit_grad 0.61% : 0.000184s : 1: opt_b 13.16% : 0.003995s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000214s : 1: renormalize.infer 0.64% : 0.000194s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 21.03% : 0.006386s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.40% : 0.006193s : 1: type_inference 0.19% : 0.000059s : 1: validate TotalTime = 0.0184728, [24] [bootstrap]: 0.00047279 [type_inference]: 0.00437808 [event_method]: 1.141e-05 [auto_monad]: 5.352e-05 [graph_reusing]: 5.21002e-06 [inline]: 2.21e-06 [add_attr]: 0.00310481, [1] [add_attr_with_inline]: 0.00309641, [1] [Cycle 1]: 4.572e-05, [2] [tag_attr]: 1.301e-05 [meta_addattr_fg_expand]: 2.73e-06 [parallel-infer-symbol]: 3.25e-06 [pre_auto_parallel]: 2.519e-05 [insert-virtual-dataset]: 2.74001e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00383203, [53] [py_interpret_to_execute]: 1.65e-05 [rewriter_before_opt_a]: 3.958e-05 [opt_a]: 0.00199064, [2] [Cycle 1]: 0.00134991, [45] [expand_dump_flag]: 2.44001e-06 [switch_simplify]: 2.426e-05 [loop_unroll]: 1.333e-05 [a_1]: 0.00030135 [with_stream_mark]: 1.631e-05 [recompute_prepare]: 7.25e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.735e-05 [accelerated_algorithm]: 7.1e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.83997e-06 [merge_send_recv]: 7.95998e-06 [auto_parallel]: 6.24999e-06 [parallel]: 1.737e-05 [flash_sp]: 7.28999e-06 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.09998e-06 [allreduce_slice_to_reducescatter]: 5.99975e-07 [virtual_shard_identity]: 7.32997e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.58002e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.113e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.51003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.79002e-06 [meta_fg_expand]: 2.47001e-06 [flash_sp_send_recv_attached]: 2.75002e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 8.42e-06 [renormalize]: 0.00041946 [add_forward_monad_depend]: 4.63001e-06 [auto_monad_grad]: 1.94999e-06 [auto_monad_eliminator]: 1.345e-05 [cse]: 2.727e-05 [a_3]: 4.062e-05 [Cycle 2]: 0.00063082, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012569 [with_stream_mark]: 1.094e-05 [recompute_prepare]: 5.60001e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.09e-06 [updatestate_loads_eliminate]: 2.36998e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.981e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.21002e-06 [meta_shard_fg_expand]: 1.35999e-06 [shard_inline]: 5.81998e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.18999e-06 [flash_sp]: 2.98998e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.58e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.72001e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.34999e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.96001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.91999e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 8.15e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 7e-06 [cse]: 1.323e-05 [a_3]: 3.258e-05 [py_interpret_to_execute_after_opt_a]: 7.85e-06 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 3.086e-05 [convert_after_rewriter]: 6.88998e-06 [order_py_execute_after_rewriter]: 4.79002e-06 [mutable_eliminate]: 0.0004799 [opt_b]: 0.0001809, [1] [Cycle 1]: 0.00017459, [7] [b_1]: 0.00010803 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.28002e-06 [renormalize]: 5.49975e-07 [cse]: 1.545e-05 [optimize_parallel_all_gather_comm]: 1.614e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.236e-05 [loop_unroll]: 0.00041719 [opt_after_cconv]: 9.444e-05, [1] [Cycle 1]: 8.875e-05, [7] [c_1]: 2.744e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.575e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.164e-05 [tuple_transform]: 6.833e-05, [1] [Cycle 1]: 6.366e-05, [4] [d_1]: 3.866e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.448e-05 [cse_after_recomputation]: 2.049e-05, [1] [Cycle 1]: 1.576e-05, [1] [cse]: 1.075e-05 [environ_conv]: 4.85999e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 3.91999e-06 [label_fine_grained_interleaved_index]: 2.99001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 7.10017e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.30999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.176e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66002e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.671e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.87999e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.902e-05, [1] [Cycle 1]: 6.483e-05, [6] [build]: 2.81999e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.136e-05 [opt_reshape]: 5.90002e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.552e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.19001e-06 [opt_after_jit_grad]: 0.00044855 [validate]: 3.209e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00587876 [execute]: 6.51e-06 Sums bootstrap : 0.000473s : 3.29% type_inference : 0.004378s : 30.45% event_method : 0.000011s : 0.08% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.18% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000017s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000427s : 2.97% optimize.opt_a.with_stream_mark : 0.000027s : 0.19% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000420s : 2.92% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000480s : 3.34% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000417s : 2.90% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.12% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005879s : 40.88% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000127 26 17.62% : 0.000022s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 3.98% : 0.000005s : 4: substitution.graph_param_transform 67.46% : 0.000086s : 2: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.26% : 0.000004s : 4: substitution.remove_not_recompute_node 3.16% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004339 2 92.17% : 0.003999s : 1: type_inference.infer 7.83% : 0.000340s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000084 2 100.00% : 0.000084s : 2: match.inline ------[predicate.] 0.000138 984 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 2.12% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000009s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.66% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.65% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.99% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.48% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.67% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 41.60% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.40% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026768 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.61% : 0.003109s : 1: add_attr 11.58% : 0.003100s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.87% : 0.000500s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000426s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.83% : 0.000489s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000782s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.45% : 0.001993s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000458s : 1: opt_after_jit_grad 0.69% : 0.000185s : 1: opt_b 14.33% : 0.003836s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000029s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.89% : 0.000239s : 1: renormalize.infer 0.65% : 0.000174s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.00% : 0.005888s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.41% : 0.004393s : 1: type_inference 0.22% : 0.000059s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x4-kbk],max_mem:50.0M TotalTime = 0.122702, [24] [bootstrap]: 0.00053786 [type_inference]: 0.00598737 [event_method]: 1.418e-05 [auto_monad]: 5.456e-05 [graph_reusing]: 5.52999e-06 [inline]: 1.62999e-06 [add_attr]: 0.00337535, [1] [add_attr_with_inline]: 0.00336469, [1] [Cycle 1]: 4.371e-05, [2] [tag_attr]: 1.504e-05 [meta_addattr_fg_expand]: 3.85998e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.784e-05 [insert-virtual-dataset]: 2.21e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.80001e-06 [optimize]: 0.00395306, [53] [py_interpret_to_execute]: 1.991e-05 [rewriter_before_opt_a]: 5.76e-05 [opt_a]: 0.00212264, [2] [Cycle 1]: 0.00152375, [45] [expand_dump_flag]: 3.06001e-06 [switch_simplify]: 3.189e-05 [loop_unroll]: 2.071e-05 [a_1]: 0.0004488 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 7.93001e-06 [updatestate_depend_eliminate]: 4.27e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.6e-05 [accelerated_algorithm]: 6.25002e-06 [shard]: 1.96998e-06 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 5.97001e-06 [merge_send_recv]: 7.56001e-06 [auto_parallel]: 6.26e-06 [parallel]: 2.247e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.63001e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.50998e-06 [virtual_dataset]: 6.05002e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.47999e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 1.04003e-06 [offload_activation]: 9.34e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.41998e-06 [before_grad]: 8.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.74001e-06 [flash_sp_send_recv_attached]: 3.02002e-06 [receive_attached]: 2.78998e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 8.85999e-06 [renormalize]: 0.0004093 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.294e-05 [cse]: 2.576e-05 [a_3]: 6.271e-05 [Cycle 2]: 0.00058948, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.71e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.00012562 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 5.61003e-06 [updatestate_depend_eliminate]: 2.83003e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.669e-05 [accelerated_algorithm]: 5.36002e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.13001e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.62001e-06 [matmul_add_comm_reduction]: 4.90001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.26e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.14998e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.53998e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.14998e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.76001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.09003e-06 [after_resolve]: 8.79e-06 [a_after_grad]: 7.98001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.409e-05 [a_3]: 3.16e-05 [py_interpret_to_execute_after_opt_a]: 7.24001e-06 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 3.095e-05 [convert_after_rewriter]: 6.53e-06 [order_py_execute_after_rewriter]: 5.00999e-06 [mutable_eliminate]: 0.00045045 [opt_b]: 0.00018218, [1] [Cycle 1]: 0.00017586, [7] [b_1]: 0.0001073 [b_2]: 7.4e-06 [updatestate_depend_eliminate]: 5.09003e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.46998e-06 [renormalize]: 5.3001e-07 [cse]: 1.588e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.198e-05 [loop_unroll]: 0.00041392 [opt_after_cconv]: 9.441e-05, [1] [Cycle 1]: 8.884e-05, [7] [c_1]: 2.754e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.622e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.256e-05 [tuple_transform]: 6.762e-05, [1] [Cycle 1]: 6.338e-05, [4] [d_1]: 3.798e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 4.874e-05 [cse_after_recomputation]: 2.033e-05, [1] [Cycle 1]: 1.593e-05, [1] [cse]: 1.094e-05 [environ_conv]: 4.55999e-06 [swap_dp_allreduce_reducescatter]: 4.92e-06 [bias_add_comm_swap]: 2.19999e-06 [label_micro_interleaved_index]: 4.29997e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 9.49978e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.31998e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.33002e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61002e-06 [control_data_broadcast_order]: 1.135e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.23002e-06 [overlap_grad_ring_attention]: 3.93999e-06 [overlap_grad_flash_sp]: 1.68e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.87999e-06 [split_layernorm_comm]: 1.82999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.673e-05, [1] [Cycle 1]: 6.268e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.45999e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.61002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.512e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.70998e-06 [opt_after_jit_grad]: 0.00044754 [validate]: 3.044e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.108015 [execute]: 9.09e-06 Sums bootstrap : 0.000538s : 0.45% type_inference : 0.005987s : 5.06% event_method : 0.000014s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000574s : 0.49% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000409s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.03% optimize.opt_a.a_3 : 0.000094s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000450s : 0.38% optimize.opt_b.b_1 : 0.000107s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000414s : 0.35% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000448s : 0.38% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.108015s : 91.26% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000162 30 14.50% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 67.36% : 0.000109s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 6.73% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005944 2 90.73% : 0.005393s : 1: type_inference.infer 9.27% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.13% : 0.000027s : 3: replace.inline 29.87% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.56% : 0.000107s : 3: match.inline 8.44% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 47.17% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.83% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131549 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.57% : 0.003380s : 1: add_attr 2.56% : 0.003368s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000059s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000574s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.73% : 0.000960s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000042s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.62% : 0.002126s : 1: opt_a 0.07% : 0.000098s : 1: opt_after_cconv 0.35% : 0.000457s : 1: opt_after_jit_grad 0.14% : 0.000186s : 1: opt_b 3.01% : 0.003957s : 1: optimize 0.01% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000210s : 1: renormalize.infer 0.15% : 0.000193s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000069s : 1: symbol_engine_optimizer 82.13% : 0.108037s : 1: task_emit 0.05% : 0.000070s : 1: tuple_transform 4.56% : 0.006001s : 1: type_inference 0.04% : 0.000056s : 1: validate TotalTime = 0.112483, [24] [bootstrap]: 0.00053665 [type_inference]: 0.00441298 [event_method]: 1.12e-05 [auto_monad]: 5.242e-05 [graph_reusing]: 5.07e-06 [inline]: 2.21e-06 [add_attr]: 0.00292208, [1] [add_attr_with_inline]: 0.00291397, [1] [Cycle 1]: 4.283e-05, [2] [tag_attr]: 1.155e-05 [meta_addattr_fg_expand]: 3.15998e-06 [parallel-infer-symbol]: 3.26001e-06 [pre_auto_parallel]: 2.187e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00371165, [53] [py_interpret_to_execute]: 1.479e-05 [rewriter_before_opt_a]: 3.975e-05 [opt_a]: 0.0019111, [2] [Cycle 1]: 0.00130736, [45] [expand_dump_flag]: 2.80997e-06 [switch_simplify]: 2.408e-05 [loop_unroll]: 1.331e-05 [a_1]: 0.00033019 [with_stream_mark]: 1.39e-05 [recompute_prepare]: 7.39002e-06 [updatestate_depend_eliminate]: 3.93001e-06 [updatestate_assign_eliminate]: 3.08998e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.10002e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 5.71998e-06 [parallel]: 1.689e-05 [flash_sp]: 7.36001e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.30998e-06 [matmul_add_comm_reduction]: 8.79e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 7.13998e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.87999e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 9.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.097e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 9.17001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.93e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 8.83001e-06 [renormalize]: 0.00036016 [add_forward_monad_depend]: 4.60999e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.274e-05 [cse]: 2.738e-05 [a_3]: 3.98e-05 [Cycle 2]: 0.00059413, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.15e-06 [loop_unroll]: 5.40001e-06 [a_1]: 0.00012533 [with_stream_mark]: 1.088e-05 [recompute_prepare]: 5.47999e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.31998e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.775e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.18999e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.20999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.20001e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.58998e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.25002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.98998e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 2.81e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.005e-05 [a_after_grad]: 8.14997e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.04001e-06 [cse]: 1.273e-05 [a_3]: 3.239e-05 [py_interpret_to_execute_after_opt_a]: 7.31001e-06 [slice_cell_reuse_recomputed_activation]: 2.14999e-06 [rewriter_after_opt_a]: 3.086e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 5.42999e-06 [mutable_eliminate]: 0.000449 [opt_b]: 0.00018258, [1] [Cycle 1]: 0.00017695, [7] [b_1]: 0.00010856 [b_2]: 7.68999e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.45002e-06 [renormalize]: 3.50003e-07 [cse]: 1.649e-05 [optimize_parallel_all_gather_comm]: 1.592e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.258e-05 [loop_unroll]: 0.00040996 [opt_after_cconv]: 9.459e-05, [1] [Cycle 1]: 8.911e-05, [7] [c_1]: 2.78e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.635e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.287e-05 [tuple_transform]: 6.893e-05, [1] [Cycle 1]: 6.463e-05, [4] [d_1]: 3.907e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.66998e-06 [add_recomputation]: 4.329e-05 [cse_after_recomputation]: 2.042e-05, [1] [Cycle 1]: 1.591e-05, [1] [cse]: 1.083e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.06998e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.35e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.25002e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.32999e-06 [symbol_engine_optimizer]: 6.854e-05, [1] [Cycle 1]: 6.459e-05, [6] [build]: 2.43e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.70999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.466e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.32002e-06 [opt_after_jit_grad]: 0.00047205 [validate]: 3.162e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.10006 [execute]: 9.04e-06 Sums bootstrap : 0.000537s : 0.49% type_inference : 0.004413s : 4.06% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000456s : 0.42% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000360s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.41% optimize.opt_b.b_1 : 0.000109s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000410s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000472s : 0.43% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.100060s : 92.14% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000124 26 18.04% : 0.000022s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.60% : 0.000006s : 4: substitution.graph_param_transform 65.32% : 0.000081s : 2: substitution.inline 2.10% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.88% : 0.000005s : 4: substitution.remove_not_recompute_node 3.51% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004374 2 91.82% : 0.004016s : 1: type_inference.infer 8.18% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.44% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.22% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.94% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.96% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.63% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.93% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000260 6 42.18% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.82% : 0.000150s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120466 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.43% : 0.002926s : 1: add_attr 2.42% : 0.002918s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.47% : 0.000565s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000418s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.67% : 0.000808s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.04% : 0.000047s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.59% : 0.001914s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.40% : 0.000482s : 1: opt_after_jit_grad 0.15% : 0.000186s : 1: opt_b 3.08% : 0.003716s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000196s : 1: renormalize.infer 0.13% : 0.000158s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 83.08% : 0.100083s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.67% : 0.004427s : 1: type_inference 0.04% : 0.000053s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x4-ge],max_mem:50.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x5-pynative],max_mem:50.0M TotalTime = 0.0212417, [24] [bootstrap]: 0.00057877 [type_inference]: 0.00618336 [event_method]: 1.392e-05 [auto_monad]: 8.611e-05 [graph_reusing]: 5.71e-06 [inline]: 1.75001e-06 [add_attr]: 0.00342286, [1] [add_attr_with_inline]: 0.00341286, [1] [Cycle 1]: 4.584e-05, [2] [tag_attr]: 1.558e-05 [meta_addattr_fg_expand]: 3.88001e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.708e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.06998e-06 [pipeline_split]: 1.71998e-06 [optimize]: 0.00401862, [53] [py_interpret_to_execute]: 2.13e-05 [rewriter_before_opt_a]: 5.769e-05 [opt_a]: 0.00216491, [2] [Cycle 1]: 0.00154894, [45] [expand_dump_flag]: 2.82002e-06 [switch_simplify]: 3.142e-05 [loop_unroll]: 2.097e-05 [a_1]: 0.00046543 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 7.67002e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.68999e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.86e-06 [a_2]: 7.68e-05 [accelerated_algorithm]: 6.50002e-06 [shard]: 1.77001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 7.86001e-06 [auto_parallel]: 5.86998e-06 [parallel]: 2.212e-05 [flash_sp]: 6.99001e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.51999e-06 [matmul_add_comm_reduction]: 8.54e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.33e-06 [virtual_dataset]: 5.67001e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.73997e-06 [merge_forward]: 3.80998e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.29998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.054e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 9.09e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00043342 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.404e-05 [cse]: 2.794e-05 [a_3]: 4.183e-05 [Cycle 2]: 0.00060656, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00012669 [with_stream_mark]: 1.015e-05 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.92002e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.32999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.873e-05 [accelerated_algorithm]: 5.67999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 4.36002e-06 [auto_parallel]: 5.42999e-06 [parallel]: 3.93001e-06 [flash_sp]: 5.20999e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.59999e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.15999e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.03002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.011e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89001e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.14e-06 [after_resolve]: 9.91998e-06 [a_after_grad]: 8.79e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.351e-05 [a_3]: 3.232e-05 [py_interpret_to_execute_after_opt_a]: 8.10999e-06 [slice_cell_reuse_recomputed_activation]: 2.34001e-06 [rewriter_after_opt_a]: 3.159e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.00045772 [opt_b]: 0.00018206, [1] [Cycle 1]: 0.00017607, [7] [b_1]: 0.00010858 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 3.80009e-07 [cse]: 1.667e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 2.12999e-06 [cconv]: 2.178e-05 [loop_unroll]: 0.0004149 [opt_after_cconv]: 9.6e-05, [1] [Cycle 1]: 9.009e-05, [7] [c_1]: 2.745e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.704e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.258e-05 [tuple_transform]: 6.965e-05, [1] [Cycle 1]: 6.533e-05, [4] [d_1]: 3.895e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.735e-05 [cse_after_recomputation]: 2.02e-05, [1] [Cycle 1]: 1.586e-05, [1] [cse]: 1.078e-05 [environ_conv]: 4.92999e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.36998e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 9.50007e-07 [full_micro_interleaved_order_control]: 2.42001e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.83999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.681e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.95001e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 6.907e-05, [1] [Cycle 1]: 6.49e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.27998e-06 [elim_not_effective]: 1.192e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 9.22999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.94999e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.525e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.00049683 [validate]: 3.216e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.00613607 [execute]: 7.38e-06 Sums bootstrap : 0.000579s : 3.44% type_inference : 0.006183s : 36.70% event_method : 0.000014s : 0.08% auto_monad : 0.000086s : 0.51% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000592s : 3.51% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.15% optimize.opt_a.flash_sp : 0.000012s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.10% optimize.opt_a.renormalize : 0.000433s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000041s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000458s : 2.72% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000415s : 2.46% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000497s : 2.95% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006136s : 36.42% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 15.36% : 0.000026s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000002s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 66.07% : 0.000110s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000004s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.51% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006138 2 90.93% : 0.005582s : 1: type_inference.infer 9.07% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.63% : 0.000028s : 3: replace.inline 30.37% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.72% : 0.000108s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 0.95% : 0.000002s : 11: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.91% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.03% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.38% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.70% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.73% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000357 8 48.08% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.92% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030236 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.33% : 0.003427s : 1: add_attr 11.30% : 0.003417s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.30% : 0.000092s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.02% : 0.000610s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000467s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.19% : 0.000964s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.17% : 0.002168s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.68% : 0.000507s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.30% : 0.004022s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.72% : 0.000218s : 1: renormalize.infer 0.69% : 0.000209s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 20.33% : 0.006146s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.50% : 0.006197s : 1: type_inference 0.21% : 0.000064s : 1: validate TotalTime = 0.0181947, [24] [bootstrap]: 0.00047505 [type_inference]: 0.00435588 [event_method]: 1.048e-05 [auto_monad]: 4.908e-05 [graph_reusing]: 5.15001e-06 [inline]: 1.92999e-06 [add_attr]: 0.00299813, [1] [add_attr_with_inline]: 0.00299035, [1] [Cycle 1]: 4.487e-05, [2] [tag_attr]: 1.187e-05 [meta_addattr_fg_expand]: 3.7e-06 [parallel-infer-symbol]: 2.97002e-06 [pre_auto_parallel]: 2.226e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00374055, [53] [py_interpret_to_execute]: 1.573e-05 [rewriter_before_opt_a]: 3.816e-05 [opt_a]: 0.00192896, [2] [Cycle 1]: 0.00128404, [45] [expand_dump_flag]: 2.46e-06 [switch_simplify]: 2.444e-05 [loop_unroll]: 1.382e-05 [a_1]: 0.0002933 [with_stream_mark]: 1.31e-05 [recompute_prepare]: 7.63001e-06 [updatestate_depend_eliminate]: 3.74002e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.3e-06 [parameter_eliminate]: 1.84998e-06 [a_2]: 7.634e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.73997e-06 [merge_send_recv]: 7.67998e-06 [auto_parallel]: 6.24001e-06 [parallel]: 1.78e-05 [flash_sp]: 7.41001e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 8.69e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.96e-06 [merge_forward]: 3.41001e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.37001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 9.07001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.16e-06 [receive_attached]: 2.64001e-06 [after_resolve]: 1.033e-05 [a_after_grad]: 8.95001e-06 [renormalize]: 0.00037426 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.86998e-06 [auto_monad_eliminator]: 1.252e-05 [cse]: 2.753e-05 [a_3]: 4.037e-05 [Cycle 2]: 0.00063565, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.16001e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012509 [with_stream_mark]: 9.49999e-06 [recompute_prepare]: 5.77001e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 0.00010027 [accelerated_algorithm]: 5.86e-06 [shard]: 1.25999e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.44998e-06 [auto_parallel]: 5.77999e-06 [parallel]: 3.9e-06 [flash_sp]: 3.01001e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.24999e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 6.01e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94999e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.3e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08998e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 9.79984e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 9.66e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 7.04001e-06 [cse]: 1.269e-05 [a_3]: 3.209e-05 [py_interpret_to_execute_after_opt_a]: 7.54002e-06 [slice_cell_reuse_recomputed_activation]: 1.96998e-06 [rewriter_after_opt_a]: 3.072e-05 [convert_after_rewriter]: 7.29001e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00045393 [opt_b]: 0.00018482, [1] [Cycle 1]: 0.00017865, [7] [b_1]: 0.00010753 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.57001e-06 [renormalize]: 4.80009e-07 [cse]: 1.744e-05 [optimize_parallel_all_gather_comm]: 1.574e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.316e-05 [loop_unroll]: 0.00041448 [opt_after_cconv]: 9.471e-05, [1] [Cycle 1]: 8.859e-05, [7] [c_1]: 2.759e-05 [parameter_eliminate]: 2.16998e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.575e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.233e-05 [tuple_transform]: 6.852e-05, [1] [Cycle 1]: 6.415e-05, [4] [d_1]: 3.839e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 1.61002e-06 [add_recomputation]: 4.231e-05 [cse_after_recomputation]: 1.966e-05, [1] [Cycle 1]: 1.519e-05, [1] [cse]: 1.012e-05 [environ_conv]: 4.58001e-06 [swap_dp_allreduce_reducescatter]: 5.57999e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.14003e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.39998e-06 [offloading_packed_experts]: 3.85e-06 [overlap_recompute_and_grad_model_parallel]: 4.28001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.16998e-06 [overlap_grad_ring_attention]: 3.91001e-06 [overlap_grad_flash_sp]: 1.763e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.843e-05, [1] [Cycle 1]: 6.428e-05, [6] [build]: 2.35002e-06 [elim_shapecalc]: 8.40001e-06 [elim_not_effective]: 1.132e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 9.25001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.78997e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.552e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00044827 [validate]: 3.124e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00582259 [execute]: 6.76999e-06 Sums bootstrap : 0.000475s : 3.34% type_inference : 0.004356s : 30.60% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000418s : 2.94% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000177s : 1.24% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000021s : 0.15% optimize.opt_a.a_after_grad : 0.000019s : 0.13% optimize.opt_a.renormalize : 0.000374s : 2.63% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000454s : 3.19% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000414s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000448s : 3.15% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005823s : 40.90% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.43% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.31% : 0.000002s : 2: substitution.fold_const_symbol 4.08% : 0.000005s : 4: substitution.graph_param_transform 65.42% : 0.000080s : 2: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.30% : 0.000004s : 4: substitution.remove_not_recompute_node 3.64% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004317 2 92.10% : 0.003976s : 1: type_inference.infer 7.90% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.83% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.89% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.06% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.86% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.86% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 1.07% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.92% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.58% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.10% : 0.000002s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 41.69% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.31% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026272 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.43% : 0.003002s : 1: add_attr 11.39% : 0.002994s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.95% : 0.000511s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000463s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.07% : 0.000806s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.35% : 0.001932s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.74% : 0.000458s : 1: opt_after_jit_grad 0.72% : 0.000188s : 1: opt_b 14.25% : 0.003744s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.80% : 0.000209s : 1: renormalize.infer 0.60% : 0.000158s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.20% : 0.005832s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.63% : 0.004369s : 1: type_inference 0.22% : 0.000058s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x5-kbk],max_mem:50.0M TotalTime = 0.105238, [24] [bootstrap]: 0.00052842 [type_inference]: 0.00602865 [event_method]: 1.423e-05 [auto_monad]: 5.767e-05 [graph_reusing]: 6.21998e-06 [inline]: 1.82001e-06 [add_attr]: 0.00347523, [1] [add_attr_with_inline]: 0.00346381, [1] [Cycle 1]: 4.617e-05, [2] [tag_attr]: 1.504e-05 [meta_addattr_fg_expand]: 4.40999e-06 [parallel-infer-symbol]: 3.08998e-06 [pre_auto_parallel]: 2.888e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.00402677, [53] [py_interpret_to_execute]: 1.922e-05 [rewriter_before_opt_a]: 5.696e-05 [opt_a]: 0.00212907, [2] [Cycle 1]: 0.00152955, [45] [expand_dump_flag]: 2.46998e-06 [switch_simplify]: 3.103e-05 [loop_unroll]: 2.157e-05 [a_1]: 0.00045436 [with_stream_mark]: 1.37e-05 [recompute_prepare]: 7.88999e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.49001e-06 [updatestate_loads_eliminate]: 3.16999e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.396e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.02998e-06 [auto_parallel]: 5.76e-06 [parallel]: 2.431e-05 [flash_sp]: 7.34002e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.41002e-06 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 7.23e-06 [virtual_dataset]: 5.71998e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.139e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 1.034e-05 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.28998e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 2.46998e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 8.79e-06 [renormalize]: 0.00043029 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.99e-06 [auto_monad_eliminator]: 1.399e-05 [cse]: 2.554e-05 [a_3]: 4.187e-05 [Cycle 2]: 0.00058998, [45] [expand_dump_flag]: 7.90023e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012486 [with_stream_mark]: 1.031e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.726e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.35999e-06 [meta_shard_fg_expand]: 1.32e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.32003e-06 [auto_parallel]: 5.13002e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.88998e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.10001e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 5.12999e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.45001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.20999e-06 [a_after_grad]: 7.80998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.42001e-06 [cse]: 1.585e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 7.18998e-06 [slice_cell_reuse_recomputed_activation]: 2.33998e-06 [rewriter_after_opt_a]: 2.978e-05 [convert_after_rewriter]: 6.86001e-06 [order_py_execute_after_rewriter]: 5.23002e-06 [mutable_eliminate]: 0.0004561 [opt_b]: 0.00018192, [1] [Cycle 1]: 0.00017601, [7] [b_1]: 0.00010818 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.59985e-07 [cse]: 1.633e-05 [optimize_parallel_all_gather_comm]: 1.598e-05 [overlap_param_gather]: 2.18002e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00041828 [opt_after_cconv]: 0.000138, [1] [Cycle 1]: 0.00013231, [7] [c_1]: 2.794e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.28998e-06 [cse]: 5.74e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.34e-05 [tuple_transform]: 7.154e-05, [1] [Cycle 1]: 6.681e-05, [4] [d_1]: 3.972e-05 [none_parameter_eliminate]: 1.97001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.54001e-06 [partial_unused_args_eliminate]: 2.01998e-06 [add_recomputation]: 4.992e-05 [cse_after_recomputation]: 2.151e-05, [1] [Cycle 1]: 1.716e-05, [1] [cse]: 1.196e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.26002e-06 [bias_add_comm_swap]: 2.63998e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.32001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.17001e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.40025e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 2.02001e-06 [offloading_packed_experts]: 4.09997e-06 [overlap_recompute_and_grad_model_parallel]: 4.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.84998e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.662e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.20002e-06 [split_layernorm_comm]: 1.86998e-06 [handle_group_info]: 1.39e-06 [symbol_engine_optimizer]: 6.799e-05, [1] [Cycle 1]: 6.382e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.18001e-06 [elim_not_effective]: 1.106e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 9.10999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.536e-05 [get_jit_bprop_graph]: 1.09003e-06 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00045176 [validate]: 3.063e-05 [backend_pass]: 8.69972e-07 [task_emit]: 0.0903392 [execute]: 9.38002e-06 Sums bootstrap : 0.000528s : 0.52% type_inference : 0.006029s : 5.98% event_method : 0.000014s : 0.01% auto_monad : 0.000058s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000057s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.04% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000579s : 0.57% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000430s : 0.43% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000456s : 0.45% optimize.opt_b.b_1 : 0.000108s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000418s : 0.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000057s : 0.06% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.05% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000452s : 0.45% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.090339s : 89.63% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.84% : 0.000025s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.57% : 0.000111s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.87% : 0.000005s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.56% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005984 2 90.81% : 0.005434s : 1: type_inference.infer 9.19% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000038 5 71.06% : 0.000027s : 3: replace.inline 28.94% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.76% : 0.000109s : 3: match.inline 8.24% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.96% : 0.000002s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 1.01% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.16% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.18% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.36% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.83% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.05% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.80% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.56% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.25% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 45.71% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.29% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.114264 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.05% : 0.003479s : 1: add_attr 3.03% : 0.003468s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000063s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.49% : 0.000560s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.83% : 0.000943s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.87% : 0.002132s : 1: opt_a 0.12% : 0.000141s : 1: opt_after_cconv 0.40% : 0.000461s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.53% : 0.004031s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.20% : 0.000224s : 1: renormalize.infer 0.17% : 0.000200s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000061s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 79.08% : 0.090361s : 1: task_emit 0.07% : 0.000074s : 1: tuple_transform 5.29% : 0.006043s : 1: type_inference 0.05% : 0.000055s : 1: validate TotalTime = 0.0957272, [24] [bootstrap]: 0.00048533 [type_inference]: 0.00447247 [event_method]: 1.128e-05 [auto_monad]: 5.1e-05 [graph_reusing]: 5.37999e-06 [inline]: 1.60999e-06 [add_attr]: 0.0029983, [1] [add_attr_with_inline]: 0.0029899, [1] [Cycle 1]: 4.544e-05, [2] [tag_attr]: 1.296e-05 [meta_addattr_fg_expand]: 3.21999e-06 [parallel-infer-symbol]: 3.12002e-06 [pre_auto_parallel]: 2.285e-05 [insert-virtual-dataset]: 3.01001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.18002e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00374186, [53] [py_interpret_to_execute]: 1.655e-05 [rewriter_before_opt_a]: 3.934e-05 [opt_a]: 0.00192418, [2] [Cycle 1]: 0.00132106, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 2.443e-05 [loop_unroll]: 1.376e-05 [a_1]: 0.00029333 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.78999e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.5e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.745e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.01998e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 8.64998e-06 [auto_parallel]: 5.87001e-06 [parallel]: 1.816e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.97e-06 [allreduce_fusion]: 3.13e-06 [matmul_add_comm_reduction]: 9.07001e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.31001e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.60001e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 3.55998e-06 [cell_reuse_recompute_pass]: 1.19003e-06 [offload_activation]: 9.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.34998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 1.99e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00040583 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.392e-05 [cse]: 2.671e-05 [a_3]: 4.099e-05 [Cycle 2]: 0.00059366, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 7.23999e-06 [loop_unroll]: 5.66e-06 [a_1]: 0.00012502 [with_stream_mark]: 1.047e-05 [recompute_prepare]: 5.60001e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.825e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.30001e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.60001e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.27e-06 [flash_sp]: 3.74002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.59001e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.15001e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71998e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.37998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.85001e-06 [a_after_grad]: 8.23001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.54999e-06 [cse]: 1.366e-05 [a_3]: 3.194e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.193e-05 [convert_after_rewriter]: 7.28e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00045937 [opt_b]: 0.00018291, [1] [Cycle 1]: 0.00017671, [7] [b_1]: 0.0001087 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 4.39992e-07 [cse]: 1.649e-05 [optimize_parallel_all_gather_comm]: 1.554e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.0004114 [opt_after_cconv]: 9.415e-05, [1] [Cycle 1]: 8.856e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.06998e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.597e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.267e-05 [tuple_transform]: 6.954e-05, [1] [Cycle 1]: 6.52e-05, [4] [d_1]: 3.928e-05 [none_parameter_eliminate]: 1.41998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.338e-05 [cse_after_recomputation]: 2.045e-05, [1] [Cycle 1]: 1.618e-05, [1] [cse]: 1.11e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.67999e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 4.22e-06 [overlap_grad_flash_sp]: 1.738e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.855e-05, [1] [Cycle 1]: 6.45e-05, [6] [build]: 2.89001e-06 [elim_shapecalc]: 8.12998e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.72998e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.569e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00044525 [validate]: 3.167e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0832058 [execute]: 9.24998e-06 Sums bootstrap : 0.000485s : 0.53% type_inference : 0.004472s : 4.87% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000418s : 0.46% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000406s : 0.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000459s : 0.50% optimize.opt_b.b_1 : 0.000109s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000411s : 0.45% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000445s : 0.49% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.083206s : 90.68% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.02% : 0.000022s : 4: substitution.arithmetic_simplify 1.50% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.40% : 0.000005s : 4: substitution.graph_param_transform 65.69% : 0.000080s : 2: substitution.inline 2.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.73% : 0.000005s : 4: substitution.remove_not_recompute_node 2.97% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004429 2 91.86% : 0.004069s : 1: type_inference.infer 8.14% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.98% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.95% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.82% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.97% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.96% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.89% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.98% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.69% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.94% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000264 6 42.34% : 0.000112s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.66% : 0.000152s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.103800 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.89% : 0.003003s : 1: add_attr 2.88% : 0.002993s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.05% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.50% : 0.000523s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.40% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.74% : 0.000772s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.86% : 0.001927s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.44% : 0.000454s : 1: opt_after_jit_grad 0.18% : 0.000186s : 1: opt_b 3.61% : 0.003746s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000234s : 1: renormalize.infer 0.16% : 0.000165s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000071s : 1: symbol_engine_optimizer 80.18% : 0.083228s : 1: task_emit 0.07% : 0.000072s : 1: tuple_transform 4.32% : 0.004488s : 1: type_inference 0.05% : 0.000054s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x5-ge],max_mem:52.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x6-pynative],max_mem:52.0M TotalTime = 0.0212675, [24] [bootstrap]: 0.00056989 [type_inference]: 0.00614275 [event_method]: 1.453e-05 [auto_monad]: 5.499e-05 [graph_reusing]: 5.96e-06 [inline]: 1.62001e-06 [add_attr]: 0.00348217, [1] [add_attr_with_inline]: 0.00347104, [1] [Cycle 1]: 4.355e-05, [2] [tag_attr]: 1.481e-05 [meta_addattr_fg_expand]: 3.98999e-06 [parallel-infer-symbol]: 2.59001e-06 [pre_auto_parallel]: 2.798e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00397639, [53] [py_interpret_to_execute]: 2.095e-05 [rewriter_before_opt_a]: 5.742e-05 [opt_a]: 0.00212977, [2] [Cycle 1]: 0.00151994, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 3.132e-05 [loop_unroll]: 2.12e-05 [a_1]: 0.00045193 [with_stream_mark]: 1.341e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 3.10002e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.578e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 1.81e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.68001e-06 [auto_parallel]: 6.16e-06 [parallel]: 2.341e-05 [flash_sp]: 7.49002e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.18998e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.36001e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.59998e-06 [virtual_output]: 5.91998e-06 [merge_forward]: 3.55e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.42999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 2.08998e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 8.60999e-06 [renormalize]: 0.00042387 [add_forward_monad_depend]: 4.40999e-06 [auto_monad_grad]: 1.99e-06 [auto_monad_eliminator]: 1.352e-05 [cse]: 2.648e-05 [a_3]: 4.081e-05 [Cycle 2]: 0.0006005, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 7.06999e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012612 [with_stream_mark]: 9.82999e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.33002e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.791e-05 [accelerated_algorithm]: 5.68002e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.04e-06 [parallel]: 4.02998e-06 [flash_sp]: 3.16999e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 8.18999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.20002e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 4.97999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.81998e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.16002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03998e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.00001e-06 [a_after_grad]: 7.8e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.49001e-06 [cse]: 1.338e-05 [a_3]: 3.357e-05 [py_interpret_to_execute_after_opt_a]: 7.55e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.087e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.13002e-06 [mutable_eliminate]: 0.00047056 [opt_b]: 0.00017905, [1] [Cycle 1]: 0.00017305, [7] [b_1]: 0.00010632 [b_2]: 7.00002e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.19997e-07 [cse]: 1.605e-05 [optimize_parallel_all_gather_comm]: 1.585e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.151e-05 [loop_unroll]: 0.0004112 [opt_after_cconv]: 9.508e-05, [1] [Cycle 1]: 8.917e-05, [7] [c_1]: 2.779e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.43e-06 [cse]: 1.61e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.163e-05 [tuple_transform]: 6.918e-05, [1] [Cycle 1]: 6.489e-05, [4] [d_1]: 3.887e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.497e-05 [cse_after_recomputation]: 2.052e-05, [1] [Cycle 1]: 1.618e-05, [1] [cse]: 1.111e-05 [environ_conv]: 4.75001e-06 [swap_dp_allreduce_reducescatter]: 4.87e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 1.99e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 3.66001e-06 [overlap_grad_flash_sp]: 1.68e-05 [begin_end_overlap_inline]: 8.50006e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.827e-05, [1] [Cycle 1]: 6.415e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.197e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 8.52998e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.37999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.577e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00044612 [validate]: 3.168e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00627975 [execute]: 6.76e-06 Sums bootstrap : 0.000570s : 3.39% type_inference : 0.006143s : 36.51% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000578s : 3.44% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.10% optimize.opt_a.renormalize : 0.000424s : 2.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000471s : 2.80% optimize.opt_b.b_1 : 0.000106s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000411s : 2.44% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000446s : 2.65% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006280s : 37.32% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 15.46% : 0.000026s : 5: substitution.arithmetic_simplify 1.19% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.21% : 0.000005s : 4: substitution.graph_param_transform 66.47% : 0.000110s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.53% : 0.000004s : 4: substitution.remove_not_recompute_node 2.16% : 0.000004s : 4: substitution.replace_old_param 6.34% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006099 2 90.98% : 0.005549s : 1: type_inference.infer 9.02% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.76% : 0.000026s : 3: replace.inline 30.24% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.95% : 0.000108s : 3: match.inline 8.05% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000002s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.92% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 1.05% : 0.000002s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.92% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.32% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000358 8 48.64% : 0.000174s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.36% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030247 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.53% : 0.003487s : 1: add_attr 11.49% : 0.003475s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.99% : 0.000601s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.59% : 0.000480s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.12% : 0.000945s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.05% : 0.002133s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.51% : 0.000455s : 1: opt_after_jit_grad 0.60% : 0.000182s : 1: opt_b 13.16% : 0.003980s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000006s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.70% : 0.000213s : 1: renormalize.infer 0.67% : 0.000204s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.79% : 0.006290s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.35% : 0.006156s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0181556, [24] [bootstrap]: 0.00040781 [type_inference]: 0.0042753 [event_method]: 1.118e-05 [auto_monad]: 5.076e-05 [graph_reusing]: 5.57999e-06 [inline]: 1.92999e-06 [add_attr]: 0.00295752, [1] [add_attr_with_inline]: 0.00294897, [1] [Cycle 1]: 4.436e-05, [2] [tag_attr]: 1.125e-05 [meta_addattr_fg_expand]: 3.11999e-06 [parallel-infer-symbol]: 2.89001e-06 [pre_auto_parallel]: 2.158e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00370941, [53] [py_interpret_to_execute]: 1.596e-05 [rewriter_before_opt_a]: 3.897e-05 [opt_a]: 0.00190579, [2] [Cycle 1]: 0.00125748, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 2.454e-05 [loop_unroll]: 1.411e-05 [a_1]: 0.00029142 [with_stream_mark]: 1.314e-05 [recompute_prepare]: 7.66001e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.03998e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.92999e-06 [a_2]: 7.557e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 2.37001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.5e-06 [auto_parallel]: 5.91e-06 [parallel]: 1.78e-05 [flash_sp]: 7.35e-06 [merge_comm]: 3.62002e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.55999e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 6.84001e-06 [virtual_dataset]: 5.56e-06 [get_grad_eliminate_]: 5.51998e-06 [virtual_output]: 5.45001e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.15001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.118e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 8.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.93003e-06 [receive_attached]: 2.40002e-06 [after_resolve]: 1.07e-05 [a_after_grad]: 8.69e-06 [renormalize]: 0.00034624 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.33e-05 [cse]: 2.855e-05 [a_3]: 4.025e-05 [Cycle 2]: 0.00063914, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 7.16999e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012558 [with_stream_mark]: 1.111e-05 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.48998e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00010907 [accelerated_algorithm]: 5.91e-06 [shard]: 1.34e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.46998e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.67001e-06 [parallel]: 4.2e-06 [flash_sp]: 3.09999e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.08002e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.37001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.041e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.93999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.26002e-06 [after_resolve]: 9.71e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.36998e-06 [cse]: 1.255e-05 [a_3]: 3.217e-05 [py_interpret_to_execute_after_opt_a]: 7.83001e-06 [slice_cell_reuse_recomputed_activation]: 1.71002e-06 [rewriter_after_opt_a]: 3.065e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00045038 [opt_b]: 0.00018105, [1] [Cycle 1]: 0.0001748, [7] [b_1]: 0.00010783 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 4.89992e-07 [cse]: 1.567e-05 [optimize_parallel_all_gather_comm]: 1.525e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.225e-05 [loop_unroll]: 0.00041559 [opt_after_cconv]: 9.406e-05, [1] [Cycle 1]: 8.853e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.53e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.23998e-06 [cse]: 1.522e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.339e-05 [tuple_transform]: 6.791e-05, [1] [Cycle 1]: 6.355e-05, [4] [d_1]: 3.854e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 4.326e-05 [cse_after_recomputation]: 1.952e-05, [1] [Cycle 1]: 1.515e-05, [1] [cse]: 1.012e-05 [environ_conv]: 5.20999e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.70997e-06 [label_micro_interleaved_index]: 3.88999e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.78e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.36002e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.04998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.83002e-06 [control_data_broadcast_order]: 1.158e-05 [grouped_pairwise_exchange_alltoall]: 1.67001e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.12e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.654e-05 [begin_end_overlap_inline]: 5.8001e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.756e-05, [1] [Cycle 1]: 6.367e-05, [6] [build]: 2.12999e-06 [elim_shapecalc]: 8.29002e-06 [elim_not_effective]: 1.128e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 9.17999e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.44003e-06 [auto_monad_reorder]: 1.643e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00044547 [validate]: 3.036e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.00600397 [execute]: 7.61001e-06 Sums bootstrap : 0.000408s : 2.86% type_inference : 0.004275s : 30.02% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000417s : 2.93% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000185s : 1.30% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000346s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.29% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.16% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000416s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.12% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.13% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006004s : 42.15% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.73% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.22% : 0.000001s : 2: substitution.fold_const_symbol 4.25% : 0.000005s : 4: substitution.graph_param_transform 64.42% : 0.000078s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.23% : 0.000005s : 4: substitution.remove_not_recompute_node 3.35% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004234 2 91.93% : 0.003892s : 1: type_inference.infer 8.07% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.76% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.89% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.32% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.19% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 42.17% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.83% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026139 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.33% : 0.002962s : 1: add_attr 11.30% : 0.002953s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.70% : 0.000443s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.76% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000811s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.30% : 0.001909s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000455s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 14.21% : 0.003713s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.74% : 0.000193s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.01% : 0.006015s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.41% : 0.004289s : 1: type_inference 0.22% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x6-kbk],max_mem:52.0M . TotalTime = 0.795887, [24] [bootstrap]: 0.00052922 [type_inference]: 0.00607762 [event_method]: 1.404e-05 [auto_monad]: 5.818e-05 [graph_reusing]: 5.70001e-06 [inline]: 1.55999e-06 [add_attr]: 0.00351472, [1] [add_attr_with_inline]: 0.00350346, [1] [Cycle 1]: 4.913e-05, [2] [tag_attr]: 1.622e-05 [meta_addattr_fg_expand]: 4.2e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.89e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 9.79984e-07 [dataset_repeat_opt]: 2.03002e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00409072, [53] [py_interpret_to_execute]: 2.099e-05 [rewriter_before_opt_a]: 5.939e-05 [opt_a]: 0.00222958, [2] [Cycle 1]: 0.00162415, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 3.219e-05 [loop_unroll]: 2.131e-05 [a_1]: 0.00045787 [with_stream_mark]: 1.405e-05 [recompute_prepare]: 7.97e-06 [updatestate_depend_eliminate]: 4.15e-06 [updatestate_assign_eliminate]: 3.05998e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.52999e-06 [a_2]: 7.662e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 8.42998e-06 [auto_parallel]: 6.29999e-06 [parallel]: 2.319e-05 [flash_sp]: 8.07003e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.17001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 5.82999e-06 [virtual_output]: 5.71e-06 [merge_forward]: 4.40999e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.75002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.38998e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.072e-05 [a_after_grad]: 8.84e-06 [renormalize]: 0.00048318 [add_forward_monad_depend]: 4.80001e-06 [auto_monad_grad]: 2.44999e-06 [auto_monad_eliminator]: 1.35e-05 [cse]: 2.71e-05 [a_3]: 7.264e-05 [Cycle 2]: 0.00059545, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 7.17002e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00012638 [with_stream_mark]: 9.92001e-06 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.11998e-06 [updatestate_loads_eliminate]: 2.50002e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.915e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.32999e-06 [meta_shard_fg_expand]: 1.27999e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.21001e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.70999e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 2.81e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.27001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.07999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.09e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.26998e-06 [a_after_grad]: 8.28001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.77002e-06 [cse]: 1.372e-05 [a_3]: 3.131e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 1.74998e-06 [rewriter_after_opt_a]: 3.165e-05 [convert_after_rewriter]: 7.11999e-06 [order_py_execute_after_rewriter]: 4.96002e-06 [mutable_eliminate]: 0.0004711 [opt_b]: 0.00018229, [1] [Cycle 1]: 0.00017626, [7] [b_1]: 0.00010891 [b_2]: 7.12002e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.18002e-06 [renormalize]: 3.00002e-07 [cse]: 1.605e-05 [optimize_parallel_all_gather_comm]: 1.555e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.154e-05 [loop_unroll]: 0.000411 [opt_after_cconv]: 9.493e-05, [1] [Cycle 1]: 8.927e-05, [7] [c_1]: 2.823e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.542e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.241e-05 [tuple_transform]: 6.923e-05, [1] [Cycle 1]: 6.479e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.39998e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.716e-05 [cse_after_recomputation]: 1.984e-05, [1] [Cycle 1]: 1.556e-05, [1] [cse]: 1.056e-05 [environ_conv]: 5.07999e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.72001e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.06998e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.172e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.67998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69e-06 [overlap_recompute_comm]: 2.13002e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.744e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.987e-05, [1] [Cycle 1]: 6.59e-05, [6] [build]: 3.08998e-06 [elim_shapecalc]: 8.57e-06 [elim_not_effective]: 1.191e-05 [opt_reshape]: 6.23998e-06 [fold_const_symbol]: 9.25001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 1.584e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.36001e-06 [opt_after_jit_grad]: 0.00044943 [validate]: 3.256e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.780822 [execute]: 9.76998e-06 Sums bootstrap : 0.000529s : 0.07% type_inference : 0.006078s : 0.77% event_method : 0.000014s : 0.00% auto_monad : 0.000058s : 0.01% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000584s : 0.07% optimize.opt_a.with_stream_mark : 0.000024s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000483s : 0.06% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000041s : 0.01% optimize.opt_a.a_3 : 0.000104s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000471s : 0.06% optimize.opt_b.b_1 : 0.000109s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000411s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.06% validate : 0.000033s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.780822s : 98.66% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000168 30 14.83% : 0.000025s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000002s : 2: substitution.fold_const_symbol 3.13% : 0.000005s : 4: substitution.graph_param_transform 66.97% : 0.000112s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.24% : 0.000004s : 4: substitution.replace_old_param 6.53% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006032 2 90.90% : 0.005483s : 1: type_inference.infer 9.10% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.63% : 0.000028s : 3: replace.inline 28.37% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.77% : 0.000110s : 3: match.inline 8.23% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.94% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.14% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.37% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.91% : 0.000001s : 8: predicate.replace_old_param 0.26% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.76% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.19% : 0.000004s : 24: predicate.switch_layer_defer_inline 5.07% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 45.59% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.41% : 0.000190s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.805108 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.44% : 0.003519s : 1: add_attr 0.44% : 0.003507s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000064s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.07% : 0.000563s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000480s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.12% : 0.000984s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.01% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.28% : 0.002233s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.06% : 0.000459s : 1: opt_after_jit_grad 0.02% : 0.000186s : 1: opt_b 0.51% : 0.004095s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.03% : 0.000262s : 1: renormalize.infer 0.03% : 0.000214s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 96.99% : 0.780845s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.76% : 0.006093s : 1: type_inference 0.01% : 0.000059s : 1: validate TotalTime = 0.0796738, [24] [bootstrap]: 0.00041258 [type_inference]: 0.00435933 [event_method]: 1.084e-05 [auto_monad]: 5.055e-05 [graph_reusing]: 4.75001e-06 [inline]: 1.71e-06 [add_attr]: 0.0029875, [1] [add_attr_with_inline]: 0.00297947, [1] [Cycle 1]: 4.334e-05, [2] [tag_attr]: 1.224e-05 [meta_addattr_fg_expand]: 3.13998e-06 [parallel-infer-symbol]: 2.89999e-06 [pre_auto_parallel]: 2.18e-05 [insert-virtual-dataset]: 2.66999e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0037157, [53] [py_interpret_to_execute]: 1.472e-05 [rewriter_before_opt_a]: 3.913e-05 [opt_a]: 0.00188487, [2] [Cycle 1]: 0.00128114, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 2.506e-05 [loop_unroll]: 1.361e-05 [a_1]: 0.00029202 [with_stream_mark]: 1.39e-05 [recompute_prepare]: 7.83001e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.693e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 2.59001e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.36002e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.889e-05 [flash_sp]: 7.56001e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.44001e-06 [matmul_add_comm_reduction]: 9.06998e-06 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.51001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.35001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.29e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.95e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.61e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 8.62e-06 [renormalize]: 0.00036315 [add_forward_monad_depend]: 4.97e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.266e-05 [cse]: 2.931e-05 [a_3]: 4.034e-05 [Cycle 2]: 0.00059464, [45] [expand_dump_flag]: 7.90023e-07 [switch_simplify]: 6.92002e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012504 [with_stream_mark]: 1.032e-05 [recompute_prepare]: 5.81e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.852e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.16997e-06 [shard_inline]: 5.65001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.98001e-06 [flash_sp]: 3.03e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 4.94e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.20001e-06 [virtual_output]: 5.17999e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.99999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.12998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11999e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.09003e-06 [after_resolve]: 9.11002e-06 [a_after_grad]: 8.2e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.276e-05 [a_3]: 3.246e-05 [py_interpret_to_execute_after_opt_a]: 7.92e-06 [slice_cell_reuse_recomputed_activation]: 1.80001e-06 [rewriter_after_opt_a]: 3.135e-05 [convert_after_rewriter]: 7.15998e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00044729 [opt_b]: 0.00018113, [1] [Cycle 1]: 0.0001748, [7] [b_1]: 0.00010771 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.28998e-06 [renormalize]: 3.29979e-07 [cse]: 1.595e-05 [optimize_parallel_all_gather_comm]: 1.559e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 4.953e-05 [loop_unroll]: 0.00041076 [opt_after_cconv]: 9.488e-05, [1] [Cycle 1]: 8.933e-05, [7] [c_1]: 2.788e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.24998e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.13002e-06 [cse]: 1.621e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.256e-05 [tuple_transform]: 6.938e-05, [1] [Cycle 1]: 6.525e-05, [4] [d_1]: 3.963e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.423e-05 [cse_after_recomputation]: 2.039e-05, [1] [Cycle 1]: 1.592e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.45e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.55997e-06 [label_micro_interleaved_index]: 4.13999e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.40002e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.29984e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.24e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 4.75001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.15999e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.867e-05, [1] [Cycle 1]: 6.458e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 8.49002e-06 [elim_not_effective]: 1.182e-05 [opt_reshape]: 6.12999e-06 [fold_const_symbol]: 8.60999e-06 [renormalize]: 2.40019e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.522e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00044433 [validate]: 3.176e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.067385 [execute]: 9.94999e-06 Sums bootstrap : 0.000413s : 0.54% type_inference : 0.004359s : 5.76% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000417s : 0.55% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000363s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.59% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000050s : 0.07% optimize.loop_unroll : 0.000411s : 0.54% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000444s : 0.59% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.067385s : 88.99% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.39% : 0.000022s : 4: substitution.arithmetic_simplify 1.71% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 64.88% : 0.000078s : 2: substitution.inline 2.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.91% : 0.000005s : 4: substitution.remove_not_recompute_node 3.28% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004319 2 91.56% : 0.003954s : 1: type_inference.infer 8.44% : 0.000364s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.94% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.83% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.37% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.77% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.84% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.71% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000295 6 48.76% : 0.000144s : 2: func_graph_cloner_run.FuncGraphClonerGraph 51.24% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087664 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.41% : 0.002992s : 1: add_attr 3.40% : 0.002983s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.50% : 0.000442s : 1: bootstrap 0.06% : 0.000053s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000419s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.88% : 0.000772s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.15% : 0.001888s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.52% : 0.000454s : 1: opt_after_jit_grad 0.21% : 0.000184s : 1: opt_b 4.24% : 0.003719s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000203s : 1: renormalize.infer 0.18% : 0.000153s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 76.89% : 0.067407s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 4.99% : 0.004374s : 1: type_inference 0.06% : 0.000055s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x6-ge],max_mem:52.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x7-pynative],max_mem:52.0M TotalTime = 0.0213403, [24] [bootstrap]: 0.00052139 [type_inference]: 0.0061319 [event_method]: 1.49e-05 [auto_monad]: 5.64e-05 [graph_reusing]: 5.69999e-06 [inline]: 1.79e-06 [add_attr]: 0.00343333, [1] [add_attr_with_inline]: 0.00342276, [1] [Cycle 1]: 4.784e-05, [2] [tag_attr]: 1.547e-05 [meta_addattr_fg_expand]: 4.27e-06 [parallel-infer-symbol]: 3.57002e-06 [pre_auto_parallel]: 2.928e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.00409095, [53] [py_interpret_to_execute]: 2.184e-05 [rewriter_before_opt_a]: 5.845e-05 [opt_a]: 0.00222783, [2] [Cycle 1]: 0.00161393, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 3.229e-05 [loop_unroll]: 2.113e-05 [a_1]: 0.00045996 [with_stream_mark]: 1.4e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 4.13999e-06 [updatestate_assign_eliminate]: 3.37002e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 7.681e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 2.68e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.51e-06 [merge_send_recv]: 8.12e-06 [auto_parallel]: 5.50001e-06 [parallel]: 2.233e-05 [flash_sp]: 7.17002e-06 [merge_comm]: 3.95998e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 5.91998e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.061e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 9.30001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35003e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.23002e-06 [after_resolve]: 1.084e-05 [a_after_grad]: 8.84998e-06 [renormalize]: 0.00050053 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 2.01998e-06 [auto_monad_eliminator]: 1.394e-05 [cse]: 2.801e-05 [a_3]: 4.306e-05 [Cycle 2]: 0.00060474, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 7.09001e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012745 [with_stream_mark]: 9.47001e-06 [recompute_prepare]: 5.51998e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.14e-06 [updatestate_loads_eliminate]: 2.37001e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 6.859e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 1.25999e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.32999e-06 [merge_send_recv]: 4.44002e-06 [auto_parallel]: 5.63997e-06 [parallel]: 4.77998e-06 [flash_sp]: 3.09999e-06 [merge_comm]: 3.22002e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.06002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.43998e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.58998e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 6.08998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.009e-05 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 8.60999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18998e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.97e-06 [a_after_grad]: 8.1e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.299e-05 [a_3]: 3.28e-05 [py_interpret_to_execute_after_opt_a]: 8.12e-06 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 3.148e-05 [convert_after_rewriter]: 7.21999e-06 [order_py_execute_after_rewriter]: 4.78001e-06 [mutable_eliminate]: 0.00046452 [opt_b]: 0.00018192, [1] [Cycle 1]: 0.00017601, [7] [b_1]: 0.00010769 [b_2]: 7.30998e-06 [updatestate_depend_eliminate]: 5.82001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 3.09985e-07 [cse]: 1.632e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.304e-05 [loop_unroll]: 0.00041668 [opt_after_cconv]: 9.619e-05, [1] [Cycle 1]: 9.052e-05, [7] [c_1]: 2.794e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.681e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.248e-05 [tuple_transform]: 6.944e-05, [1] [Cycle 1]: 6.515e-05, [4] [d_1]: 3.958e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.35002e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.771e-05 [cse_after_recomputation]: 2.057e-05, [1] [Cycle 1]: 1.63e-05, [1] [cse]: 1.13e-05 [environ_conv]: 4.93001e-06 [swap_dp_allreduce_reducescatter]: 5.01002e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.98003e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.24003e-06 [ForceFp32Comm]: 6.89994e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.162e-05 [grouped_pairwise_exchange_alltoall]: 1.46002e-06 [offloading_packed_experts]: 4.13999e-06 [overlap_recompute_and_grad_model_parallel]: 4.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.54e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.737e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.841e-05, [1] [Cycle 1]: 6.418e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.29998e-06 [elim_not_effective]: 1.173e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.598e-05 [get_jit_bprop_graph]: 1.13001e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00045374 [validate]: 3.29e-05 [backend_pass]: 8.90024e-07 [task_emit]: 0.00631483 [execute]: 7.63999e-06 Sums bootstrap : 0.000521s : 3.08% type_inference : 0.006132s : 36.23% event_method : 0.000015s : 0.09% auto_monad : 0.000056s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000587s : 3.47% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000501s : 2.96% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000076s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000465s : 2.74% optimize.opt_b.b_1 : 0.000108s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000417s : 2.46% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000454s : 2.68% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006315s : 37.31% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000169 30 14.78% : 0.000025s : 5: substitution.arithmetic_simplify 1.26% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.30% : 0.000006s : 4: substitution.graph_param_transform 66.64% : 0.000112s : 3: substitution.inline 1.91% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.23% : 0.000004s : 4: substitution.replace_old_param 6.56% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006087 2 90.56% : 0.005512s : 1: type_inference.infer 9.44% : 0.000575s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.14% : 0.000027s : 3: replace.inline 30.86% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.71% : 0.000110s : 3: match.inline 8.29% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.81% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.75% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.83% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.86% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.67% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.60% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000363 8 46.63% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.37% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030476 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.28% : 0.003437s : 1: add_attr 11.24% : 0.003426s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.84% : 0.000560s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.55% : 0.000473s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.15% : 0.000959s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.32% : 0.002231s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.52% : 0.000463s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.44% : 0.004094s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.81% : 0.000246s : 1: renormalize.infer 0.81% : 0.000247s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.76% : 0.006326s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.17% : 0.006147s : 1: type_inference 0.22% : 0.000066s : 1: validate TotalTime = 0.0181565, [24] [bootstrap]: 0.00040501 [type_inference]: 0.00433211 [event_method]: 1.152e-05 [auto_monad]: 5.243e-05 [graph_reusing]: 5.24e-06 [inline]: 2.22999e-06 [add_attr]: 0.00296831, [1] [add_attr_with_inline]: 0.00295968, [1] [Cycle 1]: 4.253e-05, [2] [tag_attr]: 1.211e-05 [meta_addattr_fg_expand]: 3.13e-06 [parallel-infer-symbol]: 3.48e-06 [pre_auto_parallel]: 2.297e-05 [insert-virtual-dataset]: 2.42001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00370763, [53] [py_interpret_to_execute]: 1.475e-05 [rewriter_before_opt_a]: 3.938e-05 [opt_a]: 0.00191216, [2] [Cycle 1]: 0.00128049, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 2.478e-05 [loop_unroll]: 1.402e-05 [a_1]: 0.00029385 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 7.94002e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.39001e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.76998e-06 [a_2]: 7.682e-05 [accelerated_algorithm]: 6.12001e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.67999e-06 [merge_send_recv]: 7.81001e-06 [auto_parallel]: 5.97999e-06 [parallel]: 1.724e-05 [flash_sp]: 7.55e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.61001e-06 [matmul_add_comm_reduction]: 8.73001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.84e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.143e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.55001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.26998e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.041e-05 [a_after_grad]: 8.91002e-06 [renormalize]: 0.00036178 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.324e-05 [cse]: 2.793e-05 [a_3]: 4.054e-05 [Cycle 2]: 0.0006223, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.86999e-06 [loop_unroll]: 5.60001e-06 [a_1]: 0.0001257 [with_stream_mark]: 1.136e-05 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.93998e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 7.50006e-07 [a_2]: 6.965e-05 [accelerated_algorithm]: 5.77001e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.29998e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 2.475e-05 [auto_parallel]: 5.76e-06 [parallel]: 4.57e-06 [flash_sp]: 3.35e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.00001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.28e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.63002e-06 [virtual_output]: 5.25999e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.005e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 7.75e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.79998e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.267e-05 [a_3]: 3.218e-05 [py_interpret_to_execute_after_opt_a]: 7.86001e-06 [slice_cell_reuse_recomputed_activation]: 1.80001e-06 [rewriter_after_opt_a]: 3.088e-05 [convert_after_rewriter]: 6.67002e-06 [order_py_execute_after_rewriter]: 5.34998e-06 [mutable_eliminate]: 0.00044013 [opt_b]: 0.00018425, [1] [Cycle 1]: 0.00017777, [7] [b_1]: 0.00010948 [b_2]: 7.34002e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 3.00002e-07 [cse]: 1.571e-05 [optimize_parallel_all_gather_comm]: 1.528e-05 [overlap_param_gather]: 1.71998e-06 [cconv]: 2.303e-05 [loop_unroll]: 0.00040987 [opt_after_cconv]: 9.43e-05, [1] [Cycle 1]: 8.869e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.51e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.40002e-06 [cse]: 1.541e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.282e-05 [tuple_transform]: 6.994e-05, [1] [Cycle 1]: 6.527e-05, [4] [d_1]: 3.913e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 4.237e-05 [cse_after_recomputation]: 1.909e-05, [1] [Cycle 1]: 1.495e-05, [1] [cse]: 9.72001e-06 [environ_conv]: 3.97e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 1.04998e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.809e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.828e-05, [1] [Cycle 1]: 6.419e-05, [6] [build]: 2.14e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.177e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.79998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00044353 [validate]: 3.105e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0059361 [execute]: 7.21999e-06 Sums bootstrap : 0.000405s : 2.85% type_inference : 0.004332s : 30.45% event_method : 0.000012s : 0.08% auto_monad : 0.000052s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000420s : 2.95% optimize.opt_a.with_stream_mark : 0.000025s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000033s : 0.23% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000362s : 2.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.29% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000440s : 3.09% optimize.opt_b.b_1 : 0.000109s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000410s : 2.88% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.02% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.13% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000444s : 3.12% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005936s : 41.73% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 17.90% : 0.000022s : 4: substitution.arithmetic_simplify 1.72% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.39% : 0.000005s : 4: substitution.graph_param_transform 65.28% : 0.000079s : 2: substitution.inline 2.52% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.95% : 0.000005s : 4: substitution.remove_not_recompute_node 3.18% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004290 2 91.90% : 0.003943s : 1: type_inference.infer 8.10% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000139 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.67% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.62% : 0.000004s : 17: predicate.arithmetic_simplify 1.12% : 0.000002s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.87% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.35% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.91% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.69% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.53% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.65% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.76% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.96% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.49% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.51% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026129 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.38% : 0.002973s : 1: add_attr 11.34% : 0.002963s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.68% : 0.000440s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000017s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000449s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.97% : 0.000777s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.33% : 0.001915s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.73% : 0.000453s : 1: opt_after_jit_grad 0.72% : 0.000188s : 1: opt_b 14.20% : 0.003712s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.78% : 0.000204s : 1: renormalize.infer 0.58% : 0.000151s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.76% : 0.005948s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.64% : 0.004347s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x7-kbk],max_mem:52.0M TotalTime = 0.821389, [24] [bootstrap]: 0.00051521 [type_inference]: 0.00610583 [event_method]: 1.436e-05 [auto_monad]: 5.511e-05 [graph_reusing]: 5.45001e-06 [inline]: 1.82999e-06 [add_attr]: 0.00343529, [1] [add_attr_with_inline]: 0.00342255, [1] [Cycle 1]: 5.283e-05, [2] [tag_attr]: 1.721e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 3.64002e-06 [pre_auto_parallel]: 2.97e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 1.00999e-06 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00415563, [53] [py_interpret_to_execute]: 2.288e-05 [rewriter_before_opt_a]: 6.388e-05 [opt_a]: 0.00227913, [2] [Cycle 1]: 0.00167514, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.226e-05 [loop_unroll]: 2.109e-05 [a_1]: 0.00049727 [with_stream_mark]: 1.494e-05 [recompute_prepare]: 8.74e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.14001e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.71002e-06 [a_2]: 9.207e-05 [accelerated_algorithm]: 6.86999e-06 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 6.41e-06 [merge_send_recv]: 7.93999e-06 [auto_parallel]: 6.68e-06 [parallel]: 2.251e-05 [flash_sp]: 7.78999e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.40998e-06 [matmul_add_comm_reduction]: 9.07999e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.73001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.49999e-06 [flash_sp_send_recv_attached]: 2.38998e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 1.1e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00050737 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 2.07001e-06 [auto_monad_eliminator]: 1.349e-05 [cse]: 2.66e-05 [a_3]: 4.076e-05 [Cycle 2]: 0.00059406, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012585 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 6.09999e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 7.60017e-07 [a_2]: 6.774e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.39998e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.52e-06 [flash_sp]: 3.21001e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.30001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.05002e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.88001e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.67999e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.48997e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.90998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.38002e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 9.60019e-07 [auto_monad_grad]: 9.99979e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.542e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 7.62002e-06 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.324e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.41998e-06 [mutable_eliminate]: 0.00046812 [opt_b]: 0.0001814, [1] [Cycle 1]: 0.00017538, [7] [b_1]: 0.00010813 [b_2]: 7.02002e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 2.80008e-07 [cse]: 1.594e-05 [optimize_parallel_all_gather_comm]: 1.623e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 2.304e-05 [loop_unroll]: 0.00041152 [opt_after_cconv]: 9.492e-05, [1] [Cycle 1]: 8.895e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.54e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.234e-05 [tuple_transform]: 7.023e-05, [1] [Cycle 1]: 6.573e-05, [4] [d_1]: 3.972e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.30002e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 5.069e-05 [cse_after_recomputation]: 1.978e-05, [1] [Cycle 1]: 1.537e-05, [1] [cse]: 1.033e-05 [environ_conv]: 4.13999e-06 [swap_dp_allreduce_reducescatter]: 4.97999e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.22999e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.16e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.53999e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.50002e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.826e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 7.109e-05, [1] [Cycle 1]: 6.698e-05, [6] [build]: 2.79001e-06 [elim_shapecalc]: 9.09998e-06 [elim_not_effective]: 1.199e-05 [opt_reshape]: 6.28002e-06 [fold_const_symbol]: 9.15001e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.78002e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.609e-05 [get_jit_bprop_graph]: 1.67999e-06 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00044509 [validate]: 3.293e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.806322 [execute]: 9.02999e-06 Sums bootstrap : 0.000515s : 0.06% type_inference : 0.006106s : 0.75% event_method : 0.000014s : 0.00% auto_monad : 0.000055s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000030s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000064s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000623s : 0.08% optimize.opt_a.with_stream_mark : 0.000025s : 0.00% optimize.opt_a.recompute_prepare : 0.000015s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000160s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.00% optimize.opt_a.virtual_output : 0.000010s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000507s : 0.06% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000042s : 0.01% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000468s : 0.06% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.00% optimize.loop_unroll : 0.000412s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000445s : 0.05% validate : 0.000033s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.806322s : 98.70% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000178 30 15.16% : 0.000027s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.03% : 0.000005s : 4: substitution.graph_param_transform 66.87% : 0.000119s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.58% : 0.000005s : 4: substitution.replace_old_param 6.44% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006060 2 90.96% : 0.005512s : 1: type_inference.infer 9.04% : 0.000548s : 1: type_inference.specialize ------[replace.] 0.000043 5 70.75% : 0.000030s : 3: replace.inline 29.25% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000127 5 91.85% : 0.000117s : 3: match.inline 8.15% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 1131 0.98% : 0.000002s : 11: predicate.accumulaten_eliminater 0.92% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 1.04% : 0.000002s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.94% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.17% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.35% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.94% : 0.000002s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000002s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.48% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.59% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.70% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 8: predicate.shard_identity_eliminate 0.69% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.48% : 0.000002s : 16: predicate.switch_defer_inline 2.10% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.27% : 0.000009s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 1.00% : 0.000002s : 11: predicate.transpose_eliminate 1.53% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000353 8 44.72% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.28% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.830643 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.41% : 0.003440s : 1: add_attr 0.41% : 0.003427s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000060s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000554s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000007s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000477s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.12% : 0.001009s : 78: opt.transform.opt_a 0.00% : 0.000026s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.27% : 0.002282s : 1: opt_a 0.01% : 0.000099s : 1: opt_after_cconv 0.05% : 0.000455s : 1: opt_after_jit_grad 0.02% : 0.000185s : 1: opt_b 0.50% : 0.004159s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000034s : 1: pre_auto_parallel 0.00% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.03% : 0.000276s : 1: renormalize.infer 0.03% : 0.000224s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000038s : 1: rewriter_after_opt_a 0.01% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000074s : 1: symbol_engine_optimizer 97.07% : 0.806345s : 1: task_emit 0.01% : 0.000073s : 1: tuple_transform 0.74% : 0.006121s : 1: type_inference 0.01% : 0.000061s : 1: validate TotalTime = 0.0794616, [24] [bootstrap]: 0.0004094 [type_inference]: 0.0043318 [event_method]: 1.053e-05 [auto_monad]: 4.942e-05 [graph_reusing]: 4.72e-06 [inline]: 1.77001e-06 [add_attr]: 0.00298314, [1] [add_attr_with_inline]: 0.00297464, [1] [Cycle 1]: 4.01e-05, [2] [tag_attr]: 1.212e-05 [meta_addattr_fg_expand]: 3.01001e-06 [parallel-infer-symbol]: 3.29001e-06 [pre_auto_parallel]: 2.197e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00378198, [53] [py_interpret_to_execute]: 1.52e-05 [rewriter_before_opt_a]: 3.957e-05 [opt_a]: 0.00192947, [2] [Cycle 1]: 0.00132305, [45] [expand_dump_flag]: 2.76e-06 [switch_simplify]: 2.424e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.00029595 [with_stream_mark]: 1.446e-05 [recompute_prepare]: 7.41999e-06 [updatestate_depend_eliminate]: 3.62002e-06 [updatestate_assign_eliminate]: 3.35003e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.663e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 8e-06 [auto_parallel]: 6.20002e-06 [parallel]: 1.836e-05 [flash_sp]: 7.77e-06 [merge_comm]: 3.53999e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 8.74e-06 [allreduce_slice_to_reducescatter]: 7.50006e-07 [virtual_shard_identity]: 7.35998e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.62002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.009e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.52001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.33998e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 1.073e-05 [a_after_grad]: 8.96002e-06 [renormalize]: 0.00040315 [add_forward_monad_depend]: 4.65999e-06 [auto_monad_grad]: 1.91e-06 [auto_monad_eliminator]: 1.318e-05 [cse]: 2.698e-05 [a_3]: 4.058e-05 [Cycle 2]: 0.00059721, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.2e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012626 [with_stream_mark]: 9.51003e-06 [recompute_prepare]: 5.67001e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.50002e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 6.755e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.66003e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.52001e-06 [parallel]: 4.42e-06 [flash_sp]: 3.85e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.00999e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.85999e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.46002e-06 [offload_activation]: 5.76998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39998e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.33001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 1.00999e-06 [receive_attached]: 1.06002e-06 [after_resolve]: 9.32999e-06 [a_after_grad]: 8.34998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.16e-06 [cse]: 1.279e-05 [a_3]: 3.181e-05 [py_interpret_to_execute_after_opt_a]: 7.65e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.321e-05 [convert_after_rewriter]: 7.30998e-06 [order_py_execute_after_rewriter]: 5.22e-06 [mutable_eliminate]: 0.0004823 [opt_b]: 0.0001826, [1] [Cycle 1]: 0.0001764, [7] [b_1]: 0.00010881 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.16998e-06 [renormalize]: 3.80009e-07 [cse]: 1.656e-05 [optimize_parallel_all_gather_comm]: 1.663e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.21e-05 [loop_unroll]: 0.0004133 [opt_after_cconv]: 9.604e-05, [1] [Cycle 1]: 9.033e-05, [7] [c_1]: 2.788e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.682e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 1.258e-05 [tuple_transform]: 6.94e-05, [1] [Cycle 1]: 6.523e-05, [4] [d_1]: 3.942e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.42001e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 4.517e-05 [cse_after_recomputation]: 2.044e-05, [1] [Cycle 1]: 1.621e-05, [1] [cse]: 1.124e-05 [environ_conv]: 4.97e-06 [swap_dp_allreduce_reducescatter]: 5.59998e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.13999e-06 [label_fine_grained_interleaved_index]: 2.68003e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 1.35999e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91003e-06 [control_data_broadcast_order]: 1.168e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.52001e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.809e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.856e-05, [1] [Cycle 1]: 6.422e-05, [6] [build]: 2.48998e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.138e-05 [opt_reshape]: 6.36e-06 [fold_const_symbol]: 8.82e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.644e-05 [get_jit_bprop_graph]: 1.16002e-06 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00044512 [validate]: 3.242e-05 [backend_pass]: 1.18001e-06 [task_emit]: 0.0671272 [execute]: 9.41e-06 Sums bootstrap : 0.000409s : 0.54% type_inference : 0.004332s : 5.74% event_method : 0.000011s : 0.01% auto_monad : 0.000049s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000422s : 0.56% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000403s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000482s : 0.64% optimize.opt_b.b_1 : 0.000109s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000413s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000445s : 0.59% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.067127s : 88.92% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.09% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.11% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000005s : 4: substitution.graph_param_transform 66.01% : 0.000081s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.58% : 0.000004s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004290 2 91.44% : 0.003923s : 1: type_inference.infer 8.56% : 0.000367s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000140 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.79% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.59% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.00% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.71% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.08% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.77% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000002s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.37% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.26% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.19% : 0.000002s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.20% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.70% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.19% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000264 6 41.03% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.97% : 0.000156s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087560 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.41% : 0.002987s : 1: add_attr 3.40% : 0.002978s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000055s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.51% : 0.000446s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000492s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.88% : 0.000775s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.21% : 0.001932s : 1: opt_a 0.11% : 0.000100s : 1: opt_after_cconv 0.52% : 0.000455s : 1: opt_after_jit_grad 0.21% : 0.000186s : 1: opt_b 4.32% : 0.003786s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.26% : 0.000228s : 1: renormalize.infer 0.19% : 0.000169s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 76.69% : 0.067150s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 4.97% : 0.004348s : 1: type_inference 0.07% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x7-ge],max_mem:52.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x8-pynative],max_mem:52.0M TotalTime = 0.0215178, [24] [bootstrap]: 0.00055774 [type_inference]: 0.00614946 [event_method]: 1.452e-05 [auto_monad]: 5.868e-05 [graph_reusing]: 5.57999e-06 [inline]: 2.09999e-06 [add_attr]: 0.00347168, [1] [add_attr_with_inline]: 0.00346015, [1] [Cycle 1]: 5.079e-05, [2] [tag_attr]: 1.594e-05 [meta_addattr_fg_expand]: 4.48999e-06 [parallel-infer-symbol]: 3.68999e-06 [pre_auto_parallel]: 3.106e-05 [insert-virtual-dataset]: 2.54001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.83002e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00415585, [53] [py_interpret_to_execute]: 2.052e-05 [rewriter_before_opt_a]: 5.91e-05 [opt_a]: 0.00228653, [2] [Cycle 1]: 0.00167017, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 3.222e-05 [loop_unroll]: 2.176e-05 [a_1]: 0.0004718 [with_stream_mark]: 1.351e-05 [recompute_prepare]: 7.98999e-06 [updatestate_depend_eliminate]: 3.55998e-06 [updatestate_assign_eliminate]: 3.38999e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.714e-05 [accelerated_algorithm]: 6.80998e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 7.63001e-06 [auto_parallel]: 6.02999e-06 [parallel]: 2.363e-05 [flash_sp]: 7.76001e-06 [merge_comm]: 3.59002e-06 [allreduce_fusion]: 3.17002e-06 [matmul_add_comm_reduction]: 9.07001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.41998e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.20999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.065e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 9.54999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.40998e-06 [meta_fg_expand]: 2.34999e-06 [flash_sp_send_recv_attached]: 2.96999e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.018e-05 [a_after_grad]: 8.73001e-06 [renormalize]: 0.00051269 [add_forward_monad_depend]: 4.95999e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.403e-05 [cse]: 2.807e-05 [a_3]: 7.208e-05 [Cycle 2]: 0.00060696, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.56998e-06 [a_1]: 0.00012881 [with_stream_mark]: 1.066e-05 [recompute_prepare]: 5.52999e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.84e-05 [accelerated_algorithm]: 5.53002e-06 [shard]: 1.39998e-06 [meta_shard_fg_expand]: 1.21997e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.44998e-06 [parallel]: 4.47e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 5.99999e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.69e-06 [offload_activation]: 6.27001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.28999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.30998e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.27e-06 [after_resolve]: 9.06002e-06 [a_after_grad]: 8.33999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 1.36002e-06 [auto_monad_eliminator]: 7.01001e-06 [cse]: 1.373e-05 [a_3]: 3.181e-05 [py_interpret_to_execute_after_opt_a]: 8.12e-06 [slice_cell_reuse_recomputed_activation]: 1.67001e-06 [rewriter_after_opt_a]: 3.219e-05 [convert_after_rewriter]: 7.11001e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00045977 [opt_b]: 0.00018504, [1] [Cycle 1]: 0.00017881, [7] [b_1]: 0.00010955 [b_2]: 8.02e-06 [updatestate_depend_eliminate]: 5.46e-06 [updatestate_assign_eliminate]: 2.35997e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 3.39991e-07 [cse]: 1.693e-05 [optimize_parallel_all_gather_comm]: 1.523e-05 [overlap_param_gather]: 1.86003e-06 [cconv]: 2.399e-05 [loop_unroll]: 0.0004155 [opt_after_cconv]: 9.575e-05, [1] [Cycle 1]: 9.016e-05, [7] [c_1]: 2.793e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.05002e-06 [cse]: 1.707e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.246e-05 [tuple_transform]: 6.895e-05, [1] [Cycle 1]: 6.446e-05, [4] [d_1]: 3.912e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.15002e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.81e-05 [cse_after_recomputation]: 2.116e-05, [1] [Cycle 1]: 1.665e-05, [1] [cse]: 1.159e-05 [environ_conv]: 4.27e-06 [swap_dp_allreduce_reducescatter]: 5.46e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.49998e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.53e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.52999e-06 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.37999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.90025e-07 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.82001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.268e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 3.88999e-06 [overlap_recompute_and_grad_model_parallel]: 4.75001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.751e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.68002e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 6.908e-05, [1] [Cycle 1]: 6.473e-05, [6] [build]: 2.54999e-06 [elim_shapecalc]: 8.54998e-06 [elim_not_effective]: 1.115e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.647e-05 [get_jit_bprop_graph]: 1.32e-06 [rewriter_after_jit_bprop_graph]: 3.81001e-06 [opt_after_jit_grad]: 0.00044559 [validate]: 3.31e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00633034 [execute]: 6.88998e-06 Sums bootstrap : 0.000558s : 3.27% type_inference : 0.006149s : 36.07% event_method : 0.000015s : 0.09% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000031s : 0.18% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000601s : 3.52% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000146s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000513s : 3.01% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000042s : 0.25% optimize.opt_a.a_3 : 0.000104s : 0.61% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000460s : 2.70% optimize.opt_b.b_1 : 0.000110s : 0.64% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000415s : 2.44% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000013s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000446s : 2.61% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006330s : 37.13% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000175 30 14.92% : 0.000026s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.04% : 0.000005s : 4: substitution.graph_param_transform 66.96% : 0.000117s : 3: substitution.inline 2.01% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.44% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.47% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006103 2 90.65% : 0.005532s : 1: type_inference.infer 9.35% : 0.000570s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.97% : 0.000027s : 3: replace.inline 30.03% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000125 5 91.81% : 0.000115s : 3: match.inline 8.19% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 1.15% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000002s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.68% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.30% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.52% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.06% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000364 8 46.59% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.41% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030807 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.28% : 0.003476s : 1: add_attr 11.24% : 0.003464s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.94% : 0.000598s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000016s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000469s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.24% : 0.000999s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.43% : 0.002290s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.48% : 0.000456s : 1: opt_after_jit_grad 0.61% : 0.000188s : 1: opt_b 13.50% : 0.004160s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000035s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.87% : 0.000269s : 1: renormalize.infer 0.77% : 0.000237s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 20.59% : 0.006343s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.01% : 0.006166s : 1: type_inference 0.22% : 0.000069s : 1: validate TotalTime = 0.0182694, [24] [bootstrap]: 0.00042995 [type_inference]: 0.00438051 [event_method]: 1.086e-05 [auto_monad]: 5.216e-05 [graph_reusing]: 4.65001e-06 [inline]: 2.22999e-06 [add_attr]: 0.00298711, [1] [add_attr_with_inline]: 0.00297914, [1] [Cycle 1]: 4.19e-05, [2] [tag_attr]: 1.185e-05 [meta_addattr_fg_expand]: 3.14999e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 2.303e-05 [insert-virtual-dataset]: 2.86e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.76998e-06 [optimize]: 0.00371738, [53] [py_interpret_to_execute]: 1.614e-05 [rewriter_before_opt_a]: 3.837e-05 [opt_a]: 0.00190412, [2] [Cycle 1]: 0.00129872, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 2.467e-05 [loop_unroll]: 1.416e-05 [a_1]: 0.00029542 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.32002e-06 [updatestate_depend_eliminate]: 3.35e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.783e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 6.14001e-06 [parallel]: 1.655e-05 [flash_sp]: 7.41999e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 9.19e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.29001e-06 [virtual_dataset]: 6.21e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.03002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.093e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 9.65002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.53e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.082e-05 [a_after_grad]: 8.66997e-06 [renormalize]: 0.00037862 [add_forward_monad_depend]: 4.15e-06 [auto_monad_grad]: 2.16998e-06 [auto_monad_eliminator]: 1.365e-05 [cse]: 2.74e-05 [a_3]: 4.075e-05 [Cycle 2]: 0.00059624, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 6.80998e-06 [loop_unroll]: 5.65001e-06 [a_1]: 0.00012562 [with_stream_mark]: 9.41998e-06 [recompute_prepare]: 5.67001e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.26e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.888e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.60001e-06 [merge_send_recv]: 4.47003e-06 [auto_parallel]: 5.57001e-06 [parallel]: 4.16001e-06 [flash_sp]: 3.04001e-06 [merge_comm]: 2.87002e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.24998e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 5.98998e-06 [virtual_dataset]: 5.36998e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.041e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.16002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.55001e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.02e-06 [a_after_grad]: 7.95e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.02001e-06 [cse]: 1.221e-05 [a_3]: 3.144e-05 [py_interpret_to_execute_after_opt_a]: 7.78999e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.194e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.15001e-06 [mutable_eliminate]: 0.00044872 [opt_b]: 0.00018392, [1] [Cycle 1]: 0.00017786, [7] [b_1]: 0.00010939 [b_2]: 7.51001e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 4.50003e-07 [cse]: 1.635e-05 [optimize_parallel_all_gather_comm]: 1.571e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.182e-05 [loop_unroll]: 0.00041165 [opt_after_cconv]: 9.586e-05, [1] [Cycle 1]: 9.004e-05, [7] [c_1]: 2.812e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.48e-06 [cse]: 1.653e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.206e-05 [tuple_transform]: 7.033e-05, [1] [Cycle 1]: 6.601e-05, [4] [d_1]: 3.958e-05 [none_parameter_eliminate]: 1.72001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.57002e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.446e-05 [cse_after_recomputation]: 1.958e-05, [1] [Cycle 1]: 1.526e-05, [1] [cse]: 1.01e-05 [environ_conv]: 5.56e-06 [swap_dp_allreduce_reducescatter]: 5.47001e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.61999e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.44e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 1.01997e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.51002e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 4.02e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 4.50001e-06 [overlap_grad_flash_sp]: 1.736e-05 [begin_end_overlap_inline]: 8.00006e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.914e-05, [1] [Cycle 1]: 6.508e-05, [6] [build]: 2.47001e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 8.81002e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.517e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00048266 [validate]: 3.062e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.00590779 [execute]: 6.54001e-06 Sums bootstrap : 0.000430s : 3.00% type_inference : 0.004381s : 30.60% event_method : 0.000011s : 0.08% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000421s : 2.94% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000379s : 2.65% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000449s : 3.13% optimize.opt_b.b_1 : 0.000109s : 0.76% optimize.opt_b.b_2 : 0.000008s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000412s : 2.88% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000006s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000483s : 3.37% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005908s : 41.27% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 17.95% : 0.000022s : 4: substitution.arithmetic_simplify 1.59% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.43% : 0.000005s : 4: substitution.graph_param_transform 65.42% : 0.000079s : 2: substitution.inline 2.38% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.94% : 0.000005s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004337 2 91.88% : 0.003985s : 1: type_inference.infer 8.12% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.77% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.50% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.90% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.17% : 0.000009s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.03% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.37% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 4: predicate.row_tensor_eliminate 1.02% : 0.000001s : 8: predicate.same_eliminate 0.68% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.35% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 42.77% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.23% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026293 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.38% : 0.002991s : 1: add_attr 11.34% : 0.002983s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.77% : 0.000465s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000420s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000458s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000777s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.25% : 0.001907s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.87% : 0.000492s : 1: opt_after_jit_grad 0.71% : 0.000187s : 1: opt_b 14.15% : 0.003721s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.82% : 0.000215s : 1: renormalize.infer 0.60% : 0.000157s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.51% : 0.005919s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.72% : 0.004396s : 1: type_inference 0.22% : 0.000058s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x8-kbk],max_mem:52.0M TotalTime = 0.126801, [24] [bootstrap]: 0.00051434 [type_inference]: 0.0060464 [event_method]: 1.389e-05 [auto_monad]: 5.663e-05 [graph_reusing]: 5.51e-06 [inline]: 1.77999e-06 [add_attr]: 0.00341384, [1] [add_attr_with_inline]: 0.00340341, [1] [Cycle 1]: 4.828e-05, [2] [tag_attr]: 1.589e-05 [meta_addattr_fg_expand]: 4.43999e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 2.762e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.75001e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00403532, [53] [py_interpret_to_execute]: 2.117e-05 [rewriter_before_opt_a]: 5.853e-05 [opt_a]: 0.0021667, [2] [Cycle 1]: 0.00155803, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.244e-05 [loop_unroll]: 2.134e-05 [a_1]: 0.00046113 [with_stream_mark]: 1.383e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 3.97e-06 [updatestate_assign_eliminate]: 3.76999e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.95001e-06 [a_2]: 7.614e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.03997e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 7.92003e-06 [auto_parallel]: 6.17001e-06 [parallel]: 2.483e-05 [flash_sp]: 7.26999e-06 [merge_comm]: 3.95998e-06 [allreduce_fusion]: 3.45998e-06 [matmul_add_comm_reduction]: 8.49002e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 7.16999e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.41998e-06 [virtual_output]: 5.59e-06 [merge_forward]: 4.28001e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.009e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.091e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.35001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.43998e-06 [flash_sp_send_recv_attached]: 2.56998e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 9.20999e-06 [renormalize]: 0.00044255 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 2.41e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 2.778e-05 [a_3]: 4.134e-05 [Cycle 2]: 0.00059882, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.86999e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012702 [with_stream_mark]: 9.84001e-06 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.60002e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 6.741e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.01997e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.62001e-06 [parallel]: 4.53001e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.70997e-06 [matmul_add_comm_reduction]: 4.95001e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.07999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59999e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.93002e-06 [a_after_grad]: 7.73001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 1.06002e-06 [auto_monad_eliminator]: 6.56e-06 [cse]: 1.388e-05 [a_3]: 3.288e-05 [py_interpret_to_execute_after_opt_a]: 7.63999e-06 [slice_cell_reuse_recomputed_activation]: 1.96003e-06 [rewriter_after_opt_a]: 3.124e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.27999e-06 [mutable_eliminate]: 0.00045453 [opt_b]: 0.00020378, [1] [Cycle 1]: 0.00019732, [7] [b_1]: 0.00010978 [b_2]: 6.90002e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.19997e-07 [cse]: 3.615e-05 [optimize_parallel_all_gather_comm]: 1.588e-05 [overlap_param_gather]: 1.69998e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.00041423 [opt_after_cconv]: 9.49e-05, [1] [Cycle 1]: 8.903e-05, [7] [c_1]: 2.776e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.18002e-06 [cse]: 1.578e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.206e-05 [tuple_transform]: 6.979e-05, [1] [Cycle 1]: 6.548e-05, [4] [d_1]: 3.975e-05 [none_parameter_eliminate]: 1.44998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.44001e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.853e-05 [cse_after_recomputation]: 2.126e-05, [1] [Cycle 1]: 1.684e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.42e-06 [swap_dp_allreduce_reducescatter]: 5.29998e-06 [bias_add_comm_swap]: 2.33998e-06 [label_micro_interleaved_index]: 4.14997e-06 [label_fine_grained_interleaved_index]: 2.44001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.13998e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.50997e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.75001e-06 [offloading_packed_experts]: 3.41999e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.801e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.955e-05, [1] [Cycle 1]: 6.539e-05, [6] [build]: 2.57001e-06 [elim_shapecalc]: 8.23001e-06 [elim_not_effective]: 1.202e-05 [opt_reshape]: 6.33e-06 [fold_const_symbol]: 9.09998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.536e-05 [get_jit_bprop_graph]: 1.30001e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00044712 [validate]: 3.158e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.111946 [execute]: 9.15999e-06 Sums bootstrap : 0.000514s : 0.42% type_inference : 0.006046s : 4.94% event_method : 0.000014s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000588s : 0.48% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000443s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.03% optimize.opt_a.a_3 : 0.000074s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000455s : 0.37% optimize.opt_b.b_1 : 0.000110s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000036s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000414s : 0.34% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.37% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.111946s : 91.46% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000167 30 14.46% : 0.000024s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.20% : 0.000005s : 4: substitution.graph_param_transform 67.09% : 0.000112s : 3: substitution.inline 1.73% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 6.65% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006000 2 90.95% : 0.005457s : 1: type_inference.infer 9.05% : 0.000543s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.86% : 0.000027s : 3: replace.inline 30.14% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.71% : 0.000110s : 3: match.inline 8.29% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.78% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.35% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.21% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000010s : 51: predicate.inline 0.96% : 0.000002s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.32% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000347 8 47.06% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.94% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.135798 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.52% : 0.003418s : 1: add_attr 2.51% : 0.003407s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000062s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000552s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000463s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.70% : 0.000956s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000091s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.60% : 0.002170s : 1: opt_a 0.07% : 0.000098s : 1: opt_after_cconv 0.34% : 0.000456s : 1: opt_after_jit_grad 0.15% : 0.000207s : 1: opt_b 2.97% : 0.004039s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000234s : 1: renormalize.infer 0.15% : 0.000201s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000072s : 1: symbol_engine_optimizer 82.45% : 0.111967s : 1: task_emit 0.05% : 0.000073s : 1: tuple_transform 4.46% : 0.006061s : 1: type_inference 0.04% : 0.000058s : 1: validate TotalTime = 0.115111, [24] [bootstrap]: 0.00050955 [type_inference]: 0.00446521 [event_method]: 1.073e-05 [auto_monad]: 5.079e-05 [graph_reusing]: 4.62998e-06 [inline]: 2.09e-06 [add_attr]: 0.00300003, [1] [add_attr_with_inline]: 0.00299171, [1] [Cycle 1]: 4.711e-05, [2] [tag_attr]: 1.239e-05 [meta_addattr_fg_expand]: 3.33e-06 [parallel-infer-symbol]: 3.31999e-06 [pre_auto_parallel]: 2.15e-05 [insert-virtual-dataset]: 2.98e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.11998e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00374801, [53] [py_interpret_to_execute]: 1.507e-05 [rewriter_before_opt_a]: 3.932e-05 [opt_a]: 0.00193504, [2] [Cycle 1]: 0.00132552, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 2.398e-05 [loop_unroll]: 1.367e-05 [a_1]: 0.00031559 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 7.83999e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 3.35003e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 2.10002e-06 [a_2]: 7.761e-05 [accelerated_algorithm]: 6.91001e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 7.98999e-06 [auto_parallel]: 5.82001e-06 [parallel]: 2.201e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.41998e-06 [merge_forward]: 4.06001e-06 [cell_reuse_recompute_pass]: 1.14003e-06 [offload_activation]: 8.95001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.58002e-06 [before_grad]: 9.31e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27997e-06 [meta_fg_expand]: 2.09999e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.085e-05 [a_after_grad]: 8.48999e-06 [renormalize]: 0.00038231 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 2.27001e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 2.716e-05 [a_3]: 4.244e-05 [Cycle 2]: 0.00059959, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 7.18e-06 [loop_unroll]: 5.80002e-06 [a_1]: 0.0001278 [with_stream_mark]: 9.38002e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 6.77e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.30999e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.30001e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.12999e-06 [parallel]: 4.52998e-06 [flash_sp]: 3.80998e-06 [merge_comm]: 2.95998e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.10013e-07 [virtual_shard_identity]: 6.15002e-06 [virtual_dataset]: 5.19e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.44998e-06 [merge_forward]: 2.99999e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.34998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 8.95999e-06 [a_after_grad]: 8.37e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.48e-06 [cse]: 1.24e-05 [a_3]: 3.292e-05 [py_interpret_to_execute_after_opt_a]: 8e-06 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 3.171e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.28002e-06 [mutable_eliminate]: 0.00045106 [opt_b]: 0.00018081, [1] [Cycle 1]: 0.00017462, [7] [b_1]: 0.00010645 [b_2]: 7.02002e-06 [updatestate_depend_eliminate]: 5.36002e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.90021e-07 [cse]: 1.669e-05 [optimize_parallel_all_gather_comm]: 1.55e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.359e-05 [loop_unroll]: 0.00041411 [opt_after_cconv]: 9.476e-05, [1] [Cycle 1]: 8.917e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.642e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.23e-05 [tuple_transform]: 6.954e-05, [1] [Cycle 1]: 6.493e-05, [4] [d_1]: 3.942e-05 [none_parameter_eliminate]: 1.43002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.439e-05 [cse_after_recomputation]: 2.062e-05, [1] [Cycle 1]: 1.643e-05, [1] [cse]: 1.091e-05 [environ_conv]: 4.71002e-06 [swap_dp_allreduce_reducescatter]: 5.44e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.36002e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.33002e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 1.04998e-06 [remove_cast_before_assign_add]: 7.30011e-07 [full_micro_interleaved_order_control]: 2.46e-06 [reorder_send_recv_between_fp_bp]: 2.38998e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 1.14998e-06 [interleave_split_concat_branches]: 1.14003e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.67e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.678e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 2.24001e-06 [handle_group_info]: 1.26002e-06 [symbol_engine_optimizer]: 6.732e-05, [1] [Cycle 1]: 6.295e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 8.36002e-06 [elim_not_effective]: 1.122e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 8.37e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.75001e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.63e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00047508 [validate]: 3.211e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.102533 [execute]: 9.29998e-06 Sums bootstrap : 0.000510s : 0.46% type_inference : 0.004465s : 4.02% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000443s : 0.40% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000382s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000075s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.41% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000414s : 0.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000475s : 0.43% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.102533s : 92.26% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.63% : 0.000023s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.49% : 0.000005s : 4: substitution.graph_param_transform 65.38% : 0.000080s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.52% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004423 2 91.92% : 0.004066s : 1: type_inference.infer 8.08% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.81% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.17% : 0.000002s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.31% : 0.000003s : 26: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.66% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.95% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.38% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.26% : 0.000002s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.98% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.83% : 0.000001s : 9: predicate.tile_eliminate 0.75% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.78% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000263 6 42.40% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.60% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.123194 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.44% : 0.003004s : 1: add_attr 2.43% : 0.002995s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000547s : 1: bootstrap 0.02% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.65% : 0.000798s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.57% : 0.001938s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.39% : 0.000485s : 1: opt_after_jit_grad 0.15% : 0.000184s : 1: opt_b 3.05% : 0.003752s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000211s : 1: renormalize.infer 0.13% : 0.000164s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 83.25% : 0.102556s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.64% : 0.004481s : 1: type_inference 0.04% : 0.000055s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x8-ge],max_mem:52.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x9-pynative],max_mem:52.0M TotalTime = 0.0221729, [24] [bootstrap]: 0.00056991 [type_inference]: 0.00644083 [event_method]: 1.399e-05 [auto_monad]: 5.8e-05 [graph_reusing]: 5.33002e-06 [inline]: 2.09999e-06 [add_attr]: 0.00364606, [1] [add_attr_with_inline]: 0.00363348, [1] [Cycle 1]: 4.649e-05, [2] [tag_attr]: 1.472e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.994e-05 [insert-virtual-dataset]: 2.87002e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00414793, [53] [py_interpret_to_execute]: 2.187e-05 [rewriter_before_opt_a]: 5.965e-05 [opt_a]: 0.00226942, [2] [Cycle 1]: 0.00165331, [45] [expand_dump_flag]: 2.41e-06 [switch_simplify]: 3.337e-05 [loop_unroll]: 2.119e-05 [a_1]: 0.00046828 [with_stream_mark]: 1.523e-05 [recompute_prepare]: 7.90998e-06 [updatestate_depend_eliminate]: 3.94002e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.16001e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.734e-05 [accelerated_algorithm]: 6.87002e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 8.08001e-06 [auto_parallel]: 6.05002e-06 [parallel]: 2.441e-05 [flash_sp]: 7.55e-06 [merge_comm]: 3.52002e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.27001e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.2e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.60001e-06 [before_grad]: 9.22999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 1.05e-05 [a_after_grad]: 8.68001e-06 [renormalize]: 0.00051444 [add_forward_monad_depend]: 4.64002e-06 [auto_monad_grad]: 2.09e-06 [auto_monad_eliminator]: 1.405e-05 [cse]: 2.909e-05 [a_3]: 4.143e-05 [Cycle 2]: 0.0006064, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 6.67002e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012759 [with_stream_mark]: 9.72001e-06 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 2.90998e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.39999e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.994e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.32999e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.47998e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 6.16e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 6.71e-06 [virtual_dataset]: 5.72999e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.36998e-06 [merge_forward]: 3.08e-06 [cell_reuse_recompute_pass]: 1.54998e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 9.49978e-07 [after_resolve]: 9.04e-06 [a_after_grad]: 8.46002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.34998e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.73e-06 [cse]: 1.338e-05 [a_3]: 3.279e-05 [py_interpret_to_execute_after_opt_a]: 7.95e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.229e-05 [convert_after_rewriter]: 6.90998e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.00046967 [opt_b]: 0.00018543, [1] [Cycle 1]: 0.00017934, [7] [b_1]: 0.00011029 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 4.39992e-07 [cse]: 1.72e-05 [optimize_parallel_all_gather_comm]: 1.524e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.192e-05 [loop_unroll]: 0.00041258 [opt_after_cconv]: 9.573e-05, [1] [Cycle 1]: 8.985e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 2.43e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.647e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.271e-05 [tuple_transform]: 7.13e-05, [1] [Cycle 1]: 6.692e-05, [4] [d_1]: 4.091e-05 [none_parameter_eliminate]: 1.94e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.50999e-06 [add_recomputation]: 4.869e-05 [cse_after_recomputation]: 2.085e-05, [1] [Cycle 1]: 1.637e-05, [1] [cse]: 1.11e-05 [environ_conv]: 5.26002e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.78001e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.25999e-06 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.83003e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.04003e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.156e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.40999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.33998e-06 [overlap_grad_ring_attention]: 4.07e-06 [overlap_grad_flash_sp]: 1.79e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.01998e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.989e-05, [1] [Cycle 1]: 6.574e-05, [6] [build]: 2.57001e-06 [elim_shapecalc]: 8.85999e-06 [elim_not_effective]: 1.165e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.555e-05 [get_jit_bprop_graph]: 1.35001e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00049811 [validate]: 3.283e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.0064667 [execute]: 7.3e-06 Sums bootstrap : 0.000570s : 3.25% type_inference : 0.006441s : 36.77% event_method : 0.000014s : 0.08% auto_monad : 0.000058s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000030s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.34% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000596s : 3.40% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000147s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000515s : 2.94% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000042s : 0.24% optimize.opt_a.a_3 : 0.000074s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000470s : 2.68% optimize.opt_b.b_1 : 0.000110s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000413s : 2.36% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000498s : 2.84% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006467s : 36.91% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000175 30 14.92% : 0.000026s : 5: substitution.arithmetic_simplify 0.99% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.32% : 0.000006s : 4: substitution.graph_param_transform 67.54% : 0.000118s : 3: substitution.inline 1.51% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.10% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006395 2 91.01% : 0.005820s : 1: type_inference.infer 8.99% : 0.000575s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.73% : 0.000028s : 3: replace.inline 29.27% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000126 5 92.33% : 0.000116s : 3: match.inline 7.67% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 1.04% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 1.01% : 0.000002s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.12% : 0.000003s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.96% : 0.000003s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000010s : 51: predicate.inline 0.75% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.71% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.47% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.63% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.08% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000384 8 46.39% : 0.000178s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.61% : 0.000206s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031604 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.55% : 0.003650s : 1: add_attr 11.51% : 0.003637s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000063s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000613s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.33% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000479s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.07% : 0.000970s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.19% : 0.002272s : 1: opt_a 0.31% : 0.000099s : 1: opt_after_cconv 1.61% : 0.000508s : 1: opt_after_jit_grad 0.60% : 0.000189s : 1: opt_b 13.14% : 0.004152s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.90% : 0.000284s : 1: renormalize.infer 0.71% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000036s : 1: rewriter_after_opt_a 0.20% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000073s : 1: symbol_engine_optimizer 20.50% : 0.006478s : 1: task_emit 0.23% : 0.000074s : 1: tuple_transform 20.42% : 0.006455s : 1: type_inference 0.22% : 0.000069s : 1: validate TotalTime = 0.0182731, [24] [bootstrap]: 0.00041835 [type_inference]: 0.0043528 [event_method]: 1.084e-05 [auto_monad]: 5.027e-05 [graph_reusing]: 5.48997e-06 [inline]: 1.91998e-06 [add_attr]: 0.00296384, [1] [add_attr_with_inline]: 0.00295537, [1] [Cycle 1]: 4.654e-05, [2] [tag_attr]: 1.183e-05 [meta_addattr_fg_expand]: 3.5e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.217e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00379839, [53] [py_interpret_to_execute]: 1.586e-05 [rewriter_before_opt_a]: 3.933e-05 [opt_a]: 0.00195661, [2] [Cycle 1]: 0.00133763, [45] [expand_dump_flag]: 2.90002e-06 [switch_simplify]: 2.396e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.00029407 [with_stream_mark]: 1.361e-05 [recompute_prepare]: 7.3e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.61999e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 1.87999e-06 [a_2]: 7.756e-05 [accelerated_algorithm]: 6.54999e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.56998e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 7.46001e-06 [auto_parallel]: 6.07001e-06 [parallel]: 1.748e-05 [flash_sp]: 6.96001e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 8.55999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.94999e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 4.09002e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.013e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.113e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 9.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.25002e-06 [receive_attached]: 2.53003e-06 [after_resolve]: 1.125e-05 [a_after_grad]: 8.45001e-06 [renormalize]: 0.00039136 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.99e-06 [auto_monad_eliminator]: 1.46e-05 [cse]: 2.705e-05 [a_3]: 4.207e-05 [Cycle 2]: 0.00060882, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 7.12997e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.0001283 [with_stream_mark]: 1.228e-05 [recompute_prepare]: 6.16e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 6.855e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.27999e-06 [shard_inline]: 5.83997e-06 [merge_send_recv]: 4.68999e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.82e-06 [flash_sp]: 3.35003e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 6.29001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.53998e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.09998e-06 [virtual_output]: 5.13002e-06 [merge_forward]: 2.99001e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 6.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.91998e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.35001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 1.20999e-06 [receive_attached]: 1.15001e-06 [after_resolve]: 9.92999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.261e-05 [a_3]: 3.215e-05 [py_interpret_to_execute_after_opt_a]: 8.27e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.219e-05 [convert_after_rewriter]: 6.76999e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.00047256 [opt_b]: 0.00018384, [1] [Cycle 1]: 0.00017778, [7] [b_1]: 0.00010926 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 2.40019e-07 [cse]: 1.658e-05 [optimize_parallel_all_gather_comm]: 1.614e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.413e-05 [loop_unroll]: 0.00041427 [opt_after_cconv]: 9.529e-05, [1] [Cycle 1]: 8.95e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.586e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.22e-05 [tuple_transform]: 6.922e-05, [1] [Cycle 1]: 6.439e-05, [4] [d_1]: 3.903e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.622e-05 [cse_after_recomputation]: 2.084e-05, [1] [Cycle 1]: 1.642e-05, [1] [cse]: 1.143e-05 [environ_conv]: 4.59998e-06 [swap_dp_allreduce_reducescatter]: 5.36998e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.70997e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.68003e-06 [assign_add_opt]: 1.21997e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.135e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.95e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.75e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.55001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.809e-05, [1] [Cycle 1]: 6.405e-05, [6] [build]: 2.68e-06 [elim_shapecalc]: 8.13001e-06 [elim_not_effective]: 1.138e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.93002e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.97001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.56e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00044274 [validate]: 3.258e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00590781 [execute]: 7.55e-06 Sums bootstrap : 0.000418s : 2.93% type_inference : 0.004353s : 30.45% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000422s : 2.96% optimize.opt_a.with_stream_mark : 0.000026s : 0.18% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000021s : 0.15% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000391s : 2.74% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.15% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000473s : 3.31% optimize.opt_b.b_1 : 0.000109s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.17% optimize.loop_unroll : 0.000414s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.32% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000443s : 3.10% validate : 0.000033s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.005908s : 41.33% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000123 26 19.72% : 0.000024s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.24% : 0.000005s : 4: substitution.graph_param_transform 63.22% : 0.000078s : 2: substitution.inline 2.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000005s : 4: substitution.remove_not_recompute_node 4.15% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004311 2 91.30% : 0.003936s : 1: type_inference.infer 8.70% : 0.000375s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000140 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.68% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.61% : 0.000004s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.99% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.65% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.00% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.75% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.98% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 6.29% : 0.000009s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.52% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.14% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.79% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 1.29% : 0.000002s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.33% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.72% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000249 6 40.90% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.10% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026367 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.26% : 0.002968s : 1: add_attr 11.22% : 0.002959s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000050s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.82% : 0.000479s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.83% : 0.000482s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.96% : 0.000781s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.43% : 0.001959s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.72% : 0.000452s : 1: opt_after_jit_grad 0.71% : 0.000187s : 1: opt_b 14.42% : 0.003802s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.05% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.83% : 0.000218s : 1: renormalize.infer 0.63% : 0.000166s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.45% : 0.005919s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.56% : 0.004367s : 1: type_inference 0.23% : 0.000060s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x9-kbk],max_mem:52.0M TotalTime = 0.119658, [24] [bootstrap]: 0.00051187 [type_inference]: 0.00608573 [event_method]: 1.348e-05 [auto_monad]: 5.892e-05 [graph_reusing]: 5.54e-06 [inline]: 1.94999e-06 [add_attr]: 0.00338928, [1] [add_attr_with_inline]: 0.00337751, [1] [Cycle 1]: 4.765e-05, [2] [tag_attr]: 1.576e-05 [meta_addattr_fg_expand]: 4.36002e-06 [parallel-infer-symbol]: 2.72001e-06 [pre_auto_parallel]: 2.813e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 1.93002e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00400518, [53] [py_interpret_to_execute]: 1.991e-05 [rewriter_before_opt_a]: 5.801e-05 [opt_a]: 0.00214652, [2] [Cycle 1]: 0.00154575, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 3.246e-05 [loop_unroll]: 2.087e-05 [a_1]: 0.00045765 [with_stream_mark]: 1.356e-05 [recompute_prepare]: 8.36002e-06 [updatestate_depend_eliminate]: 3.93999e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.634e-05 [accelerated_algorithm]: 6.40002e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.32e-06 [auto_parallel]: 5.84999e-06 [parallel]: 2.442e-05 [flash_sp]: 6.83998e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 8.50999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.39999e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.54e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.077e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.82001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.78998e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 1.043e-05 [a_after_grad]: 8.70999e-06 [renormalize]: 0.000439 [add_forward_monad_depend]: 5.04e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.363e-05 [cse]: 2.619e-05 [a_3]: 3.968e-05 [Cycle 2]: 0.00059113, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 7.11001e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012595 [with_stream_mark]: 9.49e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.685e-05 [accelerated_algorithm]: 5.53002e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 4.95999e-06 [parallel]: 4.24997e-06 [flash_sp]: 2.81999e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.81998e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.06002e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.21998e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.74e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.42001e-06 [a_after_grad]: 8.11002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.20002e-06 [cse]: 1.612e-05 [a_3]: 3.292e-05 [py_interpret_to_execute_after_opt_a]: 7.8e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.246e-05 [convert_after_rewriter]: 7.01001e-06 [order_py_execute_after_rewriter]: 4.90999e-06 [mutable_eliminate]: 0.00044449 [opt_b]: 0.0001819, [1] [Cycle 1]: 0.00017562, [7] [b_1]: 0.00010769 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 3.09985e-07 [cse]: 1.595e-05 [optimize_parallel_all_gather_comm]: 1.626e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.269e-05 [loop_unroll]: 0.00043498 [opt_after_cconv]: 9.351e-05, [1] [Cycle 1]: 8.766e-05, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.549e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.301e-05 [tuple_transform]: 6.962e-05, [1] [Cycle 1]: 6.517e-05, [4] [d_1]: 3.958e-05 [none_parameter_eliminate]: 1.67001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.807e-05 [cse_after_recomputation]: 2.046e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.071e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 2.82002e-06 [label_micro_interleaved_index]: 3.78001e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.38002e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.47001e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.753e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.84e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.908e-05, [1] [Cycle 1]: 6.502e-05, [6] [build]: 2.81e-06 [elim_shapecalc]: 8.50999e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.98002e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.549e-05 [get_jit_bprop_graph]: 1.20001e-06 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.0004426 [validate]: 3.13e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.104824 [execute]: 9.07001e-06 Sums bootstrap : 0.000512s : 0.44% type_inference : 0.006086s : 5.28% event_method : 0.000013s : 0.01% auto_monad : 0.000059s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000584s : 0.51% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000439s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000444s : 0.39% optimize.opt_b.b_1 : 0.000108s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000435s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000443s : 0.38% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.104824s : 90.92% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000167 30 14.56% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 67.04% : 0.000112s : 3: substitution.inline 1.61% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.59% : 0.000004s : 4: substitution.remove_not_recompute_node 2.56% : 0.000004s : 4: substitution.replace_old_param 6.55% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006040 2 90.82% : 0.005485s : 1: type_inference.infer 9.18% : 0.000555s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.12% : 0.000028s : 3: replace.inline 30.88% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.74% : 0.000110s : 3: match.inline 8.26% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 1.08% : 0.000002s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.88% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.71% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.99% : 0.000002s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.24% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.31% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 45.65% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.35% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.128586 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.64% : 0.003394s : 1: add_attr 2.63% : 0.003381s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000064s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000548s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000444s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.74% : 0.000949s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.67% : 0.002150s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.35% : 0.000452s : 1: opt_after_jit_grad 0.14% : 0.000185s : 1: opt_b 3.12% : 0.004009s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.18% : 0.000229s : 1: renormalize.infer 0.16% : 0.000203s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 81.54% : 0.104847s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.74% : 0.006100s : 1: type_inference 0.04% : 0.000057s : 1: validate TotalTime = 0.108527, [24] [bootstrap]: 0.00043908 [type_inference]: 0.00445416 [event_method]: 1.099e-05 [auto_monad]: 5.317e-05 [graph_reusing]: 5.29998e-06 [inline]: 2.01e-06 [add_attr]: 0.00293611, [1] [add_attr_with_inline]: 0.00292834, [1] [Cycle 1]: 4.271e-05, [2] [tag_attr]: 1.207e-05 [meta_addattr_fg_expand]: 3.39001e-06 [parallel-infer-symbol]: 3.81999e-06 [pre_auto_parallel]: 2.256e-05 [insert-virtual-dataset]: 2.28998e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 2.03002e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.00378373, [53] [py_interpret_to_execute]: 1.564e-05 [rewriter_before_opt_a]: 4.054e-05 [opt_a]: 0.00197858, [2] [Cycle 1]: 0.001369, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 2.396e-05 [loop_unroll]: 1.323e-05 [a_1]: 0.00029289 [with_stream_mark]: 1.335e-05 [recompute_prepare]: 7.19001e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.55e-06 [updatestate_loads_eliminate]: 2.85002e-06 [parameter_eliminate]: 1.55001e-06 [a_2]: 7.656e-05 [accelerated_algorithm]: 6.79001e-06 [shard]: 2.34999e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 6.10002e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 5.59e-06 [parallel]: 1.808e-05 [flash_sp]: 6.79999e-06 [merge_comm]: 3.65998e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 6.99001e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.79e-06 [merge_forward]: 3.53999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.099e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 9.22999e-06 [set_forward_comm_id_for_comm_node_pass]: 4.2e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.73998e-06 [after_resolve]: 1.196e-05 [a_after_grad]: 9.34e-06 [renormalize]: 0.00042871 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 2.01003e-06 [auto_monad_eliminator]: 1.281e-05 [cse]: 2.887e-05 [a_3]: 3.997e-05 [Cycle 2]: 0.00060042, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.96999e-06 [loop_unroll]: 5.29e-06 [a_1]: 0.00012753 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.23002e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.77e-05 [accelerated_algorithm]: 5.34e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.73001e-06 [auto_parallel]: 5.25001e-06 [parallel]: 4.21001e-06 [flash_sp]: 3.08e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 5.02999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.59001e-06 [virtual_dataset]: 5.54998e-06 [get_grad_eliminate_]: 5.01002e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42999e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.3e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.19e-06 [after_resolve]: 9.96e-06 [a_after_grad]: 8.61002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.285e-05 [a_3]: 3.203e-05 [py_interpret_to_execute_after_opt_a]: 7.64002e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.07e-05 [convert_after_rewriter]: 7.28e-06 [order_py_execute_after_rewriter]: 5.05001e-06 [mutable_eliminate]: 0.00044556 [opt_b]: 0.00018248, [1] [Cycle 1]: 0.00017622, [7] [b_1]: 0.00010821 [b_2]: 7.58999e-06 [updatestate_depend_eliminate]: 5.20999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 4.00003e-07 [cse]: 1.581e-05 [optimize_parallel_all_gather_comm]: 1.642e-05 [overlap_param_gather]: 1.67999e-06 [cconv]: 2.186e-05 [loop_unroll]: 0.00041074 [opt_after_cconv]: 9.669e-05, [1] [Cycle 1]: 9.099e-05, [7] [c_1]: 2.786e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 5.28002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.31998e-06 [cse]: 1.674e-05 [renormalize]: 7.00005e-07 [remove_dup_value]: 1.223e-05 [tuple_transform]: 6.956e-05, [1] [Cycle 1]: 6.54e-05, [4] [d_1]: 3.957e-05 [none_parameter_eliminate]: 1.44998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.39001e-06 [partial_unused_args_eliminate]: 1.94e-06 [add_recomputation]: 4.312e-05 [cse_after_recomputation]: 2.029e-05, [1] [Cycle 1]: 1.604e-05, [1] [cse]: 1.086e-05 [environ_conv]: 4.32e-06 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.39001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.67001e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.06997e-06 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.63002e-06 [control_data_broadcast_order]: 1.168e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.71e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.736e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.61002e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.92e-05, [1] [Cycle 1]: 6.495e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.54e-06 [elim_not_effective]: 1.195e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.498e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00044844 [validate]: 3.291e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.0960844 [execute]: 9.31998e-06 Sums bootstrap : 0.000439s : 0.42% type_inference : 0.004454s : 4.26% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000023s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000041s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000420s : 0.40% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000022s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000429s : 0.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.43% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000411s : 0.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.43% validate : 0.000033s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096084s : 91.87% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.19% : 0.000023s : 4: substitution.arithmetic_simplify 1.62% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.54% : 0.000006s : 4: substitution.graph_param_transform 65.06% : 0.000081s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.63% : 0.000005s : 4: substitution.remove_not_recompute_node 3.76% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004410 2 91.04% : 0.004015s : 1: type_inference.infer 8.96% : 0.000395s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000139 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.51% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.08% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.73% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 1.14% : 0.000002s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 1.01% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.68% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.08% : 0.000002s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000270 6 40.45% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.55% : 0.000161s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.116606 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.52% : 0.002940s : 1: add_attr 2.51% : 0.002932s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000474s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000419s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.66% : 0.000774s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.70% : 0.001982s : 1: opt_a 0.09% : 0.000100s : 1: opt_after_cconv 0.39% : 0.000458s : 1: opt_after_jit_grad 0.16% : 0.000186s : 1: opt_b 3.25% : 0.003787s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.22% : 0.000258s : 1: renormalize.infer 0.14% : 0.000165s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000045s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 82.42% : 0.096107s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 3.83% : 0.004470s : 1: type_inference 0.05% : 0.000055s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y5-dtype_x9-ge],max_mem:54.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x0-pynative],max_mem:54.0M TotalTime = 0.0216194, [24] [bootstrap]: 0.00054939 [type_inference]: 0.00636079 [event_method]: 1.491e-05 [auto_monad]: 5.621e-05 [graph_reusing]: 5.66e-06 [inline]: 1.80001e-06 [add_attr]: 0.00351274, [1] [add_attr_with_inline]: 0.0035006, [1] [Cycle 1]: 4.565e-05, [2] [tag_attr]: 1.573e-05 [meta_addattr_fg_expand]: 3.7e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.794e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.74998e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00400972, [53] [py_interpret_to_execute]: 2.095e-05 [rewriter_before_opt_a]: 5.944e-05 [opt_a]: 0.00215482, [2] [Cycle 1]: 0.00154515, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 3.116e-05 [loop_unroll]: 2.085e-05 [a_1]: 0.00045289 [with_stream_mark]: 1.373e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.56002e-06 [a_2]: 7.541e-05 [accelerated_algorithm]: 6.63998e-06 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 7.88001e-06 [auto_parallel]: 6.23e-06 [parallel]: 2.534e-05 [flash_sp]: 7.53e-06 [merge_comm]: 4.03001e-06 [allreduce_fusion]: 3.19001e-06 [matmul_add_comm_reduction]: 9.14e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 5.96998e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.81998e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.14003e-06 [offload_activation]: 8.95001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.125e-05 [merge_recompute_call_nodes]: 1.29998e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 2.63998e-06 [receive_attached]: 2.79001e-06 [after_resolve]: 1.022e-05 [a_after_grad]: 8.70999e-06 [renormalize]: 0.00044156 [add_forward_monad_depend]: 5.04e-06 [auto_monad_grad]: 1.48002e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.782e-05 [a_3]: 4.115e-05 [Cycle 2]: 0.00060021, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 6.80998e-06 [loop_unroll]: 5.48997e-06 [a_1]: 0.00012456 [with_stream_mark]: 9.79999e-06 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.27001e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 6.742e-05 [accelerated_algorithm]: 5.40001e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.21002e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.67998e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.43999e-06 [flash_sp]: 2.94999e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.87002e-06 [matmul_add_comm_reduction]: 6.95998e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 5.14998e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 6.33002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.92001e-06 [a_after_grad]: 8.49002e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 8.40024e-07 [auto_monad_eliminator]: 6.47001e-06 [cse]: 1.368e-05 [a_3]: 3.23e-05 [py_interpret_to_execute_after_opt_a]: 7.90998e-06 [slice_cell_reuse_recomputed_activation]: 1.88002e-06 [rewriter_after_opt_a]: 2.958e-05 [convert_after_rewriter]: 6.67002e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00044365 [opt_b]: 0.00018218, [1] [Cycle 1]: 0.00017611, [7] [b_1]: 0.0001079 [b_2]: 7.22002e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 6.80011e-07 [cse]: 1.604e-05 [optimize_parallel_all_gather_comm]: 1.552e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.156e-05 [loop_unroll]: 0.00040983 [opt_after_cconv]: 9.495e-05, [1] [Cycle 1]: 8.926e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.13002e-06 [cse]: 1.649e-05 [renormalize]: 5.90022e-07 [remove_dup_value]: 1.354e-05 [tuple_transform]: 6.773e-05, [1] [Cycle 1]: 6.353e-05, [4] [d_1]: 3.794e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.46e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.761e-05 [cse_after_recomputation]: 2.01e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 5.26002e-06 [bias_add_comm_swap]: 2.60002e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.539e-05 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.78e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.29998e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.43002e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86003e-06 [control_data_broadcast_order]: 1.184e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.44001e-06 [overlap_recompute_and_grad_model_parallel]: 4.26001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.87998e-06 [overlap_grad_flash_sp]: 1.709e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 7.109e-05, [1] [Cycle 1]: 6.649e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.59002e-06 [elim_not_effective]: 1.228e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 9.27999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 0.00010162 [opt_after_jit_grad]: 0.00045304 [validate]: 3.172e-05 [backend_pass]: 1.37e-06 [task_emit]: 0.00624695 [execute]: 7.1e-06 Sums bootstrap : 0.000549s : 3.21% type_inference : 0.006361s : 37.14% event_method : 0.000015s : 0.09% auto_monad : 0.000056s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000577s : 3.37% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000030s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000442s : 2.58% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.59% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000410s : 2.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000025s : 0.15% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000102s : 0.59% opt_after_jit_grad : 0.000453s : 2.65% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006247s : 36.48% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.41% : 0.000024s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.10% : 0.000005s : 4: substitution.graph_param_transform 67.09% : 0.000110s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.85% : 0.000005s : 4: substitution.remove_not_recompute_node 2.54% : 0.000004s : 4: substitution.replace_old_param 6.42% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006313 2 90.22% : 0.005695s : 1: type_inference.infer 9.78% : 0.000618s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.08% : 0.000028s : 3: replace.inline 28.92% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.92% : 0.000108s : 3: match.inline 8.08% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.37% : 0.000004s : 19: predicate.arithmetic_simplify 0.94% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000002s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.18% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.41% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.83% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.97% : 0.000002s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000368 8 48.64% : 0.000179s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.36% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030679 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.46% : 0.003517s : 1: add_attr 11.42% : 0.003504s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000062s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000588s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.09% : 0.000029s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.48% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.07% : 0.000943s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.03% : 0.002158s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.51% : 0.000463s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.08% : 0.004014s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.73% : 0.000224s : 1: renormalize.infer 0.69% : 0.000211s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.35% : 0.000107s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000074s : 1: symbol_engine_optimizer 20.39% : 0.006257s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.78% : 0.006376s : 1: type_inference 0.21% : 0.000065s : 1: validate TotalTime = 0.0180773, [24] [bootstrap]: 0.00042425 [type_inference]: 0.00433066 [event_method]: 9.97001e-06 [auto_monad]: 4.883e-05 [graph_reusing]: 5.22e-06 [inline]: 1.77999e-06 [add_attr]: 0.00295684, [1] [add_attr_with_inline]: 0.00294868, [1] [Cycle 1]: 4.256e-05, [2] [tag_attr]: 1.185e-05 [meta_addattr_fg_expand]: 3.06001e-06 [parallel-infer-symbol]: 2.61999e-06 [pre_auto_parallel]: 2.156e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00368902, [53] [py_interpret_to_execute]: 1.511e-05 [rewriter_before_opt_a]: 3.838e-05 [opt_a]: 0.00184854, [2] [Cycle 1]: 0.0012455, [45] [expand_dump_flag]: 2.51998e-06 [switch_simplify]: 2.362e-05 [loop_unroll]: 1.339e-05 [a_1]: 0.00028821 [with_stream_mark]: 1.359e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 2.90002e-06 [parameter_eliminate]: 1.50999e-06 [a_2]: 7.637e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 7.61999e-06 [auto_parallel]: 6.50002e-06 [parallel]: 1.689e-05 [flash_sp]: 7.5e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 9.30013e-07 [virtual_shard_identity]: 7.08998e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 5.27999e-06 [virtual_output]: 5.44998e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 1.98002e-06 [before_grad]: 9.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.85001e-06 [renormalize]: 0.00034348 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.317e-05 [cse]: 2.689e-05 [a_3]: 4.067e-05 [Cycle 2]: 0.0005938, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 7.23e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00012524 [with_stream_mark]: 1.053e-05 [recompute_prepare]: 5.68997e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 7.59988e-07 [a_2]: 6.802e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.25001e-06 [parallel]: 3.89997e-06 [flash_sp]: 2.93e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.80002e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.30002e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.58003e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 8.03001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 1.07e-06 [receive_attached]: 1.04e-06 [after_resolve]: 8.70999e-06 [a_after_grad]: 8.17e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.29e-05 [a_3]: 3.157e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.156e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00048778 [opt_b]: 0.00018174, [1] [Cycle 1]: 0.00017546, [7] [b_1]: 0.00010834 [b_2]: 6.97002e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 3.19997e-07 [cse]: 1.593e-05 [optimize_parallel_all_gather_comm]: 1.695e-05 [overlap_param_gather]: 2.26998e-06 [cconv]: 2.224e-05 [loop_unroll]: 0.00041219 [opt_after_cconv]: 9.415e-05, [1] [Cycle 1]: 8.836e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.13998e-06 [cse]: 1.576e-05 [renormalize]: 1.79978e-07 [remove_dup_value]: 1.245e-05 [tuple_transform]: 6.793e-05, [1] [Cycle 1]: 6.349e-05, [4] [d_1]: 3.829e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.262e-05 [cse_after_recomputation]: 2.043e-05, [1] [Cycle 1]: 1.614e-05, [1] [cse]: 1.12e-05 [environ_conv]: 4.20999e-06 [swap_dp_allreduce_reducescatter]: 4.99998e-06 [bias_add_comm_swap]: 2.43002e-06 [label_micro_interleaved_index]: 4.79e-06 [label_fine_grained_interleaved_index]: 2.83003e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.38998e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.37999e-06 [full_micro_interleaved_order_control]: 2.08998e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.04998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.78002e-06 [control_data_broadcast_order]: 1.137e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.91999e-06 [overlap_recompute_and_grad_model_parallel]: 4.40999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71e-06 [overlap_recompute_comm]: 2.11003e-06 [overlap_grad_ring_attention]: 3.76999e-06 [overlap_grad_flash_sp]: 1.664e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 2.07999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.838e-05, [1] [Cycle 1]: 6.414e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 7.9e-06 [elim_not_effective]: 1.207e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.50001e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.632e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00044759 [validate]: 3.048e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00587743 [execute]: 6.52001e-06 Sums bootstrap : 0.000424s : 2.99% type_inference : 0.004331s : 30.57% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000413s : 2.92% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000344s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000488s : 3.44% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000412s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.09% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.12% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000448s : 3.16% validate : 0.000030s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005877s : 41.49% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.64% : 0.000022s : 4: substitution.arithmetic_simplify 1.55% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 65.71% : 0.000078s : 2: substitution.inline 2.21% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.33% : 0.000004s : 4: substitution.remove_not_recompute_node 3.03% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004289 2 91.99% : 0.003945s : 1: type_inference.infer 8.01% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.92% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.00% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.51% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.32% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.79% : 0.000001s : 8: predicate.remove_not_recompute_node 1.26% : 0.000002s : 17: predicate.replace_applicator 0.90% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.07% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.37% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.25% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.68% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 42.34% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.66% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025988 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.39% : 0.002961s : 1: add_attr 11.36% : 0.002952s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000459s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.62% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.91% : 0.000496s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000762s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.12% : 0.001851s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.76% : 0.000458s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 14.21% : 0.003693s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000189s : 1: renormalize.infer 0.57% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.66% : 0.005888s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.72% : 0.004344s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0194115, [24] [bootstrap]: 0.00042356 [type_inference]: 0.00543962 [event_method]: 1.426e-05 [auto_monad]: 5.284e-05 [graph_reusing]: 5.54e-06 [inline]: 1.88002e-06 [add_attr]: 0.0029289, [1] [add_attr_with_inline]: 0.0029213, [1] [Cycle 1]: 4.73e-05, [2] [tag_attr]: 1.567e-05 [meta_addattr_fg_expand]: 4.13001e-06 [parallel-infer-symbol]: 2.57001e-06 [pre_auto_parallel]: 2.466e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.09999e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00394603, [53] [py_interpret_to_execute]: 1.972e-05 [rewriter_before_opt_a]: 5.754e-05 [opt_a]: 0.00208756, [2] [Cycle 1]: 0.00148747, [45] [expand_dump_flag]: 2.83998e-06 [switch_simplify]: 3.24e-05 [loop_unroll]: 2.119e-05 [a_1]: 0.00044198 [with_stream_mark]: 1.347e-05 [recompute_prepare]: 7.66001e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.04001e-06 [updatestate_loads_eliminate]: 3.2e-06 [parameter_eliminate]: 1.55001e-06 [a_2]: 7.449e-05 [accelerated_algorithm]: 6.23998e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 6.36998e-06 [merge_send_recv]: 7.77998e-06 [auto_parallel]: 6.23002e-06 [parallel]: 1.84e-05 [flash_sp]: 6.97002e-06 [merge_comm]: 3.79002e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 8.29002e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.32002e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.73999e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.99999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28998e-06 [meta_fg_expand]: 2.16998e-06 [flash_sp_send_recv_attached]: 2.17999e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00040658 [add_forward_monad_depend]: 5.12999e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.358e-05 [cse]: 2.836e-05 [a_3]: 4.073e-05 [Cycle 2]: 0.00059085, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012376 [with_stream_mark]: 9.42001e-06 [recompute_prepare]: 5.47001e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.768e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.27e-06 [meta_shard_fg_expand]: 1.25001e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.03002e-06 [parallel]: 3.98001e-06 [flash_sp]: 2.80002e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.65997e-06 [matmul_add_comm_reduction]: 4.70999e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.64998e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.81001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.09998e-06 [a_after_grad]: 7.94002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 6.31e-06 [cse]: 1.363e-05 [a_3]: 3.253e-05 [py_interpret_to_execute_after_opt_a]: 7.45998e-06 [slice_cell_reuse_recomputed_activation]: 1.75001e-06 [rewriter_after_opt_a]: 3.092e-05 [convert_after_rewriter]: 6.61e-06 [order_py_execute_after_rewriter]: 4.68001e-06 [mutable_eliminate]: 0.00044474 [opt_b]: 0.00018285, [1] [Cycle 1]: 0.00017649, [7] [b_1]: 0.00010804 [b_2]: 7.22997e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 5.00004e-07 [cse]: 1.646e-05 [optimize_parallel_all_gather_comm]: 1.627e-05 [overlap_param_gather]: 1.65001e-06 [cconv]: 5.345e-05 [loop_unroll]: 0.00041381 [opt_after_cconv]: 9.44e-05, [1] [Cycle 1]: 8.865e-05, [7] [c_1]: 2.732e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.53003e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.568e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.309e-05 [tuple_transform]: 6.938e-05, [1] [Cycle 1]: 6.528e-05, [4] [d_1]: 3.931e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 2.16e-06 [add_recomputation]: 4.412e-05 [cse_after_recomputation]: 1.989e-05, [1] [Cycle 1]: 1.552e-05, [1] [cse]: 1.036e-05 [environ_conv]: 4.55001e-06 [swap_dp_allreduce_reducescatter]: 4.95999e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.09998e-06 [full_micro_interleaved_order_control]: 2.30002e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 1.33002e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91003e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.68999e-06 [overlap_recompute_and_grad_model_parallel]: 4.12e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 3.76001e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 7.49977e-07 [split_matmul_comm_elemetwise]: 1.97001e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.861e-05, [1] [Cycle 1]: 6.438e-05, [6] [build]: 2.41998e-06 [elim_shapecalc]: 8.31002e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 9.18002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.75001e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.608e-05 [get_jit_bprop_graph]: 9.60019e-07 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.00044613 [validate]: 3.007e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00586555 [execute]: 6.21e-06 Sums bootstrap : 0.000424s : 2.73% type_inference : 0.005440s : 35.03% event_method : 0.000014s : 0.09% auto_monad : 0.000053s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000566s : 3.64% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000407s : 2.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 2.86% optimize.opt_b.b_1 : 0.000108s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000053s : 0.34% optimize.loop_unroll : 0.000414s : 2.67% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 2.87% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005866s : 37.78% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000162 30 14.91% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000002s : 2: substitution.fold_const_symbol 3.35% : 0.000005s : 4: substitution.graph_param_transform 66.18% : 0.000107s : 3: substitution.inline 1.92% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.74% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.51% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005401 2 90.16% : 0.004869s : 1: type_inference.infer 9.84% : 0.000532s : 1: type_inference.specialize ------[replace.] 0.000037 5 70.04% : 0.000026s : 3: replace.inline 29.96% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.77% : 0.000105s : 3: match.inline 8.23% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000009s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.53% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.30% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000002s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.16% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.92% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000334 8 46.83% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.17% : 0.000177s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027783 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.56% : 0.002933s : 1: add_attr 10.53% : 0.002925s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000058s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.65% : 0.000458s : 1: bootstrap 0.21% : 0.000057s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.52% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.35% : 0.000931s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.52% : 0.002091s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.64% : 0.000456s : 1: opt_after_jit_grad 0.67% : 0.000186s : 1: opt_b 14.22% : 0.003950s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.75% : 0.000208s : 1: renormalize.infer 0.69% : 0.000192s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000071s : 1: symbol_engine_optimizer 21.15% : 0.005875s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 19.63% : 0.005453s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0370524, [24] [bootstrap]: 0.00045781 [type_inference]: 0.0112286 [event_method]: 4.617e-05 [auto_monad]: 0.0001205 [graph_reusing]: 8.03001e-06 [inline]: 1.97001e-06 [add_attr]: 0.00300833, [1] [add_attr_with_inline]: 0.00300001, [1] [Cycle 1]: 7.153e-05, [2] [tag_attr]: 3.427e-05 [meta_addattr_fg_expand]: 9.25999e-06 [parallel-infer-symbol]: 2.93e-06 [pre_auto_parallel]: 4.958e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.0133063, [53] [py_interpret_to_execute]: 3.805e-05 [rewriter_before_opt_a]: 0.00014472 [opt_a]: 0.0110391, [3] [Cycle 1]: 0.00704561, [45] [expand_dump_flag]: 4.14002e-06 [switch_simplify]: 7.295e-05 [loop_unroll]: 6.096e-05 [a_1]: 0.00144754 [with_stream_mark]: 2.288e-05 [recompute_prepare]: 2.17e-05 [updatestate_depend_eliminate]: 9.24e-06 [updatestate_assign_eliminate]: 7.77e-06 [updatestate_loads_eliminate]: 7.64002e-06 [parameter_eliminate]: 2.46e-06 [a_2]: 0.00024253 [accelerated_algorithm]: 3.003e-05 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 3.51999e-06 [shard_inline]: 1.639e-05 [merge_send_recv]: 1.572e-05 [auto_parallel]: 1.078e-05 [parallel]: 1.857e-05 [flash_sp]: 1.211e-05 [merge_comm]: 9.44e-06 [allreduce_fusion]: 8.83001e-06 [matmul_add_comm_reduction]: 2.623e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.768e-05 [virtual_dataset]: 1.557e-05 [get_grad_eliminate_]: 1.543e-05 [virtual_output]: 1.538e-05 [merge_forward]: 9.43002e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.742e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.887e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 2.711e-05 [set_forward_comm_id_for_comm_node_pass]: 9.46998e-06 [meta_fg_expand]: 0.00141236 [flash_sp_send_recv_attached]: 3.86999e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 6.001e-05 [a_after_grad]: 8.11e-05 [renormalize]: 0.00240137 [add_forward_monad_depend]: 9.15999e-06 [auto_monad_grad]: 5.32999e-06 [auto_monad_eliminator]: 5.562e-05 [cse]: 0.00018823 [a_3]: 0.00033616 [Cycle 2]: 0.00301816, [45] [expand_dump_flag]: 1.77001e-06 [switch_simplify]: 4.668e-05 [loop_unroll]: 4.369e-05 [a_1]: 0.00152619 [with_stream_mark]: 1.29e-05 [recompute_prepare]: 1.142e-05 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 4.13999e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.00012712 [accelerated_algorithm]: 1.25e-05 [shard]: 1.21997e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 9.22001e-06 [merge_send_recv]: 7.43999e-06 [auto_parallel]: 7.3e-06 [parallel]: 5.64998e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.75001e-06 [matmul_add_comm_reduction]: 7.93999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.042e-05 [virtual_dataset]: 8.85001e-06 [get_grad_eliminate_]: 8.74e-06 [virtual_output]: 8.38999e-06 [merge_forward]: 4.58001e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 9.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.68e-05 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 1.373e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25999e-06 [meta_fg_expand]: 7.029e-05 [flash_sp_send_recv_attached]: 1.04998e-06 [receive_attached]: 1.32999e-06 [after_resolve]: 1.659e-05 [a_after_grad]: 1.42e-05 [renormalize]: 0.00060253 [add_forward_monad_depend]: 3.93001e-06 [auto_monad_grad]: 1.32e-06 [auto_monad_eliminator]: 1.481e-05 [cse]: 4.621e-05 [a_3]: 6.577e-05 [Cycle 3]: 0.00096148, [45] [expand_dump_flag]: 1.06997e-06 [switch_simplify]: 1.089e-05 [loop_unroll]: 9.09e-06 [a_1]: 0.00025111 [with_stream_mark]: 1.062e-05 [recompute_prepare]: 9.19e-06 [updatestate_depend_eliminate]: 4.68999e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 3.76001e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012306 [accelerated_algorithm]: 1.142e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.02e-06 [merge_send_recv]: 6.91001e-06 [auto_parallel]: 7e-06 [parallel]: 4.82e-06 [flash_sp]: 1.23002e-06 [merge_comm]: 4.90999e-06 [allreduce_fusion]: 4.95999e-06 [matmul_add_comm_reduction]: 6.202e-05 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.061e-05 [virtual_dataset]: 8.85999e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.33999e-06 [merge_forward]: 4.33999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 8.52e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.629e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25001e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.475e-05 [a_after_grad]: 1.465e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.48002e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 1.06e-05 [cse]: 2.64e-05 [a_3]: 6.065e-05 [py_interpret_to_execute_after_opt_a]: 1.071e-05 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 4.579e-05 [convert_after_rewriter]: 8.90001e-06 [order_py_execute_after_rewriter]: 6.66e-06 [mutable_eliminate]: 0.0004718 [opt_b]: 0.00028789, [1] [Cycle 1]: 0.00028151, [7] [b_1]: 0.00018871 [b_2]: 1.067e-05 [updatestate_depend_eliminate]: 7.65e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 4.01001e-06 [renormalize]: 5.20027e-07 [cse]: 3.098e-05 [optimize_parallel_all_gather_comm]: 1.998e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 1.944e-05 [loop_unroll]: 0.0004239 [opt_after_cconv]: 0.00013462, [1] [Cycle 1]: 0.00012869, [7] [c_1]: 4.789e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 3.9e-06 [cse]: 2.937e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.846e-05 [tuple_transform]: 0.00010096, [1] [Cycle 1]: 9.609e-05, [4] [d_1]: 6.659e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.69e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 5.672e-05 [cse_after_recomputation]: 3.112e-05, [1] [Cycle 1]: 2.633e-05, [1] [cse]: 2.067e-05 [environ_conv]: 8.29002e-06 [swap_dp_allreduce_reducescatter]: 7.8e-06 [bias_add_comm_swap]: 2.86e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.29999e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.45999e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 9.99979e-07 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.697e-05 [grouped_pairwise_exchange_alltoall]: 1.41998e-06 [offloading_packed_experts]: 5.29e-06 [overlap_recompute_and_grad_model_parallel]: 5.22999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.57001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 5.00999e-06 [overlap_grad_flash_sp]: 2.455e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.791e-05, [1] [Cycle 1]: 9.365e-05, [6] [build]: 9.29e-06 [elim_shapecalc]: 1.342e-05 [elim_not_effective]: 1.814e-05 [opt_reshape]: 9.86e-06 [fold_const_symbol]: 1.518e-05 [renormalize]: 1.80007e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 2.562e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.0004656 [validate]: 4.4e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00805838 [execute]: 6.68e-06 Sums bootstrap : 0.000458s : 1.40% type_inference : 0.011229s : 34.24% event_method : 0.000046s : 0.14% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.12% optimize.rewriter_before_opt_a : 0.000145s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003225s : 9.83% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000096s : 0.29% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001486s : 4.53% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.003004s : 9.16% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000261s : 0.80% optimize.opt_a.a_3 : 0.000463s : 1.41% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000472s : 1.44% optimize.opt_b.b_1 : 0.000189s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000424s : 1.29% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000466s : 1.42% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008058s : 24.58% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000760 222 5.87% : 0.000045s : 12: substitution.arithmetic_simplify 1.81% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.67% : 0.000423s : 17: substitution.inline 2.04% : 0.000016s : 2: substitution.inline_without_move 1.30% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.07% : 0.000023s : 10: substitution.replace_applicator 1.48% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.69% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011154 2 86.96% : 0.009700s : 1: type_inference.infer 13.04% : 0.001454s : 1: type_inference.specialize ------[replace.] 0.000221 33 57.66% : 0.000128s : 17: replace.inline 42.34% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000448 33 92.41% : 0.000414s : 17: match.inline 7.59% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.76% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.69% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000009s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.97% : 0.000037s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.50% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.66% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001583 34 58.09% : 0.000920s : 13: func_graph_cloner_run.FuncGraphClonerGraph 41.91% : 0.000664s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061587 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.89% : 0.003013s : 1: add_attr 4.88% : 0.003004s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000128s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.80% : 0.000492s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.94% : 0.004891s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.93% : 0.011042s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.77% : 0.000475s : 1: opt_after_jit_grad 0.47% : 0.000291s : 1: opt_b 21.61% : 0.013310s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.65% : 0.001630s : 2: renormalize.infer 2.21% : 0.001361s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.10% : 0.008069s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.26% : 0.011244s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0183592, [24] [bootstrap]: 0.00042471 [type_inference]: 0.00424376 [event_method]: 1.055e-05 [auto_monad]: 5.015e-05 [graph_reusing]: 4.97e-06 [inline]: 1.59998e-06 [add_attr]: 0.00298452, [1] [add_attr_with_inline]: 0.00297653, [1] [Cycle 1]: 4.24e-05, [2] [tag_attr]: 1.164e-05 [meta_addattr_fg_expand]: 2.86e-06 [parallel-infer-symbol]: 2.56998e-06 [pre_auto_parallel]: 2.058e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 8.39995e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00368535, [53] [py_interpret_to_execute]: 1.508e-05 [rewriter_before_opt_a]: 3.763e-05 [opt_a]: 0.00189329, [2] [Cycle 1]: 0.00124452, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 2.429e-05 [loop_unroll]: 1.368e-05 [a_1]: 0.00029196 [with_stream_mark]: 1.318e-05 [recompute_prepare]: 7.15998e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.75997e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 7.691e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 7.28999e-06 [auto_parallel]: 5.79e-06 [parallel]: 1.654e-05 [flash_sp]: 6.96001e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.16999e-06 [virtual_dataset]: 5.59e-06 [get_grad_eliminate_]: 5.91998e-06 [virtual_output]: 5.47999e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.20001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.71e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.54001e-06 [after_resolve]: 1.098e-05 [a_after_grad]: 8.84e-06 [renormalize]: 0.00033907 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.285e-05 [cse]: 2.682e-05 [a_3]: 3.974e-05 [Cycle 2]: 0.00063941, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012461 [with_stream_mark]: 9.49999e-06 [recompute_prepare]: 5.74999e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.754e-05 [accelerated_algorithm]: 5.78002e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.08999e-06 [auto_parallel]: 5.05001e-06 [parallel]: 4.32e-06 [flash_sp]: 3.28998e-06 [merge_comm]: 2.97002e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 4.77998e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.16998e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 5.01997e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 8.3e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.90998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.15999e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 8.90024e-07 [auto_monad_eliminator]: 5.266e-05 [cse]: 1.381e-05 [a_3]: 3.271e-05 [py_interpret_to_execute_after_opt_a]: 7.51001e-06 [slice_cell_reuse_recomputed_activation]: 1.70001e-06 [rewriter_after_opt_a]: 3.062e-05 [convert_after_rewriter]: 6.93998e-06 [order_py_execute_after_rewriter]: 5.20999e-06 [mutable_eliminate]: 0.00044667 [opt_b]: 0.0001807, [1] [Cycle 1]: 0.00017457, [7] [b_1]: 0.0001072 [b_2]: 7.26999e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.09985e-07 [cse]: 1.579e-05 [optimize_parallel_all_gather_comm]: 1.508e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.00040942 [opt_after_cconv]: 9.611e-05, [1] [Cycle 1]: 9.048e-05, [7] [c_1]: 2.803e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.661e-05 [renormalize]: 5.60016e-07 [remove_dup_value]: 1.241e-05 [tuple_transform]: 6.775e-05, [1] [Cycle 1]: 6.329e-05, [4] [d_1]: 3.836e-05 [none_parameter_eliminate]: 1.35999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 2.14999e-06 [add_recomputation]: 4.347e-05 [cse_after_recomputation]: 1.997e-05, [1] [Cycle 1]: 1.56e-05, [1] [cse]: 1.053e-05 [environ_conv]: 4.48001e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.17e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.12e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 3.68e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.716e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 1.58002e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 6.837e-05, [1] [Cycle 1]: 6.429e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.21998e-06 [fold_const_symbol]: 8.82e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.61998e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.21999e-06 [opt_after_jit_grad]: 0.00044449 [validate]: 3.032e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00622836 [execute]: 6.73e-06 Sums bootstrap : 0.000425s : 2.94% type_inference : 0.004244s : 29.42% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.89% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000011s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000018s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000066s : 0.45% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000447s : 3.10% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000409s : 2.84% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.12% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 3.08% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006228s : 43.17% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 17.99% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.43% : 0.000005s : 4: substitution.graph_param_transform 65.74% : 0.000079s : 2: substitution.inline 2.23% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.74% : 0.000004s : 4: substitution.remove_not_recompute_node 3.33% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004203 2 92.02% : 0.003868s : 1: type_inference.infer 7.98% : 0.000336s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000134 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.11% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.95% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.50% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.16% : 0.000002s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.43% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.97% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000230 6 42.30% : 0.000097s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.70% : 0.000133s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026298 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.36% : 0.002989s : 1: add_attr 11.33% : 0.002980s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.75% : 0.000459s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000418s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000455s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000770s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.21% : 0.001896s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.73% : 0.000454s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.03% : 0.003689s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000024s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000187s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.72% : 0.006238s : 1: task_emit 0.27% : 0.000070s : 1: tuple_transform 16.19% : 0.004257s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0358474, [24] [bootstrap]: 0.00052746 [type_inference]: 0.0102496 [event_method]: 4.678e-05 [auto_monad]: 0.00011178 [graph_reusing]: 8.85999e-06 [inline]: 2.09e-06 [add_attr]: 0.00304702, [1] [add_attr_with_inline]: 0.0030377, [1] [Cycle 1]: 6.695e-05, [2] [tag_attr]: 3.173e-05 [meta_addattr_fg_expand]: 8.36002e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 4.53e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.0129826, [53] [py_interpret_to_execute]: 3.473e-05 [rewriter_before_opt_a]: 0.00012777 [opt_a]: 0.0107205, [3] [Cycle 1]: 0.0068169, [45] [expand_dump_flag]: 3.56001e-06 [switch_simplify]: 6.588e-05 [loop_unroll]: 5.513e-05 [a_1]: 0.00134341 [with_stream_mark]: 2.511e-05 [recompute_prepare]: 2.177e-05 [updatestate_depend_eliminate]: 9.07001e-06 [updatestate_assign_eliminate]: 7.98001e-06 [updatestate_loads_eliminate]: 7.3e-06 [parameter_eliminate]: 2.58e-06 [a_2]: 0.00024512 [accelerated_algorithm]: 3.068e-05 [shard]: 1.84998e-06 [meta_shard_fg_expand]: 3.21001e-06 [shard_inline]: 1.616e-05 [merge_send_recv]: 1.58e-05 [auto_parallel]: 1.071e-05 [parallel]: 1.725e-05 [flash_sp]: 1.093e-05 [merge_comm]: 9.51e-06 [allreduce_fusion]: 9.04e-06 [matmul_add_comm_reduction]: 2.595e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.77e-05 [virtual_dataset]: 1.552e-05 [get_grad_eliminate_]: 1.508e-05 [virtual_output]: 1.499e-05 [merge_forward]: 9.20999e-06 [cell_reuse_recompute_pass]: 1.06997e-06 [offload_activation]: 1.767e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.854e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.717e-05 [set_forward_comm_id_for_comm_node_pass]: 9.46e-06 [meta_fg_expand]: 0.00139186 [flash_sp_send_recv_attached]: 3.70998e-06 [receive_attached]: 2.86e-06 [after_resolve]: 5.823e-05 [a_after_grad]: 8.244e-05 [renormalize]: 0.00234512 [add_forward_monad_depend]: 8.74e-06 [auto_monad_grad]: 5.66e-06 [auto_monad_eliminator]: 5.546e-05 [cse]: 0.00015798 [a_3]: 0.00033571 [Cycle 2]: 0.00298837, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.74e-05 [loop_unroll]: 8.711e-05 [a_1]: 0.00153207 [with_stream_mark]: 1.213e-05 [recompute_prepare]: 1.096e-05 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 9.90025e-07 [a_2]: 0.00012657 [accelerated_algorithm]: 1.222e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.20999e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.24001e-06 [parallel]: 4.50001e-06 [flash_sp]: 3.01001e-06 [merge_comm]: 5.17999e-06 [allreduce_fusion]: 4.48999e-06 [matmul_add_comm_reduction]: 7.56999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.036e-05 [virtual_dataset]: 9.17001e-06 [get_grad_eliminate_]: 9.22001e-06 [virtual_output]: 8.59e-06 [merge_forward]: 4.52998e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.646e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.48002e-06 [meta_fg_expand]: 3.415e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.04003e-06 [after_resolve]: 1.561e-05 [a_after_grad]: 1.456e-05 [renormalize]: 0.00056964 [add_forward_monad_depend]: 4.09002e-06 [auto_monad_grad]: 1.13001e-06 [auto_monad_eliminator]: 1.474e-05 [cse]: 4.486e-05 [a_3]: 6.625e-05 [Cycle 3]: 0.00090124, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.065e-05 [loop_unroll]: 9.19e-06 [a_1]: 0.00024947 [with_stream_mark]: 9.71e-06 [recompute_prepare]: 9.34998e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 3.83001e-06 [updatestate_loads_eliminate]: 3.70998e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 0.00012337 [accelerated_algorithm]: 1.193e-05 [shard]: 8.89995e-07 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 8.99998e-06 [merge_send_recv]: 7.28e-06 [auto_parallel]: 7.29001e-06 [parallel]: 4.49998e-06 [flash_sp]: 1.07998e-06 [merge_comm]: 4.90999e-06 [allreduce_fusion]: 4.85999e-06 [matmul_add_comm_reduction]: 7.56999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.018e-05 [virtual_dataset]: 8.92e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.42998e-06 [merge_forward]: 4.42998e-06 [cell_reuse_recompute_pass]: 1.19003e-06 [offload_activation]: 8.57998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.606e-05 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 1.386e-05 [set_forward_comm_id_for_comm_node_pass]: 5.16002e-06 [meta_fg_expand]: 2.94001e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.351e-05 [a_after_grad]: 1.428e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 1.131e-05 [cse]: 2.663e-05 [a_3]: 5.925e-05 [py_interpret_to_execute_after_opt_a]: 9.94001e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 5.191e-05 [convert_after_rewriter]: 9.06002e-06 [order_py_execute_after_rewriter]: 6.80998e-06 [mutable_eliminate]: 0.00047652 [opt_b]: 0.00028824, [1] [Cycle 1]: 0.0002823, [7] [b_1]: 0.00019027 [b_2]: 1.103e-05 [updatestate_depend_eliminate]: 7.48e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.95998e-06 [renormalize]: 5.79981e-07 [cse]: 3.05e-05 [optimize_parallel_all_gather_comm]: 2.402e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.02e-05 [loop_unroll]: 0.00042192 [opt_after_cconv]: 0.00013481, [1] [Cycle 1]: 0.00012903, [7] [c_1]: 4.803e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 7.35998e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.89002e-06 [cse]: 2.916e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 2.838e-05 [tuple_transform]: 0.00010156, [1] [Cycle 1]: 9.664e-05, [4] [d_1]: 6.728e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.77999e-06 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 5.595e-05 [cse_after_recomputation]: 3.128e-05, [1] [Cycle 1]: 2.677e-05, [1] [cse]: 2.157e-05 [environ_conv]: 8.78001e-06 [swap_dp_allreduce_reducescatter]: 7.97e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.11003e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.25002e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.722e-05 [grouped_pairwise_exchange_alltoall]: 1.70001e-06 [offloading_packed_experts]: 4.95001e-06 [overlap_recompute_and_grad_model_parallel]: 5.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44998e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 5.40999e-06 [overlap_grad_flash_sp]: 2.378e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.57001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.732e-05, [1] [Cycle 1]: 9.318e-05, [6] [build]: 8.77e-06 [elim_shapecalc]: 1.408e-05 [elim_not_effective]: 1.835e-05 [opt_reshape]: 1.015e-05 [fold_const_symbol]: 1.442e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.78997e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 2.455e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00046465 [validate]: 4.356e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00806606 [execute]: 6.80998e-06 Sums bootstrap : 0.000527s : 1.67% type_inference : 0.010250s : 32.48% event_method : 0.000047s : 0.15% auto_monad : 0.000112s : 0.35% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000128s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000151s : 0.48% optimize.opt_a.a_1 : 0.003125s : 9.90% optimize.opt_a.with_stream_mark : 0.000047s : 0.15% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000495s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001429s : 4.53% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000111s : 0.35% optimize.opt_a.renormalize : 0.002915s : 9.24% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.26% optimize.opt_a.cse : 0.000229s : 0.73% optimize.opt_a.a_3 : 0.000461s : 1.46% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000052s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000477s : 1.51% optimize.opt_b.b_1 : 0.000190s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000024s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000422s : 1.34% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000465s : 1.47% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008066s : 25.56% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000729 218 6.07% : 0.000044s : 11: substitution.arithmetic_simplify 1.90% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.85% : 0.000400s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.81% : 0.000013s : 20: substitution.remove_not_recompute_node 3.24% : 0.000024s : 10: substitution.replace_applicator 1.45% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.75% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.88% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.47% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010177 2 87.54% : 0.008909s : 1: type_inference.infer 12.46% : 0.001268s : 1: type_inference.specialize ------[replace.] 0.000201 30 58.95% : 0.000119s : 16: replace.inline 41.05% : 0.000083s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000422 30 92.84% : 0.000391s : 16: match.inline 7.16% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.08% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000019s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.24% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.94% : 0.000014s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.98% : 0.000015s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.80% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.10% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.89% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.42% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.30% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.19% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001456 32 57.59% : 0.000839s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.41% : 0.000618s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059937 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.09% : 0.003051s : 1: add_attr 5.07% : 0.003041s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000119s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.94% : 0.000561s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.81% : 0.000486s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.04% : 0.004819s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.89% : 0.010724s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.79% : 0.000474s : 1: opt_after_jit_grad 0.49% : 0.000292s : 1: opt_b 21.67% : 0.012987s : 1: optimize 0.05% : 0.000028s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.60% : 0.001561s : 2: renormalize.infer 2.24% : 0.001341s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000056s : 1: rewriter_after_opt_a 0.22% : 0.000132s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.47% : 0.008076s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.12% : 0.010264s : 1: type_inference 0.12% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x0-kbk],max_mem:54.0M . TotalTime = 0.0790023, [24] [bootstrap]: 0.0005553 [type_inference]: 0.00606211 [event_method]: 1.377e-05 [auto_monad]: 5.404e-05 [graph_reusing]: 5.31002e-06 [inline]: 1.89999e-06 [add_attr]: 0.00350158, [1] [add_attr_with_inline]: 0.00349024, [1] [Cycle 1]: 4.435e-05, [2] [tag_attr]: 1.462e-05 [meta_addattr_fg_expand]: 4.03999e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.799e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.72001e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.003986, [53] [py_interpret_to_execute]: 1.935e-05 [rewriter_before_opt_a]: 5.744e-05 [opt_a]: 0.00210031, [2] [Cycle 1]: 0.00150741, [45] [expand_dump_flag]: 3.07002e-06 [switch_simplify]: 3.117e-05 [loop_unroll]: 2.076e-05 [a_1]: 0.00045474 [with_stream_mark]: 1.342e-05 [recompute_prepare]: 7.70998e-06 [updatestate_depend_eliminate]: 3.88999e-06 [updatestate_assign_eliminate]: 3.41001e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.551e-05 [accelerated_algorithm]: 6.35002e-06 [shard]: 2.51998e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 8.18999e-06 [auto_parallel]: 5.81e-06 [parallel]: 2.356e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.75998e-06 [allreduce_fusion]: 3.55998e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.41001e-06 [virtual_dataset]: 5.88998e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 6.16e-06 [merge_forward]: 4.03999e-06 [cell_reuse_recompute_pass]: 9.49978e-07 [offload_activation]: 9.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.57001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.16e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.57998e-06 [renormalize]: 0.00040971 [add_forward_monad_depend]: 4.35999e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.31e-05 [cse]: 2.614e-05 [a_3]: 4.055e-05 [Cycle 2]: 0.00058351, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.54999e-06 [loop_unroll]: 5.27001e-06 [a_1]: 0.0001255 [with_stream_mark]: 9.19e-06 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 3.00998e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.752e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.60999e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.47998e-06 [auto_parallel]: 5.22e-06 [parallel]: 4.23999e-06 [flash_sp]: 3.13e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.63003e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.10002e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.34e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 9.05001e-06 [a_after_grad]: 7.80998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.06998e-06 [cse]: 1.265e-05 [a_3]: 3.183e-05 [py_interpret_to_execute_after_opt_a]: 7.38999e-06 [slice_cell_reuse_recomputed_activation]: 2.27999e-06 [rewriter_after_opt_a]: 3.124e-05 [convert_after_rewriter]: 6.71999e-06 [order_py_execute_after_rewriter]: 8.87e-06 [mutable_eliminate]: 0.0004962 [opt_b]: 0.0001823, [1] [Cycle 1]: 0.00017645, [7] [b_1]: 0.00010879 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 4.89992e-07 [cse]: 1.624e-05 [optimize_parallel_all_gather_comm]: 1.476e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.256e-05 [loop_unroll]: 0.00041347 [opt_after_cconv]: 9.519e-05, [1] [Cycle 1]: 8.964e-05, [7] [c_1]: 2.78e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.617e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.218e-05 [tuple_transform]: 6.799e-05, [1] [Cycle 1]: 6.358e-05, [4] [d_1]: 3.856e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 5.064e-05 [cse_after_recomputation]: 2.132e-05, [1] [Cycle 1]: 1.682e-05, [1] [cse]: 1.152e-05 [environ_conv]: 4.38001e-06 [swap_dp_allreduce_reducescatter]: 5.02999e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 3.04001e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92001e-06 [control_data_broadcast_order]: 1.181e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.17e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.651e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.83997e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.769e-05, [1] [Cycle 1]: 6.341e-05, [6] [build]: 2.20002e-06 [elim_shapecalc]: 8.18999e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 9.27001e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.571e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.0004503 [validate]: 3.076e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0640637 [execute]: 8.15999e-06 Sums bootstrap : 0.000555s : 0.75% type_inference : 0.006062s : 8.13% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000580s : 0.78% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000410s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000009s : 0.01% optimize.mutable_eliminate : 0.000496s : 0.67% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000413s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.60% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064064s : 85.95% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.85% : 0.000024s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.24% : 0.000005s : 4: substitution.graph_param_transform 66.86% : 0.000110s : 3: substitution.inline 1.61% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000004s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 6.67% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006019 2 91.03% : 0.005479s : 1: type_inference.infer 8.97% : 0.000540s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.83% : 0.000028s : 3: replace.inline 29.17% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.61% : 0.000108s : 3: match.inline 8.39% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.94% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.09% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.92% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.00% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.27% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.61% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.29% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.89% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.17% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.59% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.43% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000327 8 46.97% : 0.000154s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.03% : 0.000173s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087994 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.98% : 0.003506s : 1: add_attr 3.97% : 0.003494s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000593s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000505s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.07% : 0.000944s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.39% : 0.002103s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.52% : 0.000460s : 1: opt_after_jit_grad 0.21% : 0.000186s : 1: opt_b 4.53% : 0.003990s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000012s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000212s : 1: renormalize.infer 0.22% : 0.000192s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 72.83% : 0.064083s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 6.90% : 0.006076s : 1: type_inference 0.06% : 0.000056s : 1: validate TotalTime = 0.0703495, [24] [bootstrap]: 0.00047373 [type_inference]: 0.00442461 [event_method]: 1.063e-05 [auto_monad]: 4.994e-05 [graph_reusing]: 5.45001e-06 [inline]: 1.87001e-06 [add_attr]: 0.00296618, [1] [add_attr_with_inline]: 0.00295798, [1] [Cycle 1]: 4.521e-05, [2] [tag_attr]: 1.186e-05 [meta_addattr_fg_expand]: 2.91e-06 [parallel-infer-symbol]: 2.67001e-06 [pre_auto_parallel]: 2.067e-05 [insert-virtual-dataset]: 2.21e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.07001e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00367183, [53] [py_interpret_to_execute]: 1.64e-05 [rewriter_before_opt_a]: 3.943e-05 [opt_a]: 0.00187114, [2] [Cycle 1]: 0.0012512, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 2.429e-05 [loop_unroll]: 1.353e-05 [a_1]: 0.00029069 [with_stream_mark]: 1.296e-05 [recompute_prepare]: 7.25e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 3.01999e-06 [parameter_eliminate]: 1.47001e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.83e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 5.52001e-06 [parallel]: 1.764e-05 [flash_sp]: 7.85998e-06 [merge_comm]: 3.48e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 8.13999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 6.79999e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.59998e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.27999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.02999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.22001e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 8.78001e-06 [renormalize]: 0.00034237 [add_forward_monad_depend]: 4.83001e-06 [auto_monad_grad]: 1.62001e-06 [auto_monad_eliminator]: 1.299e-05 [cse]: 2.707e-05 [a_3]: 4.013e-05 [Cycle 2]: 0.00061096, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012097 [with_stream_mark]: 1.05e-05 [recompute_prepare]: 5.57999e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.16998e-06 [updatestate_loads_eliminate]: 2.41e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.818e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.03002e-06 [parallel]: 4.2e-06 [flash_sp]: 3.25e-06 [merge_comm]: 2.86999e-06 [allreduce_fusion]: 2.57001e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.08002e-06 [get_grad_eliminate_]: 4.89003e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.34001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.24e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.65998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 7.59988e-07 [auto_monad_eliminator]: 6.16e-06 [cse]: 1.295e-05 [a_3]: 5.805e-05 [py_interpret_to_execute_after_opt_a]: 7.65e-06 [slice_cell_reuse_recomputed_activation]: 1.79998e-06 [rewriter_after_opt_a]: 3.051e-05 [convert_after_rewriter]: 6.73003e-06 [order_py_execute_after_rewriter]: 5.13002e-06 [mutable_eliminate]: 0.00044907 [opt_b]: 0.00018015, [1] [Cycle 1]: 0.000174, [7] [b_1]: 0.0001065 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 5.3001e-07 [cse]: 1.647e-05 [optimize_parallel_all_gather_comm]: 1.557e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.338e-05 [loop_unroll]: 0.00041279 [opt_after_cconv]: 9.28e-05, [1] [Cycle 1]: 8.747e-05, [7] [c_1]: 2.707e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.09999e-06 [cse]: 1.58e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.201e-05 [tuple_transform]: 6.952e-05, [1] [Cycle 1]: 6.505e-05, [4] [d_1]: 3.93e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.3e-05 [cse_after_recomputation]: 1.976e-05, [1] [Cycle 1]: 1.537e-05, [1] [cse]: 1.05e-05 [environ_conv]: 4.50999e-06 [swap_dp_allreduce_reducescatter]: 4.99998e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.46998e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.57001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.14e-05 [grouped_pairwise_exchange_alltoall]: 1.70001e-06 [offloading_packed_experts]: 3.65998e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 4.25999e-06 [overlap_grad_flash_sp]: 1.656e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.97999e-06 [split_layernorm_comm]: 1.71002e-06 [handle_group_info]: 1.39998e-06 [symbol_engine_optimizer]: 6.818e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.08002e-06 [elim_shapecalc]: 8.67e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.555e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.00044402 [validate]: 3.113e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0580112 [execute]: 8.08001e-06 Sums bootstrap : 0.000474s : 0.71% type_inference : 0.004425s : 6.66% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000412s : 0.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000342s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000098s : 0.15% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.68% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000413s : 0.62% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058011s : 87.33% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000122 26 17.85% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.27% : 0.000005s : 4: substitution.graph_param_transform 66.07% : 0.000080s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.38% : 0.000004s : 4: substitution.remove_not_recompute_node 3.76% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004385 2 91.89% : 0.004029s : 1: type_inference.infer 8.11% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.51% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.37% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.97% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.21% : 0.000002s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.90% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.86% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.44% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.18% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.69% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.92% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.99% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000262 6 44.02% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.98% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078273 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.79% : 0.002970s : 1: add_attr 3.78% : 0.002962s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000508s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.01% : 0.000787s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.39% : 0.001874s : 1: opt_a 0.12% : 0.000096s : 1: opt_after_cconv 0.58% : 0.000453s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.70% : 0.003676s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.03% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000189s : 1: renormalize.infer 0.19% : 0.000147s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.14% : 0.058029s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.67% : 0.004438s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.0716891, [24] [bootstrap]: 0.00047222 [type_inference]: 0.00555241 [event_method]: 1.393e-05 [auto_monad]: 5.481e-05 [graph_reusing]: 5.41998e-06 [inline]: 2.09e-06 [add_attr]: 0.00292377, [1] [add_attr_with_inline]: 0.00291525, [1] [Cycle 1]: 4.592e-05, [2] [tag_attr]: 1.585e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 2.69001e-06 [pre_auto_parallel]: 2.435e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00395887, [53] [py_interpret_to_execute]: 2.044e-05 [rewriter_before_opt_a]: 5.653e-05 [opt_a]: 0.00212535, [2] [Cycle 1]: 0.00151927, [45] [expand_dump_flag]: 2.66999e-06 [switch_simplify]: 3.133e-05 [loop_unroll]: 2.043e-05 [a_1]: 0.00044711 [with_stream_mark]: 3.069e-05 [recompute_prepare]: 8.63001e-06 [updatestate_depend_eliminate]: 4.15999e-06 [updatestate_assign_eliminate]: 3.31999e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.557e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 2.08002e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.46999e-06 [auto_parallel]: 6.23e-06 [parallel]: 1.63e-05 [flash_sp]: 6.53e-06 [merge_comm]: 3.94002e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 8.33999e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 6.89001e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 3.94002e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.047e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 9.36998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.32997e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.31998e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 9.97999e-06 [a_after_grad]: 8.77999e-06 [renormalize]: 0.00041709 [add_forward_monad_depend]: 4.51002e-06 [auto_monad_grad]: 2.09e-06 [auto_monad_eliminator]: 1.378e-05 [cse]: 2.878e-05 [a_3]: 4.046e-05 [Cycle 2]: 0.00059678, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012496 [with_stream_mark]: 9.62001e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.22001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.804e-05 [accelerated_algorithm]: 5.66998e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.62e-06 [auto_parallel]: 5.07999e-06 [parallel]: 3.78001e-06 [flash_sp]: 3.45e-06 [merge_comm]: 3.29001e-06 [allreduce_fusion]: 3.14001e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.14001e-06 [virtual_dataset]: 5.39998e-06 [get_grad_eliminate_]: 5.21998e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.68e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.90002e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 8.12998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.2e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.36e-06 [cse]: 1.343e-05 [a_3]: 3.241e-05 [py_interpret_to_execute_after_opt_a]: 7.3e-06 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 3.098e-05 [convert_after_rewriter]: 7.4e-06 [order_py_execute_after_rewriter]: 5.19998e-06 [mutable_eliminate]: 0.00045176 [opt_b]: 0.00018038, [1] [Cycle 1]: 0.0001744, [7] [b_1]: 0.0001081 [b_2]: 6.94001e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.2998e-07 [cse]: 1.591e-05 [optimize_parallel_all_gather_comm]: 1.551e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.318e-05 [loop_unroll]: 0.00041693 [opt_after_cconv]: 9.385e-05, [1] [Cycle 1]: 8.8e-05, [7] [c_1]: 2.727e-05 [parameter_eliminate]: 2.18002e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.575e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.278e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.402e-05, [4] [d_1]: 3.851e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 5.95002e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 4.291e-05 [cse_after_recomputation]: 2.04e-05, [1] [Cycle 1]: 1.561e-05, [1] [cse]: 1.061e-05 [environ_conv]: 5.46002e-06 [swap_dp_allreduce_reducescatter]: 5.44998e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.31998e-06 [merge_cast_opt]: 1.16002e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.01998e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 1.07e-06 [remove_cast_before_assign_add]: 1.29998e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 4.02e-06 [overlap_recompute_and_grad_model_parallel]: 4.71002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.41002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 3.7e-06 [overlap_grad_flash_sp]: 1.708e-05 [begin_end_overlap_inline]: 6.69999e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 6.838e-05, [1] [Cycle 1]: 6.422e-05, [6] [build]: 2.71e-06 [elim_shapecalc]: 8.68001e-06 [elim_not_effective]: 1.113e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.86002e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.41002e-06 [auto_monad_reorder]: 1.496e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.46999e-06 [opt_after_jit_grad]: 0.00044548 [validate]: 3.135e-05 [backend_pass]: 7.80012e-07 [task_emit]: 0.0579591 [execute]: 9.03002e-06 Sums bootstrap : 0.000472s : 0.70% type_inference : 0.005552s : 8.19% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000572s : 0.84% optimize.opt_a.with_stream_mark : 0.000040s : 0.06% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000020s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000417s : 0.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.67% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000417s : 0.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000445s : 0.66% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057959s : 85.49% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.88% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 66.60% : 0.000109s : 3: substitution.inline 1.90% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000004s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 6.52% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005512 2 89.50% : 0.004933s : 1: type_inference.infer 10.50% : 0.000579s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.80% : 0.000026s : 3: replace.inline 31.20% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.74% : 0.000107s : 3: match.inline 8.26% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.72% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000009s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.20% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.53% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.82% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000348 8 47.59% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.41% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080076 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.66% : 0.002928s : 1: add_attr 3.65% : 0.002919s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000508s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.17% : 0.000936s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.66% : 0.002128s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.57% : 0.000455s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.95% : 0.003963s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000208s : 1: renormalize.infer 0.25% : 0.000203s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 72.41% : 0.057980s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 6.95% : 0.005566s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.105924, [24] [bootstrap]: 0.00050043 [type_inference]: 0.0114232 [event_method]: 4.862e-05 [auto_monad]: 0.00011774 [graph_reusing]: 7.87e-06 [inline]: 1.91e-06 [add_attr]: 0.00304283, [1] [add_attr_with_inline]: 0.0030345, [1] [Cycle 1]: 7.064e-05, [2] [tag_attr]: 3.488e-05 [meta_addattr_fg_expand]: 8.97e-06 [parallel-infer-symbol]: 3.03998e-06 [pre_auto_parallel]: 4.918e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.51002e-06 [optimize]: 0.013288, [53] [py_interpret_to_execute]: 3.709e-05 [rewriter_before_opt_a]: 0.00014551 [opt_a]: 0.01103, [3] [Cycle 1]: 0.0071081, [45] [expand_dump_flag]: 3.84002e-06 [switch_simplify]: 7.364e-05 [loop_unroll]: 6.305e-05 [a_1]: 0.0014865 [with_stream_mark]: 2.229e-05 [recompute_prepare]: 2.147e-05 [updatestate_depend_eliminate]: 9.02e-06 [updatestate_assign_eliminate]: 7.87e-06 [updatestate_loads_eliminate]: 7.3e-06 [parameter_eliminate]: 2.32999e-06 [a_2]: 0.00024288 [accelerated_algorithm]: 3.042e-05 [shard]: 2.04e-06 [meta_shard_fg_expand]: 3.06999e-06 [shard_inline]: 1.613e-05 [merge_send_recv]: 1.552e-05 [auto_parallel]: 1.079e-05 [parallel]: 1.756e-05 [flash_sp]: 1.13e-05 [merge_comm]: 9.36e-06 [allreduce_fusion]: 8.77999e-06 [matmul_add_comm_reduction]: 2.607e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.754e-05 [virtual_dataset]: 1.568e-05 [get_grad_eliminate_]: 1.516e-05 [virtual_output]: 1.527e-05 [merge_forward]: 9.20999e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 1.771e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.8e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 2.739e-05 [set_forward_comm_id_for_comm_node_pass]: 9.46003e-06 [meta_fg_expand]: 0.00140461 [flash_sp_send_recv_attached]: 3.48999e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 5.93e-05 [a_after_grad]: 8.09e-05 [renormalize]: 0.00246095 [add_forward_monad_depend]: 9.17999e-06 [auto_monad_grad]: 5.04998e-06 [auto_monad_eliminator]: 5.602e-05 [cse]: 0.00016752 [a_3]: 0.0003329 [Cycle 2]: 0.00301043, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 4.675e-05 [loop_unroll]: 4.506e-05 [a_1]: 0.00151336 [with_stream_mark]: 1.212e-05 [recompute_prepare]: 1.081e-05 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 4.29002e-06 [updatestate_loads_eliminate]: 3.56999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012559 [accelerated_algorithm]: 1.226e-05 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.70001e-06 [shard_inline]: 9.17999e-06 [merge_send_recv]: 6.70998e-06 [auto_parallel]: 7.4e-06 [parallel]: 4.77998e-06 [flash_sp]: 3.3e-06 [merge_comm]: 5.03002e-06 [allreduce_fusion]: 4.64002e-06 [matmul_add_comm_reduction]: 7.55e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.97001e-06 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.74998e-06 [virtual_output]: 8.38001e-06 [merge_forward]: 4.13999e-06 [cell_reuse_recompute_pass]: 8.99978e-07 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.742e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.496e-05 [set_forward_comm_id_for_comm_node_pass]: 5.55001e-06 [meta_fg_expand]: 6.745e-05 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 1.17999e-06 [after_resolve]: 1.627e-05 [a_after_grad]: 1.456e-05 [renormalize]: 0.00061853 [add_forward_monad_depend]: 4.13001e-06 [auto_monad_grad]: 1.23002e-06 [auto_monad_eliminator]: 1.486e-05 [cse]: 4.594e-05 [a_3]: 6.466e-05 [Cycle 3]: 0.00089735, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 1.037e-05 [loop_unroll]: 8.90001e-06 [a_1]: 0.00024878 [with_stream_mark]: 1.034e-05 [recompute_prepare]: 9.15001e-06 [updatestate_depend_eliminate]: 4.76002e-06 [updatestate_assign_eliminate]: 3.85e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00012297 [accelerated_algorithm]: 1.134e-05 [shard]: 9.90025e-07 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 8.92e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 7.39002e-06 [parallel]: 4.34002e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 4.76002e-06 [matmul_add_comm_reduction]: 7.42002e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 9.74999e-06 [virtual_dataset]: 8.58001e-06 [get_grad_eliminate_]: 8.27998e-06 [virtual_output]: 8.22e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 8.33999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.576e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.404e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20001e-06 [meta_fg_expand]: 2.96999e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.45e-05 [a_after_grad]: 1.487e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 1.29e-06 [auto_monad_eliminator]: 1.17e-05 [cse]: 2.597e-05 [a_3]: 5.902e-05 [py_interpret_to_execute_after_opt_a]: 1.008e-05 [slice_cell_reuse_recomputed_activation]: 2.24999e-06 [rewriter_after_opt_a]: 4.553e-05 [convert_after_rewriter]: 8.72e-06 [order_py_execute_after_rewriter]: 6.52001e-06 [mutable_eliminate]: 0.00046447 [opt_b]: 0.00028824, [1] [Cycle 1]: 0.00028197, [7] [b_1]: 0.00018849 [b_2]: 1.089e-05 [updatestate_depend_eliminate]: 7.25998e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 3.85998e-06 [renormalize]: 5.40022e-07 [cse]: 3.2e-05 [optimize_parallel_all_gather_comm]: 1.989e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.035e-05 [loop_unroll]: 0.00042233 [opt_after_cconv]: 0.00013665, [1] [Cycle 1]: 0.00013066, [7] [c_1]: 4.83e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 7.33999e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.81001e-06 [cse]: 3.045e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.851e-05 [tuple_transform]: 0.00010118, [1] [Cycle 1]: 9.629e-05, [4] [d_1]: 6.667e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.62001e-06 [partial_unused_args_eliminate]: 2.31e-06 [add_recomputation]: 5.66e-05 [cse_after_recomputation]: 3.227e-05, [1] [Cycle 1]: 2.749e-05, [1] [cse]: 2.208e-05 [environ_conv]: 8.94998e-06 [swap_dp_allreduce_reducescatter]: 7.64002e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.66998e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.7e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.59998e-06 [overlap_recompute_and_grad_model_parallel]: 5.89999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4.83001e-06 [overlap_grad_flash_sp]: 2.35e-05 [begin_end_overlap_inline]: 7.89994e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.54998e-06 [handle_group_info]: 9.40025e-07 [symbol_engine_optimizer]: 9.874e-05, [1] [Cycle 1]: 9.452e-05, [6] [build]: 9.82001e-06 [elim_shapecalc]: 1.347e-05 [elim_not_effective]: 1.815e-05 [opt_reshape]: 1.014e-05 [fold_const_symbol]: 1.468e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.457e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00049271 [validate]: 4.486e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0766482 [execute]: 8.27e-06 Sums bootstrap : 0.000500s : 0.49% type_inference : 0.011423s : 11.24% event_method : 0.000049s : 0.05% auto_monad : 0.000118s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.04% optimize.rewriter_before_opt_a : 0.000146s : 0.14% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.13% optimize.opt_a.loop_unroll : 0.000117s : 0.12% optimize.opt_a.a_1 : 0.003249s : 3.20% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000491s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.03% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000019s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001475s : 1.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.09% optimize.opt_a.a_after_grad : 0.000110s : 0.11% optimize.opt_a.renormalize : 0.003080s : 3.03% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000239s : 0.24% optimize.opt_a.a_3 : 0.000457s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000464s : 0.46% optimize.opt_b.b_1 : 0.000188s : 0.19% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000422s : 0.42% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000493s : 0.48% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.076648s : 75.42% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000758 222 5.86% : 0.000044s : 12: substitution.arithmetic_simplify 1.83% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.38% : 0.000420s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000015s : 3: substitution.less_batch_normalization 1.68% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.91% : 0.000014s : 20: substitution.remove_not_recompute_node 3.20% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.80% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011347 2 86.82% : 0.009851s : 1: type_inference.infer 13.18% : 0.001496s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.00% : 0.000126s : 17: replace.inline 43.00% : 0.000095s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000446 33 92.26% : 0.000411s : 17: match.inline 7.74% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 249: predicate.inline 1.24% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.12% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.94% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.72% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001589 34 56.64% : 0.000900s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.36% : 0.000689s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.130569 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.33% : 0.003047s : 1: add_attr 2.33% : 0.003038s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000125s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000535s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000056s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000474s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.76% : 0.004908s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.45% : 0.011033s : 1: opt_a 0.11% : 0.000140s : 1: opt_after_cconv 0.38% : 0.000502s : 1: opt_after_jit_grad 0.22% : 0.000292s : 1: opt_b 10.18% : 0.013292s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000054s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.25% : 0.001627s : 2: renormalize.infer 1.10% : 0.001439s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000049s : 1: rewriter_after_opt_a 0.12% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 58.72% : 0.076665s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 8.76% : 0.011438s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.072187, [24] [bootstrap]: 0.00047119 [type_inference]: 0.00440324 [event_method]: 1.092e-05 [auto_monad]: 5.358e-05 [graph_reusing]: 4.77e-06 [inline]: 1.56998e-06 [add_attr]: 0.00307886, [1] [add_attr_with_inline]: 0.00307106, [1] [Cycle 1]: 4.507e-05, [2] [tag_attr]: 1.166e-05 [meta_addattr_fg_expand]: 3.01001e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.063e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 8.49977e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.73002e-06 [optimize]: 0.00367536, [53] [py_interpret_to_execute]: 1.477e-05 [rewriter_before_opt_a]: 3.819e-05 [opt_a]: 0.00185169, [2] [Cycle 1]: 0.00124805, [45] [expand_dump_flag]: 3.16999e-06 [switch_simplify]: 2.313e-05 [loop_unroll]: 1.395e-05 [a_1]: 0.00029095 [with_stream_mark]: 1.361e-05 [recompute_prepare]: 7.93999e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 7.549e-05 [accelerated_algorithm]: 6.19001e-06 [shard]: 2.30002e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.77e-06 [auto_parallel]: 5.70001e-06 [parallel]: 1.768e-05 [flash_sp]: 7.26999e-06 [merge_comm]: 3.59002e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.36e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 6.12999e-06 [merge_forward]: 3.91001e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.51998e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.61002e-06 [renormalize]: 0.00034168 [add_forward_monad_depend]: 4.55001e-06 [auto_monad_grad]: 1.57001e-06 [auto_monad_eliminator]: 1.281e-05 [cse]: 2.655e-05 [a_3]: 4.034e-05 [Cycle 2]: 0.00059415, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.89001e-06 [loop_unroll]: 5.46998e-06 [a_1]: 0.00012511 [with_stream_mark]: 1.083e-05 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.48998e-06 [parameter_eliminate]: 7.30011e-07 [a_2]: 6.758e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.54998e-06 [meta_shard_fg_expand]: 1.04998e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.02e-06 [flash_sp]: 2.94001e-06 [merge_comm]: 3.19001e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 4.87e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.05002e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.20999e-06 [merge_forward]: 2.66999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 6.04999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.73999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.73002e-06 [a_after_grad]: 8.57998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.259e-05 [a_3]: 3.152e-05 [py_interpret_to_execute_after_opt_a]: 7.40998e-06 [slice_cell_reuse_recomputed_activation]: 1.66998e-06 [rewriter_after_opt_a]: 3.316e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 4.82e-06 [mutable_eliminate]: 0.00044761 [opt_b]: 0.00017837, [1] [Cycle 1]: 0.0001722, [7] [b_1]: 0.00010594 [b_2]: 7.04001e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.49974e-07 [cse]: 1.629e-05 [optimize_parallel_all_gather_comm]: 1.627e-05 [overlap_param_gather]: 2.23998e-06 [cconv]: 2.209e-05 [loop_unroll]: 0.0004107 [opt_after_cconv]: 9.437e-05, [1] [Cycle 1]: 8.846e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.564e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.252e-05 [tuple_transform]: 6.887e-05, [1] [Cycle 1]: 6.447e-05, [4] [d_1]: 3.924e-05 [none_parameter_eliminate]: 1.75001e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.54e-06 [add_recomputation]: 6.712e-05 [cse_after_recomputation]: 2.126e-05, [1] [Cycle 1]: 1.659e-05, [1] [cse]: 1.15e-05 [environ_conv]: 4.97e-06 [swap_dp_allreduce_reducescatter]: 5.59998e-06 [bias_add_comm_swap]: 2.33998e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.17999e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 1.27e-06 [interleave_split_concat_branches]: 1.33002e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.73002e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.65001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.94e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.08001e-06 [symbol_engine_optimizer]: 6.941e-05, [1] [Cycle 1]: 6.523e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 9.03002e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 8.75999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.579e-05 [get_jit_bprop_graph]: 9.29984e-07 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044955 [validate]: 3.384e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.059732 [execute]: 8.92999e-06 Sums bootstrap : 0.000471s : 0.69% type_inference : 0.004403s : 6.46% event_method : 0.000011s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.61% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000342s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.66% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000411s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000067s : 0.10% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000450s : 0.66% validate : 0.000034s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059732s : 87.66% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.34% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.47% : 0.000005s : 4: substitution.graph_param_transform 65.13% : 0.000078s : 2: substitution.inline 2.52% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.91% : 0.000005s : 4: substitution.remove_not_recompute_node 3.17% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004364 2 91.85% : 0.004008s : 1: type_inference.infer 8.15% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.97% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000008s : 44: predicate.inline 1.09% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000000s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.68% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000259 6 43.64% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.36% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080200 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.84% : 0.003083s : 1: add_attr 3.83% : 0.003074s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000072s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000505s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000011s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 0.96% : 0.000766s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.31% : 0.001855s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.59% : 0.003679s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000184s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 74.51% : 0.059754s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.51% : 0.004417s : 1: type_inference 0.07% : 0.000056s : 1: validate TotalTime = 0.114833, [24] [bootstrap]: 0.00049562 [type_inference]: 0.0102683 [event_method]: 4.247e-05 [auto_monad]: 0.00011795 [graph_reusing]: 8.42e-06 [inline]: 1.76998e-06 [add_attr]: 0.00297712, [1] [add_attr_with_inline]: 0.00296877, [1] [Cycle 1]: 6.727e-05, [2] [tag_attr]: 3.092e-05 [meta_addattr_fg_expand]: 8.48999e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 4.619e-05 [insert-virtual-dataset]: 2.58003e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0131899, [53] [py_interpret_to_execute]: 3.588e-05 [rewriter_before_opt_a]: 0.00012685 [opt_a]: 0.0108838, [3] [Cycle 1]: 0.00696597, [45] [expand_dump_flag]: 3.79002e-06 [switch_simplify]: 6.594e-05 [loop_unroll]: 5.433e-05 [a_1]: 0.00135227 [with_stream_mark]: 2.298e-05 [recompute_prepare]: 2.138e-05 [updatestate_depend_eliminate]: 9.18002e-06 [updatestate_assign_eliminate]: 8.07e-06 [updatestate_loads_eliminate]: 7.42998e-06 [parameter_eliminate]: 2.48998e-06 [a_2]: 0.0002467 [accelerated_algorithm]: 3.094e-05 [shard]: 2.09e-06 [meta_shard_fg_expand]: 3.25e-06 [shard_inline]: 1.583e-05 [merge_send_recv]: 1.606e-05 [auto_parallel]: 1.08e-05 [parallel]: 1.779e-05 [flash_sp]: 1.221e-05 [merge_comm]: 9.86e-06 [allreduce_fusion]: 8.99e-06 [matmul_add_comm_reduction]: 2.627e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.796e-05 [virtual_dataset]: 1.561e-05 [get_grad_eliminate_]: 1.507e-05 [virtual_output]: 1.503e-05 [merge_forward]: 9.37001e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.783e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.845e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 2.775e-05 [set_forward_comm_id_for_comm_node_pass]: 9.44e-06 [meta_fg_expand]: 0.00140166 [flash_sp_send_recv_attached]: 3.73999e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 5.876e-05 [a_after_grad]: 8.139e-05 [renormalize]: 0.00246003 [add_forward_monad_depend]: 9.09e-06 [auto_monad_grad]: 5.87999e-06 [auto_monad_eliminator]: 5.728e-05 [cse]: 0.0001677 [a_3]: 0.00033322 [Cycle 2]: 0.00300434, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.709e-05 [loop_unroll]: 4.377e-05 [a_1]: 0.00157115 [with_stream_mark]: 1.201e-05 [recompute_prepare]: 1.091e-05 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012544 [accelerated_algorithm]: 1.186e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.99999e-06 [shard_inline]: 8.95001e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 7.38e-06 [parallel]: 5.50001e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 6.06998e-06 [allreduce_fusion]: 4.93001e-06 [matmul_add_comm_reduction]: 8.66997e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.002e-05 [virtual_dataset]: 8.85999e-06 [get_grad_eliminate_]: 8.98002e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.63e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.415e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 3.472e-05 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.513e-05 [a_after_grad]: 1.442e-05 [renormalize]: 0.00058719 [add_forward_monad_depend]: 3.97e-06 [auto_monad_grad]: 1.29003e-06 [auto_monad_eliminator]: 1.512e-05 [cse]: 4.667e-05 [a_3]: 6.497e-05 [Cycle 3]: 0.00089938, [45] [expand_dump_flag]: 1.07998e-06 [switch_simplify]: 1.025e-05 [loop_unroll]: 8.97e-06 [a_1]: 0.00024879 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 9.48002e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.97998e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 0.00012205 [accelerated_algorithm]: 1.168e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 8.90001e-06 [merge_send_recv]: 7.36001e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.82e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.92e-06 [allreduce_fusion]: 4.97e-06 [matmul_add_comm_reduction]: 7.72998e-06 [allreduce_slice_to_reducescatter]: 4.7998e-07 [virtual_shard_identity]: 9.96e-06 [virtual_dataset]: 8.74998e-06 [get_grad_eliminate_]: 8.40001e-06 [virtual_output]: 8.15e-06 [merge_forward]: 4.20999e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 8.50001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.574e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.387e-05 [set_forward_comm_id_for_comm_node_pass]: 5.13002e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 1.12e-06 [receive_attached]: 1.00999e-06 [after_resolve]: 1.431e-05 [a_after_grad]: 1.448e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 1.072e-05 [cse]: 2.666e-05 [a_3]: 5.946e-05 [py_interpret_to_execute_after_opt_a]: 1.094e-05 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 4.68e-05 [convert_after_rewriter]: 9.02999e-06 [order_py_execute_after_rewriter]: 6.96001e-06 [mutable_eliminate]: 0.00047152 [opt_b]: 0.00028818, [1] [Cycle 1]: 0.00028174, [7] [b_1]: 0.00018899 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 7.41001e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.88999e-06 [renormalize]: 4.69998e-07 [cse]: 3.179e-05 [optimize_parallel_all_gather_comm]: 2.045e-05 [overlap_param_gather]: 1.94999e-06 [cconv]: 2.03e-05 [loop_unroll]: 0.00042632 [opt_after_cconv]: 0.00018196, [1] [Cycle 1]: 0.00017593, [7] [c_1]: 9.363e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 7.23e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 3.81999e-06 [cse]: 2.989e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.912e-05 [tuple_transform]: 0.00010241, [1] [Cycle 1]: 9.758e-05, [4] [d_1]: 6.712e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.055e-05 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 5.772e-05 [cse_after_recomputation]: 3.2e-05, [1] [Cycle 1]: 2.727e-05, [1] [cse]: 2.206e-05 [environ_conv]: 9.10001e-06 [swap_dp_allreduce_reducescatter]: 7.98001e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 3.09001e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.12e-06 [overlap_opt_shard_in_pipeline]: 1.70001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.743e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 4.86002e-06 [overlap_recompute_and_grad_model_parallel]: 5.69999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.60001e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 5.02e-06 [overlap_grad_flash_sp]: 2.541e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 1.39998e-06 [symbol_engine_optimizer]: 9.655e-05, [1] [Cycle 1]: 9.249e-05, [6] [build]: 9.67001e-06 [elim_shapecalc]: 1.341e-05 [elim_not_effective]: 1.737e-05 [opt_reshape]: 9.50001e-06 [fold_const_symbol]: 1.518e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.489e-05 [get_jit_bprop_graph]: 1.29998e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00046836 [validate]: 4.651e-05 [backend_pass]: 1.15001e-06 [task_emit]: 0.0868987 [execute]: 9.22001e-06 Sums bootstrap : 0.000496s : 0.45% type_inference : 0.010268s : 9.29% event_method : 0.000042s : 0.04% auto_monad : 0.000118s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.11% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000123s : 0.11% optimize.opt_a.loop_unroll : 0.000107s : 0.10% optimize.opt_a.a_1 : 0.003172s : 2.87% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.45% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000017s : 0.02% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001439s : 1.30% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.10% optimize.opt_a.renormalize : 0.003047s : 2.76% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000241s : 0.22% optimize.opt_a.a_3 : 0.000458s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000472s : 0.43% optimize.opt_b.b_1 : 0.000189s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000426s : 0.39% optimize.opt_after_cconv.c_1 : 0.000094s : 0.08% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000017s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000468s : 0.42% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.086899s : 78.58% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000779 218 5.54% : 0.000043s : 11: substitution.arithmetic_simplify 1.72% : 0.000013s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.46% : 0.000004s : 5: substitution.float_depend_g_call 0.64% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000003s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 52.56% : 0.000409s : 16: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000015s : 3: substitution.less_batch_normalization 1.63% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.72% : 0.000013s : 20: substitution.remove_not_recompute_node 3.03% : 0.000024s : 10: substitution.replace_applicator 1.35% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.58% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 6.78% : 0.000053s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.81% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010199 2 86.81% : 0.008854s : 1: type_inference.infer 13.19% : 0.001345s : 1: type_inference.specialize ------[replace.] 0.000202 30 59.84% : 0.000121s : 16: replace.inline 40.16% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000430 30 93.18% : 0.000401s : 16: match.inline 6.82% : 0.000029s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.20% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.80% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.52% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 244: predicate.inline 1.24% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.70% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.51% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001514 32 57.45% : 0.000870s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.55% : 0.000644s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.139227 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.14% : 0.002981s : 1: add_attr 2.14% : 0.002973s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000126s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.38% : 0.000529s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000049s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.35% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.46% : 0.004814s : 117: opt.transform.opt_a 0.07% : 0.000092s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000174s : 28: opt.transform.opt_b 0.05% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000052s : 4: opt.transform.symbol_engine_opt 7.82% : 0.010887s : 1: opt_a 0.13% : 0.000185s : 1: opt_after_cconv 0.34% : 0.000478s : 1: opt_after_jit_grad 0.21% : 0.000292s : 1: opt_b 9.48% : 0.013194s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000029s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.16% : 0.001620s : 2: renormalize.infer 1.02% : 0.001414s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.09% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000099s : 1: symbol_engine_optimizer 62.43% : 0.086921s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 7.39% : 0.010283s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x0-ge],max_mem:54.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x1-pynative],max_mem:54.0M TotalTime = 0.0221258, [24] [bootstrap]: 0.00056069 [type_inference]: 0.00624175 [event_method]: 1.517e-05 [auto_monad]: 5.537e-05 [graph_reusing]: 5.46e-06 [inline]: 2.07001e-06 [add_attr]: 0.00352855, [1] [add_attr_with_inline]: 0.0035163, [1] [Cycle 1]: 4.539e-05, [2] [tag_attr]: 1.591e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.847e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.59e-06 [optimize]: 0.00415163, [53] [py_interpret_to_execute]: 2.119e-05 [rewriter_before_opt_a]: 6.235e-05 [opt_a]: 0.00219954, [2] [Cycle 1]: 0.00158891, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 3.299e-05 [loop_unroll]: 2.148e-05 [a_1]: 0.00046427 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.43999e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.35003e-06 [updatestate_loads_eliminate]: 2.73998e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 7.718e-05 [accelerated_algorithm]: 6.56999e-06 [shard]: 2.67001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.04999e-06 [merge_send_recv]: 7.89002e-06 [auto_parallel]: 5.49998e-06 [parallel]: 2.662e-05 [flash_sp]: 7.82e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.20001e-06 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 6.28002e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 8.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.13e-05 [merge_recompute_call_nodes]: 1.45001e-06 [before_grad]: 9.42999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.93e-06 [after_resolve]: 1.112e-05 [a_after_grad]: 9.49e-06 [renormalize]: 0.00046026 [add_forward_monad_depend]: 4.79e-06 [auto_monad_grad]: 2.32999e-06 [auto_monad_eliminator]: 1.394e-05 [cse]: 2.816e-05 [a_3]: 4.198e-05 [Cycle 2]: 0.0006013, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.00012811 [with_stream_mark]: 9.07999e-06 [recompute_prepare]: 5.83997e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.797e-05 [accelerated_algorithm]: 5.51998e-06 [shard]: 1.19003e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.61002e-06 [auto_parallel]: 5.71e-06 [parallel]: 5.32001e-06 [flash_sp]: 3.00998e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.84999e-06 [matmul_add_comm_reduction]: 4.85999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56003e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.27998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08998e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 7.59988e-07 [auto_monad_eliminator]: 6.46e-06 [cse]: 1.706e-05 [a_3]: 3.236e-05 [py_interpret_to_execute_after_opt_a]: 8.17e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 3.122e-05 [convert_after_rewriter]: 6.82002e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00048341 [opt_b]: 0.00018281, [1] [Cycle 1]: 0.00017673, [7] [b_1]: 0.00010753 [b_2]: 6.93998e-06 [updatestate_depend_eliminate]: 5.39998e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 4.89992e-07 [cse]: 1.811e-05 [optimize_parallel_all_gather_comm]: 1.582e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.367e-05 [loop_unroll]: 0.00043128 [opt_after_cconv]: 9.692e-05, [1] [Cycle 1]: 9.066e-05, [7] [c_1]: 2.813e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.666e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.443e-05 [tuple_transform]: 0.00010412, [1] [Cycle 1]: 9.966e-05, [4] [d_1]: 7.273e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.33998e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 5.251e-05 [cse_after_recomputation]: 2.182e-05, [1] [Cycle 1]: 1.732e-05, [1] [cse]: 1.206e-05 [environ_conv]: 4.99e-06 [swap_dp_allreduce_reducescatter]: 5.67001e-06 [bias_add_comm_swap]: 2.36998e-06 [label_micro_interleaved_index]: 4.22998e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.55001e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.19999e-06 [assign_add_opt]: 1.13001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.81e-06 [reorder_send_recv_between_fp_bp]: 2.95998e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 1.35999e-06 [interleave_split_concat_branches]: 1.34e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96003e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.46001e-06 [overlap_recompute_and_grad_model_parallel]: 4.91002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.89999e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.765e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.83002e-06 [handle_group_info]: 1.04003e-06 [symbol_engine_optimizer]: 6.985e-05, [1] [Cycle 1]: 6.566e-05, [6] [build]: 2.79001e-06 [elim_shapecalc]: 8.32998e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 9.34e-06 [renormalize]: 2.59985e-07 [detach_backward]: 1.47999e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 1.636e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 0.00013154 [opt_after_jit_grad]: 0.0004752 [validate]: 3.571e-05 [backend_pass]: 9.99979e-07 [task_emit]: 0.00664391 [execute]: 6.93e-06 Sums bootstrap : 0.000561s : 3.18% type_inference : 0.006242s : 35.45% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.31% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000062s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000592s : 3.36% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000032s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000460s : 2.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000045s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000483s : 2.75% optimize.opt_b.b_1 : 0.000108s : 0.61% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.13% optimize.loop_unroll : 0.000431s : 2.45% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000073s : 0.41% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000132s : 0.75% opt_after_jit_grad : 0.000475s : 2.70% validate : 0.000036s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006644s : 37.73% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000204 30 12.33% : 0.000025s : 5: substitution.arithmetic_simplify 0.88% : 0.000002s : 2: substitution.elim_not_effective 0.65% : 0.000001s : 2: substitution.fold_const_symbol 18.57% : 0.000038s : 4: substitution.graph_param_transform 55.91% : 0.000114s : 3: substitution.inline 1.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.36% : 0.000005s : 4: substitution.remove_not_recompute_node 2.11% : 0.000004s : 4: substitution.replace_old_param 5.82% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006195 2 90.43% : 0.005602s : 1: type_inference.infer 9.57% : 0.000593s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.77% : 0.000027s : 3: replace.inline 30.23% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 91.18% : 0.000112s : 3: match.inline 8.82% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 1.03% : 0.000002s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 1.10% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.95% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.81% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.84% : 0.000001s : 11: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 1.04% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000389 8 45.29% : 0.000176s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.71% : 0.000213s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031415 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.25% : 0.003534s : 1: add_attr 11.21% : 0.003520s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000061s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000599s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000022s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000440s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.57% : 0.000493s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.07% : 0.000963s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.24% : 0.000077s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.01% : 0.002202s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.54% : 0.000485s : 1: opt_after_jit_grad 0.59% : 0.000186s : 1: opt_b 13.23% : 0.004156s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 0.76% : 0.000239s : 1: renormalize.infer 0.68% : 0.000214s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000138s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000067s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 21.18% : 0.006654s : 1: task_emit 0.34% : 0.000107s : 1: tuple_transform 19.91% : 0.006256s : 1: type_inference 0.22% : 0.000070s : 1: validate TotalTime = 0.0193084, [24] [bootstrap]: 0.00049513 [type_inference]: 0.00461087 [event_method]: 1.052e-05 [auto_monad]: 5.098e-05 [graph_reusing]: 5.23002e-06 [inline]: 1.84998e-06 [add_attr]: 0.00320762, [1] [add_attr_with_inline]: 0.00319923, [1] [Cycle 1]: 5.066e-05, [2] [tag_attr]: 1.219e-05 [meta_addattr_fg_expand]: 3.11001e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.499e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 2.17999e-06 [optimize]: 0.00400727, [53] [py_interpret_to_execute]: 1.564e-05 [rewriter_before_opt_a]: 4.113e-05 [opt_a]: 0.0020097, [2] [Cycle 1]: 0.00139156, [45] [expand_dump_flag]: 3.12002e-06 [switch_simplify]: 2.432e-05 [loop_unroll]: 1.367e-05 [a_1]: 0.00030386 [with_stream_mark]: 1.526e-05 [recompute_prepare]: 7.55e-06 [updatestate_depend_eliminate]: 3.69002e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 7.822e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.20002e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 6.09999e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 5.98002e-06 [parallel]: 1.925e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.25998e-06 [matmul_add_comm_reduction]: 9.24998e-06 [allreduce_slice_to_reducescatter]: 7.90023e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.80002e-06 [merge_forward]: 4e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 9.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.126e-05 [merge_recompute_call_nodes]: 1.81e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.00045239 [add_forward_monad_depend]: 4.77998e-06 [auto_monad_grad]: 2.06e-06 [auto_monad_eliminator]: 1.392e-05 [cse]: 2.805e-05 [a_3]: 4.078e-05 [Cycle 2]: 0.00060847, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.53002e-06 [a_1]: 0.00012848 [with_stream_mark]: 1.166e-05 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 3.06001e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.909e-05 [accelerated_algorithm]: 5.82999e-06 [shard]: 1.34e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.84e-06 [auto_parallel]: 5.35999e-06 [parallel]: 4.87e-06 [flash_sp]: 3.21999e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.68998e-06 [matmul_add_comm_reduction]: 5.40999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.48e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.80002e-06 [cell_reuse_recompute_pass]: 1.60999e-06 [offload_activation]: 6.20002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.85002e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 1.91e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.23002e-06 [after_resolve]: 9.49e-06 [a_after_grad]: 8.37e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.29001e-06 [cse]: 1.411e-05 [a_3]: 3.434e-05 [py_interpret_to_execute_after_opt_a]: 8.55999e-06 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 3.269e-05 [convert_after_rewriter]: 7.31001e-06 [order_py_execute_after_rewriter]: 4.90999e-06 [mutable_eliminate]: 0.00054123 [opt_b]: 0.0001859, [1] [Cycle 1]: 0.00017961, [7] [b_1]: 0.00011019 [b_2]: 6.96999e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 7.59988e-07 [cse]: 1.676e-05 [optimize_parallel_all_gather_comm]: 1.57e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.356e-05 [loop_unroll]: 0.00044873 [opt_after_cconv]: 0.00013707, [1] [Cycle 1]: 0.00013077, [7] [c_1]: 6.767e-05 [parameter_eliminate]: 2.71e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.608e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 7.05e-05, [1] [Cycle 1]: 6.627e-05, [4] [d_1]: 3.988e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.61e-06 [partial_unused_args_eliminate]: 1.56998e-06 [add_recomputation]: 4.397e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.79e-06 [swap_dp_allreduce_reducescatter]: 4.98001e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.55999e-06 [label_fine_grained_interleaved_index]: 2.43998e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.01998e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 3.06999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.21997e-06 [add_comm_op_reuse_tag]: 1.22999e-06 [interleave_split_concat_branches]: 1.37999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94999e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.81002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.73e-06 [overlap_grad_ring_attention]: 3.82998e-06 [overlap_grad_flash_sp]: 1.8e-05 [begin_end_overlap_inline]: 8.2e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.962e-05, [1] [Cycle 1]: 6.54e-05, [6] [build]: 3.23e-06 [elim_shapecalc]: 8.67e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.83001e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.76e-05 [get_jit_bprop_graph]: 1.34e-06 [rewriter_after_jit_bprop_graph]: 3.83999e-06 [opt_after_jit_grad]: 0.00048912 [validate]: 3.562e-05 [backend_pass]: 1.14003e-06 [task_emit]: 0.00611359 [execute]: 7.66999e-06 Sums bootstrap : 0.000495s : 3.28% type_inference : 0.004611s : 30.50% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.17% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.10% optimize.rewriter_before_opt_a : 0.000041s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000432s : 2.86% optimize.opt_a.with_stream_mark : 0.000027s : 0.18% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000452s : 2.99% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000042s : 0.28% optimize.opt_a.a_3 : 0.000075s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000541s : 3.58% optimize.opt_b.b_1 : 0.000110s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.01% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000449s : 2.97% optimize.opt_after_cconv.c_1 : 0.000068s : 0.45% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.01% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000018s : 0.12% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000489s : 3.24% validate : 0.000036s : 0.24% backend_pass : 0.000001s : 0.01% task_emit : 0.006114s : 40.44% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000130 26 18.44% : 0.000024s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000001s : 2: substitution.fold_const_symbol 4.33% : 0.000006s : 4: substitution.graph_param_transform 65.81% : 0.000086s : 2: substitution.inline 2.12% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.42% : 0.000004s : 4: substitution.remove_not_recompute_node 3.62% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004565 2 92.20% : 0.004209s : 1: type_inference.infer 7.80% : 0.000356s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000084 2 100.00% : 0.000084s : 2: match.inline ------[predicate.] 0.000183 984 0.68% : 0.000001s : 9: predicate.accumulaten_eliminater 0.89% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.52% : 0.000001s : 8: predicate.addn_check_dump 0.61% : 0.000001s : 9: predicate.addn_zero_filter 0.55% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.01% : 0.000004s : 17: predicate.arithmetic_simplify 0.67% : 0.000001s : 9: predicate.cast_eliminate 0.61% : 0.000001s : 8: predicate.check_bprop_eliminate 0.49% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.60% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.68% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.89% : 0.000002s : 13: predicate.environ_get_add_eliminate 0.82% : 0.000001s : 13: predicate.environ_get_depend_swap 1.48% : 0.000003s : 21: predicate.environ_get_eliminate 0.83% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.71% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.53% : 0.000003s : 11: predicate.float_depend_g_call 0.50% : 0.000001s : 8: predicate.float_environ_get_switch 0.77% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000001s : 4: predicate.graph_param_transform 0.62% : 0.000001s : 8: predicate.incorporate_call 0.50% : 0.000001s : 8: predicate.incorporate_call_switch 25.66% : 0.000047s : 44: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.20% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.63% : 0.000003s : 26: predicate.load_eliminater 0.98% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.27% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.38% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.51% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.58% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.56% : 0.000001s : 9: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.34% : 0.000001s : 4: predicate.parallel_virtual_node 0.95% : 0.000002s : 11: predicate.partial_defer_inline 0.94% : 0.000002s : 13: predicate.partial_eliminate 0.60% : 0.000001s : 9: predicate.print_const_string_wrapper 0.56% : 0.000001s : 8: predicate.reduce_all_const_elim 0.80% : 0.000001s : 9: predicate.reduce_eliminate 1.66% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.59% : 0.000001s : 8: predicate.remove_not_recompute_node 1.09% : 0.000002s : 17: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000001s : 4: predicate.reset_defer_inline 0.60% : 0.000001s : 9: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.45% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000002s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.82% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.70% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.82% : 0.000001s : 11: predicate.switch_defer_inline 1.34% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.50% : 0.000006s : 41: predicate.switch_simplify 0.63% : 0.000001s : 9: predicate.tile_eliminate 0.59% : 0.000001s : 9: predicate.transpose_eliminate 1.23% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.30% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.05% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.62% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.06% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.91% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.14% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.65% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.47% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.64% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.29% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000259 6 40.44% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.56% : 0.000154s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027966 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.49% : 0.003213s : 1: add_attr 11.45% : 0.003203s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000056s : 1: auto_monad 0.08% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.90% : 0.000532s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000006s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000458s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.97% : 0.000551s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.83% : 0.000791s : 78: opt.transform.opt_a 0.24% : 0.000066s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.20% : 0.002013s : 1: opt_a 0.50% : 0.000140s : 1: opt_after_cconv 1.79% : 0.000500s : 1: opt_after_jit_grad 0.68% : 0.000189s : 1: opt_b 14.34% : 0.004011s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.93% : 0.000261s : 1: renormalize.infer 0.66% : 0.000185s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000037s : 1: rewriter_after_opt_a 0.16% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000072s : 1: symbol_engine_optimizer 21.90% : 0.006125s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 16.55% : 0.004627s : 1: type_inference 0.24% : 0.000067s : 1: validate TotalTime = 0.0188824, [24] [bootstrap]: 0.00041815 [type_inference]: 0.00508223 [event_method]: 1.316e-05 [auto_monad]: 3.321e-05 [graph_reusing]: 3.55998e-06 [inline]: 2.26e-06 [add_attr]: 0.00281734, [1] [add_attr_with_inline]: 0.00281005, [1] [Cycle 1]: 3.408e-05, [2] [tag_attr]: 1.118e-05 [meta_addattr_fg_expand]: 2.94001e-06 [parallel-infer-symbol]: 1.74e-06 [pre_auto_parallel]: 2.076e-05 [insert-virtual-dataset]: 1.00999e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 1.05001e-06 [pipeline_split]: 8.89995e-07 [optimize]: 0.00389363, [53] [py_interpret_to_execute]: 1.619e-05 [rewriter_before_opt_a]: 4.929e-05 [opt_a]: 0.0020436, [2] [Cycle 1]: 0.00145289, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 2.959e-05 [loop_unroll]: 2.055e-05 [a_1]: 0.00042702 [with_stream_mark]: 1.232e-05 [recompute_prepare]: 7.52998e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.06999e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.539e-05 [accelerated_algorithm]: 6.25002e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.51998e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.65e-06 [auto_parallel]: 5.76998e-06 [parallel]: 1.612e-05 [flash_sp]: 6.81001e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 8.52e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.84999e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.044e-05 [merge_recompute_call_nodes]: 1.64998e-06 [before_grad]: 9.19e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.59001e-06 [receive_attached]: 1.95001e-06 [after_resolve]: 1.026e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 0.00040651 [add_forward_monad_depend]: 3.8e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 1.29e-05 [cse]: 2.515e-05 [a_3]: 3.885e-05 [Cycle 2]: 0.00058199, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 6.31e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00012266 [with_stream_mark]: 8.86002e-06 [recompute_prepare]: 5.66998e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.43002e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 6.713e-05 [accelerated_algorithm]: 5.29998e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.34e-06 [merge_send_recv]: 4.32003e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.05998e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.58998e-06 [matmul_add_comm_reduction]: 4.88001e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 5.97001e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 5.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 8.87e-06 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 7.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.38002e-06 [a_after_grad]: 7.95e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 7.30011e-07 [auto_monad_eliminator]: 6.04001e-06 [cse]: 1.67e-05 [a_3]: 3.168e-05 [py_interpret_to_execute_after_opt_a]: 7.24001e-06 [slice_cell_reuse_recomputed_activation]: 2.22999e-06 [rewriter_after_opt_a]: 2.922e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.06002e-06 [mutable_eliminate]: 0.00044701 [opt_b]: 0.00018234, [1] [Cycle 1]: 0.0001761, [7] [b_1]: 0.00010979 [b_2]: 7.07002e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.7998e-07 [cse]: 1.559e-05 [optimize_parallel_all_gather_comm]: 1.5e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.336e-05 [loop_unroll]: 0.00041766 [opt_after_cconv]: 9.42e-05, [1] [Cycle 1]: 8.853e-05, [7] [c_1]: 2.747e-05 [parameter_eliminate]: 2.02999e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.28998e-06 [cse]: 1.606e-05 [renormalize]: 3.20026e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 6.892e-05, [1] [Cycle 1]: 6.464e-05, [4] [d_1]: 3.89e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.318e-05 [cse_after_recomputation]: 1.953e-05, [1] [Cycle 1]: 1.527e-05, [1] [cse]: 1.028e-05 [environ_conv]: 4.33001e-06 [swap_dp_allreduce_reducescatter]: 5.10001e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.86999e-06 [merge_cast_opt]: 1.34998e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.161e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 3.41001e-06 [overlap_recompute_and_grad_model_parallel]: 4.46002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 3.73999e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 1.81e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 1.10999e-06 [symbol_engine_optimizer]: 6.925e-05, [1] [Cycle 1]: 6.484e-05, [6] [build]: 2.39999e-06 [elim_shapecalc]: 8.77999e-06 [elim_not_effective]: 1.172e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 9.00999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.56002e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.532e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.53999e-06 [opt_after_jit_grad]: 0.00045564 [validate]: 3e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00589917 [execute]: 7.05e-06 Sums bootstrap : 0.000418s : 2.77% type_inference : 0.005082s : 33.66% event_method : 0.000013s : 0.09% auto_monad : 0.000033s : 0.22% graph_reusing : 0.000004s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.07% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000002s : 0.01% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000001s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000001s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000049s : 0.33% optimize.opt_a.expand_dump_flag : 0.000002s : 0.01% optimize.opt_a.switch_simplify : 0.000036s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000550s : 3.64% optimize.opt_a.with_stream_mark : 0.000021s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000020s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000019s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000407s : 2.69% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000042s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000447s : 2.96% optimize.opt_b.b_1 : 0.000110s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000418s : 2.77% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000456s : 3.02% validate : 0.000030s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005899s : 39.07% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000147 30 15.46% : 0.000023s : 5: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000001s : 2: substitution.fold_const_symbol 3.92% : 0.000006s : 4: substitution.graph_param_transform 63.74% : 0.000093s : 3: substitution.inline 1.83% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.77% : 0.000004s : 4: substitution.remove_not_recompute_node 2.65% : 0.000004s : 4: substitution.replace_old_param 7.35% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005048 2 90.19% : 0.004553s : 1: type_inference.infer 9.81% : 0.000495s : 1: type_inference.specialize ------[replace.] 0.000036 5 69.28% : 0.000025s : 3: replace.inline 30.72% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000101 5 90.35% : 0.000091s : 3: match.inline 9.65% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.41% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.91% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 0.98% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 1.06% : 0.000002s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.43% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.61% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000000s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.86% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.27% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000310 8 42.16% : 0.000131s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.84% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027035 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.44% : 0.002821s : 1: add_attr 10.41% : 0.002813s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.14% : 0.000038s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.62% : 0.000439s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000004s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000007s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000004s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.58% : 0.000426s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.68% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000011s : 1: opt.transform.mutable_eliminate 3.36% : 0.000908s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.57% : 0.002047s : 1: opt_a 0.36% : 0.000098s : 1: opt_after_cconv 1.72% : 0.000465s : 1: opt_after_jit_grad 0.69% : 0.000186s : 1: opt_b 14.42% : 0.003898s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000005s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.78% : 0.000212s : 1: renormalize.infer 0.70% : 0.000188s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000033s : 1: rewriter_after_opt_a 0.20% : 0.000053s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 21.86% : 0.005910s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 18.85% : 0.005095s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0377549, [24] [bootstrap]: 0.00051477 [type_inference]: 0.0114182 [event_method]: 4.914e-05 [auto_monad]: 0.00012061 [graph_reusing]: 8.84e-06 [inline]: 2.60002e-06 [add_attr]: 0.00301781, [1] [add_attr_with_inline]: 0.00300926, [1] [Cycle 1]: 7.123e-05, [2] [tag_attr]: 3.433e-05 [meta_addattr_fg_expand]: 9.51e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 5.037e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.0135595, [53] [py_interpret_to_execute]: 3.9e-05 [rewriter_before_opt_a]: 0.00014766 [opt_a]: 0.0112163, [3] [Cycle 1]: 0.00721005, [45] [expand_dump_flag]: 3.94002e-06 [switch_simplify]: 7.43e-05 [loop_unroll]: 6.104e-05 [a_1]: 0.00149345 [with_stream_mark]: 2.349e-05 [recompute_prepare]: 2.181e-05 [updatestate_depend_eliminate]: 8.96002e-06 [updatestate_assign_eliminate]: 7.71999e-06 [updatestate_loads_eliminate]: 7.2e-06 [parameter_eliminate]: 2.49999e-06 [a_2]: 0.0002448 [accelerated_algorithm]: 3.052e-05 [shard]: 1.94e-06 [meta_shard_fg_expand]: 3.81999e-06 [shard_inline]: 1.616e-05 [merge_send_recv]: 1.522e-05 [auto_parallel]: 1.128e-05 [parallel]: 1.841e-05 [flash_sp]: 1.097e-05 [merge_comm]: 9.48002e-06 [allreduce_fusion]: 8.74e-06 [matmul_add_comm_reduction]: 2.622e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.849e-05 [virtual_dataset]: 1.618e-05 [get_grad_eliminate_]: 1.545e-05 [virtual_output]: 1.517e-05 [merge_forward]: 9.17001e-06 [cell_reuse_recompute_pass]: 1.14003e-06 [offload_activation]: 1.763e-05 [cell_reuse_handle_not_recompute_node_pass]: 3.042e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 2.798e-05 [set_forward_comm_id_for_comm_node_pass]: 9.67001e-06 [meta_fg_expand]: 0.00142516 [flash_sp_send_recv_attached]: 3.78001e-06 [receive_attached]: 2.21e-06 [after_resolve]: 5.977e-05 [a_after_grad]: 8.095e-05 [renormalize]: 0.00252187 [add_forward_monad_depend]: 9.37999e-06 [auto_monad_grad]: 5.96e-06 [auto_monad_eliminator]: 5.602e-05 [cse]: 0.00016688 [a_3]: 0.00033485 [Cycle 2]: 0.00308796, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 4.675e-05 [loop_unroll]: 4.322e-05 [a_1]: 0.00151978 [with_stream_mark]: 1.215e-05 [recompute_prepare]: 1.124e-05 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 3.65e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012589 [accelerated_algorithm]: 1.226e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.96998e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.66999e-06 [parallel]: 6.06998e-06 [flash_sp]: 3.18e-06 [merge_comm]: 4.95999e-06 [allreduce_fusion]: 4.68001e-06 [matmul_add_comm_reduction]: 7.9e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.034e-05 [virtual_dataset]: 8.89e-06 [get_grad_eliminate_]: 8.85001e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.75001e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 9.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.64e-05 [merge_recompute_call_nodes]: 8.39995e-07 [before_grad]: 1.395e-05 [set_forward_comm_id_for_comm_node_pass]: 5.02e-06 [meta_fg_expand]: 8.152e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.05999e-06 [after_resolve]: 1.644e-05 [a_after_grad]: 1.457e-05 [renormalize]: 0.00067141 [add_forward_monad_depend]: 4.15e-06 [auto_monad_grad]: 1.32999e-06 [auto_monad_eliminator]: 1.484e-05 [cse]: 4.799e-05 [a_3]: 6.507e-05 [Cycle 3]: 0.00090334, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.089e-05 [loop_unroll]: 8.77e-06 [a_1]: 0.00025022 [with_stream_mark]: 9.84999e-06 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.73001e-06 [updatestate_assign_eliminate]: 3.89002e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.00012242 [accelerated_algorithm]: 1.166e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 2.05002e-06 [shard_inline]: 9.00999e-06 [merge_send_recv]: 7.18e-06 [auto_parallel]: 7.16001e-06 [parallel]: 4.95001e-06 [flash_sp]: 1.27e-06 [merge_comm]: 4.99998e-06 [allreduce_fusion]: 5.07999e-06 [matmul_add_comm_reduction]: 7.75e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.79e-06 [get_grad_eliminate_]: 8.40999e-06 [virtual_output]: 8.26002e-06 [merge_forward]: 4.01001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.80001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.618e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.362e-05 [set_forward_comm_id_for_comm_node_pass]: 4.99e-06 [meta_fg_expand]: 2.90998e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.446e-05 [a_after_grad]: 1.465e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.30999e-06 [auto_monad_grad]: 1.04003e-06 [auto_monad_eliminator]: 1.053e-05 [cse]: 2.667e-05 [a_3]: 6.057e-05 [py_interpret_to_execute_after_opt_a]: 1.166e-05 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 4.707e-05 [convert_after_rewriter]: 8.92e-06 [order_py_execute_after_rewriter]: 6.83e-06 [mutable_eliminate]: 0.0005081 [opt_b]: 0.00028957, [1] [Cycle 1]: 0.00028302, [7] [b_1]: 0.00018956 [b_2]: 1.102e-05 [updatestate_depend_eliminate]: 7.4e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 5.39992e-07 [cse]: 3.218e-05 [optimize_parallel_all_gather_comm]: 2.116e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.104e-05 [loop_unroll]: 0.00043929 [opt_after_cconv]: 0.00013623, [1] [Cycle 1]: 0.00013026, [7] [c_1]: 4.831e-05 [parameter_eliminate]: 2.18002e-06 [updatestate_depend_eliminate]: 7.38e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.81999e-06 [cse]: 3.028e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 2.938e-05 [tuple_transform]: 0.00010173, [1] [Cycle 1]: 9.724e-05, [4] [d_1]: 6.744e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.71e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 6.195e-05 [cse_after_recomputation]: 3.383e-05, [1] [Cycle 1]: 2.859e-05, [1] [cse]: 2.283e-05 [environ_conv]: 8.92999e-06 [swap_dp_allreduce_reducescatter]: 7.9e-06 [bias_add_comm_swap]: 2.73998e-06 [label_micro_interleaved_index]: 3.91001e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.26998e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.37e-06 [full_micro_interleaved_order_control]: 2.01998e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.723e-05 [grouped_pairwise_exchange_alltoall]: 1.66998e-06 [offloading_packed_experts]: 4.77e-06 [overlap_recompute_and_grad_model_parallel]: 5.45001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 5.32999e-06 [overlap_grad_flash_sp]: 2.575e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.83997e-06 [handle_group_info]: 1.16997e-06 [symbol_engine_optimizer]: 9.859e-05, [1] [Cycle 1]: 9.417e-05, [6] [build]: 1.004e-05 [elim_shapecalc]: 1.334e-05 [elim_not_effective]: 1.799e-05 [opt_reshape]: 1e-05 [fold_const_symbol]: 1.504e-05 [renormalize]: 1.70025e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 2.495e-05 [get_jit_bprop_graph]: 1.29998e-06 [rewriter_after_jit_bprop_graph]: 3.80998e-06 [opt_after_jit_grad]: 0.00051698 [validate]: 4.545e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00818792 [execute]: 7.61001e-06 Sums bootstrap : 0.000515s : 1.54% type_inference : 0.011418s : 34.11% event_method : 0.000049s : 0.15% auto_monad : 0.000121s : 0.36% graph_reusing : 0.000009s : 0.03% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000148s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000113s : 0.34% optimize.opt_a.a_1 : 0.003263s : 9.75% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.47% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001510s : 4.51% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.003193s : 9.54% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000242s : 0.72% optimize.opt_a.a_3 : 0.000460s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000508s : 1.52% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000439s : 1.31% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000062s : 0.19% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000026s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000517s : 1.54% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008188s : 24.46% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000804 222 5.51% : 0.000044s : 12: substitution.arithmetic_simplify 1.75% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.22% : 0.000002s : 2: substitution.incorporate_call_switch 52.69% : 0.000424s : 17: substitution.inline 1.94% : 0.000016s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000015s : 3: substitution.less_batch_normalization 1.54% : 0.000012s : 11: substitution.minmaximum_grad 0.71% : 0.000006s : 5: substitution.partial_eliminate 1.79% : 0.000014s : 20: substitution.remove_not_recompute_node 2.92% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.41% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.70% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.24% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 13.43% : 0.000108s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011342 2 87.06% : 0.009874s : 1: type_inference.infer 12.94% : 0.001467s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.25% : 0.000125s : 17: replace.inline 42.75% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000490 33 84.53% : 0.000415s : 17: match.inline 15.47% : 0.000076s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 100: predicate.arithmetic_simplify 1.17% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000042s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.78% : 0.000021s : 168: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.98% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000037s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001599 34 55.85% : 0.000893s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.15% : 0.000706s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062782 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.81% : 0.003022s : 1: add_attr 4.80% : 0.003013s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000128s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.87% : 0.000549s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000037s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000056s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000449s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.82% : 0.000518s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.85% : 0.004930s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.87% : 0.011219s : 1: opt_a 0.22% : 0.000140s : 1: opt_after_cconv 0.84% : 0.000527s : 1: opt_after_jit_grad 0.47% : 0.000293s : 1: opt_b 21.60% : 0.013564s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.77% : 0.001737s : 2: renormalize.infer 2.30% : 0.001443s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000152s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.06% : 0.008198s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.21% : 0.011434s : 1: type_inference 0.13% : 0.000079s : 1: validate TotalTime = 0.0186029, [24] [bootstrap]: 0.00047696 [type_inference]: 0.00429801 [event_method]: 1.082e-05 [auto_monad]: 4.978e-05 [graph_reusing]: 5.46998e-06 [inline]: 1.95001e-06 [add_attr]: 0.00300441, [1] [add_attr_with_inline]: 0.00299649, [1] [Cycle 1]: 4.534e-05, [2] [tag_attr]: 1.235e-05 [meta_addattr_fg_expand]: 2.91e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.227e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.99977e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.51002e-06 [optimize]: 0.00373056, [53] [py_interpret_to_execute]: 1.488e-05 [rewriter_before_opt_a]: 3.904e-05 [opt_a]: 0.00191711, [2] [Cycle 1]: 0.00131738, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 2.369e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.0002905 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.48999e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.22002e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.61002e-06 [a_2]: 7.69e-05 [accelerated_algorithm]: 6.25002e-06 [shard]: 2.11998e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 8.28001e-06 [auto_parallel]: 6.17999e-06 [parallel]: 1.737e-05 [flash_sp]: 7.95e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.85001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.06999e-06 [virtual_dataset]: 5.79999e-06 [get_grad_eliminate_]: 5.81003e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 8.72e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 1.076e-05 [a_after_grad]: 9.12999e-06 [renormalize]: 0.0004041 [add_forward_monad_depend]: 4.97999e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.291e-05 [cse]: 2.799e-05 [a_3]: 4.048e-05 [Cycle 2]: 0.00059008, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.70001e-06 [a_1]: 0.00012445 [with_stream_mark]: 8.96998e-06 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.709e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 1.28002e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.38001e-06 [flash_sp]: 2.83e-06 [merge_comm]: 2.98998e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.15001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 5.87001e-06 [virtual_dataset]: 5.07e-06 [get_grad_eliminate_]: 4.94998e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.66e-06 [merge_recompute_call_nodes]: 8.40024e-07 [before_grad]: 7.93001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 1.50001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.34998e-06 [a_after_grad]: 8.04002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.46e-06 [cse]: 1.294e-05 [a_3]: 3.311e-05 [py_interpret_to_execute_after_opt_a]: 6.93998e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.03e-05 [convert_after_rewriter]: 6.62002e-06 [order_py_execute_after_rewriter]: 4.87e-06 [mutable_eliminate]: 0.00045859 [opt_b]: 0.00018146, [1] [Cycle 1]: 0.00017556, [7] [b_1]: 0.0001077 [b_2]: 6.94001e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 5.10016e-07 [cse]: 1.632e-05 [optimize_parallel_all_gather_comm]: 1.539e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.167e-05 [loop_unroll]: 0.00041688 [opt_after_cconv]: 9.474e-05, [1] [Cycle 1]: 8.902e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.58003e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.646e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.317e-05 [tuple_transform]: 6.848e-05, [1] [Cycle 1]: 6.427e-05, [4] [d_1]: 3.884e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 2.35002e-06 [add_recomputation]: 4.609e-05 [cse_after_recomputation]: 1.987e-05, [1] [Cycle 1]: 1.544e-05, [1] [cse]: 1.054e-05 [environ_conv]: 4.38999e-06 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 2.24001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.19999e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 8.10018e-07 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 9.40025e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.35e-06 [overlap_recompute_and_grad_model_parallel]: 4.47998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.7e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.744e-05, [1] [Cycle 1]: 6.296e-05, [6] [build]: 2.46998e-06 [elim_shapecalc]: 8.07e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.71002e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.582e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.48999e-06 [opt_after_jit_grad]: 0.00044935 [validate]: 3.24e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.00629276 [execute]: 6.56999e-06 Sums bootstrap : 0.000477s : 3.26% type_inference : 0.004298s : 29.34% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000404s : 2.76% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000459s : 3.13% optimize.opt_b.b_1 : 0.000108s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000417s : 2.85% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.02% optimize.add_recomputation : 0.000046s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.07% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006293s : 42.95% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000120 26 18.31% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.57% : 0.000005s : 4: substitution.graph_param_transform 65.15% : 0.000078s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 3.47% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004255 2 91.94% : 0.003912s : 1: type_inference.infer 8.06% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 1.06% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000002s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.69% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 9: predicate.minmaximum_grad 1.47% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.86% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000242 6 41.01% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.99% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026660 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.29% : 0.003009s : 1: add_attr 11.25% : 0.003000s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000503s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000425s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000467s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.87% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.20% : 0.001920s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.72% : 0.000458s : 1: opt_after_jit_grad 0.69% : 0.000185s : 1: opt_b 14.01% : 0.003735s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.89% : 0.000238s : 1: renormalize.infer 0.60% : 0.000160s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.64% : 0.006304s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.17% : 0.004311s : 1: type_inference 0.22% : 0.000060s : 1: validate TotalTime = 0.0381849, [24] [bootstrap]: 0.00052775 [type_inference]: 0.0105938 [event_method]: 4.625e-05 [auto_monad]: 0.00012028 [graph_reusing]: 8.05e-06 [inline]: 2.54999e-06 [add_attr]: 0.00339906, [1] [add_attr_with_inline]: 0.00338945, [1] [Cycle 1]: 8.129e-05, [2] [tag_attr]: 3.479e-05 [meta_addattr_fg_expand]: 8.62e-06 [parallel-infer-symbol]: 3.34001e-06 [pre_auto_parallel]: 5.249e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.56002e-06 [optimize]: 0.0142742, [53] [py_interpret_to_execute]: 3.834e-05 [rewriter_before_opt_a]: 0.00013637 [opt_a]: 0.0118266, [3] [Cycle 1]: 0.00781222, [45] [expand_dump_flag]: 4.94998e-06 [switch_simplify]: 6.74e-05 [loop_unroll]: 5.493e-05 [a_1]: 0.00140232 [with_stream_mark]: 2.781e-05 [recompute_prepare]: 2.251e-05 [updatestate_depend_eliminate]: 9.62001e-06 [updatestate_assign_eliminate]: 7.63001e-06 [updatestate_loads_eliminate]: 7.64002e-06 [parameter_eliminate]: 3.15002e-06 [a_2]: 0.00025314 [accelerated_algorithm]: 3.337e-05 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 5.29e-06 [shard_inline]: 1.623e-05 [merge_send_recv]: 1.824e-05 [auto_parallel]: 1.266e-05 [parallel]: 2.068e-05 [flash_sp]: 1.198e-05 [merge_comm]: 9.81e-06 [allreduce_fusion]: 8.97e-06 [matmul_add_comm_reduction]: 3.221e-05 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 1.833e-05 [virtual_dataset]: 1.567e-05 [get_grad_eliminate_]: 1.529e-05 [virtual_output]: 1.575e-05 [merge_forward]: 9.86e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.914e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.897e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.778e-05 [set_forward_comm_id_for_comm_node_pass]: 9.99001e-06 [meta_fg_expand]: 0.0017571 [flash_sp_send_recv_attached]: 4.42e-06 [receive_attached]: 3.07002e-06 [after_resolve]: 6.63e-05 [a_after_grad]: 8.408e-05 [renormalize]: 0.00282109 [add_forward_monad_depend]: 9.20001e-06 [auto_monad_grad]: 6.19001e-06 [auto_monad_eliminator]: 5.703e-05 [cse]: 0.00017025 [a_3]: 0.00034035 [Cycle 2]: 0.00309298, [45] [expand_dump_flag]: 2.01e-06 [switch_simplify]: 4.685e-05 [loop_unroll]: 4.371e-05 [a_1]: 0.0015522 [with_stream_mark]: 1.42e-05 [recompute_prepare]: 1.144e-05 [updatestate_depend_eliminate]: 5.76998e-06 [updatestate_assign_eliminate]: 4.87e-06 [updatestate_loads_eliminate]: 4.12e-06 [parameter_eliminate]: 1.25001e-06 [a_2]: 0.00012748 [accelerated_algorithm]: 1.262e-05 [shard]: 1.01997e-06 [meta_shard_fg_expand]: 2.36e-06 [shard_inline]: 9.02e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 8.03001e-06 [parallel]: 5.94e-06 [flash_sp]: 3.24001e-06 [merge_comm]: 5.59998e-06 [allreduce_fusion]: 5.00001e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 4.2998e-07 [virtual_shard_identity]: 1.048e-05 [virtual_dataset]: 8.76997e-06 [get_grad_eliminate_]: 9.07999e-06 [virtual_output]: 8.25999e-06 [merge_forward]: 4.50001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 1.06e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.566e-05 [merge_recompute_call_nodes]: 8.30012e-07 [before_grad]: 1.404e-05 [set_forward_comm_id_for_comm_node_pass]: 5.65001e-06 [meta_fg_expand]: 4.587e-05 [flash_sp_send_recv_attached]: 1.10001e-06 [receive_attached]: 1.28002e-06 [after_resolve]: 1.483e-05 [a_after_grad]: 1.487e-05 [renormalize]: 0.00064256 [add_forward_monad_depend]: 4.98001e-06 [auto_monad_grad]: 1.50001e-06 [auto_monad_eliminator]: 1.527e-05 [cse]: 5.277e-05 [a_3]: 6.708e-05 [Cycle 3]: 0.00090556, [45] [expand_dump_flag]: 1.09998e-06 [switch_simplify]: 1.043e-05 [loop_unroll]: 9.00999e-06 [a_1]: 0.00025396 [with_stream_mark]: 1.083e-05 [recompute_prepare]: 9.51e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 4.12998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012272 [accelerated_algorithm]: 1.165e-05 [shard]: 1.40999e-06 [meta_shard_fg_expand]: 1.93997e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 7.16999e-06 [auto_parallel]: 7.31999e-06 [parallel]: 4.81002e-06 [flash_sp]: 1.15001e-06 [merge_comm]: 4.79002e-06 [allreduce_fusion]: 4.97e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.99001e-06 [virtual_dataset]: 8.53001e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.12003e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.615e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.4e-05 [set_forward_comm_id_for_comm_node_pass]: 5.47999e-06 [meta_fg_expand]: 3.02002e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.364e-05 [a_after_grad]: 1.42e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 1.08e-05 [cse]: 2.623e-05 [a_3]: 5.717e-05 [py_interpret_to_execute_after_opt_a]: 1.336e-05 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 5.093e-05 [convert_after_rewriter]: 9.09998e-06 [order_py_execute_after_rewriter]: 6.69999e-06 [mutable_eliminate]: 0.00058402 [opt_b]: 0.00029282, [1] [Cycle 1]: 0.00028619, [7] [b_1]: 0.00019131 [b_2]: 1.12e-05 [updatestate_depend_eliminate]: 7.48e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.94002e-06 [renormalize]: 3.19997e-07 [cse]: 3.197e-05 [optimize_parallel_all_gather_comm]: 2.138e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.255e-05 [loop_unroll]: 0.00045858 [opt_after_cconv]: 0.00013812, [1] [Cycle 1]: 0.00013185, [7] [c_1]: 4.881e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 6.94001e-06 [updatestate_assign_eliminate]: 4.52e-06 [updatestate_loads_eliminate]: 3.96001e-06 [cse]: 3.095e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 3.158e-05 [tuple_transform]: 0.00010276, [1] [Cycle 1]: 9.803e-05, [4] [d_1]: 6.716e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 1.018e-05 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.843e-05 [cse_after_recomputation]: 3.335e-05, [1] [Cycle 1]: 2.836e-05, [1] [cse]: 2.266e-05 [environ_conv]: 8.80999e-06 [swap_dp_allreduce_reducescatter]: 8.18001e-06 [bias_add_comm_swap]: 2.68998e-06 [label_micro_interleaved_index]: 4.67e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.00002e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 9.90025e-07 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.40001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.752e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.92999e-06 [overlap_recompute_and_grad_model_parallel]: 5.39e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 5.12e-06 [overlap_grad_flash_sp]: 2.595e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 0.00010063, [1] [Cycle 1]: 9.627e-05, [6] [build]: 9.58002e-06 [elim_shapecalc]: 1.313e-05 [elim_not_effective]: 1.829e-05 [opt_reshape]: 1.087e-05 [fold_const_symbol]: 1.519e-05 [renormalize]: 2.60014e-07 [detach_backward]: 2.13998e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.482e-05 [get_jit_bprop_graph]: 1.69e-06 [rewriter_after_jit_bprop_graph]: 3.73001e-06 [opt_after_jit_grad]: 0.00050502 [validate]: 7.774e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.0083027 [execute]: 7.11999e-06 Sums bootstrap : 0.000528s : 1.58% type_inference : 0.010594s : 31.66% event_method : 0.000046s : 0.14% auto_monad : 0.000120s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000052s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000136s : 0.41% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.37% optimize.opt_a.loop_unroll : 0.000108s : 0.32% optimize.opt_a.a_1 : 0.003208s : 9.59% optimize.opt_a.with_stream_mark : 0.000053s : 0.16% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000503s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.17% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000010s : 0.03% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000034s : 0.10% optimize.opt_a.auto_parallel : 0.000028s : 0.08% optimize.opt_a.parallel : 0.000031s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000049s : 0.15% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000039s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001806s : 5.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000095s : 0.28% optimize.opt_a.a_after_grad : 0.000113s : 0.34% optimize.opt_a.renormalize : 0.003464s : 10.35% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.25% optimize.opt_a.cse : 0.000249s : 0.74% optimize.opt_a.a_3 : 0.000465s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000584s : 1.75% optimize.opt_b.b_1 : 0.000191s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.07% optimize.loop_unroll : 0.000459s : 1.37% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000032s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.17% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000026s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000505s : 1.51% validate : 0.000078s : 0.23% backend_pass : 0.000001s : 0.00% task_emit : 0.008303s : 24.81% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000812 218 6.52% : 0.000053s : 11: substitution.arithmetic_simplify 2.10% : 0.000017s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.83% : 0.000453s : 16: substitution.inline 2.20% : 0.000018s : 2: substitution.inline_without_move 1.29% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000016s : 3: substitution.less_batch_normalization 1.72% : 0.000014s : 11: substitution.minmaximum_grad 0.80% : 0.000006s : 5: substitution.partial_eliminate 1.63% : 0.000013s : 20: substitution.remove_not_recompute_node 3.06% : 0.000025s : 10: substitution.replace_applicator 1.35% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.56% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.71% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.00% : 0.000065s : 28: substitution.tuple_list_get_item_eliminator 2.33% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010514 2 86.66% : 0.009111s : 1: type_inference.infer 13.34% : 0.001403s : 1: type_inference.specialize ------[replace.] 0.000212 30 59.91% : 0.000127s : 16: replace.inline 40.09% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000476 30 93.32% : 0.000444s : 16: match.inline 6.68% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5663 1.13% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 99: predicate.arithmetic_simplify 1.20% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.17% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.49% : 0.000004s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.55% : 0.000004s : 32: predicate.float_environ_get_switch 0.72% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.48% : 0.000041s : 244: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.15% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 67: predicate.minmaximum_grad 0.38% : 0.000003s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 97: predicate.partial_defer_inline 1.68% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.15% : 0.000009s : 67: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000014s : 97: predicate.switch_defer_inline 2.91% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.78% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.75% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.67% : 0.000013s : 97: predicate.tuple_to_list_eliminator_ 2.59% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.61% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001571 32 57.78% : 0.000908s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.22% : 0.000663s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064524 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.28% : 0.003404s : 1: add_attr 5.26% : 0.003394s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000127s : 1: auto_monad 0.04% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.86% : 0.000555s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000053s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000468s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.92% : 0.000594s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.56% : 0.004879s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000054s : 4: opt.transform.symbol_engine_opt 18.33% : 0.011830s : 1: opt_a 0.22% : 0.000142s : 1: opt_after_cconv 0.80% : 0.000516s : 1: opt_after_jit_grad 0.46% : 0.000296s : 1: opt_b 22.13% : 0.014278s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000058s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.03% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000036s : 1: remove_dup_value 2.91% : 0.001880s : 2: renormalize.infer 2.43% : 0.001570s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000055s : 1: rewriter_after_opt_a 0.22% : 0.000141s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000103s : 1: symbol_engine_optimizer 12.89% : 0.008314s : 1: task_emit 0.16% : 0.000106s : 1: tuple_transform 16.45% : 0.010614s : 1: type_inference 0.18% : 0.000117s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x1-kbk],max_mem:54.0M TotalTime = 0.0919353, [24] [bootstrap]: 0.0005705 [type_inference]: 0.00633397 [event_method]: 1.492e-05 [auto_monad]: 5.867e-05 [graph_reusing]: 5.50001e-06 [inline]: 2.29999e-06 [add_attr]: 0.0038938, [1] [add_attr_with_inline]: 0.00388085, [1] [Cycle 1]: 5.858e-05, [2] [tag_attr]: 1.849e-05 [meta_addattr_fg_expand]: 4.18001e-06 [parallel-infer-symbol]: 3.02002e-06 [pre_auto_parallel]: 3.174e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00471661, [53] [py_interpret_to_execute]: 2.387e-05 [rewriter_before_opt_a]: 6.506e-05 [opt_a]: 0.00241423, [2] [Cycle 1]: 0.00178551, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 3.333e-05 [loop_unroll]: 2.088e-05 [a_1]: 0.00048474 [with_stream_mark]: 1.624e-05 [recompute_prepare]: 8.65001e-06 [updatestate_depend_eliminate]: 4.05e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.62e-05 [accelerated_algorithm]: 6.02999e-06 [shard]: 2.02999e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.09002e-06 [auto_parallel]: 6.63e-06 [parallel]: 2.717e-05 [flash_sp]: 7.93001e-06 [merge_comm]: 3.78001e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 1.011e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.75001e-06 [merge_forward]: 3.72002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.018e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.11e-05 [merge_recompute_call_nodes]: 1.54e-06 [before_grad]: 9.70002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.86e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.95002e-06 [after_resolve]: 9.84001e-06 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00062779 [add_forward_monad_depend]: 5.37001e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.486e-05 [cse]: 2.821e-05 [a_3]: 4.333e-05 [Cycle 2]: 0.00061817, [45] [expand_dump_flag]: 1.22e-06 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00013017 [with_stream_mark]: 1.169e-05 [recompute_prepare]: 5.92001e-06 [updatestate_depend_eliminate]: 3.01999e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.90998e-06 [parameter_eliminate]: 1.09e-06 [a_2]: 6.848e-05 [accelerated_algorithm]: 5.69e-06 [shard]: 1.39e-06 [meta_shard_fg_expand]: 1.38002e-06 [shard_inline]: 5.86998e-06 [merge_send_recv]: 5.07e-06 [auto_parallel]: 6.38e-06 [parallel]: 5.60001e-06 [flash_sp]: 3.48e-06 [merge_comm]: 3.24001e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 6.01998e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.71e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.27001e-06 [merge_forward]: 2.84001e-06 [cell_reuse_recompute_pass]: 2.01e-06 [offload_activation]: 7.73001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.039e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 8.47e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.27999e-06 [after_resolve]: 9.92999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.26002e-06 [auto_monad_grad]: 8.90024e-07 [auto_monad_eliminator]: 7.29001e-06 [cse]: 1.528e-05 [a_3]: 3.217e-05 [py_interpret_to_execute_after_opt_a]: 1.027e-05 [slice_cell_reuse_recomputed_activation]: 2.21998e-06 [rewriter_after_opt_a]: 3.669e-05 [convert_after_rewriter]: 6.66e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00066455 [opt_b]: 0.00021622, [1] [Cycle 1]: 0.00020867, [7] [b_1]: 0.00012747 [b_2]: 7.84002e-06 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 2.84999e-06 [renormalize]: 7.49977e-07 [cse]: 2.328e-05 [optimize_parallel_all_gather_comm]: 1.963e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 3.05e-05 [loop_unroll]: 0.000534 [opt_after_cconv]: 0.00010811, [1] [Cycle 1]: 0.00010124, [7] [c_1]: 2.978e-05 [parameter_eliminate]: 3.71999e-06 [updatestate_depend_eliminate]: 6.41e-06 [updatestate_assign_eliminate]: 2.72001e-06 [updatestate_loads_eliminate]: 2.43e-06 [cse]: 2.171e-05 [renormalize]: 5.59987e-07 [remove_dup_value]: 1.383e-05 [tuple_transform]: 7.499e-05, [1] [Cycle 1]: 6.98e-05, [4] [d_1]: 4.282e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.51e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 5.89e-05 [cse_after_recomputation]: 2.088e-05, [1] [Cycle 1]: 1.628e-05, [1] [cse]: 1.081e-05 [environ_conv]: 6.09999e-06 [swap_dp_allreduce_reducescatter]: 5.37001e-06 [bias_add_comm_swap]: 3.19001e-06 [label_micro_interleaved_index]: 5.66e-06 [label_fine_grained_interleaved_index]: 2.88998e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.46002e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.213e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 3.56999e-06 [overlap_recompute_and_grad_model_parallel]: 4.70999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 1.99999e-06 [overlap_grad_ring_attention]: 3.92998e-06 [overlap_grad_flash_sp]: 1.981e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 7.359e-05, [1] [Cycle 1]: 6.922e-05, [6] [build]: 3.71001e-06 [elim_shapecalc]: 9.52999e-06 [elim_not_effective]: 1.255e-05 [opt_reshape]: 6.64001e-06 [fold_const_symbol]: 9.09998e-06 [renormalize]: 2.3999e-07 [detach_backward]: 2.36998e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.619e-05 [get_jit_bprop_graph]: 1.96e-06 [rewriter_after_jit_bprop_graph]: 4.30999e-06 [opt_after_jit_grad]: 0.00058147 [validate]: 3.921e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.0754111 [execute]: 9.17001e-06 Sums bootstrap : 0.000571s : 0.66% type_inference : 0.006334s : 7.28% event_method : 0.000015s : 0.02% auto_monad : 0.000059s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000032s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.03% optimize.rewriter_before_opt_a : 0.000065s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000615s : 0.71% optimize.opt_a.with_stream_mark : 0.000028s : 0.03% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.17% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.01% optimize.opt_a.parallel : 0.000033s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000018s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000628s : 0.72% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.03% optimize.opt_a.cse : 0.000043s : 0.05% optimize.opt_a.a_3 : 0.000076s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000665s : 0.76% optimize.opt_b.b_1 : 0.000127s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.04% optimize.loop_unroll : 0.000534s : 0.61% optimize.opt_after_cconv.c_1 : 0.000030s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000022s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000043s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000006s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000020s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000004s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000581s : 0.67% validate : 0.000039s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.075411s : 86.66% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000192 30 13.96% : 0.000027s : 5: substitution.arithmetic_simplify 0.96% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000006s : 4: substitution.graph_param_transform 67.92% : 0.000130s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.41% : 0.000005s : 4: substitution.remove_not_recompute_node 2.52% : 0.000005s : 4: substitution.replace_old_param 6.62% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006283 2 90.72% : 0.005700s : 1: type_inference.infer 9.28% : 0.000583s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.90% : 0.000029s : 3: replace.inline 29.10% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000140 5 91.75% : 0.000128s : 3: match.inline 8.25% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000168 1131 0.90% : 0.000002s : 11: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000004s : 19: predicate.arithmetic_simplify 1.00% : 0.000002s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.53% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.21% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.76% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.75% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.89% : 0.000003s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.72% : 0.000003s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.36% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.30% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.49% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000002s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.33% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.92% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.75% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.78% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.80% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.23% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000384 8 43.65% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.35% : 0.000216s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.102334 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.81% : 0.003899s : 1: add_attr 3.80% : 0.003885s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000064s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.58% : 0.000598s : 1: bootstrap 0.03% : 0.000034s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000021s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.53% : 0.000546s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.66% : 0.000677s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000017s : 1: opt.transform.mutable_eliminate 0.96% : 0.000987s : 78: opt.transform.opt_a 0.03% : 0.000029s : 1: opt.transform.opt_after_cconv 0.03% : 0.000026s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000106s : 28: opt.transform.opt_b 0.05% : 0.000047s : 2: opt.transform.opt_trans_graph 0.03% : 0.000035s : 4: opt.transform.symbol_engine_opt 2.36% : 0.002418s : 1: opt_a 0.11% : 0.000112s : 1: opt_after_cconv 0.58% : 0.000593s : 1: opt_after_jit_grad 0.21% : 0.000220s : 1: opt_b 4.61% : 0.004721s : 1: optimize 0.02% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000023s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000037s : 1: pre_auto_parallel 0.03% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000018s : 1: remove_dup_value 0.33% : 0.000340s : 1: renormalize.infer 0.27% : 0.000279s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000041s : 1: rewriter_after_opt_a 0.07% : 0.000069s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000076s : 1: symbol_engine_optimizer 73.71% : 0.075433s : 1: task_emit 0.08% : 0.000078s : 1: tuple_transform 6.21% : 0.006350s : 1: type_inference 0.07% : 0.000077s : 1: validate TotalTime = 0.0763586, [24] [bootstrap]: 0.0004892 [type_inference]: 0.00440245 [event_method]: 1.062e-05 [auto_monad]: 5.099e-05 [graph_reusing]: 5.26998e-06 [inline]: 1.92999e-06 [add_attr]: 0.00298226, [1] [add_attr_with_inline]: 0.00297457, [1] [Cycle 1]: 4.601e-05, [2] [tag_attr]: 1.215e-05 [meta_addattr_fg_expand]: 3.29001e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.107e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00372587, [53] [py_interpret_to_execute]: 1.461e-05 [rewriter_before_opt_a]: 3.962e-05 [opt_a]: 0.00188989, [2] [Cycle 1]: 0.00129019, [45] [expand_dump_flag]: 2.53003e-06 [switch_simplify]: 2.346e-05 [loop_unroll]: 1.376e-05 [a_1]: 0.00029234 [with_stream_mark]: 1.399e-05 [recompute_prepare]: 7.20998e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 2.93998e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.759e-05 [accelerated_algorithm]: 6.14001e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.13001e-06 [auto_parallel]: 5.60001e-06 [parallel]: 1.662e-05 [flash_sp]: 7.61999e-06 [merge_comm]: 3.52002e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.36998e-06 [allreduce_slice_to_reducescatter]: 7.80012e-07 [virtual_shard_identity]: 6.81999e-06 [virtual_dataset]: 6.14999e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 9.07999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.05002e-06 [flash_sp_send_recv_attached]: 2.86999e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.011e-05 [a_after_grad]: 8.53001e-06 [renormalize]: 0.00038174 [add_forward_monad_depend]: 4.05e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.391e-05 [cse]: 2.552e-05 [a_3]: 4.006e-05 [Cycle 2]: 0.00059041, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00012363 [with_stream_mark]: 9.20999e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.25002e-06 [updatestate_loads_eliminate]: 2.65997e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.834e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 4.63001e-06 [auto_parallel]: 5.25999e-06 [parallel]: 4.15e-06 [flash_sp]: 3.08e-06 [merge_comm]: 3.10998e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.32999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 4.85001e-06 [merge_forward]: 2.36e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 5.82001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46003e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.86001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.52001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.24998e-06 [after_resolve]: 8.95999e-06 [a_after_grad]: 8.01001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.271e-05 [a_3]: 3.134e-05 [py_interpret_to_execute_after_opt_a]: 7.09001e-06 [slice_cell_reuse_recomputed_activation]: 2.08002e-06 [rewriter_after_opt_a]: 3.041e-05 [convert_after_rewriter]: 6.94999e-06 [order_py_execute_after_rewriter]: 5.69e-06 [mutable_eliminate]: 0.00045652 [opt_b]: 0.00017787, [1] [Cycle 1]: 0.00017193, [7] [b_1]: 0.00010635 [b_2]: 6.72002e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 5.59987e-07 [cse]: 1.517e-05 [optimize_parallel_all_gather_comm]: 1.581e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.269e-05 [loop_unroll]: 0.00041256 [opt_after_cconv]: 9.471e-05, [1] [Cycle 1]: 8.901e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.04998e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.607e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.188e-05 [tuple_transform]: 6.771e-05, [1] [Cycle 1]: 6.35e-05, [4] [d_1]: 3.841e-05 [none_parameter_eliminate]: 1.39e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 4.178e-05 [cse_after_recomputation]: 2.031e-05, [1] [Cycle 1]: 1.584e-05, [1] [cse]: 1.058e-05 [environ_conv]: 3.729e-05 [swap_dp_allreduce_reducescatter]: 5.72999e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.77002e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 1.97999e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.61002e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 9.90025e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.11997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67001e-06 [control_data_broadcast_order]: 1.189e-05 [grouped_pairwise_exchange_alltoall]: 1.44998e-06 [offloading_packed_experts]: 3.77998e-06 [overlap_recompute_and_grad_model_parallel]: 4.44002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.622e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 1.87999e-06 [split_layernorm_comm]: 1.61998e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 6.872e-05, [1] [Cycle 1]: 6.462e-05, [6] [build]: 2.44001e-06 [elim_shapecalc]: 8.49998e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 9.05999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.519e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00044699 [validate]: 3.029e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.0639469 [execute]: 9.05999e-06 Sums bootstrap : 0.000489s : 0.68% type_inference : 0.004402s : 6.08% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.05% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.57% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000382s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000038s : 0.05% optimize.opt_a.a_3 : 0.000071s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.63% optimize.opt_b.b_1 : 0.000106s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000015s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000413s : 0.57% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000037s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000447s : 0.62% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063947s : 88.31% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.39% : 0.000022s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.18% : 0.000001s : 2: substitution.fold_const_symbol 4.17% : 0.000005s : 4: substitution.graph_param_transform 65.75% : 0.000080s : 2: substitution.inline 2.19% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000004s : 4: substitution.remove_not_recompute_node 3.12% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004361 2 91.65% : 0.003997s : 1: type_inference.infer 8.35% : 0.000364s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 1.05% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.62% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.97% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.90% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000262 6 43.08% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.92% : 0.000149s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084365 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.54% : 0.002987s : 1: add_attr 3.53% : 0.002978s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.61% : 0.000518s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.05% : 0.000041s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.91% : 0.000765s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.24% : 0.001893s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.54% : 0.000457s : 1: opt_after_jit_grad 0.21% : 0.000181s : 1: opt_b 4.42% : 0.003730s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.26% : 0.000216s : 1: renormalize.infer 0.19% : 0.000159s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 75.82% : 0.063968s : 1: task_emit 0.08% : 0.000070s : 1: tuple_transform 5.24% : 0.004417s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.0801753, [24] [bootstrap]: 0.00048268 [type_inference]: 0.00558679 [event_method]: 1.442e-05 [auto_monad]: 5.402e-05 [graph_reusing]: 5.39998e-06 [inline]: 2.48002e-06 [add_attr]: 0.00302752, [1] [add_attr_with_inline]: 0.00301901, [1] [Cycle 1]: 4.571e-05, [2] [tag_attr]: 1.648e-05 [meta_addattr_fg_expand]: 3.68999e-06 [parallel-infer-symbol]: 2.73003e-06 [pre_auto_parallel]: 2.504e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00401075, [53] [py_interpret_to_execute]: 2.075e-05 [rewriter_before_opt_a]: 5.991e-05 [opt_a]: 0.0021282, [2] [Cycle 1]: 0.00152283, [45] [expand_dump_flag]: 3.11001e-06 [switch_simplify]: 3.184e-05 [loop_unroll]: 2.084e-05 [a_1]: 0.00044432 [with_stream_mark]: 1.28e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.87998e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.546e-05 [accelerated_algorithm]: 6.44999e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.90002e-06 [merge_send_recv]: 7.58999e-06 [auto_parallel]: 6.71e-06 [parallel]: 1.836e-05 [flash_sp]: 7.08998e-06 [merge_comm]: 3.55998e-06 [allreduce_fusion]: 3.11999e-06 [matmul_add_comm_reduction]: 9.12001e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 6.89999e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.59998e-06 [merge_forward]: 3.77002e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.56002e-06 [before_grad]: 9.32999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.19999e-06 [flash_sp_send_recv_attached]: 2.37001e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.095e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00044087 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.65001e-06 [auto_monad_eliminator]: 1.365e-05 [cse]: 2.58e-05 [a_3]: 4.049e-05 [Cycle 2]: 0.00059612, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.71e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012493 [with_stream_mark]: 1.01e-05 [recompute_prepare]: 5.90002e-06 [updatestate_depend_eliminate]: 2.88998e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.35002e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.724e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.33002e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.26998e-06 [parallel]: 3.97e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.84999e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 7e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 7.34002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.25999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.48e-06 [set_forward_comm_id_for_comm_node_pass]: 3.17002e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.91002e-06 [a_after_grad]: 8e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.42999e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.60002e-06 [cse]: 1.377e-05 [a_3]: 3.391e-05 [py_interpret_to_execute_after_opt_a]: 7.86001e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.132e-05 [convert_after_rewriter]: 6.47001e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00045381 [opt_b]: 0.00018104, [1] [Cycle 1]: 0.00017523, [7] [b_1]: 0.00010678 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 4.2998e-07 [cse]: 1.734e-05 [optimize_parallel_all_gather_comm]: 1.56e-05 [overlap_param_gather]: 2.32999e-06 [cconv]: 2.277e-05 [loop_unroll]: 0.00041752 [opt_after_cconv]: 9.656e-05, [1] [Cycle 1]: 9.063e-05, [7] [c_1]: 2.755e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.68e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.282e-05 [tuple_transform]: 7.002e-05, [1] [Cycle 1]: 6.501e-05, [4] [d_1]: 3.928e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.08002e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.264e-05 [cse_after_recomputation]: 2.037e-05, [1] [Cycle 1]: 1.596e-05, [1] [cse]: 1.076e-05 [environ_conv]: 3.843e-05 [swap_dp_allreduce_reducescatter]: 5.51002e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.52998e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.18001e-06 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.45999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.184e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 4.66002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.06001e-06 [overlap_grad_flash_sp]: 1.738e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.18998e-06 [split_layernorm_comm]: 2.40002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.07e-05, [1] [Cycle 1]: 6.647e-05, [6] [build]: 2.64999e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.246e-05 [opt_reshape]: 6.61e-06 [fold_const_symbol]: 9.22999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 2.13998e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.593e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00045295 [validate]: 3.183e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.066231 [execute]: 8.42e-06 Sums bootstrap : 0.000483s : 0.63% type_inference : 0.005587s : 7.33% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000569s : 0.75% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000441s : 0.58% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.60% optimize.opt_b.b_1 : 0.000107s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000418s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000038s : 0.05% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000453s : 0.59% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.066231s : 86.94% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.27% : 0.000023s : 5: substitution.arithmetic_simplify 1.27% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000001s : 2: substitution.fold_const_symbol 3.29% : 0.000005s : 4: substitution.graph_param_transform 66.44% : 0.000109s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000004s : 4: substitution.remove_not_recompute_node 2.81% : 0.000005s : 4: substitution.replace_old_param 6.61% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005545 2 90.08% : 0.004995s : 1: type_inference.infer 9.92% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.21% : 0.000026s : 3: replace.inline 30.79% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.59% : 0.000107s : 3: match.inline 8.41% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.44% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 1.10% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.12% : 0.000010s : 51: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.46% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.50% : 0.000002s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.46% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.18% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.73% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000349 8 46.68% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.32% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088743 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.42% : 0.003032s : 1: add_attr 3.41% : 0.003023s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.58% : 0.000518s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.05% : 0.000042s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 1.05% : 0.000935s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000088s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.40% : 0.002131s : 1: opt_a 0.11% : 0.000100s : 1: opt_after_cconv 0.52% : 0.000462s : 1: opt_after_jit_grad 0.21% : 0.000185s : 1: opt_b 4.52% : 0.004015s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.25% : 0.000223s : 1: renormalize.infer 0.24% : 0.000212s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000073s : 1: symbol_engine_optimizer 74.66% : 0.066253s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 6.31% : 0.005601s : 1: type_inference 0.06% : 0.000054s : 1: validate TotalTime = 0.120587, [24] [bootstrap]: 0.00046224 [type_inference]: 0.0112775 [event_method]: 4.816e-05 [auto_monad]: 0.00011841 [graph_reusing]: 8.18001e-06 [inline]: 1.88997e-06 [add_attr]: 0.00297404, [1] [add_attr_with_inline]: 0.00296566, [1] [Cycle 1]: 6.904e-05, [2] [tag_attr]: 3.397e-05 [meta_addattr_fg_expand]: 9.32999e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 4.938e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0133791, [53] [py_interpret_to_execute]: 3.859e-05 [rewriter_before_opt_a]: 0.00014579 [opt_a]: 0.0110854, [3] [Cycle 1]: 0.00711271, [45] [expand_dump_flag]: 3.5e-06 [switch_simplify]: 7.406e-05 [loop_unroll]: 6.176e-05 [a_1]: 0.00145569 [with_stream_mark]: 2.28e-05 [recompute_prepare]: 2.189e-05 [updatestate_depend_eliminate]: 9.09e-06 [updatestate_assign_eliminate]: 7.7e-06 [updatestate_loads_eliminate]: 7.4e-06 [parameter_eliminate]: 2.71e-06 [a_2]: 0.00024609 [accelerated_algorithm]: 3.05e-05 [shard]: 1.75001e-06 [meta_shard_fg_expand]: 3.38e-06 [shard_inline]: 1.587e-05 [merge_send_recv]: 1.489e-05 [auto_parallel]: 1.088e-05 [parallel]: 1.784e-05 [flash_sp]: 1.223e-05 [merge_comm]: 9.89001e-06 [allreduce_fusion]: 8.91002e-06 [matmul_add_comm_reduction]: 2.616e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 1.83e-05 [virtual_dataset]: 1.593e-05 [get_grad_eliminate_]: 1.564e-05 [virtual_output]: 1.51e-05 [merge_forward]: 9.76e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.792e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.893e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 2.746e-05 [set_forward_comm_id_for_comm_node_pass]: 9.51e-06 [meta_fg_expand]: 0.00140238 [flash_sp_send_recv_attached]: 3.64002e-06 [receive_attached]: 2.68003e-06 [after_resolve]: 5.916e-05 [a_after_grad]: 8.198e-05 [renormalize]: 0.0024884 [add_forward_monad_depend]: 9.39e-06 [auto_monad_grad]: 5.25999e-06 [auto_monad_eliminator]: 5.633e-05 [cse]: 0.00016638 [a_3]: 0.0003346 [Cycle 2]: 0.00304192, [45] [expand_dump_flag]: 1.45001e-06 [switch_simplify]: 4.777e-05 [loop_unroll]: 4.377e-05 [a_1]: 0.00153345 [with_stream_mark]: 1.272e-05 [recompute_prepare]: 1.165e-05 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 4.43999e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012737 [accelerated_algorithm]: 1.228e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.87001e-06 [shard_inline]: 9.30001e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.56001e-06 [parallel]: 5.19e-06 [flash_sp]: 3.31001e-06 [merge_comm]: 5.13002e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 7.80998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.013e-05 [virtual_dataset]: 9.56e-06 [get_grad_eliminate_]: 9.42999e-06 [virtual_output]: 8.78001e-06 [merge_forward]: 4.82e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 8.91002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.655e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 1.438e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 7.168e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.657e-05 [a_after_grad]: 1.485e-05 [renormalize]: 0.00061615 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.25999e-06 [auto_monad_eliminator]: 1.474e-05 [cse]: 4.67e-05 [a_3]: 6.673e-05 [Cycle 3]: 0.00091585, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 1.098e-05 [loop_unroll]: 8.98002e-06 [a_1]: 0.00025707 [with_stream_mark]: 9.56998e-06 [recompute_prepare]: 9.47001e-06 [updatestate_depend_eliminate]: 4.86002e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.83001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.000124 [accelerated_algorithm]: 1.14e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 1.037e-05 [merge_send_recv]: 7.87998e-06 [auto_parallel]: 7.75998e-06 [parallel]: 4.89e-06 [flash_sp]: 1.05001e-06 [merge_comm]: 5.04e-06 [allreduce_fusion]: 4.81002e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.039e-05 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.61002e-06 [virtual_output]: 8.21002e-06 [merge_forward]: 4.06001e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 8.55001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.556e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.487e-05 [set_forward_comm_id_for_comm_node_pass]: 5.66e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.48e-05 [a_after_grad]: 1.426e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.33002e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 1.039e-05 [cse]: 2.674e-05 [a_3]: 5.965e-05 [py_interpret_to_execute_after_opt_a]: 1.146e-05 [slice_cell_reuse_recomputed_activation]: 2.27999e-06 [rewriter_after_opt_a]: 4.865e-05 [convert_after_rewriter]: 9.46e-06 [order_py_execute_after_rewriter]: 6.88998e-06 [mutable_eliminate]: 0.00046726 [opt_b]: 0.00029143, [1] [Cycle 1]: 0.00028518, [7] [b_1]: 0.00018925 [b_2]: 1.076e-05 [updatestate_depend_eliminate]: 7.08e-06 [updatestate_assign_eliminate]: 4.07998e-06 [updatestate_loads_eliminate]: 4.17e-06 [renormalize]: 4.59986e-07 [cse]: 3.475e-05 [optimize_parallel_all_gather_comm]: 2.168e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.065e-05 [loop_unroll]: 0.00042479 [opt_after_cconv]: 0.00014223, [1] [Cycle 1]: 0.00013615, [7] [c_1]: 4.841e-05 [parameter_eliminate]: 2.71999e-06 [updatestate_depend_eliminate]: 6.99001e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 4.17e-06 [cse]: 3.438e-05 [renormalize]: 5.89993e-07 [remove_dup_value]: 2.928e-05 [tuple_transform]: 0.00010357, [1] [Cycle 1]: 9.888e-05, [4] [d_1]: 6.819e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.89001e-06 [partial_unused_args_eliminate]: 2.06998e-06 [add_recomputation]: 5.83e-05 [cse_after_recomputation]: 3.215e-05, [1] [Cycle 1]: 2.762e-05, [1] [cse]: 2.247e-05 [environ_conv]: 8.89998e-06 [swap_dp_allreduce_reducescatter]: 7.96001e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.50001e-06 [slice_recompute_activation]: 2.06998e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.29003e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 7.59988e-07 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 1.15001e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64998e-06 [control_data_broadcast_order]: 1.697e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.90001e-06 [overlap_recompute_and_grad_model_parallel]: 5.59e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 5.58997e-06 [overlap_grad_flash_sp]: 2.367e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.71998e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 0.00010084, [1] [Cycle 1]: 9.613e-05, [6] [build]: 1.004e-05 [elim_shapecalc]: 1.396e-05 [elim_not_effective]: 1.825e-05 [opt_reshape]: 1.001e-05 [fold_const_symbol]: 1.548e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 2.514e-05 [get_jit_bprop_graph]: 1.20999e-06 [rewriter_after_jit_bprop_graph]: 3.70998e-06 [opt_after_jit_grad]: 0.00050316 [validate]: 4.637e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0914629 [execute]: 9.02999e-06 Sums bootstrap : 0.000462s : 0.40% type_inference : 0.011278s : 9.69% event_method : 0.000048s : 0.04% auto_monad : 0.000118s : 0.10% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000146s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000133s : 0.11% optimize.opt_a.loop_unroll : 0.000115s : 0.10% optimize.opt_a.a_1 : 0.003246s : 2.79% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000497s : 0.43% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000036s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000017s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001477s : 1.27% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.10% optimize.opt_a.renormalize : 0.003105s : 2.67% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.07% optimize.opt_a.cse : 0.000240s : 0.21% optimize.opt_a.a_3 : 0.000461s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000049s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.40% optimize.opt_b.b_1 : 0.000189s : 0.16% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000035s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000425s : 0.37% optimize.opt_after_cconv.c_1 : 0.000048s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000034s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000503s : 0.43% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.091463s : 78.60% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000756 222 6.02% : 0.000046s : 12: substitution.arithmetic_simplify 1.76% : 0.000013s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.35% : 0.000003s : 5: substitution.fold_const_symbol 1.01% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 55.46% : 0.000420s : 17: substitution.inline 2.15% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.17% : 0.000024s : 10: substitution.replace_applicator 1.43% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.63% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011205 2 86.52% : 0.009694s : 1: type_inference.infer 13.48% : 0.001511s : 1: type_inference.specialize ------[replace.] 0.000217 33 56.88% : 0.000123s : 17: replace.inline 43.12% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 33 92.38% : 0.000411s : 17: match.inline 7.62% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000760 5764 1.12% : 0.000009s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.14% : 0.000009s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.04% : 0.000016s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.61% : 0.000005s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000042s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000009s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.72% : 0.000021s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000015s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.23% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001602 34 56.31% : 0.000902s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.69% : 0.000700s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.145295 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.05% : 0.002978s : 1: add_attr 2.04% : 0.002969s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000125s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.33% : 0.000484s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000055s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.30% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000476s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.39% : 0.004923s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000174s : 28: opt.transform.opt_b 0.05% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 7.63% : 0.011089s : 1: opt_a 0.10% : 0.000146s : 1: opt_after_cconv 0.35% : 0.000512s : 1: opt_after_jit_grad 0.20% : 0.000295s : 1: opt_b 9.21% : 0.013383s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000054s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.13% : 0.001644s : 2: renormalize.infer 1.00% : 0.001448s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000053s : 1: rewriter_after_opt_a 0.10% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000104s : 1: symbol_engine_optimizer 62.97% : 0.091486s : 1: task_emit 0.07% : 0.000107s : 1: tuple_transform 7.77% : 0.011292s : 1: type_inference 0.05% : 0.000072s : 1: validate TotalTime = 0.077446, [24] [bootstrap]: 0.00049236 [type_inference]: 0.00427692 [event_method]: 1.092e-05 [auto_monad]: 5.018e-05 [graph_reusing]: 5.20001e-06 [inline]: 1.84998e-06 [add_attr]: 0.0029685, [1] [add_attr_with_inline]: 0.00296023, [1] [Cycle 1]: 4.386e-05, [2] [tag_attr]: 1.115e-05 [meta_addattr_fg_expand]: 3.45998e-06 [parallel-infer-symbol]: 2.83998e-06 [pre_auto_parallel]: 2.127e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.00369504, [53] [py_interpret_to_execute]: 1.529e-05 [rewriter_before_opt_a]: 3.867e-05 [opt_a]: 0.00185715, [2] [Cycle 1]: 0.00125442, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 2.38e-05 [loop_unroll]: 1.378e-05 [a_1]: 0.00029347 [with_stream_mark]: 1.302e-05 [recompute_prepare]: 7.46999e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 2.94001e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.716e-05 [accelerated_algorithm]: 6.41998e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.66e-06 [parallel]: 1.794e-05 [flash_sp]: 7.71001e-06 [merge_comm]: 3.42002e-06 [allreduce_fusion]: 3.42997e-06 [matmul_add_comm_reduction]: 9.56e-06 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.30001e-06 [virtual_output]: 5.45001e-06 [merge_forward]: 3.48999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 9.67999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.099e-05 [merge_recompute_call_nodes]: 1.34003e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 2.06998e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.085e-05 [a_after_grad]: 9.09e-06 [renormalize]: 0.00034222 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 2.12001e-06 [auto_monad_eliminator]: 1.292e-05 [cse]: 2.647e-05 [a_3]: 3.952e-05 [Cycle 2]: 0.0005935, [45] [expand_dump_flag]: 9.40025e-07 [switch_simplify]: 6.69999e-06 [loop_unroll]: 5.39998e-06 [a_1]: 0.0001269 [with_stream_mark]: 9.44e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.48002e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.741e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.2e-06 [auto_parallel]: 5.52999e-06 [parallel]: 3.81999e-06 [flash_sp]: 2.83998e-06 [merge_comm]: 3.10002e-06 [allreduce_fusion]: 3.00002e-06 [matmul_add_comm_reduction]: 4.77e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.24001e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 7.76001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.76003e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.22998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.27001e-06 [cse]: 1.283e-05 [a_3]: 3.173e-05 [py_interpret_to_execute_after_opt_a]: 7.31001e-06 [slice_cell_reuse_recomputed_activation]: 2.00002e-06 [rewriter_after_opt_a]: 2.975e-05 [convert_after_rewriter]: 6.89999e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00048764 [opt_b]: 0.00018178, [1] [Cycle 1]: 0.00017537, [7] [b_1]: 0.00010679 [b_2]: 7.38e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 2.9002e-07 [cse]: 1.669e-05 [optimize_parallel_all_gather_comm]: 1.473e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.191e-05 [loop_unroll]: 0.000412 [opt_after_cconv]: 9.465e-05, [1] [Cycle 1]: 8.905e-05, [7] [c_1]: 2.731e-05 [parameter_eliminate]: 2.47001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.648e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.859e-05, [1] [Cycle 1]: 6.408e-05, [4] [d_1]: 3.857e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.64998e-06 [add_recomputation]: 4.464e-05 [cse_after_recomputation]: 1.944e-05, [1] [Cycle 1]: 1.549e-05, [1] [cse]: 1.042e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.11003e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.24998e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.133e-05 [grouped_pairwise_exchange_alltoall]: 1.85001e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.57999e-06 [overlap_recompute_comm]: 2.13002e-06 [overlap_grad_ring_attention]: 4.18999e-06 [overlap_grad_flash_sp]: 1.743e-05 [begin_end_overlap_inline]: 4.70027e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.738e-05, [1] [Cycle 1]: 6.347e-05, [6] [build]: 2.09e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.137e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.57998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.56998e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.618e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.78999e-06 [opt_after_jit_grad]: 0.00044752 [validate]: 3.098e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0651908 [execute]: 9.10001e-06 Sums bootstrap : 0.000492s : 0.67% type_inference : 0.004277s : 5.82% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.57% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000342s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000071s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000488s : 0.66% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.56% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000448s : 0.61% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.065191s : 88.69% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.38% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.16% : 0.000005s : 4: substitution.graph_param_transform 65.83% : 0.000078s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004238 2 91.80% : 0.003891s : 1: type_inference.infer 8.20% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.81% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.40% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.79% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.87% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.50% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.86% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.69% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.68% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.92% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.61% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 43.00% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.00% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.085374 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.48% : 0.002973s : 1: add_attr 3.47% : 0.002964s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000055s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000528s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000497s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.90% : 0.000771s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.18% : 0.001860s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.54% : 0.000457s : 1: opt_after_jit_grad 0.22% : 0.000185s : 1: opt_b 4.33% : 0.003699s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.22% : 0.000187s : 1: renormalize.infer 0.17% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.38% : 0.065213s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 5.03% : 0.004291s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.124567, [24] [bootstrap]: 0.00051142 [type_inference]: 0.0105375 [event_method]: 4.583e-05 [auto_monad]: 0.00011914 [graph_reusing]: 8.29002e-06 [inline]: 2.56e-06 [add_attr]: 0.00309911, [1] [add_attr_with_inline]: 0.00309063, [1] [Cycle 1]: 7.013e-05, [2] [tag_attr]: 3.343e-05 [meta_addattr_fg_expand]: 9.01998e-06 [parallel-infer-symbol]: 2.75002e-06 [pre_auto_parallel]: 4.759e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.0137605, [53] [py_interpret_to_execute]: 4.189e-05 [rewriter_before_opt_a]: 0.00016809 [opt_a]: 0.0114105, [3] [Cycle 1]: 0.00737225, [45] [expand_dump_flag]: 4.80999e-06 [switch_simplify]: 6.763e-05 [loop_unroll]: 5.765e-05 [a_1]: 0.00137176 [with_stream_mark]: 2.429e-05 [recompute_prepare]: 2.256e-05 [updatestate_depend_eliminate]: 9.49999e-06 [updatestate_assign_eliminate]: 8.1e-06 [updatestate_loads_eliminate]: 7.96001e-06 [parameter_eliminate]: 2.58e-06 [a_2]: 0.0002498 [accelerated_algorithm]: 3.141e-05 [shard]: 1.99e-06 [meta_shard_fg_expand]: 3.63999e-06 [shard_inline]: 1.607e-05 [merge_send_recv]: 1.623e-05 [auto_parallel]: 1.146e-05 [parallel]: 1.863e-05 [flash_sp]: 1.206e-05 [merge_comm]: 1.04e-05 [allreduce_fusion]: 8.91997e-06 [matmul_add_comm_reduction]: 2.688e-05 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 1.834e-05 [virtual_dataset]: 1.591e-05 [get_grad_eliminate_]: 1.667e-05 [virtual_output]: 1.54e-05 [merge_forward]: 9.96e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 1.8e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.904e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.762e-05 [set_forward_comm_id_for_comm_node_pass]: 9.66e-06 [meta_fg_expand]: 0.00147165 [flash_sp_send_recv_attached]: 3.79002e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 0.00011431 [a_after_grad]: 8.499e-05 [renormalize]: 0.00266531 [add_forward_monad_depend]: 9.84001e-06 [auto_monad_grad]: 5.60001e-06 [auto_monad_eliminator]: 5.843e-05 [cse]: 0.00018021 [a_3]: 0.00034493 [Cycle 2]: 0.00310172, [45] [expand_dump_flag]: 1.68002e-06 [switch_simplify]: 4.763e-05 [loop_unroll]: 4.609e-05 [a_1]: 0.00160125 [with_stream_mark]: 1.204e-05 [recompute_prepare]: 1.112e-05 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.71001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012913 [accelerated_algorithm]: 1.223e-05 [shard]: 1.09003e-06 [meta_shard_fg_expand]: 1.96e-06 [shard_inline]: 9.36e-06 [merge_send_recv]: 7.01999e-06 [auto_parallel]: 7.68001e-06 [parallel]: 4.92999e-06 [flash_sp]: 3.19001e-06 [merge_comm]: 5.09e-06 [allreduce_fusion]: 4.73001e-06 [matmul_add_comm_reduction]: 8e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.155e-05 [virtual_dataset]: 9.35001e-06 [get_grad_eliminate_]: 9.68997e-06 [virtual_output]: 8.82999e-06 [merge_forward]: 4.62e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.15001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.678e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.454e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17999e-06 [meta_fg_expand]: 3.964e-05 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.15999e-06 [after_resolve]: 1.523e-05 [a_after_grad]: 1.499e-05 [renormalize]: 0.00062784 [add_forward_monad_depend]: 4.08001e-06 [auto_monad_grad]: 1.25999e-06 [auto_monad_eliminator]: 1.573e-05 [cse]: 4.989e-05 [a_3]: 6.795e-05 [Cycle 3]: 0.00092233, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.073e-05 [loop_unroll]: 9.21998e-06 [a_1]: 0.00025264 [with_stream_mark]: 1.027e-05 [recompute_prepare]: 9.69e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 4.02002e-06 [updatestate_loads_eliminate]: 4.50001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.0001259 [accelerated_algorithm]: 1.187e-05 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.24e-06 [merge_send_recv]: 7.1e-06 [auto_parallel]: 7.6e-06 [parallel]: 4.54998e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.22e-06 [allreduce_fusion]: 5.12e-06 [matmul_add_comm_reduction]: 7.7e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.081e-05 [virtual_dataset]: 9.11002e-06 [get_grad_eliminate_]: 8.57998e-06 [virtual_output]: 8.53001e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.598e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.423e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 3.35003e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.494e-05 [a_after_grad]: 1.499e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.90024e-07 [auto_monad_eliminator]: 1.149e-05 [cse]: 2.823e-05 [a_3]: 6.028e-05 [py_interpret_to_execute_after_opt_a]: 1.074e-05 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 4.984e-05 [convert_after_rewriter]: 9.54999e-06 [order_py_execute_after_rewriter]: 7.01999e-06 [mutable_eliminate]: 0.00048161 [opt_b]: 0.00029533, [1] [Cycle 1]: 0.00028899, [7] [b_1]: 0.00019294 [b_2]: 1.103e-05 [updatestate_depend_eliminate]: 7.67998e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 4.13999e-06 [renormalize]: 2.60014e-07 [cse]: 3.385e-05 [optimize_parallel_all_gather_comm]: 2.135e-05 [overlap_param_gather]: 2.02999e-06 [cconv]: 1.982e-05 [loop_unroll]: 0.00043301 [opt_after_cconv]: 0.00014044, [1] [Cycle 1]: 0.00013434, [7] [c_1]: 5.001e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 7.58001e-06 [updatestate_assign_eliminate]: 4.44002e-06 [updatestate_loads_eliminate]: 4.07e-06 [cse]: 3.181e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 2.953e-05 [tuple_transform]: 0.00010348, [1] [Cycle 1]: 9.878e-05, [4] [d_1]: 6.798e-05 [none_parameter_eliminate]: 2.01998e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 9.85002e-06 [partial_unused_args_eliminate]: 1.87001e-06 [add_recomputation]: 5.902e-05 [cse_after_recomputation]: 3.304e-05, [1] [Cycle 1]: 2.82e-05, [1] [cse]: 2.274e-05 [environ_conv]: 9.01998e-06 [swap_dp_allreduce_reducescatter]: 8.13999e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.41002e-06 [label_fine_grained_interleaved_index]: 3.06001e-06 [merge_cast_opt]: 1.50001e-06 [slice_recompute_activation]: 2.01998e-06 [micro_interleaved_order_control]: 2.11998e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 9.99979e-07 [remove_cast_before_assign_add]: 1.04003e-06 [full_micro_interleaved_order_control]: 1.99999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.30001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.744e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 5.12e-06 [overlap_recompute_and_grad_model_parallel]: 5.51998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 2.34999e-06 [overlap_grad_ring_attention]: 5.65001e-06 [overlap_grad_flash_sp]: 2.389e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 0.00010064, [1] [Cycle 1]: 9.647e-05, [6] [build]: 1.021e-05 [elim_shapecalc]: 1.383e-05 [elim_not_effective]: 1.864e-05 [opt_reshape]: 1.032e-05 [fold_const_symbol]: 1.551e-05 [renormalize]: 2.3999e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.72999e-06 [auto_monad_reorder]: 2.659e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.71001e-06 [opt_after_jit_grad]: 0.0004761 [validate]: 4.795e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0956315 [execute]: 9.69e-06 Sums bootstrap : 0.000511s : 0.43% type_inference : 0.010538s : 8.77% event_method : 0.000046s : 0.04% auto_monad : 0.000119s : 0.10% graph_reusing : 0.000008s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000042s : 0.03% optimize.rewriter_before_opt_a : 0.000168s : 0.14% optimize.opt_a.expand_dump_flag : 0.000008s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.10% optimize.opt_a.loop_unroll : 0.000113s : 0.09% optimize.opt_a.a_1 : 0.003226s : 2.68% optimize.opt_a.with_stream_mark : 0.000047s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000505s : 0.42% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000027s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001515s : 1.26% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000144s : 0.12% optimize.opt_a.a_after_grad : 0.000115s : 0.10% optimize.opt_a.renormalize : 0.003293s : 2.74% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.07% optimize.opt_a.cse : 0.000258s : 0.21% optimize.opt_a.a_3 : 0.000473s : 0.39% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000050s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000482s : 0.40% optimize.opt_b.b_1 : 0.000193s : 0.16% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000433s : 0.36% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000027s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000476s : 0.40% validate : 0.000048s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.095632s : 79.58% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000763 218 5.76% : 0.000044s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.89% : 0.000426s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000015s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 20: substitution.remove_not_recompute_node 3.17% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.82% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.25% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010466 2 85.46% : 0.008944s : 1: type_inference.infer 14.54% : 0.001522s : 1: type_inference.specialize ------[replace.] 0.000243 30 51.26% : 0.000124s : 16: replace.inline 48.74% : 0.000118s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000449 30 93.04% : 0.000418s : 16: match.inline 6.96% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.13% : 0.000016s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.14% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.65% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.58% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.56% : 0.000042s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.70% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.62% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.22% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.61% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.67% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.35% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.33% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.33% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000014s : 97: predicate.switch_defer_inline 2.87% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.86% : 0.000037s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.54% : 0.000012s : 83: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.21% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001740 32 55.31% : 0.000962s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.69% : 0.000778s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.150023 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.07% : 0.003103s : 1: add_attr 2.06% : 0.003095s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000127s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.36% : 0.000546s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000053s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.29% : 0.000442s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.33% : 0.000491s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.32% : 0.004975s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000178s : 28: opt.transform.opt_b 0.05% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000055s : 4: opt.transform.symbol_engine_opt 7.61% : 0.011413s : 1: opt_a 0.10% : 0.000144s : 1: opt_after_cconv 0.32% : 0.000486s : 1: opt_after_jit_grad 0.20% : 0.000299s : 1: opt_b 9.17% : 0.013764s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000046s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.13% : 0.001697s : 2: renormalize.infer 1.05% : 0.001582s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000054s : 1: rewriter_after_opt_a 0.12% : 0.000173s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000103s : 1: symbol_engine_optimizer 63.76% : 0.095656s : 1: task_emit 0.07% : 0.000107s : 1: tuple_transform 7.03% : 0.010553s : 1: type_inference 0.05% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x1-ge],max_mem:54.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x2-pynative],max_mem:54.0M TotalTime = 0.021805, [24] [bootstrap]: 0.0005494 [type_inference]: 0.00622116 [event_method]: 1.534e-05 [auto_monad]: 5.855e-05 [graph_reusing]: 5.05001e-06 [inline]: 1.73997e-06 [add_attr]: 0.00345806, [1] [add_attr_with_inline]: 0.00344774, [1] [Cycle 1]: 4.505e-05, [2] [tag_attr]: 1.613e-05 [meta_addattr_fg_expand]: 4.48999e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.81e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00403536, [53] [py_interpret_to_execute]: 2.03e-05 [rewriter_before_opt_a]: 6.105e-05 [opt_a]: 0.00218488, [2] [Cycle 1]: 0.00157268, [45] [expand_dump_flag]: 2.63e-06 [switch_simplify]: 3.301e-05 [loop_unroll]: 2.19e-05 [a_1]: 0.00046361 [with_stream_mark]: 1.312e-05 [recompute_prepare]: 7.78999e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 3.28998e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.32999e-06 [a_2]: 7.774e-05 [accelerated_algorithm]: 6.60002e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 8.00999e-06 [auto_parallel]: 6.51e-06 [parallel]: 2.178e-05 [flash_sp]: 7.38e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 8.95999e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.55e-06 [virtual_dataset]: 6.23e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.46998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.136e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.37999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.58998e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.24001e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.92999e-06 [renormalize]: 0.00043427 [add_forward_monad_depend]: 4.58001e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.401e-05 [cse]: 2.621e-05 [a_3]: 4.189e-05 [Cycle 2]: 0.00060278, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.74999e-06 [a_1]: 0.00012856 [with_stream_mark]: 9.71998e-06 [recompute_prepare]: 5.90002e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.75002e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.877e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.16002e-06 [shard_inline]: 5.76998e-06 [merge_send_recv]: 4.72e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.03001e-06 [flash_sp]: 2.95002e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.35999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.17999e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.21002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00002e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.00007e-07 [after_resolve]: 9.19998e-06 [a_after_grad]: 8.3e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.20002e-06 [cse]: 1.718e-05 [a_3]: 3.306e-05 [py_interpret_to_execute_after_opt_a]: 7.88999e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 2.919e-05 [convert_after_rewriter]: 7.02002e-06 [order_py_execute_after_rewriter]: 5.13002e-06 [mutable_eliminate]: 0.00044967 [opt_b]: 0.00018561, [1] [Cycle 1]: 0.00017961, [7] [b_1]: 0.0001115 [b_2]: 7.21999e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.44999e-06 [renormalize]: 3.80009e-07 [cse]: 1.629e-05 [optimize_parallel_all_gather_comm]: 1.547e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.238e-05 [loop_unroll]: 0.00041434 [opt_after_cconv]: 9.625e-05, [1] [Cycle 1]: 9.048e-05, [7] [c_1]: 2.883e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.628e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.269e-05 [tuple_transform]: 7.096e-05, [1] [Cycle 1]: 6.649e-05, [4] [d_1]: 4.078e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 4.72e-05 [cse_after_recomputation]: 2.212e-05, [1] [Cycle 1]: 1.739e-05, [1] [cse]: 1.206e-05 [environ_conv]: 4.47e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.61999e-06 [label_micro_interleaved_index]: 4.39002e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.59001e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.48002e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.186e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 3.94002e-06 [overlap_grad_flash_sp]: 1.7e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.90001e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.994e-05, [1] [Cycle 1]: 6.587e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 8.38001e-06 [elim_not_effective]: 1.181e-05 [opt_reshape]: 6.43998e-06 [fold_const_symbol]: 9.39e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.60001e-06 [auto_monad_reorder]: 1.579e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 0.00011918 [opt_after_jit_grad]: 0.00046109 [validate]: 5.636e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00655061 [execute]: 6.56e-06 Sums bootstrap : 0.000549s : 3.17% type_inference : 0.006221s : 35.86% event_method : 0.000015s : 0.09% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000061s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000028s : 0.16% optimize.opt_a.a_1 : 0.000592s : 3.41% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000147s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000434s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000075s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.59% optimize.opt_b.b_1 : 0.000112s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000414s : 2.39% optimize.opt_after_cconv.c_1 : 0.000029s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000119s : 0.69% opt_after_jit_grad : 0.000461s : 2.66% validate : 0.000056s : 0.32% backend_pass : 0.000001s : 0.01% task_emit : 0.006551s : 37.76% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000170 30 14.36% : 0.000024s : 5: substitution.arithmetic_simplify 0.97% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.60% : 0.000006s : 4: substitution.graph_param_transform 67.48% : 0.000115s : 3: substitution.inline 1.61% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.72% : 0.000005s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.21% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006171 2 90.03% : 0.005556s : 1: type_inference.infer 9.97% : 0.000615s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.80% : 0.000028s : 3: replace.inline 30.20% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.25% : 0.000113s : 3: match.inline 7.75% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.06% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.69% : 0.000003s : 16: predicate.partial_defer_inline 1.51% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.26% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000398 8 46.24% : 0.000184s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.76% : 0.000214s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030857 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.22% : 0.003462s : 1: add_attr 11.18% : 0.003451s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000586s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000967s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000092s : 28: opt.transform.opt_b 0.15% : 0.000045s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.09% : 0.002188s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.52% : 0.000471s : 1: opt_after_jit_grad 0.61% : 0.000189s : 1: opt_b 13.09% : 0.004040s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.72% : 0.000221s : 1: renormalize.infer 0.67% : 0.000206s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.40% : 0.000125s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000073s : 1: symbol_engine_optimizer 21.26% : 0.006561s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 20.20% : 0.006235s : 1: type_inference 0.29% : 0.000090s : 1: validate TotalTime = 0.021478, [24] [bootstrap]: 0.00048986 [type_inference]: 0.00489381 [event_method]: 1.211e-05 [auto_monad]: 5.153e-05 [graph_reusing]: 5.94e-06 [inline]: 1.96e-06 [add_attr]: 0.00333841, [1] [add_attr_with_inline]: 0.00332854, [1] [Cycle 1]: 5.906e-05, [2] [tag_attr]: 1.511e-05 [meta_addattr_fg_expand]: 3.49001e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 2.763e-05 [insert-virtual-dataset]: 2.89001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.11998e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00444615, [53] [py_interpret_to_execute]: 1.978e-05 [rewriter_before_opt_a]: 4.502e-05 [opt_a]: 0.00234688, [2] [Cycle 1]: 0.00166654, [45] [expand_dump_flag]: 2.63e-06 [switch_simplify]: 2.657e-05 [loop_unroll]: 1.434e-05 [a_1]: 0.00032536 [with_stream_mark]: 1.721e-05 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 4.38001e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.919e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.84999e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.66999e-06 [merge_send_recv]: 8.80999e-06 [auto_parallel]: 7.03e-06 [parallel]: 1.777e-05 [flash_sp]: 9.66998e-06 [merge_comm]: 4.26001e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 9.69999e-06 [allreduce_slice_to_reducescatter]: 1.03001e-06 [virtual_shard_identity]: 9.74e-06 [virtual_dataset]: 6.33e-06 [get_grad_eliminate_]: 5.61003e-06 [virtual_output]: 6.12001e-06 [merge_forward]: 4.75999e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 1.07e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.407e-05 [merge_recompute_call_nodes]: 1.63002e-06 [before_grad]: 1.014e-05 [set_forward_comm_id_for_comm_node_pass]: 4.60001e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 2.58e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 1.243e-05 [a_after_grad]: 9.31e-06 [renormalize]: 0.00063467 [add_forward_monad_depend]: 6.33998e-06 [auto_monad_grad]: 2.66999e-06 [auto_monad_eliminator]: 1.609e-05 [cse]: 2.97e-05 [a_3]: 4.631e-05 [Cycle 2]: 0.00066889, [45] [expand_dump_flag]: 1.76998e-06 [switch_simplify]: 8.21002e-06 [loop_unroll]: 5.75001e-06 [a_1]: 0.00013568 [with_stream_mark]: 1.453e-05 [recompute_prepare]: 6.31998e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 1.07998e-06 [a_2]: 7.15e-05 [accelerated_algorithm]: 5.76e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 6.07999e-06 [auto_parallel]: 6.11e-06 [parallel]: 5.99999e-06 [flash_sp]: 3.55e-06 [merge_comm]: 3.53999e-06 [allreduce_fusion]: 2.98998e-06 [matmul_add_comm_reduction]: 7.35998e-06 [allreduce_slice_to_reducescatter]: 8.89995e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.63002e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.63999e-06 [cell_reuse_recompute_pass]: 1.91003e-06 [offload_activation]: 8.3e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.04e-06 [before_grad]: 8.79e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 2.28998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.054e-05 [a_after_grad]: 8.32e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 2.86999e-06 [auto_monad_grad]: 1.44998e-06 [auto_monad_eliminator]: 1.014e-05 [cse]: 1.618e-05 [a_3]: 3.465e-05 [py_interpret_to_execute_after_opt_a]: 1.258e-05 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.819e-05 [convert_after_rewriter]: 8.22e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.0006109 [opt_b]: 0.00020007, [1] [Cycle 1]: 0.00019262, [7] [b_1]: 0.00011369 [b_2]: 7.49002e-06 [updatestate_depend_eliminate]: 8.03001e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.66e-06 [renormalize]: 3.00002e-07 [cse]: 2.273e-05 [optimize_parallel_all_gather_comm]: 1.9e-05 [overlap_param_gather]: 1.78997e-06 [cconv]: 2.773e-05 [loop_unroll]: 0.00044872 [opt_after_cconv]: 9.983e-05, [1] [Cycle 1]: 9.397e-05, [7] [c_1]: 2.901e-05 [parameter_eliminate]: 3.24001e-06 [updatestate_depend_eliminate]: 5.89999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.709e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 1.392e-05 [tuple_transform]: 7.206e-05, [1] [Cycle 1]: 6.787e-05, [4] [d_1]: 4.139e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.48003e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.753e-05 [cse_after_recomputation]: 2.059e-05, [1] [Cycle 1]: 1.618e-05, [1] [cse]: 1.094e-05 [environ_conv]: 5.20001e-06 [swap_dp_allreduce_reducescatter]: 5.74e-06 [bias_add_comm_swap]: 2.41998e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.94001e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.76e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.60997e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64998e-06 [control_data_broadcast_order]: 1.208e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.65001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.43999e-06 [overlap_grad_flash_sp]: 1.987e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 7.405e-05, [1] [Cycle 1]: 6.977e-05, [6] [build]: 3.09999e-06 [elim_shapecalc]: 9.29e-06 [elim_not_effective]: 1.279e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 9.41e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.01e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.748e-05 [get_jit_bprop_graph]: 1.54e-06 [rewriter_after_jit_bprop_graph]: 4.20999e-06 [opt_after_jit_grad]: 0.0004799 [validate]: 3.811e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00737917 [execute]: 8.82999e-06 Sums bootstrap : 0.000490s : 2.88% type_inference : 0.004894s : 28.74% event_method : 0.000012s : 0.07% auto_monad : 0.000052s : 0.30% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000045s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000035s : 0.20% optimize.opt_a.loop_unroll : 0.000020s : 0.12% optimize.opt_a.a_1 : 0.000461s : 2.71% optimize.opt_a.with_stream_mark : 0.000032s : 0.19% optimize.opt_a.recompute_prepare : 0.000016s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000008s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.89% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000005s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000015s : 0.09% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000024s : 0.14% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000008s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000017s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000017s : 0.10% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000008s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000019s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000025s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000019s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000008s : 0.05% optimize.opt_a.meta_fg_expand : 0.000005s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000023s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.10% optimize.opt_a.renormalize : 0.000635s : 3.73% optimize.opt_a.add_forward_monad_depend : 0.000009s : 0.05% optimize.opt_a.auto_monad_grad : 0.000004s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000026s : 0.15% optimize.opt_a.cse : 0.000046s : 0.27% optimize.opt_a.a_3 : 0.000081s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.07% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000038s : 0.22% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000611s : 3.59% optimize.opt_b.b_1 : 0.000114s : 0.67% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.05% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000023s : 0.13% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000028s : 0.16% optimize.loop_unroll : 0.000449s : 2.64% optimize.opt_after_cconv.c_1 : 0.000029s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000020s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000013s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000480s : 2.82% validate : 0.000038s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.007379s : 43.34% execute : 0.000009s : 0.05% Time group info: ------[substitution.] 0.000149 26 17.78% : 0.000026s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.73% : 0.000006s : 4: substitution.graph_param_transform 66.86% : 0.000099s : 2: substitution.inline 2.42% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.59% : 0.000005s : 4: substitution.remove_not_recompute_node 3.44% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004846 2 92.46% : 0.004481s : 1: type_inference.infer 7.54% : 0.000366s : 1: type_inference.specialize ------[replace.] 0.000022 2 100.00% : 0.000022s : 2: replace.inline ------[match.] 0.000098 2 100.00% : 0.000098s : 2: match.inline ------[predicate.] 0.000152 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.65% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.56% : 0.000004s : 17: predicate.arithmetic_simplify 0.73% : 0.000001s : 9: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.75% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.78% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.72% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.41% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.45% : 0.000001s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.39% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.95% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.96% : 0.000001s : 13: predicate.environ_get_depend_swap 2.06% : 0.000003s : 21: predicate.environ_get_eliminate 0.97% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.86% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.93% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.28% : 0.000010s : 44: predicate.inline 1.18% : 0.000002s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.17% : 0.000002s : 8: predicate.less_batch_normalization 1.44% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.98% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.66% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.65% : 0.000001s : 9: predicate.minmaximum_grad 2.71% : 0.000004s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.29% : 0.000002s : 11: predicate.partial_defer_inline 1.14% : 0.000002s : 13: predicate.partial_eliminate 0.71% : 0.000001s : 9: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 2.05% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.46% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 1.10% : 0.000002s : 8: predicate.same_eliminate 0.72% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.34% : 0.000002s : 8: predicate.shard_identity_eliminate 1.07% : 0.000002s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.53% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.93% : 0.000001s : 11: predicate.switch_defer_inline 1.65% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000007s : 41: predicate.switch_simplify 0.70% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.90% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.80% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 1.03% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.92% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000287 6 38.31% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 61.69% : 0.000177s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030908 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.82% : 0.003344s : 1: add_attr 10.78% : 0.003333s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.18% : 0.000057s : 1: auto_monad 0.07% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.69% : 0.000524s : 1: bootstrap 0.10% : 0.000031s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000012s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.48% : 0.000458s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.02% : 0.000626s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.07% : 0.000021s : 1: opt.transform.mutable_eliminate 2.73% : 0.000844s : 78: opt.transform.opt_a 0.09% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000095s : 28: opt.transform.opt_b 0.15% : 0.000046s : 2: opt.transform.opt_trans_graph 0.11% : 0.000034s : 4: opt.transform.symbol_engine_opt 7.60% : 0.002350s : 1: opt_a 0.33% : 0.000104s : 1: opt_after_cconv 1.70% : 0.000525s : 1: opt_after_jit_grad 0.66% : 0.000204s : 1: opt_b 14.40% : 0.004451s : 1: optimize 0.07% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000023s : 1: overlap_grad_flash_sp 0.01% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.05% : 0.000016s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000018s : 1: remove_dup_value 1.26% : 0.000390s : 1: renormalize.infer 0.77% : 0.000236s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000045s : 1: rewriter_after_opt_a 0.16% : 0.000051s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000077s : 1: symbol_engine_optimizer 23.94% : 0.007399s : 1: task_emit 0.24% : 0.000075s : 1: tuple_transform 15.90% : 0.004915s : 1: type_inference 0.23% : 0.000072s : 1: validate TotalTime = 0.0198929, [24] [bootstrap]: 0.00048415 [type_inference]: 0.00559673 [event_method]: 1.375e-05 [auto_monad]: 5.569e-05 [graph_reusing]: 5.79999e-06 [inline]: 2.02001e-06 [add_attr]: 0.00305296, [1] [add_attr_with_inline]: 0.00304498, [1] [Cycle 1]: 4.7e-05, [2] [tag_attr]: 1.568e-05 [meta_addattr_fg_expand]: 4.14002e-06 [parallel-infer-symbol]: 2.59001e-06 [pre_auto_parallel]: 2.586e-05 [insert-virtual-dataset]: 2.77002e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00402388, [53] [py_interpret_to_execute]: 2.014e-05 [rewriter_before_opt_a]: 5.869e-05 [opt_a]: 0.00214262, [2] [Cycle 1]: 0.00152577, [45] [expand_dump_flag]: 3.08e-06 [switch_simplify]: 3.213e-05 [loop_unroll]: 2.149e-05 [a_1]: 0.00046444 [with_stream_mark]: 1.329e-05 [recompute_prepare]: 8.14002e-06 [updatestate_depend_eliminate]: 3.80998e-06 [updatestate_assign_eliminate]: 3.76999e-06 [updatestate_loads_eliminate]: 2.65002e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.715e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 1.92999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.52002e-06 [auto_parallel]: 6.23998e-06 [parallel]: 1.807e-05 [flash_sp]: 7.43e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 8.62e-06 [allreduce_slice_to_reducescatter]: 5.90022e-07 [virtual_shard_identity]: 7.78001e-06 [virtual_dataset]: 6.02001e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.59998e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.57999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 9.49999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.56e-06 [flash_sp_send_recv_attached]: 2.35997e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 9.07999e-06 [renormalize]: 0.00041723 [add_forward_monad_depend]: 4.51002e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.365e-05 [cse]: 2.664e-05 [a_3]: 4.217e-05 [Cycle 2]: 0.00060749, [45] [expand_dump_flag]: 1.06002e-06 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.79e-06 [a_1]: 0.00013068 [with_stream_mark]: 9.57999e-06 [recompute_prepare]: 5.78002e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.964e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.48002e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.32001e-06 [parallel]: 3.95e-06 [flash_sp]: 2.94999e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.24999e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.09998e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.38002e-06 [cell_reuse_recompute_pass]: 1.34003e-06 [offload_activation]: 5.61998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47001e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.55999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03998e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.86e-06 [a_after_grad]: 8.37e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.10002e-06 [cse]: 1.384e-05 [a_3]: 3.315e-05 [py_interpret_to_execute_after_opt_a]: 7.36001e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.041e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.28002e-06 [mutable_eliminate]: 0.00045102 [opt_b]: 0.00018604, [1] [Cycle 1]: 0.0001801, [7] [b_1]: 0.00011129 [b_2]: 7.46999e-06 [updatestate_depend_eliminate]: 5.46998e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 2.3999e-07 [cse]: 1.631e-05 [optimize_parallel_all_gather_comm]: 1.562e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.235e-05 [loop_unroll]: 0.00041825 [opt_after_cconv]: 9.791e-05, [1] [Cycle 1]: 9.22e-05, [7] [c_1]: 2.848e-05 [parameter_eliminate]: 2.33002e-06 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.673e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.223e-05 [tuple_transform]: 9.786e-05, [1] [Cycle 1]: 9.342e-05, [4] [d_1]: 6.583e-05 [none_parameter_eliminate]: 1.82999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 4.355e-05 [cse_after_recomputation]: 2.19e-05, [1] [Cycle 1]: 1.758e-05, [1] [cse]: 1.22e-05 [environ_conv]: 5.24e-06 [swap_dp_allreduce_reducescatter]: 5.57001e-06 [bias_add_comm_swap]: 2.98e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.51998e-06 [merge_cast_opt]: 1.24998e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.21997e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.25999e-06 [full_micro_interleaved_order_control]: 2.31998e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.09998e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.181e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67001e-06 [overlap_recompute_comm]: 2.45002e-06 [overlap_grad_ring_attention]: 3.76999e-06 [overlap_grad_flash_sp]: 1.661e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.953e-05, [1] [Cycle 1]: 6.525e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.185e-05 [opt_reshape]: 6.51e-06 [fold_const_symbol]: 8.84998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.78997e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.582e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.0004574 [validate]: 3.185e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00590475 [execute]: 7.12002e-06 Sums bootstrap : 0.000484s : 3.05% type_inference : 0.005597s : 35.27% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000595s : 3.75% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000417s : 2.63% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000040s : 0.26% optimize.opt_a.a_3 : 0.000075s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.84% optimize.opt_b.b_1 : 0.000111s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000418s : 2.64% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000066s : 0.41% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.27% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000457s : 2.88% validate : 0.000032s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005905s : 37.21% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000168 30 14.61% : 0.000025s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.45% : 0.000006s : 4: substitution.graph_param_transform 66.15% : 0.000111s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.42% : 0.000004s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 7.52% : 0.000013s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005556 2 90.05% : 0.005003s : 1: type_inference.infer 9.95% : 0.000553s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.48% : 0.000028s : 3: replace.inline 30.52% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 90.43% : 0.000109s : 3: match.inline 9.57% : 0.000012s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.66% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.88% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.18% : 0.000010s : 51: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.05% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.74% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.31% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000002s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.78% : 0.000001s : 11: predicate.transpose_eliminate 1.57% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.24% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 45.67% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.33% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028545 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.71% : 0.003057s : 1: add_attr 10.68% : 0.003049s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.82% : 0.000520s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.61% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 3.39% : 0.000969s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000093s : 28: opt.transform.opt_b 0.25% : 0.000070s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.52% : 0.002146s : 1: opt_a 0.36% : 0.000102s : 1: opt_after_cconv 1.64% : 0.000467s : 1: opt_after_jit_grad 0.66% : 0.000189s : 1: opt_b 14.11% : 0.004028s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000212s : 1: renormalize.infer 0.69% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000072s : 1: symbol_engine_optimizer 20.72% : 0.005915s : 1: task_emit 0.35% : 0.000101s : 1: tuple_transform 19.65% : 0.005610s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0379629, [24] [bootstrap]: 0.00051021 [type_inference]: 0.0115042 [event_method]: 4.879e-05 [auto_monad]: 0.00012307 [graph_reusing]: 8.37998e-06 [inline]: 1.83002e-06 [add_attr]: 0.00303103, [1] [add_attr_with_inline]: 0.00302264, [1] [Cycle 1]: 7.163e-05, [2] [tag_attr]: 3.522e-05 [meta_addattr_fg_expand]: 9.93998e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 5.037e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.71998e-06 [optimize]: 0.0134823, [53] [py_interpret_to_execute]: 3.843e-05 [rewriter_before_opt_a]: 0.00014857 [opt_a]: 0.0111811, [3] [Cycle 1]: 0.0071493, [45] [expand_dump_flag]: 3.89002e-06 [switch_simplify]: 7.62e-05 [loop_unroll]: 6.342e-05 [a_1]: 0.00147934 [with_stream_mark]: 2.322e-05 [recompute_prepare]: 2.209e-05 [updatestate_depend_eliminate]: 9.58002e-06 [updatestate_assign_eliminate]: 8.15e-06 [updatestate_loads_eliminate]: 7.55e-06 [parameter_eliminate]: 2.41e-06 [a_2]: 0.00026392 [accelerated_algorithm]: 3.208e-05 [shard]: 1.81998e-06 [meta_shard_fg_expand]: 4.20999e-06 [shard_inline]: 1.671e-05 [merge_send_recv]: 1.615e-05 [auto_parallel]: 1.149e-05 [parallel]: 1.738e-05 [flash_sp]: 1.098e-05 [merge_comm]: 9.55001e-06 [allreduce_fusion]: 8.73001e-06 [matmul_add_comm_reduction]: 2.674e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.856e-05 [virtual_dataset]: 1.657e-05 [get_grad_eliminate_]: 1.552e-05 [virtual_output]: 1.559e-05 [merge_forward]: 9.49999e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 1.751e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.883e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 2.77e-05 [set_forward_comm_id_for_comm_node_pass]: 9.52999e-06 [meta_fg_expand]: 0.00140635 [flash_sp_send_recv_attached]: 3.91999e-06 [receive_attached]: 2.66e-06 [after_resolve]: 6.057e-05 [a_after_grad]: 8.329e-05 [renormalize]: 0.00245659 [add_forward_monad_depend]: 8.67e-06 [auto_monad_grad]: 5.30001e-06 [auto_monad_eliminator]: 5.651e-05 [cse]: 0.00016796 [a_3]: 0.00034029 [Cycle 2]: 0.00308617, [45] [expand_dump_flag]: 1.64998e-06 [switch_simplify]: 4.777e-05 [loop_unroll]: 4.466e-05 [a_1]: 0.00155857 [with_stream_mark]: 1.241e-05 [recompute_prepare]: 1.114e-05 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012778 [accelerated_algorithm]: 1.214e-05 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.66e-06 [merge_send_recv]: 6.79999e-06 [auto_parallel]: 7.58999e-06 [parallel]: 4.80999e-06 [flash_sp]: 2.96001e-06 [merge_comm]: 5.24e-06 [allreduce_fusion]: 4.73001e-06 [matmul_add_comm_reduction]: 7.9e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.043e-05 [virtual_dataset]: 9.00999e-06 [get_grad_eliminate_]: 9.14e-06 [virtual_output]: 8.50001e-06 [merge_forward]: 4.63999e-06 [cell_reuse_recompute_pass]: 8.39995e-07 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.658e-05 [merge_recompute_call_nodes]: 8.79983e-07 [before_grad]: 1.494e-05 [set_forward_comm_id_for_comm_node_pass]: 6.06e-06 [meta_fg_expand]: 7.133e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.656e-05 [a_after_grad]: 1.457e-05 [renormalize]: 0.00062826 [add_forward_monad_depend]: 3.83001e-06 [auto_monad_grad]: 1.27999e-06 [auto_monad_eliminator]: 1.531e-05 [cse]: 4.928e-05 [a_3]: 6.681e-05 [Cycle 3]: 0.00093145, [45] [expand_dump_flag]: 1.10001e-06 [switch_simplify]: 1.107e-05 [loop_unroll]: 9.33002e-06 [a_1]: 0.00025601 [with_stream_mark]: 1.028e-05 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 4.18999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012626 [accelerated_algorithm]: 1.167e-05 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 2.07001e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 7.05e-06 [auto_parallel]: 7.31001e-06 [parallel]: 5.22e-06 [flash_sp]: 1.14998e-06 [merge_comm]: 5.46e-06 [allreduce_fusion]: 5.13002e-06 [matmul_add_comm_reduction]: 7.95e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.063e-05 [virtual_dataset]: 9.31e-06 [get_grad_eliminate_]: 9.86e-06 [virtual_output]: 9.13002e-06 [merge_forward]: 5.47999e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 9.59999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.714e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.434e-05 [set_forward_comm_id_for_comm_node_pass]: 5.68002e-06 [meta_fg_expand]: 3.48999e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 1.38e-05 [a_after_grad]: 1.458e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.22999e-06 [auto_monad_grad]: 1.07e-06 [auto_monad_eliminator]: 1.096e-05 [cse]: 2.704e-05 [a_3]: 6.168e-05 [py_interpret_to_execute_after_opt_a]: 1.074e-05 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 4.819e-05 [convert_after_rewriter]: 9.67999e-06 [order_py_execute_after_rewriter]: 7.5e-06 [mutable_eliminate]: 0.00046307 [opt_b]: 0.00029566, [1] [Cycle 1]: 0.00028898, [7] [b_1]: 0.00019354 [b_2]: 1.119e-05 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 4.38001e-06 [renormalize]: 3.9002e-07 [cse]: 3.28e-05 [optimize_parallel_all_gather_comm]: 2.127e-05 [overlap_param_gather]: 1.67999e-06 [cconv]: 1.986e-05 [loop_unroll]: 0.00042978 [opt_after_cconv]: 0.0001391, [1] [Cycle 1]: 0.00013314, [7] [c_1]: 4.969e-05 [parameter_eliminate]: 2.58e-06 [updatestate_depend_eliminate]: 7.41999e-06 [updatestate_assign_eliminate]: 4.48001e-06 [updatestate_loads_eliminate]: 4.10998e-06 [cse]: 3.009e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 2.986e-05 [tuple_transform]: 0.00010459, [1] [Cycle 1]: 9.99e-05, [4] [d_1]: 6.878e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 1.071e-05 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 5.686e-05 [cse_after_recomputation]: 3.282e-05, [1] [Cycle 1]: 2.819e-05, [1] [cse]: 2.263e-05 [environ_conv]: 9.05999e-06 [swap_dp_allreduce_reducescatter]: 8.38999e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.68998e-06 [micro_interleaved_order_control]: 2.01003e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.84e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 5.29e-06 [overlap_recompute_and_grad_model_parallel]: 6.11e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.51998e-06 [overlap_grad_ring_attention]: 5.30001e-06 [overlap_grad_flash_sp]: 2.484e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.57001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 9.869e-05, [1] [Cycle 1]: 9.45e-05, [6] [build]: 9.34e-06 [elim_shapecalc]: 1.356e-05 [elim_not_effective]: 1.846e-05 [opt_reshape]: 1.024e-05 [fold_const_symbol]: 1.497e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.491e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.0004785 [validate]: 4.548e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00842222 [execute]: 6.76999e-06 Sums bootstrap : 0.000510s : 1.52% type_inference : 0.011504s : 34.17% event_method : 0.000049s : 0.14% auto_monad : 0.000123s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000149s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000135s : 0.40% optimize.opt_a.loop_unroll : 0.000117s : 0.35% optimize.opt_a.a_1 : 0.003294s : 9.78% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000518s : 1.54% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.02% optimize.opt_a.shard_inline : 0.000036s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.04% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000040s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000035s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000020s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001481s : 4.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.27% optimize.opt_a.a_after_grad : 0.000112s : 0.33% optimize.opt_a.renormalize : 0.003085s : 9.16% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.25% optimize.opt_a.cse : 0.000244s : 0.73% optimize.opt_a.a_3 : 0.000469s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000463s : 1.38% optimize.opt_b.b_1 : 0.000194s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000430s : 1.28% optimize.opt_after_cconv.c_1 : 0.000050s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000069s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000011s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000479s : 1.42% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008422s : 25.02% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000789 222 5.73% : 0.000045s : 12: substitution.arithmetic_simplify 1.69% : 0.000013s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.05% : 0.000434s : 17: substitution.inline 2.04% : 0.000016s : 2: substitution.inline_without_move 1.30% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.66% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.24% : 0.000026s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.28% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.54% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 10.15% : 0.000080s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011430 2 86.73% : 0.009913s : 1: type_inference.infer 13.27% : 0.001517s : 1: type_inference.specialize ------[replace.] 0.000229 33 57.18% : 0.000131s : 17: replace.inline 42.82% : 0.000098s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000459 33 92.66% : 0.000425s : 17: match.inline 7.34% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000765 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.02% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 100: predicate.arithmetic_simplify 1.12% : 0.000009s : 68: predicate.cast_eliminate 1.10% : 0.000008s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.28% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000043s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.61% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.06% : 0.000016s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.38% : 0.000011s : 68: predicate.reduce_eliminate 2.61% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.90% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.97% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.53% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000017s : 116: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.57% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.22% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001592 34 56.36% : 0.000897s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.64% : 0.000695s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062908 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.83% : 0.003035s : 1: add_attr 4.81% : 0.003027s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000131s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.87% : 0.000546s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.09% : 0.000056s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.96% : 0.005009s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000179s : 28: opt.transform.opt_b 0.12% : 0.000077s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.78% : 0.011184s : 1: opt_a 0.23% : 0.000142s : 1: opt_after_cconv 0.78% : 0.000488s : 1: opt_after_jit_grad 0.48% : 0.000299s : 1: opt_b 21.44% : 0.013486s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000055s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.61% : 0.001641s : 2: renormalize.infer 2.28% : 0.001431s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000153s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.40% : 0.008432s : 1: task_emit 0.17% : 0.000108s : 1: tuple_transform 18.31% : 0.011519s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.0188384, [24] [bootstrap]: 0.00048195 [type_inference]: 0.00430899 [event_method]: 1.068e-05 [auto_monad]: 5.311e-05 [graph_reusing]: 5.84e-06 [inline]: 1.92001e-06 [add_attr]: 0.00303274, [1] [add_attr_with_inline]: 0.00302459, [1] [Cycle 1]: 4.732e-05, [2] [tag_attr]: 1.217e-05 [meta_addattr_fg_expand]: 3.93001e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 2.143e-05 [insert-virtual-dataset]: 2.59999e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 2.17001e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00375483, [53] [py_interpret_to_execute]: 1.499e-05 [rewriter_before_opt_a]: 3.878e-05 [opt_a]: 0.00189903, [2] [Cycle 1]: 0.0012906, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 2.479e-05 [loop_unroll]: 1.422e-05 [a_1]: 0.00030115 [with_stream_mark]: 1.311e-05 [recompute_prepare]: 7.58999e-06 [updatestate_depend_eliminate]: 3.79002e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 3.35e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 7.885e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 6.44999e-06 [merge_send_recv]: 7.33999e-06 [auto_parallel]: 6.06998e-06 [parallel]: 1.746e-05 [flash_sp]: 7.13e-06 [merge_comm]: 3.79002e-06 [allreduce_fusion]: 3.11001e-06 [matmul_add_comm_reduction]: 9.20001e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.36001e-06 [virtual_dataset]: 6.14001e-06 [get_grad_eliminate_]: 5.67999e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.68999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.114e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.17999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.69999e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 1.107e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00035841 [add_forward_monad_depend]: 4.46002e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.657e-05 [a_3]: 4.171e-05 [Cycle 2]: 0.00059924, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.65001e-06 [a_1]: 0.000128 [with_stream_mark]: 9.64e-06 [recompute_prepare]: 5.68002e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 7.027e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.32e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 4.2e-06 [auto_parallel]: 5.65001e-06 [parallel]: 3.96001e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.40001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.38e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.005e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 3.20998e-06 [meta_fg_expand]: 1.94e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 8.25999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04003e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.41998e-06 [cse]: 1.263e-05 [a_3]: 3.235e-05 [py_interpret_to_execute_after_opt_a]: 7.84002e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.099e-05 [convert_after_rewriter]: 6.61e-06 [order_py_execute_after_rewriter]: 5.15001e-06 [mutable_eliminate]: 0.00045221 [opt_b]: 0.00018728, [1] [Cycle 1]: 0.00018136, [7] [b_1]: 0.00011134 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.28002e-06 [renormalize]: 4.30009e-07 [cse]: 1.647e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 2.181e-05 [loop_unroll]: 0.00044107 [opt_after_cconv]: 9.75e-05, [1] [Cycle 1]: 9.148e-05, [7] [c_1]: 2.935e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.594e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.272e-05 [tuple_transform]: 7.069e-05, [1] [Cycle 1]: 6.624e-05, [4] [d_1]: 4.028e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.257e-05 [cse_after_recomputation]: 2.074e-05, [1] [Cycle 1]: 1.648e-05, [1] [cse]: 1.111e-05 [environ_conv]: 5.38002e-06 [swap_dp_allreduce_reducescatter]: 5.47999e-06 [bias_add_comm_swap]: 2.20002e-06 [label_micro_interleaved_index]: 3.93999e-06 [label_fine_grained_interleaved_index]: 2.93998e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.57001e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.40999e-06 [overlap_opt_shard_in_pipeline]: 1.35001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.75e-06 [overlap_recompute_and_grad_model_parallel]: 5.56e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49998e-06 [overlap_recompute_comm]: 2.73e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.685e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.961e-05, [1] [Cycle 1]: 6.548e-05, [6] [build]: 2.67001e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.163e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.94999e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.682e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.52997e-06 [opt_after_jit_grad]: 0.00044757 [validate]: 3.29e-05 [backend_pass]: 1.15999e-06 [task_emit]: 0.00645421 [execute]: 7.24001e-06 Sums bootstrap : 0.000482s : 3.25% type_inference : 0.004309s : 29.03% event_method : 0.000011s : 0.07% auto_monad : 0.000053s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000429s : 2.89% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000149s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000358s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000039s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 3.05% optimize.opt_b.b_1 : 0.000111s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000441s : 2.97% optimize.opt_after_cconv.c_1 : 0.000029s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.04% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000448s : 3.02% validate : 0.000033s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006454s : 43.49% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.33% : 0.000022s : 4: substitution.arithmetic_simplify 1.52% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.30% : 0.000005s : 4: substitution.graph_param_transform 65.80% : 0.000080s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.46% : 0.000004s : 4: substitution.remove_not_recompute_node 3.29% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004267 2 91.68% : 0.003912s : 1: type_inference.infer 8.32% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000142 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.58% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.36% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.98% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.92% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.92% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.61% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000242 6 40.61% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.39% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026938 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.27% : 0.003037s : 1: add_attr 11.24% : 0.003028s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.08% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000510s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.67% : 0.000450s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000790s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000093s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.06% : 0.001902s : 1: opt_a 0.38% : 0.000101s : 1: opt_after_cconv 1.70% : 0.000457s : 1: opt_after_jit_grad 0.71% : 0.000191s : 1: opt_b 13.95% : 0.003759s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000006s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000194s : 1: renormalize.infer 0.58% : 0.000157s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 24.00% : 0.006464s : 1: task_emit 0.27% : 0.000074s : 1: tuple_transform 16.05% : 0.004322s : 1: type_inference 0.22% : 0.000059s : 1: validate TotalTime = 0.0382794, [24] [bootstrap]: 0.0005375 [type_inference]: 0.0107536 [event_method]: 4.363e-05 [auto_monad]: 0.00011791 [graph_reusing]: 8.1e-06 [inline]: 1.77999e-06 [add_attr]: 0.00314823, [1] [add_attr_with_inline]: 0.00313968, [1] [Cycle 1]: 7.426e-05, [2] [tag_attr]: 3.291e-05 [meta_addattr_fg_expand]: 8.94e-06 [parallel-infer-symbol]: 3.36999e-06 [pre_auto_parallel]: 4.957e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.0140632, [53] [py_interpret_to_execute]: 3.891e-05 [rewriter_before_opt_a]: 0.00013187 [opt_a]: 0.0116716, [3] [Cycle 1]: 0.0073615, [45] [expand_dump_flag]: 4.28001e-06 [switch_simplify]: 6.772e-05 [loop_unroll]: 5.539e-05 [a_1]: 0.00137163 [with_stream_mark]: 2.775e-05 [recompute_prepare]: 2.381e-05 [updatestate_depend_eliminate]: 9.63002e-06 [updatestate_assign_eliminate]: 7.62002e-06 [updatestate_loads_eliminate]: 7.85e-06 [parameter_eliminate]: 2.69001e-06 [a_2]: 0.00025092 [accelerated_algorithm]: 3.397e-05 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 3.65998e-06 [shard_inline]: 1.654e-05 [merge_send_recv]: 1.67e-05 [auto_parallel]: 1.219e-05 [parallel]: 1.98e-05 [flash_sp]: 1.203e-05 [merge_comm]: 9.94999e-06 [allreduce_fusion]: 9.39e-06 [matmul_add_comm_reduction]: 3e-05 [allreduce_slice_to_reducescatter]: 9.39996e-07 [virtual_shard_identity]: 1.84e-05 [virtual_dataset]: 1.652e-05 [get_grad_eliminate_]: 1.605e-05 [virtual_output]: 1.559e-05 [merge_forward]: 9.85002e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 1.893e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.899e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 2.796e-05 [set_forward_comm_id_for_comm_node_pass]: 9.64e-06 [meta_fg_expand]: 0.00154937 [flash_sp_send_recv_attached]: 3.69002e-06 [receive_attached]: 2.36e-06 [after_resolve]: 6.106e-05 [a_after_grad]: 8.278e-05 [renormalize]: 0.00261417 [add_forward_monad_depend]: 9.66e-06 [auto_monad_grad]: 6.10002e-06 [auto_monad_eliminator]: 5.755e-05 [cse]: 0.00016911 [a_3]: 0.00035555 [Cycle 2]: 0.00329396, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 4.759e-05 [loop_unroll]: 4.428e-05 [a_1]: 0.00159525 [with_stream_mark]: 2.146e-05 [recompute_prepare]: 1.494e-05 [updatestate_depend_eliminate]: 5.96e-06 [updatestate_assign_eliminate]: 5.22999e-06 [updatestate_loads_eliminate]: 3.89002e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 0.00013239 [accelerated_algorithm]: 1.458e-05 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 2.74999e-06 [shard_inline]: 9.72999e-06 [merge_send_recv]: 9.08002e-06 [auto_parallel]: 1.002e-05 [parallel]: 8.01001e-06 [flash_sp]: 3.91999e-06 [merge_comm]: 5.84e-06 [allreduce_fusion]: 4.89003e-06 [matmul_add_comm_reduction]: 9.79999e-06 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 1.212e-05 [virtual_dataset]: 9.33002e-06 [get_grad_eliminate_]: 8.93002e-06 [virtual_output]: 8.68001e-06 [merge_forward]: 6.16e-06 [cell_reuse_recompute_pass]: 1.41998e-06 [offload_activation]: 1.244e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.816e-05 [merge_recompute_call_nodes]: 1.07e-06 [before_grad]: 1.475e-05 [set_forward_comm_id_for_comm_node_pass]: 5.47999e-06 [meta_fg_expand]: 4.835e-05 [flash_sp_send_recv_attached]: 1.47999e-06 [receive_attached]: 1.86e-06 [after_resolve]: 1.675e-05 [a_after_grad]: 1.5e-05 [renormalize]: 0.00074431 [add_forward_monad_depend]: 4.86002e-06 [auto_monad_grad]: 1.95001e-06 [auto_monad_eliminator]: 1.713e-05 [cse]: 5.692e-05 [a_3]: 6.759e-05 [Cycle 3]: 0.00100012, [45] [expand_dump_flag]: 1.22999e-06 [switch_simplify]: 1.096e-05 [loop_unroll]: 9.21998e-06 [a_1]: 0.00026416 [with_stream_mark]: 1.184e-05 [recompute_prepare]: 9.72001e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 4.02002e-06 [updatestate_loads_eliminate]: 7.251e-05 [parameter_eliminate]: 1.14e-06 [a_2]: 0.00012686 [accelerated_algorithm]: 1.226e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 2.23998e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 7.26999e-06 [auto_parallel]: 7.68001e-06 [parallel]: 4.68001e-06 [flash_sp]: 1.23002e-06 [merge_comm]: 4.97e-06 [allreduce_fusion]: 4.93001e-06 [matmul_add_comm_reduction]: 7.87e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.041e-05 [virtual_dataset]: 8.90001e-06 [get_grad_eliminate_]: 8.42998e-06 [virtual_output]: 8.47998e-06 [merge_forward]: 4.32e-06 [cell_reuse_recompute_pass]: 1.50001e-06 [offload_activation]: 8.77e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.642e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.438e-05 [set_forward_comm_id_for_comm_node_pass]: 5.49e-06 [meta_fg_expand]: 3.36001e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.41e-05 [a_after_grad]: 1.472e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.112e-05 [cse]: 2.654e-05 [a_3]: 5.798e-05 [py_interpret_to_execute_after_opt_a]: 1.494e-05 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 5.36e-05 [convert_after_rewriter]: 9.81e-06 [order_py_execute_after_rewriter]: 6.93e-06 [mutable_eliminate]: 0.00054891 [opt_b]: 0.00029342, [1] [Cycle 1]: 0.00028642, [7] [b_1]: 0.00019179 [b_2]: 1.106e-05 [updatestate_depend_eliminate]: 7.28999e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 3.59985e-07 [cse]: 3.308e-05 [optimize_parallel_all_gather_comm]: 2.11e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.297e-05 [loop_unroll]: 0.00042999 [opt_after_cconv]: 0.00013796, [1] [Cycle 1]: 0.00013207, [7] [c_1]: 4.955e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 7.19001e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 4.04002e-06 [cse]: 3.033e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 3.325e-05 [tuple_transform]: 0.00010433, [1] [Cycle 1]: 9.96e-05, [4] [d_1]: 6.858e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 1.039e-05 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 6.03e-05 [cse_after_recomputation]: 3.299e-05, [1] [Cycle 1]: 2.786e-05, [1] [cse]: 2.254e-05 [environ_conv]: 8.55999e-06 [swap_dp_allreduce_reducescatter]: 7.68999e-06 [bias_add_comm_swap]: 2.99999e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.24999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.54e-06 [interleave_parallel_branches]: 1.14e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.808e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 5.22e-06 [overlap_recompute_and_grad_model_parallel]: 5.96e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.64001e-06 [overlap_grad_ring_attention]: 5.09e-06 [overlap_grad_flash_sp]: 2.708e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 1.84998e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 0.00010011, [1] [Cycle 1]: 9.565e-05, [6] [build]: 9.82001e-06 [elim_shapecalc]: 1.359e-05 [elim_not_effective]: 1.837e-05 [opt_reshape]: 1.043e-05 [fold_const_symbol]: 1.492e-05 [renormalize]: 2.30008e-07 [detach_backward]: 2.16998e-06 [pipeline_parallel_scheduler]: 1.77001e-06 [auto_monad_reorder]: 2.508e-05 [get_jit_bprop_graph]: 2.26e-06 [rewriter_after_jit_bprop_graph]: 4e-06 [opt_after_jit_grad]: 0.00047796 [validate]: 4.673e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.008746 [execute]: 8.87e-06 Sums bootstrap : 0.000537s : 1.59% type_inference : 0.010754s : 31.81% event_method : 0.000044s : 0.13% auto_monad : 0.000118s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000050s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000132s : 0.39% optimize.opt_a.expand_dump_flag : 0.000008s : 0.02% optimize.opt_a.switch_simplify : 0.000126s : 0.37% optimize.opt_a.loop_unroll : 0.000109s : 0.32% optimize.opt_a.a_1 : 0.003231s : 9.56% optimize.opt_a.with_stream_mark : 0.000061s : 0.18% optimize.opt_a.recompute_prepare : 0.000048s : 0.14% optimize.opt_a.updatestate_depend_eliminate : 0.000021s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000084s : 0.25% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000510s : 1.51% optimize.opt_a.accelerated_algorithm : 0.000061s : 0.18% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.03% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000033s : 0.10% optimize.opt_a.auto_parallel : 0.000030s : 0.09% optimize.opt_a.parallel : 0.000032s : 0.10% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000048s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000041s : 0.12% optimize.opt_a.virtual_dataset : 0.000035s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000020s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000040s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001601s : 4.74% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000092s : 0.27% optimize.opt_a.a_after_grad : 0.000112s : 0.33% optimize.opt_a.renormalize : 0.003359s : 9.94% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000086s : 0.25% optimize.opt_a.cse : 0.000253s : 0.75% optimize.opt_a.a_3 : 0.000481s : 1.42% optimize.py_interpret_to_execute_after_opt_a : 0.000015s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000054s : 0.16% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000549s : 1.62% optimize.opt_b.b_1 : 0.000192s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.07% optimize.loop_unroll : 0.000430s : 1.27% optimize.opt_after_cconv.c_1 : 0.000050s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000033s : 0.10% optimize.tuple_transform.d_1 : 0.000069s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000060s : 0.18% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000027s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000478s : 1.41% validate : 0.000047s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008746s : 25.87% execute : 0.000009s : 0.03% Time group info: ------[substitution.] 0.000800 218 6.31% : 0.000051s : 11: substitution.arithmetic_simplify 1.88% : 0.000015s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.27% : 0.000442s : 16: substitution.inline 2.09% : 0.000017s : 2: substitution.inline_without_move 1.32% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.20% : 0.000018s : 3: substitution.less_batch_normalization 1.78% : 0.000014s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.71% : 0.000014s : 20: substitution.remove_not_recompute_node 3.26% : 0.000026s : 10: substitution.replace_applicator 1.44% : 0.000012s : 15: substitution.replace_old_param 0.32% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.52% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.71% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.45% : 0.000020s : 11: substitution.tuple_list_get_item_depend_reorder 8.27% : 0.000066s : 28: substitution.tuple_list_get_item_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010677 2 87.05% : 0.009294s : 1: type_inference.infer 12.95% : 0.001383s : 1: type_inference.specialize ------[replace.] 0.000211 30 59.01% : 0.000125s : 16: replace.inline 40.99% : 0.000087s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000466 30 93.09% : 0.000433s : 16: match.inline 6.91% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000754 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.03% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.15% : 0.000016s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.65% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.54% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.58% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000042s : 244: predicate.inline 1.34% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.73% : 0.000006s : 32: predicate.less_batch_normalization 1.71% : 0.000013s : 97: predicate.list_to_tuple_eliminator_ 2.60% : 0.000020s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.49% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 97: predicate.partial_defer_inline 1.69% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 67: predicate.reduce_eliminate 2.60% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.05% : 0.000008s : 67: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.43% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.70% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 97: predicate.switch_defer_inline 2.88% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000037s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.05% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.52% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.60% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.18% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.62% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.59% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001584 32 57.75% : 0.000915s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.25% : 0.000669s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.064120 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.92% : 0.003153s : 1: add_attr 4.90% : 0.003144s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000065s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000125s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000565s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000006s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.87% : 0.000558s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.71% : 0.004946s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000177s : 28: opt.transform.opt_b 0.12% : 0.000077s : 2: opt.transform.opt_trans_graph 0.08% : 0.000054s : 4: opt.transform.symbol_engine_opt 18.21% : 0.011675s : 1: opt_a 0.22% : 0.000141s : 1: opt_after_cconv 0.76% : 0.000488s : 1: opt_after_jit_grad 0.46% : 0.000297s : 1: opt_b 21.94% : 0.014068s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000030s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.03% : 0.000019s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000038s : 1: remove_dup_value 2.88% : 0.001848s : 2: renormalize.infer 2.33% : 0.001495s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000058s : 1: rewriter_after_opt_a 0.21% : 0.000137s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000103s : 1: symbol_engine_optimizer 13.67% : 0.008763s : 1: task_emit 0.17% : 0.000107s : 1: tuple_transform 16.80% : 0.010774s : 1: type_inference 0.13% : 0.000087s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x2-kbk],max_mem:54.0M TotalTime = 0.121553, [24] [bootstrap]: 0.00061338 [type_inference]: 0.00631466 [event_method]: 1.434e-05 [auto_monad]: 5.376e-05 [graph_reusing]: 5.46998e-06 [inline]: 1.83997e-06 [add_attr]: 0.00346624, [1] [add_attr_with_inline]: 0.0034555, [1] [Cycle 1]: 4.409e-05, [2] [tag_attr]: 1.532e-05 [meta_addattr_fg_expand]: 4.21001e-06 [parallel-infer-symbol]: 3.00002e-06 [pre_auto_parallel]: 2.836e-05 [insert-virtual-dataset]: 2.38002e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.00407851, [53] [py_interpret_to_execute]: 2.116e-05 [rewriter_before_opt_a]: 5.933e-05 [opt_a]: 0.00216832, [2] [Cycle 1]: 0.00155657, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 3.362e-05 [loop_unroll]: 2.213e-05 [a_1]: 0.00046633 [with_stream_mark]: 1.377e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 3.46999e-06 [updatestate_loads_eliminate]: 2.67001e-06 [parameter_eliminate]: 2.33998e-06 [a_2]: 7.848e-05 [accelerated_algorithm]: 6.44999e-06 [shard]: 2.55002e-06 [meta_shard_fg_expand]: 1.97001e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.64e-06 [auto_parallel]: 6.14001e-06 [parallel]: 2.33e-05 [flash_sp]: 7.19001e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 9.61e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.26999e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 6.19001e-06 [virtual_output]: 5.97999e-06 [merge_forward]: 4.13999e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.115e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 2.44001e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.038e-05 [a_after_grad]: 8.82999e-06 [renormalize]: 0.00043233 [add_forward_monad_depend]: 4.46002e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.398e-05 [cse]: 2.585e-05 [a_3]: 4.279e-05 [Cycle 2]: 0.00060222, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 7.05002e-06 [loop_unroll]: 5.72001e-06 [a_1]: 0.00012891 [with_stream_mark]: 9.76e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 3.04999e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 6.927e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.59998e-06 [flash_sp]: 2.91999e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.47001e-06 [virtual_dataset]: 5.59998e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.15001e-06 [merge_forward]: 2.58998e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.53001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.87999e-06 [a_after_grad]: 8.42e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.36e-06 [cse]: 1.212e-05 [a_3]: 3.368e-05 [py_interpret_to_execute_after_opt_a]: 7.52998e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.246e-05 [convert_after_rewriter]: 6.74001e-06 [order_py_execute_after_rewriter]: 5.12999e-06 [mutable_eliminate]: 0.000458 [opt_b]: 0.00018745, [1] [Cycle 1]: 0.00018135, [7] [b_1]: 0.00011273 [b_2]: 7.5e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.7998e-07 [cse]: 1.614e-05 [optimize_parallel_all_gather_comm]: 1.556e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.288e-05 [loop_unroll]: 0.00042078 [opt_after_cconv]: 9.734e-05, [1] [Cycle 1]: 9.132e-05, [7] [c_1]: 2.875e-05 [parameter_eliminate]: 2.35002e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.54e-05 [renormalize]: 1.80007e-07 [remove_dup_value]: 1.304e-05 [tuple_transform]: 6.973e-05, [1] [Cycle 1]: 6.538e-05, [4] [d_1]: 3.925e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.49999e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 8.31e-05 [cse_after_recomputation]: 2.181e-05, [1] [Cycle 1]: 1.718e-05, [1] [cse]: 1.178e-05 [environ_conv]: 4.25e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 3.28e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.66999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.5999e-07 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.16997e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.171e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.37998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44998e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 4.07e-06 [overlap_grad_flash_sp]: 1.712e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 7.034e-05, [1] [Cycle 1]: 6.633e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.82e-06 [elim_not_effective]: 1.219e-05 [opt_reshape]: 6.58e-06 [fold_const_symbol]: 9.20999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.604e-05 [get_jit_bprop_graph]: 1.18001e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00045896 [validate]: 3.167e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.106224 [execute]: 1.028e-05 Sums bootstrap : 0.000613s : 0.52% type_inference : 0.006315s : 5.39% event_method : 0.000014s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.03% optimize.opt_a.loop_unroll : 0.000028s : 0.02% optimize.opt_a.a_1 : 0.000595s : 0.51% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000148s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000432s : 0.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000038s : 0.03% optimize.opt_a.a_3 : 0.000076s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000458s : 0.39% optimize.opt_b.b_1 : 0.000113s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.02% optimize.loop_unroll : 0.000421s : 0.36% optimize.opt_after_cconv.c_1 : 0.000029s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000083s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000459s : 0.39% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.106224s : 90.72% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000169 30 14.10% : 0.000024s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.33% : 0.000006s : 4: substitution.graph_param_transform 67.61% : 0.000114s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.39% : 0.000004s : 4: substitution.remove_not_recompute_node 2.50% : 0.000004s : 4: substitution.replace_old_param 6.61% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006265 2 90.30% : 0.005657s : 1: type_inference.infer 9.70% : 0.000608s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.82% : 0.000029s : 3: replace.inline 29.18% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 91.72% : 0.000112s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000165 1131 0.98% : 0.000002s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.88% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.90% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.80% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.26% : 0.000004s : 32: predicate.load_eliminater 0.89% : 0.000001s : 4: predicate.loop_unroll_after_grad 2.23% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.95% : 0.000002s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.45% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.15% : 0.000008s : 54: predicate.switch_simplify 0.92% : 0.000002s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 46.92% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.08% : 0.000188s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.130660 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.66% : 0.003471s : 1: add_attr 2.65% : 0.003459s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000088s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.50% : 0.000651s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000025s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.75% : 0.000975s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000094s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000034s : 4: opt.transform.symbol_engine_opt 1.66% : 0.002171s : 1: opt_a 0.08% : 0.000101s : 1: opt_after_cconv 0.36% : 0.000469s : 1: opt_after_jit_grad 0.15% : 0.000191s : 1: opt_b 3.12% : 0.004083s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.17% : 0.000223s : 1: renormalize.infer 0.15% : 0.000202s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000073s : 1: symbol_engine_optimizer 81.31% : 0.106244s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.84% : 0.006329s : 1: type_inference 0.04% : 0.000057s : 1: validate TotalTime = 0.113511, [24] [bootstrap]: 0.00048215 [type_inference]: 0.00446072 [event_method]: 1.065e-05 [auto_monad]: 5.224e-05 [graph_reusing]: 4.82e-06 [inline]: 2.23998e-06 [add_attr]: 0.00305897, [1] [add_attr_with_inline]: 0.00304997, [1] [Cycle 1]: 4.766e-05, [2] [tag_attr]: 1.312e-05 [meta_addattr_fg_expand]: 3.47997e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 2.393e-05 [insert-virtual-dataset]: 2.66999e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00388765, [53] [py_interpret_to_execute]: 1.608e-05 [rewriter_before_opt_a]: 4.129e-05 [opt_a]: 0.00199392, [2] [Cycle 1]: 0.00134217, [45] [expand_dump_flag]: 2.73998e-06 [switch_simplify]: 2.376e-05 [loop_unroll]: 1.403e-05 [a_1]: 0.00029866 [with_stream_mark]: 1.455e-05 [recompute_prepare]: 7.51999e-06 [updatestate_depend_eliminate]: 3.85998e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 2.40002e-06 [a_2]: 7.608e-05 [accelerated_algorithm]: 6.23998e-06 [shard]: 3.33e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 8.33999e-06 [auto_parallel]: 5.81998e-06 [parallel]: 1.782e-05 [flash_sp]: 7.68001e-06 [merge_comm]: 3.62998e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.50001e-06 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 5.89999e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 4.06001e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.098e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.62999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47002e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 2.52001e-06 [receive_attached]: 2.27001e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 8.67e-06 [renormalize]: 0.00041333 [add_forward_monad_depend]: 5.10001e-06 [auto_monad_grad]: 2.14999e-06 [auto_monad_eliminator]: 1.366e-05 [cse]: 2.86e-05 [a_3]: 4.116e-05 [Cycle 2]: 0.00064141, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 7.11999e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012593 [with_stream_mark]: 1.154e-05 [recompute_prepare]: 6.17999e-06 [updatestate_depend_eliminate]: 3.06999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.75997e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.844e-05 [accelerated_algorithm]: 5.59998e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.32999e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 6.01e-06 [parallel]: 4.74e-06 [flash_sp]: 3.45998e-06 [merge_comm]: 3.15998e-06 [allreduce_fusion]: 2.68003e-06 [matmul_add_comm_reduction]: 5.64e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.26e-06 [virtual_dataset]: 5.39998e-06 [get_grad_eliminate_]: 5.09998e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 3.00002e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 6.28998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33998e-06 [meta_fg_expand]: 1.96e-06 [flash_sp_send_recv_attached]: 1.27e-06 [receive_attached]: 1.05999e-06 [after_resolve]: 9.72999e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.30999e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 7e-06 [cse]: 1.425e-05 [a_3]: 3.294e-05 [py_interpret_to_execute_after_opt_a]: 8.70999e-06 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 3.253e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00049152 [opt_b]: 0.00018619, [1] [Cycle 1]: 0.00017963, [7] [b_1]: 0.00010997 [b_2]: 7.19001e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 6.69999e-07 [cse]: 1.701e-05 [optimize_parallel_all_gather_comm]: 1.595e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.42e-05 [loop_unroll]: 0.000428 [opt_after_cconv]: 9.911e-05, [1] [Cycle 1]: 9.336e-05, [7] [c_1]: 2.969e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.57999e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.587e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.304e-05 [tuple_transform]: 7.012e-05, [1] [Cycle 1]: 6.584e-05, [4] [d_1]: 3.952e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.33e-05 [cse_after_recomputation]: 2.104e-05, [1] [Cycle 1]: 1.632e-05, [1] [cse]: 1.104e-05 [environ_conv]: 4.64002e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.53998e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.64999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.807e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.80001e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 1.22e-06 [symbol_engine_optimizer]: 7.092e-05, [1] [Cycle 1]: 6.637e-05, [6] [build]: 2.71e-06 [elim_shapecalc]: 9.12001e-06 [elim_not_effective]: 1.154e-05 [opt_reshape]: 6.46e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 1.626e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.0004805 [validate]: 3.731e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.100745 [execute]: 9.57001e-06 Sums bootstrap : 0.000482s : 0.44% type_inference : 0.004461s : 4.08% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000041s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000425s : 0.39% optimize.opt_a.with_stream_mark : 0.000026s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000413s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.02% optimize.opt_a.cse : 0.000043s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000492s : 0.45% optimize.opt_b.b_1 : 0.000110s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.02% optimize.loop_unroll : 0.000428s : 0.39% optimize.opt_after_cconv.c_1 : 0.000030s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000481s : 0.44% validate : 0.000037s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.100745s : 92.07% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000127 26 18.65% : 0.000024s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 4.28% : 0.000005s : 4: substitution.graph_param_transform 66.21% : 0.000084s : 2: substitution.inline 2.08% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.34% : 0.000004s : 4: substitution.remove_not_recompute_node 3.13% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004417 2 91.51% : 0.004042s : 1: type_inference.infer 8.49% : 0.000375s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000083 2 100.00% : 0.000083s : 2: match.inline ------[predicate.] 0.000140 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.68% : 0.000004s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.56% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.67% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.76% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.93% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.89% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.79% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.50% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 1.02% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.52% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000276 6 42.29% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.71% : 0.000159s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.121803 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.52% : 0.003064s : 1: add_attr 2.51% : 0.003054s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000518s : 1: bootstrap 0.02% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000437s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.41% : 0.000501s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.64% : 0.000776s : 78: opt.transform.opt_a 0.02% : 0.000028s : 1: opt.transform.opt_after_cconv 0.02% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000092s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 1.64% : 0.001997s : 1: opt_a 0.08% : 0.000103s : 1: opt_after_cconv 0.40% : 0.000491s : 1: opt_after_jit_grad 0.16% : 0.000190s : 1: opt_b 3.20% : 0.003892s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000028s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.19% : 0.000232s : 1: renormalize.infer 0.14% : 0.000174s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000046s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000074s : 1: symbol_engine_optimizer 82.73% : 0.100768s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 3.67% : 0.004475s : 1: type_inference 0.05% : 0.000063s : 1: validate TotalTime = 0.110306, [24] [bootstrap]: 0.00045869 [type_inference]: 0.0055703 [event_method]: 1.413e-05 [auto_monad]: 5.495e-05 [graph_reusing]: 5.59e-06 [inline]: 1.59998e-06 [add_attr]: 0.00295662, [1] [add_attr_with_inline]: 0.00294828, [1] [Cycle 1]: 4.569e-05, [2] [tag_attr]: 1.547e-05 [meta_addattr_fg_expand]: 4.32e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.532e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.03002e-06 [pipeline_split]: 1.71998e-06 [optimize]: 0.00401581, [53] [py_interpret_to_execute]: 2.15e-05 [rewriter_before_opt_a]: 5.933e-05 [opt_a]: 0.00212884, [2] [Cycle 1]: 0.00151664, [45] [expand_dump_flag]: 2.81999e-06 [switch_simplify]: 3.186e-05 [loop_unroll]: 2.083e-05 [a_1]: 0.00045067 [with_stream_mark]: 1.266e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.631e-05 [accelerated_algorithm]: 6.57002e-06 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 5.76998e-06 [merge_send_recv]: 7.88001e-06 [auto_parallel]: 6.58e-06 [parallel]: 1.638e-05 [flash_sp]: 7.20003e-06 [merge_comm]: 3.96001e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 8.3e-06 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 7.34002e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.74e-06 [merge_forward]: 4.19002e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.36998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.133e-05 [merge_recompute_call_nodes]: 1.29e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.75e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.35002e-06 [receive_attached]: 2.14999e-06 [after_resolve]: 1.028e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00042411 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.312e-05 [cse]: 2.759e-05 [a_3]: 4.223e-05 [Cycle 2]: 0.00060216, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012549 [with_stream_mark]: 1.002e-05 [recompute_prepare]: 6.07999e-06 [updatestate_depend_eliminate]: 3.07002e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.26e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.838e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.19998e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.02998e-06 [flash_sp]: 3.8e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 5.09998e-06 [allreduce_slice_to_reducescatter]: 2.10013e-07 [virtual_shard_identity]: 6.26e-06 [virtual_dataset]: 5.35999e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 5.04003e-06 [merge_forward]: 2.41e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.24999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.43002e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.6e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 8.05e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.369e-05 [a_3]: 3.195e-05 [py_interpret_to_execute_after_opt_a]: 7.71999e-06 [slice_cell_reuse_recomputed_activation]: 2.22001e-06 [rewriter_after_opt_a]: 3.318e-05 [convert_after_rewriter]: 7.07997e-06 [order_py_execute_after_rewriter]: 5.15999e-06 [mutable_eliminate]: 0.00045295 [opt_b]: 0.00018089, [1] [Cycle 1]: 0.00017474, [7] [b_1]: 0.0001069 [b_2]: 6.95998e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.09986e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.608e-05 [overlap_param_gather]: 1.72001e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00046153 [opt_after_cconv]: 9.653e-05, [1] [Cycle 1]: 9.098e-05, [7] [c_1]: 2.838e-05 [parameter_eliminate]: 2.61e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.68e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 6.873e-05, [1] [Cycle 1]: 6.421e-05, [4] [d_1]: 3.817e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.31e-05 [cse_after_recomputation]: 2.024e-05, [1] [Cycle 1]: 1.58e-05, [1] [cse]: 1.055e-05 [environ_conv]: 5.00001e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.27003e-06 [label_fine_grained_interleaved_index]: 2.55002e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.50001e-06 [ForceFp32Comm]: 1.04003e-06 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.136e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.56999e-06 [overlap_recompute_and_grad_model_parallel]: 4.62998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.48002e-06 [overlap_recompute_comm]: 1.89999e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.743e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 6.721e-05, [1] [Cycle 1]: 6.331e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 8.19002e-06 [elim_not_effective]: 1.116e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.82999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.51998e-06 [auto_monad_reorder]: 1.579e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.83001e-06 [opt_after_jit_grad]: 0.00045982 [validate]: 3.184e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0964586 [execute]: 9.49e-06 Sums bootstrap : 0.000459s : 0.43% type_inference : 0.005570s : 5.24% event_method : 0.000014s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000576s : 0.54% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000020s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000424s : 0.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000041s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000453s : 0.43% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000462s : 0.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000460s : 0.43% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096459s : 90.68% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.97% : 0.000025s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.08% : 0.000005s : 4: substitution.graph_param_transform 67.05% : 0.000111s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000004s : 4: substitution.remove_not_recompute_node 2.34% : 0.000004s : 4: substitution.replace_old_param 6.49% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005529 2 89.89% : 0.004970s : 1: type_inference.infer 10.11% : 0.000559s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.84% : 0.000027s : 3: replace.inline 29.16% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.82% : 0.000109s : 3: match.inline 8.18% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.38% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.89% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.99% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000004s : 16: predicate.float_depend_g_call 0.54% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.25% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.53% : 0.000002s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.25% : 0.000002s : 11: predicate.reduce_eliminate 2.25% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.93% : 0.000002s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.74% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000361 8 46.81% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.19% : 0.000192s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.118795 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.49% : 0.002961s : 1: add_attr 2.49% : 0.002952s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000494s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.40% : 0.000470s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.79% : 0.000943s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000042s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.79% : 0.002132s : 1: opt_a 0.08% : 0.000100s : 1: opt_after_cconv 0.40% : 0.000469s : 1: opt_after_jit_grad 0.16% : 0.000184s : 1: opt_b 3.38% : 0.004020s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.18% : 0.000210s : 1: renormalize.infer 0.17% : 0.000207s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 81.22% : 0.096481s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.70% : 0.005584s : 1: type_inference 0.04% : 0.000053s : 1: validate TotalTime = 0.14364, [24] [bootstrap]: 0.00051665 [type_inference]: 0.0114834 [event_method]: 4.955e-05 [auto_monad]: 0.00012039 [graph_reusing]: 7.9e-06 [inline]: 2.24001e-06 [add_attr]: 0.00305669, [1] [add_attr_with_inline]: 0.00304817, [1] [Cycle 1]: 0.00010743, [2] [tag_attr]: 7.122e-05 [meta_addattr_fg_expand]: 9.34e-06 [parallel-infer-symbol]: 3.86001e-06 [pre_auto_parallel]: 5.004e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.0135469, [53] [py_interpret_to_execute]: 3.874e-05 [rewriter_before_opt_a]: 0.00015008 [opt_a]: 0.0112083, [3] [Cycle 1]: 0.00720344, [45] [expand_dump_flag]: 3.48999e-06 [switch_simplify]: 7.396e-05 [loop_unroll]: 6.288e-05 [a_1]: 0.00146809 [with_stream_mark]: 2.358e-05 [recompute_prepare]: 2.179e-05 [updatestate_depend_eliminate]: 9.40001e-06 [updatestate_assign_eliminate]: 7.95e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 2.42001e-06 [a_2]: 0.00024516 [accelerated_algorithm]: 3.154e-05 [shard]: 2.02001e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.663e-05 [merge_send_recv]: 1.594e-05 [auto_parallel]: 1.081e-05 [parallel]: 1.865e-05 [flash_sp]: 1.179e-05 [merge_comm]: 9.59999e-06 [allreduce_fusion]: 8.79003e-06 [matmul_add_comm_reduction]: 2.596e-05 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 1.77e-05 [virtual_dataset]: 1.589e-05 [get_grad_eliminate_]: 1.527e-05 [virtual_output]: 1.531e-05 [merge_forward]: 9.02999e-06 [cell_reuse_recompute_pass]: 1.64e-06 [offload_activation]: 1.772e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.878e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 2.701e-05 [set_forward_comm_id_for_comm_node_pass]: 9.51998e-06 [meta_fg_expand]: 0.00143609 [flash_sp_send_recv_attached]: 3.75e-06 [receive_attached]: 2.58998e-06 [after_resolve]: 5.971e-05 [a_after_grad]: 8.208e-05 [renormalize]: 0.00252189 [add_forward_monad_depend]: 8.69e-06 [auto_monad_grad]: 5.89e-06 [auto_monad_eliminator]: 5.632e-05 [cse]: 0.00017113 [a_3]: 0.00033618 [Cycle 2]: 0.00307483, [45] [expand_dump_flag]: 1.69e-06 [switch_simplify]: 4.794e-05 [loop_unroll]: 4.444e-05 [a_1]: 0.00156046 [with_stream_mark]: 1.332e-05 [recompute_prepare]: 1.133e-05 [updatestate_depend_eliminate]: 5.46e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012635 [accelerated_algorithm]: 1.234e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 2.10002e-06 [shard_inline]: 9.33002e-06 [merge_send_recv]: 6.63e-06 [auto_parallel]: 7.55998e-06 [parallel]: 4.82e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 4.71997e-06 [matmul_add_comm_reduction]: 7.80998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 8.87999e-06 [get_grad_eliminate_]: 9.20999e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.01001e-06 [cell_reuse_recompute_pass]: 8.40024e-07 [offload_activation]: 9.18002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.652e-05 [merge_recompute_call_nodes]: 1.00001e-06 [before_grad]: 1.423e-05 [set_forward_comm_id_for_comm_node_pass]: 5.74999e-06 [meta_fg_expand]: 7.219e-05 [flash_sp_send_recv_attached]: 1.19e-06 [receive_attached]: 1.15001e-06 [after_resolve]: 1.692e-05 [a_after_grad]: 1.5e-05 [renormalize]: 0.00061915 [add_forward_monad_depend]: 4.18999e-06 [auto_monad_grad]: 1.21002e-06 [auto_monad_eliminator]: 1.561e-05 [cse]: 4.816e-05 [a_3]: 6.558e-05 [Cycle 3]: 0.00091568, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 1.101e-05 [loop_unroll]: 9.25999e-06 [a_1]: 0.00025308 [with_stream_mark]: 1.057e-05 [recompute_prepare]: 9.51998e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.86999e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 0.00012479 [accelerated_algorithm]: 1.182e-05 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 9.08002e-06 [merge_send_recv]: 7.08e-06 [auto_parallel]: 7.32002e-06 [parallel]: 4.43001e-06 [flash_sp]: 1.14998e-06 [merge_comm]: 4.92e-06 [allreduce_fusion]: 5.22e-06 [matmul_add_comm_reduction]: 7.56999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.016e-05 [virtual_dataset]: 8.75001e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.31002e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.35001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.728e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.475e-05 [set_forward_comm_id_for_comm_node_pass]: 5.82001e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.09e-06 [after_resolve]: 1.329e-05 [a_after_grad]: 1.447e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 1.089e-05 [cse]: 2.751e-05 [a_3]: 6.024e-05 [py_interpret_to_execute_after_opt_a]: 1.169e-05 [slice_cell_reuse_recomputed_activation]: 2.35002e-06 [rewriter_after_opt_a]: 4.814e-05 [convert_after_rewriter]: 9.07001e-06 [order_py_execute_after_rewriter]: 7.13e-06 [mutable_eliminate]: 0.00050528 [opt_b]: 0.00029353, [1] [Cycle 1]: 0.00028671, [7] [b_1]: 0.00019101 [b_2]: 1.148e-05 [updatestate_depend_eliminate]: 7.19001e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 4.18999e-06 [renormalize]: 8.39995e-07 [cse]: 3.309e-05 [optimize_parallel_all_gather_comm]: 2.047e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 1.953e-05 [loop_unroll]: 0.00042512 [opt_after_cconv]: 0.00013961, [1] [Cycle 1]: 0.00013372, [7] [c_1]: 4.934e-05 [parameter_eliminate]: 2.56998e-06 [updatestate_depend_eliminate]: 7.36001e-06 [updatestate_assign_eliminate]: 4.21001e-06 [updatestate_loads_eliminate]: 3.90998e-06 [cse]: 3.165e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 2.973e-05 [tuple_transform]: 0.00010209, [1] [Cycle 1]: 9.74e-05, [4] [d_1]: 6.681e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 9.89001e-06 [partial_unused_args_eliminate]: 1.55999e-06 [add_recomputation]: 5.802e-05 [cse_after_recomputation]: 3.309e-05, [1] [Cycle 1]: 2.841e-05, [1] [cse]: 2.295e-05 [environ_conv]: 8.64e-06 [swap_dp_allreduce_reducescatter]: 8.07998e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 4.60001e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.91999e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 1.30999e-06 [interleave_split_concat_branches]: 1.14003e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.69e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 5.26002e-06 [overlap_recompute_and_grad_model_parallel]: 5.44e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34003e-06 [overlap_recompute_comm]: 2.18998e-06 [overlap_grad_ring_attention]: 5.03002e-06 [overlap_grad_flash_sp]: 2.635e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.18001e-06 [symbol_engine_optimizer]: 0.00010185, [1] [Cycle 1]: 9.74e-05, [6] [build]: 9.94999e-06 [elim_shapecalc]: 1.38e-05 [elim_not_effective]: 1.852e-05 [opt_reshape]: 1.074e-05 [fold_const_symbol]: 1.471e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 2.581e-05 [get_jit_bprop_graph]: 1.22e-06 [rewriter_after_jit_bprop_graph]: 3.71001e-06 [opt_after_jit_grad]: 0.00048469 [validate]: 4.87e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.113993 [execute]: 9.69999e-06 Sums bootstrap : 0.000517s : 0.37% type_inference : 0.011483s : 8.24% event_method : 0.000050s : 0.04% auto_monad : 0.000120s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000071s : 0.05% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000050s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.03% optimize.rewriter_before_opt_a : 0.000150s : 0.11% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000133s : 0.10% optimize.opt_a.loop_unroll : 0.000117s : 0.08% optimize.opt_a.a_1 : 0.003282s : 2.36% optimize.opt_a.with_stream_mark : 0.000047s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.36% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000017s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001511s : 1.08% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.06% optimize.opt_a.a_after_grad : 0.000112s : 0.08% optimize.opt_a.renormalize : 0.003141s : 2.25% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000247s : 0.18% optimize.opt_a.a_3 : 0.000462s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000505s : 0.36% optimize.opt_b.b_1 : 0.000191s : 0.14% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000425s : 0.31% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000026s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000485s : 0.35% validate : 0.000049s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.113993s : 81.81% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000791 222 5.68% : 0.000045s : 12: substitution.arithmetic_simplify 1.78% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.96% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.99% : 0.000451s : 17: substitution.inline 2.02% : 0.000016s : 2: substitution.inline_without_move 1.30% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000016s : 3: substitution.less_batch_normalization 1.71% : 0.000014s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.08% : 0.000024s : 10: substitution.replace_applicator 1.27% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.52% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.71% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.27% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.38% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.27% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011408 2 86.35% : 0.009850s : 1: type_inference.infer 13.65% : 0.001558s : 1: type_inference.specialize ------[replace.] 0.000223 33 58.32% : 0.000130s : 17: replace.inline 41.68% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000476 33 92.71% : 0.000442s : 17: match.inline 7.29% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000753 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.15% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000018s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.68% : 0.000043s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.31% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000015s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.07% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.67% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001612 34 56.68% : 0.000914s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.32% : 0.000698s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168668 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.81% : 0.003061s : 1: add_attr 1.81% : 0.003052s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000128s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.33% : 0.000553s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000058s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.26% : 0.000434s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.30% : 0.000514s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.94% : 0.004957s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000176s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.65% : 0.011211s : 1: opt_a 0.08% : 0.000143s : 1: opt_after_cconv 0.29% : 0.000495s : 1: opt_after_jit_grad 0.18% : 0.000297s : 1: opt_b 8.03% : 0.013551s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000030s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.00% : 0.001687s : 2: renormalize.infer 0.85% : 0.001441s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.09% : 0.000155s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000105s : 1: symbol_engine_optimizer 67.60% : 0.114014s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.82% : 0.011500s : 1: type_inference 0.04% : 0.000076s : 1: validate TotalTime = 0.104414, [24] [bootstrap]: 0.00047636 [type_inference]: 0.00440345 [event_method]: 1.082e-05 [auto_monad]: 4.979e-05 [graph_reusing]: 4.97999e-06 [inline]: 2.29001e-06 [add_attr]: 0.00296185, [1] [add_attr_with_inline]: 0.00295375, [1] [Cycle 1]: 4.558e-05, [2] [tag_attr]: 1.168e-05 [meta_addattr_fg_expand]: 3.48e-06 [parallel-infer-symbol]: 3.13e-06 [pre_auto_parallel]: 2.016e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.00371965, [53] [py_interpret_to_execute]: 1.504e-05 [rewriter_before_opt_a]: 3.907e-05 [opt_a]: 0.00190076, [2] [Cycle 1]: 0.00129516, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 2.456e-05 [loop_unroll]: 1.35e-05 [a_1]: 0.00029356 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 7.47002e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.35998e-06 [updatestate_loads_eliminate]: 3.35e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.747e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.18002e-06 [merge_send_recv]: 8.06001e-06 [auto_parallel]: 5.85002e-06 [parallel]: 1.721e-05 [flash_sp]: 7.26999e-06 [merge_comm]: 4.18999e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.04998e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.21999e-06 [virtual_dataset]: 5.69999e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.87001e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 9.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.132e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.66999e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.102e-05 [a_after_grad]: 9.20001e-06 [renormalize]: 0.00035137 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.337e-05 [cse]: 2.607e-05 [a_3]: 3.996e-05 [Cycle 2]: 0.00059627, [45] [expand_dump_flag]: 8.10018e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012565 [with_stream_mark]: 9.28002e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.851e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.26997e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.19998e-06 [parallel]: 4.45e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 3.06999e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.07001e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 4.95001e-06 [virtual_output]: 5.25001e-06 [merge_forward]: 2.76999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.14001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.05001e-06 [a_after_grad]: 7.9e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.294e-05 [a_3]: 3.354e-05 [py_interpret_to_execute_after_opt_a]: 7.15003e-06 [slice_cell_reuse_recomputed_activation]: 1.95001e-06 [rewriter_after_opt_a]: 3.135e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.0004528 [opt_b]: 0.00018818, [1] [Cycle 1]: 0.00018195, [7] [b_1]: 0.00011308 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 4.19997e-07 [cse]: 1.659e-05 [optimize_parallel_all_gather_comm]: 1.498e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.212e-05 [loop_unroll]: 0.00041622 [opt_after_cconv]: 9.505e-05, [1] [Cycle 1]: 8.947e-05, [7] [c_1]: 2.768e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.664e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 6.875e-05, [1] [Cycle 1]: 6.449e-05, [4] [d_1]: 3.827e-05 [none_parameter_eliminate]: 1.76998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.25002e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.409e-05 [cse_after_recomputation]: 2.08e-05, [1] [Cycle 1]: 1.641e-05, [1] [cse]: 1.073e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 4.70001e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.05998e-06 [label_fine_grained_interleaved_index]: 2.43e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.21998e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.09989e-07 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50001e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.51002e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 4.46002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.709e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.927e-05, [1] [Cycle 1]: 6.513e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 8.37998e-06 [elim_not_effective]: 1.195e-05 [opt_reshape]: 6.21998e-06 [fold_const_symbol]: 8.95999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.582e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.78001e-06 [opt_after_jit_grad]: 0.00045117 [validate]: 3.236e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0920277 [execute]: 9.82001e-06 Sums bootstrap : 0.000476s : 0.47% type_inference : 0.004403s : 4.38% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000020s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000419s : 0.42% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000351s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.45% optimize.opt_b.b_1 : 0.000113s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000416s : 0.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000451s : 0.45% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.092028s : 91.62% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000121 26 17.60% : 0.000021s : 4: substitution.arithmetic_simplify 1.75% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.25% : 0.000005s : 4: substitution.graph_param_transform 65.66% : 0.000080s : 2: substitution.inline 2.15% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.83% : 0.000005s : 4: substitution.remove_not_recompute_node 3.67% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004363 2 90.83% : 0.003963s : 1: type_inference.infer 9.17% : 0.000400s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.95% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.22% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.90% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.96% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.87% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.53% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.19% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.94% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000292 6 35.20% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 64.80% : 0.000189s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.112371 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.64% : 0.002967s : 1: add_attr 2.63% : 0.002957s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.45% : 0.000511s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.38% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000772s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.69% : 0.001904s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.41% : 0.000461s : 1: opt_after_jit_grad 0.17% : 0.000192s : 1: opt_b 3.31% : 0.003723s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000024s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000190s : 1: renormalize.infer 0.14% : 0.000155s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 81.92% : 0.092051s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.93% : 0.004417s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.139383, [24] [bootstrap]: 0.00050564 [type_inference]: 0.0102431 [event_method]: 4.268e-05 [auto_monad]: 0.00011635 [graph_reusing]: 7.84002e-06 [inline]: 1.92001e-06 [add_attr]: 0.00301935, [1] [add_attr_with_inline]: 0.00301055, [1] [Cycle 1]: 6.836e-05, [2] [tag_attr]: 3.2e-05 [meta_addattr_fg_expand]: 8.35001e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 4.641e-05 [insert-virtual-dataset]: 2.78e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.0131487, [53] [py_interpret_to_execute]: 3.601e-05 [rewriter_before_opt_a]: 0.00012855 [opt_a]: 0.0108962, [3] [Cycle 1]: 0.00698853, [45] [expand_dump_flag]: 3.76999e-06 [switch_simplify]: 6.61e-05 [loop_unroll]: 5.491e-05 [a_1]: 0.00133381 [with_stream_mark]: 2.392e-05 [recompute_prepare]: 2.097e-05 [updatestate_depend_eliminate]: 9.22999e-06 [updatestate_assign_eliminate]: 7.63001e-06 [updatestate_loads_eliminate]: 7.24001e-06 [parameter_eliminate]: 2.56998e-06 [a_2]: 0.00024389 [accelerated_algorithm]: 3.093e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 3.27997e-06 [shard_inline]: 1.606e-05 [merge_send_recv]: 1.635e-05 [auto_parallel]: 1.091e-05 [parallel]: 1.824e-05 [flash_sp]: 1.122e-05 [merge_comm]: 9.76998e-06 [allreduce_fusion]: 9.22999e-06 [matmul_add_comm_reduction]: 2.529e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.879e-05 [virtual_dataset]: 1.596e-05 [get_grad_eliminate_]: 1.534e-05 [virtual_output]: 1.521e-05 [merge_forward]: 9.46e-06 [cell_reuse_recompute_pass]: 1.00999e-06 [offload_activation]: 1.737e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.845e-05 [merge_recompute_call_nodes]: 1.75001e-06 [before_grad]: 2.784e-05 [set_forward_comm_id_for_comm_node_pass]: 9.77001e-06 [meta_fg_expand]: 0.00143108 [flash_sp_send_recv_attached]: 3.74002e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 5.866e-05 [a_after_grad]: 8.094e-05 [renormalize]: 0.00245974 [add_forward_monad_depend]: 9.29998e-06 [auto_monad_grad]: 5.54998e-06 [auto_monad_eliminator]: 5.675e-05 [cse]: 0.00017365 [a_3]: 0.00034125 [Cycle 2]: 0.00296974, [45] [expand_dump_flag]: 1.59e-06 [switch_simplify]: 4.736e-05 [loop_unroll]: 4.365e-05 [a_1]: 0.00153426 [with_stream_mark]: 1.18e-05 [recompute_prepare]: 1.079e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 4.05998e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012641 [accelerated_algorithm]: 1.216e-05 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.59e-06 [merge_send_recv]: 6.74001e-06 [auto_parallel]: 7.31001e-06 [parallel]: 4.83001e-06 [flash_sp]: 3.38e-06 [merge_comm]: 5.69e-06 [allreduce_fusion]: 5.44e-06 [matmul_add_comm_reduction]: 7.53e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.034e-05 [virtual_dataset]: 9.10001e-06 [get_grad_eliminate_]: 8.80001e-06 [virtual_output]: 8.64998e-06 [merge_forward]: 4.74e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 8.85001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.6e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.436e-05 [set_forward_comm_id_for_comm_node_pass]: 5.42999e-06 [meta_fg_expand]: 3.442e-05 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.529e-05 [a_after_grad]: 1.444e-05 [renormalize]: 0.00058961 [add_forward_monad_depend]: 3.71999e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.456e-05 [cse]: 4.678e-05 [a_3]: 6.62e-05 [Cycle 3]: 0.00092375, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.086e-05 [loop_unroll]: 9.02e-06 [a_1]: 0.0002501 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 9.40001e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 4.16001e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 0.00012465 [accelerated_algorithm]: 1.177e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 9.02e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.38e-06 [parallel]: 4.74002e-06 [flash_sp]: 1.07e-06 [merge_comm]: 5.00999e-06 [allreduce_fusion]: 5.17e-06 [matmul_add_comm_reduction]: 7.77e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.99001e-06 [virtual_dataset]: 8.85999e-06 [get_grad_eliminate_]: 8.65001e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.35999e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.57998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.576e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.376e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.02002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.321e-05 [a_after_grad]: 1.445e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 1.13e-05 [cse]: 2.726e-05 [a_3]: 5.931e-05 [py_interpret_to_execute_after_opt_a]: 1.046e-05 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 4.763e-05 [convert_after_rewriter]: 8.98002e-06 [order_py_execute_after_rewriter]: 7.28999e-06 [mutable_eliminate]: 0.00046193 [opt_b]: 0.00028697, [1] [Cycle 1]: 0.00028107, [7] [b_1]: 0.00018874 [b_2]: 1.085e-05 [updatestate_depend_eliminate]: 7.3e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 3.69997e-07 [cse]: 3.181e-05 [optimize_parallel_all_gather_comm]: 2.016e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.046e-05 [loop_unroll]: 0.00042703 [opt_after_cconv]: 0.00013702, [1] [Cycle 1]: 0.00013112, [7] [c_1]: 4.852e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 7.23e-06 [updatestate_assign_eliminate]: 4.32998e-06 [updatestate_loads_eliminate]: 3.81001e-06 [cse]: 3.088e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 2.886e-05 [tuple_transform]: 0.0001021, [1] [Cycle 1]: 9.739e-05, [4] [d_1]: 6.691e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.87999e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.784e-05 [cse_after_recomputation]: 3.249e-05, [1] [Cycle 1]: 2.785e-05, [1] [cse]: 2.233e-05 [environ_conv]: 9.08002e-06 [swap_dp_allreduce_reducescatter]: 8.02998e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.29997e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.63e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.34999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.661e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.80001e-06 [overlap_recompute_and_grad_model_parallel]: 6.12001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36998e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 5.15001e-06 [overlap_grad_flash_sp]: 2.487e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.29999e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.795e-05, [1] [Cycle 1]: 9.373e-05, [6] [build]: 9.08002e-06 [elim_shapecalc]: 1.36e-05 [elim_not_effective]: 1.81e-05 [opt_reshape]: 1.02e-05 [fold_const_symbol]: 1.469e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.73002e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.447e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.65998e-06 [opt_after_jit_grad]: 0.00046663 [validate]: 4.622e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.111468 [execute]: 8.45001e-06 Sums bootstrap : 0.000506s : 0.37% type_inference : 0.010243s : 7.58% event_method : 0.000043s : 0.03% auto_monad : 0.000116s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000129s : 0.10% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000124s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.08% optimize.opt_a.a_1 : 0.003118s : 2.31% optimize.opt_a.with_stream_mark : 0.000046s : 0.03% optimize.opt_a.recompute_prepare : 0.000041s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.37% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001469s : 1.09% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.06% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003049s : 2.26% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000248s : 0.18% optimize.opt_a.a_3 : 0.000467s : 0.35% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000462s : 0.34% optimize.opt_b.b_1 : 0.000189s : 0.14% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000427s : 0.32% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000467s : 0.35% validate : 0.000046s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.111468s : 82.52% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000732 218 5.91% : 0.000043s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.96% : 0.000403s : 16: substitution.inline 2.13% : 0.000016s : 2: substitution.inline_without_move 1.42% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.06% : 0.000015s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.83% : 0.000013s : 20: substitution.remove_not_recompute_node 3.23% : 0.000024s : 10: substitution.replace_applicator 1.35% : 0.000010s : 15: substitution.replace_old_param 0.34% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.74% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.49% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010174 2 86.99% : 0.008850s : 1: type_inference.infer 13.01% : 0.001324s : 1: type_inference.specialize ------[replace.] 0.000203 30 58.20% : 0.000118s : 16: replace.inline 41.80% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 30 92.71% : 0.000394s : 16: match.inline 7.29% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.21% : 0.000009s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.73% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.85% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.67% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.32% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.61% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001539 32 57.14% : 0.000879s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.86% : 0.000660s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.163695 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.85% : 0.003024s : 1: add_attr 1.84% : 0.003015s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000123s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.33% : 0.000540s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000050s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.27% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.29% : 0.000471s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.92% : 0.004776s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000175s : 28: opt.transform.opt_b 0.05% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.66% : 0.010899s : 1: opt_a 0.09% : 0.000141s : 1: opt_after_cconv 0.29% : 0.000476s : 1: opt_after_jit_grad 0.18% : 0.000291s : 1: opt_b 8.03% : 0.013152s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.02% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.96% : 0.001579s : 2: renormalize.infer 0.89% : 0.001456s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000052s : 1: rewriter_after_opt_a 0.08% : 0.000133s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 68.11% : 0.111490s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.27% : 0.010258s : 1: type_inference 0.04% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x2-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x3-pynative],max_mem:56.0M TotalTime = 0.0215611, [24] [bootstrap]: 0.00054861 [type_inference]: 0.00615182 [event_method]: 1.443e-05 [auto_monad]: 5.638e-05 [graph_reusing]: 5.27001e-06 [inline]: 1.77001e-06 [add_attr]: 0.00340861, [1] [add_attr_with_inline]: 0.0033984, [1] [Cycle 1]: 4.57e-05, [2] [tag_attr]: 1.624e-05 [meta_addattr_fg_expand]: 4.16001e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 2.711e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.91003e-06 [optimize]: 0.004016, [53] [py_interpret_to_execute]: 1.955e-05 [rewriter_before_opt_a]: 5.938e-05 [opt_a]: 0.0021737, [2] [Cycle 1]: 0.00157351, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 3.172e-05 [loop_unroll]: 2.124e-05 [a_1]: 0.00050961 [with_stream_mark]: 1.321e-05 [recompute_prepare]: 7.87998e-06 [updatestate_depend_eliminate]: 4.13999e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 3.03998e-06 [parameter_eliminate]: 1.80001e-06 [a_2]: 7.463e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 6.17999e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 6.12999e-06 [parallel]: 2.18e-05 [flash_sp]: 6.98e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 9.22001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 6.84001e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.68997e-06 [merge_forward]: 3.43e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.04003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.095e-05 [a_after_grad]: 9.08002e-06 [renormalize]: 0.0004201 [add_forward_monad_depend]: 4.59998e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.389e-05 [cse]: 2.549e-05 [a_3]: 4.104e-05 [Cycle 2]: 0.00059087, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.65001e-06 [a_1]: 0.0001258 [with_stream_mark]: 9.39998e-06 [recompute_prepare]: 5.67999e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.756e-05 [accelerated_algorithm]: 5.37001e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.13001e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.12e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.40002e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.16998e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03998e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.49977e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 8.00999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.10002e-06 [cse]: 1.325e-05 [a_3]: 3.177e-05 [py_interpret_to_execute_after_opt_a]: 7.82e-06 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 2.826e-05 [convert_after_rewriter]: 7.70998e-06 [order_py_execute_after_rewriter]: 5.49e-06 [mutable_eliminate]: 0.00045268 [opt_b]: 0.00018215, [1] [Cycle 1]: 0.00017628, [7] [b_1]: 0.000108 [b_2]: 7.06999e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.87002e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 5.20027e-07 [cse]: 1.647e-05 [optimize_parallel_all_gather_comm]: 1.639e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.22e-05 [loop_unroll]: 0.00041702 [opt_after_cconv]: 9.447e-05, [1] [Cycle 1]: 8.873e-05, [7] [c_1]: 2.787e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.615e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.286e-05 [tuple_transform]: 6.825e-05, [1] [Cycle 1]: 6.414e-05, [4] [d_1]: 3.905e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 5.97999e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 5e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.62e-05, [1] [cse]: 1.084e-05 [environ_conv]: 4.74998e-06 [swap_dp_allreduce_reducescatter]: 5.41998e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.06998e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.39003e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.36998e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.35002e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.01002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.167e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.56999e-06 [overlap_recompute_and_grad_model_parallel]: 4.18999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66e-06 [overlap_recompute_comm]: 1.89999e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 1.28002e-06 [symbol_engine_optimizer]: 6.792e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.67e-06 [elim_not_effective]: 1.124e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.84e-06 [renormalize]: 2.19996e-07 [detach_backward]: 2.02999e-06 [pipeline_parallel_scheduler]: 1.71002e-06 [auto_monad_reorder]: 1.525e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 0.00012897 [opt_after_jit_grad]: 0.00047072 [validate]: 3.108e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00646536 [execute]: 6.80002e-06 Sums bootstrap : 0.000549s : 3.19% type_inference : 0.006152s : 35.78% event_method : 0.000014s : 0.08% auto_monad : 0.000056s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000635s : 3.70% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000420s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.63% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000417s : 2.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000129s : 0.75% opt_after_jit_grad : 0.000471s : 2.74% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.006465s : 37.61% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000223 30 37.39% : 0.000083s : 5: substitution.arithmetic_simplify 0.83% : 0.000002s : 2: substitution.elim_not_effective 0.55% : 0.000001s : 2: substitution.fold_const_symbol 2.44% : 0.000005s : 4: substitution.graph_param_transform 49.07% : 0.000109s : 3: substitution.inline 1.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 1.81% : 0.000004s : 4: substitution.remove_not_recompute_node 1.94% : 0.000004s : 4: substitution.replace_old_param 4.72% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006104 2 90.37% : 0.005516s : 1: type_inference.infer 9.63% : 0.000588s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.85% : 0.000026s : 3: replace.inline 31.15% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.90% : 0.000107s : 3: match.inline 8.10% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.84% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.51% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000365 8 46.27% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.73% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030560 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.17% : 0.003413s : 1: add_attr 11.13% : 0.003402s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000062s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000586s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.27% : 0.000999s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.12% : 0.002177s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.57% : 0.000481s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.15% : 0.004020s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000215s : 1: renormalize.infer 0.65% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000135s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000032s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.19% : 0.006475s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.18% : 0.006165s : 1: type_inference 0.19% : 0.000057s : 1: validate TotalTime = 0.0181598, [24] [bootstrap]: 0.0004801 [type_inference]: 0.00435468 [event_method]: 1.067e-05 [auto_monad]: 5.039e-05 [graph_reusing]: 4.85001e-06 [inline]: 1.93002e-06 [add_attr]: 0.0029865, [1] [add_attr_with_inline]: 0.00297858, [1] [Cycle 1]: 4.617e-05, [2] [tag_attr]: 1.188e-05 [meta_addattr_fg_expand]: 3.23e-06 [parallel-infer-symbol]: 2.67001e-06 [pre_auto_parallel]: 2.107e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00365498, [53] [py_interpret_to_execute]: 1.58e-05 [rewriter_before_opt_a]: 3.914e-05 [opt_a]: 0.00185772, [2] [Cycle 1]: 0.00126871, [45] [expand_dump_flag]: 2.43e-06 [switch_simplify]: 2.339e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.00029287 [with_stream_mark]: 1.291e-05 [recompute_prepare]: 7.61001e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.68e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.51998e-06 [shard_inline]: 5.78002e-06 [merge_send_recv]: 7.58001e-06 [auto_parallel]: 6.19001e-06 [parallel]: 1.856e-05 [flash_sp]: 6.86999e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.55001e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 5.72999e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 6.00002e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 8.57998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.49999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33998e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.56e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.52e-06 [renormalize]: 0.00035569 [add_forward_monad_depend]: 4.56002e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 2.73e-05 [a_3]: 3.977e-05 [Cycle 2]: 0.00057961, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.61999e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00011815 [with_stream_mark]: 9.51e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.68998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.663e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.18001e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 5.82999e-06 [virtual_dataset]: 5.11997e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.78999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.00005e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.16998e-06 [a_after_grad]: 7.99002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.30013e-07 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 5.96e-06 [cse]: 1.257e-05 [a_3]: 3.046e-05 [py_interpret_to_execute_after_opt_a]: 7.23e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 3.023e-05 [convert_after_rewriter]: 6.39999e-06 [order_py_execute_after_rewriter]: 4.90999e-06 [mutable_eliminate]: 0.0004479 [opt_b]: 0.00017924, [1] [Cycle 1]: 0.0001737, [7] [b_1]: 0.00010693 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.35002e-06 [renormalize]: 3.69997e-07 [cse]: 1.564e-05 [optimize_parallel_all_gather_comm]: 1.547e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 2.264e-05 [loop_unroll]: 0.00041596 [opt_after_cconv]: 9.413e-05, [1] [Cycle 1]: 8.836e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.69999e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.521e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.259e-05 [tuple_transform]: 6.852e-05, [1] [Cycle 1]: 6.42e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.40001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.283e-05 [cse_after_recomputation]: 1.98e-05, [1] [Cycle 1]: 1.546e-05, [1] [cse]: 1.043e-05 [environ_conv]: 5.19e-06 [swap_dp_allreduce_reducescatter]: 4.93001e-06 [bias_add_comm_swap]: 2.25002e-06 [label_micro_interleaved_index]: 4.47998e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.14e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.49998e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.34998e-06 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 9.79984e-07 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50001e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.90001e-06 [offloading_packed_experts]: 4.02e-06 [overlap_recompute_and_grad_model_parallel]: 4.42998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 1.84998e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.656e-05 [begin_end_overlap_inline]: 7.09988e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.697e-05, [1] [Cycle 1]: 6.277e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 7.92998e-06 [elim_not_effective]: 1.096e-05 [opt_reshape]: 5.94999e-06 [fold_const_symbol]: 8.58001e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.88002e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.493e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.30998e-06 [opt_after_jit_grad]: 0.00044693 [validate]: 3.091e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00588334 [execute]: 7.35e-06 Sums bootstrap : 0.000480s : 3.38% type_inference : 0.004355s : 30.63% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000411s : 2.89% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000356s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000070s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000006s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.15% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000416s : 2.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.14% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005883s : 41.38% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 17.92% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.28% : 0.000005s : 4: substitution.graph_param_transform 65.83% : 0.000080s : 2: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.78% : 0.000005s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004315 2 92.09% : 0.003974s : 1: type_inference.infer 7.91% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.71% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.88% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.29% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.26% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.64% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 42.87% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.13% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026073 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.47% : 0.002991s : 1: add_attr 11.44% : 0.002982s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.98% : 0.000515s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000009s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000457s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.91% : 0.000758s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000030s : 4: opt.transform.symbol_engine_opt 7.14% : 0.001861s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.75% : 0.000457s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.03% : 0.003659s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.02% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000190s : 1: renormalize.infer 0.61% : 0.000159s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.60% : 0.005894s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.76% : 0.004369s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0197043, [24] [bootstrap]: 0.00053488 [type_inference]: 0.00554508 [event_method]: 1.399e-05 [auto_monad]: 5.392e-05 [graph_reusing]: 5.36998e-06 [inline]: 1.86998e-06 [add_attr]: 0.00303851, [1] [add_attr_with_inline]: 0.00303045, [1] [Cycle 1]: 4.498e-05, [2] [tag_attr]: 1.556e-05 [meta_addattr_fg_expand]: 4.10998e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.473e-05 [insert-virtual-dataset]: 2.71999e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.75001e-06 [optimize]: 0.00394026, [53] [py_interpret_to_execute]: 1.981e-05 [rewriter_before_opt_a]: 5.692e-05 [opt_a]: 0.00208469, [2] [Cycle 1]: 0.00148291, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.196e-05 [loop_unroll]: 2.047e-05 [a_1]: 0.0004479 [with_stream_mark]: 1.38e-05 [recompute_prepare]: 7.36001e-06 [updatestate_depend_eliminate]: 3.7e-06 [updatestate_assign_eliminate]: 3.05002e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.615e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 7.78999e-06 [auto_parallel]: 5.56e-06 [parallel]: 1.639e-05 [flash_sp]: 6.58003e-06 [merge_comm]: 3.48999e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.60001e-06 [virtual_output]: 5.90002e-06 [merge_forward]: 3.75998e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 8.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.074e-05 [a_after_grad]: 8.93002e-06 [renormalize]: 0.00040342 [add_forward_monad_depend]: 4.40999e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 2.643e-05 [a_3]: 4e-05 [Cycle 2]: 0.00059232, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012652 [with_stream_mark]: 9.10001e-06 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.66999e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.41998e-06 [parameter_eliminate]: 9.40025e-07 [a_2]: 6.773e-05 [accelerated_algorithm]: 5.53002e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.11002e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.18e-06 [merge_comm]: 2.76e-06 [allreduce_fusion]: 2.57001e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 5.90002e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 4.85999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.34999e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 2.92002e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.00001e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.24001e-06 [cse]: 1.615e-05 [a_3]: 3.244e-05 [py_interpret_to_execute_after_opt_a]: 7.18998e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.067e-05 [convert_after_rewriter]: 7.03e-06 [order_py_execute_after_rewriter]: 5.10001e-06 [mutable_eliminate]: 0.0004513 [opt_b]: 0.00018364, [1] [Cycle 1]: 0.00017759, [7] [b_1]: 0.00010915 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.16998e-06 [renormalize]: 4.70027e-07 [cse]: 1.678e-05 [optimize_parallel_all_gather_comm]: 1.583e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.324e-05 [loop_unroll]: 0.00043909 [opt_after_cconv]: 9.508e-05, [1] [Cycle 1]: 8.898e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.45002e-06 [cse]: 1.547e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.178e-05 [tuple_transform]: 6.943e-05, [1] [Cycle 1]: 6.488e-05, [4] [d_1]: 3.934e-05 [none_parameter_eliminate]: 1.40001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.21998e-06 [partial_unused_args_eliminate]: 2.07001e-06 [add_recomputation]: 4.202e-05 [cse_after_recomputation]: 2.022e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.082e-05 [environ_conv]: 4.90001e-06 [swap_dp_allreduce_reducescatter]: 5.49e-06 [bias_add_comm_swap]: 2.78998e-06 [label_micro_interleaved_index]: 3.86999e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.27e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.00001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.142e-05 [grouped_pairwise_exchange_alltoall]: 1.92001e-06 [offloading_packed_experts]: 3.41001e-06 [overlap_recompute_and_grad_model_parallel]: 4.65001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.27e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.675e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.718e-05, [1] [Cycle 1]: 6.309e-05, [6] [build]: 1.99e-06 [elim_shapecalc]: 8.33001e-06 [elim_not_effective]: 1.16e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.83001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.48002e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 1.483e-05 [get_jit_bprop_graph]: 9.30013e-07 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00044931 [validate]: 3.068e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00583616 [execute]: 6.93998e-06 Sums bootstrap : 0.000535s : 3.40% type_inference : 0.005545s : 35.28% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000057s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000574s : 3.65% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000404s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.87% optimize.opt_b.b_1 : 0.000109s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000439s : 2.79% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.86% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005836s : 37.13% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 15.42% : 0.000025s : 5: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.43% : 0.000006s : 4: substitution.graph_param_transform 65.43% : 0.000107s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000004s : 4: substitution.remove_not_recompute_node 2.45% : 0.000004s : 4: substitution.replace_old_param 6.78% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005504 2 90.09% : 0.004959s : 1: type_inference.infer 9.91% : 0.000546s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.28% : 0.000027s : 3: replace.inline 29.72% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.29% : 0.000105s : 3: match.inline 8.71% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.33% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.92% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 1.03% : 0.000002s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 4: predicate.row_tensor_eliminate 0.75% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.15% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.41% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.45% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.55% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028185 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.80% : 0.003043s : 1: add_attr 10.76% : 0.003034s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.02% : 0.000568s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.59% : 0.000448s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.33% : 0.000939s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.41% : 0.002088s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.63% : 0.000459s : 1: opt_after_jit_grad 0.66% : 0.000187s : 1: opt_b 13.99% : 0.003944s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.73% : 0.000206s : 1: renormalize.infer 0.68% : 0.000190s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 20.74% : 0.005846s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 19.72% : 0.005559s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.0373999, [24] [bootstrap]: 0.0005096 [type_inference]: 0.0113513 [event_method]: 4.784e-05 [auto_monad]: 0.00012033 [graph_reusing]: 7.85e-06 [inline]: 2.07001e-06 [add_attr]: 0.0029941, [1] [add_attr_with_inline]: 0.00298495, [1] [Cycle 1]: 6.996e-05, [2] [tag_attr]: 3.468e-05 [meta_addattr_fg_expand]: 9.36002e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 4.853e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.98002e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.013393, [53] [py_interpret_to_execute]: 3.758e-05 [rewriter_before_opt_a]: 0.00014439 [opt_a]: 0.0110997, [3] [Cycle 1]: 0.00715407, [45] [expand_dump_flag]: 4.19002e-06 [switch_simplify]: 7.296e-05 [loop_unroll]: 6.091e-05 [a_1]: 0.00150023 [with_stream_mark]: 2.337e-05 [recompute_prepare]: 2.157e-05 [updatestate_depend_eliminate]: 9.02e-06 [updatestate_assign_eliminate]: 8.2e-06 [updatestate_loads_eliminate]: 7.26001e-06 [parameter_eliminate]: 2.76e-06 [a_2]: 0.00024713 [accelerated_algorithm]: 3.218e-05 [shard]: 2.39001e-06 [meta_shard_fg_expand]: 3.26001e-06 [shard_inline]: 1.601e-05 [merge_send_recv]: 1.61e-05 [auto_parallel]: 1.066e-05 [parallel]: 1.887e-05 [flash_sp]: 1.142e-05 [merge_comm]: 9.69e-06 [allreduce_fusion]: 8.82e-06 [matmul_add_comm_reduction]: 2.62e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.863e-05 [virtual_dataset]: 1.614e-05 [get_grad_eliminate_]: 1.533e-05 [virtual_output]: 1.629e-05 [merge_forward]: 1.003e-05 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 1.804e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.847e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 2.683e-05 [set_forward_comm_id_for_comm_node_pass]: 9.75002e-06 [meta_fg_expand]: 0.00147627 [flash_sp_send_recv_attached]: 3.93999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 6.22e-05 [a_after_grad]: 8.277e-05 [renormalize]: 0.00240074 [add_forward_monad_depend]: 9.15001e-06 [auto_monad_grad]: 4.87e-06 [auto_monad_eliminator]: 5.52e-05 [cse]: 0.00016504 [a_3]: 0.00033362 [Cycle 2]: 0.00303362, [45] [expand_dump_flag]: 1.40001e-06 [switch_simplify]: 4.727e-05 [loop_unroll]: 4.363e-05 [a_1]: 0.00157207 [with_stream_mark]: 1.202e-05 [recompute_prepare]: 1.115e-05 [updatestate_depend_eliminate]: 5.01997e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012581 [accelerated_algorithm]: 1.191e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.00999e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.3e-06 [parallel]: 4.67e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 5.15999e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.51001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.013e-05 [virtual_dataset]: 8.65001e-06 [get_grad_eliminate_]: 9.15001e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 9.79984e-07 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.575e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.364e-05 [set_forward_comm_id_for_comm_node_pass]: 5.09e-06 [meta_fg_expand]: 7.018e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.612e-05 [a_after_grad]: 1.436e-05 [renormalize]: 0.00058512 [add_forward_monad_depend]: 4e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.446e-05 [cse]: 4.53e-05 [a_3]: 6.513e-05 [Cycle 3]: 0.00089796, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 1.043e-05 [loop_unroll]: 8.70999e-06 [a_1]: 0.00024818 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 9.22001e-06 [updatestate_depend_eliminate]: 4.77998e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012275 [accelerated_algorithm]: 1.15e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 8.96002e-06 [merge_send_recv]: 6.88998e-06 [auto_parallel]: 7.15e-06 [parallel]: 4.67e-06 [flash_sp]: 1.09e-06 [merge_comm]: 4.98001e-06 [allreduce_fusion]: 4.96002e-06 [matmul_add_comm_reduction]: 7.88001e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 9.97001e-06 [virtual_dataset]: 8.66997e-06 [get_grad_eliminate_]: 8.64998e-06 [virtual_output]: 8.33999e-06 [merge_forward]: 4.33999e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 8.67998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.566e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.388e-05 [set_forward_comm_id_for_comm_node_pass]: 5.05999e-06 [meta_fg_expand]: 2.89001e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.349e-05 [a_after_grad]: 1.393e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 1.133e-05 [cse]: 2.607e-05 [a_3]: 5.977e-05 [py_interpret_to_execute_after_opt_a]: 9.67999e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 4.744e-05 [convert_after_rewriter]: 9.58002e-06 [order_py_execute_after_rewriter]: 7.05002e-06 [mutable_eliminate]: 0.00045802 [opt_b]: 0.00028899, [1] [Cycle 1]: 0.00028246, [7] [b_1]: 0.00019006 [b_2]: 1.096e-05 [updatestate_depend_eliminate]: 7.14001e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 3.83001e-06 [renormalize]: 3.9002e-07 [cse]: 3.092e-05 [optimize_parallel_all_gather_comm]: 2.044e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 1.973e-05 [loop_unroll]: 0.00045706 [opt_after_cconv]: 0.00013747, [1] [Cycle 1]: 0.00013144, [7] [c_1]: 4.99e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 7.07002e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 4.03999e-06 [cse]: 2.933e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 2.833e-05 [tuple_transform]: 0.00010178, [1] [Cycle 1]: 9.717e-05, [4] [d_1]: 6.648e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.004e-05 [partial_unused_args_eliminate]: 2.01998e-06 [add_recomputation]: 5.646e-05 [cse_after_recomputation]: 3.146e-05, [1] [Cycle 1]: 2.683e-05, [1] [cse]: 2.135e-05 [environ_conv]: 9.01998e-06 [swap_dp_allreduce_reducescatter]: 7.71999e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.26997e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 9.30013e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.01998e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.686e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 4.88001e-06 [overlap_recompute_and_grad_model_parallel]: 5.72001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 5.17999e-06 [overlap_grad_flash_sp]: 2.36e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.90001e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 0.0001001, [1] [Cycle 1]: 9.578e-05, [6] [build]: 9.71998e-06 [elim_shapecalc]: 1.35e-05 [elim_not_effective]: 1.828e-05 [opt_reshape]: 1.036e-05 [fold_const_symbol]: 1.528e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.469e-05 [get_jit_bprop_graph]: 1.09998e-06 [rewriter_after_jit_bprop_graph]: 3.68999e-06 [opt_after_jit_grad]: 0.00047071 [validate]: 4.354e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00815201 [execute]: 6.82002e-06 Sums bootstrap : 0.000510s : 1.54% type_inference : 0.011351s : 34.25% event_method : 0.000048s : 0.14% auto_monad : 0.000120s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.39% optimize.opt_a.loop_unroll : 0.000113s : 0.34% optimize.opt_a.a_1 : 0.003320s : 10.02% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000054s : 0.16% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001549s : 4.67% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000092s : 0.28% optimize.opt_a.a_after_grad : 0.000111s : 0.34% optimize.opt_a.renormalize : 0.002986s : 9.01% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000236s : 0.71% optimize.opt_a.a_3 : 0.000459s : 1.38% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.38% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000457s : 1.38% optimize.opt_after_cconv.c_1 : 0.000050s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000471s : 1.42% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008152s : 24.60% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000769 222 5.83% : 0.000045s : 12: substitution.arithmetic_simplify 1.68% : 0.000013s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.31% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 56.35% : 0.000433s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.26% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.66% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.70% : 0.000013s : 20: substitution.remove_not_recompute_node 3.08% : 0.000024s : 10: substitution.replace_applicator 1.47% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.55% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.29% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.54% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.41% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011247 2 87.10% : 0.009796s : 1: type_inference.infer 12.90% : 0.001451s : 1: type_inference.specialize ------[replace.] 0.000220 33 58.00% : 0.000128s : 17: replace.inline 42.00% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000458 33 92.60% : 0.000424s : 17: match.inline 7.40% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000782 5764 1.05% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.02% : 0.000008s : 68: predicate.addn_zero_filter 1.01% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.96% : 0.000015s : 100: predicate.arithmetic_simplify 1.11% : 0.000009s : 68: predicate.cast_eliminate 1.10% : 0.000009s : 68: predicate.check_bprop_eliminate 0.50% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.09% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000014s : 108: predicate.environ_get_eliminate 1.16% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 101: predicate.exchange_switch_depend_value 5.63% : 0.000044s : 101: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.62% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.38% : 0.000042s : 249: predicate.inline 1.25% : 0.000010s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.05% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.07% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000009s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.94% : 0.000015s : 101: predicate.partial_defer_inline 1.67% : 0.000013s : 92: predicate.partial_eliminate 1.02% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000010s : 68: predicate.reduce_eliminate 2.59% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.29% : 0.000002s : 32: predicate.remove_not_recompute_node 1.79% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.04% : 0.000008s : 68: predicate.reshape_eliminate 1.09% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.20% : 0.000009s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.59% : 0.000005s : 32: predicate.specialize_transform 1.20% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000014s : 101: predicate.switch_defer_inline 2.87% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.82% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.38% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.73% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.94% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.55% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.20% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000005s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000005s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001561 34 57.56% : 0.000898s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.44% : 0.000662s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062091 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.83% : 0.002998s : 1: add_attr 4.81% : 0.002989s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000128s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000545s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.75% : 0.000466s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.03% : 0.004987s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.88% : 0.011103s : 1: opt_a 0.23% : 0.000141s : 1: opt_after_cconv 0.77% : 0.000480s : 1: opt_after_jit_grad 0.47% : 0.000293s : 1: opt_b 21.58% : 0.013397s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.60% : 0.001617s : 2: renormalize.infer 2.18% : 0.001356s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.15% : 0.008162s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.31% : 0.011367s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0185616, [24] [bootstrap]: 0.00047703 [type_inference]: 0.00432483 [event_method]: 1.069e-05 [auto_monad]: 5.065e-05 [graph_reusing]: 5.02e-06 [inline]: 2.48e-06 [add_attr]: 0.00293752, [1] [add_attr_with_inline]: 0.00292964, [1] [Cycle 1]: 4.657e-05, [2] [tag_attr]: 1.169e-05 [meta_addattr_fg_expand]: 3.13e-06 [parallel-infer-symbol]: 2.91999e-06 [pre_auto_parallel]: 2.104e-05 [insert-virtual-dataset]: 2.44999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.41e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00367667, [53] [py_interpret_to_execute]: 1.538e-05 [rewriter_before_opt_a]: 3.829e-05 [opt_a]: 0.00187513, [2] [Cycle 1]: 0.0012719, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 2.373e-05 [loop_unroll]: 1.354e-05 [a_1]: 0.00028981 [with_stream_mark]: 1.362e-05 [recompute_prepare]: 7e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 3.11001e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.58e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 2.14e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.45998e-06 [auto_parallel]: 5.89e-06 [parallel]: 1.749e-05 [flash_sp]: 6.96999e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 9.17999e-06 [allreduce_slice_to_reducescatter]: 5.09986e-07 [virtual_shard_identity]: 6.86001e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.79002e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 9.08002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.078e-05 [merge_recompute_call_nodes]: 1.51002e-06 [before_grad]: 9.17001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.06998e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 8.62998e-06 [renormalize]: 0.00037057 [add_forward_monad_depend]: 4.31002e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.326e-05 [cse]: 2.553e-05 [a_3]: 3.988e-05 [Cycle 2]: 0.00059403, [45] [expand_dump_flag]: 7.2e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.67001e-06 [a_1]: 0.00012514 [with_stream_mark]: 9.37999e-06 [recompute_prepare]: 5.97999e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.823e-05 [accelerated_algorithm]: 5.81e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.78997e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.03001e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.63003e-06 [cell_reuse_recompute_pass]: 1.24003e-06 [offload_activation]: 5.93002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.10999e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.79984e-07 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 6.00002e-06 [cse]: 1.322e-05 [a_3]: 3.158e-05 [py_interpret_to_execute_after_opt_a]: 7.9e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 3.036e-05 [convert_after_rewriter]: 6.56e-06 [order_py_execute_after_rewriter]: 4.82998e-06 [mutable_eliminate]: 0.00045299 [opt_b]: 0.00017992, [1] [Cycle 1]: 0.00017392, [7] [b_1]: 0.00010822 [b_2]: 6.81001e-06 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 3.39991e-07 [cse]: 1.553e-05 [optimize_parallel_all_gather_comm]: 1.474e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.196e-05 [loop_unroll]: 0.00041614 [opt_after_cconv]: 9.36e-05, [1] [Cycle 1]: 8.805e-05, [7] [c_1]: 2.758e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.538e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.179e-05 [tuple_transform]: 6.844e-05, [1] [Cycle 1]: 6.425e-05, [4] [d_1]: 3.841e-05 [none_parameter_eliminate]: 1.58002e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 2.11998e-06 [add_recomputation]: 4.305e-05 [cse_after_recomputation]: 2.034e-05, [1] [Cycle 1]: 1.607e-05, [1] [cse]: 1.084e-05 [environ_conv]: 4.67998e-06 [swap_dp_allreduce_reducescatter]: 5.22999e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.43002e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.37e-06 [interleave_parallel_branches]: 1.24e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.51998e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.35998e-06 [overlap_recompute_and_grad_model_parallel]: 4.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46998e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 3.85998e-06 [overlap_grad_flash_sp]: 1.657e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.686e-05, [1] [Cycle 1]: 6.281e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.04002e-06 [elim_not_effective]: 1.118e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.49998e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.46002e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.25e-06 [opt_after_jit_grad]: 0.00044684 [validate]: 3.123e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.00634237 [execute]: 6.98e-06 Sums bootstrap : 0.000477s : 3.25% type_inference : 0.004325s : 29.48% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000371s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 3.09% optimize.opt_b.b_1 : 0.000108s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.84% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.05% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006342s : 43.24% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.83% : 0.000022s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.52% : 0.000005s : 4: substitution.graph_param_transform 65.12% : 0.000078s : 2: substitution.inline 2.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.56% : 0.000004s : 4: substitution.remove_not_recompute_node 3.24% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004254 2 91.92% : 0.003911s : 1: type_inference.infer 8.08% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.85% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.71% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.83% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.24% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.45% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 1.20% : 0.000002s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.86% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.36% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.17% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 43.53% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.47% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026470 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.12% : 0.002942s : 1: add_attr 11.08% : 0.002933s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000513s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.89% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000020s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001878s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.73% : 0.000457s : 1: opt_after_jit_grad 0.69% : 0.000184s : 1: opt_b 13.90% : 0.003680s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.82% : 0.000218s : 1: renormalize.infer 0.55% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 24.00% : 0.006353s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.39% : 0.004338s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0358652, [24] [bootstrap]: 0.00051007 [type_inference]: 0.0102102 [event_method]: 4.204e-05 [auto_monad]: 0.00011513 [graph_reusing]: 8.45999e-06 [inline]: 2.09e-06 [add_attr]: 0.0030108, [1] [add_attr_with_inline]: 0.0030022, [1] [Cycle 1]: 6.762e-05, [2] [tag_attr]: 3.155e-05 [meta_addattr_fg_expand]: 8.72e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 4.565e-05 [insert-virtual-dataset]: 2.38002e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.0130879, [53] [py_interpret_to_execute]: 3.6e-05 [rewriter_before_opt_a]: 0.00012885 [opt_a]: 0.0108344, [3] [Cycle 1]: 0.00691439, [45] [expand_dump_flag]: 4.07e-06 [switch_simplify]: 6.795e-05 [loop_unroll]: 5.533e-05 [a_1]: 0.00133682 [with_stream_mark]: 2.231e-05 [recompute_prepare]: 2.146e-05 [updatestate_depend_eliminate]: 9.30001e-06 [updatestate_assign_eliminate]: 7.97e-06 [updatestate_loads_eliminate]: 7.78001e-06 [parameter_eliminate]: 2.63e-06 [a_2]: 0.00024372 [accelerated_algorithm]: 3.032e-05 [shard]: 2.46998e-06 [meta_shard_fg_expand]: 3.26001e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.603e-05 [auto_parallel]: 1.091e-05 [parallel]: 1.853e-05 [flash_sp]: 1.074e-05 [merge_comm]: 9.91998e-06 [allreduce_fusion]: 9.07001e-06 [matmul_add_comm_reduction]: 2.597e-05 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 1.853e-05 [virtual_dataset]: 1.594e-05 [get_grad_eliminate_]: 1.574e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.76998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.717e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.896e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.797e-05 [set_forward_comm_id_for_comm_node_pass]: 9.88998e-06 [meta_fg_expand]: 0.0014132 [flash_sp_send_recv_attached]: 3.55998e-06 [receive_attached]: 2.58e-06 [after_resolve]: 5.937e-05 [a_after_grad]: 8.16e-05 [renormalize]: 0.00237552 [add_forward_monad_depend]: 8.90999e-06 [auto_monad_grad]: 6.04001e-06 [auto_monad_eliminator]: 5.468e-05 [cse]: 0.00016452 [a_3]: 0.00037505 [Cycle 2]: 0.00298905, [45] [expand_dump_flag]: 1.51998e-06 [switch_simplify]: 4.794e-05 [loop_unroll]: 4.417e-05 [a_1]: 0.00156639 [with_stream_mark]: 1.179e-05 [recompute_prepare]: 1.106e-05 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.06002e-06 [a_2]: 0.00012593 [accelerated_algorithm]: 1.198e-05 [shard]: 1.29e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.93e-06 [auto_parallel]: 7.4e-06 [parallel]: 5.02999e-06 [flash_sp]: 3.2e-06 [merge_comm]: 5.05001e-06 [allreduce_fusion]: 4.46002e-06 [matmul_add_comm_reduction]: 7.51999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 8.82999e-06 [get_grad_eliminate_]: 8.97e-06 [virtual_output]: 8.27998e-06 [merge_forward]: 4.21001e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.42999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.657e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.419e-05 [set_forward_comm_id_for_comm_node_pass]: 5.13002e-06 [meta_fg_expand]: 3.496e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.54e-05 [a_after_grad]: 1.464e-05 [renormalize]: 0.00058318 [add_forward_monad_depend]: 3.94002e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.44e-05 [cse]: 4.465e-05 [a_3]: 6.556e-05 [Cycle 3]: 0.00091658, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.081e-05 [loop_unroll]: 8.95001e-06 [a_1]: 0.00025063 [with_stream_mark]: 1.008e-05 [recompute_prepare]: 9.42999e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 3.83001e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 0.00012365 [accelerated_algorithm]: 1.154e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.74001e-06 [auto_parallel]: 7.36999e-06 [parallel]: 4.72e-06 [flash_sp]: 1.09e-06 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 4.93001e-06 [matmul_add_comm_reduction]: 7.46999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 9.94001e-06 [virtual_dataset]: 8.99998e-06 [get_grad_eliminate_]: 8.50001e-06 [virtual_output]: 8.45999e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 8.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.563e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.384e-05 [set_forward_comm_id_for_comm_node_pass]: 5.12e-06 [meta_fg_expand]: 3.04001e-06 [flash_sp_send_recv_attached]: 8.40024e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.374e-05 [a_after_grad]: 1.503e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 1.096e-05 [cse]: 2.549e-05 [a_3]: 5.73e-05 [py_interpret_to_execute_after_opt_a]: 1.038e-05 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 4.982e-05 [convert_after_rewriter]: 9.19998e-06 [order_py_execute_after_rewriter]: 7.11001e-06 [mutable_eliminate]: 0.00046305 [opt_b]: 0.00028871, [1] [Cycle 1]: 0.00028257, [7] [b_1]: 0.00019052 [b_2]: 1.119e-05 [updatestate_depend_eliminate]: 6.96001e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 4.05998e-06 [renormalize]: 5.00004e-07 [cse]: 3.094e-05 [optimize_parallel_all_gather_comm]: 2e-05 [overlap_param_gather]: 1.78997e-06 [cconv]: 1.977e-05 [loop_unroll]: 0.00042581 [opt_after_cconv]: 0.00013675, [1] [Cycle 1]: 0.00013067, [7] [c_1]: 4.854e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 7.4e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 4.1e-06 [cse]: 2.949e-05 [renormalize]: 5.10016e-07 [remove_dup_value]: 2.795e-05 [tuple_transform]: 0.00010216, [1] [Cycle 1]: 9.732e-05, [4] [d_1]: 6.755e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 9.74999e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 5.785e-05 [cse_after_recomputation]: 3.177e-05, [1] [Cycle 1]: 2.708e-05, [1] [cse]: 2.171e-05 [environ_conv]: 8.90001e-06 [swap_dp_allreduce_reducescatter]: 7.61001e-06 [bias_add_comm_swap]: 2.50002e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.28002e-06 [micro_interleaved_order_control]: 2.91e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 8.10018e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.96001e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 1.20001e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52001e-06 [control_data_broadcast_order]: 1.695e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 5.38002e-06 [overlap_recompute_and_grad_model_parallel]: 5.54e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 5.31998e-06 [overlap_grad_flash_sp]: 2.4e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.83002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.849e-05, [1] [Cycle 1]: 9.429e-05, [6] [build]: 9.56e-06 [elim_shapecalc]: 1.338e-05 [elim_not_effective]: 1.81e-05 [opt_reshape]: 1.068e-05 [fold_const_symbol]: 1.453e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.489e-05 [get_jit_bprop_graph]: 1.24e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00046962 [validate]: 4.428e-05 [backend_pass]: 1.06997e-06 [task_emit]: 0.00805974 [execute]: 7.82e-06 Sums bootstrap : 0.000510s : 1.61% type_inference : 0.010210s : 32.32% event_method : 0.000042s : 0.13% auto_monad : 0.000115s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000129s : 0.41% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000127s : 0.40% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003154s : 9.98% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.56% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000005s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.11% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001451s : 4.59% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000089s : 0.28% optimize.opt_a.a_after_grad : 0.000111s : 0.35% optimize.opt_a.renormalize : 0.002959s : 9.37% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000235s : 0.74% optimize.opt_a.a_3 : 0.000498s : 1.58% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000050s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000463s : 1.47% optimize.opt_b.b_1 : 0.000191s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000426s : 1.35% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.49% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008060s : 25.51% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000734 218 5.97% : 0.000044s : 11: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.55% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.89% : 0.000403s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000014s : 3: substitution.less_batch_normalization 1.82% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.37% : 0.000025s : 10: substitution.replace_applicator 1.41% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.72% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.90% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.40% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010143 2 87.29% : 0.008854s : 1: type_inference.infer 12.71% : 0.001289s : 1: type_inference.specialize ------[replace.] 0.000203 30 59.38% : 0.000120s : 16: replace.inline 40.62% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 30 92.89% : 0.000395s : 16: match.inline 7.11% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000782 5663 1.05% : 0.000008s : 67: predicate.accumulaten_eliminater 0.25% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.93% : 0.000015s : 99: predicate.arithmetic_simplify 1.08% : 0.000008s : 67: predicate.cast_eliminate 1.09% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.12% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.14% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.72% : 0.000013s : 107: predicate.environ_get_eliminate 1.14% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.63% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.10% : 0.000016s : 97: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.64% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.52% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.27% : 0.000041s : 244: predicate.inline 1.17% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.61% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000013s : 97: predicate.list_to_tuple_eliminator_ 2.58% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.05% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.07% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.07% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.09% : 0.000009s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.93% : 0.000015s : 97: predicate.partial_defer_inline 1.58% : 0.000012s : 89: predicate.partial_eliminate 1.03% : 0.000008s : 67: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.21% : 0.000009s : 67: predicate.reduce_eliminate 2.56% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.80% : 0.000014s : 149: predicate.replace_applicator 0.59% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.03% : 0.000008s : 67: predicate.reshape_eliminate 1.08% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.22% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 97: predicate.switch_defer_inline 7.59% : 0.000059s : 165: predicate.switch_layer_defer_inline 4.69% : 0.000037s : 265: predicate.switch_simplify 1.05% : 0.000008s : 67: predicate.tile_eliminate 1.05% : 0.000008s : 67: predicate.transpose_eliminate 1.39% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.46% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.26% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.57% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.36% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.89% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.53% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.50% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.18% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001467 32 57.30% : 0.000840s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.70% : 0.000626s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060099 237 0.01% : 0.000004s : 1: ForceFp32Comm 5.02% : 0.003015s : 1: add_attr 5.00% : 0.003006s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000122s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.91% : 0.000545s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000049s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.79% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.06% : 0.004846s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000176s : 28: opt.transform.opt_b 0.13% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 18.03% : 0.010837s : 1: opt_a 0.23% : 0.000140s : 1: opt_after_cconv 0.80% : 0.000479s : 1: opt_after_jit_grad 0.49% : 0.000292s : 1: opt_b 21.78% : 0.013092s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.62% : 0.001577s : 2: renormalize.infer 2.28% : 0.001370s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000054s : 1: rewriter_after_opt_a 0.22% : 0.000133s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.43% : 0.008070s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 17.01% : 0.010225s : 1: type_inference 0.13% : 0.000076s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x3-kbk],max_mem:56.0M . TotalTime = 0.0843186, [24] [bootstrap]: 0.00055831 [type_inference]: 0.00615017 [event_method]: 1.408e-05 [auto_monad]: 5.614e-05 [graph_reusing]: 5.69e-06 [inline]: 1.77001e-06 [add_attr]: 0.00349591, [1] [add_attr_with_inline]: 0.00348511, [1] [Cycle 1]: 4.579e-05, [2] [tag_attr]: 1.489e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.78e-05 [insert-virtual-dataset]: 2.26998e-06 [parallel-infer-symbol-second]: 8.09989e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00418497, [53] [py_interpret_to_execute]: 2.164e-05 [rewriter_before_opt_a]: 5.889e-05 [opt_a]: 0.00230299, [2] [Cycle 1]: 0.00163498, [45] [expand_dump_flag]: 2.60002e-06 [switch_simplify]: 3.277e-05 [loop_unroll]: 2.105e-05 [a_1]: 0.00051513 [with_stream_mark]: 1.508e-05 [recompute_prepare]: 8.37998e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.53e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 8.245e-05 [accelerated_algorithm]: 6.64001e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 2.01e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 5.99e-06 [parallel]: 2.356e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.90998e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 9.12001e-06 [allreduce_slice_to_reducescatter]: 7.29982e-07 [virtual_shard_identity]: 7.69002e-06 [virtual_dataset]: 6.84999e-06 [get_grad_eliminate_]: 6.34001e-06 [virtual_output]: 6.29999e-06 [merge_forward]: 4.08001e-06 [cell_reuse_recompute_pass]: 1.45001e-06 [offload_activation]: 9.78002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.164e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 1.088e-05 [set_forward_comm_id_for_comm_node_pass]: 3.97998e-06 [meta_fg_expand]: 2.46998e-06 [flash_sp_send_recv_attached]: 2.38002e-06 [receive_attached]: 2.91e-06 [after_resolve]: 1.225e-05 [a_after_grad]: 1.026e-05 [renormalize]: 0.00043066 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.86998e-06 [auto_monad_eliminator]: 1.354e-05 [cse]: 2.76e-05 [a_3]: 4.605e-05 [Cycle 2]: 0.00065842, [45] [expand_dump_flag]: 1.04998e-06 [switch_simplify]: 7.6e-06 [loop_unroll]: 6.09001e-06 [a_1]: 0.00014304 [with_stream_mark]: 1.049e-05 [recompute_prepare]: 6.26998e-06 [updatestate_depend_eliminate]: 3.17002e-06 [updatestate_assign_eliminate]: 2.63003e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 1.14998e-06 [a_2]: 7.701e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 6.29001e-06 [merge_send_recv]: 4.85001e-06 [auto_parallel]: 5.91e-06 [parallel]: 5.27001e-06 [flash_sp]: 3.97e-06 [merge_comm]: 3.39001e-06 [allreduce_fusion]: 3.00002e-06 [matmul_add_comm_reduction]: 5.44e-06 [allreduce_slice_to_reducescatter]: 2.3999e-07 [virtual_shard_identity]: 7.28999e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.74999e-06 [virtual_output]: 5.54e-06 [merge_forward]: 2.76e-06 [cell_reuse_recompute_pass]: 1.66998e-06 [offload_activation]: 6.37001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 7.89994e-07 [before_grad]: 8.90001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.84e-06 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.19e-06 [after_resolve]: 1.016e-05 [a_after_grad]: 9.09998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.323e-05 [a_3]: 3.631e-05 [py_interpret_to_execute_after_opt_a]: 8.89e-06 [slice_cell_reuse_recomputed_activation]: 2.24999e-06 [rewriter_after_opt_a]: 3.418e-05 [convert_after_rewriter]: 8.03001e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00046769 [opt_b]: 0.0001906, [1] [Cycle 1]: 0.00018378, [7] [b_1]: 0.00010664 [b_2]: 1.048e-05 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 2.61999e-06 [renormalize]: 7.50006e-07 [cse]: 1.813e-05 [optimize_parallel_all_gather_comm]: 1.544e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.172e-05 [loop_unroll]: 0.00041615 [opt_after_cconv]: 9.413e-05, [1] [Cycle 1]: 8.858e-05, [7] [c_1]: 2.776e-05 [parameter_eliminate]: 2.13002e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.608e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.265e-05 [tuple_transform]: 6.865e-05, [1] [Cycle 1]: 6.444e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.48002e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 2.12001e-06 [add_recomputation]: 5.028e-05 [cse_after_recomputation]: 1.997e-05, [1] [Cycle 1]: 1.557e-05, [1] [cse]: 1.046e-05 [environ_conv]: 5.27999e-06 [swap_dp_allreduce_reducescatter]: 5.61e-06 [bias_add_comm_swap]: 2.58998e-06 [label_micro_interleaved_index]: 3.99002e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.44998e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.23998e-06 [assign_add_opt]: 1.55999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.34e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96998e-06 [control_data_broadcast_order]: 1.219e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.94998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.65001e-06 [overlap_recompute_comm]: 1.84e-06 [overlap_grad_ring_attention]: 3.78999e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.845e-05, [1] [Cycle 1]: 6.429e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.41002e-06 [elim_not_effective]: 1.192e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00047051 [validate]: 3.075e-05 [backend_pass]: 1.14e-06 [task_emit]: 0.0690722 [execute]: 8.42e-06 Sums bootstrap : 0.000558s : 0.70% type_inference : 0.006150s : 7.71% event_method : 0.000014s : 0.02% auto_monad : 0.000056s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000658s : 0.82% optimize.opt_a.with_stream_mark : 0.000026s : 0.03% optimize.opt_a.recompute_prepare : 0.000015s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000159s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.02% optimize.opt_a.virtual_dataset : 0.000013s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000022s : 0.03% optimize.opt_a.a_after_grad : 0.000019s : 0.02% optimize.opt_a.renormalize : 0.000431s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.05% optimize.opt_a.a_3 : 0.000082s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000468s : 0.59% optimize.opt_b.b_1 : 0.000107s : 0.13% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000416s : 0.52% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000471s : 0.59% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.069072s : 86.54% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000166 30 15.63% : 0.000026s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 64.27% : 0.000107s : 3: substitution.inline 2.15% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.77% : 0.000005s : 4: substitution.remove_not_recompute_node 2.87% : 0.000005s : 4: substitution.replace_old_param 7.17% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006102 2 91.05% : 0.005555s : 1: type_inference.infer 8.95% : 0.000546s : 1: type_inference.specialize ------[replace.] 0.000040 5 67.99% : 0.000027s : 3: replace.inline 32.01% : 0.000013s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 90.72% : 0.000105s : 3: match.inline 9.28% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000168 1131 0.90% : 0.000002s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.88% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.35% : 0.000004s : 19: predicate.arithmetic_simplify 0.82% : 0.000001s : 11: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 2.00% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.60% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000010s : 51: predicate.inline 0.92% : 0.000002s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.01% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.07% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.71% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000002s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.68% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.34% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.10% : 0.000004s : 24: predicate.switch_layer_defer_inline 5.17% : 0.000009s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 1.05% : 0.000002s : 11: predicate.transpose_eliminate 1.43% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000348 8 47.42% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.58% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.093645 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.74% : 0.003500s : 1: add_attr 3.73% : 0.003489s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000061s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.64% : 0.000597s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.45% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000477s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.13% : 0.001062s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000093s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.46% : 0.002306s : 1: opt_a 0.10% : 0.000097s : 1: opt_after_cconv 0.51% : 0.000480s : 1: opt_after_jit_grad 0.21% : 0.000194s : 1: opt_b 4.47% : 0.004189s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000218s : 1: renormalize.infer 0.22% : 0.000205s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000039s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 73.78% : 0.069089s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 6.58% : 0.006164s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.0700701, [24] [bootstrap]: 0.00047459 [type_inference]: 0.00439572 [event_method]: 1.137e-05 [auto_monad]: 5.025e-05 [graph_reusing]: 5.31002e-06 [inline]: 2.03997e-06 [add_attr]: 0.00292907, [1] [add_attr_with_inline]: 0.00292119, [1] [Cycle 1]: 4.308e-05, [2] [tag_attr]: 1.206e-05 [meta_addattr_fg_expand]: 3.10998e-06 [parallel-infer-symbol]: 2.80002e-06 [pre_auto_parallel]: 2.073e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.59998e-06 [optimize]: 0.00380838, [53] [py_interpret_to_execute]: 1.517e-05 [rewriter_before_opt_a]: 3.774e-05 [opt_a]: 0.00194555, [2] [Cycle 1]: 0.00131164, [45] [expand_dump_flag]: 3.14001e-06 [switch_simplify]: 2.441e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.0002899 [with_stream_mark]: 1.335e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 3.34001e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 7.66e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.88998e-06 [merge_send_recv]: 7.54002e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.662e-05 [flash_sp]: 8.03999e-06 [merge_comm]: 3.38999e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.61002e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.086e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.56998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.25002e-06 [flash_sp_send_recv_attached]: 2.98e-06 [receive_attached]: 3.5e-05 [after_resolve]: 1.104e-05 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00035934 [add_forward_monad_depend]: 5.35001e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.77e-05 [a_3]: 4.54e-05 [Cycle 2]: 0.00062412, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 7.71001e-06 [loop_unroll]: 6.07999e-06 [a_1]: 0.00014139 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 5.91e-06 [updatestate_depend_eliminate]: 2.83998e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 7.052e-05 [accelerated_algorithm]: 5.43002e-06 [shard]: 1.21002e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.08001e-06 [flash_sp]: 3.2e-06 [merge_comm]: 3.28e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 2.3999e-07 [virtual_shard_identity]: 5.77001e-06 [virtual_dataset]: 5.09e-06 [get_grad_eliminate_]: 4.95001e-06 [virtual_output]: 4.80001e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 6.24999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.022e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.55001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.15999e-06 [after_resolve]: 8.96998e-06 [a_after_grad]: 8.46002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 5.97001e-06 [cse]: 1.308e-05 [a_3]: 3.595e-05 [py_interpret_to_execute_after_opt_a]: 8.2e-06 [slice_cell_reuse_recomputed_activation]: 2.32999e-06 [rewriter_after_opt_a]: 3.345e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00046578 [opt_b]: 0.00018302, [1] [Cycle 1]: 0.00017679, [7] [b_1]: 0.00010779 [b_2]: 7.02002e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 3.70026e-07 [cse]: 1.74e-05 [optimize_parallel_all_gather_comm]: 1.619e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 2.299e-05 [loop_unroll]: 0.0004386 [opt_after_cconv]: 9.79e-05, [1] [Cycle 1]: 9.232e-05, [7] [c_1]: 2.826e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.659e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 1.252e-05 [tuple_transform]: 7.064e-05, [1] [Cycle 1]: 6.643e-05, [4] [d_1]: 4.04e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 1.49972e-07 [switch_simplify]: 6.31998e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.367e-05 [cse_after_recomputation]: 2.184e-05, [1] [Cycle 1]: 1.733e-05, [1] [cse]: 1.188e-05 [environ_conv]: 5.02999e-06 [swap_dp_allreduce_reducescatter]: 4.95999e-06 [bias_add_comm_swap]: 2.57001e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.44999e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.00001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.118e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 4.03001e-06 [overlap_recompute_and_grad_model_parallel]: 5.27001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.611e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.94e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.922e-05, [1] [Cycle 1]: 6.515e-05, [6] [build]: 2.61e-06 [elim_shapecalc]: 8.12003e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.78001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.506e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.0004429 [validate]: 3.098e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.0576626 [execute]: 8.28999e-06 Sums bootstrap : 0.000475s : 0.72% type_inference : 0.004396s : 6.64% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000431s : 0.65% optimize.opt_a.with_stream_mark : 0.000023s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000036s : 0.05% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000359s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000081s : 0.12% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000466s : 0.70% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000439s : 0.66% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000443s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057663s : 87.15% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.72% : 0.000022s : 4: substitution.arithmetic_simplify 1.61% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.51% : 0.000005s : 4: substitution.graph_param_transform 65.05% : 0.000078s : 2: substitution.inline 2.23% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.74% : 0.000004s : 4: substitution.remove_not_recompute_node 3.15% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004355 2 91.48% : 0.003984s : 1: type_inference.infer 8.52% : 0.000371s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000142 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.77% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.91% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.16% : 0.000002s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 1.08% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000009s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 1.04% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 9: predicate.reshape_eliminate 0.88% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 1.25% : 0.000002s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.22% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.90% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.98% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000258 6 43.76% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.24% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078120 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.76% : 0.002933s : 1: add_attr 3.74% : 0.002925s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000508s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.57% : 0.000448s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.61% : 0.000475s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.02% : 0.000793s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.49% : 0.001949s : 1: opt_a 0.13% : 0.000101s : 1: opt_after_cconv 0.58% : 0.000452s : 1: opt_after_jit_grad 0.24% : 0.000187s : 1: opt_b 4.88% : 0.003812s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000192s : 1: renormalize.infer 0.21% : 0.000161s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 73.83% : 0.057679s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.64% : 0.004410s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.0730895, [24] [bootstrap]: 0.00046671 [type_inference]: 0.00571327 [event_method]: 1.436e-05 [auto_monad]: 5.896e-05 [graph_reusing]: 5.29e-06 [inline]: 2.41e-06 [add_attr]: 0.00298466, [1] [add_attr_with_inline]: 0.00297636, [1] [Cycle 1]: 4.687e-05, [2] [tag_attr]: 1.514e-05 [meta_addattr_fg_expand]: 4.45e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.496e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.94999e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.00396657, [53] [py_interpret_to_execute]: 2.23e-05 [rewriter_before_opt_a]: 6.13e-05 [opt_a]: 0.00213269, [2] [Cycle 1]: 0.00152646, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 3.395e-05 [loop_unroll]: 2.113e-05 [a_1]: 0.00047443 [with_stream_mark]: 1.372e-05 [recompute_prepare]: 8e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 7.516e-05 [accelerated_algorithm]: 6.49999e-06 [shard]: 2.04999e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 7.65998e-06 [auto_parallel]: 6.17999e-06 [parallel]: 1.706e-05 [flash_sp]: 7e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 9.24e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 6.21e-06 [get_grad_eliminate_]: 5.89999e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 8.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.058e-05 [merge_recompute_call_nodes]: 1.84998e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.48e-06 [after_resolve]: 1.041e-05 [a_after_grad]: 9.05999e-06 [renormalize]: 0.0004068 [add_forward_monad_depend]: 4.79e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.365e-05 [cse]: 2.81e-05 [a_3]: 4.093e-05 [Cycle 2]: 0.00059704, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.99001e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012473 [with_stream_mark]: 9.98002e-06 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.25002e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.818e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.25001e-06 [parallel]: 3.98999e-06 [flash_sp]: 3.25e-06 [merge_comm]: 2.97002e-06 [allreduce_fusion]: 2.92002e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.12001e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.53998e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.29e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.65998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 6.79982e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 8.59e-06 [a_after_grad]: 8.25e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.07999e-06 [cse]: 1.396e-05 [a_3]: 3.315e-05 [py_interpret_to_execute_after_opt_a]: 7.48999e-06 [slice_cell_reuse_recomputed_activation]: 1.89999e-06 [rewriter_after_opt_a]: 3.103e-05 [convert_after_rewriter]: 7.21001e-06 [order_py_execute_after_rewriter]: 4.97999e-06 [mutable_eliminate]: 0.00044722 [opt_b]: 0.00018057, [1] [Cycle 1]: 0.00017448, [7] [b_1]: 0.00010728 [b_2]: 6.93e-06 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 4.19997e-07 [cse]: 1.606e-05 [optimize_parallel_all_gather_comm]: 1.652e-05 [overlap_param_gather]: 1.65001e-06 [cconv]: 2.124e-05 [loop_unroll]: 0.00041481 [opt_after_cconv]: 9.37e-05, [1] [Cycle 1]: 8.791e-05, [7] [c_1]: 2.638e-05 [parameter_eliminate]: 2.23002e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.593e-05 [renormalize]: 3.99974e-07 [remove_dup_value]: 1.269e-05 [tuple_transform]: 6.992e-05, [1] [Cycle 1]: 6.548e-05, [4] [d_1]: 3.942e-05 [none_parameter_eliminate]: 1.86e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.22001e-06 [partial_unused_args_eliminate]: 1.58997e-06 [add_recomputation]: 4.307e-05 [cse_after_recomputation]: 1.996e-05, [1] [Cycle 1]: 1.573e-05, [1] [cse]: 1.066e-05 [environ_conv]: 5.24e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 2.33002e-06 [label_micro_interleaved_index]: 3.95998e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.49999e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 6.99976e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.18e-05 [grouped_pairwise_exchange_alltoall]: 1.76998e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.55001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.68002e-06 [overlap_recompute_comm]: 2.22001e-06 [overlap_grad_ring_attention]: 3.78001e-06 [overlap_grad_flash_sp]: 1.72e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.876e-05, [1] [Cycle 1]: 6.472e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.48999e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.06998e-06 [fold_const_symbol]: 9.18002e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.524e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.0004681 [validate]: 3.036e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.0591109 [execute]: 8.43999e-06 Sums bootstrap : 0.000467s : 0.68% type_inference : 0.005713s : 8.26% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.09% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000061s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000041s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000599s : 0.87% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000407s : 0.59% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.65% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000415s : 0.60% optimize.opt_after_cconv.c_1 : 0.000026s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000468s : 0.68% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.059111s : 85.50% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.23% : 0.000024s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000002s : 2: substitution.fold_const_symbol 3.20% : 0.000005s : 4: substitution.graph_param_transform 67.34% : 0.000113s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.57% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005670 2 89.45% : 0.005072s : 1: type_inference.infer 10.55% : 0.000598s : 1: type_inference.specialize ------[replace.] 0.000058 5 79.74% : 0.000046s : 3: replace.inline 20.26% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.76% : 0.000111s : 3: match.inline 8.24% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.41% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000009s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.35% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.92% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.82% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.76% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.23% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 47.68% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.32% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081571 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.66% : 0.002989s : 1: add_attr 3.65% : 0.002980s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000065s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000502s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000021s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.19% : 0.000967s : 78: opt.transform.opt_a 0.03% : 0.000025s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.62% : 0.002136s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.59% : 0.000478s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.87% : 0.003970s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000027s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000201s : 1: renormalize.infer 0.24% : 0.000199s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000065s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 72.49% : 0.059128s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 7.02% : 0.005727s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.109206, [24] [bootstrap]: 0.00051086 [type_inference]: 0.0114185 [event_method]: 4.831e-05 [auto_monad]: 0.00011749 [graph_reusing]: 8.23001e-06 [inline]: 1.97001e-06 [add_attr]: 0.00298082, [1] [add_attr_with_inline]: 0.00297257, [1] [Cycle 1]: 6.994e-05, [2] [tag_attr]: 3.381e-05 [meta_addattr_fg_expand]: 9.34e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 6.031e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 9.40025e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.63997e-06 [optimize]: 0.0133991, [53] [py_interpret_to_execute]: 3.949e-05 [rewriter_before_opt_a]: 0.00014574 [opt_a]: 0.011079, [3] [Cycle 1]: 0.00710274, [45] [expand_dump_flag]: 3.44001e-06 [switch_simplify]: 7.498e-05 [loop_unroll]: 6.272e-05 [a_1]: 0.00145685 [with_stream_mark]: 2.292e-05 [recompute_prepare]: 2.141e-05 [updatestate_depend_eliminate]: 9.31e-06 [updatestate_assign_eliminate]: 7.71999e-06 [updatestate_loads_eliminate]: 7.75e-06 [parameter_eliminate]: 2.71e-06 [a_2]: 0.00024502 [accelerated_algorithm]: 3.084e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 3.35e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.58e-05 [auto_parallel]: 1.098e-05 [parallel]: 1.7e-05 [flash_sp]: 1.107e-05 [merge_comm]: 9.56998e-06 [allreduce_fusion]: 8.79e-06 [matmul_add_comm_reduction]: 2.678e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.786e-05 [virtual_dataset]: 1.576e-05 [get_grad_eliminate_]: 1.524e-05 [virtual_output]: 1.528e-05 [merge_forward]: 9.20999e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 1.689e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.855e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.784e-05 [set_forward_comm_id_for_comm_node_pass]: 9.74999e-06 [meta_fg_expand]: 0.00142312 [flash_sp_send_recv_attached]: 3.85e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 5.942e-05 [a_after_grad]: 8.266e-05 [renormalize]: 0.00245415 [add_forward_monad_depend]: 9.42001e-06 [auto_monad_grad]: 5.22999e-06 [auto_monad_eliminator]: 5.638e-05 [cse]: 0.00016816 [a_3]: 0.00033683 [Cycle 2]: 0.0030549, [45] [expand_dump_flag]: 1.59998e-06 [switch_simplify]: 4.669e-05 [loop_unroll]: 4.395e-05 [a_1]: 0.00156833 [with_stream_mark]: 1.218e-05 [recompute_prepare]: 1.132e-05 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 3.60998e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012857 [accelerated_algorithm]: 1.193e-05 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 9.34e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 7.38e-06 [parallel]: 4.99e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.28002e-06 [allreduce_fusion]: 5.07e-06 [matmul_add_comm_reduction]: 7.85998e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.074e-05 [virtual_dataset]: 9.10001e-06 [get_grad_eliminate_]: 9.07999e-06 [virtual_output]: 8.64e-06 [merge_forward]: 4.88001e-06 [cell_reuse_recompute_pass]: 8.80013e-07 [offload_activation]: 9.09998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.685e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.435e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 6.921e-05 [flash_sp_send_recv_attached]: 9.79984e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.614e-05 [a_after_grad]: 1.459e-05 [renormalize]: 0.0005945 [add_forward_monad_depend]: 3.93001e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 1.472e-05 [cse]: 4.765e-05 [a_3]: 6.624e-05 [Cycle 3]: 0.00090724, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.075e-05 [loop_unroll]: 8.90999e-06 [a_1]: 0.00025203 [with_stream_mark]: 9.71e-06 [recompute_prepare]: 9.66e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 3.92998e-06 [updatestate_loads_eliminate]: 3.79002e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.0001241 [accelerated_algorithm]: 1.158e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 9.22999e-06 [merge_send_recv]: 7.05e-06 [auto_parallel]: 7.38999e-06 [parallel]: 4.48001e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 4.74998e-06 [matmul_add_comm_reduction]: 7.65e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.64e-06 [virtual_dataset]: 8.78001e-06 [get_grad_eliminate_]: 8.60001e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.06001e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.44998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.723e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.475e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.349e-05 [a_after_grad]: 1.407e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.24998e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 1.075e-05 [cse]: 2.703e-05 [a_3]: 5.931e-05 [py_interpret_to_execute_after_opt_a]: 1.021e-05 [slice_cell_reuse_recomputed_activation]: 2.43e-06 [rewriter_after_opt_a]: 4.676e-05 [convert_after_rewriter]: 9.05001e-06 [order_py_execute_after_rewriter]: 6.71999e-06 [mutable_eliminate]: 0.00046251 [opt_b]: 0.00032944, [1] [Cycle 1]: 0.00032344, [7] [b_1]: 0.00022981 [b_2]: 1.102e-05 [updatestate_depend_eliminate]: 7.15998e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.07998e-06 [renormalize]: 4.10015e-07 [cse]: 3.143e-05 [optimize_parallel_all_gather_comm]: 2.01e-05 [overlap_param_gather]: 2.15002e-06 [cconv]: 1.99e-05 [loop_unroll]: 0.00043042 [opt_after_cconv]: 0.00013829, [1] [Cycle 1]: 0.00013234, [7] [c_1]: 4.949e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 7.24001e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 4.08001e-06 [cse]: 2.993e-05 [renormalize]: 6.80011e-07 [remove_dup_value]: 2.987e-05 [tuple_transform]: 0.00010322, [1] [Cycle 1]: 9.852e-05, [4] [d_1]: 6.774e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 1.033e-05 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 5.639e-05 [cse_after_recomputation]: 3.272e-05, [1] [Cycle 1]: 2.827e-05, [1] [cse]: 2.252e-05 [environ_conv]: 8.90999e-06 [swap_dp_allreduce_reducescatter]: 8.36002e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.40002e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.75e-05 [grouped_pairwise_exchange_alltoall]: 1.79998e-06 [offloading_packed_experts]: 4.97e-06 [overlap_recompute_and_grad_model_parallel]: 6.19001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 5.09998e-06 [overlap_grad_flash_sp]: 2.413e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.844e-05, [1] [Cycle 1]: 9.443e-05, [6] [build]: 1.001e-05 [elim_shapecalc]: 1.314e-05 [elim_not_effective]: 1.828e-05 [opt_reshape]: 1.042e-05 [fold_const_symbol]: 1.509e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 2.473e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.81001e-06 [opt_after_jit_grad]: 0.00047097 [validate]: 4.446e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0798713 [execute]: 9.11002e-06 Sums bootstrap : 0.000511s : 0.49% type_inference : 0.011418s : 10.88% event_method : 0.000048s : 0.05% auto_monad : 0.000117s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000060s : 0.06% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.04% optimize.rewriter_before_opt_a : 0.000146s : 0.14% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.13% optimize.opt_a.loop_unroll : 0.000116s : 0.11% optimize.opt_a.a_1 : 0.003277s : 3.12% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000498s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001495s : 1.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.11% optimize.opt_a.renormalize : 0.003049s : 2.90% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000243s : 0.23% optimize.opt_a.a_3 : 0.000462s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.44% optimize.opt_b.b_1 : 0.000230s : 0.22% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000430s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000471s : 0.45% validate : 0.000044s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079871s : 76.10% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000773 222 5.89% : 0.000046s : 12: substitution.arithmetic_simplify 1.73% : 0.000013s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000003s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 56.07% : 0.000434s : 17: substitution.inline 2.06% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000014s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.08% : 0.000024s : 10: substitution.replace_applicator 1.35% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.54% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.57% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011344 2 86.29% : 0.009789s : 1: type_inference.infer 13.71% : 0.001556s : 1: type_inference.specialize ------[replace.] 0.000225 33 57.66% : 0.000130s : 17: replace.inline 42.34% : 0.000095s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000459 33 92.51% : 0.000424s : 17: match.inline 7.49% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.49% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000043s : 249: predicate.inline 1.26% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.32% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.10% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001589 34 56.42% : 0.000896s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.58% : 0.000692s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.133953 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.23% : 0.002985s : 1: add_attr 2.22% : 0.002977s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000126s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000545s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000055s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000439s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.70% : 0.004953s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000214s : 28: opt.transform.opt_b 0.06% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.27% : 0.011082s : 1: opt_a 0.11% : 0.000142s : 1: opt_after_cconv 0.36% : 0.000481s : 1: opt_after_jit_grad 0.25% : 0.000333s : 1: opt_b 10.01% : 0.013403s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.05% : 0.000066s : 1: pre_auto_parallel 0.03% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000034s : 1: remove_dup_value 1.21% : 0.001625s : 2: renormalize.infer 1.05% : 0.001410s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.11% : 0.000150s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 59.64% : 0.079893s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 8.54% : 0.011434s : 1: type_inference 0.05% : 0.000068s : 1: validate TotalTime = 0.0722203, [24] [bootstrap]: 0.0004643 [type_inference]: 0.00426842 [event_method]: 1.112e-05 [auto_monad]: 5.054e-05 [graph_reusing]: 5.47999e-06 [inline]: 1.89e-06 [add_attr]: 0.00295996, [1] [add_attr_with_inline]: 0.00295171, [1] [Cycle 1]: 4.368e-05, [2] [tag_attr]: 1.198e-05 [meta_addattr_fg_expand]: 3.13998e-06 [parallel-infer-symbol]: 2.75002e-06 [pre_auto_parallel]: 2.169e-05 [insert-virtual-dataset]: 2.52001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00368437, [53] [py_interpret_to_execute]: 1.503e-05 [rewriter_before_opt_a]: 3.854e-05 [opt_a]: 0.00189086, [2] [Cycle 1]: 0.00124608, [45] [expand_dump_flag]: 2.64001e-06 [switch_simplify]: 2.432e-05 [loop_unroll]: 1.378e-05 [a_1]: 0.00029137 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 7.51999e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.53999e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 7.671e-05 [accelerated_algorithm]: 6.17999e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.42999e-06 [shard_inline]: 5.99e-06 [merge_send_recv]: 7.68001e-06 [auto_parallel]: 6.21e-06 [parallel]: 1.837e-05 [flash_sp]: 7.36001e-06 [merge_comm]: 3.84002e-06 [allreduce_fusion]: 3.13e-06 [matmul_add_comm_reduction]: 8.38001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.92002e-06 [virtual_dataset]: 5.67001e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.47001e-06 [merge_forward]: 3.67002e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 8.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.14998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.16998e-06 [flash_sp_send_recv_attached]: 2.39999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 9.19998e-06 [renormalize]: 0.0003388 [add_forward_monad_depend]: 4.33001e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.245e-05 [cse]: 2.594e-05 [a_3]: 3.94e-05 [Cycle 2]: 0.00063532, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.92002e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.00012449 [with_stream_mark]: 9.56998e-06 [recompute_prepare]: 5.58002e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.35002e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.918e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 1.19e-06 [meta_shard_fg_expand]: 1.05999e-06 [shard_inline]: 5.41998e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.18001e-06 [flash_sp]: 3.25002e-06 [merge_comm]: 3.16999e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.41998e-06 [allreduce_slice_to_reducescatter]: 4.20026e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.15999e-06 [virtual_output]: 5.39998e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 5.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.91e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.05002e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 9.13002e-06 [a_after_grad]: 8.16002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.53003e-06 [cse]: 1.306e-05 [a_3]: 3.242e-05 [py_interpret_to_execute_after_opt_a]: 7.37002e-06 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 3.017e-05 [convert_after_rewriter]: 7.04001e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00044486 [opt_b]: 0.00018261, [1] [Cycle 1]: 0.00017627, [7] [b_1]: 0.00010802 [b_2]: 7.26999e-06 [updatestate_depend_eliminate]: 5.41998e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 4.59986e-07 [cse]: 1.597e-05 [optimize_parallel_all_gather_comm]: 1.519e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.124e-05 [loop_unroll]: 0.00041001 [opt_after_cconv]: 9.507e-05, [1] [Cycle 1]: 8.93e-05, [7] [c_1]: 2.724e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.608e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.212e-05 [tuple_transform]: 6.879e-05, [1] [Cycle 1]: 6.454e-05, [4] [d_1]: 3.875e-05 [none_parameter_eliminate]: 1.71002e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 4.257e-05 [cse_after_recomputation]: 2.003e-05, [1] [Cycle 1]: 1.569e-05, [1] [cse]: 1.07e-05 [environ_conv]: 4.60999e-06 [swap_dp_allreduce_reducescatter]: 4.92e-06 [bias_add_comm_swap]: 2.39999e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 9.60019e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 1.99e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 9.49978e-07 [add_comm_op_reuse_tag]: 1.01002e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.11997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.16e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.15999e-06 [overlap_grad_flash_sp]: 1.708e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.84e-05, [1] [Cycle 1]: 6.429e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.119e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.527e-05 [get_jit_bprop_graph]: 9.69972e-07 [rewriter_after_jit_bprop_graph]: 3.35998e-06 [opt_after_jit_grad]: 0.00044982 [validate]: 2.983e-05 [backend_pass]: 1.14e-06 [task_emit]: 0.060034 [execute]: 8.08999e-06 Sums bootstrap : 0.000464s : 0.68% type_inference : 0.004268s : 6.25% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.61% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000339s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000410s : 0.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.66% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.060034s : 87.95% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.81% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.17% : 0.000005s : 4: substitution.graph_param_transform 65.51% : 0.000079s : 2: substitution.inline 2.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.46% : 0.000004s : 4: substitution.remove_not_recompute_node 3.39% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004229 2 91.82% : 0.003883s : 1: type_inference.infer 8.18% : 0.000346s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.25% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.71% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.86% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.66% : 0.000001s : 4: predicate.parallel_virtual_node 1.44% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.54% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.22% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.08% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.61% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 41.74% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.26% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080129 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.70% : 0.002964s : 1: add_attr 3.69% : 0.002955s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000501s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000418s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 0.96% : 0.000769s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.36% : 0.001894s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.60% : 0.003688s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000182s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.94% : 0.060050s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.34% : 0.004282s : 1: type_inference 0.06% : 0.000051s : 1: validate TotalTime = 0.106637, [24] [bootstrap]: 0.00056006 [type_inference]: 0.0101837 [event_method]: 4.289e-05 [auto_monad]: 0.00011419 [graph_reusing]: 8.52e-06 [inline]: 2.11e-06 [add_attr]: 0.00299652, [1] [add_attr_with_inline]: 0.00298851, [1] [Cycle 1]: 6.529e-05, [2] [tag_attr]: 3.033e-05 [meta_addattr_fg_expand]: 8.78001e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 4.676e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.90023e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0130403, [53] [py_interpret_to_execute]: 3.604e-05 [rewriter_before_opt_a]: 0.00012586 [opt_a]: 0.0108022, [3] [Cycle 1]: 0.00691103, [45] [expand_dump_flag]: 3.39001e-06 [switch_simplify]: 6.577e-05 [loop_unroll]: 5.535e-05 [a_1]: 0.00132595 [with_stream_mark]: 2.259e-05 [recompute_prepare]: 2.192e-05 [updatestate_depend_eliminate]: 9.36e-06 [updatestate_assign_eliminate]: 7.45998e-06 [updatestate_loads_eliminate]: 7.26001e-06 [parameter_eliminate]: 2.58003e-06 [a_2]: 0.00026572 [accelerated_algorithm]: 3.06e-05 [shard]: 1.92001e-06 [meta_shard_fg_expand]: 3.23998e-06 [shard_inline]: 1.649e-05 [merge_send_recv]: 1.608e-05 [auto_parallel]: 1.099e-05 [parallel]: 1.822e-05 [flash_sp]: 1.191e-05 [merge_comm]: 9.68997e-06 [allreduce_fusion]: 8.79998e-06 [matmul_add_comm_reduction]: 2.59e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.821e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.528e-05 [virtual_output]: 1.531e-05 [merge_forward]: 9.26002e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 1.748e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.837e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 2.774e-05 [set_forward_comm_id_for_comm_node_pass]: 9.53997e-06 [meta_fg_expand]: 0.00137584 [flash_sp_send_recv_attached]: 4.07e-06 [receive_attached]: 2.84999e-06 [after_resolve]: 5.866e-05 [a_after_grad]: 8.022e-05 [renormalize]: 0.00244119 [add_forward_monad_depend]: 9.10999e-06 [auto_monad_grad]: 5.58002e-06 [auto_monad_eliminator]: 5.684e-05 [cse]: 0.00016772 [a_3]: 0.00033275 [Cycle 2]: 0.00298049, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 4.659e-05 [loop_unroll]: 4.361e-05 [a_1]: 0.00153054 [with_stream_mark]: 1.189e-05 [recompute_prepare]: 1.081e-05 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 4.38001e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 9.90025e-07 [a_2]: 0.00012598 [accelerated_algorithm]: 1.205e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 6.68003e-06 [auto_parallel]: 7.4e-06 [parallel]: 5.20001e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 6.02001e-06 [allreduce_fusion]: 4.92e-06 [matmul_add_comm_reduction]: 7.77998e-06 [allreduce_slice_to_reducescatter]: 4.59986e-07 [virtual_shard_identity]: 1.031e-05 [virtual_dataset]: 8.81997e-06 [get_grad_eliminate_]: 8.75999e-06 [virtual_output]: 8.40999e-06 [merge_forward]: 4.55001e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.10999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.636e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.388e-05 [set_forward_comm_id_for_comm_node_pass]: 5.68002e-06 [meta_fg_expand]: 3.453e-05 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 1.579e-05 [a_after_grad]: 1.407e-05 [renormalize]: 0.00058018 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.5e-05 [cse]: 4.531e-05 [a_3]: 9.152e-05 [Cycle 3]: 0.00089668, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.068e-05 [loop_unroll]: 8.99e-06 [a_1]: 0.00024973 [with_stream_mark]: 9.86998e-06 [recompute_prepare]: 8.92999e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.91001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012246 [accelerated_algorithm]: 1.141e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 7.05e-06 [auto_parallel]: 6.98e-06 [parallel]: 5.00999e-06 [flash_sp]: 1.01997e-06 [merge_comm]: 5.01002e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 7.63999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.74e-06 [virtual_dataset]: 8.85001e-06 [get_grad_eliminate_]: 8.43001e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.14002e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 8.55001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.596e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.341e-05 [set_forward_comm_id_for_comm_node_pass]: 5.13002e-06 [meta_fg_expand]: 3.06999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.22e-06 [after_resolve]: 1.313e-05 [a_after_grad]: 1.432e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 1.111e-05 [cse]: 2.801e-05 [a_3]: 5.857e-05 [py_interpret_to_execute_after_opt_a]: 1.03e-05 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 4.739e-05 [convert_after_rewriter]: 9.05999e-06 [order_py_execute_after_rewriter]: 6.78e-06 [mutable_eliminate]: 0.00045701 [opt_b]: 0.00028874, [1] [Cycle 1]: 0.00028282, [7] [b_1]: 0.00018862 [b_2]: 1.118e-05 [updatestate_depend_eliminate]: 7.1e-06 [updatestate_assign_eliminate]: 4.11001e-06 [updatestate_loads_eliminate]: 3.86999e-06 [renormalize]: 5.19998e-07 [cse]: 3.264e-05 [optimize_parallel_all_gather_comm]: 2.046e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 1.921e-05 [loop_unroll]: 0.0004249 [opt_after_cconv]: 0.00013663, [1] [Cycle 1]: 0.00013079, [7] [c_1]: 4.805e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 7.56001e-06 [updatestate_assign_eliminate]: 4.26001e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 3.042e-05 [renormalize]: 3.49974e-07 [remove_dup_value]: 2.832e-05 [tuple_transform]: 0.00010158, [1] [Cycle 1]: 9.685e-05, [4] [d_1]: 6.641e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 1.001e-05 [partial_unused_args_eliminate]: 1.61002e-06 [add_recomputation]: 5.763e-05 [cse_after_recomputation]: 3.301e-05, [1] [Cycle 1]: 2.823e-05, [1] [cse]: 2.297e-05 [environ_conv]: 8.95999e-06 [swap_dp_allreduce_reducescatter]: 7.91001e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.33001e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.32999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 1.97999e-06 [reorder_send_recv_between_fp_bp]: 2.63998e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.40025e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.721e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 4.97e-06 [overlap_recompute_and_grad_model_parallel]: 5.44e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 5.16998e-06 [overlap_grad_flash_sp]: 2.325e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 9.873e-05, [1] [Cycle 1]: 9.456e-05, [6] [build]: 9.25999e-06 [elim_shapecalc]: 1.324e-05 [elim_not_effective]: 1.875e-05 [opt_reshape]: 9.94999e-06 [fold_const_symbol]: 1.511e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.51002e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.536e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.56001e-06 [opt_after_jit_grad]: 0.00046897 [validate]: 4.573e-05 [backend_pass]: 1.27e-06 [task_emit]: 0.0788657 [execute]: 8.62e-06 Sums bootstrap : 0.000560s : 0.55% type_inference : 0.010184s : 9.95% event_method : 0.000043s : 0.04% auto_monad : 0.000114s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.04% optimize.rewriter_before_opt_a : 0.000126s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000123s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.11% optimize.opt_a.a_1 : 0.003106s : 3.03% optimize.opt_a.with_stream_mark : 0.000044s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000514s : 0.50% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001413s : 1.38% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.09% optimize.opt_a.a_after_grad : 0.000109s : 0.11% optimize.opt_a.renormalize : 0.003021s : 2.95% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000241s : 0.24% optimize.opt_a.a_3 : 0.000483s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.45% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000425s : 0.41% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000066s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000469s : 0.46% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.078866s : 77.03% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000726 218 5.87% : 0.000043s : 11: substitution.arithmetic_simplify 1.98% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000007s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.68% : 0.000397s : 16: substitution.inline 2.15% : 0.000016s : 2: substitution.inline_without_move 1.42% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.99% : 0.000014s : 3: substitution.less_batch_normalization 1.80% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.90% : 0.000014s : 20: substitution.remove_not_recompute_node 3.16% : 0.000023s : 10: substitution.replace_applicator 1.43% : 0.000010s : 15: substitution.replace_old_param 0.39% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.76% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.48% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.49% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.52% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010116 2 87.04% : 0.008806s : 1: type_inference.infer 12.96% : 0.001311s : 1: type_inference.specialize ------[replace.] 0.000200 30 58.94% : 0.000118s : 16: replace.inline 41.06% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000418 30 92.81% : 0.000388s : 16: match.inline 7.19% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.00% : 0.000015s : 99: predicate.arithmetic_simplify 1.17% : 0.000009s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.24% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000041s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.19% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.95% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.77% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001531 32 56.36% : 0.000863s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.64% : 0.000668s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.130796 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.29% : 0.003001s : 1: add_attr 2.29% : 0.002992s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000122s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000594s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000049s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.65% : 0.004775s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.26% : 0.010805s : 1: opt_a 0.11% : 0.000140s : 1: opt_after_cconv 0.37% : 0.000478s : 1: opt_after_jit_grad 0.22% : 0.000292s : 1: opt_b 9.97% : 0.013044s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 1.20% : 0.001567s : 2: renormalize.infer 1.10% : 0.001441s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.10% : 0.000130s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 60.31% : 0.078884s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 7.80% : 0.010199s : 1: type_inference 0.05% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x3-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x4-pynative],max_mem:56.0M TotalTime = 0.021362, [24] [bootstrap]: 0.00059153 [type_inference]: 0.00615291 [event_method]: 1.438e-05 [auto_monad]: 5.488e-05 [graph_reusing]: 5.49e-06 [inline]: 1.64e-06 [add_attr]: 0.00343399, [1] [add_attr_with_inline]: 0.00342278, [1] [Cycle 1]: 4.371e-05, [2] [tag_attr]: 1.481e-05 [meta_addattr_fg_expand]: 4.37e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.831e-05 [insert-virtual-dataset]: 2.40997e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00397624, [53] [py_interpret_to_execute]: 2.015e-05 [rewriter_before_opt_a]: 5.861e-05 [opt_a]: 0.00213439, [2] [Cycle 1]: 0.00150751, [45] [expand_dump_flag]: 2.59999e-06 [switch_simplify]: 3.165e-05 [loop_unroll]: 2.102e-05 [a_1]: 0.00045388 [with_stream_mark]: 1.281e-05 [recompute_prepare]: 8.07e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 3.18998e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.42e-05 [accelerated_algorithm]: 6.14001e-06 [shard]: 2.06e-06 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 5.81998e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 5.60001e-06 [parallel]: 2.314e-05 [flash_sp]: 7.26001e-06 [merge_comm]: 3.43999e-06 [allreduce_fusion]: 3.42002e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 5.99975e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.80002e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.99e-06 [merge_forward]: 3.59002e-06 [cell_reuse_recompute_pass]: 9.90025e-07 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 8.84e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.24999e-06 [flash_sp_send_recv_attached]: 2.55002e-06 [receive_attached]: 2.79001e-06 [after_resolve]: 1.037e-05 [a_after_grad]: 8.89998e-06 [renormalize]: 0.00041778 [add_forward_monad_depend]: 4.30999e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.302e-05 [cse]: 2.714e-05 [a_3]: 4.004e-05 [Cycle 2]: 0.00061746, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 6.68998e-06 [loop_unroll]: 5.25999e-06 [a_1]: 0.0001248 [with_stream_mark]: 9.54e-06 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.604e-05 [accelerated_algorithm]: 5.41002e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 4.42998e-06 [auto_parallel]: 5.35999e-06 [parallel]: 4.08001e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.57001e-06 [matmul_add_comm_reduction]: 4.93001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 4.96002e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.71999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 5.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.09988e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 8.82e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.24999e-06 [cse]: 1.679e-05 [a_3]: 3.16e-05 [py_interpret_to_execute_after_opt_a]: 7.71001e-06 [slice_cell_reuse_recomputed_activation]: 1.98002e-06 [rewriter_after_opt_a]: 3.034e-05 [convert_after_rewriter]: 7.27002e-06 [order_py_execute_after_rewriter]: 5.52999e-06 [mutable_eliminate]: 0.00045311 [opt_b]: 0.00017994, [1] [Cycle 1]: 0.00017389, [7] [b_1]: 0.00010617 [b_2]: 7.12002e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 5.29981e-07 [cse]: 1.644e-05 [optimize_parallel_all_gather_comm]: 1.608e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.241e-05 [loop_unroll]: 0.00041456 [opt_after_cconv]: 9.465e-05, [1] [Cycle 1]: 8.906e-05, [7] [c_1]: 2.732e-05 [parameter_eliminate]: 2.06003e-06 [updatestate_depend_eliminate]: 5.26998e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.603e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.279e-05 [tuple_transform]: 6.867e-05, [1] [Cycle 1]: 6.448e-05, [4] [d_1]: 3.904e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.15002e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.903e-05 [cse_after_recomputation]: 2.122e-05, [1] [Cycle 1]: 1.688e-05, [1] [cse]: 1.171e-05 [environ_conv]: 4.92e-06 [swap_dp_allreduce_reducescatter]: 4.96002e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.05002e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 2.00002e-06 [control_data_broadcast_order]: 1.129e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.18001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 1.94e-06 [overlap_grad_ring_attention]: 3.83999e-06 [overlap_grad_flash_sp]: 1.682e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.976e-05, [1] [Cycle 1]: 6.575e-05, [6] [build]: 2.17001e-06 [elim_shapecalc]: 8.59002e-06 [elim_not_effective]: 1.168e-05 [opt_reshape]: 6.46999e-06 [fold_const_symbol]: 9.10999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 1.639e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.0004518 [validate]: 3.193e-05 [backend_pass]: 1.19003e-06 [task_emit]: 0.00637664 [execute]: 6.94999e-06 Sums bootstrap : 0.000592s : 3.49% type_inference : 0.006153s : 36.32% event_method : 0.000014s : 0.08% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000579s : 3.42% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000140s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000418s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000044s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.67% optimize.opt_b.b_1 : 0.000106s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000415s : 2.45% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 2.67% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006377s : 37.64% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 15.19% : 0.000025s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 66.80% : 0.000110s : 3: substitution.inline 1.55% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000004s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 6.48% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006109 2 90.48% : 0.005527s : 1: type_inference.infer 9.52% : 0.000582s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.09% : 0.000027s : 3: replace.inline 29.91% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.83% : 0.000108s : 3: match.inline 8.17% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.93% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.18% : 0.000003s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.65% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.31% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.48% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000371 8 46.03% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.97% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030282 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.35% : 0.003438s : 1: add_attr 11.32% : 0.003427s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.08% : 0.000629s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000939s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.06% : 0.002137s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.52% : 0.000461s : 1: opt_after_jit_grad 0.61% : 0.000183s : 1: opt_b 13.14% : 0.003980s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000213s : 1: renormalize.infer 0.66% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 21.09% : 0.006387s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.36% : 0.006167s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0180539, [24] [bootstrap]: 0.00047308 [type_inference]: 0.00431546 [event_method]: 1.068e-05 [auto_monad]: 5.026e-05 [graph_reusing]: 5.57999e-06 [inline]: 1.98002e-06 [add_attr]: 0.00297344, [1] [add_attr_with_inline]: 0.0029648, [1] [Cycle 1]: 4.415e-05, [2] [tag_attr]: 1.18e-05 [meta_addattr_fg_expand]: 3.33e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 2.189e-05 [insert-virtual-dataset]: 2.86999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00366649, [53] [py_interpret_to_execute]: 1.576e-05 [rewriter_before_opt_a]: 3.873e-05 [opt_a]: 0.00183603, [2] [Cycle 1]: 0.00124244, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 2.468e-05 [loop_unroll]: 1.369e-05 [a_1]: 0.00029323 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 7.16001e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 2.91e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 7.62e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 7.62998e-06 [auto_parallel]: 5.71e-06 [parallel]: 1.724e-05 [flash_sp]: 6.99001e-06 [merge_comm]: 4.03001e-06 [allreduce_fusion]: 3.49001e-06 [matmul_add_comm_reduction]: 8.66002e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 6.73e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.76e-06 [merge_forward]: 3.92002e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 8.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.081e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.50002e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 1.063e-05 [a_after_grad]: 8.40001e-06 [renormalize]: 0.00033423 [add_forward_monad_depend]: 4.15999e-06 [auto_monad_grad]: 1.93002e-06 [auto_monad_eliminator]: 1.317e-05 [cse]: 2.673e-05 [a_3]: 4.027e-05 [Cycle 2]: 0.00058451, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.65998e-06 [loop_unroll]: 5.46998e-06 [a_1]: 0.00012251 [with_stream_mark]: 1.097e-05 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.11e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.728e-05 [accelerated_algorithm]: 5.66003e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.2e-06 [auto_parallel]: 5.17999e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 2.88003e-06 [allreduce_fusion]: 2.61999e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.26e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 4.87e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.62001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.7e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.81002e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.26e-05 [a_3]: 3.142e-05 [py_interpret_to_execute_after_opt_a]: 7.30998e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.115e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.22e-06 [mutable_eliminate]: 0.00045017 [opt_b]: 0.00018226, [1] [Cycle 1]: 0.00017617, [7] [b_1]: 0.0001082 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 5.58002e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.59985e-07 [cse]: 1.588e-05 [optimize_parallel_all_gather_comm]: 3.677e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.249e-05 [loop_unroll]: 0.00041763 [opt_after_cconv]: 9.408e-05, [1] [Cycle 1]: 8.846e-05, [7] [c_1]: 2.712e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.593e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.193e-05 [tuple_transform]: 6.817e-05, [1] [Cycle 1]: 6.402e-05, [4] [d_1]: 3.806e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.21998e-06 [partial_unused_args_eliminate]: 1.54998e-06 [add_recomputation]: 4.385e-05 [cse_after_recomputation]: 1.984e-05, [1] [Cycle 1]: 1.553e-05, [1] [cse]: 1.049e-05 [environ_conv]: 4.4e-06 [swap_dp_allreduce_reducescatter]: 4.94e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.25002e-06 [micro_interleaved_order_control]: 2.54999e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.37e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.00999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61002e-06 [control_data_broadcast_order]: 1.147e-05 [grouped_pairwise_exchange_alltoall]: 2.03002e-06 [offloading_packed_experts]: 4.21001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.752e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.68002e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 6.795e-05, [1] [Cycle 1]: 6.394e-05, [6] [build]: 2.14999e-06 [elim_shapecalc]: 8.23999e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.515e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.29001e-06 [opt_after_jit_grad]: 0.00044863 [validate]: 3.103e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.00582124 [execute]: 7.05998e-06 Sums bootstrap : 0.000473s : 3.35% type_inference : 0.004315s : 30.54% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000416s : 2.94% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000016s : 0.12% optimize.opt_a.renormalize : 0.000334s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.19% optimize.opt_b.b_1 : 0.000108s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000037s : 0.26% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000418s : 2.96% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.18% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005821s : 41.20% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.27% : 0.000022s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 1.20% : 0.000001s : 2: substitution.fold_const_symbol 4.24% : 0.000005s : 4: substitution.graph_param_transform 65.85% : 0.000079s : 2: substitution.inline 2.46% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.56% : 0.000004s : 4: substitution.remove_not_recompute_node 3.04% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004274 2 91.97% : 0.003931s : 1: type_inference.infer 8.03% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.35% : 0.000003s : 17: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.81% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.44% : 0.000003s : 26: predicate.load_eliminater 1.43% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.72% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.67% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.99% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 1.05% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.39% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.85% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.77% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 42.88% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.12% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025952 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.47% : 0.002978s : 1: add_attr 11.44% : 0.002969s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.97% : 0.000510s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000426s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000764s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001839s : 1: opt_a 0.38% : 0.000097s : 1: opt_after_cconv 1.76% : 0.000458s : 1: opt_after_jit_grad 0.72% : 0.000186s : 1: opt_b 14.14% : 0.003670s : 1: optimize 0.16% : 0.000041s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.70% : 0.000183s : 1: renormalize.infer 0.56% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.47% : 0.005831s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.68% : 0.004329s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x4-kbk],max_mem:56.0M TotalTime = 0.0797749, [24] [bootstrap]: 0.00056595 [type_inference]: 0.00611884 [event_method]: 1.376e-05 [auto_monad]: 5.903e-05 [graph_reusing]: 5.10001e-06 [inline]: 2.06003e-06 [add_attr]: 0.00343172, [1] [add_attr_with_inline]: 0.00342081, [1] [Cycle 1]: 4.453e-05, [2] [tag_attr]: 1.507e-05 [meta_addattr_fg_expand]: 4.23999e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 2.832e-05 [insert-virtual-dataset]: 2.56e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00394804, [53] [py_interpret_to_execute]: 2.065e-05 [rewriter_before_opt_a]: 5.768e-05 [opt_a]: 0.00209608, [2] [Cycle 1]: 0.0015034, [45] [expand_dump_flag]: 2.84001e-06 [switch_simplify]: 3.271e-05 [loop_unroll]: 2.086e-05 [a_1]: 0.0004485 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 7.47002e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 3.63999e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.97001e-06 [a_2]: 7.617e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.92998e-06 [auto_parallel]: 6.11e-06 [parallel]: 2.389e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.30998e-06 [matmul_add_comm_reduction]: 8.71002e-06 [allreduce_slice_to_reducescatter]: 9.89996e-07 [virtual_shard_identity]: 7.23e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.34e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.064e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.37999e-06 [flash_sp_send_recv_attached]: 2.61e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 9.99999e-06 [a_after_grad]: 8.64998e-06 [renormalize]: 0.00040884 [add_forward_monad_depend]: 4.94998e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 2.656e-05 [a_3]: 4.052e-05 [Cycle 2]: 0.00058326, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.58e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.00012411 [with_stream_mark]: 9.17999e-06 [recompute_prepare]: 5.50001e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.722e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.11997e-06 [shard_inline]: 5.40001e-06 [merge_send_recv]: 4.16001e-06 [auto_parallel]: 4.90001e-06 [parallel]: 4.30999e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.00998e-06 [allreduce_fusion]: 2.73998e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.15001e-06 [get_grad_eliminate_]: 4.90001e-06 [virtual_output]: 4.85999e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.43002e-06 [offload_activation]: 5.61e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.65002e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.52002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.18002e-06 [a_after_grad]: 7.97e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.32001e-06 [cse]: 1.548e-05 [a_3]: 3.18e-05 [py_interpret_to_execute_after_opt_a]: 8.08001e-06 [slice_cell_reuse_recomputed_activation]: 1.63002e-06 [rewriter_after_opt_a]: 3.096e-05 [convert_after_rewriter]: 6.44001e-06 [order_py_execute_after_rewriter]: 4.63999e-06 [mutable_eliminate]: 0.00047228 [opt_b]: 0.00018256, [1] [Cycle 1]: 0.00017658, [7] [b_1]: 0.00010774 [b_2]: 7.69002e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.60014e-07 [cse]: 1.646e-05 [optimize_parallel_all_gather_comm]: 1.566e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.184e-05 [loop_unroll]: 0.00041525 [opt_after_cconv]: 9.408e-05, [1] [Cycle 1]: 8.837e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.43998e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.582e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.305e-05 [tuple_transform]: 6.764e-05, [1] [Cycle 1]: 6.35e-05, [4] [d_1]: 3.816e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.18002e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.664e-05 [cse_after_recomputation]: 2.039e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.40999e-06 [swap_dp_allreduce_reducescatter]: 5.36998e-06 [bias_add_comm_swap]: 2.21998e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.48998e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97999e-06 [control_data_broadcast_order]: 1.186e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.4e-06 [overlap_recompute_and_grad_model_parallel]: 4.18001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 3.52997e-06 [overlap_grad_flash_sp]: 1.596e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.87001e-06 [split_layernorm_comm]: 2.07001e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.777e-05, [1] [Cycle 1]: 6.354e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 8.28999e-06 [elim_not_effective]: 1.097e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00045309 [validate]: 3.085e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0648674 [execute]: 8.43999e-06 Sums bootstrap : 0.000566s : 0.75% type_inference : 0.006119s : 8.12% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000573s : 0.76% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000409s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000472s : 0.63% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000453s : 0.60% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064867s : 86.06% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000161 30 14.45% : 0.000023s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.25% : 0.000005s : 4: substitution.graph_param_transform 67.12% : 0.000108s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.72% : 0.000004s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 6.45% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006075 2 90.87% : 0.005521s : 1: type_inference.infer 9.13% : 0.000555s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.66% : 0.000028s : 3: replace.inline 28.34% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.92% : 0.000106s : 3: match.inline 8.08% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 0.97% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.60% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000002s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.36% : 0.000009s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000348 8 46.87% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.13% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088651 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.88% : 0.003436s : 1: add_attr 3.86% : 0.003424s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000064s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.68% : 0.000603s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000006s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.54% : 0.000482s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.05% : 0.000935s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.37% : 0.002099s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.52% : 0.000463s : 1: opt_after_jit_grad 0.21% : 0.000186s : 1: opt_b 4.46% : 0.003952s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000006s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000210s : 1: renormalize.infer 0.22% : 0.000192s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 73.19% : 0.064884s : 1: task_emit 0.08% : 0.000070s : 1: tuple_transform 6.92% : 0.006132s : 1: type_inference 0.07% : 0.000058s : 1: validate TotalTime = 0.0727307, [24] [bootstrap]: 0.00047658 [type_inference]: 0.00461169 [event_method]: 1.064e-05 [auto_monad]: 5.085e-05 [graph_reusing]: 5.38002e-06 [inline]: 2.19999e-06 [add_attr]: 0.00306495, [1] [add_attr_with_inline]: 0.00305695, [1] [Cycle 1]: 4.436e-05, [2] [tag_attr]: 1.167e-05 [meta_addattr_fg_expand]: 2.78e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.183e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00366204, [53] [py_interpret_to_execute]: 1.534e-05 [rewriter_before_opt_a]: 3.946e-05 [opt_a]: 0.00186693, [2] [Cycle 1]: 0.00126337, [45] [expand_dump_flag]: 2.31e-06 [switch_simplify]: 2.497e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00030076 [with_stream_mark]: 1.387e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.20002e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.536e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 5.86e-06 [parallel]: 1.676e-05 [flash_sp]: 7.51001e-06 [merge_comm]: 3.50998e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.44e-06 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 6.70002e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.82002e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.44998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.19999e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 8.61002e-06 [renormalize]: 0.00034675 [add_forward_monad_depend]: 4.75999e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.291e-05 [cse]: 2.672e-05 [a_3]: 3.962e-05 [Cycle 2]: 0.00059437, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012509 [with_stream_mark]: 1.079e-05 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.833e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.42e-06 [meta_shard_fg_expand]: 1.02e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.27998e-06 [flash_sp]: 2.93e-06 [merge_comm]: 3.19001e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.01e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 5.24003e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.54998e-06 [offload_activation]: 5.76003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.96e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.47e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 9.90025e-07 [after_resolve]: 8.80999e-06 [a_after_grad]: 8.08001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.14003e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 5.94999e-06 [cse]: 1.234e-05 [a_3]: 3.164e-05 [py_interpret_to_execute_after_opt_a]: 7.41001e-06 [slice_cell_reuse_recomputed_activation]: 1.90001e-06 [rewriter_after_opt_a]: 3.034e-05 [convert_after_rewriter]: 7.11001e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00044742 [opt_b]: 0.0001806, [1] [Cycle 1]: 0.00017442, [7] [b_1]: 0.00010756 [b_2]: 7.05e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 3.30008e-07 [cse]: 1.628e-05 [optimize_parallel_all_gather_comm]: 1.541e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.00041098 [opt_after_cconv]: 9.372e-05, [1] [Cycle 1]: 8.792e-05, [7] [c_1]: 2.746e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.564e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 1.252e-05 [tuple_transform]: 6.887e-05, [1] [Cycle 1]: 6.456e-05, [4] [d_1]: 3.897e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.53002e-06 [add_recomputation]: 4.406e-05 [cse_after_recomputation]: 2.016e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.064e-05 [environ_conv]: 5.11002e-06 [swap_dp_allreduce_reducescatter]: 5.37999e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.50999e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.42001e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 9.09989e-07 [remove_cast_before_assign_add]: 8.59989e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.134e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.37002e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 3.83999e-06 [overlap_grad_flash_sp]: 1.658e-05 [begin_end_overlap_inline]: 8.30012e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.54998e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.76e-05, [1] [Cycle 1]: 6.341e-05, [6] [build]: 2.09e-06 [elim_shapecalc]: 8.15e-06 [elim_not_effective]: 1.16e-05 [opt_reshape]: 5.94e-06 [fold_const_symbol]: 8.71002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.68997e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.519e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00044581 [validate]: 3.131e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0600939 [execute]: 8.27e-06 Sums bootstrap : 0.000477s : 0.69% type_inference : 0.004612s : 6.71% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000426s : 0.62% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000347s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000411s : 0.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000446s : 0.65% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060094s : 87.48% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000116 26 18.75% : 0.000022s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.86% : 0.000006s : 4: substitution.graph_param_transform 64.23% : 0.000074s : 2: substitution.inline 2.51% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.92% : 0.000005s : 4: substitution.remove_not_recompute_node 3.17% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004506 2 91.39% : 0.004118s : 1: type_inference.infer 8.61% : 0.000388s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000073 2 100.00% : 0.000073s : 2: match.inline ------[predicate.] 0.000151 984 0.75% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.65% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 17: predicate.arithmetic_simplify 0.71% : 0.000001s : 9: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.73% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.00% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.99% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.95% : 0.000001s : 13: predicate.environ_get_depend_swap 1.75% : 0.000003s : 21: predicate.environ_get_eliminate 0.96% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.85% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.73% : 0.000003s : 11: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 5.41% : 0.000008s : 44: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.88% : 0.000003s : 26: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.57% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.56% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.34% : 0.000002s : 11: predicate.partial_defer_inline 1.13% : 0.000002s : 13: predicate.partial_eliminate 0.70% : 0.000001s : 9: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 9: predicate.reduce_eliminate 1.96% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 8: predicate.remove_not_recompute_node 1.20% : 0.000002s : 17: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.70% : 0.000001s : 9: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.92% : 0.000001s : 11: predicate.switch_defer_inline 1.60% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.30% : 0.000007s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 10.39% : 0.000016s : 9: predicate.transpose_eliminate 1.39% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.23% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.28% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.87% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.74% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000268 6 41.01% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.99% : 0.000158s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080735 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.80% : 0.003069s : 1: add_attr 3.79% : 0.003060s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.64% : 0.000513s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 0.96% : 0.000776s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.32% : 0.001870s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.58% : 0.000470s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.54% : 0.003666s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000186s : 1: renormalize.infer 0.19% : 0.000154s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.45% : 0.060109s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.73% : 0.004626s : 1: type_inference 0.07% : 0.000053s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x4-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x5-pynative],max_mem:56.0M TotalTime = 0.022723, [24] [bootstrap]: 0.00067406 [type_inference]: 0.00675629 [event_method]: 1.436e-05 [auto_monad]: 5.735e-05 [graph_reusing]: 5.57999e-06 [inline]: 1.89e-06 [add_attr]: 0.00388295, [1] [add_attr_with_inline]: 0.00387197, [1] [Cycle 1]: 4.571e-05, [2] [tag_attr]: 1.541e-05 [meta_addattr_fg_expand]: 4.89e-06 [parallel-infer-symbol]: 3.32997e-06 [pre_auto_parallel]: 2.815e-05 [insert-virtual-dataset]: 2.70002e-06 [parallel-infer-symbol-second]: 9.30013e-07 [dataset_repeat_opt]: 2.28002e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.0040633, [53] [py_interpret_to_execute]: 2.144e-05 [rewriter_before_opt_a]: 5.712e-05 [opt_a]: 0.00220949, [2] [Cycle 1]: 0.00159718, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 3.187e-05 [loop_unroll]: 2.052e-05 [a_1]: 0.00045166 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 3.84002e-06 [updatestate_assign_eliminate]: 3.04999e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.491e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.83002e-06 [merge_send_recv]: 7.49002e-06 [auto_parallel]: 5.89999e-06 [parallel]: 9.448e-05 [flash_sp]: 7.88001e-06 [merge_comm]: 4.09002e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 8.25999e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.58999e-06 [virtual_dataset]: 6.31e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 9.45001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 2.21998e-06 [after_resolve]: 1.003e-05 [a_after_grad]: 8.64998e-06 [renormalize]: 0.00043207 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.329e-05 [cse]: 2.784e-05 [a_3]: 4.206e-05 [Cycle 2]: 0.00060277, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.0001313 [with_stream_mark]: 9.41e-06 [recompute_prepare]: 5.56998e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.25002e-06 [updatestate_loads_eliminate]: 2.34001e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.722e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.34998e-06 [parallel]: 4.07003e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 2.89999e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.00999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.07001e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86999e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.82e-06 [a_after_grad]: 8e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.339e-05 [a_3]: 3.37e-05 [py_interpret_to_execute_after_opt_a]: 7.70998e-06 [slice_cell_reuse_recomputed_activation]: 2.07001e-06 [rewriter_after_opt_a]: 3.106e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00045615 [opt_b]: 0.00018011, [1] [Cycle 1]: 0.00017401, [7] [b_1]: 0.0001063 [b_2]: 7.50998e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 4.60015e-07 [cse]: 1.613e-05 [optimize_parallel_all_gather_comm]: 1.503e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.234e-05 [loop_unroll]: 0.0004183 [opt_after_cconv]: 9.568e-05, [1] [Cycle 1]: 8.988e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 2.04e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.621e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.349e-05 [tuple_transform]: 6.824e-05, [1] [Cycle 1]: 6.397e-05, [4] [d_1]: 3.832e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 5.145e-05 [cse_after_recomputation]: 2.082e-05, [1] [Cycle 1]: 1.64e-05, [1] [cse]: 1.139e-05 [environ_conv]: 4.42e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.93e-06 [label_micro_interleaved_index]: 4.02e-06 [label_fine_grained_interleaved_index]: 2.82002e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.44001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.40025e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.183e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.51001e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.49001e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.748e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.757e-05, [1] [Cycle 1]: 6.332e-05, [6] [build]: 2.15002e-06 [elim_shapecalc]: 8.62e-06 [elim_not_effective]: 1.132e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.616e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00045309 [validate]: 3.15e-05 [backend_pass]: 9.19972e-07 [task_emit]: 0.0065009 [execute]: 6.93998e-06 Sums bootstrap : 0.000674s : 3.77% type_inference : 0.006756s : 37.83% event_method : 0.000014s : 0.08% auto_monad : 0.000057s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.14% optimize.opt_a.a_1 : 0.000583s : 3.26% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000142s : 0.80% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000099s : 0.55% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.07% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000432s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000041s : 0.23% optimize.opt_a.a_3 : 0.000076s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000456s : 2.55% optimize.opt_b.b_1 : 0.000106s : 0.60% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000418s : 2.34% optimize.opt_after_cconv.c_1 : 0.000028s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000004s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000453s : 2.54% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006501s : 36.40% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.98% : 0.000025s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 66.87% : 0.000110s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.25% : 0.000004s : 4: substitution.replace_old_param 6.62% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006709 2 91.41% : 0.006133s : 1: type_inference.infer 8.59% : 0.000576s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.30% : 0.000027s : 3: replace.inline 29.70% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.68% : 0.000108s : 3: match.inline 8.32% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.95% : 0.000002s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.85% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.90% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.52% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.57% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000002s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.91% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.96% : 0.000002s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 45.37% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.63% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032198 196 0.01% : 0.000004s : 1: ForceFp32Comm 12.07% : 0.003887s : 1: add_attr 12.04% : 0.003876s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000062s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.21% : 0.000710s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.33% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.45% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000947s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000089s : 28: opt.transform.opt_b 0.13% : 0.000042s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.87% : 0.002212s : 1: opt_a 0.31% : 0.000099s : 1: opt_after_cconv 1.44% : 0.000462s : 1: opt_after_jit_grad 0.57% : 0.000184s : 1: opt_b 12.63% : 0.004067s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.10% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.67% : 0.000215s : 1: renormalize.infer 0.65% : 0.000210s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.19% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000070s : 1: symbol_engine_optimizer 20.22% : 0.006511s : 1: task_emit 0.22% : 0.000071s : 1: tuple_transform 21.03% : 0.006770s : 1: type_inference 0.23% : 0.000073s : 1: validate TotalTime = 0.0182918, [24] [bootstrap]: 0.00040966 [type_inference]: 0.00441173 [event_method]: 1.023e-05 [auto_monad]: 4.864e-05 [graph_reusing]: 4.57e-06 [inline]: 1.94e-06 [add_attr]: 0.0030183, [1] [add_attr_with_inline]: 0.00301028, [1] [Cycle 1]: 4.331e-05, [2] [tag_attr]: 1.17e-05 [meta_addattr_fg_expand]: 3.61999e-06 [parallel-infer-symbol]: 3.26999e-06 [pre_auto_parallel]: 2.393e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.00367796, [53] [py_interpret_to_execute]: 1.556e-05 [rewriter_before_opt_a]: 4.013e-05 [opt_a]: 0.0018656, [2] [Cycle 1]: 0.00126428, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.51e-05 [loop_unroll]: 1.398e-05 [a_1]: 0.00029406 [with_stream_mark]: 1.388e-05 [recompute_prepare]: 7.3e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.38999e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.638e-05 [accelerated_algorithm]: 6.11e-06 [shard]: 2.41998e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.90002e-06 [merge_send_recv]: 7.38e-06 [auto_parallel]: 6.39001e-06 [parallel]: 1.841e-05 [flash_sp]: 7.21999e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 9.77999e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.05999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.054e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 9.29e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.18002e-06 [receive_attached]: 2.84999e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 8.64998e-06 [renormalize]: 0.00034909 [add_forward_monad_depend]: 4.89003e-06 [auto_monad_grad]: 1.71002e-06 [auto_monad_eliminator]: 1.244e-05 [cse]: 2.795e-05 [a_3]: 3.973e-05 [Cycle 2]: 0.00059197, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.00012533 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 5.49e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.709e-05 [accelerated_algorithm]: 5.79999e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.27999e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.40999e-06 [parallel]: 3.88999e-06 [flash_sp]: 2.80002e-06 [merge_comm]: 3.37002e-06 [allreduce_fusion]: 2.63003e-06 [matmul_add_comm_reduction]: 4.90001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.24001e-06 [virtual_dataset]: 5.51e-06 [get_grad_eliminate_]: 5.02999e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.38e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.93002e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.12998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.39996e-07 [after_resolve]: 8.95999e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.44001e-06 [cse]: 1.233e-05 [a_3]: 3.255e-05 [py_interpret_to_execute_after_opt_a]: 7.86001e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.086e-05 [convert_after_rewriter]: 6.46999e-06 [order_py_execute_after_rewriter]: 5.13002e-06 [mutable_eliminate]: 0.00044617 [opt_b]: 0.00018073, [1] [Cycle 1]: 0.00017483, [7] [b_1]: 0.00010788 [b_2]: 7.47002e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 4.39992e-07 [cse]: 1.564e-05 [optimize_parallel_all_gather_comm]: 1.604e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.18e-05 [loop_unroll]: 0.00042287 [opt_after_cconv]: 9.375e-05, [1] [Cycle 1]: 8.79e-05, [7] [c_1]: 2.73e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.599e-05 [renormalize]: 5.39992e-07 [remove_dup_value]: 1.305e-05 [tuple_transform]: 6.829e-05, [1] [Cycle 1]: 6.391e-05, [4] [d_1]: 3.872e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.01998e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.372e-05 [cse_after_recomputation]: 1.989e-05, [1] [Cycle 1]: 1.546e-05, [1] [cse]: 1.043e-05 [environ_conv]: 4.65001e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.52999e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.26998e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 3.77998e-06 [overlap_recompute_and_grad_model_parallel]: 4.28001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 3.77998e-06 [overlap_grad_flash_sp]: 1.708e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.888e-05, [1] [Cycle 1]: 6.463e-05, [6] [build]: 2.24999e-06 [elim_shapecalc]: 8.57e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.77e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.57e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00045047 [validate]: 3.063e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00596728 [execute]: 7.01001e-06 Sums bootstrap : 0.000410s : 2.86% type_inference : 0.004412s : 30.82% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.17% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000419s : 2.93% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000349s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000006s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000446s : 3.12% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000423s : 2.95% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000450s : 3.15% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005967s : 41.68% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.22% : 0.000022s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.04% : 0.000005s : 4: substitution.graph_param_transform 65.92% : 0.000079s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.72% : 0.000004s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004368 2 92.12% : 0.004024s : 1: type_inference.infer 7.88% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.12% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.86% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.61% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.07% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.35% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.04% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 43.12% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.88% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026266 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.51% : 0.003023s : 1: add_attr 11.47% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.20% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.70% : 0.000447s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000431s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000455s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000770s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.11% : 0.001869s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.75% : 0.000460s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.02% : 0.003682s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000028s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000195s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.76% : 0.005977s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.85% : 0.004425s : 1: type_inference 0.21% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x5-kbk],max_mem:56.0M TotalTime = 0.0941346, [24] [bootstrap]: 0.00053505 [type_inference]: 0.00661218 [event_method]: 1.415e-05 [auto_monad]: 6.105e-05 [graph_reusing]: 5.42999e-06 [inline]: 2.63e-06 [add_attr]: 0.00384657, [1] [add_attr_with_inline]: 0.00383309, [1] [Cycle 1]: 5.753e-05, [2] [tag_attr]: 1.868e-05 [meta_addattr_fg_expand]: 4.02998e-06 [parallel-infer-symbol]: 3.55e-06 [pre_auto_parallel]: 3.499e-05 [insert-virtual-dataset]: 2.42001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.01e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00438949, [53] [py_interpret_to_execute]: 2.387e-05 [rewriter_before_opt_a]: 6.392e-05 [opt_a]: 0.00233534, [2] [Cycle 1]: 0.00172226, [45] [expand_dump_flag]: 2.89999e-06 [switch_simplify]: 3.382e-05 [loop_unroll]: 2.18e-05 [a_1]: 0.00047553 [with_stream_mark]: 1.491e-05 [recompute_prepare]: 7.78001e-06 [updatestate_depend_eliminate]: 3.76001e-06 [updatestate_assign_eliminate]: 3.25002e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.708e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 6.26e-06 [merge_send_recv]: 7.26999e-06 [auto_parallel]: 5.92001e-06 [parallel]: 2.393e-05 [flash_sp]: 7.98999e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.95999e-06 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.36002e-06 [offload_activation]: 9.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.79998e-06 [before_grad]: 9.82001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.31998e-06 [flash_sp_send_recv_attached]: 2.52001e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.236e-05 [a_after_grad]: 9.14998e-06 [renormalize]: 0.00058031 [add_forward_monad_depend]: 5.17e-06 [auto_monad_grad]: 2.11e-06 [auto_monad_eliminator]: 1.365e-05 [cse]: 2.951e-05 [a_3]: 4.139e-05 [Cycle 2]: 0.00060321, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.27999e-06 [a_1]: 0.00012733 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 7.99977e-07 [a_2]: 6.812e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.63999e-06 [auto_parallel]: 5.71998e-06 [parallel]: 4.94e-06 [flash_sp]: 3.63e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.75002e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 6.40002e-06 [virtual_dataset]: 5.40999e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.79999e-06 [cell_reuse_recompute_pass]: 1.59e-06 [offload_activation]: 6.57002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62001e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.13999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 2.02001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.02e-06 [a_after_grad]: 7.98001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.21998e-06 [cse]: 1.462e-05 [a_3]: 3.187e-05 [py_interpret_to_execute_after_opt_a]: 9.14e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.255e-05 [convert_after_rewriter]: 6.58e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00054047 [opt_b]: 0.00018537, [1] [Cycle 1]: 0.0001784, [7] [b_1]: 0.00010885 [b_2]: 7.13998e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 3.39991e-07 [cse]: 1.748e-05 [optimize_parallel_all_gather_comm]: 1.639e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.253e-05 [loop_unroll]: 0.00050966 [opt_after_cconv]: 9.835e-05, [1] [Cycle 1]: 9.241e-05, [7] [c_1]: 2.815e-05 [parameter_eliminate]: 2.82002e-06 [updatestate_depend_eliminate]: 5.49998e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [cse]: 1.73e-05 [renormalize]: 1.59984e-07 [remove_dup_value]: 1.296e-05 [tuple_transform]: 7.035e-05, [1] [Cycle 1]: 6.582e-05, [4] [d_1]: 3.969e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.33002e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 4.913e-05 [cse_after_recomputation]: 2.065e-05, [1] [Cycle 1]: 1.641e-05, [1] [cse]: 1.115e-05 [environ_conv]: 5.04003e-06 [swap_dp_allreduce_reducescatter]: 5.36002e-06 [bias_add_comm_swap]: 2.89001e-06 [label_micro_interleaved_index]: 4.88001e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.55002e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.11998e-06 [reorder_send_recv_between_fp_bp]: 2.53998e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.19003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.182e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.37e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.784e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.02e-06 [symbol_engine_optimizer]: 6.764e-05, [1] [Cycle 1]: 6.341e-05, [6] [build]: 2.56e-06 [elim_shapecalc]: 8.3e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 6.23e-06 [fold_const_symbol]: 8.60999e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.57001e-06 [pipeline_parallel_scheduler]: 1.32999e-06 [auto_monad_reorder]: 1.658e-05 [get_jit_bprop_graph]: 1.48002e-06 [rewriter_after_jit_bprop_graph]: 3.83001e-06 [opt_after_jit_grad]: 0.00048644 [validate]: 3.597e-05 [backend_pass]: 1.20001e-06 [task_emit]: 0.0778358 [execute]: 8.80999e-06 Sums bootstrap : 0.000535s : 0.60% type_inference : 0.006612s : 7.41% event_method : 0.000014s : 0.02% auto_monad : 0.000061s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000019s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000035s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.03% optimize.rewriter_before_opt_a : 0.000064s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000041s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000603s : 0.68% optimize.opt_a.with_stream_mark : 0.000025s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000580s : 0.65% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000044s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000540s : 0.61% optimize.opt_b.b_1 : 0.000109s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000510s : 0.57% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000486s : 0.54% validate : 0.000036s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.077836s : 87.18% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000185 30 14.78% : 0.000027s : 5: substitution.arithmetic_simplify 0.97% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.20% : 0.000006s : 4: substitution.graph_param_transform 68.18% : 0.000126s : 3: substitution.inline 1.55% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.23% : 0.000004s : 4: substitution.remove_not_recompute_node 2.44% : 0.000005s : 4: substitution.replace_old_param 5.94% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006559 2 90.41% : 0.005930s : 1: type_inference.infer 9.59% : 0.000629s : 1: type_inference.specialize ------[replace.] 0.000041 5 71.90% : 0.000030s : 3: replace.inline 28.10% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000134 5 92.61% : 0.000124s : 3: match.inline 7.39% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.54% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.94% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.60% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.49% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.69% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.46% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.23% : 0.000009s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.94% : 0.000002s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.52% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.97% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000392 8 43.18% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.82% : 0.000223s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.104070 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.70% : 0.003851s : 1: add_attr 3.69% : 0.003837s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000067s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.55% : 0.000573s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.50% : 0.000519s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000550s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.94% : 0.000975s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.25% : 0.002338s : 1: opt_a 0.10% : 0.000102s : 1: opt_after_cconv 0.48% : 0.000497s : 1: opt_after_jit_grad 0.18% : 0.000189s : 1: opt_b 4.22% : 0.004394s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000040s : 1: pre_auto_parallel 0.03% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.30% : 0.000314s : 1: renormalize.infer 0.25% : 0.000259s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.07% : 0.000068s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000070s : 1: symbol_engine_optimizer 74.81% : 0.077857s : 1: task_emit 0.07% : 0.000073s : 1: tuple_transform 6.37% : 0.006630s : 1: type_inference 0.07% : 0.000068s : 1: validate TotalTime = 0.0832854, [24] [bootstrap]: 0.00043991 [type_inference]: 0.00444411 [event_method]: 1.088e-05 [auto_monad]: 5.214e-05 [graph_reusing]: 4.90001e-06 [inline]: 1.89999e-06 [add_attr]: 0.00304328, [1] [add_attr_with_inline]: 0.00303503, [1] [Cycle 1]: 4.671e-05, [2] [tag_attr]: 1.305e-05 [meta_addattr_fg_expand]: 2.89999e-06 [parallel-infer-symbol]: 3.23e-06 [pre_auto_parallel]: 2.395e-05 [insert-virtual-dataset]: 2.38998e-06 [parallel-infer-symbol-second]: 6.29982e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00393154, [53] [py_interpret_to_execute]: 1.711e-05 [rewriter_before_opt_a]: 4.004e-05 [opt_a]: 0.00200207, [2] [Cycle 1]: 0.00139124, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 2.456e-05 [loop_unroll]: 1.399e-05 [a_1]: 0.00030156 [with_stream_mark]: 1.463e-05 [recompute_prepare]: 7.63001e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 2.32999e-06 [a_2]: 7.776e-05 [accelerated_algorithm]: 6.02999e-06 [shard]: 2.59999e-06 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 5.54998e-06 [parallel]: 1.872e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 9.57001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 9.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.142e-05 [merge_recompute_call_nodes]: 1.54998e-06 [before_grad]: 9.88998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.42001e-06 [flash_sp_send_recv_attached]: 2.61e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.067e-05 [a_after_grad]: 8.64e-06 [renormalize]: 0.00045521 [add_forward_monad_depend]: 4.85001e-06 [auto_monad_grad]: 2.21e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.918e-05 [a_3]: 4.113e-05 [Cycle 2]: 0.00060109, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 7.00002e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012616 [with_stream_mark]: 1.183e-05 [recompute_prepare]: 5.81998e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.736e-05 [accelerated_algorithm]: 5.37999e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.30999e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.66e-06 [parallel]: 4.24997e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 2.91999e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.49e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.33e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.56998e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.15002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.60999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 1.03001e-06 [receive_attached]: 1.25001e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.22e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.34e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.43998e-06 [cse]: 1.326e-05 [a_3]: 3.22e-05 [py_interpret_to_execute_after_opt_a]: 8.70001e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.323e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.0005404 [opt_b]: 0.0001854, [1] [Cycle 1]: 0.00017899, [7] [b_1]: 0.00011027 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.60997e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 2.50002e-07 [cse]: 1.669e-05 [optimize_parallel_all_gather_comm]: 1.686e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.485e-05 [loop_unroll]: 0.00042069 [opt_after_cconv]: 9.574e-05, [1] [Cycle 1]: 9.001e-05, [7] [c_1]: 2.835e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.66e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.346e-05 [tuple_transform]: 6.973e-05, [1] [Cycle 1]: 6.516e-05, [4] [d_1]: 3.977e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 2.26e-06 [add_recomputation]: 4.587e-05 [cse_after_recomputation]: 2.056e-05, [1] [Cycle 1]: 1.635e-05, [1] [cse]: 1.119e-05 [environ_conv]: 5.06002e-06 [swap_dp_allreduce_reducescatter]: 5.67999e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 3.96001e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.86e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.66999e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.133e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.25e-06 [overlap_grad_flash_sp]: 1.708e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.931e-05, [1] [Cycle 1]: 6.522e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 8.62e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 6.58e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.59998e-06 [auto_monad_reorder]: 1.595e-05 [get_jit_bprop_graph]: 1.35999e-06 [rewriter_after_jit_bprop_graph]: 3.89002e-06 [opt_after_jit_grad]: 0.00045111 [validate]: 3.448e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.0705837 [execute]: 9.50001e-06 Sums bootstrap : 0.000440s : 0.56% type_inference : 0.004444s : 5.61% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000428s : 0.54% optimize.opt_a.with_stream_mark : 0.000026s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.18% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000455s : 0.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000540s : 0.68% optimize.opt_b.b_1 : 0.000110s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.03% optimize.loop_unroll : 0.000421s : 0.53% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000451s : 0.57% validate : 0.000034s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.070584s : 89.06% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000129 26 17.77% : 0.000023s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.10% : 0.000005s : 4: substitution.graph_param_transform 66.51% : 0.000086s : 2: substitution.inline 2.81% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.32% : 0.000004s : 4: substitution.remove_not_recompute_node 3.11% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004402 2 91.81% : 0.004041s : 1: type_inference.infer 8.19% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000084 2 100.00% : 0.000084s : 2: match.inline ------[predicate.] 0.000140 984 1.09% : 0.000002s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.41% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.06% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000009s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.87% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.90% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.94% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.50% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.51% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.70% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.54% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.81% : 0.000003s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.90% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000287 6 41.91% : 0.000120s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.09% : 0.000166s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.091655 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.33% : 0.003048s : 1: add_attr 3.32% : 0.003039s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.52% : 0.000477s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.47% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.60% : 0.000550s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000014s : 1: opt.transform.mutable_eliminate 0.85% : 0.000782s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.19% : 0.002005s : 1: opt_a 0.11% : 0.000099s : 1: opt_after_cconv 0.50% : 0.000461s : 1: opt_after_jit_grad 0.21% : 0.000189s : 1: opt_b 4.29% : 0.003935s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000003s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000028s : 1: pre_auto_parallel 0.02% : 0.000021s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.28% : 0.000259s : 1: renormalize.infer 0.21% : 0.000190s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 77.04% : 0.070607s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 4.87% : 0.004459s : 1: type_inference 0.07% : 0.000062s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x5-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x6-pynative],max_mem:56.0M TotalTime = 0.0218722, [24] [bootstrap]: 0.00054323 [type_inference]: 0.00638083 [event_method]: 1.517e-05 [auto_monad]: 5.912e-05 [graph_reusing]: 5.27001e-06 [inline]: 1.74998e-06 [add_attr]: 0.00356649, [1] [add_attr_with_inline]: 0.00355436, [1] [Cycle 1]: 4.927e-05, [2] [tag_attr]: 1.612e-05 [meta_addattr_fg_expand]: 4.05998e-06 [parallel-infer-symbol]: 3.51001e-06 [pre_auto_parallel]: 3.139e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 1.09998e-06 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.00418881, [53] [py_interpret_to_execute]: 2.304e-05 [rewriter_before_opt_a]: 5.963e-05 [opt_a]: 0.0022523, [2] [Cycle 1]: 0.00164192, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 3.248e-05 [loop_unroll]: 2.14e-05 [a_1]: 0.00046235 [with_stream_mark]: 1.435e-05 [recompute_prepare]: 7.52998e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 2.76999e-06 [parameter_eliminate]: 1.78002e-06 [a_2]: 7.569e-05 [accelerated_algorithm]: 6.61999e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 6.20002e-06 [merge_send_recv]: 8.47998e-06 [auto_parallel]: 6.18998e-06 [parallel]: 2.424e-05 [flash_sp]: 6.94001e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.68e-06 [matmul_add_comm_reduction]: 8.52e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 5.83002e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.93999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.32999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.62002e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.99001e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 1.07e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00052428 [add_forward_monad_depend]: 5.57001e-06 [auto_monad_grad]: 2.15002e-06 [auto_monad_eliminator]: 1.365e-05 [cse]: 2.903e-05 [a_3]: 4.082e-05 [Cycle 2]: 0.00060043, [45] [expand_dump_flag]: 1.22e-06 [switch_simplify]: 7e-06 [loop_unroll]: 5.52001e-06 [a_1]: 0.0001261 [with_stream_mark]: 1.037e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.18002e-06 [updatestate_loads_eliminate]: 2.29999e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.705e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.31002e-06 [auto_parallel]: 5.05999e-06 [parallel]: 4.27e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 2.47001e-06 [matmul_add_comm_reduction]: 4.88001e-06 [allreduce_slice_to_reducescatter]: 4.19997e-07 [virtual_shard_identity]: 6.59999e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.62999e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56998e-06 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.09002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95998e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.81e-06 [a_after_grad]: 8.64e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.88e-06 [cse]: 1.303e-05 [a_3]: 3.197e-05 [py_interpret_to_execute_after_opt_a]: 9.22999e-06 [slice_cell_reuse_recomputed_activation]: 2.05002e-06 [rewriter_after_opt_a]: 3.411e-05 [convert_after_rewriter]: 7.09001e-06 [order_py_execute_after_rewriter]: 5.15001e-06 [mutable_eliminate]: 0.0005213 [opt_b]: 0.0001812, [1] [Cycle 1]: 0.00017516, [7] [b_1]: 0.0001076 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 3.60014e-07 [cse]: 1.712e-05 [optimize_parallel_all_gather_comm]: 1.609e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.304e-05 [loop_unroll]: 0.00042041 [opt_after_cconv]: 9.707e-05, [1] [Cycle 1]: 9.102e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.58e-06 [updatestate_depend_eliminate]: 5.67999e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.634e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.279e-05 [tuple_transform]: 6.85e-05, [1] [Cycle 1]: 6.438e-05, [4] [d_1]: 3.891e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.96e-06 [partial_unused_args_eliminate]: 1.87001e-06 [add_recomputation]: 4.919e-05 [cse_after_recomputation]: 2.088e-05, [1] [Cycle 1]: 1.635e-05, [1] [cse]: 1.108e-05 [environ_conv]: 5.25999e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.48002e-06 [label_micro_interleaved_index]: 4.15999e-06 [label_fine_grained_interleaved_index]: 2.60002e-06 [merge_cast_opt]: 1.39998e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.60002e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.20001e-06 [full_micro_interleaved_order_control]: 2.53998e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.24e-06 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.98002e-06 [control_data_broadcast_order]: 1.117e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.80001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.26998e-06 [overlap_grad_ring_attention]: 4.23999e-06 [overlap_grad_flash_sp]: 1.761e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.43e-06 [split_layernorm_comm]: 2.00002e-06 [handle_group_info]: 1.04998e-06 [symbol_engine_optimizer]: 6.85e-05, [1] [Cycle 1]: 6.445e-05, [6] [build]: 2.55997e-06 [elim_shapecalc]: 8.29002e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.15002e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.55998e-06 [opt_after_jit_grad]: 0.00045665 [validate]: 3.359e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0063435 [execute]: 7.48999e-06 Sums bootstrap : 0.000543s : 3.14% type_inference : 0.006381s : 36.82% event_method : 0.000015s : 0.09% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000031s : 0.18% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000023s : 0.13% optimize.rewriter_before_opt_a : 0.000060s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000588s : 3.40% optimize.opt_a.with_stream_mark : 0.000025s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000524s : 3.03% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000042s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000521s : 3.01% optimize.opt_b.b_1 : 0.000108s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.13% optimize.loop_unroll : 0.000420s : 2.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000457s : 2.64% validate : 0.000034s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006343s : 36.61% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000175 30 14.83% : 0.000026s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.68% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000006s : 4: substitution.graph_param_transform 66.97% : 0.000118s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000005s : 4: substitution.remove_not_recompute_node 2.50% : 0.000004s : 4: substitution.replace_old_param 6.43% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006334 2 90.91% : 0.005758s : 1: type_inference.infer 9.09% : 0.000576s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.77% : 0.000027s : 3: replace.inline 29.23% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000126 5 91.85% : 0.000115s : 3: match.inline 8.15% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.37% : 0.000001s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.12% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.50% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.61% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.81% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000391 8 46.76% : 0.000183s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.24% : 0.000208s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031257 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.43% : 0.003571s : 1: add_attr 11.38% : 0.003558s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.84% : 0.000575s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.70% : 0.000530s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.06% : 0.000956s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.22% : 0.002255s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.49% : 0.000466s : 1: opt_after_jit_grad 0.59% : 0.000185s : 1: opt_b 13.41% : 0.004193s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000036s : 1: pre_auto_parallel 0.09% : 0.000027s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.88% : 0.000275s : 1: renormalize.infer 0.77% : 0.000242s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000038s : 1: rewriter_after_opt_a 0.20% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.33% : 0.006354s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.47% : 0.006397s : 1: type_inference 0.21% : 0.000067s : 1: validate TotalTime = 0.0187502, [24] [bootstrap]: 0.00041535 [type_inference]: 0.00446775 [event_method]: 1.143e-05 [auto_monad]: 5.369e-05 [graph_reusing]: 5.21002e-06 [inline]: 2.06e-06 [add_attr]: 0.00310968, [1] [add_attr_with_inline]: 0.00310124, [1] [Cycle 1]: 4.519e-05, [2] [tag_attr]: 1.279e-05 [meta_addattr_fg_expand]: 3.23998e-06 [parallel-infer-symbol]: 3.26001e-06 [pre_auto_parallel]: 2.249e-05 [insert-virtual-dataset]: 2.91999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.10002e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00385956, [53] [py_interpret_to_execute]: 1.564e-05 [rewriter_before_opt_a]: 3.928e-05 [opt_a]: 0.00197279, [2] [Cycle 1]: 0.00135899, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 2.479e-05 [loop_unroll]: 1.351e-05 [a_1]: 0.00030197 [with_stream_mark]: 1.457e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 3.13998e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.834e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.53e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.02998e-06 [auto_parallel]: 6.17001e-06 [parallel]: 1.814e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 9.19e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.2e-06 [virtual_dataset]: 5.78002e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.73999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.109e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.66998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.50002e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 8.78001e-06 [renormalize]: 0.00042826 [add_forward_monad_depend]: 5.28002e-06 [auto_monad_grad]: 1.87999e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.792e-05 [a_3]: 4.177e-05 [Cycle 2]: 0.00060366, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 7.39002e-06 [loop_unroll]: 5.53997e-06 [a_1]: 0.00012829 [with_stream_mark]: 1.16e-05 [recompute_prepare]: 5.70001e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 6.817e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.34e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.28002e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.44998e-06 [parallel]: 4.74998e-06 [flash_sp]: 3.53e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.54e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.16002e-06 [virtual_output]: 5.91e-06 [merge_forward]: 2.69001e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 6.26e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82001e-06 [merge_recompute_call_nodes]: 8.29983e-07 [before_grad]: 8.48001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.22e-06 [after_resolve]: 8.92999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.56999e-06 [cse]: 1.273e-05 [a_3]: 3.257e-05 [py_interpret_to_execute_after_opt_a]: 8.20999e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 6.76e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.49998e-06 [mutable_eliminate]: 0.00046926 [opt_b]: 0.00018543, [1] [Cycle 1]: 0.00017924, [7] [b_1]: 0.00011066 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 5.50004e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.696e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.282e-05 [loop_unroll]: 0.00041736 [opt_after_cconv]: 9.667e-05, [1] [Cycle 1]: 9.089e-05, [7] [c_1]: 2.825e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 4.91002e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.33998e-06 [cse]: 1.674e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.197e-05 [tuple_transform]: 7.109e-05, [1] [Cycle 1]: 6.677e-05, [4] [d_1]: 4.057e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 4.464e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.574e-05, [1] [cse]: 1.063e-05 [environ_conv]: 4.88001e-06 [swap_dp_allreduce_reducescatter]: 5.02999e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.51998e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.21e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.21e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.98001e-06 [overlap_recompute_and_grad_model_parallel]: 4.80999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.38998e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.794e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 1.87001e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.26997e-06 [symbol_engine_optimizer]: 7.015e-05, [1] [Cycle 1]: 6.593e-05, [6] [build]: 2.66e-06 [elim_shapecalc]: 8.80001e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 6.40002e-06 [fold_const_symbol]: 9.36e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.12001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.611e-05 [get_jit_bprop_graph]: 1.27e-06 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00045191 [validate]: 3.344e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00606417 [execute]: 7.21001e-06 Sums bootstrap : 0.000415s : 2.83% type_inference : 0.004468s : 30.48% event_method : 0.000011s : 0.08% auto_monad : 0.000054s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000430s : 2.94% optimize.opt_a.with_stream_mark : 0.000026s : 0.18% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000428s : 2.92% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000068s : 0.46% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000469s : 3.20% optimize.opt_b.b_1 : 0.000111s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000417s : 2.85% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000041s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000452s : 3.08% validate : 0.000033s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006064s : 41.37% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000125 26 18.82% : 0.000024s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.39% : 0.000006s : 4: substitution.graph_param_transform 65.22% : 0.000082s : 2: substitution.inline 2.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.08% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004421 2 91.58% : 0.004049s : 1: type_inference.infer 8.42% : 0.000372s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000139 984 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.02% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.35% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000009s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.29% : 0.000002s : 11: predicate.partial_defer_inline 1.18% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.91% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.83% : 0.000007s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.69% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.02% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.90% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000261 6 40.28% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.72% : 0.000156s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027093 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.50% : 0.003114s : 1: add_attr 11.46% : 0.003105s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000059s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.67% : 0.000452s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000426s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000479s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.90% : 0.000787s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.29% : 0.001976s : 1: opt_a 0.37% : 0.000100s : 1: opt_after_cconv 1.71% : 0.000462s : 1: opt_after_jit_grad 0.70% : 0.000189s : 1: opt_b 14.26% : 0.003864s : 1: optimize 0.08% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.92% : 0.000249s : 1: renormalize.infer 0.64% : 0.000172s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.27% : 0.000072s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000073s : 1: symbol_engine_optimizer 22.42% : 0.006075s : 1: task_emit 0.27% : 0.000074s : 1: tuple_transform 16.56% : 0.004486s : 1: type_inference 0.23% : 0.000063s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x6-kbk],max_mem:56.0M TotalTime = 0.071547, [24] [bootstrap]: 0.00053457 [type_inference]: 0.00598925 [event_method]: 1.398e-05 [auto_monad]: 6.005e-05 [graph_reusing]: 5.04e-06 [inline]: 2.02999e-06 [add_attr]: 0.00355732, [1] [add_attr_with_inline]: 0.00354513, [1] [Cycle 1]: 4.959e-05, [2] [tag_attr]: 1.723e-05 [meta_addattr_fg_expand]: 3.92998e-06 [parallel-infer-symbol]: 3.48999e-06 [pre_auto_parallel]: 2.933e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00430477, [53] [py_interpret_to_execute]: 2.111e-05 [rewriter_before_opt_a]: 6.035e-05 [opt_a]: 0.00237704, [2] [Cycle 1]: 0.00169373, [45] [expand_dump_flag]: 2.95998e-06 [switch_simplify]: 3.316e-05 [loop_unroll]: 2.1e-05 [a_1]: 0.0004718 [with_stream_mark]: 1.424e-05 [recompute_prepare]: 8.28001e-06 [updatestate_depend_eliminate]: 4.28001e-06 [updatestate_assign_eliminate]: 3.86001e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.805e-05 [accelerated_algorithm]: 6.70998e-06 [shard]: 2.31998e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 6.04001e-06 [merge_send_recv]: 8.32998e-06 [auto_parallel]: 7.05e-06 [parallel]: 2.51e-05 [flash_sp]: 7.71999e-06 [merge_comm]: 3.57002e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 9.01998e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.56999e-06 [virtual_dataset]: 6.28998e-06 [get_grad_eliminate_]: 5.50001e-06 [virtual_output]: 5.69999e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 9.66e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.156e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.70002e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 9.00001e-06 [renormalize]: 0.00055144 [add_forward_monad_depend]: 5.30999e-06 [auto_monad_grad]: 2.15002e-06 [auto_monad_eliminator]: 1.502e-05 [cse]: 2.855e-05 [a_3]: 4.249e-05 [Cycle 2]: 0.00067296, [45] [expand_dump_flag]: 1.13001e-06 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.57001e-06 [a_1]: 0.0001323 [with_stream_mark]: 1.124e-05 [recompute_prepare]: 5.72999e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 7.90023e-07 [a_2]: 6.839e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.40001e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 5.31002e-06 [auto_parallel]: 5.75001e-06 [parallel]: 4.71002e-06 [flash_sp]: 3.8e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 6.19999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.43003e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.86999e-06 [cell_reuse_recompute_pass]: 1.85001e-06 [offload_activation]: 7.17002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 9.00007e-07 [before_grad]: 7.85998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 1.87001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.30001e-06 [after_resolve]: 9.20001e-06 [a_after_grad]: 8.64998e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.50002e-06 [cse]: 1.769e-05 [a_3]: 3.222e-05 [py_interpret_to_execute_after_opt_a]: 8.65999e-06 [slice_cell_reuse_recomputed_activation]: 1.78002e-06 [rewriter_after_opt_a]: 3.378e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.00050244 [opt_b]: 0.00018573, [1] [Cycle 1]: 0.00017931, [7] [b_1]: 0.0001095 [b_2]: 7.21001e-06 [updatestate_depend_eliminate]: 6.22001e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 5.39992e-07 [cse]: 1.687e-05 [optimize_parallel_all_gather_comm]: 1.623e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.525e-05 [loop_unroll]: 0.00042329 [opt_after_cconv]: 9.722e-05, [1] [Cycle 1]: 9.115e-05, [7] [c_1]: 2.912e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 5.62001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 1.648e-05 [renormalize]: 4.90021e-07 [remove_dup_value]: 1.236e-05 [tuple_transform]: 6.974e-05, [1] [Cycle 1]: 6.549e-05, [4] [d_1]: 3.96e-05 [none_parameter_eliminate]: 1.80001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.22001e-06 [partial_unused_args_eliminate]: 1.89999e-06 [add_recomputation]: 4.914e-05 [cse_after_recomputation]: 2.05e-05, [1] [Cycle 1]: 1.593e-05, [1] [cse]: 1.072e-05 [environ_conv]: 4.91002e-06 [swap_dp_allreduce_reducescatter]: 5.43002e-06 [bias_add_comm_swap]: 2.73003e-06 [label_micro_interleaved_index]: 4.51002e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.56998e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.54e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.75997e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.27999e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76998e-06 [control_data_broadcast_order]: 1.137e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.58999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.57999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72999e-06 [overlap_recompute_comm]: 2.06e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.778e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.978e-05, [1] [Cycle 1]: 6.544e-05, [6] [build]: 2.49001e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.92999e-06 [pipeline_parallel_scheduler]: 1.46998e-06 [auto_monad_reorder]: 1.557e-05 [get_jit_bprop_graph]: 1.39e-06 [rewriter_after_jit_bprop_graph]: 3.88999e-06 [opt_after_jit_grad]: 0.00046134 [validate]: 3.436e-05 [backend_pass]: 1.22e-06 [task_emit]: 0.056278 [execute]: 9.54e-06 Sums bootstrap : 0.000535s : 0.80% type_inference : 0.005989s : 8.95% event_method : 0.000014s : 0.02% auto_monad : 0.000060s : 0.09% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000029s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000604s : 0.90% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000030s : 0.04% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000552s : 0.82% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.03% optimize.opt_a.cse : 0.000046s : 0.07% optimize.opt_a.a_3 : 0.000075s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000502s : 0.75% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000025s : 0.04% optimize.loop_unroll : 0.000423s : 0.63% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000461s : 0.69% validate : 0.000034s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.056278s : 84.09% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000178 30 14.47% : 0.000026s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.06% : 0.000005s : 4: substitution.graph_param_transform 68.12% : 0.000121s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000005s : 4: substitution.remove_not_recompute_node 2.32% : 0.000004s : 4: substitution.replace_old_param 5.87% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005942 2 90.63% : 0.005385s : 1: type_inference.infer 9.37% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.39% : 0.000028s : 3: replace.inline 29.61% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000128 5 92.74% : 0.000119s : 3: match.inline 7.26% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000168 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.71% : 0.000005s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.00% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.22% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.82% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.64% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000001s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000002s : 8: predicate.less_batch_normalization 1.60% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 11: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000003s : 16: predicate.partial_defer_inline 1.38% : 0.000002s : 17: predicate.partial_eliminate 0.79% : 0.000001s : 11: predicate.print_const_string_wrapper 0.59% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.28% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000002s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.09% : 0.000002s : 8: predicate.shard_identity_eliminate 1.03% : 0.000002s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.31% : 0.000002s : 16: predicate.switch_defer_inline 1.93% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.88% : 0.000001s : 11: predicate.tile_eliminate 0.92% : 0.000002s : 11: predicate.transpose_eliminate 1.38% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.47% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.04% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000366 8 44.24% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.76% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081089 196 0.00% : 0.000003s : 1: ForceFp32Comm 4.39% : 0.003561s : 1: add_attr 4.38% : 0.003549s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000065s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.71% : 0.000577s : 1: bootstrap 0.04% : 0.000029s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.63% : 0.000511s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.21% : 0.000979s : 78: opt.transform.opt_a 0.03% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.94% : 0.002380s : 1: opt_a 0.12% : 0.000101s : 1: opt_after_cconv 0.58% : 0.000471s : 1: opt_after_jit_grad 0.23% : 0.000189s : 1: opt_b 5.31% : 0.004309s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000034s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.37% : 0.000298s : 1: renormalize.infer 0.30% : 0.000246s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000038s : 1: rewriter_after_opt_a 0.08% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 69.43% : 0.056300s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 7.41% : 0.006005s : 1: type_inference 0.08% : 0.000068s : 1: validate TotalTime = 0.06243, [24] [bootstrap]: 0.0004281 [type_inference]: 0.00486843 [event_method]: 1.285e-05 [auto_monad]: 5.53e-05 [graph_reusing]: 4.99003e-06 [inline]: 2.48e-06 [add_attr]: 0.0032982, [1] [add_attr_with_inline]: 0.00328816, [1] [Cycle 1]: 6.004e-05, [2] [tag_attr]: 1.418e-05 [meta_addattr_fg_expand]: 3.67002e-06 [parallel-infer-symbol]: 3.56001e-06 [pre_auto_parallel]: 2.988e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.86998e-06 [pipeline_split]: 1.59998e-06 [optimize]: 0.00454989, [53] [py_interpret_to_execute]: 1.894e-05 [rewriter_before_opt_a]: 4.689e-05 [opt_a]: 0.00234156, [2] [Cycle 1]: 0.00168545, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 2.634e-05 [loop_unroll]: 1.406e-05 [a_1]: 0.00037248 [with_stream_mark]: 1.964e-05 [recompute_prepare]: 8.31002e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.40003e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.94999e-06 [a_2]: 7.885e-05 [accelerated_algorithm]: 6.99001e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 6.14999e-06 [merge_send_recv]: 8.07998e-06 [auto_parallel]: 6.88e-06 [parallel]: 2.155e-05 [flash_sp]: 8.75001e-06 [merge_comm]: 3.85e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 9.44e-06 [allreduce_slice_to_reducescatter]: 5.99975e-07 [virtual_shard_identity]: 7.50003e-06 [virtual_dataset]: 6.02001e-06 [get_grad_eliminate_]: 6.01e-06 [virtual_output]: 6.28e-06 [merge_forward]: 3.73001e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 1.046e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.172e-05 [merge_recompute_call_nodes]: 1.50001e-06 [before_grad]: 1.016e-05 [set_forward_comm_id_for_comm_node_pass]: 3.57002e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.74999e-06 [receive_attached]: 2.67001e-06 [after_resolve]: 1.063e-05 [a_after_grad]: 9.00001e-06 [renormalize]: 0.00063159 [add_forward_monad_depend]: 6.41e-06 [auto_monad_grad]: 2.72001e-06 [auto_monad_eliminator]: 1.563e-05 [cse]: 3.122e-05 [a_3]: 4.482e-05 [Cycle 2]: 0.00064473, [45] [expand_dump_flag]: 1.28002e-06 [switch_simplify]: 7.11001e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00013094 [with_stream_mark]: 1.406e-05 [recompute_prepare]: 5.60001e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.83e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 1.22e-06 [a_2]: 6.787e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.57001e-06 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.12002e-06 [parallel]: 5.74999e-06 [flash_sp]: 3.50998e-06 [merge_comm]: 4.05998e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 6.88998e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.91999e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.94999e-06 [cell_reuse_recompute_pass]: 1.71002e-06 [offload_activation]: 7.68001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.137e-05 [merge_recompute_call_nodes]: 1.06002e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 1.80001e-06 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.25999e-06 [after_resolve]: 1.073e-05 [a_after_grad]: 8.67998e-06 [renormalize]: 1.69995e-07 [add_forward_monad_depend]: 1.58002e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 9.36e-06 [cse]: 1.653e-05 [a_3]: 3.228e-05 [py_interpret_to_execute_after_opt_a]: 1.285e-05 [slice_cell_reuse_recomputed_activation]: 2.01003e-06 [rewriter_after_opt_a]: 3.725e-05 [convert_after_rewriter]: 7.45998e-06 [order_py_execute_after_rewriter]: 5.40999e-06 [mutable_eliminate]: 0.00062846 [opt_b]: 0.0001986, [1] [Cycle 1]: 0.00019157, [7] [b_1]: 0.00011038 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 8.47e-06 [updatestate_assign_eliminate]: 2.66e-06 [updatestate_loads_eliminate]: 2.50002e-06 [renormalize]: 7.39994e-07 [cse]: 2.318e-05 [optimize_parallel_all_gather_comm]: 1.863e-05 [overlap_param_gather]: 2.16e-06 [cconv]: 3.005e-05 [loop_unroll]: 0.00046987 [opt_after_cconv]: 0.0001039, [1] [Cycle 1]: 9.739e-05, [7] [c_1]: 2.893e-05 [parameter_eliminate]: 3.86999e-06 [updatestate_depend_eliminate]: 6.09999e-06 [updatestate_assign_eliminate]: 2.63998e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.894e-05 [renormalize]: 5.3001e-07 [remove_dup_value]: 1.446e-05 [tuple_transform]: 7.47e-05, [1] [Cycle 1]: 7.032e-05, [4] [d_1]: 4.42e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.40019e-07 [switch_simplify]: 6.58e-06 [partial_unused_args_eliminate]: 2.07999e-06 [add_recomputation]: 5.029e-05 [cse_after_recomputation]: 2.223e-05, [1] [Cycle 1]: 1.773e-05, [1] [cse]: 1.201e-05 [environ_conv]: 5.69e-06 [swap_dp_allreduce_reducescatter]: 5.35001e-06 [bias_add_comm_swap]: 2.69001e-06 [label_micro_interleaved_index]: 4.55999e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.47001e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.71e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.63002e-06 [control_data_broadcast_order]: 1.285e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 4.18999e-06 [overlap_recompute_and_grad_model_parallel]: 4.97999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.33999e-06 [overlap_grad_flash_sp]: 2.232e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.41998e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 7.345e-05, [1] [Cycle 1]: 6.919e-05, [6] [build]: 3.23e-06 [elim_shapecalc]: 9.54e-06 [elim_not_effective]: 1.234e-05 [opt_reshape]: 6.44999e-06 [fold_const_symbol]: 8.83001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 2.32001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.698e-05 [get_jit_bprop_graph]: 2.14e-06 [rewriter_after_jit_bprop_graph]: 4.43001e-06 [opt_after_jit_grad]: 0.00050857 [validate]: 4.05e-05 [backend_pass]: 8.00006e-07 [task_emit]: 0.0483339 [execute]: 9.51e-06 Sums bootstrap : 0.000428s : 0.74% type_inference : 0.004868s : 8.39% event_method : 0.000013s : 0.02% auto_monad : 0.000055s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000030s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000047s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000033s : 0.06% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000503s : 0.87% optimize.opt_a.with_stream_mark : 0.000034s : 0.06% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000147s : 0.25% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000004s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000015s : 0.03% optimize.opt_a.auto_parallel : 0.000014s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.05% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000008s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000018s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000023s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.04% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000632s : 1.09% optimize.opt_a.add_forward_monad_depend : 0.000008s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000025s : 0.04% optimize.opt_a.cse : 0.000048s : 0.08% optimize.opt_a.a_3 : 0.000077s : 0.13% optimize.py_interpret_to_execute_after_opt_a : 0.000013s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000037s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000628s : 1.08% optimize.opt_b.b_1 : 0.000110s : 0.19% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000023s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000030s : 0.05% optimize.loop_unroll : 0.000470s : 0.81% optimize.opt_after_cconv.c_1 : 0.000029s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000019s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000014s : 0.02% optimize.tuple_transform.d_1 : 0.000044s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.09% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000006s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000013s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000022s : 0.04% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000010s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000017s : 0.03% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000509s : 0.88% validate : 0.000040s : 0.07% backend_pass : 0.000001s : 0.00% task_emit : 0.048334s : 83.33% execute : 0.000010s : 0.02% Time group info: ------[substitution.] 0.000148 26 17.73% : 0.000026s : 4: substitution.arithmetic_simplify 1.25% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 4.12% : 0.000006s : 4: substitution.graph_param_transform 66.62% : 0.000099s : 2: substitution.inline 2.73% : 0.000004s : 4: substitution.j_node_and_user_rematch 3.31% : 0.000005s : 4: substitution.remove_not_recompute_node 3.41% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004814 2 90.72% : 0.004367s : 1: type_inference.infer 9.28% : 0.000447s : 1: type_inference.specialize ------[replace.] 0.000074 2 100.00% : 0.000074s : 2: replace.inline ------[match.] 0.000097 2 100.00% : 0.000097s : 2: match.inline ------[predicate.] 0.000145 984 0.98% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.66% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.22% : 0.000003s : 17: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.46% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_depend_swap 1.96% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000003s : 11: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.43% : 0.000001s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000009s : 44: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000002s : 8: predicate.less_batch_normalization 1.46% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.98% : 0.000003s : 26: predicate.load_eliminater 1.45% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.63% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.63% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.66% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 2.39% : 0.000003s : 4: predicate.mutable_eliminate 0.61% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.30% : 0.000002s : 11: predicate.partial_defer_inline 1.13% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 1.97% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.94% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.65% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.61% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.95% : 0.000001s : 11: predicate.switch_defer_inline 1.68% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.31% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.39% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.61% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.97% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.97% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.03% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 1.14% : 0.000002s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000329 6 39.26% : 0.000129s : 2: func_graph_cloner_run.FuncGraphClonerGraph 60.74% : 0.000200s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.071944 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.59% : 0.003304s : 1: add_attr 4.58% : 0.003292s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000061s : 1: auto_monad 0.03% : 0.000021s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000469s : 1: bootstrap 0.05% : 0.000034s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000016s : 1: control_data_broadcast_order 0.02% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000006s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000005s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.67% : 0.000481s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.89% : 0.000641s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000015s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000020s : 1: opt.transform.mutable_eliminate 1.21% : 0.000869s : 78: opt.transform.opt_a 0.04% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000024s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000093s : 28: opt.transform.opt_b 0.07% : 0.000048s : 2: opt.transform.opt_trans_graph 0.05% : 0.000034s : 4: opt.transform.symbol_engine_opt 3.26% : 0.002345s : 1: opt_a 0.15% : 0.000107s : 1: opt_after_cconv 0.73% : 0.000522s : 1: opt_after_jit_grad 0.28% : 0.000202s : 1: opt_b 6.33% : 0.004555s : 1: optimize 0.03% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.05% : 0.000034s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.02% : 0.000017s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000018s : 1: remove_dup_value 0.51% : 0.000369s : 1: renormalize.infer 0.35% : 0.000254s : 1: renormalize.specialize 0.08% : 0.000059s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000008s : 1: rewriter_after_jit_bprop_graph 0.06% : 0.000042s : 1: rewriter_after_opt_a 0.07% : 0.000052s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000076s : 1: symbol_engine_optimizer 67.21% : 0.048356s : 1: task_emit 0.11% : 0.000078s : 1: tuple_transform 6.80% : 0.004893s : 1: type_inference 0.10% : 0.000073s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x6-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x7-pynative],max_mem:56.0M TotalTime = 0.0225579, [24] [bootstrap]: 0.00053222 [type_inference]: 0.00629552 [event_method]: 1.486e-05 [auto_monad]: 5.688e-05 [graph_reusing]: 5.62001e-06 [inline]: 2.03002e-06 [add_attr]: 0.00368944, [1] [add_attr_with_inline]: 0.00367817, [1] [Cycle 1]: 5.234e-05, [2] [tag_attr]: 1.746e-05 [meta_addattr_fg_expand]: 4.07e-06 [parallel-infer-symbol]: 3.97e-06 [pre_auto_parallel]: 3.164e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 8.50006e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00436575, [53] [py_interpret_to_execute]: 2.44e-05 [rewriter_before_opt_a]: 6.123e-05 [opt_a]: 0.00242341, [2] [Cycle 1]: 0.00179733, [45] [expand_dump_flag]: 2.77002e-06 [switch_simplify]: 3.333e-05 [loop_unroll]: 2.087e-05 [a_1]: 0.00047809 [with_stream_mark]: 1.531e-05 [recompute_prepare]: 7.83001e-06 [updatestate_depend_eliminate]: 4.02e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 2.89001e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.797e-05 [accelerated_algorithm]: 6.73998e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.66002e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.08999e-06 [auto_parallel]: 6.26998e-06 [parallel]: 2.409e-05 [flash_sp]: 7.93001e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.67998e-06 [matmul_add_comm_reduction]: 9.54e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.93999e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 6.517e-05 [virtual_output]: 6.79999e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 1.091e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.211e-05 [merge_recompute_call_nodes]: 1.63002e-06 [before_grad]: 9.67999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.67002e-06 [meta_fg_expand]: 2.49999e-06 [flash_sp_send_recv_attached]: 2.74999e-06 [receive_attached]: 2.58e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 1.161e-05 [renormalize]: 0.00058096 [add_forward_monad_depend]: 4.90999e-06 [auto_monad_grad]: 2.17999e-06 [auto_monad_eliminator]: 1.406e-05 [cse]: 3.078e-05 [a_3]: 4.246e-05 [Cycle 2]: 0.00061576, [45] [expand_dump_flag]: 1.27999e-06 [switch_simplify]: 6.88998e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00013125 [with_stream_mark]: 1.124e-05 [recompute_prepare]: 6.21e-06 [updatestate_depend_eliminate]: 3.18e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.9e-05 [accelerated_algorithm]: 5.77999e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.35999e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.86997e-06 [auto_parallel]: 6.08998e-06 [parallel]: 4.70001e-06 [flash_sp]: 2.89999e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 6.71999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.38e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.65001e-06 [offload_activation]: 6.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.29998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.22997e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.18001e-06 [after_resolve]: 9.32999e-06 [a_after_grad]: 8.35001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.42999e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 7.11999e-06 [cse]: 1.326e-05 [a_3]: 3.254e-05 [py_interpret_to_execute_after_opt_a]: 9.35001e-06 [slice_cell_reuse_recomputed_activation]: 2.18998e-06 [rewriter_after_opt_a]: 3.414e-05 [convert_after_rewriter]: 7.21999e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00049739 [opt_b]: 0.00018765, [1] [Cycle 1]: 0.00018111, [7] [b_1]: 0.00010958 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 6.19001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 8.2e-07 [cse]: 1.751e-05 [optimize_parallel_all_gather_comm]: 1.621e-05 [overlap_param_gather]: 1.76998e-06 [cconv]: 2.416e-05 [loop_unroll]: 0.00042754 [opt_after_cconv]: 9.871e-05, [1] [Cycle 1]: 9.266e-05, [7] [c_1]: 2.866e-05 [parameter_eliminate]: 2.43002e-06 [updatestate_depend_eliminate]: 5.53997e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.754e-05 [renormalize]: 5.39992e-07 [remove_dup_value]: 1.294e-05 [tuple_transform]: 7.133e-05, [1] [Cycle 1]: 6.707e-05, [4] [d_1]: 4.094e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 5.221e-05 [cse_after_recomputation]: 2.066e-05, [1] [Cycle 1]: 1.624e-05, [1] [cse]: 1.099e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.18002e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.64998e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.64e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.38002e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.21e-05 [grouped_pairwise_exchange_alltoall]: 1.85001e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 4.52e-06 [overlap_grad_flash_sp]: 1.793e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 7.134e-05, [1] [Cycle 1]: 6.702e-05, [6] [build]: 3.13998e-06 [elim_shapecalc]: 9.29e-06 [elim_not_effective]: 1.197e-05 [opt_reshape]: 6.43e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 1.99972e-07 [detach_backward]: 2.19001e-06 [pipeline_parallel_scheduler]: 1.48002e-06 [auto_monad_reorder]: 1.556e-05 [get_jit_bprop_graph]: 1.20001e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00046862 [validate]: 3.551e-05 [backend_pass]: 1.09e-06 [task_emit]: 0.00679203 [execute]: 7.31001e-06 Sums bootstrap : 0.000532s : 2.98% type_inference : 0.006296s : 35.25% event_method : 0.000015s : 0.08% auto_monad : 0.000057s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000004s : 0.02% pre_auto_parallel : 0.000032s : 0.18% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.14% optimize.rewriter_before_opt_a : 0.000061s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000609s : 3.41% optimize.opt_a.with_stream_mark : 0.000027s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000147s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000070s : 0.39% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000020s : 0.11% optimize.opt_a.renormalize : 0.000581s : 3.25% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000075s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000034s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000497s : 2.79% optimize.opt_b.b_1 : 0.000110s : 0.61% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000018s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.14% optimize.loop_unroll : 0.000428s : 2.39% optimize.opt_after_cconv.c_1 : 0.000029s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000018s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000469s : 2.62% validate : 0.000036s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006792s : 38.03% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000184 30 14.78% : 0.000027s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.31% : 0.000006s : 4: substitution.graph_param_transform 67.75% : 0.000125s : 3: substitution.inline 1.55% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.76% : 0.000005s : 4: substitution.remove_not_recompute_node 2.32% : 0.000004s : 4: substitution.replace_old_param 5.79% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006248 2 90.53% : 0.005656s : 1: type_inference.infer 9.47% : 0.000592s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.85% : 0.000028s : 3: replace.inline 30.15% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000133 5 92.71% : 0.000123s : 3: match.inline 7.29% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 1.04% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.99% : 0.000002s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 19: predicate.arithmetic_simplify 0.81% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.57% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.89% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 5.75% : 0.000009s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.75% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.80% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.63% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.63% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000002s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.80% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 1.06% : 0.000002s : 11: predicate.transpose_eliminate 1.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.95% : 0.000002s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000392 8 46.12% : 0.000181s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.88% : 0.000211s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.032337 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.42% : 0.003694s : 1: add_attr 11.39% : 0.003682s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000062s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.76% : 0.000570s : 1: bootstrap 0.09% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000006s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.35% : 0.000437s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.57% : 0.000507s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.06% : 0.000989s : 78: opt.transform.opt_a 0.08% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.50% : 0.002427s : 1: opt_a 0.32% : 0.000102s : 1: opt_after_cconv 1.48% : 0.000479s : 1: opt_after_jit_grad 0.59% : 0.000191s : 1: opt_b 13.51% : 0.004370s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000008s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000037s : 1: pre_auto_parallel 0.09% : 0.000029s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.96% : 0.000310s : 1: renormalize.infer 0.81% : 0.000264s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000038s : 1: rewriter_after_opt_a 0.20% : 0.000066s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000074s : 1: symbol_engine_optimizer 21.05% : 0.006807s : 1: task_emit 0.23% : 0.000074s : 1: tuple_transform 19.52% : 0.006313s : 1: type_inference 0.22% : 0.000071s : 1: validate TotalTime = 0.0187627, [24] [bootstrap]: 0.00040925 [type_inference]: 0.0044223 [event_method]: 1.089e-05 [auto_monad]: 5.13e-05 [graph_reusing]: 4.82e-06 [inline]: 2.14999e-06 [add_attr]: 0.00311326, [1] [add_attr_with_inline]: 0.00310446, [1] [Cycle 1]: 4.592e-05, [2] [tag_attr]: 1.324e-05 [meta_addattr_fg_expand]: 3.08e-06 [parallel-infer-symbol]: 3.25e-06 [pre_auto_parallel]: 2.499e-05 [insert-virtual-dataset]: 2.42001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00386751, [53] [py_interpret_to_execute]: 1.585e-05 [rewriter_before_opt_a]: 4.019e-05 [opt_a]: 0.00201682, [2] [Cycle 1]: 0.00140265, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 2.436e-05 [loop_unroll]: 1.446e-05 [a_1]: 0.00030458 [with_stream_mark]: 1.585e-05 [recompute_prepare]: 8.01001e-06 [updatestate_depend_eliminate]: 3.79002e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 3.45e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.755e-05 [accelerated_algorithm]: 6.54001e-06 [shard]: 2.34999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 7.5e-06 [auto_parallel]: 5.99999e-06 [parallel]: 1.796e-05 [flash_sp]: 7.4e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 9.24e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 5.92999e-06 [virtual_output]: 5.94999e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.59999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.137e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.52001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.36e-06 [after_resolve]: 3.155e-05 [a_after_grad]: 9.61e-06 [renormalize]: 0.00044246 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.86998e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.806e-05 [a_3]: 4.139e-05 [Cycle 2]: 0.00060381, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 6.96999e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.00012743 [with_stream_mark]: 1.02e-05 [recompute_prepare]: 5.61998e-06 [updatestate_depend_eliminate]: 3.03e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 7.79983e-07 [a_2]: 6.905e-05 [accelerated_algorithm]: 5.51998e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.48999e-06 [auto_parallel]: 5.72001e-06 [parallel]: 4.55999e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.19001e-06 [allreduce_fusion]: 2.68998e-06 [matmul_add_comm_reduction]: 5.37999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 5.92999e-06 [virtual_dataset]: 5.21002e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.25999e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 6.21998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.76998e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.82999e-06 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.28999e-06 [renormalize]: 1.60013e-07 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.49999e-06 [cse]: 1.261e-05 [a_3]: 3.221e-05 [py_interpret_to_execute_after_opt_a]: 8.33999e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.243e-05 [convert_after_rewriter]: 6.76e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00047221 [opt_b]: 0.00018492, [1] [Cycle 1]: 0.00017846, [7] [b_1]: 0.00011037 [b_2]: 7.38e-06 [updatestate_depend_eliminate]: 5.49e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 5.10016e-07 [cse]: 1.69e-05 [optimize_parallel_all_gather_comm]: 1.562e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.275e-05 [loop_unroll]: 0.00041568 [opt_after_cconv]: 9.599e-05, [1] [Cycle 1]: 9.018e-05, [7] [c_1]: 2.853e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.668e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.213e-05 [tuple_transform]: 7.017e-05, [1] [Cycle 1]: 6.576e-05, [4] [d_1]: 3.978e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 6.53003e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 4.527e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.639e-05, [1] [cse]: 1.128e-05 [environ_conv]: 4.92e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.94001e-06 [label_micro_interleaved_index]: 4.95001e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.53003e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.35999e-06 [overlap_opt_shard_in_pipeline]: 1.24e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.218e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 4.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.43998e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.742e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.97001e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.09998e-06 [symbol_engine_optimizer]: 6.969e-05, [1] [Cycle 1]: 6.525e-05, [6] [build]: 2.30002e-06 [elim_shapecalc]: 8.55001e-06 [elim_not_effective]: 1.168e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.617e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.14999e-06 [opt_after_jit_grad]: 0.00044986 [validate]: 3.297e-05 [backend_pass]: 1.09998e-06 [task_emit]: 0.00612212 [execute]: 8.55001e-06 Sums bootstrap : 0.000409s : 2.79% type_inference : 0.004422s : 30.15% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.17% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000432s : 2.95% optimize.opt_a.with_stream_mark : 0.000026s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000042s : 0.28% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000443s : 3.02% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000472s : 3.22% optimize.opt_b.b_1 : 0.000110s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000416s : 2.83% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 3.07% validate : 0.000033s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006122s : 41.74% execute : 0.000009s : 0.06% Time group info: ------[substitution.] 0.000127 26 17.87% : 0.000023s : 4: substitution.arithmetic_simplify 1.41% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.05% : 0.000005s : 4: substitution.graph_param_transform 66.69% : 0.000085s : 2: substitution.inline 2.10% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.50% : 0.000004s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004378 2 91.91% : 0.004024s : 1: type_inference.infer 8.09% : 0.000354s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000083 2 100.00% : 0.000083s : 2: match.inline ------[predicate.] 0.000141 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.15% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.72% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.39% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.67% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.60% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.02% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.69% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.48% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.99% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000256 6 40.88% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.12% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027159 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.48% : 0.003118s : 1: add_attr 11.44% : 0.003108s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.64% : 0.000446s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.56% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000482s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.99% : 0.000812s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.44% : 0.002020s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.69% : 0.000460s : 1: opt_after_jit_grad 0.69% : 0.000188s : 1: opt_b 14.25% : 0.003871s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000029s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.94% : 0.000255s : 1: renormalize.infer 0.66% : 0.000180s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000045s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.58% : 0.006133s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.34% : 0.004438s : 1: type_inference 0.23% : 0.000063s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x7-kbk],max_mem:56.0M TotalTime = 0.870674, [24] [bootstrap]: 0.00050006 [type_inference]: 0.00608825 [event_method]: 1.395e-05 [auto_monad]: 5.56e-05 [graph_reusing]: 5.40999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00348324, [1] [add_attr_with_inline]: 0.00347087, [1] [Cycle 1]: 4.616e-05, [2] [tag_attr]: 1.552e-05 [meta_addattr_fg_expand]: 4.02002e-06 [parallel-infer-symbol]: 3.45e-06 [pre_auto_parallel]: 2.895e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00411026, [53] [py_interpret_to_execute]: 2.052e-05 [rewriter_before_opt_a]: 6.016e-05 [opt_a]: 0.00219577, [2] [Cycle 1]: 0.00158974, [45] [expand_dump_flag]: 2.76999e-06 [switch_simplify]: 3.26e-05 [loop_unroll]: 2.141e-05 [a_1]: 0.00046821 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 7.968e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 1.92001e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 6.00002e-06 [merge_send_recv]: 8.35001e-06 [auto_parallel]: 5.81e-06 [parallel]: 2.379e-05 [flash_sp]: 7.15998e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.42e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.66001e-06 [virtual_dataset]: 6.22001e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 8.98002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.53e-06 [receive_attached]: 2.65997e-06 [after_resolve]: 1.078e-05 [a_after_grad]: 8.94e-06 [renormalize]: 0.0004623 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.87001e-06 [auto_monad_eliminator]: 1.34e-05 [cse]: 2.889e-05 [a_3]: 4.198e-05 [Cycle 2]: 0.00059627, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00012476 [with_stream_mark]: 9.74e-06 [recompute_prepare]: 5.99999e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 7.005e-05 [accelerated_algorithm]: 5.68997e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.32e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.53999e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.21001e-06 [flash_sp]: 3.21999e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 4.89998e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 6.38e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.75002e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.47e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.45001e-06 [a_after_grad]: 8.05e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 5.94e-06 [cse]: 1.621e-05 [a_3]: 3.207e-05 [py_interpret_to_execute_after_opt_a]: 9.15001e-06 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 3.231e-05 [convert_after_rewriter]: 7.17002e-06 [order_py_execute_after_rewriter]: 5.52001e-06 [mutable_eliminate]: 0.00050784 [opt_b]: 0.00018419, [1] [Cycle 1]: 0.0001777, [7] [b_1]: 0.00010966 [b_2]: 7.13998e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 6.79982e-07 [cse]: 1.651e-05 [optimize_parallel_all_gather_comm]: 1.581e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.259e-05 [loop_unroll]: 0.00041239 [opt_after_cconv]: 9.665e-05, [1] [Cycle 1]: 9.062e-05, [7] [c_1]: 2.879e-05 [parameter_eliminate]: 2.81e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.41998e-06 [cse]: 1.642e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.224e-05 [tuple_transform]: 7.163e-05, [1] [Cycle 1]: 6.701e-05, [4] [d_1]: 4.104e-05 [none_parameter_eliminate]: 1.46002e-06 [renormalize]: 1.99972e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.824e-05 [cse_after_recomputation]: 1.988e-05, [1] [Cycle 1]: 1.551e-05, [1] [cse]: 1.039e-05 [environ_conv]: 4.57e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.91e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.56998e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 9.69972e-07 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.184e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.53002e-06 [overlap_recompute_comm]: 2.05002e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.783e-05 [begin_end_overlap_inline]: 8.40024e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.007e-05, [1] [Cycle 1]: 6.584e-05, [6] [build]: 2.41998e-06 [elim_shapecalc]: 8.82e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 6.46e-06 [fold_const_symbol]: 9.00001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.572e-05 [get_jit_bprop_graph]: 1.19998e-06 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00044398 [validate]: 3.413e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.855643 [execute]: 9.79e-06 Sums bootstrap : 0.000500s : 0.06% type_inference : 0.006088s : 0.70% event_method : 0.000014s : 0.00% auto_monad : 0.000056s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000060s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000593s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000150s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000462s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000045s : 0.01% optimize.opt_a.a_3 : 0.000074s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000508s : 0.06% optimize.opt_b.b_1 : 0.000110s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.00% optimize.loop_unroll : 0.000412s : 0.05% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.05% validate : 0.000034s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.855643s : 98.78% execute : 0.000010s : 0.00% Time group info: ------[substitution.] 0.000173 30 15.18% : 0.000026s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000005s : 4: substitution.graph_param_transform 66.70% : 0.000116s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000005s : 4: substitution.remove_not_recompute_node 2.49% : 0.000004s : 4: substitution.replace_old_param 6.41% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006042 2 90.67% : 0.005478s : 1: type_inference.infer 9.33% : 0.000563s : 1: type_inference.specialize ------[replace.] 0.000039 5 68.12% : 0.000026s : 3: replace.inline 31.88% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000124 5 91.91% : 0.000114s : 3: match.inline 8.09% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 1.01% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.29% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.91% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.83% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.56% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000364 8 45.11% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.89% : 0.000200s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.879849 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.40% : 0.003487s : 1: add_attr 0.39% : 0.003475s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000060s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.06% : 0.000538s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000020s : 1: event_method 0.00% : 0.000018s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000421s : 1: loop_unroll 0.00% : 0.000005s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000517s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.11% : 0.000968s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000091s : 28: opt.transform.opt_b 0.01% : 0.000045s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.25% : 0.002199s : 1: opt_a 0.01% : 0.000100s : 1: opt_after_cconv 0.05% : 0.000453s : 1: opt_after_jit_grad 0.02% : 0.000188s : 1: opt_b 0.47% : 0.004114s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000009s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.03% : 0.000246s : 1: renormalize.infer 0.02% : 0.000210s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000065s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000073s : 1: symbol_engine_optimizer 97.25% : 0.855664s : 1: task_emit 0.01% : 0.000075s : 1: tuple_transform 0.69% : 0.006102s : 1: type_inference 0.01% : 0.000064s : 1: validate TotalTime = 0.0773905, [24] [bootstrap]: 0.0004232 [type_inference]: 0.00449318 [event_method]: 1.105e-05 [auto_monad]: 5.135e-05 [graph_reusing]: 5.72001e-06 [inline]: 1.77001e-06 [add_attr]: 0.00301331, [1] [add_attr_with_inline]: 0.00300503, [1] [Cycle 1]: 4.373e-05, [2] [tag_attr]: 1.196e-05 [meta_addattr_fg_expand]: 3.15998e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 2.443e-05 [insert-virtual-dataset]: 2.74001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00388227, [53] [py_interpret_to_execute]: 1.519e-05 [rewriter_before_opt_a]: 4.074e-05 [opt_a]: 0.00202172, [2] [Cycle 1]: 0.00140247, [45] [expand_dump_flag]: 2.48998e-06 [switch_simplify]: 2.471e-05 [loop_unroll]: 1.394e-05 [a_1]: 0.00029949 [with_stream_mark]: 1.396e-05 [recompute_prepare]: 7.55e-06 [updatestate_depend_eliminate]: 3.28998e-06 [updatestate_assign_eliminate]: 3.45e-06 [updatestate_loads_eliminate]: 3.23e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 0.00013076 [accelerated_algorithm]: 6.75002e-06 [shard]: 2.89999e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 6.01998e-06 [merge_send_recv]: 8.03001e-06 [auto_parallel]: 6.24001e-06 [parallel]: 1.67e-05 [flash_sp]: 7.81001e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 8.90999e-06 [allreduce_slice_to_reducescatter]: 8.2e-07 [virtual_shard_identity]: 7.36001e-06 [virtual_dataset]: 5.77999e-06 [get_grad_eliminate_]: 5.58002e-06 [virtual_output]: 5.70001e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.41002e-06 [before_grad]: 9.49e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.41998e-06 [flash_sp_send_recv_attached]: 2.23998e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 1.016e-05 [a_after_grad]: 9.14e-06 [renormalize]: 0.0004221 [add_forward_monad_depend]: 4.32e-06 [auto_monad_grad]: 1.97999e-06 [auto_monad_eliminator]: 1.415e-05 [cse]: 2.75e-05 [a_3]: 4.076e-05 [Cycle 2]: 0.00060978, [45] [expand_dump_flag]: 7.7e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012779 [with_stream_mark]: 9.74999e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.897e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.35999e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.24e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.77999e-06 [parallel]: 4.58999e-06 [flash_sp]: 3.84002e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.20002e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.35999e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.041e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.69e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 1.73002e-06 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.24998e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 8.33999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.89999e-06 [cse]: 1.32e-05 [a_3]: 3.289e-05 [py_interpret_to_execute_after_opt_a]: 8.90999e-06 [slice_cell_reuse_recomputed_activation]: 2.48998e-06 [rewriter_after_opt_a]: 3.317e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00046942 [opt_b]: 0.00018731, [1] [Cycle 1]: 0.00018073, [7] [b_1]: 0.00010993 [b_2]: 7.06999e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.34999e-06 [renormalize]: 3.59985e-07 [cse]: 1.78e-05 [optimize_parallel_all_gather_comm]: 1.541e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.257e-05 [loop_unroll]: 0.00041852 [opt_after_cconv]: 9.764e-05, [1] [Cycle 1]: 9.183e-05, [7] [c_1]: 2.864e-05 [parameter_eliminate]: 2.52001e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.44001e-06 [cse]: 1.68e-05 [renormalize]: 2.10013e-07 [remove_dup_value]: 1.336e-05 [tuple_transform]: 6.974e-05, [1] [Cycle 1]: 6.564e-05, [4] [d_1]: 4.045e-05 [none_parameter_eliminate]: 1.48002e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 1.88002e-06 [add_recomputation]: 4.453e-05 [cse_after_recomputation]: 2.192e-05, [1] [Cycle 1]: 1.71e-05, [1] [cse]: 1.178e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.17999e-06 [bias_add_comm_swap]: 3.09001e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.03997e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 1.19998e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.134e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 4.03001e-06 [overlap_grad_flash_sp]: 1.873e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.26998e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 6.992e-05, [1] [Cycle 1]: 6.58e-05, [6] [build]: 2.51998e-06 [elim_shapecalc]: 8.84998e-06 [elim_not_effective]: 1.152e-05 [opt_reshape]: 6.49001e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.612e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.55003e-06 [opt_after_jit_grad]: 0.00045018 [validate]: 3.488e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.0647348 [execute]: 9.14e-06 Sums bootstrap : 0.000423s : 0.58% type_inference : 0.004493s : 6.12% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000427s : 0.58% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000200s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000422s : 0.58% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000469s : 0.64% optimize.opt_b.b_1 : 0.000110s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000419s : 0.57% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000450s : 0.61% validate : 0.000035s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.064735s : 88.22% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000125 26 18.46% : 0.000023s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 4.52% : 0.000006s : 4: substitution.graph_param_transform 65.45% : 0.000082s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.66% : 0.000005s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004449 2 90.27% : 0.004016s : 1: type_inference.infer 9.73% : 0.000433s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000140 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.80% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.80% : 0.000008s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.57% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.93% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.48% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.07% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.54% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.69% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.83% : 0.000003s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 1.08% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000326 6 36.15% : 0.000118s : 2: func_graph_cloner_run.FuncGraphClonerGraph 63.85% : 0.000208s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.085701 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.52% : 0.003018s : 1: add_attr 3.51% : 0.003009s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.54% : 0.000459s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000479s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 0.98% : 0.000837s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.36% : 0.002025s : 1: opt_a 0.12% : 0.000101s : 1: opt_after_cconv 0.54% : 0.000460s : 1: opt_after_jit_grad 0.22% : 0.000191s : 1: opt_b 4.54% : 0.003887s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.28% : 0.000242s : 1: renormalize.infer 0.20% : 0.000173s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.05% : 0.000045s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000073s : 1: symbol_engine_optimizer 75.56% : 0.064758s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 5.26% : 0.004510s : 1: type_inference 0.07% : 0.000060s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x7-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x8-pynative],max_mem:56.0M TotalTime = 0.0233421, [24] [bootstrap]: 0.00055779 [type_inference]: 0.00691433 [event_method]: 1.467e-05 [auto_monad]: 6.203e-05 [graph_reusing]: 5.69e-06 [inline]: 2.45002e-06 [add_attr]: 0.0038647, [1] [add_attr_with_inline]: 0.00385167, [1] [Cycle 1]: 5.434e-05, [2] [tag_attr]: 1.759e-05 [meta_addattr_fg_expand]: 4.17e-06 [parallel-infer-symbol]: 3.38e-06 [pre_auto_parallel]: 3.209e-05 [insert-virtual-dataset]: 2.76999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.66e-06 [optimize]: 0.00440847, [53] [py_interpret_to_execute]: 2.444e-05 [rewriter_before_opt_a]: 6.229e-05 [opt_a]: 0.00237197, [2] [Cycle 1]: 0.00174369, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 3.305e-05 [loop_unroll]: 2.158e-05 [a_1]: 0.00047847 [with_stream_mark]: 1.541e-05 [recompute_prepare]: 8.32998e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.789e-05 [accelerated_algorithm]: 6.33002e-06 [shard]: 2.48e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 8.32e-06 [auto_parallel]: 6.36998e-06 [parallel]: 2.885e-05 [flash_sp]: 8e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 8.72e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 7.61999e-06 [virtual_dataset]: 6.42001e-06 [get_grad_eliminate_]: 5.78002e-06 [virtual_output]: 5.94e-06 [merge_forward]: 4.20999e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 9.43002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.122e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.51e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.52001e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 1.066e-05 [a_after_grad]: 9.02e-06 [renormalize]: 0.00059327 [add_forward_monad_depend]: 4.85999e-06 [auto_monad_grad]: 2.12999e-06 [auto_monad_eliminator]: 1.401e-05 [cse]: 2.888e-05 [a_3]: 4.285e-05 [Cycle 2]: 0.00061787, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 7.45e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00013013 [with_stream_mark]: 1.099e-05 [recompute_prepare]: 5.86998e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.16998e-06 [updatestate_loads_eliminate]: 2.39001e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.808e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.72e-06 [auto_parallel]: 5.66003e-06 [parallel]: 4.05e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 5.15999e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.69001e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 5.15001e-06 [merge_forward]: 3.01999e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 6.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.044e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 8.55999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.66001e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.32999e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 8.77999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.42001e-06 [cse]: 1.378e-05 [a_3]: 3.285e-05 [py_interpret_to_execute_after_opt_a]: 9.35001e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.331e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00050558 [opt_b]: 0.00024726, [1] [Cycle 1]: 0.00024083, [7] [b_1]: 0.0001674 [b_2]: 7.9e-06 [updatestate_depend_eliminate]: 5.52001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.53003e-06 [renormalize]: 2.50002e-07 [cse]: 1.858e-05 [optimize_parallel_all_gather_comm]: 1.677e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.413e-05 [loop_unroll]: 0.00044115 [opt_after_cconv]: 9.968e-05, [1] [Cycle 1]: 9.353e-05, [7] [c_1]: 2.893e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.87999e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.719e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.335e-05 [tuple_transform]: 7.179e-05, [1] [Cycle 1]: 6.748e-05, [4] [d_1]: 4.094e-05 [none_parameter_eliminate]: 1.99e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.44001e-06 [partial_unused_args_eliminate]: 1.69998e-06 [add_recomputation]: 5.161e-05 [cse_after_recomputation]: 2.178e-05, [1] [Cycle 1]: 1.72e-05, [1] [cse]: 1.202e-05 [environ_conv]: 5.29998e-06 [swap_dp_allreduce_reducescatter]: 5.54998e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.75001e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.43e-06 [reorder_send_recv_between_fp_bp]: 2.41e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.40001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86003e-06 [control_data_broadcast_order]: 1.149e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 4.08001e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 4.23001e-06 [overlap_grad_flash_sp]: 1.893e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.129e-05, [1] [Cycle 1]: 6.699e-05, [6] [build]: 2.73e-06 [elim_shapecalc]: 8.40999e-06 [elim_not_effective]: 1.248e-05 [opt_reshape]: 6.49001e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.99999e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.72e-05 [get_jit_bprop_graph]: 1.57001e-06 [rewriter_after_jit_bprop_graph]: 3.81001e-06 [opt_after_jit_grad]: 0.00046883 [validate]: 3.588e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00669357 [execute]: 8.35001e-06 Sums bootstrap : 0.000558s : 3.02% type_inference : 0.006914s : 37.48% event_method : 0.000015s : 0.08% auto_monad : 0.000062s : 0.34% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000032s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.13% optimize.rewriter_before_opt_a : 0.000062s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000609s : 3.30% optimize.opt_a.with_stream_mark : 0.000026s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000146s : 0.79% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.06% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000033s : 0.18% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000009s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000018s : 0.10% optimize.opt_a.renormalize : 0.000593s : 3.22% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000043s : 0.23% optimize.opt_a.a_3 : 0.000076s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000033s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000506s : 2.74% optimize.opt_b.b_1 : 0.000167s : 0.91% optimize.opt_b.b_2 : 0.000008s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.13% optimize.loop_unroll : 0.000441s : 2.39% optimize.opt_after_cconv.c_1 : 0.000029s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000041s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000052s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000019s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000017s : 0.09% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000469s : 2.54% validate : 0.000036s : 0.19% backend_pass : 0.000001s : 0.00% task_emit : 0.006694s : 36.29% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000186 30 14.20% : 0.000026s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.38% : 0.000006s : 4: substitution.graph_param_transform 67.77% : 0.000126s : 3: substitution.inline 1.84% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000005s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 5.82% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006863 2 91.05% : 0.006249s : 1: type_inference.infer 8.95% : 0.000614s : 1: type_inference.specialize ------[replace.] 0.000041 5 70.99% : 0.000029s : 3: replace.inline 29.01% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000134 5 92.70% : 0.000124s : 3: match.inline 7.30% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000221 1131 0.70% : 0.000002s : 11: predicate.accumulaten_eliminater 0.74% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.48% : 0.000001s : 8: predicate.addn_check_dump 0.60% : 0.000001s : 11: predicate.addn_zero_filter 0.57% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.70% : 0.000004s : 19: predicate.arithmetic_simplify 0.66% : 0.000001s : 11: predicate.cast_eliminate 0.47% : 0.000001s : 8: predicate.check_bprop_eliminate 0.45% : 0.000001s : 8: predicate.compare_switch_simplify 0.17% : 0.000000s : 4: predicate.const_output_eliminate 0.49% : 0.000001s : 8: predicate.depend_value_elim 0.64% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.72% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.62% : 0.000001s : 11: predicate.dict_set_item_eliminator 25.66% : 0.000057s : 8: predicate.dumpgradient_eliminate 0.21% : 0.000000s : 4: predicate.elim_not_effective 0.27% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.86% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.80% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.82% : 0.000002s : 15: predicate.environ_get_depend_swap 1.35% : 0.000003s : 23: predicate.environ_get_eliminate 0.80% : 0.000002s : 15: predicate.environ_get_set_eliminate 0.94% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.60% : 0.000004s : 16: predicate.float_depend_g_call 0.44% : 0.000001s : 8: predicate.float_environ_get_switch 0.67% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.55% : 0.000001s : 8: predicate.get_grad_eliminate 0.17% : 0.000000s : 4: predicate.graph_param_transform 0.49% : 0.000001s : 8: predicate.incorporate_call 0.41% : 0.000001s : 8: predicate.incorporate_call_switch 4.31% : 0.000010s : 51: predicate.inline 0.62% : 0.000001s : 8: predicate.inline_without_move 0.29% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.70% : 0.000002s : 8: predicate.less_batch_normalization 1.31% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.77% : 0.000004s : 32: predicate.load_eliminater 0.80% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.63% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.27% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.46% : 0.000001s : 8: predicate.merge_addn 0.51% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.48% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.60% : 0.000001s : 11: predicate.minmaximum_grad 0.96% : 0.000002s : 4: predicate.mutable_eliminate 0.32% : 0.000001s : 4: predicate.opt_reshape 0.30% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000003s : 16: predicate.partial_defer_inline 1.08% : 0.000002s : 17: predicate.partial_eliminate 0.62% : 0.000001s : 11: predicate.print_const_string_wrapper 0.49% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000002s : 11: predicate.reduce_eliminate 1.75% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000001s : 8: predicate.remove_not_recompute_node 1.05% : 0.000002s : 21: predicate.replace_applicator 0.38% : 0.000001s : 8: predicate.replace_old_param 0.24% : 0.000001s : 4: predicate.reset_defer_inline 0.62% : 0.000001s : 11: predicate.reshape_eliminate 0.55% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.32% : 0.000001s : 4: predicate.row_tensor_eliminate 0.72% : 0.000002s : 8: predicate.same_eliminate 0.40% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.64% : 0.000001s : 8: predicate.shard_identity_eliminate 0.62% : 0.000001s : 8: predicate.special_op_eliminate 0.58% : 0.000001s : 8: predicate.specialize_transform 0.76% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.69% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.28% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000002s : 16: predicate.switch_defer_inline 1.49% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.96% : 0.000009s : 54: predicate.switch_simplify 0.61% : 0.000001s : 11: predicate.tile_eliminate 0.71% : 0.000002s : 11: predicate.transpose_eliminate 1.18% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.14% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.08% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.49% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.04% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.68% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.27% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.73% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.38% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.59% : 0.000001s : 8: predicate.virtual_output_eliminate 0.24% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.34% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000440 8 46.22% : 0.000203s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.78% : 0.000237s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.033399 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.59% : 0.003869s : 1: add_attr 11.54% : 0.003856s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000056s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000067s : 1: auto_monad 0.06% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.81% : 0.000606s : 1: bootstrap 0.08% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.07% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000021s : 1: event_method 0.04% : 0.000015s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000008s : 1: label_micro_interleaved_index 1.35% : 0.000450s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000516s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000014s : 1: opt.transform.mutable_eliminate 2.95% : 0.000986s : 78: opt.transform.opt_a 0.08% : 0.000028s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.45% : 0.000149s : 28: opt.transform.opt_b 0.14% : 0.000045s : 2: opt.transform.opt_trans_graph 0.10% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.11% : 0.002375s : 1: opt_a 0.31% : 0.000103s : 1: opt_after_cconv 1.43% : 0.000479s : 1: opt_after_jit_grad 0.75% : 0.000251s : 1: opt_b 13.21% : 0.004413s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000022s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000036s : 1: pre_auto_parallel 0.09% : 0.000029s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.95% : 0.000319s : 1: renormalize.infer 0.80% : 0.000268s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000037s : 1: rewriter_after_opt_a 0.20% : 0.000067s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000074s : 1: symbol_engine_optimizer 20.08% : 0.006708s : 1: task_emit 0.22% : 0.000075s : 1: tuple_transform 20.76% : 0.006933s : 1: type_inference 0.22% : 0.000072s : 1: validate TotalTime = 0.0196848, [24] [bootstrap]: 0.00042796 [type_inference]: 0.00471145 [event_method]: 1.135e-05 [auto_monad]: 5.535e-05 [graph_reusing]: 5.44e-06 [inline]: 2.12999e-06 [add_attr]: 0.00324863, [1] [add_attr_with_inline]: 0.00323895, [1] [Cycle 1]: 5.11e-05, [2] [tag_attr]: 1.392e-05 [meta_addattr_fg_expand]: 3.31001e-06 [parallel-infer-symbol]: 3.28e-06 [pre_auto_parallel]: 2.563e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.01998e-06 [pipeline_split]: 1.96e-06 [optimize]: 0.00408885, [53] [py_interpret_to_execute]: 1.798e-05 [rewriter_before_opt_a]: 4.19e-05 [opt_a]: 0.00212923, [2] [Cycle 1]: 0.00149074, [45] [expand_dump_flag]: 2.32001e-06 [switch_simplify]: 2.816e-05 [loop_unroll]: 1.604e-05 [a_1]: 0.00031694 [with_stream_mark]: 1.619e-05 [recompute_prepare]: 9.00999e-06 [updatestate_depend_eliminate]: 4.04997e-06 [updatestate_assign_eliminate]: 3.06999e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.77001e-06 [a_2]: 8.072e-05 [accelerated_algorithm]: 6.68998e-06 [shard]: 2.49999e-06 [meta_shard_fg_expand]: 1.98002e-06 [shard_inline]: 6.42001e-06 [merge_send_recv]: 8.78001e-06 [auto_parallel]: 6.88e-06 [parallel]: 2.011e-05 [flash_sp]: 8.19002e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 9.05001e-06 [allreduce_slice_to_reducescatter]: 9.89996e-07 [virtual_shard_identity]: 7.53e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.54e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 9.98998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.143e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 9.71e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76999e-06 [meta_fg_expand]: 2.52001e-06 [flash_sp_send_recv_attached]: 2.43002e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.237e-05 [a_after_grad]: 9.59e-06 [renormalize]: 0.00049398 [add_forward_monad_depend]: 5.02e-06 [auto_monad_grad]: 2.27999e-06 [auto_monad_eliminator]: 1.478e-05 [cse]: 2.845e-05 [a_3]: 4.534e-05 [Cycle 2]: 0.00062707, [45] [expand_dump_flag]: 1.20001e-06 [switch_simplify]: 8.02e-06 [loop_unroll]: 5.82999e-06 [a_1]: 0.00013791 [with_stream_mark]: 1.306e-05 [recompute_prepare]: 6.26e-06 [updatestate_depend_eliminate]: 3.08e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.32999e-06 [a_2]: 6.998e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.80999e-06 [auto_parallel]: 5.92999e-06 [parallel]: 4.77e-06 [flash_sp]: 3.4e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 5.25001e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 2.58003e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.58e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.042e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.12e-06 [set_forward_comm_id_for_comm_node_pass]: 2.79999e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.12e-06 [after_resolve]: 1.01e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.355e-05 [a_3]: 3.314e-05 [py_interpret_to_execute_after_opt_a]: 9.15999e-06 [slice_cell_reuse_recomputed_activation]: 2.24999e-06 [rewriter_after_opt_a]: 3.468e-05 [convert_after_rewriter]: 7.08998e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00050046 [opt_b]: 0.00021565, [1] [Cycle 1]: 0.00020888, [7] [b_1]: 0.00013587 [b_2]: 7.4e-06 [updatestate_depend_eliminate]: 6.38e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.38998e-06 [renormalize]: 3.60014e-07 [cse]: 1.801e-05 [optimize_parallel_all_gather_comm]: 1.744e-05 [overlap_param_gather]: 2.22999e-06 [cconv]: 2.443e-05 [loop_unroll]: 0.00043573 [opt_after_cconv]: 9.913e-05, [1] [Cycle 1]: 9.316e-05, [7] [c_1]: 2.891e-05 [parameter_eliminate]: 2.76e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.13998e-06 [cse]: 1.733e-05 [renormalize]: 4.10015e-07 [remove_dup_value]: 1.362e-05 [tuple_transform]: 7.176e-05, [1] [Cycle 1]: 6.738e-05, [4] [d_1]: 4.126e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.44999e-06 [partial_unused_args_eliminate]: 2.32001e-06 [add_recomputation]: 4.63e-05 [cse_after_recomputation]: 2.093e-05, [1] [Cycle 1]: 1.654e-05, [1] [cse]: 1.129e-05 [environ_conv]: 5.49e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 3.36001e-06 [label_micro_interleaved_index]: 4.70001e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.14003e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.39001e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.19998e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 1.26002e-06 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.15999e-06 [overlap_opt_shard_in_pipeline]: 1.50001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.217e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.98001e-06 [overlap_recompute_and_grad_model_parallel]: 4.78001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36998e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.52e-06 [overlap_grad_flash_sp]: 1.843e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 7.009e-05, [1] [Cycle 1]: 6.594e-05, [6] [build]: 2.84001e-06 [elim_shapecalc]: 8.72e-06 [elim_not_effective]: 1.228e-05 [opt_reshape]: 6.06998e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 1.512e-05 [get_jit_bprop_graph]: 1.81003e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00045977 [validate]: 3.521e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00634296 [execute]: 7.90998e-06 Sums bootstrap : 0.000428s : 2.78% type_inference : 0.004711s : 30.58% event_method : 0.000011s : 0.07% auto_monad : 0.000055s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.17% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000018s : 0.12% optimize.rewriter_before_opt_a : 0.000042s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000036s : 0.23% optimize.opt_a.loop_unroll : 0.000022s : 0.14% optimize.opt_a.a_1 : 0.000455s : 2.95% optimize.opt_a.with_stream_mark : 0.000029s : 0.19% optimize.opt_a.recompute_prepare : 0.000015s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000151s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000014s : 0.09% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000025s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000017s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000022s : 0.15% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000494s : 3.21% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.14% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000078s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000035s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000500s : 3.25% optimize.opt_b.b_1 : 0.000136s : 0.88% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.16% optimize.loop_unroll : 0.000436s : 2.83% optimize.opt_after_cconv.c_1 : 0.000029s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.02% optimize.add_recomputation : 0.000046s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000018s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000460s : 2.98% validate : 0.000035s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.006343s : 41.17% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000136 26 18.35% : 0.000025s : 4: substitution.arithmetic_simplify 1.61% : 0.000002s : 2: substitution.elim_not_effective 0.94% : 0.000001s : 2: substitution.fold_const_symbol 4.27% : 0.000006s : 4: substitution.graph_param_transform 65.88% : 0.000090s : 2: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.44% : 0.000005s : 4: substitution.remove_not_recompute_node 3.38% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004663 2 92.09% : 0.004294s : 1: type_inference.infer 7.91% : 0.000369s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000088 2 100.00% : 0.000088s : 2: match.inline ------[predicate.] 0.000145 984 0.97% : 0.000001s : 9: predicate.accumulaten_eliminater 1.21% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.98% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000002s : 13: predicate.environ_get_depend_swap 1.81% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.92% : 0.000001s : 8: predicate.get_grad_eliminate 0.41% : 0.000001s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000009s : 44: predicate.inline 1.06% : 0.000002s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.18% : 0.000002s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.04% : 0.000003s : 26: predicate.load_eliminater 1.41% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.83% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.48% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.56% : 0.000001s : 4: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.67% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.57% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.62% : 0.000007s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.46% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.06% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.66% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.02% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.92% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.98% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000276 6 40.80% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.20% : 0.000164s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028508 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.41% : 0.003253s : 1: add_attr 11.38% : 0.003243s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.64% : 0.000466s : 1: bootstrap 0.10% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000018s : 1: event_method 0.05% : 0.000014s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.56% : 0.000445s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.79% : 0.000511s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.91% : 0.000830s : 78: opt.transform.opt_a 0.10% : 0.000028s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000095s : 28: opt.transform.opt_b 0.16% : 0.000046s : 2: opt.transform.opt_trans_graph 0.12% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.48% : 0.002132s : 1: opt_a 0.36% : 0.000102s : 1: opt_after_cconv 1.65% : 0.000470s : 1: opt_after_jit_grad 0.77% : 0.000219s : 1: opt_b 14.36% : 0.004093s : 1: optimize 0.07% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000008s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000022s : 1: py_interpret_to_execute 0.04% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 1.00% : 0.000286s : 1: renormalize.infer 0.70% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000039s : 1: rewriter_after_opt_a 0.16% : 0.000046s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000073s : 1: symbol_engine_optimizer 22.30% : 0.006357s : 1: task_emit 0.26% : 0.000075s : 1: tuple_transform 16.60% : 0.004732s : 1: type_inference 0.25% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x8-kbk],max_mem:56.0M . TotalTime = 0.900132, [24] [bootstrap]: 0.00055712 [type_inference]: 0.00657893 [event_method]: 1.441e-05 [auto_monad]: 5.72e-05 [graph_reusing]: 5.19e-06 [inline]: 2.10002e-06 [add_attr]: 0.00383712, [1] [add_attr_with_inline]: 0.00382248, [1] [Cycle 1]: 5.423e-05, [2] [tag_attr]: 1.71e-05 [meta_addattr_fg_expand]: 4.11001e-06 [parallel-infer-symbol]: 4.24997e-06 [pre_auto_parallel]: 3.372e-05 [insert-virtual-dataset]: 2.64001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.92999e-06 [optimize]: 0.0044275, [53] [py_interpret_to_execute]: 2.28e-05 [rewriter_before_opt_a]: 6.269e-05 [opt_a]: 0.00244474, [2] [Cycle 1]: 0.00177453, [45] [expand_dump_flag]: 3.13998e-06 [switch_simplify]: 3.184e-05 [loop_unroll]: 2.188e-05 [a_1]: 0.00048281 [with_stream_mark]: 1.642e-05 [recompute_prepare]: 8.58001e-06 [updatestate_depend_eliminate]: 3.97e-06 [updatestate_assign_eliminate]: 3.63e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 2.18998e-06 [a_2]: 7.732e-05 [accelerated_algorithm]: 6.42001e-06 [shard]: 2.48998e-06 [meta_shard_fg_expand]: 1.94999e-06 [shard_inline]: 6.14999e-06 [merge_send_recv]: 8.28001e-06 [auto_parallel]: 6.39999e-06 [parallel]: 2.68e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 9.21998e-06 [allreduce_slice_to_reducescatter]: 5.70028e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 6.22001e-06 [get_grad_eliminate_]: 5.64998e-06 [virtual_output]: 6.44001e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.164e-05 [merge_recompute_call_nodes]: 1.87001e-06 [before_grad]: 9.84001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.50998e-06 [meta_fg_expand]: 2.44999e-06 [flash_sp_send_recv_attached]: 2.67001e-06 [receive_attached]: 2.92002e-06 [after_resolve]: 1.088e-05 [a_after_grad]: 9.34e-06 [renormalize]: 0.0006121 [add_forward_monad_depend]: 5.53002e-06 [auto_monad_grad]: 2.34999e-06 [auto_monad_eliminator]: 1.51e-05 [cse]: 2.89e-05 [a_3]: 4.304e-05 [Cycle 2]: 0.00065792, [45] [expand_dump_flag]: 1.66e-06 [switch_simplify]: 7.26999e-06 [loop_unroll]: 5.64998e-06 [a_1]: 0.00013162 [with_stream_mark]: 1.161e-05 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 3.09999e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 6.782e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.26002e-06 [meta_shard_fg_expand]: 1.37e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.48001e-06 [auto_parallel]: 5.51e-06 [parallel]: 5.17999e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 2.89999e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.44999e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.49e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.87002e-06 [cell_reuse_recompute_pass]: 1.60001e-06 [offload_activation]: 6.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.67999e-06 [merge_recompute_call_nodes]: 9.70002e-07 [before_grad]: 8.09002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.84e-06 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.32e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 8.53001e-06 [renormalize]: 1.20024e-07 [add_forward_monad_depend]: 1.39e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 6.81001e-06 [cse]: 1.276e-05 [a_3]: 7.653e-05 [py_interpret_to_execute_after_opt_a]: 9.76998e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.528e-05 [convert_after_rewriter]: 6.80998e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00053007 [opt_b]: 0.00018571, [1] [Cycle 1]: 0.00017853, [7] [b_1]: 0.00011012 [b_2]: 7.28e-06 [updatestate_depend_eliminate]: 5.72001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 4.2998e-07 [cse]: 1.627e-05 [optimize_parallel_all_gather_comm]: 1.611e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.395e-05 [loop_unroll]: 0.00043562 [opt_after_cconv]: 9.611e-05, [1] [Cycle 1]: 9.005e-05, [7] [c_1]: 2.851e-05 [parameter_eliminate]: 2.54001e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.38002e-06 [cse]: 1.602e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.319e-05 [tuple_transform]: 7.193e-05, [1] [Cycle 1]: 6.779e-05, [4] [d_1]: 4.132e-05 [none_parameter_eliminate]: 1.97001e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.53e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 4.94e-05 [cse_after_recomputation]: 1.997e-05, [1] [Cycle 1]: 1.547e-05, [1] [cse]: 1.042e-05 [environ_conv]: 4.82998e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 1.34998e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.21997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.211e-05 [grouped_pairwise_exchange_alltoall]: 2.46e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.41e-06 [overlap_grad_ring_attention]: 3.86001e-06 [overlap_grad_flash_sp]: 1.891e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.23002e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 7.122e-05, [1] [Cycle 1]: 6.639e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 9.01998e-06 [elim_not_effective]: 1.207e-05 [opt_reshape]: 6.53e-06 [fold_const_symbol]: 8.93002e-06 [renormalize]: 2.20025e-07 [detach_backward]: 2.07999e-06 [pipeline_parallel_scheduler]: 1.75001e-06 [auto_monad_reorder]: 1.579e-05 [get_jit_bprop_graph]: 1.57001e-06 [rewriter_after_jit_bprop_graph]: 3.59002e-06 [opt_after_jit_grad]: 0.00047158 [validate]: 3.464e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.88383 [execute]: 9.40001e-06 Sums bootstrap : 0.000557s : 0.06% type_inference : 0.006579s : 0.73% event_method : 0.000014s : 0.00% auto_monad : 0.000057s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000004s : 0.00% pre_auto_parallel : 0.000034s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000023s : 0.00% optimize.rewriter_before_opt_a : 0.000063s : 0.01% optimize.opt_a.expand_dump_flag : 0.000005s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000028s : 0.00% optimize.opt_a.a_1 : 0.000614s : 0.07% optimize.opt_a.with_stream_mark : 0.000028s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000032s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000012s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000612s : 0.07% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000022s : 0.00% optimize.opt_a.cse : 0.000042s : 0.00% optimize.opt_a.a_3 : 0.000120s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000530s : 0.06% optimize.opt_b.b_1 : 0.000110s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000436s : 0.05% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000041s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000019s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000007s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000472s : 0.05% validate : 0.000035s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.883830s : 98.72% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000187 30 15.09% : 0.000028s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.03% : 0.000006s : 4: substitution.graph_param_transform 67.50% : 0.000126s : 3: substitution.inline 1.51% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000005s : 4: substitution.remove_not_recompute_node 2.35% : 0.000004s : 4: substitution.replace_old_param 6.11% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006529 2 91.10% : 0.005948s : 1: type_inference.infer 8.90% : 0.000581s : 1: type_inference.specialize ------[replace.] 0.000042 5 70.72% : 0.000029s : 3: replace.inline 29.28% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000134 5 92.27% : 0.000124s : 3: match.inline 7.73% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000166 1131 0.90% : 0.000002s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.97% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.13% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.20% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 16: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.36% : 0.000001s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.61% : 0.000009s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.92% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.27% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000002s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.71% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.35% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000392 8 43.38% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.62% : 0.000222s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.910146 196 0.00% : 0.000004s : 1: ForceFp32Comm 0.42% : 0.003842s : 1: add_attr 0.42% : 0.003827s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000063s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000598s : 1: bootstrap 0.00% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000007s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000446s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000540s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000014s : 1: opt.transform.mutable_eliminate 0.11% : 0.000990s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000092s : 28: opt.transform.opt_b 0.01% : 0.000046s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.27% : 0.002448s : 1: opt_a 0.01% : 0.000099s : 1: opt_after_cconv 0.05% : 0.000482s : 1: opt_after_jit_grad 0.02% : 0.000189s : 1: opt_b 0.49% : 0.004432s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000038s : 1: pre_auto_parallel 0.00% : 0.000027s : 1: py_interpret_to_execute 0.00% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000017s : 1: remove_dup_value 0.04% : 0.000336s : 1: renormalize.infer 0.03% : 0.000268s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000039s : 1: rewriter_after_opt_a 0.01% : 0.000067s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000074s : 1: symbol_engine_optimizer 97.11% : 0.883853s : 1: task_emit 0.01% : 0.000075s : 1: tuple_transform 0.72% : 0.006596s : 1: type_inference 0.01% : 0.000067s : 1: validate TotalTime = 0.0791663, [24] [bootstrap]: 0.00042561 [type_inference]: 0.00449774 [event_method]: 1.1e-05 [auto_monad]: 5.186e-05 [graph_reusing]: 5.15999e-06 [inline]: 2.04999e-06 [add_attr]: 0.00309078, [1] [add_attr_with_inline]: 0.003082, [1] [Cycle 1]: 4.565e-05, [2] [tag_attr]: 1.278e-05 [meta_addattr_fg_expand]: 3.01999e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 2.31e-05 [insert-virtual-dataset]: 2.23998e-06 [parallel-infer-symbol-second]: 9.00007e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00394259, [53] [py_interpret_to_execute]: 1.563e-05 [rewriter_before_opt_a]: 3.914e-05 [opt_a]: 0.0019938, [2] [Cycle 1]: 0.00137622, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 2.418e-05 [loop_unroll]: 1.397e-05 [a_1]: 0.00030056 [with_stream_mark]: 1.455e-05 [recompute_prepare]: 7.73001e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 3.54002e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.667e-05 [accelerated_algorithm]: 6.46999e-06 [shard]: 2.36998e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 7.21001e-06 [auto_parallel]: 5.79999e-06 [parallel]: 1.798e-05 [flash_sp]: 7.53e-06 [merge_comm]: 3.72998e-06 [allreduce_fusion]: 3.30998e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.48e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 4.19997e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.106e-05 [merge_recompute_call_nodes]: 1.93002e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 3.16001e-06 [receive_attached]: 3.13998e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 9.09e-06 [renormalize]: 0.0004477 [add_forward_monad_depend]: 4.38001e-06 [auto_monad_grad]: 1.83002e-06 [auto_monad_eliminator]: 1.327e-05 [cse]: 2.808e-05 [a_3]: 4.246e-05 [Cycle 2]: 0.00060768, [45] [expand_dump_flag]: 1.47999e-06 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012786 [with_stream_mark]: 1.195e-05 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.869e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.56002e-06 [auto_parallel]: 5.31002e-06 [parallel]: 3.95e-06 [flash_sp]: 3.23e-06 [merge_comm]: 3.35003e-06 [allreduce_fusion]: 2.84001e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.39001e-06 [virtual_dataset]: 5.98998e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.71999e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 6.44001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.067e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.34998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.15999e-06 [after_resolve]: 9.22001e-06 [a_after_grad]: 8.49998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.28002e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.46999e-06 [cse]: 1.33e-05 [a_3]: 3.294e-05 [py_interpret_to_execute_after_opt_a]: 8.38999e-06 [slice_cell_reuse_recomputed_activation]: 1.79998e-06 [rewriter_after_opt_a]: 3.187e-05 [convert_after_rewriter]: 7.4e-06 [order_py_execute_after_rewriter]: 4.95999e-06 [mutable_eliminate]: 0.00048526 [opt_b]: 0.0002481, [1] [Cycle 1]: 0.00024175, [7] [b_1]: 0.00017063 [b_2]: 7.25998e-06 [updatestate_depend_eliminate]: 5.89e-06 [updatestate_assign_eliminate]: 2.50997e-06 [updatestate_loads_eliminate]: 2.56998e-06 [renormalize]: 4.00003e-07 [cse]: 1.811e-05 [optimize_parallel_all_gather_comm]: 1.691e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.423e-05 [loop_unroll]: 0.00043272 [opt_after_cconv]: 9.726e-05, [1] [Cycle 1]: 9.158e-05, [7] [c_1]: 2.85e-05 [parameter_eliminate]: 2.40002e-06 [updatestate_depend_eliminate]: 5.56e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.7e-05 [renormalize]: 2.10013e-07 [remove_dup_value]: 1.256e-05 [tuple_transform]: 6.984e-05, [1] [Cycle 1]: 6.552e-05, [4] [d_1]: 3.996e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 4.614e-05 [cse_after_recomputation]: 2.111e-05, [1] [Cycle 1]: 1.655e-05, [1] [cse]: 1.135e-05 [environ_conv]: 5.38002e-06 [swap_dp_allreduce_reducescatter]: 5.11997e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 3.98001e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.52001e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.36998e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.51998e-06 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.85001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.34999e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.792e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 2.09999e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 6.944e-05, [1] [Cycle 1]: 6.527e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.65001e-06 [elim_not_effective]: 1.189e-05 [opt_reshape]: 6.37001e-06 [fold_const_symbol]: 8.76002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.603e-05 [get_jit_bprop_graph]: 1.67999e-06 [rewriter_after_jit_bprop_graph]: 3.25002e-06 [opt_after_jit_grad]: 0.000457 [validate]: 3.568e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0663551 [execute]: 9.85002e-06 Sums bootstrap : 0.000426s : 0.57% type_inference : 0.004498s : 5.99% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000428s : 0.57% optimize.opt_a.with_stream_mark : 0.000026s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000448s : 0.60% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000075s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000485s : 0.65% optimize.opt_b.b_1 : 0.000171s : 0.23% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000018s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000433s : 0.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000457s : 0.61% validate : 0.000036s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.066355s : 88.38% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000126 26 17.81% : 0.000022s : 4: substitution.arithmetic_simplify 1.55% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.72% : 0.000006s : 4: substitution.graph_param_transform 66.48% : 0.000084s : 2: substitution.inline 2.00% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.95% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004452 2 91.33% : 0.004066s : 1: type_inference.infer 8.67% : 0.000386s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000083 2 100.00% : 0.000083s : 2: match.inline ------[predicate.] 0.000141 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.56% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.28% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.43% : 0.000001s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.65% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.26% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.37% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.82% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.10% : 0.000002s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.49% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.66% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.72% : 0.000001s : 9: predicate.transpose_eliminate 1.65% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.16% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.93% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000283 6 40.25% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.75% : 0.000169s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087650 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.53% : 0.003095s : 1: add_attr 3.52% : 0.003086s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.53% : 0.000462s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000442s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.56% : 0.000495s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 0.90% : 0.000788s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.17% : 0.000152s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.28% : 0.001997s : 1: opt_a 0.11% : 0.000101s : 1: opt_after_cconv 0.53% : 0.000467s : 1: opt_after_jit_grad 0.29% : 0.000252s : 1: opt_b 4.50% : 0.003947s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.30% : 0.000260s : 1: renormalize.infer 0.21% : 0.000180s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 75.73% : 0.066378s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 5.15% : 0.004516s : 1: type_inference 0.07% : 0.000063s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x8-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x9-pynative],max_mem:56.0M TotalTime = 0.0212554, [24] [bootstrap]: 0.0005958 [type_inference]: 0.00629478 [event_method]: 1.458e-05 [auto_monad]: 5.992e-05 [graph_reusing]: 5.59e-06 [inline]: 1.76003e-06 [add_attr]: 0.00336936, [1] [add_attr_with_inline]: 0.00335845, [1] [Cycle 1]: 4.403e-05, [2] [tag_attr]: 1.558e-05 [meta_addattr_fg_expand]: 3.97998e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.748e-05 [insert-virtual-dataset]: 2.63e-06 [parallel-infer-symbol-second]: 8.49977e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.92999e-06 [optimize]: 0.00401812, [53] [py_interpret_to_execute]: 2.018e-05 [rewriter_before_opt_a]: 5.781e-05 [opt_a]: 0.00213481, [2] [Cycle 1]: 0.00152401, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 3.188e-05 [loop_unroll]: 2.103e-05 [a_1]: 0.00045422 [with_stream_mark]: 1.37e-05 [recompute_prepare]: 7.83999e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.03998e-06 [updatestate_loads_eliminate]: 3.09001e-06 [parameter_eliminate]: 2.22999e-06 [a_2]: 7.593e-05 [accelerated_algorithm]: 6.31e-06 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 6.04999e-06 [merge_send_recv]: 7.70998e-06 [auto_parallel]: 5.64e-06 [parallel]: 2.162e-05 [flash_sp]: 8.32e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.73001e-06 [allreduce_slice_to_reducescatter]: 1.07e-06 [virtual_shard_identity]: 7.61999e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.37999e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.97e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.12999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.50998e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.57001e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.013e-05 [a_after_grad]: 8.56002e-06 [renormalize]: 0.00042327 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.81003e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.779e-05 [a_3]: 4.053e-05 [Cycle 2]: 0.00060108, [45] [expand_dump_flag]: 8.30012e-07 [switch_simplify]: 7.35998e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.000126 [with_stream_mark]: 9.67001e-06 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.39001e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 6.855e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.2e-06 [flash_sp]: 4.99e-06 [merge_comm]: 3.30998e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 5.18002e-06 [merge_forward]: 2.72001e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.048e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.45999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.15001e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.66999e-06 [cse]: 1.306e-05 [a_3]: 3.24e-05 [py_interpret_to_execute_after_opt_a]: 7.82998e-06 [slice_cell_reuse_recomputed_activation]: 2.14999e-06 [rewriter_after_opt_a]: 3.088e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 5.56002e-06 [mutable_eliminate]: 0.00045025 [opt_b]: 0.00018541, [1] [Cycle 1]: 0.00017931, [7] [b_1]: 0.00010913 [b_2]: 7.29001e-06 [updatestate_depend_eliminate]: 5.71003e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 3.60014e-07 [cse]: 1.7e-05 [optimize_parallel_all_gather_comm]: 1.545e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.17e-05 [loop_unroll]: 0.00045027 [opt_after_cconv]: 9.695e-05, [1] [Cycle 1]: 9.128e-05, [7] [c_1]: 2.846e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.69e-05 [renormalize]: 4.90021e-07 [remove_dup_value]: 1.381e-05 [tuple_transform]: 6.935e-05, [1] [Cycle 1]: 6.474e-05, [4] [d_1]: 3.895e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.39001e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.733e-05 [cse_after_recomputation]: 2.065e-05, [1] [Cycle 1]: 1.625e-05, [1] [cse]: 1.108e-05 [environ_conv]: 4.79e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.80002e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.73998e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.187e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 5.00999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4.17998e-06 [overlap_grad_flash_sp]: 1.695e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 6.924e-05, [1] [Cycle 1]: 6.505e-05, [6] [build]: 2.45002e-06 [elim_shapecalc]: 9.19998e-06 [elim_not_effective]: 1.136e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.70001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.603e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00045238 [validate]: 3.171e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.00613849 [execute]: 6.74001e-06 Sums bootstrap : 0.000596s : 3.52% type_inference : 0.006295s : 37.22% event_method : 0.000015s : 0.09% auto_monad : 0.000060s : 0.35% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000580s : 3.43% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000026s : 0.15% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000423s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.66% optimize.opt_b.b_1 : 0.000109s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000450s : 2.66% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 2.68% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006138s : 36.30% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 15.15% : 0.000025s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.13% : 0.000005s : 4: substitution.graph_param_transform 66.67% : 0.000110s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.96% : 0.000005s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.29% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006252 2 90.86% : 0.005680s : 1: type_inference.infer 9.14% : 0.000571s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.94% : 0.000027s : 3: replace.inline 29.06% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 92.05% : 0.000108s : 3: match.inline 7.95% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_depend_swap 2.01% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.34% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.28% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.80% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.80% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.88% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000393 8 51.96% : 0.000204s : 3: func_graph_cloner_run.FuncGraphClonerGraph 48.04% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030169 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.18% : 0.003374s : 1: add_attr 11.14% : 0.003362s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000065s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.11% : 0.000635s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.52% : 0.000459s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000006s : 1: micro_interleaved_order_control 1.52% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000948s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.002138s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.53% : 0.000462s : 1: opt_after_jit_grad 0.63% : 0.000189s : 1: opt_b 13.33% : 0.004022s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.70% : 0.000211s : 1: renormalize.infer 0.68% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 20.38% : 0.006149s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.91% : 0.006308s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0182198, [24] [bootstrap]: 0.00047322 [type_inference]: 0.00444872 [event_method]: 1.036e-05 [auto_monad]: 4.893e-05 [graph_reusing]: 5.81998e-06 [inline]: 1.84998e-06 [add_attr]: 0.00296474, [1] [add_attr_with_inline]: 0.00295715, [1] [Cycle 1]: 4.625e-05, [2] [tag_attr]: 1.219e-05 [meta_addattr_fg_expand]: 3.07002e-06 [parallel-infer-symbol]: 2.54999e-06 [pre_auto_parallel]: 2.232e-05 [insert-virtual-dataset]: 2.68998e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00370159, [53] [py_interpret_to_execute]: 1.516e-05 [rewriter_before_opt_a]: 3.938e-05 [opt_a]: 0.0018468, [2] [Cycle 1]: 0.00124808, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 2.437e-05 [loop_unroll]: 1.388e-05 [a_1]: 0.00029221 [with_stream_mark]: 1.403e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 4.11001e-06 [updatestate_assign_eliminate]: 3.01999e-06 [updatestate_loads_eliminate]: 3.27997e-06 [parameter_eliminate]: 1.63002e-06 [a_2]: 7.734e-05 [accelerated_algorithm]: 6.43998e-06 [shard]: 2.37999e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.88002e-06 [merge_send_recv]: 7.31001e-06 [auto_parallel]: 5.82999e-06 [parallel]: 1.653e-05 [flash_sp]: 7.18998e-06 [merge_comm]: 3.45998e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 8.66002e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.80002e-06 [virtual_output]: 5.79999e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.32999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.101e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 1.054e-05 [a_after_grad]: 8.42e-06 [renormalize]: 0.00033989 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.281e-05 [cse]: 2.692e-05 [a_3]: 3.982e-05 [Cycle 2]: 0.00058893, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.97002e-06 [loop_unroll]: 5.59998e-06 [a_1]: 0.00012065 [with_stream_mark]: 1.101e-05 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.82e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.53002e-06 [merge_send_recv]: 4.18001e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.25999e-06 [flash_sp]: 2.84001e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 4.90001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 4.91002e-06 [merge_forward]: 2.68003e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.69999e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.13999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00998e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.90025e-07 [after_resolve]: 9.92001e-06 [a_after_grad]: 8.55999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.01e-06 [cse]: 1.182e-05 [a_3]: 3.22e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 3.008e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 5.11002e-06 [mutable_eliminate]: 0.00049823 [opt_b]: 0.00018062, [1] [Cycle 1]: 0.00017484, [7] [b_1]: 0.00010838 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 4.89992e-07 [cse]: 1.577e-05 [optimize_parallel_all_gather_comm]: 1.62e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.2e-05 [loop_unroll]: 0.00041753 [opt_after_cconv]: 9.41e-05, [1] [Cycle 1]: 8.851e-05, [7] [c_1]: 2.746e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.582e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 6.9e-05, [1] [Cycle 1]: 6.473e-05, [4] [d_1]: 3.914e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.425e-05 [cse_after_recomputation]: 1.962e-05, [1] [Cycle 1]: 1.527e-05, [1] [cse]: 1.017e-05 [environ_conv]: 4.22e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.09003e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.58998e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.30001e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61998e-06 [control_data_broadcast_order]: 1.186e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.63999e-06 [overlap_recompute_and_grad_model_parallel]: 4.28999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 3.81001e-06 [overlap_grad_flash_sp]: 1.703e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.751e-05, [1] [Cycle 1]: 6.355e-05, [6] [build]: 2.14999e-06 [elim_shapecalc]: 8.47998e-06 [elim_not_effective]: 1.107e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.58e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.45003e-06 [opt_after_jit_grad]: 0.00044683 [validate]: 3.067e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00582878 [execute]: 7.04001e-06 Sums bootstrap : 0.000473s : 3.31% type_inference : 0.004449s : 31.11% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000413s : 2.89% optimize.opt_a.with_stream_mark : 0.000025s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000011s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000340s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000498s : 3.48% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000418s : 2.92% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.12% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005829s : 40.76% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.44% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.63% : 0.000006s : 4: substitution.graph_param_transform 65.41% : 0.000078s : 2: substitution.inline 2.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.69% : 0.000004s : 4: substitution.remove_not_recompute_node 3.06% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004410 2 92.08% : 0.004061s : 1: type_inference.infer 7.92% : 0.000349s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.32% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.35% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.57% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.64% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.97% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.67% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000244 6 42.72% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.28% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026152 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.35% : 0.002969s : 1: add_attr 11.32% : 0.002961s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.95% : 0.000511s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000426s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.94% : 0.000508s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.07% : 0.001850s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000456s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.17% : 0.003705s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.72% : 0.000189s : 1: renormalize.infer 0.55% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.33% : 0.005839s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 17.06% : 0.004462s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x9-kbk],max_mem:56.0M TotalTime = 0.836348, [24] [bootstrap]: 0.00053878 [type_inference]: 0.00600051 [event_method]: 1.374e-05 [auto_monad]: 5.432e-05 [graph_reusing]: 5.24e-06 [inline]: 1.67999e-06 [add_attr]: 0.00337391, [1] [add_attr_with_inline]: 0.0033631, [1] [Cycle 1]: 4.444e-05, [2] [tag_attr]: 1.444e-05 [meta_addattr_fg_expand]: 4.39002e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.847e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00398979, [53] [py_interpret_to_execute]: 1.967e-05 [rewriter_before_opt_a]: 5.825e-05 [opt_a]: 0.00214755, [2] [Cycle 1]: 0.00149569, [45] [expand_dump_flag]: 2.88998e-06 [switch_simplify]: 3.17e-05 [loop_unroll]: 2.122e-05 [a_1]: 0.00044992 [with_stream_mark]: 1.357e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.50998e-06 [updatestate_assign_eliminate]: 3.05002e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.642e-05 [accelerated_algorithm]: 6.42001e-06 [shard]: 2.51998e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.04002e-06 [auto_parallel]: 5.95002e-06 [parallel]: 2.329e-05 [flash_sp]: 7.15e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.61999e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.23e-06 [virtual_dataset]: 5.98998e-06 [get_grad_eliminate_]: 5.46998e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.061e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 8.75999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 2.34999e-06 [flash_sp_send_recv_attached]: 2.17999e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 8.74e-06 [renormalize]: 0.00040394 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.60001e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.571e-05 [a_3]: 3.996e-05 [Cycle 2]: 0.00064239, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7e-06 [loop_unroll]: 5.69999e-06 [a_1]: 0.00016968 [with_stream_mark]: 1.031e-05 [recompute_prepare]: 6.19001e-06 [updatestate_depend_eliminate]: 2.88998e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.878e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.49002e-06 [auto_parallel]: 5.25001e-06 [parallel]: 4.32e-06 [flash_sp]: 2.92002e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.22e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.49998e-06 [offload_activation]: 5.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.34e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.61999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94001e-06 [meta_fg_expand]: 1.61002e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 7.81001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.72002e-06 [cse]: 1.399e-05 [a_3]: 3.288e-05 [py_interpret_to_execute_after_opt_a]: 7.31999e-06 [slice_cell_reuse_recomputed_activation]: 2.17001e-06 [rewriter_after_opt_a]: 3.152e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.38002e-06 [mutable_eliminate]: 0.0004536 [opt_b]: 0.00018286, [1] [Cycle 1]: 0.00017625, [7] [b_1]: 0.00010827 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 7.10017e-07 [cse]: 1.527e-05 [optimize_parallel_all_gather_comm]: 1.532e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 2.171e-05 [loop_unroll]: 0.00041751 [opt_after_cconv]: 9.44e-05, [1] [Cycle 1]: 8.869e-05, [7] [c_1]: 2.813e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.602e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.265e-05 [tuple_transform]: 6.866e-05, [1] [Cycle 1]: 6.454e-05, [4] [d_1]: 3.931e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.56998e-06 [add_recomputation]: 4.8e-05 [cse_after_recomputation]: 1.932e-05, [1] [Cycle 1]: 1.532e-05, [1] [cse]: 1.04e-05 [environ_conv]: 4.51002e-06 [swap_dp_allreduce_reducescatter]: 5.14998e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 3.98001e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.61999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.16e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.59002e-06 [overlap_recompute_and_grad_model_parallel]: 4.32998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.27001e-06 [overlap_grad_ring_attention]: 3.95998e-06 [overlap_grad_flash_sp]: 1.64e-05 [begin_end_overlap_inline]: 7.2e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.746e-05, [1] [Cycle 1]: 6.319e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 7.89002e-06 [elim_not_effective]: 1.189e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.76997e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.41998e-06 [auto_monad_reorder]: 1.503e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00045287 [validate]: 3.185e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.821605 [execute]: 9.14998e-06 Sums bootstrap : 0.000539s : 0.06% type_inference : 0.006001s : 0.72% event_method : 0.000014s : 0.00% auto_monad : 0.000054s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000620s : 0.07% optimize.opt_a.with_stream_mark : 0.000024s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000016s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000404s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000040s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000454s : 0.05% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000015s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000418s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000453s : 0.05% validate : 0.000032s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.821605s : 98.75% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000164 30 15.12% : 0.000025s : 5: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 66.34% : 0.000109s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.67% : 0.000004s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 6.69% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005956 2 90.92% : 0.005415s : 1: type_inference.infer 9.08% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.83% : 0.000027s : 3: replace.inline 29.17% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.54% : 0.000106s : 3: match.inline 8.46% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000200 1131 0.71% : 0.000001s : 11: predicate.accumulaten_eliminater 0.75% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000001s : 8: predicate.addn_check_dump 0.65% : 0.000001s : 11: predicate.addn_zero_filter 0.62% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.80% : 0.000004s : 19: predicate.arithmetic_simplify 0.64% : 0.000001s : 11: predicate.cast_eliminate 0.54% : 0.000001s : 8: predicate.check_bprop_eliminate 0.49% : 0.000001s : 8: predicate.compare_switch_simplify 0.19% : 0.000000s : 4: predicate.const_output_eliminate 0.49% : 0.000001s : 8: predicate.depend_value_elim 0.70% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.73% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.68% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.89% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.30% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.94% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.86% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.87% : 0.000002s : 15: predicate.environ_get_depend_swap 1.47% : 0.000003s : 23: predicate.environ_get_eliminate 0.90% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.00% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.72% : 0.000003s : 16: predicate.float_depend_g_call 0.45% : 0.000001s : 8: predicate.float_environ_get_switch 0.69% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.18% : 0.000000s : 4: predicate.fold_const_symbol 0.57% : 0.000001s : 8: predicate.get_grad_eliminate 0.21% : 0.000000s : 4: predicate.graph_param_transform 0.56% : 0.000001s : 8: predicate.incorporate_call 0.46% : 0.000001s : 8: predicate.incorporate_call_switch 4.74% : 0.000009s : 51: predicate.inline 0.70% : 0.000001s : 8: predicate.inline_without_move 0.31% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.69% : 0.000001s : 8: predicate.less_batch_normalization 1.39% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.91% : 0.000004s : 32: predicate.load_eliminater 0.78% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.36% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.49% : 0.000001s : 8: predicate.merge_addn 0.50% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.50% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.63% : 0.000001s : 11: predicate.minmaximum_grad 0.91% : 0.000002s : 4: predicate.mutable_eliminate 0.28% : 0.000001s : 4: predicate.opt_reshape 0.30% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 16: predicate.partial_defer_inline 1.13% : 0.000002s : 17: predicate.partial_eliminate 0.66% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 22.12% : 0.000044s : 11: predicate.reduce_eliminate 1.91% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.39% : 0.000001s : 8: predicate.remove_not_recompute_node 1.10% : 0.000002s : 21: predicate.replace_applicator 0.46% : 0.000001s : 8: predicate.replace_old_param 0.26% : 0.000001s : 4: predicate.reset_defer_inline 0.63% : 0.000001s : 11: predicate.reshape_eliminate 0.53% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.30% : 0.000001s : 4: predicate.row_tensor_eliminate 0.67% : 0.000001s : 8: predicate.same_eliminate 0.42% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.75% : 0.000001s : 8: predicate.shard_identity_eliminate 0.61% : 0.000001s : 8: predicate.special_op_eliminate 0.63% : 0.000001s : 8: predicate.specialize_transform 0.77% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.64% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.31% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.07% : 0.000002s : 16: predicate.switch_defer_inline 1.57% : 0.000003s : 24: predicate.switch_layer_defer_inline 3.93% : 0.000008s : 54: predicate.switch_simplify 0.67% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000002s : 11: predicate.transpose_eliminate 1.19% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.29% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.08% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.57% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.14% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.73% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.28% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.80% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.45% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.32% : 0.000001s : 4: predicate.value_based_eliminate 0.59% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.61% : 0.000001s : 8: predicate.virtual_output_eliminate 0.28% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.36% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000343 8 47.81% : 0.000164s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.19% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.845252 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.40% : 0.003378s : 1: add_attr 0.40% : 0.003367s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000059s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000576s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000462s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.12% : 0.000987s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.25% : 0.002151s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.05% : 0.000462s : 1: opt_after_jit_grad 0.02% : 0.000186s : 1: opt_b 0.47% : 0.003993s : 1: optimize 0.00% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000023s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000209s : 1: renormalize.infer 0.02% : 0.000189s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000070s : 1: symbol_engine_optimizer 97.20% : 0.821626s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.71% : 0.006013s : 1: type_inference 0.01% : 0.000057s : 1: validate TotalTime = 0.0730236, [24] [bootstrap]: 0.00042069 [type_inference]: 0.00445625 [event_method]: 1.137e-05 [auto_monad]: 5.191e-05 [graph_reusing]: 4.63999e-06 [inline]: 1.65001e-06 [add_attr]: 0.00303032, [1] [add_attr_with_inline]: 0.00302178, [1] [Cycle 1]: 4.185e-05, [2] [tag_attr]: 1.203e-05 [meta_addattr_fg_expand]: 2.99999e-06 [parallel-infer-symbol]: 2.61999e-06 [pre_auto_parallel]: 2.261e-05 [insert-virtual-dataset]: 2.20002e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00384679, [53] [py_interpret_to_execute]: 1.502e-05 [rewriter_before_opt_a]: 4.099e-05 [opt_a]: 0.00194695, [2] [Cycle 1]: 0.00133703, [45] [expand_dump_flag]: 3.13998e-06 [switch_simplify]: 2.432e-05 [loop_unroll]: 1.375e-05 [a_1]: 0.00029672 [with_stream_mark]: 1.391e-05 [recompute_prepare]: 7.46999e-06 [updatestate_depend_eliminate]: 3.82998e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 1.96998e-06 [a_2]: 7.736e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 6.09001e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 5.97999e-06 [parallel]: 1.735e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.48e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 9.02999e-06 [allreduce_slice_to_reducescatter]: 9.40025e-07 [virtual_shard_identity]: 7.45998e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.62999e-06 [virtual_output]: 6.48e-06 [merge_forward]: 3.73001e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 9.33997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.093e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.13002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 2.31998e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.53e-06 [after_resolve]: 1.028e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00041252 [add_forward_monad_depend]: 4.58999e-06 [auto_monad_grad]: 1.88997e-06 [auto_monad_eliminator]: 1.317e-05 [cse]: 2.836e-05 [a_3]: 4.165e-05 [Cycle 2]: 0.00060025, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 7.11001e-06 [loop_unroll]: 5.62999e-06 [a_1]: 0.00012781 [with_stream_mark]: 1.11e-05 [recompute_prepare]: 6.10002e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 7.50006e-07 [a_2]: 6.84e-05 [accelerated_algorithm]: 5.72999e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.46998e-06 [merge_send_recv]: 4.27003e-06 [auto_parallel]: 5.42001e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.51001e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.00001e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 6.26e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.99999e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.48001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.15001e-06 [a_after_grad]: 7.92e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.34999e-06 [cse]: 1.257e-05 [a_3]: 3.226e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.98002e-06 [rewriter_after_opt_a]: 3.144e-05 [convert_after_rewriter]: 6.68998e-06 [order_py_execute_after_rewriter]: 5.35001e-06 [mutable_eliminate]: 0.00046493 [opt_b]: 0.000186, [1] [Cycle 1]: 0.00017988, [7] [b_1]: 0.0001104 [b_2]: 7.23e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 3.69997e-07 [cse]: 1.684e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.389e-05 [loop_unroll]: 0.00042308 [opt_after_cconv]: 9.619e-05, [1] [Cycle 1]: 9.038e-05, [7] [c_1]: 2.87e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.47001e-06 [updatestate_assign_eliminate]: 2.74001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.646e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.249e-05 [tuple_transform]: 6.939e-05, [1] [Cycle 1]: 6.509e-05, [4] [d_1]: 3.926e-05 [none_parameter_eliminate]: 1.46998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.47001e-06 [partial_unused_args_eliminate]: 2.07999e-06 [add_recomputation]: 9.376e-05 [cse_after_recomputation]: 2.075e-05, [1] [Cycle 1]: 1.653e-05, [1] [cse]: 1.137e-05 [environ_conv]: 4.64002e-06 [swap_dp_allreduce_reducescatter]: 4.87e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.57e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.81999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.28998e-06 [reorder_send_recv_between_fp_bp]: 2.53998e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.06002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.89002e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.697e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 7.023e-05, [1] [Cycle 1]: 6.584e-05, [6] [build]: 2.53e-06 [elim_shapecalc]: 8.47998e-06 [elim_not_effective]: 1.182e-05 [opt_reshape]: 6.48e-06 [fold_const_symbol]: 9.24e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.89e-06 [pipeline_parallel_scheduler]: 1.41002e-06 [auto_monad_reorder]: 1.64e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.52997e-06 [opt_after_jit_grad]: 0.00045656 [validate]: 3.287e-05 [backend_pass]: 1.14998e-06 [task_emit]: 0.0604308 [execute]: 8.37e-06 Sums bootstrap : 0.000421s : 0.61% type_inference : 0.004456s : 6.46% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000425s : 0.62% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000413s : 0.60% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000465s : 0.67% optimize.opt_b.b_1 : 0.000110s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000423s : 0.61% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000094s : 0.14% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000457s : 0.66% validate : 0.000033s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060431s : 87.57% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000125 26 17.73% : 0.000022s : 4: substitution.arithmetic_simplify 1.54% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.24% : 0.000005s : 4: substitution.graph_param_transform 66.18% : 0.000083s : 2: substitution.inline 2.47% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.69% : 0.000005s : 4: substitution.remove_not_recompute_node 3.08% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004413 2 91.79% : 0.004051s : 1: type_inference.infer 8.21% : 0.000362s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000081 2 100.00% : 0.000081s : 2: match.inline ------[predicate.] 0.000141 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.76% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.35% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.79% : 0.000003s : 21: predicate.environ_get_eliminate 1.15% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.71% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000009s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.08% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.07% : 0.000002s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.59% : 0.000001s : 4: predicate.opt_reshape 0.66% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 1.05% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.80% : 0.000003s : 17: predicate.tuple_to_list_eliminator_ 1.99% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.60% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000282 6 43.97% : 0.000124s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.03% : 0.000158s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081251 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.73% : 0.003035s : 1: add_attr 3.72% : 0.003025s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.12% : 0.000098s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.56% : 0.000456s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.58% : 0.000474s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.96% : 0.000781s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000093s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.40% : 0.001950s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.57% : 0.000467s : 1: opt_after_jit_grad 0.23% : 0.000190s : 1: opt_b 4.74% : 0.003851s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.29% : 0.000232s : 1: renormalize.infer 0.21% : 0.000174s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000045s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 74.40% : 0.060453s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.50% : 0.004471s : 1: type_inference 0.07% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y6-dtype_x9-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x0-pynative],max_mem:56.0M TotalTime = 0.0212236, [24] [bootstrap]: 0.00049922 [type_inference]: 0.0061225 [event_method]: 1.494e-05 [auto_monad]: 5.38e-05 [graph_reusing]: 5.52001e-06 [inline]: 1.81e-06 [add_attr]: 0.00341971, [1] [add_attr_with_inline]: 0.00340907, [1] [Cycle 1]: 4.476e-05, [2] [tag_attr]: 1.562e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.82002e-06 [pre_auto_parallel]: 2.797e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.76998e-06 [optimize]: 0.00405069, [53] [py_interpret_to_execute]: 2.148e-05 [rewriter_before_opt_a]: 5.913e-05 [opt_a]: 0.00219906, [2] [Cycle 1]: 0.00158674, [45] [expand_dump_flag]: 2.57001e-06 [switch_simplify]: 3.198e-05 [loop_unroll]: 2.053e-05 [a_1]: 0.00045469 [with_stream_mark]: 1.339e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.02002e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 9.021e-05 [accelerated_algorithm]: 6.68998e-06 [shard]: 2.69999e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 6.09999e-06 [merge_send_recv]: 8.13001e-06 [auto_parallel]: 5.82999e-06 [parallel]: 2.229e-05 [flash_sp]: 6.91001e-06 [merge_comm]: 4.01001e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 9.00999e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.80998e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 8.66002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.065e-05 [merge_recompute_call_nodes]: 1.70001e-06 [before_grad]: 9.63002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.62e-06 [renormalize]: 0.00047151 [add_forward_monad_depend]: 4.65999e-06 [auto_monad_grad]: 1.87999e-06 [auto_monad_eliminator]: 1.404e-05 [cse]: 2.722e-05 [a_3]: 4.097e-05 [Cycle 2]: 0.00060243, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 7.28e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012554 [with_stream_mark]: 9.82001e-06 [recompute_prepare]: 5.76998e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.786e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.27999e-06 [shard_inline]: 7.38e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.51002e-06 [flash_sp]: 3.07002e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 4.88001e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.47001e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.01002e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.58002e-06 [offload_activation]: 6.29001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.036e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.11002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.23002e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 7.38e-06 [cse]: 1.309e-05 [a_3]: 3.199e-05 [py_interpret_to_execute_after_opt_a]: 7.44002e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 2.951e-05 [convert_after_rewriter]: 7.22002e-06 [order_py_execute_after_rewriter]: 5.30001e-06 [mutable_eliminate]: 0.00045237 [opt_b]: 0.00018272, [1] [Cycle 1]: 0.0001768, [7] [b_1]: 0.00010867 [b_2]: 7.41001e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 7.39994e-07 [cse]: 1.698e-05 [optimize_parallel_all_gather_comm]: 1.619e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.283e-05 [loop_unroll]: 0.00041648 [opt_after_cconv]: 9.481e-05, [1] [Cycle 1]: 8.908e-05, [7] [c_1]: 2.758e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.651e-05 [renormalize]: 2.09984e-07 [remove_dup_value]: 1.184e-05 [tuple_transform]: 6.838e-05, [1] [Cycle 1]: 6.422e-05, [4] [d_1]: 3.86e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.98002e-06 [add_recomputation]: 4.886e-05 [cse_after_recomputation]: 2.131e-05, [1] [Cycle 1]: 1.689e-05, [1] [cse]: 1.167e-05 [environ_conv]: 4.87e-06 [swap_dp_allreduce_reducescatter]: 5.72999e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4.95999e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.74001e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.30001e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.47002e-06 [overlap_recompute_and_grad_model_parallel]: 4.28001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.48002e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.38001e-06 [overlap_grad_flash_sp]: 1.691e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.74e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.853e-05, [1] [Cycle 1]: 6.437e-05, [6] [build]: 2.65002e-06 [elim_shapecalc]: 8.13999e-06 [elim_not_effective]: 1.168e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.66997e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 1.539e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 0.00010159 [opt_after_jit_grad]: 0.00045911 [validate]: 3.061e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.00617647 [execute]: 6.74999e-06 Sums bootstrap : 0.000499s : 2.97% type_inference : 0.006123s : 36.42% event_method : 0.000015s : 0.09% auto_monad : 0.000054s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000580s : 3.45% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000158s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000472s : 2.81% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.13% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.69% optimize.opt_b.b_1 : 0.000109s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000416s : 2.48% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000102s : 0.60% opt_after_jit_grad : 0.000459s : 2.73% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006176s : 36.74% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000170 30 14.59% : 0.000025s : 5: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 2.98% : 0.000005s : 4: substitution.graph_param_transform 67.02% : 0.000114s : 3: substitution.inline 1.98% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.84% : 0.000005s : 4: substitution.remove_not_recompute_node 2.50% : 0.000004s : 4: substitution.replace_old_param 6.01% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006079 2 90.50% : 0.005502s : 1: type_inference.infer 9.50% : 0.000578s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.79% : 0.000027s : 3: replace.inline 29.21% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 92.38% : 0.000112s : 3: match.inline 7.62% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.89% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.95% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.32% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.57% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.40% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.60% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.55% : 0.000001s : 8: predicate.replace_old_param 0.28% : 0.000000s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000002s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 1.05% : 0.000002s : 11: predicate.transpose_eliminate 1.61% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 46.93% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.07% : 0.000198s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030283 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.31% : 0.003424s : 1: add_attr 11.27% : 0.003413s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000059s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000536s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.40% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.18% : 0.000963s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.27% : 0.002202s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.55% : 0.000469s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.39% : 0.004055s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.81% : 0.000244s : 1: renormalize.infer 0.73% : 0.000221s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.35% : 0.000107s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000006s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.43% : 0.006186s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.27% : 0.006137s : 1: type_inference 0.26% : 0.000080s : 1: validate TotalTime = 0.0181154, [24] [bootstrap]: 0.00045584 [type_inference]: 0.00431656 [event_method]: 1.037e-05 [auto_monad]: 5.091e-05 [graph_reusing]: 5.25001e-06 [inline]: 2.29001e-06 [add_attr]: 0.0029474, [1] [add_attr_with_inline]: 0.00293936, [1] [Cycle 1]: 4.646e-05, [2] [tag_attr]: 1.227e-05 [meta_addattr_fg_expand]: 3.20998e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.193e-05 [insert-virtual-dataset]: 2.25002e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.51998e-06 [optimize]: 0.00373551, [53] [py_interpret_to_execute]: 1.478e-05 [rewriter_before_opt_a]: 3.789e-05 [opt_a]: 0.00193498, [2] [Cycle 1]: 0.00132767, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.429e-05 [loop_unroll]: 1.406e-05 [a_1]: 0.000339 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 8.27998e-06 [updatestate_depend_eliminate]: 4e-06 [updatestate_assign_eliminate]: 2.99999e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 7.768e-05 [accelerated_algorithm]: 6.18998e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 6.44001e-06 [merge_send_recv]: 7.49002e-06 [auto_parallel]: 5.94e-06 [parallel]: 1.658e-05 [flash_sp]: 7.42002e-06 [merge_comm]: 3.98001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.95999e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 6.89999e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.63002e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.64e-06 [set_forward_comm_id_for_comm_node_pass]: 3.40998e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 1.083e-05 [a_after_grad]: 8.85001e-06 [renormalize]: 0.00036667 [add_forward_monad_depend]: 4.18999e-06 [auto_monad_grad]: 2.02999e-06 [auto_monad_eliminator]: 1.319e-05 [cse]: 2.614e-05 [a_3]: 4.026e-05 [Cycle 2]: 0.00059805, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.56001e-06 [loop_unroll]: 5.87001e-06 [a_1]: 0.00012653 [with_stream_mark]: 1.252e-05 [recompute_prepare]: 5.76998e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.16e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.899e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.59e-06 [parallel]: 4.18999e-06 [flash_sp]: 3.45e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.00999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.14998e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.26997e-06 [offload_activation]: 6.40997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.018e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.92999e-06 [a_after_grad]: 8.21002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.5999e-07 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.228e-05 [a_3]: 3.164e-05 [py_interpret_to_execute_after_opt_a]: 7.59002e-06 [slice_cell_reuse_recomputed_activation]: 1.75001e-06 [rewriter_after_opt_a]: 3.094e-05 [convert_after_rewriter]: 6.85002e-06 [order_py_execute_after_rewriter]: 5.31998e-06 [mutable_eliminate]: 0.00044503 [opt_b]: 0.00018174, [1] [Cycle 1]: 0.00017575, [7] [b_1]: 0.00010835 [b_2]: 6.94001e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 3.4002e-07 [cse]: 1.637e-05 [optimize_parallel_all_gather_comm]: 1.583e-05 [overlap_param_gather]: 2.27999e-06 [cconv]: 2.256e-05 [loop_unroll]: 0.00041359 [opt_after_cconv]: 9.466e-05, [1] [Cycle 1]: 8.893e-05, [7] [c_1]: 2.773e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.573e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.154e-05 [tuple_transform]: 6.864e-05, [1] [Cycle 1]: 6.444e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 4.478e-05 [cse_after_recomputation]: 2.072e-05, [1] [Cycle 1]: 1.619e-05, [1] [cse]: 1.113e-05 [environ_conv]: 4.57e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 3.95998e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.08998e-06 [reorder_send_recv_between_fp_bp]: 2.63e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.30999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.191e-05 [grouped_pairwise_exchange_alltoall]: 1.43002e-06 [offloading_packed_experts]: 3.65998e-06 [overlap_recompute_and_grad_model_parallel]: 4.72e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46002e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.66e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.39999e-06 [split_layernorm_comm]: 1.71998e-06 [handle_group_info]: 9.69972e-07 [symbol_engine_optimizer]: 6.788e-05, [1] [Cycle 1]: 6.385e-05, [6] [build]: 2.22999e-06 [elim_shapecalc]: 8.44002e-06 [elim_not_effective]: 1.143e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.51e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00043822 [validate]: 3.054e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00586607 [execute]: 7.00998e-06 Sums bootstrap : 0.000456s : 3.21% type_inference : 0.004317s : 30.38% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000466s : 3.28% optimize.opt_a.with_stream_mark : 0.000026s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000367s : 2.58% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000445s : 3.13% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000414s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.32% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000438s : 3.08% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005866s : 41.28% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.51% : 0.000022s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.37% : 0.000005s : 4: substitution.graph_param_transform 65.29% : 0.000079s : 2: substitution.inline 2.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.68% : 0.000004s : 4: substitution.remove_not_recompute_node 3.22% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004276 2 91.93% : 0.003931s : 1: type_inference.infer 8.07% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000140 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.49% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.73% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.77% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.40% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.91% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.90% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.34% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.07% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 1.14% : 0.000002s : 8: predicate.special_op_eliminate 1.25% : 0.000002s : 8: predicate.specialize_transform 1.23% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.17% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.73% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.55% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 41.85% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.15% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026145 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.29% : 0.002952s : 1: add_attr 11.26% : 0.002943s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.88% : 0.000491s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000822s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.41% : 0.001938s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000448s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 14.30% : 0.003739s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.78% : 0.000204s : 1: renormalize.infer 0.60% : 0.000156s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.48% : 0.005876s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.56% : 0.004331s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0259913, [24] [bootstrap]: 0.00042682 [type_inference]: 0.00547071 [event_method]: 1.332e-05 [auto_monad]: 5.307e-05 [graph_reusing]: 6.01998e-06 [inline]: 1.71e-06 [add_attr]: 0.0093315, [1] [add_attr_with_inline]: 0.00932339, [1] [Cycle 1]: 4.796e-05, [2] [tag_attr]: 1.579e-05 [meta_addattr_fg_expand]: 4.55999e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 2.624e-05 [insert-virtual-dataset]: 2.26998e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00406917, [53] [py_interpret_to_execute]: 1.959e-05 [rewriter_before_opt_a]: 5.865e-05 [opt_a]: 0.00217132, [2] [Cycle 1]: 0.00156269, [45] [expand_dump_flag]: 2.63003e-06 [switch_simplify]: 3.198e-05 [loop_unroll]: 2.113e-05 [a_1]: 0.00044922 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.41999e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 2.29001e-06 [a_2]: 7.537e-05 [accelerated_algorithm]: 6.18998e-06 [shard]: 2.56e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 7.48999e-06 [auto_parallel]: 6.14999e-06 [parallel]: 1.764e-05 [flash_sp]: 7.08998e-06 [merge_comm]: 3.94002e-06 [allreduce_fusion]: 3.14001e-06 [matmul_add_comm_reduction]: 9.20001e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 6.94001e-06 [virtual_dataset]: 6.03002e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.91998e-06 [merge_forward]: 3.55e-06 [cell_reuse_recompute_pass]: 1.11997e-06 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.82999e-06 [before_grad]: 9.81998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.44001e-06 [after_resolve]: 9.98998e-06 [a_after_grad]: 8.67998e-06 [renormalize]: 0.00047607 [add_forward_monad_depend]: 4.75999e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.318e-05 [cse]: 2.819e-05 [a_3]: 4.086e-05 [Cycle 2]: 0.00059896, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 7.15e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.00012679 [with_stream_mark]: 9.62001e-06 [recompute_prepare]: 5.70001e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.781e-05 [accelerated_algorithm]: 5.84e-06 [shard]: 1.19998e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.05001e-06 [parallel]: 4.3e-06 [flash_sp]: 3.4e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.60002e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 5.86e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 5.26002e-06 [virtual_output]: 5.66003e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84999e-06 [merge_recompute_call_nodes]: 1.20001e-06 [before_grad]: 7.88001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.86999e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.08999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.21e-06 [cse]: 1.309e-05 [a_3]: 3.378e-05 [py_interpret_to_execute_after_opt_a]: 7.56999e-06 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 3.124e-05 [convert_after_rewriter]: 7.25e-06 [order_py_execute_after_rewriter]: 5.65001e-06 [mutable_eliminate]: 0.00049955 [opt_b]: 0.00018394, [1] [Cycle 1]: 0.00017779, [7] [b_1]: 0.00010964 [b_2]: 6.91001e-06 [updatestate_depend_eliminate]: 5.14003e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.59985e-07 [cse]: 1.673e-05 [optimize_parallel_all_gather_comm]: 1.555e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.228e-05 [loop_unroll]: 0.00042028 [opt_after_cconv]: 9.54e-05, [1] [Cycle 1]: 8.99e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.38998e-06 [cse]: 1.673e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.231e-05 [tuple_transform]: 6.97e-05, [1] [Cycle 1]: 6.546e-05, [4] [d_1]: 3.925e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.12001e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.371e-05 [cse_after_recomputation]: 2.032e-05, [1] [Cycle 1]: 1.586e-05, [1] [cse]: 1.067e-05 [environ_conv]: 4.78001e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.60002e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 3.08998e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.28998e-06 [assign_add_opt]: 1.24998e-06 [ForceFp32Comm]: 9.20001e-07 [remove_cast_before_assign_add]: 7.30011e-07 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.185e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 4e-06 [overlap_recompute_and_grad_model_parallel]: 4.25e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 1.79e-06 [overlap_grad_ring_attention]: 3.72998e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 7.02e-05, [1] [Cycle 1]: 6.587e-05, [6] [build]: 2.36998e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.182e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 2.60014e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.493e-05 [get_jit_bprop_graph]: 1.19998e-06 [rewriter_after_jit_bprop_graph]: 3.76999e-06 [opt_after_jit_grad]: 0.00045079 [validate]: 3.285e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.0058692 [execute]: 7.08e-06 Sums bootstrap : 0.000427s : 2.72% type_inference : 0.005471s : 34.87% event_method : 0.000013s : 0.08% auto_monad : 0.000053s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000576s : 3.67% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000476s : 3.03% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000041s : 0.26% optimize.opt_a.a_3 : 0.000075s : 0.48% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000500s : 3.18% optimize.opt_b.b_1 : 0.000110s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000420s : 2.68% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000451s : 2.87% validate : 0.000033s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005869s : 37.41% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000169 30 14.74% : 0.000025s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.07% : 0.000005s : 4: substitution.graph_param_transform 66.48% : 0.000112s : 3: substitution.inline 2.13% : 0.000004s : 4: substitution.j_node_and_user_rematch 2.77% : 0.000005s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.53% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005429 2 90.17% : 0.004896s : 1: type_inference.infer 9.83% : 0.000534s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.06% : 0.000026s : 3: replace.inline 30.94% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.72% : 0.000110s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.74% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.19% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.36% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.10% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.01% : 0.000002s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.18% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.30% : 0.000002s : 21: predicate.replace_applicator 0.55% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.76% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.73% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.40% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.06% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 45.68% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.32% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.040966 196 0.01% : 0.000004s : 1: ForceFp32Comm 22.79% : 0.009336s : 1: add_attr 22.77% : 0.009327s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.12% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.14% : 0.000058s : 1: auto_monad 0.05% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 1.13% : 0.000463s : 1: bootstrap 0.06% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.04% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.06% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.05% : 0.000019s : 1: event_method 0.03% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.05% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 1.24% : 0.000510s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.03% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000013s : 1: opt.transform.mutable_eliminate 2.30% : 0.000942s : 78: opt.transform.opt_a 0.06% : 0.000026s : 1: opt.transform.opt_after_cconv 0.05% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.22% : 0.000091s : 28: opt.transform.opt_b 0.11% : 0.000043s : 2: opt.transform.opt_trans_graph 0.08% : 0.000032s : 4: opt.transform.symbol_engine_opt 5.31% : 0.002174s : 1: opt_a 0.24% : 0.000099s : 1: opt_after_cconv 1.12% : 0.000460s : 1: opt_after_jit_grad 0.46% : 0.000187s : 1: opt_b 9.94% : 0.004073s : 1: optimize 0.05% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.05% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000031s : 1: pre_auto_parallel 0.06% : 0.000024s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.04% : 0.000016s : 1: remove_dup_value 0.62% : 0.000253s : 1: renormalize.infer 0.53% : 0.000216s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000035s : 1: rewriter_after_opt_a 0.15% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.18% : 0.000073s : 1: symbol_engine_optimizer 14.35% : 0.005880s : 1: task_emit 0.18% : 0.000073s : 1: tuple_transform 13.39% : 0.005485s : 1: type_inference 0.15% : 0.000060s : 1: validate TotalTime = 0.0378239, [24] [bootstrap]: 0.00046017 [type_inference]: 0.0113265 [event_method]: 4.778e-05 [auto_monad]: 0.00011844 [graph_reusing]: 7.81001e-06 [inline]: 2.30002e-06 [add_attr]: 0.0030574, [1] [add_attr_with_inline]: 0.00304841, [1] [Cycle 1]: 7.211e-05, [2] [tag_attr]: 3.557e-05 [meta_addattr_fg_expand]: 9.31e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 4.908e-05 [insert-virtual-dataset]: 2.57001e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.0136461, [53] [py_interpret_to_execute]: 3.794e-05 [rewriter_before_opt_a]: 0.0001453 [opt_a]: 0.011276, [3] [Cycle 1]: 0.00721628, [45] [expand_dump_flag]: 4.16001e-06 [switch_simplify]: 7.458e-05 [loop_unroll]: 6.2e-05 [a_1]: 0.00144468 [with_stream_mark]: 2.306e-05 [recompute_prepare]: 2.145e-05 [updatestate_depend_eliminate]: 9.44e-06 [updatestate_assign_eliminate]: 7.92003e-06 [updatestate_loads_eliminate]: 7.25e-06 [parameter_eliminate]: 2.55002e-06 [a_2]: 0.00025853 [accelerated_algorithm]: 3.154e-05 [shard]: 1.93002e-06 [meta_shard_fg_expand]: 3.26999e-06 [shard_inline]: 1.606e-05 [merge_send_recv]: 1.615e-05 [auto_parallel]: 1.119e-05 [parallel]: 1.82e-05 [flash_sp]: 1.121e-05 [merge_comm]: 9.57001e-06 [allreduce_fusion]: 8.88002e-06 [matmul_add_comm_reduction]: 2.716e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.8e-05 [virtual_dataset]: 1.548e-05 [get_grad_eliminate_]: 1.533e-05 [virtual_output]: 1.485e-05 [merge_forward]: 9.25999e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 1.719e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.925e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 2.661e-05 [set_forward_comm_id_for_comm_node_pass]: 9.50001e-06 [meta_fg_expand]: 0.00142753 [flash_sp_send_recv_attached]: 4.38001e-06 [receive_attached]: 2.53e-06 [after_resolve]: 5.912e-05 [a_after_grad]: 8.1e-05 [renormalize]: 0.00256191 [add_forward_monad_depend]: 9.24e-06 [auto_monad_grad]: 5.31002e-06 [auto_monad_eliminator]: 5.596e-05 [cse]: 0.00016676 [a_3]: 0.00033582 [Cycle 2]: 0.00313115, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.671e-05 [loop_unroll]: 4.394e-05 [a_1]: 0.00153912 [with_stream_mark]: 1.272e-05 [recompute_prepare]: 1.12e-05 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 4.29002e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 1.11002e-06 [a_2]: 0.00012683 [accelerated_algorithm]: 1.224e-05 [shard]: 1.19e-06 [meta_shard_fg_expand]: 2.02999e-06 [shard_inline]: 9.79e-06 [merge_send_recv]: 7.59002e-06 [auto_parallel]: 7.6e-06 [parallel]: 6.19001e-06 [flash_sp]: 3.45003e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 4.89e-06 [matmul_add_comm_reduction]: 8.18001e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.029e-05 [virtual_dataset]: 9.04998e-06 [get_grad_eliminate_]: 8.90999e-06 [virtual_output]: 8.80999e-06 [merge_forward]: 4.81002e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.623e-05 [merge_recompute_call_nodes]: 9.90025e-07 [before_grad]: 1.45e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 8.068e-05 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 1.622e-05 [a_after_grad]: 1.434e-05 [renormalize]: 0.00068671 [add_forward_monad_depend]: 4.20999e-06 [auto_monad_grad]: 1.40001e-06 [auto_monad_eliminator]: 1.584e-05 [cse]: 4.919e-05 [a_3]: 6.525e-05 [Cycle 3]: 0.00091346, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 1.074e-05 [loop_unroll]: 9.14998e-06 [a_1]: 0.00025464 [with_stream_mark]: 1.077e-05 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 4.82998e-06 [updatestate_assign_eliminate]: 3.91001e-06 [updatestate_loads_eliminate]: 3.83999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012365 [accelerated_algorithm]: 1.183e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 7.07002e-06 [auto_parallel]: 7.26001e-06 [parallel]: 5.21002e-06 [flash_sp]: 1.13001e-06 [merge_comm]: 4.87998e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.63999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 8.69998e-06 [get_grad_eliminate_]: 8.44002e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.44002e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 8.50001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.68e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.505e-05 [set_forward_comm_id_for_comm_node_pass]: 6.24999e-06 [meta_fg_expand]: 2.98998e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.409e-05 [a_after_grad]: 1.414e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 1.086e-05 [cse]: 2.719e-05 [a_3]: 5.938e-05 [py_interpret_to_execute_after_opt_a]: 1.15e-05 [slice_cell_reuse_recomputed_activation]: 2.32001e-06 [rewriter_after_opt_a]: 4.805e-05 [convert_after_rewriter]: 9.24998e-06 [order_py_execute_after_rewriter]: 6.88e-06 [mutable_eliminate]: 0.00053068 [opt_b]: 0.00029055, [1] [Cycle 1]: 0.00028402, [7] [b_1]: 0.00019028 [b_2]: 1.097e-05 [updatestate_depend_eliminate]: 7.33e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.98001e-06 [renormalize]: 4.00003e-07 [cse]: 3.213e-05 [optimize_parallel_all_gather_comm]: 2.074e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.173e-05 [loop_unroll]: 0.00045093 [opt_after_cconv]: 0.00013625, [1] [Cycle 1]: 0.00013003, [7] [c_1]: 4.794e-05 [parameter_eliminate]: 2.52001e-06 [updatestate_depend_eliminate]: 7.45998e-06 [updatestate_assign_eliminate]: 4.20999e-06 [updatestate_loads_eliminate]: 4.12e-06 [cse]: 2.985e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 2.959e-05 [tuple_transform]: 0.00010159, [1] [Cycle 1]: 9.687e-05, [4] [d_1]: 6.727e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 9.67001e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 5.669e-05 [cse_after_recomputation]: 3.185e-05, [1] [Cycle 1]: 2.723e-05, [1] [cse]: 2.195e-05 [environ_conv]: 9.08002e-06 [swap_dp_allreduce_reducescatter]: 7.55998e-06 [bias_add_comm_swap]: 2.88998e-06 [label_micro_interleaved_index]: 4.47e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.26997e-06 [ForceFp32Comm]: 1.14e-06 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.45002e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.67e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.02e-06 [overlap_recompute_and_grad_model_parallel]: 5.87001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 5.10001e-06 [overlap_grad_flash_sp]: 2.372e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.30002e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 9.813e-05, [1] [Cycle 1]: 9.361e-05, [6] [build]: 9.66998e-06 [elim_shapecalc]: 1.265e-05 [elim_not_effective]: 1.848e-05 [opt_reshape]: 9.95002e-06 [fold_const_symbol]: 1.511e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.55001e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 2.439e-05 [get_jit_bprop_graph]: 1.24e-06 [rewriter_after_jit_bprop_graph]: 3.83999e-06 [opt_after_jit_grad]: 0.00049416 [validate]: 4.477e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.00830793 [execute]: 6.93998e-06 Sums bootstrap : 0.000460s : 1.37% type_inference : 0.011326s : 33.81% event_method : 0.000048s : 0.14% auto_monad : 0.000118s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000145s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.39% optimize.opt_a.loop_unroll : 0.000115s : 0.34% optimize.opt_a.a_1 : 0.003238s : 9.67% optimize.opt_a.with_stream_mark : 0.000047s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000509s : 1.52% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000031s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000030s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001511s : 4.51% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003249s : 9.70% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.25% optimize.opt_a.cse : 0.000243s : 0.73% optimize.opt_a.a_3 : 0.000460s : 1.37% optimize.py_interpret_to_execute_after_opt_a : 0.000012s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000531s : 1.58% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.06% optimize.loop_unroll : 0.000451s : 1.35% optimize.opt_after_cconv.c_1 : 0.000048s : 0.14% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000494s : 1.47% validate : 0.000045s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008308s : 24.80% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000774 222 6.10% : 0.000047s : 12: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.53% : 0.000430s : 17: substitution.inline 2.04% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.65% : 0.000005s : 5: substitution.partial_eliminate 1.92% : 0.000015s : 20: substitution.remove_not_recompute_node 3.16% : 0.000024s : 10: substitution.replace_applicator 1.38% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.64% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.58% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011251 2 86.86% : 0.009773s : 1: type_inference.infer 13.14% : 0.001478s : 1: type_inference.specialize ------[replace.] 0.000221 33 57.56% : 0.000127s : 17: replace.inline 42.44% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000455 33 92.55% : 0.000421s : 17: match.inline 7.45% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000754 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.53% : 0.000042s : 249: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.71% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.15% : 0.000009s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.14% : 0.000009s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.98% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.34% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001663 34 53.87% : 0.000896s : 13: func_graph_cloner_run.FuncGraphClonerGraph 46.13% : 0.000767s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.063009 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.86% : 0.003062s : 1: add_attr 4.84% : 0.003052s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000125s : 1: auto_monad 0.04% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.79% : 0.000495s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000460s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.86% : 0.000541s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.79% : 0.004905s : 117: opt.transform.opt_a 0.07% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.90% : 0.011279s : 1: opt_a 0.22% : 0.000140s : 1: opt_after_cconv 0.80% : 0.000504s : 1: opt_after_jit_grad 0.47% : 0.000294s : 1: opt_b 21.66% : 0.013650s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.73% : 0.001720s : 2: renormalize.infer 2.41% : 0.001516s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000101s : 1: symbol_engine_optimizer 13.20% : 0.008319s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.00% : 0.011343s : 1: type_inference 0.12% : 0.000078s : 1: validate TotalTime = 0.0187971, [24] [bootstrap]: 0.0004408 [type_inference]: 0.00435483 [event_method]: 1.051e-05 [auto_monad]: 5.211e-05 [graph_reusing]: 5.04998e-06 [inline]: 2.31e-06 [add_attr]: 0.00306577, [1] [add_attr_with_inline]: 0.0030574, [1] [Cycle 1]: 4.68e-05, [2] [tag_attr]: 1.244e-05 [meta_addattr_fg_expand]: 3.26001e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.231e-05 [insert-virtual-dataset]: 2.35997e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 1.68002e-06 [optimize]: 0.00381547, [53] [py_interpret_to_execute]: 1.474e-05 [rewriter_before_opt_a]: 3.838e-05 [opt_a]: 0.00192088, [2] [Cycle 1]: 0.00131239, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 2.432e-05 [loop_unroll]: 1.394e-05 [a_1]: 0.00029417 [with_stream_mark]: 1.343e-05 [recompute_prepare]: 7.46999e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.6e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.76998e-06 [a_2]: 7.567e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.58002e-06 [shard_inline]: 6.10002e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.72999e-06 [parallel]: 1.681e-05 [flash_sp]: 7.37997e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 8.46002e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.93002e-06 [get_grad_eliminate_]: 5.68002e-06 [virtual_output]: 5.88002e-06 [merge_forward]: 4.01001e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 8.85999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.04e-05 [merge_recompute_call_nodes]: 1.77001e-06 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 1.96e-06 [after_resolve]: 1.062e-05 [a_after_grad]: 8.55001e-06 [renormalize]: 0.00040013 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.87001e-06 [auto_monad_eliminator]: 1.259e-05 [cse]: 2.787e-05 [a_3]: 3.982e-05 [Cycle 2]: 0.00059886, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.51999e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012506 [with_stream_mark]: 1.061e-05 [recompute_prepare]: 5.96e-06 [updatestate_depend_eliminate]: 2.85002e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.58003e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.73e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.16997e-06 [shard_inline]: 5.12e-06 [merge_send_recv]: 4.18999e-06 [auto_parallel]: 5.32001e-06 [parallel]: 5.32001e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.00999e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.24998e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.25001e-06 [merge_forward]: 2.64999e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86998e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.89002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.016e-05 [a_after_grad]: 8.69e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 6.26998e-06 [cse]: 1.32e-05 [a_3]: 3.285e-05 [py_interpret_to_execute_after_opt_a]: 7.75998e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.181e-05 [convert_after_rewriter]: 7.25e-06 [order_py_execute_after_rewriter]: 5.34e-06 [mutable_eliminate]: 0.00052964 [opt_b]: 0.00018131, [1] [Cycle 1]: 0.00017535, [7] [b_1]: 0.00010868 [b_2]: 6.73e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 2.89991e-07 [cse]: 1.657e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.315e-05 [loop_unroll]: 0.00042262 [opt_after_cconv]: 9.575e-05, [1] [Cycle 1]: 8.978e-05, [7] [c_1]: 2.815e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.614e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.212e-05 [tuple_transform]: 6.971e-05, [1] [Cycle 1]: 6.49e-05, [4] [d_1]: 3.886e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.245e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.088e-05 [environ_conv]: 4.45999e-06 [swap_dp_allreduce_reducescatter]: 4.85001e-06 [bias_add_comm_swap]: 2.56998e-06 [label_micro_interleaved_index]: 4.02e-06 [label_fine_grained_interleaved_index]: 3.03e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.44001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.128e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.28001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 3.82998e-06 [overlap_grad_flash_sp]: 1.667e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.89e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.705e-05, [1] [Cycle 1]: 6.301e-05, [6] [build]: 2.17999e-06 [elim_shapecalc]: 7.98999e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 6.20002e-06 [fold_const_symbol]: 8.74998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.483e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00045798 [validate]: 3.24e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00629571 [execute]: 7.65e-06 Sums bootstrap : 0.000441s : 2.99% type_inference : 0.004355s : 29.50% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000419s : 2.84% optimize.opt_a.with_stream_mark : 0.000024s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.97% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000400s : 2.71% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000530s : 3.59% optimize.opt_b.b_1 : 0.000109s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000423s : 2.86% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000458s : 3.10% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006296s : 42.65% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.45% : 0.000022s : 4: substitution.arithmetic_simplify 1.39% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.61% : 0.000006s : 4: substitution.graph_param_transform 65.43% : 0.000079s : 2: substitution.inline 2.16% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.75% : 0.000005s : 4: substitution.remove_not_recompute_node 3.22% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004310 2 92.04% : 0.003967s : 1: type_inference.infer 7.96% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000135 984 0.91% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.76% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.14% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.06% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.95% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.87% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 1.09% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.32% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.91% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 42.61% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.39% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027006 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.37% : 0.003070s : 1: add_attr 11.33% : 0.003061s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.76% : 0.000476s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 2.00% : 0.000539s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.85% : 0.000771s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.12% : 0.001924s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.73% : 0.000468s : 1: opt_after_jit_grad 0.68% : 0.000185s : 1: opt_b 14.14% : 0.003819s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.85% : 0.000231s : 1: renormalize.infer 0.60% : 0.000163s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.35% : 0.006307s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.19% : 0.004372s : 1: type_inference 0.23% : 0.000061s : 1: validate TotalTime = 0.0386488, [24] [bootstrap]: 0.00053662 [type_inference]: 0.0110281 [event_method]: 4.704e-05 [auto_monad]: 0.00011958 [graph_reusing]: 8.12e-06 [inline]: 2.01e-06 [add_attr]: 0.00330302, [1] [add_attr_with_inline]: 0.00329364, [1] [Cycle 1]: 7.526e-05, [2] [tag_attr]: 3.396e-05 [meta_addattr_fg_expand]: 8.65001e-06 [parallel-infer-symbol]: 3.37997e-06 [pre_auto_parallel]: 4.88e-05 [insert-virtual-dataset]: 2.39999e-06 [parallel-infer-symbol-second]: 8.90024e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.0143141, [53] [py_interpret_to_execute]: 3.641e-05 [rewriter_before_opt_a]: 0.00013467 [opt_a]: 0.0118152, [3] [Cycle 1]: 0.00757621, [45] [expand_dump_flag]: 4.43999e-06 [switch_simplify]: 6.711e-05 [loop_unroll]: 5.466e-05 [a_1]: 0.00146899 [with_stream_mark]: 2.631e-05 [recompute_prepare]: 2.19e-05 [updatestate_depend_eliminate]: 9.50001e-06 [updatestate_assign_eliminate]: 8.15999e-06 [updatestate_loads_eliminate]: 7.21999e-06 [parameter_eliminate]: 2.71e-06 [a_2]: 0.00024842 [accelerated_algorithm]: 3.256e-05 [shard]: 1.92001e-06 [meta_shard_fg_expand]: 4.79e-06 [shard_inline]: 1.654e-05 [merge_send_recv]: 1.555e-05 [auto_parallel]: 1.138e-05 [parallel]: 1.977e-05 [flash_sp]: 1.178e-05 [merge_comm]: 9.52001e-06 [allreduce_fusion]: 8.67e-06 [matmul_add_comm_reduction]: 2.756e-05 [allreduce_slice_to_reducescatter]: 1.10999e-06 [virtual_shard_identity]: 1.773e-05 [virtual_dataset]: 1.587e-05 [get_grad_eliminate_]: 1.535e-05 [virtual_output]: 1.545e-05 [merge_forward]: 9.88002e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 1.899e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.885e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 2.783e-05 [set_forward_comm_id_for_comm_node_pass]: 9.76998e-06 [meta_fg_expand]: 0.00155015 [flash_sp_send_recv_attached]: 4.03001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 6.22e-05 [a_after_grad]: 8.378e-05 [renormalize]: 0.00273497 [add_forward_monad_depend]: 1.01e-05 [auto_monad_grad]: 6.62002e-06 [auto_monad_eliminator]: 5.945e-05 [cse]: 0.00017793 [a_3]: 0.00034409 [Cycle 2]: 0.00329746, [45] [expand_dump_flag]: 2.02001e-06 [switch_simplify]: 4.725e-05 [loop_unroll]: 4.43e-05 [a_1]: 0.0016328 [with_stream_mark]: 1.668e-05 [recompute_prepare]: 1.184e-05 [updatestate_depend_eliminate]: 5.71e-06 [updatestate_assign_eliminate]: 4.67e-06 [updatestate_loads_eliminate]: 3.86999e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 0.00012779 [accelerated_algorithm]: 1.395e-05 [shard]: 1.42999e-06 [meta_shard_fg_expand]: 2.65002e-06 [shard_inline]: 9.57001e-06 [merge_send_recv]: 8.72e-06 [auto_parallel]: 8.50999e-06 [parallel]: 7.75998e-06 [flash_sp]: 3.61001e-06 [merge_comm]: 6.07999e-06 [allreduce_fusion]: 5.04003e-06 [matmul_add_comm_reduction]: 9.76998e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 1.03e-05 [virtual_dataset]: 9.22001e-06 [get_grad_eliminate_]: 8.82999e-06 [virtual_output]: 8.50001e-06 [merge_forward]: 5.15999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.062e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.735e-05 [merge_recompute_call_nodes]: 9.5999e-07 [before_grad]: 1.45e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 5.029e-05 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 1.44e-06 [after_resolve]: 1.579e-05 [a_after_grad]: 1.522e-05 [renormalize]: 0.00075244 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.38002e-06 [auto_monad_eliminator]: 1.698e-05 [cse]: 5.468e-05 [a_3]: 6.944e-05 [Cycle 3]: 0.00092456, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.108e-05 [loop_unroll]: 9.15999e-06 [a_1]: 0.00026308 [with_stream_mark]: 1.031e-05 [recompute_prepare]: 9.62001e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 3.78999e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 9.60019e-07 [a_2]: 0.00012436 [accelerated_algorithm]: 1.181e-05 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 9.06002e-06 [merge_send_recv]: 7.40998e-06 [auto_parallel]: 7.53999e-06 [parallel]: 5.34e-06 [flash_sp]: 1.22e-06 [merge_comm]: 4.89e-06 [allreduce_fusion]: 4.95999e-06 [matmul_add_comm_reduction]: 8.32003e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 1.019e-05 [virtual_dataset]: 8.74003e-06 [get_grad_eliminate_]: 8.48999e-06 [virtual_output]: 8.39002e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 1.56998e-06 [offload_activation]: 9.16002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.576e-05 [merge_recompute_call_nodes]: 9.69972e-07 [before_grad]: 1.441e-05 [set_forward_comm_id_for_comm_node_pass]: 4.98001e-06 [meta_fg_expand]: 3.06999e-06 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.349e-05 [a_after_grad]: 1.421e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.40999e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 1.176e-05 [cse]: 2.777e-05 [a_3]: 5.989e-05 [py_interpret_to_execute_after_opt_a]: 1.423e-05 [slice_cell_reuse_recomputed_activation]: 2.17999e-06 [rewriter_after_opt_a]: 4.912e-05 [convert_after_rewriter]: 9.61e-06 [order_py_execute_after_rewriter]: 6.86001e-06 [mutable_eliminate]: 0.00056244 [opt_b]: 0.00029608, [1] [Cycle 1]: 0.0002887, [7] [b_1]: 0.00019428 [b_2]: 1.103e-05 [updatestate_depend_eliminate]: 7.31999e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 3.84002e-06 [renormalize]: 5.3001e-07 [cse]: 3.294e-05 [optimize_parallel_all_gather_comm]: 2.156e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 2.381e-05 [loop_unroll]: 0.0004724 [opt_after_cconv]: 0.00014003, [1] [Cycle 1]: 0.00013379, [7] [c_1]: 4.982e-05 [parameter_eliminate]: 2.33002e-06 [updatestate_depend_eliminate]: 7.53999e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.83999e-06 [cse]: 3.098e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 3.135e-05 [tuple_transform]: 0.0001034, [1] [Cycle 1]: 9.872e-05, [4] [d_1]: 6.822e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.81e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 6.157e-05 [cse_after_recomputation]: 3.303e-05, [1] [Cycle 1]: 2.792e-05, [1] [cse]: 2.196e-05 [environ_conv]: 9.27999e-06 [swap_dp_allreduce_reducescatter]: 8.19998e-06 [bias_add_comm_swap]: 2.86999e-06 [label_micro_interleaved_index]: 4.60999e-06 [label_fine_grained_interleaved_index]: 2.49999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.81999e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.73998e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.11997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.743e-05 [grouped_pairwise_exchange_alltoall]: 1.57999e-06 [offloading_packed_experts]: 5.51e-06 [overlap_recompute_and_grad_model_parallel]: 5.44e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 5.66e-06 [overlap_grad_flash_sp]: 2.599e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 0.00010144, [1] [Cycle 1]: 9.665e-05, [6] [build]: 9.87999e-06 [elim_shapecalc]: 1.416e-05 [elim_not_effective]: 1.863e-05 [opt_reshape]: 1.045e-05 [fold_const_symbol]: 1.498e-05 [renormalize]: 1.59984e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.631e-05 [get_jit_bprop_graph]: 1.83002e-06 [rewriter_after_jit_bprop_graph]: 3.79002e-06 [opt_after_jit_grad]: 0.00051759 [validate]: 4.958e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00838612 [execute]: 7.92e-06 Sums bootstrap : 0.000537s : 1.58% type_inference : 0.011028s : 32.46% event_method : 0.000047s : 0.14% auto_monad : 0.000120s : 0.35% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000135s : 0.40% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000125s : 0.37% optimize.opt_a.loop_unroll : 0.000108s : 0.32% optimize.opt_a.a_1 : 0.003365s : 9.90% optimize.opt_a.with_stream_mark : 0.000053s : 0.16% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000501s : 1.47% optimize.opt_a.accelerated_algorithm : 0.000058s : 0.17% optimize.opt_a.shard : 0.000005s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000009s : 0.03% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000032s : 0.09% optimize.opt_a.auto_parallel : 0.000027s : 0.08% optimize.opt_a.parallel : 0.000033s : 0.10% optimize.opt_a.flash_sp : 0.000017s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000046s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000039s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000004s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001604s : 4.72% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.27% optimize.opt_a.a_after_grad : 0.000113s : 0.33% optimize.opt_a.renormalize : 0.003488s : 10.27% optimize.opt_a.add_forward_monad_depend : 0.000016s : 0.05% optimize.opt_a.auto_monad_grad : 0.000009s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000088s : 0.26% optimize.opt_a.cse : 0.000260s : 0.77% optimize.opt_a.a_3 : 0.000473s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000014s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000562s : 1.66% optimize.opt_b.b_1 : 0.000194s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.07% optimize.loop_unroll : 0.000472s : 1.39% optimize.opt_after_cconv.c_1 : 0.000050s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000031s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000062s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000026s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000002s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000518s : 1.52% validate : 0.000050s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008386s : 24.68% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000934 218 5.32% : 0.000050s : 11: substitution.arithmetic_simplify 1.68% : 0.000016s : 2: substitution.cast_eliminate 0.31% : 0.000003s : 5: substitution.elim_not_effective 0.46% : 0.000004s : 5: substitution.float_depend_g_call 0.50% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.23% : 0.000002s : 5: substitution.fold_const_symbol 0.84% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.21% : 0.000002s : 2: substitution.incorporate_call_switch 56.57% : 0.000528s : 16: substitution.inline 1.90% : 0.000018s : 2: substitution.inline_without_move 1.17% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.74% : 0.000016s : 3: substitution.less_batch_normalization 1.45% : 0.000014s : 11: substitution.minmaximum_grad 0.63% : 0.000006s : 5: substitution.partial_eliminate 1.54% : 0.000014s : 20: substitution.remove_not_recompute_node 2.78% : 0.000026s : 10: substitution.replace_applicator 1.14% : 0.000011s : 15: substitution.replace_old_param 0.26% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.14% : 0.000029s : 11: substitution.tuple_list_convert_item_index_to_positive 1.53% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 1.99% : 0.000019s : 11: substitution.tuple_list_get_item_depend_reorder 12.25% : 0.000114s : 28: substitution.tuple_list_get_item_eliminator 2.05% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010952 2 87.44% : 0.009577s : 1: type_inference.infer 12.56% : 0.001375s : 1: type_inference.specialize ------[replace.] 0.000211 30 58.13% : 0.000123s : 16: replace.inline 41.87% : 0.000089s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000601 30 86.45% : 0.000519s : 16: match.inline 13.55% : 0.000081s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.33% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.14% : 0.000016s : 99: predicate.arithmetic_simplify 1.19% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.73% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.51% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.70% : 0.000013s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.38% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 97: predicate.partial_defer_inline 1.67% : 0.000012s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.56% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 67: predicate.reduce_eliminate 2.62% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.31% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.23% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.78% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.83% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.59% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.23% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001607 32 56.64% : 0.000910s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.36% : 0.000697s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.065078 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.08% : 0.003308s : 1: add_attr 5.07% : 0.003298s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000126s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000572s : 1: bootstrap 0.04% : 0.000028s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000054s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000482s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.88% : 0.000573s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000018s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000019s : 1: opt.transform.mutable_eliminate 7.75% : 0.005044s : 117: opt.transform.opt_a 0.07% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.27% : 0.000179s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.08% : 0.000055s : 4: opt.transform.symbol_engine_opt 18.16% : 0.011819s : 1: opt_a 0.22% : 0.000144s : 1: opt_after_cconv 0.81% : 0.000529s : 1: opt_after_jit_grad 0.46% : 0.000300s : 1: opt_b 22.00% : 0.014319s : 1: optimize 0.04% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000029s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000053s : 1: pre_auto_parallel 0.06% : 0.000041s : 1: py_interpret_to_execute 0.03% : 0.000018s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000036s : 1: remove_dup_value 2.94% : 0.001911s : 2: renormalize.infer 2.40% : 0.001561s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000053s : 1: rewriter_after_opt_a 0.21% : 0.000139s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000104s : 1: symbol_engine_optimizer 12.91% : 0.008398s : 1: task_emit 0.16% : 0.000106s : 1: tuple_transform 16.98% : 0.011050s : 1: type_inference 0.14% : 0.000088s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x0-kbk],max_mem:56.0M TotalTime = 0.0820112, [24] [bootstrap]: 0.00057757 [type_inference]: 0.0061309 [event_method]: 1.421e-05 [auto_monad]: 5.545e-05 [graph_reusing]: 5.67001e-06 [inline]: 1.89e-06 [add_attr]: 0.0034372, [1] [add_attr_with_inline]: 0.0034251, [1] [Cycle 1]: 4.568e-05, [2] [tag_attr]: 1.546e-05 [meta_addattr_fg_expand]: 4.12e-06 [parallel-infer-symbol]: 2.82002e-06 [pre_auto_parallel]: 2.862e-05 [insert-virtual-dataset]: 2.55997e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.23002e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00396676, [53] [py_interpret_to_execute]: 2.054e-05 [rewriter_before_opt_a]: 5.959e-05 [opt_a]: 0.00214529, [2] [Cycle 1]: 0.00151976, [45] [expand_dump_flag]: 2.81999e-06 [switch_simplify]: 3.266e-05 [loop_unroll]: 2.144e-05 [a_1]: 0.00045349 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 7.24001e-06 [updatestate_depend_eliminate]: 3.75998e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 7.619e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 8.38001e-06 [auto_parallel]: 6.06998e-06 [parallel]: 2.502e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.53e-06 [matmul_add_comm_reduction]: 9.32001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.38e-06 [virtual_dataset]: 6.33e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.47999e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 9.74e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.051e-05 [a_after_grad]: 8.49998e-06 [renormalize]: 0.000414 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.431e-05 [cse]: 2.78e-05 [a_3]: 4.05e-05 [Cycle 2]: 0.00061575, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.35001e-06 [a_1]: 0.00012657 [with_stream_mark]: 9.85002e-06 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 6.684e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.60001e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 5.14e-06 [parallel]: 3.93001e-06 [flash_sp]: 3.02002e-06 [merge_comm]: 3.09001e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 5.92001e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.83e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.50001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.86001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.59e-06 [a_after_grad]: 8e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 9.80013e-07 [auto_monad_eliminator]: 9.44998e-06 [cse]: 1.334e-05 [a_3]: 3.338e-05 [py_interpret_to_execute_after_opt_a]: 7.46001e-06 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 3.181e-05 [convert_after_rewriter]: 7.14001e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.0004376 [opt_b]: 0.00018071, [1] [Cycle 1]: 0.00017493, [7] [b_1]: 0.00010757 [b_2]: 7.1e-06 [updatestate_depend_eliminate]: 5.04003e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 8.99978e-07 [cse]: 1.626e-05 [optimize_parallel_all_gather_comm]: 1.597e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.371e-05 [loop_unroll]: 0.00040449 [opt_after_cconv]: 9.413e-05, [1] [Cycle 1]: 8.861e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.36998e-06 [cse]: 1.588e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.294e-05 [tuple_transform]: 6.822e-05, [1] [Cycle 1]: 6.385e-05, [4] [d_1]: 3.854e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.15002e-06 [partial_unused_args_eliminate]: 1.88997e-06 [add_recomputation]: 5.215e-05 [cse_after_recomputation]: 2.01e-05, [1] [Cycle 1]: 1.571e-05, [1] [cse]: 1.063e-05 [environ_conv]: 4.63001e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.59999e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.95002e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.70997e-06 [micro_interleaved_order_control]: 2.37001e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.78998e-06 [comm_op_add_attrs]: 1.04003e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.206e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.35998e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39998e-06 [overlap_recompute_comm]: 2.53998e-06 [overlap_grad_ring_attention]: 4.21001e-06 [overlap_grad_flash_sp]: 1.738e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.89e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 6.733e-05, [1] [Cycle 1]: 6.318e-05, [6] [build]: 1.94999e-06 [elim_shapecalc]: 8.37998e-06 [elim_not_effective]: 1.154e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.90001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.66998e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.596e-05 [get_jit_bprop_graph]: 9.00007e-07 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00043768 [validate]: 3.045e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0670764 [execute]: 9.13002e-06 Sums bootstrap : 0.000578s : 0.74% type_inference : 0.006131s : 7.90% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000060s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000580s : 0.75% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.18% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000029s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000414s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000024s : 0.03% optimize.opt_a.cse : 0.000041s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000438s : 0.56% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000404s : 0.52% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000438s : 0.56% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.067076s : 86.45% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.75% : 0.000024s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.20% : 0.000005s : 4: substitution.graph_param_transform 66.67% : 0.000110s : 3: substitution.inline 1.83% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000004s : 4: substitution.remove_not_recompute_node 2.45% : 0.000004s : 4: substitution.replace_old_param 6.64% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006082 2 90.75% : 0.005520s : 1: type_inference.infer 9.25% : 0.000563s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.09% : 0.000028s : 3: replace.inline 29.91% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.59% : 0.000108s : 3: match.inline 8.41% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.90% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.46% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.19% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 1.09% : 0.000002s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000352 8 46.84% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.16% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.090924 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.79% : 0.003442s : 1: add_attr 3.77% : 0.003429s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000056s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.68% : 0.000615s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.45% : 0.000412s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.49% : 0.000446s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.04% : 0.000946s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.36% : 0.002148s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.49% : 0.000447s : 1: opt_after_jit_grad 0.20% : 0.000184s : 1: opt_b 4.37% : 0.003971s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000214s : 1: renormalize.infer 0.21% : 0.000193s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 73.79% : 0.067094s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 6.76% : 0.006146s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.0754666, [24] [bootstrap]: 0.00042008 [type_inference]: 0.00436908 [event_method]: 1.029e-05 [auto_monad]: 4.83e-05 [graph_reusing]: 4.98001e-06 [inline]: 1.87001e-06 [add_attr]: 0.00286072, [1] [add_attr_with_inline]: 0.00285322, [1] [Cycle 1]: 4.02e-05, [2] [tag_attr]: 1.2e-05 [meta_addattr_fg_expand]: 3.38e-06 [parallel-infer-symbol]: 2.32999e-06 [pre_auto_parallel]: 2.033e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.92999e-06 [pipeline_split]: 1.66998e-06 [optimize]: 0.00362285, [53] [py_interpret_to_execute]: 1.406e-05 [rewriter_before_opt_a]: 3.842e-05 [opt_a]: 0.00183808, [2] [Cycle 1]: 0.00124001, [45] [expand_dump_flag]: 3.18e-06 [switch_simplify]: 2.309e-05 [loop_unroll]: 1.321e-05 [a_1]: 0.00028324 [with_stream_mark]: 1.272e-05 [recompute_prepare]: 7.02002e-06 [updatestate_depend_eliminate]: 3.39001e-06 [updatestate_assign_eliminate]: 2.86e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 7.47e-05 [accelerated_algorithm]: 6.45002e-06 [shard]: 1.74998e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 6.36e-06 [auto_parallel]: 5.94999e-06 [parallel]: 1.492e-05 [flash_sp]: 6.89999e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.47002e-06 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 7.5e-06 [virtual_dataset]: 6.02001e-06 [get_grad_eliminate_]: 5.56998e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 9.80013e-07 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.16003e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 1.55001e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 8.85999e-06 [renormalize]: 0.00033443 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.317e-05 [cse]: 2.444e-05 [a_3]: 3.986e-05 [Cycle 2]: 0.00058873, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.48002e-06 [a_1]: 0.00012287 [with_stream_mark]: 1.112e-05 [recompute_prepare]: 5.52001e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.753e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.04e-06 [parallel]: 4.28999e-06 [flash_sp]: 2.98e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 5.37001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 5.92999e-06 [virtual_dataset]: 5.25999e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46998e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.80998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.97999e-06 [a_after_grad]: 7.97e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.40024e-07 [auto_monad_eliminator]: 6.03002e-06 [cse]: 1.346e-05 [a_3]: 3.172e-05 [py_interpret_to_execute_after_opt_a]: 7.15e-06 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 3.039e-05 [convert_after_rewriter]: 6.31e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.0004438 [opt_b]: 0.00018041, [1] [Cycle 1]: 0.00017451, [7] [b_1]: 0.00010734 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 3.50003e-07 [cse]: 1.639e-05 [optimize_parallel_all_gather_comm]: 1.534e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.161e-05 [loop_unroll]: 0.00040833 [opt_after_cconv]: 9.529e-05, [1] [Cycle 1]: 8.975e-05, [7] [c_1]: 2.776e-05 [parameter_eliminate]: 2.42001e-06 [updatestate_depend_eliminate]: 5.62001e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.631e-05 [renormalize]: 5.79981e-07 [remove_dup_value]: 1.161e-05 [tuple_transform]: 6.992e-05, [1] [Cycle 1]: 6.577e-05, [4] [d_1]: 4.031e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.50001e-06 [add_recomputation]: 4.159e-05 [cse_after_recomputation]: 1.977e-05, [1] [Cycle 1]: 1.545e-05, [1] [cse]: 1.045e-05 [environ_conv]: 4.49002e-06 [swap_dp_allreduce_reducescatter]: 5.52999e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 3.95998e-06 [label_fine_grained_interleaved_index]: 2.99001e-06 [merge_cast_opt]: 9.09989e-07 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.59001e-06 [assign_add_opt]: 9.99979e-07 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 1.13001e-06 [full_micro_interleaved_order_control]: 2.43998e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 7.09988e-07 [add_comm_op_reuse_tag]: 8.2e-07 [interleave_split_concat_branches]: 1.20001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.39e-06 [control_data_broadcast_order]: 1.155e-05 [grouped_pairwise_exchange_alltoall]: 1.89999e-06 [offloading_packed_experts]: 4.08999e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.37001e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.576e-05 [begin_end_overlap_inline]: 8.2e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.48002e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.751e-05, [1] [Cycle 1]: 6.343e-05, [6] [build]: 2.02999e-06 [elim_shapecalc]: 8.15999e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.30002e-06 [fold_const_symbol]: 8.50001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.369e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00044413 [validate]: 2.957e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0634117 [execute]: 7.92e-06 Sums bootstrap : 0.000420s : 0.59% type_inference : 0.004369s : 6.10% event_method : 0.000010s : 0.01% auto_monad : 0.000048s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000002s : 0.00% pre_auto_parallel : 0.000020s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000406s : 0.57% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000011s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000019s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000335s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000038s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000444s : 0.62% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000408s : 0.57% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000001s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000014s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.62% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063412s : 88.51% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000112 26 18.54% : 0.000021s : 4: substitution.arithmetic_simplify 1.64% : 0.000002s : 2: substitution.elim_not_effective 1.11% : 0.000001s : 2: substitution.fold_const_symbol 4.55% : 0.000005s : 4: substitution.graph_param_transform 64.59% : 0.000072s : 2: substitution.inline 2.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.75% : 0.000004s : 4: substitution.remove_not_recompute_node 3.40% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004329 2 90.83% : 0.003932s : 1: type_inference.infer 9.17% : 0.000397s : 1: type_inference.specialize ------[replace.] 0.000017 2 100.00% : 0.000017s : 2: replace.inline ------[match.] 0.000071 2 100.00% : 0.000071s : 2: match.inline ------[predicate.] 0.000136 984 0.90% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000003s : 17: predicate.arithmetic_simplify 0.89% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.58% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.87% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.91% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.07% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.32% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.58% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.61% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000269 6 43.45% : 0.000117s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.55% : 0.000152s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.083196 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.44% : 0.002865s : 1: add_attr 3.43% : 0.002857s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000045s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000053s : 1: auto_monad 0.02% : 0.000017s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.53% : 0.000443s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000009s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000015s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000417s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.54% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.90% : 0.000752s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000045s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.21% : 0.001841s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.55% : 0.000454s : 1: opt_after_jit_grad 0.22% : 0.000184s : 1: opt_b 4.36% : 0.003626s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000024s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.22% : 0.000179s : 1: renormalize.infer 0.18% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.24% : 0.063428s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.27% : 0.004383s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.0760622, [24] [bootstrap]: 0.00040868 [type_inference]: 0.00680654 [event_method]: 1.476e-05 [auto_monad]: 5.176e-05 [graph_reusing]: 5.58002e-06 [inline]: 1.79e-06 [add_attr]: 0.00296254, [1] [add_attr_with_inline]: 0.0029545, [1] [Cycle 1]: 4.242e-05, [2] [tag_attr]: 1.462e-05 [meta_addattr_fg_expand]: 3.88001e-06 [parallel-infer-symbol]: 2.22999e-06 [pre_auto_parallel]: 2.32e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.69e-06 [pipeline_split]: 1.22e-06 [optimize]: 0.00390887, [53] [py_interpret_to_execute]: 1.905e-05 [rewriter_before_opt_a]: 5.67e-05 [opt_a]: 0.00206544, [2] [Cycle 1]: 0.0014571, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 3.078e-05 [loop_unroll]: 2.063e-05 [a_1]: 0.00043283 [with_stream_mark]: 1.244e-05 [recompute_prepare]: 7.54002e-06 [updatestate_depend_eliminate]: 3.40998e-06 [updatestate_assign_eliminate]: 2.86e-06 [updatestate_loads_eliminate]: 2.28998e-06 [parameter_eliminate]: 2.01e-06 [a_2]: 7.506e-05 [accelerated_algorithm]: 6.19999e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 5.90002e-06 [auto_parallel]: 6.68e-06 [parallel]: 1.488e-05 [flash_sp]: 6.36e-06 [merge_comm]: 3.94002e-06 [allreduce_fusion]: 3.06001e-06 [matmul_add_comm_reduction]: 7.56001e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.63002e-06 [merge_forward]: 3.83999e-06 [cell_reuse_recompute_pass]: 1.50001e-06 [offload_activation]: 7.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.016e-05 [merge_recompute_call_nodes]: 9.60019e-07 [before_grad]: 9.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.22001e-06 [receive_attached]: 1.92001e-06 [after_resolve]: 1.019e-05 [a_after_grad]: 8.49002e-06 [renormalize]: 0.00040474 [add_forward_monad_depend]: 4.50001e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.284e-05 [cse]: 2.319e-05 [a_3]: 4.096e-05 [Cycle 2]: 0.00059918, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.86e-06 [a_1]: 0.00012564 [with_stream_mark]: 9.29998e-06 [recompute_prepare]: 6.10002e-06 [updatestate_depend_eliminate]: 2.93e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.32999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.806e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.34e-06 [merge_send_recv]: 4.44998e-06 [auto_parallel]: 5.07e-06 [parallel]: 3.9e-06 [flash_sp]: 3.37002e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.61999e-06 [virtual_dataset]: 5.51e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.87002e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.04999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.35001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.13999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 9.96998e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 7.90023e-07 [auto_monad_eliminator]: 6.39001e-06 [cse]: 1.287e-05 [a_3]: 3.246e-05 [py_interpret_to_execute_after_opt_a]: 7.49002e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 2.911e-05 [convert_after_rewriter]: 6.16e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00048026 [opt_b]: 0.00018262, [1] [Cycle 1]: 0.00017623, [7] [b_1]: 0.00010817 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 4.10015e-07 [cse]: 1.657e-05 [optimize_parallel_all_gather_comm]: 1.499e-05 [overlap_param_gather]: 2.43e-06 [cconv]: 2.176e-05 [loop_unroll]: 0.00041156 [opt_after_cconv]: 9.453e-05, [1] [Cycle 1]: 8.893e-05, [7] [c_1]: 2.715e-05 [parameter_eliminate]: 2.38998e-06 [updatestate_depend_eliminate]: 5.37999e-06 [updatestate_assign_eliminate]: 2.58003e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.628e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.161e-05 [tuple_transform]: 6.773e-05, [1] [Cycle 1]: 6.375e-05, [4] [d_1]: 3.806e-05 [none_parameter_eliminate]: 1.42999e-06 [renormalize]: 1.29978e-07 [switch_simplify]: 6.49999e-06 [partial_unused_args_eliminate]: 1.45001e-06 [add_recomputation]: 3.915e-05 [cse_after_recomputation]: 1.975e-05, [1] [Cycle 1]: 1.542e-05, [1] [cse]: 1.064e-05 [environ_conv]: 4.3e-06 [swap_dp_allreduce_reducescatter]: 5.21002e-06 [bias_add_comm_swap]: 2.42001e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.43998e-06 [merge_cast_opt]: 9.30013e-07 [slice_recompute_activation]: 2.01e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.06002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 6.19999e-07 [add_comm_op_reuse_tag]: 7.79983e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 8.29983e-07 [overlap_opt_shard_in_pipeline]: 9.5999e-07 [overlap_opt_shard_grad_in_pipeline]: 1.71998e-06 [control_data_broadcast_order]: 1.123e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.94002e-06 [overlap_recompute_and_grad_model_parallel]: 4.40999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.16997e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.571e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 1.58002e-06 [split_layernorm_comm]: 1.45999e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.929e-05, [1] [Cycle 1]: 6.507e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.43001e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.50999e-06 [pipeline_parallel_scheduler]: 1.82001e-06 [auto_monad_reorder]: 1.44e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00044681 [validate]: 3.12e-05 [backend_pass]: 6.69999e-07 [task_emit]: 0.0611753 [execute]: 7.95e-06 Sums bootstrap : 0.000409s : 0.57% type_inference : 0.006807s : 9.43% event_method : 0.000015s : 0.02% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000002s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000558s : 0.77% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000010s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000019s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000014s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000405s : 0.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000036s : 0.05% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000029s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000480s : 0.67% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.57% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000001s : 0.00% optimize.add_recomputation : 0.000039s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000001s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000014s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.62% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.061175s : 84.79% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000154 30 15.29% : 0.000024s : 5: substitution.arithmetic_simplify 1.31% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.30% : 0.000005s : 4: substitution.graph_param_transform 65.03% : 0.000100s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.86% : 0.000004s : 4: substitution.replace_old_param 7.00% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006768 2 91.29% : 0.006179s : 1: type_inference.infer 8.71% : 0.000589s : 1: type_inference.specialize ------[replace.] 0.000036 5 67.42% : 0.000024s : 3: replace.inline 32.58% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000108 5 90.95% : 0.000098s : 3: match.inline 9.05% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.51% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.20% : 0.000008s : 54: predicate.switch_simplify 0.89% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.13% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.28% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000377 8 39.49% : 0.000149s : 3: func_graph_cloner_run.FuncGraphClonerGraph 60.51% : 0.000228s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.084417 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.51% : 0.002967s : 1: add_attr 3.50% : 0.002958s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.05% : 0.000043s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.51% : 0.000433s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000003s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000009s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000003s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000490s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.09% : 0.000922s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.45% : 0.002068s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.54% : 0.000457s : 1: opt_after_jit_grad 0.22% : 0.000186s : 1: opt_b 4.64% : 0.003913s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.23% : 0.000197s : 1: renormalize.infer 0.24% : 0.000202s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000033s : 1: rewriter_after_opt_a 0.07% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 72.49% : 0.061191s : 1: task_emit 0.08% : 0.000070s : 1: tuple_transform 8.08% : 0.006820s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.112066, [24] [bootstrap]: 0.00049239 [type_inference]: 0.0116375 [event_method]: 4.972e-05 [auto_monad]: 0.0001218 [graph_reusing]: 8.60001e-06 [inline]: 1.76e-06 [add_attr]: 0.00298973, [1] [add_attr_with_inline]: 0.00298131, [1] [Cycle 1]: 7.015e-05, [2] [tag_attr]: 3.436e-05 [meta_addattr_fg_expand]: 9.52999e-06 [parallel-infer-symbol]: 3.11001e-06 [pre_auto_parallel]: 4.826e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.28998e-06 [pipeline_split]: 1.27999e-06 [optimize]: 0.0133631, [53] [py_interpret_to_execute]: 4.01e-05 [rewriter_before_opt_a]: 0.00014476 [opt_a]: 0.0110913, [3] [Cycle 1]: 0.00714194, [45] [expand_dump_flag]: 3.9e-06 [switch_simplify]: 7.438e-05 [loop_unroll]: 6.166e-05 [a_1]: 0.00144754 [with_stream_mark]: 2.309e-05 [recompute_prepare]: 2.136e-05 [updatestate_depend_eliminate]: 9.02999e-06 [updatestate_assign_eliminate]: 7.50998e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.85002e-06 [a_2]: 0.00027746 [accelerated_algorithm]: 3.211e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 3.14999e-06 [shard_inline]: 1.635e-05 [merge_send_recv]: 1.609e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.921e-05 [flash_sp]: 1.18e-05 [merge_comm]: 9.63002e-06 [allreduce_fusion]: 9.14e-06 [matmul_add_comm_reduction]: 2.666e-05 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 1.788e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.539e-05 [virtual_output]: 1.492e-05 [merge_forward]: 9.37999e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 1.804e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.873e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 2.672e-05 [set_forward_comm_id_for_comm_node_pass]: 9.12999e-06 [meta_fg_expand]: 0.00141394 [flash_sp_send_recv_attached]: 3.8e-06 [receive_attached]: 2.16e-06 [after_resolve]: 5.94e-05 [a_after_grad]: 7.966e-05 [renormalize]: 0.00247949 [add_forward_monad_depend]: 1.019e-05 [auto_monad_grad]: 5.81e-06 [auto_monad_eliminator]: 5.639e-05 [cse]: 0.000168 [a_3]: 0.0003338 [Cycle 2]: 0.0030358, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.692e-05 [loop_unroll]: 4.357e-05 [a_1]: 0.00152583 [with_stream_mark]: 1.198e-05 [recompute_prepare]: 1.09e-05 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 4.49998e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012581 [accelerated_algorithm]: 1.179e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 9.17999e-06 [merge_send_recv]: 6.62002e-06 [auto_parallel]: 8.06001e-06 [parallel]: 4.70001e-06 [flash_sp]: 3.44001e-06 [merge_comm]: 5.83002e-06 [allreduce_fusion]: 4.84998e-06 [matmul_add_comm_reduction]: 8.23001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.97e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.55999e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 9.09989e-07 [offload_activation]: 8.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.634e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.424e-05 [set_forward_comm_id_for_comm_node_pass]: 5.65001e-06 [meta_fg_expand]: 7.039e-05 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.26002e-06 [after_resolve]: 1.704e-05 [a_after_grad]: 1.494e-05 [renormalize]: 0.00062627 [add_forward_monad_depend]: 3.97e-06 [auto_monad_grad]: 1.13001e-06 [auto_monad_eliminator]: 1.447e-05 [cse]: 4.732e-05 [a_3]: 6.514e-05 [Cycle 3]: 0.00089938, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.08e-05 [loop_unroll]: 9.15001e-06 [a_1]: 0.00024894 [with_stream_mark]: 9.87999e-06 [recompute_prepare]: 9.31e-06 [updatestate_depend_eliminate]: 4.62e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012315 [accelerated_algorithm]: 1.148e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 9.12001e-06 [merge_send_recv]: 6.79999e-06 [auto_parallel]: 7.17997e-06 [parallel]: 4.52e-06 [flash_sp]: 1.02998e-06 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 4.80001e-06 [matmul_add_comm_reduction]: 7.47998e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 9.84999e-06 [virtual_dataset]: 8.75999e-06 [get_grad_eliminate_]: 8.41002e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.22003e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 8.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.715e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.471e-05 [set_forward_comm_id_for_comm_node_pass]: 5.55001e-06 [meta_fg_expand]: 2.92002e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 1.354e-05 [a_after_grad]: 1.401e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.29998e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 1.067e-05 [cse]: 2.688e-05 [a_3]: 5.935e-05 [py_interpret_to_execute_after_opt_a]: 1.048e-05 [slice_cell_reuse_recomputed_activation]: 2.41998e-06 [rewriter_after_opt_a]: 4.746e-05 [convert_after_rewriter]: 9.86998e-06 [order_py_execute_after_rewriter]: 7.00002e-06 [mutable_eliminate]: 0.00045895 [opt_b]: 0.00028832, [1] [Cycle 1]: 0.00028234, [7] [b_1]: 0.00018901 [b_2]: 1.064e-05 [updatestate_depend_eliminate]: 7.35e-06 [updatestate_assign_eliminate]: 4.03999e-06 [updatestate_loads_eliminate]: 4.01001e-06 [renormalize]: 5.3001e-07 [cse]: 3.194e-05 [optimize_parallel_all_gather_comm]: 2.173e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.021e-05 [loop_unroll]: 0.00042056 [opt_after_cconv]: 0.00013587, [1] [Cycle 1]: 0.00012991, [7] [c_1]: 4.825e-05 [parameter_eliminate]: 2.40002e-06 [updatestate_depend_eliminate]: 7.13998e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.90998e-06 [cse]: 2.972e-05 [renormalize]: 5.8001e-07 [remove_dup_value]: 2.917e-05 [tuple_transform]: 0.00010059, [1] [Cycle 1]: 9.617e-05, [4] [d_1]: 6.677e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 9.67999e-06 [partial_unused_args_eliminate]: 1.83002e-06 [add_recomputation]: 6.024e-05 [cse_after_recomputation]: 3.215e-05, [1] [Cycle 1]: 2.768e-05, [1] [cse]: 2.238e-05 [environ_conv]: 8.92e-06 [swap_dp_allreduce_reducescatter]: 7.92e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.60001e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.60001e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.09003e-06 [add_comm_op_reuse_tag]: 1.34998e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.734e-05 [grouped_pairwise_exchange_alltoall]: 1.87001e-06 [offloading_packed_experts]: 5.29e-06 [overlap_recompute_and_grad_model_parallel]: 5.72001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 5.25001e-06 [overlap_grad_flash_sp]: 2.46e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.69001e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 9.771e-05, [1] [Cycle 1]: 9.359e-05, [6] [build]: 1.014e-05 [elim_shapecalc]: 1.329e-05 [elim_not_effective]: 1.76e-05 [opt_reshape]: 9.84999e-06 [fold_const_symbol]: 1.508e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.61998e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 2.437e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00046504 [validate]: 4.545e-05 [backend_pass]: 1.01997e-06 [task_emit]: 0.0825803 [execute]: 8.63001e-06 Sums bootstrap : 0.000492s : 0.46% type_inference : 0.011637s : 10.79% event_method : 0.000050s : 0.05% auto_monad : 0.000122s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.04% optimize.rewriter_before_opt_a : 0.000145s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.12% optimize.opt_a.loop_unroll : 0.000114s : 0.11% optimize.opt_a.a_1 : 0.003222s : 2.99% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000526s : 0.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001487s : 1.38% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.08% optimize.opt_a.a_after_grad : 0.000109s : 0.10% optimize.opt_a.renormalize : 0.003106s : 2.88% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000242s : 0.22% optimize.opt_a.a_3 : 0.000458s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000459s : 0.43% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000421s : 0.39% optimize.opt_after_cconv.c_1 : 0.000048s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000060s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000465s : 0.43% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.082580s : 76.59% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000796 222 5.63% : 0.000045s : 12: substitution.arithmetic_simplify 1.75% : 0.000014s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000003s : 5: substitution.fold_const_symbol 0.96% : 0.000008s : 8: substitution.graph_param_transform 4.34% : 0.000035s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 53.39% : 0.000425s : 17: substitution.inline 1.94% : 0.000015s : 2: substitution.inline_without_move 1.28% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000016s : 3: substitution.less_batch_normalization 1.66% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.76% : 0.000014s : 20: substitution.remove_not_recompute_node 3.06% : 0.000024s : 10: substitution.replace_applicator 1.31% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.48% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.70% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.25% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.27% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.30% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011559 2 86.27% : 0.009973s : 1: type_inference.infer 13.73% : 0.001587s : 1: type_inference.specialize ------[replace.] 0.000221 33 57.51% : 0.000127s : 17: replace.inline 42.49% : 0.000094s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 33 92.41% : 0.000417s : 17: match.inline 7.59% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.19% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.24% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.08% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001665 34 56.26% : 0.000937s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.74% : 0.000728s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.136762 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.19% : 0.002994s : 1: add_attr 2.18% : 0.002985s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000064s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000129s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.38% : 0.000525s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000058s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000009s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.59% : 0.004916s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.05% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.11% : 0.011094s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.35% : 0.000475s : 1: opt_after_jit_grad 0.21% : 0.000292s : 1: opt_b 9.77% : 0.013367s : 1: optimize 0.02% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000005s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000006s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.19% : 0.001621s : 2: renormalize.infer 1.08% : 0.001471s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.11% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000100s : 1: symbol_engine_optimizer 60.39% : 0.082597s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 8.52% : 0.011654s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.0727685, [24] [bootstrap]: 0.0004136 [type_inference]: 0.00422458 [event_method]: 1.069e-05 [auto_monad]: 5.303e-05 [graph_reusing]: 4.62998e-06 [inline]: 1.78002e-06 [add_attr]: 0.0029068, [1] [add_attr_with_inline]: 0.00289917, [1] [Cycle 1]: 4.006e-05, [2] [tag_attr]: 1.197e-05 [meta_addattr_fg_expand]: 3.27002e-06 [parallel-infer-symbol]: 2.83998e-06 [pre_auto_parallel]: 2.075e-05 [insert-virtual-dataset]: 2.19999e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.25001e-06 [optimize]: 0.00370748, [53] [py_interpret_to_execute]: 1.485e-05 [rewriter_before_opt_a]: 3.801e-05 [opt_a]: 0.00185845, [2] [Cycle 1]: 0.00125589, [45] [expand_dump_flag]: 2.95002e-06 [switch_simplify]: 2.535e-05 [loop_unroll]: 1.377e-05 [a_1]: 0.00028784 [with_stream_mark]: 1.315e-05 [recompute_prepare]: 7.34002e-06 [updatestate_depend_eliminate]: 3.55003e-06 [updatestate_assign_eliminate]: 3.59002e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 2.14999e-06 [a_2]: 7.597e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.28998e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 6.66e-06 [auto_parallel]: 5.60001e-06 [parallel]: 1.718e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 8.94998e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.75001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.47001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.173e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 1.007e-05 [set_forward_comm_id_for_comm_node_pass]: 3.55998e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 1.91e-06 [receive_attached]: 2.04e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.69e-06 [renormalize]: 0.00034605 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.412e-05 [cse]: 2.783e-05 [a_3]: 4.07e-05 [Cycle 2]: 0.00059322, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.30001e-06 [a_1]: 0.00012524 [with_stream_mark]: 9.99001e-06 [recompute_prepare]: 5.49998e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.735e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.53999e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 5.19998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.75002e-06 [virtual_dataset]: 5.49e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 5.21002e-06 [merge_forward]: 2.81e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54999e-06 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 2.87002e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 8.82999e-06 [a_after_grad]: 8.17e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.31e-06 [cse]: 1.337e-05 [a_3]: 3.182e-05 [py_interpret_to_execute_after_opt_a]: 7.67002e-06 [slice_cell_reuse_recomputed_activation]: 2.21998e-06 [rewriter_after_opt_a]: 3.081e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.51e-06 [mutable_eliminate]: 0.00044177 [opt_b]: 0.00018069, [1] [Cycle 1]: 0.00017458, [7] [b_1]: 0.00010751 [b_2]: 6.89999e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 4.60015e-07 [cse]: 1.655e-05 [optimize_parallel_all_gather_comm]: 1.621e-05 [overlap_param_gather]: 2.13002e-06 [cconv]: 2.375e-05 [loop_unroll]: 0.00045419 [opt_after_cconv]: 9.625e-05, [1] [Cycle 1]: 9.07e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 2.37999e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.638e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.171e-05 [tuple_transform]: 6.885e-05, [1] [Cycle 1]: 6.45e-05, [4] [d_1]: 3.913e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 4.348e-05 [cse_after_recomputation]: 2.057e-05, [1] [Cycle 1]: 1.615e-05, [1] [cse]: 1.093e-05 [environ_conv]: 4.55001e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 3.01001e-06 [label_micro_interleaved_index]: 4.74e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.41998e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 1.91998e-06 [reorder_send_recv_between_fp_bp]: 2.99001e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 1.22999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.166e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.64001e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.714e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.79001e-06 [split_layernorm_comm]: 1.49e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 6.826e-05, [1] [Cycle 1]: 6.423e-05, [6] [build]: 2.58003e-06 [elim_shapecalc]: 8.39998e-06 [elim_not_effective]: 1.142e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.48999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.506e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00044089 [validate]: 3.237e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.0607213 [execute]: 8.38999e-06 Sums bootstrap : 0.000414s : 0.60% type_inference : 0.004225s : 6.13% event_method : 0.000011s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000413s : 0.60% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000011s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000346s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000442s : 0.64% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000454s : 0.66% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000003s : 0.00% optimize.split_layernorm_comm : 0.000001s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000441s : 0.64% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060721s : 88.12% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000118 26 19.45% : 0.000023s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.38% : 0.000005s : 4: substitution.graph_param_transform 64.50% : 0.000076s : 2: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.85% : 0.000005s : 4: substitution.remove_not_recompute_node 2.95% : 0.000003s : 4: substitution.replace_old_param ------[type_inference.] 0.004185 2 91.64% : 0.003835s : 1: type_inference.infer 8.36% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000017 2 100.00% : 0.000017s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000138 984 1.11% : 0.000002s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.15% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.17% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000251 6 43.38% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.62% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080651 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.61% : 0.002911s : 1: add_attr 3.60% : 0.002903s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.55% : 0.000441s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.57% : 0.000464s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000452s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.95% : 0.000766s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.31% : 0.001861s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.56% : 0.000451s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.60% : 0.003711s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.23% : 0.000188s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 75.31% : 0.060737s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.26% : 0.004239s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.114241, [24] [bootstrap]: 0.00049778 [type_inference]: 0.010451 [event_method]: 4.402e-05 [auto_monad]: 0.00013174 [graph_reusing]: 7.6e-06 [inline]: 1.86e-06 [add_attr]: 0.00297622, [1] [add_attr_with_inline]: 0.00296797, [1] [Cycle 1]: 6.664e-05, [2] [tag_attr]: 3.157e-05 [meta_addattr_fg_expand]: 8.52e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 4.606e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.013193, [53] [py_interpret_to_execute]: 3.509e-05 [rewriter_before_opt_a]: 0.00012749 [opt_a]: 0.0108787, [3] [Cycle 1]: 0.00699868, [45] [expand_dump_flag]: 3.58e-06 [switch_simplify]: 6.624e-05 [loop_unroll]: 5.436e-05 [a_1]: 0.00138294 [with_stream_mark]: 2.39e-05 [recompute_prepare]: 2.188e-05 [updatestate_depend_eliminate]: 9.04e-06 [updatestate_assign_eliminate]: 8.23001e-06 [updatestate_loads_eliminate]: 7.5e-06 [parameter_eliminate]: 2.91e-06 [a_2]: 0.00024188 [accelerated_algorithm]: 3.107e-05 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 3.27002e-06 [shard_inline]: 1.57e-05 [merge_send_recv]: 1.627e-05 [auto_parallel]: 1.057e-05 [parallel]: 1.787e-05 [flash_sp]: 1.102e-05 [merge_comm]: 9.64e-06 [allreduce_fusion]: 8.70999e-06 [matmul_add_comm_reduction]: 2.677e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.733e-05 [virtual_dataset]: 1.556e-05 [get_grad_eliminate_]: 1.485e-05 [virtual_output]: 1.526e-05 [merge_forward]: 9.22999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.81e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.813e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.726e-05 [set_forward_comm_id_for_comm_node_pass]: 9.53997e-06 [meta_fg_expand]: 0.00139436 [flash_sp_send_recv_attached]: 3.97998e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 5.984e-05 [a_after_grad]: 8.056e-05 [renormalize]: 0.0024724 [add_forward_monad_depend]: 9.46e-06 [auto_monad_grad]: 5.57001e-06 [auto_monad_eliminator]: 5.637e-05 [cse]: 0.00017018 [a_3]: 0.00033182 [Cycle 2]: 0.0029651, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.67e-05 [loop_unroll]: 4.363e-05 [a_1]: 0.00153769 [with_stream_mark]: 1.208e-05 [recompute_prepare]: 1.101e-05 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 1.20999e-06 [a_2]: 0.00012556 [accelerated_algorithm]: 1.187e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 6.66999e-06 [auto_parallel]: 7.35e-06 [parallel]: 4.68999e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 4.92e-06 [allreduce_fusion]: 4.67e-06 [matmul_add_comm_reduction]: 7.48999e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 9.71e-06 [virtual_dataset]: 8.82999e-06 [get_grad_eliminate_]: 8.64998e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.619e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.421e-05 [set_forward_comm_id_for_comm_node_pass]: 5.00001e-06 [meta_fg_expand]: 3.511e-05 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.478e-05 [a_after_grad]: 1.412e-05 [renormalize]: 0.00059022 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 1.455e-05 [cse]: 4.757e-05 [a_3]: 6.549e-05 [Cycle 3]: 0.00090052, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 1.088e-05 [loop_unroll]: 8.96002e-06 [a_1]: 0.000248 [with_stream_mark]: 9.92999e-06 [recompute_prepare]: 9.34e-06 [updatestate_depend_eliminate]: 4.70999e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 4.03999e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.00012355 [accelerated_algorithm]: 1.164e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 8.90001e-06 [merge_send_recv]: 7.03998e-06 [auto_parallel]: 6.94001e-06 [parallel]: 4.68999e-06 [flash_sp]: 1.13001e-06 [merge_comm]: 5.24998e-06 [allreduce_fusion]: 4.99e-06 [matmul_add_comm_reduction]: 7.66999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.029e-05 [virtual_dataset]: 8.48999e-06 [get_grad_eliminate_]: 8.30999e-06 [virtual_output]: 8.24002e-06 [merge_forward]: 4.35999e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 8.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.582e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19998e-06 [meta_fg_expand]: 2.90998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.501e-05 [a_after_grad]: 1.455e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 1.082e-05 [cse]: 2.729e-05 [a_3]: 5.935e-05 [py_interpret_to_execute_after_opt_a]: 1.008e-05 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 4.832e-05 [convert_after_rewriter]: 9.21998e-06 [order_py_execute_after_rewriter]: 6.87002e-06 [mutable_eliminate]: 0.00045722 [opt_b]: 0.00029407, [1] [Cycle 1]: 0.00028838, [7] [b_1]: 0.00019496 [b_2]: 1.063e-05 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 3.95e-06 [renormalize]: 2.60014e-07 [cse]: 3.243e-05 [optimize_parallel_all_gather_comm]: 2.134e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.176e-05 [loop_unroll]: 0.00042167 [opt_after_cconv]: 0.00013616, [1] [Cycle 1]: 0.00013064, [7] [c_1]: 4.863e-05 [parameter_eliminate]: 2.50002e-06 [updatestate_depend_eliminate]: 6.96001e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.91999e-06 [cse]: 3.069e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 2.856e-05 [tuple_transform]: 0.00010198, [1] [Cycle 1]: 9.733e-05, [4] [d_1]: 6.728e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 9.86998e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 5.854e-05 [cse_after_recomputation]: 9.402e-05, [1] [Cycle 1]: 8.911e-05, [1] [cse]: 8.27e-05 [environ_conv]: 9.74e-06 [swap_dp_allreduce_reducescatter]: 8.35999e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.53999e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.31002e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 7.89994e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.11002e-06 [overlap_opt_shard_in_pipeline]: 1.19998e-06 [overlap_opt_shard_grad_in_pipeline]: 2.01e-06 [control_data_broadcast_order]: 1.737e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 5.57999e-06 [overlap_recompute_and_grad_model_parallel]: 5.49e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 5.17999e-06 [overlap_grad_flash_sp]: 2.434e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 2.06998e-06 [handle_group_info]: 1.41998e-06 [symbol_engine_optimizer]: 9.953e-05, [1] [Cycle 1]: 9.521e-05, [6] [build]: 1.027e-05 [elim_shapecalc]: 1.355e-05 [elim_not_effective]: 1.808e-05 [opt_reshape]: 1.015e-05 [fold_const_symbol]: 1.483e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 2.547e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.65998e-06 [opt_after_jit_grad]: 0.0004667 [validate]: 4.637e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.0861228 [execute]: 8.65999e-06 Sums bootstrap : 0.000498s : 0.45% type_inference : 0.010451s : 9.50% event_method : 0.000044s : 0.04% auto_monad : 0.000132s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000124s : 0.11% optimize.opt_a.loop_unroll : 0.000107s : 0.10% optimize.opt_a.a_1 : 0.003169s : 2.88% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000491s : 0.45% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001432s : 1.30% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.08% optimize.opt_a.a_after_grad : 0.000109s : 0.10% optimize.opt_a.renormalize : 0.003063s : 2.78% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.07% optimize.opt_a.cse : 0.000245s : 0.22% optimize.opt_a.a_3 : 0.000457s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.42% optimize.opt_b.b_1 : 0.000195s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000422s : 0.38% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.05% optimize.cse_after_recomputation.cse : 0.000083s : 0.08% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000006s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000467s : 0.42% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.086123s : 78.28% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000788 218 5.61% : 0.000044s : 11: substitution.arithmetic_simplify 1.78% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.31% : 0.000002s : 2: substitution.incorporate_call_switch 50.90% : 0.000401s : 16: substitution.inline 1.99% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000015s : 3: substitution.less_batch_normalization 1.69% : 0.000013s : 11: substitution.minmaximum_grad 7.48% : 0.000059s : 5: substitution.partial_eliminate 1.69% : 0.000013s : 20: substitution.remove_not_recompute_node 3.05% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.49% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.73% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.28% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.81% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.27% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010382 2 86.67% : 0.008999s : 1: type_inference.infer 13.33% : 0.001384s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.50% : 0.000118s : 16: replace.inline 41.50% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.78% : 0.000393s : 16: match.inline 7.22% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.32% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.08% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.66% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.43% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.66% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001572 32 58.60% : 0.000921s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.40% : 0.000651s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.138606 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.15% : 0.002981s : 1: add_attr 2.14% : 0.002972s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000140s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.38% : 0.000527s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.07% : 0.000097s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.47% : 0.004807s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.05% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 7.85% : 0.010882s : 1: opt_a 0.10% : 0.000140s : 1: opt_after_cconv 0.34% : 0.000477s : 1: opt_after_jit_grad 0.21% : 0.000298s : 1: opt_b 9.52% : 0.013197s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000051s : 1: pre_auto_parallel 0.03% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 1.18% : 0.001637s : 2: renormalize.infer 1.02% : 0.001413s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000102s : 1: symbol_engine_optimizer 62.15% : 0.086138s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 7.55% : 0.010466s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x0-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x1-pynative],max_mem:56.0M TotalTime = 0.0214974, [24] [bootstrap]: 0.00051861 [type_inference]: 0.00617428 [event_method]: 1.391e-05 [auto_monad]: 5.43e-05 [graph_reusing]: 5.77001e-06 [inline]: 1.86e-06 [add_attr]: 0.00342612, [1] [add_attr_with_inline]: 0.00341472, [1] [Cycle 1]: 4.423e-05, [2] [tag_attr]: 1.544e-05 [meta_addattr_fg_expand]: 4.08001e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.67e-05 [insert-virtual-dataset]: 2.28002e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.00397756, [53] [py_interpret_to_execute]: 2.031e-05 [rewriter_before_opt_a]: 5.961e-05 [opt_a]: 0.00212685, [2] [Cycle 1]: 0.00152193, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 3.183e-05 [loop_unroll]: 2.083e-05 [a_1]: 0.00044889 [with_stream_mark]: 1.377e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.83999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.77e-05 [accelerated_algorithm]: 6.49001e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 7.81001e-06 [auto_parallel]: 5.99e-06 [parallel]: 2.133e-05 [flash_sp]: 6.63e-06 [merge_comm]: 3.55998e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.06998e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 5.89999e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.72999e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 9.37001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.46e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.46998e-06 [flash_sp_send_recv_attached]: 2.26998e-06 [receive_attached]: 2.11e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 8.94e-06 [renormalize]: 0.00042553 [add_forward_monad_depend]: 4.71002e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.355e-05 [cse]: 2.876e-05 [a_3]: 4.035e-05 [Cycle 2]: 0.00059552, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.68002e-06 [a_1]: 0.0001249 [with_stream_mark]: 9.45001e-06 [recompute_prepare]: 5.75001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.40002e-06 [parameter_eliminate]: 8.19971e-07 [a_2]: 6.799e-05 [accelerated_algorithm]: 5.74e-06 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.38001e-06 [flash_sp]: 3.29001e-06 [merge_comm]: 3.35e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.17001e-06 [virtual_dataset]: 5.67001e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 4.99998e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.57001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52999e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.52999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.12001e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.73e-06 [cse]: 1.425e-05 [a_3]: 3.247e-05 [py_interpret_to_execute_after_opt_a]: 7.98001e-06 [slice_cell_reuse_recomputed_activation]: 1.85001e-06 [rewriter_after_opt_a]: 2.816e-05 [convert_after_rewriter]: 7.01001e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.00044605 [opt_b]: 0.00018037, [1] [Cycle 1]: 0.00017419, [7] [b_1]: 0.0001063 [b_2]: 6.85998e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 4.69998e-07 [cse]: 1.69e-05 [optimize_parallel_all_gather_comm]: 1.544e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 3.404e-05 [loop_unroll]: 0.00041776 [opt_after_cconv]: 9.683e-05, [1] [Cycle 1]: 9.11e-05, [7] [c_1]: 2.778e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.46e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.712e-05 [renormalize]: 4.20026e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 7.015e-05, [1] [Cycle 1]: 6.564e-05, [4] [d_1]: 3.981e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.41998e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.989e-05 [cse_after_recomputation]: 2.206e-05, [1] [Cycle 1]: 1.749e-05, [1] [cse]: 1.196e-05 [environ_conv]: 4.56002e-06 [swap_dp_allreduce_reducescatter]: 4.94e-06 [bias_add_comm_swap]: 3.06001e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.12e-06 [slice_recompute_activation]: 2.48e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.24998e-06 [full_micro_interleaved_order_control]: 2.24001e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76003e-06 [control_data_broadcast_order]: 1.11e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.52002e-06 [overlap_recompute_and_grad_model_parallel]: 4.31002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.48002e-06 [overlap_recompute_comm]: 1.89999e-06 [overlap_grad_ring_attention]: 3.7e-06 [overlap_grad_flash_sp]: 1.681e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.846e-05, [1] [Cycle 1]: 6.438e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 8.43999e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 9.17001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.518e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 0.00013396 [opt_after_jit_grad]: 0.00044979 [validate]: 3.09e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00644231 [execute]: 7.18998e-06 Sums bootstrap : 0.000519s : 3.03% type_inference : 0.006174s : 36.11% event_method : 0.000014s : 0.08% auto_monad : 0.000054s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000574s : 3.36% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000146s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000426s : 2.49% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.25% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 2.61% optimize.opt_b.b_1 : 0.000106s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000034s : 0.20% optimize.loop_unroll : 0.000418s : 2.44% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000134s : 0.78% opt_after_jit_grad : 0.000450s : 2.63% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006442s : 37.67% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.77% : 0.000024s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000002s : 2: substitution.fold_const_symbol 3.45% : 0.000006s : 4: substitution.graph_param_transform 65.98% : 0.000109s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.87% : 0.000005s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.74% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006130 2 90.29% : 0.005535s : 1: type_inference.infer 9.71% : 0.000595s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.63% : 0.000026s : 3: replace.inline 30.37% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.32% : 0.000107s : 3: match.inline 8.68% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.78% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.41% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.04% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.91% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000009s : 51: predicate.inline 0.95% : 0.000002s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.93% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.96% : 0.000002s : 11: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.06% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 47.52% : 0.000183s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.48% : 0.000203s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030421 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.28% : 0.003431s : 1: add_attr 11.24% : 0.003419s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000059s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.83% : 0.000557s : 1: bootstrap 0.13% : 0.000038s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000088s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.00% : 0.002130s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.51% : 0.000460s : 1: opt_after_jit_grad 0.60% : 0.000184s : 1: opt_b 13.09% : 0.003981s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000215s : 1: renormalize.infer 0.67% : 0.000204s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.46% : 0.000140s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000032s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.21% : 0.006452s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.34% : 0.006188s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.018162, [24] [bootstrap]: 0.0004725 [type_inference]: 0.00438403 [event_method]: 1.107e-05 [auto_monad]: 5.025e-05 [graph_reusing]: 4.89998e-06 [inline]: 1.71e-06 [add_attr]: 0.00298296, [1] [add_attr_with_inline]: 0.00297489, [1] [Cycle 1]: 4.39e-05, [2] [tag_attr]: 1.128e-05 [meta_addattr_fg_expand]: 3.14001e-06 [parallel-infer-symbol]: 3.03998e-06 [pre_auto_parallel]: 2.059e-05 [insert-virtual-dataset]: 2.21998e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00362986, [53] [py_interpret_to_execute]: 1.575e-05 [rewriter_before_opt_a]: 3.87e-05 [opt_a]: 0.00183657, [2] [Cycle 1]: 0.00124002, [45] [expand_dump_flag]: 2.29001e-06 [switch_simplify]: 2.403e-05 [loop_unroll]: 1.347e-05 [a_1]: 0.00029036 [with_stream_mark]: 1.263e-05 [recompute_prepare]: 7.29001e-06 [updatestate_depend_eliminate]: 3.47997e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.61998e-06 [a_2]: 7.642e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 1.46998e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.88001e-06 [auto_parallel]: 5.59e-06 [parallel]: 1.702e-05 [flash_sp]: 6.93998e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.57001e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 7.44002e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.55001e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 9.20001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 9.41e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23998e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.049e-05 [a_after_grad]: 8.64e-06 [renormalize]: 0.00033663 [add_forward_monad_depend]: 4.08999e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.262e-05 [cse]: 2.721e-05 [a_3]: 4.002e-05 [Cycle 2]: 0.00058733, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 6.53e-06 [loop_unroll]: 5.34998e-06 [a_1]: 0.00012287 [with_stream_mark]: 8.96002e-06 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.811e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.08002e-06 [parallel]: 4.12003e-06 [flash_sp]: 3.6e-06 [merge_comm]: 2.93003e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.37001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.10001e-06 [get_grad_eliminate_]: 4.91002e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.75002e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.21002e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 7.2e-07 [auto_monad_eliminator]: 6.38003e-06 [cse]: 1.235e-05 [a_3]: 3.204e-05 [py_interpret_to_execute_after_opt_a]: 7.50998e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.039e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 4.72e-06 [mutable_eliminate]: 0.00044858 [opt_b]: 0.00017818, [1] [Cycle 1]: 0.00017244, [7] [b_1]: 0.00010665 [b_2]: 6.81999e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 3.50003e-07 [cse]: 1.562e-05 [optimize_parallel_all_gather_comm]: 1.494e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 2.149e-05 [loop_unroll]: 0.00041625 [opt_after_cconv]: 9.448e-05, [1] [Cycle 1]: 8.862e-05, [7] [c_1]: 2.743e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 5.61003e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.23998e-06 [cse]: 1.528e-05 [renormalize]: 5.90022e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 6.79e-05, [1] [Cycle 1]: 6.377e-05, [4] [d_1]: 3.841e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.218e-05 [cse_after_recomputation]: 1.965e-05, [1] [Cycle 1]: 1.525e-05, [1] [cse]: 1.006e-05 [environ_conv]: 4.27998e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 2.28998e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.50002e-06 [merge_cast_opt]: 1.24003e-06 [slice_recompute_activation]: 2.27001e-06 [micro_interleaved_order_control]: 2.37001e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49e-06 [control_data_broadcast_order]: 1.11e-05 [grouped_pairwise_exchange_alltoall]: 1.81998e-06 [offloading_packed_experts]: 3.44001e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 4.09002e-06 [overlap_grad_flash_sp]: 1.686e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.42001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 6.732e-05, [1] [Cycle 1]: 6.324e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.1e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 5.95002e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.526e-05 [get_jit_bprop_graph]: 1.31998e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00045051 [validate]: 3.032e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00585414 [execute]: 7.53e-06 Sums bootstrap : 0.000473s : 3.33% type_inference : 0.004384s : 30.89% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000413s : 2.91% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000337s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 3.16% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000416s : 2.93% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000451s : 3.17% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005854s : 41.25% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.51% : 0.000022s : 4: substitution.arithmetic_simplify 1.72% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.58% : 0.000006s : 4: substitution.graph_param_transform 64.92% : 0.000078s : 2: substitution.inline 2.40% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000004s : 4: substitution.remove_not_recompute_node 3.29% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004343 2 92.09% : 0.004000s : 1: type_inference.infer 7.91% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.11% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.37% : 0.000003s : 17: predicate.arithmetic_simplify 0.91% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.96% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.85% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.44% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.31% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.26% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.45% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 1.16% : 0.000002s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.05% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.58% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000267 6 49.21% : 0.000131s : 2: func_graph_cloner_run.FuncGraphClonerGraph 50.79% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025999 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.49% : 0.002987s : 1: add_attr 11.46% : 0.002978s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.96% : 0.000509s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000425s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000764s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001840s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.77% : 0.000460s : 1: opt_after_jit_grad 0.70% : 0.000181s : 1: opt_b 13.98% : 0.003634s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000184s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.55% : 0.005863s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.92% : 0.004398s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0195739, [24] [bootstrap]: 0.00046986 [type_inference]: 0.0055444 [event_method]: 1.378e-05 [auto_monad]: 5.35e-05 [graph_reusing]: 5.54998e-06 [inline]: 1.75001e-06 [add_attr]: 0.00296735, [1] [add_attr_with_inline]: 0.00295899, [1] [Cycle 1]: 4.433e-05, [2] [tag_attr]: 1.536e-05 [meta_addattr_fg_expand]: 3.71999e-06 [parallel-infer-symbol]: 2.55002e-06 [pre_auto_parallel]: 2.458e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.06998e-06 [pipeline_split]: 1.64998e-06 [optimize]: 0.00391909, [53] [py_interpret_to_execute]: 1.965e-05 [rewriter_before_opt_a]: 5.678e-05 [opt_a]: 0.00210946, [2] [Cycle 1]: 0.00148036, [45] [expand_dump_flag]: 2.79001e-06 [switch_simplify]: 3.259e-05 [loop_unroll]: 2.1e-05 [a_1]: 0.00044503 [with_stream_mark]: 1.327e-05 [recompute_prepare]: 8.23001e-06 [updatestate_depend_eliminate]: 3.68999e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.514e-05 [accelerated_algorithm]: 6.38003e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 5.96e-06 [parallel]: 1.664e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 8.22e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.74999e-06 [merge_forward]: 3.56999e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 8.89998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.19e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.33002e-06 [receive_attached]: 2.48002e-06 [after_resolve]: 9.97999e-06 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00040211 [add_forward_monad_depend]: 4.77998e-06 [auto_monad_grad]: 1.93997e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.691e-05 [a_3]: 3.998e-05 [Cycle 2]: 0.00061983, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.19e-06 [a_1]: 0.00012382 [with_stream_mark]: 9.71998e-06 [recompute_prepare]: 6.09999e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.716e-05 [accelerated_algorithm]: 5.39998e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.11002e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.17999e-06 [parallel]: 4.01001e-06 [flash_sp]: 3.53999e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.79999e-06 [matmul_add_comm_reduction]: 5.25001e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 4.92999e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.1e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08998e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.96002e-06 [a_after_grad]: 8.05999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 6.19001e-06 [cse]: 1.542e-05 [a_3]: 3.117e-05 [py_interpret_to_execute_after_opt_a]: 8.19002e-06 [slice_cell_reuse_recomputed_activation]: 2.19999e-06 [rewriter_after_opt_a]: 2.994e-05 [convert_after_rewriter]: 6.51999e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00044388 [opt_b]: 0.00018214, [1] [Cycle 1]: 0.00017609, [7] [b_1]: 0.00010903 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 5.50004e-07 [cse]: 1.507e-05 [optimize_parallel_all_gather_comm]: 1.511e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.064e-05 [loop_unroll]: 0.00041294 [opt_after_cconv]: 9.248e-05, [1] [Cycle 1]: 8.707e-05, [7] [c_1]: 2.714e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.464e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.19e-05 [tuple_transform]: 6.825e-05, [1] [Cycle 1]: 6.371e-05, [4] [d_1]: 3.828e-05 [none_parameter_eliminate]: 1.56002e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.13002e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.218e-05 [cse_after_recomputation]: 1.925e-05, [1] [Cycle 1]: 1.489e-05, [1] [cse]: 9.84001e-06 [environ_conv]: 4.39998e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 4.42998e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.55002e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.35999e-06 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.04998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.47001e-06 [control_data_broadcast_order]: 1.155e-05 [grouped_pairwise_exchange_alltoall]: 1.60999e-06 [offloading_packed_experts]: 3.49001e-06 [overlap_recompute_and_grad_model_parallel]: 4.25e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 3.93999e-06 [overlap_grad_flash_sp]: 1.698e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.28998e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.801e-05, [1] [Cycle 1]: 6.387e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 8.2e-06 [elim_not_effective]: 1.12e-05 [opt_reshape]: 6.15002e-06 [fold_const_symbol]: 8.95999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.468e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.25998e-06 [opt_after_jit_grad]: 0.00044938 [validate]: 3.072e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00586165 [execute]: 7.2e-06 Sums bootstrap : 0.000470s : 3.01% type_inference : 0.005544s : 35.48% event_method : 0.000014s : 0.09% auto_monad : 0.000053s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000057s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000569s : 3.64% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000402s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000444s : 2.84% optimize.opt_b.b_1 : 0.000109s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000015s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.13% optimize.loop_unroll : 0.000413s : 2.64% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.88% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005862s : 37.51% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000163 30 14.98% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.91% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 66.30% : 0.000108s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000004s : 4: substitution.remove_not_recompute_node 2.29% : 0.000004s : 4: substitution.replace_old_param 6.82% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005504 2 90.13% : 0.004961s : 1: type_inference.infer 9.87% : 0.000543s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.53% : 0.000027s : 3: replace.inline 30.47% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.32% : 0.000106s : 3: match.inline 8.68% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.57% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.99% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.72% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000009s : 51: predicate.inline 0.78% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.93% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.79% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.22% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000335 8 45.58% : 0.000152s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.42% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027954 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.63% : 0.002972s : 1: add_attr 10.60% : 0.002963s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.81% : 0.000505s : 1: bootstrap 0.09% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.62% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.34% : 0.000933s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000092s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.56% : 0.002112s : 1: opt_a 0.34% : 0.000096s : 1: opt_after_cconv 1.64% : 0.000459s : 1: opt_after_jit_grad 0.66% : 0.000186s : 1: opt_b 14.03% : 0.003923s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.74% : 0.000207s : 1: renormalize.infer 0.67% : 0.000189s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.00% : 0.005871s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.88% : 0.005558s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0372261, [24] [bootstrap]: 0.00051126 [type_inference]: 0.0113878 [event_method]: 4.589e-05 [auto_monad]: 0.00011963 [graph_reusing]: 8e-06 [inline]: 2.02001e-06 [add_attr]: 0.00301799, [1] [add_attr_with_inline]: 0.00300979, [1] [Cycle 1]: 7e-05, [2] [tag_attr]: 3.472e-05 [meta_addattr_fg_expand]: 9.25001e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 4.832e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.91998e-06 [optimize]: 0.0132378, [53] [py_interpret_to_execute]: 3.986e-05 [rewriter_before_opt_a]: 0.0001437 [opt_a]: 0.0109813, [3] [Cycle 1]: 0.0070384, [45] [expand_dump_flag]: 4.27e-06 [switch_simplify]: 7.337e-05 [loop_unroll]: 6.161e-05 [a_1]: 0.00144111 [with_stream_mark]: 2.311e-05 [recompute_prepare]: 2.165e-05 [updatestate_depend_eliminate]: 8.81002e-06 [updatestate_assign_eliminate]: 8.22e-06 [updatestate_loads_eliminate]: 7.46001e-06 [parameter_eliminate]: 2.44999e-06 [a_2]: 0.00026045 [accelerated_algorithm]: 3.116e-05 [shard]: 2.06998e-06 [meta_shard_fg_expand]: 3.26001e-06 [shard_inline]: 1.635e-05 [merge_send_recv]: 1.706e-05 [auto_parallel]: 1.068e-05 [parallel]: 1.763e-05 [flash_sp]: 1.078e-05 [merge_comm]: 9.94999e-06 [allreduce_fusion]: 8.77999e-06 [matmul_add_comm_reduction]: 2.693e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.794e-05 [virtual_dataset]: 1.556e-05 [get_grad_eliminate_]: 1.5e-05 [virtual_output]: 1.501e-05 [merge_forward]: 9.04998e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 1.769e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.879e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 2.755e-05 [set_forward_comm_id_for_comm_node_pass]: 9.45001e-06 [meta_fg_expand]: 0.001394 [flash_sp_send_recv_attached]: 3.71999e-06 [receive_attached]: 2.66e-06 [after_resolve]: 5.94e-05 [a_after_grad]: 8.076e-05 [renormalize]: 0.00243276 [add_forward_monad_depend]: 8.83001e-06 [auto_monad_grad]: 5.09998e-06 [auto_monad_eliminator]: 5.555e-05 [cse]: 0.0001608 [a_3]: 0.00033504 [Cycle 2]: 0.0030292, [45] [expand_dump_flag]: 1.59e-06 [switch_simplify]: 4.691e-05 [loop_unroll]: 4.351e-05 [a_1]: 0.00151931 [with_stream_mark]: 1.15e-05 [recompute_prepare]: 1.105e-05 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 0.00012633 [accelerated_algorithm]: 1.23e-05 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 9.29998e-06 [merge_send_recv]: 6.63e-06 [auto_parallel]: 7.55998e-06 [parallel]: 4.65999e-06 [flash_sp]: 3.2e-06 [merge_comm]: 5.13002e-06 [allreduce_fusion]: 4.58999e-06 [matmul_add_comm_reduction]: 7.65998e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.81002e-06 [virtual_output]: 8.3e-06 [merge_forward]: 4.20999e-06 [cell_reuse_recompute_pass]: 9.50007e-07 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.761e-05 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27001e-06 [meta_fg_expand]: 7.213e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.20001e-06 [after_resolve]: 1.643e-05 [a_after_grad]: 1.466e-05 [renormalize]: 0.0006325 [add_forward_monad_depend]: 4.32e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.436e-05 [cse]: 4.48e-05 [a_3]: 6.519e-05 [Cycle 3]: 0.00089926, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 1.089e-05 [loop_unroll]: 8.99998e-06 [a_1]: 0.00025009 [with_stream_mark]: 9.46e-06 [recompute_prepare]: 9.42999e-06 [updatestate_depend_eliminate]: 4.65001e-06 [updatestate_assign_eliminate]: 4.09002e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 0.0001233 [accelerated_algorithm]: 1.173e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 9.29e-06 [merge_send_recv]: 6.69001e-06 [auto_parallel]: 7.23e-06 [parallel]: 4.37e-06 [flash_sp]: 1.28002e-06 [merge_comm]: 4.97e-06 [allreduce_fusion]: 4.72e-06 [matmul_add_comm_reduction]: 7.91001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 8.78001e-06 [get_grad_eliminate_]: 8.53001e-06 [virtual_output]: 8.26002e-06 [merge_forward]: 4.03001e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 8.27e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.586e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.485e-05 [set_forward_comm_id_for_comm_node_pass]: 5.77001e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 8.79983e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.504e-05 [a_after_grad]: 1.399e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.14003e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 1.051e-05 [cse]: 2.542e-05 [a_3]: 5.933e-05 [py_interpret_to_execute_after_opt_a]: 1.002e-05 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 4.666e-05 [convert_after_rewriter]: 8.87e-06 [order_py_execute_after_rewriter]: 7.18e-06 [mutable_eliminate]: 0.00046013 [opt_b]: 0.00028521, [1] [Cycle 1]: 0.00027909, [7] [b_1]: 0.00018847 [b_2]: 1.072e-05 [updatestate_depend_eliminate]: 6.97002e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 4.03999e-06 [renormalize]: 3.80009e-07 [cse]: 3.031e-05 [optimize_parallel_all_gather_comm]: 2.02e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.005e-05 [loop_unroll]: 0.00042526 [opt_after_cconv]: 0.00013578, [1] [Cycle 1]: 0.0001294, [7] [c_1]: 4.864e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 7.12002e-06 [updatestate_assign_eliminate]: 4.19002e-06 [updatestate_loads_eliminate]: 3.82998e-06 [cse]: 2.922e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 2.808e-05 [tuple_transform]: 0.00010107, [1] [Cycle 1]: 9.65e-05, [4] [d_1]: 6.637e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.84999e-06 [partial_unused_args_eliminate]: 1.87001e-06 [add_recomputation]: 5.485e-05 [cse_after_recomputation]: 3.179e-05, [1] [Cycle 1]: 2.69e-05, [1] [cse]: 2.127e-05 [environ_conv]: 9.17999e-06 [swap_dp_allreduce_reducescatter]: 8.02e-06 [bias_add_comm_swap]: 2.65997e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.43998e-06 [micro_interleaved_order_control]: 2.07001e-06 [assign_add_opt]: 1.26002e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 7.39994e-07 [full_micro_interleaved_order_control]: 2.41998e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.21002e-06 [overlap_opt_shard_in_pipeline]: 1.65001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.732e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 4.92999e-06 [overlap_recompute_and_grad_model_parallel]: 5.64e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66002e-06 [overlap_recompute_comm]: 2.05002e-06 [overlap_grad_ring_attention]: 5.91998e-06 [overlap_grad_flash_sp]: 2.344e-05 [begin_end_overlap_inline]: 8.49977e-07 [split_matmul_comm_elemetwise]: 2.24999e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.9e-05, [1] [Cycle 1]: 9.479e-05, [6] [build]: 1.041e-05 [elim_shapecalc]: 1.379e-05 [elim_not_effective]: 1.807e-05 [opt_reshape]: 9.86e-06 [fold_const_symbol]: 1.469e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.93002e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 2.435e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00046719 [validate]: 4.351e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00808383 [execute]: 6.63e-06 Sums bootstrap : 0.000511s : 1.55% type_inference : 0.011388s : 34.54% event_method : 0.000046s : 0.14% auto_monad : 0.000120s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000040s : 0.12% optimize.rewriter_before_opt_a : 0.000144s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003211s : 9.74% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000510s : 1.55% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001469s : 4.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.28% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003065s : 9.30% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000231s : 0.70% optimize.opt_a.a_3 : 0.000460s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000460s : 1.40% optimize.opt_b.b_1 : 0.000188s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000425s : 1.29% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000055s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000006s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000467s : 1.42% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008084s : 24.52% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000752 222 6.05% : 0.000045s : 12: substitution.arithmetic_simplify 1.77% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.33% : 0.000416s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.46% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000015s : 3: substitution.less_batch_normalization 1.67% : 0.000013s : 11: substitution.minmaximum_grad 0.75% : 0.000006s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.21% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.58% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.77% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.32% : 0.000017s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011316 2 86.97% : 0.009842s : 1: type_inference.infer 13.03% : 0.001474s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.76% : 0.000125s : 17: replace.inline 42.24% : 0.000091s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000441 33 92.24% : 0.000407s : 17: match.inline 7.76% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000747 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.25% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.35% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000004s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000037s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.28% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001568 34 57.12% : 0.000895s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.88% : 0.000672s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061766 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.89% : 0.003022s : 1: add_attr 4.88% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000059s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.88% : 0.000545s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000434s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.92% : 0.004892s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.78% : 0.010984s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000477s : 1: opt_after_jit_grad 0.47% : 0.000289s : 1: opt_b 21.44% : 0.013241s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000044s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.59% : 0.001597s : 2: renormalize.infer 2.36% : 0.001456s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000102s : 1: symbol_engine_optimizer 13.10% : 0.008094s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.46% : 0.011403s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0183786, [24] [bootstrap]: 0.00046501 [type_inference]: 0.00430524 [event_method]: 1.045e-05 [auto_monad]: 4.987e-05 [graph_reusing]: 5.64e-06 [inline]: 1.59e-06 [add_attr]: 0.0029888, [1] [add_attr_with_inline]: 0.00298073, [1] [Cycle 1]: 4.487e-05, [2] [tag_attr]: 1.212e-05 [meta_addattr_fg_expand]: 3.2e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.14e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 9.89996e-07 [dataset_repeat_opt]: 1.56002e-06 [pipeline_split]: 1.76003e-06 [optimize]: 0.00364161, [53] [py_interpret_to_execute]: 1.493e-05 [rewriter_before_opt_a]: 3.741e-05 [opt_a]: 0.00185126, [2] [Cycle 1]: 0.00124399, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 2.398e-05 [loop_unroll]: 1.346e-05 [a_1]: 0.00028921 [with_stream_mark]: 1.292e-05 [recompute_prepare]: 7.30998e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.20002e-06 [updatestate_loads_eliminate]: 2.98e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.584e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.18998e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 6.18998e-06 [merge_send_recv]: 8.00999e-06 [auto_parallel]: 5.90002e-06 [parallel]: 1.769e-05 [flash_sp]: 7.45e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.2e-06 [matmul_add_comm_reduction]: 8.52e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.26999e-06 [virtual_dataset]: 5.61003e-06 [get_grad_eliminate_]: 5.63002e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.51001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.129e-05 [merge_recompute_call_nodes]: 1.46002e-06 [before_grad]: 9.95002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 2.60997e-06 [flash_sp_send_recv_attached]: 2.24999e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.077e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00033886 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.288e-05 [cse]: 2.638e-05 [a_3]: 3.967e-05 [Cycle 2]: 0.00059823, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.60002e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00013262 [with_stream_mark]: 9.00999e-06 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.761e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.37001e-06 [parallel]: 3.95998e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.61999e-06 [matmul_add_comm_reduction]: 5.35001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.84e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 4.90999e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 6.01e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.51e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.81001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 8.90999e-06 [a_after_grad]: 7.81001e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.09998e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.46e-06 [cse]: 1.354e-05 [a_3]: 3.22e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 2.32001e-06 [rewriter_after_opt_a]: 2.989e-05 [convert_after_rewriter]: 6.97002e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00044813 [opt_b]: 0.00017781, [1] [Cycle 1]: 0.00017202, [7] [b_1]: 0.00010645 [b_2]: 7.05998e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 3.59985e-07 [cse]: 1.56e-05 [optimize_parallel_all_gather_comm]: 1.6e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.214e-05 [loop_unroll]: 0.00041248 [opt_after_cconv]: 9.329e-05, [1] [Cycle 1]: 8.771e-05, [7] [c_1]: 2.737e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.591e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.211e-05 [tuple_transform]: 6.703e-05, [1] [Cycle 1]: 6.287e-05, [4] [d_1]: 3.778e-05 [none_parameter_eliminate]: 1.46002e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.88997e-06 [add_recomputation]: 4.334e-05 [cse_after_recomputation]: 1.951e-05, [1] [Cycle 1]: 1.504e-05, [1] [cse]: 1.004e-05 [environ_conv]: 4.68001e-06 [swap_dp_allreduce_reducescatter]: 4.97999e-06 [bias_add_comm_swap]: 2.34999e-06 [label_micro_interleaved_index]: 4.48001e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.14003e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.63003e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.29e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.112e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.61e-06 [overlap_grad_ring_attention]: 4.04002e-06 [overlap_grad_flash_sp]: 1.648e-05 [begin_end_overlap_inline]: 4.50003e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.90001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.806e-05, [1] [Cycle 1]: 6.357e-05, [6] [build]: 2.44001e-06 [elim_shapecalc]: 7.88999e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.535e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00044434 [validate]: 3.14e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00618092 [execute]: 7.31999e-06 Sums bootstrap : 0.000465s : 3.22% type_inference : 0.004305s : 29.82% event_method : 0.000010s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000422s : 2.92% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.10% optimize.opt_b.b_1 : 0.000106s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000412s : 2.86% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.26% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 3.08% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006181s : 42.81% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.59% : 0.000022s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.46% : 0.000005s : 4: substitution.graph_param_transform 64.75% : 0.000077s : 2: substitution.inline 2.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.32% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004265 2 91.86% : 0.003918s : 1: type_inference.infer 8.14% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000134 984 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.51% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.83% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.37% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.86% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.12% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.18% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.59% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.87% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 41.75% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.25% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026275 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.39% : 0.002993s : 1: add_attr 11.36% : 0.002984s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.90% : 0.000500s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000422s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000458s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 2.94% : 0.000772s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.06% : 0.001854s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.73% : 0.000454s : 1: opt_after_jit_grad 0.69% : 0.000181s : 1: opt_b 13.87% : 0.003645s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000185s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.56% : 0.006191s : 1: task_emit 0.27% : 0.000070s : 1: tuple_transform 16.44% : 0.004320s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0359725, [24] [bootstrap]: 0.00051067 [type_inference]: 0.0102822 [event_method]: 3.926e-05 [auto_monad]: 0.00011497 [graph_reusing]: 7.75e-06 [inline]: 2.07001e-06 [add_attr]: 0.00302285, [1] [add_attr_with_inline]: 0.00301435, [1] [Cycle 1]: 6.73e-05, [2] [tag_attr]: 3.187e-05 [meta_addattr_fg_expand]: 8.16002e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 4.625e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.0129725, [53] [py_interpret_to_execute]: 3.707e-05 [rewriter_before_opt_a]: 0.00012973 [opt_a]: 0.0107309, [3] [Cycle 1]: 0.00687159, [45] [expand_dump_flag]: 3.68999e-06 [switch_simplify]: 6.707e-05 [loop_unroll]: 5.529e-05 [a_1]: 0.00134106 [with_stream_mark]: 2.262e-05 [recompute_prepare]: 2.257e-05 [updatestate_depend_eliminate]: 9.04e-06 [updatestate_assign_eliminate]: 7.85e-06 [updatestate_loads_eliminate]: 7.45998e-06 [parameter_eliminate]: 2.78e-06 [a_2]: 0.00024475 [accelerated_algorithm]: 3.1e-05 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 3.28e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.556e-05 [auto_parallel]: 1.12e-05 [parallel]: 1.873e-05 [flash_sp]: 1.191e-05 [merge_comm]: 9.52001e-06 [allreduce_fusion]: 8.89998e-06 [matmul_add_comm_reduction]: 2.628e-05 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 1.753e-05 [virtual_dataset]: 1.539e-05 [get_grad_eliminate_]: 1.492e-05 [virtual_output]: 1.508e-05 [merge_forward]: 9.35001e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 1.727e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.873e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.707e-05 [set_forward_comm_id_for_comm_node_pass]: 9.62999e-06 [meta_fg_expand]: 0.00137992 [flash_sp_send_recv_attached]: 4.11001e-06 [receive_attached]: 3.03998e-06 [after_resolve]: 5.922e-05 [a_after_grad]: 7.973e-05 [renormalize]: 0.00240546 [add_forward_monad_depend]: 8.99e-06 [auto_monad_grad]: 5.01002e-06 [auto_monad_eliminator]: 5.649e-05 [cse]: 0.00015941 [a_3]: 0.00033406 [Cycle 2]: 0.00294646, [45] [expand_dump_flag]: 1.47001e-06 [switch_simplify]: 4.674e-05 [loop_unroll]: 4.406e-05 [a_1]: 0.00151547 [with_stream_mark]: 1.191e-05 [recompute_prepare]: 1.069e-05 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.0001263 [accelerated_algorithm]: 1.169e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 6.94999e-06 [auto_parallel]: 7.3e-06 [parallel]: 4.72e-06 [flash_sp]: 2.93998e-06 [merge_comm]: 5.09998e-06 [allreduce_fusion]: 4.64002e-06 [matmul_add_comm_reduction]: 7.78001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.045e-05 [virtual_dataset]: 8.53001e-06 [get_grad_eliminate_]: 9.44998e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.50999e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 8.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.598e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.373e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.357e-05 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.51e-05 [a_after_grad]: 1.406e-05 [renormalize]: 0.00059251 [add_forward_monad_depend]: 4.00998e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.469e-05 [cse]: 4.56e-05 [a_3]: 6.576e-05 [Cycle 3]: 0.00089853, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 1.062e-05 [loop_unroll]: 8.85999e-06 [a_1]: 0.00024829 [with_stream_mark]: 1.026e-05 [recompute_prepare]: 9.43002e-06 [updatestate_depend_eliminate]: 4.87998e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 3.82998e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00012243 [accelerated_algorithm]: 1.175e-05 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.65001e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 6.74999e-06 [auto_parallel]: 7.28e-06 [parallel]: 4.67e-06 [flash_sp]: 1.09e-06 [merge_comm]: 4.82998e-06 [allreduce_fusion]: 4.87998e-06 [matmul_add_comm_reduction]: 7.58001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 1.019e-05 [virtual_dataset]: 8.78001e-06 [get_grad_eliminate_]: 8.43001e-06 [virtual_output]: 8.15e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 1.54e-06 [offload_activation]: 8.60999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.635e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.379e-05 [set_forward_comm_id_for_comm_node_pass]: 5.12e-06 [meta_fg_expand]: 2.92002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.372e-05 [a_after_grad]: 1.457e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.29e-06 [auto_monad_grad]: 1.04003e-06 [auto_monad_eliminator]: 1.125e-05 [cse]: 2.568e-05 [a_3]: 5.912e-05 [py_interpret_to_execute_after_opt_a]: 1.063e-05 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 4.69e-05 [convert_after_rewriter]: 9.31e-06 [order_py_execute_after_rewriter]: 6.64001e-06 [mutable_eliminate]: 0.0004636 [opt_b]: 0.00028518, [1] [Cycle 1]: 0.00027927, [7] [b_1]: 0.00018803 [b_2]: 1.073e-05 [updatestate_depend_eliminate]: 7.05e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 3.93001e-06 [renormalize]: 6.00005e-07 [cse]: 3.03e-05 [optimize_parallel_all_gather_comm]: 2.042e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 1.978e-05 [loop_unroll]: 0.00042128 [opt_after_cconv]: 0.00013519, [1] [Cycle 1]: 0.00012905, [7] [c_1]: 4.821e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.83999e-06 [cse]: 2.946e-05 [renormalize]: 3.7998e-07 [remove_dup_value]: 2.965e-05 [tuple_transform]: 0.00010073, [1] [Cycle 1]: 9.62e-05, [4] [d_1]: 6.622e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 9.59e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 5.773e-05 [cse_after_recomputation]: 3.152e-05, [1] [Cycle 1]: 2.707e-05, [1] [cse]: 2.14e-05 [environ_conv]: 8.48999e-06 [swap_dp_allreduce_reducescatter]: 7.85e-06 [bias_add_comm_swap]: 2.91999e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.31998e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.51002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.29998e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.83e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.723e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 4.72e-06 [overlap_recompute_and_grad_model_parallel]: 5.81e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 5.25001e-06 [overlap_grad_flash_sp]: 2.443e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 2.08998e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 9.626e-05, [1] [Cycle 1]: 9.197e-05, [6] [build]: 8.78001e-06 [elim_shapecalc]: 1.32e-05 [elim_not_effective]: 1.757e-05 [opt_reshape]: 1.001e-05 [fold_const_symbol]: 1.476e-05 [renormalize]: 1.90019e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 2.474e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.58999e-06 [opt_after_jit_grad]: 0.00047521 [validate]: 4.341e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.00814582 [execute]: 7.23e-06 Sums bootstrap : 0.000511s : 1.61% type_inference : 0.010282s : 32.49% event_method : 0.000039s : 0.12% auto_monad : 0.000115s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.12% optimize.rewriter_before_opt_a : 0.000130s : 0.41% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003105s : 9.81% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.56% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001416s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000108s : 0.34% optimize.opt_a.renormalize : 0.002998s : 9.47% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.26% optimize.opt_a.cse : 0.000231s : 0.73% optimize.opt_a.a_3 : 0.000459s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000464s : 1.47% optimize.opt_b.b_1 : 0.000188s : 0.59% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000421s : 1.33% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000475s : 1.50% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008146s : 25.74% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000727 218 5.95% : 0.000043s : 11: substitution.arithmetic_simplify 2.07% : 0.000015s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.07% : 0.000008s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.43% : 0.000396s : 16: substitution.inline 2.09% : 0.000015s : 2: substitution.inline_without_move 1.36% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.09% : 0.000015s : 3: substitution.less_batch_normalization 1.76% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.92% : 0.000014s : 20: substitution.remove_not_recompute_node 3.24% : 0.000024s : 10: substitution.replace_applicator 1.55% : 0.000011s : 15: substitution.replace_old_param 0.41% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.71% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010214 2 87.39% : 0.008926s : 1: type_inference.infer 12.61% : 0.001288s : 1: type_inference.specialize ------[replace.] 0.000200 30 58.80% : 0.000117s : 16: replace.inline 41.20% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000417 30 92.83% : 0.000387s : 16: match.inline 7.17% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.12% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.08% : 0.000015s : 99: predicate.arithmetic_simplify 1.11% : 0.000008s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.15% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.18% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000015s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.64% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.44% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.28% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.17% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001494 32 56.82% : 0.000849s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.18% : 0.000645s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060040 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.04% : 0.003027s : 1: add_attr 5.03% : 0.003018s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.91% : 0.000546s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000045s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000473s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.91% : 0.004750s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.88% : 0.010734s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.81% : 0.000485s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.61% : 0.012976s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.62% : 0.001575s : 2: renormalize.infer 2.35% : 0.001410s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000134s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000099s : 1: symbol_engine_optimizer 13.59% : 0.008156s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.15% : 0.010297s : 1: type_inference 0.21% : 0.000126s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x1-kbk],max_mem:56.0M . TotalTime = 0.0790814, [24] [bootstrap]: 0.00056766 [type_inference]: 0.00611929 [event_method]: 1.426e-05 [auto_monad]: 5.378e-05 [graph_reusing]: 5.01002e-06 [inline]: 1.97999e-06 [add_attr]: 0.00346553, [1] [add_attr_with_inline]: 0.00345518, [1] [Cycle 1]: 4.456e-05, [2] [tag_attr]: 1.493e-05 [meta_addattr_fg_expand]: 3.87998e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.662e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 9.20001e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.47999e-06 [optimize]: 0.0039685, [53] [py_interpret_to_execute]: 1.941e-05 [rewriter_before_opt_a]: 5.93e-05 [opt_a]: 0.00212524, [2] [Cycle 1]: 0.0015308, [45] [expand_dump_flag]: 2.91999e-06 [switch_simplify]: 3.148e-05 [loop_unroll]: 2.058e-05 [a_1]: 0.00045088 [with_stream_mark]: 1.37e-05 [recompute_prepare]: 7.52002e-06 [updatestate_depend_eliminate]: 3.71999e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.7e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 1.97001e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 7.21001e-06 [auto_parallel]: 6.06e-06 [parallel]: 2.344e-05 [flash_sp]: 7.07002e-06 [merge_comm]: 3.88999e-06 [allreduce_fusion]: 3.23998e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.26001e-06 [virtual_dataset]: 6.45997e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.54998e-06 [merge_forward]: 3.77002e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.52999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.6e-06 [meta_fg_expand]: 2.22001e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.48998e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00043314 [add_forward_monad_depend]: 4.15999e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 2.682e-05 [a_3]: 4.048e-05 [Cycle 2]: 0.00058494, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.83998e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012497 [with_stream_mark]: 9.57001e-06 [recompute_prepare]: 5.42001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.841e-05 [accelerated_algorithm]: 5.40001e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.56998e-06 [merge_send_recv]: 4.1e-06 [auto_parallel]: 4.90001e-06 [parallel]: 4.72e-06 [flash_sp]: 2.97002e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.50002e-06 [cell_reuse_recompute_pass]: 1.84998e-06 [offload_activation]: 5.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62999e-06 [merge_recompute_call_nodes]: 6.40022e-07 [before_grad]: 7.87003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.59999e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.90025e-07 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 5.74e-06 [cse]: 1.22e-05 [a_3]: 3.135e-05 [py_interpret_to_execute_after_opt_a]: 7.58999e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 2.985e-05 [convert_after_rewriter]: 6.54999e-06 [order_py_execute_after_rewriter]: 5.01002e-06 [mutable_eliminate]: 0.00045213 [opt_b]: 0.00018551, [1] [Cycle 1]: 0.00017969, [7] [b_1]: 0.00011101 [b_2]: 7.25998e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 5.50004e-07 [cse]: 1.657e-05 [optimize_parallel_all_gather_comm]: 1.517e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.319e-05 [loop_unroll]: 0.0004162 [opt_after_cconv]: 9.5e-05, [1] [Cycle 1]: 8.902e-05, [7] [c_1]: 2.809e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.61e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.284e-05 [tuple_transform]: 6.962e-05, [1] [Cycle 1]: 6.506e-05, [4] [d_1]: 3.983e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.97999e-06 [partial_unused_args_eliminate]: 1.90001e-06 [add_recomputation]: 5.153e-05 [cse_after_recomputation]: 2.036e-05, [1] [Cycle 1]: 1.611e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.21002e-06 [bias_add_comm_swap]: 2.31998e-06 [label_micro_interleaved_index]: 3.92998e-06 [label_fine_grained_interleaved_index]: 3.14999e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.08002e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.01997e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.141e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.38e-06 [overlap_recompute_and_grad_model_parallel]: 4.70999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.91e-06 [overlap_grad_ring_attention]: 3.74002e-06 [overlap_grad_flash_sp]: 1.622e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.28002e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 1.26997e-06 [symbol_engine_optimizer]: 6.712e-05, [1] [Cycle 1]: 6.31e-05, [6] [build]: 1.99e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 8.75999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 1.515e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00044761 [validate]: 3.125e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0641389 [execute]: 7.86001e-06 Sums bootstrap : 0.000568s : 0.76% type_inference : 0.006119s : 8.20% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000576s : 0.77% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000011s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000433s : 0.58% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.61% optimize.opt_b.b_1 : 0.000111s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000416s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000052s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000448s : 0.60% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064139s : 85.91% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.83% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.08% : 0.000005s : 4: substitution.graph_param_transform 66.53% : 0.000108s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000004s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.92% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006075 2 90.94% : 0.005525s : 1: type_inference.infer 9.06% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.81% : 0.000027s : 3: replace.inline 30.19% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.21% : 0.000106s : 3: match.inline 8.79% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.17% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 2.01% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.77% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.76% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.32% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000347 8 45.63% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.37% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088049 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.94% : 0.003470s : 1: add_attr 3.93% : 0.003459s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.69% : 0.000605s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.07% : 0.000942s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000094s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.42% : 0.002128s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.52% : 0.000457s : 1: opt_after_jit_grad 0.21% : 0.000189s : 1: opt_b 4.51% : 0.003972s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000031s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000210s : 1: renormalize.infer 0.25% : 0.000217s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 72.86% : 0.064156s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.97% : 0.006133s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.0701794, [24] [bootstrap]: 0.00051464 [type_inference]: 0.00441094 [event_method]: 1.089e-05 [auto_monad]: 4.924e-05 [graph_reusing]: 5.16998e-06 [inline]: 1.69e-06 [add_attr]: 0.00295579, [1] [add_attr_with_inline]: 0.00294795, [1] [Cycle 1]: 4.669e-05, [2] [tag_attr]: 1.166e-05 [meta_addattr_fg_expand]: 3.33e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.179e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.61998e-06 [optimize]: 0.00365583, [53] [py_interpret_to_execute]: 1.485e-05 [rewriter_before_opt_a]: 3.776e-05 [opt_a]: 0.00186536, [2] [Cycle 1]: 0.00126653, [45] [expand_dump_flag]: 2.04999e-06 [switch_simplify]: 2.393e-05 [loop_unroll]: 1.35e-05 [a_1]: 0.00029068 [with_stream_mark]: 1.347e-05 [recompute_prepare]: 7.50998e-06 [updatestate_depend_eliminate]: 3.43e-06 [updatestate_assign_eliminate]: 3.21999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.92001e-06 [a_2]: 7.575e-05 [accelerated_algorithm]: 5.96e-06 [shard]: 2.03002e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 7.27002e-06 [auto_parallel]: 5.77999e-06 [parallel]: 1.708e-05 [flash_sp]: 7.16999e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.45999e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 6.90998e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.35e-06 [cell_reuse_recompute_pass]: 1.16997e-06 [offload_activation]: 9.24998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43999e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.11998e-06 [after_resolve]: 1.011e-05 [a_after_grad]: 8.83001e-06 [renormalize]: 0.00033904 [add_forward_monad_depend]: 4.3e-06 [auto_monad_grad]: 1.79998e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.802e-05 [a_3]: 3.953e-05 [Cycle 2]: 0.0005895, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.41002e-06 [a_1]: 0.00012528 [with_stream_mark]: 9.41003e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 7.60017e-07 [a_2]: 6.761e-05 [accelerated_algorithm]: 5.56002e-06 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 4.47998e-06 [auto_parallel]: 5.32999e-06 [parallel]: 4.12e-06 [flash_sp]: 3.00002e-06 [merge_comm]: 3.08998e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 6.01e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.74e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.76001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.10998e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 7.83001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.05002e-06 [cse]: 1.208e-05 [a_3]: 3.181e-05 [py_interpret_to_execute_after_opt_a]: 7.43e-06 [slice_cell_reuse_recomputed_activation]: 1.78002e-06 [rewriter_after_opt_a]: 3.076e-05 [convert_after_rewriter]: 7.00998e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00044866 [opt_b]: 0.00017888, [1] [Cycle 1]: 0.00017304, [7] [b_1]: 0.00010659 [b_2]: 6.84001e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 4.40021e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.534e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.165e-05 [loop_unroll]: 0.00041158 [opt_after_cconv]: 9.431e-05, [1] [Cycle 1]: 8.882e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.557e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.23e-05 [tuple_transform]: 6.875e-05, [1] [Cycle 1]: 6.47e-05, [4] [d_1]: 3.925e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.16998e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 4.117e-05 [cse_after_recomputation]: 2.047e-05, [1] [Cycle 1]: 1.601e-05, [1] [cse]: 1.069e-05 [environ_conv]: 4.89e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.25002e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.68003e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 1.09e-06 [remove_cast_before_assign_add]: 7.2e-07 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.09003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91003e-06 [control_data_broadcast_order]: 1.112e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.33001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.02001e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.691e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 6.739e-05, [1] [Cycle 1]: 6.323e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.559e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00047003 [validate]: 3.142e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0578167 [execute]: 7.7e-06 Sums bootstrap : 0.000515s : 0.78% type_inference : 0.004411s : 6.66% event_method : 0.000011s : 0.02% auto_monad : 0.000049s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.63% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000339s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.68% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000041s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000470s : 0.71% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057817s : 87.28% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.22% : 0.000022s : 4: substitution.arithmetic_simplify 1.36% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.93% : 0.000006s : 4: substitution.graph_param_transform 65.59% : 0.000079s : 2: substitution.inline 2.21% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.14% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004369 2 91.53% : 0.003999s : 1: type_inference.infer 8.47% : 0.000370s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000134 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.09% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000002s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.09% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.43% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.17% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.89% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000266 6 42.69% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.31% : 0.000152s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078050 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.79% : 0.002960s : 1: add_attr 3.78% : 0.002951s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000045s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000054s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.70% : 0.000550s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000763s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.39% : 0.001868s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.61% : 0.000479s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.69% : 0.003660s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000186s : 1: renormalize.infer 0.19% : 0.000146s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.10% : 0.057832s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.67% : 0.004424s : 1: type_inference 0.07% : 0.000051s : 1: validate TotalTime = 0.0714444, [24] [bootstrap]: 0.000482 [type_inference]: 0.00554763 [event_method]: 1.445e-05 [auto_monad]: 5.256e-05 [graph_reusing]: 5.17999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00297384, [1] [add_attr_with_inline]: 0.00296587, [1] [Cycle 1]: 4.517e-05, [2] [tag_attr]: 1.608e-05 [meta_addattr_fg_expand]: 4.1e-06 [parallel-infer-symbol]: 2.58003e-06 [pre_auto_parallel]: 2.551e-05 [insert-virtual-dataset]: 2.51998e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00394329, [53] [py_interpret_to_execute]: 2.032e-05 [rewriter_before_opt_a]: 5.771e-05 [opt_a]: 0.00209576, [2] [Cycle 1]: 0.00149348, [45] [expand_dump_flag]: 3.08998e-06 [switch_simplify]: 3.152e-05 [loop_unroll]: 2.032e-05 [a_1]: 0.00044867 [with_stream_mark]: 1.321e-05 [recompute_prepare]: 8.17e-06 [updatestate_depend_eliminate]: 3.62998e-06 [updatestate_assign_eliminate]: 3.28998e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 1.73002e-06 [a_2]: 7.601e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 2.17001e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 7.98999e-06 [auto_parallel]: 6.41e-06 [parallel]: 1.847e-05 [flash_sp]: 7.11001e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.47002e-06 [matmul_add_comm_reduction]: 8.43999e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.87001e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.64998e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 8.68001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.058e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.66e-06 [receive_attached]: 2.15002e-06 [after_resolve]: 9.96e-06 [a_after_grad]: 8.63001e-06 [renormalize]: 0.00040897 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.323e-05 [cse]: 2.578e-05 [a_3]: 4.108e-05 [Cycle 2]: 0.00059282, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.0001239 [with_stream_mark]: 9.32001e-06 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.14e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.769e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.23001e-06 [auto_parallel]: 5.18002e-06 [parallel]: 3.78001e-06 [flash_sp]: 3.26999e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.53002e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.17e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 8.87999e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 6.44001e-06 [cse]: 1.298e-05 [a_3]: 3.161e-05 [py_interpret_to_execute_after_opt_a]: 7.21001e-06 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 3.228e-05 [convert_after_rewriter]: 7.6e-06 [order_py_execute_after_rewriter]: 5.29e-06 [mutable_eliminate]: 0.00047158 [opt_b]: 0.00017908, [1] [Cycle 1]: 0.00017324, [7] [b_1]: 0.00010708 [b_2]: 7.23e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.19997e-07 [cse]: 1.557e-05 [optimize_parallel_all_gather_comm]: 1.484e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.162e-05 [loop_unroll]: 0.00041413 [opt_after_cconv]: 9.373e-05, [1] [Cycle 1]: 8.811e-05, [7] [c_1]: 2.811e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.554e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.301e-05 [tuple_transform]: 6.953e-05, [1] [Cycle 1]: 6.506e-05, [4] [d_1]: 3.922e-05 [none_parameter_eliminate]: 1.39e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.21998e-06 [partial_unused_args_eliminate]: 1.89999e-06 [add_recomputation]: 4.243e-05 [cse_after_recomputation]: 2.016e-05, [1] [Cycle 1]: 1.601e-05, [1] [cse]: 1.081e-05 [environ_conv]: 4.48001e-06 [swap_dp_allreduce_reducescatter]: 5.40001e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.50999e-06 [label_fine_grained_interleaved_index]: 2.41e-06 [merge_cast_opt]: 1.14998e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 1.98997e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.53998e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.176e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 4.95001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.715e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.34999e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 1.18001e-06 [symbol_engine_optimizer]: 6.793e-05, [1] [Cycle 1]: 6.364e-05, [6] [build]: 2.51998e-06 [elim_shapecalc]: 8.23001e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.65001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.71e-06 [auto_monad_reorder]: 1.539e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.30003e-06 [opt_after_jit_grad]: 0.00044849 [validate]: 3.087e-05 [backend_pass]: 9.19972e-07 [task_emit]: 0.0576839 [execute]: 8.37e-06 Sums bootstrap : 0.000482s : 0.71% type_inference : 0.005548s : 8.22% event_method : 0.000014s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000573s : 0.85% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000409s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000472s : 0.70% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000414s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.66% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057684s : 85.44% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.77% : 0.000024s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 66.84% : 0.000110s : 3: substitution.inline 1.92% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.70% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005507 2 90.02% : 0.004957s : 1: type_inference.infer 9.98% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.41% : 0.000026s : 3: replace.inline 30.59% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.61% : 0.000107s : 3: match.inline 8.39% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.46% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.84% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.89% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.44% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 47.01% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.99% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079865 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.73% : 0.002978s : 1: add_attr 3.72% : 0.002969s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000516s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.60% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.17% : 0.000937s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.63% : 0.002099s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.57% : 0.000458s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.94% : 0.003947s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000207s : 1: renormalize.infer 0.25% : 0.000196s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.08% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 72.25% : 0.057700s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.96% : 0.005561s : 1: type_inference 0.06% : 0.000051s : 1: validate TotalTime = 0.10771, [24] [bootstrap]: 0.00050668 [type_inference]: 0.0113955 [event_method]: 4.803e-05 [auto_monad]: 0.00011969 [graph_reusing]: 8.28999e-06 [inline]: 1.81998e-06 [add_attr]: 0.00298825, [1] [add_attr_with_inline]: 0.00297949, [1] [Cycle 1]: 7.021e-05, [2] [tag_attr]: 3.408e-05 [meta_addattr_fg_expand]: 9.34e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 4.884e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.95001e-06 [optimize]: 0.0134295, [53] [py_interpret_to_execute]: 3.757e-05 [rewriter_before_opt_a]: 0.00014502 [opt_a]: 0.0110752, [3] [Cycle 1]: 0.00710839, [45] [expand_dump_flag]: 4.85001e-06 [switch_simplify]: 7.555e-05 [loop_unroll]: 6.204e-05 [a_1]: 0.00144342 [with_stream_mark]: 2.284e-05 [recompute_prepare]: 2.154e-05 [updatestate_depend_eliminate]: 9.12999e-06 [updatestate_assign_eliminate]: 7.7e-06 [updatestate_loads_eliminate]: 7.65e-06 [parameter_eliminate]: 2.63998e-06 [a_2]: 0.00024213 [accelerated_algorithm]: 3.043e-05 [shard]: 2.19001e-06 [meta_shard_fg_expand]: 3.31999e-06 [shard_inline]: 1.618e-05 [merge_send_recv]: 1.542e-05 [auto_parallel]: 1.073e-05 [parallel]: 1.782e-05 [flash_sp]: 1.149e-05 [merge_comm]: 9.19998e-06 [allreduce_fusion]: 8.97999e-06 [matmul_add_comm_reduction]: 2.63e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.775e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.52e-05 [virtual_output]: 1.509e-05 [merge_forward]: 9.74e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 1.76e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.856e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 2.723e-05 [set_forward_comm_id_for_comm_node_pass]: 9.57999e-06 [meta_fg_expand]: 0.0013959 [flash_sp_send_recv_attached]: 3.41001e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 5.943e-05 [a_after_grad]: 0.00013418 [renormalize]: 0.00245149 [add_forward_monad_depend]: 8.94e-06 [auto_monad_grad]: 5.65001e-06 [auto_monad_eliminator]: 5.629e-05 [cse]: 0.00016833 [a_3]: 0.00033689 [Cycle 2]: 0.00305063, [45] [expand_dump_flag]: 1.52001e-06 [switch_simplify]: 4.719e-05 [loop_unroll]: 4.452e-05 [a_1]: 0.00157387 [with_stream_mark]: 1.206e-05 [recompute_prepare]: 1.132e-05 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 4.03999e-06 [parameter_eliminate]: 9.99979e-07 [a_2]: 0.00012658 [accelerated_algorithm]: 1.241e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.41e-06 [merge_send_recv]: 7.16999e-06 [auto_parallel]: 7.38e-06 [parallel]: 4.70999e-06 [flash_sp]: 3.23e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.52998e-06 [matmul_add_comm_reduction]: 8.43999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.013e-05 [virtual_dataset]: 8.95999e-06 [get_grad_eliminate_]: 8.73001e-06 [virtual_output]: 8.62998e-06 [merge_forward]: 4.38999e-06 [cell_reuse_recompute_pass]: 9.29984e-07 [offload_activation]: 8.78001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.632e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.15001e-06 [meta_fg_expand]: 7.1e-05 [flash_sp_send_recv_attached]: 9.29984e-07 [receive_attached]: 1.25001e-06 [after_resolve]: 1.644e-05 [a_after_grad]: 1.476e-05 [renormalize]: 0.00059014 [add_forward_monad_depend]: 3.99002e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.469e-05 [cse]: 4.674e-05 [a_3]: 6.524e-05 [Cycle 3]: 0.00090079, [45] [expand_dump_flag]: 9.19972e-07 [switch_simplify]: 1.058e-05 [loop_unroll]: 9.07999e-06 [a_1]: 0.00025039 [with_stream_mark]: 1.003e-05 [recompute_prepare]: 9.61998e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 0.00012342 [accelerated_algorithm]: 1.15e-05 [shard]: 9.09989e-07 [meta_shard_fg_expand]: 1.62999e-06 [shard_inline]: 9.22001e-06 [merge_send_recv]: 7.16001e-06 [auto_parallel]: 7.19001e-06 [parallel]: 4.60001e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 4.75001e-06 [allreduce_fusion]: 4.71002e-06 [matmul_add_comm_reduction]: 7.25e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.53001e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 8.30999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 7.90023e-07 [before_grad]: 1.41e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 2.94999e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.456e-05 [a_after_grad]: 1.498e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 1.04998e-06 [auto_monad_eliminator]: 1.109e-05 [cse]: 2.593e-05 [a_3]: 5.952e-05 [py_interpret_to_execute_after_opt_a]: 1.089e-05 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 4.721e-05 [convert_after_rewriter]: 9.37999e-06 [order_py_execute_after_rewriter]: 7.31999e-06 [mutable_eliminate]: 0.00046114 [opt_b]: 0.00030365, [1] [Cycle 1]: 0.00029749, [7] [b_1]: 0.00019114 [b_2]: 1.071e-05 [updatestate_depend_eliminate]: 7.04001e-06 [updatestate_assign_eliminate]: 4.12998e-06 [updatestate_loads_eliminate]: 4.19002e-06 [renormalize]: 4.89992e-07 [cse]: 4.532e-05 [optimize_parallel_all_gather_comm]: 2.076e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.048e-05 [loop_unroll]: 0.00043037 [opt_after_cconv]: 0.0001367, [1] [Cycle 1]: 0.00013073, [7] [c_1]: 4.897e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 7.16999e-06 [updatestate_assign_eliminate]: 4.17e-06 [updatestate_loads_eliminate]: 4.04002e-06 [cse]: 2.998e-05 [renormalize]: 2.40019e-07 [remove_dup_value]: 2.867e-05 [tuple_transform]: 0.00010147, [1] [Cycle 1]: 9.689e-05, [4] [d_1]: 6.668e-05 [none_parameter_eliminate]: 2.02001e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 9.79999e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 5.628e-05 [cse_after_recomputation]: 3.2e-05, [1] [Cycle 1]: 2.743e-05, [1] [cse]: 2.205e-05 [environ_conv]: 9.29e-06 [swap_dp_allreduce_reducescatter]: 8.20999e-06 [bias_add_comm_swap]: 2.72001e-06 [label_micro_interleaved_index]: 4.06001e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 9.79984e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.53003e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.01002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81003e-06 [control_data_broadcast_order]: 1.7e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 5.26002e-06 [overlap_recompute_and_grad_model_parallel]: 5.79e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 5.17e-06 [overlap_grad_flash_sp]: 2.379e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 9.803e-05, [1] [Cycle 1]: 9.387e-05, [6] [build]: 1.005e-05 [elim_shapecalc]: 1.31e-05 [elim_not_effective]: 1.812e-05 [opt_reshape]: 1.019e-05 [fold_const_symbol]: 1.472e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 2.451e-05 [get_jit_bprop_graph]: 1.15999e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00047592 [validate]: 4.419e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.0783819 [execute]: 8.12e-06 Sums bootstrap : 0.000507s : 0.49% type_inference : 0.011395s : 11.02% event_method : 0.000048s : 0.05% auto_monad : 0.000120s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000145s : 0.14% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000133s : 0.13% optimize.opt_a.loop_unroll : 0.000116s : 0.11% optimize.opt_a.a_1 : 0.003268s : 3.16% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000492s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000019s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001470s : 1.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.09% optimize.opt_a.a_after_grad : 0.000164s : 0.16% optimize.opt_a.renormalize : 0.003042s : 2.94% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000241s : 0.23% optimize.opt_a.a_3 : 0.000462s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000461s : 0.45% optimize.opt_b.b_1 : 0.000191s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000045s : 0.04% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000430s : 0.42% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000476s : 0.46% validate : 0.000044s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.078382s : 75.81% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000760 222 5.98% : 0.000045s : 12: substitution.arithmetic_simplify 1.70% : 0.000013s : 2: substitution.cast_eliminate 0.40% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.60% : 0.000423s : 17: substitution.inline 2.14% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.80% : 0.000014s : 20: substitution.remove_not_recompute_node 3.23% : 0.000025s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.52% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.78% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011323 2 86.35% : 0.009777s : 1: type_inference.infer 13.65% : 0.001546s : 1: type_inference.specialize ------[replace.] 0.000217 33 57.38% : 0.000125s : 17: replace.inline 42.62% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000448 33 92.36% : 0.000414s : 17: match.inline 7.64% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000785 5764 5.59% : 0.000044s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.02% : 0.000008s : 68: predicate.addn_zero_filter 1.00% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.94% : 0.000015s : 100: predicate.arithmetic_simplify 1.09% : 0.000009s : 68: predicate.cast_eliminate 1.08% : 0.000008s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.49% : 0.000004s : 32: predicate.depend_value_elim 1.14% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.13% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.15% : 0.000009s : 76: predicate.environ_get_depend_swap 1.67% : 0.000013s : 108: predicate.environ_get_eliminate 1.14% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.65% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.19% : 0.000017s : 101: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.62% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.53% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.38% : 0.000042s : 249: predicate.inline 1.22% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.61% : 0.000005s : 32: predicate.less_batch_normalization 1.56% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.54% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.33% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.50% : 0.000004s : 32: predicate.merge_addn 1.05% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.05% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.06% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.91% : 0.000015s : 101: predicate.partial_defer_inline 1.67% : 0.000013s : 92: predicate.partial_eliminate 1.02% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000010s : 68: predicate.reduce_eliminate 2.58% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.83% : 0.000014s : 152: predicate.replace_applicator 0.58% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.04% : 0.000008s : 68: predicate.reshape_eliminate 1.08% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.20% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.57% : 0.000004s : 32: predicate.shard_identity_eliminate 0.27% : 0.000002s : 16: predicate.special_op_eliminate 0.58% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.79% : 0.000014s : 101: predicate.switch_defer_inline 2.83% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.77% : 0.000037s : 277: predicate.switch_simplify 1.03% : 0.000008s : 68: predicate.tile_eliminate 1.02% : 0.000008s : 68: predicate.transpose_eliminate 1.38% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.78% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.92% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.55% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.52% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.11% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.52% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.53% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001592 34 56.87% : 0.000906s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.13% : 0.000687s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.132485 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.26% : 0.002993s : 1: add_attr 2.25% : 0.002983s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000128s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000541s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000055s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000439s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.77% : 0.004990s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000176s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.36% : 0.011078s : 1: opt_a 0.11% : 0.000140s : 1: opt_after_cconv 0.37% : 0.000485s : 1: opt_after_jit_grad 0.23% : 0.000307s : 1: opt_b 10.14% : 0.013433s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.22% : 0.001620s : 2: renormalize.infer 1.06% : 0.001408s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.17% : 0.000220s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 59.18% : 0.078400s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 8.61% : 0.011410s : 1: type_inference 0.05% : 0.000069s : 1: validate TotalTime = 0.0700091, [24] [bootstrap]: 0.00046701 [type_inference]: 0.00428181 [event_method]: 1.105e-05 [auto_monad]: 4.961e-05 [graph_reusing]: 4.80999e-06 [inline]: 2.10002e-06 [add_attr]: 0.00299913, [1] [add_attr_with_inline]: 0.00299132, [1] [Cycle 1]: 4.382e-05, [2] [tag_attr]: 1.171e-05 [meta_addattr_fg_expand]: 3.42002e-06 [parallel-infer-symbol]: 2.59999e-06 [pre_auto_parallel]: 2.137e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.00368019, [53] [py_interpret_to_execute]: 1.443e-05 [rewriter_before_opt_a]: 3.759e-05 [opt_a]: 0.00184133, [2] [Cycle 1]: 0.00124145, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 2.391e-05 [loop_unroll]: 1.357e-05 [a_1]: 0.00028894 [with_stream_mark]: 1.354e-05 [recompute_prepare]: 7.33e-06 [updatestate_depend_eliminate]: 3.69002e-06 [updatestate_assign_eliminate]: 3.32002e-06 [updatestate_loads_eliminate]: 2.73998e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.697e-05 [accelerated_algorithm]: 6.23e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.47001e-06 [parallel]: 1.709e-05 [flash_sp]: 6.96999e-06 [merge_comm]: 3.81001e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 9.44e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.59002e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.096e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.19e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 2.22001e-06 [receive_attached]: 2.26998e-06 [after_resolve]: 1.084e-05 [a_after_grad]: 8.76997e-06 [renormalize]: 0.00033936 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.91e-06 [auto_monad_eliminator]: 1.3e-05 [cse]: 2.554e-05 [a_3]: 3.888e-05 [Cycle 2]: 0.0005909, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.38002e-06 [a_1]: 0.00012443 [with_stream_mark]: 9.20999e-06 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.64001e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.758e-05 [accelerated_algorithm]: 5.37999e-06 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.41998e-06 [merge_send_recv]: 4.34002e-06 [auto_parallel]: 5.24e-06 [parallel]: 3.75e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 3.12002e-06 [allreduce_fusion]: 2.78998e-06 [matmul_add_comm_reduction]: 4.90001e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.08998e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.27e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.52001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.17e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.79984e-07 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.233e-05 [a_3]: 3.177e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 1.99999e-06 [rewriter_after_opt_a]: 2.975e-05 [convert_after_rewriter]: 6.60997e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00045097 [opt_b]: 0.00021437, [1] [Cycle 1]: 0.00020853, [7] [b_1]: 0.0001406 [b_2]: 7.33e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.09999e-06 [renormalize]: 3.59985e-07 [cse]: 1.63e-05 [optimize_parallel_all_gather_comm]: 1.617e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.238e-05 [loop_unroll]: 0.00041688 [opt_after_cconv]: 9.432e-05, [1] [Cycle 1]: 8.859e-05, [7] [c_1]: 2.765e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.579e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.236e-05 [tuple_transform]: 7.005e-05, [1] [Cycle 1]: 6.561e-05, [4] [d_1]: 3.956e-05 [none_parameter_eliminate]: 1.51998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.213e-05 [cse_after_recomputation]: 2.095e-05, [1] [Cycle 1]: 1.61e-05, [1] [cse]: 1.087e-05 [environ_conv]: 4.45e-06 [swap_dp_allreduce_reducescatter]: 5.46002e-06 [bias_add_comm_swap]: 2.29999e-06 [label_micro_interleaved_index]: 4.34997e-06 [label_fine_grained_interleaved_index]: 2.34999e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.37999e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.48998e-06 [reorder_send_recv_between_fp_bp]: 2.94001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.128e-05 [grouped_pairwise_exchange_alltoall]: 1.92999e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.23999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.37001e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.623e-05 [begin_end_overlap_inline]: 8.2e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.76998e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.847e-05, [1] [Cycle 1]: 6.439e-05, [6] [build]: 2.35002e-06 [elim_shapecalc]: 8.71002e-06 [elim_not_effective]: 1.131e-05 [opt_reshape]: 6.13002e-06 [fold_const_symbol]: 8.72998e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.5e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00044827 [validate]: 3.139e-05 [backend_pass]: 8.10018e-07 [task_emit]: 0.0577743 [execute]: 8.08999e-06 Sums bootstrap : 0.000467s : 0.71% type_inference : 0.004282s : 6.48% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000413s : 0.63% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000339s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000038s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.68% optimize.opt_b.b_1 : 0.000141s : 0.21% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000417s : 0.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000448s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057774s : 87.47% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.53% : 0.000022s : 4: substitution.arithmetic_simplify 1.63% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.60% : 0.000006s : 4: substitution.graph_param_transform 64.83% : 0.000078s : 2: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.82% : 0.000005s : 4: substitution.remove_not_recompute_node 3.22% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004241 2 91.59% : 0.003885s : 1: type_inference.infer 8.41% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.25% : 0.000003s : 17: predicate.arithmetic_simplify 1.02% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.85% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.49% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.95% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.59% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.03% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.47% : 0.000001s : 4: predicate.value_based_eliminate 0.97% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000248 6 41.79% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.21% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.077981 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.85% : 0.003004s : 1: add_attr 3.84% : 0.002995s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.64% : 0.000503s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.55% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 0.98% : 0.000764s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000124s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.36% : 0.001844s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.59% : 0.000458s : 1: opt_after_jit_grad 0.28% : 0.000218s : 1: opt_b 4.72% : 0.003684s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000184s : 1: renormalize.infer 0.19% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.11% : 0.057791s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.51% : 0.004295s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.107082, [24] [bootstrap]: 0.00053926 [type_inference]: 0.0101394 [event_method]: 4.29e-05 [auto_monad]: 0.00011461 [graph_reusing]: 7.8e-06 [inline]: 2.16e-06 [add_attr]: 0.00301165, [1] [add_attr_with_inline]: 0.00300288, [1] [Cycle 1]: 6.636e-05, [2] [tag_attr]: 3.156e-05 [meta_addattr_fg_expand]: 8.61002e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 4.514e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 9.09989e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.51002e-06 [optimize]: 0.0130044, [53] [py_interpret_to_execute]: 3.508e-05 [rewriter_before_opt_a]: 0.00012612 [opt_a]: 0.0107697, [3] [Cycle 1]: 0.00690088, [45] [expand_dump_flag]: 3.90998e-06 [switch_simplify]: 6.621e-05 [loop_unroll]: 5.523e-05 [a_1]: 0.0013313 [with_stream_mark]: 2.348e-05 [recompute_prepare]: 2.207e-05 [updatestate_depend_eliminate]: 9.07001e-06 [updatestate_assign_eliminate]: 7.60998e-06 [updatestate_loads_eliminate]: 7.18e-06 [parameter_eliminate]: 2.53998e-06 [a_2]: 0.00024415 [accelerated_algorithm]: 3.036e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 3.2e-06 [shard_inline]: 2.983e-05 [merge_send_recv]: 1.616e-05 [auto_parallel]: 1.086e-05 [parallel]: 1.82e-05 [flash_sp]: 1.215e-05 [merge_comm]: 9.87999e-06 [allreduce_fusion]: 8.74e-06 [matmul_add_comm_reduction]: 2.586e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.828e-05 [virtual_dataset]: 1.587e-05 [get_grad_eliminate_]: 1.561e-05 [virtual_output]: 1.529e-05 [merge_forward]: 1.017e-05 [cell_reuse_recompute_pass]: 9.5999e-07 [offload_activation]: 1.733e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.827e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 2.747e-05 [set_forward_comm_id_for_comm_node_pass]: 9.45001e-06 [meta_fg_expand]: 0.00137758 [flash_sp_send_recv_attached]: 3.48e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 5.874e-05 [a_after_grad]: 8.146e-05 [renormalize]: 0.00242755 [add_forward_monad_depend]: 9.51998e-06 [auto_monad_grad]: 5.09e-06 [auto_monad_eliminator]: 5.642e-05 [cse]: 0.00016881 [a_3]: 0.00033518 [Cycle 2]: 0.00294412, [45] [expand_dump_flag]: 1.64e-06 [switch_simplify]: 4.68e-05 [loop_unroll]: 4.416e-05 [a_1]: 0.00152549 [with_stream_mark]: 1.213e-05 [recompute_prepare]: 1.117e-05 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 4.42e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 1.00999e-06 [a_2]: 0.00012668 [accelerated_algorithm]: 1.226e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.73997e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 6.54001e-06 [auto_parallel]: 7.18998e-06 [parallel]: 4.89e-06 [flash_sp]: 3.01001e-06 [merge_comm]: 5.74e-06 [allreduce_fusion]: 5.42999e-06 [matmul_add_comm_reduction]: 7.76001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.007e-05 [virtual_dataset]: 9.02e-06 [get_grad_eliminate_]: 8.67e-06 [virtual_output]: 8.42e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 9.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.653e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.409e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34e-06 [meta_fg_expand]: 3.594e-05 [flash_sp_send_recv_attached]: 8.79983e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.501e-05 [a_after_grad]: 1.446e-05 [renormalize]: 0.00057649 [add_forward_monad_depend]: 3.92998e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 1.448e-05 [cse]: 4.503e-05 [a_3]: 6.466e-05 [Cycle 3]: 0.00091083, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 1.073e-05 [loop_unroll]: 8.83001e-06 [a_1]: 0.00026336 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 9.37001e-06 [updatestate_depend_eliminate]: 4.87e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.83999e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 0.0001228 [accelerated_algorithm]: 1.179e-05 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 6.69001e-06 [auto_parallel]: 7.1e-06 [parallel]: 4.86002e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 4.80001e-06 [allreduce_fusion]: 4.95001e-06 [matmul_add_comm_reduction]: 7.29001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 9.94001e-06 [virtual_dataset]: 8.58001e-06 [get_grad_eliminate_]: 8.38999e-06 [virtual_output]: 8.22e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 8.53001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.589e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.389e-05 [set_forward_comm_id_for_comm_node_pass]: 4.94e-06 [meta_fg_expand]: 3.11001e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.295e-05 [a_after_grad]: 1.516e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 1.149e-05 [cse]: 2.596e-05 [a_3]: 5.869e-05 [py_interpret_to_execute_after_opt_a]: 9.74e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 4.618e-05 [convert_after_rewriter]: 9.51e-06 [order_py_execute_after_rewriter]: 6.58e-06 [mutable_eliminate]: 0.00045648 [opt_b]: 0.0002871, [1] [Cycle 1]: 0.00028116, [7] [b_1]: 0.00018902 [b_2]: 1.045e-05 [updatestate_depend_eliminate]: 7.09001e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.91999e-06 [renormalize]: 3.50003e-07 [cse]: 3.137e-05 [optimize_parallel_all_gather_comm]: 2.017e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 1.949e-05 [loop_unroll]: 0.00042271 [opt_after_cconv]: 0.00013675, [1] [Cycle 1]: 0.00013085, [7] [c_1]: 4.857e-05 [parameter_eliminate]: 2.18998e-06 [updatestate_depend_eliminate]: 7.09001e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.87002e-06 [cse]: 3.065e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.92e-05 [tuple_transform]: 0.000102, [1] [Cycle 1]: 9.742e-05, [4] [d_1]: 6.692e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.019e-05 [partial_unused_args_eliminate]: 2.17999e-06 [add_recomputation]: 5.798e-05 [cse_after_recomputation]: 3.236e-05, [1] [Cycle 1]: 2.76e-05, [1] [cse]: 2.222e-05 [environ_conv]: 8.72e-06 [swap_dp_allreduce_reducescatter]: 7.65e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.69001e-06 [merge_cast_opt]: 1.46002e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.09e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.54001e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.28002e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.67e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 5.03002e-06 [overlap_recompute_and_grad_model_parallel]: 5.68002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 5.25999e-06 [overlap_grad_flash_sp]: 2.376e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.24e-06 [symbol_engine_optimizer]: 9.799e-05, [1] [Cycle 1]: 9.344e-05, [6] [build]: 9.74999e-06 [elim_shapecalc]: 1.298e-05 [elim_not_effective]: 1.857e-05 [opt_reshape]: 9.79999e-06 [fold_const_symbol]: 1.471e-05 [renormalize]: 3.09985e-07 [detach_backward]: 1.76003e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 2.434e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.45003e-06 [opt_after_jit_grad]: 0.00046567 [validate]: 4.482e-05 [backend_pass]: 8.40024e-07 [task_emit]: 0.0794037 [execute]: 8.93002e-06 Sums bootstrap : 0.000539s : 0.52% type_inference : 0.010139s : 9.86% event_method : 0.000043s : 0.04% auto_monad : 0.000115s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000126s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000124s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.11% optimize.opt_a.a_1 : 0.003120s : 3.03% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000048s : 0.05% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001417s : 1.38% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.11% optimize.opt_a.renormalize : 0.003004s : 2.92% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000240s : 0.23% optimize.opt_a.a_3 : 0.000459s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000456s : 0.44% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000423s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000466s : 0.45% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079404s : 77.23% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000729 218 5.81% : 0.000042s : 11: substitution.arithmetic_simplify 1.80% : 0.000013s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.57% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.33% : 0.000002s : 2: substitution.incorporate_call_switch 54.99% : 0.000401s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.00% : 0.000015s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.23% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000010s : 15: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.81% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.38% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010070 2 86.93% : 0.008755s : 1: type_inference.infer 13.07% : 0.001316s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.21% : 0.000118s : 16: replace.inline 40.79% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.74% : 0.000392s : 16: match.inline 7.26% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000735 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.24% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.18% : 0.000009s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.82% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.23% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.71% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.61% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.74% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000009s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.64% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001511 32 57.38% : 0.000867s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.62% : 0.000644s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131195 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.30% : 0.003016s : 1: add_attr 2.29% : 0.003007s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000122s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.44% : 0.000574s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.64% : 0.004769s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.21% : 0.010773s : 1: opt_a 0.11% : 0.000140s : 1: opt_after_cconv 0.36% : 0.000475s : 1: opt_after_jit_grad 0.22% : 0.000291s : 1: opt_b 9.92% : 0.013008s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.19% : 0.001565s : 2: renormalize.infer 1.09% : 0.001426s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.10% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 60.54% : 0.079422s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 7.74% : 0.010154s : 1: type_inference 0.05% : 0.000068s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x1-ge],max_mem:56.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x2-pynative],max_mem:56.0M TotalTime = 0.0216083, [24] [bootstrap]: 0.00049007 [type_inference]: 0.00605225 [event_method]: 1.465e-05 [auto_monad]: 5.547e-05 [graph_reusing]: 5.39e-06 [inline]: 1.64998e-06 [add_attr]: 0.00346963, [1] [add_attr_with_inline]: 0.00345888, [1] [Cycle 1]: 4.438e-05, [2] [tag_attr]: 1.509e-05 [meta_addattr_fg_expand]: 4.18001e-06 [parallel-infer-symbol]: 2.89001e-06 [pre_auto_parallel]: 2.813e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00398342, [53] [py_interpret_to_execute]: 2.035e-05 [rewriter_before_opt_a]: 5.87e-05 [opt_a]: 0.00214839, [2] [Cycle 1]: 0.00155152, [45] [expand_dump_flag]: 2.78e-06 [switch_simplify]: 3.115e-05 [loop_unroll]: 2.081e-05 [a_1]: 0.00045427 [with_stream_mark]: 1.278e-05 [recompute_prepare]: 7.96001e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 2.326e-05 [updatestate_loads_eliminate]: 3.21999e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 7.546e-05 [accelerated_algorithm]: 6.74999e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.20002e-06 [merge_send_recv]: 8.38001e-06 [auto_parallel]: 6.21998e-06 [parallel]: 2.362e-05 [flash_sp]: 7.28999e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.18998e-06 [matmul_add_comm_reduction]: 9.19e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 5.96e-06 [virtual_output]: 6.04999e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.47999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.18998e-06 [flash_sp_send_recv_attached]: 2.44001e-06 [receive_attached]: 2.31e-06 [after_resolve]: 1.001e-05 [a_after_grad]: 9.14998e-06 [renormalize]: 0.00042775 [add_forward_monad_depend]: 4.79e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.331e-05 [cse]: 2.761e-05 [a_3]: 4.014e-05 [Cycle 2]: 0.0005877, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.38002e-06 [a_1]: 0.00012635 [with_stream_mark]: 9.86e-06 [recompute_prepare]: 5.61003e-06 [updatestate_depend_eliminate]: 2.66e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.766e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.42999e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.28002e-06 [parallel]: 4.84e-06 [flash_sp]: 3.3e-06 [merge_comm]: 2.81e-06 [allreduce_fusion]: 2.51e-06 [matmul_add_comm_reduction]: 4.94e-06 [allreduce_slice_to_reducescatter]: 3.49974e-07 [virtual_shard_identity]: 5.67001e-06 [virtual_dataset]: 5.02e-06 [get_grad_eliminate_]: 5.33002e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.29001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 5.97001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.84999e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 8.74998e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 5.99e-06 [cse]: 1.653e-05 [a_3]: 3.117e-05 [py_interpret_to_execute_after_opt_a]: 7.31999e-06 [slice_cell_reuse_recomputed_activation]: 1.97001e-06 [rewriter_after_opt_a]: 2.972e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.47001e-06 [mutable_eliminate]: 0.00045287 [opt_b]: 0.00018044, [1] [Cycle 1]: 0.00017443, [7] [b_1]: 0.00010654 [b_2]: 7.30998e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 3.69997e-07 [cse]: 1.675e-05 [optimize_parallel_all_gather_comm]: 1.517e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.127e-05 [loop_unroll]: 0.00041137 [opt_after_cconv]: 9.524e-05, [1] [Cycle 1]: 8.981e-05, [7] [c_1]: 2.711e-05 [parameter_eliminate]: 2.76e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.23998e-06 [cse]: 1.644e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.217e-05 [tuple_transform]: 6.825e-05, [1] [Cycle 1]: 6.403e-05, [4] [d_1]: 3.891e-05 [none_parameter_eliminate]: 1.46998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 5.85002e-06 [partial_unused_args_eliminate]: 1.71002e-06 [add_recomputation]: 5.071e-05 [cse_after_recomputation]: 2.08e-05, [1] [Cycle 1]: 1.631e-05, [1] [cse]: 1.117e-05 [environ_conv]: 4.43999e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.55997e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.42e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.59001e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.03997e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 3.51001e-06 [overlap_recompute_and_grad_model_parallel]: 4.38001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.05002e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.40002e-06 [split_layernorm_comm]: 1.66998e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.786e-05, [1] [Cycle 1]: 6.378e-05, [6] [build]: 2.55002e-06 [elim_shapecalc]: 8.17998e-06 [elim_not_effective]: 1.109e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 9.10999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.588e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 0.00011534 [opt_after_jit_grad]: 0.00045528 [validate]: 3.088e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.00661393 [execute]: 6.93e-06 Sums bootstrap : 0.000490s : 2.86% type_inference : 0.006052s : 35.35% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000581s : 3.39% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000025s : 0.15% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000428s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000044s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.64% optimize.opt_b.b_1 : 0.000107s : 0.62% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.12% optimize.loop_unroll : 0.000411s : 2.40% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000115s : 0.67% opt_after_jit_grad : 0.000455s : 2.66% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.006614s : 38.63% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 15.06% : 0.000025s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.13% : 0.000005s : 4: substitution.graph_param_transform 66.83% : 0.000110s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.14% : 0.000004s : 4: substitution.replace_old_param 6.68% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006007 2 90.37% : 0.005429s : 1: type_inference.infer 9.63% : 0.000578s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.81% : 0.000028s : 3: replace.inline 29.19% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.59% : 0.000108s : 3: match.inline 8.41% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.35% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.34% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 1.01% : 0.000002s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.79% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.88% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.54% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.96% : 0.000002s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.95% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 8 47.40% : 0.000176s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.60% : 0.000196s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030582 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.36% : 0.003474s : 1: add_attr 11.32% : 0.003462s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.06% : 0.000020s : 1: auto_monad_reorder 0.19% : 0.000059s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.72% : 0.000525s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.37% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.08% : 0.000943s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.03% : 0.002151s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.52% : 0.000465s : 1: opt_after_jit_grad 0.60% : 0.000184s : 1: opt_b 13.04% : 0.003987s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.72% : 0.000220s : 1: renormalize.infer 0.66% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.40% : 0.000122s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000070s : 1: symbol_engine_optimizer 21.66% : 0.006625s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 19.84% : 0.006067s : 1: type_inference 0.19% : 0.000059s : 1: validate TotalTime = 0.0179923, [24] [bootstrap]: 0.00040692 [type_inference]: 0.00433226 [event_method]: 1.058e-05 [auto_monad]: 5.063e-05 [graph_reusing]: 4.72e-06 [inline]: 2.42001e-06 [add_attr]: 0.00296137, [1] [add_attr_with_inline]: 0.00295294, [1] [Cycle 1]: 4.011e-05, [2] [tag_attr]: 1.186e-05 [meta_addattr_fg_expand]: 3.09999e-06 [parallel-infer-symbol]: 2.68003e-06 [pre_auto_parallel]: 2.169e-05 [insert-virtual-dataset]: 2.41e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.96998e-06 [optimize]: 0.00365545, [53] [py_interpret_to_execute]: 1.575e-05 [rewriter_before_opt_a]: 3.835e-05 [opt_a]: 0.00184961, [2] [Cycle 1]: 0.00125268, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 2.525e-05 [loop_unroll]: 1.353e-05 [a_1]: 0.00029316 [with_stream_mark]: 1.365e-05 [recompute_prepare]: 7.68999e-06 [updatestate_depend_eliminate]: 3.97e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 6.15002e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.78001e-06 [auto_parallel]: 5.54998e-06 [parallel]: 1.846e-05 [flash_sp]: 6.91999e-06 [merge_comm]: 3.46001e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.40999e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.15e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 5.37999e-06 [virtual_output]: 5.57999e-06 [merge_forward]: 3.82002e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.49e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.078e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.50001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.35002e-06 [receive_attached]: 2.36998e-06 [after_resolve]: 1.062e-05 [a_after_grad]: 9.27999e-06 [renormalize]: 0.00034248 [add_forward_monad_depend]: 4.74e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.293e-05 [cse]: 2.766e-05 [a_3]: 3.986e-05 [Cycle 2]: 0.00058799, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.25999e-06 [a_1]: 0.00012417 [with_stream_mark]: 9.75002e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.93998e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.674e-05 [accelerated_algorithm]: 5.47999e-06 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.41998e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.01002e-06 [parallel]: 4.05e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.07e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01999e-06 [meta_fg_expand]: 1.55001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.93002e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 7.59988e-07 [auto_monad_eliminator]: 6.09001e-06 [cse]: 1.286e-05 [a_3]: 3.19e-05 [py_interpret_to_execute_after_opt_a]: 7.33999e-06 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 3.035e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00044387 [opt_b]: 0.00018112, [1] [Cycle 1]: 0.00017521, [7] [b_1]: 0.00010736 [b_2]: 6.74001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 3.80009e-07 [cse]: 1.7e-05 [optimize_parallel_all_gather_comm]: 1.532e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.2e-05 [loop_unroll]: 0.00041143 [opt_after_cconv]: 9.406e-05, [1] [Cycle 1]: 8.852e-05, [7] [c_1]: 2.756e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 1.538e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.222e-05 [tuple_transform]: 6.896e-05, [1] [Cycle 1]: 6.484e-05, [4] [d_1]: 3.918e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.303e-05 [cse_after_recomputation]: 1.966e-05, [1] [Cycle 1]: 1.535e-05, [1] [cse]: 1.023e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 4.86002e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.36e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.58e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.30001e-06 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.137e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 4.26001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.00002e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 2.619e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 7.021e-05, [1] [Cycle 1]: 6.611e-05, [6] [build]: 2.95002e-06 [elim_shapecalc]: 8.55999e-06 [elim_not_effective]: 1.228e-05 [opt_reshape]: 6.01998e-06 [fold_const_symbol]: 9.06998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.59998e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.534e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.66001e-06 [opt_after_jit_grad]: 0.00044071 [validate]: 3.081e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.00584457 [execute]: 6.56e-06 Sums bootstrap : 0.000407s : 2.89% type_inference : 0.004332s : 30.77% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.23% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.96% optimize.opt_a.with_stream_mark : 0.000023s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000343s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000041s : 0.29% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000030s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000444s : 3.15% optimize.opt_b.b_1 : 0.000107s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000411s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000026s : 0.19% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.09% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000441s : 3.13% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005845s : 41.51% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.15% : 0.000022s : 4: substitution.arithmetic_simplify 1.64% : 0.000002s : 2: substitution.elim_not_effective 1.21% : 0.000001s : 2: substitution.fold_const_symbol 4.21% : 0.000005s : 4: substitution.graph_param_transform 65.93% : 0.000081s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.48% : 0.000004s : 4: substitution.remove_not_recompute_node 3.10% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004292 2 92.01% : 0.003949s : 1: type_inference.infer 7.99% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000136 984 1.00% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.57% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.79% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.13% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.88% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 1.17% : 0.000002s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.67% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.50% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.44% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 42.04% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.96% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025877 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.46% : 0.002966s : 1: add_attr 11.43% : 0.002957s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.71% : 0.000442s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000420s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000453s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.97% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.13% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.16% : 0.001853s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.74% : 0.000450s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.14% : 0.003659s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.12% : 0.000030s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000187s : 1: renormalize.infer 0.57% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000073s : 1: symbol_engine_optimizer 22.62% : 0.005854s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.79% : 0.004345s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0194385, [24] [bootstrap]: 0.00045908 [type_inference]: 0.00543992 [event_method]: 1.407e-05 [auto_monad]: 5.53e-05 [graph_reusing]: 5.71e-06 [inline]: 1.88002e-06 [add_attr]: 0.0029621, [1] [add_attr_with_inline]: 0.00295383, [1] [Cycle 1]: 4.593e-05, [2] [tag_attr]: 1.552e-05 [meta_addattr_fg_expand]: 4.48001e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.51e-05 [insert-virtual-dataset]: 2.68998e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.59e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00392314, [53] [py_interpret_to_execute]: 2.023e-05 [rewriter_before_opt_a]: 5.617e-05 [opt_a]: 0.00207494, [2] [Cycle 1]: 0.00148081, [45] [expand_dump_flag]: 3.09999e-06 [switch_simplify]: 3.114e-05 [loop_unroll]: 2.041e-05 [a_1]: 0.00044324 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 4.20999e-06 [updatestate_assign_eliminate]: 3.12002e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.541e-05 [accelerated_algorithm]: 6.36998e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 6.28e-06 [parallel]: 1.673e-05 [flash_sp]: 7.07002e-06 [merge_comm]: 3.77002e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 9.03002e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.11001e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.64002e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 8.70001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.47001e-06 [after_resolve]: 1.074e-05 [a_after_grad]: 8.77999e-06 [renormalize]: 0.0004069 [add_forward_monad_depend]: 4.65999e-06 [auto_monad_grad]: 1.94999e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.706e-05 [a_3]: 4.028e-05 [Cycle 2]: 0.00058467, [45] [expand_dump_flag]: 9.49978e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012354 [with_stream_mark]: 1.004e-05 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.723e-05 [accelerated_algorithm]: 5.59998e-06 [shard]: 8.89995e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.09998e-06 [parallel]: 4.13999e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.59001e-06 [matmul_add_comm_reduction]: 5.32999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.81e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.86997e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 5.82001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.21002e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.05e-06 [set_forward_comm_id_for_comm_node_pass]: 2.97002e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 7.66999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.19972e-07 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.535e-05 [a_3]: 3.119e-05 [py_interpret_to_execute_after_opt_a]: 8.20999e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.058e-05 [convert_after_rewriter]: 7.01999e-06 [order_py_execute_after_rewriter]: 4.80999e-06 [mutable_eliminate]: 0.00044767 [opt_b]: 0.00018142, [1] [Cycle 1]: 0.00017498, [7] [b_1]: 0.00010875 [b_2]: 6.69999e-06 [updatestate_depend_eliminate]: 4.94998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 4.2998e-07 [cse]: 1.576e-05 [optimize_parallel_all_gather_comm]: 1.667e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.287e-05 [loop_unroll]: 0.00044466 [opt_after_cconv]: 9.257e-05, [1] [Cycle 1]: 8.707e-05, [7] [c_1]: 2.684e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.518e-05 [renormalize]: 5.09986e-07 [remove_dup_value]: 1.291e-05 [tuple_transform]: 6.946e-05, [1] [Cycle 1]: 6.511e-05, [4] [d_1]: 3.903e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 4.257e-05 [cse_after_recomputation]: 1.991e-05, [1] [Cycle 1]: 1.557e-05, [1] [cse]: 1.05e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.04998e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.49999e-06 [reorder_send_recv_between_fp_bp]: 2.34999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.098e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.04997e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.81001e-06 [overlap_grad_flash_sp]: 1.683e-05 [begin_end_overlap_inline]: 5.40022e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.54e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.691e-05, [1] [Cycle 1]: 6.287e-05, [6] [build]: 2.26998e-06 [elim_shapecalc]: 8.23999e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 9.05001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.485e-05 [get_jit_bprop_graph]: 9.69972e-07 [rewriter_after_jit_bprop_graph]: 3.55998e-06 [opt_after_jit_grad]: 0.00044772 [validate]: 3.086e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0058437 [execute]: 6.61e-06 Sums bootstrap : 0.000459s : 2.95% type_inference : 0.005440s : 35.01% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000056s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000567s : 3.65% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000407s : 2.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 2.88% optimize.opt_b.b_1 : 0.000109s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000445s : 2.86% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000448s : 2.88% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005844s : 37.61% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 15.07% : 0.000025s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.39% : 0.000006s : 4: substitution.graph_param_transform 66.02% : 0.000109s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.93% : 0.000005s : 4: substitution.remove_not_recompute_node 2.73% : 0.000005s : 4: substitution.replace_old_param 6.41% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005400 2 90.02% : 0.004861s : 1: type_inference.infer 9.98% : 0.000539s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.03% : 0.000027s : 3: replace.inline 29.97% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.81% : 0.000107s : 3: match.inline 8.19% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.06% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000009s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.73% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 1.06% : 0.000002s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.68% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000337 8 46.54% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.46% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027816 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.66% : 0.002966s : 1: add_attr 10.63% : 0.002957s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000060s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000493s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000453s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.34% : 0.000928s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.47% : 0.002078s : 1: opt_a 0.35% : 0.000096s : 1: opt_after_cconv 1.64% : 0.000457s : 1: opt_after_jit_grad 0.66% : 0.000185s : 1: opt_b 14.12% : 0.003927s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.09% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000207s : 1: renormalize.infer 0.70% : 0.000194s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000060s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000069s : 1: symbol_engine_optimizer 21.04% : 0.005853s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 19.61% : 0.005453s : 1: type_inference 0.21% : 0.000057s : 1: validate TotalTime = 0.037054, [24] [bootstrap]: 0.0004634 [type_inference]: 0.0111935 [event_method]: 4.807e-05 [auto_monad]: 0.00012012 [graph_reusing]: 8.27e-06 [inline]: 2.04e-06 [add_attr]: 0.00296392, [1] [add_attr_with_inline]: 0.00295496, [1] [Cycle 1]: 7.117e-05, [2] [tag_attr]: 3.467e-05 [meta_addattr_fg_expand]: 9.39e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 4.913e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0131811, [53] [py_interpret_to_execute]: 3.872e-05 [rewriter_before_opt_a]: 0.00014499 [opt_a]: 0.0108768, [3] [Cycle 1]: 0.00698538, [45] [expand_dump_flag]: 3.69002e-06 [switch_simplify]: 7.283e-05 [loop_unroll]: 6.092e-05 [a_1]: 0.00147702 [with_stream_mark]: 2.254e-05 [recompute_prepare]: 2.115e-05 [updatestate_depend_eliminate]: 9.04e-06 [updatestate_assign_eliminate]: 7.61999e-06 [updatestate_loads_eliminate]: 7.36001e-06 [parameter_eliminate]: 2.64001e-06 [a_2]: 0.00024258 [accelerated_algorithm]: 3.019e-05 [shard]: 1.71998e-06 [meta_shard_fg_expand]: 3.33998e-06 [shard_inline]: 1.61e-05 [merge_send_recv]: 1.633e-05 [auto_parallel]: 1.112e-05 [parallel]: 1.896e-05 [flash_sp]: 1.108e-05 [merge_comm]: 9.62001e-06 [allreduce_fusion]: 8.87999e-06 [matmul_add_comm_reduction]: 2.615e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.744e-05 [virtual_dataset]: 1.553e-05 [get_grad_eliminate_]: 1.487e-05 [virtual_output]: 1.495e-05 [merge_forward]: 9.10999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.714e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.847e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 2.703e-05 [set_forward_comm_id_for_comm_node_pass]: 9.91998e-06 [meta_fg_expand]: 0.00139057 [flash_sp_send_recv_attached]: 3.91999e-06 [receive_attached]: 2.48002e-06 [after_resolve]: 5.943e-05 [a_after_grad]: 8.097e-05 [renormalize]: 0.00236919 [add_forward_monad_depend]: 9.04e-06 [auto_monad_grad]: 5.42999e-06 [auto_monad_eliminator]: 5.614e-05 [cse]: 0.000161 [a_3]: 0.000336 [Cycle 2]: 0.00298036, [45] [expand_dump_flag]: 1.45999e-06 [switch_simplify]: 4.699e-05 [loop_unroll]: 4.383e-05 [a_1]: 0.00152654 [with_stream_mark]: 1.195e-05 [recompute_prepare]: 1.099e-05 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00012559 [accelerated_algorithm]: 1.189e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 6.89001e-06 [auto_parallel]: 7.47998e-06 [parallel]: 4.85999e-06 [flash_sp]: 3.23e-06 [merge_comm]: 5.10001e-06 [allreduce_fusion]: 4.60001e-06 [matmul_add_comm_reduction]: 7.63999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 1.004e-05 [virtual_dataset]: 8.82e-06 [get_grad_eliminate_]: 9.19e-06 [virtual_output]: 8.52e-06 [merge_forward]: 4.13001e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 8.84998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.631e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.394e-05 [set_forward_comm_id_for_comm_node_pass]: 5.14998e-06 [meta_fg_expand]: 6.853e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.19e-06 [after_resolve]: 1.61e-05 [a_after_grad]: 1.446e-05 [renormalize]: 0.00057748 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.18001e-06 [auto_monad_eliminator]: 1.429e-05 [cse]: 4.524e-05 [a_3]: 6.561e-05 [Cycle 3]: 0.00089694, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.079e-05 [loop_unroll]: 8.84e-06 [a_1]: 0.00024832 [with_stream_mark]: 9.96e-06 [recompute_prepare]: 9.21998e-06 [updatestate_depend_eliminate]: 4.74998e-06 [updatestate_assign_eliminate]: 3.81001e-06 [updatestate_loads_eliminate]: 3.75998e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012342 [accelerated_algorithm]: 1.167e-05 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 8.91997e-06 [merge_send_recv]: 6.97002e-06 [auto_parallel]: 7.17002e-06 [parallel]: 4.55999e-06 [flash_sp]: 1.15001e-06 [merge_comm]: 5.02999e-06 [allreduce_fusion]: 4.95999e-06 [matmul_add_comm_reduction]: 7.67002e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 9.91e-06 [virtual_dataset]: 8.73001e-06 [get_grad_eliminate_]: 8.3e-06 [virtual_output]: 8.15e-06 [merge_forward]: 4.32998e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 8.45999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.569e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.406e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 2.84001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 1.453e-05 [a_after_grad]: 1.436e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.089e-05 [cse]: 2.539e-05 [a_3]: 5.905e-05 [py_interpret_to_execute_after_opt_a]: 9.71e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 4.621e-05 [convert_after_rewriter]: 9.59e-06 [order_py_execute_after_rewriter]: 6.51e-06 [mutable_eliminate]: 0.00045725 [opt_b]: 0.00028486, [1] [Cycle 1]: 0.00027857, [7] [b_1]: 0.00018741 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 7.03e-06 [updatestate_assign_eliminate]: 3.97e-06 [updatestate_loads_eliminate]: 4.17e-06 [renormalize]: 5.90022e-07 [cse]: 3.064e-05 [optimize_parallel_all_gather_comm]: 2.031e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 1.991e-05 [loop_unroll]: 0.00042138 [opt_after_cconv]: 0.00013431, [1] [Cycle 1]: 0.00012829, [7] [c_1]: 4.794e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 6.98e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 4.13999e-06 [cse]: 2.908e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 2.79e-05 [tuple_transform]: 0.00010204, [1] [Cycle 1]: 9.712e-05, [4] [d_1]: 6.74e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 9.94001e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 0.0001097 [cse_after_recomputation]: 3.277e-05, [1] [Cycle 1]: 2.809e-05, [1] [cse]: 2.256e-05 [environ_conv]: 9.23002e-06 [swap_dp_allreduce_reducescatter]: 7.77e-06 [bias_add_comm_swap]: 2.84999e-06 [label_micro_interleaved_index]: 4.44998e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.52001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.44998e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.21997e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.80997e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.673e-05 [grouped_pairwise_exchange_alltoall]: 1.40001e-06 [offloading_packed_experts]: 5.14998e-06 [overlap_recompute_and_grad_model_parallel]: 5.94999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.06e-06 [overlap_grad_ring_attention]: 5.44e-06 [overlap_grad_flash_sp]: 2.339e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.14e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 9.868e-05, [1] [Cycle 1]: 9.455e-05, [6] [build]: 8.67e-06 [elim_shapecalc]: 1.374e-05 [elim_not_effective]: 1.828e-05 [opt_reshape]: 1.031e-05 [fold_const_symbol]: 1.512e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.72999e-06 [auto_monad_reorder]: 2.474e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.26001e-06 [opt_after_jit_grad]: 0.00047162 [validate]: 4.439e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00823189 [execute]: 6.76e-06 Sums bootstrap : 0.000463s : 1.41% type_inference : 0.011193s : 34.10% event_method : 0.000048s : 0.15% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000145s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003252s : 9.91% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001462s : 4.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.002947s : 8.98% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000232s : 0.71% optimize.opt_a.a_3 : 0.000461s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000457s : 1.39% optimize.opt_b.b_1 : 0.000187s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000421s : 1.28% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.08% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000110s : 0.33% optimize.cse_after_recomputation.cse : 0.000023s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000472s : 1.44% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008232s : 25.08% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000749 222 5.96% : 0.000045s : 12: substitution.arithmetic_simplify 1.79% : 0.000013s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.71% : 0.000417s : 17: substitution.inline 2.06% : 0.000015s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000014s : 3: substitution.less_batch_normalization 1.67% : 0.000012s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.80% : 0.000013s : 20: substitution.remove_not_recompute_node 3.16% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000010s : 15: substitution.replace_old_param 0.36% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.65% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011121 2 87.01% : 0.009677s : 1: type_inference.infer 12.99% : 0.001444s : 1: type_inference.specialize ------[replace.] 0.000260 33 64.45% : 0.000167s : 17: replace.inline 35.55% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000442 33 92.40% : 0.000409s : 17: match.inline 7.60% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.12% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.12% : 0.000016s : 100: predicate.arithmetic_simplify 1.13% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 2.06% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 152: predicate.replace_applicator 0.58% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.98% : 0.000037s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.92% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.66% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001540 34 57.31% : 0.000883s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.69% : 0.000658s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061384 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.83% : 0.002968s : 1: add_attr 4.82% : 0.002959s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000115s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.81% : 0.000494s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.00% : 0.004908s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000173s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.72% : 0.010880s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.78% : 0.000481s : 1: opt_after_jit_grad 0.47% : 0.000288s : 1: opt_b 21.48% : 0.013185s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.58% : 0.001585s : 2: renormalize.infer 2.20% : 0.001350s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.43% : 0.008241s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.30% : 0.011236s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0184927, [24] [bootstrap]: 0.00046816 [type_inference]: 0.00429849 [event_method]: 1.063e-05 [auto_monad]: 5.013e-05 [graph_reusing]: 5.51e-06 [inline]: 1.74998e-06 [add_attr]: 0.0030054, [1] [add_attr_with_inline]: 0.00299755, [1] [Cycle 1]: 4.532e-05, [2] [tag_attr]: 1.221e-05 [meta_addattr_fg_expand]: 3.23e-06 [parallel-infer-symbol]: 2.54999e-06 [pre_auto_parallel]: 2.066e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.04999e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00369059, [53] [py_interpret_to_execute]: 1.545e-05 [rewriter_before_opt_a]: 3.827e-05 [opt_a]: 0.00184152, [2] [Cycle 1]: 0.0012422, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 2.381e-05 [loop_unroll]: 1.337e-05 [a_1]: 0.00029067 [with_stream_mark]: 1.319e-05 [recompute_prepare]: 7.88999e-06 [updatestate_depend_eliminate]: 3.54002e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.94001e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.661e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 2.37001e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.99002e-06 [auto_parallel]: 5.85002e-06 [parallel]: 1.764e-05 [flash_sp]: 7.15e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 8.54e-06 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 6.83e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.77999e-06 [merge_forward]: 3.56001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 8.94998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 1.97001e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 1.13e-05 [a_after_grad]: 8.64003e-06 [renormalize]: 0.00033593 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.275e-05 [cse]: 2.654e-05 [a_3]: 4.011e-05 [Cycle 2]: 0.00058999, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 6.73998e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00012513 [with_stream_mark]: 9.19998e-06 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.891e-05 [accelerated_algorithm]: 5.30999e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.21002e-06 [shard_inline]: 5.40001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 4.97999e-06 [parallel]: 4.17e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.39e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 5.71e-06 [virtual_dataset]: 5.14998e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 6.06998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.31e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.83999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 8.40999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 9.19972e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.226e-05 [a_3]: 3.155e-05 [py_interpret_to_execute_after_opt_a]: 7.82998e-06 [slice_cell_reuse_recomputed_activation]: 2.31998e-06 [rewriter_after_opt_a]: 3.119e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 4.92999e-06 [mutable_eliminate]: 0.00044845 [opt_b]: 0.00022575, [1] [Cycle 1]: 0.00017355, [7] [b_1]: 0.00010719 [b_2]: 6.91999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.60014e-07 [cse]: 1.567e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.225e-05 [loop_unroll]: 0.00041596 [opt_after_cconv]: 9.53e-05, [1] [Cycle 1]: 8.922e-05, [7] [c_1]: 2.824e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 4.69002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.41e-06 [cse]: 1.606e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.217e-05 [tuple_transform]: 6.935e-05, [1] [Cycle 1]: 6.463e-05, [4] [d_1]: 3.901e-05 [none_parameter_eliminate]: 1.46998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.33998e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.215e-05 [cse_after_recomputation]: 1.947e-05, [1] [Cycle 1]: 1.529e-05, [1] [cse]: 1.03e-05 [environ_conv]: 4.58999e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.94001e-06 [label_micro_interleaved_index]: 4.57e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.08002e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 1.15999e-06 [remove_cast_before_assign_add]: 1.39e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.63998e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86998e-06 [control_data_broadcast_order]: 1.109e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 3.49001e-06 [overlap_recompute_and_grad_model_parallel]: 4.75001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.02999e-06 [overlap_grad_ring_attention]: 3.76001e-06 [overlap_grad_flash_sp]: 1.704e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 2.21e-06 [handle_group_info]: 1.29003e-06 [symbol_engine_optimizer]: 6.705e-05, [1] [Cycle 1]: 6.276e-05, [6] [build]: 2.17999e-06 [elim_shapecalc]: 7.88999e-06 [elim_not_effective]: 1.113e-05 [opt_reshape]: 6.20997e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.50999e-06 [auto_monad_reorder]: 1.517e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.0004515 [validate]: 2.95e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.0062273 [execute]: 6.97997e-06 Sums bootstrap : 0.000468s : 3.23% type_inference : 0.004298s : 29.67% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000416s : 2.87% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000336s : 2.32% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.10% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.02% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 3.12% validate : 0.000029s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006227s : 42.98% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 17.99% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000001s : 2: substitution.fold_const_symbol 4.29% : 0.000005s : 4: substitution.graph_param_transform 65.78% : 0.000080s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.56% : 0.000004s : 4: substitution.remove_not_recompute_node 3.70% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004259 2 91.82% : 0.003911s : 1: type_inference.infer 8.18% : 0.000349s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.32% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.89% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.86% : 0.000007s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 40.90% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.10% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026448 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.38% : 0.003010s : 1: add_attr 11.35% : 0.003001s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.90% : 0.000502s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000008s : 1: label_micro_interleaved_index 1.60% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.90% : 0.000766s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.97% : 0.001844s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.74% : 0.000461s : 1: opt_after_jit_grad 0.87% : 0.000230s : 1: opt_b 13.97% : 0.003694s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.69% : 0.000183s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.58% : 0.006237s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.30% : 0.004312s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0357741, [24] [bootstrap]: 0.0005368 [type_inference]: 0.010164 [event_method]: 4.046e-05 [auto_monad]: 0.00011518 [graph_reusing]: 7.9e-06 [inline]: 2.06e-06 [add_attr]: 0.00297218, [1] [add_attr_with_inline]: 0.00296404, [1] [Cycle 1]: 6.476e-05, [2] [tag_attr]: 3.094e-05 [meta_addattr_fg_expand]: 8.69e-06 [parallel-infer-symbol]: 2.71999e-06 [pre_auto_parallel]: 4.545e-05 [insert-virtual-dataset]: 2.64999e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.0129443, [53] [py_interpret_to_execute]: 3.722e-05 [rewriter_before_opt_a]: 0.00012482 [opt_a]: 0.010718, [3] [Cycle 1]: 0.00682625, [45] [expand_dump_flag]: 3.58999e-06 [switch_simplify]: 6.629e-05 [loop_unroll]: 5.434e-05 [a_1]: 0.00133185 [with_stream_mark]: 2.243e-05 [recompute_prepare]: 2.182e-05 [updatestate_depend_eliminate]: 8.92e-06 [updatestate_assign_eliminate]: 7.78001e-06 [updatestate_loads_eliminate]: 7.53e-06 [parameter_eliminate]: 2.41998e-06 [a_2]: 0.00024416 [accelerated_algorithm]: 3.094e-05 [shard]: 2.06998e-06 [meta_shard_fg_expand]: 3.26001e-06 [shard_inline]: 1.589e-05 [merge_send_recv]: 1.57e-05 [auto_parallel]: 1.1e-05 [parallel]: 1.846e-05 [flash_sp]: 1.074e-05 [merge_comm]: 9.78002e-06 [allreduce_fusion]: 9.07001e-06 [matmul_add_comm_reduction]: 2.722e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 1.78e-05 [virtual_dataset]: 3.826e-05 [get_grad_eliminate_]: 1.646e-05 [virtual_output]: 1.522e-05 [merge_forward]: 9.56e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 1.814e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.863e-05 [merge_recompute_call_nodes]: 1.50999e-06 [before_grad]: 2.726e-05 [set_forward_comm_id_for_comm_node_pass]: 9.67001e-06 [meta_fg_expand]: 0.00136116 [flash_sp_send_recv_attached]: 3.67998e-06 [receive_attached]: 2.41998e-06 [after_resolve]: 5.899e-05 [a_after_grad]: 8.13e-05 [renormalize]: 0.00237331 [add_forward_monad_depend]: 8.92e-06 [auto_monad_grad]: 5.07e-06 [auto_monad_eliminator]: 5.522e-05 [cse]: 0.00016152 [a_3]: 0.00033542 [Cycle 2]: 0.00293554, [45] [expand_dump_flag]: 1.53002e-06 [switch_simplify]: 4.665e-05 [loop_unroll]: 4.43e-05 [a_1]: 0.00153187 [with_stream_mark]: 1.179e-05 [recompute_prepare]: 1.114e-05 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 1.04003e-06 [a_2]: 0.00012664 [accelerated_algorithm]: 1.191e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 9.06002e-06 [merge_send_recv]: 6.49001e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.85001e-06 [flash_sp]: 3.08e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 4.57e-06 [matmul_add_comm_reduction]: 7.49002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.024e-05 [virtual_dataset]: 8.79e-06 [get_grad_eliminate_]: 9.08002e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.12e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 8.99998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.602e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.402e-05 [set_forward_comm_id_for_comm_node_pass]: 5.02999e-06 [meta_fg_expand]: 3.396e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.489e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00057155 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.31002e-06 [auto_monad_eliminator]: 1.459e-05 [cse]: 4.529e-05 [a_3]: 6.478e-05 [Cycle 3]: 0.00094251, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 1.024e-05 [loop_unroll]: 8.92e-06 [a_1]: 0.00025317 [with_stream_mark]: 9.59999e-06 [recompute_prepare]: 1.006e-05 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 4.12998e-06 [updatestate_loads_eliminate]: 3.82998e-06 [parameter_eliminate]: 1.15001e-06 [a_2]: 0.00012477 [accelerated_algorithm]: 1.157e-05 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 6.87002e-06 [auto_parallel]: 7.26001e-06 [parallel]: 4.57998e-06 [flash_sp]: 1.02998e-06 [merge_comm]: 4.94998e-06 [allreduce_fusion]: 4.94003e-06 [matmul_add_comm_reduction]: 7.55e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.008e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.18999e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 8.48999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.614e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.13002e-06 [meta_fg_expand]: 2.98e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.335e-05 [a_after_grad]: 1.402e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.33002e-06 [auto_monad_grad]: 1.09e-06 [auto_monad_eliminator]: 1.047e-05 [cse]: 2.568e-05 [a_3]: 5.682e-05 [py_interpret_to_execute_after_opt_a]: 1.033e-05 [slice_cell_reuse_recomputed_activation]: 2.56998e-06 [rewriter_after_opt_a]: 4.751e-05 [convert_after_rewriter]: 9.09e-06 [order_py_execute_after_rewriter]: 6.79001e-06 [mutable_eliminate]: 0.00045381 [opt_b]: 0.00028836, [1] [Cycle 1]: 0.00028233, [7] [b_1]: 0.00019007 [b_2]: 1.077e-05 [updatestate_depend_eliminate]: 7.05e-06 [updatestate_assign_eliminate]: 4.05998e-06 [updatestate_loads_eliminate]: 3.9e-06 [renormalize]: 3.60014e-07 [cse]: 3.092e-05 [optimize_parallel_all_gather_comm]: 2.019e-05 [overlap_param_gather]: 1.99e-06 [cconv]: 1.975e-05 [loop_unroll]: 0.00042036 [opt_after_cconv]: 0.00013555, [1] [Cycle 1]: 0.00012971, [7] [c_1]: 4.855e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 6.97002e-06 [updatestate_assign_eliminate]: 4.38001e-06 [updatestate_loads_eliminate]: 4.19002e-06 [cse]: 2.965e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 2.864e-05 [tuple_transform]: 0.00010038, [1] [Cycle 1]: 9.585e-05, [4] [d_1]: 6.623e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 9.69999e-06 [partial_unused_args_eliminate]: 1.74998e-06 [add_recomputation]: 5.688e-05 [cse_after_recomputation]: 3.169e-05, [1] [Cycle 1]: 2.697e-05, [1] [cse]: 2.155e-05 [environ_conv]: 8.62e-06 [swap_dp_allreduce_reducescatter]: 7.56001e-06 [bias_add_comm_swap]: 2.20002e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.27999e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.33998e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.86999e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 8.49977e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.53002e-06 [control_data_broadcast_order]: 1.74e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.66e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.01003e-06 [overlap_grad_ring_attention]: 5.14e-06 [overlap_grad_flash_sp]: 2.375e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.68002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.741e-05, [1] [Cycle 1]: 9.318e-05, [6] [build]: 9.29e-06 [elim_shapecalc]: 1.281e-05 [elim_not_effective]: 1.832e-05 [opt_reshape]: 1.007e-05 [fold_const_symbol]: 1.482e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 2.528e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.66999e-06 [opt_after_jit_grad]: 0.00046546 [validate]: 4.314e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.00818344 [execute]: 6.79999e-06 Sums bootstrap : 0.000537s : 1.70% type_inference : 0.010164s : 32.24% event_method : 0.000040s : 0.13% auto_monad : 0.000115s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.12% optimize.rewriter_before_opt_a : 0.000125s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003117s : 9.89% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.14% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000056s : 0.18% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.11% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001398s : 4.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002945s : 9.34% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000232s : 0.74% optimize.opt_a.a_3 : 0.000457s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000454s : 1.44% optimize.opt_b.b_1 : 0.000190s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000420s : 1.33% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000465s : 1.48% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008183s : 25.96% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000722 218 5.89% : 0.000043s : 11: substitution.arithmetic_simplify 1.97% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.65% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.33% : 0.000002s : 2: substitution.incorporate_call_switch 54.60% : 0.000394s : 16: substitution.inline 2.18% : 0.000016s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.16% : 0.000016s : 3: substitution.less_batch_normalization 1.82% : 0.000013s : 11: substitution.minmaximum_grad 0.79% : 0.000006s : 5: substitution.partial_eliminate 1.81% : 0.000013s : 20: substitution.remove_not_recompute_node 3.15% : 0.000023s : 10: substitution.replace_applicator 1.43% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.70% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.44% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.50% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010097 2 87.43% : 0.008828s : 1: type_inference.infer 12.57% : 0.001269s : 1: type_inference.specialize ------[replace.] 0.000198 30 58.70% : 0.000116s : 16: replace.inline 41.30% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000416 30 92.78% : 0.000386s : 16: match.inline 7.22% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000733 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 67: predicate.addn_zero_filter 1.08% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.15% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.44% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.85% : 0.000014s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.58% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.33% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.59% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 164: predicate.load_eliminater 0.36% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.17% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.37% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000009s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.11% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.66% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001511 32 59.30% : 0.000896s : 12: func_graph_cloner_run.FuncGraphClonerGraph 40.70% : 0.000615s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059751 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.98% : 0.002977s : 1: add_attr 4.97% : 0.002968s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000122s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.95% : 0.000570s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.77% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.01% : 0.004787s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.94% : 0.010721s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.79% : 0.000475s : 1: opt_after_jit_grad 0.49% : 0.000292s : 1: opt_b 21.67% : 0.012948s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000033s : 1: remove_dup_value 2.59% : 0.001548s : 2: renormalize.infer 2.32% : 0.001386s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000129s : 1: rewriter_before_opt_a 0.01% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.71% : 0.008193s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.04% : 0.010179s : 1: type_inference 0.12% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x2-kbk],max_mem:56.0M TotalTime = 0.117658, [24] [bootstrap]: 0.00062079 [type_inference]: 0.0060601 [event_method]: 1.34e-05 [auto_monad]: 5.511e-05 [graph_reusing]: 5.87001e-06 [inline]: 1.55999e-06 [add_attr]: 0.00348864, [1] [add_attr_with_inline]: 0.00347697, [1] [Cycle 1]: 4.498e-05, [2] [tag_attr]: 1.545e-05 [meta_addattr_fg_expand]: 4.22e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 2.837e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.76998e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00397104, [53] [py_interpret_to_execute]: 1.946e-05 [rewriter_before_opt_a]: 5.622e-05 [opt_a]: 0.00212501, [2] [Cycle 1]: 0.00150316, [45] [expand_dump_flag]: 2.95998e-06 [switch_simplify]: 3.121e-05 [loop_unroll]: 2.146e-05 [a_1]: 0.00044971 [with_stream_mark]: 1.594e-05 [recompute_prepare]: 7.98001e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.29001e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.651e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 5.68997e-06 [merge_send_recv]: 8.12e-06 [auto_parallel]: 5.57001e-06 [parallel]: 2.255e-05 [flash_sp]: 6.98e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 9.00001e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.63999e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.79999e-06 [merge_forward]: 3.46001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.28002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.30998e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.84001e-06 [receive_attached]: 2.53998e-06 [after_resolve]: 9.89001e-06 [a_after_grad]: 8.74e-06 [renormalize]: 0.00040926 [add_forward_monad_depend]: 4.48999e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.309e-05 [cse]: 2.626e-05 [a_3]: 4.072e-05 [Cycle 2]: 0.00061262, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.60001e-06 [a_1]: 0.00012494 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.66998e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.727e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.38002e-06 [merge_send_recv]: 4.55999e-06 [auto_parallel]: 5.25001e-06 [parallel]: 4.38999e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.41e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.09998e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.66999e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 5.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52999e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.42e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.05999e-06 [a_after_grad]: 8.07e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.20001e-07 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.235e-05 [a_3]: 3.187e-05 [py_interpret_to_execute_after_opt_a]: 7.97e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.566e-05 [convert_after_rewriter]: 1.076e-05 [order_py_execute_after_rewriter]: 5.82001e-06 [mutable_eliminate]: 0.0004525 [opt_b]: 0.00018096, [1] [Cycle 1]: 0.00017496, [7] [b_1]: 0.00010734 [b_2]: 7.09001e-06 [updatestate_depend_eliminate]: 5.04998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 3.39991e-07 [cse]: 1.604e-05 [optimize_parallel_all_gather_comm]: 1.967e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.161e-05 [loop_unroll]: 0.0004128 [opt_after_cconv]: 9.308e-05, [1] [Cycle 1]: 8.766e-05, [7] [c_1]: 2.707e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.574e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.276e-05 [tuple_transform]: 6.835e-05, [1] [Cycle 1]: 6.402e-05, [4] [d_1]: 3.868e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.92001e-06 [add_recomputation]: 4.816e-05 [cse_after_recomputation]: 2.016e-05, [1] [Cycle 1]: 1.575e-05, [1] [cse]: 1.087e-05 [environ_conv]: 4.52e-06 [swap_dp_allreduce_reducescatter]: 5.21998e-06 [bias_add_comm_swap]: 2.54001e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.59998e-06 [slice_recompute_activation]: 2.73998e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.27001e-06 [reorder_send_recv_between_fp_bp]: 2.50002e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.134e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.15e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.691e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.18998e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.40999e-06 [symbol_engine_optimizer]: 6.8e-05, [1] [Cycle 1]: 6.403e-05, [6] [build]: 2.32001e-06 [elim_shapecalc]: 8.20999e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 8.98002e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.512e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00045446 [validate]: 3.022e-05 [backend_pass]: 1.07e-06 [task_emit]: 0.102675 [execute]: 9.04e-06 Sums bootstrap : 0.000621s : 0.55% type_inference : 0.006060s : 5.35% event_method : 0.000013s : 0.01% auto_monad : 0.000055s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000056s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000575s : 0.51% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000409s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000039s : 0.03% optimize.opt_a.a_3 : 0.000073s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000036s : 0.03% optimize.convert_after_rewriter : 0.000011s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.40% optimize.opt_b.b_1 : 0.000107s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000413s : 0.36% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000454s : 0.40% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.102675s : 90.72% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 15.45% : 0.000025s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.23% : 0.000005s : 4: substitution.graph_param_transform 66.34% : 0.000108s : 3: substitution.inline 1.80% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.76% : 0.000004s : 4: substitution.remove_not_recompute_node 2.20% : 0.000004s : 4: substitution.replace_old_param 6.39% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006014 2 90.83% : 0.005462s : 1: type_inference.infer 9.17% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.31% : 0.000027s : 3: replace.inline 30.69% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.91% : 0.000106s : 3: match.inline 8.09% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 1.10% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.16% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.45% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.45% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.58% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.91% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.57% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.83% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 47.58% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.42% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.126614 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.76% : 0.003493s : 1: add_attr 2.75% : 0.003481s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.52% : 0.000656s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000014s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.74% : 0.000940s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.68% : 0.002128s : 1: opt_a 0.08% : 0.000096s : 1: opt_after_cconv 0.37% : 0.000464s : 1: opt_after_jit_grad 0.15% : 0.000184s : 1: opt_b 3.14% : 0.003975s : 1: optimize 0.02% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000211s : 1: renormalize.infer 0.15% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000040s : 1: rewriter_after_opt_a 0.05% : 0.000060s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 81.11% : 0.102697s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 4.80% : 0.006073s : 1: type_inference 0.04% : 0.000056s : 1: validate TotalTime = 0.112407, [24] [bootstrap]: 0.00048917 [type_inference]: 0.00452588 [event_method]: 1.102e-05 [auto_monad]: 5.256e-05 [graph_reusing]: 5.44998e-06 [inline]: 1.96e-06 [add_attr]: 0.00305528, [1] [add_attr_with_inline]: 0.00304755, [1] [Cycle 1]: 4.541e-05, [2] [tag_attr]: 1.192e-05 [meta_addattr_fg_expand]: 3.04999e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.126e-05 [insert-virtual-dataset]: 2.57001e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.00368288, [53] [py_interpret_to_execute]: 1.541e-05 [rewriter_before_opt_a]: 3.979e-05 [opt_a]: 0.00187255, [2] [Cycle 1]: 0.00127385, [45] [expand_dump_flag]: 2.75997e-06 [switch_simplify]: 2.433e-05 [loop_unroll]: 1.365e-05 [a_1]: 0.00029244 [with_stream_mark]: 1.328e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 3.45998e-06 [updatestate_assign_eliminate]: 3.00002e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 1.66998e-06 [a_2]: 7.639e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 2.39001e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.60001e-06 [merge_send_recv]: 7.45998e-06 [auto_parallel]: 6.06e-06 [parallel]: 1.805e-05 [flash_sp]: 2.998e-05 [merge_comm]: 4.13999e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 8.64003e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.27999e-06 [virtual_output]: 5.44998e-06 [merge_forward]: 3.78001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.76e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.125e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.54999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45003e-06 [meta_fg_expand]: 2.49001e-06 [flash_sp_send_recv_attached]: 2.93998e-06 [receive_attached]: 2.58998e-06 [after_resolve]: 1.049e-05 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00034054 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.329e-05 [cse]: 2.628e-05 [a_3]: 3.964e-05 [Cycle 2]: 0.00058967, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.59999e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012519 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.90998e-06 [updatestate_assign_eliminate]: 2.18998e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.763e-05 [accelerated_algorithm]: 5.39998e-06 [shard]: 1.30001e-06 [meta_shard_fg_expand]: 1.04003e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 5.17e-06 [parallel]: 4.28001e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.29001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.44e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.18998e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.88001e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 5.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.50001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.62998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 9.99979e-07 [after_resolve]: 8.90999e-06 [a_after_grad]: 7.85998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.259e-05 [a_3]: 3.184e-05 [py_interpret_to_execute_after_opt_a]: 7.3e-06 [slice_cell_reuse_recomputed_activation]: 1.64e-06 [rewriter_after_opt_a]: 3.521e-05 [convert_after_rewriter]: 7.27002e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00044656 [opt_b]: 0.00018169, [1] [Cycle 1]: 0.00017552, [7] [b_1]: 0.0001072 [b_2]: 7.11999e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.33998e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.30008e-07 [cse]: 1.685e-05 [optimize_parallel_all_gather_comm]: 1.565e-05 [overlap_param_gather]: 2.22001e-06 [cconv]: 2.173e-05 [loop_unroll]: 0.00041187 [opt_after_cconv]: 9.372e-05, [1] [Cycle 1]: 8.789e-05, [7] [c_1]: 2.754e-05 [parameter_eliminate]: 2.01e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.07001e-06 [cse]: 1.609e-05 [renormalize]: 3.70026e-07 [remove_dup_value]: 1.21e-05 [tuple_transform]: 6.979e-05, [1] [Cycle 1]: 6.544e-05, [4] [d_1]: 3.949e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.762e-05 [cse_after_recomputation]: 2.011e-05, [1] [Cycle 1]: 1.562e-05, [1] [cse]: 1.055e-05 [environ_conv]: 4.33999e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.33998e-06 [label_micro_interleaved_index]: 4.05998e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.21997e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.14e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.158e-05 [grouped_pairwise_exchange_alltoall]: 1.44998e-06 [offloading_packed_experts]: 3.88999e-06 [overlap_recompute_and_grad_model_parallel]: 4.85001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 1.95001e-06 [overlap_grad_ring_attention]: 4.25999e-06 [overlap_grad_flash_sp]: 1.765e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.17999e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.823e-05, [1] [Cycle 1]: 6.41e-05, [6] [build]: 2.73998e-06 [elim_shapecalc]: 8.1e-06 [elim_not_effective]: 1.144e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.65001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.531e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.83999e-06 [opt_after_jit_grad]: 0.00044678 [validate]: 3.438e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.0998305 [execute]: 1.003e-05 Sums bootstrap : 0.000489s : 0.45% type_inference : 0.004526s : 4.18% event_method : 0.000011s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000418s : 0.39% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000033s : 0.03% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000341s : 0.31% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.41% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000412s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000447s : 0.41% validate : 0.000034s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.099830s : 92.11% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.09% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.40% : 0.000005s : 4: substitution.graph_param_transform 66.14% : 0.000080s : 2: substitution.inline 2.35% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 3.21% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004484 2 91.05% : 0.004083s : 1: type_inference.infer 8.95% : 0.000401s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.04% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.47% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.88% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.19% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.44% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.11% : 0.000002s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.44% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.26% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.54% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000256 6 43.88% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.12% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120407 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.54% : 0.003060s : 1: add_attr 2.53% : 0.003051s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.43% : 0.000522s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.64% : 0.000767s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.56% : 0.001875s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.38% : 0.000456s : 1: opt_after_jit_grad 0.15% : 0.000185s : 1: opt_b 3.06% : 0.003687s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000187s : 1: renormalize.infer 0.12% : 0.000147s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000040s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 82.93% : 0.099852s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 3.77% : 0.004540s : 1: type_inference 0.05% : 0.000055s : 1: validate TotalTime = 0.113566, [24] [bootstrap]: 0.00046801 [type_inference]: 0.00554919 [event_method]: 1.46e-05 [auto_monad]: 5.444e-05 [graph_reusing]: 5.81e-06 [inline]: 1.75001e-06 [add_attr]: 0.00298775, [1] [add_attr_with_inline]: 0.00297937, [1] [Cycle 1]: 4.481e-05, [2] [tag_attr]: 1.548e-05 [meta_addattr_fg_expand]: 4.08001e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.48e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00401749, [53] [py_interpret_to_execute]: 2.04e-05 [rewriter_before_opt_a]: 5.813e-05 [opt_a]: 0.00218802, [2] [Cycle 1]: 0.0015195, [45] [expand_dump_flag]: 3.08e-06 [switch_simplify]: 3.117e-05 [loop_unroll]: 2.084e-05 [a_1]: 0.00044936 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 7.75e-06 [updatestate_depend_eliminate]: 3.77998e-06 [updatestate_assign_eliminate]: 2.99999e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 2.07001e-06 [a_2]: 8.302e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 7.87e-06 [auto_parallel]: 6.06e-06 [parallel]: 1.627e-05 [flash_sp]: 7.68999e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.56001e-06 [matmul_add_comm_reduction]: 9.24998e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.06999e-06 [virtual_dataset]: 5.86998e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.89002e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 1.26e-05 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.36998e-06 [flash_sp_send_recv_attached]: 2.77002e-06 [receive_attached]: 2.52001e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 8.45001e-06 [renormalize]: 0.00041972 [add_forward_monad_depend]: 5.02e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.321e-05 [cse]: 2.82e-05 [a_3]: 4.144e-05 [Cycle 2]: 0.00059709, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.11999e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012464 [with_stream_mark]: 1.041e-05 [recompute_prepare]: 5.92001e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.779e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.42999e-06 [merge_send_recv]: 4.29002e-06 [auto_parallel]: 5.19e-06 [parallel]: 3.83001e-06 [flash_sp]: 3.38999e-06 [merge_comm]: 2.91999e-06 [allreduce_fusion]: 3.20002e-06 [matmul_add_comm_reduction]: 4.88001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.80999e-06 [merge_forward]: 2.46e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 6.13998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52999e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.05998e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 8.70001e-06 [a_after_grad]: 7.97998e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 6.59999e-06 [cse]: 1.359e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.93001e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.17e-05 [convert_after_rewriter]: 7.08998e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.0004453 [opt_b]: 0.00018151, [1] [Cycle 1]: 0.00017535, [7] [b_1]: 0.00010783 [b_2]: 7.08998e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.38998e-06 [renormalize]: 2.80008e-07 [cse]: 1.596e-05 [optimize_parallel_all_gather_comm]: 1.613e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.158e-05 [loop_unroll]: 0.0004115 [opt_after_cconv]: 9.696e-05, [1] [Cycle 1]: 9.075e-05, [7] [c_1]: 2.823e-05 [parameter_eliminate]: 2.51e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.23998e-06 [cse]: 1.641e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.201e-05 [tuple_transform]: 7.038e-05, [1] [Cycle 1]: 6.607e-05, [4] [d_1]: 3.963e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.90001e-06 [add_recomputation]: 4.327e-05 [cse_after_recomputation]: 2.067e-05, [1] [Cycle 1]: 1.642e-05, [1] [cse]: 1.106e-05 [environ_conv]: 4.89e-06 [swap_dp_allreduce_reducescatter]: 5.10001e-06 [bias_add_comm_swap]: 2.97002e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.51998e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 8.80013e-07 [full_micro_interleaved_order_control]: 1.92999e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.04003e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.172e-05 [grouped_pairwise_exchange_alltoall]: 1.40999e-06 [offloading_packed_experts]: 3.53999e-06 [overlap_recompute_and_grad_model_parallel]: 4.38001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.13002e-06 [overlap_grad_ring_attention]: 3.76999e-06 [overlap_grad_flash_sp]: 1.732e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.40001e-06 [symbol_engine_optimizer]: 6.839e-05, [1] [Cycle 1]: 6.429e-05, [6] [build]: 2.45002e-06 [elim_shapecalc]: 8.21002e-06 [elim_not_effective]: 1.112e-05 [opt_reshape]: 6.24001e-06 [fold_const_symbol]: 9.20001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.571e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 4.07998e-06 [opt_after_jit_grad]: 0.00044366 [validate]: 3.147e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.0997172 [execute]: 9.21998e-06 Sums bootstrap : 0.000468s : 0.43% type_inference : 0.005549s : 5.07% event_method : 0.000015s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000574s : 0.52% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000020s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000021s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.01% optimize.opt_a.renormalize : 0.000420s : 0.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000445s : 0.41% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000411s : 0.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000444s : 0.41% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.099717s : 91.03% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.44% : 0.000024s : 5: substitution.arithmetic_simplify 0.98% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 3.51% : 0.000006s : 4: substitution.graph_param_transform 65.56% : 0.000110s : 3: substitution.inline 3.56% : 0.000006s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.20% : 0.000004s : 4: substitution.replace_old_param 6.26% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005507 2 89.85% : 0.004948s : 1: type_inference.infer 10.15% : 0.000559s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.79% : 0.000027s : 3: replace.inline 30.21% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.93% : 0.000108s : 3: match.inline 8.07% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.30% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.22% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.44% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.59% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.43% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000362 8 46.10% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.90% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.122093 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.45% : 0.002992s : 1: add_attr 2.44% : 0.002983s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000502s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.78% : 0.000948s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.79% : 0.002191s : 1: opt_a 0.08% : 0.000100s : 1: opt_after_cconv 0.37% : 0.000454s : 1: opt_after_jit_grad 0.15% : 0.000185s : 1: opt_b 3.29% : 0.004021s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000029s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.17% : 0.000205s : 1: renormalize.infer 0.17% : 0.000208s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 81.69% : 0.099739s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.56% : 0.005562s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.143003, [24] [bootstrap]: 0.00050616 [type_inference]: 0.011562 [event_method]: 5.069e-05 [auto_monad]: 0.00012249 [graph_reusing]: 7.41999e-06 [inline]: 2.02999e-06 [add_attr]: 0.00304673, [1] [add_attr_with_inline]: 0.00303852, [1] [Cycle 1]: 7.186e-05, [2] [tag_attr]: 3.508e-05 [meta_addattr_fg_expand]: 9.91998e-06 [parallel-infer-symbol]: 2.59999e-06 [pre_auto_parallel]: 5.072e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.88997e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0135439, [53] [py_interpret_to_execute]: 3.75e-05 [rewriter_before_opt_a]: 0.0001472 [opt_a]: 0.0112427, [3] [Cycle 1]: 0.00725292, [45] [expand_dump_flag]: 3.81001e-06 [switch_simplify]: 7.346e-05 [loop_unroll]: 6.263e-05 [a_1]: 0.00152194 [with_stream_mark]: 2.277e-05 [recompute_prepare]: 2.159e-05 [updatestate_depend_eliminate]: 9.39998e-06 [updatestate_assign_eliminate]: 7.61001e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 2.43e-06 [a_2]: 0.00024327 [accelerated_algorithm]: 3.043e-05 [shard]: 1.82001e-06 [meta_shard_fg_expand]: 3.49001e-06 [shard_inline]: 1.6e-05 [merge_send_recv]: 1.559e-05 [auto_parallel]: 1.035e-05 [parallel]: 1.769e-05 [flash_sp]: 1.116e-05 [merge_comm]: 9.52999e-06 [allreduce_fusion]: 8.98002e-06 [matmul_add_comm_reduction]: 2.732e-05 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 1.752e-05 [virtual_dataset]: 1.606e-05 [get_grad_eliminate_]: 1.542e-05 [virtual_output]: 1.513e-05 [merge_forward]: 9.01998e-06 [cell_reuse_recompute_pass]: 9.79984e-07 [offload_activation]: 1.786e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.86e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.68e-05 [set_forward_comm_id_for_comm_node_pass]: 9.64999e-06 [meta_fg_expand]: 0.00142967 [flash_sp_send_recv_attached]: 3.91001e-06 [receive_attached]: 2.33002e-06 [after_resolve]: 6.032e-05 [a_after_grad]: 8.068e-05 [renormalize]: 0.00253016 [add_forward_monad_depend]: 8.92e-06 [auto_monad_grad]: 5.31002e-06 [auto_monad_eliminator]: 5.643e-05 [cse]: 0.00017436 [a_3]: 0.00033603 [Cycle 2]: 0.00306467, [45] [expand_dump_flag]: 1.45001e-06 [switch_simplify]: 4.737e-05 [loop_unroll]: 4.496e-05 [a_1]: 0.0015521 [with_stream_mark]: 1.216e-05 [recompute_prepare]: 1.067e-05 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 4.52e-06 [updatestate_loads_eliminate]: 3.73999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012666 [accelerated_algorithm]: 1.213e-05 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.75001e-06 [shard_inline]: 9.25999e-06 [merge_send_recv]: 6.86001e-06 [auto_parallel]: 7.51999e-06 [parallel]: 4.80001e-06 [flash_sp]: 2.97002e-06 [merge_comm]: 5.24e-06 [allreduce_fusion]: 4.80001e-06 [matmul_add_comm_reduction]: 7.99002e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 1.06e-05 [virtual_dataset]: 9.37001e-06 [get_grad_eliminate_]: 8.94e-06 [virtual_output]: 8.66997e-06 [merge_forward]: 4.26001e-06 [cell_reuse_recompute_pass]: 7.7e-07 [offload_activation]: 9.49999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.626e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.426e-05 [set_forward_comm_id_for_comm_node_pass]: 5.57001e-06 [meta_fg_expand]: 6.902e-05 [flash_sp_send_recv_attached]: 1.02998e-06 [receive_attached]: 1.04e-06 [after_resolve]: 1.612e-05 [a_after_grad]: 1.468e-05 [renormalize]: 0.000626 [add_forward_monad_depend]: 3.91999e-06 [auto_monad_grad]: 1.21002e-06 [auto_monad_eliminator]: 1.52e-05 [cse]: 4.789e-05 [a_3]: 6.495e-05 [Cycle 3]: 0.00091127, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 1.098e-05 [loop_unroll]: 9.15999e-06 [a_1]: 0.0002506 [with_stream_mark]: 1.019e-05 [recompute_prepare]: 9.76998e-06 [updatestate_depend_eliminate]: 4.83001e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 3.89002e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 0.00012358 [accelerated_algorithm]: 1.2e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 8.94998e-06 [merge_send_recv]: 7.13998e-06 [auto_parallel]: 7.19001e-06 [parallel]: 4.68999e-06 [flash_sp]: 1.12e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.85001e-06 [matmul_add_comm_reduction]: 7.80998e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.039e-05 [virtual_dataset]: 8.87e-06 [get_grad_eliminate_]: 8.57e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.66002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.603e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 1.52e-05 [set_forward_comm_id_for_comm_node_pass]: 5.98998e-06 [meta_fg_expand]: 3.28998e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.45e-05 [a_after_grad]: 1.42e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.27999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.093e-05 [cse]: 2.747e-05 [a_3]: 5.964e-05 [py_interpret_to_execute_after_opt_a]: 1.069e-05 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 4.813e-05 [convert_after_rewriter]: 9.22001e-06 [order_py_execute_after_rewriter]: 7.06999e-06 [mutable_eliminate]: 0.00046729 [opt_b]: 0.00029256, [1] [Cycle 1]: 0.00028644, [7] [b_1]: 0.00019108 [b_2]: 1.08e-05 [updatestate_depend_eliminate]: 7.58001e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 4.25999e-06 [renormalize]: 5.3001e-07 [cse]: 3.301e-05 [optimize_parallel_all_gather_comm]: 2.131e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 1.939e-05 [loop_unroll]: 0.00043523 [opt_after_cconv]: 0.00014015, [1] [Cycle 1]: 0.00013408, [7] [c_1]: 4.975e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 7.46999e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 4.07998e-06 [cse]: 3.104e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 2.897e-05 [tuple_transform]: 0.00010268, [1] [Cycle 1]: 9.808e-05, [4] [d_1]: 6.756e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 1.004e-05 [partial_unused_args_eliminate]: 1.84998e-06 [add_recomputation]: 5.678e-05 [cse_after_recomputation]: 3.339e-05, [1] [Cycle 1]: 2.863e-05, [1] [cse]: 2.31e-05 [environ_conv]: 9.57999e-06 [swap_dp_allreduce_reducescatter]: 7.95998e-06 [bias_add_comm_swap]: 2.23998e-06 [label_micro_interleaved_index]: 4.77e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.57999e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.47001e-06 [ForceFp32Comm]: 8.60018e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.22999e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.758e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 4.88001e-06 [overlap_recompute_and_grad_model_parallel]: 5.65001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 4.84998e-06 [overlap_grad_flash_sp]: 2.448e-05 [begin_end_overlap_inline]: 8.2e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 1.31002e-06 [symbol_engine_optimizer]: 9.944e-05, [1] [Cycle 1]: 9.51e-05, [6] [build]: 9.23002e-06 [elim_shapecalc]: 1.335e-05 [elim_not_effective]: 1.867e-05 [opt_reshape]: 1.048e-05 [fold_const_symbol]: 1.504e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 2.555e-05 [get_jit_bprop_graph]: 1.14e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00051843 [validate]: 4.522e-05 [backend_pass]: 9.90025e-07 [task_emit]: 0.113276 [execute]: 9.42999e-06 Sums bootstrap : 0.000506s : 0.36% type_inference : 0.011562s : 8.34% event_method : 0.000051s : 0.04% auto_monad : 0.000122s : 0.09% graph_reusing : 0.000007s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000051s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000147s : 0.11% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000132s : 0.10% optimize.opt_a.loop_unroll : 0.000117s : 0.08% optimize.opt_a.a_1 : 0.003325s : 2.40% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.36% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001502s : 1.08% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.07% optimize.opt_a.a_after_grad : 0.000110s : 0.08% optimize.opt_a.renormalize : 0.003156s : 2.28% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000250s : 0.18% optimize.opt_a.a_3 : 0.000461s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.34% optimize.opt_b.b_1 : 0.000191s : 0.14% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000033s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.01% optimize.loop_unroll : 0.000435s : 0.31% optimize.opt_after_cconv.c_1 : 0.000050s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000068s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000518s : 0.37% validate : 0.000045s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.113276s : 81.68% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000830 222 13.36% : 0.000111s : 12: substitution.arithmetic_simplify 1.71% : 0.000014s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.44% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.26% : 0.000002s : 5: substitution.fold_const_symbol 0.89% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 51.71% : 0.000429s : 17: substitution.inline 1.82% : 0.000015s : 2: substitution.inline_without_move 1.26% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.77% : 0.000015s : 3: substitution.less_batch_normalization 1.56% : 0.000013s : 11: substitution.minmaximum_grad 0.63% : 0.000005s : 5: substitution.partial_eliminate 1.57% : 0.000013s : 20: substitution.remove_not_recompute_node 2.81% : 0.000023s : 10: substitution.replace_applicator 1.23% : 0.000010s : 15: substitution.replace_old_param 0.27% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.29% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.63% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.18% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 7.94% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.25% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011487 2 86.79% : 0.009969s : 1: type_inference.infer 13.21% : 0.001518s : 1: type_inference.specialize ------[replace.] 0.000224 33 57.42% : 0.000129s : 17: replace.inline 42.58% : 0.000095s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000454 33 92.60% : 0.000420s : 17: match.inline 7.40% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5764 1.06% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.71% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000018s : 101: predicate.float_depend_g_call 0.49% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.56% : 0.000042s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.29% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000009s : 68: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.78% : 0.000014s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.11% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.51% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.65% : 0.000013s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.22% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.19% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001640 34 57.23% : 0.000939s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.77% : 0.000702s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168070 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.82% : 0.003051s : 1: add_attr 1.81% : 0.003042s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000130s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.32% : 0.000538s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000057s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.26% : 0.000444s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000476s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.97% : 0.004994s : 117: opt.transform.opt_a 0.03% : 0.000049s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000176s : 28: opt.transform.opt_b 0.04% : 0.000076s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.69% : 0.011246s : 1: opt_a 0.09% : 0.000144s : 1: opt_after_cconv 0.31% : 0.000529s : 1: opt_after_jit_grad 0.18% : 0.000296s : 1: opt_b 8.06% : 0.013548s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000055s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.00% : 0.001686s : 2: renormalize.infer 0.87% : 0.001457s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000053s : 1: rewriter_after_opt_a 0.09% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000102s : 1: symbol_engine_optimizer 67.41% : 0.113299s : 1: task_emit 0.06% : 0.000106s : 1: tuple_transform 6.89% : 0.011577s : 1: type_inference 0.04% : 0.000069s : 1: validate TotalTime = 0.104502, [24] [bootstrap]: 0.00046731 [type_inference]: 0.00432643 [event_method]: 1.051e-05 [auto_monad]: 5.1e-05 [graph_reusing]: 5.05001e-06 [inline]: 1.76e-06 [add_attr]: 0.00303866, [1] [add_attr_with_inline]: 0.00303036, [1] [Cycle 1]: 4.635e-05, [2] [tag_attr]: 1.179e-05 [meta_addattr_fg_expand]: 4.06001e-06 [parallel-infer-symbol]: 2.77002e-06 [pre_auto_parallel]: 2.096e-05 [insert-virtual-dataset]: 2.56998e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.74998e-06 [optimize]: 0.00375432, [53] [py_interpret_to_execute]: 1.583e-05 [rewriter_before_opt_a]: 4.005e-05 [opt_a]: 0.00188529, [2] [Cycle 1]: 0.00127775, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 2.43e-05 [loop_unroll]: 1.394e-05 [a_1]: 0.00029558 [with_stream_mark]: 1.311e-05 [recompute_prepare]: 7.95e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.50998e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 7.846e-05 [accelerated_algorithm]: 6.18002e-06 [shard]: 2.30002e-06 [meta_shard_fg_expand]: 1.46998e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 7.32997e-06 [auto_parallel]: 6.16998e-06 [parallel]: 1.747e-05 [flash_sp]: 7.49002e-06 [merge_comm]: 3.80998e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.55001e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.38999e-06 [virtual_dataset]: 5.84999e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 9.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.113e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 9.47001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.24999e-06 [receive_attached]: 2.42001e-06 [after_resolve]: 1.147e-05 [a_after_grad]: 9.14e-06 [renormalize]: 0.00035326 [add_forward_monad_depend]: 4.89998e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.255e-05 [cse]: 2.943e-05 [a_3]: 4.111e-05 [Cycle 2]: 0.00059821, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.66e-06 [a_1]: 0.00012686 [with_stream_mark]: 1.012e-05 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.48998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.873e-05 [accelerated_algorithm]: 5.71e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.40001e-06 [parallel]: 4.3e-06 [flash_sp]: 2.97002e-06 [merge_comm]: 3.16001e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 4.92999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.52001e-06 [offload_activation]: 6.28e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.05999e-06 [a_after_grad]: 8.28001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.63e-06 [cse]: 1.295e-05 [a_3]: 3.226e-05 [py_interpret_to_execute_after_opt_a]: 7.9e-06 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 3.025e-05 [convert_after_rewriter]: 6.69999e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00045199 [opt_b]: 0.00018256, [1] [Cycle 1]: 0.00017658, [7] [b_1]: 0.00010844 [b_2]: 7.09001e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 2.3999e-07 [cse]: 1.668e-05 [optimize_parallel_all_gather_comm]: 1.556e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.197e-05 [loop_unroll]: 0.00042003 [opt_after_cconv]: 9.543e-05, [1] [Cycle 1]: 8.955e-05, [7] [c_1]: 2.753e-05 [parameter_eliminate]: 2.01003e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.706e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.907e-05, [1] [Cycle 1]: 6.475e-05, [4] [d_1]: 3.889e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.18998e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.422e-05 [cse_after_recomputation]: 2.025e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.077e-05 [environ_conv]: 4.63999e-06 [swap_dp_allreduce_reducescatter]: 5.01002e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.88998e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.58998e-06 [micro_interleaved_order_control]: 2.28998e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.29998e-06 [overlap_opt_shard_in_pipeline]: 1.32999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.226e-05 [grouped_pairwise_exchange_alltoall]: 2.06e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 4.32e-06 [overlap_grad_flash_sp]: 1.646e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 6.898e-05, [1] [Cycle 1]: 6.444e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.66002e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.537e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.53e-06 [opt_after_jit_grad]: 0.00045365 [validate]: 3.098e-05 [backend_pass]: 8.29983e-07 [task_emit]: 0.0920916 [execute]: 8.95001e-06 Sums bootstrap : 0.000467s : 0.47% type_inference : 0.004326s : 4.31% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000020s : 0.02% optimize.opt_a.a_1 : 0.000422s : 0.42% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.15% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000021s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000353s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.45% optimize.opt_b.b_1 : 0.000108s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000420s : 0.42% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000454s : 0.45% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.092092s : 91.69% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.32% : 0.000022s : 4: substitution.arithmetic_simplify 1.40% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.14% : 0.000005s : 4: substitution.graph_param_transform 65.62% : 0.000079s : 2: substitution.inline 2.37% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.57% : 0.000004s : 4: substitution.remove_not_recompute_node 3.52% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004285 2 91.67% : 0.003928s : 1: type_inference.infer 8.33% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.39% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.58% : 0.000004s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 5.77% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.91% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.58% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 41.23% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.77% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.112581 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.70% : 0.003043s : 1: add_attr 2.69% : 0.003034s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000503s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.38% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.69% : 0.000779s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.68% : 0.001888s : 1: opt_a 0.09% : 0.000099s : 1: opt_after_cconv 0.41% : 0.000463s : 1: opt_after_jit_grad 0.21% : 0.000233s : 1: opt_b 3.34% : 0.003758s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000192s : 1: renormalize.infer 0.14% : 0.000154s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000044s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 81.82% : 0.092114s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.85% : 0.004340s : 1: type_inference 0.05% : 0.000052s : 1: validate TotalTime = 0.143408, [24] [bootstrap]: 0.0005128 [type_inference]: 0.0103217 [event_method]: 4.376e-05 [auto_monad]: 0.00011596 [graph_reusing]: 8.05e-06 [inline]: 2.04e-06 [add_attr]: 0.00299369, [1] [add_attr_with_inline]: 0.00298529, [1] [Cycle 1]: 6.449e-05, [2] [tag_attr]: 3.043e-05 [meta_addattr_fg_expand]: 8.57e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 4.534e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.63002e-06 [optimize]: 0.0131996, [53] [py_interpret_to_execute]: 3.543e-05 [rewriter_before_opt_a]: 0.00012597 [opt_a]: 0.0108904, [3] [Cycle 1]: 0.00696522, [45] [expand_dump_flag]: 3.53e-06 [switch_simplify]: 6.604e-05 [loop_unroll]: 5.478e-05 [a_1]: 0.00137086 [with_stream_mark]: 2.272e-05 [recompute_prepare]: 2.163e-05 [updatestate_depend_eliminate]: 9.34e-06 [updatestate_assign_eliminate]: 7.79002e-06 [updatestate_loads_eliminate]: 7.11999e-06 [parameter_eliminate]: 2.88e-06 [a_2]: 0.00024455 [accelerated_algorithm]: 3.08e-05 [shard]: 2.01998e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.594e-05 [merge_send_recv]: 1.53e-05 [auto_parallel]: 1.053e-05 [parallel]: 1.757e-05 [flash_sp]: 1.07e-05 [merge_comm]: 9.94001e-06 [allreduce_fusion]: 8.85999e-06 [matmul_add_comm_reduction]: 2.691e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.766e-05 [virtual_dataset]: 1.568e-05 [get_grad_eliminate_]: 1.603e-05 [virtual_output]: 1.553e-05 [merge_forward]: 9.52999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.742e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.891e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.727e-05 [set_forward_comm_id_for_comm_node_pass]: 1.023e-05 [meta_fg_expand]: 0.00139036 [flash_sp_send_recv_attached]: 3.79002e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 5.834e-05 [a_after_grad]: 7.987e-05 [renormalize]: 0.00245393 [add_forward_monad_depend]: 9.16998e-06 [auto_monad_grad]: 5.15001e-06 [auto_monad_eliminator]: 5.62e-05 [cse]: 0.00017211 [a_3]: 0.00033498 [Cycle 2]: 0.00300915, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.704e-05 [loop_unroll]: 4.395e-05 [a_1]: 0.00158059 [with_stream_mark]: 1.238e-05 [recompute_prepare]: 1.128e-05 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 4.34002e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 9.79984e-07 [a_2]: 0.00012536 [accelerated_algorithm]: 1.189e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.14998e-06 [merge_send_recv]: 6.53e-06 [auto_parallel]: 7.14001e-06 [parallel]: 4.70001e-06 [flash_sp]: 3.08e-06 [merge_comm]: 5.12e-06 [allreduce_fusion]: 4.58999e-06 [matmul_add_comm_reduction]: 7.81001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.01e-05 [virtual_dataset]: 9.10999e-06 [get_grad_eliminate_]: 8.91997e-06 [virtual_output]: 8.60001e-06 [merge_forward]: 4.34997e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 8.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.64e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.414e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20001e-06 [meta_fg_expand]: 3.467e-05 [flash_sp_send_recv_attached]: 8.59989e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.538e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00058491 [add_forward_monad_depend]: 3.75e-06 [auto_monad_grad]: 1.11002e-06 [auto_monad_eliminator]: 1.498e-05 [cse]: 4.657e-05 [a_3]: 6.497e-05 [Cycle 3]: 0.00090214, [45] [expand_dump_flag]: 1.04998e-06 [switch_simplify]: 1.051e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.00025116 [with_stream_mark]: 9.68997e-06 [recompute_prepare]: 9.51e-06 [updatestate_depend_eliminate]: 4.68999e-06 [updatestate_assign_eliminate]: 3.82998e-06 [updatestate_loads_eliminate]: 3.86001e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012334 [accelerated_algorithm]: 1.168e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.76998e-06 [shard_inline]: 8.90001e-06 [merge_send_recv]: 7.16999e-06 [auto_parallel]: 7.21001e-06 [parallel]: 4.53001e-06 [flash_sp]: 1.14e-06 [merge_comm]: 4.94e-06 [allreduce_fusion]: 5.02e-06 [matmul_add_comm_reduction]: 7.76001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.004e-05 [virtual_dataset]: 8.87e-06 [get_grad_eliminate_]: 8.80999e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.33999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 8.77e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.607e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.363e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24998e-06 [meta_fg_expand]: 2.96001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.34e-05 [a_after_grad]: 1.385e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.21997e-06 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 1.156e-05 [cse]: 2.69e-05 [a_3]: 5.883e-05 [py_interpret_to_execute_after_opt_a]: 1.03e-05 [slice_cell_reuse_recomputed_activation]: 1.80001e-06 [rewriter_after_opt_a]: 4.701e-05 [convert_after_rewriter]: 9.07001e-06 [order_py_execute_after_rewriter]: 6.62002e-06 [mutable_eliminate]: 0.00046207 [opt_b]: 0.00028969, [1] [Cycle 1]: 0.0002836, [7] [b_1]: 0.00018986 [b_2]: 1.153e-05 [updatestate_depend_eliminate]: 7.31999e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 4.03999e-06 [renormalize]: 3.19997e-07 [cse]: 3.118e-05 [optimize_parallel_all_gather_comm]: 2.107e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 1.947e-05 [loop_unroll]: 0.00042464 [opt_after_cconv]: 0.00020305, [1] [Cycle 1]: 0.00019716, [7] [c_1]: 4.89e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 7.15e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.86999e-06 [cse]: 9.483e-05 [renormalize]: 2.59985e-07 [remove_dup_value]: 2.878e-05 [tuple_transform]: 0.00010209, [1] [Cycle 1]: 9.728e-05, [4] [d_1]: 6.699e-05 [none_parameter_eliminate]: 2.02001e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 1.007e-05 [partial_unused_args_eliminate]: 1.98997e-06 [add_recomputation]: 5.826e-05 [cse_after_recomputation]: 3.199e-05, [1] [Cycle 1]: 2.731e-05, [1] [cse]: 2.197e-05 [environ_conv]: 9.19998e-06 [swap_dp_allreduce_reducescatter]: 7.88999e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 4.14002e-06 [label_fine_grained_interleaved_index]: 2.30002e-06 [merge_cast_opt]: 1.20001e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.49001e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.99978e-07 [interleave_split_concat_branches]: 1.09003e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.712e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 5.07e-06 [overlap_recompute_and_grad_model_parallel]: 5.84e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.03002e-06 [overlap_grad_ring_attention]: 5.11997e-06 [overlap_grad_flash_sp]: 2.416e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.03002e-06 [split_layernorm_comm]: 2.01998e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 9.718e-05, [1] [Cycle 1]: 9.291e-05, [6] [build]: 9.14e-06 [elim_shapecalc]: 1.331e-05 [elim_not_effective]: 1.86e-05 [opt_reshape]: 9.43002e-06 [fold_const_symbol]: 1.539e-05 [renormalize]: 3.30008e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 2.486e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00046945 [validate]: 4.712e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.115383 [execute]: 9.35001e-06 Sums bootstrap : 0.000513s : 0.37% type_inference : 0.010322s : 7.42% event_method : 0.000044s : 0.03% auto_monad : 0.000116s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000126s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000124s : 0.09% optimize.opt_a.loop_unroll : 0.000108s : 0.08% optimize.opt_a.a_1 : 0.003203s : 2.30% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000493s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.02% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001428s : 1.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000087s : 0.06% optimize.opt_a.a_after_grad : 0.000108s : 0.08% optimize.opt_a.renormalize : 0.003039s : 2.18% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.06% optimize.opt_a.cse : 0.000246s : 0.18% optimize.opt_a.a_3 : 0.000459s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000462s : 0.33% optimize.opt_b.b_1 : 0.000190s : 0.14% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.01% optimize.loop_unroll : 0.000425s : 0.31% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000095s : 0.07% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000009s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000469s : 0.34% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.115383s : 82.92% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000789 218 5.52% : 0.000044s : 11: substitution.arithmetic_simplify 1.69% : 0.000013s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.92% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 51.09% : 0.000403s : 16: substitution.inline 1.99% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.89% : 0.000015s : 3: substitution.less_batch_normalization 1.65% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000014s : 20: substitution.remove_not_recompute_node 3.08% : 0.000024s : 10: substitution.replace_applicator 1.28% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.48% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.29% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 14.66% : 0.000116s : 28: substitution.tuple_list_get_item_eliminator 2.29% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010253 2 86.64% : 0.008883s : 1: type_inference.infer 13.36% : 0.001370s : 1: type_inference.specialize ------[replace.] 0.000203 30 58.77% : 0.000119s : 16: replace.inline 41.23% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000479 30 82.42% : 0.000395s : 16: match.inline 17.58% : 0.000084s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.08% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.80% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.67% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.53% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.21% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.38% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 97: predicate.partial_defer_inline 1.69% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.24% : 0.000009s : 67: predicate.reduce_eliminate 2.70% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.65% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.20% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000004s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.90% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.59% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001534 32 57.81% : 0.000887s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.19% : 0.000647s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.167810 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.79% : 0.002998s : 1: add_attr 1.78% : 0.002989s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000123s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.32% : 0.000545s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000050s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.26% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000471s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.89% : 0.004847s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000053s : 4: opt.transform.symbol_engine_opt 6.49% : 0.010893s : 1: opt_a 0.12% : 0.000207s : 1: opt_after_cconv 0.29% : 0.000479s : 1: opt_after_jit_grad 0.17% : 0.000293s : 1: opt_b 7.87% : 0.013203s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.02% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.96% : 0.001614s : 2: renormalize.infer 0.84% : 0.001412s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.08% : 0.000130s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000100s : 1: symbol_engine_optimizer 68.77% : 0.115404s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.16% : 0.010336s : 1: type_inference 0.04% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x2-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x3-pynative],max_mem:60.0M TotalTime = 0.0215381, [24] [bootstrap]: 0.00054025 [type_inference]: 0.00623504 [event_method]: 1.461e-05 [auto_monad]: 5.567e-05 [graph_reusing]: 5.45001e-06 [inline]: 1.64e-06 [add_attr]: 0.00342544, [1] [add_attr_with_inline]: 0.0034144, [1] [Cycle 1]: 4.273e-05, [2] [tag_attr]: 1.462e-05 [meta_addattr_fg_expand]: 4.02e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.705e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00398016, [53] [py_interpret_to_execute]: 2.016e-05 [rewriter_before_opt_a]: 5.75e-05 [opt_a]: 0.00211091, [2] [Cycle 1]: 0.00151213, [45] [expand_dump_flag]: 2.59001e-06 [switch_simplify]: 3.177e-05 [loop_unroll]: 2.122e-05 [a_1]: 0.00045189 [with_stream_mark]: 1.285e-05 [recompute_prepare]: 7.74002e-06 [updatestate_depend_eliminate]: 3.51999e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.624e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 5.89999e-06 [merge_send_recv]: 7.87e-06 [auto_parallel]: 5.99e-06 [parallel]: 2.309e-05 [flash_sp]: 7.63999e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 9.17999e-06 [allreduce_slice_to_reducescatter]: 8.50006e-07 [virtual_shard_identity]: 7.26999e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.39998e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.076e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.67999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 1.119e-05 [a_after_grad]: 9.20001e-06 [renormalize]: 0.00041603 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.388e-05 [cse]: 2.703e-05 [a_3]: 4.094e-05 [Cycle 2]: 0.00058976, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.91001e-06 [loop_unroll]: 5.64998e-06 [a_1]: 0.00012811 [with_stream_mark]: 9.34998e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.799e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.65001e-06 [auto_parallel]: 5.36998e-06 [parallel]: 4.43999e-06 [flash_sp]: 3.08e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.18002e-06 [allreduce_slice_to_reducescatter]: 2.30008e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.09e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.45999e-06 [offload_activation]: 5.69999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.009e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 9.15001e-06 [a_after_grad]: 7.95e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.03998e-06 [cse]: 1.189e-05 [a_3]: 3.164e-05 [py_interpret_to_execute_after_opt_a]: 7.75998e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 2.828e-05 [convert_after_rewriter]: 6.83998e-06 [order_py_execute_after_rewriter]: 5.09e-06 [mutable_eliminate]: 0.0004581 [opt_b]: 0.00018194, [1] [Cycle 1]: 0.00017588, [7] [b_1]: 0.00010795 [b_2]: 7.14001e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.49974e-07 [cse]: 1.661e-05 [optimize_parallel_all_gather_comm]: 1.501e-05 [overlap_param_gather]: 2.18002e-06 [cconv]: 2.201e-05 [loop_unroll]: 0.00041553 [opt_after_cconv]: 9.355e-05, [1] [Cycle 1]: 8.792e-05, [7] [c_1]: 2.746e-05 [parameter_eliminate]: 2.18002e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.598e-05 [renormalize]: 4.50003e-07 [remove_dup_value]: 1.322e-05 [tuple_transform]: 6.826e-05, [1] [Cycle 1]: 6.4e-05, [4] [d_1]: 3.854e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 5.261e-05 [cse_after_recomputation]: 2.312e-05, [1] [Cycle 1]: 1.797e-05, [1] [cse]: 1.199e-05 [environ_conv]: 5.12e-06 [swap_dp_allreduce_reducescatter]: 5.82001e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.42e-06 [label_fine_grained_interleaved_index]: 2.45002e-06 [merge_cast_opt]: 1.16002e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.69999e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.34e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.171e-05 [grouped_pairwise_exchange_alltoall]: 1.97999e-06 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.26001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52999e-06 [overlap_recompute_comm]: 2.04e-06 [overlap_grad_ring_attention]: 4.23001e-06 [overlap_grad_flash_sp]: 1.665e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.952e-05, [1] [Cycle 1]: 6.541e-05, [6] [build]: 2.44999e-06 [elim_shapecalc]: 8.63001e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.28e-06 [fold_const_symbol]: 9.29e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.511e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 0.00012988 [opt_after_jit_grad]: 0.00045871 [validate]: 3.107e-05 [backend_pass]: 1.22e-06 [task_emit]: 0.00639486 [execute]: 7.16001e-06 Sums bootstrap : 0.000540s : 3.15% type_inference : 0.006235s : 36.39% event_method : 0.000015s : 0.09% auto_monad : 0.000056s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000580s : 3.38% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000416s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000458s : 2.67% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000416s : 2.43% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000053s : 0.31% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000130s : 0.76% opt_after_jit_grad : 0.000459s : 2.68% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006395s : 37.32% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.53% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.07% : 0.000005s : 4: substitution.graph_param_transform 66.54% : 0.000110s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.84% : 0.000005s : 4: substitution.remove_not_recompute_node 2.78% : 0.000005s : 4: substitution.replace_old_param 6.69% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006185 2 90.53% : 0.005599s : 1: type_inference.infer 9.47% : 0.000586s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.03% : 0.000027s : 3: replace.inline 29.97% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.50% : 0.000108s : 3: match.inline 8.50% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.83% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.31% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.13% : 0.000010s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.06% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.91% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.41% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.24% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.91% : 0.000001s : 11: predicate.transpose_eliminate 1.57% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.25% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.65% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000368 8 46.49% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.51% : 0.000197s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030438 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.27% : 0.003429s : 1: add_attr 11.23% : 0.003418s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000057s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.89% : 0.000576s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000026s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000467s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.11% : 0.000947s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.94% : 0.002114s : 1: opt_a 0.32% : 0.000097s : 1: opt_after_cconv 1.54% : 0.000468s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.09% : 0.003984s : 1: optimize 0.06% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000017s : 1: remove_dup_value 0.70% : 0.000212s : 1: renormalize.infer 0.65% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000136s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000032s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 21.04% : 0.006404s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.53% : 0.006249s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0180858, [24] [bootstrap]: 0.00046934 [type_inference]: 0.00435412 [event_method]: 1.077e-05 [auto_monad]: 4.974e-05 [graph_reusing]: 5.39998e-06 [inline]: 1.86e-06 [add_attr]: 0.00300107, [1] [add_attr_with_inline]: 0.00299333, [1] [Cycle 1]: 4.446e-05, [2] [tag_attr]: 1.178e-05 [meta_addattr_fg_expand]: 3.04001e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 2.182e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.60001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00366695, [53] [py_interpret_to_execute]: 1.498e-05 [rewriter_before_opt_a]: 3.795e-05 [opt_a]: 0.00184161, [2] [Cycle 1]: 0.00124392, [45] [expand_dump_flag]: 2.73e-06 [switch_simplify]: 2.454e-05 [loop_unroll]: 1.387e-05 [a_1]: 0.00029118 [with_stream_mark]: 1.311e-05 [recompute_prepare]: 7.64002e-06 [updatestate_depend_eliminate]: 3.41999e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.561e-05 [accelerated_algorithm]: 6.17001e-06 [shard]: 2.39001e-06 [meta_shard_fg_expand]: 1.41998e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 7.73001e-06 [auto_parallel]: 5.60001e-06 [parallel]: 1.734e-05 [flash_sp]: 7.2e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.57002e-06 [matmul_add_comm_reduction]: 9.07001e-06 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 6.88998e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.30001e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 1.001e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.39998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.31e-06 [flash_sp_send_recv_attached]: 2.88e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.051e-05 [a_after_grad]: 8.69998e-06 [renormalize]: 0.0003378 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.315e-05 [cse]: 2.633e-05 [a_3]: 3.949e-05 [Cycle 2]: 0.00058837, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00012397 [with_stream_mark]: 9.51e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.25002e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.726e-05 [accelerated_algorithm]: 5.47999e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.28002e-06 [parallel]: 4.09002e-06 [flash_sp]: 3.28e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.97999e-06 [merge_forward]: 2.36e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 5.91998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.95998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.90998e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 7.95e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.90025e-07 [auto_monad_grad]: 9.49978e-07 [auto_monad_eliminator]: 6.56999e-06 [cse]: 1.269e-05 [a_3]: 3.108e-05 [py_interpret_to_execute_after_opt_a]: 7.52998e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.219e-05 [convert_after_rewriter]: 7.26999e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.0004499 [opt_b]: 0.00018101, [1] [Cycle 1]: 0.00017513, [7] [b_1]: 0.00010702 [b_2]: 7.27002e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.42001e-06 [renormalize]: 3.00002e-07 [cse]: 1.621e-05 [optimize_parallel_all_gather_comm]: 1.554e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.144e-05 [loop_unroll]: 0.00043511 [opt_after_cconv]: 9.375e-05, [1] [Cycle 1]: 8.806e-05, [7] [c_1]: 2.739e-05 [parameter_eliminate]: 2.28998e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.23002e-06 [cse]: 1.597e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.344e-05 [tuple_transform]: 6.942e-05, [1] [Cycle 1]: 6.505e-05, [4] [d_1]: 3.933e-05 [none_parameter_eliminate]: 1.51998e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 1.55001e-06 [add_recomputation]: 4.233e-05 [cse_after_recomputation]: 1.977e-05, [1] [Cycle 1]: 1.539e-05, [1] [cse]: 1.044e-05 [environ_conv]: 4.47e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.59999e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.29001e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.29998e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.151e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.68999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 4.23999e-06 [overlap_grad_flash_sp]: 1.648e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.944e-05, [1] [Cycle 1]: 6.537e-05, [6] [build]: 2.54001e-06 [elim_shapecalc]: 8.64998e-06 [elim_not_effective]: 1.173e-05 [opt_reshape]: 6.21998e-06 [fold_const_symbol]: 8.87e-06 [renormalize]: 2.29978e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.507e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00044755 [validate]: 3.114e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00579675 [execute]: 7.03998e-06 Sums bootstrap : 0.000469s : 3.32% type_inference : 0.004354s : 30.81% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000415s : 2.94% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000338s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.23% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.18% optimize.opt_b.b_1 : 0.000107s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000435s : 3.08% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.10% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 3.17% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005797s : 41.02% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.54% : 0.000022s : 4: substitution.arithmetic_simplify 1.58% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.66% : 0.000006s : 4: substitution.graph_param_transform 65.26% : 0.000079s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.32% : 0.000004s : 4: substitution.remove_not_recompute_node 3.31% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004314 2 91.98% : 0.003968s : 1: type_inference.infer 8.02% : 0.000346s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.18% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.71% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.29% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.93% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.89% : 0.000008s : 44: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.10% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.87% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.84% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 9: predicate.reduce_eliminate 2.37% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.96% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.59% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.93% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 42.25% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.75% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026011 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.55% : 0.003005s : 1: add_attr 11.52% : 0.002997s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000502s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000005s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.71% : 0.000444s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000761s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001845s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.76% : 0.000457s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.11% : 0.003671s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.71% : 0.000186s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000072s : 1: symbol_engine_optimizer 22.32% : 0.005807s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.79% : 0.004369s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.019587, [24] [bootstrap]: 0.0004756 [type_inference]: 0.00551498 [event_method]: 1.431e-05 [auto_monad]: 5.468e-05 [graph_reusing]: 6.06e-06 [inline]: 2.04e-06 [add_attr]: 0.00300357, [1] [add_attr_with_inline]: 0.00299605, [1] [Cycle 1]: 4.506e-05, [2] [tag_attr]: 1.522e-05 [meta_addattr_fg_expand]: 4.41002e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.574e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.00390511, [53] [py_interpret_to_execute]: 1.981e-05 [rewriter_before_opt_a]: 5.757e-05 [opt_a]: 0.00207987, [2] [Cycle 1]: 0.00147832, [45] [expand_dump_flag]: 3.09999e-06 [switch_simplify]: 3.181e-05 [loop_unroll]: 2.09e-05 [a_1]: 0.00044305 [with_stream_mark]: 1.29e-05 [recompute_prepare]: 7.81001e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.09001e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.6e-05 [accelerated_algorithm]: 6.37001e-06 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 7.91001e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.791e-05 [flash_sp]: 7.48e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 8.27e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.16001e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.87998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.02e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.092e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.07999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.22001e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 9.13002e-06 [renormalize]: 0.00040012 [add_forward_monad_depend]: 4.62e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.3e-05 [cse]: 2.712e-05 [a_3]: 4.033e-05 [Cycle 2]: 0.00059206, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.65002e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012543 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.83998e-06 [updatestate_assign_eliminate]: 2.23002e-06 [updatestate_loads_eliminate]: 2.41998e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.826e-05 [accelerated_algorithm]: 5.38002e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.14003e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.45e-06 [auto_parallel]: 5.49e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.58999e-06 [merge_comm]: 3.17002e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47001e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.86001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98998e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.15999e-06 [a_after_grad]: 8.3e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.60019e-07 [auto_monad_grad]: 7.59988e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.62e-05 [a_3]: 3.13e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 1.93002e-06 [rewriter_after_opt_a]: 3.036e-05 [convert_after_rewriter]: 6.65002e-06 [order_py_execute_after_rewriter]: 4.92e-06 [mutable_eliminate]: 0.00045729 [opt_b]: 0.00018173, [1] [Cycle 1]: 0.00017579, [7] [b_1]: 0.0001081 [b_2]: 7.30998e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.70002e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 3.69997e-07 [cse]: 1.589e-05 [optimize_parallel_all_gather_comm]: 1.599e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.00041142 [opt_after_cconv]: 9.393e-05, [1] [Cycle 1]: 8.823e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.613e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.756e-05, [1] [Cycle 1]: 6.345e-05, [4] [d_1]: 3.844e-05 [none_parameter_eliminate]: 1.42999e-06 [renormalize]: 1.40019e-07 [switch_simplify]: 5.94999e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 4.254e-05 [cse_after_recomputation]: 1.917e-05, [1] [Cycle 1]: 1.495e-05, [1] [cse]: 9.86e-06 [environ_conv]: 4.52998e-06 [swap_dp_allreduce_reducescatter]: 5.09e-06 [bias_add_comm_swap]: 2.74999e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.16002e-06 [slice_recompute_activation]: 2.17999e-06 [micro_interleaved_order_control]: 2.11998e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.57001e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.27001e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.64e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 2.08998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.739e-05, [1] [Cycle 1]: 6.346e-05, [6] [build]: 2.12001e-06 [elim_shapecalc]: 8.11002e-06 [elim_not_effective]: 1.12e-05 [opt_reshape]: 6.28002e-06 [fold_const_symbol]: 9.07001e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 1.442e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00044942 [validate]: 3.055e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.00587394 [execute]: 7.78999e-06 Sums bootstrap : 0.000476s : 3.04% type_inference : 0.005515s : 35.27% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000568s : 3.64% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000400s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000457s : 2.92% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000411s : 2.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000014s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.87% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005874s : 37.56% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000161 30 14.83% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.32% : 0.000005s : 4: substitution.graph_param_transform 66.60% : 0.000107s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000004s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.73% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005474 2 89.96% : 0.004924s : 1: type_inference.infer 10.04% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.29% : 0.000026s : 3: replace.inline 30.71% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.45% : 0.000105s : 3: match.inline 8.55% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.56% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.53% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.55% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.88% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.95% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.82% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.42% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 46.27% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.73% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027986 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.75% : 0.003008s : 1: add_attr 10.72% : 0.002999s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.06% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.82% : 0.000509s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.67% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.33% : 0.000933s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.44% : 0.002083s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.64% : 0.000459s : 1: opt_after_jit_grad 0.66% : 0.000185s : 1: opt_b 13.97% : 0.003909s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.09% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000203s : 1: renormalize.infer 0.68% : 0.000190s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.02% : 0.005883s : 1: task_emit 0.25% : 0.000070s : 1: tuple_transform 19.76% : 0.005529s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0370638, [24] [bootstrap]: 0.00051411 [type_inference]: 0.01129 [event_method]: 4.626e-05 [auto_monad]: 0.00011979 [graph_reusing]: 8.86002e-06 [inline]: 1.97999e-06 [add_attr]: 0.00301818, [1] [add_attr_with_inline]: 0.00300979, [1] [Cycle 1]: 6.915e-05, [2] [tag_attr]: 3.433e-05 [meta_addattr_fg_expand]: 9.45001e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 4.801e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.0131852, [53] [py_interpret_to_execute]: 3.842e-05 [rewriter_before_opt_a]: 0.00014594 [opt_a]: 0.0108891, [3] [Cycle 1]: 0.00696641, [45] [expand_dump_flag]: 3.78001e-06 [switch_simplify]: 7.326e-05 [loop_unroll]: 6.155e-05 [a_1]: 0.00143655 [with_stream_mark]: 2.316e-05 [recompute_prepare]: 2.145e-05 [updatestate_depend_eliminate]: 9.09998e-06 [updatestate_assign_eliminate]: 7.75e-06 [updatestate_loads_eliminate]: 7.63001e-06 [parameter_eliminate]: 2.56e-06 [a_2]: 0.0002429 [accelerated_algorithm]: 3.022e-05 [shard]: 2.31e-06 [meta_shard_fg_expand]: 3.45e-06 [shard_inline]: 1.606e-05 [merge_send_recv]: 1.562e-05 [auto_parallel]: 1.14e-05 [parallel]: 1.724e-05 [flash_sp]: 1.111e-05 [merge_comm]: 9.99999e-06 [allreduce_fusion]: 9.49e-06 [matmul_add_comm_reduction]: 2.593e-05 [allreduce_slice_to_reducescatter]: 7.89994e-07 [virtual_shard_identity]: 1.766e-05 [virtual_dataset]: 1.548e-05 [get_grad_eliminate_]: 1.54e-05 [virtual_output]: 1.515e-05 [merge_forward]: 9.54e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.773e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.837e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.726e-05 [set_forward_comm_id_for_comm_node_pass]: 9.89001e-06 [meta_fg_expand]: 0.00141471 [flash_sp_send_recv_attached]: 3.58999e-06 [receive_attached]: 2.60002e-06 [after_resolve]: 5.977e-05 [a_after_grad]: 8.094e-05 [renormalize]: 0.00236379 [add_forward_monad_depend]: 8.80999e-06 [auto_monad_grad]: 5.14003e-06 [auto_monad_eliminator]: 5.547e-05 [cse]: 0.00016347 [a_3]: 0.0003331 [Cycle 2]: 0.00300745, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.681e-05 [loop_unroll]: 4.386e-05 [a_1]: 0.00154647 [with_stream_mark]: 1.126e-05 [recompute_prepare]: 1.071e-05 [updatestate_depend_eliminate]: 4.84003e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.00012633 [accelerated_algorithm]: 1.198e-05 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.41003e-06 [merge_send_recv]: 6.77002e-06 [auto_parallel]: 7.26999e-06 [parallel]: 4.55999e-06 [flash_sp]: 3.65e-06 [merge_comm]: 5.09998e-06 [allreduce_fusion]: 4.67e-06 [matmul_add_comm_reduction]: 7.58999e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 8.82e-06 [get_grad_eliminate_]: 9.23002e-06 [virtual_output]: 8.45999e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 9.29984e-07 [offload_activation]: 9.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.618e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.397e-05 [set_forward_comm_id_for_comm_node_pass]: 5.91e-06 [meta_fg_expand]: 7.017e-05 [flash_sp_send_recv_attached]: 1.02998e-06 [receive_attached]: 1.13001e-06 [after_resolve]: 1.676e-05 [a_after_grad]: 1.48e-05 [renormalize]: 0.00058074 [add_forward_monad_depend]: 3.88999e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.407e-05 [cse]: 4.633e-05 [a_3]: 6.553e-05 [Cycle 3]: 0.00090155, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.046e-05 [loop_unroll]: 9.05999e-06 [a_1]: 0.00024955 [with_stream_mark]: 1.01e-05 [recompute_prepare]: 9.44e-06 [updatestate_depend_eliminate]: 4.75999e-06 [updatestate_assign_eliminate]: 3.90998e-06 [updatestate_loads_eliminate]: 3.80998e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012397 [accelerated_algorithm]: 1.176e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.76003e-06 [shard_inline]: 8.77999e-06 [merge_send_recv]: 7.08998e-06 [auto_parallel]: 7.1e-06 [parallel]: 4.53001e-06 [flash_sp]: 1.09998e-06 [merge_comm]: 4.85999e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 9.79999e-06 [virtual_dataset]: 8.59e-06 [get_grad_eliminate_]: 8.64e-06 [virtual_output]: 8.25999e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 8.52e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.547e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.518e-05 [set_forward_comm_id_for_comm_node_pass]: 5.79e-06 [meta_fg_expand]: 2.93e-06 [flash_sp_send_recv_attached]: 8.90024e-07 [receive_attached]: 9.70002e-07 [after_resolve]: 1.489e-05 [a_after_grad]: 1.428e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.061e-05 [cse]: 2.576e-05 [a_3]: 5.884e-05 [py_interpret_to_execute_after_opt_a]: 1.025e-05 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 4.661e-05 [convert_after_rewriter]: 9.09e-06 [order_py_execute_after_rewriter]: 6.99001e-06 [mutable_eliminate]: 0.00045789 [opt_b]: 0.00029097, [1] [Cycle 1]: 0.00028462, [7] [b_1]: 0.00019085 [b_2]: 1.097e-05 [updatestate_depend_eliminate]: 7.41999e-06 [updatestate_assign_eliminate]: 4.18001e-06 [updatestate_loads_eliminate]: 4.01001e-06 [renormalize]: 2.19996e-07 [cse]: 3.15e-05 [optimize_parallel_all_gather_comm]: 1.996e-05 [overlap_param_gather]: 1.98002e-06 [cconv]: 2.031e-05 [loop_unroll]: 0.00042752 [opt_after_cconv]: 0.0001351, [1] [Cycle 1]: 0.00012925, [7] [c_1]: 4.821e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 7.18e-06 [updatestate_assign_eliminate]: 4.37e-06 [updatestate_loads_eliminate]: 3.86001e-06 [cse]: 2.938e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 2.753e-05 [tuple_transform]: 0.00010343, [1] [Cycle 1]: 9.858e-05, [4] [d_1]: 6.83e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.75002e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 5.839e-05 [cse_after_recomputation]: 3.181e-05, [1] [Cycle 1]: 2.704e-05, [1] [cse]: 2.188e-05 [environ_conv]: 8.19998e-06 [swap_dp_allreduce_reducescatter]: 7.88999e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 3.34001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.52001e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.693e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 4.92e-06 [overlap_recompute_and_grad_model_parallel]: 5.74999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.61998e-06 [overlap_recompute_comm]: 1.86003e-06 [overlap_grad_ring_attention]: 5.14998e-06 [overlap_grad_flash_sp]: 2.339e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.684e-05, [1] [Cycle 1]: 9.269e-05, [6] [build]: 9.42999e-06 [elim_shapecalc]: 1.334e-05 [elim_not_effective]: 1.769e-05 [opt_reshape]: 9.79999e-06 [fold_const_symbol]: 1.48e-05 [renormalize]: 2.40019e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 2.525e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.7e-06 [opt_after_jit_grad]: 0.00047071 [validate]: 4.454e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.00806272 [execute]: 6.79001e-06 Sums bootstrap : 0.000514s : 1.57% type_inference : 0.011290s : 34.45% event_method : 0.000046s : 0.14% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000009s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.12% optimize.rewriter_before_opt_a : 0.000146s : 0.45% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003233s : 9.86% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.51% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000022s : 0.07% optimize.opt_a.meta_fg_expand : 0.001488s : 4.54% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000091s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.002945s : 8.99% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000236s : 0.72% optimize.opt_a.a_3 : 0.000457s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.40% optimize.opt_b.b_1 : 0.000191s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000428s : 1.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.08% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000471s : 1.44% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008063s : 24.60% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000783 222 5.77% : 0.000045s : 12: substitution.arithmetic_simplify 1.77% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.46% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.32% : 0.000425s : 17: substitution.inline 1.98% : 0.000016s : 2: substitution.inline_without_move 1.36% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.85% : 0.000014s : 3: substitution.less_batch_normalization 1.68% : 0.000013s : 11: substitution.minmaximum_grad 0.66% : 0.000005s : 5: substitution.partial_eliminate 1.66% : 0.000013s : 20: substitution.remove_not_recompute_node 3.11% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.50% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.26% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 11.04% : 0.000086s : 30: substitution.tuple_list_get_item_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011216 2 86.86% : 0.009742s : 1: type_inference.infer 13.14% : 0.001474s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.61% : 0.000126s : 17: replace.inline 42.39% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000471 33 88.34% : 0.000417s : 17: match.inline 11.66% : 0.000055s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.09% : 0.000016s : 100: predicate.arithmetic_simplify 1.11% : 0.000008s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.31% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.78% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.21% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001561 34 56.45% : 0.000881s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.55% : 0.000680s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061434 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.92% : 0.003022s : 1: add_attr 4.91% : 0.003014s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000126s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.89% : 0.000548s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000437s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.82% : 0.000501s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.96% : 0.004893s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.73% : 0.010892s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.78% : 0.000480s : 1: opt_after_jit_grad 0.48% : 0.000295s : 1: opt_b 21.47% : 0.013189s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000052s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.57% : 0.001578s : 2: renormalize.infer 2.20% : 0.001354s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.14% : 0.008073s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 18.40% : 0.011305s : 1: type_inference 0.12% : 0.000075s : 1: validate TotalTime = 0.0184514, [24] [bootstrap]: 0.00049924 [type_inference]: 0.00428862 [event_method]: 1.027e-05 [auto_monad]: 4.967e-05 [graph_reusing]: 4.47e-06 [inline]: 1.96e-06 [add_attr]: 0.00294587, [1] [add_attr_with_inline]: 0.0029378, [1] [Cycle 1]: 4.439e-05, [2] [tag_attr]: 1.134e-05 [meta_addattr_fg_expand]: 2.88e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 2.235e-05 [insert-virtual-dataset]: 2.21e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00367184, [53] [py_interpret_to_execute]: 1.641e-05 [rewriter_before_opt_a]: 3.73e-05 [opt_a]: 0.00186845, [2] [Cycle 1]: 0.00127089, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.376e-05 [loop_unroll]: 1.384e-05 [a_1]: 0.00031521 [with_stream_mark]: 1.305e-05 [recompute_prepare]: 7.63999e-06 [updatestate_depend_eliminate]: 3.45998e-06 [updatestate_assign_eliminate]: 2.91999e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.532e-05 [accelerated_algorithm]: 6.80002e-06 [shard]: 2.59001e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 8.2e-06 [auto_parallel]: 5.68002e-06 [parallel]: 1.656e-05 [flash_sp]: 7.5e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 7.59988e-07 [virtual_shard_identity]: 7.03998e-06 [virtual_dataset]: 5.82999e-06 [get_grad_eliminate_]: 5.68002e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 3.72002e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 9.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.57997e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.083e-05 [a_after_grad]: 8.92999e-06 [renormalize]: 0.00033964 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.68002e-06 [auto_monad_eliminator]: 1.299e-05 [cse]: 2.658e-05 [a_3]: 4.007e-05 [Cycle 2]: 0.0005879, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.64001e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012404 [with_stream_mark]: 9.37999e-06 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.779e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.05999e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.67e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.58998e-06 [matmul_add_comm_reduction]: 5.19998e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.89999e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49999e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.17998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.91e-06 [meta_fg_expand]: 1.58997e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.14e-06 [after_resolve]: 9.09e-06 [a_after_grad]: 7.95998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 7.49977e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.232e-05 [a_3]: 3.28e-05 [py_interpret_to_execute_after_opt_a]: 7.56001e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.039e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00044878 [opt_b]: 0.00017997, [1] [Cycle 1]: 0.00017406, [7] [b_1]: 0.00010633 [b_2]: 7.33e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 3.50003e-07 [cse]: 1.56e-05 [optimize_parallel_all_gather_comm]: 1.586e-05 [overlap_param_gather]: 2.24001e-06 [cconv]: 2.188e-05 [loop_unroll]: 0.00041557 [opt_after_cconv]: 9.459e-05, [1] [Cycle 1]: 8.916e-05, [7] [c_1]: 2.77e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.627e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.219e-05 [tuple_transform]: 7.063e-05, [1] [Cycle 1]: 6.636e-05, [4] [d_1]: 3.995e-05 [none_parameter_eliminate]: 1.91998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.388e-05 [cse_after_recomputation]: 1.948e-05, [1] [Cycle 1]: 1.508e-05, [1] [cse]: 9.97999e-06 [environ_conv]: 4.99e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.21e-06 [label_micro_interleaved_index]: 4.04002e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.12e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.206e-05 [grouped_pairwise_exchange_alltoall]: 1.86e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.74002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 1.81998e-06 [overlap_grad_ring_attention]: 4.08001e-06 [overlap_grad_flash_sp]: 1.678e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.33002e-06 [symbol_engine_optimizer]: 6.782e-05, [1] [Cycle 1]: 6.372e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.15999e-06 [elim_not_effective]: 1.123e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.65001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.516e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044816 [validate]: 3.048e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00625377 [execute]: 6.68e-06 Sums bootstrap : 0.000499s : 3.43% type_inference : 0.004289s : 29.46% event_method : 0.000010s : 0.07% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000004s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000439s : 3.02% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000340s : 2.33% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 3.08% optimize.opt_b.b_1 : 0.000106s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.85% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 3.08% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006254s : 42.96% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 17.88% : 0.000021s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.97% : 0.000006s : 4: substitution.graph_param_transform 65.67% : 0.000078s : 2: substitution.inline 2.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.43% : 0.000004s : 4: substitution.remove_not_recompute_node 3.07% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004247 2 91.81% : 0.003899s : 1: type_inference.infer 8.19% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.15% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.19% : 0.000002s : 8: predicate.less_batch_normalization 1.71% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.86% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.98% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.10% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.37% : 0.000000s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 41.13% : 0.000097s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.87% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026358 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.19% : 0.002950s : 1: add_attr 11.16% : 0.002941s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.02% : 0.000531s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.00% : 0.000790s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001871s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.73% : 0.000457s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 13.95% : 0.003676s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000003s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000185s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.76% : 0.006263s : 1: task_emit 0.28% : 0.000074s : 1: tuple_transform 16.32% : 0.004301s : 1: type_inference 0.21% : 0.000055s : 1: validate TotalTime = 0.0355511, [24] [bootstrap]: 0.00051065 [type_inference]: 0.010212 [event_method]: 3.931e-05 [auto_monad]: 0.00011229 [graph_reusing]: 7.43e-06 [inline]: 1.72001e-06 [add_attr]: 0.00296395, [1] [add_attr_with_inline]: 0.00295593, [1] [Cycle 1]: 6.581e-05, [2] [tag_attr]: 3.172e-05 [meta_addattr_fg_expand]: 8.33999e-06 [parallel-infer-symbol]: 3.18e-06 [pre_auto_parallel]: 4.554e-05 [insert-virtual-dataset]: 2.30997e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.06e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.0128888, [53] [py_interpret_to_execute]: 3.406e-05 [rewriter_before_opt_a]: 0.00012523 [opt_a]: 0.0106022, [3] [Cycle 1]: 0.0067513, [45] [expand_dump_flag]: 3.83001e-06 [switch_simplify]: 6.511e-05 [loop_unroll]: 5.388e-05 [a_1]: 0.00131673 [with_stream_mark]: 2.223e-05 [recompute_prepare]: 2.104e-05 [updatestate_depend_eliminate]: 9.22999e-06 [updatestate_assign_eliminate]: 7.66999e-06 [updatestate_loads_eliminate]: 8.21002e-06 [parameter_eliminate]: 2.51e-06 [a_2]: 0.0002444 [accelerated_algorithm]: 3.025e-05 [shard]: 1.76e-06 [meta_shard_fg_expand]: 3.49001e-06 [shard_inline]: 1.635e-05 [merge_send_recv]: 1.615e-05 [auto_parallel]: 1.06e-05 [parallel]: 1.833e-05 [flash_sp]: 1.13e-05 [merge_comm]: 9.68002e-06 [allreduce_fusion]: 8.90001e-06 [matmul_add_comm_reduction]: 2.707e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.785e-05 [virtual_dataset]: 1.56e-05 [get_grad_eliminate_]: 1.499e-05 [virtual_output]: 1.49e-05 [merge_forward]: 9.48002e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.728e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.863e-05 [merge_recompute_call_nodes]: 1.51002e-06 [before_grad]: 2.75e-05 [set_forward_comm_id_for_comm_node_pass]: 9.92001e-06 [meta_fg_expand]: 0.00136409 [flash_sp_send_recv_attached]: 3.38e-06 [receive_attached]: 2.67001e-06 [after_resolve]: 5.904e-05 [a_after_grad]: 9.734e-05 [renormalize]: 0.00231805 [add_forward_monad_depend]: 8.57e-06 [auto_monad_grad]: 4.99e-06 [auto_monad_eliminator]: 5.518e-05 [cse]: 0.00016176 [a_3]: 0.00033328 [Cycle 2]: 0.00294824, [45] [expand_dump_flag]: 1.49998e-06 [switch_simplify]: 4.667e-05 [loop_unroll]: 4.358e-05 [a_1]: 0.00154346 [with_stream_mark]: 1.156e-05 [recompute_prepare]: 1.092e-05 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.01002e-06 [a_2]: 0.00012501 [accelerated_algorithm]: 1.195e-05 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.04998e-06 [merge_send_recv]: 6.78e-06 [auto_parallel]: 7.16001e-06 [parallel]: 5.04998e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.04e-06 [allreduce_fusion]: 4.53999e-06 [matmul_add_comm_reduction]: 7.90998e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.68001e-06 [get_grad_eliminate_]: 8.62998e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.97999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.20999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.631e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.392e-05 [set_forward_comm_id_for_comm_node_pass]: 5.14e-06 [meta_fg_expand]: 3.425e-05 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.561e-05 [a_after_grad]: 1.387e-05 [renormalize]: 0.00056802 [add_forward_monad_depend]: 3.88001e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.424e-05 [cse]: 4.565e-05 [a_3]: 6.457e-05 [Cycle 3]: 0.00088844, [45] [expand_dump_flag]: 1.17e-06 [switch_simplify]: 1.051e-05 [loop_unroll]: 8.74e-06 [a_1]: 0.00024737 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 9.42001e-06 [updatestate_depend_eliminate]: 4.74e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 3.9e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 0.00012238 [accelerated_algorithm]: 1.152e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.67001e-06 [shard_inline]: 8.92999e-06 [merge_send_recv]: 7.13998e-06 [auto_parallel]: 7.08e-06 [parallel]: 4.32e-06 [flash_sp]: 1.04003e-06 [merge_comm]: 5.00001e-06 [allreduce_fusion]: 5.00999e-06 [matmul_add_comm_reduction]: 7.92998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.014e-05 [virtual_dataset]: 8.59998e-06 [get_grad_eliminate_]: 8.45999e-06 [virtual_output]: 8.30999e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.565e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.374e-05 [set_forward_comm_id_for_comm_node_pass]: 5.37999e-06 [meta_fg_expand]: 3.06001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.312e-05 [a_after_grad]: 1.409e-05 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.09998e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 1.039e-05 [cse]: 2.537e-05 [a_3]: 5.673e-05 [py_interpret_to_execute_after_opt_a]: 1.023e-05 [slice_cell_reuse_recomputed_activation]: 2.26e-06 [rewriter_after_opt_a]: 4.893e-05 [convert_after_rewriter]: 9.30001e-06 [order_py_execute_after_rewriter]: 6.64999e-06 [mutable_eliminate]: 0.00045841 [opt_b]: 0.00028511, [1] [Cycle 1]: 0.00027899, [7] [b_1]: 0.00018767 [b_2]: 1.036e-05 [updatestate_depend_eliminate]: 7.22002e-06 [updatestate_assign_eliminate]: 4.06001e-06 [updatestate_loads_eliminate]: 4.03001e-06 [renormalize]: 4.00003e-07 [cse]: 3.079e-05 [optimize_parallel_all_gather_comm]: 2.012e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.011e-05 [loop_unroll]: 0.00047689 [opt_after_cconv]: 0.00013516, [1] [Cycle 1]: 0.00012916, [7] [c_1]: 4.827e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 7.14001e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 4.07998e-06 [cse]: 2.955e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 2.85e-05 [tuple_transform]: 0.00010181, [1] [Cycle 1]: 9.693e-05, [4] [d_1]: 6.643e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.89999e-06 [partial_unused_args_eliminate]: 1.85001e-06 [add_recomputation]: 5.599e-05 [cse_after_recomputation]: 3.283e-05, [1] [Cycle 1]: 2.789e-05, [1] [cse]: 2.177e-05 [environ_conv]: 9.04e-06 [swap_dp_allreduce_reducescatter]: 7.62998e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.44998e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.41002e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.23998e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.50006e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.04998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.68002e-06 [control_data_broadcast_order]: 1.69e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 4.90999e-06 [overlap_recompute_and_grad_model_parallel]: 5.71e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 5.02e-06 [overlap_grad_flash_sp]: 2.451e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 9.825e-05, [1] [Cycle 1]: 9.402e-05, [6] [build]: 9.26002e-06 [elim_shapecalc]: 1.359e-05 [elim_not_effective]: 1.85e-05 [opt_reshape]: 1.025e-05 [fold_const_symbol]: 1.444e-05 [renormalize]: 2.80008e-07 [detach_backward]: 1.75001e-06 [pipeline_parallel_scheduler]: 1.67999e-06 [auto_monad_reorder]: 2.475e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.0004697 [validate]: 4.376e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.00797559 [execute]: 6.64001e-06 Sums bootstrap : 0.000511s : 1.63% type_inference : 0.010212s : 32.61% event_method : 0.000039s : 0.13% auto_monad : 0.000112s : 0.36% graph_reusing : 0.000007s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000125s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000122s : 0.39% optimize.opt_a.loop_unroll : 0.000106s : 0.34% optimize.opt_a.a_1 : 0.003108s : 9.92% optimize.opt_a.with_stream_mark : 0.000043s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.10% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000031s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.07% optimize.opt_a.meta_fg_expand : 0.001401s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000125s : 0.40% optimize.opt_a.renormalize : 0.002886s : 9.22% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000233s : 0.74% optimize.opt_a.a_3 : 0.000455s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.46% optimize.opt_b.b_1 : 0.000188s : 0.60% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000477s : 1.52% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000470s : 1.50% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.007976s : 25.47% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000729 218 5.99% : 0.000044s : 11: substitution.arithmetic_simplify 2.00% : 0.000015s : 2: substitution.cast_eliminate 0.42% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.08% : 0.000008s : 8: substitution.graph_param_transform 0.39% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.07% : 0.000394s : 16: substitution.inline 2.25% : 0.000016s : 2: substitution.inline_without_move 1.44% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000015s : 3: substitution.less_batch_normalization 1.84% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.34% : 0.000024s : 10: substitution.replace_applicator 1.46% : 0.000011s : 15: substitution.replace_old_param 0.34% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.79% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.52% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010146 2 87.16% : 0.008843s : 1: type_inference.infer 12.84% : 0.001303s : 1: type_inference.specialize ------[replace.] 0.000224 30 52.68% : 0.000118s : 16: replace.inline 47.32% : 0.000106s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000417 30 92.61% : 0.000386s : 16: match.inline 7.39% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000732 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.07% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.54% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000041s : 244: predicate.inline 1.30% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.70% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.15% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 97: predicate.partial_defer_inline 1.68% : 0.000012s : 89: predicate.partial_eliminate 1.09% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000009s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.85% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.88% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000014s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.28% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001443 32 57.86% : 0.000835s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.14% : 0.000608s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059374 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.00% : 0.002968s : 1: add_attr 4.98% : 0.002960s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000119s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.91% : 0.000542s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000045s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.82% : 0.000486s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.01% : 0.004757s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.13% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.86% : 0.010605s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.81% : 0.000479s : 1: opt_after_jit_grad 0.49% : 0.000289s : 1: opt_b 21.71% : 0.012893s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.06% : 0.000038s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.59% : 0.001535s : 2: renormalize.infer 2.25% : 0.001339s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000053s : 1: rewriter_after_opt_a 0.22% : 0.000130s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.45% : 0.007986s : 1: task_emit 0.18% : 0.000105s : 1: tuple_transform 17.23% : 0.010227s : 1: type_inference 0.13% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x3-kbk],max_mem:60.0M TotalTime = 0.0799702, [24] [bootstrap]: 0.00057243 [type_inference]: 0.00614286 [event_method]: 1.332e-05 [auto_monad]: 5.851e-05 [graph_reusing]: 5.19e-06 [inline]: 1.69e-06 [add_attr]: 0.00346535, [1] [add_attr_with_inline]: 0.00345394, [1] [Cycle 1]: 4.402e-05, [2] [tag_attr]: 1.516e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 2.768e-05 [insert-virtual-dataset]: 2.50997e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.80001e-06 [optimize]: 0.00393743, [53] [py_interpret_to_execute]: 1.978e-05 [rewriter_before_opt_a]: 5.865e-05 [opt_a]: 0.0020883, [2] [Cycle 1]: 0.00149871, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 3.183e-05 [loop_unroll]: 2.122e-05 [a_1]: 0.00044911 [with_stream_mark]: 1.335e-05 [recompute_prepare]: 7.53e-06 [updatestate_depend_eliminate]: 3.88001e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 3.13e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.61e-05 [accelerated_algorithm]: 6.36998e-06 [shard]: 2.25002e-06 [meta_shard_fg_expand]: 1.56002e-06 [shard_inline]: 5.64e-06 [merge_send_recv]: 7.91001e-06 [auto_parallel]: 5.59e-06 [parallel]: 2.201e-05 [flash_sp]: 7.25e-06 [merge_comm]: 3.67002e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 8.3e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.87003e-06 [virtual_dataset]: 5.94999e-06 [get_grad_eliminate_]: 5.51998e-06 [virtual_output]: 5.72999e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.56003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.66999e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.00040643 [add_forward_monad_depend]: 4.78001e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.328e-05 [cse]: 2.698e-05 [a_3]: 4.033e-05 [Cycle 2]: 0.00058049, [45] [expand_dump_flag]: 7.39994e-07 [switch_simplify]: 6.67002e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012009 [with_stream_mark]: 9.49999e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.712e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.29998e-06 [parallel]: 4e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.54998e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 5.71998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71998e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.98002e-06 [a_after_grad]: 8.08999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.19999e-06 [cse]: 1.184e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 7.33e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 2.912e-05 [convert_after_rewriter]: 6.62002e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00044734 [opt_b]: 0.00018691, [1] [Cycle 1]: 0.00018053, [7] [b_1]: 0.00011285 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.16998e-06 [renormalize]: 3.60014e-07 [cse]: 1.575e-05 [optimize_parallel_all_gather_comm]: 1.575e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.223e-05 [loop_unroll]: 0.00041747 [opt_after_cconv]: 9.301e-05, [1] [Cycle 1]: 8.725e-05, [7] [c_1]: 2.762e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 4.97999e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.57e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.17e-05 [tuple_transform]: 6.924e-05, [1] [Cycle 1]: 6.502e-05, [4] [d_1]: 3.909e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.32001e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.779e-05 [cse_after_recomputation]: 2.026e-05, [1] [Cycle 1]: 1.573e-05, [1] [cse]: 1.037e-05 [environ_conv]: 4.59998e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.08002e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 8.89995e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.53003e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86998e-06 [control_data_broadcast_order]: 1.131e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.32002e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.02001e-06 [overlap_grad_ring_attention]: 4.08001e-06 [overlap_grad_flash_sp]: 1.742e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.13001e-06 [symbol_engine_optimizer]: 6.976e-05, [1] [Cycle 1]: 6.574e-05, [6] [build]: 2.76999e-06 [elim_shapecalc]: 9.27999e-06 [elim_not_effective]: 1.187e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 8.83001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.487e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.62998e-06 [opt_after_jit_grad]: 0.00045127 [validate]: 2.937e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0650191 [execute]: 8.56002e-06 Sums bootstrap : 0.000572s : 0.76% type_inference : 0.006143s : 8.13% event_method : 0.000013s : 0.02% auto_monad : 0.000059s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000569s : 0.75% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000407s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000029s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.59% optimize.opt_b.b_1 : 0.000113s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000417s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000451s : 0.60% validate : 0.000029s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.065019s : 86.08% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 15.03% : 0.000025s : 5: substitution.arithmetic_simplify 1.26% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.37% : 0.000006s : 4: substitution.graph_param_transform 66.33% : 0.000109s : 3: substitution.inline 1.59% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.53% : 0.000004s : 4: substitution.remove_not_recompute_node 2.58% : 0.000004s : 4: substitution.replace_old_param 6.57% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006099 2 90.72% : 0.005533s : 1: type_inference.infer 9.28% : 0.000566s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.77% : 0.000026s : 3: replace.inline 31.23% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.67% : 0.000107s : 3: match.inline 8.33% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.79% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.29% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.06% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.65% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.70% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000347 8 46.21% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.79% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088861 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.90% : 0.003469s : 1: add_attr 3.89% : 0.003457s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000064s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.68% : 0.000607s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000018s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.05% : 0.000936s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000095s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.35% : 0.002091s : 1: opt_a 0.11% : 0.000096s : 1: opt_after_cconv 0.52% : 0.000461s : 1: opt_after_jit_grad 0.21% : 0.000190s : 1: opt_b 4.44% : 0.003941s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000210s : 1: renormalize.infer 0.21% : 0.000189s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000033s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000006s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 73.19% : 0.065036s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.93% : 0.006157s : 1: type_inference 0.06% : 0.000056s : 1: validate TotalTime = 0.0703979, [24] [bootstrap]: 0.00047344 [type_inference]: 0.00440625 [event_method]: 1.071e-05 [auto_monad]: 5.048e-05 [graph_reusing]: 5.07e-06 [inline]: 1.81e-06 [add_attr]: 0.00300579, [1] [add_attr_with_inline]: 0.00299762, [1] [Cycle 1]: 4.546e-05, [2] [tag_attr]: 1.097e-05 [meta_addattr_fg_expand]: 3.31999e-06 [parallel-infer-symbol]: 2.52001e-06 [pre_auto_parallel]: 2.101e-05 [insert-virtual-dataset]: 2.53998e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.00368286, [53] [py_interpret_to_execute]: 1.522e-05 [rewriter_before_opt_a]: 3.954e-05 [opt_a]: 0.00188314, [2] [Cycle 1]: 0.00124154, [45] [expand_dump_flag]: 2.32999e-06 [switch_simplify]: 2.432e-05 [loop_unroll]: 1.324e-05 [a_1]: 0.00028903 [with_stream_mark]: 1.33e-05 [recompute_prepare]: 7.52002e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 2.89999e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.65e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.71e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 8.25e-06 [auto_parallel]: 5.51998e-06 [parallel]: 1.798e-05 [flash_sp]: 7.61001e-06 [merge_comm]: 4.15999e-06 [allreduce_fusion]: 3.34001e-06 [matmul_add_comm_reduction]: 8.52e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.79001e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.068e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.04e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.004e-05 [a_after_grad]: 8.84998e-06 [renormalize]: 0.00033976 [add_forward_monad_depend]: 4.50001e-06 [auto_monad_grad]: 1.79998e-06 [auto_monad_eliminator]: 1.32e-05 [cse]: 2.81e-05 [a_3]: 4.021e-05 [Cycle 2]: 0.00063274, [45] [expand_dump_flag]: 7.00005e-07 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.00012488 [with_stream_mark]: 9.62999e-06 [recompute_prepare]: 5.63002e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.64999e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00010739 [accelerated_algorithm]: 5.89e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 4.42003e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.08999e-06 [flash_sp]: 2.84999e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.98001e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.96001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.30998e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 1.04e-06 [receive_attached]: 1.00999e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 8.33999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.25999e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.45002e-06 [cse]: 1.285e-05 [a_3]: 3.216e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 1.76003e-06 [rewriter_after_opt_a]: 3.065e-05 [convert_after_rewriter]: 6.67002e-06 [order_py_execute_after_rewriter]: 5.56e-06 [mutable_eliminate]: 0.00044672 [opt_b]: 0.00018035, [1] [Cycle 1]: 0.0001741, [7] [b_1]: 0.00010711 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 3.30008e-07 [cse]: 1.593e-05 [optimize_parallel_all_gather_comm]: 1.536e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.313e-05 [loop_unroll]: 0.0004129 [opt_after_cconv]: 9.414e-05, [1] [Cycle 1]: 8.832e-05, [7] [c_1]: 2.721e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.577e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.281e-05 [tuple_transform]: 6.872e-05, [1] [Cycle 1]: 6.439e-05, [4] [d_1]: 3.883e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.25002e-06 [partial_unused_args_eliminate]: 1.55001e-06 [add_recomputation]: 4.308e-05 [cse_after_recomputation]: 2.058e-05, [1] [Cycle 1]: 1.655e-05, [1] [cse]: 1.124e-05 [environ_conv]: 4.88001e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.56998e-06 [label_micro_interleaved_index]: 3.98001e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.21997e-06 [slice_recompute_activation]: 2.73e-06 [micro_interleaved_order_control]: 2.26e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.22e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99999e-06 [control_data_broadcast_order]: 1.154e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 4.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.63998e-06 [overlap_grad_ring_attention]: 4.33001e-06 [overlap_grad_flash_sp]: 1.717e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 1.89999e-06 [split_layernorm_comm]: 1.57001e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 6.786e-05, [1] [Cycle 1]: 6.373e-05, [6] [build]: 2.11003e-06 [elim_shapecalc]: 8.07998e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.96998e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 1.547e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.48999e-06 [opt_after_jit_grad]: 0.00044766 [validate]: 3.086e-05 [backend_pass]: 8.00006e-07 [task_emit]: 0.0580261 [execute]: 7.55998e-06 Sums bootstrap : 0.000473s : 0.71% type_inference : 0.004406s : 6.63% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000414s : 0.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000184s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000340s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.67% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000413s : 0.62% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000448s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058026s : 87.34% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000117 26 18.20% : 0.000021s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.33% : 0.000005s : 4: substitution.graph_param_transform 65.99% : 0.000077s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.57% : 0.000004s : 4: substitution.remove_not_recompute_node 3.08% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004366 2 91.79% : 0.004008s : 1: type_inference.infer 8.21% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.40% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 2.06% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.72% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.31% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.14% : 0.000002s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.81% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000001s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.64% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.89% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.62% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000250 6 42.77% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.23% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078383 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.84% : 0.003010s : 1: add_attr 3.83% : 0.003001s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000508s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.02% : 0.000802s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.41% : 0.001886s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.58% : 0.000457s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.70% : 0.003687s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000186s : 1: renormalize.infer 0.19% : 0.000147s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.05% : 0.058042s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.64% : 0.004419s : 1: type_inference 0.07% : 0.000051s : 1: validate TotalTime = 0.0727781, [24] [bootstrap]: 0.00046885 [type_inference]: 0.0055288 [event_method]: 1.382e-05 [auto_monad]: 5.419e-05 [graph_reusing]: 5.66e-06 [inline]: 2.01e-06 [add_attr]: 0.00290801, [1] [add_attr_with_inline]: 0.00289978, [1] [Cycle 1]: 4.517e-05, [2] [tag_attr]: 1.542e-05 [meta_addattr_fg_expand]: 4.05998e-06 [parallel-infer-symbol]: 2.94001e-06 [pre_auto_parallel]: 2.464e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 8.29983e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.00395892, [53] [py_interpret_to_execute]: 2.172e-05 [rewriter_before_opt_a]: 5.848e-05 [opt_a]: 0.00212576, [2] [Cycle 1]: 0.0015198, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 3.187e-05 [loop_unroll]: 2.076e-05 [a_1]: 0.00044432 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.85998e-06 [updatestate_depend_eliminate]: 3.83001e-06 [updatestate_assign_eliminate]: 3.33e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 7.668e-05 [accelerated_algorithm]: 6.39001e-06 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 5.72001e-06 [parallel]: 1.798e-05 [flash_sp]: 6.77002e-06 [merge_comm]: 3.60003e-06 [allreduce_fusion]: 3.22002e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.11001e-06 [virtual_dataset]: 6.16e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 8.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 1.004e-05 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.93e-06 [receive_attached]: 3.03998e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 8.43999e-06 [renormalize]: 0.0004123 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.357e-05 [cse]: 2.631e-05 [a_3]: 4.005e-05 [Cycle 2]: 0.0005966, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 7e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012432 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.771e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.36002e-06 [merge_send_recv]: 4.18001e-06 [auto_parallel]: 5.12999e-06 [parallel]: 3.84002e-06 [flash_sp]: 2.89999e-06 [merge_comm]: 3.09001e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.08002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.37001e-06 [virtual_dataset]: 5.52999e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.36998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.17e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.06997e-06 [after_resolve]: 8.77e-06 [a_after_grad]: 8.13999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.16e-06 [cse]: 1.394e-05 [a_3]: 3.335e-05 [py_interpret_to_execute_after_opt_a]: 7.65e-06 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 3.098e-05 [convert_after_rewriter]: 6.60997e-06 [order_py_execute_after_rewriter]: 4.85999e-06 [mutable_eliminate]: 0.0004488 [opt_b]: 0.00018042, [1] [Cycle 1]: 0.00017463, [7] [b_1]: 0.00010766 [b_2]: 6.88e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 3.10014e-07 [cse]: 1.596e-05 [optimize_parallel_all_gather_comm]: 1.571e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 2.272e-05 [loop_unroll]: 0.00041592 [opt_after_cconv]: 9.539e-05, [1] [Cycle 1]: 8.957e-05, [7] [c_1]: 2.753e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.615e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.264e-05 [tuple_transform]: 6.943e-05, [1] [Cycle 1]: 6.498e-05, [4] [d_1]: 3.926e-05 [none_parameter_eliminate]: 1.51002e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.28998e-06 [partial_unused_args_eliminate]: 1.96998e-06 [add_recomputation]: 4.425e-05 [cse_after_recomputation]: 2.06e-05, [1] [Cycle 1]: 1.61e-05, [1] [cse]: 1.099e-05 [environ_conv]: 4.42e-06 [swap_dp_allreduce_reducescatter]: 4.97999e-06 [bias_add_comm_swap]: 2.34999e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.45002e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.05001e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 2.88e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 1.29e-06 [interleave_split_concat_branches]: 1.37e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.177e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.66999e-06 [overlap_recompute_and_grad_model_parallel]: 4.42998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.16998e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.698e-05 [begin_end_overlap_inline]: 7.2e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.819e-05, [1] [Cycle 1]: 6.388e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.47998e-06 [elim_not_effective]: 1.14e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.98002e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.32e-06 [auto_monad_reorder]: 1.566e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00049274 [validate]: 3.157e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.0590473 [execute]: 7.58999e-06 Sums bootstrap : 0.000469s : 0.68% type_inference : 0.005529s : 8.03% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000022s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000569s : 0.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000412s : 0.60% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000416s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000493s : 0.72% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.059047s : 85.73% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.67% : 0.000024s : 5: substitution.arithmetic_simplify 1.21% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 66.37% : 0.000108s : 3: substitution.inline 1.86% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000004s : 4: substitution.remove_not_recompute_node 2.32% : 0.000004s : 4: substitution.replace_old_param 6.83% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005489 2 89.84% : 0.004931s : 1: type_inference.infer 10.16% : 0.000558s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.22% : 0.000026s : 3: replace.inline 30.78% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.43% : 0.000107s : 3: match.inline 8.57% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.74% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.97% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.17% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.17% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.62% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000353 8 45.25% : 0.000160s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.75% : 0.000193s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081149 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.59% : 0.002912s : 1: add_attr 3.58% : 0.002904s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000503s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.15% : 0.000935s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.62% : 0.002129s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.62% : 0.000503s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.88% : 0.003963s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000026s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000204s : 1: renormalize.infer 0.25% : 0.000202s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.08% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 72.79% : 0.059065s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.83% : 0.005542s : 1: type_inference 0.07% : 0.000053s : 1: validate TotalTime = 0.109139, [24] [bootstrap]: 0.00050564 [type_inference]: 0.0114371 [event_method]: 4.9e-05 [auto_monad]: 0.00011944 [graph_reusing]: 8.42e-06 [inline]: 2.39999e-06 [add_attr]: 0.00303733, [1] [add_attr_with_inline]: 0.00302879, [1] [Cycle 1]: 7.085e-05, [2] [tag_attr]: 3.429e-05 [meta_addattr_fg_expand]: 9.42999e-06 [parallel-infer-symbol]: 3.06001e-06 [pre_auto_parallel]: 4.906e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0133506, [53] [py_interpret_to_execute]: 3.853e-05 [rewriter_before_opt_a]: 0.00014457 [opt_a]: 0.0110806, [3] [Cycle 1]: 0.00713414, [45] [expand_dump_flag]: 4.2e-06 [switch_simplify]: 7.471e-05 [loop_unroll]: 6.153e-05 [a_1]: 0.00148047 [with_stream_mark]: 2.285e-05 [recompute_prepare]: 2.165e-05 [updatestate_depend_eliminate]: 8.97e-06 [updatestate_assign_eliminate]: 7.52002e-06 [updatestate_loads_eliminate]: 7.11001e-06 [parameter_eliminate]: 2.51998e-06 [a_2]: 0.00024511 [accelerated_algorithm]: 3.131e-05 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 3.29001e-06 [shard_inline]: 1.596e-05 [merge_send_recv]: 1.559e-05 [auto_parallel]: 1.092e-05 [parallel]: 1.853e-05 [flash_sp]: 1.12e-05 [merge_comm]: 1.009e-05 [allreduce_fusion]: 9.09e-06 [matmul_add_comm_reduction]: 2.579e-05 [allreduce_slice_to_reducescatter]: 9.50007e-07 [virtual_shard_identity]: 1.786e-05 [virtual_dataset]: 1.573e-05 [get_grad_eliminate_]: 1.526e-05 [virtual_output]: 1.527e-05 [merge_forward]: 9.36002e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 1.749e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.811e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 2.732e-05 [set_forward_comm_id_for_comm_node_pass]: 9.74999e-06 [meta_fg_expand]: 0.0014034 [flash_sp_send_recv_attached]: 3.53999e-06 [receive_attached]: 3.04999e-06 [after_resolve]: 5.999e-05 [a_after_grad]: 8.152e-05 [renormalize]: 0.00248198 [add_forward_monad_depend]: 9.29e-06 [auto_monad_grad]: 5.05999e-06 [auto_monad_eliminator]: 5.694e-05 [cse]: 0.00016923 [a_3]: 0.00033612 [Cycle 2]: 0.00302854, [45] [expand_dump_flag]: 1.49e-06 [switch_simplify]: 4.708e-05 [loop_unroll]: 4.404e-05 [a_1]: 0.00151986 [with_stream_mark]: 1.235e-05 [recompute_prepare]: 1.085e-05 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.0001268 [accelerated_algorithm]: 1.175e-05 [shard]: 1.12999e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.14998e-06 [merge_send_recv]: 6.88e-06 [auto_parallel]: 7.03e-06 [parallel]: 4.97999e-06 [flash_sp]: 2.91999e-06 [merge_comm]: 4.87998e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.97e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.85002e-06 [virtual_dataset]: 9.37999e-06 [get_grad_eliminate_]: 9.24e-06 [virtual_output]: 8.55001e-06 [merge_forward]: 4.51002e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 8.70999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.65e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.397e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22999e-06 [meta_fg_expand]: 6.806e-05 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.629e-05 [a_after_grad]: 4.566e-05 [renormalize]: 0.00059742 [add_forward_monad_depend]: 4.19002e-06 [auto_monad_grad]: 1.24e-06 [auto_monad_eliminator]: 1.481e-05 [cse]: 4.646e-05 [a_3]: 6.643e-05 [Cycle 3]: 0.00090374, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 1.072e-05 [loop_unroll]: 9.12999e-06 [a_1]: 0.00025039 [with_stream_mark]: 9.78002e-06 [recompute_prepare]: 9.19e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 3.97998e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.00012439 [accelerated_algorithm]: 1.185e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 8.97e-06 [merge_send_recv]: 6.87002e-06 [auto_parallel]: 7.16999e-06 [parallel]: 4.57e-06 [flash_sp]: 9.89996e-07 [merge_comm]: 4.96002e-06 [allreduce_fusion]: 4.94e-06 [matmul_add_comm_reduction]: 7.41999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.84999e-06 [virtual_dataset]: 8.44998e-06 [get_grad_eliminate_]: 8.55999e-06 [virtual_output]: 8.22e-06 [merge_forward]: 4.42e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.3e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.603e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.406e-05 [set_forward_comm_id_for_comm_node_pass]: 5.21002e-06 [meta_fg_expand]: 2.79001e-06 [flash_sp_send_recv_attached]: 7.90023e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.524e-05 [a_after_grad]: 1.524e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.36998e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.076e-05 [cse]: 2.644e-05 [a_3]: 5.988e-05 [py_interpret_to_execute_after_opt_a]: 1.003e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 4.757e-05 [convert_after_rewriter]: 9.24e-06 [order_py_execute_after_rewriter]: 6.83e-06 [mutable_eliminate]: 0.00046091 [opt_b]: 0.00028892, [1] [Cycle 1]: 0.00028228, [7] [b_1]: 0.00018992 [b_2]: 1.086e-05 [updatestate_depend_eliminate]: 7.29001e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.92002e-06 [renormalize]: 2.80008e-07 [cse]: 3.131e-05 [optimize_parallel_all_gather_comm]: 2.006e-05 [overlap_param_gather]: 1.76003e-06 [cconv]: 1.943e-05 [loop_unroll]: 0.0004267 [opt_after_cconv]: 0.00013761, [1] [Cycle 1]: 0.00013137, [7] [c_1]: 4.928e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 4.12e-06 [cse]: 3.072e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 2.915e-05 [tuple_transform]: 0.00010159, [1] [Cycle 1]: 9.687e-05, [4] [d_1]: 6.739e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.738e-05 [cse_after_recomputation]: 3.222e-05, [1] [Cycle 1]: 2.734e-05, [1] [cse]: 2.2e-05 [environ_conv]: 8.72998e-06 [swap_dp_allreduce_reducescatter]: 7.87003e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 4.82e-06 [label_fine_grained_interleaved_index]: 2.96001e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.68e-06 [micro_interleaved_order_control]: 2.11998e-06 [assign_add_opt]: 1.45999e-06 [ForceFp32Comm]: 1.12e-06 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.50002e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.676e-05 [grouped_pairwise_exchange_alltoall]: 1.41002e-06 [offloading_packed_experts]: 4.87e-06 [overlap_recompute_and_grad_model_parallel]: 6.02001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.27001e-06 [overlap_grad_ring_attention]: 5.09998e-06 [overlap_grad_flash_sp]: 2.401e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 9.919e-05, [1] [Cycle 1]: 9.513e-05, [6] [build]: 9.78998e-06 [elim_shapecalc]: 1.362e-05 [elim_not_effective]: 1.838e-05 [opt_reshape]: 1.03e-05 [fold_const_symbol]: 1.451e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 2.598e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.64002e-06 [opt_after_jit_grad]: 0.00050155 [validate]: 4.508e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0797718 [execute]: 8.24002e-06 Sums bootstrap : 0.000506s : 0.48% type_inference : 0.011437s : 10.91% event_method : 0.000049s : 0.05% auto_monad : 0.000119s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.04% optimize.rewriter_before_opt_a : 0.000145s : 0.14% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000133s : 0.13% optimize.opt_a.loop_unroll : 0.000115s : 0.11% optimize.opt_a.a_1 : 0.003251s : 3.10% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000496s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000034s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001474s : 1.41% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000092s : 0.09% optimize.opt_a.a_after_grad : 0.000142s : 0.14% optimize.opt_a.renormalize : 0.003079s : 2.94% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000242s : 0.23% optimize.opt_a.a_3 : 0.000462s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000461s : 0.44% optimize.opt_b.b_1 : 0.000190s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000427s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000502s : 0.48% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079772s : 76.08% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000791 222 5.59% : 0.000044s : 12: substitution.arithmetic_simplify 1.71% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.45% : 0.000004s : 5: substitution.float_depend_g_call 0.59% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 57.90% : 0.000458s : 17: substitution.inline 1.98% : 0.000016s : 2: substitution.inline_without_move 1.26% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000015s : 3: substitution.less_batch_normalization 1.60% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.70% : 0.000013s : 20: substitution.remove_not_recompute_node 2.89% : 0.000023s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.28% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.49% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.71% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.22% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.23% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.22% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011364 2 86.59% : 0.009840s : 1: type_inference.infer 13.41% : 0.001524s : 1: type_inference.specialize ------[replace.] 0.000217 33 57.51% : 0.000125s : 17: replace.inline 42.49% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000482 33 93.09% : 0.000449s : 17: match.inline 6.91% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.00% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000042s : 249: predicate.inline 1.32% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.13% : 0.000009s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.04% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.85% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.32% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001603 34 56.20% : 0.000901s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.80% : 0.000702s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.133889 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.27% : 0.003042s : 1: add_attr 2.27% : 0.003033s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000127s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.40% : 0.000539s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000056s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.33% : 0.000436s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.70% : 0.004955s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.28% : 0.011083s : 1: opt_a 0.11% : 0.000141s : 1: opt_after_cconv 0.38% : 0.000511s : 1: opt_after_jit_grad 0.22% : 0.000293s : 1: opt_b 9.97% : 0.013355s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000054s : 1: pre_auto_parallel 0.03% : 0.000043s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.21% : 0.001615s : 2: renormalize.infer 1.08% : 0.001452s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.11% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000102s : 1: symbol_engine_optimizer 59.59% : 0.079789s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 8.55% : 0.011452s : 1: type_inference 0.05% : 0.000069s : 1: validate TotalTime = 0.0746551, [24] [bootstrap]: 0.00047396 [type_inference]: 0.00428679 [event_method]: 1.076e-05 [auto_monad]: 5.18e-05 [graph_reusing]: 5.48002e-06 [inline]: 1.46998e-06 [add_attr]: 0.00299022, [1] [add_attr_with_inline]: 0.00298177, [1] [Cycle 1]: 4.457e-05, [2] [tag_attr]: 1.138e-05 [meta_addattr_fg_expand]: 3.33e-06 [parallel-infer-symbol]: 3.14001e-06 [pre_auto_parallel]: 2.137e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 8.59989e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00374569, [53] [py_interpret_to_execute]: 1.495e-05 [rewriter_before_opt_a]: 3.812e-05 [opt_a]: 0.00186936, [2] [Cycle 1]: 0.00126482, [45] [expand_dump_flag]: 2.70002e-06 [switch_simplify]: 2.541e-05 [loop_unroll]: 1.354e-05 [a_1]: 0.00029534 [with_stream_mark]: 1.397e-05 [recompute_prepare]: 7.83999e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.812e-05 [accelerated_algorithm]: 6.30002e-06 [shard]: 2.63e-06 [meta_shard_fg_expand]: 1.46998e-06 [shard_inline]: 5.61003e-06 [merge_send_recv]: 7.5e-06 [auto_parallel]: 5.82999e-06 [parallel]: 1.767e-05 [flash_sp]: 7.03e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.44001e-06 [matmul_add_comm_reduction]: 9.24998e-06 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 7.29001e-06 [virtual_dataset]: 6.48998e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.91e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 8.79998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.12001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.20002e-06 [after_resolve]: 1.041e-05 [a_after_grad]: 8.84e-06 [renormalize]: 0.0003455 [add_forward_monad_depend]: 4.18001e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.286e-05 [cse]: 2.719e-05 [a_3]: 4.077e-05 [Cycle 2]: 0.00059518, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.78998e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012579 [with_stream_mark]: 1.103e-05 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.833e-05 [accelerated_algorithm]: 5.81998e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.23999e-06 [flash_sp]: 3.05002e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 3.08998e-06 [matmul_add_comm_reduction]: 5.14e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.19998e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.10002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.75e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00998e-06 [meta_fg_expand]: 1.52001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.67e-06 [a_after_grad]: 8.57e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.15002e-06 [cse]: 1.263e-05 [a_3]: 3.347e-05 [py_interpret_to_execute_after_opt_a]: 7.38e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.088e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00044476 [opt_b]: 0.00018171, [1] [Cycle 1]: 0.00017581, [7] [b_1]: 0.0001087 [b_2]: 6.82002e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 2.89991e-07 [cse]: 1.67e-05 [optimize_parallel_all_gather_comm]: 1.565e-05 [overlap_param_gather]: 2.14999e-06 [cconv]: 2.407e-05 [loop_unroll]: 0.0004139 [opt_after_cconv]: 0.00016389, [1] [Cycle 1]: 0.00015847, [7] [c_1]: 2.999e-05 [parameter_eliminate]: 2.59999e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.673e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.304e-05 [tuple_transform]: 7.041e-05, [1] [Cycle 1]: 6.579e-05, [4] [d_1]: 3.98e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.464e-05 [cse_after_recomputation]: 1.997e-05, [1] [Cycle 1]: 1.557e-05, [1] [cse]: 1.051e-05 [environ_conv]: 4.77998e-06 [swap_dp_allreduce_reducescatter]: 5.32001e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.08999e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.08002e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.43002e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.35999e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.22999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.51998e-06 [control_data_broadcast_order]: 1.206e-05 [grouped_pairwise_exchange_alltoall]: 2.46998e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.63999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 4.63001e-06 [overlap_grad_flash_sp]: 1.698e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.18002e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.739e-05, [1] [Cycle 1]: 6.335e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 8.02e-06 [elim_not_effective]: 1.096e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.84998e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00044993 [validate]: 3.148e-05 [backend_pass]: 7.89994e-07 [task_emit]: 0.062345 [execute]: 8.70999e-06 Sums bootstrap : 0.000474s : 0.67% type_inference : 0.004287s : 6.07% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000001s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000421s : 0.60% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000346s : 0.49% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.63% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000414s : 0.59% optimize.opt_after_cconv.c_1 : 0.000030s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.64% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.062345s : 88.26% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.30% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.52% : 0.000005s : 4: substitution.graph_param_transform 65.78% : 0.000079s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000004s : 4: substitution.remove_not_recompute_node 3.10% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004247 2 91.76% : 0.003897s : 1: type_inference.infer 8.24% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.58% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.93% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.08% : 0.000001s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.74% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.71% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.24% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 43.10% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.90% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.082674 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.62% : 0.002995s : 1: add_attr 3.61% : 0.002985s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000509s : 1: bootstrap 0.03% : 0.000028s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000004s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.94% : 0.000777s : 78: opt.transform.opt_a 0.03% : 0.000029s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.26% : 0.001872s : 1: opt_a 0.20% : 0.000167s : 1: opt_after_cconv 0.56% : 0.000459s : 1: opt_after_jit_grad 0.22% : 0.000185s : 1: opt_b 4.54% : 0.003750s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.23% : 0.000187s : 1: renormalize.infer 0.18% : 0.000152s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 75.43% : 0.062362s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.20% : 0.004301s : 1: type_inference 0.06% : 0.000053s : 1: validate . TotalTime = 0.117219, [24] [bootstrap]: 0.00052512 [type_inference]: 0.0104468 [event_method]: 4.375e-05 [auto_monad]: 0.00011642 [graph_reusing]: 7.90998e-06 [inline]: 2.08998e-06 [add_attr]: 0.00303622, [1] [add_attr_with_inline]: 0.00302708, [1] [Cycle 1]: 6.821e-05, [2] [tag_attr]: 3.185e-05 [meta_addattr_fg_expand]: 8.76002e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 4.725e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 8.70001e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0133867, [53] [py_interpret_to_execute]: 3.518e-05 [rewriter_before_opt_a]: 0.00012729 [opt_a]: 0.0110915, [3] [Cycle 1]: 0.00709873, [45] [expand_dump_flag]: 4.11001e-06 [switch_simplify]: 6.766e-05 [loop_unroll]: 5.521e-05 [a_1]: 0.00136284 [with_stream_mark]: 2.391e-05 [recompute_prepare]: 2.138e-05 [updatestate_depend_eliminate]: 9.05999e-06 [updatestate_assign_eliminate]: 7.61999e-06 [updatestate_loads_eliminate]: 7.2e-06 [parameter_eliminate]: 2.53e-06 [a_2]: 0.00024801 [accelerated_algorithm]: 3.203e-05 [shard]: 1.91e-06 [meta_shard_fg_expand]: 3.8e-06 [shard_inline]: 1.619e-05 [merge_send_recv]: 1.575e-05 [auto_parallel]: 1.118e-05 [parallel]: 1.913e-05 [flash_sp]: 1.118e-05 [merge_comm]: 9.96998e-06 [allreduce_fusion]: 9.36998e-06 [matmul_add_comm_reduction]: 2.623e-05 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 1.823e-05 [virtual_dataset]: 1.58e-05 [get_grad_eliminate_]: 1.519e-05 [virtual_output]: 1.534e-05 [merge_forward]: 9.49999e-06 [cell_reuse_recompute_pass]: 1.04003e-06 [offload_activation]: 1.752e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.895e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.751e-05 [set_forward_comm_id_for_comm_node_pass]: 9.74e-06 [meta_fg_expand]: 0.00140791 [flash_sp_send_recv_attached]: 3.51999e-06 [receive_attached]: 2.48e-06 [after_resolve]: 6.005e-05 [a_after_grad]: 8.538e-05 [renormalize]: 0.0025428 [add_forward_monad_depend]: 9.44e-06 [auto_monad_grad]: 6.02001e-06 [auto_monad_eliminator]: 5.812e-05 [cse]: 0.00017517 [a_3]: 0.00034023 [Cycle 2]: 0.00307011, [45] [expand_dump_flag]: 1.44998e-06 [switch_simplify]: 4.797e-05 [loop_unroll]: 4.443e-05 [a_1]: 0.00154664 [with_stream_mark]: 1.231e-05 [recompute_prepare]: 1.14e-05 [updatestate_depend_eliminate]: 5.39998e-06 [updatestate_assign_eliminate]: 4.52e-06 [updatestate_loads_eliminate]: 3.61999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.0001282 [accelerated_algorithm]: 1.213e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.17001e-06 [merge_send_recv]: 6.81999e-06 [auto_parallel]: 7.27002e-06 [parallel]: 5.49e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 5.22999e-06 [allreduce_fusion]: 4.95001e-06 [matmul_add_comm_reduction]: 7.85e-06 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 1.001e-05 [virtual_dataset]: 9.04e-06 [get_grad_eliminate_]: 6.54e-05 [virtual_output]: 9.09e-06 [merge_forward]: 5.20001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.672e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.453e-05 [set_forward_comm_id_for_comm_node_pass]: 5.54e-06 [meta_fg_expand]: 3.708e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 9.89996e-07 [after_resolve]: 1.489e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00060903 [add_forward_monad_depend]: 4.08001e-06 [auto_monad_grad]: 1.10001e-06 [auto_monad_eliminator]: 1.449e-05 [cse]: 4.624e-05 [a_3]: 6.599e-05 [Cycle 3]: 0.00090825, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.075e-05 [loop_unroll]: 9.34e-06 [a_1]: 0.00025251 [with_stream_mark]: 1.013e-05 [recompute_prepare]: 9.59999e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012442 [accelerated_algorithm]: 1.162e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 8.90001e-06 [merge_send_recv]: 7.24001e-06 [auto_parallel]: 7.36999e-06 [parallel]: 4.68999e-06 [flash_sp]: 1.09003e-06 [merge_comm]: 5.08002e-06 [allreduce_fusion]: 5.00999e-06 [matmul_add_comm_reduction]: 7.50998e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.001e-05 [virtual_dataset]: 8.57998e-06 [get_grad_eliminate_]: 8.48999e-06 [virtual_output]: 8.18999e-06 [merge_forward]: 4.27998e-06 [cell_reuse_recompute_pass]: 1.57001e-06 [offload_activation]: 8.50001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.587e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.411e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 2.98998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.469e-05 [a_after_grad]: 1.483e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.072e-05 [cse]: 2.597e-05 [a_3]: 6.031e-05 [py_interpret_to_execute_after_opt_a]: 1.034e-05 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 4.781e-05 [convert_after_rewriter]: 9.16998e-06 [order_py_execute_after_rewriter]: 7.08e-06 [mutable_eliminate]: 0.00046891 [opt_b]: 0.00029148, [1] [Cycle 1]: 0.0002848, [7] [b_1]: 0.00019095 [b_2]: 1.093e-05 [updatestate_depend_eliminate]: 7.21001e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 4.22998e-06 [renormalize]: 3.50003e-07 [cse]: 3.222e-05 [optimize_parallel_all_gather_comm]: 2.13e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 1.98e-05 [loop_unroll]: 0.00042794 [opt_after_cconv]: 0.00013638, [1] [Cycle 1]: 0.0001304, [7] [c_1]: 4.883e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 7.36999e-06 [updatestate_assign_eliminate]: 4.30999e-06 [updatestate_loads_eliminate]: 3.86001e-06 [cse]: 3.002e-05 [renormalize]: 3.49974e-07 [remove_dup_value]: 2.992e-05 [tuple_transform]: 0.00010308, [1] [Cycle 1]: 9.832e-05, [4] [d_1]: 6.788e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 9.99001e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 5.79e-05 [cse_after_recomputation]: 3.221e-05, [1] [Cycle 1]: 2.755e-05, [1] [cse]: 2.209e-05 [environ_conv]: 9.30001e-06 [swap_dp_allreduce_reducescatter]: 8.14002e-06 [bias_add_comm_swap]: 2.58003e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.41e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.03997e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 1.07e-06 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.727e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 5.18002e-06 [overlap_recompute_and_grad_model_parallel]: 5.30999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.95999e-06 [overlap_grad_flash_sp]: 2.521e-05 [begin_end_overlap_inline]: 6.69999e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 0.00012888, [1] [Cycle 1]: 0.00012445, [6] [build]: 1.004e-05 [elim_shapecalc]: 1.308e-05 [elim_not_effective]: 4.631e-05 [opt_reshape]: 1.061e-05 [fold_const_symbol]: 1.526e-05 [renormalize]: 2.50002e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.496e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 4.01001e-06 [opt_after_jit_grad]: 0.00048018 [validate]: 4.845e-05 [backend_pass]: 9.69972e-07 [task_emit]: 0.0888037 [execute]: 9.15999e-06 Sums bootstrap : 0.000525s : 0.47% type_inference : 0.010447s : 9.25% event_method : 0.000044s : 0.04% auto_monad : 0.000116s : 0.10% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.11% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.11% optimize.opt_a.loop_unroll : 0.000109s : 0.10% optimize.opt_a.a_1 : 0.003162s : 2.80% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000501s : 0.44% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000089s : 0.08% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001448s : 1.28% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.08% optimize.opt_a.a_after_grad : 0.000114s : 0.10% optimize.opt_a.renormalize : 0.003152s : 2.79% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.07% optimize.opt_a.cse : 0.000247s : 0.22% optimize.opt_a.a_3 : 0.000467s : 0.41% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000469s : 0.42% optimize.opt_b.b_1 : 0.000191s : 0.17% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000428s : 0.38% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000046s : 0.04% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000480s : 0.43% validate : 0.000048s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.088804s : 78.65% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000740 218 5.78% : 0.000043s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.60% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.17% : 0.000408s : 16: substitution.inline 2.13% : 0.000016s : 2: substitution.inline_without_move 1.40% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.15% : 0.000016s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.77% : 0.000013s : 20: substitution.remove_not_recompute_node 3.27% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.42% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.49% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010376 2 86.99% : 0.009026s : 1: type_inference.infer 13.01% : 0.001350s : 1: type_inference.specialize ------[replace.] 0.000201 30 58.99% : 0.000119s : 16: replace.inline 41.01% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000430 30 92.94% : 0.000400s : 16: match.inline 7.06% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000746 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.06% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.73% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.60% : 0.000005s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.59% : 0.000042s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.65% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.96% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.60% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001541 32 57.07% : 0.000880s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.93% : 0.000662s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.142039 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.14% : 0.003041s : 1: add_attr 2.13% : 0.003031s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000124s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.39% : 0.000558s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000051s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.31% : 0.000436s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000477s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.44% : 0.004891s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000177s : 28: opt.transform.opt_b 0.05% : 0.000076s : 2: opt.transform.opt_trans_graph 0.06% : 0.000082s : 4: opt.transform.symbol_engine_opt 7.81% : 0.011094s : 1: opt_a 0.10% : 0.000140s : 1: opt_after_cconv 0.34% : 0.000490s : 1: opt_after_jit_grad 0.21% : 0.000295s : 1: opt_b 9.43% : 0.013391s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.18% : 0.001680s : 2: renormalize.infer 1.03% : 0.001458s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.09% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000132s : 1: symbol_engine_optimizer 62.54% : 0.088825s : 1: task_emit 0.07% : 0.000106s : 1: tuple_transform 7.37% : 0.010463s : 1: type_inference 0.05% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x3-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x4-pynative],max_mem:60.0M TotalTime = 0.0217423, [24] [bootstrap]: 0.00056752 [type_inference]: 0.00627334 [event_method]: 1.462e-05 [auto_monad]: 5.433e-05 [graph_reusing]: 5.59e-06 [inline]: 1.65001e-06 [add_attr]: 0.00356276, [1] [add_attr_with_inline]: 0.00355177, [1] [Cycle 1]: 4.515e-05, [2] [tag_attr]: 1.561e-05 [meta_addattr_fg_expand]: 4.33001e-06 [parallel-infer-symbol]: 3.16001e-06 [pre_auto_parallel]: 2.836e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00399585, [53] [py_interpret_to_execute]: 1.95e-05 [rewriter_before_opt_a]: 5.834e-05 [opt_a]: 0.00215984, [2] [Cycle 1]: 0.00154321, [45] [expand_dump_flag]: 3.02002e-06 [switch_simplify]: 3.124e-05 [loop_unroll]: 2.071e-05 [a_1]: 0.00045602 [with_stream_mark]: 1.42e-05 [recompute_prepare]: 7.85e-06 [updatestate_depend_eliminate]: 4.05e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.555e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.05002e-06 [merge_send_recv]: 7.76001e-06 [auto_parallel]: 6.59999e-06 [parallel]: 2.514e-05 [flash_sp]: 7.38e-06 [merge_comm]: 3.66999e-06 [allreduce_fusion]: 3.28998e-06 [matmul_add_comm_reduction]: 8.45999e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 6.13002e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.54998e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 9.15001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.74001e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 9.06002e-06 [renormalize]: 0.00044086 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.83002e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.73e-05 [a_3]: 4.022e-05 [Cycle 2]: 0.00060738, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012573 [with_stream_mark]: 9.79e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.695e-05 [accelerated_algorithm]: 5.46998e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.61998e-06 [merge_send_recv]: 4.63999e-06 [auto_parallel]: 5.20001e-06 [parallel]: 3.95998e-06 [flash_sp]: 3.47997e-06 [merge_comm]: 2.86999e-06 [allreduce_fusion]: 3.03998e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.36002e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 5.59e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.73998e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.76001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.88e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 7.61999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.27001e-06 [cse]: 1.642e-05 [a_3]: 4.885e-05 [py_interpret_to_execute_after_opt_a]: 7.57002e-06 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 3.107e-05 [convert_after_rewriter]: 6.73998e-06 [order_py_execute_after_rewriter]: 5.02999e-06 [mutable_eliminate]: 0.00045336 [opt_b]: 0.00018245, [1] [Cycle 1]: 0.00017637, [7] [b_1]: 0.00010938 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 4.10015e-07 [cse]: 1.567e-05 [optimize_parallel_all_gather_comm]: 1.589e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 2.226e-05 [loop_unroll]: 0.00041045 [opt_after_cconv]: 9.533e-05, [1] [Cycle 1]: 8.944e-05, [7] [c_1]: 2.75e-05 [parameter_eliminate]: 2.51e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.627e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.229e-05 [tuple_transform]: 6.894e-05, [1] [Cycle 1]: 6.45e-05, [4] [d_1]: 3.904e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.058e-05 [cse_after_recomputation]: 2.052e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.091e-05 [environ_conv]: 4.59002e-06 [swap_dp_allreduce_reducescatter]: 5.04003e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.27998e-06 [label_fine_grained_interleaved_index]: 2.80002e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.25002e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.39003e-06 [ForceFp32Comm]: 1.10001e-06 [remove_cast_before_assign_add]: 7.59988e-07 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.091e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.47e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40999e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.677e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.16998e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.04e-06 [symbol_engine_optimizer]: 6.625e-05, [1] [Cycle 1]: 6.222e-05, [6] [build]: 2.35002e-06 [elim_shapecalc]: 8.26002e-06 [elim_not_effective]: 1.096e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.0004462 [validate]: 3.085e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00652202 [execute]: 7.71001e-06 Sums bootstrap : 0.000568s : 3.30% type_inference : 0.006273s : 36.42% event_method : 0.000015s : 0.08% auto_monad : 0.000054s : 0.32% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000582s : 3.38% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000441s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000044s : 0.25% optimize.opt_a.a_3 : 0.000089s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.63% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000410s : 2.38% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 2.59% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006522s : 37.87% execute : 0.000008s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.30% : 0.000024s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.43% : 0.000006s : 4: substitution.graph_param_transform 67.22% : 0.000112s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.69% : 0.000004s : 4: substitution.remove_not_recompute_node 2.56% : 0.000004s : 4: substitution.replace_old_param 6.32% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006227 2 90.55% : 0.005638s : 1: type_inference.infer 9.45% : 0.000589s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.48% : 0.000027s : 3: replace.inline 29.52% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 92.09% : 0.000110s : 3: match.inline 7.91% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.94% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.89% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.06% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.16% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.89% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.61% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.49% : 0.000004s : 32: predicate.load_eliminater 0.98% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.63% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.62% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.61% : 0.000003s : 19: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.26% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000389 8 47.60% : 0.000185s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.40% : 0.000204s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030852 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.56% : 0.003567s : 1: add_attr 11.52% : 0.003556s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.95% : 0.000602s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.11% : 0.000960s : 78: opt.transform.opt_a 0.08% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.01% : 0.002163s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.48% : 0.000455s : 1: opt_after_jit_grad 0.60% : 0.000186s : 1: opt_b 12.96% : 0.004000s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.73% : 0.000227s : 1: renormalize.infer 0.67% : 0.000207s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000069s : 1: symbol_engine_optimizer 21.17% : 0.006532s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.38% : 0.006288s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0181745, [24] [bootstrap]: 0.00042484 [type_inference]: 0.00436581 [event_method]: 1.055e-05 [auto_monad]: 5.095e-05 [graph_reusing]: 5.14998e-06 [inline]: 1.73002e-06 [add_attr]: 0.00296397, [1] [add_attr_with_inline]: 0.00295635, [1] [Cycle 1]: 4.114e-05, [2] [tag_attr]: 1.204e-05 [meta_addattr_fg_expand]: 3.25e-06 [parallel-infer-symbol]: 2.80002e-06 [pre_auto_parallel]: 2.218e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 2.04999e-06 [optimize]: 0.00369254, [53] [py_interpret_to_execute]: 1.5e-05 [rewriter_before_opt_a]: 3.805e-05 [opt_a]: 0.00189721, [2] [Cycle 1]: 0.00129872, [45] [expand_dump_flag]: 2.45002e-06 [switch_simplify]: 2.419e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.0002909 [with_stream_mark]: 1.258e-05 [recompute_prepare]: 7.1e-06 [updatestate_depend_eliminate]: 3.38999e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.81998e-06 [a_2]: 7.689e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 3.28e-06 [meta_shard_fg_expand]: 1.41998e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 8.46002e-06 [auto_parallel]: 6.39999e-06 [parallel]: 1.785e-05 [flash_sp]: 7.97e-06 [merge_comm]: 4.32e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 9.52999e-06 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 6.00002e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.63999e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 1.015e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.113e-05 [merge_recompute_call_nodes]: 1.89e-06 [before_grad]: 9.05001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.88e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.037e-05 [a_after_grad]: 8.73001e-06 [renormalize]: 0.00035126 [add_forward_monad_depend]: 4.66002e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 3.033e-05 [a_3]: 3.949e-05 [Cycle 2]: 0.00058885, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.72002e-06 [loop_unroll]: 5.47001e-06 [a_1]: 0.00012568 [with_stream_mark]: 1.23e-05 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.28002e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.792e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.21002e-06 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.08002e-06 [parallel]: 4.3e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 2.95998e-06 [allreduce_fusion]: 2.65002e-06 [matmul_add_comm_reduction]: 5.09003e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 5.86e-06 [virtual_dataset]: 5.15999e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.36e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 6.28998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.08002e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.23001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89001e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.10001e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.244e-05 [a_3]: 3.077e-05 [py_interpret_to_execute_after_opt_a]: 7.47002e-06 [slice_cell_reuse_recomputed_activation]: 2.49999e-06 [rewriter_after_opt_a]: 3.208e-05 [convert_after_rewriter]: 7.53e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00044308 [opt_b]: 0.00017978, [1] [Cycle 1]: 0.00017355, [7] [b_1]: 0.00010673 [b_2]: 6.95998e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 4.59986e-07 [cse]: 1.563e-05 [optimize_parallel_all_gather_comm]: 1.596e-05 [overlap_param_gather]: 2.16998e-06 [cconv]: 2.246e-05 [loop_unroll]: 0.00041046 [opt_after_cconv]: 9.476e-05, [1] [Cycle 1]: 8.927e-05, [7] [c_1]: 2.755e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.647e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.42e-05, [4] [d_1]: 3.902e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.396e-05 [cse_after_recomputation]: 1.991e-05, [1] [Cycle 1]: 1.57e-05, [1] [cse]: 1.063e-05 [environ_conv]: 4.50001e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.81999e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.71999e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.44e-06 [full_micro_interleaved_order_control]: 2.16998e-06 [reorder_send_recv_between_fp_bp]: 2.88998e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 9.40025e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.15e-05 [grouped_pairwise_exchange_alltoall]: 1.56002e-06 [offloading_packed_experts]: 3.99002e-06 [overlap_recompute_and_grad_model_parallel]: 5.05999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 4.14002e-06 [overlap_grad_flash_sp]: 1.742e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 1.87999e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.688e-05, [1] [Cycle 1]: 6.301e-05, [6] [build]: 2.52001e-06 [elim_shapecalc]: 8.22e-06 [elim_not_effective]: 1.113e-05 [opt_reshape]: 5.69e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.512e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.2e-06 [opt_after_jit_grad]: 0.00044728 [validate]: 3.109e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00592651 [execute]: 6.91001e-06 Sums bootstrap : 0.000425s : 2.98% type_inference : 0.004366s : 30.67% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.93% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000351s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000043s : 0.30% optimize.opt_a.a_3 : 0.000070s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000032s : 0.23% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000443s : 3.11% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000410s : 2.88% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.04% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.14% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005927s : 41.64% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.62% : 0.000022s : 4: substitution.arithmetic_simplify 1.52% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.47% : 0.000005s : 4: substitution.graph_param_transform 65.11% : 0.000077s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.57% : 0.000004s : 4: substitution.remove_not_recompute_node 3.32% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004325 2 91.52% : 0.003958s : 1: type_inference.infer 8.48% : 0.000367s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.16% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.50% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.18% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.82% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.92% : 0.000001s : 9: predicate.print_const_string_wrapper 0.83% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.10% : 0.000002s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.99% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.83% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.53% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.66% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 42.61% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.39% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026103 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.37% : 0.002968s : 1: add_attr 11.34% : 0.002960s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.75% : 0.000457s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000453s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000764s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.28% : 0.001900s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000457s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.16% : 0.003696s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000194s : 1: renormalize.infer 0.57% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000069s : 1: symbol_engine_optimizer 22.74% : 0.005936s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.78% : 0.004380s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x4-kbk],max_mem:60.0M TotalTime = 0.0937188, [24] [bootstrap]: 0.00058235 [type_inference]: 0.00614687 [event_method]: 1.395e-05 [auto_monad]: 5.918e-05 [graph_reusing]: 6.02999e-06 [inline]: 1.86e-06 [add_attr]: 0.00355201, [1] [add_attr_with_inline]: 0.00354071, [1] [Cycle 1]: 4.594e-05, [2] [tag_attr]: 1.554e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 2.917e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.72001e-06 [pipeline_split]: 1.78002e-06 [optimize]: 0.00400973, [53] [py_interpret_to_execute]: 2.03e-05 [rewriter_before_opt_a]: 5.94e-05 [opt_a]: 0.00216454, [2] [Cycle 1]: 0.00152898, [45] [expand_dump_flag]: 3.18e-06 [switch_simplify]: 3.142e-05 [loop_unroll]: 2.012e-05 [a_1]: 0.00045033 [with_stream_mark]: 1.315e-05 [recompute_prepare]: 7.46001e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.25e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.444e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.17003e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.534e-05 [flash_sp]: 7.3e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.64003e-06 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 6.13998e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.79e-06 [merge_forward]: 3.98999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.55001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.09e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 8.99998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.96999e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 1.015e-05 [a_after_grad]: 9.03002e-06 [renormalize]: 0.00043383 [add_forward_monad_depend]: 4.38001e-06 [auto_monad_grad]: 1.99999e-06 [auto_monad_eliminator]: 1.405e-05 [cse]: 2.734e-05 [a_3]: 4.047e-05 [Cycle 2]: 0.00062603, [45] [expand_dump_flag]: 7.50006e-07 [switch_simplify]: 7.45e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012568 [with_stream_mark]: 9.83002e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.23998e-06 [updatestate_loads_eliminate]: 2.61e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.689e-05 [accelerated_algorithm]: 5.62999e-06 [shard]: 1.09003e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.44002e-06 [auto_parallel]: 5.46e-06 [parallel]: 4.38999e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.88998e-06 [matmul_add_comm_reduction]: 5.34e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.14001e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.41e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 5.96998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44998e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.98999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 1.71002e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.04e-06 [a_after_grad]: 8.11002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.665e-05 [a_3]: 3.155e-05 [py_interpret_to_execute_after_opt_a]: 8.54e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.202e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 4.97999e-06 [mutable_eliminate]: 0.00045304 [opt_b]: 0.00018126, [1] [Cycle 1]: 0.00017489, [7] [b_1]: 0.00010657 [b_2]: 7.37002e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.58003e-06 [renormalize]: 4.30009e-07 [cse]: 1.633e-05 [optimize_parallel_all_gather_comm]: 1.56e-05 [overlap_param_gather]: 2.29001e-06 [cconv]: 2.311e-05 [loop_unroll]: 0.00041579 [opt_after_cconv]: 9.46e-05, [1] [Cycle 1]: 8.903e-05, [7] [c_1]: 2.829e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.52999e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.553e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.268e-05 [tuple_transform]: 6.86e-05, [1] [Cycle 1]: 6.438e-05, [4] [d_1]: 3.904e-05 [none_parameter_eliminate]: 1.76003e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.09001e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.874e-05 [cse_after_recomputation]: 1.971e-05, [1] [Cycle 1]: 1.554e-05, [1] [cse]: 1.044e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.05001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.47998e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.86999e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.56002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.07998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.136e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.32002e-06 [overlap_recompute_and_grad_model_parallel]: 4.23999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.53e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.763e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.67999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.831e-05, [1] [Cycle 1]: 6.416e-05, [6] [build]: 2.24999e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.246e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 8.90999e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.565e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.0004527 [validate]: 3.102e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.0785845 [execute]: 9.03002e-06 Sums bootstrap : 0.000582s : 0.65% type_inference : 0.006147s : 6.89% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000576s : 0.65% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000030s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000434s : 0.49% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000044s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.51% optimize.opt_b.b_1 : 0.000107s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000416s : 0.47% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.05% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000453s : 0.51% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.078585s : 88.13% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.99% : 0.000025s : 5: substitution.arithmetic_simplify 1.27% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.37% : 0.000006s : 4: substitution.graph_param_transform 67.07% : 0.000112s : 3: substitution.inline 1.60% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.46% : 0.000004s : 4: substitution.remove_not_recompute_node 2.27% : 0.000004s : 4: substitution.replace_old_param 6.26% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006101 2 90.81% : 0.005541s : 1: type_inference.infer 9.19% : 0.000560s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.87% : 0.000027s : 3: replace.inline 29.13% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 92.14% : 0.000110s : 3: match.inline 7.86% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 1.02% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.62% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.31% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.96% : 0.000009s : 51: predicate.inline 0.92% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.92% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.48% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.64% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.48% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.19% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000351 8 46.00% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.00% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.102800 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.46% : 0.003556s : 1: add_attr 3.45% : 0.003544s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000065s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.60% : 0.000613s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.41% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.91% : 0.000937s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.11% : 0.002167s : 1: opt_a 0.10% : 0.000098s : 1: opt_after_cconv 0.45% : 0.000463s : 1: opt_after_jit_grad 0.18% : 0.000185s : 1: opt_b 3.90% : 0.004013s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000034s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.22% : 0.000224s : 1: renormalize.infer 0.20% : 0.000203s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.06% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000006s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000071s : 1: symbol_engine_optimizer 76.47% : 0.078607s : 1: task_emit 0.07% : 0.000071s : 1: tuple_transform 5.99% : 0.006160s : 1: type_inference 0.05% : 0.000056s : 1: validate TotalTime = 0.0747338, [24] [bootstrap]: 0.00048175 [type_inference]: 0.00441555 [event_method]: 1.097e-05 [auto_monad]: 5.037e-05 [graph_reusing]: 5.39e-06 [inline]: 2.20002e-06 [add_attr]: 0.00298252, [1] [add_attr_with_inline]: 0.00297434, [1] [Cycle 1]: 4.322e-05, [2] [tag_attr]: 1.164e-05 [meta_addattr_fg_expand]: 3.01001e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.257e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.27999e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00367121, [53] [py_interpret_to_execute]: 1.449e-05 [rewriter_before_opt_a]: 3.903e-05 [opt_a]: 0.00187161, [2] [Cycle 1]: 0.00125725, [45] [expand_dump_flag]: 2.81999e-06 [switch_simplify]: 2.407e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00029074 [with_stream_mark]: 1.342e-05 [recompute_prepare]: 7.21999e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.551e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 8.23001e-06 [auto_parallel]: 5.67001e-06 [parallel]: 1.66e-05 [flash_sp]: 7.37002e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.10999e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.01999e-06 [virtual_dataset]: 5.69999e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.89999e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.37001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.049e-05 [merge_recompute_call_nodes]: 1.52999e-06 [before_grad]: 8.82e-06 [set_forward_comm_id_for_comm_node_pass]: 3.25998e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 8.84e-06 [renormalize]: 0.00035336 [add_forward_monad_depend]: 5.23002e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.657e-05 [a_3]: 4.027e-05 [Cycle 2]: 0.00060514, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.76e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00013352 [with_stream_mark]: 9.92999e-06 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 2.66e-06 [updatestate_assign_eliminate]: 2.18998e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.731e-05 [accelerated_algorithm]: 5.47999e-06 [shard]: 1.74e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 4.99e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.21001e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.11002e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.22e-06 [merge_forward]: 2.53003e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.76e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.48001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 1.04e-06 [receive_attached]: 1.07998e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 8.08999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.53e-06 [cse]: 1.251e-05 [a_3]: 3.163e-05 [py_interpret_to_execute_after_opt_a]: 7.71001e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.059e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00044822 [opt_b]: 0.00017854, [1] [Cycle 1]: 0.0001728, [7] [b_1]: 0.00010628 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.33998e-06 [renormalize]: 3.29979e-07 [cse]: 1.648e-05 [optimize_parallel_all_gather_comm]: 1.547e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.233e-05 [loop_unroll]: 0.00041538 [opt_after_cconv]: 9.438e-05, [1] [Cycle 1]: 8.86e-05, [7] [c_1]: 2.751e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.553e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.223e-05 [tuple_transform]: 6.884e-05, [1] [Cycle 1]: 6.437e-05, [4] [d_1]: 3.841e-05 [none_parameter_eliminate]: 1.84e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.18002e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.336e-05 [cse_after_recomputation]: 2.042e-05, [1] [Cycle 1]: 1.582e-05, [1] [cse]: 1.07e-05 [environ_conv]: 5.03002e-06 [swap_dp_allreduce_reducescatter]: 5.23002e-06 [bias_add_comm_swap]: 2.36e-06 [label_micro_interleaved_index]: 4.46002e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.12001e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 8.49977e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.53998e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.01997e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.75e-06 [overlap_recompute_and_grad_model_parallel]: 4.34002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.50002e-06 [overlap_grad_ring_attention]: 4.12998e-06 [overlap_grad_flash_sp]: 1.692e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 6.719e-05, [1] [Cycle 1]: 6.312e-05, [6] [build]: 2.14999e-06 [elim_shapecalc]: 8.16002e-06 [elim_not_effective]: 1.081e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 1.11002e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00044578 [validate]: 3.204e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0623675 [execute]: 9.44998e-06 Sums bootstrap : 0.000482s : 0.68% type_inference : 0.004416s : 6.24% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000424s : 0.60% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000353s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.63% optimize.opt_b.b_1 : 0.000106s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.59% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.63% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.062368s : 88.11% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.13% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.35% : 0.000002s : 2: substitution.fold_const_symbol 4.34% : 0.000005s : 4: substitution.graph_param_transform 65.14% : 0.000077s : 2: substitution.inline 2.54% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.83% : 0.000005s : 4: substitution.remove_not_recompute_node 3.22% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004375 2 91.83% : 0.004018s : 1: type_inference.infer 8.17% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.88% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.99% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.11% : 0.000002s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 1.09% : 0.000002s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.15% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.09% : 0.000002s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000257 6 43.40% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.60% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.082666 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.61% : 0.002987s : 1: add_attr 3.60% : 0.002978s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000514s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.93% : 0.000772s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.27% : 0.001875s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.55% : 0.000455s : 1: opt_after_jit_grad 0.22% : 0.000182s : 1: opt_b 4.45% : 0.003675s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000196s : 1: renormalize.infer 0.18% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 75.47% : 0.062389s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.36% : 0.004429s : 1: type_inference 0.07% : 0.000054s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x4-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x5-pynative],max_mem:60.0M TotalTime = 0.0211881, [24] [bootstrap]: 0.00055934 [type_inference]: 0.00611757 [event_method]: 1.468e-05 [auto_monad]: 6.011e-05 [graph_reusing]: 5.51998e-06 [inline]: 1.66998e-06 [add_attr]: 0.00340716, [1] [add_attr_with_inline]: 0.00339631, [1] [Cycle 1]: 4.527e-05, [2] [tag_attr]: 1.577e-05 [meta_addattr_fg_expand]: 4.27e-06 [parallel-infer-symbol]: 3.21001e-06 [pre_auto_parallel]: 2.916e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00403564, [53] [py_interpret_to_execute]: 1.97e-05 [rewriter_before_opt_a]: 5.839e-05 [opt_a]: 0.00219048, [2] [Cycle 1]: 0.00157931, [45] [expand_dump_flag]: 2.66999e-06 [switch_simplify]: 3.181e-05 [loop_unroll]: 2.079e-05 [a_1]: 0.00045495 [with_stream_mark]: 1.329e-05 [recompute_prepare]: 8.1e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.66998e-06 [a_2]: 7.644e-05 [accelerated_algorithm]: 7.1e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 7.84002e-06 [auto_parallel]: 5.58002e-06 [parallel]: 2.342e-05 [flash_sp]: 7.87e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 8.30999e-06 [allreduce_slice_to_reducescatter]: 7.79983e-07 [virtual_shard_identity]: 7.63001e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.057e-05 [merge_recompute_call_nodes]: 1.30001e-06 [before_grad]: 9.33002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16999e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.61999e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 9.74999e-06 [a_after_grad]: 8.79e-06 [renormalize]: 0.00047312 [add_forward_monad_depend]: 4.21001e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.411e-05 [cse]: 2.704e-05 [a_3]: 4.136e-05 [Cycle 2]: 0.0006018, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 7.13998e-06 [loop_unroll]: 5.81e-06 [a_1]: 0.00012659 [with_stream_mark]: 1.056e-05 [recompute_prepare]: 5.92001e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.832e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.28002e-06 [shard_inline]: 5.32001e-06 [merge_send_recv]: 4.16001e-06 [auto_parallel]: 5.42999e-06 [parallel]: 3.9e-06 [flash_sp]: 4.93001e-06 [merge_comm]: 3.14999e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 4.98001e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.73997e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.04002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.14e-06 [after_resolve]: 9.52001e-06 [a_after_grad]: 8.47e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 6.62002e-06 [cse]: 1.319e-05 [a_3]: 3.232e-05 [py_interpret_to_execute_after_opt_a]: 7.54002e-06 [slice_cell_reuse_recomputed_activation]: 1.85001e-06 [rewriter_after_opt_a]: 3.111e-05 [convert_after_rewriter]: 6.84001e-06 [order_py_execute_after_rewriter]: 5.50001e-06 [mutable_eliminate]: 0.00045692 [opt_b]: 0.00018143, [1] [Cycle 1]: 0.00017545, [7] [b_1]: 0.00010799 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.56998e-06 [renormalize]: 3.50003e-07 [cse]: 1.575e-05 [optimize_parallel_all_gather_comm]: 1.549e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.179e-05 [loop_unroll]: 0.00041451 [opt_after_cconv]: 9.585e-05, [1] [Cycle 1]: 9.023e-05, [7] [c_1]: 2.814e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.64001e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.66e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 6.873e-05, [1] [Cycle 1]: 6.451e-05, [4] [d_1]: 3.904e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.82e-05 [cse_after_recomputation]: 2.029e-05, [1] [Cycle 1]: 1.61e-05, [1] [cse]: 1.131e-05 [environ_conv]: 4.89e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 2.49999e-06 [label_micro_interleaved_index]: 4.62e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.40999e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 9.29984e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92001e-06 [control_data_broadcast_order]: 1.2e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 4.07e-06 [overlap_recompute_and_grad_model_parallel]: 4.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67001e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 4.17e-06 [overlap_grad_flash_sp]: 1.631e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.841e-05, [1] [Cycle 1]: 6.439e-05, [6] [build]: 2.48998e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.16e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.85001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.80001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.54e-05 [get_jit_bprop_graph]: 9.30013e-07 [rewriter_after_jit_bprop_graph]: 3.49001e-06 [opt_after_jit_grad]: 0.00044879 [validate]: 3.229e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00623728 [execute]: 7.15e-06 Sums bootstrap : 0.000559s : 3.33% type_inference : 0.006118s : 36.38% event_method : 0.000015s : 0.09% auto_monad : 0.000060s : 0.36% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000582s : 3.46% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000473s : 2.81% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000457s : 2.72% optimize.opt_b.b_1 : 0.000108s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000415s : 2.47% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.67% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006237s : 37.09% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 15.64% : 0.000026s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.04% : 0.000109s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.77% : 0.000005s : 4: substitution.remove_not_recompute_node 2.26% : 0.000004s : 4: substitution.replace_old_param 6.60% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006073 2 90.65% : 0.005505s : 1: type_inference.infer 9.35% : 0.000568s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.77% : 0.000027s : 3: replace.inline 29.23% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.56% : 0.000107s : 3: match.inline 8.44% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.01% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.20% : 0.000002s : 15: predicate.environ_get_depend_swap 1.73% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.22% : 0.000004s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.34% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.65% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.50% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000002s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.48% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000365 8 46.51% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.49% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030208 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.29% : 0.003411s : 1: add_attr 11.26% : 0.003400s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000066s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.96% : 0.000591s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000466s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.15% : 0.000951s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.26% : 0.002193s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.52% : 0.000459s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.37% : 0.004039s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.81% : 0.000244s : 1: renormalize.infer 0.74% : 0.000223s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.68% : 0.006248s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.30% : 0.006132s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0182589, [24] [bootstrap]: 0.0004698 [type_inference]: 0.00434454 [event_method]: 1.022e-05 [auto_monad]: 5.071e-05 [graph_reusing]: 5.09e-06 [inline]: 1.89e-06 [add_attr]: 0.00299929, [1] [add_attr_with_inline]: 0.00299017, [1] [Cycle 1]: 4.222e-05, [2] [tag_attr]: 1.228e-05 [meta_addattr_fg_expand]: 3.09001e-06 [parallel-infer-symbol]: 3.08e-06 [pre_auto_parallel]: 2.23e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 1.83997e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00370494, [53] [py_interpret_to_execute]: 1.642e-05 [rewriter_before_opt_a]: 3.961e-05 [opt_a]: 0.00189732, [2] [Cycle 1]: 0.00126404, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.505e-05 [loop_unroll]: 1.39e-05 [a_1]: 0.00029298 [with_stream_mark]: 1.342e-05 [recompute_prepare]: 7.20998e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.53e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.711e-05 [accelerated_algorithm]: 6.17001e-06 [shard]: 1.99999e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 7.31001e-06 [auto_parallel]: 5.78997e-06 [parallel]: 1.671e-05 [flash_sp]: 6.94999e-06 [merge_comm]: 3.9e-06 [allreduce_fusion]: 3.25998e-06 [matmul_add_comm_reduction]: 9.57999e-06 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 7.26999e-06 [virtual_dataset]: 5.72999e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 5.81e-06 [merge_forward]: 4.07998e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.94001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.56998e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00035259 [add_forward_monad_depend]: 4.15999e-06 [auto_monad_grad]: 1.87001e-06 [auto_monad_eliminator]: 1.336e-05 [cse]: 2.687e-05 [a_3]: 4.003e-05 [Cycle 2]: 0.00062388, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.78997e-06 [a_1]: 0.0001255 [with_stream_mark]: 1.068e-05 [recompute_prepare]: 5.50001e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.775e-05 [accelerated_algorithm]: 5.51998e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.27998e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 3.21999e-06 [allreduce_fusion]: 2.76999e-06 [matmul_add_comm_reduction]: 5.14998e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 5.32999e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.91998e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.15e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.09e-06 [after_resolve]: 1.022e-05 [a_after_grad]: 8.3e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.30999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.09001e-06 [cse]: 1.287e-05 [a_3]: 5.747e-05 [py_interpret_to_execute_after_opt_a]: 8.14002e-06 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 3.075e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 5.13002e-06 [mutable_eliminate]: 0.00044828 [opt_b]: 0.00018019, [1] [Cycle 1]: 0.00017417, [7] [b_1]: 0.0001071 [b_2]: 6.96999e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 6.19999e-07 [cse]: 1.636e-05 [optimize_parallel_all_gather_comm]: 1.677e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.271e-05 [loop_unroll]: 0.0004121 [opt_after_cconv]: 9.508e-05, [1] [Cycle 1]: 8.922e-05, [7] [c_1]: 2.746e-05 [parameter_eliminate]: 2.42001e-06 [updatestate_depend_eliminate]: 5.54e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.653e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.225e-05 [tuple_transform]: 6.848e-05, [1] [Cycle 1]: 6.402e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.43002e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.508e-05 [cse_after_recomputation]: 1.98e-05, [1] [Cycle 1]: 1.548e-05, [1] [cse]: 1.038e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.34998e-06 [bias_add_comm_swap]: 2.79001e-06 [label_micro_interleaved_index]: 4.60001e-06 [label_fine_grained_interleaved_index]: 2.88e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.36998e-06 [reorder_send_recv_between_fp_bp]: 2.93e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.18001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.06002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.187e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.83999e-06 [overlap_recompute_and_grad_model_parallel]: 4.27e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.93999e-06 [overlap_grad_flash_sp]: 1.667e-05 [begin_end_overlap_inline]: 5.59987e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 6.76e-05, [1] [Cycle 1]: 6.341e-05, [6] [build]: 2.22001e-06 [elim_shapecalc]: 8.18001e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 6.26998e-06 [fold_const_symbol]: 8.59e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.482e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.50998e-06 [opt_after_jit_grad]: 0.0004496 [validate]: 3.242e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00593078 [execute]: 7.03998e-06 Sums bootstrap : 0.000470s : 3.29% type_inference : 0.004345s : 30.38% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000418s : 2.93% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000353s : 2.47% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000098s : 0.68% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.06% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.14% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.12% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000412s : 2.88% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000017s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.32% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000450s : 3.14% validate : 0.000032s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.005931s : 41.48% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.15% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.20% : 0.000005s : 4: substitution.graph_param_transform 65.76% : 0.000080s : 2: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.73% : 0.000005s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004304 2 92.04% : 0.003961s : 1: type_inference.infer 7.96% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.22% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.38% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.63% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.12% : 0.000003s : 26: predicate.load_eliminater 1.35% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.84% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.51% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.61% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 1.11% : 0.000002s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.54% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.10% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.90% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026269 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.43% : 0.003003s : 1: add_attr 11.40% : 0.002994s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.19% : 0.000049s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000018s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.93% : 0.000506s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000457s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.03% : 0.000797s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.23% : 0.001900s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000459s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.12% : 0.003709s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000193s : 1: renormalize.infer 0.58% : 0.000153s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.61% : 0.005940s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.59% : 0.004359s : 1: type_inference 0.22% : 0.000059s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x5-kbk],max_mem:60.0M TotalTime = 0.0927408, [24] [bootstrap]: 0.00056418 [type_inference]: 0.00605191 [event_method]: 1.373e-05 [auto_monad]: 5.485e-05 [graph_reusing]: 5.30999e-06 [inline]: 1.98002e-06 [add_attr]: 0.00351713, [1] [add_attr_with_inline]: 0.00350592, [1] [Cycle 1]: 4.531e-05, [2] [tag_attr]: 1.537e-05 [meta_addattr_fg_expand]: 4.28001e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.888e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 1.13001e-06 [dataset_repeat_opt]: 2.31998e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00401264, [53] [py_interpret_to_execute]: 1.98e-05 [rewriter_before_opt_a]: 5.766e-05 [opt_a]: 0.00217362, [2] [Cycle 1]: 0.00157454, [45] [expand_dump_flag]: 3.27002e-06 [switch_simplify]: 3.298e-05 [loop_unroll]: 2.02e-05 [a_1]: 0.00045533 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.63999e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.66998e-06 [a_2]: 7.594e-05 [accelerated_algorithm]: 6.31998e-06 [shard]: 2.30002e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 5.95002e-06 [parallel]: 2.602e-05 [flash_sp]: 7.32002e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.36001e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.09003e-06 [offload_activation]: 9.22999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.55999e-06 [before_grad]: 9.84001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.79002e-06 [meta_fg_expand]: 2.37999e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.43002e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 8.54e-06 [renormalize]: 0.00042918 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 2.02001e-06 [auto_monad_eliminator]: 1.35e-05 [cse]: 2.619e-05 [a_3]: 4.002e-05 [Cycle 2]: 0.00058976, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 6.63998e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012497 [with_stream_mark]: 9.84999e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.14e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.781e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.32998e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.50999e-06 [flash_sp]: 3.12002e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.80997e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 3.7998e-07 [virtual_shard_identity]: 5.92001e-06 [virtual_dataset]: 5.39998e-06 [get_grad_eliminate_]: 4.99998e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.14001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.53002e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.79e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.95001e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 5.72999e-06 [cse]: 1.58e-05 [a_3]: 3.187e-05 [py_interpret_to_execute_after_opt_a]: 7.48999e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.027e-05 [convert_after_rewriter]: 7.01001e-06 [order_py_execute_after_rewriter]: 5.46002e-06 [mutable_eliminate]: 0.00045333 [opt_b]: 0.00018291, [1] [Cycle 1]: 0.00017654, [7] [b_1]: 0.00010917 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.43002e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.19999e-06 [renormalize]: 4.00003e-07 [cse]: 1.623e-05 [optimize_parallel_all_gather_comm]: 1.615e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.196e-05 [loop_unroll]: 0.00041308 [opt_after_cconv]: 9.463e-05, [1] [Cycle 1]: 8.866e-05, [7] [c_1]: 2.742e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.589e-05 [renormalize]: 1.70025e-07 [remove_dup_value]: 1.21e-05 [tuple_transform]: 6.827e-05, [1] [Cycle 1]: 6.383e-05, [4] [d_1]: 3.834e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.11998e-06 [partial_unused_args_eliminate]: 1.88002e-06 [add_recomputation]: 4.626e-05 [cse_after_recomputation]: 2.025e-05, [1] [Cycle 1]: 1.572e-05, [1] [cse]: 1.071e-05 [environ_conv]: 5.21998e-06 [swap_dp_allreduce_reducescatter]: 5.37001e-06 [bias_add_comm_swap]: 2.72001e-06 [label_micro_interleaved_index]: 4.99e-06 [label_fine_grained_interleaved_index]: 3.08998e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.64999e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.34998e-06 [ForceFp32Comm]: 1.07998e-06 [remove_cast_before_assign_add]: 8.59989e-07 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 1.12999e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.08001e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97999e-06 [control_data_broadcast_order]: 1.195e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.8e-06 [overlap_recompute_and_grad_model_parallel]: 4.61002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.54999e-06 [overlap_grad_ring_attention]: 3.65e-06 [overlap_grad_flash_sp]: 1.8e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.58002e-06 [handle_group_info]: 1.42999e-06 [symbol_engine_optimizer]: 6.71e-05, [1] [Cycle 1]: 6.314e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.33001e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 5.93002e-06 [fold_const_symbol]: 9.14998e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.552e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.0004531 [validate]: 3.086e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.0777445 [execute]: 9.54e-06 Sums bootstrap : 0.000564s : 0.64% type_inference : 0.006052s : 6.86% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.06% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.07% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000580s : 0.66% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.16% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000031s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000429s : 0.49% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000042s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.08% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.51% optimize.opt_b.b_1 : 0.000109s : 0.12% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000413s : 0.47% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.05% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000453s : 0.51% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.077744s : 88.14% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.73% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.86% : 0.000001s : 2: substitution.fold_const_symbol 2.94% : 0.000005s : 4: substitution.graph_param_transform 66.18% : 0.000110s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000004s : 4: substitution.remove_not_recompute_node 2.64% : 0.000004s : 4: substitution.replace_old_param 7.05% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006006 2 90.88% : 0.005458s : 1: type_inference.infer 9.12% : 0.000548s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.30% : 0.000028s : 3: replace.inline 29.70% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 90.97% : 0.000108s : 3: match.inline 9.03% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.94% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.34% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.22% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.16% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.08% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.51% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.28% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.58% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.05% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.86% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 46.12% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.88% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.101793 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.46% : 0.003522s : 1: add_attr 3.45% : 0.003510s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.59% : 0.000603s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.41% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.45% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.93% : 0.000944s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.09% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.14% : 0.002176s : 1: opt_a 0.10% : 0.000098s : 1: opt_after_cconv 0.45% : 0.000462s : 1: opt_after_jit_grad 0.18% : 0.000186s : 1: opt_b 3.95% : 0.004017s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000033s : 1: pre_auto_parallel 0.02% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.22% : 0.000223s : 1: renormalize.infer 0.20% : 0.000200s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000070s : 1: symbol_engine_optimizer 76.40% : 0.077767s : 1: task_emit 0.07% : 0.000071s : 1: tuple_transform 5.96% : 0.006066s : 1: type_inference 0.06% : 0.000058s : 1: validate TotalTime = 0.0787638, [24] [bootstrap]: 0.00046949 [type_inference]: 0.00438781 [event_method]: 1.131e-05 [auto_monad]: 4.933e-05 [graph_reusing]: 5.38002e-06 [inline]: 1.60001e-06 [add_attr]: 0.00295407, [1] [add_attr_with_inline]: 0.0029465, [1] [Cycle 1]: 4.124e-05, [2] [tag_attr]: 1.201e-05 [meta_addattr_fg_expand]: 3.3e-06 [parallel-infer-symbol]: 2.58e-06 [pre_auto_parallel]: 2.216e-05 [insert-virtual-dataset]: 2.43e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 2.06998e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.00371674, [53] [py_interpret_to_execute]: 1.624e-05 [rewriter_before_opt_a]: 3.834e-05 [opt_a]: 0.00187533, [2] [Cycle 1]: 0.00127061, [45] [expand_dump_flag]: 2.92002e-06 [switch_simplify]: 2.39e-05 [loop_unroll]: 1.34e-05 [a_1]: 0.00029239 [with_stream_mark]: 1.342e-05 [recompute_prepare]: 7.31001e-06 [updatestate_depend_eliminate]: 3.66999e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.814e-05 [accelerated_algorithm]: 6.08002e-06 [shard]: 2.02001e-06 [meta_shard_fg_expand]: 1.49998e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.52e-06 [auto_parallel]: 5.94999e-06 [parallel]: 1.797e-05 [flash_sp]: 7.00998e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.30003e-06 [matmul_add_comm_reduction]: 9.52999e-06 [allreduce_slice_to_reducescatter]: 7.40023e-07 [virtual_shard_identity]: 6.73998e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.123e-05 [merge_recompute_call_nodes]: 1.72999e-06 [before_grad]: 9.62001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 2.17999e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.066e-05 [a_after_grad]: 8.53001e-06 [renormalize]: 0.00035497 [add_forward_monad_depend]: 4.22998e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.307e-05 [cse]: 2.821e-05 [a_3]: 4.109e-05 [Cycle 2]: 0.00059555, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.76e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012526 [with_stream_mark]: 1.089e-05 [recompute_prepare]: 5.54998e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.37001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.764e-05 [accelerated_algorithm]: 5.70001e-06 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.55001e-06 [merge_send_recv]: 4.68999e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.60001e-06 [flash_sp]: 3.34001e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 2.70997e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.25002e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.30001e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.65002e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.23999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11999e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 7.49977e-07 [auto_monad_eliminator]: 6.05002e-06 [cse]: 1.307e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.68999e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.063e-05 [convert_after_rewriter]: 7.37997e-06 [order_py_execute_after_rewriter]: 5.06997e-06 [mutable_eliminate]: 0.00048577 [opt_b]: 0.00018435, [1] [Cycle 1]: 0.00017797, [7] [b_1]: 0.00010935 [b_2]: 6.93e-06 [updatestate_depend_eliminate]: 5.24998e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 2.80008e-07 [cse]: 1.726e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 2.154e-05 [loop_unroll]: 0.00041198 [opt_after_cconv]: 9.379e-05, [1] [Cycle 1]: 8.817e-05, [7] [c_1]: 2.735e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.28002e-06 [cse]: 1.588e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.287e-05 [tuple_transform]: 6.975e-05, [1] [Cycle 1]: 6.538e-05, [4] [d_1]: 3.946e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.33e-05 [cse_after_recomputation]: 2.074e-05, [1] [Cycle 1]: 1.617e-05, [1] [cse]: 1.102e-05 [environ_conv]: 4.23999e-06 [swap_dp_allreduce_reducescatter]: 5.02999e-06 [bias_add_comm_swap]: 2.78e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 8.2e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.00999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 3.46999e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 1.94999e-06 [overlap_grad_ring_attention]: 3.93999e-06 [overlap_grad_flash_sp]: 1.752e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.89999e-06 [split_layernorm_comm]: 1.72001e-06 [handle_group_info]: 1.30001e-06 [symbol_engine_optimizer]: 6.773e-05, [1] [Cycle 1]: 6.367e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.55999e-06 [elim_not_effective]: 1.123e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 8.33001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.572e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00044585 [validate]: 3.279e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0664267 [execute]: 8.28999e-06 Sums bootstrap : 0.000469s : 0.63% type_inference : 0.004388s : 5.86% event_method : 0.000011s : 0.02% auto_monad : 0.000049s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000418s : 0.56% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000355s : 0.47% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000486s : 0.65% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.55% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.60% validate : 0.000033s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.066427s : 88.75% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.31% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 0.95% : 0.000001s : 2: substitution.fold_const_symbol 4.35% : 0.000005s : 4: substitution.graph_param_transform 64.72% : 0.000078s : 2: substitution.inline 2.54% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.09% : 0.000005s : 4: substitution.remove_not_recompute_node 3.59% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004348 2 91.60% : 0.003982s : 1: type_inference.infer 8.40% : 0.000365s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 2.11% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.03% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.22% : 0.000009s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.01% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.65% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.93% : 0.000001s : 9: predicate.transpose_eliminate 1.62% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.48% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000266 6 40.66% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.34% : 0.000158s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.086716 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.41% : 0.002958s : 1: add_attr 3.40% : 0.002950s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000054s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.58% : 0.000501s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000495s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.89% : 0.000769s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.17% : 0.001878s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.53% : 0.000455s : 1: opt_after_jit_grad 0.22% : 0.000188s : 1: opt_b 4.29% : 0.003721s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.22% : 0.000191s : 1: renormalize.infer 0.18% : 0.000157s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 76.63% : 0.066447s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 5.08% : 0.004401s : 1: type_inference 0.06% : 0.000054s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x5-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x6-pynative],max_mem:60.0M TotalTime = 0.0210301, [24] [bootstrap]: 0.00055813 [type_inference]: 0.00606881 [event_method]: 1.482e-05 [auto_monad]: 5.577e-05 [graph_reusing]: 6.01998e-06 [inline]: 1.60001e-06 [add_attr]: 0.00337614, [1] [add_attr_with_inline]: 0.00336551, [1] [Cycle 1]: 4.364e-05, [2] [tag_attr]: 1.478e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.803e-05 [insert-virtual-dataset]: 2.80997e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.70001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00398743, [53] [py_interpret_to_execute]: 2.007e-05 [rewriter_before_opt_a]: 5.959e-05 [opt_a]: 0.00214824, [2] [Cycle 1]: 0.00154283, [45] [expand_dump_flag]: 2.99999e-06 [switch_simplify]: 3.286e-05 [loop_unroll]: 2.082e-05 [a_1]: 0.00045425 [with_stream_mark]: 1.277e-05 [recompute_prepare]: 7.85e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 1.418e-05 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 2.12001e-06 [a_2]: 7.624e-05 [accelerated_algorithm]: 6.40002e-06 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.46999e-06 [auto_parallel]: 6.43e-06 [parallel]: 2.362e-05 [flash_sp]: 7.09001e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 8.55999e-06 [allreduce_slice_to_reducescatter]: 5.99975e-07 [virtual_shard_identity]: 7e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.55001e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 9.09e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 2.34999e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 1.04e-05 [a_after_grad]: 8.65999e-06 [renormalize]: 0.0004282 [add_forward_monad_depend]: 4.65999e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.402e-05 [cse]: 2.793e-05 [a_3]: 4.104e-05 [Cycle 2]: 0.00059601, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.72002e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012457 [with_stream_mark]: 1.034e-05 [recompute_prepare]: 5.69999e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 6.804e-05 [accelerated_algorithm]: 5.61998e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.09002e-06 [auto_parallel]: 6.88e-06 [parallel]: 4.25999e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 2.56998e-06 [matmul_add_comm_reduction]: 4.83001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 5.93002e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 5.35001e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 7.13e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.76e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.4e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.80999e-06 [a_after_grad]: 8.31002e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.34999e-06 [cse]: 1.279e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.51001e-06 [slice_cell_reuse_recomputed_activation]: 1.69998e-06 [rewriter_after_opt_a]: 2.967e-05 [convert_after_rewriter]: 7.00998e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00045042 [opt_b]: 0.00018214, [1] [Cycle 1]: 0.00017626, [7] [b_1]: 0.0001084 [b_2]: 6.74999e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [renormalize]: 5.3001e-07 [cse]: 1.663e-05 [optimize_parallel_all_gather_comm]: 1.594e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.223e-05 [loop_unroll]: 0.00041538 [opt_after_cconv]: 9.632e-05, [1] [Cycle 1]: 9.048e-05, [7] [c_1]: 2.786e-05 [parameter_eliminate]: 2.28998e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.676e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.188e-05 [tuple_transform]: 6.8e-05, [1] [Cycle 1]: 6.353e-05, [4] [d_1]: 3.833e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 5.87001e-06 [partial_unused_args_eliminate]: 1.70001e-06 [add_recomputation]: 4.844e-05 [cse_after_recomputation]: 2.036e-05, [1] [Cycle 1]: 1.603e-05, [1] [cse]: 1.081e-05 [environ_conv]: 4.61002e-06 [swap_dp_allreduce_reducescatter]: 5.45001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.01001e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.40002e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.118e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.56002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.684e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.825e-05, [1] [Cycle 1]: 6.399e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 8.17e-06 [elim_not_effective]: 1.155e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.78002e-06 [auto_monad_reorder]: 1.532e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.62002e-06 [opt_after_jit_grad]: 0.00045093 [validate]: 3.082e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.00621228 [execute]: 8e-06 Sums bootstrap : 0.000558s : 3.34% type_inference : 0.006069s : 36.37% event_method : 0.000015s : 0.09% auto_monad : 0.000056s : 0.33% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000579s : 3.47% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.10% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000013s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000017s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000428s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.70% optimize.opt_b.b_1 : 0.000108s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000415s : 2.49% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000451s : 2.70% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006212s : 37.23% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000165 30 14.91% : 0.000025s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.25% : 0.000005s : 4: substitution.graph_param_transform 66.78% : 0.000110s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.72% : 0.000005s : 4: substitution.remove_not_recompute_node 2.27% : 0.000004s : 4: substitution.replace_old_param 6.46% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006025 2 90.71% : 0.005465s : 1: type_inference.infer 9.29% : 0.000560s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.84% : 0.000027s : 3: replace.inline 30.16% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.85% : 0.000108s : 3: match.inline 8.15% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.14% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.95% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.96% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.17% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.68% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.77% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.50% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.85% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000352 8 47.02% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.98% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029918 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.30% : 0.003380s : 1: add_attr 11.26% : 0.003369s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000061s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.98% : 0.000593s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.16% : 0.000944s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.19% : 0.002151s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.54% : 0.000461s : 1: opt_after_jit_grad 0.62% : 0.000185s : 1: opt_b 13.34% : 0.003991s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.72% : 0.000216s : 1: renormalize.infer 0.69% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.80% : 0.006222s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.33% : 0.006082s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0182038, [24] [bootstrap]: 0.00044173 [type_inference]: 0.00433486 [event_method]: 1.054e-05 [auto_monad]: 5.153e-05 [graph_reusing]: 5.61998e-06 [inline]: 1.86e-06 [add_attr]: 0.0029973, [1] [add_attr_with_inline]: 0.0029893, [1] [Cycle 1]: 4.133e-05, [2] [tag_attr]: 1.287e-05 [meta_addattr_fg_expand]: 3.11001e-06 [parallel-infer-symbol]: 2.64999e-06 [pre_auto_parallel]: 2.204e-05 [insert-virtual-dataset]: 2.53003e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00369431, [53] [py_interpret_to_execute]: 1.566e-05 [rewriter_before_opt_a]: 3.984e-05 [opt_a]: 0.00186744, [2] [Cycle 1]: 0.0012639, [45] [expand_dump_flag]: 3.26001e-06 [switch_simplify]: 2.475e-05 [loop_unroll]: 1.399e-05 [a_1]: 0.00029468 [with_stream_mark]: 1.306e-05 [recompute_prepare]: 7.19001e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 3.18e-06 [parameter_eliminate]: 1.49e-06 [a_2]: 7.625e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 2.73e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.00002e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 6.85002e-06 [parallel]: 1.88e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.17999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.01999e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 5.53002e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.95998e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 9.16002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.04e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.56998e-06 [flash_sp_send_recv_attached]: 2.46998e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.086e-05 [a_after_grad]: 9.25999e-06 [renormalize]: 0.00034483 [add_forward_monad_depend]: 4.21001e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 2.642e-05 [a_3]: 4.111e-05 [Cycle 2]: 0.00059408, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.59e-06 [a_1]: 0.00012532 [with_stream_mark]: 9.84001e-06 [recompute_prepare]: 5.70001e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.76e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.05999e-06 [shard_inline]: 5.51002e-06 [merge_send_recv]: 4.33001e-06 [auto_parallel]: 5.19998e-06 [parallel]: 4.29002e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 4.97e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.31002e-06 [get_grad_eliminate_]: 5.12999e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.46998e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.004e-05 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.35001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.14998e-06 [after_resolve]: 9.97999e-06 [a_after_grad]: 8.28999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.23e-06 [cse]: 1.13e-05 [a_3]: 3.222e-05 [py_interpret_to_execute_after_opt_a]: 7.58001e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 3.073e-05 [convert_after_rewriter]: 7.16999e-06 [order_py_execute_after_rewriter]: 4.97e-06 [mutable_eliminate]: 0.0004477 [opt_b]: 0.00018255, [1] [Cycle 1]: 0.0001765, [7] [b_1]: 0.0001094 [b_2]: 7.08998e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [renormalize]: 5.69999e-07 [cse]: 1.557e-05 [optimize_parallel_all_gather_comm]: 1.589e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.273e-05 [loop_unroll]: 0.00043198 [opt_after_cconv]: 9.546e-05, [1] [Cycle 1]: 8.999e-05, [7] [c_1]: 2.81e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.576e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.266e-05 [tuple_transform]: 6.942e-05, [1] [Cycle 1]: 6.537e-05, [4] [d_1]: 3.985e-05 [none_parameter_eliminate]: 1.51998e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 4.434e-05 [cse_after_recomputation]: 1.958e-05, [1] [Cycle 1]: 1.525e-05, [1] [cse]: 1.031e-05 [environ_conv]: 4.78001e-06 [swap_dp_allreduce_reducescatter]: 5.21998e-06 [bias_add_comm_swap]: 2.35002e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.24998e-06 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.58002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.45002e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.00001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.48999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.744e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.35002e-06 [split_layernorm_comm]: 1.86998e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.822e-05, [1] [Cycle 1]: 6.421e-05, [6] [build]: 2.38998e-06 [elim_shapecalc]: 8.22e-06 [elim_not_effective]: 1.179e-05 [opt_reshape]: 6.33002e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.559e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.7e-06 [opt_after_jit_grad]: 0.00044978 [validate]: 3.109e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.00593148 [execute]: 6.41e-06 Sums bootstrap : 0.000442s : 3.10% type_inference : 0.004335s : 30.41% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000420s : 2.95% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.15% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000345s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.14% optimize.opt_b.b_1 : 0.000109s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000432s : 3.03% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000450s : 3.16% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005931s : 41.62% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000121 26 17.61% : 0.000021s : 4: substitution.arithmetic_simplify 1.37% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 4.34% : 0.000005s : 4: substitution.graph_param_transform 66.27% : 0.000080s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000004s : 4: substitution.remove_not_recompute_node 3.46% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004295 2 91.96% : 0.003949s : 1: type_inference.infer 8.04% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 1.03% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.98% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.18% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.42% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.78% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.44% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.27% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.38% : 0.000006s : 41: predicate.switch_simplify 0.69% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.46% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 1.13% : 0.000002s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.68% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 42.99% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.01% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026178 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.47% : 0.003001s : 1: add_attr 11.43% : 0.002993s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.82% : 0.000475s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.68% : 0.000441s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.96% : 0.000775s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.14% : 0.001870s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.75% : 0.000459s : 1: opt_after_jit_grad 0.71% : 0.000186s : 1: opt_b 14.13% : 0.003698s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000192s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.70% : 0.005941s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.61% : 0.004348s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x6-kbk],max_mem:60.0M TotalTime = 0.080456, [24] [bootstrap]: 0.00054885 [type_inference]: 0.00595781 [event_method]: 1.414e-05 [auto_monad]: 5.857e-05 [graph_reusing]: 5.62001e-06 [inline]: 2.02999e-06 [add_attr]: 0.00340981, [1] [add_attr_with_inline]: 0.00339934, [1] [Cycle 1]: 4.449e-05, [2] [tag_attr]: 1.527e-05 [meta_addattr_fg_expand]: 3.96001e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.845e-05 [insert-virtual-dataset]: 2.95002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.31e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00396338, [53] [py_interpret_to_execute]: 2.006e-05 [rewriter_before_opt_a]: 5.861e-05 [opt_a]: 0.00210824, [2] [Cycle 1]: 0.00150578, [45] [expand_dump_flag]: 2.46e-06 [switch_simplify]: 3.13e-05 [loop_unroll]: 2.032e-05 [a_1]: 0.00045171 [with_stream_mark]: 1.306e-05 [recompute_prepare]: 7.68001e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.60001e-06 [a_2]: 7.521e-05 [accelerated_algorithm]: 6.14999e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 5.81998e-06 [merge_send_recv]: 8.45001e-06 [auto_parallel]: 6.31e-06 [parallel]: 2.361e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.78999e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.45999e-06 [before_grad]: 9.56e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.52001e-06 [flash_sp_send_recv_attached]: 2.90002e-06 [receive_attached]: 2.55002e-06 [after_resolve]: 1.023e-05 [a_after_grad]: 9.24e-06 [renormalize]: 0.00041291 [add_forward_monad_depend]: 4.42e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.346e-05 [cse]: 2.636e-05 [a_3]: 4.062e-05 [Cycle 2]: 0.00059302, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.74001e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012567 [with_stream_mark]: 9.02999e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.805e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.50999e-06 [auto_parallel]: 5.25999e-06 [parallel]: 4.38999e-06 [flash_sp]: 3.04001e-06 [merge_comm]: 3.29001e-06 [allreduce_fusion]: 2.91e-06 [matmul_add_comm_reduction]: 5.10999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.44999e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.77002e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.25002e-06 [cse]: 1.576e-05 [a_3]: 3.183e-05 [py_interpret_to_execute_after_opt_a]: 7.66999e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 3.233e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.00046735 [opt_b]: 0.00018198, [1] [Cycle 1]: 0.00017588, [7] [b_1]: 0.00010902 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 4.80009e-07 [cse]: 1.577e-05 [optimize_parallel_all_gather_comm]: 1.59e-05 [overlap_param_gather]: 2.14e-06 [cconv]: 2.206e-05 [loop_unroll]: 0.00041534 [opt_after_cconv]: 9.318e-05, [1] [Cycle 1]: 8.727e-05, [7] [c_1]: 2.728e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.537e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.855e-05, [1] [Cycle 1]: 6.429e-05, [4] [d_1]: 3.9e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.02001e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.584e-05 [cse_after_recomputation]: 1.986e-05, [1] [Cycle 1]: 1.54e-05, [1] [cse]: 1.042e-05 [environ_conv]: 4.65001e-06 [swap_dp_allreduce_reducescatter]: 5.60001e-06 [bias_add_comm_swap]: 2.48e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.59998e-06 [slice_recompute_activation]: 2.47001e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.59001e-06 [reorder_send_recv_between_fp_bp]: 2.53003e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.161e-05 [grouped_pairwise_exchange_alltoall]: 2.04e-06 [offloading_packed_experts]: 3.78001e-06 [overlap_recompute_and_grad_model_parallel]: 4.85999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.54998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.24999e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.737e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.12e-06 [symbol_engine_optimizer]: 6.894e-05, [1] [Cycle 1]: 6.473e-05, [6] [build]: 2.02999e-06 [elim_shapecalc]: 8.57e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.63002e-06 [auto_monad_reorder]: 1.525e-05 [get_jit_bprop_graph]: 9.90025e-07 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00045087 [validate]: 3.13e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.0657354 [execute]: 8.53001e-06 Sums bootstrap : 0.000549s : 0.72% type_inference : 0.005958s : 7.83% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000577s : 0.76% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000413s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.61% optimize.opt_b.b_1 : 0.000109s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.55% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000451s : 0.59% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.065735s : 86.40% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.29% : 0.000023s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.83% : 0.000001s : 2: substitution.fold_const_symbol 3.29% : 0.000005s : 4: substitution.graph_param_transform 66.81% : 0.000109s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.88% : 0.000005s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.67% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005915 2 90.90% : 0.005377s : 1: type_inference.infer 9.10% : 0.000538s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.45% : 0.000028s : 3: replace.inline 28.55% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.58% : 0.000107s : 3: match.inline 8.42% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.97% : 0.000002s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.81% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.90% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 47.37% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.63% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.089338 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.82% : 0.003414s : 1: add_attr 3.81% : 0.003403s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000064s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000583s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.47% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000477s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.06% : 0.000944s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.36% : 0.002111s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.52% : 0.000460s : 1: opt_after_jit_grad 0.21% : 0.000185s : 1: opt_b 4.44% : 0.003967s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000033s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000214s : 1: renormalize.infer 0.22% : 0.000193s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 73.60% : 0.065755s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 6.68% : 0.005972s : 1: type_inference 0.06% : 0.000056s : 1: validate TotalTime = 0.0759368, [24] [bootstrap]: 0.00047979 [type_inference]: 0.00443361 [event_method]: 1.13e-05 [auto_monad]: 5.099e-05 [graph_reusing]: 5.20001e-06 [inline]: 2.08002e-06 [add_attr]: 0.00294093, [1] [add_attr_with_inline]: 0.00293307, [1] [Cycle 1]: 4.034e-05, [2] [tag_attr]: 1.189e-05 [meta_addattr_fg_expand]: 2.94999e-06 [parallel-infer-symbol]: 2.93e-06 [pre_auto_parallel]: 2.14e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00366857, [53] [py_interpret_to_execute]: 1.485e-05 [rewriter_before_opt_a]: 3.876e-05 [opt_a]: 0.00186805, [2] [Cycle 1]: 0.00126459, [45] [expand_dump_flag]: 2.43e-06 [switch_simplify]: 2.431e-05 [loop_unroll]: 1.412e-05 [a_1]: 0.00030532 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 6.98e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 1.79e-06 [a_2]: 7.628e-05 [accelerated_algorithm]: 6.64001e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 7.76001e-06 [auto_parallel]: 5.93002e-06 [parallel]: 1.68e-05 [flash_sp]: 7.03998e-06 [merge_comm]: 3.41001e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 9.17999e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.91999e-06 [virtual_dataset]: 5.91e-06 [get_grad_eliminate_]: 5.66003e-06 [virtual_output]: 5.37999e-06 [merge_forward]: 3.89002e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.05999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.092e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.19e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 2.14e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.18002e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00034859 [add_forward_monad_depend]: 4.15e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.3e-05 [cse]: 2.637e-05 [a_3]: 4.027e-05 [Cycle 2]: 0.00059417, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.56e-06 [a_1]: 0.00012525 [with_stream_mark]: 1.075e-05 [recompute_prepare]: 5.87001e-06 [updatestate_depend_eliminate]: 2.83998e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.28002e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.735e-05 [accelerated_algorithm]: 5.86e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.32e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.12e-06 [flash_sp]: 2.96001e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.85998e-06 [matmul_add_comm_reduction]: 4.95001e-06 [allreduce_slice_to_reducescatter]: 2.29978e-07 [virtual_shard_identity]: 6.24001e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 5.05001e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 6.23e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47999e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.33001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.86998e-06 [a_after_grad]: 8.14997e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.223e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.35998e-06 [slice_cell_reuse_recomputed_activation]: 1.81003e-06 [rewriter_after_opt_a]: 3.142e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 5.22e-06 [mutable_eliminate]: 0.00044875 [opt_b]: 0.00018193, [1] [Cycle 1]: 0.0001758, [7] [b_1]: 0.00010758 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.29979e-07 [cse]: 1.706e-05 [optimize_parallel_all_gather_comm]: 1.577e-05 [overlap_param_gather]: 2.17999e-06 [cconv]: 2.229e-05 [loop_unroll]: 0.00041277 [opt_after_cconv]: 9.38e-05, [1] [Cycle 1]: 8.806e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 1.94e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.589e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.146e-05 [tuple_transform]: 6.883e-05, [1] [Cycle 1]: 6.469e-05, [4] [d_1]: 3.918e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 4.435e-05 [cse_after_recomputation]: 2.061e-05, [1] [Cycle 1]: 1.619e-05, [1] [cse]: 1.085e-05 [environ_conv]: 4.87998e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.44999e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.54001e-06 [reorder_send_recv_between_fp_bp]: 2.94999e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49998e-06 [control_data_broadcast_order]: 1.243e-05 [grouped_pairwise_exchange_alltoall]: 1.92001e-06 [offloading_packed_experts]: 3.62002e-06 [overlap_recompute_and_grad_model_parallel]: 4.51002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.642e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.12001e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 6.703e-05, [1] [Cycle 1]: 6.288e-05, [6] [build]: 2.47001e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.101e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.43001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.63002e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.523e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00049384 [validate]: 3.293e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.0635482 [execute]: 9.02e-06 Sums bootstrap : 0.000480s : 0.67% type_inference : 0.004434s : 6.16% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.05% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.04% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000431s : 0.60% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.20% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000349s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.62% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000413s : 0.57% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000494s : 0.69% validate : 0.000033s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.063548s : 88.22% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.71% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 65.34% : 0.000080s : 2: substitution.inline 2.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.53% : 0.000004s : 4: substitution.remove_not_recompute_node 3.08% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004393 2 91.84% : 0.004034s : 1: type_inference.infer 8.16% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000139 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000004s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.00% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 1.01% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.78% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.62% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.45% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000261 6 44.03% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.97% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.083830 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.51% : 0.002945s : 1: add_attr 3.50% : 0.002936s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000513s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.93% : 0.000782s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.23% : 0.001871s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.60% : 0.000504s : 1: opt_after_jit_grad 0.22% : 0.000186s : 1: opt_b 4.38% : 0.003672s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.23% : 0.000192s : 1: renormalize.infer 0.18% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 75.83% : 0.063571s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.31% : 0.004447s : 1: type_inference 0.07% : 0.000055s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x6-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x7-pynative],max_mem:60.0M TotalTime = 0.0212165, [24] [bootstrap]: 0.00055883 [type_inference]: 0.00617891 [event_method]: 1.402e-05 [auto_monad]: 5.919e-05 [graph_reusing]: 5.23002e-06 [inline]: 1.72999e-06 [add_attr]: 0.00339877, [1] [add_attr_with_inline]: 0.00338764, [1] [Cycle 1]: 4.636e-05, [2] [tag_attr]: 1.55e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.839e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.81e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00402851, [53] [py_interpret_to_execute]: 2.098e-05 [rewriter_before_opt_a]: 5.919e-05 [opt_a]: 0.00213818, [2] [Cycle 1]: 0.0015339, [45] [expand_dump_flag]: 3.14001e-06 [switch_simplify]: 3.083e-05 [loop_unroll]: 2.118e-05 [a_1]: 0.00045578 [with_stream_mark]: 1.413e-05 [recompute_prepare]: 7.96001e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.604e-05 [accelerated_algorithm]: 6.63e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 5.92999e-06 [merge_send_recv]: 8.50999e-06 [auto_parallel]: 6.01998e-06 [parallel]: 2.337e-05 [flash_sp]: 7.51999e-06 [merge_comm]: 3.84002e-06 [allreduce_fusion]: 3.71001e-06 [matmul_add_comm_reduction]: 7.83001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.51001e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.84e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.081e-05 [merge_recompute_call_nodes]: 1.84e-06 [before_grad]: 9.38002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37997e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.24999e-06 [after_resolve]: 1.043e-05 [a_after_grad]: 8.85999e-06 [renormalize]: 0.00043203 [add_forward_monad_depend]: 4.17e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.352e-05 [cse]: 2.736e-05 [a_3]: 4.057e-05 [Cycle 2]: 0.00059505, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 7.07002e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012613 [with_stream_mark]: 1.016e-05 [recompute_prepare]: 5.84e-06 [updatestate_depend_eliminate]: 2.93e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.32001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.744e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.40001e-06 [merge_send_recv]: 4.15e-06 [auto_parallel]: 5.07e-06 [parallel]: 3.88001e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.94e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.63003e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 6.20002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62999e-06 [merge_recompute_call_nodes]: 6.00005e-07 [before_grad]: 8.27998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.10998e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.72e-06 [a_after_grad]: 8.06001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.11002e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.61e-06 [cse]: 1.318e-05 [a_3]: 3.146e-05 [py_interpret_to_execute_after_opt_a]: 7.9e-06 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 3.208e-05 [convert_after_rewriter]: 6.92002e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00045267 [opt_b]: 0.00018287, [1] [Cycle 1]: 0.0001766, [7] [b_1]: 0.00010852 [b_2]: 6.91999e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.09985e-07 [cse]: 1.716e-05 [optimize_parallel_all_gather_comm]: 1.564e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.191e-05 [loop_unroll]: 0.00041921 [opt_after_cconv]: 9.482e-05, [1] [Cycle 1]: 8.928e-05, [7] [c_1]: 2.734e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.637e-05 [renormalize]: 8.99978e-07 [remove_dup_value]: 1.2e-05 [tuple_transform]: 6.979e-05, [1] [Cycle 1]: 6.503e-05, [4] [d_1]: 3.898e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.78002e-06 [add_recomputation]: 4.673e-05 [cse_after_recomputation]: 2.102e-05, [1] [Cycle 1]: 1.647e-05, [1] [cse]: 1.114e-05 [environ_conv]: 5.35001e-06 [swap_dp_allreduce_reducescatter]: 5.61998e-06 [bias_add_comm_swap]: 3.06999e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.14999e-06 [micro_interleaved_order_control]: 2.63998e-06 [assign_add_opt]: 1.23002e-06 [ForceFp32Comm]: 1.34e-06 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.41e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50999e-06 [control_data_broadcast_order]: 1.191e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 3.75e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.06e-06 [overlap_grad_ring_attention]: 4.57e-06 [overlap_grad_flash_sp]: 1.673e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.07e-06 [symbol_engine_optimizer]: 0.00010533, [1] [Cycle 1]: 0.00010118, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.194e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 4.402e-05 [renormalize]: 1.59984e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.59998e-06 [auto_monad_reorder]: 1.626e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00045131 [validate]: 3.263e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00620471 [execute]: 6.71e-06 Sums bootstrap : 0.000559s : 3.32% type_inference : 0.006179s : 36.70% event_method : 0.000014s : 0.08% auto_monad : 0.000059s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000582s : 3.46% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000432s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000072s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.69% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000419s : 2.49% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000001s : 0.01% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000044s : 0.26% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000451s : 2.68% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006205s : 36.86% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 15.22% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 4: substitution.graph_param_transform 66.35% : 0.000110s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000004s : 4: substitution.remove_not_recompute_node 2.53% : 0.000004s : 4: substitution.replace_old_param 6.38% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006136 2 90.04% : 0.005525s : 1: type_inference.infer 9.96% : 0.000611s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.37% : 0.000027s : 3: replace.inline 30.63% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.88% : 0.000108s : 3: match.inline 8.12% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.93% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000009s : 51: predicate.inline 0.78% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000002s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.32% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.59% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.25% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.61% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.41% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.42% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000393 8 42.94% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.06% : 0.000224s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030211 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.26% : 0.003403s : 1: add_attr 11.23% : 0.003391s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000065s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.97% : 0.000596s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.06% : 0.000017s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000948s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.22% : 0.000067s : 4: opt.transform.symbol_engine_opt 7.09% : 0.002141s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.53% : 0.000461s : 1: opt_after_jit_grad 0.62% : 0.000186s : 1: opt_b 13.35% : 0.004032s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.72% : 0.000218s : 1: renormalize.infer 0.68% : 0.000207s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.36% : 0.000108s : 1: symbol_engine_optimizer 20.57% : 0.006215s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.50% : 0.006193s : 1: type_inference 0.21% : 0.000064s : 1: validate TotalTime = 0.018344, [24] [bootstrap]: 0.00048221 [type_inference]: 0.00435513 [event_method]: 1.104e-05 [auto_monad]: 5.31e-05 [graph_reusing]: 5.35999e-06 [inline]: 1.94999e-06 [add_attr]: 0.00302032, [1] [add_attr_with_inline]: 0.00301142, [1] [Cycle 1]: 4.521e-05, [2] [tag_attr]: 1.234e-05 [meta_addattr_fg_expand]: 3.33e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.251e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.99999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00374147, [53] [py_interpret_to_execute]: 1.601e-05 [rewriter_before_opt_a]: 3.929e-05 [opt_a]: 0.0018676, [2] [Cycle 1]: 0.00125772, [45] [expand_dump_flag]: 2.57001e-06 [switch_simplify]: 2.405e-05 [loop_unroll]: 1.385e-05 [a_1]: 0.00029289 [with_stream_mark]: 1.475e-05 [recompute_prepare]: 7.30998e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.25002e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.716e-05 [accelerated_algorithm]: 6.33998e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 7.80998e-06 [auto_parallel]: 5.86e-06 [parallel]: 1.857e-05 [flash_sp]: 8.42e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 9.36998e-06 [allreduce_slice_to_reducescatter]: 9.00007e-07 [virtual_shard_identity]: 7.38999e-06 [virtual_dataset]: 5.80002e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.40999e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.097e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.00034524 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.239e-05 [cse]: 2.697e-05 [a_3]: 4.04e-05 [Cycle 2]: 0.00060002, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 7.07002e-06 [loop_unroll]: 5.67001e-06 [a_1]: 0.00012585 [with_stream_mark]: 1.109e-05 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.38998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.966e-05 [accelerated_algorithm]: 5.83002e-06 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.77001e-06 [parallel]: 3.81999e-06 [flash_sp]: 3.52997e-06 [merge_comm]: 3.20998e-06 [allreduce_fusion]: 2.55997e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.21998e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.46998e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.99999e-06 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 8.50001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 6.89994e-07 [receive_attached]: 1.10999e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 8.18001e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.23e-05 [a_3]: 3.227e-05 [py_interpret_to_execute_after_opt_a]: 7.37002e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.071e-05 [convert_after_rewriter]: 7.11001e-06 [order_py_execute_after_rewriter]: 5.10001e-06 [mutable_eliminate]: 0.00050486 [opt_b]: 0.00018353, [1] [Cycle 1]: 0.00017708, [7] [b_1]: 0.00010915 [b_2]: 7.2e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 2.80008e-07 [cse]: 1.621e-05 [optimize_parallel_all_gather_comm]: 1.574e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.287e-05 [loop_unroll]: 0.00041963 [opt_after_cconv]: 9.486e-05, [1] [Cycle 1]: 8.913e-05, [7] [c_1]: 2.835e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.29998e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.594e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.251e-05 [tuple_transform]: 6.875e-05, [1] [Cycle 1]: 6.469e-05, [4] [d_1]: 3.929e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.88002e-06 [add_recomputation]: 4.305e-05 [cse_after_recomputation]: 1.997e-05, [1] [Cycle 1]: 1.564e-05, [1] [cse]: 1.07e-05 [environ_conv]: 4.35999e-06 [swap_dp_allreduce_reducescatter]: 5.74e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.60001e-06 [slice_recompute_activation]: 2.58998e-06 [micro_interleaved_order_control]: 2.63998e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.37e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.44e-06 [overlap_opt_shard_in_pipeline]: 1.46998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.58002e-06 [offloading_packed_experts]: 3.44001e-06 [overlap_recompute_and_grad_model_parallel]: 4.48999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52001e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.724e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.738e-05, [1] [Cycle 1]: 6.326e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 7.9e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.77999e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.59e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.52e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00044584 [validate]: 3.159e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00593815 [execute]: 6.74999e-06 Sums bootstrap : 0.000482s : 3.36% type_inference : 0.004355s : 30.32% event_method : 0.000011s : 0.08% auto_monad : 0.000053s : 0.37% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000419s : 2.92% optimize.opt_a.with_stream_mark : 0.000026s : 0.18% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000012s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000345s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000505s : 3.51% optimize.opt_b.b_1 : 0.000109s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000420s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.10% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005938s : 41.34% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.64% : 0.000022s : 4: substitution.arithmetic_simplify 1.67% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.31% : 0.000005s : 4: substitution.graph_param_transform 65.00% : 0.000078s : 2: substitution.inline 2.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.76% : 0.000005s : 4: substitution.remove_not_recompute_node 3.16% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004314 2 92.06% : 0.003971s : 1: type_inference.infer 7.94% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000140 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.15% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.54% : 0.000004s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000009s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.56% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 42.69% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.31% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026384 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.46% : 0.003025s : 1: add_attr 11.43% : 0.003015s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000058s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.96% : 0.000516s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000017s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000428s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.95% : 0.000515s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000014s : 1: opt.transform.mutable_eliminate 2.93% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001870s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.72% : 0.000455s : 1: opt_after_jit_grad 0.71% : 0.000187s : 1: opt_b 14.20% : 0.003745s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000193s : 1: renormalize.infer 0.55% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.54% : 0.005948s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.56% : 0.004369s : 1: type_inference 0.22% : 0.000059s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x7-kbk],max_mem:60.0M TotalTime = 0.0633516, [24] [bootstrap]: 0.00059775 [type_inference]: 0.00598166 [event_method]: 1.342e-05 [auto_monad]: 5.403e-05 [graph_reusing]: 6.11e-06 [inline]: 1.79e-06 [add_attr]: 0.00340944, [1] [add_attr_with_inline]: 0.00339897, [1] [Cycle 1]: 4.474e-05, [2] [tag_attr]: 1.523e-05 [meta_addattr_fg_expand]: 4.63001e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.803e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.91003e-06 [pipeline_split]: 2.05002e-06 [optimize]: 0.00398344, [53] [py_interpret_to_execute]: 1.974e-05 [rewriter_before_opt_a]: 5.933e-05 [opt_a]: 0.00214824, [2] [Cycle 1]: 0.00152215, [45] [expand_dump_flag]: 3.09001e-06 [switch_simplify]: 3.201e-05 [loop_unroll]: 2.4e-05 [a_1]: 0.00046092 [with_stream_mark]: 1.369e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.9e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.94999e-06 [parameter_eliminate]: 1.56002e-06 [a_2]: 7.606e-05 [accelerated_algorithm]: 6.47001e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 7.56999e-06 [auto_parallel]: 6.02001e-06 [parallel]: 2.271e-05 [flash_sp]: 8.12998e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 8.70999e-06 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 7.57002e-06 [virtual_dataset]: 6.43003e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.84999e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.56003e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.83002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.67002e-06 [meta_fg_expand]: 2.09999e-06 [flash_sp_send_recv_attached]: 2.28998e-06 [receive_attached]: 2.11998e-06 [after_resolve]: 1.026e-05 [a_after_grad]: 8.74998e-06 [renormalize]: 0.00041392 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.96e-06 [auto_monad_eliminator]: 1.307e-05 [cse]: 2.487e-05 [a_3]: 4.113e-05 [Cycle 2]: 0.00061636, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.72999e-06 [a_1]: 0.00012664 [with_stream_mark]: 9.91998e-06 [recompute_prepare]: 5.87001e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 8.548e-05 [accelerated_algorithm]: 5.97001e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.63997e-06 [parallel]: 4.04997e-06 [flash_sp]: 3.06999e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 3.01001e-06 [matmul_add_comm_reduction]: 4.97e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.12999e-06 [virtual_dataset]: 5.36998e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.38002e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.27e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.33999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 6.29001e-06 [cse]: 1.634e-05 [a_3]: 3.231e-05 [py_interpret_to_execute_after_opt_a]: 7.78001e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.178e-05 [convert_after_rewriter]: 7.21999e-06 [order_py_execute_after_rewriter]: 5.62999e-06 [mutable_eliminate]: 0.00044749 [opt_b]: 0.00018158, [1] [Cycle 1]: 0.0001756, [7] [b_1]: 0.00010819 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 3.80009e-07 [cse]: 1.559e-05 [optimize_parallel_all_gather_comm]: 1.508e-05 [overlap_param_gather]: 2.19001e-06 [cconv]: 2.182e-05 [loop_unroll]: 0.00041307 [opt_after_cconv]: 9.408e-05, [1] [Cycle 1]: 8.819e-05, [7] [c_1]: 2.742e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.564e-05 [renormalize]: 1.8999e-07 [remove_dup_value]: 1.212e-05 [tuple_transform]: 6.956e-05, [1] [Cycle 1]: 6.513e-05, [4] [d_1]: 3.893e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 4.78e-05 [cse_after_recomputation]: 2.026e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.048e-05 [environ_conv]: 4.33999e-06 [swap_dp_allreduce_reducescatter]: 5.34e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.05999e-06 [add_comm_op_reuse_tag]: 1.34998e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.88002e-06 [control_data_broadcast_order]: 1.158e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.40998e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.28002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 3.63e-06 [overlap_grad_flash_sp]: 1.677e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 1.98997e-06 [split_layernorm_comm]: 2.21e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.959e-05, [1] [Cycle 1]: 6.549e-05, [6] [build]: 2.14e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.204e-05 [opt_reshape]: 6.31998e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 2.3999e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.506e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00044824 [validate]: 3.229e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.0485427 [execute]: 8.62998e-06 Sums bootstrap : 0.000598s : 1.01% type_inference : 0.005982s : 10.14% event_method : 0.000013s : 0.02% auto_monad : 0.000054s : 0.09% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000028s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.10% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.07% optimize.opt_a.loop_unroll : 0.000030s : 0.05% optimize.opt_a.a_1 : 0.000588s : 1.00% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000162s : 0.27% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.05% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000414s : 0.70% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.07% optimize.opt_a.a_3 : 0.000073s : 0.12% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000447s : 0.76% optimize.opt_b.b_1 : 0.000108s : 0.18% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000413s : 0.70% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.08% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000448s : 0.76% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.048543s : 82.32% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.63% : 0.000024s : 5: substitution.arithmetic_simplify 1.33% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.01% : 0.000005s : 4: substitution.graph_param_transform 66.61% : 0.000110s : 3: substitution.inline 2.00% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.75% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005939 2 90.84% : 0.005395s : 1: type_inference.infer 9.16% : 0.000544s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.98% : 0.000028s : 3: replace.inline 29.02% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.50% : 0.000108s : 3: match.inline 8.50% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000163 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.44% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000002s : 11: predicate.cast_eliminate 0.90% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.55% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.32% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.66% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.29% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 1.04% : 0.000002s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.12% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.19% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.58% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 47.14% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.86% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.072288 196 0.00% : 0.000003s : 1: ForceFp32Comm 4.72% : 0.003414s : 1: add_attr 4.71% : 0.003403s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.08% : 0.000059s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000635s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.03% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.58% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.63% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.35% : 0.000977s : 78: opt.transform.opt_a 0.04% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.98% : 0.002151s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.63% : 0.000458s : 1: opt_after_jit_grad 0.26% : 0.000185s : 1: opt_b 5.52% : 0.003987s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.30% : 0.000215s : 1: renormalize.infer 0.27% : 0.000193s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.09% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.10% : 0.000072s : 1: symbol_engine_optimizer 67.18% : 0.048562s : 1: task_emit 0.10% : 0.000072s : 1: tuple_transform 8.29% : 0.005996s : 1: type_inference 0.08% : 0.000057s : 1: validate TotalTime = 0.0548335, [24] [bootstrap]: 0.00046564 [type_inference]: 0.00436232 [event_method]: 1.122e-05 [auto_monad]: 5.122e-05 [graph_reusing]: 5.22e-06 [inline]: 1.44e-06 [add_attr]: 0.00293523, [1] [add_attr_with_inline]: 0.00292676, [1] [Cycle 1]: 4.285e-05, [2] [tag_attr]: 1.136e-05 [meta_addattr_fg_expand]: 3.00998e-06 [parallel-infer-symbol]: 2.54999e-06 [pre_auto_parallel]: 2.14e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.47999e-06 [optimize]: 0.00365589, [53] [py_interpret_to_execute]: 1.471e-05 [rewriter_before_opt_a]: 3.979e-05 [opt_a]: 0.00183919, [2] [Cycle 1]: 0.00124163, [45] [expand_dump_flag]: 2.88e-06 [switch_simplify]: 2.439e-05 [loop_unroll]: 1.424e-05 [a_1]: 0.00028795 [with_stream_mark]: 1.398e-05 [recompute_prepare]: 7.50998e-06 [updatestate_depend_eliminate]: 3.51999e-06 [updatestate_assign_eliminate]: 3.12002e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.58002e-06 [a_2]: 7.495e-05 [accelerated_algorithm]: 5.98998e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.56998e-06 [merge_send_recv]: 7.35998e-06 [auto_parallel]: 5.62999e-06 [parallel]: 1.791e-05 [flash_sp]: 7.51999e-06 [merge_comm]: 3.54002e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 6.69001e-06 [virtual_dataset]: 5.37999e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.35999e-06 [merge_forward]: 3.77002e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 8.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.067e-05 [merge_recompute_call_nodes]: 1.74998e-06 [before_grad]: 8.87999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.02001e-06 [flash_sp_send_recv_attached]: 2.22001e-06 [receive_attached]: 2.40002e-06 [after_resolve]: 1e-05 [a_after_grad]: 8.57e-06 [renormalize]: 0.00034656 [add_forward_monad_depend]: 4.2e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.271e-05 [cse]: 2.611e-05 [a_3]: 3.997e-05 [Cycle 2]: 0.00058859, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.78e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012501 [with_stream_mark]: 9.02999e-06 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.57001e-06 [parameter_eliminate]: 7.79983e-07 [a_2]: 6.749e-05 [accelerated_algorithm]: 5.69e-06 [shard]: 1.49e-06 [meta_shard_fg_expand]: 1.11002e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.12999e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.11999e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 4.95001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.14001e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 6.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.68002e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.01001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.79001e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.62999e-06 [a_after_grad]: 8.18001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 8.79983e-07 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.176e-05 [a_3]: 3.121e-05 [py_interpret_to_execute_after_opt_a]: 7.48e-06 [slice_cell_reuse_recomputed_activation]: 1.70001e-06 [rewriter_after_opt_a]: 3.052e-05 [convert_after_rewriter]: 7.33e-06 [order_py_execute_after_rewriter]: 5.40001e-06 [mutable_eliminate]: 0.00044272 [opt_b]: 0.00017925, [1] [Cycle 1]: 0.00017367, [7] [b_1]: 0.00010677 [b_2]: 7.64002e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.80009e-07 [cse]: 1.585e-05 [optimize_parallel_all_gather_comm]: 1.542e-05 [overlap_param_gather]: 2.00002e-06 [cconv]: 2.234e-05 [loop_unroll]: 0.00043575 [opt_after_cconv]: 9.42e-05, [1] [Cycle 1]: 8.873e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.25002e-06 [cse]: 1.592e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.198e-05 [tuple_transform]: 6.926e-05, [1] [Cycle 1]: 6.491e-05, [4] [d_1]: 3.944e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.13002e-06 [partial_unused_args_eliminate]: 2.06e-06 [add_recomputation]: 4.367e-05 [cse_after_recomputation]: 1.923e-05, [1] [Cycle 1]: 1.493e-05, [1] [cse]: 9.87001e-06 [environ_conv]: 5.14998e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.30002e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.13002e-06 [micro_interleaved_order_control]: 2.24001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.04998e-06 [full_micro_interleaved_order_control]: 2.45002e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.62001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.50999e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64998e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.59998e-06 [offloading_packed_experts]: 3.95998e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.708e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 2.07999e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 6.78e-05, [1] [Cycle 1]: 6.302e-05, [6] [build]: 2.05002e-06 [elim_shapecalc]: 8.22e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 5.87001e-06 [fold_const_symbol]: 8.95001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.508e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.26001e-06 [opt_after_jit_grad]: 0.00043985 [validate]: 3.084e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.0426252 [execute]: 8.33999e-06 Sums bootstrap : 0.000466s : 0.91% type_inference : 0.004362s : 8.56% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000001s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000021s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000040s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.06% optimize.opt_a.loop_unroll : 0.000020s : 0.04% optimize.opt_a.a_1 : 0.000413s : 0.81% optimize.opt_a.with_stream_mark : 0.000023s : 0.05% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000347s : 0.68% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.04% optimize.opt_a.cse : 0.000038s : 0.07% optimize.opt_a.a_3 : 0.000071s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000443s : 0.87% optimize.opt_b.b_1 : 0.000107s : 0.21% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000436s : 0.86% optimize.opt_after_cconv.c_1 : 0.000027s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.09% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000002s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000002s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000440s : 0.86% validate : 0.000031s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042625s : 83.65% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000118 26 17.83% : 0.000021s : 4: substitution.arithmetic_simplify 1.54% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.65% : 0.000005s : 4: substitution.graph_param_transform 65.78% : 0.000078s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000004s : 4: substitution.remove_not_recompute_node 3.20% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004322 2 91.74% : 0.003966s : 1: type_inference.infer 8.26% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.98% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.96% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.01% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.51% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.22% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.75% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.78% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 1.06% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.63% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.59% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.14% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.26% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000251 6 42.84% : 0.000108s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.16% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062685 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.69% : 0.002939s : 1: add_attr 4.67% : 0.002930s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000056s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.78% : 0.000492s : 1: bootstrap 0.04% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000444s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.72% : 0.000451s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.21% : 0.000760s : 78: opt.transform.opt_a 0.04% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.14% : 0.000090s : 28: opt.transform.opt_b 0.07% : 0.000044s : 2: opt.transform.opt_trans_graph 0.05% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.94% : 0.001842s : 1: opt_a 0.16% : 0.000098s : 1: opt_after_cconv 0.71% : 0.000448s : 1: opt_after_jit_grad 0.29% : 0.000183s : 1: opt_b 5.84% : 0.003659s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000025s : 1: pre_auto_parallel 0.03% : 0.000018s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.30% : 0.000187s : 1: renormalize.infer 0.24% : 0.000152s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000070s : 1: symbol_engine_optimizer 68.03% : 0.042643s : 1: task_emit 0.12% : 0.000072s : 1: tuple_transform 6.98% : 0.004375s : 1: type_inference 0.08% : 0.000051s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x7-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x8-pynative],max_mem:60.0M TotalTime = 0.021447, [24] [bootstrap]: 0.00057315 [type_inference]: 0.00630137 [event_method]: 1.393e-05 [auto_monad]: 5.489e-05 [graph_reusing]: 5.42999e-06 [inline]: 1.60999e-06 [add_attr]: 0.00340471, [1] [add_attr_with_inline]: 0.00339322, [1] [Cycle 1]: 4.521e-05, [2] [tag_attr]: 1.59e-05 [meta_addattr_fg_expand]: 4.05e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.777e-05 [insert-virtual-dataset]: 2.25002e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 1.92001e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00402888, [53] [py_interpret_to_execute]: 1.973e-05 [rewriter_before_opt_a]: 6.031e-05 [opt_a]: 0.00219876, [2] [Cycle 1]: 0.00158861, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 3.178e-05 [loop_unroll]: 7.899e-05 [a_1]: 0.00045724 [with_stream_mark]: 1.331e-05 [recompute_prepare]: 7.53999e-06 [updatestate_depend_eliminate]: 3.52002e-06 [updatestate_assign_eliminate]: 3.34001e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.641e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 1.86e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 8.23999e-06 [auto_parallel]: 6.11998e-06 [parallel]: 2.547e-05 [flash_sp]: 7.78001e-06 [merge_comm]: 3.72002e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 8.64e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.37997e-06 [virtual_dataset]: 6.16998e-06 [get_grad_eliminate_]: 5.68002e-06 [virtual_output]: 6.14999e-06 [merge_forward]: 3.55998e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 8.96002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.31002e-06 [before_grad]: 9.18002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.31998e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.79999e-06 [after_resolve]: 9.70002e-06 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00042345 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.321e-05 [cse]: 2.695e-05 [a_3]: 4.159e-05 [Cycle 2]: 0.00060087, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00012613 [with_stream_mark]: 9.67001e-06 [recompute_prepare]: 5.81e-06 [updatestate_depend_eliminate]: 2.75002e-06 [updatestate_assign_eliminate]: 2.12001e-06 [updatestate_loads_eliminate]: 2.39001e-06 [parameter_eliminate]: 1.27999e-06 [a_2]: 6.906e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.25999e-06 [parallel]: 3.95e-06 [flash_sp]: 3.17002e-06 [merge_comm]: 3.07002e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 7e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.26e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 5.87001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.58002e-06 [merge_recompute_call_nodes]: 6.40022e-07 [before_grad]: 7.85998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.68997e-06 [a_after_grad]: 8.56002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 6.45002e-06 [cse]: 1.255e-05 [a_3]: 3.257e-05 [py_interpret_to_execute_after_opt_a]: 7.97003e-06 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 3.059e-05 [convert_after_rewriter]: 6.91999e-06 [order_py_execute_after_rewriter]: 4.99e-06 [mutable_eliminate]: 0.00044493 [opt_b]: 0.00018126, [1] [Cycle 1]: 0.00017529, [7] [b_1]: 0.00010875 [b_2]: 7.30003e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.30008e-07 [cse]: 1.566e-05 [optimize_parallel_all_gather_comm]: 1.528e-05 [overlap_param_gather]: 2.16998e-06 [cconv]: 2.185e-05 [loop_unroll]: 0.00041062 [opt_after_cconv]: 9.529e-05, [1] [Cycle 1]: 8.968e-05, [7] [c_1]: 2.817e-05 [parameter_eliminate]: 2.24999e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.53003e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.631e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.162e-05 [tuple_transform]: 6.843e-05, [1] [Cycle 1]: 6.411e-05, [4] [d_1]: 3.869e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 4.755e-05 [cse_after_recomputation]: 1.944e-05, [1] [Cycle 1]: 1.522e-05, [1] [cse]: 1.01e-05 [environ_conv]: 4.63999e-06 [swap_dp_allreduce_reducescatter]: 5.19998e-06 [bias_add_comm_swap]: 2.72001e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.32e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.37001e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 8.99978e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.06e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.90001e-06 [control_data_broadcast_order]: 1.164e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.41002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.21998e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.70001e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 6.99e-05, [1] [Cycle 1]: 6.559e-05, [6] [build]: 2.59001e-06 [elim_shapecalc]: 8.59998e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 6.36e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.71998e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.61999e-06 [opt_after_jit_grad]: 0.0004897 [validate]: 3.275e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00627207 [execute]: 7.35e-06 Sums bootstrap : 0.000573s : 3.36% type_inference : 0.006301s : 36.91% event_method : 0.000014s : 0.08% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000085s : 0.50% optimize.opt_a.a_1 : 0.000583s : 3.42% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000016s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000424s : 2.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000040s : 0.23% optimize.opt_a.a_3 : 0.000074s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 2.61% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000411s : 2.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000490s : 2.87% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006272s : 36.74% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.51% : 0.000024s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.08% : 0.000005s : 4: substitution.graph_param_transform 66.88% : 0.000111s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000004s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 6.77% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006256 2 89.88% : 0.005623s : 1: type_inference.infer 10.12% : 0.000633s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.58% : 0.000027s : 3: replace.inline 30.42% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.49% : 0.000109s : 3: match.inline 8.51% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.98% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.64% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.39% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.03% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.28% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.48% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 1.04% : 0.000002s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000360 8 46.98% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.02% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030411 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.21% : 0.003409s : 1: add_attr 11.17% : 0.003397s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.01% : 0.000610s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.14% : 0.000954s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.24% : 0.002202s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.64% : 0.000500s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.26% : 0.004033s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.71% : 0.000215s : 1: renormalize.infer 0.66% : 0.000202s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000065s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000073s : 1: symbol_engine_optimizer 20.66% : 0.006282s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.76% : 0.006315s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.018077, [24] [bootstrap]: 0.00047206 [type_inference]: 0.00435308 [event_method]: 1.064e-05 [auto_monad]: 4.949e-05 [graph_reusing]: 5.00999e-06 [inline]: 1.67999e-06 [add_attr]: 0.00297705, [1] [add_attr_with_inline]: 0.00296919, [1] [Cycle 1]: 4.543e-05, [2] [tag_attr]: 1.141e-05 [meta_addattr_fg_expand]: 2.99999e-06 [parallel-infer-symbol]: 3.09999e-06 [pre_auto_parallel]: 2.118e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 6.79982e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.94e-06 [optimize]: 0.00366311, [53] [py_interpret_to_execute]: 1.575e-05 [rewriter_before_opt_a]: 3.859e-05 [opt_a]: 0.00184908, [2] [Cycle 1]: 0.00124761, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 2.435e-05 [loop_unroll]: 1.384e-05 [a_1]: 0.00029183 [with_stream_mark]: 1.343e-05 [recompute_prepare]: 7.32002e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 3.01999e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 7.647e-05 [accelerated_algorithm]: 6.35002e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 7.56001e-06 [auto_parallel]: 5.83002e-06 [parallel]: 1.709e-05 [flash_sp]: 7.29001e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.51001e-06 [matmul_add_comm_reduction]: 8.91997e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.22002e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.99999e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.15999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.84999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.18998e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.44999e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.60001e-06 [renormalize]: 0.00033864 [add_forward_monad_depend]: 4.32003e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 2.617e-05 [a_3]: 4.04e-05 [Cycle 2]: 0.00059241, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 7.03e-06 [loop_unroll]: 5.46998e-06 [a_1]: 0.00012576 [with_stream_mark]: 1.052e-05 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.85002e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.782e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.04e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.30999e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.23001e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 5.08002e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.01e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.64002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 1.59998e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.77999e-06 [a_after_grad]: 8.30999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 5.84e-06 [cse]: 1.178e-05 [a_3]: 3.226e-05 [py_interpret_to_execute_after_opt_a]: 7.55e-06 [slice_cell_reuse_recomputed_activation]: 1.67001e-06 [rewriter_after_opt_a]: 3.169e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00043826 [opt_b]: 0.00018189, [1] [Cycle 1]: 0.00017569, [7] [b_1]: 0.00010886 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 3.00002e-07 [cse]: 1.524e-05 [optimize_parallel_all_gather_comm]: 1.552e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 2.304e-05 [loop_unroll]: 0.00041144 [opt_after_cconv]: 9.469e-05, [1] [Cycle 1]: 8.875e-05, [7] [c_1]: 2.751e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.601e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 8.808e-05, [1] [Cycle 1]: 8.381e-05, [4] [d_1]: 5.68e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.76998e-06 [add_recomputation]: 4.309e-05 [cse_after_recomputation]: 2.02e-05, [1] [Cycle 1]: 1.591e-05, [1] [cse]: 1.073e-05 [environ_conv]: 4.90001e-06 [swap_dp_allreduce_reducescatter]: 5.22999e-06 [bias_add_comm_swap]: 2.78e-06 [label_micro_interleaved_index]: 4.22998e-06 [label_fine_grained_interleaved_index]: 2.86e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.78003e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.90025e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.126e-05 [grouped_pairwise_exchange_alltoall]: 1.51002e-06 [offloading_packed_experts]: 3.45998e-06 [overlap_recompute_and_grad_model_parallel]: 4.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36998e-06 [overlap_recompute_comm]: 2.46998e-06 [overlap_grad_ring_attention]: 3.83999e-06 [overlap_grad_flash_sp]: 1.694e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.703e-05, [1] [Cycle 1]: 6.281e-05, [6] [build]: 2.04e-06 [elim_shapecalc]: 8.03999e-06 [elim_not_effective]: 1.124e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.33002e-06 [auto_monad_reorder]: 1.539e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00044501 [validate]: 3.091e-05 [backend_pass]: 1.04003e-06 [task_emit]: 0.00581622 [execute]: 6.60997e-06 Sums bootstrap : 0.000472s : 3.34% type_inference : 0.004353s : 30.78% event_method : 0.000011s : 0.08% auto_monad : 0.000049s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000418s : 2.95% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000438s : 3.10% optimize.opt_b.b_1 : 0.000109s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000411s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000057s : 0.40% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000445s : 3.15% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005816s : 41.12% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.28% : 0.000022s : 4: substitution.arithmetic_simplify 1.64% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.36% : 0.000005s : 4: substitution.graph_param_transform 65.42% : 0.000077s : 2: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000004s : 4: substitution.remove_not_recompute_node 3.34% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004313 2 92.15% : 0.003974s : 1: type_inference.infer 7.85% : 0.000339s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.00% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.15% : 0.000002s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.89% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.76% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.70% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 1.05% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.69% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.64% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.93% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 43.06% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.94% : 0.000133s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026002 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.47% : 0.002981s : 1: add_attr 11.43% : 0.002973s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.95% : 0.000506s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000448s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.96% : 0.000770s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.24% : 0.000061s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.12% : 0.001852s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000454s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 14.10% : 0.003667s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.08% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000185s : 1: renormalize.infer 0.57% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.40% : 0.005826s : 1: task_emit 0.35% : 0.000091s : 1: tuple_transform 16.79% : 0.004366s : 1: type_inference 0.22% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x8-kbk],max_mem:60.0M TotalTime = 0.889202, [24] [bootstrap]: 0.00054098 [type_inference]: 0.00602166 [event_method]: 1.45e-05 [auto_monad]: 5.773e-05 [graph_reusing]: 5.17e-06 [inline]: 1.64e-06 [add_attr]: 0.00342841, [1] [add_attr_with_inline]: 0.0034177, [1] [Cycle 1]: 4.659e-05, [2] [tag_attr]: 1.617e-05 [meta_addattr_fg_expand]: 3.94002e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.702e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.00401051, [53] [py_interpret_to_execute]: 1.997e-05 [rewriter_before_opt_a]: 5.905e-05 [opt_a]: 0.00216409, [2] [Cycle 1]: 0.00156053, [45] [expand_dump_flag]: 3.27002e-06 [switch_simplify]: 3.139e-05 [loop_unroll]: 2.126e-05 [a_1]: 0.00048543 [with_stream_mark]: 1.428e-05 [recompute_prepare]: 8.13001e-06 [updatestate_depend_eliminate]: 3.49001e-06 [updatestate_assign_eliminate]: 3.06999e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.604e-05 [accelerated_algorithm]: 6.51999e-06 [shard]: 2.01998e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.94e-06 [merge_send_recv]: 8e-06 [auto_parallel]: 6.11e-06 [parallel]: 2.324e-05 [flash_sp]: 7.71999e-06 [merge_comm]: 3.62002e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 9.13002e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.17002e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.55001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.06e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.32001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35003e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.43002e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.87e-06 [renormalize]: 0.00043011 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.428e-05 [cse]: 2.737e-05 [a_3]: 3.96e-05 [Cycle 2]: 0.00059418, [45] [expand_dump_flag]: 7.80012e-07 [switch_simplify]: 6.79999e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00012493 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.12999e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.816e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.20999e-06 [auto_parallel]: 5.36002e-06 [parallel]: 4.33999e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.11002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.20002e-06 [virtual_dataset]: 5.15001e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 5.03002e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.02999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.117e-05 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.40023e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.004e-05 [a_after_grad]: 8.12e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.40024e-07 [auto_monad_eliminator]: 6.53003e-06 [cse]: 1.144e-05 [a_3]: 3.236e-05 [py_interpret_to_execute_after_opt_a]: 7.41999e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.127e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.10999e-06 [mutable_eliminate]: 0.00045138 [opt_b]: 0.00018284, [1] [Cycle 1]: 0.0001769, [7] [b_1]: 0.00010864 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 4.80009e-07 [cse]: 1.619e-05 [optimize_parallel_all_gather_comm]: 1.64e-05 [overlap_param_gather]: 2.42001e-06 [cconv]: 2.406e-05 [loop_unroll]: 0.00041558 [opt_after_cconv]: 9.414e-05, [1] [Cycle 1]: 8.853e-05, [7] [c_1]: 2.807e-05 [parameter_eliminate]: 2.18998e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.567e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.252e-05 [tuple_transform]: 6.844e-05, [1] [Cycle 1]: 6.404e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 4.662e-05 [cse_after_recomputation]: 2.024e-05, [1] [Cycle 1]: 1.548e-05, [1] [cse]: 1.047e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 5.00001e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.79e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.30999e-06 [full_micro_interleaved_order_control]: 2.18002e-06 [reorder_send_recv_between_fp_bp]: 2.61999e-06 [comm_op_add_attrs]: 1.20999e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.61001e-06 [overlap_recompute_and_grad_model_parallel]: 4.61002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.45002e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.709e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.15002e-06 [split_layernorm_comm]: 2.08998e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 6.947e-05, [1] [Cycle 1]: 6.537e-05, [6] [build]: 2.65002e-06 [elim_shapecalc]: 9.09e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.79003e-06 [renormalize]: 2.3999e-07 [detach_backward]: 2.02001e-06 [pipeline_parallel_scheduler]: 1.64e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 1.13001e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.0005064 [validate]: 3.177e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.874245 [execute]: 9.14e-06 Sums bootstrap : 0.000541s : 0.06% type_inference : 0.006022s : 0.68% event_method : 0.000014s : 0.00% auto_monad : 0.000058s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000059s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000610s : 0.07% optimize.opt_a.with_stream_mark : 0.000024s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000028s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000020s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000430s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.05% optimize.opt_b.b_1 : 0.000109s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.00% optimize.loop_unroll : 0.000416s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.01% optimize.cse_after_recomputation.cse : 0.000010s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000506s : 0.06% validate : 0.000032s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.874245s : 98.81% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000169 30 14.27% : 0.000024s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 68.04% : 0.000115s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 6.13% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005978 2 90.88% : 0.005432s : 1: type_inference.infer 9.12% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.59% : 0.000027s : 3: replace.inline 30.41% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.36% : 0.000113s : 3: match.inline 7.64% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.06% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.66% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.58% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.34% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.77% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.62% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.19% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 4: predicate.value_based_eliminate 0.67% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000343 8 46.83% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.17% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.898145 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.38% : 0.003433s : 1: add_attr 0.38% : 0.003421s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000063s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000578s : 1: bootstrap 0.00% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000021s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000008s : 1: label_micro_interleaved_index 0.05% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000461s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000012s : 1: opt.transform.mutable_eliminate 0.11% : 0.000980s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000091s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.24% : 0.002167s : 1: opt_a 0.01% : 0.000097s : 1: opt_after_cconv 0.06% : 0.000517s : 1: opt_after_jit_grad 0.02% : 0.000186s : 1: opt_b 0.45% : 0.004014s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000006s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000031s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000223s : 1: renormalize.infer 0.02% : 0.000200s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 97.34% : 0.874266s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.67% : 0.006035s : 1: type_inference 0.01% : 0.000057s : 1: validate TotalTime = 0.0754176, [24] [bootstrap]: 0.00048958 [type_inference]: 0.00437125 [event_method]: 1.11e-05 [auto_monad]: 5.016e-05 [graph_reusing]: 5.17e-06 [inline]: 1.92999e-06 [add_attr]: 0.00301426, [1] [add_attr_with_inline]: 0.00300573, [1] [Cycle 1]: 8.046e-05, [2] [tag_attr]: 4.36e-05 [meta_addattr_fg_expand]: 3.5e-06 [parallel-infer-symbol]: 2.72001e-06 [pre_auto_parallel]: 2.171e-05 [insert-virtual-dataset]: 2.46e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.0036796, [53] [py_interpret_to_execute]: 1.475e-05 [rewriter_before_opt_a]: 4.099e-05 [opt_a]: 0.00187122, [2] [Cycle 1]: 0.00126309, [45] [expand_dump_flag]: 2.84001e-06 [switch_simplify]: 2.508e-05 [loop_unroll]: 1.388e-05 [a_1]: 0.00029459 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.83999e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 2.96999e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.741e-05 [accelerated_algorithm]: 6.28e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.37999e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 8.14002e-06 [auto_parallel]: 5.86998e-06 [parallel]: 1.854e-05 [flash_sp]: 7.44002e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.48999e-06 [matmul_add_comm_reduction]: 8.90001e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 7.54002e-06 [virtual_dataset]: 5.81998e-06 [get_grad_eliminate_]: 5.59998e-06 [virtual_output]: 5.53997e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.77999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.32002e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.32001e-06 [receive_attached]: 2.46998e-06 [after_resolve]: 1.043e-05 [a_after_grad]: 8.89998e-06 [renormalize]: 0.00034886 [add_forward_monad_depend]: 4.55001e-06 [auto_monad_grad]: 2.14e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 2.702e-05 [a_3]: 4.092e-05 [Cycle 2]: 0.00059822, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.98998e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00012594 [with_stream_mark]: 9.67001e-06 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 2.85998e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.70002e-06 [parameter_eliminate]: 8.49977e-07 [a_2]: 6.968e-05 [accelerated_algorithm]: 5.71998e-06 [shard]: 9.90025e-07 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.49e-06 [merge_send_recv]: 4.44998e-06 [auto_parallel]: 5.00999e-06 [parallel]: 3.77998e-06 [flash_sp]: 3.48e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.19e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.63998e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.14003e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.51998e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 5.77999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.3e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.25999e-06 [a_after_grad]: 8.32e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.21e-06 [cse]: 1.159e-05 [a_3]: 3.277e-05 [py_interpret_to_execute_after_opt_a]: 7.51001e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 2.944e-05 [convert_after_rewriter]: 7.35998e-06 [order_py_execute_after_rewriter]: 5.35001e-06 [mutable_eliminate]: 0.00044852 [opt_b]: 0.00018126, [1] [Cycle 1]: 0.00017524, [7] [b_1]: 0.00010877 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 4.60015e-07 [cse]: 1.656e-05 [optimize_parallel_all_gather_comm]: 1.646e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 2.246e-05 [loop_unroll]: 0.00041713 [opt_after_cconv]: 9.428e-05, [1] [Cycle 1]: 8.876e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.16998e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.582e-05 [renormalize]: 9.30013e-07 [remove_dup_value]: 1.238e-05 [tuple_transform]: 6.983e-05, [1] [Cycle 1]: 6.55e-05, [4] [d_1]: 3.968e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 2.69996e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.5e-05 [cse_after_recomputation]: 1.946e-05, [1] [Cycle 1]: 1.525e-05, [1] [cse]: 1.027e-05 [environ_conv]: 4.77e-06 [swap_dp_allreduce_reducescatter]: 5.47999e-06 [bias_add_comm_swap]: 2.21e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.34999e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 8.99978e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.20999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.45999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 1.98002e-06 [overlap_grad_ring_attention]: 3.93999e-06 [overlap_grad_flash_sp]: 1.63e-05 [begin_end_overlap_inline]: 7.99977e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.809e-05, [1] [Cycle 1]: 6.392e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.20999e-06 [elim_not_effective]: 1.181e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.561e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00047682 [validate]: 3.186e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0630173 [execute]: 9.27999e-06 Sums bootstrap : 0.000490s : 0.68% type_inference : 0.004371s : 6.12% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000044s : 0.06% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000421s : 0.59% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000349s : 0.49% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000029s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.63% optimize.opt_b.b_1 : 0.000109s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000417s : 0.58% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000045s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000477s : 0.67% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063017s : 88.17% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000123 26 18.27% : 0.000022s : 4: substitution.arithmetic_simplify 1.76% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.45% : 0.000005s : 4: substitution.graph_param_transform 65.42% : 0.000080s : 2: substitution.inline 2.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.40% : 0.000004s : 4: substitution.remove_not_recompute_node 3.23% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004331 2 91.84% : 0.003977s : 1: type_inference.infer 8.16% : 0.000353s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000138 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.69% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.48% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.26% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.91% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000008s : 44: predicate.inline 1.09% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.37% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.70% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.46% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 1.05% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.95% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 41.87% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.13% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.083396 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.62% : 0.003019s : 1: add_attr 3.61% : 0.003009s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000525s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.93% : 0.000779s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.25% : 0.001874s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.58% : 0.000487s : 1: opt_after_jit_grad 0.22% : 0.000184s : 1: opt_b 4.42% : 0.003683s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000192s : 1: renormalize.infer 0.18% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000033s : 1: rewriter_after_opt_a 0.05% : 0.000045s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 75.59% : 0.063036s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.26% : 0.004385s : 1: type_inference 0.06% : 0.000054s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x8-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x9-pynative],max_mem:60.0M TotalTime = 0.02131, [24] [bootstrap]: 0.00056272 [type_inference]: 0.00612528 [event_method]: 1.456e-05 [auto_monad]: 5.505e-05 [graph_reusing]: 5.76e-06 [inline]: 1.60999e-06 [add_attr]: 0.00342291, [1] [add_attr_with_inline]: 0.00341176, [1] [Cycle 1]: 4.39e-05, [2] [tag_attr]: 1.526e-05 [meta_addattr_fg_expand]: 4.28999e-06 [parallel-infer-symbol]: 2.71999e-06 [pre_auto_parallel]: 2.783e-05 [insert-virtual-dataset]: 2.51e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00405339, [53] [py_interpret_to_execute]: 2.362e-05 [rewriter_before_opt_a]: 5.959e-05 [opt_a]: 0.00219551, [2] [Cycle 1]: 0.00157609, [45] [expand_dump_flag]: 2.61e-06 [switch_simplify]: 3.296e-05 [loop_unroll]: 2.183e-05 [a_1]: 0.00047333 [with_stream_mark]: 1.492e-05 [recompute_prepare]: 8.31002e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 3.66999e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.791e-05 [accelerated_algorithm]: 6.78e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.04001e-06 [merge_send_recv]: 8.30999e-06 [auto_parallel]: 5.71e-06 [parallel]: 2.341e-05 [flash_sp]: 7.25e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.35998e-06 [matmul_add_comm_reduction]: 8.76997e-06 [allreduce_slice_to_reducescatter]: 8.30012e-07 [virtual_shard_identity]: 7.52998e-06 [virtual_dataset]: 6.01003e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.79e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.42001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.132e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 9.43002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.047e-05 [a_after_grad]: 1.127e-05 [renormalize]: 0.0004375 [add_forward_monad_depend]: 4.26001e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.355e-05 [cse]: 2.998e-05 [a_3]: 4.167e-05 [Cycle 2]: 0.0006099, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 7.22002e-06 [loop_unroll]: 5.61003e-06 [a_1]: 0.0001289 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 5.96998e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.95e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.64002e-06 [auto_parallel]: 5.42001e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.3e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.02999e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 6.44999e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.97999e-06 [virtual_output]: 5.48002e-06 [merge_forward]: 2.91e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 5.90002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 6.40022e-07 [before_grad]: 8.60001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23998e-06 [meta_fg_expand]: 1.71998e-06 [flash_sp_send_recv_attached]: 1.04e-06 [receive_attached]: 1.02998e-06 [after_resolve]: 8.78001e-06 [a_after_grad]: 8.39002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.322e-05 [a_3]: 3.317e-05 [py_interpret_to_execute_after_opt_a]: 7.36001e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.068e-05 [convert_after_rewriter]: 6.59999e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.00044887 [opt_b]: 0.00018557, [1] [Cycle 1]: 0.00017937, [7] [b_1]: 0.00011091 [b_2]: 7.49002e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 4.00003e-07 [cse]: 1.702e-05 [optimize_parallel_all_gather_comm]: 1.915e-05 [overlap_param_gather]: 1.95001e-06 [cconv]: 2.299e-05 [loop_unroll]: 0.0004183 [opt_after_cconv]: 9.709e-05, [1] [Cycle 1]: 9.146e-05, [7] [c_1]: 2.824e-05 [parameter_eliminate]: 2.46e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.675e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.274e-05 [tuple_transform]: 6.969e-05, [1] [Cycle 1]: 6.531e-05, [4] [d_1]: 3.963e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.44001e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.749e-05 [cse_after_recomputation]: 2.15e-05, [1] [Cycle 1]: 1.71e-05, [1] [cse]: 1.188e-05 [environ_conv]: 4.75999e-06 [swap_dp_allreduce_reducescatter]: 5.67999e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.64001e-06 [merge_cast_opt]: 1.40001e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 2.22001e-06 [assign_add_opt]: 1.41002e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 1.08001e-06 [add_comm_op_reuse_tag]: 1.05999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.22e-05 [grouped_pairwise_exchange_alltoall]: 1.99e-06 [offloading_packed_experts]: 3.77002e-06 [overlap_recompute_and_grad_model_parallel]: 4.11001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34003e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.91001e-06 [overlap_grad_flash_sp]: 1.689e-05 [begin_end_overlap_inline]: 7.30011e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.55001e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.928e-05, [1] [Cycle 1]: 6.513e-05, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.181e-05 [opt_reshape]: 6.35002e-06 [fold_const_symbol]: 9.01998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.02999e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00045087 [validate]: 3.186e-05 [backend_pass]: 1.22999e-06 [task_emit]: 0.00631154 [execute]: 6.92002e-06 Sums bootstrap : 0.000563s : 3.33% type_inference : 0.006125s : 36.23% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000024s : 0.14% optimize.rewriter_before_opt_a : 0.000060s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.24% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000602s : 3.56% optimize.opt_a.with_stream_mark : 0.000025s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000013s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000012s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000020s : 0.12% optimize.opt_a.renormalize : 0.000438s : 2.59% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000043s : 0.26% optimize.opt_a.a_3 : 0.000075s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 2.66% optimize.opt_b.b_1 : 0.000111s : 0.66% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000418s : 2.47% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000451s : 2.67% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006312s : 37.33% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000170 30 14.23% : 0.000024s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 67.96% : 0.000116s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.70% : 0.000005s : 4: substitution.remove_not_recompute_node 2.18% : 0.000004s : 4: substitution.replace_old_param 6.29% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006081 2 90.59% : 0.005509s : 1: type_inference.infer 9.41% : 0.000572s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.46% : 0.000028s : 3: replace.inline 30.54% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000123 5 92.21% : 0.000114s : 3: match.inline 7.79% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000002s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 0.98% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.99% : 0.000002s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 11: predicate.reduce_eliminate 2.43% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.56% : 0.000001s : 8: predicate.remove_not_recompute_node 1.47% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.35% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.58% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000370 8 47.66% : 0.000177s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.34% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030362 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.29% : 0.003427s : 1: add_attr 11.25% : 0.003415s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.99% : 0.000604s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.23% : 0.000981s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.31% : 0.000093s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.24% : 0.002198s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.52% : 0.000460s : 1: opt_after_jit_grad 0.62% : 0.000189s : 1: opt_b 13.36% : 0.004057s : 1: optimize 0.08% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000028s : 1: py_interpret_to_execute 0.03% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000212s : 1: renormalize.infer 0.72% : 0.000219s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 20.82% : 0.006322s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.22% : 0.006139s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0183141, [24] [bootstrap]: 0.00047048 [type_inference]: 0.00435148 [event_method]: 1.056e-05 [auto_monad]: 4.977e-05 [graph_reusing]: 5.67001e-06 [inline]: 1.59998e-06 [add_attr]: 0.00300361, [1] [add_attr_with_inline]: 0.00299534, [1] [Cycle 1]: 4.544e-05, [2] [tag_attr]: 1.176e-05 [meta_addattr_fg_expand]: 3.23e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.253e-05 [insert-virtual-dataset]: 2.83e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.64e-06 [optimize]: 0.00371931, [53] [py_interpret_to_execute]: 1.502e-05 [rewriter_before_opt_a]: 3.939e-05 [opt_a]: 0.0018653, [2] [Cycle 1]: 0.00125725, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 2.459e-05 [loop_unroll]: 1.418e-05 [a_1]: 0.00029584 [with_stream_mark]: 1.324e-05 [recompute_prepare]: 7.43999e-06 [updatestate_depend_eliminate]: 3.55003e-06 [updatestate_assign_eliminate]: 3.20002e-06 [updatestate_loads_eliminate]: 3.5e-06 [parameter_eliminate]: 1.99999e-06 [a_2]: 7.635e-05 [accelerated_algorithm]: 6.54001e-06 [shard]: 2.24999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.11e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 5.87001e-06 [parallel]: 1.721e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.57002e-06 [allreduce_fusion]: 3.83001e-06 [matmul_add_comm_reduction]: 8.87e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.66998e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.90998e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 9.01002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.51002e-06 [before_grad]: 9.38002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.48998e-06 [flash_sp_send_recv_attached]: 2.50002e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.105e-05 [a_after_grad]: 9.38997e-06 [renormalize]: 0.00033888 [add_forward_monad_depend]: 4.23999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.319e-05 [cse]: 2.717e-05 [a_3]: 4.026e-05 [Cycle 2]: 0.00059854, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.66e-06 [a_1]: 0.00012653 [with_stream_mark]: 1.091e-05 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.92002e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.763e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.29e-06 [merge_send_recv]: 4.62e-06 [auto_parallel]: 5.43002e-06 [parallel]: 4.12998e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 2.78003e-06 [matmul_add_comm_reduction]: 4.97999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.44999e-06 [virtual_dataset]: 5.78002e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.21002e-06 [merge_forward]: 2.71999e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.91e-06 [a_after_grad]: 8.77e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 9.5999e-07 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 6.14001e-06 [cse]: 1.244e-05 [a_3]: 3.322e-05 [py_interpret_to_execute_after_opt_a]: 7.73001e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 2.948e-05 [convert_after_rewriter]: 6.98998e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00044952 [opt_b]: 0.00018042, [1] [Cycle 1]: 0.0001746, [7] [b_1]: 0.00010795 [b_2]: 7.26001e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 3.59985e-07 [cse]: 1.583e-05 [optimize_parallel_all_gather_comm]: 1.516e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.247e-05 [loop_unroll]: 0.00046059 [opt_after_cconv]: 9.534e-05, [1] [Cycle 1]: 8.958e-05, [7] [c_1]: 2.803e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.12001e-06 [cse]: 1.634e-05 [renormalize]: 5.00004e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.992e-05, [1] [Cycle 1]: 6.569e-05, [4] [d_1]: 3.976e-05 [none_parameter_eliminate]: 1.78002e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.448e-05 [cse_after_recomputation]: 1.978e-05, [1] [Cycle 1]: 1.552e-05, [1] [cse]: 1.031e-05 [environ_conv]: 4.60001e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 9.30013e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64998e-06 [control_data_broadcast_order]: 1.185e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.685e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.908e-05, [1] [Cycle 1]: 6.495e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.40999e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.39999e-06 [fold_const_symbol]: 9.24e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.66998e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 1.619e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.00044991 [validate]: 3.285e-05 [backend_pass]: 1.14e-06 [task_emit]: 0.00595917 [execute]: 6.83e-06 Sums bootstrap : 0.000470s : 3.28% type_inference : 0.004351s : 30.32% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000422s : 2.94% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000007s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.15% optimize.opt_a.a_after_grad : 0.000018s : 0.13% optimize.opt_a.renormalize : 0.000339s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.13% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000461s : 3.21% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 3.13% validate : 0.000033s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.005959s : 41.52% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 19.17% : 0.000023s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.30% : 0.000005s : 4: substitution.graph_param_transform 64.99% : 0.000079s : 2: substitution.inline 2.09% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.49% : 0.000004s : 4: substitution.remove_not_recompute_node 3.50% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004312 2 91.96% : 0.003965s : 1: type_inference.infer 8.04% : 0.000347s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.95% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.66% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.39% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.46% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.15% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.05% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.75% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.43% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.57% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026317 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.43% : 0.003008s : 1: add_attr 11.40% : 0.002999s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.03% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000506s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.78% : 0.000469s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.96% : 0.000779s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001868s : 1: opt_a 0.37% : 0.000099s : 1: opt_after_cconv 1.75% : 0.000460s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.15% : 0.003723s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000186s : 1: renormalize.infer 0.55% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000033s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.68% : 0.005969s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.59% : 0.004365s : 1: type_inference 0.23% : 0.000060s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x9-kbk],max_mem:60.0M . TotalTime = 0.813504, [24] [bootstrap]: 0.00056444 [type_inference]: 0.00608266 [event_method]: 1.37e-05 [auto_monad]: 5.447e-05 [graph_reusing]: 5.47999e-06 [inline]: 1.77001e-06 [add_attr]: 0.00340878, [1] [add_attr_with_inline]: 0.00339755, [1] [Cycle 1]: 4.509e-05, [2] [tag_attr]: 1.531e-05 [meta_addattr_fg_expand]: 4.3e-06 [parallel-infer-symbol]: 2.59001e-06 [pre_auto_parallel]: 2.753e-05 [insert-virtual-dataset]: 2.25002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.72001e-06 [pipeline_split]: 1.51998e-06 [optimize]: 0.00401029, [53] [py_interpret_to_execute]: 2.05e-05 [rewriter_before_opt_a]: 5.751e-05 [opt_a]: 0.00212741, [2] [Cycle 1]: 0.00151972, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 3.195e-05 [loop_unroll]: 2.241e-05 [a_1]: 0.00045997 [with_stream_mark]: 1.394e-05 [recompute_prepare]: 7.94002e-06 [updatestate_depend_eliminate]: 3.88999e-06 [updatestate_assign_eliminate]: 3.45003e-06 [updatestate_loads_eliminate]: 2.83e-06 [parameter_eliminate]: 2.23002e-06 [a_2]: 7.633e-05 [accelerated_algorithm]: 6.55002e-06 [shard]: 1.83002e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 5.96998e-06 [merge_send_recv]: 7.56001e-06 [auto_parallel]: 6.39001e-06 [parallel]: 2.196e-05 [flash_sp]: 7.66999e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 8.35999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.65998e-06 [virtual_dataset]: 5.84e-06 [get_grad_eliminate_]: 5.97001e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.47999e-06 [before_grad]: 9.61e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.61e-06 [flash_sp_send_recv_attached]: 3.08e-06 [receive_attached]: 2.43002e-06 [after_resolve]: 1.039e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00041052 [add_forward_monad_depend]: 4.70001e-06 [auto_monad_grad]: 2.02001e-06 [auto_monad_eliminator]: 1.38e-05 [cse]: 2.706e-05 [a_3]: 4.146e-05 [Cycle 2]: 0.00059797, [45] [expand_dump_flag]: 1.09e-06 [switch_simplify]: 7.14001e-06 [loop_unroll]: 5.60001e-06 [a_1]: 0.00012826 [with_stream_mark]: 9.19998e-06 [recompute_prepare]: 5.91998e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.16003e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.917e-05 [accelerated_algorithm]: 5.63002e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.27e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.36002e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.77e-06 [flash_sp]: 3.16001e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.74e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.14e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.05998e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.11002e-06 [after_resolve]: 8.92999e-06 [a_after_grad]: 8.79998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.26e-06 [cse]: 1.219e-05 [a_3]: 3.234e-05 [py_interpret_to_execute_after_opt_a]: 7.76001e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.237e-05 [convert_after_rewriter]: 7.75998e-06 [order_py_execute_after_rewriter]: 5.04003e-06 [mutable_eliminate]: 0.00048051 [opt_b]: 0.0001837, [1] [Cycle 1]: 0.00017769, [7] [b_1]: 0.00010998 [b_2]: 7.46001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.19997e-07 [cse]: 1.579e-05 [optimize_parallel_all_gather_comm]: 1.667e-05 [overlap_param_gather]: 2.06998e-06 [cconv]: 2.187e-05 [loop_unroll]: 0.00042051 [opt_after_cconv]: 9.461e-05, [1] [Cycle 1]: 8.895e-05, [7] [c_1]: 2.864e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.565e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 1.247e-05 [tuple_transform]: 7.007e-05, [1] [Cycle 1]: 6.582e-05, [4] [d_1]: 3.958e-05 [none_parameter_eliminate]: 1.69998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 4.779e-05 [cse_after_recomputation]: 2.045e-05, [1] [Cycle 1]: 1.594e-05, [1] [cse]: 1.063e-05 [environ_conv]: 4.70999e-06 [swap_dp_allreduce_reducescatter]: 4.90001e-06 [bias_add_comm_swap]: 2.58003e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.52001e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.25002e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.42002e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42999e-06 [overlap_recompute_comm]: 2.13002e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.739e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.905e-05, [1] [Cycle 1]: 6.487e-05, [6] [build]: 2.17001e-06 [elim_shapecalc]: 8.87e-06 [elim_not_effective]: 1.179e-05 [opt_reshape]: 6.07001e-06 [fold_const_symbol]: 9.07001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.90001e-06 [pipeline_parallel_scheduler]: 1.57999e-06 [auto_monad_reorder]: 1.615e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.62002e-06 [opt_after_jit_grad]: 0.00044984 [validate]: 3.089e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.798564 [execute]: 8.94998e-06 Sums bootstrap : 0.000564s : 0.07% type_inference : 0.006083s : 0.75% event_method : 0.000014s : 0.00% auto_monad : 0.000054s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000028s : 0.00% optimize.opt_a.a_1 : 0.000588s : 0.07% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000014s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000006s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000018s : 0.00% optimize.opt_a.renormalize : 0.000411s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000074s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000008s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000481s : 0.06% optimize.opt_b.b_1 : 0.000110s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000421s : 0.05% optimize.opt_after_cconv.c_1 : 0.000029s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000040s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000450s : 0.06% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.798564s : 98.70% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000166 30 14.61% : 0.000024s : 5: substitution.arithmetic_simplify 1.20% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.07% : 0.000005s : 4: substitution.graph_param_transform 67.09% : 0.000111s : 3: substitution.inline 1.93% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.40% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006039 2 90.86% : 0.005487s : 1: type_inference.infer 9.14% : 0.000552s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.40% : 0.000027s : 3: replace.inline 30.60% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.97% : 0.000109s : 3: match.inline 8.03% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.75% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.44% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.37% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.52% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.76% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.73% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.34% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000353 8 47.67% : 0.000168s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.33% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.822446 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.41% : 0.003413s : 1: add_attr 0.41% : 0.003401s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000059s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000603s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000011s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000429s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.06% : 0.000489s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.12% : 0.000959s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000092s : 28: opt.transform.opt_b 0.01% : 0.000044s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.26% : 0.002130s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.06% : 0.000459s : 1: opt_after_jit_grad 0.02% : 0.000187s : 1: opt_b 0.49% : 0.004014s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000032s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.03% : 0.000212s : 1: renormalize.infer 0.02% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 97.10% : 0.798587s : 1: task_emit 0.01% : 0.000073s : 1: tuple_transform 0.74% : 0.006127s : 1: type_inference 0.01% : 0.000055s : 1: validate TotalTime = 0.0725186, [24] [bootstrap]: 0.0004846 [type_inference]: 0.00442756 [event_method]: 1.063e-05 [auto_monad]: 5.374e-05 [graph_reusing]: 5.59e-06 [inline]: 1.59998e-06 [add_attr]: 0.00305157, [1] [add_attr_with_inline]: 0.00304318, [1] [Cycle 1]: 4.515e-05, [2] [tag_attr]: 1.19e-05 [meta_addattr_fg_expand]: 2.99001e-06 [parallel-infer-symbol]: 2.79999e-06 [pre_auto_parallel]: 2.189e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00371514, [53] [py_interpret_to_execute]: 1.483e-05 [rewriter_before_opt_a]: 3.987e-05 [opt_a]: 0.00185527, [2] [Cycle 1]: 0.00125336, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 2.451e-05 [loop_unroll]: 1.375e-05 [a_1]: 0.00029426 [with_stream_mark]: 1.336e-05 [recompute_prepare]: 7.18e-06 [updatestate_depend_eliminate]: 3.49001e-06 [updatestate_assign_eliminate]: 3.09999e-06 [updatestate_loads_eliminate]: 3.38999e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.54e-05 [accelerated_algorithm]: 6.31e-06 [shard]: 2.44999e-06 [meta_shard_fg_expand]: 1.86998e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 8.08001e-06 [auto_parallel]: 6.07001e-06 [parallel]: 1.838e-05 [flash_sp]: 7.83001e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.72e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.90002e-06 [get_grad_eliminate_]: 5.41998e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.73001e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.065e-05 [merge_recompute_call_nodes]: 1.86e-06 [before_grad]: 9.17001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35003e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.56e-06 [after_resolve]: 1.022e-05 [a_after_grad]: 8.88002e-06 [renormalize]: 0.00034183 [add_forward_monad_depend]: 4.46002e-06 [auto_monad_grad]: 1.70001e-06 [auto_monad_eliminator]: 1.295e-05 [cse]: 2.719e-05 [a_3]: 4.062e-05 [Cycle 2]: 0.00059212, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.41002e-06 [a_1]: 0.00012481 [with_stream_mark]: 1.038e-05 [recompute_prepare]: 5.49e-06 [updatestate_depend_eliminate]: 2.83998e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.799e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 5.51002e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 5.38002e-06 [parallel]: 4.33999e-06 [flash_sp]: 2.91999e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.24001e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.89e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 5.68002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.96e-06 [merge_recompute_call_nodes]: 7.79983e-07 [before_grad]: 8.33001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.15999e-06 [a_after_grad]: 8.00999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.16e-06 [cse]: 1.229e-05 [a_3]: 3.187e-05 [py_interpret_to_execute_after_opt_a]: 7.88999e-06 [slice_cell_reuse_recomputed_activation]: 1.74998e-06 [rewriter_after_opt_a]: 3.521e-05 [convert_after_rewriter]: 7.04001e-06 [order_py_execute_after_rewriter]: 5.03002e-06 [mutable_eliminate]: 0.00045113 [opt_b]: 0.00021531, [1] [Cycle 1]: 0.00020912, [7] [b_1]: 0.00013925 [b_2]: 7.55e-06 [updatestate_depend_eliminate]: 5.44998e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 6.50005e-07 [cse]: 1.647e-05 [optimize_parallel_all_gather_comm]: 1.58e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.248e-05 [loop_unroll]: 0.0004153 [opt_after_cconv]: 9.54e-05, [1] [Cycle 1]: 8.973e-05, [7] [c_1]: 2.766e-05 [parameter_eliminate]: 2.07001e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.69e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.932e-05, [1] [Cycle 1]: 6.516e-05, [4] [d_1]: 3.936e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.846e-05 [cse_after_recomputation]: 2.156e-05, [1] [Cycle 1]: 1.699e-05, [1] [cse]: 1.142e-05 [environ_conv]: 4.55001e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.94999e-06 [label_micro_interleaved_index]: 4.35999e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.51e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 9.90025e-07 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.146e-05 [grouped_pairwise_exchange_alltoall]: 1.46002e-06 [offloading_packed_experts]: 3.91999e-06 [overlap_recompute_and_grad_model_parallel]: 4.87e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64998e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.716e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 6.948e-05, [1] [Cycle 1]: 6.534e-05, [6] [build]: 2.22001e-06 [elim_shapecalc]: 8.62998e-06 [elim_not_effective]: 1.227e-05 [opt_reshape]: 5.92001e-06 [fold_const_symbol]: 9.07999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.66998e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.549e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00045 [validate]: 3.399e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.0600225 [execute]: 7.71999e-06 Sums bootstrap : 0.000485s : 0.71% type_inference : 0.004428s : 6.46% event_method : 0.000011s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.61% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000342s : 0.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.66% optimize.opt_b.b_1 : 0.000139s : 0.20% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000450s : 0.66% validate : 0.000034s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.060023s : 87.62% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 17.77% : 0.000021s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.39% : 0.000005s : 4: substitution.graph_param_transform 65.97% : 0.000079s : 2: substitution.inline 2.44% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.78% : 0.000005s : 4: substitution.remove_not_recompute_node 3.15% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004389 2 91.62% : 0.004021s : 1: type_inference.infer 8.38% : 0.000368s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.65% : 0.000004s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.44% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.30% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.92% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.52% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 9: predicate.minmaximum_grad 1.28% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 0.89% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.71% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.44% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.66% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000263 6 43.58% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.42% : 0.000148s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080585 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.79% : 0.003056s : 1: add_attr 3.78% : 0.003047s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.64% : 0.000519s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.96% : 0.000770s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000121s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.31% : 0.001858s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.57% : 0.000459s : 1: opt_after_jit_grad 0.27% : 0.000219s : 1: opt_b 4.62% : 0.003719s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000189s : 1: renormalize.infer 0.18% : 0.000146s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000039s : 1: rewriter_after_opt_a 0.05% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 74.50% : 0.060039s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.51% : 0.004442s : 1: type_inference 0.07% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y7-dtype_x9-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x0-pynative],max_mem:60.0M TotalTime = 0.0214622, [24] [bootstrap]: 0.00055186 [type_inference]: 0.00624796 [event_method]: 1.409e-05 [auto_monad]: 5.491e-05 [graph_reusing]: 5.10001e-06 [inline]: 1.46002e-06 [add_attr]: 0.00343251, [1] [add_attr_with_inline]: 0.00342228, [1] [Cycle 1]: 4.538e-05, [2] [tag_attr]: 1.633e-05 [meta_addattr_fg_expand]: 3.75998e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.689e-05 [insert-virtual-dataset]: 2.49001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 1.56002e-06 [optimize]: 0.00404528, [53] [py_interpret_to_execute]: 2.179e-05 [rewriter_before_opt_a]: 5.789e-05 [opt_a]: 0.00213743, [2] [Cycle 1]: 0.00152957, [45] [expand_dump_flag]: 3.08e-06 [switch_simplify]: 3.318e-05 [loop_unroll]: 2.139e-05 [a_1]: 0.00045734 [with_stream_mark]: 1.358e-05 [recompute_prepare]: 7.56001e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.06001e-06 [updatestate_loads_eliminate]: 3.28e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.65e-05 [accelerated_algorithm]: 6.49001e-06 [shard]: 2.32999e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 6.08002e-06 [merge_send_recv]: 7.85e-06 [auto_parallel]: 6.04001e-06 [parallel]: 2.409e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.87e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.6e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.078e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 1.007e-05 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 2.61e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.054e-05 [a_after_grad]: 8.98002e-06 [renormalize]: 0.00042213 [add_forward_monad_depend]: 5.05001e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.323e-05 [cse]: 2.68e-05 [a_3]: 4.151e-05 [Cycle 2]: 0.00059829, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.82002e-06 [loop_unroll]: 5.73002e-06 [a_1]: 0.00012502 [with_stream_mark]: 9.46e-06 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.38998e-06 [parameter_eliminate]: 7.60017e-07 [a_2]: 6.817e-05 [accelerated_algorithm]: 5.77999e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.23001e-06 [auto_parallel]: 5.59e-06 [parallel]: 3.98001e-06 [flash_sp]: 3.23998e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 3.03e-06 [matmul_add_comm_reduction]: 5.04003e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.36998e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.89001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.71998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.83002e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 8.17e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23998e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 9.10001e-06 [a_after_grad]: 8e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.33e-06 [cse]: 1.265e-05 [a_3]: 3.221e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 2.31e-06 [rewriter_after_opt_a]: 2.952e-05 [convert_after_rewriter]: 6.77002e-06 [order_py_execute_after_rewriter]: 4.92e-06 [mutable_eliminate]: 0.00050184 [opt_b]: 0.00018264, [1] [Cycle 1]: 0.00017665, [7] [b_1]: 0.00010877 [b_2]: 7.06001e-06 [updatestate_depend_eliminate]: 5.52001e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.2998e-07 [cse]: 1.613e-05 [optimize_parallel_all_gather_comm]: 1.545e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 2.126e-05 [loop_unroll]: 0.00041888 [opt_after_cconv]: 9.483e-05, [1] [Cycle 1]: 8.911e-05, [7] [c_1]: 2.744e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.625e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 1.318e-05 [tuple_transform]: 6.966e-05, [1] [Cycle 1]: 6.54e-05, [4] [d_1]: 3.948e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.48e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 5.03e-05 [cse_after_recomputation]: 2.09e-05, [1] [Cycle 1]: 1.653e-05, [1] [cse]: 1.134e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.73003e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.50002e-06 [comm_op_add_attrs]: 1.31998e-06 [add_comm_op_reuse_tag]: 1.11002e-06 [interleave_split_concat_branches]: 1.23002e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.32e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.234e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 4.86002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.52001e-06 [overlap_grad_ring_attention]: 3.91999e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.08998e-06 [split_layernorm_comm]: 1.93002e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 7.113e-05, [1] [Cycle 1]: 6.699e-05, [6] [build]: 2.83e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.207e-05 [opt_reshape]: 6.43998e-06 [fold_const_symbol]: 9.17001e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.94e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.571e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 0.00011747 [opt_after_jit_grad]: 0.00045858 [validate]: 3.21e-05 [backend_pass]: 1.20001e-06 [task_emit]: 0.00623718 [execute]: 6.93998e-06 Sums bootstrap : 0.000552s : 3.24% type_inference : 0.006248s : 36.63% event_method : 0.000014s : 0.08% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000001s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000022s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000040s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000582s : 3.41% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000422s : 2.48% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000074s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000502s : 2.94% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.12% optimize.loop_unroll : 0.000419s : 2.46% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000117s : 0.69% opt_after_jit_grad : 0.000459s : 2.69% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006237s : 36.56% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.81% : 0.000024s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.15% : 0.000005s : 4: substitution.graph_param_transform 66.41% : 0.000109s : 3: substitution.inline 1.90% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.76% : 0.000005s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006207 2 90.64% : 0.005626s : 1: type_inference.infer 9.36% : 0.000581s : 1: type_inference.specialize ------[replace.] 0.000038 5 71.40% : 0.000027s : 3: replace.inline 28.60% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.47% : 0.000107s : 3: match.inline 8.53% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.31% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.63% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.89% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.51% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000002s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.93% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000385 8 49.28% : 0.000190s : 3: func_graph_cloner_run.FuncGraphClonerGraph 50.72% : 0.000195s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030470 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.28% : 0.003437s : 1: add_attr 11.24% : 0.003426s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000586s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.68% : 0.000511s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.13% : 0.000954s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 7.02% : 0.002140s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.54% : 0.000469s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.29% : 0.004049s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.69% : 0.000211s : 1: renormalize.infer 0.67% : 0.000204s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.40% : 0.000123s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000074s : 1: symbol_engine_optimizer 20.50% : 0.006247s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.55% : 0.006261s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0180696, [24] [bootstrap]: 0.00041931 [type_inference]: 0.00432236 [event_method]: 1.056e-05 [auto_monad]: 5.009e-05 [graph_reusing]: 5.86e-06 [inline]: 2.24999e-06 [add_attr]: 0.00297617, [1] [add_attr_with_inline]: 0.00296811, [1] [Cycle 1]: 4.108e-05, [2] [tag_attr]: 1.223e-05 [meta_addattr_fg_expand]: 3.58e-06 [parallel-infer-symbol]: 3.23e-06 [pre_auto_parallel]: 2.269e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00368442, [53] [py_interpret_to_execute]: 1.573e-05 [rewriter_before_opt_a]: 3.74e-05 [opt_a]: 0.00188012, [2] [Cycle 1]: 0.0012661, [45] [expand_dump_flag]: 2.49001e-06 [switch_simplify]: 2.5e-05 [loop_unroll]: 1.403e-05 [a_1]: 0.00028991 [with_stream_mark]: 1.361e-05 [recompute_prepare]: 7.85998e-06 [updatestate_depend_eliminate]: 3.78001e-06 [updatestate_assign_eliminate]: 3.04999e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.748e-05 [accelerated_algorithm]: 6.47001e-06 [shard]: 2.61e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 8.07998e-06 [auto_parallel]: 5.94999e-06 [parallel]: 1.821e-05 [flash_sp]: 6.91001e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.62002e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.90998e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.96002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.148e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.59e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.032e-05 [a_after_grad]: 8.88002e-06 [renormalize]: 0.00033892 [add_forward_monad_depend]: 4.73001e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.262e-05 [cse]: 4.109e-05 [a_3]: 4.176e-05 [Cycle 2]: 0.00060497, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 7.08e-06 [loop_unroll]: 5.64998e-06 [a_1]: 0.0001265 [with_stream_mark]: 1.183e-05 [recompute_prepare]: 5.89999e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.58998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.781e-05 [accelerated_algorithm]: 5.79e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.26002e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.64998e-06 [flash_sp]: 3.43e-06 [merge_comm]: 3.29001e-06 [allreduce_fusion]: 2.75997e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.74001e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.28002e-06 [virtual_output]: 5.25999e-06 [merge_forward]: 2.96001e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.93002e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.15998e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.58e-06 [cse]: 1.276e-05 [a_3]: 3.335e-05 [py_interpret_to_execute_after_opt_a]: 7.40998e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.142e-05 [convert_after_rewriter]: 6.71999e-06 [order_py_execute_after_rewriter]: 5.04998e-06 [mutable_eliminate]: 0.0004466 [opt_b]: 0.00018324, [1] [Cycle 1]: 0.00017704, [7] [b_1]: 0.00010957 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 5.24998e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.80009e-07 [cse]: 1.591e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.13e-05 [loop_unroll]: 0.00041246 [opt_after_cconv]: 9.553e-05, [1] [Cycle 1]: 8.993e-05, [7] [c_1]: 2.783e-05 [parameter_eliminate]: 2.64001e-06 [updatestate_depend_eliminate]: 5.06997e-06 [updatestate_assign_eliminate]: 2.69999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.614e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 1.236e-05 [tuple_transform]: 6.976e-05, [1] [Cycle 1]: 6.545e-05, [4] [d_1]: 3.932e-05 [none_parameter_eliminate]: 1.94e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 2.14e-06 [add_recomputation]: 4.373e-05 [cse_after_recomputation]: 2.052e-05, [1] [Cycle 1]: 1.616e-05, [1] [cse]: 1.109e-05 [environ_conv]: 5.12999e-06 [swap_dp_allreduce_reducescatter]: 5.61e-06 [bias_add_comm_swap]: 2.47001e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.46002e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.21998e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.35999e-06 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72999e-06 [control_data_broadcast_order]: 1.2e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 3.87998e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.16001e-06 [overlap_grad_flash_sp]: 1.603e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.919e-05, [1] [Cycle 1]: 6.476e-05, [6] [build]: 2.11e-06 [elim_shapecalc]: 8.16002e-06 [elim_not_effective]: 1.181e-05 [opt_reshape]: 6.44001e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.76998e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.519e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.000444 [validate]: 2.981e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00587872 [execute]: 7.77002e-06 Sums bootstrap : 0.000419s : 2.96% type_inference : 0.004322s : 30.55% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000023s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.23% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000416s : 2.94% optimize.opt_a.with_stream_mark : 0.000025s : 0.18% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000054s : 0.38% optimize.opt_a.a_3 : 0.000075s : 0.53% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000447s : 3.16% optimize.opt_b.b_1 : 0.000110s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000412s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.02% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.05% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000444s : 3.14% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005879s : 41.55% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.18% : 0.000022s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.31% : 0.000005s : 4: substitution.graph_param_transform 65.16% : 0.000078s : 2: substitution.inline 2.53% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.07% : 0.000005s : 4: substitution.remove_not_recompute_node 3.12% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004282 2 91.94% : 0.003937s : 1: type_inference.infer 8.06% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.60% : 0.000004s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.95% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.91% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.84% : 0.000003s : 17: predicate.list_to_tuple_eliminator_ 2.11% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.71% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.12% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.81% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000001s : 8: predicate.shard_identity_eliminate 1.01% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.36% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.58% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 41.86% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.14% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026007 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.46% : 0.002980s : 1: add_attr 11.43% : 0.002971s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.72% : 0.000446s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000010s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000456s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.98% : 0.000776s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.36% : 0.000093s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.24% : 0.001883s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.74% : 0.000453s : 1: opt_after_jit_grad 0.72% : 0.000186s : 1: opt_b 14.18% : 0.003688s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000187s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000072s : 1: symbol_engine_optimizer 22.64% : 0.005888s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.67% : 0.004336s : 1: type_inference 0.21% : 0.000055s : 1: validate TotalTime = 0.0194321, [24] [bootstrap]: 0.00041515 [type_inference]: 0.00541381 [event_method]: 1.377e-05 [auto_monad]: 5.292e-05 [graph_reusing]: 5.43997e-06 [inline]: 1.84998e-06 [add_attr]: 0.00296505, [1] [add_attr_with_inline]: 0.00295678, [1] [Cycle 1]: 4.597e-05, [2] [tag_attr]: 1.61e-05 [meta_addattr_fg_expand]: 4.26001e-06 [parallel-infer-symbol]: 2.57001e-06 [pre_auto_parallel]: 2.5e-05 [insert-virtual-dataset]: 2.58003e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.96003e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00398171, [53] [py_interpret_to_execute]: 1.999e-05 [rewriter_before_opt_a]: 5.753e-05 [opt_a]: 0.00215584, [2] [Cycle 1]: 0.00154881, [45] [expand_dump_flag]: 2.99999e-06 [switch_simplify]: 3.267e-05 [loop_unroll]: 2.144e-05 [a_1]: 0.00045162 [with_stream_mark]: 1.234e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.07002e-06 [updatestate_loads_eliminate]: 3.12002e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.768e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.98002e-06 [merge_send_recv]: 7.66001e-06 [auto_parallel]: 5.83997e-06 [parallel]: 1.685e-05 [flash_sp]: 6.73e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.37002e-06 [matmul_add_comm_reduction]: 9.15001e-06 [allreduce_slice_to_reducescatter]: 7.60017e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 6.17999e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.96e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 9.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.11e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56999e-06 [meta_fg_expand]: 2.19999e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 8.99998e-06 [renormalize]: 0.00045895 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.288e-05 [cse]: 2.671e-05 [a_3]: 4.124e-05 [Cycle 2]: 0.00059785, [45] [expand_dump_flag]: 1.10001e-06 [switch_simplify]: 7.07002e-06 [loop_unroll]: 5.69e-06 [a_1]: 0.00012971 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.94001e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.829e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.44998e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.22e-06 [flash_sp]: 3.04001e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.28e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.14e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 6.23998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.92001e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.71001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.29998e-06 [a_after_grad]: 8.28999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.10002e-06 [cse]: 1.558e-05 [a_3]: 3.239e-05 [py_interpret_to_execute_after_opt_a]: 7.54002e-06 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 3.108e-05 [convert_after_rewriter]: 7.28e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.00044564 [opt_b]: 0.00018274, [1] [Cycle 1]: 0.00017638, [7] [b_1]: 0.00010922 [b_2]: 7.34002e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.80009e-07 [cse]: 1.59e-05 [optimize_parallel_all_gather_comm]: 1.541e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.18e-05 [loop_unroll]: 0.00041228 [opt_after_cconv]: 9.39e-05, [1] [Cycle 1]: 8.813e-05, [7] [c_1]: 2.755e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.49998e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.602e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.212e-05 [tuple_transform]: 6.876e-05, [1] [Cycle 1]: 6.443e-05, [4] [d_1]: 3.892e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.321e-05 [cse_after_recomputation]: 1.96e-05, [1] [Cycle 1]: 1.517e-05, [1] [cse]: 9.88002e-06 [environ_conv]: 4.65999e-06 [swap_dp_allreduce_reducescatter]: 5.33002e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.47999e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.78e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.34e-06 [overlap_opt_shard_in_pipeline]: 1.40999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61002e-06 [control_data_broadcast_order]: 1.205e-05 [grouped_pairwise_exchange_alltoall]: 1.64e-06 [offloading_packed_experts]: 3.76999e-06 [overlap_recompute_and_grad_model_parallel]: 4.22003e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.634e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 6.732e-05, [1] [Cycle 1]: 6.308e-05, [6] [build]: 2.26998e-06 [elim_shapecalc]: 8.24998e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 6.01998e-06 [fold_const_symbol]: 8.83001e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.511e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00044543 [validate]: 2.995e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.00585917 [execute]: 6.74999e-06 Sums bootstrap : 0.000415s : 2.67% type_inference : 0.005414s : 34.87% event_method : 0.000014s : 0.09% auto_monad : 0.000053s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000040s : 0.26% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000581s : 3.74% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.94% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000459s : 2.96% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000074s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000446s : 2.87% optimize.opt_b.b_1 : 0.000109s : 0.70% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000412s : 2.66% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 2.87% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005859s : 37.73% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.30% : 0.000023s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.98% : 0.000109s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.86% : 0.000005s : 4: substitution.remove_not_recompute_node 2.57% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005374 2 90.05% : 0.004839s : 1: type_inference.infer 9.95% : 0.000535s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.51% : 0.000027s : 3: replace.inline 30.49% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.73% : 0.000107s : 3: match.inline 8.27% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.73% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.30% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.18% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.41% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.82% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.26% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.75% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 1.01% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.80% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.59% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.35% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000334 8 46.49% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.51% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027949 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.62% : 0.002969s : 1: add_attr 10.59% : 0.002960s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000058s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.58% : 0.000442s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.41% : 0.000954s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.72% : 0.002159s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.63% : 0.000455s : 1: opt_after_jit_grad 0.67% : 0.000186s : 1: opt_b 14.26% : 0.003985s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.93% : 0.000259s : 1: renormalize.infer 0.69% : 0.000194s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.00% : 0.005869s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 19.42% : 0.005427s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.0370758, [24] [bootstrap]: 0.00045116 [type_inference]: 0.0112687 [event_method]: 4.556e-05 [auto_monad]: 0.00012007 [graph_reusing]: 8.35001e-06 [inline]: 2.31e-06 [add_attr]: 0.00300035, [1] [add_attr_with_inline]: 0.0029915, [1] [Cycle 1]: 6.88e-05, [2] [tag_attr]: 3.418e-05 [meta_addattr_fg_expand]: 9.59e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 4.884e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 1.88002e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.013261, [53] [py_interpret_to_execute]: 3.773e-05 [rewriter_before_opt_a]: 0.0001441 [opt_a]: 0.0109993, [3] [Cycle 1]: 0.00702599, [45] [expand_dump_flag]: 4.08001e-06 [switch_simplify]: 7.523e-05 [loop_unroll]: 6.187e-05 [a_1]: 0.00144948 [with_stream_mark]: 2.172e-05 [recompute_prepare]: 2.099e-05 [updatestate_depend_eliminate]: 9.07001e-06 [updatestate_assign_eliminate]: 7.55e-06 [updatestate_loads_eliminate]: 7.28e-06 [parameter_eliminate]: 2.38002e-06 [a_2]: 0.00026896 [accelerated_algorithm]: 3.25e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 3.36001e-06 [shard_inline]: 1.619e-05 [merge_send_recv]: 1.631e-05 [auto_parallel]: 1.106e-05 [parallel]: 1.902e-05 [flash_sp]: 1.141e-05 [merge_comm]: 9.66e-06 [allreduce_fusion]: 8.92e-06 [matmul_add_comm_reduction]: 2.581e-05 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 1.804e-05 [virtual_dataset]: 1.611e-05 [get_grad_eliminate_]: 1.53e-05 [virtual_output]: 1.526e-05 [merge_forward]: 8.97e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.694e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.852e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 2.763e-05 [set_forward_comm_id_for_comm_node_pass]: 9.78998e-06 [meta_fg_expand]: 0.00139107 [flash_sp_send_recv_attached]: 3.53e-06 [receive_attached]: 2.56e-06 [after_resolve]: 5.914e-05 [a_after_grad]: 8.098e-05 [renormalize]: 0.00240219 [add_forward_monad_depend]: 9.10999e-06 [auto_monad_grad]: 6.36e-06 [auto_monad_eliminator]: 5.422e-05 [cse]: 0.00016334 [a_3]: 0.00033439 [Cycle 2]: 0.00305202, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 4.725e-05 [loop_unroll]: 4.428e-05 [a_1]: 0.00153887 [with_stream_mark]: 1.172e-05 [recompute_prepare]: 1.03e-05 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.65998e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.00012683 [accelerated_algorithm]: 1.201e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 2.04e-06 [shard_inline]: 9.27001e-06 [merge_send_recv]: 6.66999e-06 [auto_parallel]: 7.45e-06 [parallel]: 5.24e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.17999e-06 [allreduce_fusion]: 4.64002e-06 [matmul_add_comm_reduction]: 7.81001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.82001e-06 [virtual_dataset]: 8.84003e-06 [get_grad_eliminate_]: 8.80999e-06 [virtual_output]: 8.67e-06 [merge_forward]: 4.38001e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.658e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.447e-05 [set_forward_comm_id_for_comm_node_pass]: 5.40999e-06 [meta_fg_expand]: 6.954e-05 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.637e-05 [a_after_grad]: 1.454e-05 [renormalize]: 0.00063392 [add_forward_monad_depend]: 4.35999e-06 [auto_monad_grad]: 1.12999e-06 [auto_monad_eliminator]: 1.435e-05 [cse]: 4.54e-05 [a_3]: 6.605e-05 [Cycle 3]: 0.0009072, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 1.095e-05 [loop_unroll]: 9.02e-06 [a_1]: 0.00025061 [with_stream_mark]: 9.92001e-06 [recompute_prepare]: 9.24e-06 [updatestate_depend_eliminate]: 4.80001e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 4e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 0.0001244 [accelerated_algorithm]: 1.171e-05 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 9.24998e-06 [merge_send_recv]: 7.01001e-06 [auto_parallel]: 7.51999e-06 [parallel]: 4.93001e-06 [flash_sp]: 1.14e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 4.91002e-06 [matmul_add_comm_reduction]: 7.36001e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 8.65999e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.34998e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.692e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.473e-05 [set_forward_comm_id_for_comm_node_pass]: 5.93002e-06 [meta_fg_expand]: 3.03e-06 [flash_sp_send_recv_attached]: 7.99977e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.388e-05 [a_after_grad]: 1.396e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 1.049e-05 [cse]: 2.588e-05 [a_3]: 5.974e-05 [py_interpret_to_execute_after_opt_a]: 1.024e-05 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 4.645e-05 [convert_after_rewriter]: 8.69998e-06 [order_py_execute_after_rewriter]: 7.00002e-06 [mutable_eliminate]: 0.00045851 [opt_b]: 0.00028929, [1] [Cycle 1]: 0.00028316, [7] [b_1]: 0.00018975 [b_2]: 1.123e-05 [updatestate_depend_eliminate]: 7.21001e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 4.08001e-06 [renormalize]: 3.00002e-07 [cse]: 3.196e-05 [optimize_parallel_all_gather_comm]: 2.012e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 1.965e-05 [loop_unroll]: 0.00042247 [opt_after_cconv]: 0.00013554, [1] [Cycle 1]: 0.00012958, [7] [c_1]: 4.865e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 7.26999e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 4.03999e-06 [cse]: 2.897e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 2.865e-05 [tuple_transform]: 0.00010314, [1] [Cycle 1]: 9.861e-05, [4] [d_1]: 6.773e-05 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.034e-05 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 6.155e-05 [cse_after_recomputation]: 3.207e-05, [1] [Cycle 1]: 2.74e-05, [1] [cse]: 2.168e-05 [environ_conv]: 8.02e-06 [swap_dp_allreduce_reducescatter]: 7.84002e-06 [bias_add_comm_swap]: 2.28002e-06 [label_micro_interleaved_index]: 4.54998e-06 [label_fine_grained_interleaved_index]: 2.57001e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.79001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.87002e-06 [comm_op_add_attrs]: 1.27e-06 [add_comm_op_reuse_tag]: 1.28002e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.739e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 5.20999e-06 [overlap_recompute_and_grad_model_parallel]: 5.54e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50999e-06 [overlap_recompute_comm]: 1.91e-06 [overlap_grad_ring_attention]: 5.42999e-06 [overlap_grad_flash_sp]: 2.448e-05 [begin_end_overlap_inline]: 7.89994e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.99979e-07 [symbol_engine_optimizer]: 9.761e-05, [1] [Cycle 1]: 9.338e-05, [6] [build]: 9.05999e-06 [elim_shapecalc]: 1.379e-05 [elim_not_effective]: 1.807e-05 [opt_reshape]: 9.94001e-06 [fold_const_symbol]: 1.482e-05 [renormalize]: 1.90019e-07 [detach_backward]: 1.57001e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.426e-05 [get_jit_bprop_graph]: 1.11002e-06 [rewriter_after_jit_bprop_graph]: 3.75e-06 [opt_after_jit_grad]: 0.00046781 [validate]: 4.504e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00811018 [execute]: 6.65998e-06 Sums bootstrap : 0.000451s : 1.37% type_inference : 0.011269s : 34.32% event_method : 0.000046s : 0.14% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.44% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000133s : 0.41% optimize.opt_a.loop_unroll : 0.000115s : 0.35% optimize.opt_a.a_1 : 0.003239s : 9.86% optimize.opt_a.with_stream_mark : 0.000043s : 0.13% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000520s : 1.58% optimize.opt_a.accelerated_algorithm : 0.000056s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000057s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001464s : 4.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.003036s : 9.25% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.03% optimize.opt_a.auto_monad_eliminator : 0.000079s : 0.24% optimize.opt_a.cse : 0.000235s : 0.71% optimize.opt_a.a_3 : 0.000460s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000459s : 1.40% optimize.opt_b.b_1 : 0.000190s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000422s : 1.29% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000062s : 0.19% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.02% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000468s : 1.42% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008110s : 24.70% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000759 222 5.90% : 0.000045s : 12: substitution.arithmetic_simplify 1.92% : 0.000015s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 55.31% : 0.000420s : 17: substitution.inline 2.06% : 0.000016s : 2: substitution.inline_without_move 1.42% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.09% : 0.000016s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.83% : 0.000014s : 20: substitution.remove_not_recompute_node 3.05% : 0.000023s : 10: substitution.replace_applicator 1.48% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.65% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.31% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.73% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011196 2 87.07% : 0.009749s : 1: type_inference.infer 12.93% : 0.001447s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.60% : 0.000126s : 17: replace.inline 42.40% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000446 33 92.23% : 0.000411s : 17: match.inline 7.77% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.13% : 0.000008s : 68: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.43% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.33% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.52% : 0.000041s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000009s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.99% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.30% : 0.000010s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.22% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.98% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.07% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.00% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001559 34 56.27% : 0.000877s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.73% : 0.000682s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061637 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.87% : 0.003005s : 1: add_attr 4.86% : 0.002995s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000066s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.77% : 0.000477s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.00% : 0.004932s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.85% : 0.011002s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000477s : 1: opt_after_jit_grad 0.48% : 0.000293s : 1: opt_b 21.52% : 0.013265s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.58% : 0.001593s : 2: renormalize.infer 2.32% : 0.001430s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.18% : 0.008121s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 18.31% : 0.011283s : 1: type_inference 0.12% : 0.000076s : 1: validate TotalTime = 0.0187582, [24] [bootstrap]: 0.000414 [type_inference]: 0.00423078 [event_method]: 1.032e-05 [auto_monad]: 5.206e-05 [graph_reusing]: 4.79998e-06 [inline]: 2.02001e-06 [add_attr]: 0.00346073, [1] [add_attr_with_inline]: 0.00345241, [1] [Cycle 1]: 3.987e-05, [2] [tag_attr]: 1.17e-05 [meta_addattr_fg_expand]: 3.26001e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.165e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 6.09987e-07 [dataset_repeat_opt]: 1.79e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.00370718, [53] [py_interpret_to_execute]: 1.416e-05 [rewriter_before_opt_a]: 3.938e-05 [opt_a]: 0.0018565, [2] [Cycle 1]: 0.00125342, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.406e-05 [loop_unroll]: 1.381e-05 [a_1]: 0.00029294 [with_stream_mark]: 1.299e-05 [recompute_prepare]: 7.21001e-06 [updatestate_depend_eliminate]: 3.49001e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 3.31001e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 7.561e-05 [accelerated_algorithm]: 6.20002e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 5.66003e-06 [merge_send_recv]: 7.46001e-06 [auto_parallel]: 5.59998e-06 [parallel]: 1.791e-05 [flash_sp]: 7.5e-06 [merge_comm]: 3.36999e-06 [allreduce_fusion]: 3.18998e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 8.09989e-07 [virtual_shard_identity]: 7.15998e-06 [virtual_dataset]: 5.76998e-06 [get_grad_eliminate_]: 5.79e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.058e-05 [merge_recompute_call_nodes]: 1.60999e-06 [before_grad]: 9.71e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.21998e-06 [after_resolve]: 1.029e-05 [a_after_grad]: 9.04998e-06 [renormalize]: 0.00034667 [add_forward_monad_depend]: 4.26001e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.322e-05 [cse]: 2.636e-05 [a_3]: 4.114e-05 [Cycle 2]: 0.00059402, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012477 [with_stream_mark]: 1.111e-05 [recompute_prepare]: 6.09001e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.28002e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.866e-05 [accelerated_algorithm]: 5.88002e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.37998e-06 [flash_sp]: 3.18e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 4.74e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.19999e-06 [virtual_dataset]: 5.21002e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.27001e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.82001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.94001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.80998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.05999e-06 [a_after_grad]: 8.04002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 7.60017e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.238e-05 [a_3]: 3.376e-05 [py_interpret_to_execute_after_opt_a]: 7.43999e-06 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 2.943e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 4.97e-06 [mutable_eliminate]: 0.00044873 [opt_b]: 0.00018038, [1] [Cycle 1]: 0.00017445, [7] [b_1]: 0.00010848 [b_2]: 7.46999e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 4.99975e-07 [cse]: 1.555e-05 [optimize_parallel_all_gather_comm]: 1.571e-05 [overlap_param_gather]: 1.92999e-06 [cconv]: 2.216e-05 [loop_unroll]: 0.00040887 [opt_after_cconv]: 0.00014881, [1] [Cycle 1]: 0.00014312, [7] [c_1]: 8.172e-05 [parameter_eliminate]: 2.39999e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.537e-05 [renormalize]: 5.19998e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.925e-05, [1] [Cycle 1]: 6.447e-05, [4] [d_1]: 3.887e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.33998e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 4.4e-05 [cse_after_recomputation]: 1.969e-05, [1] [Cycle 1]: 1.542e-05, [1] [cse]: 1.039e-05 [environ_conv]: 4.68001e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.19001e-06 [label_micro_interleaved_index]: 4.25e-06 [label_fine_grained_interleaved_index]: 2.78e-06 [merge_cast_opt]: 1.36002e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.01002e-06 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.46998e-06 [comm_op_add_attrs]: 1.34e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.45001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.151e-05 [grouped_pairwise_exchange_alltoall]: 1.74e-06 [offloading_packed_experts]: 3.51999e-06 [overlap_recompute_and_grad_model_parallel]: 4.46002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.21997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 3.92998e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 5.39992e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 6.861e-05, [1] [Cycle 1]: 6.455e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.141e-05 [opt_reshape]: 6.27001e-06 [fold_const_symbol]: 8.92999e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.66998e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.545e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00044603 [validate]: 3.084e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00615373 [execute]: 7.05998e-06 Sums bootstrap : 0.000414s : 2.88% type_inference : 0.004231s : 29.47% event_method : 0.000010s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000418s : 2.91% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000347s : 2.42% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000075s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 3.13% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000409s : 2.85% optimize.opt_after_cconv.c_1 : 0.000082s : 0.57% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000446s : 3.11% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006154s : 42.87% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.33% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.17% : 0.000005s : 4: substitution.graph_param_transform 65.91% : 0.000080s : 2: substitution.inline 2.14% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.93% : 0.000005s : 4: substitution.remove_not_recompute_node 3.08% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004192 2 91.93% : 0.003853s : 1: type_inference.infer 8.07% : 0.000338s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.76% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.36% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 1.18% : 0.000002s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.36% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.68% : 0.000001s : 4: predicate.row_tensor_eliminate 1.03% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 1.08% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.38% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 43.33% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.67% : 0.000134s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027257 196 0.01% : 0.000003s : 1: ForceFp32Comm 12.71% : 0.003465s : 1: add_attr 12.68% : 0.003456s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.62% : 0.000442s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.53% : 0.000417s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.68% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.83% : 0.000771s : 78: opt.transform.opt_a 0.29% : 0.000080s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.33% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.82% : 0.001859s : 1: opt_a 0.56% : 0.000152s : 1: opt_after_cconv 1.67% : 0.000455s : 1: opt_after_jit_grad 0.67% : 0.000184s : 1: opt_b 13.61% : 0.003711s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000190s : 1: renormalize.infer 0.55% : 0.000150s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000033s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000071s : 1: symbol_engine_optimizer 22.61% : 0.006163s : 1: task_emit 0.26% : 0.000072s : 1: tuple_transform 15.57% : 0.004244s : 1: type_inference 0.21% : 0.000057s : 1: validate TotalTime = 0.0355704, [24] [bootstrap]: 0.00044917 [type_inference]: 0.0101646 [event_method]: 3.995e-05 [auto_monad]: 0.0001143 [graph_reusing]: 7.79002e-06 [inline]: 2.61e-06 [add_attr]: 0.00296905, [1] [add_attr_with_inline]: 0.00296108, [1] [Cycle 1]: 6.58e-05, [2] [tag_attr]: 3.139e-05 [meta_addattr_fg_expand]: 8.27e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 4.564e-05 [insert-virtual-dataset]: 2.64999e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.0129473, [53] [py_interpret_to_execute]: 3.485e-05 [rewriter_before_opt_a]: 0.00012652 [opt_a]: 0.0106797, [3] [Cycle 1]: 0.00680806, [45] [expand_dump_flag]: 3.65e-06 [switch_simplify]: 6.68e-05 [loop_unroll]: 5.515e-05 [a_1]: 0.00133807 [with_stream_mark]: 2.304e-05 [recompute_prepare]: 2.122e-05 [updatestate_depend_eliminate]: 9.37001e-06 [updatestate_assign_eliminate]: 7.98001e-06 [updatestate_loads_eliminate]: 8.00999e-06 [parameter_eliminate]: 2.48e-06 [a_2]: 0.00024671 [accelerated_algorithm]: 3.044e-05 [shard]: 1.77999e-06 [meta_shard_fg_expand]: 3.76001e-06 [shard_inline]: 1.627e-05 [merge_send_recv]: 1.538e-05 [auto_parallel]: 1.106e-05 [parallel]: 1.857e-05 [flash_sp]: 1.151e-05 [merge_comm]: 9.68997e-06 [allreduce_fusion]: 9.12001e-06 [matmul_add_comm_reduction]: 2.542e-05 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 1.789e-05 [virtual_dataset]: 1.578e-05 [get_grad_eliminate_]: 1.529e-05 [virtual_output]: 1.504e-05 [merge_forward]: 9.44e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 1.773e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.865e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 2.714e-05 [set_forward_comm_id_for_comm_node_pass]: 9.88002e-06 [meta_fg_expand]: 0.00139951 [flash_sp_send_recv_attached]: 3.61001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 5.926e-05 [a_after_grad]: 8.017e-05 [renormalize]: 0.00233225 [add_forward_monad_depend]: 9.21002e-06 [auto_monad_grad]: 5.17999e-06 [auto_monad_eliminator]: 5.523e-05 [cse]: 0.00015878 [a_3]: 0.00033379 [Cycle 2]: 0.00295605, [45] [expand_dump_flag]: 1.72999e-06 [switch_simplify]: 4.675e-05 [loop_unroll]: 4.431e-05 [a_1]: 0.00154754 [with_stream_mark]: 1.18e-05 [recompute_prepare]: 1.076e-05 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 9.49978e-07 [a_2]: 0.00012651 [accelerated_algorithm]: 1.215e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.86e-06 [shard_inline]: 9.27001e-06 [merge_send_recv]: 6.75002e-06 [auto_parallel]: 7.53e-06 [parallel]: 4.85001e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 5.09e-06 [allreduce_fusion]: 4.64002e-06 [matmul_add_comm_reduction]: 7.46001e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 1.011e-05 [virtual_dataset]: 8.95001e-06 [get_grad_eliminate_]: 8.74003e-06 [virtual_output]: 8.79e-06 [merge_forward]: 4.94e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.677e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.403e-05 [set_forward_comm_id_for_comm_node_pass]: 5.44e-06 [meta_fg_expand]: 3.424e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 1.448e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00056933 [add_forward_monad_depend]: 4.08999e-06 [auto_monad_grad]: 1.24998e-06 [auto_monad_eliminator]: 1.478e-05 [cse]: 4.496e-05 [a_3]: 6.527e-05 [Cycle 3]: 0.00090155, [45] [expand_dump_flag]: 1.10999e-06 [switch_simplify]: 1.055e-05 [loop_unroll]: 8.84003e-06 [a_1]: 0.00025141 [with_stream_mark]: 9.94001e-06 [recompute_prepare]: 9.57999e-06 [updatestate_depend_eliminate]: 4.80001e-06 [updatestate_assign_eliminate]: 3.88999e-06 [updatestate_loads_eliminate]: 3.83001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00012285 [accelerated_algorithm]: 1.16e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.05999e-06 [merge_send_recv]: 6.91001e-06 [auto_parallel]: 7.11001e-06 [parallel]: 4.60001e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 7.73999e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 9.54e-06 [virtual_output]: 8.69003e-06 [merge_forward]: 4.87998e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 8.64e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.642e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.412e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35999e-06 [meta_fg_expand]: 2.86999e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.322e-05 [a_after_grad]: 1.386e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 1.039e-05 [cse]: 2.539e-05 [a_3]: 5.94e-05 [py_interpret_to_execute_after_opt_a]: 1.053e-05 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 4.671e-05 [convert_after_rewriter]: 9.25999e-06 [order_py_execute_after_rewriter]: 6.67002e-06 [mutable_eliminate]: 0.00045749 [opt_b]: 0.00032399, [1] [Cycle 1]: 0.00031785, [7] [b_1]: 0.00022445 [b_2]: 1.108e-05 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 4.19002e-06 [renormalize]: 5.60016e-07 [cse]: 3.028e-05 [optimize_parallel_all_gather_comm]: 2.036e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 1.999e-05 [loop_unroll]: 0.00042496 [opt_after_cconv]: 0.00013521, [1] [Cycle 1]: 0.00012919, [7] [c_1]: 4.89e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 7.3e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 3.92998e-06 [cse]: 2.899e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 2.827e-05 [tuple_transform]: 0.00010115, [1] [Cycle 1]: 9.669e-05, [4] [d_1]: 6.609e-05 [none_parameter_eliminate]: 2.08998e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.99999e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.712e-05 [cse_after_recomputation]: 3.082e-05, [1] [Cycle 1]: 2.61e-05, [1] [cse]: 2.071e-05 [environ_conv]: 9.02999e-06 [swap_dp_allreduce_reducescatter]: 7.78001e-06 [bias_add_comm_swap]: 2.73e-06 [label_micro_interleaved_index]: 4.43999e-06 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.51998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.709e-05 [grouped_pairwise_exchange_alltoall]: 1.97001e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.32999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 4.99e-06 [overlap_grad_flash_sp]: 2.395e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.05999e-06 [symbol_engine_optimizer]: 9.775e-05, [1] [Cycle 1]: 9.354e-05, [6] [build]: 9.49e-06 [elim_shapecalc]: 1.323e-05 [elim_not_effective]: 1.85e-05 [opt_reshape]: 9.68997e-06 [fold_const_symbol]: 1.506e-05 [renormalize]: 1.90019e-07 [detach_backward]: 2.05002e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 2.471e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.00046608 [validate]: 4.339e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.00807402 [execute]: 6.68e-06 Sums bootstrap : 0.000449s : 1.43% type_inference : 0.010165s : 32.41% event_method : 0.000040s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000003s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.40% optimize.opt_a.loop_unroll : 0.000108s : 0.35% optimize.opt_a.a_1 : 0.003137s : 10.00% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.58% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000034s : 0.11% optimize.opt_a.virtual_output : 0.000033s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.07% optimize.opt_a.meta_fg_expand : 0.001437s : 4.58% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000108s : 0.35% optimize.opt_a.renormalize : 0.002902s : 9.25% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.26% optimize.opt_a.cse : 0.000229s : 0.73% optimize.opt_a.a_3 : 0.000458s : 1.46% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000457s : 1.46% optimize.opt_b.b_1 : 0.000224s : 0.72% optimize.opt_b.b_2 : 0.000011s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000425s : 1.35% optimize.opt_after_cconv.c_1 : 0.000049s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000057s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000466s : 1.49% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008074s : 25.74% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000736 218 5.82% : 0.000043s : 11: substitution.arithmetic_simplify 1.80% : 0.000013s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000007s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.36% : 0.000003s : 2: substitution.incorporate_call_switch 55.51% : 0.000409s : 16: substitution.inline 2.12% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000014s : 3: substitution.less_batch_normalization 1.72% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.21% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.86% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.41% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010097 2 87.09% : 0.008794s : 1: type_inference.infer 12.91% : 0.001304s : 1: type_inference.specialize ------[replace.] 0.000204 30 59.04% : 0.000121s : 16: replace.inline 40.96% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000430 30 92.96% : 0.000400s : 16: match.inline 7.04% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.41% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.79% : 0.000013s : 107: predicate.environ_get_eliminate 1.23% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000019s : 164: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.65% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.96% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.84% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.90% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.22% : 0.000002s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001496 32 56.61% : 0.000847s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.39% : 0.000649s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059539 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.99% : 0.002973s : 1: add_attr 4.98% : 0.002965s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.80% : 0.000476s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.04% : 0.004786s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000210s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.94% : 0.010683s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.80% : 0.000475s : 1: opt_after_jit_grad 0.55% : 0.000328s : 1: opt_b 21.75% : 0.012951s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.60% : 0.001549s : 2: renormalize.infer 2.25% : 0.001340s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.58% : 0.008084s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.10% : 0.010179s : 1: type_inference 0.13% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x0-kbk],max_mem:60.0M TotalTime = 0.079682, [24] [bootstrap]: 0.00050246 [type_inference]: 0.00608808 [event_method]: 1.44e-05 [auto_monad]: 5.523e-05 [graph_reusing]: 5.03002e-06 [inline]: 1.71e-06 [add_attr]: 0.00343459, [1] [add_attr_with_inline]: 0.00342289, [1] [Cycle 1]: 4.466e-05, [2] [tag_attr]: 1.526e-05 [meta_addattr_fg_expand]: 4.01001e-06 [parallel-infer-symbol]: 3.04999e-06 [pre_auto_parallel]: 2.892e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.00399723, [53] [py_interpret_to_execute]: 1.986e-05 [rewriter_before_opt_a]: 5.761e-05 [opt_a]: 0.00214838, [2] [Cycle 1]: 0.00155553, [45] [expand_dump_flag]: 2.92002e-06 [switch_simplify]: 3.196e-05 [loop_unroll]: 2.163e-05 [a_1]: 0.00049093 [with_stream_mark]: 1.382e-05 [recompute_prepare]: 7.92e-06 [updatestate_depend_eliminate]: 3.59002e-06 [updatestate_assign_eliminate]: 3.28998e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.573e-05 [accelerated_algorithm]: 6.19999e-06 [shard]: 2.49999e-06 [meta_shard_fg_expand]: 1.66002e-06 [shard_inline]: 6.36e-06 [merge_send_recv]: 8.02998e-06 [auto_parallel]: 5.81e-06 [parallel]: 2.279e-05 [flash_sp]: 7.15003e-06 [merge_comm]: 3.68999e-06 [allreduce_fusion]: 3.36999e-06 [matmul_add_comm_reduction]: 8.35999e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 7.33999e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.51998e-06 [virtual_output]: 5.61e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.043e-05 [merge_recompute_call_nodes]: 1.81998e-06 [before_grad]: 9.84999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.78998e-06 [flash_sp_send_recv_attached]: 3.01001e-06 [receive_attached]: 2.48e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00041543 [add_forward_monad_depend]: 4.18001e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.785e-05 [a_3]: 4e-05 [Cycle 2]: 0.00058361, [45] [expand_dump_flag]: 9.39996e-07 [switch_simplify]: 6.90002e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012479 [with_stream_mark]: 9.57001e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 7.39994e-07 [a_2]: 6.748e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.29e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 5.29998e-06 [parallel]: 4.02998e-06 [flash_sp]: 2.84999e-06 [merge_comm]: 2.94001e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 5.19998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.67001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.78999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 8.74e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.99978e-07 [auto_monad_eliminator]: 6.16998e-06 [cse]: 1.227e-05 [a_3]: 3.129e-05 [py_interpret_to_execute_after_opt_a]: 7.97998e-06 [slice_cell_reuse_recomputed_activation]: 1.72001e-06 [rewriter_after_opt_a]: 3.013e-05 [convert_after_rewriter]: 1.001e-05 [order_py_execute_after_rewriter]: 5.32999e-06 [mutable_eliminate]: 0.00045657 [opt_b]: 0.00018081, [1] [Cycle 1]: 0.00017472, [7] [b_1]: 0.00010744 [b_2]: 7.11001e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 4.80009e-07 [cse]: 1.618e-05 [optimize_parallel_all_gather_comm]: 1.565e-05 [overlap_param_gather]: 2.04e-06 [cconv]: 2.171e-05 [loop_unroll]: 0.00041374 [opt_after_cconv]: 9.452e-05, [1] [Cycle 1]: 8.879e-05, [7] [c_1]: 2.834e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.561e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.965e-05, [1] [Cycle 1]: 6.494e-05, [4] [d_1]: 3.86e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.59001e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.043e-05 [cse_after_recomputation]: 2.094e-05, [1] [Cycle 1]: 1.663e-05, [1] [cse]: 1.151e-05 [environ_conv]: 4.24002e-06 [swap_dp_allreduce_reducescatter]: 4.97999e-06 [bias_add_comm_swap]: 2.83e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.94001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.04003e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.30001e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.126e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.89e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.63e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.697e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.12999e-06 [split_layernorm_comm]: 1.82999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.871e-05, [1] [Cycle 1]: 6.461e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 2.10013e-07 [detach_backward]: 2.15002e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.569e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00048162 [validate]: 3.016e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.0647969 [execute]: 8.39002e-06 Sums bootstrap : 0.000502s : 0.67% type_inference : 0.006088s : 8.09% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000616s : 0.82% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000005s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000416s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.05% optimize.opt_a.a_3 : 0.000071s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.61% optimize.opt_b.b_1 : 0.000107s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000414s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.07% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000482s : 0.64% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064797s : 86.07% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000167 30 15.22% : 0.000025s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.95% : 0.000112s : 3: substitution.inline 1.88% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.43% : 0.000004s : 4: substitution.remove_not_recompute_node 2.20% : 0.000004s : 4: substitution.replace_old_param 6.29% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005993 2 90.97% : 0.005452s : 1: type_inference.infer 9.03% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000075 5 84.66% : 0.000063s : 3: replace.inline 15.34% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 92.07% : 0.000110s : 3: match.inline 7.93% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.33% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.57% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.38% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.07% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.22% : 0.000008s : 54: predicate.switch_simplify 1.03% : 0.000002s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.09% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.71% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 46.97% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.03% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088660 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.88% : 0.003439s : 1: add_attr 3.87% : 0.003427s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.60% : 0.000533s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.11% : 0.000980s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.43% : 0.002151s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.55% : 0.000491s : 1: opt_after_jit_grad 0.21% : 0.000184s : 1: opt_b 4.51% : 0.004001s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000034s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000215s : 1: renormalize.infer 0.22% : 0.000195s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 73.11% : 0.064816s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.88% : 0.006102s : 1: type_inference 0.06% : 0.000056s : 1: validate TotalTime = 0.0706023, [24] [bootstrap]: 0.00043152 [type_inference]: 0.00442437 [event_method]: 1.087e-05 [auto_monad]: 5.099e-05 [graph_reusing]: 4.80999e-06 [inline]: 1.86998e-06 [add_attr]: 0.00296803, [1] [add_attr_with_inline]: 0.00295965, [1] [Cycle 1]: 4.105e-05, [2] [tag_attr]: 1.164e-05 [meta_addattr_fg_expand]: 3.11001e-06 [parallel-infer-symbol]: 2.54999e-06 [pre_auto_parallel]: 2.084e-05 [insert-virtual-dataset]: 2.28002e-06 [parallel-infer-symbol-second]: 6.10016e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00368946, [53] [py_interpret_to_execute]: 1.509e-05 [rewriter_before_opt_a]: 3.95e-05 [opt_a]: 0.00185398, [2] [Cycle 1]: 0.00125272, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 2.46e-05 [loop_unroll]: 1.381e-05 [a_1]: 0.00029066 [with_stream_mark]: 1.293e-05 [recompute_prepare]: 7.21001e-06 [updatestate_depend_eliminate]: 3.67002e-06 [updatestate_assign_eliminate]: 3.22002e-06 [updatestate_loads_eliminate]: 3.11999e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.597e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 7.98999e-06 [auto_parallel]: 5.50001e-06 [parallel]: 1.815e-05 [flash_sp]: 6.98e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 9.09e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 6.98998e-06 [virtual_dataset]: 5.72001e-06 [get_grad_eliminate_]: 5.36002e-06 [virtual_output]: 5.40999e-06 [merge_forward]: 3.87002e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.044e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.41998e-06 [after_resolve]: 1.062e-05 [a_after_grad]: 9.05999e-06 [renormalize]: 0.00034578 [add_forward_monad_depend]: 4.27e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.344e-05 [cse]: 2.839e-05 [a_3]: 3.953e-05 [Cycle 2]: 0.0005918, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.85998e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.0001257 [with_stream_mark]: 1.067e-05 [recompute_prepare]: 5.84999e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.25002e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 7.90023e-07 [a_2]: 6.665e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.85002e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.50001e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.08002e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.16002e-06 [get_grad_eliminate_]: 4.94998e-06 [virtual_output]: 4.85999e-06 [merge_forward]: 2.53003e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.08001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.66e-06 [a_after_grad]: 8.43001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.04001e-06 [cse]: 1.238e-05 [a_3]: 3.176e-05 [py_interpret_to_execute_after_opt_a]: 7.31999e-06 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 3.049e-05 [convert_after_rewriter]: 6.84001e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.00044989 [opt_b]: 0.00017833, [1] [Cycle 1]: 0.00017233, [7] [b_1]: 0.00010574 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.45002e-06 [renormalize]: 5.00004e-07 [cse]: 1.546e-05 [optimize_parallel_all_gather_comm]: 1.557e-05 [overlap_param_gather]: 2.34999e-06 [cconv]: 2.177e-05 [loop_unroll]: 0.0004158 [opt_after_cconv]: 9.487e-05, [1] [Cycle 1]: 8.931e-05, [7] [c_1]: 2.836e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.13002e-06 [cse]: 1.61e-05 [renormalize]: 5.90022e-07 [remove_dup_value]: 1.214e-05 [tuple_transform]: 7.003e-05, [1] [Cycle 1]: 6.591e-05, [4] [d_1]: 4.01e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.38e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 6.707e-05 [cse_after_recomputation]: 2.192e-05, [1] [Cycle 1]: 1.743e-05, [1] [cse]: 1.184e-05 [environ_conv]: 4.65001e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 3.21999e-06 [label_micro_interleaved_index]: 4.35999e-06 [label_fine_grained_interleaved_index]: 2.83e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.29999e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.86e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.00001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.184e-05 [grouped_pairwise_exchange_alltoall]: 1.92001e-06 [offloading_packed_experts]: 4.18999e-06 [overlap_recompute_and_grad_model_parallel]: 4.62998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.661e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 7.012e-05, [1] [Cycle 1]: 6.584e-05, [6] [build]: 2.51e-06 [elim_shapecalc]: 8.54002e-06 [elim_not_effective]: 1.196e-05 [opt_reshape]: 6.26e-06 [fold_const_symbol]: 9.12999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.72999e-06 [auto_monad_reorder]: 1.574e-05 [get_jit_bprop_graph]: 9.40025e-07 [rewriter_after_jit_bprop_graph]: 3.68e-06 [opt_after_jit_grad]: 0.00045383 [validate]: 3.057e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0582785 [execute]: 8.47e-06 Sums bootstrap : 0.000432s : 0.65% type_inference : 0.004424s : 6.64% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000346s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.67% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000015s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000416s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000067s : 0.10% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000454s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058279s : 87.40% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.55% : 0.000022s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.34% : 0.000002s : 2: substitution.fold_const_symbol 4.67% : 0.000006s : 4: substitution.graph_param_transform 65.00% : 0.000078s : 2: substitution.inline 2.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.37% : 0.000004s : 4: substitution.remove_not_recompute_node 3.41% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004384 2 91.82% : 0.004025s : 1: type_inference.infer 8.18% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.82% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.79% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.36% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.89% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.26% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.13% : 0.000002s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000258 6 43.13% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.87% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078529 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.79% : 0.002972s : 1: add_attr 3.77% : 0.002963s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000071s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.58% : 0.000459s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000766s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.36% : 0.001857s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.59% : 0.000464s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.70% : 0.003693s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000188s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000073s : 1: symbol_engine_optimizer 74.24% : 0.058298s : 1: task_emit 0.09% : 0.000073s : 1: tuple_transform 5.65% : 0.004438s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.0732867, [24] [bootstrap]: 0.00042961 [type_inference]: 0.00545335 [event_method]: 1.424e-05 [auto_monad]: 5.326e-05 [graph_reusing]: 5.46e-06 [inline]: 1.78002e-06 [add_attr]: 0.00420074, [1] [add_attr_with_inline]: 0.00419186, [1] [Cycle 1]: 0.00134465, [2] [tag_attr]: 1.79e-05 [meta_addattr_fg_expand]: 4.99e-06 [parallel-infer-symbol]: 3.13998e-06 [pre_auto_parallel]: 2.562e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 7.90023e-07 [dataset_repeat_opt]: 2.29001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.00403595, [53] [py_interpret_to_execute]: 2.35e-05 [rewriter_before_opt_a]: 6.083e-05 [opt_a]: 0.00214518, [2] [Cycle 1]: 0.00153919, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.162e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.00045111 [with_stream_mark]: 1.302e-05 [recompute_prepare]: 7.48e-06 [updatestate_depend_eliminate]: 3.39001e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 2.04e-06 [a_2]: 7.598e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 8.29998e-06 [auto_parallel]: 6.01e-06 [parallel]: 1.802e-05 [flash_sp]: 7.41999e-06 [merge_comm]: 3.43999e-06 [allreduce_fusion]: 3.30003e-06 [matmul_add_comm_reduction]: 8.82e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.89e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 3.51999e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.30001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.66998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.53998e-06 [receive_attached]: 2.49999e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00045026 [add_forward_monad_depend]: 4.25999e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.384e-05 [cse]: 2.749e-05 [a_3]: 4.124e-05 [Cycle 2]: 0.00059602, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.74999e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00012465 [with_stream_mark]: 9.53002e-06 [recompute_prepare]: 5.98002e-06 [updatestate_depend_eliminate]: 2.78998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.24999e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.737e-05 [accelerated_algorithm]: 5.46998e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.35001e-06 [merge_send_recv]: 4.22e-06 [auto_parallel]: 5.59e-06 [parallel]: 4.27e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 2.87002e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 5.16002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.84e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 4.85999e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.58003e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 7.36001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.45001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.97001e-06 [a_after_grad]: 8.55001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 8.99978e-07 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.46e-06 [cse]: 1.302e-05 [a_3]: 3.186e-05 [py_interpret_to_execute_after_opt_a]: 7.34002e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.021e-05 [convert_after_rewriter]: 7.10998e-06 [order_py_execute_after_rewriter]: 5.44e-06 [mutable_eliminate]: 0.00050609 [opt_b]: 0.00018136, [1] [Cycle 1]: 0.00017516, [7] [b_1]: 0.0001079 [b_2]: 7.60998e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 4.60015e-07 [cse]: 1.602e-05 [optimize_parallel_all_gather_comm]: 1.522e-05 [overlap_param_gather]: 1.85001e-06 [cconv]: 2.191e-05 [loop_unroll]: 0.00041525 [opt_after_cconv]: 9.46e-05, [1] [Cycle 1]: 8.88e-05, [7] [c_1]: 2.737e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.22001e-06 [cse]: 1.634e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 6.84e-05, [1] [Cycle 1]: 6.428e-05, [4] [d_1]: 3.91e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 5.94999e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.286e-05 [cse_after_recomputation]: 2.049e-05, [1] [Cycle 1]: 1.616e-05, [1] [cse]: 1.1e-05 [environ_conv]: 4.43001e-06 [swap_dp_allreduce_reducescatter]: 4.85999e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 4.03999e-06 [label_fine_grained_interleaved_index]: 3.08998e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.45002e-06 [assign_add_opt]: 1.14e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.11997e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.69999e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.131e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.33999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.46e-06 [overlap_grad_ring_attention]: 3.93001e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.32999e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.882e-05, [1] [Cycle 1]: 6.457e-05, [6] [build]: 2.49999e-06 [elim_shapecalc]: 8.63001e-06 [elim_not_effective]: 1.141e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 9.18002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.514e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.50998e-06 [opt_after_jit_grad]: 0.00044951 [validate]: 3.15e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0583527 [execute]: 8.17998e-06 Sums bootstrap : 0.000430s : 0.63% type_inference : 0.005453s : 8.00% event_method : 0.000014s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000018s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.03% optimize.rewriter_before_opt_a : 0.000061s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000576s : 0.85% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000017s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000450s : 0.66% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000506s : 0.74% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000415s : 0.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000450s : 0.66% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058353s : 85.64% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000167 30 15.00% : 0.000025s : 5: substitution.arithmetic_simplify 1.03% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.77% : 0.000111s : 3: substitution.inline 1.87% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.57% : 0.000004s : 4: substitution.replace_old_param 6.28% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005414 2 89.84% : 0.004864s : 1: type_inference.infer 10.16% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.13% : 0.000028s : 3: replace.inline 29.87% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 92.08% : 0.000109s : 3: match.inline 7.92% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.31% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.00% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.24% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.13% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.97% : 0.000002s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.64% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.87% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.34% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.56% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.75% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.66% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.08% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.88% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.32% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.43% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000381 8 50.90% : 0.000194s : 3: func_graph_cloner_run.FuncGraphClonerGraph 49.10% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.083070 196 0.00% : 0.000003s : 1: ForceFp32Comm 5.06% : 0.004205s : 1: add_attr 5.05% : 0.004196s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.55% : 0.000457s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.51% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.62% : 0.000516s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.13% : 0.000940s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.59% : 0.002148s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.55% : 0.000459s : 1: opt_after_jit_grad 0.22% : 0.000185s : 1: opt_b 4.86% : 0.004040s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000027s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.28% : 0.000231s : 1: renormalize.infer 0.26% : 0.000213s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.08% : 0.000065s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 70.27% : 0.058369s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 6.58% : 0.005467s : 1: type_inference 0.06% : 0.000054s : 1: validate TotalTime = 0.112033, [24] [bootstrap]: 0.00046135 [type_inference]: 0.0113335 [event_method]: 4.729e-05 [auto_monad]: 0.00011873 [graph_reusing]: 7.76001e-06 [inline]: 2.02999e-06 [add_attr]: 0.00296992, [1] [add_attr_with_inline]: 0.0029613, [1] [Cycle 1]: 6.97e-05, [2] [tag_attr]: 3.491e-05 [meta_addattr_fg_expand]: 9.46e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 4.897e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 1.04998e-06 [dataset_repeat_opt]: 1.97999e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0133122, [53] [py_interpret_to_execute]: 3.845e-05 [rewriter_before_opt_a]: 0.00014435 [opt_a]: 0.0110121, [3] [Cycle 1]: 0.00707172, [45] [expand_dump_flag]: 3.73001e-06 [switch_simplify]: 7.283e-05 [loop_unroll]: 6.101e-05 [a_1]: 0.00144119 [with_stream_mark]: 2.335e-05 [recompute_prepare]: 2.15e-05 [updatestate_depend_eliminate]: 9.01998e-06 [updatestate_assign_eliminate]: 7.8e-06 [updatestate_loads_eliminate]: 7.10998e-06 [parameter_eliminate]: 2.59001e-06 [a_2]: 0.00024363 [accelerated_algorithm]: 3.061e-05 [shard]: 2.04e-06 [meta_shard_fg_expand]: 3.31999e-06 [shard_inline]: 1.64e-05 [merge_send_recv]: 1.572e-05 [auto_parallel]: 1.062e-05 [parallel]: 1.806e-05 [flash_sp]: 1.131e-05 [merge_comm]: 9.56e-06 [allreduce_fusion]: 8.84e-06 [matmul_add_comm_reduction]: 2.629e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.804e-05 [virtual_dataset]: 1.582e-05 [get_grad_eliminate_]: 1.539e-05 [virtual_output]: 1.531e-05 [merge_forward]: 9.86998e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 1.787e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.875e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 2.698e-05 [set_forward_comm_id_for_comm_node_pass]: 9.52001e-06 [meta_fg_expand]: 0.0014287 [flash_sp_send_recv_attached]: 3.93001e-06 [receive_attached]: 2.58e-06 [after_resolve]: 5.971e-05 [a_after_grad]: 8.055e-05 [renormalize]: 0.00243964 [add_forward_monad_depend]: 9.59e-06 [auto_monad_grad]: 5.18002e-06 [auto_monad_eliminator]: 5.676e-05 [cse]: 0.00017073 [a_3]: 0.00033344 [Cycle 2]: 0.00302941, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.625e-05 [loop_unroll]: 4.359e-05 [a_1]: 0.00155289 [with_stream_mark]: 1.168e-05 [recompute_prepare]: 1.154e-05 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 4.13999e-06 [updatestate_loads_eliminate]: 3.61001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012543 [accelerated_algorithm]: 1.235e-05 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 9.27001e-06 [merge_send_recv]: 6.91999e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.47e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.79e-06 [allreduce_fusion]: 4.90001e-06 [matmul_add_comm_reduction]: 7.60998e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.98002e-06 [get_grad_eliminate_]: 8.62e-06 [virtual_output]: 8.57e-06 [merge_forward]: 4.45e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.676e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.397e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 6.952e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.12999e-06 [after_resolve]: 1.625e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00059979 [add_forward_monad_depend]: 3.97e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.475e-05 [cse]: 4.541e-05 [a_3]: 6.508e-05 [Cycle 3]: 0.00089652, [45] [expand_dump_flag]: 1.03001e-06 [switch_simplify]: 1.063e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.0002473 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 9.10999e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012249 [accelerated_algorithm]: 1.177e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 8.78001e-06 [merge_send_recv]: 6.87002e-06 [auto_parallel]: 7.18998e-06 [parallel]: 4.26001e-06 [flash_sp]: 1.03001e-06 [merge_comm]: 4.94998e-06 [allreduce_fusion]: 4.81002e-06 [matmul_add_comm_reduction]: 7.92e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.007e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.38999e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.22e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 8.65999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.622e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.4e-05 [set_forward_comm_id_for_comm_node_pass]: 5.84e-06 [meta_fg_expand]: 3.04001e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 1.459e-05 [a_after_grad]: 1.456e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 9.89996e-07 [auto_monad_eliminator]: 1.051e-05 [cse]: 2.507e-05 [a_3]: 5.886e-05 [py_interpret_to_execute_after_opt_a]: 1.007e-05 [slice_cell_reuse_recomputed_activation]: 2.30002e-06 [rewriter_after_opt_a]: 4.795e-05 [convert_after_rewriter]: 9.02e-06 [order_py_execute_after_rewriter]: 7.16999e-06 [mutable_eliminate]: 0.00046287 [opt_b]: 0.00032143, [1] [Cycle 1]: 0.0003149, [7] [b_1]: 0.00022089 [b_2]: 1.101e-05 [updatestate_depend_eliminate]: 7.1e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.99002e-06 [renormalize]: 3.00002e-07 [cse]: 3.165e-05 [optimize_parallel_all_gather_comm]: 2.103e-05 [overlap_param_gather]: 1.84998e-06 [cconv]: 1.95e-05 [loop_unroll]: 0.00042871 [opt_after_cconv]: 0.00013528, [1] [Cycle 1]: 0.00012938, [7] [c_1]: 4.883e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 6.83e-06 [updatestate_assign_eliminate]: 4.09002e-06 [updatestate_loads_eliminate]: 4.06001e-06 [cse]: 2.968e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.926e-05 [tuple_transform]: 0.00010208, [1] [Cycle 1]: 9.729e-05, [4] [d_1]: 6.709e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.012e-05 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 5.622e-05 [cse_after_recomputation]: 3.152e-05, [1] [Cycle 1]: 2.686e-05, [1] [cse]: 2.143e-05 [environ_conv]: 8.79003e-06 [swap_dp_allreduce_reducescatter]: 8.01001e-06 [bias_add_comm_swap]: 2.50002e-06 [label_micro_interleaved_index]: 4.31002e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.45001e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 8.90024e-07 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.48998e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.39995e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.01997e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.703e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 5.61003e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.27999e-06 [overlap_grad_ring_attention]: 4.97e-06 [overlap_grad_flash_sp]: 2.387e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 1.71002e-06 [handle_group_info]: 1.14003e-06 [symbol_engine_optimizer]: 9.826e-05, [1] [Cycle 1]: 9.406e-05, [6] [build]: 9.05999e-06 [elim_shapecalc]: 1.297e-05 [elim_not_effective]: 1.868e-05 [opt_reshape]: 1.029e-05 [fold_const_symbol]: 1.463e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.42e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00046536 [validate]: 4.582e-05 [backend_pass]: 1.36998e-06 [task_emit]: 0.0829127 [execute]: 8.1e-06 Sums bootstrap : 0.000461s : 0.43% type_inference : 0.011333s : 10.52% event_method : 0.000047s : 0.04% auto_monad : 0.000119s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000144s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.12% optimize.opt_a.loop_unroll : 0.000114s : 0.11% optimize.opt_a.a_1 : 0.003241s : 3.01% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000492s : 0.46% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000019s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001501s : 1.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.08% optimize.opt_a.a_after_grad : 0.000109s : 0.10% optimize.opt_a.renormalize : 0.003039s : 2.82% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000241s : 0.22% optimize.opt_a.a_3 : 0.000457s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.43% optimize.opt_b.b_1 : 0.000221s : 0.20% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000429s : 0.40% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.05% optimize.cse_after_recomputation.cse : 0.000021s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000465s : 0.43% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.082913s : 76.94% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000758 222 6.03% : 0.000046s : 12: substitution.arithmetic_simplify 1.78% : 0.000013s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.56% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000008s : 8: substitution.graph_param_transform 0.33% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 55.40% : 0.000420s : 17: substitution.inline 2.06% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.83% : 0.000014s : 20: substitution.remove_not_recompute_node 3.10% : 0.000023s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.72% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011259 2 86.27% : 0.009713s : 1: type_inference.infer 13.73% : 0.001546s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.32% : 0.000124s : 17: replace.inline 42.68% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000445 33 92.34% : 0.000411s : 17: match.inline 7.66% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.99% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000042s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.67% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.26% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.35% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000015s : 101: predicate.partial_defer_inline 1.72% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.43% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.60% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001604 34 55.99% : 0.000898s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.01% : 0.000706s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.136558 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.18% : 0.002974s : 1: add_attr 2.17% : 0.002965s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000126s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.36% : 0.000488s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000034s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000055s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000438s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.59% : 0.004897s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.15% : 0.000206s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.07% : 0.011015s : 1: opt_a 0.10% : 0.000139s : 1: opt_after_cconv 0.35% : 0.000475s : 1: opt_after_jit_grad 0.24% : 0.000325s : 1: opt_b 9.75% : 0.013316s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000005s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.18% : 0.001617s : 2: renormalize.infer 1.03% : 0.001410s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.11% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000101s : 1: symbol_engine_optimizer 60.73% : 0.082932s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 8.31% : 0.011348s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.0716708, [24] [bootstrap]: 0.00048556 [type_inference]: 0.00434975 [event_method]: 1.122e-05 [auto_monad]: 5.188e-05 [graph_reusing]: 5.27999e-06 [inline]: 1.79e-06 [add_attr]: 0.00319713, [1] [add_attr_with_inline]: 0.00318725, [1] [Cycle 1]: 5.052e-05, [2] [tag_attr]: 1.279e-05 [meta_addattr_fg_expand]: 3.20998e-06 [parallel-infer-symbol]: 3.71001e-06 [pre_auto_parallel]: 2.502e-05 [insert-virtual-dataset]: 2.33998e-06 [parallel-infer-symbol-second]: 7.90023e-07 [dataset_repeat_opt]: 1.71002e-06 [pipeline_split]: 1.86998e-06 [optimize]: 0.00393537, [53] [py_interpret_to_execute]: 1.686e-05 [rewriter_before_opt_a]: 3.872e-05 [opt_a]: 0.0020248, [2] [Cycle 1]: 0.00138437, [45] [expand_dump_flag]: 3.01999e-06 [switch_simplify]: 2.522e-05 [loop_unroll]: 1.384e-05 [a_1]: 0.000302 [with_stream_mark]: 1.391e-05 [recompute_prepare]: 7.55e-06 [updatestate_depend_eliminate]: 3.48999e-06 [updatestate_assign_eliminate]: 3.03e-06 [updatestate_loads_eliminate]: 2.98998e-06 [parameter_eliminate]: 1.52999e-06 [a_2]: 7.641e-05 [accelerated_algorithm]: 6.23e-06 [shard]: 2.28998e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.93998e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 6.24999e-06 [parallel]: 1.893e-05 [flash_sp]: 7.81001e-06 [merge_comm]: 3.8e-06 [allreduce_fusion]: 3.42002e-06 [matmul_add_comm_reduction]: 8.70001e-06 [allreduce_slice_to_reducescatter]: 7.7e-07 [virtual_shard_identity]: 6.91999e-06 [virtual_dataset]: 5.95002e-06 [get_grad_eliminate_]: 5.76e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.13e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 9.59999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.56001e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.108e-05 [a_after_grad]: 9.09998e-06 [renormalize]: 0.00045085 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 2.14e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.885e-05 [a_3]: 4.174e-05 [Cycle 2]: 0.00063069, [45] [expand_dump_flag]: 1.12999e-06 [switch_simplify]: 7.1e-06 [loop_unroll]: 5.62001e-06 [a_1]: 0.00012635 [with_stream_mark]: 1.024e-05 [recompute_prepare]: 5.83002e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.788e-05 [accelerated_algorithm]: 5.46998e-06 [shard]: 1.28002e-06 [meta_shard_fg_expand]: 1.24998e-06 [shard_inline]: 6.09001e-06 [merge_send_recv]: 4.75001e-06 [auto_parallel]: 5.52001e-06 [parallel]: 4.03001e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.36001e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 5.97001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.91998e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 2.264e-05 [merge_forward]: 2.88e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 6.63e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.65999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.75001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.47999e-06 [after_resolve]: 9.71e-06 [a_after_grad]: 8.17998e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.48002e-06 [auto_monad_grad]: 1.36002e-06 [auto_monad_eliminator]: 7.4e-06 [cse]: 1.428e-05 [a_3]: 3.27e-05 [py_interpret_to_execute_after_opt_a]: 8.77999e-06 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 3.483e-05 [convert_after_rewriter]: 6.90002e-06 [order_py_execute_after_rewriter]: 5.49e-06 [mutable_eliminate]: 0.00051129 [opt_b]: 0.00018882, [1] [Cycle 1]: 0.00018193, [7] [b_1]: 0.00010871 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 6.94999e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.28998e-06 [renormalize]: 2.19996e-07 [cse]: 1.889e-05 [optimize_parallel_all_gather_comm]: 1.71e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.414e-05 [loop_unroll]: 0.00042908 [opt_after_cconv]: 9.673e-05, [1] [Cycle 1]: 9.1e-05, [7] [c_1]: 2.761e-05 [parameter_eliminate]: 2.81e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.785e-05 [renormalize]: 2.10013e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.93e-05, [1] [Cycle 1]: 6.509e-05, [4] [d_1]: 3.962e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.82999e-06 [add_recomputation]: 4.433e-05 [cse_after_recomputation]: 2.059e-05, [1] [Cycle 1]: 1.626e-05, [1] [cse]: 1.108e-05 [environ_conv]: 4.57e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.18001e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.38998e-06 [micro_interleaved_order_control]: 2.39001e-06 [assign_add_opt]: 1.34998e-06 [ForceFp32Comm]: 9.49978e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 1.93002e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.39e-06 [overlap_opt_shard_in_pipeline]: 1.32999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.159e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 4.02002e-06 [overlap_grad_flash_sp]: 1.855e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 6.893e-05, [1] [Cycle 1]: 6.48e-05, [6] [build]: 2.91e-06 [elim_shapecalc]: 8.53001e-06 [elim_not_effective]: 1.146e-05 [opt_reshape]: 5.87999e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.87001e-06 [pipeline_parallel_scheduler]: 1.31002e-06 [auto_monad_reorder]: 1.49e-05 [get_jit_bprop_graph]: 1.67001e-06 [rewriter_after_jit_bprop_graph]: 3.61001e-06 [opt_after_jit_grad]: 0.00045117 [validate]: 3.515e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.0588576 [execute]: 9.22001e-06 Sums bootstrap : 0.000486s : 0.72% type_inference : 0.004350s : 6.45% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000004s : 0.01% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000017s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000428s : 0.63% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000028s : 0.04% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000022s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000451s : 0.67% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000004s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000043s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000035s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000511s : 0.76% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000019s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.04% optimize.loop_unroll : 0.000429s : 0.64% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000018s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000019s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000002s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000451s : 0.67% validate : 0.000035s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058858s : 87.23% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000129 26 17.15% : 0.000022s : 4: substitution.arithmetic_simplify 1.35% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 4.34% : 0.000006s : 4: substitution.graph_param_transform 66.74% : 0.000086s : 2: substitution.inline 2.40% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000005s : 4: substitution.remove_not_recompute_node 3.51% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004307 2 91.68% : 0.003948s : 1: type_inference.infer 8.32% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000084 2 100.00% : 0.000084s : 2: match.inline ------[predicate.] 0.000138 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.71% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000004s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.97% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.55% : 0.000009s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.57% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.70% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.65% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.15% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 1.17% : 0.000002s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.49% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.97% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.98% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.54% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.97% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000260 6 40.62% : 0.000106s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.38% : 0.000155s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080210 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.99% : 0.003201s : 1: add_attr 3.98% : 0.003192s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000520s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000017s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.55% : 0.000438s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.65% : 0.000521s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000014s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000015s : 1: opt.transform.mutable_eliminate 1.00% : 0.000802s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.53% : 0.002028s : 1: opt_a 0.12% : 0.000100s : 1: opt_after_cconv 0.57% : 0.000461s : 1: opt_after_jit_grad 0.24% : 0.000192s : 1: opt_b 4.91% : 0.003939s : 1: optimize 0.03% : 0.000021s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000022s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000008s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000021s : 1: py_interpret_to_execute 0.02% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.33% : 0.000261s : 1: renormalize.infer 0.23% : 0.000183s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000039s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 73.41% : 0.058879s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.44% : 0.004367s : 1: type_inference 0.08% : 0.000062s : 1: validate TotalTime = 0.10908, [24] [bootstrap]: 0.0005426 [type_inference]: 0.0102504 [event_method]: 4.312e-05 [auto_monad]: 0.00011285 [graph_reusing]: 7.9e-06 [inline]: 1.87999e-06 [add_attr]: 0.00301384, [1] [add_attr_with_inline]: 0.00300524, [1] [Cycle 1]: 6.691e-05, [2] [tag_attr]: 3.151e-05 [meta_addattr_fg_expand]: 8.28001e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 4.527e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.92999e-06 [optimize]: 0.0131657, [53] [py_interpret_to_execute]: 3.589e-05 [rewriter_before_opt_a]: 0.00012784 [opt_a]: 0.0109177, [3] [Cycle 1]: 0.00701443, [45] [expand_dump_flag]: 3.86999e-06 [switch_simplify]: 6.633e-05 [loop_unroll]: 5.476e-05 [a_1]: 0.00133277 [with_stream_mark]: 2.411e-05 [recompute_prepare]: 2.157e-05 [updatestate_depend_eliminate]: 9.29e-06 [updatestate_assign_eliminate]: 8.10999e-06 [updatestate_loads_eliminate]: 7.66999e-06 [parameter_eliminate]: 2.76e-06 [a_2]: 0.00024514 [accelerated_algorithm]: 2.974e-05 [shard]: 1.96e-06 [meta_shard_fg_expand]: 3.28e-06 [shard_inline]: 1.596e-05 [merge_send_recv]: 1.551e-05 [auto_parallel]: 1.089e-05 [parallel]: 1.893e-05 [flash_sp]: 1.1e-05 [merge_comm]: 9.71e-06 [allreduce_fusion]: 5.132e-05 [matmul_add_comm_reduction]: 2.724e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.862e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.503e-05 [virtual_output]: 1.521e-05 [merge_forward]: 9.15001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 1.773e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.939e-05 [merge_recompute_call_nodes]: 1.75001e-06 [before_grad]: 2.725e-05 [set_forward_comm_id_for_comm_node_pass]: 9.69e-06 [meta_fg_expand]: 0.00139163 [flash_sp_send_recv_attached]: 3.85e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 5.982e-05 [a_after_grad]: 7.98e-05 [renormalize]: 0.00249573 [add_forward_monad_depend]: 8.93002e-06 [auto_monad_grad]: 5.15001e-06 [auto_monad_eliminator]: 5.635e-05 [cse]: 0.00016967 [a_3]: 0.00033353 [Cycle 2]: 0.00296482, [45] [expand_dump_flag]: 1.54e-06 [switch_simplify]: 4.661e-05 [loop_unroll]: 4.375e-05 [a_1]: 0.00152972 [with_stream_mark]: 1.187e-05 [recompute_prepare]: 1.112e-05 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 3.74002e-06 [parameter_eliminate]: 1.12999e-06 [a_2]: 0.00012635 [accelerated_algorithm]: 1.23e-05 [shard]: 1.09003e-06 [meta_shard_fg_expand]: 1.97999e-06 [shard_inline]: 9.36e-06 [merge_send_recv]: 6.51e-06 [auto_parallel]: 7.18998e-06 [parallel]: 5.94e-06 [flash_sp]: 3.21001e-06 [merge_comm]: 6.07999e-06 [allreduce_fusion]: 4.92e-06 [matmul_add_comm_reduction]: 7.79002e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 1.029e-05 [virtual_dataset]: 8.84998e-06 [get_grad_eliminate_]: 8.59002e-06 [virtual_output]: 8.27998e-06 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 8.60001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.613e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.397e-05 [set_forward_comm_id_for_comm_node_pass]: 5.18002e-06 [meta_fg_expand]: 3.423e-05 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.518e-05 [a_after_grad]: 1.478e-05 [renormalize]: 0.00059386 [add_forward_monad_depend]: 4.18999e-06 [auto_monad_grad]: 1.16002e-06 [auto_monad_eliminator]: 1.479e-05 [cse]: 4.658e-05 [a_3]: 6.52e-05 [Cycle 3]: 0.00092468, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.024e-05 [loop_unroll]: 9.77001e-06 [a_1]: 0.00025252 [with_stream_mark]: 1.014e-05 [recompute_prepare]: 9.46003e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 3.93999e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 0.0001255 [accelerated_algorithm]: 1.153e-05 [shard]: 9.09989e-07 [meta_shard_fg_expand]: 1.77999e-06 [shard_inline]: 8.99e-06 [merge_send_recv]: 7.03998e-06 [auto_parallel]: 7.03e-06 [parallel]: 4.41002e-06 [flash_sp]: 1.13001e-06 [merge_comm]: 4.85999e-06 [allreduce_fusion]: 4.95001e-06 [matmul_add_comm_reduction]: 7.61001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 9.96e-06 [virtual_dataset]: 8.49998e-06 [get_grad_eliminate_]: 8.49998e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.32003e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 8.75999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.364e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27001e-06 [meta_fg_expand]: 2.91e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.476e-05 [a_after_grad]: 1.493e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 1.138e-05 [cse]: 2.742e-05 [a_3]: 5.994e-05 [py_interpret_to_execute_after_opt_a]: 1.035e-05 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 4.726e-05 [convert_after_rewriter]: 8.84e-06 [order_py_execute_after_rewriter]: 6.63e-06 [mutable_eliminate]: 0.00046296 [opt_b]: 0.00028964, [1] [Cycle 1]: 0.00028376, [7] [b_1]: 0.00018987 [b_2]: 1.093e-05 [updatestate_depend_eliminate]: 7.34002e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 4.08001e-06 [renormalize]: 3.59985e-07 [cse]: 3.184e-05 [optimize_parallel_all_gather_comm]: 2.071e-05 [overlap_param_gather]: 1.93997e-06 [cconv]: 1.951e-05 [loop_unroll]: 0.00042692 [opt_after_cconv]: 0.00013634, [1] [Cycle 1]: 0.00013049, [7] [c_1]: 4.917e-05 [parameter_eliminate]: 2.23002e-06 [updatestate_depend_eliminate]: 7.31001e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.89002e-06 [cse]: 3.023e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 2.868e-05 [tuple_transform]: 0.0001016, [1] [Cycle 1]: 9.688e-05, [4] [d_1]: 6.694e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 9.63997e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.727e-05 [cse_after_recomputation]: 3.27e-05, [1] [Cycle 1]: 2.781e-05, [1] [cse]: 2.221e-05 [environ_conv]: 9.64999e-06 [swap_dp_allreduce_reducescatter]: 8.15999e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.36998e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.85001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.50007e-07 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.30001e-06 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.717e-05 [grouped_pairwise_exchange_alltoall]: 1.61002e-06 [offloading_packed_experts]: 4.85001e-06 [overlap_recompute_and_grad_model_parallel]: 5.34998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.56002e-06 [overlap_recompute_comm]: 2.01998e-06 [overlap_grad_ring_attention]: 5.35001e-06 [overlap_grad_flash_sp]: 2.344e-05 [begin_end_overlap_inline]: 4.50003e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 9.788e-05, [1] [Cycle 1]: 9.357e-05, [6] [build]: 1.006e-05 [elim_shapecalc]: 1.348e-05 [elim_not_effective]: 1.795e-05 [opt_reshape]: 9.59e-06 [fold_const_symbol]: 1.461e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.469e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 3.57002e-06 [opt_after_jit_grad]: 0.00047356 [validate]: 4.533e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.0811137 [execute]: 8.68001e-06 Sums bootstrap : 0.000543s : 0.52% type_inference : 0.010250s : 9.78% event_method : 0.000043s : 0.04% auto_monad : 0.000113s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000128s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000123s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.10% optimize.opt_a.a_1 : 0.003115s : 2.97% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000497s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000061s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001429s : 1.36% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.09% optimize.opt_a.a_after_grad : 0.000110s : 0.10% optimize.opt_a.renormalize : 0.003090s : 2.95% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000083s : 0.08% optimize.opt_a.cse : 0.000244s : 0.23% optimize.opt_a.a_3 : 0.000459s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.44% optimize.opt_b.b_1 : 0.000190s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000427s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000474s : 0.45% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.081114s : 77.40% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000732 218 5.97% : 0.000044s : 11: substitution.arithmetic_simplify 1.87% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.56% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.84% : 0.000401s : 16: substitution.inline 2.09% : 0.000015s : 2: substitution.inline_without_move 1.36% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000014s : 3: substitution.less_batch_normalization 1.89% : 0.000014s : 11: substitution.minmaximum_grad 0.73% : 0.000005s : 5: substitution.partial_eliminate 1.94% : 0.000014s : 20: substitution.remove_not_recompute_node 3.20% : 0.000023s : 10: substitution.replace_applicator 1.48% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.72% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.39% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.49% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010182 2 87.06% : 0.008865s : 1: type_inference.infer 12.94% : 0.001317s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.22% : 0.000119s : 16: replace.inline 40.78% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000423 30 92.98% : 0.000393s : 16: match.inline 7.02% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000735 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.16% : 0.000009s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.58% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000041s : 244: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 164: predicate.load_eliminater 0.34% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 1.95% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 67: predicate.reduce_eliminate 2.65% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.96% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.68% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.19% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.90% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.79% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.49% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.16% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001636 32 59.91% : 0.000980s : 12: func_graph_cloner_run.FuncGraphClonerGraph 40.09% : 0.000656s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.133440 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.26% : 0.003018s : 1: add_attr 2.25% : 0.003009s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000062s : 1: add_recomputation 0.00% : 0.000005s : 1: assign_add_opt 0.09% : 0.000120s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.43% : 0.000576s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.57% : 0.004764s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.18% : 0.010921s : 1: opt_a 0.10% : 0.000140s : 1: opt_after_cconv 0.36% : 0.000483s : 1: opt_after_jit_grad 0.22% : 0.000293s : 1: opt_b 9.87% : 0.013170s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.19% : 0.001594s : 2: renormalize.infer 1.11% : 0.001483s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 60.80% : 0.081132s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 7.69% : 0.010265s : 1: type_inference 0.05% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x0-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x1-pynative],max_mem:60.0M TotalTime = 0.0214228, [24] [bootstrap]: 0.00052255 [type_inference]: 0.0060757 [event_method]: 1.431e-05 [auto_monad]: 5.512e-05 [graph_reusing]: 4.80999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00347144, [1] [add_attr_with_inline]: 0.00346007, [1] [Cycle 1]: 4.531e-05, [2] [tag_attr]: 1.656e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.845e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 1.94999e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00397559, [53] [py_interpret_to_execute]: 2.1e-05 [rewriter_before_opt_a]: 5.912e-05 [opt_a]: 0.00213496, [2] [Cycle 1]: 0.00152039, [45] [expand_dump_flag]: 3.25e-06 [switch_simplify]: 3.196e-05 [loop_unroll]: 2.128e-05 [a_1]: 0.00045119 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 7.85998e-06 [updatestate_depend_eliminate]: 3.40003e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 2.94999e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.511e-05 [accelerated_algorithm]: 6.34001e-06 [shard]: 1.81e-06 [meta_shard_fg_expand]: 1.92999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 7.78001e-06 [auto_parallel]: 6.01e-06 [parallel]: 2.284e-05 [flash_sp]: 7.08e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 9.14e-06 [allreduce_slice_to_reducescatter]: 7.00005e-07 [virtual_shard_identity]: 6.96001e-06 [virtual_dataset]: 5.99999e-06 [get_grad_eliminate_]: 5.46998e-06 [virtual_output]: 5.65001e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.23002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42002e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.82002e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00042593 [add_forward_monad_depend]: 4.36002e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.296e-05 [cse]: 2.794e-05 [a_3]: 4.019e-05 [Cycle 2]: 0.00060513, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.68998e-06 [loop_unroll]: 5.31998e-06 [a_1]: 0.00012066 [with_stream_mark]: 9.71e-06 [recompute_prepare]: 5.52999e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.74e-05 [accelerated_algorithm]: 5.25001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 5.17999e-06 [parallel]: 5.01002e-06 [flash_sp]: 2.92002e-06 [merge_comm]: 3.29001e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 4.75999e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 6.04001e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 5.13002e-06 [merge_forward]: 2.68998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 5.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39998e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.83999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.79984e-07 [after_resolve]: 9.12001e-06 [a_after_grad]: 8.26002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.02998e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 6.73e-06 [cse]: 3.067e-05 [a_3]: 3.249e-05 [py_interpret_to_execute_after_opt_a]: 7.93999e-06 [slice_cell_reuse_recomputed_activation]: 2.66999e-06 [rewriter_after_opt_a]: 2.989e-05 [convert_after_rewriter]: 7.03998e-06 [order_py_execute_after_rewriter]: 4.97999e-06 [mutable_eliminate]: 0.00045007 [opt_b]: 0.00017977, [1] [Cycle 1]: 0.00017382, [7] [b_1]: 0.00010664 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.18998e-06 [renormalize]: 3.00002e-07 [cse]: 1.655e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.298e-05 [loop_unroll]: 0.00041448 [opt_after_cconv]: 9.548e-05, [1] [Cycle 1]: 8.984e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.653e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.28e-05 [tuple_transform]: 6.864e-05, [1] [Cycle 1]: 6.429e-05, [4] [d_1]: 3.887e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.57999e-06 [add_recomputation]: 5.121e-05 [cse_after_recomputation]: 2.057e-05, [1] [Cycle 1]: 1.611e-05, [1] [cse]: 1.099e-05 [environ_conv]: 4.31002e-06 [swap_dp_allreduce_reducescatter]: 5.04998e-06 [bias_add_comm_swap]: 2.62001e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.64001e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 9.00007e-07 [remove_cast_before_assign_add]: 7.7e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.50997e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.14003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.51001e-06 [overlap_recompute_and_grad_model_parallel]: 4.55999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.78998e-06 [overlap_grad_ring_attention]: 4.28001e-06 [overlap_grad_flash_sp]: 1.69e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.30002e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 6.807e-05, [1] [Cycle 1]: 6.359e-05, [6] [build]: 2.29999e-06 [elim_shapecalc]: 8.03999e-06 [elim_not_effective]: 1.115e-05 [opt_reshape]: 5.80002e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.92999e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.533e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 0.00011878 [opt_after_jit_grad]: 0.0004581 [validate]: 3.175e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00643177 [execute]: 6.76e-06 Sums bootstrap : 0.000523s : 3.07% type_inference : 0.006076s : 35.75% event_method : 0.000014s : 0.08% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000017s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000572s : 3.36% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000426s : 2.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000059s : 0.34% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000003s : 0.02% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.65% optimize.opt_b.b_1 : 0.000107s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.14% optimize.loop_unroll : 0.000414s : 2.44% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000119s : 0.70% opt_after_jit_grad : 0.000458s : 2.70% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006432s : 37.84% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.81% : 0.000025s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.87% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000005s : 4: substitution.graph_param_transform 66.35% : 0.000110s : 3: substitution.inline 1.68% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param 7.16% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006028 2 90.26% : 0.005441s : 1: type_inference.infer 9.74% : 0.000587s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.79% : 0.000027s : 3: replace.inline 30.21% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 90.90% : 0.000108s : 3: match.inline 9.10% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.85% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.96% : 0.000002s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.52% : 0.000004s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.28% : 0.000010s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.65% : 0.000004s : 32: predicate.load_eliminater 0.96% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.21% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000383 8 46.17% : 0.000177s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.83% : 0.000206s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030382 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.44% : 0.003476s : 1: add_attr 11.40% : 0.003464s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.82% : 0.000553s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.08% : 0.000937s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.04% : 0.002138s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.54% : 0.000468s : 1: opt_after_jit_grad 0.60% : 0.000183s : 1: opt_b 13.10% : 0.003980s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.72% : 0.000219s : 1: renormalize.infer 0.66% : 0.000200s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.41% : 0.000125s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000006s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.20% : 0.006442s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.04% : 0.006089s : 1: type_inference 0.19% : 0.000059s : 1: validate TotalTime = 0.0181164, [24] [bootstrap]: 0.00046801 [type_inference]: 0.00437235 [event_method]: 1.027e-05 [auto_monad]: 4.959e-05 [graph_reusing]: 4.77998e-06 [inline]: 1.79e-06 [add_attr]: 0.00292673, [1] [add_attr_with_inline]: 0.00291909, [1] [Cycle 1]: 4.226e-05, [2] [tag_attr]: 1.198e-05 [meta_addattr_fg_expand]: 3.11999e-06 [parallel-infer-symbol]: 3.08998e-06 [pre_auto_parallel]: 2.134e-05 [insert-virtual-dataset]: 2.94999e-06 [parallel-infer-symbol-second]: 6.50005e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00365065, [53] [py_interpret_to_execute]: 1.553e-05 [rewriter_before_opt_a]: 3.755e-05 [opt_a]: 0.00185827, [2] [Cycle 1]: 0.00125882, [45] [expand_dump_flag]: 2.51998e-06 [switch_simplify]: 2.471e-05 [loop_unroll]: 1.355e-05 [a_1]: 0.00029004 [with_stream_mark]: 1.305e-05 [recompute_prepare]: 7.61001e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.23998e-06 [updatestate_loads_eliminate]: 3.09999e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 7.652e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.24998e-06 [shard_inline]: 6.17001e-06 [merge_send_recv]: 7.87998e-06 [auto_parallel]: 6.28e-06 [parallel]: 1.659e-05 [flash_sp]: 7.04001e-06 [merge_comm]: 3.86999e-06 [allreduce_fusion]: 3.23998e-06 [matmul_add_comm_reduction]: 8.97e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 6.88e-06 [virtual_dataset]: 5.69999e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.49999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.07001e-06 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.066e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00033365 [add_forward_monad_depend]: 4.31002e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 2.775e-05 [a_3]: 4.002e-05 [Cycle 2]: 0.0005902, [45] [expand_dump_flag]: 7.90023e-07 [switch_simplify]: 6.69999e-06 [loop_unroll]: 5.79999e-06 [a_1]: 0.00012431 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.37999e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 6.746e-05 [accelerated_algorithm]: 5.63002e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.65001e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.09998e-06 [parallel]: 4.13999e-06 [flash_sp]: 2.89001e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 4.91997e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.26998e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 6.19001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.19998e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.56002e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 8.37e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.63e-06 [cse]: 1.312e-05 [a_3]: 3.222e-05 [py_interpret_to_execute_after_opt_a]: 7.35003e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.035e-05 [convert_after_rewriter]: 6.69999e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00044618 [opt_b]: 0.00017932, [1] [Cycle 1]: 0.00017323, [7] [b_1]: 0.00010772 [b_2]: 6.94999e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.43e-06 [renormalize]: 3.19997e-07 [cse]: 1.505e-05 [optimize_parallel_all_gather_comm]: 1.522e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.186e-05 [loop_unroll]: 0.00041191 [opt_after_cconv]: 9.385e-05, [1] [Cycle 1]: 8.828e-05, [7] [c_1]: 2.746e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.08002e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.542e-05 [renormalize]: 2.40019e-07 [remove_dup_value]: 1.184e-05 [tuple_transform]: 6.821e-05, [1] [Cycle 1]: 6.39e-05, [4] [d_1]: 3.865e-05 [none_parameter_eliminate]: 1.41998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.38e-05 [cse_after_recomputation]: 1.938e-05, [1] [Cycle 1]: 1.495e-05, [1] [cse]: 9.79999e-06 [environ_conv]: 4.55999e-06 [swap_dp_allreduce_reducescatter]: 4.96002e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 1.99999e-06 [assign_add_opt]: 1.21997e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.26002e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.29e-06 [add_comm_op_reuse_tag]: 1.02e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.176e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.95e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.732e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.98002e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 6.736e-05, [1] [Cycle 1]: 6.345e-05, [6] [build]: 2.25002e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.132e-05 [opt_reshape]: 5.98002e-06 [fold_const_symbol]: 8.67e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.602e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.16999e-06 [opt_after_jit_grad]: 0.00044178 [validate]: 3.058e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.00590658 [execute]: 6.58e-06 Sums bootstrap : 0.000468s : 3.29% type_inference : 0.004372s : 30.74% event_method : 0.000010s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000414s : 2.91% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000334s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.29% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000446s : 3.14% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000412s : 2.90% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000442s : 3.11% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005907s : 41.53% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.41% : 0.000022s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.34% : 0.000005s : 4: substitution.graph_param_transform 65.69% : 0.000078s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.45% : 0.000004s : 4: substitution.remove_not_recompute_node 3.40% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004332 2 91.52% : 0.003965s : 1: type_inference.infer 8.48% : 0.000367s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 0.96% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.57% : 0.000003s : 17: predicate.arithmetic_simplify 0.73% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.33% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.34% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000001s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.62% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.08% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.93% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.60% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.40% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025952 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.29% : 0.002931s : 1: add_attr 11.26% : 0.002922s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000502s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000420s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000455s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.95% : 0.000766s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.17% : 0.001861s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.74% : 0.000451s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.08% : 0.003655s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000021s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000184s : 1: renormalize.infer 0.55% : 0.000143s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.80% : 0.005916s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.90% : 0.004386s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0194829, [24] [bootstrap]: 0.00044907 [type_inference]: 0.00545066 [event_method]: 1.409e-05 [auto_monad]: 5.357e-05 [graph_reusing]: 5.76e-06 [inline]: 1.84e-06 [add_attr]: 0.00297431, [1] [add_attr_with_inline]: 0.00296635, [1] [Cycle 1]: 4.551e-05, [2] [tag_attr]: 1.533e-05 [meta_addattr_fg_expand]: 4.16001e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.561e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00392316, [53] [py_interpret_to_execute]: 1.966e-05 [rewriter_before_opt_a]: 5.854e-05 [opt_a]: 0.00211499, [2] [Cycle 1]: 0.00151384, [45] [expand_dump_flag]: 2.88998e-06 [switch_simplify]: 3.173e-05 [loop_unroll]: 2.157e-05 [a_1]: 0.0004455 [with_stream_mark]: 1.316e-05 [recompute_prepare]: 7.53999e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.11001e-06 [updatestate_loads_eliminate]: 3.08998e-06 [parameter_eliminate]: 1.73997e-06 [a_2]: 7.43e-05 [accelerated_algorithm]: 6.58998e-06 [shard]: 1.90001e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 8.08999e-06 [auto_parallel]: 6.03002e-06 [parallel]: 1.768e-05 [flash_sp]: 6.64999e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.89e-06 [allreduce_slice_to_reducescatter]: 6.39993e-07 [virtual_shard_identity]: 7.74002e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.61e-06 [virtual_output]: 5.47999e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.04e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.49e-06 [set_forward_comm_id_for_comm_node_pass]: 3.61001e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.61e-06 [after_resolve]: 1.106e-05 [a_after_grad]: 9.31e-06 [renormalize]: 0.00039965 [add_forward_monad_depend]: 4.65999e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 4.282e-05 [cse]: 2.851e-05 [a_3]: 4.076e-05 [Cycle 2]: 0.00059171, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012487 [with_stream_mark]: 9.20999e-06 [recompute_prepare]: 5.55001e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.55997e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.742e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.22999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.20999e-06 [auto_parallel]: 5.03002e-06 [parallel]: 4.2e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.86e-06 [matmul_add_comm_reduction]: 4.96002e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.56999e-06 [virtual_dataset]: 5.87001e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.65002e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57999e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.99002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 9.56e-06 [a_after_grad]: 7.89997e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.33e-06 [cse]: 1.626e-05 [a_3]: 3.217e-05 [py_interpret_to_execute_after_opt_a]: 7.94997e-06 [slice_cell_reuse_recomputed_activation]: 1.72001e-06 [rewriter_after_opt_a]: 3.02e-05 [convert_after_rewriter]: 6.45002e-06 [order_py_execute_after_rewriter]: 5.14998e-06 [mutable_eliminate]: 0.00044659 [opt_b]: 0.00017926, [1] [Cycle 1]: 0.00017336, [7] [b_1]: 0.00010764 [b_2]: 6.76e-06 [updatestate_depend_eliminate]: 5.32999e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 4.39992e-07 [cse]: 1.59e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.226e-05 [loop_unroll]: 0.00040596 [opt_after_cconv]: 9.333e-05, [1] [Cycle 1]: 8.716e-05, [7] [c_1]: 2.7e-05 [parameter_eliminate]: 2.35002e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.531e-05 [renormalize]: 2.59985e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.832e-05, [1] [Cycle 1]: 6.421e-05, [4] [d_1]: 3.848e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.14999e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.418e-05 [cse_after_recomputation]: 1.918e-05, [1] [Cycle 1]: 1.505e-05, [1] [cse]: 1.002e-05 [environ_conv]: 4.49998e-06 [swap_dp_allreduce_reducescatter]: 4.74e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.38999e-06 [label_fine_grained_interleaved_index]: 2.71999e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.04e-06 [micro_interleaved_order_control]: 2.26998e-06 [assign_add_opt]: 1.32e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.27999e-06 [reorder_send_recv_between_fp_bp]: 2.48998e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.19e-06 [interleave_parallel_branches]: 1.24003e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.103e-05 [grouped_pairwise_exchange_alltoall]: 1.51002e-06 [offloading_packed_experts]: 3.41001e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.19997e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.19999e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 6.846e-05, [1] [Cycle 1]: 6.444e-05, [6] [build]: 2.14999e-06 [elim_shapecalc]: 7.85e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 6.26e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.576e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.00044017 [validate]: 3.068e-05 [backend_pass]: 1.08001e-06 [task_emit]: 0.00588131 [execute]: 6.45002e-06 Sums bootstrap : 0.000449s : 2.89% type_inference : 0.005451s : 35.02% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000026s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.38% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000038s : 0.25% optimize.opt_a.loop_unroll : 0.000027s : 0.17% optimize.opt_a.a_1 : 0.000570s : 3.66% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000400s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000049s : 0.32% optimize.opt_a.cse : 0.000045s : 0.29% optimize.opt_a.a_3 : 0.000073s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000447s : 2.87% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000406s : 2.61% optimize.opt_after_cconv.c_1 : 0.000027s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.28% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000440s : 2.83% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005881s : 37.79% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000162 30 15.09% : 0.000024s : 5: substitution.arithmetic_simplify 1.15% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 66.44% : 0.000107s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.75% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005411 2 90.07% : 0.004873s : 1: type_inference.infer 9.93% : 0.000537s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.31% : 0.000026s : 3: replace.inline 30.69% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.45% : 0.000105s : 3: match.inline 8.55% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 1.00% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.54% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000009s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.92% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.65% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.11% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.76% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.60% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000329 8 46.75% : 0.000154s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.25% : 0.000175s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027874 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.69% : 0.002979s : 1: add_attr 10.65% : 0.002970s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.73% : 0.000483s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.49% : 0.000414s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.36% : 0.000937s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.60% : 0.002118s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.61% : 0.000450s : 1: opt_after_jit_grad 0.65% : 0.000183s : 1: opt_b 14.09% : 0.003927s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000204s : 1: renormalize.infer 0.68% : 0.000189s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.13% : 0.005891s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.60% : 0.005464s : 1: type_inference 0.20% : 0.000057s : 1: validate TotalTime = 0.0373278, [24] [bootstrap]: 0.00052009 [type_inference]: 0.0111556 [event_method]: 4.583e-05 [auto_monad]: 0.00011836 [graph_reusing]: 8.10999e-06 [inline]: 1.91998e-06 [add_attr]: 0.00300398, [1] [add_attr_with_inline]: 0.00299563, [1] [Cycle 1]: 6.959e-05, [2] [tag_attr]: 3.343e-05 [meta_addattr_fg_expand]: 9.47001e-06 [parallel-infer-symbol]: 3.25998e-06 [pre_auto_parallel]: 4.921e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.0131655, [53] [py_interpret_to_execute]: 3.767e-05 [rewriter_before_opt_a]: 0.00014412 [opt_a]: 0.0109274, [3] [Cycle 1]: 0.00701775, [45] [expand_dump_flag]: 3.47002e-06 [switch_simplify]: 7.344e-05 [loop_unroll]: 6.163e-05 [a_1]: 0.00147718 [with_stream_mark]: 2.304e-05 [recompute_prepare]: 2.168e-05 [updatestate_depend_eliminate]: 9.53002e-06 [updatestate_assign_eliminate]: 7.45e-06 [updatestate_loads_eliminate]: 7.66001e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 0.00024398 [accelerated_algorithm]: 3.006e-05 [shard]: 1.95001e-06 [meta_shard_fg_expand]: 3.38e-06 [shard_inline]: 1.655e-05 [merge_send_recv]: 1.602e-05 [auto_parallel]: 1.073e-05 [parallel]: 1.764e-05 [flash_sp]: 1.095e-05 [merge_comm]: 9.49e-06 [allreduce_fusion]: 9.20999e-06 [matmul_add_comm_reduction]: 2.525e-05 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 1.799e-05 [virtual_dataset]: 1.595e-05 [get_grad_eliminate_]: 1.535e-05 [virtual_output]: 1.536e-05 [merge_forward]: 9.25999e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 1.738e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.846e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 2.719e-05 [set_forward_comm_id_for_comm_node_pass]: 9.93002e-06 [meta_fg_expand]: 0.00137933 [flash_sp_send_recv_attached]: 4.37e-06 [receive_attached]: 2.65002e-06 [after_resolve]: 5.976e-05 [a_after_grad]: 7.997e-05 [renormalize]: 0.00241007 [add_forward_monad_depend]: 8.84e-06 [auto_monad_grad]: 5.21002e-06 [auto_monad_eliminator]: 5.594e-05 [cse]: 0.00015778 [a_3]: 0.00033681 [Cycle 2]: 0.00300097, [45] [expand_dump_flag]: 1.41998e-06 [switch_simplify]: 4.693e-05 [loop_unroll]: 4.348e-05 [a_1]: 0.00152011 [with_stream_mark]: 1.196e-05 [recompute_prepare]: 1.08e-05 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012557 [accelerated_algorithm]: 1.22e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.15001e-06 [merge_send_recv]: 6.84001e-06 [auto_parallel]: 7.56001e-06 [parallel]: 5.24e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.84e-06 [allreduce_fusion]: 4.72e-06 [matmul_add_comm_reduction]: 7.64002e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 1.017e-05 [virtual_dataset]: 8.75999e-06 [get_grad_eliminate_]: 8.53001e-06 [virtual_output]: 3.778e-05 [merge_forward]: 4.47e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.25999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.639e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.408e-05 [set_forward_comm_id_for_comm_node_pass]: 5.28002e-06 [meta_fg_expand]: 6.878e-05 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.19003e-06 [after_resolve]: 1.599e-05 [a_after_grad]: 1.452e-05 [renormalize]: 0.00057488 [add_forward_monad_depend]: 3.87002e-06 [auto_monad_grad]: 1.19003e-06 [auto_monad_eliminator]: 1.448e-05 [cse]: 4.605e-05 [a_3]: 6.473e-05 [Cycle 3]: 0.00089481, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 1.062e-05 [loop_unroll]: 9.06002e-06 [a_1]: 0.00024905 [with_stream_mark]: 9.92001e-06 [recompute_prepare]: 9.32001e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.88001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 0.00012279 [accelerated_algorithm]: 1.157e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 8.82999e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 6.89999e-06 [parallel]: 4.67998e-06 [flash_sp]: 9.79984e-07 [merge_comm]: 4.72e-06 [allreduce_fusion]: 4.77e-06 [matmul_add_comm_reduction]: 7.63001e-06 [allreduce_slice_to_reducescatter]: 4.00003e-07 [virtual_shard_identity]: 9.82001e-06 [virtual_dataset]: 8.52e-06 [get_grad_eliminate_]: 8.35001e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 8.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.577e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.369e-05 [set_forward_comm_id_for_comm_node_pass]: 5.61998e-06 [meta_fg_expand]: 3.05002e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 1.427e-05 [a_after_grad]: 1.471e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 1.038e-05 [cse]: 2.576e-05 [a_3]: 5.908e-05 [py_interpret_to_execute_after_opt_a]: 9.80002e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 4.612e-05 [convert_after_rewriter]: 8.85001e-06 [order_py_execute_after_rewriter]: 6.58e-06 [mutable_eliminate]: 0.00045183 [opt_b]: 0.00028818, [1] [Cycle 1]: 0.0002821, [7] [b_1]: 0.00018966 [b_2]: 1.062e-05 [updatestate_depend_eliminate]: 7.07002e-06 [updatestate_assign_eliminate]: 4e-06 [updatestate_loads_eliminate]: 4.23001e-06 [renormalize]: 4.09986e-07 [cse]: 3.195e-05 [optimize_parallel_all_gather_comm]: 1.999e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 1.87e-05 [loop_unroll]: 0.00042026 [opt_after_cconv]: 0.00013568, [1] [Cycle 1]: 0.00012975, [7] [c_1]: 4.909e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 7.08e-06 [updatestate_assign_eliminate]: 4.28999e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 2.914e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 2.802e-05 [tuple_transform]: 0.00010179, [1] [Cycle 1]: 9.72e-05, [4] [d_1]: 6.69e-05 [none_parameter_eliminate]: 1.87001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.96e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 5.626e-05 [cse_after_recomputation]: 3.25e-05, [1] [Cycle 1]: 2.778e-05, [1] [cse]: 2.236e-05 [environ_conv]: 8.88002e-06 [swap_dp_allreduce_reducescatter]: 7.97e-06 [bias_add_comm_swap]: 2.21e-06 [label_micro_interleaved_index]: 4.16001e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.20002e-06 [micro_interleaved_order_control]: 2.13998e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.01998e-06 [reorder_send_recv_between_fp_bp]: 2.66999e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.25001e-06 [overlap_opt_shard_in_pipeline]: 1.50999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71002e-06 [control_data_broadcast_order]: 1.686e-05 [grouped_pairwise_exchange_alltoall]: 1.67999e-06 [offloading_packed_experts]: 4.81002e-06 [overlap_recompute_and_grad_model_parallel]: 5.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.60002e-06 [overlap_grad_ring_attention]: 4.94e-06 [overlap_grad_flash_sp]: 2.291e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 1.24e-06 [symbol_engine_optimizer]: 9.586e-05, [1] [Cycle 1]: 9.165e-05, [6] [build]: 8.92e-06 [elim_shapecalc]: 1.292e-05 [elim_not_effective]: 1.763e-05 [opt_reshape]: 1.012e-05 [fold_const_symbol]: 1.482e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.92001e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 2.546e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00046407 [validate]: 4.406e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00844951 [execute]: 6.91001e-06 Sums bootstrap : 0.000520s : 1.57% type_inference : 0.011156s : 33.77% event_method : 0.000046s : 0.14% auto_monad : 0.000118s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000033s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003246s : 9.83% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.12% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000061s : 0.19% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001451s : 4.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.002985s : 9.04% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.24% optimize.opt_a.cse : 0.000230s : 0.69% optimize.opt_a.a_3 : 0.000461s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000452s : 1.37% optimize.opt_b.b_1 : 0.000190s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000420s : 1.27% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.08% optimize.tuple_transform.d_1 : 0.000067s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000464s : 1.40% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008450s : 25.58% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000753 222 5.87% : 0.000044s : 12: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.63% : 0.000419s : 17: substitution.inline 1.99% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000014s : 3: substitution.less_batch_normalization 1.65% : 0.000012s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.81% : 0.000014s : 20: substitution.remove_not_recompute_node 3.21% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.61% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.71% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.50% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011083 2 86.96% : 0.009638s : 1: type_inference.infer 13.04% : 0.001445s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.41% : 0.000125s : 17: replace.inline 42.59% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000444 33 92.35% : 0.000410s : 17: match.inline 7.65% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000810 5764 1.01% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.47% : 0.000004s : 32: predicate.addn_check_dump 1.04% : 0.000008s : 68: predicate.addn_zero_filter 0.99% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.86% : 0.000015s : 100: predicate.arithmetic_simplify 1.10% : 0.000009s : 68: predicate.cast_eliminate 1.08% : 0.000009s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.48% : 0.000004s : 32: predicate.depend_value_elim 1.08% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.08% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.03% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.14% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.10% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.10% : 0.000009s : 76: predicate.environ_get_depend_swap 1.62% : 0.000013s : 108: predicate.environ_get_eliminate 1.12% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.62% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.13% : 0.000017s : 101: predicate.float_depend_g_call 0.48% : 0.000004s : 32: predicate.float_environ_get_switch 0.63% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.51% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.50% : 0.000004s : 32: predicate.incorporate_call 0.47% : 0.000004s : 32: predicate.incorporate_call_switch 5.22% : 0.000042s : 249: predicate.inline 1.18% : 0.000010s : 55: predicate.inline_without_move 0.28% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.57% : 0.000005s : 32: predicate.less_batch_normalization 1.52% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.49% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.10% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.29% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.03% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.06% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.03% : 0.000008s : 68: predicate.minmaximum_grad 0.33% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.84% : 0.000015s : 101: predicate.partial_defer_inline 1.61% : 0.000013s : 92: predicate.partial_eliminate 0.99% : 0.000008s : 68: predicate.print_const_string_wrapper 0.48% : 0.000004s : 32: predicate.reduce_all_const_elim 4.84% : 0.000039s : 68: predicate.reduce_eliminate 2.50% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000003s : 32: predicate.remove_not_recompute_node 1.82% : 0.000015s : 152: predicate.replace_applicator 0.57% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 0.99% : 0.000008s : 68: predicate.reshape_eliminate 1.04% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 8: predicate.row_tensor_eliminate 1.19% : 0.000010s : 68: predicate.same_eliminate 0.34% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.57% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.58% : 0.000005s : 32: predicate.specialize_transform 1.18% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.07% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.75% : 0.000014s : 101: predicate.switch_defer_inline 2.75% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.67% : 0.000038s : 277: predicate.switch_simplify 1.01% : 0.000008s : 68: predicate.tile_eliminate 1.00% : 0.000008s : 68: predicate.transpose_eliminate 1.37% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.41% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.24% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.66% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.34% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.87% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.49% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.45% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.03% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 8: predicate.value_based_eliminate 0.51% : 0.000004s : 32: predicate.virtual_dataset_eliminate 4.04% : 0.000033s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001534 34 57.22% : 0.000878s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.78% : 0.000656s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061746 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.87% : 0.003008s : 1: add_attr 4.86% : 0.002999s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000125s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.90% : 0.000554s : 1: bootstrap 0.04% : 0.000022s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000052s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.75% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.99% : 0.004936s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.05% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.08% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.70% : 0.010930s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.77% : 0.000473s : 1: opt_after_jit_grad 0.47% : 0.000292s : 1: opt_b 21.33% : 0.013169s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.63% : 0.001624s : 2: renormalize.infer 2.18% : 0.001349s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.08% : 0.000050s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000099s : 1: symbol_engine_optimizer 13.71% : 0.008463s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.09% : 0.011170s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0183292, [24] [bootstrap]: 0.00047248 [type_inference]: 0.00426033 [event_method]: 1.051e-05 [auto_monad]: 4.933e-05 [graph_reusing]: 5.07e-06 [inline]: 1.85001e-06 [add_attr]: 0.00298528, [1] [add_attr_with_inline]: 0.00297649, [1] [Cycle 1]: 4.637e-05, [2] [tag_attr]: 1.182e-05 [meta_addattr_fg_expand]: 3.33998e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.114e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.81998e-06 [pipeline_split]: 1.83002e-06 [optimize]: 0.00366964, [53] [py_interpret_to_execute]: 1.485e-05 [rewriter_before_opt_a]: 3.969e-05 [opt_a]: 0.00183813, [2] [Cycle 1]: 0.00124295, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.458e-05 [loop_unroll]: 1.335e-05 [a_1]: 0.00028777 [with_stream_mark]: 1.367e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.02002e-06 [updatestate_loads_eliminate]: 2.94999e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.669e-05 [accelerated_algorithm]: 6.43e-06 [shard]: 2.27999e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.39001e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 5.66998e-06 [parallel]: 1.737e-05 [flash_sp]: 7.15e-06 [merge_comm]: 3.7e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.69999e-06 [get_grad_eliminate_]: 5.21002e-06 [virtual_output]: 5.94e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.31998e-06 [before_grad]: 9.26002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.28998e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 1.069e-05 [a_after_grad]: 8.57998e-06 [renormalize]: 0.00033787 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.263e-05 [cse]: 2.8e-05 [a_3]: 3.959e-05 [Cycle 2]: 0.00058603, [45] [expand_dump_flag]: 8.10018e-07 [switch_simplify]: 6.58e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012401 [with_stream_mark]: 9.12999e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.743e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.34e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.34002e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.30999e-06 [flash_sp]: 3.07002e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.51998e-06 [matmul_add_comm_reduction]: 5.56e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.00002e-06 [virtual_dataset]: 5.08002e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 4.87e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.82998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.81e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 8.77e-06 [a_after_grad]: 7.9e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.269e-05 [a_3]: 3.132e-05 [py_interpret_to_execute_after_opt_a]: 7.11999e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 3.171e-05 [convert_after_rewriter]: 7.17997e-06 [order_py_execute_after_rewriter]: 4.77e-06 [mutable_eliminate]: 0.0004456 [opt_b]: 0.00017985, [1] [Cycle 1]: 0.00017385, [7] [b_1]: 0.00010577 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.20002e-06 [renormalize]: 3.19997e-07 [cse]: 1.66e-05 [optimize_parallel_all_gather_comm]: 1.574e-05 [overlap_param_gather]: 2.05002e-06 [cconv]: 2.233e-05 [loop_unroll]: 0.00044809 [opt_after_cconv]: 9.475e-05, [1] [Cycle 1]: 8.887e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.18002e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.53998e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.564e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.349e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.4e-05, [4] [d_1]: 3.889e-05 [none_parameter_eliminate]: 1.40999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.04001e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.264e-05 [cse_after_recomputation]: 2.08e-05, [1] [Cycle 1]: 1.631e-05, [1] [cse]: 1.096e-05 [environ_conv]: 4.33999e-06 [swap_dp_allreduce_reducescatter]: 4.72e-06 [bias_add_comm_swap]: 2.54001e-06 [label_micro_interleaved_index]: 4.06001e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.01998e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.22e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.148e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 3.40003e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.13998e-06 [overlap_grad_ring_attention]: 3.74002e-06 [overlap_grad_flash_sp]: 1.656e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.01998e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.759e-05, [1] [Cycle 1]: 6.371e-05, [6] [build]: 2.15002e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.133e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.84e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.74e-06 [auto_monad_reorder]: 1.509e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.48999e-06 [opt_after_jit_grad]: 0.0004455 [validate]: 3.057e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00614501 [execute]: 6.60997e-06 Sums bootstrap : 0.000472s : 3.28% type_inference : 0.004260s : 29.60% event_method : 0.000011s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000412s : 2.86% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000338s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 3.10% optimize.opt_b.b_1 : 0.000106s : 0.73% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000448s : 3.11% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.10% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006145s : 42.70% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 17.98% : 0.000021s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.63% : 0.000005s : 4: substitution.graph_param_transform 65.62% : 0.000077s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.93% : 0.000005s : 4: substitution.remove_not_recompute_node 3.02% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004219 2 91.86% : 0.003876s : 1: type_inference.infer 8.14% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.27% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.95% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.70% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.26% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.94% : 0.000001s : 9: predicate.reduce_eliminate 2.27% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.87% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000000s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.79% : 0.000001s : 9: predicate.tile_eliminate 0.76% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.13% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.07% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 41.93% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.07% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026240 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.39% : 0.002990s : 1: add_attr 11.36% : 0.002980s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000507s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.74% : 0.000457s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.73% : 0.000455s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.90% : 0.000760s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.02% : 0.001841s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.73% : 0.000455s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.00% : 0.003674s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.71% : 0.000186s : 1: renormalize.infer 0.55% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.46% : 0.006155s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.29% : 0.004273s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0357199, [24] [bootstrap]: 0.00056309 [type_inference]: 0.0102195 [event_method]: 4.079e-05 [auto_monad]: 0.00011447 [graph_reusing]: 7.75e-06 [inline]: 2.13998e-06 [add_attr]: 0.00300104, [1] [add_attr_with_inline]: 0.00299311, [1] [Cycle 1]: 6.475e-05, [2] [tag_attr]: 3.125e-05 [meta_addattr_fg_expand]: 8.42998e-06 [parallel-infer-symbol]: 2.80002e-06 [pre_auto_parallel]: 4.55e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.0129479, [53] [py_interpret_to_execute]: 3.448e-05 [rewriter_before_opt_a]: 0.00012578 [opt_a]: 0.0107255, [3] [Cycle 1]: 0.00687966, [45] [expand_dump_flag]: 3.37997e-06 [switch_simplify]: 6.553e-05 [loop_unroll]: 5.463e-05 [a_1]: 0.00132806 [with_stream_mark]: 2.248e-05 [recompute_prepare]: 2.161e-05 [updatestate_depend_eliminate]: 9.12001e-06 [updatestate_assign_eliminate]: 7.85998e-06 [updatestate_loads_eliminate]: 7.75e-06 [parameter_eliminate]: 2.37001e-06 [a_2]: 0.00024333 [accelerated_algorithm]: 3.055e-05 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 3.5e-06 [shard_inline]: 1.67e-05 [merge_send_recv]: 1.584e-05 [auto_parallel]: 1.094e-05 [parallel]: 1.943e-05 [flash_sp]: 1.177e-05 [merge_comm]: 9.63002e-06 [allreduce_fusion]: 8.74e-06 [matmul_add_comm_reduction]: 2.603e-05 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 1.814e-05 [virtual_dataset]: 1.575e-05 [get_grad_eliminate_]: 1.519e-05 [virtual_output]: 1.494e-05 [merge_forward]: 9.46e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 1.713e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.921e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 2.793e-05 [set_forward_comm_id_for_comm_node_pass]: 9.52001e-06 [meta_fg_expand]: 0.00137611 [flash_sp_send_recv_attached]: 3.88001e-06 [receive_attached]: 2.48e-06 [after_resolve]: 5.908e-05 [a_after_grad]: 8.062e-05 [renormalize]: 0.00241386 [add_forward_monad_depend]: 8.99e-06 [auto_monad_grad]: 4.91997e-06 [auto_monad_eliminator]: 5.564e-05 [cse]: 0.0001614 [a_3]: 0.00033387 [Cycle 2]: 0.0029266, [45] [expand_dump_flag]: 1.64e-06 [switch_simplify]: 4.694e-05 [loop_unroll]: 4.343e-05 [a_1]: 0.00151876 [with_stream_mark]: 1.179e-05 [recompute_prepare]: 1.119e-05 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 0.00012534 [accelerated_algorithm]: 1.208e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.04998e-06 [merge_send_recv]: 6.73e-06 [auto_parallel]: 7.2e-06 [parallel]: 5.19e-06 [flash_sp]: 3.25e-06 [merge_comm]: 4.94003e-06 [allreduce_fusion]: 4.62e-06 [matmul_add_comm_reduction]: 7.73999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.96e-06 [virtual_dataset]: 8.55001e-06 [get_grad_eliminate_]: 9.54e-06 [virtual_output]: 8.82e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.625e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 1.417e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 3.458e-05 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.562e-05 [a_after_grad]: 1.442e-05 [renormalize]: 0.00056885 [add_forward_monad_depend]: 3.88999e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 1.424e-05 [cse]: 4.562e-05 [a_3]: 6.565e-05 [Cycle 3]: 0.00090548, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 1.067e-05 [loop_unroll]: 9.02999e-06 [a_1]: 0.00026324 [with_stream_mark]: 1.037e-05 [recompute_prepare]: 9.34998e-06 [updatestate_depend_eliminate]: 4.79e-06 [updatestate_assign_eliminate]: 3.92998e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012257 [accelerated_algorithm]: 1.151e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.64998e-06 [shard_inline]: 8.83001e-06 [merge_send_recv]: 6.94999e-06 [auto_parallel]: 6.98e-06 [parallel]: 4.2e-06 [flash_sp]: 1.17e-06 [merge_comm]: 4.82998e-06 [allreduce_fusion]: 4.70999e-06 [matmul_add_comm_reduction]: 7.43e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 9.62999e-06 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.45999e-06 [virtual_output]: 8.12e-06 [merge_forward]: 4.28999e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 8.45001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.585e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04998e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 1.337e-05 [a_after_grad]: 1.415e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.016e-05 [cse]: 2.562e-05 [a_3]: 5.937e-05 [py_interpret_to_execute_after_opt_a]: 1.005e-05 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 4.611e-05 [convert_after_rewriter]: 8.87e-06 [order_py_execute_after_rewriter]: 6.69999e-06 [mutable_eliminate]: 0.00045411 [opt_b]: 0.00028559, [1] [Cycle 1]: 0.0002795, [7] [b_1]: 0.00018786 [b_2]: 1.098e-05 [updatestate_depend_eliminate]: 7.5e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 4e-06 [renormalize]: 3.30008e-07 [cse]: 3.045e-05 [optimize_parallel_all_gather_comm]: 1.991e-05 [overlap_param_gather]: 1.86998e-06 [cconv]: 1.881e-05 [loop_unroll]: 0.00042389 [opt_after_cconv]: 0.00013493, [1] [Cycle 1]: 0.00012874, [7] [c_1]: 4.861e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 7.65e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 2.847e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 2.792e-05 [tuple_transform]: 0.00010128, [1] [Cycle 1]: 9.67e-05, [4] [d_1]: 6.693e-05 [none_parameter_eliminate]: 1.66998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.64e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 5.847e-05 [cse_after_recomputation]: 3.167e-05, [1] [Cycle 1]: 2.717e-05, [1] [cse]: 2.184e-05 [environ_conv]: 8.43001e-06 [swap_dp_allreduce_reducescatter]: 8.03001e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.40002e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 3.05998e-06 [reorder_send_recv_between_fp_bp]: 2.41998e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 1.10999e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.56002e-06 [control_data_broadcast_order]: 1.75e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.87998e-06 [overlap_recompute_and_grad_model_parallel]: 5.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.54e-06 [overlap_recompute_allgather_and_fa_grad]: 1.46002e-06 [overlap_recompute_comm]: 2.66999e-06 [overlap_grad_ring_attention]: 5.15001e-06 [overlap_grad_flash_sp]: 2.31e-05 [begin_end_overlap_inline]: 4.99975e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 9.725e-05, [1] [Cycle 1]: 9.292e-05, [6] [build]: 9.74e-06 [elim_shapecalc]: 1.298e-05 [elim_not_effective]: 1.787e-05 [opt_reshape]: 1.001e-05 [fold_const_symbol]: 1.462e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.75001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.507e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.24001e-06 [opt_after_jit_grad]: 0.00046442 [validate]: 4.5e-05 [backend_pass]: 9.89996e-07 [task_emit]: 0.00801591 [execute]: 7.03998e-06 Sums bootstrap : 0.000563s : 1.79% type_inference : 0.010220s : 32.49% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000034s : 0.11% optimize.rewriter_before_opt_a : 0.000126s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003110s : 9.89% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000491s : 1.56% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.11% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001414s : 4.49% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000109s : 0.35% optimize.opt_a.renormalize : 0.002983s : 9.48% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000233s : 0.74% optimize.opt_a.a_3 : 0.000459s : 1.46% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000454s : 1.44% optimize.opt_b.b_1 : 0.000188s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000424s : 1.35% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000028s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.19% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000464s : 1.48% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008016s : 25.48% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000727 218 5.83% : 0.000042s : 11: substitution.arithmetic_simplify 2.11% : 0.000015s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.44% : 0.000396s : 16: substitution.inline 2.11% : 0.000015s : 2: substitution.inline_without_move 1.40% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.01% : 0.000015s : 3: substitution.less_batch_normalization 1.80% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.93% : 0.000014s : 20: substitution.remove_not_recompute_node 3.21% : 0.000023s : 10: substitution.replace_applicator 1.44% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.85% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.65% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010151 2 87.49% : 0.008882s : 1: type_inference.infer 12.51% : 0.001269s : 1: type_inference.specialize ------[replace.] 0.000202 30 58.59% : 0.000118s : 16: replace.inline 41.41% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000419 30 92.53% : 0.000387s : 16: match.inline 7.47% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000748 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.20% : 0.000016s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.54% : 0.000041s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.58% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.60% : 0.000019s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.14% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.17% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.90% : 0.000014s : 97: predicate.partial_defer_inline 1.67% : 0.000012s : 89: predicate.partial_eliminate 1.04% : 0.000008s : 67: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000010s : 67: predicate.reduce_eliminate 2.65% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.28% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000014s : 97: predicate.switch_defer_inline 2.91% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.81% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 3.22% : 0.000024s : 83: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.56% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.56% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.20% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001456 32 58.14% : 0.000847s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.86% : 0.000610s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059732 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.03% : 0.003005s : 1: add_attr 5.02% : 0.002997s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000063s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 1.00% : 0.000598s : 1: bootstrap 0.04% : 0.000022s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000021s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.96% : 0.004753s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.96% : 0.010729s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.79% : 0.000474s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.68% : 0.012952s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.06% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.61% : 0.001558s : 2: renormalize.infer 2.36% : 0.001412s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.22% : 0.000130s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.44% : 0.008026s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.13% : 0.010234s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x1-kbk],max_mem:60.0M TotalTime = 0.0784705, [24] [bootstrap]: 0.00054239 [type_inference]: 0.00598775 [event_method]: 1.383e-05 [auto_monad]: 5.469e-05 [graph_reusing]: 5.03002e-06 [inline]: 1.77999e-06 [add_attr]: 0.00345505, [1] [add_attr_with_inline]: 0.00344425, [1] [Cycle 1]: 4.398e-05, [2] [tag_attr]: 1.449e-05 [meta_addattr_fg_expand]: 4.23999e-06 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 2.749e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.00394571, [53] [py_interpret_to_execute]: 2.01e-05 [rewriter_before_opt_a]: 5.776e-05 [opt_a]: 0.002112, [2] [Cycle 1]: 0.00152101, [45] [expand_dump_flag]: 2.82002e-06 [switch_simplify]: 3.248e-05 [loop_unroll]: 2.127e-05 [a_1]: 0.00045052 [with_stream_mark]: 1.294e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.61998e-06 [a_2]: 7.453e-05 [accelerated_algorithm]: 6.19001e-06 [shard]: 1.84e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 5.73002e-06 [merge_send_recv]: 7.33e-06 [auto_parallel]: 5.22999e-06 [parallel]: 2.295e-05 [flash_sp]: 7.47998e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 8.74e-06 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.31002e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 4.02002e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.42001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.03e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.41003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.30003e-06 [meta_fg_expand]: 2.25002e-06 [flash_sp_send_recv_attached]: 2.74001e-06 [receive_attached]: 2.46e-06 [after_resolve]: 1.028e-05 [a_after_grad]: 9.42001e-06 [renormalize]: 0.00040107 [add_forward_monad_depend]: 4.34002e-06 [auto_monad_grad]: 1.69998e-06 [auto_monad_eliminator]: 1.325e-05 [cse]: 2.712e-05 [a_3]: 4.023e-05 [Cycle 2]: 0.00058188, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.71999e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00012442 [with_stream_mark]: 1.012e-05 [recompute_prepare]: 5.50001e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 7.39994e-07 [a_2]: 6.718e-05 [accelerated_algorithm]: 5.64e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.35999e-06 [merge_send_recv]: 4.39002e-06 [auto_parallel]: 5.04e-06 [parallel]: 4.16001e-06 [flash_sp]: 2.97002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.16002e-06 [get_grad_eliminate_]: 5.22999e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.34999e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 5.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.15999e-06 [merge_recompute_call_nodes]: 6.49976e-07 [before_grad]: 7.88999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.84999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.23002e-06 [a_after_grad]: 7.85998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.11998e-06 [cse]: 1.191e-05 [a_3]: 3.129e-05 [py_interpret_to_execute_after_opt_a]: 7.08998e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.014e-05 [convert_after_rewriter]: 6.46e-06 [order_py_execute_after_rewriter]: 4.80001e-06 [mutable_eliminate]: 0.00045537 [opt_b]: 0.00018256, [1] [Cycle 1]: 0.00017634, [7] [b_1]: 0.00010836 [b_2]: 7.15998e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 5.50004e-07 [cse]: 1.593e-05 [optimize_parallel_all_gather_comm]: 1.557e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.177e-05 [loop_unroll]: 0.00041146 [opt_after_cconv]: 9.467e-05, [1] [Cycle 1]: 8.892e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.71998e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.569e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 1.157e-05 [tuple_transform]: 6.727e-05, [1] [Cycle 1]: 6.315e-05, [4] [d_1]: 3.809e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.23998e-06 [partial_unused_args_eliminate]: 1.79998e-06 [add_recomputation]: 4.82e-05 [cse_after_recomputation]: 1.985e-05, [1] [Cycle 1]: 1.563e-05, [1] [cse]: 1.072e-05 [environ_conv]: 4.42e-06 [swap_dp_allreduce_reducescatter]: 4.89998e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.01001e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.49998e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.71999e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.39998e-06 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 1.29998e-06 [interleave_split_concat_branches]: 1.38002e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.27e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96998e-06 [control_data_broadcast_order]: 1.123e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.46001e-06 [overlap_recompute_and_grad_model_parallel]: 4.47998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.47999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.11998e-06 [overlap_grad_ring_attention]: 3.83999e-06 [overlap_grad_flash_sp]: 1.744e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 1.44e-06 [symbol_engine_optimizer]: 6.764e-05, [1] [Cycle 1]: 6.37e-05, [6] [build]: 2.09e-06 [elim_shapecalc]: 8.73001e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 9.15001e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.514e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044718 [validate]: 2.995e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0637162 [execute]: 8.18999e-06 Sums bootstrap : 0.000542s : 0.73% type_inference : 0.005988s : 8.09% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000575s : 0.78% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000010s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000019s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000401s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000455s : 0.62% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000411s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000447s : 0.60% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063716s : 86.07% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.47% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000002s : 2: substitution.fold_const_symbol 3.29% : 0.000005s : 4: substitution.graph_param_transform 66.86% : 0.000109s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.48% : 0.000004s : 4: substitution.remove_not_recompute_node 2.57% : 0.000004s : 4: substitution.replace_old_param 6.65% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005943 2 90.83% : 0.005397s : 1: type_inference.infer 9.17% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.83% : 0.000028s : 3: replace.inline 29.17% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.63% : 0.000107s : 3: match.inline 8.37% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.16% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.55% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.41% : 0.000004s : 32: predicate.load_eliminater 1.13% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.90% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.44% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 47.59% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.41% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087359 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.96% : 0.003459s : 1: add_attr 3.95% : 0.003448s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000579s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.07% : 0.000937s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.42% : 0.002115s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.52% : 0.000456s : 1: opt_after_jit_grad 0.21% : 0.000186s : 1: opt_b 4.52% : 0.003949s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000207s : 1: renormalize.infer 0.21% : 0.000187s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 72.95% : 0.063732s : 1: task_emit 0.08% : 0.000070s : 1: tuple_transform 6.87% : 0.006001s : 1: type_inference 0.06% : 0.000055s : 1: validate TotalTime = 0.0703455, [24] [bootstrap]: 0.00047837 [type_inference]: 0.00437366 [event_method]: 1.069e-05 [auto_monad]: 4.949e-05 [graph_reusing]: 5.06002e-06 [inline]: 1.64998e-06 [add_attr]: 0.00337625, [1] [add_attr_with_inline]: 0.00336791, [1] [Cycle 1]: 5.025e-05, [2] [tag_attr]: 1.157e-05 [meta_addattr_fg_expand]: 3.44001e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 2.324e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.86003e-06 [optimize]: 0.00375082, [53] [py_interpret_to_execute]: 1.619e-05 [rewriter_before_opt_a]: 4.06e-05 [opt_a]: 0.00192089, [2] [Cycle 1]: 0.00132208, [45] [expand_dump_flag]: 2.99001e-06 [switch_simplify]: 2.343e-05 [loop_unroll]: 1.371e-05 [a_1]: 0.00029527 [with_stream_mark]: 1.356e-05 [recompute_prepare]: 7.44002e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 2.13002e-06 [a_2]: 7.643e-05 [accelerated_algorithm]: 6.66999e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 7.80998e-06 [auto_parallel]: 5.77001e-06 [parallel]: 1.731e-05 [flash_sp]: 8.45999e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 3.64002e-06 [matmul_add_comm_reduction]: 8.50001e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.78002e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 3.52002e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 8.95999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 9.25001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.59999e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 9.20001e-06 [renormalize]: 0.00040792 [add_forward_monad_depend]: 4.82e-06 [auto_monad_grad]: 1.84998e-06 [auto_monad_eliminator]: 1.363e-05 [cse]: 2.754e-05 [a_3]: 4.039e-05 [Cycle 2]: 0.00058935, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.39001e-06 [loop_unroll]: 5.36998e-06 [a_1]: 0.00012598 [with_stream_mark]: 1.075e-05 [recompute_prepare]: 5.62001e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.12999e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.736e-05 [accelerated_algorithm]: 5.27001e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.42001e-06 [merge_send_recv]: 4.15e-06 [auto_parallel]: 4.87998e-06 [parallel]: 4.18001e-06 [flash_sp]: 3.08e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.25999e-06 [allreduce_slice_to_reducescatter]: 2.19996e-07 [virtual_shard_identity]: 6.29999e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 5.88002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.38002e-06 [merge_recompute_call_nodes]: 6.49976e-07 [before_grad]: 7.66999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26999e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 8.94e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 9.79984e-07 [auto_monad_eliminator]: 5.87001e-06 [cse]: 1.251e-05 [a_3]: 3.195e-05 [py_interpret_to_execute_after_opt_a]: 7.19001e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 3.008e-05 [convert_after_rewriter]: 6.43e-06 [order_py_execute_after_rewriter]: 5.57001e-06 [mutable_eliminate]: 0.00046836 [opt_b]: 0.00018006, [1] [Cycle 1]: 0.00017383, [7] [b_1]: 0.00010749 [b_2]: 7.16999e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.50003e-07 [cse]: 1.587e-05 [optimize_parallel_all_gather_comm]: 1.515e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.323e-05 [loop_unroll]: 0.00041736 [opt_after_cconv]: 9.574e-05, [1] [Cycle 1]: 9.002e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.77002e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.638e-05 [renormalize]: 1.8999e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.846e-05, [1] [Cycle 1]: 6.414e-05, [4] [d_1]: 3.885e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.325e-05 [cse_after_recomputation]: 2.094e-05, [1] [Cycle 1]: 1.634e-05, [1] [cse]: 1.084e-05 [environ_conv]: 4.94e-06 [swap_dp_allreduce_reducescatter]: 5.12e-06 [bias_add_comm_swap]: 3.2e-06 [label_micro_interleaved_index]: 4.32e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.17999e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 7.59988e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.90025e-07 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.121e-05 [grouped_pairwise_exchange_alltoall]: 1.56998e-06 [offloading_packed_experts]: 3.63e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.49998e-06 [overlap_recompute_comm]: 2.03002e-06 [overlap_grad_ring_attention]: 3.99002e-06 [overlap_grad_flash_sp]: 1.722e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.99999e-06 [handle_group_info]: 1.38002e-06 [symbol_engine_optimizer]: 6.841e-05, [1] [Cycle 1]: 6.432e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.35001e-06 [elim_not_effective]: 1.143e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 1.549e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.0004495 [validate]: 3.127e-05 [backend_pass]: 9.10019e-07 [task_emit]: 0.0575568 [execute]: 7.82e-06 Sums bootstrap : 0.000478s : 0.72% type_inference : 0.004374s : 6.63% event_method : 0.000011s : 0.02% auto_monad : 0.000049s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000023s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000041s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000421s : 0.64% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000012s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000408s : 0.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000468s : 0.71% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.04% optimize.loop_unroll : 0.000417s : 0.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057557s : 87.19% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000122 26 17.51% : 0.000021s : 4: substitution.arithmetic_simplify 1.69% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000005s : 4: substitution.graph_param_transform 66.88% : 0.000081s : 2: substitution.inline 2.20% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.28% : 0.000004s : 4: substitution.remove_not_recompute_node 3.17% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004334 2 91.73% : 0.003976s : 1: type_inference.infer 8.27% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000080 2 100.00% : 0.000080s : 2: match.inline ------[predicate.] 0.000140 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 1.19% : 0.000002s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.59% : 0.000004s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.92% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.11% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.09% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.74% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.80% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.65% : 0.000001s : 4: predicate.parallel_virtual_node 1.18% : 0.000002s : 11: predicate.partial_defer_inline 1.19% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.24% : 0.000002s : 9: predicate.reduce_eliminate 2.05% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.13% : 0.000002s : 8: predicate.shard_identity_eliminate 1.03% : 0.000001s : 8: predicate.special_op_eliminate 1.00% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.23% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.95% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.03% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.19% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.55% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000266 6 42.48% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.52% : 0.000153s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078807 196 0.00% : 0.000004s : 1: ForceFp32Comm 4.29% : 0.003381s : 1: add_attr 4.28% : 0.003372s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000511s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.61% : 0.000478s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000770s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.44% : 0.001924s : 1: opt_a 0.13% : 0.000099s : 1: opt_after_cconv 0.58% : 0.000459s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.76% : 0.003755s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000028s : 1: pre_auto_parallel 0.03% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.28% : 0.000217s : 1: renormalize.infer 0.23% : 0.000185s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000045s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 73.05% : 0.057572s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.57% : 0.004387s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.0723105, [24] [bootstrap]: 0.00047097 [type_inference]: 0.00549424 [event_method]: 1.439e-05 [auto_monad]: 5.431e-05 [graph_reusing]: 5.51e-06 [inline]: 1.57001e-06 [add_attr]: 0.00298436, [1] [add_attr_with_inline]: 0.00297649, [1] [Cycle 1]: 4.463e-05, [2] [tag_attr]: 1.502e-05 [meta_addattr_fg_expand]: 3.93001e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.535e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00396421, [53] [py_interpret_to_execute]: 1.953e-05 [rewriter_before_opt_a]: 5.925e-05 [opt_a]: 0.00214788, [2] [Cycle 1]: 0.00154275, [45] [expand_dump_flag]: 2.74999e-06 [switch_simplify]: 3.175e-05 [loop_unroll]: 2.058e-05 [a_1]: 0.00044374 [with_stream_mark]: 1.36e-05 [recompute_prepare]: 7.89002e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.65002e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 7.629e-05 [accelerated_algorithm]: 6.12999e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 7.36001e-06 [auto_parallel]: 5.38002e-06 [parallel]: 1.717e-05 [flash_sp]: 6.94999e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 3.49001e-06 [matmul_add_comm_reduction]: 8.47998e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.09001e-06 [virtual_dataset]: 5.61998e-06 [get_grad_eliminate_]: 5.46002e-06 [virtual_output]: 5.56998e-06 [merge_forward]: 3.69002e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 9.39998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.044e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 8.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 9.72001e-06 [a_after_grad]: 8.74998e-06 [renormalize]: 0.0004654 [add_forward_monad_depend]: 4.87e-06 [auto_monad_grad]: 1.71002e-06 [auto_monad_eliminator]: 1.298e-05 [cse]: 2.826e-05 [a_3]: 4.117e-05 [Cycle 2]: 0.00059571, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.73998e-06 [loop_unroll]: 5.27999e-06 [a_1]: 0.00012476 [with_stream_mark]: 9.77999e-06 [recompute_prepare]: 5.60001e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.805e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.58002e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.33001e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.05e-06 [flash_sp]: 2.92002e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 4.95999e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.13002e-06 [virtual_output]: 6.55997e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.46002e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47999e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.7e-06 [set_forward_comm_id_for_comm_node_pass]: 2.97002e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 1.04998e-06 [receive_attached]: 1.04e-06 [after_resolve]: 8.77999e-06 [a_after_grad]: 7.87e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 7.99977e-07 [auto_monad_eliminator]: 6.34001e-06 [cse]: 1.316e-05 [a_3]: 3.298e-05 [py_interpret_to_execute_after_opt_a]: 7.42002e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 2.963e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 4.78001e-06 [mutable_eliminate]: 0.00044832 [opt_b]: 0.00018013, [1] [Cycle 1]: 0.0001739, [7] [b_1]: 0.00010707 [b_2]: 6.85002e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.33002e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.59985e-07 [cse]: 1.602e-05 [optimize_parallel_all_gather_comm]: 1.506e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.123e-05 [loop_unroll]: 0.00041011 [opt_after_cconv]: 9.435e-05, [1] [Cycle 1]: 8.85e-05, [7] [c_1]: 2.698e-05 [parameter_eliminate]: 2.55002e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.583e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.153e-05 [tuple_transform]: 6.867e-05, [1] [Cycle 1]: 6.43e-05, [4] [d_1]: 3.868e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 4.248e-05 [cse_after_recomputation]: 2.112e-05, [1] [Cycle 1]: 1.676e-05, [1] [cse]: 1.138e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 4.87998e-06 [bias_add_comm_swap]: 2.16998e-06 [label_micro_interleaved_index]: 3.88001e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.78e-06 [assign_add_opt]: 1.12999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.76e-06 [comm_op_add_attrs]: 1.31998e-06 [add_comm_op_reuse_tag]: 1.10001e-06 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97001e-06 [control_data_broadcast_order]: 1.182e-05 [grouped_pairwise_exchange_alltoall]: 2.01e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.43001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.69998e-06 [overlap_recompute_comm]: 2.16998e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.664e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.772e-05, [1] [Cycle 1]: 6.384e-05, [6] [build]: 2.12999e-06 [elim_shapecalc]: 8.18999e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.89e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.74998e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00044538 [validate]: 3.087e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.058582 [execute]: 7.81001e-06 Sums bootstrap : 0.000471s : 0.69% type_inference : 0.005494s : 8.04% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000568s : 0.83% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000012s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000016s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000018s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000465s : 0.68% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.66% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000410s : 0.60% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000445s : 0.65% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058582s : 85.69% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000161 30 15.57% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.32% : 0.000005s : 4: substitution.graph_param_transform 66.48% : 0.000107s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.59% : 0.000004s : 4: substitution.remove_not_recompute_node 2.21% : 0.000004s : 4: substitution.replace_old_param 6.39% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005453 2 90.01% : 0.004908s : 1: type_inference.infer 9.99% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.77% : 0.000027s : 3: replace.inline 30.23% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.95% : 0.000105s : 3: match.inline 8.05% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 1.05% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.78% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000009s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.45% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.88% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.40% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.77% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 1.03% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 47.05% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.95% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080813 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.70% : 0.002989s : 1: add_attr 3.69% : 0.002980s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.62% : 0.000505s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000419s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.15% : 0.000933s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.66% : 0.002151s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.56% : 0.000455s : 1: opt_after_jit_grad 0.23% : 0.000184s : 1: opt_b 4.91% : 0.003968s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.32% : 0.000260s : 1: renormalize.infer 0.25% : 0.000199s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.08% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 72.51% : 0.058599s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 6.82% : 0.005508s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.111593, [24] [bootstrap]: 0.00054691 [type_inference]: 0.0114784 [event_method]: 4.968e-05 [auto_monad]: 0.0001191 [graph_reusing]: 8.40999e-06 [inline]: 1.74998e-06 [add_attr]: 0.00306713, [1] [add_attr_with_inline]: 0.00305862, [1] [Cycle 1]: 7.167e-05, [2] [tag_attr]: 3.384e-05 [meta_addattr_fg_expand]: 9.32001e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 4.996e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0135875, [53] [py_interpret_to_execute]: 3.811e-05 [rewriter_before_opt_a]: 0.00014494 [opt_a]: 0.0112055, [3] [Cycle 1]: 0.00717584, [45] [expand_dump_flag]: 3.58999e-06 [switch_simplify]: 7.405e-05 [loop_unroll]: 6.159e-05 [a_1]: 0.00146212 [with_stream_mark]: 2.298e-05 [recompute_prepare]: 2.189e-05 [updatestate_depend_eliminate]: 8.92999e-06 [updatestate_assign_eliminate]: 7.82002e-06 [updatestate_loads_eliminate]: 7.23e-06 [parameter_eliminate]: 2.56998e-06 [a_2]: 0.00024418 [accelerated_algorithm]: 3.05e-05 [shard]: 1.74e-06 [meta_shard_fg_expand]: 3.55e-06 [shard_inline]: 1.605e-05 [merge_send_recv]: 1.528e-05 [auto_parallel]: 1.071e-05 [parallel]: 1.789e-05 [flash_sp]: 1.197e-05 [merge_comm]: 9.31e-06 [allreduce_fusion]: 9.04998e-06 [matmul_add_comm_reduction]: 2.589e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.831e-05 [virtual_dataset]: 1.552e-05 [get_grad_eliminate_]: 1.506e-05 [virtual_output]: 1.505e-05 [merge_forward]: 9.51e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.711e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.883e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 2.7e-05 [set_forward_comm_id_for_comm_node_pass]: 9.51e-06 [meta_fg_expand]: 0.00141804 [flash_sp_send_recv_attached]: 3.57002e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 5.984e-05 [a_after_grad]: 8.205e-05 [renormalize]: 0.00252854 [add_forward_monad_depend]: 9.00999e-06 [auto_monad_grad]: 5.15001e-06 [auto_monad_eliminator]: 5.707e-05 [cse]: 0.00016755 [a_3]: 0.00033636 [Cycle 2]: 0.00311072, [45] [expand_dump_flag]: 1.96e-06 [switch_simplify]: 4.7e-05 [loop_unroll]: 4.395e-05 [a_1]: 0.00157164 [with_stream_mark]: 1.387e-05 [recompute_prepare]: 1.14e-05 [updatestate_depend_eliminate]: 5.76e-06 [updatestate_assign_eliminate]: 4.57e-06 [updatestate_loads_eliminate]: 3.85998e-06 [parameter_eliminate]: 1.14e-06 [a_2]: 0.00012674 [accelerated_algorithm]: 1.255e-05 [shard]: 1.24e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.34998e-06 [merge_send_recv]: 6.98e-06 [auto_parallel]: 7.85e-06 [parallel]: 5.30999e-06 [flash_sp]: 3.16001e-06 [merge_comm]: 5.64998e-06 [allreduce_fusion]: 5.47999e-06 [matmul_add_comm_reduction]: 8.52998e-06 [allreduce_slice_to_reducescatter]: 4.80009e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 8.50001e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.46002e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.629e-05 [merge_recompute_call_nodes]: 8.09989e-07 [before_grad]: 1.396e-05 [set_forward_comm_id_for_comm_node_pass]: 5.21998e-06 [meta_fg_expand]: 7.135e-05 [flash_sp_send_recv_attached]: 1.09998e-06 [receive_attached]: 1.26002e-06 [after_resolve]: 1.628e-05 [a_after_grad]: 1.419e-05 [renormalize]: 0.00064152 [add_forward_monad_depend]: 4.75999e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.532e-05 [cse]: 4.964e-05 [a_3]: 6.501e-05 [Cycle 3]: 0.00090452, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 1.058e-05 [loop_unroll]: 8.77e-06 [a_1]: 0.00025053 [with_stream_mark]: 9.87001e-06 [recompute_prepare]: 9.42999e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 3.90998e-06 [updatestate_loads_eliminate]: 3.66001e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012299 [accelerated_algorithm]: 1.183e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 9.08002e-06 [merge_send_recv]: 7.1e-06 [auto_parallel]: 7.23999e-06 [parallel]: 4.94e-06 [flash_sp]: 1.04e-06 [merge_comm]: 5.25001e-06 [allreduce_fusion]: 4.99e-06 [matmul_add_comm_reduction]: 7.54002e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 9.86e-06 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.33999e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 1.49e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.567e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.363e-05 [set_forward_comm_id_for_comm_node_pass]: 5.14e-06 [meta_fg_expand]: 2.95002e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.47e-05 [a_after_grad]: 1.482e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 1.133e-05 [cse]: 2.706e-05 [a_3]: 5.896e-05 [py_interpret_to_execute_after_opt_a]: 1.122e-05 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 4.79e-05 [convert_after_rewriter]: 9.40001e-06 [order_py_execute_after_rewriter]: 6.89999e-06 [mutable_eliminate]: 0.00049327 [opt_b]: 0.0002897, [1] [Cycle 1]: 0.00028307, [7] [b_1]: 0.00018882 [b_2]: 1.079e-05 [updatestate_depend_eliminate]: 7.46001e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.78999e-06 [renormalize]: 4.49974e-07 [cse]: 3.242e-05 [optimize_parallel_all_gather_comm]: 2.021e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.009e-05 [loop_unroll]: 0.00043024 [opt_after_cconv]: 0.00013842, [1] [Cycle 1]: 0.00013226, [7] [c_1]: 4.885e-05 [parameter_eliminate]: 2.44001e-06 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 4.13999e-06 [updatestate_loads_eliminate]: 4.22998e-06 [cse]: 3.061e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 3.091e-05 [tuple_transform]: 0.00016953, [1] [Cycle 1]: 0.00016478, [4] [d_1]: 0.00013391 [none_parameter_eliminate]: 1.74998e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 2.31e-06 [add_recomputation]: 5.921e-05 [cse_after_recomputation]: 3.444e-05, [1] [Cycle 1]: 2.89e-05, [1] [cse]: 2.349e-05 [environ_conv]: 9.46e-06 [swap_dp_allreduce_reducescatter]: 7.43999e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.40999e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.23002e-06 [micro_interleaved_order_control]: 2.42001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.53998e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.40999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.66e-06 [control_data_broadcast_order]: 1.7e-05 [grouped_pairwise_exchange_alltoall]: 2.01e-06 [offloading_packed_experts]: 5.12e-06 [overlap_recompute_and_grad_model_parallel]: 5.53002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.48002e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 5.17e-06 [overlap_grad_flash_sp]: 2.481e-05 [begin_end_overlap_inline]: 4.49974e-07 [split_matmul_comm_elemetwise]: 2.36998e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 9.69972e-07 [symbol_engine_optimizer]: 9.771e-05, [1] [Cycle 1]: 9.332e-05, [6] [build]: 9.34998e-06 [elim_shapecalc]: 1.355e-05 [elim_not_effective]: 1.779e-05 [opt_reshape]: 1.005e-05 [fold_const_symbol]: 1.463e-05 [renormalize]: 2.19996e-07 [detach_backward]: 2.23002e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 2.49e-05 [get_jit_bprop_graph]: 1.10999e-06 [rewriter_after_jit_bprop_graph]: 3.66001e-06 [opt_after_jit_grad]: 0.0004746 [validate]: 4.674e-05 [backend_pass]: 9.49978e-07 [task_emit]: 0.081902 [execute]: 8.48999e-06 Sums bootstrap : 0.000547s : 0.51% type_inference : 0.011478s : 10.70% event_method : 0.000050s : 0.05% auto_monad : 0.000119s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000145s : 0.14% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.12% optimize.opt_a.loop_unroll : 0.000114s : 0.11% optimize.opt_a.a_1 : 0.003284s : 3.06% optimize.opt_a.with_stream_mark : 0.000047s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.46% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001492s : 1.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000091s : 0.08% optimize.opt_a.a_after_grad : 0.000111s : 0.10% optimize.opt_a.renormalize : 0.003170s : 2.96% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.08% optimize.opt_a.cse : 0.000244s : 0.23% optimize.opt_a.a_3 : 0.000460s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000048s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000493s : 0.46% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000430s : 0.40% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000031s : 0.03% optimize.tuple_transform.d_1 : 0.000134s : 0.12% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000475s : 0.44% validate : 0.000047s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.081902s : 76.36% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000770 222 6.13% : 0.000047s : 12: substitution.arithmetic_simplify 1.91% : 0.000015s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 55.54% : 0.000428s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.26% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.92% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.82% : 0.000014s : 20: substitution.remove_not_recompute_node 3.29% : 0.000025s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.49% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.77% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.47% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.56% : 0.000020s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011403 2 86.79% : 0.009897s : 1: type_inference.infer 13.21% : 0.001506s : 1: type_inference.specialize ------[replace.] 0.000234 33 60.29% : 0.000141s : 17: replace.inline 39.71% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000452 33 92.58% : 0.000419s : 17: match.inline 7.42% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000815 5764 1.00% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.46% : 0.000004s : 32: predicate.addn_check_dump 0.98% : 0.000008s : 68: predicate.addn_zero_filter 0.97% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.90% : 0.000015s : 100: predicate.arithmetic_simplify 1.05% : 0.000009s : 68: predicate.cast_eliminate 1.07% : 0.000009s : 68: predicate.check_bprop_eliminate 0.47% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.48% : 0.000004s : 32: predicate.depend_value_elim 1.10% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.08% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.02% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.34% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.14% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.09% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.10% : 0.000009s : 76: predicate.environ_get_depend_swap 1.61% : 0.000013s : 108: predicate.environ_get_eliminate 1.08% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.60% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.11% : 0.000017s : 101: predicate.float_depend_g_call 0.47% : 0.000004s : 32: predicate.float_environ_get_switch 0.62% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.07% : 0.000001s : 8: predicate.fold_const_symbol 0.51% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.15% : 0.000042s : 249: predicate.inline 1.16% : 0.000009s : 55: predicate.inline_without_move 0.29% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.60% : 0.000005s : 32: predicate.less_batch_normalization 1.52% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.46% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.11% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.25% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.51% : 0.000004s : 32: predicate.merge_addn 1.02% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.04% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.04% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 1.87% : 0.000015s : 101: predicate.partial_defer_inline 1.56% : 0.000013s : 92: predicate.partial_eliminate 1.00% : 0.000008s : 68: predicate.print_const_string_wrapper 0.48% : 0.000004s : 32: predicate.reduce_all_const_elim 1.19% : 0.000010s : 68: predicate.reduce_eliminate 2.47% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.30% : 0.000002s : 32: predicate.remove_not_recompute_node 1.78% : 0.000014s : 152: predicate.replace_applicator 0.56% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 0.98% : 0.000008s : 68: predicate.reshape_eliminate 1.04% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.13% : 0.000001s : 8: predicate.row_tensor_eliminate 1.17% : 0.000010s : 68: predicate.same_eliminate 0.33% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.26% : 0.000002s : 16: predicate.special_op_eliminate 0.57% : 0.000005s : 32: predicate.specialize_transform 1.17% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.08% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.74% : 0.000014s : 101: predicate.switch_defer_inline 2.74% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.65% : 0.000038s : 277: predicate.switch_simplify 0.99% : 0.000008s : 68: predicate.tile_eliminate 0.98% : 0.000008s : 68: predicate.transpose_eliminate 1.32% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.40% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.68% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.33% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.80% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 9.55% : 0.000078s : 100: predicate.tuple_to_list_eliminator_ 2.43% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.01% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.13% : 0.000001s : 8: predicate.value_based_eliminate 0.52% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.50% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.15% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001612 34 56.66% : 0.000913s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.34% : 0.000699s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.136758 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.25% : 0.003071s : 1: add_attr 2.24% : 0.003062s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000127s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000581s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000057s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000439s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000502s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.62% : 0.004949s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.10% : 0.000142s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.20% : 0.011209s : 1: opt_a 0.10% : 0.000142s : 1: opt_after_cconv 0.35% : 0.000484s : 1: opt_after_jit_grad 0.21% : 0.000293s : 1: opt_b 9.94% : 0.013591s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000035s : 1: remove_dup_value 1.26% : 0.001717s : 2: renormalize.infer 1.05% : 0.001440s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000052s : 1: rewriter_after_opt_a 0.11% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.07% : 0.000100s : 1: symbol_engine_optimizer 59.90% : 0.081920s : 1: task_emit 0.13% : 0.000172s : 1: tuple_transform 8.40% : 0.011493s : 1: type_inference 0.05% : 0.000071s : 1: validate TotalTime = 0.0696873, [24] [bootstrap]: 0.00046675 [type_inference]: 0.00425686 [event_method]: 1.053e-05 [auto_monad]: 5.046e-05 [graph_reusing]: 4.87e-06 [inline]: 1.79998e-06 [add_attr]: 0.00298105, [1] [add_attr_with_inline]: 0.00297307, [1] [Cycle 1]: 4.377e-05, [2] [tag_attr]: 1.155e-05 [meta_addattr_fg_expand]: 2.99999e-06 [parallel-infer-symbol]: 2.87002e-06 [pre_auto_parallel]: 2.189e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.76e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00367293, [53] [py_interpret_to_execute]: 1.509e-05 [rewriter_before_opt_a]: 3.856e-05 [opt_a]: 0.00183775, [2] [Cycle 1]: 0.00124366, [45] [expand_dump_flag]: 2.63e-06 [switch_simplify]: 2.41e-05 [loop_unroll]: 1.355e-05 [a_1]: 0.00028903 [with_stream_mark]: 1.307e-05 [recompute_prepare]: 7.41001e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 2.96001e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 1.58002e-06 [a_2]: 7.616e-05 [accelerated_algorithm]: 6.59001e-06 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.78999e-06 [auto_parallel]: 5.69999e-06 [parallel]: 1.851e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.81001e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 8.92e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.3e-06 [virtual_dataset]: 5.95002e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.43999e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 9.42001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.074e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 8.94e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.36e-06 [receive_attached]: 2.01e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 9.05999e-06 [renormalize]: 0.00034187 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.66998e-06 [auto_monad_eliminator]: 1.263e-05 [cse]: 2.569e-05 [a_3]: 3.953e-05 [Cycle 2]: 0.00058483, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.66999e-06 [loop_unroll]: 5.52999e-06 [a_1]: 0.000124 [with_stream_mark]: 9.11002e-06 [recompute_prepare]: 5.51002e-06 [updatestate_depend_eliminate]: 2.63998e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.55002e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 6.673e-05 [accelerated_algorithm]: 5.47999e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.51e-06 [parallel]: 3.90998e-06 [flash_sp]: 3.38e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.94001e-06 [matmul_add_comm_reduction]: 4.82998e-06 [allreduce_slice_to_reducescatter]: 2.30008e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.19e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.94003e-06 [merge_forward]: 2.34999e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.54e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.54998e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.66e-06 [a_after_grad]: 7.98001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 8.39995e-07 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.183e-05 [a_3]: 3.156e-05 [py_interpret_to_execute_after_opt_a]: 7.40998e-06 [slice_cell_reuse_recomputed_activation]: 1.99999e-06 [rewriter_after_opt_a]: 2.963e-05 [convert_after_rewriter]: 7.02002e-06 [order_py_execute_after_rewriter]: 5.20999e-06 [mutable_eliminate]: 0.00044786 [opt_b]: 0.00017878, [1] [Cycle 1]: 0.00017298, [7] [b_1]: 0.00010689 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 3.59985e-07 [cse]: 1.55e-05 [optimize_parallel_all_gather_comm]: 1.5e-05 [overlap_param_gather]: 1.73997e-06 [cconv]: 2.195e-05 [loop_unroll]: 0.0004559 [opt_after_cconv]: 9.441e-05, [1] [Cycle 1]: 8.843e-05, [7] [c_1]: 2.749e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.20002e-06 [cse]: 1.569e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.213e-05 [tuple_transform]: 6.819e-05, [1] [Cycle 1]: 6.377e-05, [4] [d_1]: 3.864e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.21e-06 [partial_unused_args_eliminate]: 2.09999e-06 [add_recomputation]: 4.231e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.625e-05, [1] [cse]: 1.095e-05 [environ_conv]: 4.89e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.67001e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.32999e-06 [slice_recompute_activation]: 2.06998e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.60002e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.10999e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.158e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.59002e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.7e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 2.17999e-06 [handle_group_info]: 1.00999e-06 [symbol_engine_optimizer]: 6.84e-05, [1] [Cycle 1]: 6.439e-05, [6] [build]: 2.23002e-06 [elim_shapecalc]: 8.50999e-06 [elim_not_effective]: 1.182e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 8.79998e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.53e-05 [get_jit_bprop_graph]: 9.09989e-07 [rewriter_after_jit_bprop_graph]: 3.11001e-06 [opt_after_jit_grad]: 0.00044568 [validate]: 3.079e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0575084 [execute]: 8.1e-06 Sums bootstrap : 0.000467s : 0.71% type_inference : 0.004257s : 6.47% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000413s : 0.63% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000342s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000038s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.68% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000456s : 0.69% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057508s : 87.46% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.41% : 0.000022s : 4: substitution.arithmetic_simplify 1.79% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.28% : 0.000005s : 4: substitution.graph_param_transform 65.54% : 0.000079s : 2: substitution.inline 2.21% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004215 2 91.82% : 0.003871s : 1: type_inference.infer 8.18% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000134 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.26% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.90% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.37% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.84% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.55% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.82% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.44% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.96% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.61% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.06% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.97% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 42.72% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.28% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.077600 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.85% : 0.002985s : 1: add_attr 3.84% : 0.002977s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.64% : 0.000500s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.60% : 0.000465s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000760s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.37% : 0.001841s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.59% : 0.000455s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.74% : 0.003677s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000187s : 1: renormalize.infer 0.19% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000033s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.13% : 0.057524s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.50% : 0.004270s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.107678, [24] [bootstrap]: 0.00050292 [type_inference]: 0.0102477 [event_method]: 4.332e-05 [auto_monad]: 0.00011525 [graph_reusing]: 7.93001e-06 [inline]: 2.24001e-06 [add_attr]: 0.00299123, [1] [add_attr_with_inline]: 0.002983, [1] [Cycle 1]: 6.678e-05, [2] [tag_attr]: 3.207e-05 [meta_addattr_fg_expand]: 8.22e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 4.586e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.0130915, [53] [py_interpret_to_execute]: 3.546e-05 [rewriter_before_opt_a]: 0.00012693 [opt_a]: 0.0108004, [3] [Cycle 1]: 0.00693266, [45] [expand_dump_flag]: 3.79002e-06 [switch_simplify]: 6.718e-05 [loop_unroll]: 5.485e-05 [a_1]: 0.00132524 [with_stream_mark]: 2.297e-05 [recompute_prepare]: 2.174e-05 [updatestate_depend_eliminate]: 8.87e-06 [updatestate_assign_eliminate]: 7.49002e-06 [updatestate_loads_eliminate]: 7.40998e-06 [parameter_eliminate]: 2.61999e-06 [a_2]: 0.0002447 [accelerated_algorithm]: 3.052e-05 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 3.10998e-06 [shard_inline]: 1.628e-05 [merge_send_recv]: 1.593e-05 [auto_parallel]: 1.049e-05 [parallel]: 1.779e-05 [flash_sp]: 1.109e-05 [merge_comm]: 9.54999e-06 [allreduce_fusion]: 9.76e-06 [matmul_add_comm_reduction]: 2.602e-05 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 1.79e-05 [virtual_dataset]: 1.576e-05 [get_grad_eliminate_]: 1.517e-05 [virtual_output]: 1.524e-05 [merge_forward]: 9.24e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.712e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.921e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 2.734e-05 [set_forward_comm_id_for_comm_node_pass]: 9.27999e-06 [meta_fg_expand]: 0.00140633 [flash_sp_send_recv_attached]: 3.78999e-06 [receive_attached]: 2.78e-06 [after_resolve]: 5.913e-05 [a_after_grad]: 8.009e-05 [renormalize]: 0.00242663 [add_forward_monad_depend]: 8.79e-06 [auto_monad_grad]: 5.39e-06 [auto_monad_eliminator]: 5.56e-05 [cse]: 0.00017078 [a_3]: 0.00035716 [Cycle 2]: 0.00294835, [45] [expand_dump_flag]: 1.60999e-06 [switch_simplify]: 4.761e-05 [loop_unroll]: 4.5e-05 [a_1]: 0.00151977 [with_stream_mark]: 1.208e-05 [recompute_prepare]: 1.07e-05 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 4.30999e-06 [updatestate_loads_eliminate]: 3.55998e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012634 [accelerated_algorithm]: 1.221e-05 [shard]: 1.07e-06 [meta_shard_fg_expand]: 2.11e-06 [shard_inline]: 9.15001e-06 [merge_send_recv]: 6.73998e-06 [auto_parallel]: 7.35e-06 [parallel]: 4.80001e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 5.82001e-06 [allreduce_fusion]: 4.85999e-06 [matmul_add_comm_reduction]: 7.63999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.008e-05 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.49998e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.48001e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.632e-05 [merge_recompute_call_nodes]: 8.29983e-07 [before_grad]: 1.468e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32001e-06 [meta_fg_expand]: 3.417e-05 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.48e-05 [a_after_grad]: 1.423e-05 [renormalize]: 0.00058255 [add_forward_monad_depend]: 4.13999e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.432e-05 [cse]: 4.667e-05 [a_3]: 6.623e-05 [Cycle 3]: 0.00090548, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 1.077e-05 [loop_unroll]: 8.85999e-06 [a_1]: 0.00025129 [with_stream_mark]: 9.93998e-06 [recompute_prepare]: 9.46e-06 [updatestate_depend_eliminate]: 4.77e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.81999e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 0.00012358 [accelerated_algorithm]: 1.18e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 8.93002e-06 [merge_send_recv]: 6.78998e-06 [auto_parallel]: 7.25998e-06 [parallel]: 4.38001e-06 [flash_sp]: 1.02e-06 [merge_comm]: 5.09e-06 [allreduce_fusion]: 4.91997e-06 [matmul_add_comm_reduction]: 7.68001e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.73001e-06 [get_grad_eliminate_]: 8.53001e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.42998e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 8.43001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.612e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.373e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19998e-06 [meta_fg_expand]: 2.85002e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 1.434e-05 [a_after_grad]: 1.489e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 1.03001e-06 [auto_monad_eliminator]: 1.127e-05 [cse]: 2.604e-05 [a_3]: 6.014e-05 [py_interpret_to_execute_after_opt_a]: 9.87999e-06 [slice_cell_reuse_recomputed_activation]: 1.99e-06 [rewriter_after_opt_a]: 9.33e-05 [convert_after_rewriter]: 9.05999e-06 [order_py_execute_after_rewriter]: 7.35998e-06 [mutable_eliminate]: 0.00045984 [opt_b]: 0.00028761, [1] [Cycle 1]: 0.0002817, [7] [b_1]: 0.00018944 [b_2]: 1.083e-05 [updatestate_depend_eliminate]: 7.07002e-06 [updatestate_assign_eliminate]: 3.91999e-06 [updatestate_loads_eliminate]: 3.81999e-06 [renormalize]: 4.50003e-07 [cse]: 3.18e-05 [optimize_parallel_all_gather_comm]: 2.041e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 1.979e-05 [loop_unroll]: 0.00042751 [opt_after_cconv]: 0.00013722, [1] [Cycle 1]: 0.00013162, [7] [c_1]: 4.848e-05 [parameter_eliminate]: 2.36998e-06 [updatestate_depend_eliminate]: 7.35998e-06 [updatestate_assign_eliminate]: 4.23999e-06 [updatestate_loads_eliminate]: 4.29002e-06 [cse]: 3.011e-05 [renormalize]: 7.99977e-07 [remove_dup_value]: 2.816e-05 [tuple_transform]: 0.00010072, [1] [Cycle 1]: 9.595e-05, [4] [d_1]: 6.65e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.71e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 5.848e-05 [cse_after_recomputation]: 3.272e-05, [1] [Cycle 1]: 2.783e-05, [1] [cse]: 2.204e-05 [environ_conv]: 9.44e-06 [swap_dp_allreduce_reducescatter]: 7.77998e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.30002e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.32999e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.705e-05 [grouped_pairwise_exchange_alltoall]: 1.99e-06 [offloading_packed_experts]: 4.94e-06 [overlap_recompute_and_grad_model_parallel]: 5.42001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72001e-06 [overlap_recompute_comm]: 1.95001e-06 [overlap_grad_ring_attention]: 5.57001e-06 [overlap_grad_flash_sp]: 2.397e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.769e-05, [1] [Cycle 1]: 9.34e-05, [6] [build]: 9.44e-06 [elim_shapecalc]: 1.374e-05 [elim_not_effective]: 1.763e-05 [opt_reshape]: 1.005e-05 [fold_const_symbol]: 1.481e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 2.513e-05 [get_jit_bprop_graph]: 1.13001e-06 [rewriter_after_jit_bprop_graph]: 3.35998e-06 [opt_after_jit_grad]: 0.00046524 [validate]: 4.599e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0798603 [execute]: 8.27e-06 Sums bootstrap : 0.000503s : 0.49% type_inference : 0.010248s : 9.91% event_method : 0.000043s : 0.04% auto_monad : 0.000115s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.12% optimize.opt_a.loop_unroll : 0.000109s : 0.11% optimize.opt_a.a_1 : 0.003096s : 2.99% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000020s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001443s : 1.40% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.09% optimize.opt_a.a_after_grad : 0.000109s : 0.11% optimize.opt_a.renormalize : 0.003009s : 2.91% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.08% optimize.opt_a.cse : 0.000243s : 0.24% optimize.opt_a.a_3 : 0.000484s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000093s : 0.09% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.44% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000428s : 0.41% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000066s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000465s : 0.45% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079860s : 77.21% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000725 218 5.96% : 0.000043s : 11: substitution.arithmetic_simplify 1.90% : 0.000014s : 2: substitution.cast_eliminate 0.38% : 0.000003s : 5: substitution.elim_not_effective 0.52% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.35% : 0.000003s : 2: substitution.incorporate_call_switch 54.61% : 0.000396s : 16: substitution.inline 2.11% : 0.000015s : 2: substitution.inline_without_move 1.40% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.77% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.88% : 0.000014s : 20: substitution.remove_not_recompute_node 3.41% : 0.000025s : 10: substitution.replace_applicator 1.41% : 0.000010s : 15: substitution.replace_old_param 0.41% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.75% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.50% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.33% : 0.000060s : 28: substitution.tuple_list_get_item_eliminator 2.42% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010180 2 86.93% : 0.008850s : 1: type_inference.infer 13.07% : 0.001330s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.09% : 0.000118s : 16: replace.inline 40.91% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000417 30 92.90% : 0.000388s : 16: match.inline 7.10% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5663 1.05% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.01% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000009s : 67: predicate.cast_eliminate 1.13% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.16% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.35% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.17% : 0.000009s : 75: predicate.environ_get_depend_swap 1.71% : 0.000013s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.64% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.18% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.50% : 0.000042s : 244: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.56% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.61% : 0.000020s : 164: predicate.load_eliminater 0.28% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.16% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.36% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 3.92% : 0.000030s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.91% : 0.000015s : 97: predicate.partial_defer_inline 1.67% : 0.000013s : 89: predicate.partial_eliminate 1.04% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.23% : 0.000009s : 67: predicate.reduce_eliminate 2.58% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 67: predicate.reshape_eliminate 1.12% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.59% : 0.000004s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.78% : 0.000014s : 97: predicate.switch_defer_inline 2.85% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.80% : 0.000036s : 265: predicate.switch_simplify 1.04% : 0.000008s : 67: predicate.tile_eliminate 1.04% : 0.000008s : 67: predicate.transpose_eliminate 1.41% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.47% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.73% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.96% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.56% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.59% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.17% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001531 32 57.95% : 0.000887s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.05% : 0.000644s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131864 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.27% : 0.002995s : 1: add_attr 2.26% : 0.002987s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000123s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000537s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000436s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.62% : 0.004770s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000175s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.19% : 0.010803s : 1: opt_a 0.11% : 0.000141s : 1: opt_after_cconv 0.36% : 0.000475s : 1: opt_after_jit_grad 0.22% : 0.000291s : 1: opt_b 9.93% : 0.013095s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 1.20% : 0.001585s : 2: renormalize.infer 1.07% : 0.001411s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.07% : 0.000097s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000100s : 1: symbol_engine_optimizer 60.58% : 0.079878s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 7.78% : 0.010262s : 1: type_inference 0.05% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x1-ge],max_mem:60.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x2-pynative],max_mem:60.0M TotalTime = 0.0215872, [24] [bootstrap]: 0.00053061 [type_inference]: 0.00612343 [event_method]: 1.491e-05 [auto_monad]: 5.272e-05 [graph_reusing]: 6.06e-06 [inline]: 1.56998e-06 [add_attr]: 0.00339894, [1] [add_attr_with_inline]: 0.00338738, [1] [Cycle 1]: 4.417e-05, [2] [tag_attr]: 1.624e-05 [meta_addattr_fg_expand]: 3.99002e-06 [parallel-infer-symbol]: 2.99999e-06 [pre_auto_parallel]: 2.718e-05 [insert-virtual-dataset]: 2.36998e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.55999e-06 [optimize]: 0.00400089, [53] [py_interpret_to_execute]: 1.99e-05 [rewriter_before_opt_a]: 5.642e-05 [opt_a]: 0.00211344, [2] [Cycle 1]: 0.00151062, [45] [expand_dump_flag]: 2.45002e-06 [switch_simplify]: 3.228e-05 [loop_unroll]: 2.127e-05 [a_1]: 0.00044578 [with_stream_mark]: 1.346e-05 [recompute_prepare]: 7.63999e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 3.06001e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.4e-05 [accelerated_algorithm]: 6.57002e-06 [shard]: 1.88002e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.31998e-06 [merge_send_recv]: 8.32e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.267e-05 [flash_sp]: 7.33999e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 8.80999e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.1e-06 [virtual_dataset]: 5.62001e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.45001e-06 [merge_forward]: 3.62998e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 8.89998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.047e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.69e-06 [set_forward_comm_id_for_comm_node_pass]: 3.76001e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.087e-05 [a_after_grad]: 9.25001e-06 [renormalize]: 0.00041998 [add_forward_monad_depend]: 4.62998e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.345e-05 [cse]: 2.687e-05 [a_3]: 4.118e-05 [Cycle 2]: 0.00059358, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.77002e-06 [loop_unroll]: 5.35001e-06 [a_1]: 0.0001255 [with_stream_mark]: 9.59e-06 [recompute_prepare]: 5.69999e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.762e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.39998e-06 [auto_parallel]: 5.27999e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.22002e-06 [merge_comm]: 3.06001e-06 [allreduce_fusion]: 2.76e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.01998e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.14003e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.25999e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94001e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.07998e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 9.70002e-07 [auto_monad_eliminator]: 6.43e-06 [cse]: 1.425e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 8.1e-06 [slice_cell_reuse_recomputed_activation]: 1.78002e-06 [rewriter_after_opt_a]: 7.705e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.74e-06 [mutable_eliminate]: 0.00044939 [opt_b]: 0.00018279, [1] [Cycle 1]: 0.00017687, [7] [b_1]: 0.00010832 [b_2]: 6.86001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 3.7998e-07 [cse]: 1.697e-05 [optimize_parallel_all_gather_comm]: 1.5e-05 [overlap_param_gather]: 1.88997e-06 [cconv]: 2.191e-05 [loop_unroll]: 0.00041577 [opt_after_cconv]: 9.461e-05, [1] [Cycle 1]: 8.878e-05, [7] [c_1]: 2.753e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.578e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.245e-05 [tuple_transform]: 6.941e-05, [1] [Cycle 1]: 6.521e-05, [4] [d_1]: 3.899e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.71002e-06 [add_recomputation]: 4.943e-05 [cse_after_recomputation]: 2.118e-05, [1] [Cycle 1]: 1.675e-05, [1] [cse]: 1.141e-05 [environ_conv]: 5.04e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.81e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.67001e-06 [micro_interleaved_order_control]: 2.05002e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.10002e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.06002e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.113e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.46001e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.04999e-06 [overlap_grad_ring_attention]: 3.85e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 6.59988e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.908e-05, [1] [Cycle 1]: 6.516e-05, [6] [build]: 2.78e-06 [elim_shapecalc]: 8.72e-06 [elim_not_effective]: 1.161e-05 [opt_reshape]: 6.19999e-06 [fold_const_symbol]: 8.87e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.62001e-06 [auto_monad_reorder]: 1.545e-05 [get_jit_bprop_graph]: 9.90025e-07 [rewriter_after_jit_bprop_graph]: 0.00012793 [opt_after_jit_grad]: 0.00045913 [validate]: 3.086e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.00654334 [execute]: 7.27997e-06 Sums bootstrap : 0.000531s : 3.09% type_inference : 0.006123s : 35.62% event_method : 0.000015s : 0.09% auto_monad : 0.000053s : 0.31% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000056s : 0.33% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.15% optimize.opt_a.a_1 : 0.000571s : 3.32% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.82% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000420s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000077s : 0.45% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000449s : 2.61% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000416s : 2.42% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000128s : 0.74% opt_after_jit_grad : 0.000459s : 2.67% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006543s : 38.07% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000162 30 15.03% : 0.000024s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.29% : 0.000005s : 4: substitution.graph_param_transform 66.35% : 0.000108s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.59% : 0.000004s : 4: substitution.replace_old_param 6.65% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006079 2 90.37% : 0.005493s : 1: type_inference.infer 9.63% : 0.000585s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.98% : 0.000026s : 3: replace.inline 30.02% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.53% : 0.000105s : 3: match.inline 8.47% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 1.09% : 0.000002s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.95% : 0.000002s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.21% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.68% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.45% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.35% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.88% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.32% : 0.000001s : 4: predicate.parallel_virtual_node 1.56% : 0.000002s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.46% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.34% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.97% : 0.000002s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.90% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.48% : 0.000001s : 4: predicate.value_based_eliminate 0.66% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000373 8 45.68% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.32% : 0.000202s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030496 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.16% : 0.003403s : 1: add_attr 11.12% : 0.003391s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000058s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.96% : 0.000599s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.07% : 0.000935s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.94% : 0.002116s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.54% : 0.000469s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.13% : 0.004005s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000213s : 1: renormalize.infer 0.66% : 0.000200s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000134s : 1: rewriter_after_jit_bprop_graph 0.27% : 0.000081s : 1: rewriter_after_opt_a 0.20% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 21.49% : 0.006553s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.12% : 0.006137s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.018138, [24] [bootstrap]: 0.00048742 [type_inference]: 0.00430918 [event_method]: 1.018e-05 [auto_monad]: 4.873e-05 [graph_reusing]: 5.02999e-06 [inline]: 1.84e-06 [add_attr]: 0.00295239, [1] [add_attr_with_inline]: 0.00294456, [1] [Cycle 1]: 4.589e-05, [2] [tag_attr]: 1.143e-05 [meta_addattr_fg_expand]: 3.73001e-06 [parallel-infer-symbol]: 2.56e-06 [pre_auto_parallel]: 2.192e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00371836, [53] [py_interpret_to_execute]: 1.512e-05 [rewriter_before_opt_a]: 3.784e-05 [opt_a]: 0.00192474, [2] [Cycle 1]: 0.00132469, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 2.441e-05 [loop_unroll]: 1.366e-05 [a_1]: 0.00036162 [with_stream_mark]: 1.466e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 3.45e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.758e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 2.15002e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.15002e-06 [merge_send_recv]: 8.54e-06 [auto_parallel]: 6.16e-06 [parallel]: 1.869e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 8.96002e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.65998e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.74e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.15001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.118e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.13002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.46e-06 [flash_sp_send_recv_attached]: 2.68e-06 [receive_attached]: 2.32001e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00034088 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.246e-05 [cse]: 2.628e-05 [a_3]: 3.985e-05 [Cycle 2]: 0.00059072, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 7.13998e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012514 [with_stream_mark]: 9.54e-06 [recompute_prepare]: 6.02001e-06 [updatestate_depend_eliminate]: 2.68998e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.719e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.01002e-06 [shard_inline]: 5.57999e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.60001e-06 [parallel]: 4.32998e-06 [flash_sp]: 3.08e-06 [merge_comm]: 3.00002e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 5.23002e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.41e-06 [virtual_dataset]: 5.05999e-06 [get_grad_eliminate_]: 4.96002e-06 [virtual_output]: 4.80999e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 5.69999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.90998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 8.95999e-06 [a_after_grad]: 8.35001e-06 [renormalize]: 1.09983e-07 [add_forward_monad_depend]: 1.07e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.48e-06 [cse]: 1.23e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.12002e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.096e-05 [convert_after_rewriter]: 6.67002e-06 [order_py_execute_after_rewriter]: 5.24998e-06 [mutable_eliminate]: 0.00044945 [opt_b]: 0.00017912, [1] [Cycle 1]: 0.00017311, [7] [b_1]: 0.00010636 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 3.50003e-07 [cse]: 1.561e-05 [optimize_parallel_all_gather_comm]: 1.607e-05 [overlap_param_gather]: 1.69998e-06 [cconv]: 2.171e-05 [loop_unroll]: 0.00041127 [opt_after_cconv]: 9.425e-05, [1] [Cycle 1]: 8.852e-05, [7] [c_1]: 2.805e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.58003e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.563e-05 [renormalize]: 4.00003e-07 [remove_dup_value]: 1.245e-05 [tuple_transform]: 6.962e-05, [1] [Cycle 1]: 6.514e-05, [4] [d_1]: 3.932e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.31e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.221e-05 [cse_after_recomputation]: 1.977e-05, [1] [Cycle 1]: 1.523e-05, [1] [cse]: 1.015e-05 [environ_conv]: 4.90999e-06 [swap_dp_allreduce_reducescatter]: 4.87998e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 3.88999e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.15002e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.19003e-06 [full_micro_interleaved_order_control]: 2.28998e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77999e-06 [control_data_broadcast_order]: 1.148e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 2.04999e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.668e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 6.694e-05, [1] [Cycle 1]: 6.295e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 7.88999e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 6.01e-06 [fold_const_symbol]: 8.54e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.60999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.477e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.17997e-06 [opt_after_jit_grad]: 0.00049308 [validate]: 3.045e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00582921 [execute]: 6.33002e-06 Sums bootstrap : 0.000487s : 3.42% type_inference : 0.004309s : 30.28% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000487s : 3.42% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000023s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000341s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000449s : 3.16% optimize.opt_b.b_1 : 0.000106s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000411s : 2.89% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000493s : 3.46% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005829s : 40.95% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000186 26 12.48% : 0.000023s : 4: substitution.arithmetic_simplify 0.92% : 0.000002s : 2: substitution.elim_not_effective 0.66% : 0.000001s : 2: substitution.fold_const_symbol 2.82% : 0.000005s : 4: substitution.graph_param_transform 77.18% : 0.000144s : 2: substitution.inline 1.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.42% : 0.000004s : 4: substitution.remove_not_recompute_node 2.08% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004270 2 92.00% : 0.003928s : 1: type_inference.infer 8.00% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000142 2 100.00% : 0.000142s : 2: match.inline ------[predicate.] 0.000136 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 17: predicate.arithmetic_simplify 1.03% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.11% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.20% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.22% : 0.000003s : 26: predicate.load_eliminater 1.19% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.77% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.82% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.91% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.88% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.97% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.91% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 42.33% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.67% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026147 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.31% : 0.002956s : 1: add_attr 11.27% : 0.002948s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.00% : 0.000522s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000458s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.21% : 0.000839s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.37% : 0.001928s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.92% : 0.000503s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.24% : 0.003722s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000189s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.33% : 0.005839s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.53% : 0.004323s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0193501, [24] [bootstrap]: 0.0004383 [type_inference]: 0.00540677 [event_method]: 1.456e-05 [auto_monad]: 5.447e-05 [graph_reusing]: 5.59e-06 [inline]: 1.92001e-06 [add_attr]: 0.00290113, [1] [add_attr_with_inline]: 0.00289345, [1] [Cycle 1]: 4.387e-05, [2] [tag_attr]: 1.524e-05 [meta_addattr_fg_expand]: 3.76999e-06 [parallel-infer-symbol]: 2.81999e-06 [pre_auto_parallel]: 2.533e-05 [insert-virtual-dataset]: 2.76e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.11e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00390977, [53] [py_interpret_to_execute]: 1.903e-05 [rewriter_before_opt_a]: 5.787e-05 [opt_a]: 0.0021069, [2] [Cycle 1]: 0.00150992, [45] [expand_dump_flag]: 3.09999e-06 [switch_simplify]: 3.193e-05 [loop_unroll]: 2.042e-05 [a_1]: 0.0004418 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 7.48999e-06 [updatestate_depend_eliminate]: 3.47002e-06 [updatestate_assign_eliminate]: 3.43999e-06 [updatestate_loads_eliminate]: 2.91e-06 [parameter_eliminate]: 1.83002e-06 [a_2]: 7.398e-05 [accelerated_algorithm]: 6.15002e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 5.71998e-06 [merge_send_recv]: 7.98999e-06 [auto_parallel]: 6.07999e-06 [parallel]: 1.823e-05 [flash_sp]: 7.29001e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.74e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 6.92002e-06 [virtual_dataset]: 3.686e-05 [get_grad_eliminate_]: 6.07999e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.9e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.32999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.058e-05 [merge_recompute_call_nodes]: 1.79e-06 [before_grad]: 8.89998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.018e-05 [a_after_grad]: 8.48001e-06 [renormalize]: 0.00040272 [add_forward_monad_depend]: 4.70999e-06 [auto_monad_grad]: 1.81e-06 [auto_monad_eliminator]: 1.356e-05 [cse]: 2.669e-05 [a_3]: 4.087e-05 [Cycle 2]: 0.00058765, [45] [expand_dump_flag]: 8.2e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.29e-06 [a_1]: 0.00012514 [with_stream_mark]: 9.35001e-06 [recompute_prepare]: 5.64998e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.40002e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.715e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.61998e-06 [merge_send_recv]: 4.42998e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.35999e-06 [flash_sp]: 3.33998e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 6.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57999e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 2.91999e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.20001e-07 [after_resolve]: 8.83001e-06 [a_after_grad]: 7.78001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 7.99977e-07 [auto_monad_eliminator]: 6.09999e-06 [cse]: 1.636e-05 [a_3]: 3.157e-05 [py_interpret_to_execute_after_opt_a]: 7.26001e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.032e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.16998e-06 [mutable_eliminate]: 0.00043999 [opt_b]: 0.00018201, [1] [Cycle 1]: 0.00017576, [7] [b_1]: 0.00010742 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.76e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 3.09985e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.523e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.186e-05 [loop_unroll]: 0.00040787 [opt_after_cconv]: 9.265e-05, [1] [Cycle 1]: 8.72e-05, [7] [c_1]: 2.722e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.556e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 1.222e-05 [tuple_transform]: 6.79e-05, [1] [Cycle 1]: 6.369e-05, [4] [d_1]: 3.855e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 5.98002e-06 [partial_unused_args_eliminate]: 2.00002e-06 [add_recomputation]: 4.143e-05 [cse_after_recomputation]: 1.973e-05, [1] [Cycle 1]: 1.538e-05, [1] [cse]: 1.021e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.46e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.29999e-06 [merge_cast_opt]: 1.40999e-06 [slice_recompute_activation]: 2.15002e-06 [micro_interleaved_order_control]: 2.15002e-06 [assign_add_opt]: 1.15001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.03001e-06 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.78003e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69998e-06 [control_data_broadcast_order]: 1.181e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.48999e-06 [overlap_recompute_and_grad_model_parallel]: 4.84998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.67999e-06 [overlap_recompute_comm]: 2.35002e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.705e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.783e-05, [1] [Cycle 1]: 6.381e-05, [6] [build]: 2.41e-06 [elim_shapecalc]: 8.25999e-06 [elim_not_effective]: 1.145e-05 [opt_reshape]: 5.90002e-06 [fold_const_symbol]: 9.06002e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.47001e-06 [auto_monad_reorder]: 1.532e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00043919 [validate]: 2.971e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.00586256 [execute]: 6.81999e-06 Sums bootstrap : 0.000438s : 2.83% type_inference : 0.005407s : 34.94% event_method : 0.000015s : 0.09% auto_monad : 0.000054s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000567s : 3.66% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000141s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000023s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000042s : 0.27% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000403s : 2.60% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.13% optimize.opt_a.cse : 0.000043s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000440s : 2.84% optimize.opt_b.b_1 : 0.000107s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000408s : 2.64% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000041s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000439s : 2.84% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005863s : 37.88% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000160 30 15.16% : 0.000024s : 5: substitution.arithmetic_simplify 1.22% : 0.000002s : 2: substitution.elim_not_effective 0.92% : 0.000001s : 2: substitution.fold_const_symbol 3.27% : 0.000005s : 4: substitution.graph_param_transform 65.94% : 0.000106s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.91% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005367 2 89.98% : 0.004829s : 1: type_inference.infer 10.02% : 0.000538s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.43% : 0.000026s : 3: replace.inline 30.57% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.21% : 0.000104s : 3: match.inline 8.79% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.10% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.79% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.35% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000009s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.81% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.51% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.84% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.57% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.44% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.53% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 47.00% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.00% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027681 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.50% : 0.002905s : 1: add_attr 10.47% : 0.002897s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000045s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000060s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.71% : 0.000473s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000417s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.62% : 0.000449s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.46% : 0.000959s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.62% : 0.002110s : 1: opt_a 0.35% : 0.000096s : 1: opt_after_cconv 1.62% : 0.000448s : 1: opt_after_jit_grad 0.67% : 0.000185s : 1: opt_b 14.14% : 0.003913s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000030s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.73% : 0.000203s : 1: renormalize.infer 0.70% : 0.000193s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.21% : 0.005872s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 19.69% : 0.005451s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.0370397, [24] [bootstrap]: 0.00047875 [type_inference]: 0.0112194 [event_method]: 4.656e-05 [auto_monad]: 0.00011884 [graph_reusing]: 8.31002e-06 [inline]: 1.84998e-06 [add_attr]: 0.00293998, [1] [add_attr_with_inline]: 0.00293191, [1] [Cycle 1]: 7.018e-05, [2] [tag_attr]: 3.372e-05 [meta_addattr_fg_expand]: 9.89001e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 4.894e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.51002e-06 [optimize]: 0.0132008, [53] [py_interpret_to_execute]: 3.688e-05 [rewriter_before_opt_a]: 0.00014326 [opt_a]: 0.010927, [3] [Cycle 1]: 0.00701273, [45] [expand_dump_flag]: 3.69002e-06 [switch_simplify]: 7.322e-05 [loop_unroll]: 6.247e-05 [a_1]: 0.0014323 [with_stream_mark]: 2.217e-05 [recompute_prepare]: 2.173e-05 [updatestate_depend_eliminate]: 9.13002e-06 [updatestate_assign_eliminate]: 7.81001e-06 [updatestate_loads_eliminate]: 7.30003e-06 [parameter_eliminate]: 2.44999e-06 [a_2]: 0.00024255 [accelerated_algorithm]: 3.031e-05 [shard]: 2.08998e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.614e-05 [merge_send_recv]: 1.617e-05 [auto_parallel]: 1.126e-05 [parallel]: 1.724e-05 [flash_sp]: 1.154e-05 [merge_comm]: 9.86998e-06 [allreduce_fusion]: 9.07001e-06 [matmul_add_comm_reduction]: 2.66e-05 [allreduce_slice_to_reducescatter]: 8.70001e-07 [virtual_shard_identity]: 1.769e-05 [virtual_dataset]: 1.536e-05 [get_grad_eliminate_]: 1.521e-05 [virtual_output]: 1.514e-05 [merge_forward]: 9.87001e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.758e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.833e-05 [merge_recompute_call_nodes]: 1.89999e-06 [before_grad]: 2.705e-05 [set_forward_comm_id_for_comm_node_pass]: 9.64999e-06 [meta_fg_expand]: 0.00137871 [flash_sp_send_recv_attached]: 3.43e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 5.893e-05 [a_after_grad]: 8.191e-05 [renormalize]: 0.00243388 [add_forward_monad_depend]: 8.70001e-06 [auto_monad_grad]: 5.22e-06 [auto_monad_eliminator]: 5.586e-05 [cse]: 0.00016409 [a_3]: 0.00033323 [Cycle 2]: 0.00301077, [45] [expand_dump_flag]: 1.52001e-06 [switch_simplify]: 4.677e-05 [loop_unroll]: 4.358e-05 [a_1]: 0.00155325 [with_stream_mark]: 1.154e-05 [recompute_prepare]: 1.1e-05 [updatestate_depend_eliminate]: 5.33002e-06 [updatestate_assign_eliminate]: 4.33001e-06 [updatestate_loads_eliminate]: 3.73999e-06 [parameter_eliminate]: 1.07e-06 [a_2]: 0.00012601 [accelerated_algorithm]: 1.201e-05 [shard]: 9.49978e-07 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.07001e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.3e-06 [parallel]: 4.55001e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.21998e-06 [allreduce_fusion]: 4.75999e-06 [matmul_add_comm_reduction]: 7.56001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 1.003e-05 [virtual_dataset]: 8.77999e-06 [get_grad_eliminate_]: 8.99e-06 [virtual_output]: 8.84e-06 [merge_forward]: 5.06997e-06 [cell_reuse_recompute_pass]: 1.07998e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.668e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.423e-05 [set_forward_comm_id_for_comm_node_pass]: 5.72999e-06 [meta_fg_expand]: 6.873e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.19e-06 [after_resolve]: 1.577e-05 [a_after_grad]: 1.428e-05 [renormalize]: 0.00057824 [add_forward_monad_depend]: 4.06001e-06 [auto_monad_grad]: 1.27e-06 [auto_monad_eliminator]: 1.487e-05 [cse]: 4.654e-05 [a_3]: 6.578e-05 [Cycle 3]: 0.00088942, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 1.047e-05 [loop_unroll]: 8.80001e-06 [a_1]: 0.00024908 [with_stream_mark]: 9.74e-06 [recompute_prepare]: 9.19e-06 [updatestate_depend_eliminate]: 4.73001e-06 [updatestate_assign_eliminate]: 3.8e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 0.00012129 [accelerated_algorithm]: 1.162e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 2.03002e-06 [shard_inline]: 8.96998e-06 [merge_send_recv]: 6.96999e-06 [auto_parallel]: 7.15998e-06 [parallel]: 4.35999e-06 [flash_sp]: 1.07e-06 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 4.75001e-06 [matmul_add_comm_reduction]: 7.98001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 9.66998e-06 [virtual_dataset]: 8.48999e-06 [get_grad_eliminate_]: 8.33001e-06 [virtual_output]: 8.2e-06 [merge_forward]: 4.43999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.40001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.565e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.373e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 2.91999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.302e-05 [a_after_grad]: 1.414e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 1.058e-05 [cse]: 2.426e-05 [a_3]: 6.027e-05 [py_interpret_to_execute_after_opt_a]: 1.007e-05 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 4.715e-05 [convert_after_rewriter]: 9.82999e-06 [order_py_execute_after_rewriter]: 6.54999e-06 [mutable_eliminate]: 0.00045517 [opt_b]: 0.00028633, [1] [Cycle 1]: 0.00028045, [7] [b_1]: 0.00018857 [b_2]: 1.083e-05 [updatestate_depend_eliminate]: 7.26001e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 4.15e-06 [renormalize]: 3.50003e-07 [cse]: 3.1e-05 [optimize_parallel_all_gather_comm]: 2.044e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 1.888e-05 [loop_unroll]: 0.0004504 [opt_after_cconv]: 0.00013415, [1] [Cycle 1]: 0.00012818, [7] [c_1]: 4.789e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 6.89999e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 3.76999e-06 [cse]: 2.918e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 2.914e-05 [tuple_transform]: 0.00010151, [1] [Cycle 1]: 9.709e-05, [4] [d_1]: 6.649e-05 [none_parameter_eliminate]: 1.61998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.97999e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 5.63e-05 [cse_after_recomputation]: 3.131e-05, [1] [Cycle 1]: 2.66e-05, [1] [cse]: 2.116e-05 [environ_conv]: 8.27998e-06 [swap_dp_allreduce_reducescatter]: 7.95e-06 [bias_add_comm_swap]: 3.04999e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.44001e-06 [merge_cast_opt]: 1.14998e-06 [slice_recompute_activation]: 2.08998e-06 [micro_interleaved_order_control]: 2.50997e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 8.2e-07 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.39995e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61002e-06 [control_data_broadcast_order]: 1.684e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 4.79e-06 [overlap_recompute_and_grad_model_parallel]: 6.03002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 1.99999e-06 [overlap_grad_ring_attention]: 5.23002e-06 [overlap_grad_flash_sp]: 2.396e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.91003e-06 [handle_group_info]: 1.05999e-06 [symbol_engine_optimizer]: 9.971e-05, [1] [Cycle 1]: 9.576e-05, [6] [build]: 1.12e-05 [elim_shapecalc]: 1.343e-05 [elim_not_effective]: 1.813e-05 [opt_reshape]: 9.77001e-06 [fold_const_symbol]: 1.482e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.34e-06 [auto_monad_reorder]: 2.413e-05 [get_jit_bprop_graph]: 1.18001e-06 [rewriter_after_jit_bprop_graph]: 3.51001e-06 [opt_after_jit_grad]: 0.00046552 [validate]: 4.345e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00821571 [execute]: 6.96001e-06 Sums bootstrap : 0.000479s : 1.46% type_inference : 0.011219s : 34.16% event_method : 0.000047s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000143s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000130s : 0.40% optimize.opt_a.loop_unroll : 0.000115s : 0.35% optimize.opt_a.a_1 : 0.003235s : 9.85% optimize.opt_a.with_stream_mark : 0.000043s : 0.13% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000490s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000026s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.11% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001450s : 4.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.003012s : 9.17% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000235s : 0.72% optimize.opt_a.a_3 : 0.000459s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000455s : 1.39% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000450s : 1.37% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000011s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000466s : 1.42% validate : 0.000043s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008216s : 25.01% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000780 222 5.70% : 0.000044s : 12: substitution.arithmetic_simplify 1.74% : 0.000014s : 2: substitution.cast_eliminate 0.33% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.94% : 0.000007s : 8: substitution.graph_param_transform 0.32% : 0.000002s : 2: substitution.incorporate_call 0.23% : 0.000002s : 2: substitution.incorporate_call_switch 57.40% : 0.000448s : 17: substitution.inline 1.99% : 0.000016s : 2: substitution.inline_without_move 1.26% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.85% : 0.000014s : 3: substitution.less_batch_normalization 1.67% : 0.000013s : 11: substitution.minmaximum_grad 0.65% : 0.000005s : 5: substitution.partial_eliminate 1.71% : 0.000013s : 20: substitution.remove_not_recompute_node 3.02% : 0.000024s : 10: substitution.replace_applicator 1.32% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.44% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.27% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.41% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000019s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011146 2 86.69% : 0.009663s : 1: type_inference.infer 13.31% : 0.001483s : 1: type_inference.specialize ------[replace.] 0.000219 33 57.72% : 0.000126s : 17: replace.inline 42.28% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000473 33 92.85% : 0.000439s : 17: match.inline 7.15% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000752 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.13% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.50% : 0.000004s : 32: predicate.depend_value_elim 1.22% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.77% : 0.000013s : 108: predicate.environ_get_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.60% : 0.000005s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.30% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.97% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.49% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.05% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001553 34 56.90% : 0.000884s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.10% : 0.000669s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061408 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.79% : 0.002944s : 1: add_attr 4.78% : 0.002936s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000126s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.83% : 0.000512s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.75% : 0.000459s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.96% : 0.004889s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.80% : 0.010930s : 1: opt_a 0.22% : 0.000138s : 1: opt_after_cconv 0.77% : 0.000475s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.50% : 0.013205s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000009s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.66% : 0.001634s : 2: renormalize.infer 2.22% : 0.001364s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000148s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000102s : 1: symbol_engine_optimizer 13.40% : 0.008226s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.29% : 0.011235s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0184003, [24] [bootstrap]: 0.00046659 [type_inference]: 0.00421509 [event_method]: 1.078e-05 [auto_monad]: 5.065e-05 [graph_reusing]: 5.03002e-06 [inline]: 1.89e-06 [add_attr]: 0.00295991, [1] [add_attr_with_inline]: 0.00295209, [1] [Cycle 1]: 4.464e-05, [2] [tag_attr]: 1.12e-05 [meta_addattr_fg_expand]: 3.24001e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 2.122e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 2.21e-06 [pipeline_split]: 1.80001e-06 [optimize]: 0.00367821, [53] [py_interpret_to_execute]: 1.463e-05 [rewriter_before_opt_a]: 3.891e-05 [opt_a]: 0.00187867, [2] [Cycle 1]: 0.00127527, [45] [expand_dump_flag]: 2.76999e-06 [switch_simplify]: 2.348e-05 [loop_unroll]: 1.361e-05 [a_1]: 0.00029057 [with_stream_mark]: 1.325e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 3.45e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.789e-05 [accelerated_algorithm]: 6.59001e-06 [shard]: 2.01998e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 7.41999e-06 [auto_parallel]: 5.96e-06 [parallel]: 1.698e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 8.67e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.46e-06 [merge_forward]: 3.58999e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 8.85001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.069e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.73001e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 9.23002e-06 [renormalize]: 0.00033685 [add_forward_monad_depend]: 4.28001e-06 [auto_monad_grad]: 1.91e-06 [auto_monad_eliminator]: 1.29e-05 [cse]: 2.615e-05 [a_3]: 7.255e-05 [Cycle 2]: 0.00059411, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 7.16999e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012432 [with_stream_mark]: 9.22001e-06 [recompute_prepare]: 5.76e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.53998e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.827e-05 [accelerated_algorithm]: 5.43002e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.64998e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.22e-06 [parallel]: 4.42e-06 [flash_sp]: 3.86999e-06 [merge_comm]: 2.97002e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 5.90002e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.61998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57001e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.65998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.77e-06 [a_after_grad]: 8.28001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.342e-05 [a_3]: 3.309e-05 [py_interpret_to_execute_after_opt_a]: 7.18e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.036e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 4.97e-06 [mutable_eliminate]: 0.000451 [opt_b]: 0.00018245, [1] [Cycle 1]: 0.0001764, [7] [b_1]: 0.00010861 [b_2]: 7.49002e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 3.39991e-07 [cse]: 1.588e-05 [optimize_parallel_all_gather_comm]: 1.51e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.229e-05 [loop_unroll]: 0.00041616 [opt_after_cconv]: 9.34e-05, [1] [Cycle 1]: 8.779e-05, [7] [c_1]: 2.751e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.609e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.165e-05 [tuple_transform]: 6.812e-05, [1] [Cycle 1]: 6.37e-05, [4] [d_1]: 3.881e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 1.61002e-06 [add_recomputation]: 4.185e-05 [cse_after_recomputation]: 2.015e-05, [1] [Cycle 1]: 1.593e-05, [1] [cse]: 1.098e-05 [environ_conv]: 4.25e-06 [swap_dp_allreduce_reducescatter]: 4.96002e-06 [bias_add_comm_swap]: 2.48998e-06 [label_micro_interleaved_index]: 4.29997e-06 [label_fine_grained_interleaved_index]: 2.59001e-06 [merge_cast_opt]: 1.12e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 1.05999e-06 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.86e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.16002e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.117e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.58999e-06 [overlap_recompute_and_grad_model_parallel]: 4.23999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.44999e-06 [overlap_grad_ring_attention]: 4.02998e-06 [overlap_grad_flash_sp]: 1.648e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 6.697e-05, [1] [Cycle 1]: 6.273e-05, [6] [build]: 2.43998e-06 [elim_shapecalc]: 7.84002e-06 [elim_not_effective]: 1.073e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 9.14e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.544e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.11999e-06 [opt_after_jit_grad]: 0.00044538 [validate]: 3.033e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00628692 [execute]: 6.51e-06 Sums bootstrap : 0.000467s : 3.22% type_inference : 0.004215s : 29.09% event_method : 0.000011s : 0.07% auto_monad : 0.000051s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.86% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000014s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000337s : 2.33% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000040s : 0.27% optimize.opt_a.a_3 : 0.000106s : 0.73% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 3.11% optimize.opt_b.b_1 : 0.000109s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.07% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006287s : 43.38% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000119 26 18.82% : 0.000022s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.51% : 0.000005s : 4: substitution.graph_param_transform 65.17% : 0.000078s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.49% : 0.000004s : 4: substitution.remove_not_recompute_node 3.19% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004175 2 91.78% : 0.003831s : 1: type_inference.infer 8.22% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.57% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.00% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.84% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.22% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.77% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 1.15% : 0.000002s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 41.15% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.85% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026330 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.26% : 0.002964s : 1: add_attr 11.22% : 0.002956s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000498s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.03% : 0.000799s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.15% : 0.001882s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.73% : 0.000455s : 1: opt_after_jit_grad 0.71% : 0.000186s : 1: opt_b 13.98% : 0.003682s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.70% : 0.000183s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000070s : 1: symbol_engine_optimizer 23.91% : 0.006296s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.06% : 0.004229s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0357858, [24] [bootstrap]: 0.00050785 [type_inference]: 0.0101364 [event_method]: 4.165e-05 [auto_monad]: 0.00011355 [graph_reusing]: 8.30999e-06 [inline]: 1.86003e-06 [add_attr]: 0.00300143, [1] [add_attr_with_inline]: 0.00299296, [1] [Cycle 1]: 6.623e-05, [2] [tag_attr]: 3.147e-05 [meta_addattr_fg_expand]: 8.28001e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 4.587e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.0129494, [53] [py_interpret_to_execute]: 3.575e-05 [rewriter_before_opt_a]: 0.00012516 [opt_a]: 0.0107089, [3] [Cycle 1]: 0.00686378, [45] [expand_dump_flag]: 3.64002e-06 [switch_simplify]: 6.528e-05 [loop_unroll]: 5.474e-05 [a_1]: 0.00134875 [with_stream_mark]: 2.36e-05 [recompute_prepare]: 2.094e-05 [updatestate_depend_eliminate]: 9.42001e-06 [updatestate_assign_eliminate]: 8.23999e-06 [updatestate_loads_eliminate]: 7.96001e-06 [parameter_eliminate]: 2.77002e-06 [a_2]: 0.00024327 [accelerated_algorithm]: 3.005e-05 [shard]: 1.80001e-06 [meta_shard_fg_expand]: 3.32002e-06 [shard_inline]: 1.585e-05 [merge_send_recv]: 1.575e-05 [auto_parallel]: 1.051e-05 [parallel]: 1.764e-05 [flash_sp]: 1.12e-05 [merge_comm]: 9.94001e-06 [allreduce_fusion]: 8.86997e-06 [matmul_add_comm_reduction]: 2.738e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.73e-05 [virtual_dataset]: 1.569e-05 [get_grad_eliminate_]: 1.514e-05 [virtual_output]: 1.496e-05 [merge_forward]: 9.62001e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 1.703e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.844e-05 [merge_recompute_call_nodes]: 1.90001e-06 [before_grad]: 2.749e-05 [set_forward_comm_id_for_comm_node_pass]: 9.22999e-06 [meta_fg_expand]: 0.00137128 [flash_sp_send_recv_attached]: 3.73001e-06 [receive_attached]: 3.28e-06 [after_resolve]: 5.842e-05 [a_after_grad]: 8.149e-05 [renormalize]: 0.00240799 [add_forward_monad_depend]: 9.02e-06 [auto_monad_grad]: 5.25001e-06 [auto_monad_eliminator]: 5.498e-05 [cse]: 0.00016146 [a_3]: 0.00033432 [Cycle 2]: 0.00293864, [45] [expand_dump_flag]: 1.62999e-06 [switch_simplify]: 4.609e-05 [loop_unroll]: 4.38e-05 [a_1]: 0.00151447 [with_stream_mark]: 1.176e-05 [recompute_prepare]: 1.098e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.31002e-06 [updatestate_loads_eliminate]: 3.71999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012694 [accelerated_algorithm]: 1.191e-05 [shard]: 9.99979e-07 [meta_shard_fg_expand]: 1.86998e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 6.62002e-06 [auto_parallel]: 7.36999e-06 [parallel]: 4.82e-06 [flash_sp]: 3.40998e-06 [merge_comm]: 5.12999e-06 [allreduce_fusion]: 4.70001e-06 [matmul_add_comm_reduction]: 7.3e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.006e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 8.70001e-06 [virtual_output]: 8.37e-06 [merge_forward]: 5.02999e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.648e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.399e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 3.56e-05 [flash_sp_send_recv_attached]: 9.50007e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 1.527e-05 [a_after_grad]: 1.435e-05 [renormalize]: 0.00058547 [add_forward_monad_depend]: 4.07e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.436e-05 [cse]: 4.493e-05 [a_3]: 6.584e-05 [Cycle 3]: 0.00089231, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 1.027e-05 [loop_unroll]: 9.22999e-06 [a_1]: 0.00024962 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 9.40001e-06 [updatestate_depend_eliminate]: 4.75999e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012369 [accelerated_algorithm]: 1.165e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.61998e-06 [shard_inline]: 8.94e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.01999e-06 [parallel]: 4.43999e-06 [flash_sp]: 1.05001e-06 [merge_comm]: 4.86002e-06 [allreduce_fusion]: 4.68001e-06 [matmul_add_comm_reduction]: 7.48e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 1.01e-05 [virtual_dataset]: 8.65001e-06 [get_grad_eliminate_]: 8.43999e-06 [virtual_output]: 8.15999e-06 [merge_forward]: 4.15e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 8.47e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.401e-05 [set_forward_comm_id_for_comm_node_pass]: 5.00999e-06 [meta_fg_expand]: 2.89001e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 1.338e-05 [a_after_grad]: 1.443e-05 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 1.028e-05 [cse]: 2.444e-05 [a_3]: 5.764e-05 [py_interpret_to_execute_after_opt_a]: 1.014e-05 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 4.802e-05 [convert_after_rewriter]: 9.31e-06 [order_py_execute_after_rewriter]: 7.38999e-06 [mutable_eliminate]: 0.00046527 [opt_b]: 0.00028548, [1] [Cycle 1]: 0.00027905, [7] [b_1]: 0.0001882 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 7.23e-06 [updatestate_assign_eliminate]: 4.15e-06 [updatestate_loads_eliminate]: 3.88001e-06 [renormalize]: 5.19998e-07 [cse]: 2.987e-05 [optimize_parallel_all_gather_comm]: 2.068e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.059e-05 [loop_unroll]: 0.00042595 [opt_after_cconv]: 0.00013601, [1] [Cycle 1]: 0.00013007, [7] [c_1]: 4.823e-05 [parameter_eliminate]: 2.33998e-06 [updatestate_depend_eliminate]: 7.31001e-06 [updatestate_assign_eliminate]: 4.08001e-06 [updatestate_loads_eliminate]: 4.05e-06 [cse]: 2.966e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 2.751e-05 [tuple_transform]: 0.0001, [1] [Cycle 1]: 9.526e-05, [4] [d_1]: 6.576e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.67001e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.641e-05 [cse_after_recomputation]: 3.167e-05, [1] [Cycle 1]: 2.686e-05, [1] [cse]: 2.13e-05 [environ_conv]: 8.50999e-06 [swap_dp_allreduce_reducescatter]: 7.51001e-06 [bias_add_comm_swap]: 2.93998e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.70002e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.02999e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.53998e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.39995e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.04998e-06 [overlap_opt_shard_in_pipeline]: 1.11997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49e-06 [control_data_broadcast_order]: 1.686e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 5.34998e-06 [overlap_recompute_and_grad_model_parallel]: 5.42999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 5.00001e-06 [overlap_grad_flash_sp]: 2.414e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.22001e-06 [split_layernorm_comm]: 1.65001e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 9.865e-05, [1] [Cycle 1]: 9.46e-05, [6] [build]: 1.018e-05 [elim_shapecalc]: 1.303e-05 [elim_not_effective]: 1.795e-05 [opt_reshape]: 1.024e-05 [fold_const_symbol]: 1.5e-05 [renormalize]: 1.50001e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 2.49e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.35003e-06 [opt_after_jit_grad]: 0.00048491 [validate]: 4.432e-05 [backend_pass]: 1.12e-06 [task_emit]: 0.00819578 [execute]: 7.19001e-06 Sums bootstrap : 0.000508s : 1.61% type_inference : 0.010136s : 32.14% event_method : 0.000042s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000125s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000122s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003113s : 9.87% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000494s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000037s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000031s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001410s : 4.47% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002994s : 9.49% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000231s : 0.73% optimize.opt_a.a_3 : 0.000458s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000465s : 1.48% optimize.opt_b.b_1 : 0.000188s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.07% optimize.loop_unroll : 0.000426s : 1.35% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000485s : 1.54% validate : 0.000044s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008196s : 25.98% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000725 218 5.84% : 0.000042s : 11: substitution.arithmetic_simplify 1.93% : 0.000014s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.63% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.34% : 0.000002s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.59% : 0.000396s : 16: substitution.inline 2.23% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.03% : 0.000015s : 3: substitution.less_batch_normalization 1.85% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.91% : 0.000014s : 20: substitution.remove_not_recompute_node 3.31% : 0.000024s : 10: substitution.replace_applicator 1.44% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.81% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.85% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.40% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.50% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010070 2 87.39% : 0.008800s : 1: type_inference.infer 12.61% : 0.001270s : 1: type_inference.specialize ------[replace.] 0.000199 30 58.94% : 0.000118s : 16: replace.inline 41.06% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000417 30 92.77% : 0.000387s : 16: match.inline 7.23% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000735 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.32% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.15% : 0.000016s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.14% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.70% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.55% : 0.000041s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 164: predicate.load_eliminater 0.37% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.38% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.16% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.37% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.90% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.93% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.83% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.47% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.43% : 0.000010s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001458 32 57.77% : 0.000842s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.23% : 0.000616s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059812 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.02% : 0.003005s : 1: add_attr 5.01% : 0.002997s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000120s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.91% : 0.000542s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000475s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.95% : 0.004753s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.91% : 0.010712s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.83% : 0.000494s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.66% : 0.012953s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.72% : 0.001627s : 2: renormalize.infer 2.27% : 0.001355s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000052s : 1: rewriter_after_opt_a 0.22% : 0.000130s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000101s : 1: symbol_engine_optimizer 13.72% : 0.008206s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 16.97% : 0.010152s : 1: type_inference 0.13% : 0.000077s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x2-kbk],max_mem:60.0M TotalTime = 0.117446, [24] [bootstrap]: 0.00054898 [type_inference]: 0.00599526 [event_method]: 1.348e-05 [auto_monad]: 5.709e-05 [graph_reusing]: 5.29e-06 [inline]: 1.66998e-06 [add_attr]: 0.00345064, [1] [add_attr_with_inline]: 0.0034397, [1] [Cycle 1]: 4.476e-05, [2] [tag_attr]: 1.534e-05 [meta_addattr_fg_expand]: 4.29002e-06 [parallel-infer-symbol]: 2.82002e-06 [pre_auto_parallel]: 2.776e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.94999e-06 [optimize]: 0.00396515, [53] [py_interpret_to_execute]: 1.966e-05 [rewriter_before_opt_a]: 5.896e-05 [opt_a]: 0.00209951, [2] [Cycle 1]: 0.00150534, [45] [expand_dump_flag]: 2.76e-06 [switch_simplify]: 3.227e-05 [loop_unroll]: 2.11e-05 [a_1]: 0.00045278 [with_stream_mark]: 1.303e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.51001e-06 [updatestate_loads_eliminate]: 3.14001e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.658e-05 [accelerated_algorithm]: 6.16e-06 [shard]: 2.19999e-06 [meta_shard_fg_expand]: 1.56002e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 7.48e-06 [auto_parallel]: 6.04999e-06 [parallel]: 2.375e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.63999e-06 [allreduce_fusion]: 3.60998e-06 [matmul_add_comm_reduction]: 8.51002e-06 [allreduce_slice_to_reducescatter]: 5.29981e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 6.26e-06 [get_grad_eliminate_]: 5.51002e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 8.72998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.091e-05 [merge_recompute_call_nodes]: 1.66e-06 [before_grad]: 9.42999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33998e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 1.99999e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 8.74e-06 [renormalize]: 0.00041134 [add_forward_monad_depend]: 4.18001e-06 [auto_monad_grad]: 1.79998e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 2.486e-05 [a_3]: 3.995e-05 [Cycle 2]: 0.00058539, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.44998e-06 [a_1]: 0.00012609 [with_stream_mark]: 9.42999e-06 [recompute_prepare]: 5.78002e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.40997e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.73e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.17998e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.12e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.61999e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.16998e-06 [virtual_dataset]: 5.43002e-06 [get_grad_eliminate_]: 5.11002e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.83002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.96e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.82e-06 [set_forward_comm_id_for_comm_node_pass]: 2.83003e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 7.85998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 5.94e-06 [cse]: 1.173e-05 [a_3]: 3.215e-05 [py_interpret_to_execute_after_opt_a]: 7.58999e-06 [slice_cell_reuse_recomputed_activation]: 2.12999e-06 [rewriter_after_opt_a]: 3.047e-05 [convert_after_rewriter]: 7.01999e-06 [order_py_execute_after_rewriter]: 5.02999e-06 [mutable_eliminate]: 0.00047587 [opt_b]: 0.00018608, [1] [Cycle 1]: 0.00017967, [7] [b_1]: 0.00011138 [b_2]: 7.20998e-06 [updatestate_depend_eliminate]: 5.36002e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 3.29979e-07 [cse]: 1.613e-05 [optimize_parallel_all_gather_comm]: 1.577e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.22e-05 [loop_unroll]: 0.00041504 [opt_after_cconv]: 9.416e-05, [1] [Cycle 1]: 8.819e-05, [7] [c_1]: 2.782e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.15002e-06 [cse]: 1.565e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.228e-05 [tuple_transform]: 6.987e-05, [1] [Cycle 1]: 6.542e-05, [4] [d_1]: 3.989e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.68002e-06 [add_recomputation]: 4.964e-05 [cse_after_recomputation]: 1.963e-05, [1] [Cycle 1]: 1.528e-05, [1] [cse]: 1.01e-05 [environ_conv]: 5.20999e-06 [swap_dp_allreduce_reducescatter]: 5.37001e-06 [bias_add_comm_swap]: 2.24001e-06 [label_micro_interleaved_index]: 4.56002e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.71e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.39e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.16997e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.58998e-06 [comm_op_add_attrs]: 1.05001e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.095e-05 [grouped_pairwise_exchange_alltoall]: 1.59e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.89e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.46998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.84002e-06 [overlap_grad_flash_sp]: 1.688e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.649e-05, [1] [Cycle 1]: 6.254e-05, [6] [build]: 2.05002e-06 [elim_shapecalc]: 8.28999e-06 [elim_not_effective]: 1.108e-05 [opt_reshape]: 5.97001e-06 [fold_const_symbol]: 8.80001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.513e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.57997e-06 [opt_after_jit_grad]: 0.00045085 [validate]: 2.952e-05 [backend_pass]: 1.00999e-06 [task_emit]: 0.102649 [execute]: 8.65999e-06 Sums bootstrap : 0.000549s : 0.49% type_inference : 0.005995s : 5.30% event_method : 0.000013s : 0.01% auto_monad : 0.000057s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000579s : 0.51% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000012s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000411s : 0.36% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000037s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000476s : 0.42% optimize.opt_b.b_1 : 0.000111s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000415s : 0.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000451s : 0.40% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.102649s : 90.81% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.81% : 0.000024s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.73% : 0.000001s : 2: substitution.fold_const_symbol 3.38% : 0.000006s : 4: substitution.graph_param_transform 66.51% : 0.000109s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000004s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.61% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005950 2 90.90% : 0.005409s : 1: type_inference.infer 9.10% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.74% : 0.000026s : 3: replace.inline 30.26% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.61% : 0.000107s : 3: match.inline 8.39% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.91% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.76% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.87% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.62% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.85% : 0.000001s : 8: predicate.reduce_all_const_elim 1.26% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.54% : 0.000002s : 21: predicate.replace_applicator 0.58% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.71% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.32% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.64% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 47.68% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.32% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.126374 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.73% : 0.003455s : 1: add_attr 2.73% : 0.003444s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000062s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.46% : 0.000586s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000019s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000485s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.75% : 0.000946s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000094s : 28: opt.transform.opt_b 0.03% : 0.000044s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.66% : 0.002102s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.36% : 0.000460s : 1: opt_after_jit_grad 0.15% : 0.000189s : 1: opt_b 3.14% : 0.003969s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000212s : 1: renormalize.infer 0.15% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000069s : 1: symbol_engine_optimizer 81.25% : 0.102673s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 4.75% : 0.006009s : 1: type_inference 0.04% : 0.000051s : 1: validate TotalTime = 0.11008, [24] [bootstrap]: 0.00046245 [type_inference]: 0.00436797 [event_method]: 1.063e-05 [auto_monad]: 4.958e-05 [graph_reusing]: 5.21002e-06 [inline]: 2.16e-06 [add_attr]: 0.002967, [1] [add_attr_with_inline]: 0.00295931, [1] [Cycle 1]: 4.409e-05, [2] [tag_attr]: 1.199e-05 [meta_addattr_fg_expand]: 3.03e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.081e-05 [insert-virtual-dataset]: 2.44001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.56998e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00367894, [53] [py_interpret_to_execute]: 1.434e-05 [rewriter_before_opt_a]: 3.853e-05 [opt_a]: 0.0018819, [2] [Cycle 1]: 0.00124188, [45] [expand_dump_flag]: 3.13e-06 [switch_simplify]: 2.482e-05 [loop_unroll]: 1.369e-05 [a_1]: 0.0002875 [with_stream_mark]: 1.299e-05 [recompute_prepare]: 7.3e-06 [updatestate_depend_eliminate]: 3.55998e-06 [updatestate_assign_eliminate]: 3.01999e-06 [updatestate_loads_eliminate]: 3.02002e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 7.679e-05 [accelerated_algorithm]: 6.45002e-06 [shard]: 2.79001e-06 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 6.01998e-06 [merge_send_recv]: 7.58001e-06 [auto_parallel]: 6.16998e-06 [parallel]: 1.895e-05 [flash_sp]: 7.56001e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.46001e-06 [matmul_add_comm_reduction]: 8.67e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.32999e-06 [merge_forward]: 3.85998e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 9.24998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.033e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 8.97999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.06e-06 [flash_sp_send_recv_attached]: 2.39001e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.39998e-06 [renormalize]: 0.00033837 [add_forward_monad_depend]: 4.13999e-06 [auto_monad_grad]: 1.59e-06 [auto_monad_eliminator]: 1.284e-05 [cse]: 2.697e-05 [a_3]: 3.987e-05 [Cycle 2]: 0.00063055, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.80002e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.0001247 [with_stream_mark]: 1.099e-05 [recompute_prepare]: 5.71998e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.39999e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 6.7e-05 [accelerated_algorithm]: 5.37999e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.54998e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.22e-06 [flash_sp]: 3.03e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.80997e-06 [matmul_add_comm_reduction]: 5.29998e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 5.72001e-06 [virtual_dataset]: 5.02999e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 4.83001e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.22999e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 7.77002e-06 [set_forward_comm_id_for_comm_node_pass]: 4.446e-05 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.86003e-06 [a_after_grad]: 7.87998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 9.40025e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.298e-05 [a_3]: 3.102e-05 [py_interpret_to_execute_after_opt_a]: 7.24001e-06 [slice_cell_reuse_recomputed_activation]: 1.87999e-06 [rewriter_after_opt_a]: 3.031e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.10001e-06 [mutable_eliminate]: 0.00044717 [opt_b]: 0.00018153, [1] [Cycle 1]: 0.00017535, [7] [b_1]: 0.00010801 [b_2]: 7.30998e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.33998e-06 [renormalize]: 3.09985e-07 [cse]: 1.621e-05 [optimize_parallel_all_gather_comm]: 1.546e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.235e-05 [loop_unroll]: 0.00041585 [opt_after_cconv]: 9.39e-05, [1] [Cycle 1]: 8.837e-05, [7] [c_1]: 2.725e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.19998e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.621e-05 [renormalize]: 2.99973e-07 [remove_dup_value]: 1.275e-05 [tuple_transform]: 6.808e-05, [1] [Cycle 1]: 6.397e-05, [4] [d_1]: 3.893e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 4.314e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.053e-05 [environ_conv]: 4.24002e-06 [swap_dp_allreduce_reducescatter]: 4.80999e-06 [bias_add_comm_swap]: 2.58e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.56e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.44001e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 9.5999e-07 [remove_cast_before_assign_add]: 8.29983e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64998e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.60001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.36e-06 [overlap_grad_ring_attention]: 3.76001e-06 [overlap_grad_flash_sp]: 1.658e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.79998e-06 [handle_group_info]: 1.30999e-06 [symbol_engine_optimizer]: 6.743e-05, [1] [Cycle 1]: 6.314e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 8.37998e-06 [elim_not_effective]: 1.136e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.42e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.38e-06 [opt_after_jit_grad]: 0.00044335 [validate]: 3.055e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0977958 [execute]: 8.84e-06 Sums bootstrap : 0.000462s : 0.44% type_inference : 0.004368s : 4.11% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000412s : 0.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000048s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000338s : 0.32% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.42% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000416s : 0.39% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000443s : 0.42% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.097796s : 92.13% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000118 26 18.77% : 0.000022s : 4: substitution.arithmetic_simplify 1.61% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.65% : 0.000005s : 4: substitution.graph_param_transform 64.78% : 0.000076s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.31% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004329 2 91.70% : 0.003970s : 1: type_inference.infer 8.30% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000137 984 0.96% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.50% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.71% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.33% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.40% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.26% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.46% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.82% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.64% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 1.16% : 0.000002s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.22% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.89% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.65% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.12% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.45% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000259 6 42.62% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.38% : 0.000149s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.117980 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.52% : 0.002971s : 1: add_attr 2.51% : 0.002963s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000054s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000498s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000004s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.64% : 0.000760s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.60% : 0.001885s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.38% : 0.000453s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.12% : 0.003683s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000185s : 1: renormalize.infer 0.13% : 0.000148s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.91% : 0.097818s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.71% : 0.004382s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.110099, [24] [bootstrap]: 0.00046824 [type_inference]: 0.00549289 [event_method]: 1.447e-05 [auto_monad]: 5.33e-05 [graph_reusing]: 5.78002e-06 [inline]: 1.72001e-06 [add_attr]: 0.00290323, [1] [add_attr_with_inline]: 0.00289529, [1] [Cycle 1]: 4.627e-05, [2] [tag_attr]: 1.529e-05 [meta_addattr_fg_expand]: 4.94e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.564e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.00401042, [53] [py_interpret_to_execute]: 2.005e-05 [rewriter_before_opt_a]: 5.837e-05 [opt_a]: 0.00217363, [2] [Cycle 1]: 0.00156369, [45] [expand_dump_flag]: 2.95002e-06 [switch_simplify]: 3.119e-05 [loop_unroll]: 2.394e-05 [a_1]: 0.00050275 [with_stream_mark]: 1.38e-05 [recompute_prepare]: 8.1e-06 [updatestate_depend_eliminate]: 3.52997e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 2.63998e-06 [parameter_eliminate]: 2.11998e-06 [a_2]: 7.641e-05 [accelerated_algorithm]: 6.24999e-06 [shard]: 2.23002e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.81998e-06 [merge_send_recv]: 7.55998e-06 [auto_parallel]: 6.27001e-06 [parallel]: 1.848e-05 [flash_sp]: 7.39002e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.40998e-06 [matmul_add_comm_reduction]: 8.40999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 6.81999e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.73002e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.52997e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 1.99e-06 [before_grad]: 9.24998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.13002e-06 [flash_sp_send_recv_attached]: 2.21998e-06 [receive_attached]: 2.11998e-06 [after_resolve]: 1.013e-05 [a_after_grad]: 8.79003e-06 [renormalize]: 0.00041824 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.381e-05 [cse]: 2.64e-05 [a_3]: 4.127e-05 [Cycle 2]: 0.0006003, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 6.86999e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.00012754 [with_stream_mark]: 9.84001e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.70002e-06 [updatestate_assign_eliminate]: 2.16998e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.902e-05 [accelerated_algorithm]: 5.46002e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.37001e-06 [merge_send_recv]: 4.40999e-06 [auto_parallel]: 5.24e-06 [parallel]: 3.76001e-06 [flash_sp]: 3.24001e-06 [merge_comm]: 3.23e-06 [allreduce_fusion]: 3.03998e-06 [matmul_add_comm_reduction]: 4.90999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 5.98002e-06 [virtual_dataset]: 5.20001e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 4.94003e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32999e-06 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 7.48e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31001e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.01998e-06 [a_after_grad]: 8.17e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 1.05001e-06 [auto_monad_eliminator]: 6.48998e-06 [cse]: 1.361e-05 [a_3]: 3.241e-05 [py_interpret_to_execute_after_opt_a]: 7.68999e-06 [slice_cell_reuse_recomputed_activation]: 1.96e-06 [rewriter_after_opt_a]: 3.08e-05 [convert_after_rewriter]: 6.69001e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00045427 [opt_b]: 0.00018493, [1] [Cycle 1]: 0.00017885, [7] [b_1]: 0.00011079 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 5.09986e-07 [cse]: 1.673e-05 [optimize_parallel_all_gather_comm]: 1.546e-05 [overlap_param_gather]: 2.25002e-06 [cconv]: 2.193e-05 [loop_unroll]: 0.00041399 [opt_after_cconv]: 9.524e-05, [1] [Cycle 1]: 8.96e-05, [7] [c_1]: 2.795e-05 [parameter_eliminate]: 2.36998e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.58998e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.625e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.327e-05 [tuple_transform]: 6.887e-05, [1] [Cycle 1]: 6.464e-05, [4] [d_1]: 3.869e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.06998e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 4.258e-05 [cse_after_recomputation]: 2.01e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.083e-05 [environ_conv]: 4.62998e-06 [swap_dp_allreduce_reducescatter]: 5.10001e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.29002e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.49998e-06 [slice_recompute_activation]: 2.41e-06 [micro_interleaved_order_control]: 2.63e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.01998e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.37e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 3.86001e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.54001e-06 [overlap_grad_ring_attention]: 4.44998e-06 [overlap_grad_flash_sp]: 1.733e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.82001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.67e-05, [1] [Cycle 1]: 6.271e-05, [6] [build]: 2.50002e-06 [elim_shapecalc]: 8.02e-06 [elim_not_effective]: 1.122e-05 [opt_reshape]: 5.96998e-06 [fold_const_symbol]: 8.70001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 1.536e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.29001e-06 [opt_after_jit_grad]: 0.00044747 [validate]: 3.137e-05 [backend_pass]: 8.00006e-07 [task_emit]: 0.0963959 [execute]: 9.15001e-06 Sums bootstrap : 0.000468s : 0.44% type_inference : 0.005493s : 5.17% event_method : 0.000014s : 0.01% auto_monad : 0.000053s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.04% optimize.opt_a.loop_unroll : 0.000029s : 0.03% optimize.opt_a.a_1 : 0.000630s : 0.59% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000012s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000418s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.43% optimize.opt_b.b_1 : 0.000111s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000414s : 0.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000447s : 0.42% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096396s : 90.75% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000164 30 15.23% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.25% : 0.000005s : 4: substitution.graph_param_transform 66.13% : 0.000109s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.72% : 0.000004s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.65% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005452 2 89.45% : 0.004877s : 1: type_inference.infer 10.55% : 0.000575s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.36% : 0.000028s : 3: replace.inline 29.64% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.53% : 0.000107s : 3: match.inline 8.47% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.92% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.91% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.72% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.28% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.37% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 0.96% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.54% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.69% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.44% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.06% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.05% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 46.48% : 0.000165s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.52% : 0.000189s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.118584 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.45% : 0.002908s : 1: add_attr 2.44% : 0.002899s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.42% : 0.000504s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.39% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.84% : 0.000999s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000091s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.84% : 0.002177s : 1: opt_a 0.08% : 0.000099s : 1: opt_after_cconv 0.39% : 0.000457s : 1: opt_after_jit_grad 0.16% : 0.000188s : 1: opt_b 3.39% : 0.004014s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.18% : 0.000209s : 1: renormalize.infer 0.17% : 0.000203s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000069s : 1: symbol_engine_optimizer 81.31% : 0.096418s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.64% : 0.005506s : 1: type_inference 0.04% : 0.000052s : 1: validate TotalTime = 0.146865, [24] [bootstrap]: 0.00048635 [type_inference]: 0.0113123 [event_method]: 4.736e-05 [auto_monad]: 0.00011878 [graph_reusing]: 8.53001e-06 [inline]: 2.19999e-06 [add_attr]: 0.00301306, [1] [add_attr_with_inline]: 0.00300452, [1] [Cycle 1]: 7.01e-05, [2] [tag_attr]: 3.462e-05 [meta_addattr_fg_expand]: 9.56e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 4.89e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.0132593, [53] [py_interpret_to_execute]: 3.672e-05 [rewriter_before_opt_a]: 0.00014619 [opt_a]: 0.0109914, [3] [Cycle 1]: 0.00708363, [45] [expand_dump_flag]: 3.91999e-06 [switch_simplify]: 7.435e-05 [loop_unroll]: 6.201e-05 [a_1]: 0.00144329 [with_stream_mark]: 2.285e-05 [recompute_prepare]: 2.163e-05 [updatestate_depend_eliminate]: 9.10999e-06 [updatestate_assign_eliminate]: 7.78001e-06 [updatestate_loads_eliminate]: 8.03001e-06 [parameter_eliminate]: 2.54001e-06 [a_2]: 0.00024558 [accelerated_algorithm]: 3.15e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 3.3e-06 [shard_inline]: 1.643e-05 [merge_send_recv]: 1.594e-05 [auto_parallel]: 1.095e-05 [parallel]: 1.914e-05 [flash_sp]: 1.224e-05 [merge_comm]: 9.76e-06 [allreduce_fusion]: 8.70001e-06 [matmul_add_comm_reduction]: 2.511e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.821e-05 [virtual_dataset]: 1.572e-05 [get_grad_eliminate_]: 1.525e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.57999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.7e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.97e-05 [merge_recompute_call_nodes]: 1.66002e-06 [before_grad]: 2.697e-05 [set_forward_comm_id_for_comm_node_pass]: 9.32999e-06 [meta_fg_expand]: 0.00140923 [flash_sp_send_recv_attached]: 3.5e-06 [receive_attached]: 2.36e-06 [after_resolve]: 5.929e-05 [a_after_grad]: 8.063e-05 [renormalize]: 0.00242852 [add_forward_monad_depend]: 9.15001e-06 [auto_monad_grad]: 5.09e-06 [auto_monad_eliminator]: 5.565e-05 [cse]: 0.00016531 [a_3]: 0.00037218 [Cycle 2]: 0.00299435, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.748e-05 [loop_unroll]: 4.408e-05 [a_1]: 0.00152507 [with_stream_mark]: 1.23e-05 [recompute_prepare]: 1.075e-05 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 3.73001e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.0001256 [accelerated_algorithm]: 1.182e-05 [shard]: 9.10019e-07 [meta_shard_fg_expand]: 1.76e-06 [shard_inline]: 9.30001e-06 [merge_send_recv]: 6.65998e-06 [auto_parallel]: 7.68001e-06 [parallel]: 5.06002e-06 [flash_sp]: 3.04999e-06 [merge_comm]: 5.22e-06 [allreduce_fusion]: 4.63001e-06 [matmul_add_comm_reduction]: 7.55e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.70001e-06 [get_grad_eliminate_]: 8.82e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.716e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.436e-05 [set_forward_comm_id_for_comm_node_pass]: 5.45001e-06 [meta_fg_expand]: 6.84e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.09e-06 [after_resolve]: 1.605e-05 [a_after_grad]: 1.44e-05 [renormalize]: 0.00058993 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.439e-05 [cse]: 4.64e-05 [a_3]: 6.573e-05 [Cycle 3]: 0.00089896, [45] [expand_dump_flag]: 1.06002e-06 [switch_simplify]: 1.08e-05 [loop_unroll]: 9.05001e-06 [a_1]: 0.00024884 [with_stream_mark]: 9.94001e-06 [recompute_prepare]: 9.52999e-06 [updatestate_depend_eliminate]: 4.82e-06 [updatestate_assign_eliminate]: 3.93001e-06 [updatestate_loads_eliminate]: 3.78001e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012281 [accelerated_algorithm]: 1.159e-05 [shard]: 9.29984e-07 [meta_shard_fg_expand]: 1.69998e-06 [shard_inline]: 9.04e-06 [merge_send_recv]: 7.01999e-06 [auto_parallel]: 7.13e-06 [parallel]: 4.37998e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 5.12999e-06 [allreduce_fusion]: 4.74e-06 [matmul_add_comm_reduction]: 7.5e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 9.97001e-06 [virtual_dataset]: 8.52e-06 [get_grad_eliminate_]: 8.27e-06 [virtual_output]: 8.18001e-06 [merge_forward]: 4.32003e-06 [cell_reuse_recompute_pass]: 1.48002e-06 [offload_activation]: 8.62e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.724e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.443e-05 [set_forward_comm_id_for_comm_node_pass]: 5.73997e-06 [meta_fg_expand]: 3.00002e-06 [flash_sp_send_recv_attached]: 9.60019e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.446e-05 [a_after_grad]: 1.375e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 1.03e-05 [cse]: 2.612e-05 [a_3]: 5.973e-05 [py_interpret_to_execute_after_opt_a]: 2.607e-05 [slice_cell_reuse_recomputed_activation]: 1.77001e-06 [rewriter_after_opt_a]: 4.703e-05 [convert_after_rewriter]: 9.10001e-06 [order_py_execute_after_rewriter]: 6.41998e-06 [mutable_eliminate]: 0.00045559 [opt_b]: 0.00028787, [1] [Cycle 1]: 0.00028155, [7] [b_1]: 0.00018896 [b_2]: 1.064e-05 [updatestate_depend_eliminate]: 7.21999e-06 [updatestate_assign_eliminate]: 4.02002e-06 [updatestate_loads_eliminate]: 3.96001e-06 [renormalize]: 4.60015e-07 [cse]: 3.184e-05 [optimize_parallel_all_gather_comm]: 1.982e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 2.008e-05 [loop_unroll]: 0.00042304 [opt_after_cconv]: 0.00013634, [1] [Cycle 1]: 0.00013045, [7] [c_1]: 4.883e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 7.20998e-06 [updatestate_assign_eliminate]: 4.29002e-06 [updatestate_loads_eliminate]: 3.9e-06 [cse]: 2.968e-05 [renormalize]: 3.49974e-07 [remove_dup_value]: 2.805e-05 [tuple_transform]: 0.00010144, [1] [Cycle 1]: 9.677e-05, [4] [d_1]: 6.652e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.70002e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 5.698e-05 [cse_after_recomputation]: 3.237e-05, [1] [Cycle 1]: 2.754e-05, [1] [cse]: 2.211e-05 [environ_conv]: 8.80001e-06 [swap_dp_allreduce_reducescatter]: 8.18999e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.06001e-06 [label_fine_grained_interleaved_index]: 2.66e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.24999e-06 [micro_interleaved_order_control]: 2.19001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 9.5999e-07 [remove_cast_before_assign_add]: 1.40999e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.69e-06 [control_data_broadcast_order]: 1.72e-05 [grouped_pairwise_exchange_alltoall]: 1.42999e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 5.91e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.46998e-06 [overlap_grad_ring_attention]: 5.08002e-06 [overlap_grad_flash_sp]: 2.338e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 9.748e-05, [1] [Cycle 1]: 9.327e-05, [6] [build]: 9.78002e-06 [elim_shapecalc]: 1.321e-05 [elim_not_effective]: 1.766e-05 [opt_reshape]: 1.02e-05 [fold_const_symbol]: 1.442e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 2.468e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00046522 [validate]: 4.507e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.117791 [execute]: 9.02e-06 Sums bootstrap : 0.000486s : 0.34% type_inference : 0.011312s : 7.93% event_method : 0.000047s : 0.03% auto_monad : 0.000119s : 0.08% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000049s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.03% optimize.rewriter_before_opt_a : 0.000146s : 0.10% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000133s : 0.09% optimize.opt_a.loop_unroll : 0.000115s : 0.08% optimize.opt_a.a_1 : 0.003217s : 2.26% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000494s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000029s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000040s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000033s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000064s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001481s : 1.04% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.06% optimize.opt_a.a_after_grad : 0.000109s : 0.08% optimize.opt_a.renormalize : 0.003019s : 2.12% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.06% optimize.opt_a.cse : 0.000238s : 0.17% optimize.opt_a.a_3 : 0.000498s : 0.35% optimize.py_interpret_to_execute_after_opt_a : 0.000026s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.00% optimize.mutable_eliminate : 0.000456s : 0.32% optimize.opt_b.b_1 : 0.000189s : 0.13% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000423s : 0.30% optimize.opt_after_cconv.c_1 : 0.000049s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.04% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000465s : 0.33% validate : 0.000045s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.117791s : 82.61% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000759 222 5.89% : 0.000045s : 12: substitution.arithmetic_simplify 1.86% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 55.67% : 0.000422s : 17: substitution.inline 2.04% : 0.000015s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.67% : 0.000005s : 5: substitution.partial_eliminate 2.01% : 0.000015s : 20: substitution.remove_not_recompute_node 3.12% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.53% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.65% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011240 2 86.53% : 0.009726s : 1: type_inference.infer 13.47% : 0.001514s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.42% : 0.000125s : 17: replace.inline 42.58% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000448 33 92.42% : 0.000414s : 17: match.inline 7.58% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.11% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.99% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.75% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000042s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.32% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.79% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.93% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.32% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.26% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.01% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.12% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.82% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.31% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001577 34 56.42% : 0.000890s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.58% : 0.000687s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.171361 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.76% : 0.003017s : 1: add_attr 1.76% : 0.003008s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000126s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.30% : 0.000521s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000054s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.25% : 0.000432s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.27% : 0.000465s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.85% : 0.004885s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000052s : 4: opt.transform.symbol_engine_opt 6.42% : 0.010994s : 1: opt_a 0.08% : 0.000140s : 1: opt_after_cconv 0.28% : 0.000475s : 1: opt_after_jit_grad 0.17% : 0.000291s : 1: opt_b 7.74% : 0.013263s : 1: optimize 0.01% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000053s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000030s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 0.93% : 0.001596s : 2: renormalize.infer 0.82% : 0.001410s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.09% : 0.000151s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000100s : 1: symbol_engine_optimizer 68.75% : 0.117812s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.61% : 0.011327s : 1: type_inference 0.04% : 0.000069s : 1: validate TotalTime = 0.105808, [24] [bootstrap]: 0.00046522 [type_inference]: 0.00429459 [event_method]: 1.08e-05 [auto_monad]: 5.066e-05 [graph_reusing]: 4.93001e-06 [inline]: 1.67001e-06 [add_attr]: 0.00301011, [1] [add_attr_with_inline]: 0.00300217, [1] [Cycle 1]: 4.606e-05, [2] [tag_attr]: 1.182e-05 [meta_addattr_fg_expand]: 3.68e-06 [parallel-infer-symbol]: 2.66999e-06 [pre_auto_parallel]: 2.065e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00366134, [53] [py_interpret_to_execute]: 1.53e-05 [rewriter_before_opt_a]: 3.844e-05 [opt_a]: 0.001846, [2] [Cycle 1]: 0.001251, [45] [expand_dump_flag]: 2.63998e-06 [switch_simplify]: 2.491e-05 [loop_unroll]: 1.367e-05 [a_1]: 0.00029102 [with_stream_mark]: 1.339e-05 [recompute_prepare]: 7.13998e-06 [updatestate_depend_eliminate]: 3.73999e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.657e-05 [accelerated_algorithm]: 6.09001e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 6.05002e-06 [merge_send_recv]: 8.08999e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.866e-05 [flash_sp]: 7.48999e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.82e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.14001e-06 [virtual_dataset]: 6.05002e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 6.17001e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 8.95001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.65002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.74999e-06 [after_resolve]: 1.06e-05 [a_after_grad]: 9.04e-06 [renormalize]: 0.00034121 [add_forward_monad_depend]: 4.26001e-06 [auto_monad_grad]: 1.74e-06 [auto_monad_eliminator]: 1.281e-05 [cse]: 2.687e-05 [a_3]: 3.924e-05 [Cycle 2]: 0.00058575, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.88e-06 [loop_unroll]: 5.20999e-06 [a_1]: 0.00012334 [with_stream_mark]: 9.29998e-06 [recompute_prepare]: 5.67001e-06 [updatestate_depend_eliminate]: 2.69999e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.734e-05 [accelerated_algorithm]: 5.46002e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.08999e-06 [auto_parallel]: 5.17999e-06 [parallel]: 4.2e-06 [flash_sp]: 2.91e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 2.87002e-06 [matmul_add_comm_reduction]: 5.14998e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.28002e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 4.94003e-06 [merge_forward]: 2.89999e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 5.64998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89999e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.73001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.79998e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.78001e-06 [a_after_grad]: 7.8e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.02999e-06 [cse]: 1.23e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 7.39002e-06 [slice_cell_reuse_recomputed_activation]: 1.67999e-06 [rewriter_after_opt_a]: 3.048e-05 [convert_after_rewriter]: 6.96999e-06 [order_py_execute_after_rewriter]: 5.30001e-06 [mutable_eliminate]: 0.00045018 [opt_b]: 0.00017936, [1] [Cycle 1]: 0.00017355, [7] [b_1]: 0.00010648 [b_2]: 6.90998e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 5.39992e-07 [cse]: 1.573e-05 [optimize_parallel_all_gather_comm]: 1.549e-05 [overlap_param_gather]: 2.24001e-06 [cconv]: 2.188e-05 [loop_unroll]: 0.00041717 [opt_after_cconv]: 0.00010401, [1] [Cycle 1]: 9.819e-05, [7] [c_1]: 2.801e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.40002e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.573e-05 [renormalize]: 2.30008e-07 [remove_dup_value]: 1.212e-05 [tuple_transform]: 6.993e-05, [1] [Cycle 1]: 6.534e-05, [4] [d_1]: 3.953e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 2.20025e-07 [switch_simplify]: 6.13002e-06 [partial_unused_args_eliminate]: 1.55999e-06 [add_recomputation]: 4.36e-05 [cse_after_recomputation]: 2.005e-05, [1] [Cycle 1]: 1.57e-05, [1] [cse]: 1.067e-05 [environ_conv]: 4.74e-06 [swap_dp_allreduce_reducescatter]: 5.20001e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.08998e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.49e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.18001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.234e-05 [grouped_pairwise_exchange_alltoall]: 1.73002e-06 [offloading_packed_experts]: 3.56999e-06 [overlap_recompute_and_grad_model_parallel]: 4.91002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.40999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.28002e-06 [overlap_grad_ring_attention]: 3.78999e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.77999e-06 [handle_group_info]: 1.24e-06 [symbol_engine_optimizer]: 6.737e-05, [1] [Cycle 1]: 6.332e-05, [6] [build]: 2.20002e-06 [elim_shapecalc]: 8.25e-06 [elim_not_effective]: 1.105e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.85999e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.82001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.507e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.72002e-06 [opt_after_jit_grad]: 0.00044945 [validate]: 3.101e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0935586 [execute]: 9.09e-06 Sums bootstrap : 0.000465s : 0.46% type_inference : 0.004295s : 4.22% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000414s : 0.41% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000341s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.44% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000417s : 0.41% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000040s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000449s : 0.44% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.093559s : 91.88% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.54% : 0.000022s : 4: substitution.arithmetic_simplify 1.46% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.84% : 0.000006s : 4: substitution.graph_param_transform 65.00% : 0.000078s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.85% : 0.000005s : 4: substitution.remove_not_recompute_node 3.02% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004254 2 91.65% : 0.003899s : 1: type_inference.infer 8.35% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.31% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.72% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.71% : 0.000004s : 17: predicate.arithmetic_simplify 0.72% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 6.12% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.89% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.21% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.77% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 1.00% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000244 6 41.45% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.55% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.113740 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.65% : 0.003014s : 1: add_attr 2.64% : 0.003006s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.44% : 0.000501s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000016s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000425s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.67% : 0.000766s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.63% : 0.001849s : 1: opt_a 0.09% : 0.000107s : 1: opt_after_cconv 0.40% : 0.000458s : 1: opt_after_jit_grad 0.16% : 0.000183s : 1: opt_b 3.22% : 0.003665s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.17% : 0.000188s : 1: renormalize.infer 0.13% : 0.000147s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.28% : 0.093582s : 1: task_emit 0.06% : 0.000073s : 1: tuple_transform 3.79% : 0.004308s : 1: type_inference 0.05% : 0.000053s : 1: validate TotalTime = 0.147323, [24] [bootstrap]: 0.00057102 [type_inference]: 0.0105432 [event_method]: 4.416e-05 [auto_monad]: 0.00011961 [graph_reusing]: 7.83001e-06 [inline]: 1.75001e-06 [add_attr]: 0.0030324, [1] [add_attr_with_inline]: 0.00302425, [1] [Cycle 1]: 6.681e-05, [2] [tag_attr]: 3.126e-05 [meta_addattr_fg_expand]: 8.92999e-06 [parallel-infer-symbol]: 3.11999e-06 [pre_auto_parallel]: 4.685e-05 [insert-virtual-dataset]: 2.37999e-06 [parallel-infer-symbol-second]: 7.49977e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.0135104, [53] [py_interpret_to_execute]: 3.642e-05 [rewriter_before_opt_a]: 0.00012944 [opt_a]: 0.0112003, [3] [Cycle 1]: 0.0072326, [45] [expand_dump_flag]: 4.1e-06 [switch_simplify]: 6.682e-05 [loop_unroll]: 5.605e-05 [a_1]: 0.00141575 [with_stream_mark]: 2.438e-05 [recompute_prepare]: 2.189e-05 [updatestate_depend_eliminate]: 9.62001e-06 [updatestate_assign_eliminate]: 8.15e-06 [updatestate_loads_eliminate]: 7.41001e-06 [parameter_eliminate]: 2.46e-06 [a_2]: 0.00024489 [accelerated_algorithm]: 3.061e-05 [shard]: 1.81998e-06 [meta_shard_fg_expand]: 3.36999e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.53e-05 [auto_parallel]: 1.106e-05 [parallel]: 1.872e-05 [flash_sp]: 1.148e-05 [merge_comm]: 1.014e-05 [allreduce_fusion]: 9.50001e-06 [matmul_add_comm_reduction]: 2.625e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.782e-05 [virtual_dataset]: 1.582e-05 [get_grad_eliminate_]: 2.167e-05 [virtual_output]: 1.571e-05 [merge_forward]: 1.025e-05 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.888e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.873e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 2.781e-05 [set_forward_comm_id_for_comm_node_pass]: 1.017e-05 [meta_fg_expand]: 0.00147069 [flash_sp_send_recv_attached]: 3.55e-06 [receive_attached]: 2.62001e-06 [after_resolve]: 6.092e-05 [a_after_grad]: 8.414e-05 [renormalize]: 0.00255576 [add_forward_monad_depend]: 9.66e-06 [auto_monad_grad]: 5.08002e-06 [auto_monad_eliminator]: 5.754e-05 [cse]: 0.00017551 [a_3]: 0.00034231 [Cycle 2]: 0.00304139, [45] [expand_dump_flag]: 1.62999e-06 [switch_simplify]: 4.806e-05 [loop_unroll]: 4.426e-05 [a_1]: 0.0015576 [with_stream_mark]: 1.239e-05 [recompute_prepare]: 1.138e-05 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 4.56002e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.0001294 [accelerated_algorithm]: 1.212e-05 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.92999e-06 [shard_inline]: 9.29e-06 [merge_send_recv]: 7e-06 [auto_parallel]: 7.53e-06 [parallel]: 4.64002e-06 [flash_sp]: 3.03e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 5.02999e-06 [matmul_add_comm_reduction]: 7.80998e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 9.91e-06 [virtual_dataset]: 9.74e-06 [get_grad_eliminate_]: 9.72999e-06 [virtual_output]: 8.49998e-06 [merge_forward]: 4.52e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.684e-05 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 1.418e-05 [set_forward_comm_id_for_comm_node_pass]: 5.29e-06 [meta_fg_expand]: 3.611e-05 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.09e-06 [after_resolve]: 1.564e-05 [a_after_grad]: 1.417e-05 [renormalize]: 0.0006054 [add_forward_monad_depend]: 3.85e-06 [auto_monad_grad]: 1.15001e-06 [auto_monad_eliminator]: 1.465e-05 [cse]: 4.922e-05 [a_3]: 6.681e-05 [Cycle 3]: 0.00091253, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.053e-05 [loop_unroll]: 9.05999e-06 [a_1]: 0.00025093 [with_stream_mark]: 1.01e-05 [recompute_prepare]: 9.54e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 4.08999e-06 [updatestate_loads_eliminate]: 3.85e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012526 [accelerated_algorithm]: 1.203e-05 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.87999e-06 [shard_inline]: 9.19e-06 [merge_send_recv]: 7.18e-06 [auto_parallel]: 7.4e-06 [parallel]: 4.57e-06 [flash_sp]: 1.22e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 4.99003e-06 [matmul_add_comm_reduction]: 7.63001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.045e-05 [virtual_dataset]: 8.84e-06 [get_grad_eliminate_]: 8.89998e-06 [virtual_output]: 8.48001e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 8.57998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.598e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.394e-05 [set_forward_comm_id_for_comm_node_pass]: 5.49e-06 [meta_fg_expand]: 3.13e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.351e-05 [a_after_grad]: 1.453e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 1.161e-05 [cse]: 2.816e-05 [a_3]: 6.105e-05 [py_interpret_to_execute_after_opt_a]: 1.065e-05 [slice_cell_reuse_recomputed_activation]: 1.64998e-06 [rewriter_after_opt_a]: 4.629e-05 [convert_after_rewriter]: 9.21002e-06 [order_py_execute_after_rewriter]: 6.63998e-06 [mutable_eliminate]: 0.00046441 [opt_b]: 0.00029718, [1] [Cycle 1]: 0.00029096, [7] [b_1]: 0.0001953 [b_2]: 1.09e-05 [updatestate_depend_eliminate]: 7.23e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 4.04002e-06 [renormalize]: 3.19997e-07 [cse]: 3.369e-05 [optimize_parallel_all_gather_comm]: 2.097e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 1.957e-05 [loop_unroll]: 0.00042578 [opt_after_cconv]: 0.00013783, [1] [Cycle 1]: 0.00013224, [7] [c_1]: 4.849e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 7.65e-06 [updatestate_assign_eliminate]: 4.42998e-06 [updatestate_loads_eliminate]: 3.98001e-06 [cse]: 3.079e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 2.902e-05 [tuple_transform]: 0.00010176, [1] [Cycle 1]: 9.727e-05, [4] [d_1]: 6.727e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 2.80008e-07 [switch_simplify]: 9.74999e-06 [partial_unused_args_eliminate]: 2.01e-06 [add_recomputation]: 5.722e-05 [cse_after_recomputation]: 3.343e-05, [1] [Cycle 1]: 2.873e-05, [1] [cse]: 2.288e-05 [environ_conv]: 9.20999e-06 [swap_dp_allreduce_reducescatter]: 8.07e-06 [bias_add_comm_swap]: 2.56998e-06 [label_micro_interleaved_index]: 4.342e-05 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.34e-06 [slice_recompute_activation]: 2.56998e-06 [micro_interleaved_order_control]: 2.40002e-06 [assign_add_opt]: 1.36002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 1.97999e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.733e-05 [grouped_pairwise_exchange_alltoall]: 1.77999e-06 [offloading_packed_experts]: 5.17e-06 [overlap_recompute_and_grad_model_parallel]: 5.81998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.50001e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 5.17e-06 [overlap_grad_flash_sp]: 2.431e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 0.00010149, [1] [Cycle 1]: 9.703e-05, [6] [build]: 1.035e-05 [elim_shapecalc]: 1.405e-05 [elim_not_effective]: 1.834e-05 [opt_reshape]: 1.027e-05 [fold_const_symbol]: 1.57e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.93002e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.589e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00047442 [validate]: 4.701e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.118653 [execute]: 8.87e-06 Sums bootstrap : 0.000571s : 0.40% type_inference : 0.010543s : 7.37% event_method : 0.000044s : 0.03% auto_monad : 0.000120s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000129s : 0.09% optimize.opt_a.expand_dump_flag : 0.000007s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.09% optimize.opt_a.loop_unroll : 0.000109s : 0.08% optimize.opt_a.a_1 : 0.003224s : 2.25% optimize.opt_a.with_stream_mark : 0.000047s : 0.03% optimize.opt_a.recompute_prepare : 0.000043s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000500s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000020s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000040s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000037s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001510s : 1.06% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.06% optimize.opt_a.a_after_grad : 0.000113s : 0.08% optimize.opt_a.renormalize : 0.003161s : 2.21% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.06% optimize.opt_a.cse : 0.000253s : 0.18% optimize.opt_a.a_3 : 0.000470s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000464s : 0.32% optimize.opt_b.b_1 : 0.000195s : 0.14% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000034s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000426s : 0.30% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000043s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000474s : 0.33% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.118653s : 82.98% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000746 218 5.77% : 0.000043s : 11: substitution.arithmetic_simplify 1.92% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.34% : 0.000003s : 5: substitution.fold_const_symbol 1.00% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.43% : 0.000414s : 16: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.74% : 0.000006s : 5: substitution.partial_eliminate 1.77% : 0.000013s : 20: substitution.remove_not_recompute_node 3.28% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.81% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.82% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.33% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010476 2 86.03% : 0.009013s : 1: type_inference.infer 13.97% : 0.001463s : 1: type_inference.specialize ------[replace.] 0.000212 30 58.53% : 0.000124s : 16: replace.inline 41.47% : 0.000088s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000436 30 92.96% : 0.000405s : 16: match.inline 7.04% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000747 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.18% : 0.000016s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.61% : 0.000005s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.58% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.57% : 0.000042s : 244: predicate.inline 1.32% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000017s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000015s : 97: predicate.partial_defer_inline 1.76% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 67: predicate.reduce_eliminate 2.63% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.32% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.22% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.81% : 0.000014s : 97: predicate.switch_defer_inline 2.91% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000022s : 129: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000016s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.24% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001663 32 55.67% : 0.000926s : 12: func_graph_cloner_run.FuncGraphClonerGraph 44.33% : 0.000737s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.172265 237 0.00% : 0.000004s : 1: ForceFp32Comm 1.76% : 0.003037s : 1: add_attr 1.76% : 0.003028s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000127s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.35% : 0.000605s : 1: bootstrap 0.01% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.03% : 0.000051s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000047s : 1: label_micro_interleaved_index 0.25% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000474s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.85% : 0.004909s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000181s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000055s : 4: opt.transform.symbol_engine_opt 6.50% : 0.011203s : 1: opt_a 0.08% : 0.000141s : 1: opt_after_cconv 0.28% : 0.000484s : 1: opt_after_jit_grad 0.17% : 0.000301s : 1: opt_b 7.85% : 0.013514s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000051s : 1: pre_auto_parallel 0.02% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.95% : 0.001636s : 2: renormalize.infer 0.88% : 0.001512s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000050s : 1: rewriter_after_opt_a 0.08% : 0.000134s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000104s : 1: symbol_engine_optimizer 68.89% : 0.118675s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.13% : 0.010558s : 1: type_inference 0.04% : 0.000071s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x2-ge],max_mem:62.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x3-pynative],max_mem:62.0M TotalTime = 0.0218946, [24] [bootstrap]: 0.00055929 [type_inference]: 0.00635306 [event_method]: 1.518e-05 [auto_monad]: 5.477e-05 [graph_reusing]: 5.82999e-06 [inline]: 1.98002e-06 [add_attr]: 0.00340774, [1] [add_attr_with_inline]: 0.00339711, [1] [Cycle 1]: 4.561e-05, [2] [tag_attr]: 1.553e-05 [meta_addattr_fg_expand]: 4.37e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.769e-05 [insert-virtual-dataset]: 2.94001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00404509, [53] [py_interpret_to_execute]: 2.096e-05 [rewriter_before_opt_a]: 5.932e-05 [opt_a]: 0.00214935, [2] [Cycle 1]: 0.00153963, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 3.182e-05 [loop_unroll]: 2.178e-05 [a_1]: 0.0004585 [with_stream_mark]: 1.377e-05 [recompute_prepare]: 8.01001e-06 [updatestate_depend_eliminate]: 4.15e-06 [updatestate_assign_eliminate]: 3.28998e-06 [updatestate_loads_eliminate]: 2.96999e-06 [parameter_eliminate]: 1.76003e-06 [a_2]: 7.691e-05 [accelerated_algorithm]: 6.56999e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 8.12e-06 [auto_parallel]: 6.31998e-06 [parallel]: 2.215e-05 [flash_sp]: 7.83001e-06 [merge_comm]: 3.81001e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 8.70999e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.75e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 5.66998e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.96001e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 9.06002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.124e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 1.001e-05 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.21e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.63e-06 [after_resolve]: 1.078e-05 [a_after_grad]: 9.14e-06 [renormalize]: 0.00042709 [add_forward_monad_depend]: 4.65001e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.353e-05 [cse]: 2.812e-05 [a_3]: 4.121e-05 [Cycle 2]: 0.00059982, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.89999e-06 [loop_unroll]: 5.61998e-06 [a_1]: 0.00012731 [with_stream_mark]: 1.01e-05 [recompute_prepare]: 5.59998e-06 [updatestate_depend_eliminate]: 2.85998e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.827e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.31002e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.42999e-06 [parallel]: 4.36002e-06 [flash_sp]: 3.25e-06 [merge_comm]: 2.80997e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.38002e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 5.96998e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.83e-06 [cell_reuse_recompute_pass]: 1.39998e-06 [offload_activation]: 6.01e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.37999e-06 [merge_recompute_call_nodes]: 7.49977e-07 [before_grad]: 8.44998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.15002e-06 [meta_fg_expand]: 1.74998e-06 [flash_sp_send_recv_attached]: 9.80013e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.62999e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.62002e-06 [cse]: 1.378e-05 [a_3]: 3.2e-05 [py_interpret_to_execute_after_opt_a]: 8.01001e-06 [slice_cell_reuse_recomputed_activation]: 1.79998e-06 [rewriter_after_opt_a]: 2.921e-05 [convert_after_rewriter]: 7.64002e-06 [order_py_execute_after_rewriter]: 4.90999e-06 [mutable_eliminate]: 0.000455 [opt_b]: 0.00018672, [1] [Cycle 1]: 0.00018069, [7] [b_1]: 0.00011146 [b_2]: 6.96001e-06 [updatestate_depend_eliminate]: 5.44e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.09986e-07 [cse]: 1.709e-05 [optimize_parallel_all_gather_comm]: 1.645e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.246e-05 [loop_unroll]: 0.00042363 [opt_after_cconv]: 9.668e-05, [1] [Cycle 1]: 9.11e-05, [7] [c_1]: 2.806e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.688e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 1.354e-05 [tuple_transform]: 7.007e-05, [1] [Cycle 1]: 6.582e-05, [4] [d_1]: 3.966e-05 [none_parameter_eliminate]: 1.53002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.36998e-06 [partial_unused_args_eliminate]: 1.66002e-06 [add_recomputation]: 5.015e-05 [cse_after_recomputation]: 2.115e-05, [1] [Cycle 1]: 1.672e-05, [1] [cse]: 1.162e-05 [environ_conv]: 4.97e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.61e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.12999e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.39996e-07 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 8.40024e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.11997e-06 [overlap_opt_shard_in_pipeline]: 1.76003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.65001e-06 [control_data_broadcast_order]: 1.232e-05 [grouped_pairwise_exchange_alltoall]: 1.66998e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.97999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.32e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.03002e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.689e-05 [begin_end_overlap_inline]: 5.60016e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 1.12999e-06 [symbol_engine_optimizer]: 6.951e-05, [1] [Cycle 1]: 6.545e-05, [6] [build]: 2.34001e-06 [elim_shapecalc]: 8.70001e-06 [elim_not_effective]: 1.166e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 9.02999e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 1.08001e-06 [rewriter_after_jit_bprop_graph]: 0.00016395 [opt_after_jit_grad]: 0.00046533 [validate]: 3.107e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.00652033 [execute]: 6.82002e-06 Sums bootstrap : 0.000559s : 3.20% type_inference : 0.006353s : 36.33% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.31% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000586s : 3.35% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000145s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000427s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000042s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000029s : 0.17% optimize.convert_after_rewriter : 0.000008s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000455s : 2.60% optimize.opt_b.b_1 : 0.000111s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000424s : 2.42% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000014s : 0.08% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000002s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000164s : 0.94% opt_after_jit_grad : 0.000465s : 2.66% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006520s : 37.29% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000167 30 14.82% : 0.000025s : 5: substitution.arithmetic_simplify 1.02% : 0.000002s : 2: substitution.elim_not_effective 0.69% : 0.000001s : 2: substitution.fold_const_symbol 2.91% : 0.000005s : 4: substitution.graph_param_transform 67.27% : 0.000112s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.71% : 0.000005s : 4: substitution.remove_not_recompute_node 2.35% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006304 2 89.82% : 0.005663s : 1: type_inference.infer 10.18% : 0.000642s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.65% : 0.000028s : 3: replace.inline 30.35% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.78% : 0.000110s : 3: match.inline 8.22% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.97% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.74% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.31% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.55% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000001s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.80% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.03% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.49% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000417 8 42.19% : 0.000176s : 3: func_graph_cloner_run.FuncGraphClonerGraph 57.81% : 0.000241s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030888 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.05% : 0.003412s : 1: add_attr 11.01% : 0.003401s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000595s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000021s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000007s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.50% : 0.000464s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.10% : 0.000958s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000092s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.97% : 0.002152s : 1: opt_a 0.32% : 0.000100s : 1: opt_after_cconv 1.54% : 0.000475s : 1: opt_after_jit_grad 0.62% : 0.000190s : 1: opt_b 13.11% : 0.004049s : 1: optimize 0.06% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.70% : 0.000215s : 1: renormalize.infer 0.66% : 0.000203s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.55% : 0.000170s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000033s : 1: rewriter_after_opt_a 0.29% : 0.000089s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000072s : 1: symbol_engine_optimizer 21.14% : 0.006531s : 1: task_emit 0.24% : 0.000073s : 1: tuple_transform 20.61% : 0.006367s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0183917, [24] [bootstrap]: 0.00045685 [type_inference]: 0.0044234 [event_method]: 1.079e-05 [auto_monad]: 5.175e-05 [graph_reusing]: 5.15999e-06 [inline]: 2.02999e-06 [add_attr]: 0.00310643, [1] [add_attr_with_inline]: 0.00309808, [1] [Cycle 1]: 4.588e-05, [2] [tag_attr]: 1.18e-05 [meta_addattr_fg_expand]: 3.46001e-06 [parallel-infer-symbol]: 3.28998e-06 [pre_auto_parallel]: 2.191e-05 [insert-virtual-dataset]: 2.78e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00371511, [53] [py_interpret_to_execute]: 1.497e-05 [rewriter_before_opt_a]: 3.94e-05 [opt_a]: 0.00187446, [2] [Cycle 1]: 0.00126618, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 2.373e-05 [loop_unroll]: 1.395e-05 [a_1]: 0.00029868 [with_stream_mark]: 1.329e-05 [recompute_prepare]: 7.92003e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 3.05998e-06 [parameter_eliminate]: 1.76998e-06 [a_2]: 7.78e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.89e-06 [merge_send_recv]: 7.93001e-06 [auto_parallel]: 5.62999e-06 [parallel]: 1.809e-05 [flash_sp]: 7.6e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.27002e-06 [matmul_add_comm_reduction]: 9.05999e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.61e-06 [merge_forward]: 3.89002e-06 [cell_reuse_recompute_pass]: 1.06002e-06 [offload_activation]: 9.21002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.103e-05 [merge_recompute_call_nodes]: 1.69998e-06 [before_grad]: 9.72001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.29999e-06 [flash_sp_send_recv_attached]: 2.21e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.027e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00034363 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.91003e-06 [auto_monad_eliminator]: 1.334e-05 [cse]: 2.518e-05 [a_3]: 4.093e-05 [Cycle 2]: 0.00059894, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 7.26001e-06 [loop_unroll]: 5.56002e-06 [a_1]: 0.00012798 [with_stream_mark]: 9.44e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 6.799e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.27998e-06 [flash_sp]: 3.48999e-06 [merge_comm]: 3.08998e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.56e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.32001e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.76e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.90024e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.13002e-06 [a_after_grad]: 8.30999e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.27e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 6.11998e-06 [cse]: 1.249e-05 [a_3]: 3.264e-05 [py_interpret_to_execute_after_opt_a]: 7.30998e-06 [slice_cell_reuse_recomputed_activation]: 2.09e-06 [rewriter_after_opt_a]: 3.12e-05 [convert_after_rewriter]: 7.55003e-06 [order_py_execute_after_rewriter]: 5.54e-06 [mutable_eliminate]: 0.00045633 [opt_b]: 0.00020033, [1] [Cycle 1]: 0.00019423, [7] [b_1]: 0.00012591 [b_2]: 7.41001e-06 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 4.19997e-07 [cse]: 1.564e-05 [optimize_parallel_all_gather_comm]: 1.585e-05 [overlap_param_gather]: 1.72001e-06 [cconv]: 2.222e-05 [loop_unroll]: 0.00041909 [opt_after_cconv]: 9.508e-05, [1] [Cycle 1]: 8.942e-05, [7] [c_1]: 2.785e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.61e-06 [updatestate_loads_eliminate]: 2.46998e-06 [cse]: 1.558e-05 [renormalize]: 4.7998e-07 [remove_dup_value]: 1.271e-05 [tuple_transform]: 6.982e-05, [1] [Cycle 1]: 6.55e-05, [4] [d_1]: 4.005e-05 [none_parameter_eliminate]: 1.39e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.41998e-06 [partial_unused_args_eliminate]: 1.71998e-06 [add_recomputation]: 4.368e-05 [cse_after_recomputation]: 1.936e-05, [1] [Cycle 1]: 1.515e-05, [1] [cse]: 1.009e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.38002e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.58e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 1.11002e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.51002e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.38002e-06 [overlap_recompute_comm]: 2.29999e-06 [overlap_grad_ring_attention]: 4.22998e-06 [overlap_grad_flash_sp]: 1.688e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.902e-05, [1] [Cycle 1]: 6.505e-05, [6] [build]: 2.22001e-06 [elim_shapecalc]: 8.62998e-06 [elim_not_effective]: 1.169e-05 [opt_reshape]: 6.34001e-06 [fold_const_symbol]: 8.97999e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.75001e-06 [pipeline_parallel_scheduler]: 1.38002e-06 [auto_monad_reorder]: 1.522e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.45e-06 [opt_after_jit_grad]: 0.00045427 [validate]: 3.014e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00587874 [execute]: 7.05e-06 Sums bootstrap : 0.000457s : 3.19% type_inference : 0.004423s : 30.88% event_method : 0.000011s : 0.08% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000427s : 2.98% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000344s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000038s : 0.26% optimize.opt_a.a_3 : 0.000074s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000008s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000456s : 3.19% optimize.opt_b.b_1 : 0.000126s : 0.88% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000419s : 2.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000454s : 3.17% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005879s : 41.04% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.45% : 0.000022s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.29% : 0.000005s : 4: substitution.graph_param_transform 66.13% : 0.000080s : 2: substitution.inline 2.25% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.41% : 0.000004s : 4: substitution.remove_not_recompute_node 2.95% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004378 2 92.01% : 0.004028s : 1: type_inference.infer 7.99% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000140 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000003s : 11: predicate.float_depend_g_call 0.64% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.59% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.24% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.27% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 1.12% : 0.000002s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.35% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.27% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.68% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.67% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.42% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.78% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.78% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000240 6 42.57% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.43% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026513 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.73% : 0.003111s : 1: add_attr 11.70% : 0.003102s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.86% : 0.000493s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000428s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000465s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000781s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.40% : 0.000107s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001877s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000464s : 1: opt_after_jit_grad 0.77% : 0.000204s : 1: opt_b 14.03% : 0.003719s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000186s : 1: renormalize.infer 0.57% : 0.000152s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.21% : 0.005889s : 1: task_emit 0.27% : 0.000073s : 1: tuple_transform 16.74% : 0.004437s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0197096, [24] [bootstrap]: 0.00049528 [type_inference]: 0.00553506 [event_method]: 1.403e-05 [auto_monad]: 5.584e-05 [graph_reusing]: 5.57001e-06 [inline]: 1.94999e-06 [add_attr]: 0.00298673, [1] [add_attr_with_inline]: 0.00297782, [1] [Cycle 1]: 4.675e-05, [2] [tag_attr]: 1.647e-05 [meta_addattr_fg_expand]: 4.80999e-06 [parallel-infer-symbol]: 3.04001e-06 [pre_auto_parallel]: 2.528e-05 [insert-virtual-dataset]: 2.49999e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00399562, [53] [py_interpret_to_execute]: 2.067e-05 [rewriter_before_opt_a]: 5.883e-05 [opt_a]: 0.00211523, [2] [Cycle 1]: 0.00150884, [45] [expand_dump_flag]: 2.54999e-06 [switch_simplify]: 3.271e-05 [loop_unroll]: 2.204e-05 [a_1]: 0.0004536 [with_stream_mark]: 1.342e-05 [recompute_prepare]: 7.67998e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.01999e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.742e-05 [accelerated_algorithm]: 6.66999e-06 [shard]: 1.99e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.90002e-06 [merge_send_recv]: 8e-06 [auto_parallel]: 5.99999e-06 [parallel]: 1.659e-05 [flash_sp]: 7.15e-06 [merge_comm]: 3.76001e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 8.92e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.49002e-06 [virtual_dataset]: 6.31998e-06 [get_grad_eliminate_]: 5.65001e-06 [virtual_output]: 5.82999e-06 [merge_forward]: 3.62998e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 9.31e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.41e-06 [set_forward_comm_id_for_comm_node_pass]: 3.52002e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 1.079e-05 [a_after_grad]: 9.14e-06 [renormalize]: 0.00041184 [add_forward_monad_depend]: 4.59998e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.707e-05 [a_3]: 4.078e-05 [Cycle 2]: 0.00059683, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.55001e-06 [a_1]: 0.00012722 [with_stream_mark]: 9.75002e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.18002e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.881e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.09003e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.84998e-06 [auto_parallel]: 5.47999e-06 [parallel]: 4.32998e-06 [flash_sp]: 3.35998e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.58e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 5.19003e-06 [virtual_output]: 5.12e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72001e-06 [merge_recompute_call_nodes]: 6.39993e-07 [before_grad]: 8.35001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.41e-06 [a_after_grad]: 8.03999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.624e-05 [a_3]: 3.265e-05 [py_interpret_to_execute_after_opt_a]: 7.65998e-06 [slice_cell_reuse_recomputed_activation]: 1.93002e-06 [rewriter_after_opt_a]: 2.998e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 4.90001e-06 [mutable_eliminate]: 0.00045374 [opt_b]: 0.00021407, [1] [Cycle 1]: 0.00020815, [7] [b_1]: 0.00010911 [b_2]: 7.12002e-06 [updatestate_depend_eliminate]: 5.45001e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.23998e-06 [renormalize]: 2.59985e-07 [cse]: 4.682e-05 [optimize_parallel_all_gather_comm]: 1.586e-05 [overlap_param_gather]: 1.74e-06 [cconv]: 2.302e-05 [loop_unroll]: 0.00042355 [opt_after_cconv]: 9.603e-05, [1] [Cycle 1]: 9.047e-05, [7] [c_1]: 2.85e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.39999e-06 [cse]: 1.631e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.238e-05 [tuple_transform]: 6.978e-05, [1] [Cycle 1]: 6.522e-05, [4] [d_1]: 3.92e-05 [none_parameter_eliminate]: 1.55999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.26998e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.197e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.087e-05 [environ_conv]: 4.92e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.80002e-06 [label_micro_interleaved_index]: 4.21001e-06 [label_fine_grained_interleaved_index]: 2.79999e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.36998e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 8.60018e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.17e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.182e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.44001e-06 [overlap_grad_ring_attention]: 4.07e-06 [overlap_grad_flash_sp]: 1.637e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.33002e-06 [symbol_engine_optimizer]: 6.924e-05, [1] [Cycle 1]: 6.523e-05, [6] [build]: 2.61e-06 [elim_shapecalc]: 8.74998e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.17999e-06 [fold_const_symbol]: 9.32999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.52001e-06 [auto_monad_reorder]: 1.507e-05 [get_jit_bprop_graph]: 9.09989e-07 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00045503 [validate]: 2.959e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00587674 [execute]: 7.21001e-06 Sums bootstrap : 0.000495s : 3.14% type_inference : 0.005535s : 35.10% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000028s : 0.17% optimize.opt_a.a_1 : 0.000581s : 3.68% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 0.93% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000412s : 2.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000043s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000454s : 2.88% optimize.opt_b.b_1 : 0.000109s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000047s : 0.30% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.15% optimize.loop_unroll : 0.000424s : 2.69% optimize.opt_after_cconv.c_1 : 0.000029s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000455s : 2.89% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005877s : 37.27% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000164 30 15.02% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.09% : 0.000005s : 4: substitution.graph_param_transform 66.79% : 0.000110s : 3: substitution.inline 1.61% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.52% : 0.000004s : 4: substitution.replace_old_param 6.60% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005495 2 89.95% : 0.004943s : 1: type_inference.infer 10.05% : 0.000552s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.62% : 0.000028s : 3: replace.inline 29.38% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.72% : 0.000108s : 3: match.inline 8.28% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000162 1131 0.95% : 0.000002s : 11: predicate.accumulaten_eliminater 1.03% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.30% : 0.000004s : 32: predicate.load_eliminater 1.05% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.63% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.02% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000002s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.09% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.70% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.30% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.42% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000342 8 45.94% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.06% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.028218 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.60% : 0.002991s : 1: add_attr 10.57% : 0.002982s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000061s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.88% : 0.000530s : 1: bootstrap 0.09% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.53% : 0.000432s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000463s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.38% : 0.000953s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.51% : 0.002118s : 1: opt_a 0.35% : 0.000099s : 1: opt_after_cconv 1.65% : 0.000465s : 1: opt_after_jit_grad 0.77% : 0.000218s : 1: opt_b 14.17% : 0.004000s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000209s : 1: renormalize.infer 0.70% : 0.000197s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000072s : 1: symbol_engine_optimizer 20.86% : 0.005886s : 1: task_emit 0.26% : 0.000073s : 1: tuple_transform 19.66% : 0.005548s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.0376128, [24] [bootstrap]: 0.0005138 [type_inference]: 0.0114775 [event_method]: 4.848e-05 [auto_monad]: 0.00012035 [graph_reusing]: 8.40999e-06 [inline]: 1.67999e-06 [add_attr]: 0.00299255, [1] [add_attr_with_inline]: 0.00298393, [1] [Cycle 1]: 7.113e-05, [2] [tag_attr]: 3.409e-05 [meta_addattr_fg_expand]: 9.58002e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 4.948e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.60017e-07 [dataset_repeat_opt]: 2.01998e-06 [pipeline_split]: 1.47001e-06 [optimize]: 0.0134624, [53] [py_interpret_to_execute]: 3.936e-05 [rewriter_before_opt_a]: 0.00014493 [opt_a]: 0.0111021, [3] [Cycle 1]: 0.00711424, [45] [expand_dump_flag]: 4.21001e-06 [switch_simplify]: 7.542e-05 [loop_unroll]: 6.289e-05 [a_1]: 0.00148974 [with_stream_mark]: 2.362e-05 [recompute_prepare]: 2.233e-05 [updatestate_depend_eliminate]: 9.05001e-06 [updatestate_assign_eliminate]: 8e-06 [updatestate_loads_eliminate]: 7.85e-06 [parameter_eliminate]: 2.78998e-06 [a_2]: 0.0002446 [accelerated_algorithm]: 3.081e-05 [shard]: 1.83002e-06 [meta_shard_fg_expand]: 3.43e-06 [shard_inline]: 1.601e-05 [merge_send_recv]: 1.603e-05 [auto_parallel]: 1.097e-05 [parallel]: 1.704e-05 [flash_sp]: 1.188e-05 [merge_comm]: 9.77001e-06 [allreduce_fusion]: 9.11998e-06 [matmul_add_comm_reduction]: 2.661e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.78e-05 [virtual_dataset]: 1.562e-05 [get_grad_eliminate_]: 1.505e-05 [virtual_output]: 1.518e-05 [merge_forward]: 9.57999e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 1.765e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.857e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 2.712e-05 [set_forward_comm_id_for_comm_node_pass]: 9.62001e-06 [meta_fg_expand]: 0.00140359 [flash_sp_send_recv_attached]: 3.33e-06 [receive_attached]: 2.58e-06 [after_resolve]: 6.019e-05 [a_after_grad]: 8.134e-05 [renormalize]: 0.00244313 [add_forward_monad_depend]: 9.52001e-06 [auto_monad_grad]: 5.52001e-06 [auto_monad_eliminator]: 5.677e-05 [cse]: 0.00016735 [a_3]: 0.0003388 [Cycle 2]: 0.00306465, [45] [expand_dump_flag]: 1.50001e-06 [switch_simplify]: 4.77e-05 [loop_unroll]: 4.454e-05 [a_1]: 0.00157586 [with_stream_mark]: 1.264e-05 [recompute_prepare]: 1.123e-05 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 4.40999e-06 [updatestate_loads_eliminate]: 4e-06 [parameter_eliminate]: 1.10001e-06 [a_2]: 0.00012643 [accelerated_algorithm]: 1.223e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 9.13002e-06 [merge_send_recv]: 7.1e-06 [auto_parallel]: 7.38e-06 [parallel]: 4.81997e-06 [flash_sp]: 3.09001e-06 [merge_comm]: 5.27001e-06 [allreduce_fusion]: 4.77998e-06 [matmul_add_comm_reduction]: 7.46999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.012e-05 [virtual_dataset]: 9.37001e-06 [get_grad_eliminate_]: 9.28002e-06 [virtual_output]: 8.76002e-06 [merge_forward]: 4.18001e-06 [cell_reuse_recompute_pass]: 9.89996e-07 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.665e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.428e-05 [set_forward_comm_id_for_comm_node_pass]: 5.13002e-06 [meta_fg_expand]: 6.933e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.15001e-06 [after_resolve]: 1.63e-05 [a_after_grad]: 1.468e-05 [renormalize]: 0.00059963 [add_forward_monad_depend]: 3.95e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.455e-05 [cse]: 4.678e-05 [a_3]: 6.632e-05 [Cycle 3]: 0.00090934, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.063e-05 [loop_unroll]: 9.07999e-06 [a_1]: 0.00025247 [with_stream_mark]: 1.023e-05 [recompute_prepare]: 9.46998e-06 [updatestate_depend_eliminate]: 4.94e-06 [updatestate_assign_eliminate]: 4.22e-06 [updatestate_loads_eliminate]: 3.94002e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.00012477 [accelerated_algorithm]: 1.188e-05 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.67999e-06 [shard_inline]: 9.21998e-06 [merge_send_recv]: 7.16999e-06 [auto_parallel]: 7.06001e-06 [parallel]: 4.57e-06 [flash_sp]: 9.79984e-07 [merge_comm]: 5.03002e-06 [allreduce_fusion]: 5.06997e-06 [matmul_add_comm_reduction]: 7.8e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.009e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.61e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 1.423e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 3.30998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 1.334e-05 [a_after_grad]: 1.414e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.069e-05 [cse]: 2.69e-05 [a_3]: 6.021e-05 [py_interpret_to_execute_after_opt_a]: 1.066e-05 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 4.771e-05 [convert_after_rewriter]: 9.06002e-06 [order_py_execute_after_rewriter]: 6.56999e-06 [mutable_eliminate]: 0.00046552 [opt_b]: 0.00029256, [1] [Cycle 1]: 0.00028624, [7] [b_1]: 0.00019152 [b_2]: 1.133e-05 [updatestate_depend_eliminate]: 7.47002e-06 [updatestate_assign_eliminate]: 4.19002e-06 [updatestate_loads_eliminate]: 4.15e-06 [renormalize]: 4.10015e-07 [cse]: 3.217e-05 [optimize_parallel_all_gather_comm]: 2.172e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.031e-05 [loop_unroll]: 0.00050169 [opt_after_cconv]: 0.00013812, [1] [Cycle 1]: 0.000132, [7] [c_1]: 4.952e-05 [parameter_eliminate]: 2.45002e-06 [updatestate_depend_eliminate]: 7.41999e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 3.88999e-06 [cse]: 3.016e-05 [renormalize]: 2.70025e-07 [remove_dup_value]: 2.863e-05 [tuple_transform]: 0.00010352, [1] [Cycle 1]: 9.881e-05, [4] [d_1]: 6.834e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 1.007e-05 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 5.636e-05 [cse_after_recomputation]: 3.227e-05, [1] [Cycle 1]: 2.754e-05, [1] [cse]: 2.199e-05 [environ_conv]: 8.65999e-06 [swap_dp_allreduce_reducescatter]: 7.5e-06 [bias_add_comm_swap]: 2.48998e-06 [label_micro_interleaved_index]: 4.75999e-06 [label_fine_grained_interleaved_index]: 2.66999e-06 [merge_cast_opt]: 1.31002e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.17001e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 1.17e-06 [add_comm_op_reuse_tag]: 9.09989e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.688e-05 [grouped_pairwise_exchange_alltoall]: 1.41002e-06 [offloading_packed_experts]: 4.80999e-06 [overlap_recompute_and_grad_model_parallel]: 5.77999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.06998e-06 [overlap_grad_ring_attention]: 5.05999e-06 [overlap_grad_flash_sp]: 2.391e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 2.01e-06 [handle_group_info]: 1.17e-06 [symbol_engine_optimizer]: 9.919e-05, [1] [Cycle 1]: 9.478e-05, [6] [build]: 9.95002e-06 [elim_shapecalc]: 1.392e-05 [elim_not_effective]: 1.784e-05 [opt_reshape]: 9.96998e-06 [fold_const_symbol]: 1.472e-05 [renormalize]: 2.69996e-07 [detach_backward]: 1.47999e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.512e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00047287 [validate]: 4.554e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00816012 [execute]: 6.54001e-06 Sums bootstrap : 0.000514s : 1.54% type_inference : 0.011478s : 34.41% event_method : 0.000048s : 0.15% auto_monad : 0.000120s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000145s : 0.43% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000134s : 0.40% optimize.opt_a.loop_unroll : 0.000117s : 0.35% optimize.opt_a.a_1 : 0.003318s : 9.95% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000043s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000026s : 0.08% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.11% optimize.opt_a.virtual_dataset : 0.000034s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.18% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001476s : 4.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.33% optimize.opt_a.renormalize : 0.003043s : 9.12% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.25% optimize.opt_a.cse : 0.000241s : 0.72% optimize.opt_a.a_3 : 0.000465s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000466s : 1.40% optimize.opt_b.b_1 : 0.000192s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000022s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000502s : 1.50% optimize.opt_after_cconv.c_1 : 0.000050s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000473s : 1.42% validate : 0.000046s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008160s : 24.46% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000801 222 5.71% : 0.000046s : 12: substitution.arithmetic_simplify 1.82% : 0.000015s : 2: substitution.cast_eliminate 0.32% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.51% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.27% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.31% : 0.000002s : 2: substitution.incorporate_call 0.28% : 0.000002s : 2: substitution.incorporate_call_switch 57.36% : 0.000459s : 17: substitution.inline 1.98% : 0.000016s : 2: substitution.inline_without_move 1.23% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.67% : 0.000013s : 11: substitution.minmaximum_grad 0.65% : 0.000005s : 5: substitution.partial_eliminate 1.66% : 0.000013s : 20: substitution.remove_not_recompute_node 3.10% : 0.000025s : 10: substitution.replace_applicator 1.31% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.50% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.70% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.23% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.42% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.24% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011402 2 86.65% : 0.009880s : 1: type_inference.infer 13.35% : 0.001522s : 1: type_inference.specialize ------[replace.] 0.000225 33 57.42% : 0.000129s : 17: replace.inline 42.58% : 0.000096s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000484 33 92.88% : 0.000450s : 17: match.inline 7.12% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000758 5764 1.07% : 0.000008s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.14% : 0.000016s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.17% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.22% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000043s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.66% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.64% : 0.000020s : 168: predicate.load_eliminater 0.34% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.07% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.25% : 0.000009s : 68: predicate.reduce_eliminate 2.64% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.33% : 0.000002s : 32: predicate.remove_not_recompute_node 1.93% : 0.000015s : 152: predicate.replace_applicator 0.58% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.60% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.13% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.94% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.55% : 0.000012s : 84: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.21% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.16% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001564 34 57.35% : 0.000897s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.65% : 0.000667s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062435 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.80% : 0.002997s : 1: add_attr 4.78% : 0.002988s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000128s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000550s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000056s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000013s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.82% : 0.000511s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000475s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.00% : 0.004993s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000176s : 28: opt.transform.opt_b 0.12% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.79% : 0.011105s : 1: opt_a 0.23% : 0.000142s : 1: opt_after_cconv 0.77% : 0.000483s : 1: opt_after_jit_grad 0.47% : 0.000296s : 1: opt_b 21.57% : 0.013466s : 1: optimize 0.04% : 0.000026s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.65% : 0.001656s : 2: renormalize.infer 2.20% : 0.001373s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000150s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000102s : 1: symbol_engine_optimizer 13.09% : 0.008171s : 1: task_emit 0.17% : 0.000106s : 1: tuple_transform 18.41% : 0.011493s : 1: type_inference 0.12% : 0.000077s : 1: validate TotalTime = 0.018655, [24] [bootstrap]: 0.00047455 [type_inference]: 0.00433124 [event_method]: 1.066e-05 [auto_monad]: 5.029e-05 [graph_reusing]: 4.63999e-06 [inline]: 2.16e-06 [add_attr]: 0.00302771, [1] [add_attr_with_inline]: 0.00301926, [1] [Cycle 1]: 4.625e-05, [2] [tag_attr]: 1.293e-05 [meta_addattr_fg_expand]: 3.11001e-06 [parallel-infer-symbol]: 2.73998e-06 [pre_auto_parallel]: 2.092e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.70001e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00374838, [53] [py_interpret_to_execute]: 1.59e-05 [rewriter_before_opt_a]: 3.833e-05 [opt_a]: 0.00191847, [2] [Cycle 1]: 0.00127456, [45] [expand_dump_flag]: 3.25002e-06 [switch_simplify]: 2.441e-05 [loop_unroll]: 1.412e-05 [a_1]: 0.00029814 [with_stream_mark]: 1.363e-05 [recompute_prepare]: 7.77e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.08e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.834e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 2.29001e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.88998e-06 [merge_send_recv]: 7.80998e-06 [auto_parallel]: 6.21998e-06 [parallel]: 1.805e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 9.10999e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.43999e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.51e-06 [virtual_output]: 5.94999e-06 [merge_forward]: 3.66999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.14998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.14e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.74e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51999e-06 [meta_fg_expand]: 2.41998e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.131e-05 [a_after_grad]: 8.99e-06 [renormalize]: 0.00034618 [add_forward_monad_depend]: 4.60001e-06 [auto_monad_grad]: 1.63997e-06 [auto_monad_eliminator]: 1.364e-05 [cse]: 2.706e-05 [a_3]: 4.039e-05 [Cycle 2]: 0.00063435, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.91999e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012627 [with_stream_mark]: 9.76e-06 [recompute_prepare]: 5.88002e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.25002e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 1.20999e-06 [a_2]: 6.91e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.09002e-06 [flash_sp]: 3.41001e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.88e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.16e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.40999e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 6.36e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.88002e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35003e-06 [meta_fg_expand]: 1.86e-06 [flash_sp_send_recv_attached]: 1.09e-06 [receive_attached]: 1.05999e-06 [after_resolve]: 9.20999e-06 [a_after_grad]: 8.02e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.21002e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 6.51999e-06 [cse]: 1.361e-05 [a_3]: 3.284e-05 [py_interpret_to_execute_after_opt_a]: 7.48999e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.035e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 4.97999e-06 [mutable_eliminate]: 0.00045572 [opt_b]: 0.00018322, [1] [Cycle 1]: 0.00017719, [7] [b_1]: 0.00010914 [b_2]: 7.43e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 3.09985e-07 [cse]: 1.644e-05 [optimize_parallel_all_gather_comm]: 1.654e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.186e-05 [loop_unroll]: 0.00041874 [opt_after_cconv]: 9.644e-05, [1] [Cycle 1]: 9.076e-05, [7] [c_1]: 2.846e-05 [parameter_eliminate]: 2.39001e-06 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.10002e-06 [cse]: 1.641e-05 [renormalize]: 4.7998e-07 [remove_dup_value]: 1.344e-05 [tuple_transform]: 7.107e-05, [1] [Cycle 1]: 6.676e-05, [4] [d_1]: 4.081e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.40002e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.302e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.614e-05, [1] [cse]: 1.082e-05 [environ_conv]: 5.32999e-06 [swap_dp_allreduce_reducescatter]: 5.07999e-06 [bias_add_comm_swap]: 2.64999e-06 [label_micro_interleaved_index]: 4.34002e-06 [label_fine_grained_interleaved_index]: 2.37001e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.30002e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 1.26997e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.96e-06 [control_data_broadcast_order]: 1.195e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.77e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37999e-06 [overlap_recompute_comm]: 2.33002e-06 [overlap_grad_ring_attention]: 4.10998e-06 [overlap_grad_flash_sp]: 1.703e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.44999e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.917e-05, [1] [Cycle 1]: 6.504e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.65999e-06 [elim_not_effective]: 1.156e-05 [opt_reshape]: 6.39999e-06 [fold_const_symbol]: 8.83001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.36002e-06 [auto_monad_reorder]: 1.505e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.44001e-06 [opt_after_jit_grad]: 0.00045427 [validate]: 3.097e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00626563 [execute]: 6.78e-06 Sums bootstrap : 0.000475s : 3.24% type_inference : 0.004331s : 29.61% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000013s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.13% optimize.opt_a.a_1 : 0.000424s : 2.90% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000346s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000041s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000456s : 3.12% optimize.opt_b.b_1 : 0.000109s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000419s : 2.86% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000041s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000454s : 3.11% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006266s : 42.84% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000124 26 18.08% : 0.000022s : 4: substitution.arithmetic_simplify 1.36% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.56% : 0.000006s : 4: substitution.graph_param_transform 65.26% : 0.000081s : 2: substitution.inline 2.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.64% : 0.000004s : 4: substitution.remove_not_recompute_node 3.77% : 0.000005s : 4: substitution.replace_old_param ------[type_inference.] 0.004289 2 91.66% : 0.003932s : 1: type_inference.infer 8.34% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000139 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.04% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.44% : 0.000003s : 17: predicate.arithmetic_simplify 0.77% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.78% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 1.12% : 0.000002s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.08% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.87% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.68% : 0.000001s : 9: predicate.minmaximum_grad 1.43% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.95% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.65% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.72% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.65% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.91% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.04% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000243 6 40.84% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 59.16% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026725 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.35% : 0.003032s : 1: add_attr 11.31% : 0.003023s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.91% : 0.000510s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000428s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000465s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000783s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.09% : 0.000023s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000092s : 28: opt.transform.opt_b 0.17% : 0.000045s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.19% : 0.001922s : 1: opt_a 0.37% : 0.000100s : 1: opt_after_cconv 1.74% : 0.000464s : 1: opt_after_jit_grad 0.70% : 0.000187s : 1: opt_b 14.04% : 0.003752s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.71% : 0.000189s : 1: renormalize.infer 0.57% : 0.000152s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 23.48% : 0.006276s : 1: task_emit 0.28% : 0.000074s : 1: tuple_transform 16.26% : 0.004345s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0359227, [24] [bootstrap]: 0.00053905 [type_inference]: 0.0102184 [event_method]: 4.292e-05 [auto_monad]: 0.00011686 [graph_reusing]: 7.82e-06 [inline]: 2.12001e-06 [add_attr]: 0.00301413, [1] [add_attr_with_inline]: 0.00300561, [1] [Cycle 1]: 6.854e-05, [2] [tag_attr]: 3.213e-05 [meta_addattr_fg_expand]: 8.59998e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 4.609e-05 [insert-virtual-dataset]: 2.82002e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.05002e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.0130792, [53] [py_interpret_to_execute]: 3.573e-05 [rewriter_before_opt_a]: 0.00012709 [opt_a]: 0.0108257, [3] [Cycle 1]: 0.00693349, [45] [expand_dump_flag]: 3.68999e-06 [switch_simplify]: 6.6e-05 [loop_unroll]: 5.516e-05 [a_1]: 0.00135055 [with_stream_mark]: 2.358e-05 [recompute_prepare]: 2.146e-05 [updatestate_depend_eliminate]: 8.99e-06 [updatestate_assign_eliminate]: 7.65e-06 [updatestate_loads_eliminate]: 7.15e-06 [parameter_eliminate]: 2.66e-06 [a_2]: 0.00024412 [accelerated_algorithm]: 3.08e-05 [shard]: 1.89e-06 [meta_shard_fg_expand]: 3.35003e-06 [shard_inline]: 1.62e-05 [merge_send_recv]: 1.565e-05 [auto_parallel]: 1.075e-05 [parallel]: 1.761e-05 [flash_sp]: 1.125e-05 [merge_comm]: 9.77001e-06 [allreduce_fusion]: 8.62e-06 [matmul_add_comm_reduction]: 2.642e-05 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.799e-05 [virtual_dataset]: 1.58e-05 [get_grad_eliminate_]: 1.517e-05 [virtual_output]: 1.488e-05 [merge_forward]: 9.09e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 1.893e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.9e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 2.716e-05 [set_forward_comm_id_for_comm_node_pass]: 9.91e-06 [meta_fg_expand]: 0.00137686 [flash_sp_send_recv_attached]: 3.72998e-06 [receive_attached]: 2.31998e-06 [after_resolve]: 5.915e-05 [a_after_grad]: 8.11e-05 [renormalize]: 0.00242524 [add_forward_monad_depend]: 8.69e-06 [auto_monad_grad]: 5.49998e-06 [auto_monad_eliminator]: 5.456e-05 [cse]: 0.00016218 [a_3]: 0.00033546 [Cycle 2]: 0.00295614, [45] [expand_dump_flag]: 1.52001e-06 [switch_simplify]: 4.696e-05 [loop_unroll]: 4.408e-05 [a_1]: 0.00153305 [with_stream_mark]: 1.242e-05 [recompute_prepare]: 1.096e-05 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 4.3e-06 [updatestate_loads_eliminate]: 3.65998e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012672 [accelerated_algorithm]: 1.208e-05 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.51e-06 [merge_send_recv]: 6.41e-06 [auto_parallel]: 7.28999e-06 [parallel]: 4.67998e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 5.02e-06 [allreduce_fusion]: 4.51002e-06 [matmul_add_comm_reduction]: 7.61999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 8.81002e-06 [get_grad_eliminate_]: 9.02999e-06 [virtual_output]: 8.30999e-06 [merge_forward]: 4.45999e-06 [cell_reuse_recompute_pass]: 1.05999e-06 [offload_activation]: 9.41998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.629e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.391e-05 [set_forward_comm_id_for_comm_node_pass]: 5.30001e-06 [meta_fg_expand]: 3.663e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.09998e-06 [after_resolve]: 1.561e-05 [a_after_grad]: 1.454e-05 [renormalize]: 0.00057792 [add_forward_monad_depend]: 3.89002e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.424e-05 [cse]: 4.637e-05 [a_3]: 6.596e-05 [Cycle 3]: 0.00092236, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.071e-05 [loop_unroll]: 8.84998e-06 [a_1]: 0.00027067 [with_stream_mark]: 1.017e-05 [recompute_prepare]: 9.42999e-06 [updatestate_depend_eliminate]: 4.96002e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 3.91001e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012427 [accelerated_algorithm]: 1.178e-05 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.14e-06 [merge_send_recv]: 7.23e-06 [auto_parallel]: 6.88998e-06 [parallel]: 4.37e-06 [flash_sp]: 1.10999e-06 [merge_comm]: 4.82e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.58001e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 1.037e-05 [virtual_dataset]: 8.67e-06 [get_grad_eliminate_]: 8.59e-06 [virtual_output]: 8.28999e-06 [merge_forward]: 4.23999e-06 [cell_reuse_recompute_pass]: 1.28002e-06 [offload_activation]: 8.85001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.596e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.402e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35001e-06 [meta_fg_expand]: 3.08998e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.368e-05 [a_after_grad]: 1.444e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.074e-05 [cse]: 2.576e-05 [a_3]: 5.721e-05 [py_interpret_to_execute_after_opt_a]: 1.054e-05 [slice_cell_reuse_recomputed_activation]: 2.10002e-06 [rewriter_after_opt_a]: 5.069e-05 [convert_after_rewriter]: 9.41e-06 [order_py_execute_after_rewriter]: 7.11001e-06 [mutable_eliminate]: 0.00046019 [opt_b]: 0.00029112, [1] [Cycle 1]: 0.00028441, [7] [b_1]: 0.00018907 [b_2]: 1.116e-05 [updatestate_depend_eliminate]: 7.46001e-06 [updatestate_assign_eliminate]: 4.35999e-06 [updatestate_loads_eliminate]: 4.13999e-06 [renormalize]: 5.3001e-07 [cse]: 3.211e-05 [optimize_parallel_all_gather_comm]: 2.079e-05 [overlap_param_gather]: 2.11998e-06 [cconv]: 2.023e-05 [loop_unroll]: 0.00042544 [opt_after_cconv]: 0.00013581, [1] [Cycle 1]: 0.00013002, [7] [c_1]: 4.836e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 7.38e-06 [updatestate_assign_eliminate]: 4.49998e-06 [updatestate_loads_eliminate]: 3.98001e-06 [cse]: 2.965e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 2.853e-05 [tuple_transform]: 0.00010377, [1] [Cycle 1]: 9.867e-05, [4] [d_1]: 6.827e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.79e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.62e-05 [cse_after_recomputation]: 3.159e-05, [1] [Cycle 1]: 2.713e-05, [1] [cse]: 2.163e-05 [environ_conv]: 8.37e-06 [swap_dp_allreduce_reducescatter]: 7.95998e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.34001e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 1.97001e-06 [reorder_send_recv_between_fp_bp]: 2.48002e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.689e-05 [grouped_pairwise_exchange_alltoall]: 1.46998e-06 [offloading_packed_experts]: 5.37999e-06 [overlap_recompute_and_grad_model_parallel]: 5.43002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36998e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 4.87998e-06 [overlap_grad_flash_sp]: 2.434e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 0.00010051, [1] [Cycle 1]: 9.619e-05, [6] [build]: 9.59999e-06 [elim_shapecalc]: 1.385e-05 [elim_not_effective]: 1.822e-05 [opt_reshape]: 1.066e-05 [fold_const_symbol]: 1.529e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.85001e-06 [auto_monad_reorder]: 2.598e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00047282 [validate]: 4.642e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00807786 [execute]: 6.90998e-06 Sums bootstrap : 0.000539s : 1.71% type_inference : 0.010218s : 32.32% event_method : 0.000043s : 0.14% auto_monad : 0.000117s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000046s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000108s : 0.34% optimize.opt_a.a_1 : 0.003154s : 9.98% optimize.opt_a.with_stream_mark : 0.000046s : 0.15% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000495s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000031s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000037s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.07% optimize.opt_a.meta_fg_expand : 0.001417s : 4.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.003003s : 9.50% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000234s : 0.74% optimize.opt_a.a_3 : 0.000459s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000051s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000460s : 1.46% optimize.opt_b.b_1 : 0.000189s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000425s : 1.35% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000026s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000473s : 1.50% validate : 0.000046s : 0.15% backend_pass : 0.000001s : 0.00% task_emit : 0.008078s : 25.55% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000737 218 5.91% : 0.000044s : 11: substitution.arithmetic_simplify 2.04% : 0.000015s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000002s : 5: substitution.fold_const_symbol 1.10% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 54.87% : 0.000405s : 16: substitution.inline 2.14% : 0.000016s : 2: substitution.inline_without_move 1.37% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.08% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.79% : 0.000013s : 20: substitution.remove_not_recompute_node 3.23% : 0.000024s : 10: substitution.replace_applicator 1.42% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.76% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.60% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010149 2 87.22% : 0.008852s : 1: type_inference.infer 12.78% : 0.001297s : 1: type_inference.specialize ------[replace.] 0.000204 30 58.99% : 0.000120s : 16: replace.inline 41.01% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000427 30 92.67% : 0.000396s : 16: match.inline 7.33% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000738 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.33% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.09% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.18% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.54% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.42% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.69% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.67% : 0.000042s : 244: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.63% : 0.000019s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.23% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.39% : 0.000003s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 67: predicate.reshape_eliminate 1.18% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.66% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.20% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.84% : 0.000014s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.82% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.50% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.57% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.22% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.58% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001475 32 57.92% : 0.000854s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.08% : 0.000621s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.060156 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.02% : 0.003019s : 1: add_attr 5.00% : 0.003009s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000124s : 1: auto_monad 0.05% : 0.000030s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.95% : 0.000573s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000050s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000435s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000470s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.99% : 0.004804s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000175s : 28: opt.transform.opt_b 0.13% : 0.000076s : 2: opt.transform.opt_trans_graph 0.09% : 0.000055s : 4: opt.transform.symbol_engine_opt 18.00% : 0.010829s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.80% : 0.000483s : 1: opt_after_jit_grad 0.49% : 0.000295s : 1: opt_b 21.75% : 0.013083s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000051s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.63% : 0.001584s : 2: renormalize.infer 2.34% : 0.001407s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000055s : 1: rewriter_after_opt_a 0.22% : 0.000132s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000103s : 1: symbol_engine_optimizer 13.45% : 0.008089s : 1: task_emit 0.18% : 0.000107s : 1: tuple_transform 17.01% : 0.010234s : 1: type_inference 0.13% : 0.000078s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x3-kbk],max_mem:62.0M . TotalTime = 0.0801776, [24] [bootstrap]: 0.00056593 [type_inference]: 0.00622438 [event_method]: 1.443e-05 [auto_monad]: 5.417e-05 [graph_reusing]: 5.94999e-06 [inline]: 1.89e-06 [add_attr]: 0.00343207, [1] [add_attr_with_inline]: 0.00342138, [1] [Cycle 1]: 4.522e-05, [2] [tag_attr]: 1.511e-05 [meta_addattr_fg_expand]: 4.18001e-06 [parallel-infer-symbol]: 2.65002e-06 [pre_auto_parallel]: 2.796e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.62001e-06 [optimize]: 0.0040144, [53] [py_interpret_to_execute]: 2.023e-05 [rewriter_before_opt_a]: 5.776e-05 [opt_a]: 0.00211994, [2] [Cycle 1]: 0.00152426, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 3.235e-05 [loop_unroll]: 2.146e-05 [a_1]: 0.00045703 [with_stream_mark]: 1.425e-05 [recompute_prepare]: 7.63999e-06 [updatestate_depend_eliminate]: 4.23999e-06 [updatestate_assign_eliminate]: 3.57997e-06 [updatestate_loads_eliminate]: 2.75997e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.557e-05 [accelerated_algorithm]: 6.41998e-06 [shard]: 2.17999e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 8.60001e-06 [auto_parallel]: 6.12999e-06 [parallel]: 2.257e-05 [flash_sp]: 7.28e-06 [merge_comm]: 3.91999e-06 [allreduce_fusion]: 3.47997e-06 [matmul_add_comm_reduction]: 8.84e-06 [allreduce_slice_to_reducescatter]: 7.39994e-07 [virtual_shard_identity]: 7.21999e-06 [virtual_dataset]: 6.75002e-06 [get_grad_eliminate_]: 5.60001e-06 [virtual_output]: 5.54e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.56998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.077e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.15999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.48e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 8.80001e-06 [renormalize]: 0.00041829 [add_forward_monad_depend]: 4.87e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.364e-05 [cse]: 2.7e-05 [a_3]: 4.013e-05 [Cycle 2]: 0.00058644, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.63003e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.000126 [with_stream_mark]: 9.37999e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.39999e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.804e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.18001e-06 [auto_parallel]: 4.97e-06 [parallel]: 4.15999e-06 [flash_sp]: 2.94001e-06 [merge_comm]: 2.96999e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 4.50003e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.32999e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.78998e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.75002e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04999e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.14e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 7.7e-07 [auto_monad_eliminator]: 6.04999e-06 [cse]: 1.229e-05 [a_3]: 3.197e-05 [py_interpret_to_execute_after_opt_a]: 8.55001e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.056e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.27001e-06 [mutable_eliminate]: 0.0004549 [opt_b]: 0.00018837, [1] [Cycle 1]: 0.00018194, [7] [b_1]: 0.00010793 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.28002e-06 [renormalize]: 6.30011e-07 [cse]: 1.716e-05 [optimize_parallel_all_gather_comm]: 1.661e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.151e-05 [loop_unroll]: 0.00042245 [opt_after_cconv]: 9.486e-05, [1] [Cycle 1]: 8.926e-05, [7] [c_1]: 2.832e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.51e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.529e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 6.924e-05, [1] [Cycle 1]: 6.505e-05, [4] [d_1]: 3.916e-05 [none_parameter_eliminate]: 1.32999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.46999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 5.022e-05 [cse_after_recomputation]: 5.253e-05, [1] [Cycle 1]: 4.805e-05, [1] [cse]: 4.258e-05 [environ_conv]: 4.59002e-06 [swap_dp_allreduce_reducescatter]: 5.71998e-06 [bias_add_comm_swap]: 2.18998e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 1.94999e-06 [micro_interleaved_order_control]: 2.34999e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.50997e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.14998e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.67002e-06 [overlap_recompute_and_grad_model_parallel]: 4.77998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.73002e-06 [overlap_recompute_comm]: 2.03997e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.693e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.01998e-06 [split_layernorm_comm]: 1.94999e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.863e-05, [1] [Cycle 1]: 6.446e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.95001e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.70999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.51002e-06 [auto_monad_reorder]: 1.548e-05 [get_jit_bprop_graph]: 1.17999e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00045566 [validate]: 3.173e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0651016 [execute]: 8.27998e-06 Sums bootstrap : 0.000566s : 0.75% type_inference : 0.006224s : 8.21% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000583s : 0.77% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000418s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000009s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000455s : 0.60% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000422s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.02% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.07% optimize.cse_after_recomputation.cse : 0.000043s : 0.06% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000456s : 0.60% validate : 0.000032s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.065102s : 85.92% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.82% : 0.000024s : 5: substitution.arithmetic_simplify 1.10% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 2.96% : 0.000005s : 4: substitution.graph_param_transform 67.27% : 0.000111s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.43% : 0.000004s : 4: substitution.replace_old_param 6.37% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006179 2 90.78% : 0.005610s : 1: type_inference.infer 9.22% : 0.000570s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.59% : 0.000027s : 3: replace.inline 29.41% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 92.06% : 0.000109s : 3: match.inline 7.94% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.83% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.81% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.68% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.09% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.94% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.68% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000360 8 44.84% : 0.000161s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.16% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.089148 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.85% : 0.003436s : 1: add_attr 3.84% : 0.003425s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.68% : 0.000604s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.06% : 0.000056s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000431s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000464s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.07% : 0.000950s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.38% : 0.002123s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.52% : 0.000466s : 1: opt_after_jit_grad 0.22% : 0.000192s : 1: opt_b 4.51% : 0.004018s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000214s : 1: renormalize.infer 0.22% : 0.000198s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 73.05% : 0.065119s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 7.00% : 0.006239s : 1: type_inference 0.06% : 0.000053s : 1: validate TotalTime = 0.0706229, [24] [bootstrap]: 0.00047838 [type_inference]: 0.00441438 [event_method]: 1.046e-05 [auto_monad]: 5.166e-05 [graph_reusing]: 5.67999e-06 [inline]: 1.76e-06 [add_attr]: 0.00299138, [1] [add_attr_with_inline]: 0.0029836, [1] [Cycle 1]: 4.541e-05, [2] [tag_attr]: 1.18e-05 [meta_addattr_fg_expand]: 3.28e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.094e-05 [insert-virtual-dataset]: 2.30002e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.83002e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00374031, [53] [py_interpret_to_execute]: 1.459e-05 [rewriter_before_opt_a]: 3.858e-05 [opt_a]: 0.00186434, [2] [Cycle 1]: 0.00125934, [45] [expand_dump_flag]: 2.84999e-06 [switch_simplify]: 2.344e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.00029296 [with_stream_mark]: 1.41e-05 [recompute_prepare]: 7.2e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 2.94999e-06 [updatestate_loads_eliminate]: 3.25e-06 [parameter_eliminate]: 1.86003e-06 [a_2]: 7.642e-05 [accelerated_algorithm]: 6.39001e-06 [shard]: 2.07999e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.82999e-06 [merge_send_recv]: 7.92e-06 [auto_parallel]: 5.62999e-06 [parallel]: 1.779e-05 [flash_sp]: 7.51999e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 3.20002e-06 [matmul_add_comm_reduction]: 8.55999e-06 [allreduce_slice_to_reducescatter]: 8.80013e-07 [virtual_shard_identity]: 7.13998e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.98001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 8.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.113e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 9.26998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.34999e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 8.76002e-06 [renormalize]: 0.00034901 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.89e-06 [auto_monad_eliminator]: 1.296e-05 [cse]: 2.738e-05 [a_3]: 4.058e-05 [Cycle 2]: 0.0005958, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.80002e-06 [loop_unroll]: 5.25001e-06 [a_1]: 0.00012464 [with_stream_mark]: 1.037e-05 [recompute_prepare]: 5.87999e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.64001e-06 [parameter_eliminate]: 9.29984e-07 [a_2]: 6.729e-05 [accelerated_algorithm]: 5.62001e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.57001e-06 [merge_send_recv]: 4.48001e-06 [auto_parallel]: 5.14e-06 [parallel]: 3.98999e-06 [flash_sp]: 3.61001e-06 [merge_comm]: 3.27002e-06 [allreduce_fusion]: 2.92002e-06 [matmul_add_comm_reduction]: 5.24e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 6.38e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.64999e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57999e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.77e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 1.58002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.27001e-06 [a_after_grad]: 8.18999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 6.17001e-06 [cse]: 1.301e-05 [a_3]: 3.257e-05 [py_interpret_to_execute_after_opt_a]: 7.24001e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.073e-05 [convert_after_rewriter]: 7.25e-06 [order_py_execute_after_rewriter]: 5.36998e-06 [mutable_eliminate]: 0.00045232 [opt_b]: 0.00018365, [1] [Cycle 1]: 0.0001776, [7] [b_1]: 0.00010843 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.81e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 2.79979e-07 [cse]: 1.671e-05 [optimize_parallel_all_gather_comm]: 1.577e-05 [overlap_param_gather]: 2.21998e-06 [cconv]: 2.24e-05 [loop_unroll]: 0.00042091 [opt_after_cconv]: 9.488e-05, [1] [Cycle 1]: 8.911e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.32999e-06 [updatestate_depend_eliminate]: 5.06002e-06 [updatestate_assign_eliminate]: 2.63e-06 [updatestate_loads_eliminate]: 2.38002e-06 [cse]: 1.626e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.291e-05 [tuple_transform]: 6.911e-05, [1] [Cycle 1]: 6.479e-05, [4] [d_1]: 3.917e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.209e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.548e-05, [1] [cse]: 1.052e-05 [environ_conv]: 4.87e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.90999e-06 [label_fine_grained_interleaved_index]: 2.89001e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.58e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.66002e-06 [ForceFp32Comm]: 8.99978e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.03997e-06 [reorder_send_recv_between_fp_bp]: 2.55997e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54e-06 [control_data_broadcast_order]: 1.142e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.78999e-06 [overlap_recompute_and_grad_model_parallel]: 4.1e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.63998e-06 [overlap_grad_ring_attention]: 4.2e-06 [overlap_grad_flash_sp]: 1.712e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.98002e-06 [split_layernorm_comm]: 1.88002e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.831e-05, [1] [Cycle 1]: 6.413e-05, [6] [build]: 2.51998e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 6.06998e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.44003e-06 [auto_monad_reorder]: 1.56e-05 [get_jit_bprop_graph]: 9.00007e-07 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.000452 [validate]: 3.027e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0581851 [execute]: 8.18001e-06 Sums bootstrap : 0.000478s : 0.72% type_inference : 0.004414s : 6.63% event_method : 0.000010s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000418s : 0.63% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000349s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.68% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000421s : 0.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000452s : 0.68% validate : 0.000030s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058185s : 87.35% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 17.91% : 0.000021s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.48% : 0.000005s : 4: substitution.graph_param_transform 65.96% : 0.000079s : 2: substitution.inline 2.28% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.69% : 0.000004s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004372 2 91.46% : 0.003998s : 1: type_inference.infer 8.54% : 0.000373s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000138 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.19% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.36% : 0.000003s : 17: predicate.arithmetic_simplify 0.94% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 2.04% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.80% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.15% : 0.000002s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.67% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.64% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.17% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.72% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.44% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.16% : 0.000002s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.09% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.91% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.18% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000272 6 42.18% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.82% : 0.000158s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078627 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.81% : 0.002996s : 1: add_attr 3.80% : 0.002987s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000513s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.55% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000767s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.37% : 0.001867s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.59% : 0.000462s : 1: opt_after_jit_grad 0.24% : 0.000187s : 1: opt_b 4.76% : 0.003744s : 1: optimize 0.10% : 0.000076s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000006s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000186s : 1: renormalize.infer 0.20% : 0.000157s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.02% : 0.058200s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.63% : 0.004429s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.075838, [24] [bootstrap]: 0.00046705 [type_inference]: 0.0055315 [event_method]: 1.457e-05 [auto_monad]: 5.345e-05 [graph_reusing]: 5.76e-06 [inline]: 1.79998e-06 [add_attr]: 0.0029929, [1] [add_attr_with_inline]: 0.00298495, [1] [Cycle 1]: 4.552e-05, [2] [tag_attr]: 1.465e-05 [meta_addattr_fg_expand]: 4.57e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.594e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.81003e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00748121, [53] [py_interpret_to_execute]: 2.126e-05 [rewriter_before_opt_a]: 5.879e-05 [opt_a]: 0.00540721, [2] [Cycle 1]: 0.00149528, [45] [expand_dump_flag]: 2.64999e-06 [switch_simplify]: 3.086e-05 [loop_unroll]: 2.086e-05 [a_1]: 0.00044744 [with_stream_mark]: 1.341e-05 [recompute_prepare]: 7.44002e-06 [updatestate_depend_eliminate]: 3.51999e-06 [updatestate_assign_eliminate]: 3.43e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.479e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 8.18999e-06 [auto_parallel]: 5.79e-06 [parallel]: 1.744e-05 [flash_sp]: 6.94999e-06 [merge_comm]: 3.51999e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 8.52e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.20998e-06 [virtual_dataset]: 5.61e-06 [get_grad_eliminate_]: 5.60001e-06 [virtual_output]: 5.49998e-06 [merge_forward]: 3.51001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.037e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.22001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.89999e-06 [receive_attached]: 2.18002e-06 [after_resolve]: 1.026e-05 [a_after_grad]: 8.82999e-06 [renormalize]: 0.00041607 [add_forward_monad_depend]: 4.62998e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.336e-05 [cse]: 2.534e-05 [a_3]: 4.086e-05 [Cycle 2]: 0.00390179, [45] [expand_dump_flag]: 7.99977e-07 [switch_simplify]: 6.81001e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012117 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 5.55001e-06 [updatestate_depend_eliminate]: 9.60001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.86e-06 [parameter_eliminate]: 3.13e-06 [a_2]: 8.819e-05 [accelerated_algorithm]: 7.31999e-06 [shard]: 1.93002e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 5.84e-06 [auto_parallel]: 7.28e-06 [parallel]: 5.05999e-06 [flash_sp]: 4.25e-06 [merge_comm]: 3.18e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 6.91999e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.98e-06 [virtual_dataset]: 5.52001e-06 [get_grad_eliminate_]: 5.20001e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.94001e-06 [cell_reuse_recompute_pass]: 1.77999e-06 [offload_activation]: 7.4e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.022e-05 [merge_recompute_call_nodes]: 8.00006e-07 [before_grad]: 8.02998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09999e-06 [meta_fg_expand]: 2.02999e-06 [flash_sp_send_recv_attached]: 8.89995e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.44998e-06 [a_after_grad]: 8.83001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.84e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 9.20001e-06 [cse]: 1.878e-05 [a_3]: 3.263e-05 [py_interpret_to_execute_after_opt_a]: 1.065e-05 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.369e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 5.28002e-06 [mutable_eliminate]: 0.00063683 [opt_b]: 0.00018701, [1] [Cycle 1]: 0.00018045, [7] [b_1]: 0.00011111 [b_2]: 7.60998e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.44001e-06 [renormalize]: 4.50003e-07 [cse]: 1.721e-05 [optimize_parallel_all_gather_comm]: 1.629e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.395e-05 [loop_unroll]: 0.00044718 [opt_after_cconv]: 9.607e-05, [1] [Cycle 1]: 9.03e-05, [7] [c_1]: 2.824e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.71e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.62e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.236e-05 [tuple_transform]: 7.037e-05, [1] [Cycle 1]: 6.584e-05, [4] [d_1]: 4.003e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 2.3999e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 4.37e-05 [cse_after_recomputation]: 2.03e-05, [1] [Cycle 1]: 1.578e-05, [1] [cse]: 1.072e-05 [environ_conv]: 4.92e-06 [swap_dp_allreduce_reducescatter]: 5.20999e-06 [bias_add_comm_swap]: 2.23002e-06 [label_micro_interleaved_index]: 4.28001e-06 [label_fine_grained_interleaved_index]: 2.50997e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.38002e-06 [micro_interleaved_order_control]: 2.44999e-06 [assign_add_opt]: 1.31002e-06 [ForceFp32Comm]: 8.90024e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.05002e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 1.04e-06 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 3.35998e-06 [overlap_recompute_and_grad_model_parallel]: 4.13999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.41998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.01e-06 [overlap_grad_ring_attention]: 3.80998e-06 [overlap_grad_flash_sp]: 1.704e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.11002e-06 [symbol_engine_optimizer]: 6.916e-05, [1] [Cycle 1]: 6.506e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.48999e-06 [elim_not_effective]: 1.162e-05 [opt_reshape]: 6.36998e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.5e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00048227 [validate]: 3.468e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.0585031 [execute]: 9.09e-06 Sums bootstrap : 0.000467s : 0.68% type_inference : 0.005531s : 8.06% event_method : 0.000015s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.09% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000569s : 0.83% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000013s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000163s : 0.24% optimize.opt_a.accelerated_algorithm : 0.000014s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000014s : 0.02% optimize.opt_a.auto_parallel : 0.000013s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000016s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000416s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000023s : 0.03% optimize.opt_a.cse : 0.000044s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000637s : 0.93% optimize.opt_b.b_1 : 0.000111s : 0.16% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000024s : 0.03% optimize.loop_unroll : 0.000447s : 0.65% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000482s : 0.70% validate : 0.000035s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058503s : 85.26% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000169 30 17.06% : 0.000029s : 5: substitution.arithmetic_simplify 1.04% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.55% : 0.000006s : 4: substitution.graph_param_transform 64.51% : 0.000109s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.56% : 0.000004s : 4: substitution.remove_not_recompute_node 2.59% : 0.000004s : 4: substitution.replace_old_param 6.17% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005489 2 90.06% : 0.004943s : 1: type_inference.infer 9.94% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000040 5 70.32% : 0.000028s : 3: replace.inline 29.68% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 92.00% : 0.000107s : 3: match.inline 8.00% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000164 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.74% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.53% : 0.000004s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.86% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_depend_swap 1.88% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.35% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000002s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.34% : 0.000001s : 4: predicate.parallel_virtual_node 1.57% : 0.000003s : 16: predicate.partial_defer_inline 1.39% : 0.000002s : 17: predicate.partial_eliminate 0.81% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000002s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.33% : 0.000002s : 16: predicate.switch_defer_inline 1.92% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.46% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.77% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.23% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000346 8 46.88% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.12% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087838 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.41% : 0.002997s : 1: add_attr 3.40% : 0.002988s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.05% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.57% : 0.000501s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000456s : 1: loop_unroll 0.01% : 0.000005s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.74% : 0.000647s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.08% : 0.000952s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 6.16% : 0.005410s : 1: opt_a 0.11% : 0.000100s : 1: opt_after_cconv 0.56% : 0.000493s : 1: opt_after_jit_grad 0.22% : 0.000190s : 1: opt_b 8.52% : 0.007485s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000030s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000205s : 1: renormalize.infer 0.23% : 0.000204s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000038s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 66.62% : 0.058520s : 1: task_emit 0.08% : 0.000073s : 1: tuple_transform 6.31% : 0.005545s : 1: type_inference 0.07% : 0.000058s : 1: validate TotalTime = 0.110052, [24] [bootstrap]: 0.0004799 [type_inference]: 0.0112885 [event_method]: 4.873e-05 [auto_monad]: 0.00011942 [graph_reusing]: 8.20999e-06 [inline]: 2.22999e-06 [add_attr]: 0.00297252, [1] [add_attr_with_inline]: 0.00296381, [1] [Cycle 1]: 6.984e-05, [2] [tag_attr]: 3.449e-05 [meta_addattr_fg_expand]: 9.05999e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 4.97e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.0133271, [53] [py_interpret_to_execute]: 3.763e-05 [rewriter_before_opt_a]: 0.0001446 [opt_a]: 0.0110772, [3] [Cycle 1]: 0.00713009, [45] [expand_dump_flag]: 4.18999e-06 [switch_simplify]: 7.299e-05 [loop_unroll]: 6.147e-05 [a_1]: 0.00144763 [with_stream_mark]: 2.301e-05 [recompute_prepare]: 2.149e-05 [updatestate_depend_eliminate]: 8.80999e-06 [updatestate_assign_eliminate]: 7.56001e-06 [updatestate_loads_eliminate]: 7.1e-06 [parameter_eliminate]: 2.55002e-06 [a_2]: 0.00024409 [accelerated_algorithm]: 3.003e-05 [shard]: 2.21e-06 [meta_shard_fg_expand]: 3.30998e-06 [shard_inline]: 1.611e-05 [merge_send_recv]: 1.532e-05 [auto_parallel]: 4.457e-05 [parallel]: 1.848e-05 [flash_sp]: 1.125e-05 [merge_comm]: 9.84001e-06 [allreduce_fusion]: 9.42001e-06 [matmul_add_comm_reduction]: 2.607e-05 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 1.791e-05 [virtual_dataset]: 1.564e-05 [get_grad_eliminate_]: 1.504e-05 [virtual_output]: 1.51e-05 [merge_forward]: 9.15999e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 1.725e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.868e-05 [merge_recompute_call_nodes]: 1.65001e-06 [before_grad]: 2.731e-05 [set_forward_comm_id_for_comm_node_pass]: 9.50001e-06 [meta_fg_expand]: 0.00140301 [flash_sp_send_recv_attached]: 3.71001e-06 [receive_attached]: 2.56998e-06 [after_resolve]: 5.973e-05 [a_after_grad]: 8.025e-05 [renormalize]: 0.00248673 [add_forward_monad_depend]: 9.81e-06 [auto_monad_grad]: 4.90999e-06 [auto_monad_eliminator]: 5.624e-05 [cse]: 0.00016912 [a_3]: 0.00033488 [Cycle 2]: 0.00302632, [45] [expand_dump_flag]: 1.62999e-06 [switch_simplify]: 4.658e-05 [loop_unroll]: 4.445e-05 [a_1]: 0.00152841 [with_stream_mark]: 1.224e-05 [recompute_prepare]: 1.064e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.57e-06 [updatestate_loads_eliminate]: 3.75e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012646 [accelerated_algorithm]: 1.224e-05 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.73002e-06 [shard_inline]: 9.42999e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.2e-06 [parallel]: 4.57e-06 [flash_sp]: 3.36001e-06 [merge_comm]: 5.03002e-06 [allreduce_fusion]: 4.60001e-06 [matmul_add_comm_reduction]: 7.36999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 8.65001e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.38001e-06 [merge_forward]: 4.37e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 9.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.599e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.37e-05 [set_forward_comm_id_for_comm_node_pass]: 5.19e-06 [meta_fg_expand]: 6.796e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.604e-05 [a_after_grad]: 1.458e-05 [renormalize]: 0.00059198 [add_forward_monad_depend]: 3.71001e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.482e-05 [cse]: 4.697e-05 [a_3]: 9.784e-05 [Cycle 3]: 0.00090633, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 1.078e-05 [loop_unroll]: 9.03002e-06 [a_1]: 0.00025143 [with_stream_mark]: 1.062e-05 [recompute_prepare]: 9.15001e-06 [updatestate_depend_eliminate]: 4.92e-06 [updatestate_assign_eliminate]: 3.92998e-06 [updatestate_loads_eliminate]: 3.92002e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 0.00012401 [accelerated_algorithm]: 1.196e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 9.07001e-06 [merge_send_recv]: 7.28e-06 [auto_parallel]: 7.08998e-06 [parallel]: 4.4e-06 [flash_sp]: 1.03001e-06 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 4.98001e-06 [matmul_add_comm_reduction]: 8.15e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 1.039e-05 [virtual_dataset]: 8.70999e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.32e-06 [merge_forward]: 4.32e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 8.69e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.626e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.394e-05 [set_forward_comm_id_for_comm_node_pass]: 5.17999e-06 [meta_fg_expand]: 2.81999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.334e-05 [a_after_grad]: 1.426e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 1.01002e-06 [auto_monad_eliminator]: 1.125e-05 [cse]: 2.728e-05 [a_3]: 6.012e-05 [py_interpret_to_execute_after_opt_a]: 1.023e-05 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 4.699e-05 [convert_after_rewriter]: 8.67e-06 [order_py_execute_after_rewriter]: 6.93e-06 [mutable_eliminate]: 0.00045306 [opt_b]: 0.00029026, [1] [Cycle 1]: 0.00028414, [7] [b_1]: 0.00019125 [b_2]: 1.083e-05 [updatestate_depend_eliminate]: 7.03e-06 [updatestate_assign_eliminate]: 4.12e-06 [updatestate_loads_eliminate]: 3.86001e-06 [renormalize]: 5.60016e-07 [cse]: 3.198e-05 [optimize_parallel_all_gather_comm]: 2.015e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 1.899e-05 [loop_unroll]: 0.00042148 [opt_after_cconv]: 0.00013756, [1] [Cycle 1]: 0.00013167, [7] [c_1]: 4.857e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 7.54002e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.82998e-06 [cse]: 3.053e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 2.825e-05 [tuple_transform]: 0.00010159, [1] [Cycle 1]: 9.679e-05, [4] [d_1]: 6.721e-05 [none_parameter_eliminate]: 1.74e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 9.52999e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 5.718e-05 [cse_after_recomputation]: 3.219e-05, [1] [Cycle 1]: 2.772e-05, [1] [cse]: 2.234e-05 [environ_conv]: 8.83001e-06 [swap_dp_allreduce_reducescatter]: 7.58999e-06 [bias_add_comm_swap]: 2.32999e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.56998e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.17001e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.06998e-06 [reorder_send_recv_between_fp_bp]: 2.65002e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.22999e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61002e-06 [control_data_broadcast_order]: 1.714e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 5.09003e-06 [overlap_recompute_and_grad_model_parallel]: 5.42999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.48998e-06 [overlap_grad_ring_attention]: 5.06002e-06 [overlap_grad_flash_sp]: 2.369e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.91998e-06 [split_layernorm_comm]: 1.93002e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 9.917e-05, [1] [Cycle 1]: 9.5e-05, [6] [build]: 9.57999e-06 [elim_shapecalc]: 1.356e-05 [elim_not_effective]: 1.85e-05 [opt_reshape]: 1.056e-05 [fold_const_symbol]: 1.499e-05 [renormalize]: 2.00002e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.494e-05 [get_jit_bprop_graph]: 1.12999e-06 [rewriter_after_jit_bprop_graph]: 3.47002e-06 [opt_after_jit_grad]: 0.00046384 [validate]: 4.595e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.0809842 [execute]: 7.82e-06 Sums bootstrap : 0.000480s : 0.45% type_inference : 0.011289s : 10.67% event_method : 0.000049s : 0.05% auto_monad : 0.000119s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.05% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000145s : 0.14% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000130s : 0.12% optimize.opt_a.loop_unroll : 0.000115s : 0.11% optimize.opt_a.a_1 : 0.003227s : 3.05% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000059s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001474s : 1.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.08% optimize.opt_a.a_after_grad : 0.000109s : 0.10% optimize.opt_a.renormalize : 0.003079s : 2.91% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000243s : 0.23% optimize.opt_a.a_3 : 0.000493s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.43% optimize.opt_b.b_1 : 0.000191s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000421s : 0.40% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000464s : 0.44% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.080984s : 76.52% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000757 222 5.93% : 0.000045s : 12: substitution.arithmetic_simplify 1.80% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.82% : 0.000423s : 17: substitution.inline 2.05% : 0.000016s : 2: substitution.inline_without_move 1.30% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.91% : 0.000014s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.80% : 0.000014s : 20: substitution.remove_not_recompute_node 3.20% : 0.000024s : 10: substitution.replace_applicator 1.39% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.74% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.33% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011213 2 86.68% : 0.009720s : 1: type_inference.infer 13.32% : 0.001493s : 1: type_inference.specialize ------[replace.] 0.000216 33 57.35% : 0.000124s : 17: replace.inline 42.65% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000448 33 92.48% : 0.000414s : 17: match.inline 7.52% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.08% : 0.000016s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.79% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.31% : 0.000017s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 249: predicate.inline 1.27% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.24% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000009s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.05% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.05% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.15% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.90% : 0.000014s : 101: predicate.switch_defer_inline 2.98% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000037s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.48% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.51% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.01% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.63% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001591 34 56.77% : 0.000903s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.23% : 0.000688s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.134679 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.21% : 0.002977s : 1: add_attr 2.20% : 0.002967s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000126s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.38% : 0.000514s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000035s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000056s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.32% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.34% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.65% : 0.004920s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000177s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.23% : 0.011080s : 1: opt_a 0.10% : 0.000141s : 1: opt_after_cconv 0.35% : 0.000473s : 1: opt_after_jit_grad 0.22% : 0.000294s : 1: opt_b 9.90% : 0.013331s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000054s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.19% : 0.001606s : 2: renormalize.infer 1.08% : 0.001460s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.11% : 0.000149s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000102s : 1: symbol_engine_optimizer 60.14% : 0.081002s : 1: task_emit 0.08% : 0.000105s : 1: tuple_transform 8.39% : 0.011304s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.0701892, [24] [bootstrap]: 0.00050611 [type_inference]: 0.00431886 [event_method]: 1.072e-05 [auto_monad]: 5.107e-05 [graph_reusing]: 5.27001e-06 [inline]: 2.04999e-06 [add_attr]: 0.00294306, [1] [add_attr_with_inline]: 0.00293495, [1] [Cycle 1]: 4.546e-05, [2] [tag_attr]: 1.16e-05 [meta_addattr_fg_expand]: 3.45998e-06 [parallel-infer-symbol]: 3.11001e-06 [pre_auto_parallel]: 2.064e-05 [insert-virtual-dataset]: 2.69999e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.51998e-06 [optimize]: 0.00367025, [53] [py_interpret_to_execute]: 1.47e-05 [rewriter_before_opt_a]: 3.863e-05 [opt_a]: 0.00186921, [2] [Cycle 1]: 0.00126434, [45] [expand_dump_flag]: 3.25e-06 [switch_simplify]: 2.424e-05 [loop_unroll]: 1.358e-05 [a_1]: 0.00029023 [with_stream_mark]: 1.33e-05 [recompute_prepare]: 2.4e-05 [updatestate_depend_eliminate]: 3.99002e-06 [updatestate_assign_eliminate]: 3.29001e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.62001e-06 [a_2]: 7.745e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.40001e-06 [shard_inline]: 6.01998e-06 [merge_send_recv]: 7.93001e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.733e-05 [flash_sp]: 7.43e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.96001e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.58e-06 [cell_reuse_recompute_pass]: 1.25999e-06 [offload_activation]: 9.10001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.46002e-06 [before_grad]: 9.49999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.40003e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.098e-05 [a_after_grad]: 9.11998e-06 [renormalize]: 0.00033975 [add_forward_monad_depend]: 4.32e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.241e-05 [cse]: 2.645e-05 [a_3]: 4.072e-05 [Cycle 2]: 0.00059533, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.59999e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.00012864 [with_stream_mark]: 9.04e-06 [recompute_prepare]: 5.56998e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.37999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.668e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.13001e-06 [meta_shard_fg_expand]: 1.04e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.20001e-06 [parallel]: 4.22e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.27999e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.19e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.20002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.027e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.65998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.95998e-06 [meta_fg_expand]: 1.57001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.12e-06 [after_resolve]: 1.007e-05 [a_after_grad]: 8.50001e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.288e-05 [a_3]: 3.217e-05 [py_interpret_to_execute_after_opt_a]: 7.43e-06 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 3.021e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 5.00999e-06 [mutable_eliminate]: 0.00045061 [opt_b]: 0.0001798, [1] [Cycle 1]: 0.00017374, [7] [b_1]: 0.00010598 [b_2]: 6.83998e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.35997e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 3.39991e-07 [cse]: 1.652e-05 [optimize_parallel_all_gather_comm]: 1.511e-05 [overlap_param_gather]: 2.15002e-06 [cconv]: 2.191e-05 [loop_unroll]: 0.00041308 [opt_after_cconv]: 9.446e-05, [1] [Cycle 1]: 8.889e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.656e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 1.304e-05 [tuple_transform]: 6.869e-05, [1] [Cycle 1]: 6.418e-05, [4] [d_1]: 3.843e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.50999e-06 [add_recomputation]: 4.323e-05 [cse_after_recomputation]: 2.013e-05, [1] [Cycle 1]: 1.581e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.4e-06 [swap_dp_allreduce_reducescatter]: 5.15001e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.61e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.39998e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.50002e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.142e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.25e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45999e-06 [overlap_recompute_comm]: 2.17999e-06 [overlap_grad_ring_attention]: 3.88999e-06 [overlap_grad_flash_sp]: 1.693e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.07999e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.32e-06 [symbol_engine_optimizer]: 6.799e-05, [1] [Cycle 1]: 6.369e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 8.2e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 5.82999e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.79e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.539e-05 [get_jit_bprop_graph]: 9.49978e-07 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00044795 [validate]: 3.115e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0579452 [execute]: 7.66999e-06 Sums bootstrap : 0.000506s : 0.76% type_inference : 0.004319s : 6.52% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000419s : 0.63% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000030s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000021s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000340s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000451s : 0.68% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000413s : 0.62% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000448s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057945s : 87.41% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.59% : 0.000022s : 4: substitution.arithmetic_simplify 1.42% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.27% : 0.000005s : 4: substitution.graph_param_transform 65.07% : 0.000078s : 2: substitution.inline 2.22% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.91% : 0.000005s : 4: substitution.remove_not_recompute_node 3.49% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004278 2 90.19% : 0.003858s : 1: type_inference.infer 9.81% : 0.000420s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.14% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.24% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.24% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.18% : 0.000003s : 26: predicate.load_eliminater 1.31% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.89% : 0.000001s : 8: predicate.remove_not_recompute_node 1.28% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 1.08% : 0.000001s : 8: predicate.same_eliminate 0.69% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.20% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.18% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000245 6 43.02% : 0.000105s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.98% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078085 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.77% : 0.002947s : 1: add_attr 3.76% : 0.002939s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.69% : 0.000540s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.01% : 0.000789s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.40% : 0.001872s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.59% : 0.000458s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.71% : 0.003674s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.23% : 0.000183s : 1: renormalize.infer 0.19% : 0.000151s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.23% : 0.057962s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.55% : 0.004332s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.108669, [24] [bootstrap]: 0.00056379 [type_inference]: 0.0103546 [event_method]: 4.349e-05 [auto_monad]: 0.00011448 [graph_reusing]: 7.51999e-06 [inline]: 1.96998e-06 [add_attr]: 0.00300367, [1] [add_attr_with_inline]: 0.00299554, [1] [Cycle 1]: 6.576e-05, [2] [tag_attr]: 3.042e-05 [meta_addattr_fg_expand]: 8.43001e-06 [parallel-infer-symbol]: 3.01999e-06 [pre_auto_parallel]: 4.691e-05 [insert-virtual-dataset]: 2.22001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.78002e-06 [pipeline_split]: 1.55001e-06 [optimize]: 0.013029, [53] [py_interpret_to_execute]: 3.722e-05 [rewriter_before_opt_a]: 0.00012629 [opt_a]: 0.010789, [3] [Cycle 1]: 0.00691822, [45] [expand_dump_flag]: 3.4e-06 [switch_simplify]: 6.593e-05 [loop_unroll]: 5.431e-05 [a_1]: 0.00132667 [with_stream_mark]: 2.398e-05 [recompute_prepare]: 2.123e-05 [updatestate_depend_eliminate]: 9.09003e-06 [updatestate_assign_eliminate]: 7.44002e-06 [updatestate_loads_eliminate]: 7.03e-06 [parameter_eliminate]: 2.59999e-06 [a_2]: 0.00024463 [accelerated_algorithm]: 3.128e-05 [shard]: 1.82999e-06 [meta_shard_fg_expand]: 3.21001e-06 [shard_inline]: 1.61e-05 [merge_send_recv]: 1.582e-05 [auto_parallel]: 1.032e-05 [parallel]: 2.287e-05 [flash_sp]: 1.119e-05 [merge_comm]: 1.016e-05 [allreduce_fusion]: 9.19e-06 [matmul_add_comm_reduction]: 2.645e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.812e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.515e-05 [virtual_output]: 1.552e-05 [merge_forward]: 9.25999e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 1.834e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.811e-05 [merge_recompute_call_nodes]: 1.43002e-06 [before_grad]: 2.697e-05 [set_forward_comm_id_for_comm_node_pass]: 9.95002e-06 [meta_fg_expand]: 0.00141792 [flash_sp_send_recv_attached]: 3.47997e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 5.792e-05 [a_after_grad]: 8.105e-05 [renormalize]: 0.00239474 [add_forward_monad_depend]: 9.04998e-06 [auto_monad_grad]: 5.40999e-06 [auto_monad_eliminator]: 5.623e-05 [cse]: 0.00019289 [a_3]: 0.00033392 [Cycle 2]: 0.00293696, [45] [expand_dump_flag]: 1.59e-06 [switch_simplify]: 4.755e-05 [loop_unroll]: 4.332e-05 [a_1]: 0.0015207 [with_stream_mark]: 1.21e-05 [recompute_prepare]: 1.068e-05 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.56999e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.00012544 [accelerated_algorithm]: 1.161e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.09e-06 [merge_send_recv]: 6.71999e-06 [auto_parallel]: 7.04001e-06 [parallel]: 4.58001e-06 [flash_sp]: 3.25e-06 [merge_comm]: 5.09e-06 [allreduce_fusion]: 4.52e-06 [matmul_add_comm_reduction]: 7.63001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 9.81e-06 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.25999e-06 [merge_forward]: 4.43001e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 1.052e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.764e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 1.394e-05 [set_forward_comm_id_for_comm_node_pass]: 5.45001e-06 [meta_fg_expand]: 3.407e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.41998e-06 [after_resolve]: 1.524e-05 [a_after_grad]: 1.46e-05 [renormalize]: 0.00057895 [add_forward_monad_depend]: 4.08001e-06 [auto_monad_grad]: 1.20999e-06 [auto_monad_eliminator]: 1.443e-05 [cse]: 4.609e-05 [a_3]: 6.486e-05 [Cycle 3]: 0.00091932, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.04e-05 [loop_unroll]: 8.88002e-06 [a_1]: 0.00024855 [with_stream_mark]: 9.94001e-06 [recompute_prepare]: 9.36998e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.91001e-06 [parameter_eliminate]: 9.19972e-07 [a_2]: 0.00012339 [accelerated_algorithm]: 1.166e-05 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 8.79998e-06 [merge_send_recv]: 6.54999e-06 [auto_parallel]: 7.05e-06 [parallel]: 4.62e-06 [flash_sp]: 1.15999e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.78001e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 9.99999e-06 [virtual_dataset]: 8.77e-06 [get_grad_eliminate_]: 8.37e-06 [virtual_output]: 8.27998e-06 [merge_forward]: 4.02e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 3.316e-05 [merge_recompute_call_nodes]: 7.60017e-07 [before_grad]: 1.423e-05 [set_forward_comm_id_for_comm_node_pass]: 5.62999e-06 [meta_fg_expand]: 2.99999e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.465e-05 [a_after_grad]: 1.451e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 1.106e-05 [cse]: 2.656e-05 [a_3]: 5.944e-05 [py_interpret_to_execute_after_opt_a]: 1.004e-05 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 4.634e-05 [convert_after_rewriter]: 8.78001e-06 [order_py_execute_after_rewriter]: 6.80998e-06 [mutable_eliminate]: 0.00045709 [opt_b]: 0.00028728, [1] [Cycle 1]: 0.00028104, [7] [b_1]: 0.0001879 [b_2]: 1.106e-05 [updatestate_depend_eliminate]: 7.33e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.9e-06 [renormalize]: 4.09986e-07 [cse]: 3.17e-05 [optimize_parallel_all_gather_comm]: 2.094e-05 [overlap_param_gather]: 1.73002e-06 [cconv]: 1.904e-05 [loop_unroll]: 0.00042625 [opt_after_cconv]: 0.00013764, [1] [Cycle 1]: 0.00013159, [7] [c_1]: 4.886e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 7.45e-06 [updatestate_assign_eliminate]: 4.25999e-06 [updatestate_loads_eliminate]: 3.85998e-06 [cse]: 3.034e-05 [renormalize]: 2.70025e-07 [remove_dup_value]: 2.801e-05 [tuple_transform]: 0.00010078, [1] [Cycle 1]: 9.607e-05, [4] [d_1]: 6.578e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 9.94001e-06 [partial_unused_args_eliminate]: 1.90001e-06 [add_recomputation]: 5.818e-05 [cse_after_recomputation]: 3.287e-05, [1] [Cycle 1]: 2.785e-05, [1] [cse]: 2.244e-05 [environ_conv]: 8.74e-06 [swap_dp_allreduce_reducescatter]: 7.71999e-06 [bias_add_comm_swap]: 2.38998e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.69001e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.29e-06 [full_micro_interleaved_order_control]: 2.09999e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.14e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.42e-06 [overlap_opt_shard_grad_in_pipeline]: 1.99e-06 [control_data_broadcast_order]: 1.661e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 5.05001e-06 [overlap_recompute_and_grad_model_parallel]: 5.76e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.34e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 1.91998e-06 [overlap_grad_ring_attention]: 5.25001e-06 [overlap_grad_flash_sp]: 2.461e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.62001e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 9.815e-05, [1] [Cycle 1]: 9.384e-05, [6] [build]: 1.015e-05 [elim_shapecalc]: 1.344e-05 [elim_not_effective]: 1.793e-05 [opt_reshape]: 1.006e-05 [fold_const_symbol]: 1.499e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 2.464e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00046891 [validate]: 4.503e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0807341 [execute]: 7.85e-06 Sums bootstrap : 0.000564s : 0.54% type_inference : 0.010355s : 9.92% event_method : 0.000043s : 0.04% auto_monad : 0.000114s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000047s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.04% optimize.rewriter_before_opt_a : 0.000126s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000124s : 0.12% optimize.opt_a.loop_unroll : 0.000107s : 0.10% optimize.opt_a.a_1 : 0.003096s : 2.97% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000041s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000493s : 0.47% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000024s : 0.02% optimize.opt_a.parallel : 0.000032s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000038s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000079s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001455s : 1.39% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000088s : 0.08% optimize.opt_a.a_after_grad : 0.000110s : 0.11% optimize.opt_a.renormalize : 0.002974s : 2.85% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000266s : 0.25% optimize.opt_a.a_3 : 0.000458s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.44% optimize.opt_b.b_1 : 0.000188s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000426s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000066s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000058s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000025s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000469s : 0.45% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.080734s : 77.32% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000727 218 5.79% : 0.000042s : 11: substitution.arithmetic_simplify 1.96% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.64% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000007s : 8: substitution.graph_param_transform 0.33% : 0.000002s : 2: substitution.incorporate_call 0.32% : 0.000002s : 2: substitution.incorporate_call_switch 54.63% : 0.000397s : 16: substitution.inline 2.12% : 0.000015s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.14% : 0.000016s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.95% : 0.000014s : 20: substitution.remove_not_recompute_node 3.20% : 0.000023s : 10: substitution.replace_applicator 1.41% : 0.000010s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.77% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.89% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.66% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.46% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010287 2 87.11% : 0.008962s : 1: type_inference.infer 12.89% : 0.001326s : 1: type_inference.specialize ------[replace.] 0.000196 30 58.65% : 0.000115s : 16: replace.inline 41.35% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000420 30 92.50% : 0.000388s : 16: match.inline 7.50% : 0.000032s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000735 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000015s : 99: predicate.arithmetic_simplify 1.13% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.77% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000041s : 244: predicate.inline 1.29% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000012s : 89: predicate.partial_eliminate 1.10% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.84% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.18% : 0.000001s : 8: predicate.row_tensor_eliminate 1.31% : 0.000010s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.27% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.82% : 0.000013s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.87% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.47% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.32% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.60% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001506 32 57.05% : 0.000859s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.95% : 0.000647s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.132757 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.27% : 0.003008s : 1: add_attr 2.26% : 0.002999s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000122s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000599s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000013s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000011s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000435s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.58% : 0.004756s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.13% : 0.010792s : 1: opt_a 0.11% : 0.000141s : 1: opt_after_cconv 0.36% : 0.000478s : 1: opt_after_jit_grad 0.22% : 0.000291s : 1: opt_b 9.82% : 0.013033s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000052s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000032s : 1: remove_dup_value 1.17% : 0.001557s : 2: renormalize.infer 1.06% : 0.001404s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.10% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 60.83% : 0.080751s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 7.81% : 0.010368s : 1: type_inference 0.05% : 0.000069s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x3-ge],max_mem:62.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x4-pynative],max_mem:62.0M TotalTime = 0.0215427, [24] [bootstrap]: 0.00055836 [type_inference]: 0.00620097 [event_method]: 1.495e-05 [auto_monad]: 5.43e-05 [graph_reusing]: 5.39998e-06 [inline]: 1.60001e-06 [add_attr]: 0.00343519, [1] [add_attr_with_inline]: 0.00342462, [1] [Cycle 1]: 4.333e-05, [2] [tag_attr]: 1.526e-05 [meta_addattr_fg_expand]: 4.35e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.781e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00398599, [53] [py_interpret_to_execute]: 1.943e-05 [rewriter_before_opt_a]: 5.744e-05 [opt_a]: 0.0021441, [2] [Cycle 1]: 0.00154476, [45] [expand_dump_flag]: 2.69001e-06 [switch_simplify]: 3.031e-05 [loop_unroll]: 2.062e-05 [a_1]: 0.00045098 [with_stream_mark]: 1.333e-05 [recompute_prepare]: 7.72998e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.21001e-06 [updatestate_loads_eliminate]: 2.97002e-06 [parameter_eliminate]: 1.66002e-06 [a_2]: 7.556e-05 [accelerated_algorithm]: 6.73e-06 [shard]: 1.93997e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 7.38999e-06 [auto_parallel]: 3.275e-05 [parallel]: 2.158e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 4.04002e-06 [matmul_add_comm_reduction]: 9.47001e-06 [allreduce_slice_to_reducescatter]: 8.59989e-07 [virtual_shard_identity]: 7.95e-06 [virtual_dataset]: 6.11998e-06 [get_grad_eliminate_]: 5.72999e-06 [virtual_output]: 6.41e-06 [merge_forward]: 3.54002e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 8.80001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.113e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 9.28002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.49999e-06 [flash_sp_send_recv_attached]: 2.84999e-06 [receive_attached]: 2.45002e-06 [after_resolve]: 1.026e-05 [a_after_grad]: 9.14998e-06 [renormalize]: 0.00042113 [add_forward_monad_depend]: 4.51002e-06 [auto_monad_grad]: 2.28002e-06 [auto_monad_eliminator]: 1.338e-05 [cse]: 2.705e-05 [a_3]: 4.047e-05 [Cycle 2]: 0.00059022, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.66e-06 [loop_unroll]: 5.65001e-06 [a_1]: 0.00012645 [with_stream_mark]: 9.54999e-06 [recompute_prepare]: 5.49998e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.45997e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 6.725e-05 [accelerated_algorithm]: 5.41002e-06 [shard]: 1.22e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.59e-06 [parallel]: 5.02e-06 [flash_sp]: 3.16001e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.65997e-06 [matmul_add_comm_reduction]: 5.06997e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 5.69999e-06 [virtual_dataset]: 5.13002e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.32001e-06 [merge_forward]: 2.39001e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61e-06 [merge_recompute_call_nodes]: 6.49976e-07 [before_grad]: 8.08999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 9.29984e-07 [after_resolve]: 9.04e-06 [a_after_grad]: 8.04997e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.99979e-07 [auto_monad_grad]: 7.39994e-07 [auto_monad_eliminator]: 6.34001e-06 [cse]: 1.689e-05 [a_3]: 3.17e-05 [py_interpret_to_execute_after_opt_a]: 7.75e-06 [slice_cell_reuse_recomputed_activation]: 2.03002e-06 [rewriter_after_opt_a]: 2.966e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 4.83001e-06 [mutable_eliminate]: 0.00044981 [opt_b]: 0.00018289, [1] [Cycle 1]: 0.00017667, [7] [b_1]: 0.00010753 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 3.19997e-07 [cse]: 1.71e-05 [optimize_parallel_all_gather_comm]: 1.569e-05 [overlap_param_gather]: 1.86e-06 [cconv]: 2.171e-05 [loop_unroll]: 0.00041634 [opt_after_cconv]: 9.487e-05, [1] [Cycle 1]: 8.932e-05, [7] [c_1]: 2.751e-05 [parameter_eliminate]: 2.15002e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.572e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.286e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.443e-05, [4] [d_1]: 3.883e-05 [none_parameter_eliminate]: 1.81e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 5.077e-05 [cse_after_recomputation]: 2.096e-05, [1] [Cycle 1]: 1.635e-05, [1] [cse]: 1.111e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 5.31002e-06 [bias_add_comm_swap]: 2.86e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.84001e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.16998e-06 [assign_add_opt]: 1.42e-06 [ForceFp32Comm]: 1.11997e-06 [remove_cast_before_assign_add]: 7.29982e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.67001e-06 [comm_op_add_attrs]: 1.11002e-06 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.10001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92999e-06 [control_data_broadcast_order]: 1.108e-05 [grouped_pairwise_exchange_alltoall]: 1.55999e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.75001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.59001e-06 [overlap_grad_ring_attention]: 3.97002e-06 [overlap_grad_flash_sp]: 1.717e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 1.91e-06 [split_layernorm_comm]: 2.22999e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 6.854e-05, [1] [Cycle 1]: 6.421e-05, [6] [build]: 2.63998e-06 [elim_shapecalc]: 8.43001e-06 [elim_not_effective]: 1.132e-05 [opt_reshape]: 6.22001e-06 [fold_const_symbol]: 8.42e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.95001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.529e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.60998e-06 [opt_after_jit_grad]: 0.00045187 [validate]: 3.158e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00653745 [execute]: 7.35e-06 Sums bootstrap : 0.000558s : 3.26% type_inference : 0.006201s : 36.16% event_method : 0.000015s : 0.09% auto_monad : 0.000054s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000057s : 0.33% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.22% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000577s : 3.37% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000038s : 0.22% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000007s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000421s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000044s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.62% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000416s : 2.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000051s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000003s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000008s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000452s : 2.63% validate : 0.000032s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006537s : 38.12% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 15.11% : 0.000025s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.24% : 0.000005s : 4: substitution.graph_param_transform 66.20% : 0.000109s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.83% : 0.000005s : 4: substitution.remove_not_recompute_node 2.46% : 0.000004s : 4: substitution.replace_old_param 6.65% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006156 2 90.60% : 0.005577s : 1: type_inference.infer 9.40% : 0.000579s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.22% : 0.000027s : 3: replace.inline 29.78% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.52% : 0.000107s : 3: match.inline 8.48% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.92% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.19% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000002s : 8: predicate.shard_identity_eliminate 0.84% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.42% : 0.000002s : 16: predicate.switch_defer_inline 2.08% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.14% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.25% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000376 8 46.60% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.40% : 0.000201s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030481 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.29% : 0.003440s : 1: add_attr 11.25% : 0.003428s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000055s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000059s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.95% : 0.000595s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.39% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.09% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.04% : 0.002147s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.51% : 0.000461s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.09% : 0.003990s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000215s : 1: renormalize.infer 0.65% : 0.000199s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.48% : 0.006548s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 20.39% : 0.006215s : 1: type_inference 0.19% : 0.000059s : 1: validate TotalTime = 0.0181105, [24] [bootstrap]: 0.00046026 [type_inference]: 0.00433353 [event_method]: 1.061e-05 [auto_monad]: 5.044e-05 [graph_reusing]: 5.00999e-06 [inline]: 1.79e-06 [add_attr]: 0.00299271, [1] [add_attr_with_inline]: 0.00298461, [1] [Cycle 1]: 4.476e-05, [2] [tag_attr]: 1.175e-05 [meta_addattr_fg_expand]: 3.38e-06 [parallel-infer-symbol]: 2.68e-06 [pre_auto_parallel]: 2.129e-05 [insert-virtual-dataset]: 2.68e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.81e-06 [optimize]: 0.00366264, [53] [py_interpret_to_execute]: 1.557e-05 [rewriter_before_opt_a]: 3.911e-05 [opt_a]: 0.0018407, [2] [Cycle 1]: 0.00124284, [45] [expand_dump_flag]: 2.84999e-06 [switch_simplify]: 2.428e-05 [loop_unroll]: 1.382e-05 [a_1]: 0.00029225 [with_stream_mark]: 1.273e-05 [recompute_prepare]: 7.35e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.08998e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 1.61998e-06 [a_2]: 7.651e-05 [accelerated_algorithm]: 6.54001e-06 [shard]: 2.94999e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.86998e-06 [merge_send_recv]: 8.07e-06 [auto_parallel]: 5.56e-06 [parallel]: 1.771e-05 [flash_sp]: 7.21001e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 8.53001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.24001e-06 [virtual_dataset]: 5.67001e-06 [get_grad_eliminate_]: 5.69999e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 3.70998e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 9.51998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.063e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 1.005e-05 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.058e-05 [a_after_grad]: 8.80999e-06 [renormalize]: 0.00033539 [add_forward_monad_depend]: 4.48999e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.333e-05 [cse]: 2.495e-05 [a_3]: 3.95e-05 [Cycle 2]: 0.00058864, [45] [expand_dump_flag]: 1.05999e-06 [switch_simplify]: 6.85002e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00012499 [with_stream_mark]: 9.10999e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.64001e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 6.773e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.25999e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.13999e-06 [auto_parallel]: 5.61998e-06 [parallel]: 4.41002e-06 [flash_sp]: 2.76e-06 [merge_comm]: 2.92002e-06 [allreduce_fusion]: 2.62001e-06 [matmul_add_comm_reduction]: 5.49e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.19e-06 [get_grad_eliminate_]: 4.94998e-06 [virtual_output]: 5.21002e-06 [merge_forward]: 2.56e-06 [cell_reuse_recompute_pass]: 1.27999e-06 [offload_activation]: 5.95002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.55001e-06 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 7.77998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.01001e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.79983e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.09e-06 [a_after_grad]: 8e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 9.80013e-07 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 5.92999e-06 [cse]: 1.236e-05 [a_3]: 3.189e-05 [py_interpret_to_execute_after_opt_a]: 7.41999e-06 [slice_cell_reuse_recomputed_activation]: 2.11e-06 [rewriter_after_opt_a]: 3.073e-05 [convert_after_rewriter]: 6.56e-06 [order_py_execute_after_rewriter]: 4.84e-06 [mutable_eliminate]: 0.00044504 [opt_b]: 0.00020269, [1] [Cycle 1]: 0.00019662, [7] [b_1]: 0.0001283 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.30997e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 3.39991e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.569e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 2.26e-05 [loop_unroll]: 0.00041469 [opt_after_cconv]: 9.444e-05, [1] [Cycle 1]: 8.867e-05, [7] [c_1]: 2.743e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.6e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.251e-05 [tuple_transform]: 6.759e-05, [1] [Cycle 1]: 6.337e-05, [4] [d_1]: 3.814e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.03002e-06 [partial_unused_args_eliminate]: 1.56002e-06 [add_recomputation]: 4.362e-05 [cse_after_recomputation]: 1.92e-05, [1] [Cycle 1]: 1.509e-05, [1] [cse]: 1.019e-05 [environ_conv]: 4.68001e-06 [swap_dp_allreduce_reducescatter]: 5.30001e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 4.48999e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.59001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 7.10017e-07 [remove_cast_before_assign_add]: 9.90025e-07 [full_micro_interleaved_order_control]: 2.24999e-06 [reorder_send_recv_between_fp_bp]: 2.79999e-06 [comm_op_add_attrs]: 1.25001e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.30001e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.80001e-06 [control_data_broadcast_order]: 1.108e-05 [grouped_pairwise_exchange_alltoall]: 1.84e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.52998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.642e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.33002e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 6.846e-05, [1] [Cycle 1]: 6.438e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.23001e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 6.28002e-06 [fold_const_symbol]: 8.68001e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.589e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.18998e-06 [opt_after_jit_grad]: 0.00044957 [validate]: 3.11e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00585439 [execute]: 6.52001e-06 Sums bootstrap : 0.000460s : 3.25% type_inference : 0.004334s : 30.61% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000417s : 2.95% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000335s : 2.37% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000037s : 0.26% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 3.14% optimize.opt_b.b_1 : 0.000128s : 0.91% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000415s : 2.93% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000450s : 3.18% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005854s : 41.35% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.49% : 0.000023s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.18% : 0.000005s : 4: substitution.graph_param_transform 65.27% : 0.000080s : 2: substitution.inline 2.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.71% : 0.000005s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004294 2 92.03% : 0.003952s : 1: type_inference.infer 7.97% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000157 984 0.71% : 0.000001s : 9: predicate.accumulaten_eliminater 0.95% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.61% : 0.000001s : 9: predicate.addn_zero_filter 0.60% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.25% : 0.000004s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.70% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.75% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.71% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.98% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.94% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.89% : 0.000001s : 13: predicate.environ_get_depend_swap 14.39% : 0.000023s : 21: predicate.environ_get_eliminate 0.89% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.83% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.55% : 0.000002s : 11: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 5.41% : 0.000008s : 44: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.41% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.87% : 0.000003s : 26: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.52% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.50% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.61% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.06% : 0.000002s : 11: predicate.partial_defer_inline 1.10% : 0.000002s : 13: predicate.partial_eliminate 0.66% : 0.000001s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 1.85% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.60% : 0.000001s : 8: predicate.remove_not_recompute_node 1.12% : 0.000002s : 17: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.68% : 0.000001s : 9: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.89% : 0.000001s : 11: predicate.switch_defer_inline 1.52% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.72% : 0.000006s : 41: predicate.switch_simplify 0.64% : 0.000001s : 9: predicate.tile_eliminate 0.98% : 0.000002s : 9: predicate.transpose_eliminate 1.28% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.29% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.17% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.66% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.82% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000005s : 34: predicate.updatestate_useless_node_eliminater 0.34% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.69% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 42.27% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.73% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026047 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.51% : 0.002997s : 1: add_attr 11.47% : 0.002988s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000498s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000454s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.42% : 0.000111s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.08% : 0.001844s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.76% : 0.000460s : 1: opt_after_jit_grad 0.79% : 0.000206s : 1: opt_b 14.08% : 0.003666s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000183s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.51% : 0.005864s : 1: task_emit 0.27% : 0.000070s : 1: tuple_transform 16.69% : 0.004347s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x4-kbk],max_mem:62.0M TotalTime = 0.140997, [24] [bootstrap]: 0.00052318 [type_inference]: 0.00598029 [event_method]: 1.416e-05 [auto_monad]: 5.699e-05 [graph_reusing]: 5.37999e-06 [inline]: 1.82001e-06 [add_attr]: 0.00334974, [1] [add_attr_with_inline]: 0.0033384, [1] [Cycle 1]: 4.524e-05, [2] [tag_attr]: 1.493e-05 [meta_addattr_fg_expand]: 3.73999e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 2.735e-05 [insert-virtual-dataset]: 2.21e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.96e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00393865, [53] [py_interpret_to_execute]: 2.01e-05 [rewriter_before_opt_a]: 5.974e-05 [opt_a]: 0.00210887, [2] [Cycle 1]: 0.00151229, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 3.19e-05 [loop_unroll]: 2.058e-05 [a_1]: 0.00046573 [with_stream_mark]: 1.417e-05 [recompute_prepare]: 7.41001e-06 [updatestate_depend_eliminate]: 3.78999e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 2.63e-06 [parameter_eliminate]: 1.52999e-06 [a_2]: 7.466e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 2.56998e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.59998e-06 [merge_send_recv]: 7.56999e-06 [auto_parallel]: 5.82001e-06 [parallel]: 2.383e-05 [flash_sp]: 7.60998e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.35e-06 [matmul_add_comm_reduction]: 9.14998e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.57001e-06 [merge_forward]: 3.92002e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.52999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.038e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.17001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.81e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.62e-06 [renormalize]: 0.00040701 [add_forward_monad_depend]: 4.74002e-06 [auto_monad_grad]: 1.90001e-06 [auto_monad_eliminator]: 1.283e-05 [cse]: 2.768e-05 [a_3]: 4.049e-05 [Cycle 2]: 0.00058736, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 6.73e-06 [loop_unroll]: 5.49998e-06 [a_1]: 0.00012496 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 5.57001e-06 [updatestate_depend_eliminate]: 2.76999e-06 [updatestate_assign_eliminate]: 2.16998e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.661e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.54e-06 [merge_send_recv]: 4.23999e-06 [auto_parallel]: 5.08002e-06 [parallel]: 4.48999e-06 [flash_sp]: 2.92002e-06 [merge_comm]: 3.05998e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 4.94998e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.03998e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 4.88001e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.44001e-06 [cell_reuse_recompute_pass]: 1.16997e-06 [offload_activation]: 5.96998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.11002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 1.66998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.39996e-07 [after_resolve]: 9.69999e-06 [a_after_grad]: 8.21002e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.48998e-06 [cse]: 1.507e-05 [a_3]: 3.209e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.041e-05 [convert_after_rewriter]: 6.96999e-06 [order_py_execute_after_rewriter]: 5.23002e-06 [mutable_eliminate]: 0.00044314 [opt_b]: 0.00018315, [1] [Cycle 1]: 0.00017706, [7] [b_1]: 0.00010867 [b_2]: 7.1e-06 [updatestate_depend_eliminate]: 5.15001e-06 [updatestate_assign_eliminate]: 2.76999e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 5.50004e-07 [cse]: 1.599e-05 [optimize_parallel_all_gather_comm]: 1.503e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.226e-05 [loop_unroll]: 0.00041203 [opt_after_cconv]: 9.48e-05, [1] [Cycle 1]: 8.901e-05, [7] [c_1]: 2.768e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.594e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.223e-05 [tuple_transform]: 6.714e-05, [1] [Cycle 1]: 6.269e-05, [4] [d_1]: 3.774e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.84e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.906e-05 [cse_after_recomputation]: 2.069e-05, [1] [Cycle 1]: 1.615e-05, [1] [cse]: 1.103e-05 [environ_conv]: 4.37e-06 [swap_dp_allreduce_reducescatter]: 5.31002e-06 [bias_add_comm_swap]: 2.51e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.98e-06 [merge_cast_opt]: 1.28002e-06 [slice_recompute_activation]: 2.08002e-06 [micro_interleaved_order_control]: 2.78998e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.39001e-06 [reorder_send_recv_between_fp_bp]: 2.84999e-06 [comm_op_add_attrs]: 1.11002e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.16002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82999e-06 [control_data_broadcast_order]: 1.169e-05 [grouped_pairwise_exchange_alltoall]: 1.77001e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.76002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 1.94e-06 [overlap_grad_ring_attention]: 3.7e-06 [overlap_grad_flash_sp]: 1.633e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.02999e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.901e-05, [1] [Cycle 1]: 6.491e-05, [6] [build]: 2.08998e-06 [elim_shapecalc]: 8.73001e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.95999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.525e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.61999e-06 [opt_after_jit_grad]: 0.00044648 [validate]: 3.017e-05 [backend_pass]: 8.70001e-07 [task_emit]: 0.126325 [execute]: 8.95999e-06 Sums bootstrap : 0.000523s : 0.38% type_inference : 0.005980s : 4.38% event_method : 0.000014s : 0.01% auto_monad : 0.000057s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.01% optimize.rewriter_before_opt_a : 0.000060s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000591s : 0.43% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.10% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.01% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.01% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000407s : 0.30% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.01% optimize.opt_a.cse : 0.000043s : 0.03% optimize.opt_a.a_3 : 0.000073s : 0.05% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000443s : 0.32% optimize.opt_b.b_1 : 0.000109s : 0.08% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000412s : 0.30% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000446s : 0.33% validate : 0.000030s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.126325s : 92.46% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.60% : 0.000024s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 2.95% : 0.000005s : 4: substitution.graph_param_transform 67.33% : 0.000111s : 3: substitution.inline 1.63% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.49% : 0.000004s : 4: substitution.remove_not_recompute_node 2.51% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005937 2 90.65% : 0.005382s : 1: type_inference.infer 9.35% : 0.000555s : 1: type_inference.specialize ------[replace.] 0.000051 5 51.92% : 0.000026s : 3: replace.inline 48.08% : 0.000024s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000119 5 91.73% : 0.000109s : 3: match.inline 8.27% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.95% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.83% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.79% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.28% : 0.000004s : 32: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 1.05% : 0.000002s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.19% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.58% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.89% : 0.000001s : 11: predicate.reshape_eliminate 0.74% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.87% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.94% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.87% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.54% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.57% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000345 8 46.25% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.75% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.149793 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.24% : 0.003354s : 1: add_attr 2.23% : 0.003342s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000062s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.37% : 0.000560s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000053s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.28% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.30% : 0.000452s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.64% : 0.000952s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.01% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000091s : 28: opt.transform.opt_b 0.03% : 0.000042s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.41% : 0.002112s : 1: opt_a 0.07% : 0.000098s : 1: opt_after_cconv 0.30% : 0.000456s : 1: opt_after_jit_grad 0.12% : 0.000187s : 1: opt_b 2.63% : 0.003942s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.14% : 0.000210s : 1: renormalize.infer 0.13% : 0.000190s : 1: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000034s : 1: rewriter_after_opt_a 0.04% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000072s : 1: symbol_engine_optimizer 84.35% : 0.126347s : 1: task_emit 0.05% : 0.000070s : 1: tuple_transform 4.00% : 0.005994s : 1: type_inference 0.04% : 0.000054s : 1: validate TotalTime = 0.134434, [24] [bootstrap]: 0.00048224 [type_inference]: 0.00443022 [event_method]: 1.066e-05 [auto_monad]: 5.222e-05 [graph_reusing]: 5.34e-06 [inline]: 2.01e-06 [add_attr]: 0.00294651, [1] [add_attr_with_inline]: 0.00293863, [1] [Cycle 1]: 4.504e-05, [2] [tag_attr]: 1.155e-05 [meta_addattr_fg_expand]: 3.09999e-06 [parallel-infer-symbol]: 2.59001e-06 [pre_auto_parallel]: 2.071e-05 [insert-virtual-dataset]: 3.11999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.69998e-06 [optimize]: 0.00369108, [53] [py_interpret_to_execute]: 1.65e-05 [rewriter_before_opt_a]: 3.89e-05 [opt_a]: 0.00189203, [2] [Cycle 1]: 0.00129121, [45] [expand_dump_flag]: 2.72001e-06 [switch_simplify]: 2.417e-05 [loop_unroll]: 1.368e-05 [a_1]: 0.00032583 [with_stream_mark]: 1.395e-05 [recompute_prepare]: 7.79002e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.17002e-06 [updatestate_loads_eliminate]: 3.14999e-06 [parameter_eliminate]: 1.62999e-06 [a_2]: 7.665e-05 [accelerated_algorithm]: 6.11e-06 [shard]: 2.21e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.96e-06 [merge_send_recv]: 7.67002e-06 [auto_parallel]: 6.02001e-06 [parallel]: 1.702e-05 [flash_sp]: 7.48e-06 [merge_comm]: 3.53999e-06 [allreduce_fusion]: 3.7e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 5.20027e-07 [virtual_shard_identity]: 6.92002e-06 [virtual_dataset]: 5.79e-06 [get_grad_eliminate_]: 5.51002e-06 [virtual_output]: 5.37001e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.44998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.099e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.10001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.24001e-06 [meta_fg_expand]: 2.14999e-06 [flash_sp_send_recv_attached]: 2.58e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 8.90999e-06 [renormalize]: 0.00034758 [add_forward_monad_depend]: 4.22e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.288e-05 [cse]: 2.743e-05 [a_3]: 4.008e-05 [Cycle 2]: 0.00059161, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.35001e-06 [a_1]: 0.00012413 [with_stream_mark]: 1.14e-05 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.75e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.52e-06 [auto_parallel]: 5.25999e-06 [parallel]: 4.05e-06 [flash_sp]: 3.25e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.85002e-06 [matmul_add_comm_reduction]: 5.10001e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.54e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.96002e-06 [merge_forward]: 2.54999e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 5.76998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.35001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.32998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.63002e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 9.85002e-06 [a_after_grad]: 8.69e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.40025e-07 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.19999e-06 [cse]: 1.262e-05 [a_3]: 3.178e-05 [py_interpret_to_execute_after_opt_a]: 7.31001e-06 [slice_cell_reuse_recomputed_activation]: 2.29999e-06 [rewriter_after_opt_a]: 3.004e-05 [convert_after_rewriter]: 7.13998e-06 [order_py_execute_after_rewriter]: 5.05001e-06 [mutable_eliminate]: 0.00044886 [opt_b]: 0.00018175, [1] [Cycle 1]: 0.00017593, [7] [b_1]: 0.00010851 [b_2]: 7.14001e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 2.9002e-07 [cse]: 1.572e-05 [optimize_parallel_all_gather_comm]: 1.578e-05 [overlap_param_gather]: 1.89e-06 [cconv]: 2.214e-05 [loop_unroll]: 0.00041321 [opt_after_cconv]: 9.318e-05, [1] [Cycle 1]: 8.777e-05, [7] [c_1]: 2.713e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.46998e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.634e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.259e-05 [tuple_transform]: 6.75e-05, [1] [Cycle 1]: 6.337e-05, [4] [d_1]: 3.875e-05 [none_parameter_eliminate]: 1.32999e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.28002e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.182e-05 [cse_after_recomputation]: 2.081e-05, [1] [Cycle 1]: 1.656e-05, [1] [cse]: 1.134e-05 [environ_conv]: 4.82998e-06 [swap_dp_allreduce_reducescatter]: 5.05999e-06 [bias_add_comm_swap]: 2.37001e-06 [label_micro_interleaved_index]: 3.99002e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.32001e-06 [micro_interleaved_order_control]: 2.16e-06 [assign_add_opt]: 1.49e-06 [ForceFp32Comm]: 9.79984e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.51998e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.111e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.62002e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.41998e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.635e-05 [begin_end_overlap_inline]: 4.59986e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.92001e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 6.83e-05, [1] [Cycle 1]: 6.438e-05, [6] [build]: 2.63e-06 [elim_shapecalc]: 8.15e-06 [elim_not_effective]: 1.153e-05 [opt_reshape]: 6.29999e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.77999e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.507e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.41999e-06 [opt_after_jit_grad]: 0.00044906 [validate]: 3.133e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.12203 [execute]: 9.27001e-06 Sums bootstrap : 0.000482s : 0.37% type_inference : 0.004430s : 3.40% event_method : 0.000011s : 0.01% auto_monad : 0.000052s : 0.04% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.03% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.02% optimize.opt_a.loop_unroll : 0.000019s : 0.01% optimize.opt_a.a_1 : 0.000450s : 0.34% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.11% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.01% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.01% optimize.opt_a.renormalize : 0.000348s : 0.27% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.01% optimize.opt_a.cse : 0.000040s : 0.03% optimize.opt_a.a_3 : 0.000072s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.02% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.34% optimize.opt_b.b_1 : 0.000109s : 0.08% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000413s : 0.32% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.03% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.34% validate : 0.000031s : 0.02% backend_pass : 0.000001s : 0.00% task_emit : 0.122030s : 93.52% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000118 26 18.31% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.23% : 0.000001s : 2: substitution.fold_const_symbol 4.22% : 0.000005s : 4: substitution.graph_param_transform 65.31% : 0.000077s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.72% : 0.000004s : 4: substitution.remove_not_recompute_node 3.46% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004390 2 91.91% : 0.004034s : 1: type_inference.infer 8.09% : 0.000355s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000173 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.53% : 0.000001s : 8: predicate.addn_check_dump 0.60% : 0.000001s : 9: predicate.addn_zero_filter 0.54% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 1.90% : 0.000003s : 17: predicate.arithmetic_simplify 0.64% : 0.000001s : 9: predicate.cast_eliminate 0.62% : 0.000001s : 8: predicate.check_bprop_eliminate 0.51% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.53% : 0.000001s : 8: predicate.depend_value_elim 0.64% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.70% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.65% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.90% : 0.000002s : 13: predicate.environ_add_const_eliminate 0.83% : 0.000001s : 13: predicate.environ_get_add_eliminate 0.89% : 0.000002s : 13: predicate.environ_get_depend_swap 1.48% : 0.000003s : 21: predicate.environ_get_eliminate 0.84% : 0.000001s : 13: predicate.environ_get_set_eliminate 21.49% : 0.000037s : 11: predicate.exchange_switch_depend_value 1.50% : 0.000003s : 11: predicate.float_depend_g_call 0.51% : 0.000001s : 8: predicate.float_environ_get_switch 0.79% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.64% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.53% : 0.000001s : 8: predicate.incorporate_call_switch 4.67% : 0.000008s : 44: predicate.inline 0.79% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 8: predicate.less_batch_normalization 1.29% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 1.72% : 0.000003s : 26: predicate.load_eliminater 0.98% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.36% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.59% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.53% : 0.000001s : 8: predicate.merge_addn 0.60% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.56% : 0.000001s : 9: predicate.minmaximum_grad 1.04% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 0.92% : 0.000002s : 11: predicate.partial_defer_inline 0.98% : 0.000002s : 13: predicate.partial_eliminate 0.60% : 0.000001s : 9: predicate.print_const_string_wrapper 0.59% : 0.000001s : 8: predicate.reduce_all_const_elim 0.75% : 0.000001s : 9: predicate.reduce_eliminate 1.69% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 8: predicate.remove_not_recompute_node 1.06% : 0.000002s : 17: predicate.replace_applicator 0.59% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.63% : 0.000001s : 9: predicate.reshape_eliminate 0.63% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.86% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.73% : 0.000001s : 8: predicate.shard_identity_eliminate 0.65% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.83% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.83% : 0.000001s : 11: predicate.switch_defer_inline 1.40% : 0.000002s : 19: predicate.switch_layer_defer_inline 3.53% : 0.000006s : 41: predicate.switch_simplify 0.60% : 0.000001s : 9: predicate.tile_eliminate 0.63% : 0.000001s : 9: predicate.transpose_eliminate 1.22% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.24% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.05% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.72% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.16% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 1.78% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.21% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 1.72% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.46% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.29% : 0.000001s : 4: predicate.value_based_eliminate 0.66% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.66% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.44% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000263 6 43.32% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.68% : 0.000149s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.142375 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.07% : 0.002951s : 1: add_attr 2.07% : 0.002942s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.03% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.04% : 0.000057s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.39% : 0.000553s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.30% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.32% : 0.000458s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.56% : 0.000801s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.06% : 0.000091s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.33% : 0.001895s : 1: opt_a 0.07% : 0.000096s : 1: opt_after_cconv 0.32% : 0.000459s : 1: opt_after_jit_grad 0.13% : 0.000185s : 1: opt_b 2.60% : 0.003695s : 1: optimize 0.01% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.01% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.01% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.13% : 0.000188s : 1: renormalize.infer 0.11% : 0.000153s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.02% : 0.000034s : 1: rewriter_after_opt_a 0.03% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000071s : 1: symbol_engine_optimizer 85.73% : 0.122051s : 1: task_emit 0.05% : 0.000070s : 1: tuple_transform 3.12% : 0.004444s : 1: type_inference 0.04% : 0.000053s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x4-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x5-pynative],max_mem:64.0M TotalTime = 0.0210687, [24] [bootstrap]: 0.00057592 [type_inference]: 0.00609334 [event_method]: 1.448e-05 [auto_monad]: 5.387e-05 [graph_reusing]: 5.25001e-06 [inline]: 1.89999e-06 [add_attr]: 0.0033679, [1] [add_attr_with_inline]: 0.00335725, [1] [Cycle 1]: 4.477e-05, [2] [tag_attr]: 1.521e-05 [meta_addattr_fg_expand]: 3.98001e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.802e-05 [insert-virtual-dataset]: 2.46998e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.64e-06 [pipeline_split]: 1.95001e-06 [optimize]: 0.00400038, [53] [py_interpret_to_execute]: 2.027e-05 [rewriter_before_opt_a]: 5.865e-05 [opt_a]: 0.00216349, [2] [Cycle 1]: 0.00152912, [45] [expand_dump_flag]: 3.04999e-06 [switch_simplify]: 3.14e-05 [loop_unroll]: 2.068e-05 [a_1]: 0.00045514 [with_stream_mark]: 1.339e-05 [recompute_prepare]: 7.66999e-06 [updatestate_depend_eliminate]: 3.42002e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.71e-06 [a_2]: 7.425e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.19001e-06 [merge_send_recv]: 8.04002e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.358e-05 [flash_sp]: 6.93e-06 [merge_comm]: 3.61001e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 6.89994e-07 [virtual_shard_identity]: 7.3e-06 [virtual_dataset]: 5.97999e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 3.81001e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.049e-05 [merge_recompute_call_nodes]: 1.66998e-06 [before_grad]: 9.71998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.09999e-06 [flash_sp_send_recv_attached]: 2.26e-06 [receive_attached]: 2.38002e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.83001e-06 [renormalize]: 0.00043006 [add_forward_monad_depend]: 4.37e-06 [auto_monad_grad]: 1.71002e-06 [auto_monad_eliminator]: 1.315e-05 [cse]: 2.685e-05 [a_3]: 4.171e-05 [Cycle 2]: 0.00062492, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012425 [with_stream_mark]: 9.87001e-06 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 8.958e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.92999e-06 [auto_parallel]: 5.47001e-06 [parallel]: 3.76999e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.26999e-06 [allreduce_fusion]: 2.95998e-06 [matmul_add_comm_reduction]: 5.00999e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 7.81001e-06 [virtual_dataset]: 5.30999e-06 [get_grad_eliminate_]: 5.02999e-06 [virtual_output]: 5.25001e-06 [merge_forward]: 2.93e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 6.12999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.87001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.71999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.20002e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.72e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 6.48998e-06 [cse]: 1.406e-05 [a_3]: 3.354e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 1.76998e-06 [rewriter_after_opt_a]: 3.023e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.20001e-06 [mutable_eliminate]: 0.00045111 [opt_b]: 0.00018007, [1] [Cycle 1]: 0.0001741, [7] [b_1]: 0.00010655 [b_2]: 6.96999e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 5.00004e-07 [cse]: 1.674e-05 [optimize_parallel_all_gather_comm]: 1.635e-05 [overlap_param_gather]: 2.12001e-06 [cconv]: 2.181e-05 [loop_unroll]: 0.00041883 [opt_after_cconv]: 9.465e-05, [1] [Cycle 1]: 8.898e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.665e-05 [renormalize]: 2.79979e-07 [remove_dup_value]: 1.237e-05 [tuple_transform]: 6.871e-05, [1] [Cycle 1]: 6.434e-05, [4] [d_1]: 3.902e-05 [none_parameter_eliminate]: 1.73002e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.96e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.644e-05 [cse_after_recomputation]: 2.037e-05, [1] [Cycle 1]: 1.612e-05, [1] [cse]: 1.087e-05 [environ_conv]: 4.80001e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.36002e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.11998e-06 [micro_interleaved_order_control]: 2.46998e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.30001e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.153e-05 [grouped_pairwise_exchange_alltoall]: 1.96e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.92e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52999e-06 [overlap_recompute_comm]: 1.82999e-06 [overlap_grad_ring_attention]: 3.86999e-06 [overlap_grad_flash_sp]: 1.697e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.09e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.769e-05, [1] [Cycle 1]: 6.362e-05, [6] [build]: 2.64001e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.115e-05 [opt_reshape]: 6.15002e-06 [fold_const_symbol]: 8.77999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.498e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.28e-06 [opt_after_jit_grad]: 0.00045183 [validate]: 3.251e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00619873 [execute]: 6.63998e-06 Sums bootstrap : 0.000576s : 3.44% type_inference : 0.006093s : 36.42% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000579s : 3.46% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000164s : 0.98% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000015s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000430s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000075s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.70% optimize.opt_b.b_1 : 0.000107s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000419s : 2.50% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 2.70% validate : 0.000033s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006199s : 37.05% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000168 30 14.87% : 0.000025s : 5: substitution.arithmetic_simplify 1.12% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 67.23% : 0.000113s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.45% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.43% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006049 2 90.89% : 0.005498s : 1: type_inference.infer 9.11% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.27% : 0.000028s : 3: replace.inline 28.73% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.85% : 0.000111s : 3: match.inline 8.15% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.84% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.59% : 0.000001s : 8: predicate.addn_check_dump 0.93% : 0.000001s : 11: predicate.addn_zero_filter 0.81% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.40% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.21% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.91% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.69% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.91% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.13% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.72% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.11% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000352 8 48.19% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.81% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029988 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.24% : 0.003372s : 1: add_attr 11.21% : 0.003361s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000059s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.05% : 0.000614s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.43% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.54% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.23% : 0.000967s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.22% : 0.002167s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.54% : 0.000461s : 1: opt_after_jit_grad 0.61% : 0.000183s : 1: opt_b 13.35% : 0.004004s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000214s : 1: renormalize.infer 0.70% : 0.000210s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000070s : 1: symbol_engine_optimizer 20.70% : 0.006209s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.37% : 0.006107s : 1: type_inference 0.22% : 0.000066s : 1: validate TotalTime = 0.0181714, [24] [bootstrap]: 0.00050644 [type_inference]: 0.00432538 [event_method]: 1.049e-05 [auto_monad]: 5.149e-05 [graph_reusing]: 4.80001e-06 [inline]: 1.52999e-06 [add_attr]: 0.0029396, [1] [add_attr_with_inline]: 0.0029318, [1] [Cycle 1]: 4.484e-05, [2] [tag_attr]: 1.103e-05 [meta_addattr_fg_expand]: 3.02002e-06 [parallel-infer-symbol]: 3.03998e-06 [pre_auto_parallel]: 2.141e-05 [insert-virtual-dataset]: 2.80002e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.00369459, [53] [py_interpret_to_execute]: 1.45e-05 [rewriter_before_opt_a]: 3.906e-05 [opt_a]: 0.00189376, [2] [Cycle 1]: 0.00129474, [45] [expand_dump_flag]: 2.26e-06 [switch_simplify]: 2.343e-05 [loop_unroll]: 1.36e-05 [a_1]: 0.00029131 [with_stream_mark]: 1.339e-05 [recompute_prepare]: 7.33e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.17002e-06 [updatestate_loads_eliminate]: 3.25998e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 0.00012852 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.48e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.75001e-06 [merge_send_recv]: 8.03001e-06 [auto_parallel]: 5.82999e-06 [parallel]: 1.841e-05 [flash_sp]: 7.05e-06 [merge_comm]: 3.6e-06 [allreduce_fusion]: 3.41001e-06 [matmul_add_comm_reduction]: 9.34e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 6.79001e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.86e-06 [virtual_output]: 5.86e-06 [merge_forward]: 3.53999e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.121e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 9.01002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53999e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.021e-05 [a_after_grad]: 8.55001e-06 [renormalize]: 0.00033577 [add_forward_monad_depend]: 4.13999e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.358e-05 [cse]: 2.648e-05 [a_3]: 3.997e-05 [Cycle 2]: 0.00058968, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.66e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012428 [with_stream_mark]: 9.07001e-06 [recompute_prepare]: 5.62999e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.747e-05 [accelerated_algorithm]: 5.46e-06 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.22e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 4.94e-06 [parallel]: 3.78001e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 3.21999e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.97999e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.61e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.48001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 1.76e-06 [flash_sp_send_recv_attached]: 8.49977e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.69999e-06 [a_after_grad]: 8.12e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.23002e-06 [cse]: 1.172e-05 [a_3]: 3.184e-05 [py_interpret_to_execute_after_opt_a]: 7.28e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.09e-05 [convert_after_rewriter]: 7.41001e-06 [order_py_execute_after_rewriter]: 5.39998e-06 [mutable_eliminate]: 0.00044996 [opt_b]: 0.0001808, [1] [Cycle 1]: 0.00017487, [7] [b_1]: 0.00010761 [b_2]: 7.33999e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.43998e-06 [renormalize]: 3.89991e-07 [cse]: 1.61e-05 [optimize_parallel_all_gather_comm]: 1.573e-05 [overlap_param_gather]: 1.68002e-06 [cconv]: 2.185e-05 [loop_unroll]: 0.00041569 [opt_after_cconv]: 9.338e-05, [1] [Cycle 1]: 8.79e-05, [7] [c_1]: 2.784e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.11e-06 [cse]: 1.525e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.883e-05, [1] [Cycle 1]: 6.461e-05, [4] [d_1]: 3.967e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.76998e-06 [add_recomputation]: 4.185e-05 [cse_after_recomputation]: 2.036e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.066e-05 [environ_conv]: 4.45e-06 [swap_dp_allreduce_reducescatter]: 5.34998e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.72001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.70002e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 9.5999e-07 [full_micro_interleaved_order_control]: 2.22999e-06 [reorder_send_recv_between_fp_bp]: 2.77002e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.07998e-06 [interleave_parallel_branches]: 1.27e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.175e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.59998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.15002e-06 [overlap_grad_ring_attention]: 4.18001e-06 [overlap_grad_flash_sp]: 1.649e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.19e-06 [symbol_engine_optimizer]: 6.729e-05, [1] [Cycle 1]: 6.324e-05, [6] [build]: 2.36e-06 [elim_shapecalc]: 8.12e-06 [elim_not_effective]: 1.106e-05 [opt_reshape]: 5.87999e-06 [fold_const_symbol]: 9.04e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.529e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00045051 [validate]: 3.038e-05 [backend_pass]: 1.18001e-06 [task_emit]: 0.00590074 [execute]: 7.75998e-06 Sums bootstrap : 0.000506s : 3.55% type_inference : 0.004325s : 30.29% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000416s : 2.91% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000196s : 1.37% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000013s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000336s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.15% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.91% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000040s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000451s : 3.16% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005901s : 41.32% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.73% : 0.000023s : 4: substitution.arithmetic_simplify 1.34% : 0.000002s : 2: substitution.elim_not_effective 1.19% : 0.000001s : 2: substitution.fold_const_symbol 4.64% : 0.000006s : 4: substitution.graph_param_transform 64.81% : 0.000079s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.69% : 0.000004s : 4: substitution.remove_not_recompute_node 3.29% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004286 2 92.07% : 0.003946s : 1: type_inference.infer 7.93% : 0.000340s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.65% : 0.000004s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.30% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_depend_swap 1.82% : 0.000002s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 1.10% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.31% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000001s : 9: predicate.reduce_eliminate 2.09% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 1.05% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.62% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.06% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.54% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.46% : 0.000134s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026066 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.29% : 0.002944s : 1: add_attr 11.26% : 0.002935s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 2.08% : 0.000542s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000459s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.28% : 0.001897s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.76% : 0.000460s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.19% : 0.003698s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.71% : 0.000184s : 1: renormalize.infer 0.56% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.68% : 0.005911s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.65% : 0.004340s : 1: type_inference 0.21% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x5-kbk],max_mem:64.0M TotalTime = 0.121418, [24] [bootstrap]: 0.00054545 [type_inference]: 0.00596721 [event_method]: 5.241e-05 [auto_monad]: 5.816e-05 [graph_reusing]: 5.56e-06 [inline]: 1.73997e-06 [add_attr]: 0.00338992, [1] [add_attr_with_inline]: 0.00337918, [1] [Cycle 1]: 4.502e-05, [2] [tag_attr]: 1.508e-05 [meta_addattr_fg_expand]: 4.04997e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 2.773e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0040122, [53] [py_interpret_to_execute]: 1.937e-05 [rewriter_before_opt_a]: 5.863e-05 [opt_a]: 0.00216714, [2] [Cycle 1]: 0.00157235, [45] [expand_dump_flag]: 2.88998e-06 [switch_simplify]: 3.149e-05 [loop_unroll]: 2.095e-05 [a_1]: 0.00050877 [with_stream_mark]: 1.675e-05 [recompute_prepare]: 8.27e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.65e-06 [updatestate_loads_eliminate]: 3.01001e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 7.662e-05 [accelerated_algorithm]: 6.18002e-06 [shard]: 2.56998e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 8.17e-06 [auto_parallel]: 5.92999e-06 [parallel]: 2.26e-05 [flash_sp]: 7.15998e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.39001e-06 [matmul_add_comm_reduction]: 9.19e-06 [allreduce_slice_to_reducescatter]: 5.79981e-07 [virtual_shard_identity]: 7.28999e-06 [virtual_dataset]: 5.87001e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.88999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 8.99998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 9.85002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.09999e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.37999e-06 [after_resolve]: 1.067e-05 [a_after_grad]: 8.99003e-06 [renormalize]: 0.00041224 [add_forward_monad_depend]: 4.30999e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 1.362e-05 [cse]: 2.669e-05 [a_3]: 4.107e-05 [Cycle 2]: 0.00058548, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.25001e-06 [a_1]: 0.00012616 [with_stream_mark]: 9.23002e-06 [recompute_prepare]: 5.44998e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.20002e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 9.10019e-07 [a_2]: 6.751e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.56998e-06 [merge_send_recv]: 4.32998e-06 [auto_parallel]: 4.93001e-06 [parallel]: 3.87002e-06 [flash_sp]: 3.08998e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.60002e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 5.89e-06 [virtual_dataset]: 5.26998e-06 [get_grad_eliminate_]: 4.89998e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 5.90002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.05001e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 2.83e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.95001e-06 [a_after_grad]: 7.87e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.51999e-06 [cse]: 1.534e-05 [a_3]: 3.154e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.436e-05 [convert_after_rewriter]: 6.79001e-06 [order_py_execute_after_rewriter]: 4.97999e-06 [mutable_eliminate]: 0.00045114 [opt_b]: 0.00017788, [1] [Cycle 1]: 0.00017206, [7] [b_1]: 0.00010615 [b_2]: 6.73998e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 4.00003e-07 [cse]: 1.562e-05 [optimize_parallel_all_gather_comm]: 1.862e-05 [overlap_param_gather]: 1.94e-06 [cconv]: 2.193e-05 [loop_unroll]: 0.00041703 [opt_after_cconv]: 9.354e-05, [1] [Cycle 1]: 8.796e-05, [7] [c_1]: 2.711e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.519e-05 [renormalize]: 4.99975e-07 [remove_dup_value]: 1.294e-05 [tuple_transform]: 6.944e-05, [1] [Cycle 1]: 6.524e-05, [4] [d_1]: 3.924e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.816e-05 [cse_after_recomputation]: 2.095e-05, [1] [Cycle 1]: 1.643e-05, [1] [cse]: 1.12e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 5.22999e-06 [bias_add_comm_swap]: 2.15002e-06 [label_micro_interleaved_index]: 4.26001e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.30999e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.84999e-06 [assign_add_opt]: 1.13001e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.65002e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.44e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.19e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.49001e-06 [overlap_recompute_and_grad_model_parallel]: 4.76002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71998e-06 [overlap_recompute_comm]: 2.20002e-06 [overlap_grad_ring_attention]: 3.74002e-06 [overlap_grad_flash_sp]: 1.652e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.78e-05, [1] [Cycle 1]: 6.362e-05, [6] [build]: 2.30002e-06 [elim_shapecalc]: 8.3e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 9.21002e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.84998e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.551e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00047866 [validate]: 3.103e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.106594 [execute]: 9.49999e-06 Sums bootstrap : 0.000545s : 0.47% type_inference : 0.005967s : 5.10% event_method : 0.000052s : 0.04% auto_monad : 0.000058s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.02% optimize.rewriter_before_opt_a : 0.000059s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.03% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000635s : 0.54% optimize.opt_a.with_stream_mark : 0.000026s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000412s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000042s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.39% optimize.opt_b.b_1 : 0.000106s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000417s : 0.36% optimize.opt_after_cconv.c_1 : 0.000027s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000479s : 0.41% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.106594s : 91.06% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 15.03% : 0.000024s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.32% : 0.000005s : 4: substitution.graph_param_transform 66.56% : 0.000108s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.50% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.61% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005922 2 90.71% : 0.005372s : 1: type_inference.infer 9.29% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000093 5 87.01% : 0.000081s : 3: replace.inline 12.99% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.65% : 0.000106s : 3: match.inline 8.35% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.16% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.33% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.87% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.62% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.93% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.29% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.91% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000002s : 8: predicate.shard_identity_eliminate 0.68% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.07% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000343 8 46.34% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.66% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.130379 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.60% : 0.003394s : 1: add_attr 2.59% : 0.003383s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000063s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000583s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.04% : 0.000058s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.77% : 0.001000s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000088s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.66% : 0.002170s : 1: opt_a 0.07% : 0.000097s : 1: opt_after_cconv 0.37% : 0.000488s : 1: opt_after_jit_grad 0.14% : 0.000181s : 1: opt_b 3.08% : 0.004016s : 1: optimize 0.02% : 0.000022s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000005s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000017s : 1: remove_dup_value 0.16% : 0.000211s : 1: renormalize.infer 0.15% : 0.000195s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000038s : 1: rewriter_after_opt_a 0.05% : 0.000063s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 81.77% : 0.106615s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.59% : 0.005980s : 1: type_inference 0.04% : 0.000055s : 1: validate TotalTime = 0.112833, [24] [bootstrap]: 0.00046521 [type_inference]: 0.00439117 [event_method]: 1.068e-05 [auto_monad]: 5.019e-05 [graph_reusing]: 4.93001e-06 [inline]: 2.09e-06 [add_attr]: 0.00294445, [1] [add_attr_with_inline]: 0.00293649, [1] [Cycle 1]: 4.54e-05, [2] [tag_attr]: 1.15e-05 [meta_addattr_fg_expand]: 3.04999e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 2.129e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 1.74998e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00367391, [53] [py_interpret_to_execute]: 1.456e-05 [rewriter_before_opt_a]: 3.923e-05 [opt_a]: 0.0018781, [2] [Cycle 1]: 0.00127454, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 2.387e-05 [loop_unroll]: 1.367e-05 [a_1]: 0.00029066 [with_stream_mark]: 1.411e-05 [recompute_prepare]: 7.16001e-06 [updatestate_depend_eliminate]: 3.48999e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 3.02002e-06 [parameter_eliminate]: 1.53002e-06 [a_2]: 7.557e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 5.69999e-06 [parallel]: 1.711e-05 [flash_sp]: 7.06001e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.07002e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.53002e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.10999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.033e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 8.95999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.13998e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 1.016e-05 [a_after_grad]: 8.87e-06 [renormalize]: 0.00037386 [add_forward_monad_depend]: 4.23999e-06 [auto_monad_grad]: 1.66002e-06 [auto_monad_eliminator]: 1.303e-05 [cse]: 2.633e-05 [a_3]: 4.048e-05 [Cycle 2]: 0.00059431, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.51998e-06 [a_1]: 0.00012554 [with_stream_mark]: 1.122e-05 [recompute_prepare]: 5.82001e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.43e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.854e-05 [accelerated_algorithm]: 5.66e-06 [shard]: 9.79984e-07 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 4.93001e-06 [parallel]: 4.32e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 2.88e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 4.95999e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82999e-06 [merge_recompute_call_nodes]: 6.39993e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.09988e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.64999e-06 [a_after_grad]: 8.17e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.06e-06 [cse]: 1.283e-05 [a_3]: 3.239e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 3.049e-05 [convert_after_rewriter]: 6.81001e-06 [order_py_execute_after_rewriter]: 5.35999e-06 [mutable_eliminate]: 0.00044569 [opt_b]: 0.00017974, [1] [Cycle 1]: 0.00017385, [7] [b_1]: 0.00010681 [b_2]: 7.36001e-06 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 4.69998e-07 [cse]: 1.614e-05 [optimize_parallel_all_gather_comm]: 1.528e-05 [overlap_param_gather]: 2.19999e-06 [cconv]: 2.222e-05 [loop_unroll]: 0.00041099 [opt_after_cconv]: 9.391e-05, [1] [Cycle 1]: 8.816e-05, [7] [c_1]: 2.73e-05 [parameter_eliminate]: 1.97999e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.586e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.178e-05 [tuple_transform]: 6.935e-05, [1] [Cycle 1]: 6.503e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.39e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.24999e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.299e-05 [cse_after_recomputation]: 2.069e-05, [1] [Cycle 1]: 1.629e-05, [1] [cse]: 1.135e-05 [environ_conv]: 4.76002e-06 [swap_dp_allreduce_reducescatter]: 4.90999e-06 [bias_add_comm_swap]: 3.08e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.68003e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.40025e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09003e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.74998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.52999e-06 [overlap_recompute_comm]: 2.32001e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.66002e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.76e-05, [1] [Cycle 1]: 6.351e-05, [6] [build]: 2.12999e-06 [elim_shapecalc]: 8.44998e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 1.79978e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.79e-06 [auto_monad_reorder]: 1.556e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00044153 [validate]: 3.031e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.100552 [execute]: 1.02e-05 Sums bootstrap : 0.000465s : 0.43% type_inference : 0.004391s : 4.03% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000416s : 0.38% optimize.opt_a.with_stream_mark : 0.000025s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000374s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000039s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.41% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000411s : 0.38% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000442s : 0.41% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.100552s : 92.31% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000119 26 17.88% : 0.000021s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.46% : 0.000005s : 4: substitution.graph_param_transform 65.97% : 0.000078s : 2: substitution.inline 2.30% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.81% : 0.000005s : 4: substitution.remove_not_recompute_node 3.06% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004351 2 91.80% : 0.003994s : 1: type_inference.infer 8.20% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.58% : 0.000004s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.51% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.90% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.95% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 1.03% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 2.00% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.33% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000259 6 44.11% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.89% : 0.000145s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.120746 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.44% : 0.002949s : 1: add_attr 2.43% : 0.002940s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000501s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000419s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000455s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.63% : 0.000767s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.56% : 0.001881s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.37% : 0.000450s : 1: opt_after_jit_grad 0.15% : 0.000183s : 1: opt_b 3.05% : 0.003678s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.16% : 0.000187s : 1: renormalize.infer 0.15% : 0.000180s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 83.29% : 0.100573s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.65% : 0.004404s : 1: type_inference 0.04% : 0.000051s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x5-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x6-pynative],max_mem:64.0M TotalTime = 0.0210751, [24] [bootstrap]: 0.00057657 [type_inference]: 0.00608716 [event_method]: 1.465e-05 [auto_monad]: 5.788e-05 [graph_reusing]: 5.07999e-06 [inline]: 1.59e-06 [add_attr]: 0.00336636, [1] [add_attr_with_inline]: 0.00335537, [1] [Cycle 1]: 4.422e-05, [2] [tag_attr]: 1.495e-05 [meta_addattr_fg_expand]: 4.07e-06 [parallel-infer-symbol]: 2.75002e-06 [pre_auto_parallel]: 2.697e-05 [insert-virtual-dataset]: 2.57001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.67999e-06 [optimize]: 0.0040243, [53] [py_interpret_to_execute]: 2.025e-05 [rewriter_before_opt_a]: 5.825e-05 [opt_a]: 0.00218575, [2] [Cycle 1]: 0.0015225, [45] [expand_dump_flag]: 2.48e-06 [switch_simplify]: 3.141e-05 [loop_unroll]: 2.149e-05 [a_1]: 0.00045439 [with_stream_mark]: 1.293e-05 [recompute_prepare]: 7.68999e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 2.82002e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.638e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 1.89999e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 7.81001e-06 [auto_parallel]: 5.96e-06 [parallel]: 2.26e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.65998e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 8.27e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.21999e-06 [virtual_dataset]: 5.94e-06 [get_grad_eliminate_]: 5.58002e-06 [virtual_output]: 5.89e-06 [merge_forward]: 3.60998e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.073e-05 [merge_recompute_call_nodes]: 1.49e-06 [before_grad]: 9.79e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.21998e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.11998e-06 [after_resolve]: 1.082e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00042348 [add_forward_monad_depend]: 4.48999e-06 [auto_monad_grad]: 1.92999e-06 [auto_monad_eliminator]: 1.275e-05 [cse]: 2.758e-05 [a_3]: 4.107e-05 [Cycle 2]: 0.00065375, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.73998e-06 [loop_unroll]: 5.30999e-06 [a_1]: 0.00012596 [with_stream_mark]: 9.32999e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.16e-06 [updatestate_loads_eliminate]: 2.22999e-06 [parameter_eliminate]: 7.29982e-07 [a_2]: 6.789e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.19e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.12e-06 [auto_parallel]: 4.96002e-06 [parallel]: 3.94002e-06 [flash_sp]: 2.87002e-06 [merge_comm]: 3.02002e-06 [allreduce_fusion]: 2.83e-06 [matmul_add_comm_reduction]: 6.19001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.91001e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 5.91998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.50001e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.95e-06 [set_forward_comm_id_for_comm_node_pass]: 3.07002e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 9.77999e-06 [a_after_grad]: 8.93002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.96999e-06 [cse]: 1.3e-05 [a_3]: 8.557e-05 [py_interpret_to_execute_after_opt_a]: 8.12e-06 [slice_cell_reuse_recomputed_activation]: 2.03997e-06 [rewriter_after_opt_a]: 3.165e-05 [convert_after_rewriter]: 6.96999e-06 [order_py_execute_after_rewriter]: 5.14e-06 [mutable_eliminate]: 0.00045126 [opt_b]: 0.00018078, [1] [Cycle 1]: 0.0001746, [7] [b_1]: 0.00010737 [b_2]: 7.06999e-06 [updatestate_depend_eliminate]: 5.27001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 5.00004e-07 [cse]: 1.635e-05 [optimize_parallel_all_gather_comm]: 1.58e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 2.211e-05 [loop_unroll]: 0.00041256 [opt_after_cconv]: 9.501e-05, [1] [Cycle 1]: 8.933e-05, [7] [c_1]: 2.793e-05 [parameter_eliminate]: 2.02999e-06 [updatestate_depend_eliminate]: 5.02999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.43e-06 [cse]: 1.661e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.191e-05 [tuple_transform]: 6.881e-05, [1] [Cycle 1]: 6.456e-05, [4] [d_1]: 3.897e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.943e-05 [cse_after_recomputation]: 2.017e-05, [1] [Cycle 1]: 1.591e-05, [1] [cse]: 1.071e-05 [environ_conv]: 4.36002e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 3.98001e-06 [label_fine_grained_interleaved_index]: 2.73e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.16998e-06 [micro_interleaved_order_control]: 2.37999e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 9.50007e-07 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.20002e-06 [reorder_send_recv_between_fp_bp]: 2.69001e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 9.79984e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.09e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81998e-06 [control_data_broadcast_order]: 1.181e-05 [grouped_pairwise_exchange_alltoall]: 1.46998e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.41002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.35999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.47001e-06 [overlap_grad_ring_attention]: 4.07003e-06 [overlap_grad_flash_sp]: 1.656e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.05002e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 6.886e-05, [1] [Cycle 1]: 6.472e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.44998e-06 [elim_not_effective]: 1.141e-05 [opt_reshape]: 6.31998e-06 [fold_const_symbol]: 9.00001e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.77001e-06 [pipeline_parallel_scheduler]: 1.55001e-06 [auto_monad_reorder]: 1.584e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.61999e-06 [opt_after_jit_grad]: 0.00045151 [validate]: 3.109e-05 [backend_pass]: 1.09998e-06 [task_emit]: 0.00618691 [execute]: 6.79999e-06 Sums bootstrap : 0.000577s : 3.44% type_inference : 0.006087s : 36.37% event_method : 0.000015s : 0.09% auto_monad : 0.000058s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000580s : 3.47% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000018s : 0.11% optimize.opt_a.renormalize : 0.000424s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000127s : 0.76% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.70% optimize.opt_b.b_1 : 0.000107s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000413s : 2.47% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000452s : 2.70% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006187s : 36.97% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000165 30 14.70% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.49% : 0.000110s : 3: substitution.inline 1.91% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.81% : 0.000005s : 4: substitution.remove_not_recompute_node 2.46% : 0.000004s : 4: substitution.replace_old_param 6.51% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006044 2 90.57% : 0.005474s : 1: type_inference.infer 9.43% : 0.000570s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.57% : 0.000027s : 3: replace.inline 29.43% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.75% : 0.000108s : 3: match.inline 8.25% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.89% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.56% : 0.000004s : 32: predicate.load_eliminater 1.06% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.26% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.75% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000356 8 47.97% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.03% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030045 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.22% : 0.003371s : 1: add_attr 11.18% : 0.003359s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000063s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.05% : 0.000615s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.34% : 0.001003s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.28% : 0.002189s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.53% : 0.000461s : 1: opt_after_jit_grad 0.61% : 0.000184s : 1: opt_b 13.41% : 0.004028s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000012s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.71% : 0.000213s : 1: renormalize.infer 0.68% : 0.000204s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000036s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 20.62% : 0.006197s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.30% : 0.006101s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0180973, [24] [bootstrap]: 0.00046349 [type_inference]: 0.00430964 [event_method]: 1.036e-05 [auto_monad]: 5.128e-05 [graph_reusing]: 5.64e-06 [inline]: 1.76e-06 [add_attr]: 0.00296831, [1] [add_attr_with_inline]: 0.0029603, [1] [Cycle 1]: 4.709e-05, [2] [tag_attr]: 1.195e-05 [meta_addattr_fg_expand]: 3.38e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.22e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00364963, [53] [py_interpret_to_execute]: 1.475e-05 [rewriter_before_opt_a]: 3.889e-05 [opt_a]: 0.00186199, [2] [Cycle 1]: 0.00125911, [45] [expand_dump_flag]: 2.68998e-06 [switch_simplify]: 2.335e-05 [loop_unroll]: 1.405e-05 [a_1]: 0.00029522 [with_stream_mark]: 1.325e-05 [recompute_prepare]: 7.13e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.01999e-06 [updatestate_loads_eliminate]: 3.04999e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.65e-05 [accelerated_algorithm]: 6.58e-06 [shard]: 2.34999e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 6.12999e-06 [merge_send_recv]: 8e-06 [auto_parallel]: 5.84e-06 [parallel]: 1.718e-05 [flash_sp]: 7.36001e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.21999e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 6.75002e-06 [virtual_dataset]: 5.68002e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.59998e-06 [merge_forward]: 3.59002e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 9.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 1.57001e-06 [before_grad]: 9.11002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.37999e-06 [flash_sp_send_recv_attached]: 2.26998e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 1.072e-05 [a_after_grad]: 8.95001e-06 [renormalize]: 0.00033365 [add_forward_monad_depend]: 4.43999e-06 [auto_monad_grad]: 1.76998e-06 [auto_monad_eliminator]: 1.353e-05 [cse]: 2.597e-05 [a_3]: 3.981e-05 [Cycle 2]: 0.00059374, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.96001e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.0001265 [with_stream_mark]: 9.53997e-06 [recompute_prepare]: 5.51998e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.34001e-06 [parameter_eliminate]: 7.10017e-07 [a_2]: 6.87e-05 [accelerated_algorithm]: 5.41002e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.01002e-06 [shard_inline]: 5.24998e-06 [merge_send_recv]: 4.12998e-06 [auto_parallel]: 5.13002e-06 [parallel]: 4.07e-06 [flash_sp]: 2.98e-06 [merge_comm]: 3.20002e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.12999e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.66999e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 1e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 3.07002e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.20001e-06 [a_after_grad]: 8.64998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.231e-05 [a_3]: 3.204e-05 [py_interpret_to_execute_after_opt_a]: 6.98e-06 [slice_cell_reuse_recomputed_activation]: 1.75001e-06 [rewriter_after_opt_a]: 2.996e-05 [convert_after_rewriter]: 7.23e-06 [order_py_execute_after_rewriter]: 5.49e-06 [mutable_eliminate]: 0.00044404 [opt_b]: 0.00018233, [1] [Cycle 1]: 0.00017632, [7] [b_1]: 0.00010842 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.35001e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.23998e-06 [renormalize]: 4.19997e-07 [cse]: 1.602e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 1.79e-06 [cconv]: 2.096e-05 [loop_unroll]: 0.00040711 [opt_after_cconv]: 9.451e-05, [1] [Cycle 1]: 8.88e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.08998e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.616e-05 [renormalize]: 4.09986e-07 [remove_dup_value]: 1.242e-05 [tuple_transform]: 6.835e-05, [1] [Cycle 1]: 6.407e-05, [4] [d_1]: 3.901e-05 [none_parameter_eliminate]: 1.33002e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.16e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.411e-05 [cse_after_recomputation]: 1.974e-05, [1] [Cycle 1]: 1.563e-05, [1] [cse]: 1.054e-05 [environ_conv]: 4.27998e-06 [swap_dp_allreduce_reducescatter]: 5.59998e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.33999e-06 [label_fine_grained_interleaved_index]: 2.96999e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.11998e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 8.69972e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.33002e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.18002e-06 [overlap_grad_ring_attention]: 4.40999e-06 [overlap_grad_flash_sp]: 1.655e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.56998e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.775e-05, [1] [Cycle 1]: 6.368e-05, [6] [build]: 2.61e-06 [elim_shapecalc]: 8.19998e-06 [elim_not_effective]: 1.143e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.74998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.71e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.488e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044471 [validate]: 3.062e-05 [backend_pass]: 8.39995e-07 [task_emit]: 0.0059057 [execute]: 6.94001e-06 Sums bootstrap : 0.000463s : 3.27% type_inference : 0.004310s : 30.43% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000422s : 2.98% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000145s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000334s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000444s : 3.14% optimize.opt_b.b_1 : 0.000108s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000407s : 2.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.14% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005906s : 41.70% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.16% : 0.000022s : 4: substitution.arithmetic_simplify 1.70% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.22% : 0.000005s : 4: substitution.graph_param_transform 65.40% : 0.000079s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.81% : 0.000005s : 4: substitution.remove_not_recompute_node 3.41% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004270 2 92.07% : 0.003931s : 1: type_inference.infer 7.93% : 0.000339s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.96% : 0.000001s : 9: predicate.accumulaten_eliminater 1.10% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.63% : 0.000001s : 8: predicate.addn_check_dump 0.70% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.45% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 1.06% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000002s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.94% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.34% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.06% : 0.000001s : 8: predicate.less_batch_normalization 1.55% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.97% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.86% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.34% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.40% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.45% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 42.81% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.19% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025982 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.44% : 0.002973s : 1: add_attr 11.41% : 0.002964s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000500s : 1: bootstrap 0.09% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000415s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.98% : 0.000774s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.18% : 0.001865s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000454s : 1: opt_after_jit_grad 0.71% : 0.000186s : 1: opt_b 14.06% : 0.003653s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000183s : 1: renormalize.infer 0.56% : 0.000144s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.77% : 0.005916s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.64% : 0.004323s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x6-kbk],max_mem:64.0M TotalTime = 0.0808405, [24] [bootstrap]: 0.00058164 [type_inference]: 0.00607386 [event_method]: 1.358e-05 [auto_monad]: 6.037e-05 [graph_reusing]: 6.17999e-06 [inline]: 1.81e-06 [add_attr]: 0.00342963, [1] [add_attr_with_inline]: 0.00341915, [1] [Cycle 1]: 4.493e-05, [2] [tag_attr]: 1.442e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.751e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00397641, [53] [py_interpret_to_execute]: 1.928e-05 [rewriter_before_opt_a]: 5.841e-05 [opt_a]: 0.00212498, [2] [Cycle 1]: 0.00150485, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 3.305e-05 [loop_unroll]: 2.166e-05 [a_1]: 0.00045208 [with_stream_mark]: 1.306e-05 [recompute_prepare]: 7.68999e-06 [updatestate_depend_eliminate]: 3.63e-06 [updatestate_assign_eliminate]: 3.16999e-06 [updatestate_loads_eliminate]: 2.73998e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.631e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 1.93002e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.38e-06 [auto_parallel]: 5.87999e-06 [parallel]: 2.279e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.61999e-06 [allreduce_fusion]: 3.45998e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.41999e-06 [virtual_dataset]: 5.97001e-06 [get_grad_eliminate_]: 5.40001e-06 [virtual_output]: 6.16e-06 [merge_forward]: 3.80998e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 8.89998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.084e-05 [merge_recompute_call_nodes]: 1.31002e-06 [before_grad]: 9.71998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.49001e-06 [meta_fg_expand]: 2.26998e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 1.054e-05 [a_after_grad]: 8.47e-06 [renormalize]: 0.00041043 [add_forward_monad_depend]: 4.65999e-06 [auto_monad_grad]: 2.02001e-06 [auto_monad_eliminator]: 1.336e-05 [cse]: 2.523e-05 [a_3]: 4.046e-05 [Cycle 2]: 0.00061069, [45] [expand_dump_flag]: 8.49977e-07 [switch_simplify]: 6.75002e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.00012731 [with_stream_mark]: 9.51003e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.59001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 8.192e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 1.04998e-06 [meta_shard_fg_expand]: 1.15999e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.33001e-06 [flash_sp]: 3.14001e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.71e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.31998e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 6.21998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.83998e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94001e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 1.05001e-06 [receive_attached]: 9.50007e-07 [after_resolve]: 8.77999e-06 [a_after_grad]: 8.15e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.24e-06 [auto_monad_grad]: 1.07998e-06 [auto_monad_eliminator]: 6.49999e-06 [cse]: 1.356e-05 [a_3]: 3.333e-05 [py_interpret_to_execute_after_opt_a]: 7.44002e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.379e-05 [convert_after_rewriter]: 6.93e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00045271 [opt_b]: 0.00018234, [1] [Cycle 1]: 0.00017629, [7] [b_1]: 0.00010936 [b_2]: 7.21999e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 3.80009e-07 [cse]: 1.597e-05 [optimize_parallel_all_gather_comm]: 1.519e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.154e-05 [loop_unroll]: 0.00041781 [opt_after_cconv]: 9.482e-05, [1] [Cycle 1]: 8.924e-05, [7] [c_1]: 2.796e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.69001e-06 [updatestate_loads_eliminate]: 2.34999e-06 [cse]: 1.589e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.203e-05 [tuple_transform]: 6.949e-05, [1] [Cycle 1]: 6.519e-05, [4] [d_1]: 3.947e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 2.09e-06 [add_recomputation]: 5.098e-05 [cse_after_recomputation]: 2.057e-05, [1] [Cycle 1]: 1.609e-05, [1] [cse]: 1.11e-05 [environ_conv]: 4.45999e-06 [swap_dp_allreduce_reducescatter]: 5.40999e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.16001e-06 [label_fine_grained_interleaved_index]: 2.73998e-06 [merge_cast_opt]: 1.45001e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.86e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.02998e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 1.29998e-06 [interleave_split_concat_branches]: 1.41002e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.83002e-06 [control_data_broadcast_order]: 1.147e-05 [grouped_pairwise_exchange_alltoall]: 1.55001e-06 [offloading_packed_experts]: 3.69002e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.24001e-06 [overlap_grad_ring_attention]: 3.96001e-06 [overlap_grad_flash_sp]: 1.635e-05 [begin_end_overlap_inline]: 8.39995e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.856e-05, [1] [Cycle 1]: 6.442e-05, [6] [build]: 2.29999e-06 [elim_shapecalc]: 8.23001e-06 [elim_not_effective]: 1.187e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.67999e-06 [pipeline_parallel_scheduler]: 1.67001e-06 [auto_monad_reorder]: 1.567e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.43e-06 [opt_after_jit_grad]: 0.00044901 [validate]: 3.367e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0659399 [execute]: 8.66002e-06 Sums bootstrap : 0.000582s : 0.76% type_inference : 0.006074s : 7.95% event_method : 0.000014s : 0.02% auto_monad : 0.000060s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000040s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000579s : 0.76% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000158s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000411s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000034s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.59% optimize.opt_b.b_1 : 0.000109s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000418s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000051s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.59% validate : 0.000034s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.065940s : 86.26% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000164 30 14.58% : 0.000024s : 5: substitution.arithmetic_simplify 1.19% : 0.000002s : 2: substitution.elim_not_effective 0.89% : 0.000001s : 2: substitution.fold_const_symbol 3.18% : 0.000005s : 4: substitution.graph_param_transform 66.54% : 0.000109s : 3: substitution.inline 1.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.96% : 0.000005s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.59% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006030 2 91.12% : 0.005494s : 1: type_inference.infer 8.88% : 0.000535s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.51% : 0.000026s : 3: replace.inline 30.49% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.68% : 0.000107s : 3: match.inline 8.32% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000161 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.55% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.07% : 0.000003s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.08% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.23% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.23% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.80% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 0.99% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.59% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.67% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.91% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.75% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.57% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.42% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.28% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.91% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000332 8 47.11% : 0.000157s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.89% : 0.000176s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.089760 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.83% : 0.003434s : 1: add_attr 3.81% : 0.003423s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000055s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000065s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.69% : 0.000619s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000010s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.51% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000014s : 1: opt.transform.mutable_eliminate 1.06% : 0.000948s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.37% : 0.002128s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.51% : 0.000459s : 1: opt_after_jit_grad 0.21% : 0.000186s : 1: opt_b 4.43% : 0.003980s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000213s : 1: renormalize.infer 0.21% : 0.000191s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000038s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 73.48% : 0.065957s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.78% : 0.006088s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.0703432, [24] [bootstrap]: 0.00048086 [type_inference]: 0.00437171 [event_method]: 1.077e-05 [auto_monad]: 5.056e-05 [graph_reusing]: 4.72e-06 [inline]: 1.67001e-06 [add_attr]: 0.00297369, [1] [add_attr_with_inline]: 0.00296588, [1] [Cycle 1]: 8.545e-05, [2] [tag_attr]: 1.177e-05 [meta_addattr_fg_expand]: 3.07002e-06 [parallel-infer-symbol]: 2.62001e-06 [pre_auto_parallel]: 2.142e-05 [insert-virtual-dataset]: 2.89999e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.02999e-06 [pipeline_split]: 1.72999e-06 [optimize]: 0.00365081, [53] [py_interpret_to_execute]: 1.452e-05 [rewriter_before_opt_a]: 3.808e-05 [opt_a]: 0.00184781, [2] [Cycle 1]: 0.00124695, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.428e-05 [loop_unroll]: 1.362e-05 [a_1]: 0.00029194 [with_stream_mark]: 1.395e-05 [recompute_prepare]: 7.52002e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 1.58002e-06 [a_2]: 7.626e-05 [accelerated_algorithm]: 6.09999e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 5.95002e-06 [parallel]: 1.823e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.56999e-06 [allreduce_fusion]: 3.58999e-06 [matmul_add_comm_reduction]: 8.40001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.93998e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 5.96e-06 [virtual_output]: 5.87001e-06 [merge_forward]: 3.80998e-06 [cell_reuse_recompute_pass]: 1.34003e-06 [offload_activation]: 8.82999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.129e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.41998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.05002e-06 [flash_sp_send_recv_attached]: 2.13998e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 1.042e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00034112 [add_forward_monad_depend]: 4.03999e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.24e-05 [cse]: 2.654e-05 [a_3]: 4.063e-05 [Cycle 2]: 0.00059173, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.04001e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012483 [with_stream_mark]: 1.105e-05 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.35002e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.714e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.19998e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.56002e-06 [merge_send_recv]: 4.17998e-06 [auto_parallel]: 5.21002e-06 [parallel]: 4.26001e-06 [flash_sp]: 3.33998e-06 [merge_comm]: 2.95998e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.71e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.62999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.58002e-06 [merge_recompute_call_nodes]: 6.79982e-07 [before_grad]: 8.04002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.97999e-06 [a_after_grad]: 7.95998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 5.97001e-06 [cse]: 1.275e-05 [a_3]: 3.243e-05 [py_interpret_to_execute_after_opt_a]: 7.56999e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.075e-05 [convert_after_rewriter]: 7.48999e-06 [order_py_execute_after_rewriter]: 4.89003e-06 [mutable_eliminate]: 0.00044935 [opt_b]: 0.00018181, [1] [Cycle 1]: 0.00017579, [7] [b_1]: 0.00010776 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.41e-06 [renormalize]: 2.80008e-07 [cse]: 1.634e-05 [optimize_parallel_all_gather_comm]: 1.54e-05 [overlap_param_gather]: 1.64998e-06 [cconv]: 2.22e-05 [loop_unroll]: 0.00041701 [opt_after_cconv]: 9.417e-05, [1] [Cycle 1]: 8.84e-05, [7] [c_1]: 2.752e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.68e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.621e-05 [renormalize]: 2.99973e-07 [remove_dup_value]: 1.239e-05 [tuple_transform]: 6.836e-05, [1] [Cycle 1]: 6.435e-05, [4] [d_1]: 3.852e-05 [none_parameter_eliminate]: 1.50999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.33998e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.354e-05 [cse_after_recomputation]: 1.963e-05, [1] [Cycle 1]: 1.556e-05, [1] [cse]: 1.062e-05 [environ_conv]: 5.44e-06 [swap_dp_allreduce_reducescatter]: 5.42001e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.4e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.40002e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.85001e-06 [control_data_broadcast_order]: 1.149e-05 [grouped_pairwise_exchange_alltoall]: 1.69e-06 [offloading_packed_experts]: 3.88999e-06 [overlap_recompute_and_grad_model_parallel]: 4.23999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 1.94e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.723e-05 [begin_end_overlap_inline]: 4.70027e-07 [split_matmul_comm_elemetwise]: 2.25002e-06 [split_layernorm_comm]: 1.74998e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 6.78e-05, [1] [Cycle 1]: 6.379e-05, [6] [build]: 2.07999e-06 [elim_shapecalc]: 8.1e-06 [elim_not_effective]: 1.177e-05 [opt_reshape]: 6.21e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 1.535e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.000472 [validate]: 3.147e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0580364 [execute]: 8.58001e-06 Sums bootstrap : 0.000481s : 0.72% type_inference : 0.004372s : 6.58% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000417s : 0.63% optimize.opt_a.with_stream_mark : 0.000025s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000014s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000341s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.68% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000417s : 0.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000472s : 0.71% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058036s : 87.38% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.40% : 0.000022s : 4: substitution.arithmetic_simplify 1.71% : 0.000002s : 2: substitution.elim_not_effective 1.07% : 0.000001s : 2: substitution.fold_const_symbol 4.43% : 0.000005s : 4: substitution.graph_param_transform 65.00% : 0.000078s : 2: substitution.inline 2.40% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.82% : 0.000005s : 4: substitution.remove_not_recompute_node 3.15% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004330 2 91.76% : 0.003973s : 1: type_inference.infer 8.24% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.98% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.28% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.16% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.10% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 1.13% : 0.000002s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.84% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.73% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.36% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.78% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.82% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.07% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.63% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000249 6 42.99% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.01% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078236 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.81% : 0.002978s : 1: add_attr 3.80% : 0.002969s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000516s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000009s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000770s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.37% : 0.001851s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.62% : 0.000481s : 1: opt_after_jit_grad 0.24% : 0.000185s : 1: opt_b 4.67% : 0.003654s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000186s : 1: renormalize.infer 0.19% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.20% : 0.058053s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.60% : 0.004385s : 1: type_inference 0.07% : 0.000052s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x6-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x7-pynative],max_mem:64.0M TotalTime = 0.0212042, [24] [bootstrap]: 0.00056877 [type_inference]: 0.00612108 [event_method]: 1.425e-05 [auto_monad]: 5.803e-05 [graph_reusing]: 5.32001e-06 [inline]: 1.80001e-06 [add_attr]: 0.00334719, [1] [add_attr_with_inline]: 0.00333711, [1] [Cycle 1]: 4.523e-05, [2] [tag_attr]: 1.578e-05 [meta_addattr_fg_expand]: 4.42998e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 2.787e-05 [insert-virtual-dataset]: 2.41998e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.25002e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.00398414, [53] [py_interpret_to_execute]: 2.109e-05 [rewriter_before_opt_a]: 5.82e-05 [opt_a]: 0.00214484, [2] [Cycle 1]: 0.00153832, [45] [expand_dump_flag]: 2.91e-06 [switch_simplify]: 3.06e-05 [loop_unroll]: 2.109e-05 [a_1]: 0.00047872 [with_stream_mark]: 1.329e-05 [recompute_prepare]: 7.96001e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.10002e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.70001e-06 [a_2]: 7.661e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 1.77999e-06 [meta_shard_fg_expand]: 1.59e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 6.63e-06 [parallel]: 2.303e-05 [flash_sp]: 7.65e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 8.13999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.35e-06 [virtual_dataset]: 6.18002e-06 [get_grad_eliminate_]: 5.54998e-06 [virtual_output]: 5.97999e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.88998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.06e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.05001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.51e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 9.78002e-06 [a_after_grad]: 8.55999e-06 [renormalize]: 0.00041508 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.388e-05 [cse]: 2.684e-05 [a_3]: 4.096e-05 [Cycle 2]: 0.00059724, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.69999e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.0001266 [with_stream_mark]: 9.70002e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 6.723e-05 [accelerated_algorithm]: 5.29e-06 [shard]: 1.14998e-06 [meta_shard_fg_expand]: 1.16002e-06 [shard_inline]: 5.38002e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.13999e-06 [flash_sp]: 5.20999e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.94999e-06 [matmul_add_comm_reduction]: 4.95999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.23002e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 5.06002e-06 [merge_forward]: 2.67001e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.04999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.012e-05 [merge_recompute_call_nodes]: 6.30011e-07 [before_grad]: 8.19998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 8.03999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.173e-05 [a_3]: 3.182e-05 [py_interpret_to_execute_after_opt_a]: 7.38999e-06 [slice_cell_reuse_recomputed_activation]: 2.16e-06 [rewriter_after_opt_a]: 3.036e-05 [convert_after_rewriter]: 6.93998e-06 [order_py_execute_after_rewriter]: 5.66e-06 [mutable_eliminate]: 0.00045156 [opt_b]: 0.00018323, [1] [Cycle 1]: 0.00017695, [7] [b_1]: 0.0001089 [b_2]: 7.45e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 3.09985e-07 [cse]: 1.599e-05 [optimize_parallel_all_gather_comm]: 1.579e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.175e-05 [loop_unroll]: 0.00041613 [opt_after_cconv]: 9.521e-05, [1] [Cycle 1]: 8.953e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.12001e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.637e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.766e-05, [1] [Cycle 1]: 6.36e-05, [4] [d_1]: 3.849e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.04999e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 4.601e-05 [cse_after_recomputation]: 1.986e-05, [1] [Cycle 1]: 1.538e-05, [1] [cse]: 1.032e-05 [environ_conv]: 4.50001e-06 [swap_dp_allreduce_reducescatter]: 5.04998e-06 [bias_add_comm_swap]: 2.54999e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.24999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.02999e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.64998e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.57e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.82001e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.736e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.37001e-06 [split_layernorm_comm]: 1.66e-06 [handle_group_info]: 1.17999e-06 [symbol_engine_optimizer]: 6.745e-05, [1] [Cycle 1]: 6.34e-05, [6] [build]: 2.67001e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.11e-05 [opt_reshape]: 6.05002e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.82999e-06 [pipeline_parallel_scheduler]: 1.84e-06 [auto_monad_reorder]: 1.537e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.2e-06 [opt_after_jit_grad]: 0.00046947 [validate]: 3.035e-05 [backend_pass]: 1.35999e-06 [task_emit]: 0.00633085 [execute]: 8.18999e-06 Sums bootstrap : 0.000569s : 3.37% type_inference : 0.006121s : 36.25% event_method : 0.000014s : 0.08% auto_monad : 0.000058s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000605s : 3.58% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000013s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000415s : 2.46% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.67% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000416s : 2.46% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000469s : 2.78% validate : 0.000030s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006331s : 37.49% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000186 30 12.78% : 0.000024s : 5: substitution.arithmetic_simplify 0.94% : 0.000002s : 2: substitution.elim_not_effective 0.63% : 0.000001s : 2: substitution.fold_const_symbol 2.68% : 0.000005s : 4: substitution.graph_param_transform 71.54% : 0.000133s : 3: substitution.inline 1.44% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.41% : 0.000004s : 4: substitution.remove_not_recompute_node 2.00% : 0.000004s : 4: substitution.replace_old_param 5.58% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006079 2 90.35% : 0.005492s : 1: type_inference.infer 9.65% : 0.000586s : 1: type_inference.specialize ------[replace.] 0.000040 5 71.53% : 0.000029s : 3: replace.inline 28.47% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000141 5 93.31% : 0.000131s : 3: match.inline 6.69% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.12% : 0.000003s : 19: predicate.arithmetic_simplify 0.94% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.55% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.02% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.39% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.25% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.16% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.27% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.07% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.81% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.48% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.91% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.57% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.10% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.94% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.64% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000352 8 47.11% : 0.000166s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.89% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030074 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.14% : 0.003351s : 1: add_attr 11.11% : 0.003341s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.02% : 0.000606s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.02% : 0.000007s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.41% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.22% : 0.000969s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002148s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.59% : 0.000479s : 1: opt_after_jit_grad 0.62% : 0.000187s : 1: opt_b 13.26% : 0.003988s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.69% : 0.000207s : 1: renormalize.infer 0.67% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000070s : 1: symbol_engine_optimizer 21.08% : 0.006341s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.40% : 0.006134s : 1: type_inference 0.21% : 0.000062s : 1: validate TotalTime = 0.0180128, [24] [bootstrap]: 0.00046649 [type_inference]: 0.00431356 [event_method]: 1.092e-05 [auto_monad]: 5.081e-05 [graph_reusing]: 5.56e-06 [inline]: 1.89e-06 [add_attr]: 0.00295557, [1] [add_attr_with_inline]: 0.00294771, [1] [Cycle 1]: 4.581e-05, [2] [tag_attr]: 1.202e-05 [meta_addattr_fg_expand]: 3.31999e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.216e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.92001e-06 [optimize]: 0.00367427, [53] [py_interpret_to_execute]: 1.488e-05 [rewriter_before_opt_a]: 3.81e-05 [opt_a]: 0.00186102, [2] [Cycle 1]: 0.00125426, [45] [expand_dump_flag]: 2.96999e-06 [switch_simplify]: 2.37e-05 [loop_unroll]: 1.379e-05 [a_1]: 0.00029261 [with_stream_mark]: 1.35e-05 [recompute_prepare]: 7.58001e-06 [updatestate_depend_eliminate]: 3.72002e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.609e-05 [accelerated_algorithm]: 6.54999e-06 [shard]: 2.46e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 7.38999e-06 [auto_parallel]: 5.48002e-06 [parallel]: 1.796e-05 [flash_sp]: 7.08e-06 [merge_comm]: 3.35e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 8.35999e-06 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.29998e-06 [virtual_output]: 6.09999e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.39003e-06 [before_grad]: 9.84001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.27001e-06 [receive_attached]: 2.53998e-06 [after_resolve]: 1.09e-05 [a_after_grad]: 9.27999e-06 [renormalize]: 0.00034179 [add_forward_monad_depend]: 4.03999e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.374e-05 [cse]: 2.764e-05 [a_3]: 4.148e-05 [Cycle 2]: 0.00059701, [45] [expand_dump_flag]: 8.40024e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.69999e-06 [a_1]: 0.00012565 [with_stream_mark]: 9.67999e-06 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.96001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.07999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 7.022e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.05999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.35e-06 [auto_parallel]: 5.32001e-06 [parallel]: 4.08001e-06 [flash_sp]: 3.46001e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 4.91002e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.04999e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.53e-06 [cell_reuse_recompute_pass]: 1.57999e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.013e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.21002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99001e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.17999e-06 [a_after_grad]: 8.02e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.22001e-06 [cse]: 1.231e-05 [a_3]: 3.251e-05 [py_interpret_to_execute_after_opt_a]: 7.03e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 2.982e-05 [convert_after_rewriter]: 7.34002e-06 [order_py_execute_after_rewriter]: 5.02999e-06 [mutable_eliminate]: 0.00044832 [opt_b]: 0.00018192, [1] [Cycle 1]: 0.0001761, [7] [b_1]: 0.00010865 [b_2]: 6.98998e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 4.50003e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.553e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.293e-05 [loop_unroll]: 0.00041423 [opt_after_cconv]: 0.00010702, [1] [Cycle 1]: 0.0001012, [7] [c_1]: 2.807e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.559e-05 [renormalize]: 1.69995e-07 [remove_dup_value]: 1.146e-05 [tuple_transform]: 6.994e-05, [1] [Cycle 1]: 6.527e-05, [4] [d_1]: 3.912e-05 [none_parameter_eliminate]: 1.66998e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.41e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.376e-05 [cse_after_recomputation]: 2.008e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.57998e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.42999e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.148e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.79002e-06 [overlap_recompute_and_grad_model_parallel]: 4.27998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.629e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 1.84e-06 [handle_group_info]: 1.25001e-06 [symbol_engine_optimizer]: 6.732e-05, [1] [Cycle 1]: 6.316e-05, [6] [build]: 1.97999e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.16e-05 [opt_reshape]: 6.26998e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.87999e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.562e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.29001e-06 [opt_after_jit_grad]: 0.00045276 [validate]: 3.02e-05 [backend_pass]: 1.07998e-06 [task_emit]: 0.00579739 [execute]: 6.87002e-06 Sums bootstrap : 0.000466s : 3.31% type_inference : 0.004314s : 30.61% event_method : 0.000011s : 0.08% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000418s : 2.97% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000006s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000342s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.53% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.18% optimize.opt_b.b_1 : 0.000109s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000414s : 2.94% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000453s : 3.21% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005797s : 41.15% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 18.67% : 0.000023s : 4: substitution.arithmetic_simplify 1.65% : 0.000002s : 2: substitution.elim_not_effective 0.97% : 0.000001s : 2: substitution.fold_const_symbol 4.18% : 0.000005s : 4: substitution.graph_param_transform 64.72% : 0.000079s : 2: substitution.inline 2.60% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.66% : 0.000004s : 4: substitution.remove_not_recompute_node 3.55% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004273 2 91.98% : 0.003930s : 1: type_inference.infer 8.02% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.69% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.63% : 0.000004s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.82% : 0.000002s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.81% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.09% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 0.99% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.05% : 0.000001s : 8: predicate.less_batch_normalization 1.60% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.20% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.61% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.13% : 0.000002s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.84% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.68% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.20% : 0.000002s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.07% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.55% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.47% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.87% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 43.19% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.81% : 0.000133s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025916 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.42% : 0.002960s : 1: add_attr 11.39% : 0.002951s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000503s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000458s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.98% : 0.000772s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.19% : 0.001864s : 1: opt_a 0.43% : 0.000110s : 1: opt_after_cconv 1.78% : 0.000462s : 1: opt_after_jit_grad 0.72% : 0.000186s : 1: opt_b 14.19% : 0.003678s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.72% : 0.000187s : 1: renormalize.infer 0.57% : 0.000148s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000033s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.40% : 0.005807s : 1: task_emit 0.28% : 0.000073s : 1: tuple_transform 16.70% : 0.004327s : 1: type_inference 0.21% : 0.000055s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x7-kbk],max_mem:64.0M TotalTime = 0.0785373, [24] [bootstrap]: 0.00055063 [type_inference]: 0.00603269 [event_method]: 1.341e-05 [auto_monad]: 5.949e-05 [graph_reusing]: 5.76998e-06 [inline]: 1.67999e-06 [add_attr]: 0.00338063, [1] [add_attr_with_inline]: 0.00336964, [1] [Cycle 1]: 4.366e-05, [2] [tag_attr]: 1.492e-05 [meta_addattr_fg_expand]: 3.95998e-06 [parallel-infer-symbol]: 2.63e-06 [pre_auto_parallel]: 2.654e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.65001e-06 [optimize]: 0.00401159, [53] [py_interpret_to_execute]: 2.005e-05 [rewriter_before_opt_a]: 5.78e-05 [opt_a]: 0.00210857, [2] [Cycle 1]: 0.00150032, [45] [expand_dump_flag]: 3.18998e-06 [switch_simplify]: 3.218e-05 [loop_unroll]: 2.059e-05 [a_1]: 0.00044956 [with_stream_mark]: 1.331e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 4.09002e-06 [updatestate_assign_eliminate]: 3.51999e-06 [updatestate_loads_eliminate]: 3.31999e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.541e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 2.00002e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 8.32998e-06 [auto_parallel]: 5.78002e-06 [parallel]: 2.295e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 8.67998e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.3e-06 [virtual_dataset]: 5.97999e-06 [get_grad_eliminate_]: 5.46e-06 [virtual_output]: 5.78002e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 8.72e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.06e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.06998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.51e-06 [flash_sp_send_recv_attached]: 2.98e-06 [receive_attached]: 2.43e-06 [after_resolve]: 1.035e-05 [a_after_grad]: 9.00999e-06 [renormalize]: 0.00040647 [add_forward_monad_depend]: 4.69002e-06 [auto_monad_grad]: 2.01e-06 [auto_monad_eliminator]: 1.399e-05 [cse]: 2.613e-05 [a_3]: 4.057e-05 [Cycle 2]: 0.00059892, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.89001e-06 [loop_unroll]: 5.37001e-06 [a_1]: 0.00012592 [with_stream_mark]: 1.001e-05 [recompute_prepare]: 5.68997e-06 [updatestate_depend_eliminate]: 3.01001e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.871e-05 [accelerated_algorithm]: 5.56002e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.4e-06 [auto_parallel]: 5.56e-06 [parallel]: 4.2e-06 [flash_sp]: 3.55998e-06 [merge_comm]: 2.90002e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 4.94e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.02999e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.64999e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.62001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.42001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.18999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.92002e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.89001e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 8.80013e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.293e-05 [a_3]: 3.35e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.05e-05 [convert_after_rewriter]: 6.96999e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00045212 [opt_b]: 0.00018103, [1] [Cycle 1]: 0.00017495, [7] [b_1]: 0.00010792 [b_2]: 6.97002e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 3.80009e-07 [cse]: 1.57e-05 [optimize_parallel_all_gather_comm]: 1.546e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00047894 [opt_after_cconv]: 9.398e-05, [1] [Cycle 1]: 8.836e-05, [7] [c_1]: 2.801e-05 [parameter_eliminate]: 2.02001e-06 [updatestate_depend_eliminate]: 5.18002e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.567e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.231e-05 [tuple_transform]: 6.821e-05, [1] [Cycle 1]: 6.408e-05, [4] [d_1]: 3.84e-05 [none_parameter_eliminate]: 1.61002e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.61998e-06 [add_recomputation]: 4.854e-05 [cse_after_recomputation]: 2.01e-05, [1] [Cycle 1]: 1.573e-05, [1] [cse]: 1.068e-05 [environ_conv]: 4.87e-06 [swap_dp_allreduce_reducescatter]: 5.62001e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.3e-06 [label_fine_grained_interleaved_index]: 2.58e-06 [merge_cast_opt]: 1.15999e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.32001e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.25999e-06 [full_micro_interleaved_order_control]: 2.69001e-06 [reorder_send_recv_between_fp_bp]: 2.89999e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.02998e-06 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.06002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.154e-05 [grouped_pairwise_exchange_alltoall]: 1.86998e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.14002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.12001e-06 [overlap_grad_ring_attention]: 3.71001e-06 [overlap_grad_flash_sp]: 1.738e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.37999e-06 [split_layernorm_comm]: 1.61002e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.902e-05, [1] [Cycle 1]: 6.506e-05, [6] [build]: 2.16998e-06 [elim_shapecalc]: 8.48001e-06 [elim_not_effective]: 1.232e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 9.14e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.82001e-06 [auto_monad_reorder]: 1.576e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00045565 [validate]: 3.034e-05 [backend_pass]: 1.02998e-06 [task_emit]: 0.0637203 [execute]: 7.88999e-06 Sums bootstrap : 0.000551s : 0.74% type_inference : 0.006033s : 8.13% event_method : 0.000013s : 0.02% auto_monad : 0.000059s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000575s : 0.78% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000014s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000407s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000074s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.61% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000479s : 0.65% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000456s : 0.61% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063720s : 85.89% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000160 30 15.29% : 0.000025s : 5: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.93% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 65.55% : 0.000105s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.51% : 0.000004s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 7.08% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005988 2 91.01% : 0.005450s : 1: type_inference.infer 8.99% : 0.000539s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.89% : 0.000026s : 3: replace.inline 30.11% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 5 90.85% : 0.000103s : 3: match.inline 9.15% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.98% : 0.000002s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.89% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.20% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.17% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.03% : 0.000010s : 51: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.67% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.86% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.46% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.60% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.42% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.89% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.68% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.46% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.21% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 47.50% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.50% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087430 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.87% : 0.003385s : 1: add_attr 3.86% : 0.003374s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000065s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000590s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000018s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.56% : 0.000488s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.08% : 0.000942s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.42% : 0.002112s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.53% : 0.000465s : 1: opt_after_jit_grad 0.21% : 0.000185s : 1: opt_b 4.59% : 0.004015s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000031s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000209s : 1: renormalize.infer 0.22% : 0.000191s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 72.90% : 0.063737s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 6.91% : 0.006046s : 1: type_inference 0.06% : 0.000055s : 1: validate TotalTime = 0.0699409, [24] [bootstrap]: 0.00047926 [type_inference]: 0.00435598 [event_method]: 1.082e-05 [auto_monad]: 5.015e-05 [graph_reusing]: 5.34998e-06 [inline]: 2.03002e-06 [add_attr]: 0.00296431, [1] [add_attr_with_inline]: 0.00295634, [1] [Cycle 1]: 4.513e-05, [2] [tag_attr]: 1.179e-05 [meta_addattr_fg_expand]: 3.78999e-06 [parallel-infer-symbol]: 3.22002e-06 [pre_auto_parallel]: 2.049e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 6.19999e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00369401, [53] [py_interpret_to_execute]: 1.416e-05 [rewriter_before_opt_a]: 3.924e-05 [opt_a]: 0.00189876, [2] [Cycle 1]: 0.00124196, [45] [expand_dump_flag]: 3.03e-06 [switch_simplify]: 2.466e-05 [loop_unroll]: 1.384e-05 [a_1]: 0.00028988 [with_stream_mark]: 1.332e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.29001e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.55001e-06 [a_2]: 7.632e-05 [accelerated_algorithm]: 6.31e-06 [shard]: 2.43e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 5.86e-06 [parallel]: 1.744e-05 [flash_sp]: 7.2e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.8e-06 [matmul_add_comm_reduction]: 8.88002e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.41001e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.81998e-06 [virtual_output]: 5.54998e-06 [merge_forward]: 3.61999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.29998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.118e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.15002e-06 [flash_sp_send_recv_attached]: 2.24001e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.016e-05 [a_after_grad]: 9.15001e-06 [renormalize]: 0.00033501 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.64e-06 [auto_monad_eliminator]: 1.27e-05 [cse]: 2.525e-05 [a_3]: 4.022e-05 [Cycle 2]: 0.00064735, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.76e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00017815 [with_stream_mark]: 9.63002e-06 [recompute_prepare]: 5.99999e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.39001e-06 [parameter_eliminate]: 8.30012e-07 [a_2]: 6.86e-05 [accelerated_algorithm]: 5.46998e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.32999e-06 [merge_send_recv]: 4.72e-06 [auto_parallel]: 5.40999e-06 [parallel]: 4.07e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.31001e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.02e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.24999e-06 [virtual_dataset]: 5.41998e-06 [get_grad_eliminate_]: 4.93001e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.63e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.19e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 7.75998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03e-06 [meta_fg_expand]: 1.79998e-06 [flash_sp_send_recv_attached]: 8.40024e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.32999e-06 [a_after_grad]: 8.10999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 6.39999e-06 [cse]: 1.275e-05 [a_3]: 3.167e-05 [py_interpret_to_execute_after_opt_a]: 7.27002e-06 [slice_cell_reuse_recomputed_activation]: 2.04e-06 [rewriter_after_opt_a]: 2.997e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 4.93001e-06 [mutable_eliminate]: 0.00044329 [opt_b]: 0.00018245, [1] [Cycle 1]: 0.00017645, [7] [b_1]: 0.00010881 [b_2]: 7.51999e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.33998e-06 [renormalize]: 6.50005e-07 [cse]: 1.561e-05 [optimize_parallel_all_gather_comm]: 1.521e-05 [overlap_param_gather]: 1.72001e-06 [cconv]: 2.229e-05 [loop_unroll]: 0.00040873 [opt_after_cconv]: 9.559e-05, [1] [Cycle 1]: 8.977e-05, [7] [c_1]: 2.79e-05 [parameter_eliminate]: 2.52001e-06 [updatestate_depend_eliminate]: 5.31002e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.586e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.212e-05 [tuple_transform]: 6.846e-05, [1] [Cycle 1]: 6.417e-05, [4] [d_1]: 3.876e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 4.836e-05 [cse_after_recomputation]: 2.063e-05, [1] [Cycle 1]: 1.618e-05, [1] [cse]: 1.114e-05 [environ_conv]: 4.58999e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.03001e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.64001e-06 [assign_add_opt]: 1.17e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.06002e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 1.01997e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.206e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.73999e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.56998e-06 [overlap_grad_ring_attention]: 4.02e-06 [overlap_grad_flash_sp]: 1.7e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.808e-05, [1] [Cycle 1]: 6.402e-05, [6] [build]: 1.99999e-06 [elim_shapecalc]: 8.43999e-06 [elim_not_effective]: 1.167e-05 [opt_reshape]: 6.14999e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.492e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00044574 [validate]: 3.083e-05 [backend_pass]: 8.49977e-07 [task_emit]: 0.0576435 [execute]: 8.08999e-06 Sums bootstrap : 0.000479s : 0.73% type_inference : 0.004356s : 6.60% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000020s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000468s : 0.71% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000335s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000038s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000443s : 0.67% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000409s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000446s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057644s : 87.31% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.02% : 0.000021s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.23% : 0.000005s : 4: substitution.graph_param_transform 65.88% : 0.000078s : 2: substitution.inline 2.19% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.90% : 0.000005s : 4: substitution.remove_not_recompute_node 3.14% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004315 2 91.89% : 0.003965s : 1: type_inference.infer 8.11% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.12% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.49% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.00% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 1.06% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.38% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.68% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.29% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.51% : 0.000001s : 4: predicate.parallel_virtual_node 1.19% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.07% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.66% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.55% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 1.01% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.33% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.80% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.66% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000256 6 44.87% : 0.000115s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.13% : 0.000141s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.077914 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.81% : 0.002968s : 1: add_attr 3.80% : 0.002960s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000514s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000417s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000452s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.05% : 0.000821s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.44% : 0.001902s : 1: opt_a 0.13% : 0.000099s : 1: opt_after_cconv 0.58% : 0.000455s : 1: opt_after_jit_grad 0.24% : 0.000186s : 1: opt_b 4.75% : 0.003698s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000024s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000184s : 1: renormalize.infer 0.19% : 0.000145s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.01% : 0.057660s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.61% : 0.004369s : 1: type_inference 0.07% : 0.000053s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x7-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x8-pynative],max_mem:64.0M TotalTime = 0.0207793, [24] [bootstrap]: 0.0005453 [type_inference]: 0.00605497 [event_method]: 1.437e-05 [auto_monad]: 5.777e-05 [graph_reusing]: 5.35999e-06 [inline]: 1.77999e-06 [add_attr]: 0.00332371, [1] [add_attr_with_inline]: 0.00331345, [1] [Cycle 1]: 5.493e-05, [2] [tag_attr]: 2.572e-05 [meta_addattr_fg_expand]: 3.93999e-06 [parallel-infer-symbol]: 2.86e-06 [pre_auto_parallel]: 2.75e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.81003e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00395209, [53] [py_interpret_to_execute]: 2.071e-05 [rewriter_before_opt_a]: 5.804e-05 [opt_a]: 0.00211704, [2] [Cycle 1]: 0.00151118, [45] [expand_dump_flag]: 2.89001e-06 [switch_simplify]: 3.183e-05 [loop_unroll]: 2.049e-05 [a_1]: 0.00045396 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 7.8e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.16001e-06 [updatestate_loads_eliminate]: 3.03e-06 [parameter_eliminate]: 1.50999e-06 [a_2]: 7.557e-05 [accelerated_algorithm]: 6.36e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 7.93999e-06 [auto_parallel]: 5.97999e-06 [parallel]: 2.248e-05 [flash_sp]: 6.88e-06 [merge_comm]: 3.71999e-06 [allreduce_fusion]: 3.45003e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 6.69999e-06 [virtual_dataset]: 5.82001e-06 [get_grad_eliminate_]: 5.27001e-06 [virtual_output]: 5.62999e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.44e-06 [before_grad]: 9.24998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.31999e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 3.04001e-06 [receive_attached]: 2.64999e-06 [after_resolve]: 1.013e-05 [a_after_grad]: 8.59e-06 [renormalize]: 0.00041599 [add_forward_monad_depend]: 4.42998e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.339e-05 [cse]: 2.752e-05 [a_3]: 4.117e-05 [Cycle 2]: 0.0005967, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.39002e-06 [loop_unroll]: 5.62999e-06 [a_1]: 0.00012441 [with_stream_mark]: 9.89001e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.99999e-06 [updatestate_assign_eliminate]: 2.28002e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.793e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.41998e-06 [merge_send_recv]: 4.60999e-06 [auto_parallel]: 5.19e-06 [parallel]: 5.81e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 3.08998e-06 [allreduce_fusion]: 2.56998e-06 [matmul_add_comm_reduction]: 5.04998e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 5.86998e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 4.99003e-06 [virtual_output]: 6.28998e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.032e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.35999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.32002e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.15001e-06 [after_resolve]: 8.89e-06 [a_after_grad]: 8.23999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14003e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.39001e-06 [cse]: 1.311e-05 [a_3]: 3.152e-05 [py_interpret_to_execute_after_opt_a]: 7.55e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.095e-05 [convert_after_rewriter]: 7.46999e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00044457 [opt_b]: 0.00017876, [1] [Cycle 1]: 0.00017298, [7] [b_1]: 0.00010666 [b_2]: 7.15e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.89991e-07 [cse]: 1.608e-05 [optimize_parallel_all_gather_comm]: 1.49e-05 [overlap_param_gather]: 2.32999e-06 [cconv]: 2.235e-05 [loop_unroll]: 0.00041108 [opt_after_cconv]: 9.611e-05, [1] [Cycle 1]: 9.041e-05, [7] [c_1]: 2.829e-05 [parameter_eliminate]: 2.76999e-06 [updatestate_depend_eliminate]: 5.24998e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.592e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.292e-05 [tuple_transform]: 6.834e-05, [1] [Cycle 1]: 6.41e-05, [4] [d_1]: 3.885e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.12001e-06 [partial_unused_args_eliminate]: 1.55999e-06 [add_recomputation]: 4.605e-05 [cse_after_recomputation]: 2.012e-05, [1] [Cycle 1]: 1.576e-05, [1] [cse]: 1.069e-05 [environ_conv]: 4.92999e-06 [swap_dp_allreduce_reducescatter]: 5.67999e-06 [bias_add_comm_swap]: 3.01001e-06 [label_micro_interleaved_index]: 3.91001e-06 [label_fine_grained_interleaved_index]: 2.53e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.31e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.224e-05 [grouped_pairwise_exchange_alltoall]: 1.83002e-06 [offloading_packed_experts]: 3.47002e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.39e-06 [overlap_recompute_comm]: 1.97001e-06 [overlap_grad_ring_attention]: 4.02998e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 9.40025e-07 [symbol_engine_optimizer]: 7.862e-05, [1] [Cycle 1]: 7.458e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 9.00999e-06 [elim_not_effective]: 1.16e-05 [opt_reshape]: 5.85002e-06 [fold_const_symbol]: 8.73001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 2.08002e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.547e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00044622 [validate]: 3.077e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.00607744 [execute]: 6.41e-06 Sums bootstrap : 0.000545s : 3.31% type_inference : 0.006055s : 36.71% event_method : 0.000014s : 0.09% auto_monad : 0.000058s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000026s : 0.16% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000578s : 3.51% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000416s : 2.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.25% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000445s : 2.70% optimize.opt_b.b_1 : 0.000107s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000411s : 2.49% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000046s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000446s : 2.71% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006077s : 36.85% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000166 30 14.80% : 0.000025s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.13% : 0.000005s : 4: substitution.graph_param_transform 66.65% : 0.000110s : 3: substitution.inline 1.86% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.84% : 0.000005s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 6.49% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006001 2 90.49% : 0.005431s : 1: type_inference.infer 9.51% : 0.000571s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.80% : 0.000027s : 3: replace.inline 30.20% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.80% : 0.000108s : 3: match.inline 8.20% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.83% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.05% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.06% : 0.000003s : 16: predicate.float_depend_g_call 0.63% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000002s : 8: predicate.less_batch_normalization 1.64% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.51% : 0.000004s : 32: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.16% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.18% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.57% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.75% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.08% : 0.000008s : 54: predicate.switch_simplify 0.78% : 0.000001s : 11: predicate.tile_eliminate 0.92% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.56% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.88% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.68% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000354 8 47.19% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.81% : 0.000187s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029570 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.26% : 0.003328s : 1: add_attr 11.22% : 0.003317s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000063s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.97% : 0.000583s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000453s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.19% : 0.000944s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.17% : 0.002120s : 1: opt_a 0.34% : 0.000099s : 1: opt_after_cconv 1.54% : 0.000455s : 1: opt_after_jit_grad 0.62% : 0.000182s : 1: opt_b 13.38% : 0.003956s : 1: optimize 0.06% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000208s : 1: renormalize.infer 0.68% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000081s : 1: symbol_engine_optimizer 20.59% : 0.006087s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.52% : 0.006068s : 1: type_inference 0.21% : 0.000063s : 1: validate TotalTime = 0.0179755, [24] [bootstrap]: 0.00047011 [type_inference]: 0.00431404 [event_method]: 1.038e-05 [auto_monad]: 4.903e-05 [graph_reusing]: 5.15999e-06 [inline]: 2.06998e-06 [add_attr]: 0.0029455, [1] [add_attr_with_inline]: 0.00293679, [1] [Cycle 1]: 4.682e-05, [2] [tag_attr]: 1.239e-05 [meta_addattr_fg_expand]: 3.68999e-06 [parallel-infer-symbol]: 3.09001e-06 [pre_auto_parallel]: 2.069e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00365288, [53] [py_interpret_to_execute]: 1.471e-05 [rewriter_before_opt_a]: 3.725e-05 [opt_a]: 0.00184979, [2] [Cycle 1]: 0.00123357, [45] [expand_dump_flag]: 2.89001e-06 [switch_simplify]: 2.421e-05 [loop_unroll]: 1.377e-05 [a_1]: 0.00028663 [with_stream_mark]: 1.305e-05 [recompute_prepare]: 7.15003e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 2.86999e-06 [updatestate_loads_eliminate]: 2.99999e-06 [parameter_eliminate]: 1.72999e-06 [a_2]: 7.709e-05 [accelerated_algorithm]: 6.25002e-06 [shard]: 2.18002e-06 [meta_shard_fg_expand]: 1.51002e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 7.98001e-06 [auto_parallel]: 5.53002e-06 [parallel]: 1.723e-05 [flash_sp]: 7.35e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 9.37001e-06 [allreduce_slice_to_reducescatter]: 9.09989e-07 [virtual_shard_identity]: 7.43999e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.52997e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 8.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.051e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.34e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58999e-06 [meta_fg_expand]: 2.23002e-06 [flash_sp_send_recv_attached]: 2.56998e-06 [receive_attached]: 2.17001e-06 [after_resolve]: 1.015e-05 [a_after_grad]: 8.70001e-06 [renormalize]: 0.00033018 [add_forward_monad_depend]: 4.45e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 2.621e-05 [a_3]: 4.001e-05 [Cycle 2]: 0.00060669, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.64999e-06 [loop_unroll]: 5.56998e-06 [a_1]: 0.00013588 [with_stream_mark]: 1.098e-05 [recompute_prepare]: 5.91998e-06 [updatestate_depend_eliminate]: 2.80002e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.947e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.29e-06 [merge_send_recv]: 4.27998e-06 [auto_parallel]: 5.20999e-06 [parallel]: 4.28001e-06 [flash_sp]: 3.37002e-06 [merge_comm]: 2.91e-06 [allreduce_fusion]: 2.55002e-06 [matmul_add_comm_reduction]: 5.35001e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 5.11002e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.54001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.14999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.50001e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 8.35001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.19001e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.04e-06 [after_resolve]: 9.19e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.27001e-06 [cse]: 1.259e-05 [a_3]: 3.116e-05 [py_interpret_to_execute_after_opt_a]: 7.4e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.012e-05 [convert_after_rewriter]: 6.60002e-06 [order_py_execute_after_rewriter]: 5.02e-06 [mutable_eliminate]: 0.00045034 [opt_b]: 0.00018265, [1] [Cycle 1]: 0.00017649, [7] [b_1]: 0.00010824 [b_2]: 7.38e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 3.29979e-07 [cse]: 1.644e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.134e-05 [loop_unroll]: 0.00041245 [opt_after_cconv]: 9.39e-05, [1] [Cycle 1]: 8.825e-05, [7] [c_1]: 2.716e-05 [parameter_eliminate]: 2.14999e-06 [updatestate_depend_eliminate]: 5.25999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.41998e-06 [cse]: 1.573e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.225e-05 [tuple_transform]: 6.91e-05, [1] [Cycle 1]: 6.504e-05, [4] [d_1]: 3.947e-05 [none_parameter_eliminate]: 1.60001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.23e-06 [partial_unused_args_eliminate]: 1.55001e-06 [add_recomputation]: 4.328e-05 [cse_after_recomputation]: 2.037e-05, [1] [Cycle 1]: 1.608e-05, [1] [cse]: 1.098e-05 [environ_conv]: 4.49002e-06 [swap_dp_allreduce_reducescatter]: 4.88001e-06 [bias_add_comm_swap]: 2.82002e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.87002e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.61e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.28002e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 1.03001e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.68002e-06 [offloading_packed_experts]: 3.74002e-06 [overlap_recompute_and_grad_model_parallel]: 4.37e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 1.87999e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.721e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 1.06997e-06 [symbol_engine_optimizer]: 6.711e-05, [1] [Cycle 1]: 6.282e-05, [6] [build]: 2.16e-06 [elim_shapecalc]: 7.94002e-06 [elim_not_effective]: 1.124e-05 [opt_reshape]: 5.89999e-06 [fold_const_symbol]: 8.89998e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.81e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.63999e-06 [opt_after_jit_grad]: 0.00044836 [validate]: 3.01e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00579449 [execute]: 6.78e-06 Sums bootstrap : 0.000470s : 3.34% type_inference : 0.004314s : 30.66% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000423s : 3.00% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000147s : 1.04% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000330s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000450s : 3.20% optimize.opt_b.b_1 : 0.000108s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000412s : 2.93% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000448s : 3.19% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005794s : 41.18% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.85% : 0.000023s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.29% : 0.000002s : 2: substitution.fold_const_symbol 4.63% : 0.000006s : 4: substitution.graph_param_transform 64.68% : 0.000077s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.44% : 0.000004s : 4: substitution.remove_not_recompute_node 3.31% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004274 2 92.07% : 0.003935s : 1: type_inference.infer 7.93% : 0.000339s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.15% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.88% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.77% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.07% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.33% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.68% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.81% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.45% : 0.000002s : 11: predicate.partial_defer_inline 1.29% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.78% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.22% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000000s : 4: predicate.reset_defer_inline 0.74% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.01% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.81% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.38% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.84% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.49% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000230 6 41.99% : 0.000097s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.01% : 0.000134s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025836 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.42% : 0.002949s : 1: add_attr 11.38% : 0.002940s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.96% : 0.000506s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.78% : 0.000460s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.99% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.17% : 0.001853s : 1: opt_a 0.38% : 0.000097s : 1: opt_after_cconv 1.77% : 0.000458s : 1: opt_after_jit_grad 0.72% : 0.000186s : 1: opt_b 14.15% : 0.003657s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000181s : 1: renormalize.infer 0.55% : 0.000143s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.47% : 0.005805s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.75% : 0.004327s : 1: type_inference 0.22% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x8-kbk],max_mem:64.0M . TotalTime = 0.894648, [24] [bootstrap]: 0.00055463 [type_inference]: 0.00605119 [event_method]: 1.362e-05 [auto_monad]: 5.488e-05 [graph_reusing]: 5.18002e-06 [inline]: 2.21e-06 [add_attr]: 0.00336566, [1] [add_attr_with_inline]: 0.00335439, [1] [Cycle 1]: 4.473e-05, [2] [tag_attr]: 1.46e-05 [meta_addattr_fg_expand]: 4.40999e-06 [parallel-infer-symbol]: 2.72001e-06 [pre_auto_parallel]: 2.907e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 9.90025e-07 [dataset_repeat_opt]: 2.27001e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00393903, [53] [py_interpret_to_execute]: 1.97e-05 [rewriter_before_opt_a]: 5.664e-05 [opt_a]: 0.00209672, [2] [Cycle 1]: 0.00149549, [45] [expand_dump_flag]: 2.81e-06 [switch_simplify]: 3.178e-05 [loop_unroll]: 2.113e-05 [a_1]: 0.00044939 [with_stream_mark]: 1.285e-05 [recompute_prepare]: 7.78999e-06 [updatestate_depend_eliminate]: 3.8e-06 [updatestate_assign_eliminate]: 3.48e-06 [updatestate_loads_eliminate]: 2.68003e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.458e-05 [accelerated_algorithm]: 6.45002e-06 [shard]: 2.32001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.82e-06 [auto_parallel]: 6.09999e-06 [parallel]: 2.252e-05 [flash_sp]: 6.91001e-06 [merge_comm]: 3.68999e-06 [allreduce_fusion]: 3.36001e-06 [matmul_add_comm_reduction]: 9.10999e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.99e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 6.02001e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 8.80999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.89e-06 [before_grad]: 9.70002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.65e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 8.48999e-06 [renormalize]: 0.00040387 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.83997e-06 [auto_monad_eliminator]: 1.388e-05 [cse]: 2.663e-05 [a_3]: 4.071e-05 [Cycle 2]: 0.00059194, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 6.69001e-06 [loop_unroll]: 5.34003e-06 [a_1]: 0.00012467 [with_stream_mark]: 1.051e-05 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.81e-06 [updatestate_assign_eliminate]: 2.24999e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.838e-05 [accelerated_algorithm]: 5.73002e-06 [shard]: 1.10001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.46e-06 [parallel]: 3.97e-06 [flash_sp]: 3.01001e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.64999e-06 [matmul_add_comm_reduction]: 4.84e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.66999e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 6.68e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.48002e-06 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 8.35001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 7.97e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.17e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.01998e-06 [cse]: 1.27e-05 [a_3]: 3.165e-05 [py_interpret_to_execute_after_opt_a]: 7.26001e-06 [slice_cell_reuse_recomputed_activation]: 1.82001e-06 [rewriter_after_opt_a]: 3.129e-05 [convert_after_rewriter]: 6.84999e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00044971 [opt_b]: 0.00017867, [1] [Cycle 1]: 0.00017282, [7] [b_1]: 0.0001061 [b_2]: 7.04001e-06 [updatestate_depend_eliminate]: 5.16002e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.27001e-06 [renormalize]: 3.60014e-07 [cse]: 1.555e-05 [optimize_parallel_all_gather_comm]: 1.63e-05 [overlap_param_gather]: 2.07999e-06 [cconv]: 2.178e-05 [loop_unroll]: 0.00042539 [opt_after_cconv]: 9.515e-05, [1] [Cycle 1]: 8.902e-05, [7] [c_1]: 2.821e-05 [parameter_eliminate]: 2.47001e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.557e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.202e-05 [tuple_transform]: 6.783e-05, [1] [Cycle 1]: 6.359e-05, [4] [d_1]: 3.817e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 1.60999e-06 [add_recomputation]: 4.963e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.58e-05, [1] [cse]: 1.069e-05 [environ_conv]: 4.55001e-06 [swap_dp_allreduce_reducescatter]: 5.02999e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.13999e-06 [label_fine_grained_interleaved_index]: 2.83998e-06 [merge_cast_opt]: 1.25999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.15001e-06 [ForceFp32Comm]: 6.89994e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.43998e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.19998e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.112e-05 [grouped_pairwise_exchange_alltoall]: 1.79e-06 [offloading_packed_experts]: 3.26001e-06 [overlap_recompute_and_grad_model_parallel]: 4.52998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.30001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.11003e-06 [overlap_grad_ring_attention]: 3.78001e-06 [overlap_grad_flash_sp]: 1.696e-05 [begin_end_overlap_inline]: 6.00005e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.854e-05, [1] [Cycle 1]: 6.441e-05, [6] [build]: 2.45002e-06 [elim_shapecalc]: 8.45999e-06 [elim_not_effective]: 1.154e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 9.09e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.61002e-06 [auto_monad_reorder]: 1.586e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.23998e-06 [opt_after_jit_grad]: 0.00044842 [validate]: 3.086e-05 [backend_pass]: 1.18001e-06 [task_emit]: 0.879894 [execute]: 8.82999e-06 Sums bootstrap : 0.000555s : 0.06% type_inference : 0.006051s : 0.68% event_method : 0.000014s : 0.00% auto_monad : 0.000055s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000057s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.00% optimize.opt_a.loop_unroll : 0.000026s : 0.00% optimize.opt_a.a_1 : 0.000574s : 0.06% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000012s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000007s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000404s : 0.05% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000039s : 0.00% optimize.opt_a.a_3 : 0.000072s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000450s : 0.05% optimize.opt_b.b_1 : 0.000106s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000425s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000050s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.05% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.879894s : 98.83% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000163 30 15.47% : 0.000025s : 5: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.98% : 0.000002s : 2: substitution.fold_const_symbol 3.03% : 0.000005s : 4: substitution.graph_param_transform 66.10% : 0.000107s : 3: substitution.inline 1.93% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.59% : 0.000004s : 4: substitution.remove_not_recompute_node 2.48% : 0.000004s : 4: substitution.replace_old_param 6.25% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006008 2 91.09% : 0.005473s : 1: type_inference.infer 8.91% : 0.000535s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.47% : 0.000027s : 3: replace.inline 29.53% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 92.04% : 0.000105s : 3: match.inline 7.96% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.07% : 0.000003s : 19: predicate.arithmetic_simplify 0.94% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.08% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.18% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.87% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.87% : 0.000001s : 8: predicate.shard_identity_eliminate 0.82% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.22% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.38% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.52% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000332 8 46.26% : 0.000154s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.74% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.903444 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.37% : 0.003370s : 1: add_attr 0.37% : 0.003358s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000054s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000060s : 1: auto_monad 0.00% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.07% : 0.000596s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000459s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.10% : 0.000939s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000089s : 28: opt.transform.opt_b 0.00% : 0.000042s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.23% : 0.002100s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.05% : 0.000458s : 1: opt_after_jit_grad 0.02% : 0.000182s : 1: opt_b 0.44% : 0.003943s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.00% : 0.000034s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000210s : 1: renormalize.infer 0.02% : 0.000187s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000035s : 1: rewriter_after_opt_a 0.01% : 0.000061s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000071s : 1: symbol_engine_optimizer 97.40% : 0.879915s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.67% : 0.006065s : 1: type_inference 0.01% : 0.000056s : 1: validate TotalTime = 0.0543992, [24] [bootstrap]: 0.00048069 [type_inference]: 0.00433705 [event_method]: 1.075e-05 [auto_monad]: 5.201e-05 [graph_reusing]: 5.39998e-06 [inline]: 1.80001e-06 [add_attr]: 0.0029456, [1] [add_attr_with_inline]: 0.00293711, [1] [Cycle 1]: 4.543e-05, [2] [tag_attr]: 1.16e-05 [meta_addattr_fg_expand]: 2.76e-06 [parallel-infer-symbol]: 2.58e-06 [pre_auto_parallel]: 2.073e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.85001e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00370737, [53] [py_interpret_to_execute]: 1.488e-05 [rewriter_before_opt_a]: 3.868e-05 [opt_a]: 0.00183938, [2] [Cycle 1]: 0.00123699, [45] [expand_dump_flag]: 2.58998e-06 [switch_simplify]: 2.373e-05 [loop_unroll]: 1.372e-05 [a_1]: 0.00028884 [with_stream_mark]: 1.295e-05 [recompute_prepare]: 7.46001e-06 [updatestate_depend_eliminate]: 3.62998e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.54e-06 [a_2]: 7.575e-05 [accelerated_algorithm]: 6.39001e-06 [shard]: 2.57001e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.84999e-06 [merge_send_recv]: 7.41001e-06 [auto_parallel]: 5.69e-06 [parallel]: 1.778e-05 [flash_sp]: 7.22002e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.23998e-06 [matmul_add_comm_reduction]: 8.79e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.87001e-06 [get_grad_eliminate_]: 5.40999e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.86001e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.077e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.27001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.03002e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 8.61002e-06 [renormalize]: 0.00033534 [add_forward_monad_depend]: 4.3e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.305e-05 [cse]: 2.769e-05 [a_3]: 3.99e-05 [Cycle 2]: 0.00059323, [45] [expand_dump_flag]: 9.80013e-07 [switch_simplify]: 6.89001e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.000127 [with_stream_mark]: 1.099e-05 [recompute_prepare]: 5.52001e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.785e-05 [accelerated_algorithm]: 5.48002e-06 [shard]: 9.49978e-07 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.57e-06 [auto_parallel]: 5.20001e-06 [parallel]: 4.43001e-06 [flash_sp]: 3.01001e-06 [merge_comm]: 3.25e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 4.82e-06 [allreduce_slice_to_reducescatter]: 4.09986e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.49998e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.66e-06 [cell_reuse_recompute_pass]: 1.31002e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.13002e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.88001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.13e-06 [meta_fg_expand]: 1.59998e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.74e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.63e-06 [cse]: 1.252e-05 [a_3]: 3.14e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.67001e-06 [rewriter_after_opt_a]: 3.052e-05 [convert_after_rewriter]: 6.47001e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.00044456 [opt_b]: 0.0002429, [1] [Cycle 1]: 0.00023695, [7] [b_1]: 0.00016846 [b_2]: 7.08998e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.36e-06 [renormalize]: 2.80008e-07 [cse]: 1.665e-05 [optimize_parallel_all_gather_comm]: 1.575e-05 [overlap_param_gather]: 2.11e-06 [cconv]: 2.306e-05 [loop_unroll]: 0.00041747 [opt_after_cconv]: 9.536e-05, [1] [Cycle 1]: 8.965e-05, [7] [c_1]: 2.806e-05 [parameter_eliminate]: 2.29999e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.649e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.301e-05 [tuple_transform]: 6.855e-05, [1] [Cycle 1]: 6.45e-05, [4] [d_1]: 3.903e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.11e-06 [partial_unused_args_eliminate]: 2.29999e-06 [add_recomputation]: 4.421e-05 [cse_after_recomputation]: 1.976e-05, [1] [Cycle 1]: 1.553e-05, [1] [cse]: 1.053e-05 [environ_conv]: 5.15001e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.59001e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.43e-06 [merge_cast_opt]: 1.47999e-06 [slice_recompute_activation]: 2.63e-06 [micro_interleaved_order_control]: 2.94999e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.25999e-06 [overlap_opt_shard_in_pipeline]: 1.22e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.155e-05 [grouped_pairwise_exchange_alltoall]: 1.51002e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.23999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 3.82998e-06 [overlap_grad_flash_sp]: 1.697e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.57999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.796e-05, [1] [Cycle 1]: 6.384e-05, [6] [build]: 1.94999e-06 [elim_shapecalc]: 8.50001e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 6.06e-06 [fold_const_symbol]: 8.79e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.52e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044739 [validate]: 3.006e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.0421252 [execute]: 7.53999e-06 Sums bootstrap : 0.000481s : 0.95% type_inference : 0.004337s : 8.59% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000021s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000039s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.06% optimize.opt_a.loop_unroll : 0.000019s : 0.04% optimize.opt_a.a_1 : 0.000416s : 0.82% optimize.opt_a.with_stream_mark : 0.000024s : 0.05% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.28% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000335s : 0.66% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000040s : 0.08% optimize.opt_a.a_3 : 0.000071s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.02% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.06% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.88% optimize.opt_b.b_1 : 0.000168s : 0.33% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.05% optimize.loop_unroll : 0.000417s : 0.83% optimize.opt_after_cconv.c_1 : 0.000028s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.03% optimize.tuple_transform.d_1 : 0.000039s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000447s : 0.89% validate : 0.000030s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.042125s : 83.42% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.24% : 0.000022s : 4: substitution.arithmetic_simplify 1.51% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.68% : 0.000006s : 4: substitution.graph_param_transform 65.15% : 0.000077s : 2: substitution.inline 2.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.50% : 0.000004s : 4: substitution.remove_not_recompute_node 3.42% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004297 2 91.66% : 0.003939s : 1: type_inference.infer 8.34% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.01% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.72% : 0.000002s : 11: predicate.float_depend_g_call 0.76% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.12% : 0.000002s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.84% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.90% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.59% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.75% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 1.07% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.26% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.89% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.12% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.69% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000260 6 43.53% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.47% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062369 196 0.01% : 0.000003s : 1: ForceFp32Comm 4.73% : 0.002950s : 1: add_attr 4.72% : 0.002941s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000057s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.83% : 0.000516s : 1: bootstrap 0.04% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000016s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000426s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.73% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.23% : 0.000765s : 78: opt.transform.opt_a 0.04% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.24% : 0.000151s : 28: opt.transform.opt_b 0.07% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.95% : 0.001842s : 1: opt_a 0.16% : 0.000099s : 1: opt_after_cconv 0.73% : 0.000457s : 1: opt_after_jit_grad 0.40% : 0.000246s : 1: opt_b 5.95% : 0.003711s : 1: optimize 0.03% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000025s : 1: pre_auto_parallel 0.03% : 0.000019s : 1: py_interpret_to_execute 0.02% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000017s : 1: remove_dup_value 0.29% : 0.000182s : 1: renormalize.infer 0.23% : 0.000146s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000071s : 1: symbol_engine_optimizer 67.57% : 0.042141s : 1: task_emit 0.11% : 0.000071s : 1: tuple_transform 6.97% : 0.004350s : 1: type_inference 0.08% : 0.000051s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x8-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x9-pynative],max_mem:64.0M TotalTime = 0.0209702, [24] [bootstrap]: 0.00054993 [type_inference]: 0.00611027 [event_method]: 1.463e-05 [auto_monad]: 5.505e-05 [graph_reusing]: 5.51e-06 [inline]: 1.88002e-06 [add_attr]: 0.00334294, [1] [add_attr_with_inline]: 0.00333203, [1] [Cycle 1]: 4.336e-05, [2] [tag_attr]: 1.476e-05 [meta_addattr_fg_expand]: 4e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.681e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.09e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00397964, [53] [py_interpret_to_execute]: 2.074e-05 [rewriter_before_opt_a]: 5.85e-05 [opt_a]: 0.00212499, [2] [Cycle 1]: 0.0015154, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 3.192e-05 [loop_unroll]: 2.197e-05 [a_1]: 0.00045656 [with_stream_mark]: 1.326e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 3.58999e-06 [updatestate_assign_eliminate]: 3.00998e-06 [updatestate_loads_eliminate]: 3.43e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 7.618e-05 [accelerated_algorithm]: 6.51e-06 [shard]: 1.88002e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.94999e-06 [merge_send_recv]: 7.78001e-06 [auto_parallel]: 5.38002e-06 [parallel]: 2.272e-05 [flash_sp]: 7.67002e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.25998e-06 [matmul_add_comm_reduction]: 8.60999e-06 [allreduce_slice_to_reducescatter]: 9.09989e-07 [virtual_shard_identity]: 7.17002e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.37001e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.88001e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 9.77001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.049e-05 [merge_recompute_call_nodes]: 1.33002e-06 [before_grad]: 9.07001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.71e-06 [receive_attached]: 2.29001e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.82999e-06 [renormalize]: 0.00041592 [add_forward_monad_depend]: 4.47e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.325e-05 [cse]: 2.607e-05 [a_3]: 4.103e-05 [Cycle 2]: 0.00060039, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.75998e-06 [loop_unroll]: 5.39e-06 [a_1]: 0.00012586 [with_stream_mark]: 9.76e-06 [recompute_prepare]: 5.76998e-06 [updatestate_depend_eliminate]: 2.88e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.847e-05 [accelerated_algorithm]: 5.61e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 7.28e-06 [merge_send_recv]: 4.41002e-06 [auto_parallel]: 5.24998e-06 [parallel]: 4.25e-06 [flash_sp]: 2.91e-06 [merge_comm]: 2.93e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 3.20026e-07 [virtual_shard_identity]: 6.53e-06 [virtual_dataset]: 5.44998e-06 [get_grad_eliminate_]: 5.17e-06 [virtual_output]: 5.05001e-06 [merge_forward]: 2.79001e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.003e-05 [merge_recompute_call_nodes]: 6.90023e-07 [before_grad]: 8.28999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 1.61998e-06 [flash_sp_send_recv_attached]: 6.69999e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 9.18002e-06 [a_after_grad]: 8.32e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 7.60017e-07 [auto_monad_eliminator]: 6.32001e-06 [cse]: 1.261e-05 [a_3]: 3.267e-05 [py_interpret_to_execute_after_opt_a]: 7.43e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 3.051e-05 [convert_after_rewriter]: 7.04001e-06 [order_py_execute_after_rewriter]: 4.89e-06 [mutable_eliminate]: 0.00046607 [opt_b]: 0.00018108, [1] [Cycle 1]: 0.00017506, [7] [b_1]: 0.00010705 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 5.89993e-07 [cse]: 1.642e-05 [optimize_parallel_all_gather_comm]: 1.525e-05 [overlap_param_gather]: 1.88002e-06 [cconv]: 2.136e-05 [loop_unroll]: 0.00041407 [opt_after_cconv]: 9.543e-05, [1] [Cycle 1]: 8.975e-05, [7] [c_1]: 2.808e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 2.35002e-06 [updatestate_loads_eliminate]: 2.36998e-06 [cse]: 1.639e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.151e-05 [tuple_transform]: 6.873e-05, [1] [Cycle 1]: 6.456e-05, [4] [d_1]: 3.901e-05 [none_parameter_eliminate]: 1.67999e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 4.538e-05 [cse_after_recomputation]: 2.136e-05, [1] [Cycle 1]: 1.684e-05, [1] [cse]: 1.144e-05 [environ_conv]: 4.74e-06 [swap_dp_allreduce_reducescatter]: 5.86998e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 4.01001e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.50999e-06 [slice_recompute_activation]: 2.29999e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.52001e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.74999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 1.30999e-06 [interleave_split_concat_branches]: 1.37e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.97999e-06 [control_data_broadcast_order]: 1.128e-05 [grouped_pairwise_exchange_alltoall]: 1.62001e-06 [offloading_packed_experts]: 3.79002e-06 [overlap_recompute_and_grad_model_parallel]: 4.31002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 3.68999e-06 [overlap_grad_flash_sp]: 1.711e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.23002e-06 [split_layernorm_comm]: 2.13002e-06 [handle_group_info]: 9.40025e-07 [symbol_engine_optimizer]: 6.936e-05, [1] [Cycle 1]: 6.532e-05, [6] [build]: 2.54999e-06 [elim_shapecalc]: 8.78001e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.19001e-06 [fold_const_symbol]: 9.19e-06 [renormalize]: 2.50002e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 1.629e-05 [get_jit_bprop_graph]: 9.60019e-07 [rewriter_after_jit_bprop_graph]: 3.63999e-06 [opt_after_jit_grad]: 0.00044884 [validate]: 3.124e-05 [backend_pass]: 1.04998e-06 [task_emit]: 0.00616196 [execute]: 6.91001e-06 Sums bootstrap : 0.000550s : 3.30% type_inference : 0.006110s : 36.68% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000059s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000582s : 3.50% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000416s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000466s : 2.80% optimize.opt_b.b_1 : 0.000107s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.13% optimize.loop_unroll : 0.000414s : 2.49% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000449s : 2.69% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006162s : 36.99% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000167 30 14.65% : 0.000024s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.30% : 0.000006s : 4: substitution.graph_param_transform 66.97% : 0.000112s : 3: substitution.inline 1.59% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.32% : 0.000004s : 4: substitution.replace_old_param 6.73% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006066 2 90.70% : 0.005502s : 1: type_inference.infer 9.30% : 0.000564s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.88% : 0.000027s : 3: replace.inline 30.12% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.54% : 0.000110s : 3: match.inline 8.46% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.07% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.15% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000002s : 8: predicate.less_batch_normalization 1.68% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.44% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.58% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.52% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.63% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 1.02% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.16% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.47% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.83% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000350 8 48.28% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.72% : 0.000181s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029814 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.23% : 0.003347s : 1: add_attr 11.19% : 0.003336s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000050s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.97% : 0.000588s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000423s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.59% : 0.000475s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.19% : 0.000952s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.14% : 0.002128s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.54% : 0.000458s : 1: opt_after_jit_grad 0.62% : 0.000185s : 1: opt_b 13.36% : 0.003983s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.71% : 0.000211s : 1: renormalize.infer 0.67% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 20.70% : 0.006172s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.54% : 0.006124s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0180833, [24] [bootstrap]: 0.00047202 [type_inference]: 0.00435827 [event_method]: 1.01e-05 [auto_monad]: 5.103e-05 [graph_reusing]: 5.87999e-06 [inline]: 1.87001e-06 [add_attr]: 0.00291686, [1] [add_attr_with_inline]: 0.00290813, [1] [Cycle 1]: 4.548e-05, [2] [tag_attr]: 1.198e-05 [meta_addattr_fg_expand]: 3.11999e-06 [parallel-infer-symbol]: 2.61e-06 [pre_auto_parallel]: 2.138e-05 [insert-virtual-dataset]: 2.33998e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.47999e-06 [optimize]: 0.00369225, [53] [py_interpret_to_execute]: 1.503e-05 [rewriter_before_opt_a]: 3.976e-05 [opt_a]: 0.00189823, [2] [Cycle 1]: 0.00129693, [45] [expand_dump_flag]: 2.84999e-06 [switch_simplify]: 2.346e-05 [loop_unroll]: 1.37e-05 [a_1]: 0.00028884 [with_stream_mark]: 1.344e-05 [recompute_prepare]: 7.28e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 3.14001e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.57e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.49e-06 [shard_inline]: 5.92999e-06 [merge_send_recv]: 8.10999e-06 [auto_parallel]: 5.37999e-06 [parallel]: 1.748e-05 [flash_sp]: 6.81001e-06 [merge_comm]: 3.47002e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 9.23002e-06 [allreduce_slice_to_reducescatter]: 6.00005e-07 [virtual_shard_identity]: 7.23999e-06 [virtual_dataset]: 5.71998e-06 [get_grad_eliminate_]: 5.43002e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.8e-06 [cell_reuse_recompute_pass]: 9.39996e-07 [offload_activation]: 9.54999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.062e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 1.041e-05 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.48e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 1.119e-05 [a_after_grad]: 9.05001e-06 [renormalize]: 0.00033395 [add_forward_monad_depend]: 4.12998e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.272e-05 [cse]: 2.638e-05 [a_3]: 4.022e-05 [Cycle 2]: 0.00059224, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.53e-06 [loop_unroll]: 5.27001e-06 [a_1]: 0.00012534 [with_stream_mark]: 1.116e-05 [recompute_prepare]: 5.54e-06 [updatestate_depend_eliminate]: 2.86e-06 [updatestate_assign_eliminate]: 2.25002e-06 [updatestate_loads_eliminate]: 2.36e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 6.721e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.67001e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.13002e-06 [parallel]: 4.25e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.95002e-06 [matmul_add_comm_reduction]: 4.87e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.34999e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.89998e-06 [merge_forward]: 2.71e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 5.79e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.007e-05 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.53001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.73002e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.19998e-06 [a_after_grad]: 8.27e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.16002e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.222e-05 [a_3]: 3.236e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 1.64e-06 [rewriter_after_opt_a]: 3.118e-05 [convert_after_rewriter]: 6.93e-06 [order_py_execute_after_rewriter]: 5.38002e-06 [mutable_eliminate]: 0.000446 [opt_b]: 0.00018257, [1] [Cycle 1]: 0.00017647, [7] [b_1]: 0.00010775 [b_2]: 7.31001e-06 [updatestate_depend_eliminate]: 5.31002e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.26998e-06 [renormalize]: 7.00005e-07 [cse]: 1.621e-05 [optimize_parallel_all_gather_comm]: 1.499e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.192e-05 [loop_unroll]: 0.00041183 [opt_after_cconv]: 9.278e-05, [1] [Cycle 1]: 8.717e-05, [7] [c_1]: 2.714e-05 [parameter_eliminate]: 2.13998e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.26998e-06 [cse]: 1.617e-05 [renormalize]: 3.09985e-07 [remove_dup_value]: 1.276e-05 [tuple_transform]: 6.809e-05, [1] [Cycle 1]: 6.351e-05, [4] [d_1]: 3.836e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 5.96e-06 [partial_unused_args_eliminate]: 1.53002e-06 [add_recomputation]: 4.214e-05 [cse_after_recomputation]: 1.989e-05, [1] [Cycle 1]: 1.551e-05, [1] [cse]: 1.051e-05 [environ_conv]: 4.34002e-06 [swap_dp_allreduce_reducescatter]: 4.77e-06 [bias_add_comm_swap]: 2.68e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.85002e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.29999e-06 [micro_interleaved_order_control]: 2.12999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 8.30012e-07 [remove_cast_before_assign_add]: 1.11002e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.35999e-06 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.67999e-06 [control_data_broadcast_order]: 1.188e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.48001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 4.08999e-06 [overlap_grad_flash_sp]: 1.648e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 2.03002e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.795e-05, [1] [Cycle 1]: 6.401e-05, [6] [build]: 2.16003e-06 [elim_shapecalc]: 8.23001e-06 [elim_not_effective]: 1.125e-05 [opt_reshape]: 6.34001e-06 [fold_const_symbol]: 8.82e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.89999e-06 [pipeline_parallel_scheduler]: 2.01e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.00044509 [validate]: 3.172e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00584554 [execute]: 7.13998e-06 Sums bootstrap : 0.000472s : 3.33% type_inference : 0.004358s : 30.78% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000414s : 2.92% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.09% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000019s : 0.13% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000334s : 2.36% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000073s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000446s : 3.15% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000412s : 2.91% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 3.14% validate : 0.000032s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005846s : 41.28% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000117 26 18.34% : 0.000021s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.26% : 0.000005s : 4: substitution.graph_param_transform 64.97% : 0.000076s : 2: substitution.inline 2.56% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.83% : 0.000004s : 4: substitution.remove_not_recompute_node 3.59% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004319 2 91.73% : 0.003962s : 1: type_inference.infer 8.27% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000074 2 100.00% : 0.000074s : 2: match.inline ------[predicate.] 0.000136 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.24% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.87% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.92% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000002s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.83% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.51% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.26% : 0.000002s : 8: predicate.less_batch_normalization 1.64% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 26: predicate.load_eliminater 1.17% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.72% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.88% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.72% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.76% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.93% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.45% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.43% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 41.54% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.46% : 0.000138s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025952 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.26% : 0.002921s : 1: add_attr 11.22% : 0.002912s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.96% : 0.000507s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000420s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000455s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000766s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.33% : 0.001901s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.75% : 0.000454s : 1: opt_after_jit_grad 0.72% : 0.000186s : 1: opt_b 14.24% : 0.003696s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000182s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.56% : 0.005856s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.84% : 0.004372s : 1: type_inference 0.22% : 0.000058s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x9-kbk],max_mem:64.0M TotalTime = 1.09167, [24] [bootstrap]: 0.00054103 [type_inference]: 0.00595408 [event_method]: 1.393e-05 [auto_monad]: 5.948e-05 [graph_reusing]: 5.35999e-06 [inline]: 1.84998e-06 [add_attr]: 0.00334707, [1] [add_attr_with_inline]: 0.00333623, [1] [Cycle 1]: 4.421e-05, [2] [tag_attr]: 1.539e-05 [meta_addattr_fg_expand]: 4.23001e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.668e-05 [insert-virtual-dataset]: 2.73e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.0039506, [53] [py_interpret_to_execute]: 2.046e-05 [rewriter_before_opt_a]: 5.966e-05 [opt_a]: 0.00209635, [2] [Cycle 1]: 0.00149605, [45] [expand_dump_flag]: 2.51998e-06 [switch_simplify]: 3.189e-05 [loop_unroll]: 2.106e-05 [a_1]: 0.00045338 [with_stream_mark]: 1.338e-05 [recompute_prepare]: 7.6e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.13e-06 [updatestate_loads_eliminate]: 3.01999e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.656e-05 [accelerated_algorithm]: 6.44001e-06 [shard]: 1.91e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.95002e-06 [merge_send_recv]: 7.51999e-06 [auto_parallel]: 5.84e-06 [parallel]: 2.222e-05 [flash_sp]: 7.25e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.43999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.26999e-06 [virtual_dataset]: 6.08998e-06 [get_grad_eliminate_]: 5.59e-06 [virtual_output]: 5.52001e-06 [merge_forward]: 3.53999e-06 [cell_reuse_recompute_pass]: 1.30001e-06 [offload_activation]: 9.11002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.58997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.76e-06 [receive_attached]: 2.41e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00040131 [add_forward_monad_depend]: 4.3e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.27e-05 [cse]: 2.703e-05 [a_3]: 4.064e-05 [Cycle 2]: 0.00059106, [45] [expand_dump_flag]: 1.10001e-06 [switch_simplify]: 7.01001e-06 [loop_unroll]: 5.51998e-06 [a_1]: 0.00012609 [with_stream_mark]: 9.71998e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.798e-05 [accelerated_algorithm]: 5.44998e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.53002e-06 [parallel]: 4.18999e-06 [flash_sp]: 3.2e-06 [merge_comm]: 2.98998e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 4.93001e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 6.01998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47999e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.75998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.07002e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 8.89998e-06 [a_after_grad]: 8.42998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 5.96e-06 [cse]: 1.639e-05 [a_3]: 3.187e-05 [py_interpret_to_execute_after_opt_a]: 7.41999e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.188e-05 [convert_after_rewriter]: 6.88998e-06 [order_py_execute_after_rewriter]: 5.01997e-06 [mutable_eliminate]: 0.00044687 [opt_b]: 0.00018064, [1] [Cycle 1]: 0.00017447, [7] [b_1]: 0.00010777 [b_2]: 7.29001e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.41998e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 4.30009e-07 [cse]: 1.571e-05 [optimize_parallel_all_gather_comm]: 1.609e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.0004336 [opt_after_cconv]: 9.454e-05, [1] [Cycle 1]: 8.89e-05, [7] [c_1]: 2.782e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.558e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.203e-05 [tuple_transform]: 6.776e-05, [1] [Cycle 1]: 6.349e-05, [4] [d_1]: 3.836e-05 [none_parameter_eliminate]: 1.45001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.08002e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.855e-05 [cse_after_recomputation]: 2.015e-05, [1] [Cycle 1]: 1.573e-05, [1] [cse]: 1.054e-05 [environ_conv]: 4.65001e-06 [swap_dp_allreduce_reducescatter]: 5.06002e-06 [bias_add_comm_swap]: 2.43e-06 [label_micro_interleaved_index]: 3.91001e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.07001e-06 [micro_interleaved_order_control]: 2.18002e-06 [assign_add_opt]: 1.40001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.23002e-06 [full_micro_interleaved_order_control]: 2.28002e-06 [reorder_send_recv_between_fp_bp]: 2.62001e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.05001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.125e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.39002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 1.91003e-06 [overlap_grad_ring_attention]: 3.82002e-06 [overlap_grad_flash_sp]: 1.732e-05 [begin_end_overlap_inline]: 6.39993e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 1.09e-06 [symbol_engine_optimizer]: 6.907e-05, [1] [Cycle 1]: 6.469e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.66002e-06 [elim_not_effective]: 1.149e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.99e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.47001e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.55e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.00044759 [validate]: 3.066e-05 [backend_pass]: 1.02e-06 [task_emit]: 1.07704 [execute]: 9.39e-06 Sums bootstrap : 0.000541s : 0.05% type_inference : 0.005954s : 0.55% event_method : 0.000014s : 0.00% auto_monad : 0.000059s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000060s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000027s : 0.00% optimize.opt_a.a_1 : 0.000579s : 0.05% optimize.opt_a.with_stream_mark : 0.000023s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.00% optimize.opt_a.merge_send_recv : 0.000012s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000026s : 0.00% optimize.opt_a.flash_sp : 0.000010s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.00% optimize.opt_a.virtual_dataset : 0.000011s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000010s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.00% optimize.opt_a.a_after_grad : 0.000017s : 0.00% optimize.opt_a.renormalize : 0.000401s : 0.04% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.00% optimize.opt_a.cse : 0.000043s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.04% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000434s : 0.04% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.00% optimize.tuple_transform.d_1 : 0.000038s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000049s : 0.00% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.04% validate : 0.000031s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 1.077039s : 99.05% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000161 30 14.43% : 0.000023s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.11% : 0.000005s : 4: substitution.graph_param_transform 67.19% : 0.000109s : 3: substitution.inline 1.85% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.59% : 0.000004s : 4: substitution.remove_not_recompute_node 2.20% : 0.000004s : 4: substitution.replace_old_param 6.79% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005909 2 90.77% : 0.005364s : 1: type_inference.infer 9.23% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.24% : 0.000027s : 3: replace.inline 30.76% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.43% : 0.000106s : 3: match.inline 8.57% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.86% : 0.000001s : 11: predicate.accumulaten_eliminater 0.99% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.87% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.93% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.21% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000010s : 51: predicate.inline 0.94% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.25% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.67% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.12% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.38% : 0.000001s : 4: predicate.parallel_virtual_node 1.66% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.87% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 1.96% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.02% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.81% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.40% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000331 8 46.05% : 0.000152s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.95% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 1.100467 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.30% : 0.003352s : 1: add_attr 0.30% : 0.003340s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.00% : 0.000053s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000065s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.05% : 0.000577s : 1: bootstrap 0.00% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000014s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.04% : 0.000442s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.04% : 0.000456s : 1: mutable_eliminate 0.00% : 0.000007s : 1: offloading_packed_experts 0.00% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.09% : 0.000945s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000032s : 4: opt.transform.symbol_engine_opt 0.19% : 0.002099s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.04% : 0.000457s : 1: opt_after_jit_grad 0.02% : 0.000184s : 1: opt_b 0.36% : 0.003954s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000031s : 1: pre_auto_parallel 0.00% : 0.000025s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000015s : 1: remove_dup_value 0.02% : 0.000206s : 1: renormalize.infer 0.02% : 0.000188s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000064s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 97.87% : 1.077061s : 1: task_emit 0.01% : 0.000071s : 1: tuple_transform 0.54% : 0.005968s : 1: type_inference 0.00% : 0.000055s : 1: validate TotalTime = 0.0705732, [24] [bootstrap]: 0.00047847 [type_inference]: 0.00437531 [event_method]: 1.075e-05 [auto_monad]: 4.947e-05 [graph_reusing]: 5.32001e-06 [inline]: 1.85001e-06 [add_attr]: 0.00297722, [1] [add_attr_with_inline]: 0.00296958, [1] [Cycle 1]: 4.452e-05, [2] [tag_attr]: 1.128e-05 [meta_addattr_fg_expand]: 3.25e-06 [parallel-infer-symbol]: 2.91e-06 [pre_auto_parallel]: 2.08e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.76998e-06 [optimize]: 0.00368448, [53] [py_interpret_to_execute]: 1.517e-05 [rewriter_before_opt_a]: 3.906e-05 [opt_a]: 0.00188994, [2] [Cycle 1]: 0.00124539, [45] [expand_dump_flag]: 2.94001e-06 [switch_simplify]: 2.322e-05 [loop_unroll]: 1.356e-05 [a_1]: 0.00029187 [with_stream_mark]: 1.352e-05 [recompute_prepare]: 7.38999e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.38e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.634e-05 [accelerated_algorithm]: 6.07999e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.47999e-06 [shard_inline]: 5.81e-06 [merge_send_recv]: 8.27003e-06 [auto_parallel]: 5.92001e-06 [parallel]: 1.828e-05 [flash_sp]: 7.1e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.8e-06 [matmul_add_comm_reduction]: 8.67998e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 6.79999e-06 [virtual_dataset]: 5.67999e-06 [get_grad_eliminate_]: 5.22e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 3.63999e-06 [cell_reuse_recompute_pass]: 1.14003e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.09e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 8.92e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 2.83998e-06 [receive_attached]: 2.34001e-06 [after_resolve]: 9.99999e-06 [a_after_grad]: 8.52e-06 [renormalize]: 0.00034062 [add_forward_monad_depend]: 4.59998e-06 [auto_monad_grad]: 1.82999e-06 [auto_monad_eliminator]: 1.304e-05 [cse]: 2.691e-05 [a_3]: 3.974e-05 [Cycle 2]: 0.00063508, [45] [expand_dump_flag]: 1.08001e-06 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.32999e-06 [a_1]: 0.00016657 [with_stream_mark]: 9.51998e-06 [recompute_prepare]: 5.94e-06 [updatestate_depend_eliminate]: 2.76e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 6.845e-05 [accelerated_algorithm]: 5.49e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.44e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.18002e-06 [parallel]: 4.03001e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 2.84999e-06 [allreduce_fusion]: 2.72001e-06 [matmul_add_comm_reduction]: 5.15999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.14001e-06 [virtual_dataset]: 5.25001e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.02e-06 [merge_forward]: 2.53998e-06 [cell_reuse_recompute_pass]: 1.37e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.92999e-06 [merge_recompute_call_nodes]: 6.60017e-07 [before_grad]: 8.22e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.61998e-06 [flash_sp_send_recv_attached]: 9.39996e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.22999e-06 [a_after_grad]: 7.93001e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 9.50007e-07 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 6.09999e-06 [cse]: 1.303e-05 [a_3]: 3.196e-05 [py_interpret_to_execute_after_opt_a]: 7.18e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 3.024e-05 [convert_after_rewriter]: 6.91001e-06 [order_py_execute_after_rewriter]: 5.04e-06 [mutable_eliminate]: 0.00044573 [opt_b]: 0.00018157, [1] [Cycle 1]: 0.00017553, [7] [b_1]: 0.00010805 [b_2]: 7.18e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.32999e-06 [renormalize]: 3.7998e-07 [cse]: 1.606e-05 [optimize_parallel_all_gather_comm]: 1.558e-05 [overlap_param_gather]: 1.77999e-06 [cconv]: 2.182e-05 [loop_unroll]: 0.00041273 [opt_after_cconv]: 9.428e-05, [1] [Cycle 1]: 8.865e-05, [7] [c_1]: 2.714e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.12001e-06 [cse]: 1.59e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.133e-05 [tuple_transform]: 6.938e-05, [1] [Cycle 1]: 6.512e-05, [4] [d_1]: 3.94e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.19995e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.348e-05 [cse_after_recomputation]: 2.074e-05, [1] [Cycle 1]: 1.608e-05, [1] [cse]: 1.108e-05 [environ_conv]: 4.30999e-06 [swap_dp_allreduce_reducescatter]: 5.46e-06 [bias_add_comm_swap]: 2.14999e-06 [label_micro_interleaved_index]: 3.86001e-06 [label_fine_grained_interleaved_index]: 2.76999e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 8.2e-07 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.61e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.39995e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.163e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.93001e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.30002e-06 [overlap_grad_ring_attention]: 4.21001e-06 [overlap_grad_flash_sp]: 1.676e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.21998e-06 [split_layernorm_comm]: 2.09e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.672e-05, [1] [Cycle 1]: 6.251e-05, [6] [build]: 2.39001e-06 [elim_shapecalc]: 8.33999e-06 [elim_not_effective]: 1.109e-05 [opt_reshape]: 5.82001e-06 [fold_const_symbol]: 8.59e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.46998e-06 [auto_monad_reorder]: 1.489e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.28998e-06 [opt_after_jit_grad]: 0.0004485 [validate]: 2.99e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0582591 [execute]: 8e-06 Sums bootstrap : 0.000478s : 0.72% type_inference : 0.004375s : 6.57% event_method : 0.000011s : 0.02% auto_monad : 0.000049s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000458s : 0.69% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000341s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.67% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000413s : 0.62% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.67% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.058259s : 87.42% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.23% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 0.96% : 0.000001s : 2: substitution.fold_const_symbol 4.29% : 0.000005s : 4: substitution.graph_param_transform 65.80% : 0.000080s : 2: substitution.inline 2.17% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.73% : 0.000005s : 4: substitution.remove_not_recompute_node 3.39% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004335 2 91.75% : 0.003977s : 1: type_inference.infer 8.25% : 0.000358s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.94% : 0.000001s : 9: predicate.accumulaten_eliminater 0.99% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.31% : 0.000003s : 17: predicate.arithmetic_simplify 0.86% : 0.000001s : 9: predicate.cast_eliminate 0.97% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.84% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.92% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.27% : 0.000009s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.81% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.85% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.33% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.30% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.81% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.09% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.55% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.52% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000253 6 43.20% : 0.000109s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.80% : 0.000143s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078540 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.80% : 0.002981s : 1: add_attr 3.79% : 0.002973s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000513s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.03% : 0.000806s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.41% : 0.001893s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.58% : 0.000457s : 1: opt_after_jit_grad 0.24% : 0.000185s : 1: opt_b 4.70% : 0.003688s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000185s : 1: renormalize.infer 0.19% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000069s : 1: symbol_engine_optimizer 74.20% : 0.058274s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.59% : 0.004389s : 1: type_inference 0.06% : 0.000049s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y8-dtype_x9-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x0-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x0-pynative],max_mem:64.0M TotalTime = 0.0223497, [24] [bootstrap]: 0.00175273 [type_inference]: 0.00608713 [event_method]: 1.447e-05 [auto_monad]: 5.467e-05 [graph_reusing]: 5.02e-06 [inline]: 2.19001e-06 [add_attr]: 0.00336895, [1] [add_attr_with_inline]: 0.00335845, [1] [Cycle 1]: 4.313e-05, [2] [tag_attr]: 1.497e-05 [meta_addattr_fg_expand]: 4.27e-06 [parallel-infer-symbol]: 3.01001e-06 [pre_auto_parallel]: 2.914e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.73002e-06 [pipeline_split]: 1.61002e-06 [optimize]: 0.00399009, [53] [py_interpret_to_execute]: 2.084e-05 [rewriter_before_opt_a]: 5.843e-05 [opt_a]: 0.00215519, [2] [Cycle 1]: 0.00155409, [45] [expand_dump_flag]: 2.64001e-06 [switch_simplify]: 3.104e-05 [loop_unroll]: 2.046e-05 [a_1]: 0.00045083 [with_stream_mark]: 1.302e-05 [recompute_prepare]: 7.75e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 3.13998e-06 [parameter_eliminate]: 1.74e-06 [a_2]: 0.00011399 [accelerated_algorithm]: 6.70002e-06 [shard]: 2.13998e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 8.02998e-06 [auto_parallel]: 5.93002e-06 [parallel]: 2.312e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.14999e-06 [matmul_add_comm_reduction]: 8.58001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.77e-06 [virtual_dataset]: 5.88998e-06 [get_grad_eliminate_]: 5.29998e-06 [virtual_output]: 5.87999e-06 [merge_forward]: 3.78999e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 8.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.058e-05 [merge_recompute_call_nodes]: 1.80001e-06 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.22001e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 8.62998e-06 [renormalize]: 0.00042222 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.387e-05 [cse]: 2.748e-05 [a_3]: 4.073e-05 [Cycle 2]: 0.00059194, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.82002e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.00012493 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.59998e-06 [updatestate_depend_eliminate]: 2.71999e-06 [updatestate_assign_eliminate]: 2.17999e-06 [updatestate_loads_eliminate]: 2.48002e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.741e-05 [accelerated_algorithm]: 5.52001e-06 [shard]: 1.07998e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.37e-06 [auto_parallel]: 5.10001e-06 [parallel]: 4.01001e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 3.23998e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.21002e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.58e-06 [virtual_dataset]: 5.34e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 5.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.82999e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.18999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21999e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.82e-06 [a_after_grad]: 8.28999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.00999e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.22001e-06 [cse]: 1.295e-05 [a_3]: 3.173e-05 [py_interpret_to_execute_after_opt_a]: 7.9e-06 [slice_cell_reuse_recomputed_activation]: 2.44001e-06 [rewriter_after_opt_a]: 3.02e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.37001e-06 [mutable_eliminate]: 0.00044963 [opt_b]: 0.00018181, [1] [Cycle 1]: 0.00017554, [7] [b_1]: 0.000107 [b_2]: 7.27997e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.39991e-07 [cse]: 1.686e-05 [optimize_parallel_all_gather_comm]: 1.564e-05 [overlap_param_gather]: 1.71002e-06 [cconv]: 2.186e-05 [loop_unroll]: 0.00041545 [opt_after_cconv]: 9.535e-05, [1] [Cycle 1]: 8.972e-05, [7] [c_1]: 2.808e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.29999e-06 [cse]: 1.609e-05 [renormalize]: 3.29979e-07 [remove_dup_value]: 1.181e-05 [tuple_transform]: 6.891e-05, [1] [Cycle 1]: 6.465e-05, [4] [d_1]: 3.938e-05 [none_parameter_eliminate]: 1.39998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.527e-05 [cse_after_recomputation]: 2.05e-05, [1] [Cycle 1]: 1.619e-05, [1] [cse]: 1.111e-05 [environ_conv]: 4.95001e-06 [swap_dp_allreduce_reducescatter]: 4.80999e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 4.2e-06 [label_fine_grained_interleaved_index]: 2.79001e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.36998e-06 [micro_interleaved_order_control]: 2.21998e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.17e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.76999e-06 [comm_op_add_attrs]: 1.13001e-06 [add_comm_op_reuse_tag]: 8.80013e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87999e-06 [control_data_broadcast_order]: 1.146e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 3.48e-06 [overlap_recompute_and_grad_model_parallel]: 4.28001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.14999e-06 [overlap_grad_ring_attention]: 3.82002e-06 [overlap_grad_flash_sp]: 1.719e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.03997e-06 [split_layernorm_comm]: 1.73002e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 6.704e-05, [1] [Cycle 1]: 6.298e-05, [6] [build]: 2.41998e-06 [elim_shapecalc]: 7.93999e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 5.95002e-06 [fold_const_symbol]: 8.67998e-06 [renormalize]: 1.80007e-07 [detach_backward]: 1.97999e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.516e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 0.00011866 [opt_after_jit_grad]: 0.00045676 [validate]: 2.967e-05 [backend_pass]: 1.14e-06 [task_emit]: 0.00619701 [execute]: 6.79001e-06 Sums bootstrap : 0.001753s : 9.73% type_inference : 0.006087s : 33.80% event_method : 0.000014s : 0.08% auto_monad : 0.000055s : 0.30% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000029s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000058s : 0.32% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.21% optimize.opt_a.loop_unroll : 0.000026s : 0.14% optimize.opt_a.a_1 : 0.000576s : 3.20% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.07% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000181s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.06% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000008s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.08% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.11% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.10% optimize.opt_a.a_after_grad : 0.000017s : 0.09% optimize.opt_a.renormalize : 0.000422s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000040s : 0.22% optimize.opt_a.a_3 : 0.000072s : 0.40% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 2.50% optimize.opt_b.b_1 : 0.000107s : 0.59% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000415s : 2.31% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000045s : 0.25% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.06% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.08% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000119s : 0.66% opt_after_jit_grad : 0.000457s : 2.54% validate : 0.000030s : 0.16% backend_pass : 0.000001s : 0.01% task_emit : 0.006197s : 34.41% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 14.51% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.48% : 0.000006s : 4: substitution.graph_param_transform 66.76% : 0.000110s : 3: substitution.inline 1.78% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.40% : 0.000004s : 4: substitution.replace_old_param 6.62% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006045 2 90.78% : 0.005488s : 1: type_inference.infer 9.22% : 0.000557s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.55% : 0.000027s : 3: replace.inline 29.45% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.61% : 0.000108s : 3: match.inline 8.39% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000194 1131 0.72% : 0.000001s : 11: predicate.accumulaten_eliminater 0.74% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.46% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 11: predicate.addn_zero_filter 0.63% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 1.79% : 0.000003s : 19: predicate.arithmetic_simplify 0.70% : 0.000001s : 11: predicate.cast_eliminate 0.57% : 0.000001s : 8: predicate.check_bprop_eliminate 0.46% : 0.000001s : 8: predicate.compare_switch_simplify 0.20% : 0.000000s : 4: predicate.const_output_eliminate 0.52% : 0.000001s : 8: predicate.depend_value_elim 0.74% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.77% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.73% : 0.000001s : 11: predicate.dict_set_item_eliminator 0.99% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.22% : 0.000000s : 4: predicate.elim_not_effective 0.32% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 0.98% : 0.000002s : 15: predicate.environ_add_const_eliminate 0.87% : 0.000002s : 15: predicate.environ_get_add_eliminate 0.90% : 0.000002s : 15: predicate.environ_get_depend_swap 1.47% : 0.000003s : 23: predicate.environ_get_eliminate 0.89% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.04% : 0.000002s : 16: predicate.exchange_switch_depend_value 1.80% : 0.000004s : 16: predicate.float_depend_g_call 18.63% : 0.000036s : 8: predicate.float_environ_get_switch 0.71% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.60% : 0.000001s : 8: predicate.get_grad_eliminate 0.22% : 0.000000s : 4: predicate.graph_param_transform 0.59% : 0.000001s : 8: predicate.incorporate_call 0.44% : 0.000001s : 8: predicate.incorporate_call_switch 5.03% : 0.000010s : 51: predicate.inline 0.73% : 0.000001s : 8: predicate.inline_without_move 0.31% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.70% : 0.000001s : 8: predicate.less_batch_normalization 1.46% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 1.95% : 0.000004s : 32: predicate.load_eliminater 0.95% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.80% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.33% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.49% : 0.000001s : 8: predicate.merge_addn 0.53% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.48% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.63% : 0.000001s : 11: predicate.minmaximum_grad 0.96% : 0.000002s : 4: predicate.mutable_eliminate 0.30% : 0.000001s : 4: predicate.opt_reshape 0.31% : 0.000001s : 4: predicate.parallel_virtual_node 1.30% : 0.000003s : 16: predicate.partial_defer_inline 1.20% : 0.000002s : 17: predicate.partial_eliminate 0.69% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 0.89% : 0.000002s : 11: predicate.reduce_eliminate 1.97% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000001s : 8: predicate.remove_not_recompute_node 1.13% : 0.000002s : 21: predicate.replace_applicator 0.53% : 0.000001s : 8: predicate.replace_old_param 0.26% : 0.000001s : 4: predicate.reset_defer_inline 0.70% : 0.000001s : 11: predicate.reshape_eliminate 0.56% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.30% : 0.000001s : 4: predicate.row_tensor_eliminate 0.65% : 0.000001s : 8: predicate.same_eliminate 0.41% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000002s : 8: predicate.shard_identity_eliminate 0.64% : 0.000001s : 8: predicate.special_op_eliminate 0.67% : 0.000001s : 8: predicate.specialize_transform 0.80% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.66% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.31% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.11% : 0.000002s : 16: predicate.switch_defer_inline 1.63% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.07% : 0.000008s : 54: predicate.switch_simplify 0.70% : 0.000001s : 11: predicate.tile_eliminate 0.74% : 0.000001s : 11: predicate.transpose_eliminate 1.29% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.31% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.13% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 2.74% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.16% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 1.81% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.38% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 1.91% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 2.60% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.30% : 0.000001s : 4: predicate.value_based_eliminate 0.63% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.62% : 0.000001s : 8: predicate.virtual_output_eliminate 0.28% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.41% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000358 8 48.76% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.24% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.031260 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.79% : 0.003373s : 1: add_attr 10.75% : 0.003362s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000049s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.19% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000007s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 5.73% : 0.001790s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.36% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.47% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.13% : 0.000977s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 6.90% : 0.002158s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.49% : 0.000466s : 1: opt_after_jit_grad 0.59% : 0.000185s : 1: opt_b 12.78% : 0.003994s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.11% : 0.000034s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.68% : 0.000212s : 1: renormalize.infer 0.65% : 0.000204s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.40% : 0.000124s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.20% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000070s : 1: symbol_engine_optimizer 19.86% : 0.006207s : 1: task_emit 0.23% : 0.000072s : 1: tuple_transform 19.52% : 0.006100s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0179708, [24] [bootstrap]: 0.00044882 [type_inference]: 0.00429587 [event_method]: 1.034e-05 [auto_monad]: 5.222e-05 [graph_reusing]: 5.85002e-06 [inline]: 2.36e-06 [add_attr]: 0.00293713, [1] [add_attr_with_inline]: 0.00292936, [1] [Cycle 1]: 4.52e-05, [2] [tag_attr]: 1.159e-05 [meta_addattr_fg_expand]: 3.71999e-06 [parallel-infer-symbol]: 2.91999e-06 [pre_auto_parallel]: 2.12e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.67999e-06 [pipeline_split]: 1.53002e-06 [optimize]: 0.00364629, [53] [py_interpret_to_execute]: 1.526e-05 [rewriter_before_opt_a]: 3.882e-05 [opt_a]: 0.00183615, [2] [Cycle 1]: 0.00123846, [45] [expand_dump_flag]: 2.51e-06 [switch_simplify]: 2.384e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00028843 [with_stream_mark]: 1.293e-05 [recompute_prepare]: 7.22002e-06 [updatestate_depend_eliminate]: 3.63999e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.81999e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.662e-05 [accelerated_algorithm]: 6.26e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.74e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 5.67001e-06 [parallel]: 1.674e-05 [flash_sp]: 7.23999e-06 [merge_comm]: 3.73999e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.92e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.46001e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.34998e-06 [virtual_output]: 5.44998e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 9.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.085e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.24998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.34001e-06 [flash_sp_send_recv_attached]: 2.64001e-06 [receive_attached]: 2.51e-06 [after_resolve]: 1.071e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.0003351 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.92001e-06 [auto_monad_eliminator]: 1.31e-05 [cse]: 2.84e-05 [a_3]: 3.925e-05 [Cycle 2]: 0.00058864, [45] [expand_dump_flag]: 8.60018e-07 [switch_simplify]: 6.67002e-06 [loop_unroll]: 5.29e-06 [a_1]: 0.00012424 [with_stream_mark]: 1.038e-05 [recompute_prepare]: 5.38002e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.12999e-06 [updatestate_loads_eliminate]: 2.16e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.688e-05 [accelerated_algorithm]: 5.49998e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.65001e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.17e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.31999e-06 [merge_comm]: 2.83998e-06 [allreduce_fusion]: 2.56998e-06 [matmul_add_comm_reduction]: 4.99e-06 [allreduce_slice_to_reducescatter]: 2.30008e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.12999e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 5.00999e-06 [merge_forward]: 2.74001e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 5.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 8.77e-06 [a_after_grad]: 8.03001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.49978e-07 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.235e-05 [a_3]: 3.14e-05 [py_interpret_to_execute_after_opt_a]: 7.33e-06 [slice_cell_reuse_recomputed_activation]: 2.29001e-06 [rewriter_after_opt_a]: 3.039e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.51998e-06 [mutable_eliminate]: 0.0004417 [opt_b]: 0.00018094, [1] [Cycle 1]: 0.00017484, [7] [b_1]: 0.00010675 [b_2]: 7.13e-06 [updatestate_depend_eliminate]: 4.90999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.43998e-06 [renormalize]: 3.69997e-07 [cse]: 1.63e-05 [optimize_parallel_all_gather_comm]: 1.534e-05 [overlap_param_gather]: 1.87999e-06 [cconv]: 2.24e-05 [loop_unroll]: 0.00042677 [opt_after_cconv]: 9.44e-05, [1] [Cycle 1]: 8.887e-05, [7] [c_1]: 2.758e-05 [parameter_eliminate]: 2.16998e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.74001e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.615e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 1.263e-05 [tuple_transform]: 6.937e-05, [1] [Cycle 1]: 6.505e-05, [4] [d_1]: 3.911e-05 [none_parameter_eliminate]: 1.56998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.02001e-06 [partial_unused_args_eliminate]: 1.66e-06 [add_recomputation]: 4.327e-05 [cse_after_recomputation]: 2.048e-05, [1] [Cycle 1]: 1.589e-05, [1] [cse]: 1.097e-05 [environ_conv]: 4.70001e-06 [swap_dp_allreduce_reducescatter]: 5.18002e-06 [bias_add_comm_swap]: 2.26e-06 [label_micro_interleaved_index]: 4.34002e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.14e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.52001e-06 [full_micro_interleaved_order_control]: 2.19999e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 1.00001e-06 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.11002e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.51002e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.35e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 4.12998e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.20002e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.813e-05, [1] [Cycle 1]: 6.405e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.43001e-06 [elim_not_effective]: 1.148e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.69998e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.532e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.17002e-06 [opt_after_jit_grad]: 0.00044166 [validate]: 2.967e-05 [backend_pass]: 1.04003e-06 [task_emit]: 0.00584932 [execute]: 6.29001e-06 Sums bootstrap : 0.000449s : 3.19% type_inference : 0.004296s : 30.51% event_method : 0.000010s : 0.07% auto_monad : 0.000052s : 0.37% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000413s : 2.93% optimize.opt_a.with_stream_mark : 0.000023s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000335s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000041s : 0.29% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.02% optimize.rewriter_after_opt_a : 0.000030s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000442s : 3.14% optimize.opt_b.b_1 : 0.000107s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000427s : 3.03% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.28% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.31% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000002s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000442s : 3.14% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005849s : 41.55% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000118 26 18.13% : 0.000021s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.87% : 0.000006s : 4: substitution.graph_param_transform 64.83% : 0.000077s : 2: substitution.inline 2.30% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.00% : 0.000005s : 4: substitution.remove_not_recompute_node 3.35% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004256 2 91.91% : 0.003912s : 1: type_inference.infer 8.09% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.40% : 0.000003s : 17: predicate.arithmetic_simplify 0.83% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 1.00% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.02% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.40% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.74% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.94% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.55% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.47% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 43.31% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.69% : 0.000135s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025808 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.40% : 0.002941s : 1: add_attr 11.36% : 0.002933s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.88% : 0.000485s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.69% : 0.000436s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000451s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000760s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.13% : 0.001839s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000451s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.14% : 0.003650s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000180s : 1: renormalize.infer 0.58% : 0.000149s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.70% : 0.005859s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.70% : 0.004310s : 1: type_inference 0.21% : 0.000054s : 1: validate TotalTime = 0.0195195, [24] [bootstrap]: 0.00046505 [type_inference]: 0.00545245 [event_method]: 1.415e-05 [auto_monad]: 5.232e-05 [graph_reusing]: 4.95001e-06 [inline]: 2.11e-06 [add_attr]: 0.00292319, [1] [add_attr_with_inline]: 0.00291481, [1] [Cycle 1]: 4.507e-05, [2] [tag_attr]: 1.513e-05 [meta_addattr_fg_expand]: 3.98001e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.439e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 2.16998e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00396092, [53] [py_interpret_to_execute]: 2.033e-05 [rewriter_before_opt_a]: 5.759e-05 [opt_a]: 0.00213749, [2] [Cycle 1]: 0.00154053, [45] [expand_dump_flag]: 2.78998e-06 [switch_simplify]: 3.222e-05 [loop_unroll]: 2.07e-05 [a_1]: 0.00045374 [with_stream_mark]: 1.309e-05 [recompute_prepare]: 7.51999e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 1.77999e-06 [a_2]: 7.526e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.23002e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 6.04001e-06 [merge_send_recv]: 8.13999e-06 [auto_parallel]: 5.74e-06 [parallel]: 1.62e-05 [flash_sp]: 6.83998e-06 [merge_comm]: 3.91001e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 8.69998e-06 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 6.68e-06 [virtual_dataset]: 5.92001e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.59998e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.116e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.33002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.42997e-06 [meta_fg_expand]: 2.07999e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.01003e-06 [after_resolve]: 9.91e-06 [a_after_grad]: 8.55001e-06 [renormalize]: 0.0003924 [add_forward_monad_depend]: 4.75001e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.29e-05 [cse]: 2.613e-05 [a_3]: 3.962e-05 [Cycle 2]: 0.00058769, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.53e-06 [loop_unroll]: 5.24e-06 [a_1]: 0.00012552 [with_stream_mark]: 9.65002e-06 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.72e-05 [accelerated_algorithm]: 5.58002e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.62999e-06 [merge_send_recv]: 4.50001e-06 [auto_parallel]: 5.25999e-06 [parallel]: 4.57998e-06 [flash_sp]: 3.00002e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.82002e-06 [matmul_add_comm_reduction]: 4.80001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 5.87001e-06 [virtual_dataset]: 5.07999e-06 [get_grad_eliminate_]: 5.30999e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.42001e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 6.08998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.78002e-06 [flash_sp_send_recv_attached]: 8.80013e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.65999e-06 [a_after_grad]: 7.97998e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 7.79983e-07 [auto_monad_eliminator]: 6.34999e-06 [cse]: 1.284e-05 [a_3]: 3.267e-05 [py_interpret_to_execute_after_opt_a]: 7.95e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.079e-05 [convert_after_rewriter]: 7.05e-06 [order_py_execute_after_rewriter]: 5.09998e-06 [mutable_eliminate]: 0.0004515 [opt_b]: 0.00018115, [1] [Cycle 1]: 0.00017511, [7] [b_1]: 0.00010786 [b_2]: 6.93998e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.30002e-06 [renormalize]: 3.99974e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.537e-05 [overlap_param_gather]: 1.82999e-06 [cconv]: 2.165e-05 [loop_unroll]: 0.00041187 [opt_after_cconv]: 9.474e-05, [1] [Cycle 1]: 8.905e-05, [7] [c_1]: 2.767e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.19999e-06 [cse]: 1.581e-05 [renormalize]: 4.40021e-07 [remove_dup_value]: 1.189e-05 [tuple_transform]: 6.845e-05, [1] [Cycle 1]: 6.404e-05, [4] [d_1]: 3.868e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.22001e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 4.193e-05 [cse_after_recomputation]: 1.923e-05, [1] [Cycle 1]: 1.492e-05, [1] [cse]: 9.93002e-06 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.35999e-06 [bias_add_comm_swap]: 2.37001e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.43e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.30013e-07 [full_micro_interleaved_order_control]: 2.21998e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.25999e-06 [add_comm_op_reuse_tag]: 1.05001e-06 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50999e-06 [control_data_broadcast_order]: 1.193e-05 [grouped_pairwise_exchange_alltoall]: 1.69998e-06 [offloading_packed_experts]: 3.43e-06 [overlap_recompute_and_grad_model_parallel]: 4.57998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.42e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 3.66999e-06 [overlap_grad_flash_sp]: 1.688e-05 [begin_end_overlap_inline]: 5.20027e-07 [split_matmul_comm_elemetwise]: 2.27999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.05001e-06 [symbol_engine_optimizer]: 6.797e-05, [1] [Cycle 1]: 6.386e-05, [6] [build]: 2.36998e-06 [elim_shapecalc]: 8.30999e-06 [elim_not_effective]: 1.125e-05 [opt_reshape]: 6.08002e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.60999e-06 [auto_monad_reorder]: 1.506e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.33998e-06 [opt_after_jit_grad]: 0.00046991 [validate]: 2.999e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.00589103 [execute]: 6.69999e-06 Sums bootstrap : 0.000465s : 2.98% type_inference : 0.005452s : 34.98% event_method : 0.000014s : 0.09% auto_monad : 0.000052s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.25% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000579s : 3.72% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000393s : 2.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000039s : 0.25% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.90% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000412s : 2.64% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000470s : 3.01% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005891s : 37.80% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000164 30 15.47% : 0.000025s : 5: substitution.arithmetic_simplify 1.17% : 0.000002s : 2: substitution.elim_not_effective 0.79% : 0.000001s : 2: substitution.fold_const_symbol 3.19% : 0.000005s : 4: substitution.graph_param_transform 66.06% : 0.000108s : 3: substitution.inline 1.77% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.94% : 0.000005s : 4: substitution.remove_not_recompute_node 2.22% : 0.000004s : 4: substitution.replace_old_param 6.39% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005414 2 90.30% : 0.004888s : 1: type_inference.infer 9.70% : 0.000525s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.22% : 0.000027s : 3: replace.inline 29.78% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.83% : 0.000106s : 3: match.inline 8.17% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.33% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.87% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.86% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.71% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.75% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.72% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.75% : 0.000003s : 16: predicate.partial_defer_inline 1.50% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.35% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.53% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.79% : 0.000001s : 8: predicate.shard_identity_eliminate 0.73% : 0.000001s : 8: predicate.special_op_eliminate 0.79% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.03% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.30% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000325 8 46.34% : 0.000151s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.66% : 0.000174s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027897 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.49% : 0.002927s : 1: add_attr 10.46% : 0.002919s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.79% : 0.000500s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.65% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.38% : 0.000942s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.67% : 0.002140s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.72% : 0.000480s : 1: opt_after_jit_grad 0.66% : 0.000184s : 1: opt_b 14.21% : 0.003965s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.09% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.71% : 0.000199s : 1: renormalize.infer 0.67% : 0.000187s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 21.15% : 0.005900s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 19.59% : 0.005465s : 1: type_inference 0.19% : 0.000054s : 1: validate TotalTime = 0.0369161, [24] [bootstrap]: 0.00050711 [type_inference]: 0.0112651 [event_method]: 4.565e-05 [auto_monad]: 0.00011892 [graph_reusing]: 8.07e-06 [inline]: 1.96e-06 [add_attr]: 0.00295955, [1] [add_attr_with_inline]: 0.00295164, [1] [Cycle 1]: 7.142e-05, [2] [tag_attr]: 3.443e-05 [meta_addattr_fg_expand]: 9.34e-06 [parallel-infer-symbol]: 2.52001e-06 [pre_auto_parallel]: 4.839e-05 [insert-virtual-dataset]: 2.26998e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.22999e-06 [pipeline_split]: 1.56998e-06 [optimize]: 0.0131507, [53] [py_interpret_to_execute]: 3.88e-05 [rewriter_before_opt_a]: 0.00014385 [opt_a]: 0.0108847, [3] [Cycle 1]: 0.00698952, [45] [expand_dump_flag]: 3.73999e-06 [switch_simplify]: 0.00012077 [loop_unroll]: 6.188e-05 [a_1]: 0.00143437 [with_stream_mark]: 2.23e-05 [recompute_prepare]: 2.132e-05 [updatestate_depend_eliminate]: 9.09998e-06 [updatestate_assign_eliminate]: 7.75998e-06 [updatestate_loads_eliminate]: 7.28e-06 [parameter_eliminate]: 2.71e-06 [a_2]: 0.0002436 [accelerated_algorithm]: 3.023e-05 [shard]: 1.99e-06 [meta_shard_fg_expand]: 3.38999e-06 [shard_inline]: 1.601e-05 [merge_send_recv]: 1.646e-05 [auto_parallel]: 1.097e-05 [parallel]: 1.782e-05 [flash_sp]: 1.094e-05 [merge_comm]: 9.67999e-06 [allreduce_fusion]: 8.89003e-06 [matmul_add_comm_reduction]: 2.6e-05 [allreduce_slice_to_reducescatter]: 5.70028e-07 [virtual_shard_identity]: 1.811e-05 [virtual_dataset]: 1.572e-05 [get_grad_eliminate_]: 1.494e-05 [virtual_output]: 1.488e-05 [merge_forward]: 9.27001e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.734e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.861e-05 [merge_recompute_call_nodes]: 1.95001e-06 [before_grad]: 2.658e-05 [set_forward_comm_id_for_comm_node_pass]: 9.32999e-06 [meta_fg_expand]: 0.00139029 [flash_sp_send_recv_attached]: 3.73001e-06 [receive_attached]: 2.79001e-06 [after_resolve]: 5.918e-05 [a_after_grad]: 8.015e-05 [renormalize]: 0.00236853 [add_forward_monad_depend]: 9.47001e-06 [auto_monad_grad]: 5.15999e-06 [auto_monad_eliminator]: 5.487e-05 [cse]: 0.00016148 [a_3]: 0.00033432 [Cycle 2]: 0.00298266, [45] [expand_dump_flag]: 1.55999e-06 [switch_simplify]: 4.658e-05 [loop_unroll]: 4.368e-05 [a_1]: 0.00153606 [with_stream_mark]: 1.166e-05 [recompute_prepare]: 1.095e-05 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 4.28001e-06 [updatestate_loads_eliminate]: 3.69002e-06 [parameter_eliminate]: 1.05001e-06 [a_2]: 0.00012474 [accelerated_algorithm]: 1.164e-05 [shard]: 1.11002e-06 [meta_shard_fg_expand]: 1.78002e-06 [shard_inline]: 9.10001e-06 [merge_send_recv]: 6.57002e-06 [auto_parallel]: 7.23e-06 [parallel]: 4.60999e-06 [flash_sp]: 3.25e-06 [merge_comm]: 5.03002e-06 [allreduce_fusion]: 5.28002e-06 [matmul_add_comm_reduction]: 8.42998e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.028e-05 [virtual_dataset]: 9.00999e-06 [get_grad_eliminate_]: 8.43001e-06 [virtual_output]: 8.3e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 8.82e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.632e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.37e-05 [set_forward_comm_id_for_comm_node_pass]: 5.04e-06 [meta_fg_expand]: 6.799e-05 [flash_sp_send_recv_attached]: 9.10019e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 1.613e-05 [a_after_grad]: 1.421e-05 [renormalize]: 0.00057462 [add_forward_monad_depend]: 4.16001e-06 [auto_monad_grad]: 1.17999e-06 [auto_monad_eliminator]: 1.434e-05 [cse]: 4.457e-05 [a_3]: 6.541e-05 [Cycle 3]: 0.00089787, [45] [expand_dump_flag]: 9.60019e-07 [switch_simplify]: 1.06e-05 [loop_unroll]: 8.87e-06 [a_1]: 0.00025003 [with_stream_mark]: 9.66e-06 [recompute_prepare]: 9.16002e-06 [updatestate_depend_eliminate]: 4.58001e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.91999e-06 [parameter_eliminate]: 9.50007e-07 [a_2]: 0.00012284 [accelerated_algorithm]: 1.147e-05 [shard]: 8.89995e-07 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 9.00999e-06 [merge_send_recv]: 6.84001e-06 [auto_parallel]: 7.1e-06 [parallel]: 4.35e-06 [flash_sp]: 1.02e-06 [merge_comm]: 4.71002e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 7.7e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 9.96e-06 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.48001e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.30999e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 8.2e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.702e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.439e-05 [set_forward_comm_id_for_comm_node_pass]: 5.51998e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.392e-05 [a_after_grad]: 1.401e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.19e-06 [auto_monad_grad]: 1.07998e-06 [auto_monad_eliminator]: 1.055e-05 [cse]: 2.473e-05 [a_3]: 5.967e-05 [py_interpret_to_execute_after_opt_a]: 9.77001e-06 [slice_cell_reuse_recomputed_activation]: 2.13002e-06 [rewriter_after_opt_a]: 4.609e-05 [convert_after_rewriter]: 8.94998e-06 [order_py_execute_after_rewriter]: 6.61e-06 [mutable_eliminate]: 0.00045899 [opt_b]: 0.00028618, [1] [Cycle 1]: 0.00028, [7] [b_1]: 0.0001886 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 7.26999e-06 [updatestate_assign_eliminate]: 4.10998e-06 [updatestate_loads_eliminate]: 3.94997e-06 [renormalize]: 3.80009e-07 [cse]: 3.064e-05 [optimize_parallel_all_gather_comm]: 1.938e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.052e-05 [loop_unroll]: 0.00044472 [opt_after_cconv]: 0.00013388, [1] [Cycle 1]: 0.00012807, [7] [c_1]: 4.829e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 7.25e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 4.01001e-06 [cse]: 2.894e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 2.842e-05 [tuple_transform]: 0.00010081, [1] [Cycle 1]: 9.598e-05, [4] [d_1]: 6.622e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 1.009e-05 [partial_unused_args_eliminate]: 1.84998e-06 [add_recomputation]: 5.535e-05 [cse_after_recomputation]: 3.165e-05, [1] [Cycle 1]: 2.706e-05, [1] [cse]: 2.164e-05 [environ_conv]: 8.54e-06 [swap_dp_allreduce_reducescatter]: 7.43e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.25999e-06 [label_fine_grained_interleaved_index]: 2.43998e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.31e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 1.06002e-06 [remove_cast_before_assign_add]: 8.2e-07 [full_micro_interleaved_order_control]: 2.26e-06 [reorder_send_recv_between_fp_bp]: 2.91999e-06 [comm_op_add_attrs]: 1.14998e-06 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52001e-06 [control_data_broadcast_order]: 1.678e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.62999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.33002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 4.82e-06 [overlap_grad_flash_sp]: 2.349e-05 [begin_end_overlap_inline]: 5.3001e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.91003e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 9.766e-05, [1] [Cycle 1]: 9.322e-05, [6] [build]: 9.39e-06 [elim_shapecalc]: 1.355e-05 [elim_not_effective]: 1.797e-05 [opt_reshape]: 9.79e-06 [fold_const_symbol]: 1.475e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.472e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.44001e-06 [opt_after_jit_grad]: 0.00046764 [validate]: 4.409e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.00804794 [execute]: 6.60997e-06 Sums bootstrap : 0.000507s : 1.55% type_inference : 0.011265s : 34.43% event_method : 0.000046s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.11% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000048s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000039s : 0.12% optimize.rewriter_before_opt_a : 0.000144s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000178s : 0.54% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003220s : 9.84% optimize.opt_a.with_stream_mark : 0.000044s : 0.13% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000491s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000019s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000031s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000034s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001461s : 4.47% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.02% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000108s : 0.33% optimize.opt_a.renormalize : 0.002943s : 9.00% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.24% optimize.opt_a.cse : 0.000231s : 0.71% optimize.opt_a.a_3 : 0.000459s : 1.40% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000459s : 1.40% optimize.opt_b.b_1 : 0.000189s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000019s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.06% optimize.loop_unroll : 0.000445s : 1.36% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000055s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000007s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000468s : 1.43% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008048s : 24.60% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000751 222 6.15% : 0.000046s : 12: substitution.arithmetic_simplify 1.74% : 0.000013s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.22% : 0.000415s : 17: substitution.inline 2.03% : 0.000015s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.97% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.93% : 0.000015s : 20: substitution.remove_not_recompute_node 3.17% : 0.000024s : 10: substitution.replace_applicator 1.49% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.34% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.62% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011194 2 86.93% : 0.009731s : 1: type_inference.infer 13.07% : 0.001463s : 1: type_inference.specialize ------[replace.] 0.000218 33 57.59% : 0.000125s : 17: replace.inline 42.41% : 0.000092s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000440 33 92.46% : 0.000407s : 17: match.inline 7.54% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000747 5764 1.11% : 0.000008s : 68: predicate.accumulaten_eliminater 0.32% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.02% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.14% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.76% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 249: predicate.inline 1.25% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.60% : 0.000004s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.14% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000009s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.07% : 0.000038s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000021s : 132: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.96% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.69% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001536 34 57.21% : 0.000879s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.79% : 0.000657s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061220 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.84% : 0.002964s : 1: add_attr 4.83% : 0.002955s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000059s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000126s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.88% : 0.000541s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000053s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.74% : 0.000454s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.04% : 0.004923s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.78% : 0.010888s : 1: opt_a 0.22% : 0.000137s : 1: opt_after_cconv 0.78% : 0.000477s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.49% : 0.013155s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000053s : 1: pre_auto_parallel 0.07% : 0.000043s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.60% : 0.001591s : 2: renormalize.infer 2.19% : 0.001340s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.24% : 0.000148s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000010s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.16% : 0.008058s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.42% : 0.011280s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0183013, [24] [bootstrap]: 0.0004655 [type_inference]: 0.00425762 [event_method]: 1.067e-05 [auto_monad]: 5.016e-05 [graph_reusing]: 5.16002e-06 [inline]: 1.86998e-06 [add_attr]: 0.00296703, [1] [add_attr_with_inline]: 0.00295906, [1] [Cycle 1]: 4.511e-05, [2] [tag_attr]: 1.168e-05 [meta_addattr_fg_expand]: 3.13998e-06 [parallel-infer-symbol]: 2.54999e-06 [pre_auto_parallel]: 2.111e-05 [insert-virtual-dataset]: 2.88e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.82999e-06 [optimize]: 0.00364437, [53] [py_interpret_to_execute]: 1.491e-05 [rewriter_before_opt_a]: 3.663e-05 [opt_a]: 0.00185624, [2] [Cycle 1]: 0.00123462, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 2.399e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00028804 [with_stream_mark]: 1.294e-05 [recompute_prepare]: 7.90998e-06 [updatestate_depend_eliminate]: 3.65e-06 [updatestate_assign_eliminate]: 3.03998e-06 [updatestate_loads_eliminate]: 2.86999e-06 [parameter_eliminate]: 1.69e-06 [a_2]: 7.755e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 7.83999e-06 [auto_parallel]: 5.59e-06 [parallel]: 1.754e-05 [flash_sp]: 7.21001e-06 [merge_comm]: 3.83001e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.80999e-06 [allreduce_slice_to_reducescatter]: 5.3001e-07 [virtual_shard_identity]: 7.13e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.39001e-06 [cell_reuse_recompute_pass]: 1.57999e-06 [offload_activation]: 8.92e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 1.004e-05 [set_forward_comm_id_for_comm_node_pass]: 3.37002e-06 [meta_fg_expand]: 2.41998e-06 [flash_sp_send_recv_attached]: 2.29001e-06 [receive_attached]: 2.12999e-06 [after_resolve]: 1.017e-05 [a_after_grad]: 9.06998e-06 [renormalize]: 0.00032981 [add_forward_monad_depend]: 4.63999e-06 [auto_monad_grad]: 1.78002e-06 [auto_monad_eliminator]: 1.342e-05 [cse]: 2.624e-05 [a_3]: 4.044e-05 [Cycle 2]: 0.00061248, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.61999e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012426 [with_stream_mark]: 9.12999e-06 [recompute_prepare]: 5.83002e-06 [updatestate_depend_eliminate]: 2.75002e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.74001e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 6.815e-05 [accelerated_algorithm]: 5.50001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 2.791e-05 [parallel]: 4.05e-06 [flash_sp]: 3.26001e-06 [merge_comm]: 3.45e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 4.81002e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.43e-06 [virtual_dataset]: 5.08002e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.78998e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.87001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.87e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 8.65999e-06 [a_after_grad]: 8.61002e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.27e-05 [a_3]: 3.185e-05 [py_interpret_to_execute_after_opt_a]: 7.24001e-06 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 3.038e-05 [convert_after_rewriter]: 7.16001e-06 [order_py_execute_after_rewriter]: 4.97999e-06 [mutable_eliminate]: 0.00044803 [opt_b]: 0.00018048, [1] [Cycle 1]: 0.00017449, [7] [b_1]: 0.00010831 [b_2]: 7.22002e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 3.7998e-07 [cse]: 1.539e-05 [optimize_parallel_all_gather_comm]: 1.594e-05 [overlap_param_gather]: 2.06e-06 [cconv]: 2.208e-05 [loop_unroll]: 0.00041256 [opt_after_cconv]: 9.308e-05, [1] [Cycle 1]: 8.764e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.557e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.219e-05 [tuple_transform]: 6.735e-05, [1] [Cycle 1]: 6.318e-05, [4] [d_1]: 3.836e-05 [none_parameter_eliminate]: 1.45999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.87001e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 4.213e-05 [cse_after_recomputation]: 1.896e-05, [1] [Cycle 1]: 1.46e-05, [1] [cse]: 9.66e-06 [environ_conv]: 4.39002e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.66999e-06 [label_micro_interleaved_index]: 3.88001e-06 [label_fine_grained_interleaved_index]: 2.62001e-06 [merge_cast_opt]: 1.19003e-06 [slice_recompute_activation]: 2.28998e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.51002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.66e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.53002e-06 [control_data_broadcast_order]: 1.142e-05 [grouped_pairwise_exchange_alltoall]: 1.52999e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.48999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 1.87999e-06 [overlap_grad_ring_attention]: 3.84002e-06 [overlap_grad_flash_sp]: 1.632e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.42001e-06 [split_layernorm_comm]: 1.57001e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.668e-05, [1] [Cycle 1]: 6.27e-05, [6] [build]: 2.17001e-06 [elim_shapecalc]: 8.05999e-06 [elim_not_effective]: 1.123e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 8.58001e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.00044648 [validate]: 2.946e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.00617185 [execute]: 6.86001e-06 Sums bootstrap : 0.000466s : 3.24% type_inference : 0.004258s : 29.59% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000037s : 0.25% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000412s : 2.87% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000033s : 0.23% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000018s : 0.12% optimize.opt_a.renormalize : 0.000330s : 2.29% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.11% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000413s : 2.87% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000446s : 3.10% validate : 0.000029s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.006172s : 42.90% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000116 26 18.33% : 0.000021s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.47% : 0.000005s : 4: substitution.graph_param_transform 65.68% : 0.000076s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.66% : 0.000004s : 4: substitution.remove_not_recompute_node 3.04% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004217 2 92.16% : 0.003886s : 1: type_inference.infer 7.84% : 0.000331s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000134 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.99% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.35% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.88% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.69% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000008s : 44: predicate.inline 1.04% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000001s : 8: predicate.less_batch_normalization 1.61% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.14% : 0.000002s : 4: predicate.mutable_eliminate 0.51% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.83% : 0.000001s : 9: predicate.print_const_string_wrapper 1.02% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.11% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.78% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.49% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.09% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.23% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.89% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000228 6 43.53% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.47% : 0.000129s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026165 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.36% : 0.002971s : 1: add_attr 11.32% : 0.002962s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.91% : 0.000500s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.92% : 0.000764s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.16% : 0.000042s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.11% : 0.001859s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.74% : 0.000456s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 13.94% : 0.003648s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.69% : 0.000181s : 1: renormalize.infer 0.55% : 0.000143s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000041s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.26% : 0.000069s : 1: symbol_engine_optimizer 23.62% : 0.006181s : 1: task_emit 0.27% : 0.000070s : 1: tuple_transform 16.32% : 0.004271s : 1: type_inference 0.21% : 0.000055s : 1: validate TotalTime = 0.0357293, [24] [bootstrap]: 0.00050495 [type_inference]: 0.0101444 [event_method]: 3.992e-05 [auto_monad]: 0.00011359 [graph_reusing]: 8.36002e-06 [inline]: 1.97999e-06 [add_attr]: 0.00300315, [1] [add_attr_with_inline]: 0.00299498, [1] [Cycle 1]: 6.655e-05, [2] [tag_attr]: 3.155e-05 [meta_addattr_fg_expand]: 8.38999e-06 [parallel-infer-symbol]: 2.61999e-06 [pre_auto_parallel]: 4.452e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 6.90023e-07 [dataset_repeat_opt]: 2.02001e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.0129799, [53] [py_interpret_to_execute]: 3.452e-05 [rewriter_before_opt_a]: 0.00012712 [opt_a]: 0.0107538, [3] [Cycle 1]: 0.00684654, [45] [expand_dump_flag]: 4.08999e-06 [switch_simplify]: 6.647e-05 [loop_unroll]: 5.482e-05 [a_1]: 0.00133062 [with_stream_mark]: 2.222e-05 [recompute_prepare]: 2.128e-05 [updatestate_depend_eliminate]: 8.80001e-06 [updatestate_assign_eliminate]: 7.41999e-06 [updatestate_loads_eliminate]: 7.35998e-06 [parameter_eliminate]: 2.56e-06 [a_2]: 0.00024455 [accelerated_algorithm]: 3.136e-05 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 3.51001e-06 [shard_inline]: 1.612e-05 [merge_send_recv]: 1.551e-05 [auto_parallel]: 1.066e-05 [parallel]: 1.809e-05 [flash_sp]: 1.164e-05 [merge_comm]: 9.44998e-06 [allreduce_fusion]: 8.69998e-06 [matmul_add_comm_reduction]: 2.728e-05 [allreduce_slice_to_reducescatter]: 5.39992e-07 [virtual_shard_identity]: 1.751e-05 [virtual_dataset]: 1.555e-05 [get_grad_eliminate_]: 1.528e-05 [virtual_output]: 1.514e-05 [merge_forward]: 9.27001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 1.769e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.823e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 2.752e-05 [set_forward_comm_id_for_comm_node_pass]: 9.33002e-06 [meta_fg_expand]: 0.00139452 [flash_sp_send_recv_attached]: 3.62998e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 5.829e-05 [a_after_grad]: 8.051e-05 [renormalize]: 0.00234789 [add_forward_monad_depend]: 9.15999e-06 [auto_monad_grad]: 5.23002e-06 [auto_monad_eliminator]: 5.688e-05 [cse]: 0.00019251 [a_3]: 0.00033678 [Cycle 2]: 0.00293979, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.695e-05 [loop_unroll]: 4.364e-05 [a_1]: 0.00153732 [with_stream_mark]: 1.178e-05 [recompute_prepare]: 1.061e-05 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 4.53999e-06 [updatestate_loads_eliminate]: 3.82998e-06 [parameter_eliminate]: 1.09003e-06 [a_2]: 0.00012762 [accelerated_algorithm]: 1.211e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.79998e-06 [shard_inline]: 9.60001e-06 [merge_send_recv]: 7.03998e-06 [auto_parallel]: 7.41999e-06 [parallel]: 4.52e-06 [flash_sp]: 3.21999e-06 [merge_comm]: 5.22999e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 7.75998e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.024e-05 [virtual_dataset]: 8.83001e-06 [get_grad_eliminate_]: 8.95001e-06 [virtual_output]: 8.50001e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.10001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.695e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.417e-05 [set_forward_comm_id_for_comm_node_pass]: 5.31002e-06 [meta_fg_expand]: 3.405e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.481e-05 [a_after_grad]: 1.39e-05 [renormalize]: 0.00056406 [add_forward_monad_depend]: 4.1e-06 [auto_monad_grad]: 1.36002e-06 [auto_monad_eliminator]: 1.436e-05 [cse]: 4.521e-05 [a_3]: 6.552e-05 [Cycle 3]: 0.00095333, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 1.066e-05 [loop_unroll]: 8.64e-06 [a_1]: 0.00024993 [with_stream_mark]: 9.70002e-06 [recompute_prepare]: 9.61e-06 [updatestate_depend_eliminate]: 4.62e-06 [updatestate_assign_eliminate]: 3.80998e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012358 [accelerated_algorithm]: 1.158e-05 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 9.14998e-06 [merge_send_recv]: 6.79999e-06 [auto_parallel]: 6.96001e-06 [parallel]: 4.59002e-06 [flash_sp]: 1.10001e-06 [merge_comm]: 5.29e-06 [allreduce_fusion]: 4.94998e-06 [matmul_add_comm_reduction]: 7.51999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.037e-05 [virtual_dataset]: 8.99e-06 [get_grad_eliminate_]: 8.57e-06 [virtual_output]: 8.45001e-06 [merge_forward]: 4.32998e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.725e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.446e-05 [set_forward_comm_id_for_comm_node_pass]: 5.50001e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.09e-06 [after_resolve]: 1.356e-05 [a_after_grad]: 1.474e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.44e-06 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 1.059e-05 [cse]: 2.577e-05 [a_3]: 6.068e-05 [py_interpret_to_execute_after_opt_a]: 9.89001e-06 [slice_cell_reuse_recomputed_activation]: 2.06003e-06 [rewriter_after_opt_a]: 4.616e-05 [convert_after_rewriter]: 8.83001e-06 [order_py_execute_after_rewriter]: 6.93998e-06 [mutable_eliminate]: 0.00045954 [opt_b]: 0.00028664, [1] [Cycle 1]: 0.0002801, [7] [b_1]: 0.00018847 [b_2]: 1.088e-05 [updatestate_depend_eliminate]: 7.23e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 4.04002e-06 [renormalize]: 3.69997e-07 [cse]: 3.016e-05 [optimize_parallel_all_gather_comm]: 2.055e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 1.938e-05 [loop_unroll]: 0.00042134 [opt_after_cconv]: 0.00013481, [1] [Cycle 1]: 0.00012889, [7] [c_1]: 4.913e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 6.99001e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.89002e-06 [cse]: 2.911e-05 [renormalize]: 2.60014e-07 [remove_dup_value]: 2.927e-05 [tuple_transform]: 0.00010053, [1] [Cycle 1]: 9.614e-05, [4] [d_1]: 6.65e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 9.64e-06 [partial_unused_args_eliminate]: 1.97999e-06 [add_recomputation]: 5.76e-05 [cse_after_recomputation]: 3.14e-05, [1] [Cycle 1]: 2.681e-05, [1] [cse]: 2.134e-05 [environ_conv]: 8.28999e-06 [swap_dp_allreduce_reducescatter]: 8.02e-06 [bias_add_comm_swap]: 2.21998e-06 [label_micro_interleaved_index]: 4.85999e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.23002e-06 [reorder_send_recv_between_fp_bp]: 2.35002e-06 [comm_op_add_attrs]: 1.24e-06 [add_comm_op_reuse_tag]: 8.59989e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.50999e-06 [control_data_broadcast_order]: 1.698e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 5.05999e-06 [overlap_recompute_and_grad_model_parallel]: 5.66003e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47001e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 4.94e-06 [overlap_grad_flash_sp]: 2.474e-05 [begin_end_overlap_inline]: 5.19998e-07 [split_matmul_comm_elemetwise]: 2.36998e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 9.606e-05, [1] [Cycle 1]: 9.203e-05, [6] [build]: 8.40001e-06 [elim_shapecalc]: 1.345e-05 [elim_not_effective]: 1.768e-05 [opt_reshape]: 1.023e-05 [fold_const_symbol]: 1.48e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.63002e-06 [auto_monad_reorder]: 2.428e-05 [get_jit_bprop_graph]: 1.05999e-06 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00047116 [validate]: 4.339e-05 [backend_pass]: 9.29984e-07 [task_emit]: 0.0080921 [execute]: 7e-06 Sums bootstrap : 0.000505s : 1.61% type_inference : 0.010144s : 32.30% event_method : 0.000040s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000007s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.40% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003118s : 9.93% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000496s : 1.58% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.18% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.14% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.11% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001432s : 4.56% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000109s : 0.35% optimize.opt_a.renormalize : 0.002912s : 9.27% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.26% optimize.opt_a.cse : 0.000263s : 0.84% optimize.opt_a.a_3 : 0.000463s : 1.47% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000046s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000460s : 1.46% optimize.opt_b.b_1 : 0.000188s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.07% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000421s : 1.34% optimize.opt_after_cconv.c_1 : 0.000049s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000058s : 0.18% optimize.cse_after_recomputation.cse : 0.000021s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000025s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000008s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000024s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000471s : 1.50% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008092s : 25.76% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000732 218 6.14% : 0.000045s : 11: substitution.arithmetic_simplify 1.91% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.54% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.87% : 0.000401s : 16: substitution.inline 2.14% : 0.000016s : 2: substitution.inline_without_move 1.46% : 0.000011s : 20: substitution.j_node_and_user_rematch 2.04% : 0.000015s : 3: substitution.less_batch_normalization 1.79% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.92% : 0.000014s : 20: substitution.remove_not_recompute_node 3.19% : 0.000023s : 10: substitution.replace_applicator 1.46% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.66% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.92% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.17% : 0.000060s : 28: substitution.tuple_list_get_item_eliminator 2.50% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010078 2 87.43% : 0.008811s : 1: type_inference.infer 12.57% : 0.001267s : 1: type_inference.specialize ------[replace.] 0.000205 30 59.12% : 0.000121s : 16: replace.inline 40.88% : 0.000084s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000422 30 93.16% : 0.000393s : 16: match.inline 6.84% : 0.000029s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.05% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000008s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.74% : 0.000013s : 107: predicate.environ_get_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.72% : 0.000013s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.59% : 0.000004s : 32: predicate.get_grad_eliminate 0.11% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.67% : 0.000042s : 244: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000019s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.42% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.00% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000009s : 67: predicate.reduce_eliminate 2.67% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.94% : 0.000014s : 149: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.32% : 0.000010s : 68: predicate.same_eliminate 0.38% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.32% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.17% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.07% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.58% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.68% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001501 32 58.11% : 0.000872s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.89% : 0.000629s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059724 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.04% : 0.003008s : 1: add_attr 5.02% : 0.002999s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000062s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.95% : 0.000568s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000046s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.72% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000469s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 7.99% : 0.004771s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 18.01% : 0.010757s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.81% : 0.000481s : 1: opt_after_jit_grad 0.49% : 0.000290s : 1: opt_b 21.74% : 0.012984s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.05% : 0.000028s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.08% : 0.000049s : 1: pre_auto_parallel 0.06% : 0.000038s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000033s : 1: remove_dup_value 2.55% : 0.001525s : 2: renormalize.infer 2.30% : 0.001374s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000050s : 1: rewriter_after_opt_a 0.22% : 0.000132s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000099s : 1: symbol_engine_optimizer 13.56% : 0.008102s : 1: task_emit 0.17% : 0.000103s : 1: tuple_transform 17.01% : 0.010160s : 1: type_inference 0.12% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x0-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x0-kbk],max_mem:64.0M TotalTime = 0.0810648, [24] [bootstrap]: 0.00061711 [type_inference]: 0.00730939 [event_method]: 1.356e-05 [auto_monad]: 5.41e-05 [graph_reusing]: 5.52001e-06 [inline]: 2.04e-06 [add_attr]: 0.00351268, [1] [add_attr_with_inline]: 0.00350139, [1] [Cycle 1]: 4.435e-05, [2] [tag_attr]: 1.443e-05 [meta_addattr_fg_expand]: 4.09002e-06 [parallel-infer-symbol]: 2.76999e-06 [pre_auto_parallel]: 2.726e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.84998e-06 [pipeline_split]: 1.69e-06 [optimize]: 0.00398494, [53] [py_interpret_to_execute]: 1.981e-05 [rewriter_before_opt_a]: 5.818e-05 [opt_a]: 0.00213879, [2] [Cycle 1]: 0.00154847, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.193e-05 [loop_unroll]: 2.05e-05 [a_1]: 0.0004521 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 4.07e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 1.72001e-06 [a_2]: 0.00011976 [accelerated_algorithm]: 6.59001e-06 [shard]: 2.46e-06 [meta_shard_fg_expand]: 1.54998e-06 [shard_inline]: 6.07999e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 6.24001e-06 [parallel]: 2.343e-05 [flash_sp]: 7.46999e-06 [merge_comm]: 3.58999e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 8.56002e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 7.67998e-06 [virtual_dataset]: 6.17001e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.61998e-06 [merge_forward]: 3.83999e-06 [cell_reuse_recompute_pass]: 1.23002e-06 [offload_activation]: 8.69998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.093e-05 [merge_recompute_call_nodes]: 1.30001e-06 [before_grad]: 9.42999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.79001e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 1.055e-05 [a_after_grad]: 8.69003e-06 [renormalize]: 0.000407 [add_forward_monad_depend]: 4.72998e-06 [auto_monad_grad]: 2.05002e-06 [auto_monad_eliminator]: 1.325e-05 [cse]: 2.692e-05 [a_3]: 3.957e-05 [Cycle 2]: 0.00058117, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 6.58e-06 [loop_unroll]: 5.46002e-06 [a_1]: 0.00012418 [with_stream_mark]: 9.40001e-06 [recompute_prepare]: 5.89e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.23998e-06 [updatestate_loads_eliminate]: 2.56998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.736e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 9.5999e-07 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.40999e-06 [auto_parallel]: 5.20001e-06 [parallel]: 4.23999e-06 [flash_sp]: 3.33998e-06 [merge_comm]: 2.99999e-06 [allreduce_fusion]: 2.59999e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 2.79979e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.24e-06 [virtual_output]: 5.01002e-06 [merge_forward]: 2.43e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.39e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.74002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.90002e-06 [meta_fg_expand]: 1.55999e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 9.90025e-07 [after_resolve]: 8.87e-06 [a_after_grad]: 7.95998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.09989e-07 [auto_monad_grad]: 9.5999e-07 [auto_monad_eliminator]: 5.79e-06 [cse]: 1.225e-05 [a_3]: 3.141e-05 [py_interpret_to_execute_after_opt_a]: 7.56999e-06 [slice_cell_reuse_recomputed_activation]: 2.01003e-06 [rewriter_after_opt_a]: 3.333e-05 [convert_after_rewriter]: 7.66999e-06 [order_py_execute_after_rewriter]: 5.25001e-06 [mutable_eliminate]: 0.00045812 [opt_b]: 0.00017992, [1] [Cycle 1]: 0.00017396, [7] [b_1]: 0.00010592 [b_2]: 7.15e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 5.20027e-07 [cse]: 1.638e-05 [optimize_parallel_all_gather_comm]: 1.567e-05 [overlap_param_gather]: 2.25002e-06 [cconv]: 2.215e-05 [loop_unroll]: 0.00041796 [opt_after_cconv]: 9.391e-05, [1] [Cycle 1]: 8.832e-05, [7] [c_1]: 2.752e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.40999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.12001e-06 [cse]: 1.603e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.814e-05, [1] [Cycle 1]: 6.373e-05, [4] [d_1]: 3.828e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.19999e-06 [partial_unused_args_eliminate]: 1.62999e-06 [add_recomputation]: 4.832e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.585e-05, [1] [cse]: 1.086e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.76999e-06 [label_micro_interleaved_index]: 3.97e-06 [label_fine_grained_interleaved_index]: 2.78003e-06 [merge_cast_opt]: 1.14e-06 [slice_recompute_activation]: 2.49001e-06 [micro_interleaved_order_control]: 2.09999e-06 [assign_add_opt]: 1.56998e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.12e-06 [full_micro_interleaved_order_control]: 2.42001e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.10019e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.54e-06 [offloading_packed_experts]: 3.62998e-06 [overlap_recompute_and_grad_model_parallel]: 4.03999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 1.84998e-06 [overlap_grad_ring_attention]: 4.22998e-06 [overlap_grad_flash_sp]: 1.701e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.688e-05, [1] [Cycle 1]: 6.275e-05, [6] [build]: 2.59999e-06 [elim_shapecalc]: 8.02e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 5.99999e-06 [fold_const_symbol]: 8.65999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.97001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.536e-05 [get_jit_bprop_graph]: 9.50007e-07 [rewriter_after_jit_bprop_graph]: 3.25998e-06 [opt_after_jit_grad]: 0.00045121 [validate]: 3.105e-05 [backend_pass]: 8.40024e-07 [task_emit]: 0.06481 [execute]: 8.02e-06 Sums bootstrap : 0.000617s : 0.81% type_inference : 0.007309s : 9.54% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.03% optimize.opt_a.a_1 : 0.000576s : 0.75% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000187s : 0.24% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000407s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000071s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000033s : 0.04% optimize.convert_after_rewriter : 0.000008s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.60% optimize.opt_b.b_1 : 0.000106s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000418s : 0.55% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000451s : 0.59% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064810s : 84.62% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.79% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.12% : 0.000005s : 4: substitution.graph_param_transform 66.60% : 0.000109s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.60% : 0.000004s : 4: substitution.remove_not_recompute_node 2.45% : 0.000004s : 4: substitution.replace_old_param 6.96% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.007266 2 92.27% : 0.006705s : 1: type_inference.infer 7.73% : 0.000562s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.39% : 0.000028s : 3: replace.inline 29.61% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.20% : 0.000107s : 3: match.inline 8.80% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.85% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.72% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.65% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.38% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.19% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.87% : 0.000009s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.81% : 0.000001s : 8: predicate.less_batch_normalization 1.78% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.50% : 0.000004s : 32: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.62% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.22% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.45% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.55% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.08% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.14% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 1.02% : 0.000002s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.68% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.33% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000338 8 45.30% : 0.000153s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.70% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.090059 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.91% : 0.003517s : 1: add_attr 3.89% : 0.003505s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.73% : 0.000653s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.47% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.05% : 0.000941s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.38% : 0.002142s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.51% : 0.000461s : 1: opt_after_jit_grad 0.20% : 0.000183s : 1: opt_b 4.43% : 0.003989s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000211s : 1: renormalize.infer 0.21% : 0.000190s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000037s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 71.98% : 0.064827s : 1: task_emit 0.08% : 0.000071s : 1: tuple_transform 8.13% : 0.007323s : 1: type_inference 0.06% : 0.000056s : 1: validate TotalTime = 0.0705826, [24] [bootstrap]: 0.0004853 [type_inference]: 0.00441052 [event_method]: 1.058e-05 [auto_monad]: 5.018e-05 [graph_reusing]: 5.49e-06 [inline]: 1.74e-06 [add_attr]: 0.00298937, [1] [add_attr_with_inline]: 0.00298147, [1] [Cycle 1]: 4.55e-05, [2] [tag_attr]: 1.16e-05 [meta_addattr_fg_expand]: 3.04999e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 2.144e-05 [insert-virtual-dataset]: 2.92002e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.59998e-06 [optimize]: 0.00366375, [53] [py_interpret_to_execute]: 1.529e-05 [rewriter_before_opt_a]: 3.855e-05 [opt_a]: 0.00183641, [2] [Cycle 1]: 0.00123959, [45] [expand_dump_flag]: 2.32001e-06 [switch_simplify]: 2.434e-05 [loop_unroll]: 1.391e-05 [a_1]: 0.00028865 [with_stream_mark]: 1.297e-05 [recompute_prepare]: 7.26001e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 2.98998e-06 [updatestate_loads_eliminate]: 2.71e-06 [parameter_eliminate]: 1.82001e-06 [a_2]: 7.64e-05 [accelerated_algorithm]: 6.14001e-06 [shard]: 2.14999e-06 [meta_shard_fg_expand]: 1.42e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 7.18e-06 [auto_parallel]: 5.54998e-06 [parallel]: 1.813e-05 [flash_sp]: 7.36999e-06 [merge_comm]: 3.4e-06 [allreduce_fusion]: 3.18e-06 [matmul_add_comm_reduction]: 8.37998e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 6.71999e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.45001e-06 [virtual_output]: 5.35001e-06 [merge_forward]: 3.47997e-06 [cell_reuse_recompute_pass]: 9.99979e-07 [offload_activation]: 8.87e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.072e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.10001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.34999e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.80999e-06 [renormalize]: 0.00034113 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.54e-06 [auto_monad_eliminator]: 1.327e-05 [cse]: 2.725e-05 [a_3]: 4.073e-05 [Cycle 2]: 0.00058773, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.58e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012213 [with_stream_mark]: 1.104e-05 [recompute_prepare]: 5.66998e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.23998e-06 [updatestate_loads_eliminate]: 2.39999e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.776e-05 [accelerated_algorithm]: 5.49998e-06 [shard]: 1.09998e-06 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.61e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 5.34e-06 [parallel]: 4.42e-06 [flash_sp]: 3.46001e-06 [merge_comm]: 3.04001e-06 [allreduce_fusion]: 2.71999e-06 [matmul_add_comm_reduction]: 4.84e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.13002e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.99998e-06 [merge_forward]: 2.39999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.71e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.63002e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.76001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 9.62999e-06 [a_after_grad]: 8.13001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 9.60019e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.218e-05 [a_3]: 3.181e-05 [py_interpret_to_execute_after_opt_a]: 7.25e-06 [slice_cell_reuse_recomputed_activation]: 1.70001e-06 [rewriter_after_opt_a]: 3.073e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 5.19e-06 [mutable_eliminate]: 0.00045188 [opt_b]: 0.0001801, [1] [Cycle 1]: 0.00017413, [7] [b_1]: 0.00010686 [b_2]: 7.35e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 3.70026e-07 [cse]: 1.59e-05 [optimize_parallel_all_gather_comm]: 1.541e-05 [overlap_param_gather]: 2.07001e-06 [cconv]: 2.219e-05 [loop_unroll]: 0.00041282 [opt_after_cconv]: 9.347e-05, [1] [Cycle 1]: 8.786e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.555e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.313e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.442e-05, [4] [d_1]: 3.872e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.89e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 4.403e-05 [cse_after_recomputation]: 2.034e-05, [1] [Cycle 1]: 1.592e-05, [1] [cse]: 1.06e-05 [environ_conv]: 4.59998e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.55002e-06 [label_micro_interleaved_index]: 4.22998e-06 [label_fine_grained_interleaved_index]: 2.90002e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.01e-06 [micro_interleaved_order_control]: 2.54001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 1.96e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.106e-05 [grouped_pairwise_exchange_alltoall]: 1.66e-06 [offloading_packed_experts]: 4.18001e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 1.92999e-06 [overlap_grad_ring_attention]: 3.86001e-06 [overlap_grad_flash_sp]: 1.724e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.90001e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 1.27999e-06 [symbol_engine_optimizer]: 6.846e-05, [1] [Cycle 1]: 6.423e-05, [6] [build]: 2.18002e-06 [elim_shapecalc]: 8.48001e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.62e-06 [renormalize]: 1.99972e-07 [detach_backward]: 1.72001e-06 [pipeline_parallel_scheduler]: 1.52999e-06 [auto_monad_reorder]: 1.497e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.30998e-06 [opt_after_jit_grad]: 0.00044623 [validate]: 2.981e-05 [backend_pass]: 1.19998e-06 [task_emit]: 0.0582315 [execute]: 8.37e-06 Sums bootstrap : 0.000485s : 0.73% type_inference : 0.004411s : 6.62% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000411s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000011s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000341s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.68% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000413s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000446s : 0.67% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.058232s : 87.41% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000118 26 18.04% : 0.000021s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.28% : 0.000005s : 4: substitution.graph_param_transform 66.00% : 0.000078s : 2: substitution.inline 2.19% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.88% : 0.000005s : 4: substitution.remove_not_recompute_node 3.05% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004370 2 91.74% : 0.004009s : 1: type_inference.infer 8.26% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 1.04% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.43% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.82% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.49% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.74% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.10% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.68% : 0.000001s : 8: predicate.remove_not_recompute_node 1.39% : 0.000002s : 17: predicate.replace_applicator 0.94% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.04% : 0.000001s : 8: predicate.shard_identity_eliminate 1.04% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.10% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.04% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.41% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.24% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000256 6 44.42% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.58% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078493 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.81% : 0.002993s : 1: add_attr 3.80% : 0.002985s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000519s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.03% : 0.000027s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000760s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.34% : 0.001839s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.58% : 0.000456s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.67% : 0.003668s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000017s : 1: remove_dup_value 0.24% : 0.000185s : 1: renormalize.infer 0.19% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.21% : 0.058248s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.64% : 0.004424s : 1: type_inference 0.06% : 0.000051s : 1: validate TotalTime = 0.0725506, [24] [bootstrap]: 0.00046283 [type_inference]: 0.00546705 [event_method]: 1.429e-05 [auto_monad]: 5.444e-05 [graph_reusing]: 5.87001e-06 [inline]: 1.92001e-06 [add_attr]: 0.0029934, [1] [add_attr_with_inline]: 0.00298547, [1] [Cycle 1]: 4.536e-05, [2] [tag_attr]: 1.557e-05 [meta_addattr_fg_expand]: 4.43001e-06 [parallel-infer-symbol]: 3.11999e-06 [pre_auto_parallel]: 2.789e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00401061, [53] [py_interpret_to_execute]: 2.064e-05 [rewriter_before_opt_a]: 5.837e-05 [opt_a]: 0.00218158, [2] [Cycle 1]: 0.00152392, [45] [expand_dump_flag]: 2.73998e-06 [switch_simplify]: 3.179e-05 [loop_unroll]: 2.071e-05 [a_1]: 0.00045689 [with_stream_mark]: 1.329e-05 [recompute_prepare]: 7.45e-06 [updatestate_depend_eliminate]: 4.1e-06 [updatestate_assign_eliminate]: 3.37002e-06 [updatestate_loads_eliminate]: 2.85998e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.956e-05 [accelerated_algorithm]: 6.69999e-06 [shard]: 1.85001e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 8.22998e-06 [auto_parallel]: 6.19001e-06 [parallel]: 1.831e-05 [flash_sp]: 7.41001e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 8.80999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 6.99001e-06 [virtual_dataset]: 5.95002e-06 [get_grad_eliminate_]: 5.52999e-06 [virtual_output]: 5.53002e-06 [merge_forward]: 3.76001e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.077e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 1.026e-05 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.16998e-06 [after_resolve]: 9.97001e-06 [a_after_grad]: 8.64998e-06 [renormalize]: 0.00042384 [add_forward_monad_depend]: 4.89e-06 [auto_monad_grad]: 1.81998e-06 [auto_monad_eliminator]: 1.349e-05 [cse]: 2.541e-05 [a_3]: 4.086e-05 [Cycle 2]: 0.00064829, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 7.06001e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012536 [with_stream_mark]: 9.47999e-06 [recompute_prepare]: 5.93002e-06 [updatestate_depend_eliminate]: 2.91999e-06 [updatestate_assign_eliminate]: 2.18002e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.734e-05 [accelerated_algorithm]: 5.54998e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.53001e-06 [auto_parallel]: 5.07999e-06 [parallel]: 4.01001e-06 [flash_sp]: 3.18e-06 [merge_comm]: 5.14998e-06 [allreduce_fusion]: 2.89001e-06 [matmul_add_comm_reduction]: 5.08002e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.96999e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 4.97e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.78003e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.86e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.93999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.26001e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.99e-06 [a_after_grad]: 8.62998e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.37e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.99001e-06 [cse]: 1.436e-05 [a_3]: 3.311e-05 [py_interpret_to_execute_after_opt_a]: 7.6e-06 [slice_cell_reuse_recomputed_activation]: 1.91e-06 [rewriter_after_opt_a]: 3.063e-05 [convert_after_rewriter]: 6.78e-06 [order_py_execute_after_rewriter]: 4.97e-06 [mutable_eliminate]: 0.00044835 [opt_b]: 0.00018226, [1] [Cycle 1]: 0.00017608, [7] [b_1]: 0.00010837 [b_2]: 7.26001e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.41998e-06 [renormalize]: 3.50003e-07 [cse]: 1.617e-05 [optimize_parallel_all_gather_comm]: 1.497e-05 [overlap_param_gather]: 2.15002e-06 [cconv]: 2.226e-05 [loop_unroll]: 0.00041218 [opt_after_cconv]: 9.525e-05, [1] [Cycle 1]: 8.965e-05, [7] [c_1]: 2.768e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.14e-06 [cse]: 1.631e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.277e-05 [tuple_transform]: 6.882e-05, [1] [Cycle 1]: 6.427e-05, [4] [d_1]: 3.878e-05 [none_parameter_eliminate]: 1.69998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 1.63002e-06 [add_recomputation]: 4.257e-05 [cse_after_recomputation]: 2.071e-05, [1] [Cycle 1]: 1.624e-05, [1] [cse]: 1.096e-05 [environ_conv]: 5.03002e-06 [swap_dp_allreduce_reducescatter]: 4.98001e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 4.23001e-06 [label_fine_grained_interleaved_index]: 2.44001e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.17001e-06 [assign_add_opt]: 1.51002e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.33998e-06 [reorder_send_recv_between_fp_bp]: 2.64999e-06 [comm_op_add_attrs]: 1.23002e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.12999e-06 [overlap_opt_shard_in_pipeline]: 1.09998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 3.53e-06 [overlap_recompute_and_grad_model_parallel]: 4.66002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.56998e-06 [overlap_recompute_comm]: 1.87999e-06 [overlap_grad_ring_attention]: 3.75998e-06 [overlap_grad_flash_sp]: 1.709e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.01e-06 [split_layernorm_comm]: 1.76e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.91e-05, [1] [Cycle 1]: 6.508e-05, [6] [build]: 2.44001e-06 [elim_shapecalc]: 8.57998e-06 [elim_not_effective]: 1.179e-05 [opt_reshape]: 6.26998e-06 [fold_const_symbol]: 8.79e-06 [renormalize]: 2.30008e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 1.546e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00045048 [validate]: 3.067e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.0587904 [execute]: 8.21002e-06 Sums bootstrap : 0.000463s : 0.68% type_inference : 0.005467s : 7.98% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000582s : 0.85% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000009s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000424s : 0.62% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000074s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.65% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.60% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000450s : 0.66% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.058790s : 85.77% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000168 30 14.86% : 0.000025s : 5: substitution.arithmetic_simplify 1.19% : 0.000002s : 2: substitution.elim_not_effective 0.71% : 0.000001s : 2: substitution.fold_const_symbol 3.05% : 0.000005s : 4: substitution.graph_param_transform 67.01% : 0.000113s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.55% : 0.000004s : 4: substitution.remove_not_recompute_node 2.45% : 0.000004s : 4: substitution.replace_old_param 6.49% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005427 2 89.96% : 0.004883s : 1: type_inference.infer 10.04% : 0.000545s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.55% : 0.000027s : 3: replace.inline 30.45% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000120 5 91.80% : 0.000110s : 3: match.inline 8.20% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.85% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.85% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.75% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.30% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.09% : 0.000010s : 51: predicate.inline 0.89% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.85% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 0.93% : 0.000001s : 4: predicate.loop_unroll_after_grad 2.20% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 11: predicate.reduce_eliminate 2.33% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.50% : 0.000002s : 21: predicate.replace_applicator 0.52% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.49% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.04% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.11% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.58% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000344 8 45.87% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.13% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.081085 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.70% : 0.002997s : 1: add_attr 3.69% : 0.002989s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000498s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000421s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.56% : 0.000457s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.18% : 0.000953s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.69% : 0.002184s : 1: opt_a 0.12% : 0.000098s : 1: opt_after_cconv 0.57% : 0.000460s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.95% : 0.004014s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000205s : 1: renormalize.infer 0.26% : 0.000212s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.08% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 72.53% : 0.058808s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 6.76% : 0.005481s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.107302, [24] [bootstrap]: 0.00050947 [type_inference]: 0.0113691 [event_method]: 4.848e-05 [auto_monad]: 0.0001203 [graph_reusing]: 8.32e-06 [inline]: 2.15002e-06 [add_attr]: 0.00301977, [1] [add_attr_with_inline]: 0.00301149, [1] [Cycle 1]: 7.033e-05, [2] [tag_attr]: 3.46e-05 [meta_addattr_fg_expand]: 9.04e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 5.012e-05 [insert-virtual-dataset]: 2.26998e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.19001e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.0133176, [53] [py_interpret_to_execute]: 3.704e-05 [rewriter_before_opt_a]: 0.0001438 [opt_a]: 0.0110616, [3] [Cycle 1]: 0.00708534, [45] [expand_dump_flag]: 3.65e-06 [switch_simplify]: 7.405e-05 [loop_unroll]: 6.1e-05 [a_1]: 0.00145666 [with_stream_mark]: 2.342e-05 [recompute_prepare]: 2.145e-05 [updatestate_depend_eliminate]: 9.05999e-06 [updatestate_assign_eliminate]: 7.71001e-06 [updatestate_loads_eliminate]: 7.18e-06 [parameter_eliminate]: 2.37999e-06 [a_2]: 0.00024257 [accelerated_algorithm]: 3.008e-05 [shard]: 2.08002e-06 [meta_shard_fg_expand]: 3.29001e-06 [shard_inline]: 1.588e-05 [merge_send_recv]: 1.651e-05 [auto_parallel]: 1.07e-05 [parallel]: 1.822e-05 [flash_sp]: 1.123e-05 [merge_comm]: 9.74e-06 [allreduce_fusion]: 8.60999e-06 [matmul_add_comm_reduction]: 2.687e-05 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 1.772e-05 [virtual_dataset]: 1.598e-05 [get_grad_eliminate_]: 1.532e-05 [virtual_output]: 1.506e-05 [merge_forward]: 8.96002e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 1.783e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.873e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 2.746e-05 [set_forward_comm_id_for_comm_node_pass]: 9.60001e-06 [meta_fg_expand]: 0.00139984 [flash_sp_send_recv_attached]: 4.08999e-06 [receive_attached]: 2.43e-06 [after_resolve]: 6.023e-05 [a_after_grad]: 7.948e-05 [renormalize]: 0.00246507 [add_forward_monad_depend]: 9.22999e-06 [auto_monad_grad]: 5.82001e-06 [auto_monad_eliminator]: 5.682e-05 [cse]: 0.00016932 [a_3]: 0.00033392 [Cycle 2]: 0.00305442, [45] [expand_dump_flag]: 1.52999e-06 [switch_simplify]: 4.648e-05 [loop_unroll]: 4.32e-05 [a_1]: 0.00153037 [with_stream_mark]: 1.232e-05 [recompute_prepare]: 1.073e-05 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 3.78999e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012592 [accelerated_algorithm]: 1.214e-05 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 9.52001e-06 [merge_send_recv]: 7.18e-06 [auto_parallel]: 7.56999e-06 [parallel]: 5.18002e-06 [flash_sp]: 3.55e-06 [merge_comm]: 5.62999e-06 [allreduce_fusion]: 5.03002e-06 [matmul_add_comm_reduction]: 7.83999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.023e-05 [virtual_dataset]: 9.07001e-06 [get_grad_eliminate_]: 8.90001e-06 [virtual_output]: 8.50001e-06 [merge_forward]: 4.57e-06 [cell_reuse_recompute_pass]: 8.70001e-07 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.695e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.441e-05 [set_forward_comm_id_for_comm_node_pass]: 5.49e-06 [meta_fg_expand]: 6.875e-05 [flash_sp_send_recv_attached]: 1.02e-06 [receive_attached]: 1.14e-06 [after_resolve]: 1.651e-05 [a_after_grad]: 1.477e-05 [renormalize]: 0.00063762 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.10999e-06 [auto_monad_eliminator]: 1.472e-05 [cse]: 4.721e-05 [a_3]: 6.728e-05 [Cycle 3]: 0.00090773, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 1.077e-05 [loop_unroll]: 9.07999e-06 [a_1]: 0.00025061 [with_stream_mark]: 9.96998e-06 [recompute_prepare]: 9.40001e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 4.13001e-06 [updatestate_loads_eliminate]: 3.89997e-06 [parameter_eliminate]: 8.90024e-07 [a_2]: 0.00012329 [accelerated_algorithm]: 1.167e-05 [shard]: 1.04e-06 [meta_shard_fg_expand]: 1.74e-06 [shard_inline]: 8.98002e-06 [merge_send_recv]: 6.98e-06 [auto_parallel]: 7.14001e-06 [parallel]: 4.78001e-06 [flash_sp]: 9.50007e-07 [merge_comm]: 4.89998e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.87e-06 [allreduce_slice_to_reducescatter]: 4.39992e-07 [virtual_shard_identity]: 1.013e-05 [virtual_dataset]: 8.74e-06 [get_grad_eliminate_]: 8.45001e-06 [virtual_output]: 8.23999e-06 [merge_forward]: 4.4e-06 [cell_reuse_recompute_pass]: 1.41998e-06 [offload_activation]: 9.09998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.675e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.461e-05 [set_forward_comm_id_for_comm_node_pass]: 5.27999e-06 [meta_fg_expand]: 3.04001e-06 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.04e-06 [after_resolve]: 1.373e-05 [a_after_grad]: 1.406e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.22e-06 [auto_monad_grad]: 9.10019e-07 [auto_monad_eliminator]: 1.073e-05 [cse]: 2.748e-05 [a_3]: 5.937e-05 [py_interpret_to_execute_after_opt_a]: 1.023e-05 [slice_cell_reuse_recomputed_activation]: 2.28002e-06 [rewriter_after_opt_a]: 4.627e-05 [convert_after_rewriter]: 9.67001e-06 [order_py_execute_after_rewriter]: 6.66e-06 [mutable_eliminate]: 0.00045854 [opt_b]: 0.00028745, [1] [Cycle 1]: 0.00028133, [7] [b_1]: 0.00018768 [b_2]: 1.073e-05 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.14002e-06 [updatestate_loads_eliminate]: 3.93999e-06 [renormalize]: 3.29979e-07 [cse]: 3.281e-05 [optimize_parallel_all_gather_comm]: 2.084e-05 [overlap_param_gather]: 2.16e-06 [cconv]: 1.882e-05 [loop_unroll]: 0.00042183 [opt_after_cconv]: 0.00013757, [1] [Cycle 1]: 0.00013166, [7] [c_1]: 4.834e-05 [parameter_eliminate]: 2.43998e-06 [updatestate_depend_eliminate]: 6.98e-06 [updatestate_assign_eliminate]: 4.18999e-06 [updatestate_loads_eliminate]: 4.07998e-06 [cse]: 3.179e-05 [renormalize]: 2.10013e-07 [remove_dup_value]: 2.942e-05 [tuple_transform]: 0.00010136, [1] [Cycle 1]: 9.671e-05, [4] [d_1]: 6.648e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 9.91e-06 [partial_unused_args_eliminate]: 1.61998e-06 [add_recomputation]: 5.51e-05 [cse_after_recomputation]: 3.423e-05, [1] [Cycle 1]: 2.953e-05, [1] [cse]: 2.386e-05 [environ_conv]: 8.57e-06 [swap_dp_allreduce_reducescatter]: 7.7e-06 [bias_add_comm_swap]: 2.70997e-06 [label_micro_interleaved_index]: 4.12e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.49e-06 [slice_recompute_activation]: 2.38002e-06 [micro_interleaved_order_control]: 2.59001e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89e-06 [control_data_broadcast_order]: 1.727e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.52999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 5.09e-06 [overlap_grad_flash_sp]: 2.427e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.15002e-06 [split_layernorm_comm]: 2.06e-06 [handle_group_info]: 9.49978e-07 [symbol_engine_optimizer]: 9.814e-05, [1] [Cycle 1]: 9.389e-05, [6] [build]: 9.73002e-06 [elim_shapecalc]: 1.361e-05 [elim_not_effective]: 1.779e-05 [opt_reshape]: 1.019e-05 [fold_const_symbol]: 1.475e-05 [renormalize]: 2.00002e-07 [detach_backward]: 2.09e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 2.512e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.73999e-06 [opt_after_jit_grad]: 0.00048743 [validate]: 4.537e-05 [backend_pass]: 1.28002e-06 [task_emit]: 0.0780624 [execute]: 8.02e-06 Sums bootstrap : 0.000509s : 0.49% type_inference : 0.011369s : 11.04% event_method : 0.000048s : 0.05% auto_monad : 0.000120s : 0.12% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.04% optimize.rewriter_before_opt_a : 0.000144s : 0.14% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000131s : 0.13% optimize.opt_a.loop_unroll : 0.000113s : 0.11% optimize.opt_a.a_1 : 0.003238s : 3.14% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000492s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000031s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001472s : 1.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.09% optimize.opt_a.a_after_grad : 0.000108s : 0.11% optimize.opt_a.renormalize : 0.003103s : 3.01% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000244s : 0.24% optimize.opt_a.a_3 : 0.000461s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000459s : 0.45% optimize.opt_b.b_1 : 0.000188s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000422s : 0.41% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000066s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.05% optimize.cse_after_recomputation.cse : 0.000024s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000487s : 0.47% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.078062s : 75.77% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000763 222 5.87% : 0.000045s : 12: substitution.arithmetic_simplify 1.83% : 0.000014s : 2: substitution.cast_eliminate 0.34% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.56% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.95% : 0.000007s : 8: substitution.graph_param_transform 0.34% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.91% : 0.000426s : 17: substitution.inline 2.02% : 0.000015s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.93% : 0.000015s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.90% : 0.000015s : 20: substitution.remove_not_recompute_node 3.16% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.29% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.64% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.76% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.71% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011295 2 86.57% : 0.009778s : 1: type_inference.infer 13.43% : 0.001517s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.78% : 0.000127s : 17: replace.inline 42.22% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000451 33 92.46% : 0.000417s : 17: match.inline 7.54% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.31% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.07% : 0.000016s : 100: predicate.arithmetic_simplify 1.19% : 0.000009s : 68: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.34% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.61% : 0.000005s : 32: predicate.less_batch_normalization 1.64% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.28% : 0.000010s : 68: predicate.reduce_eliminate 2.67% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000005s : 55: predicate.replace_old_param 0.09% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.17% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.06% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.47% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.04% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.64% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.68% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001613 34 55.63% : 0.000897s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.37% : 0.000716s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131964 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.29% : 0.003024s : 1: add_attr 2.28% : 0.003015s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000059s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000127s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.41% : 0.000545s : 1: bootstrap 0.02% : 0.000022s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000056s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000430s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.71% : 0.004899s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.38% : 0.011065s : 1: opt_a 0.11% : 0.000141s : 1: opt_after_cconv 0.38% : 0.000497s : 1: opt_after_jit_grad 0.22% : 0.000291s : 1: opt_b 10.09% : 0.013322s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000034s : 1: remove_dup_value 1.26% : 0.001656s : 2: renormalize.infer 1.09% : 0.001434s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.11% : 0.000148s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 59.17% : 0.078080s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 8.63% : 0.011384s : 1: type_inference 0.05% : 0.000069s : 1: validate TotalTime = 0.0697859, [24] [bootstrap]: 0.00046721 [type_inference]: 0.00427408 [event_method]: 1.084e-05 [auto_monad]: 5.07e-05 [graph_reusing]: 5.05999e-06 [inline]: 1.72001e-06 [add_attr]: 0.002989, [1] [add_attr_with_inline]: 0.00298082, [1] [Cycle 1]: 4.468e-05, [2] [tag_attr]: 1.161e-05 [meta_addattr_fg_expand]: 3.65e-06 [parallel-infer-symbol]: 2.53e-06 [pre_auto_parallel]: 2.211e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.39999e-06 [pipeline_split]: 1.84998e-06 [optimize]: 0.00364956, [53] [py_interpret_to_execute]: 1.507e-05 [rewriter_before_opt_a]: 3.834e-05 [opt_a]: 0.00184617, [2] [Cycle 1]: 0.00124959, [45] [expand_dump_flag]: 2.84001e-06 [switch_simplify]: 2.517e-05 [loop_unroll]: 1.39e-05 [a_1]: 0.00029241 [with_stream_mark]: 1.263e-05 [recompute_prepare]: 7.45998e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.18e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.56002e-06 [a_2]: 7.694e-05 [accelerated_algorithm]: 6.25002e-06 [shard]: 2.26e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 7.68001e-06 [auto_parallel]: 6.17999e-06 [parallel]: 1.805e-05 [flash_sp]: 7.01001e-06 [merge_comm]: 3.58e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 9.27001e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 6.85998e-06 [virtual_dataset]: 5.66998e-06 [get_grad_eliminate_]: 5.66e-06 [virtual_output]: 6.02999e-06 [merge_forward]: 3.77002e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 9.23002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.071e-05 [merge_recompute_call_nodes]: 1.71e-06 [before_grad]: 1.028e-05 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.24999e-06 [receive_attached]: 2.46998e-06 [after_resolve]: 1.051e-05 [a_after_grad]: 9.02e-06 [renormalize]: 0.00033837 [add_forward_monad_depend]: 4.18001e-06 [auto_monad_grad]: 1.83002e-06 [auto_monad_eliminator]: 1.33e-05 [cse]: 2.654e-05 [a_3]: 3.979e-05 [Cycle 2]: 0.00058719, [45] [expand_dump_flag]: 8.70001e-07 [switch_simplify]: 6.64999e-06 [loop_unroll]: 5.25999e-06 [a_1]: 0.00012342 [with_stream_mark]: 9.32001e-06 [recompute_prepare]: 5.52001e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.772e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.34e-06 [merge_send_recv]: 4.33999e-06 [auto_parallel]: 5.46998e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.33998e-06 [merge_comm]: 3.05002e-06 [allreduce_fusion]: 2.58e-06 [matmul_add_comm_reduction]: 4.94998e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.14001e-06 [virtual_dataset]: 4.99998e-06 [get_grad_eliminate_]: 4.92e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.49999e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.89999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 7.91001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.66002e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.92999e-06 [a_after_grad]: 7.95e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.293e-05 [a_3]: 3.134e-05 [py_interpret_to_execute_after_opt_a]: 7.36001e-06 [slice_cell_reuse_recomputed_activation]: 1.66998e-06 [rewriter_after_opt_a]: 3.059e-05 [convert_after_rewriter]: 7.03998e-06 [order_py_execute_after_rewriter]: 5.27001e-06 [mutable_eliminate]: 0.00045026 [opt_b]: 0.00018073, [1] [Cycle 1]: 0.00017462, [7] [b_1]: 0.00010623 [b_2]: 7.03998e-06 [updatestate_depend_eliminate]: 5.52001e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 5.19998e-07 [cse]: 1.606e-05 [optimize_parallel_all_gather_comm]: 1.581e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 2.126e-05 [loop_unroll]: 0.00041664 [opt_after_cconv]: 9.564e-05, [1] [Cycle 1]: 9.01e-05, [7] [c_1]: 2.807e-05 [parameter_eliminate]: 2.10002e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.692e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 1.243e-05 [tuple_transform]: 6.833e-05, [1] [Cycle 1]: 6.404e-05, [4] [d_1]: 3.857e-05 [none_parameter_eliminate]: 1.66998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 1.55001e-06 [add_recomputation]: 4.286e-05 [cse_after_recomputation]: 2.035e-05, [1] [Cycle 1]: 1.593e-05, [1] [cse]: 1.092e-05 [environ_conv]: 4.38999e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.24001e-06 [label_micro_interleaved_index]: 4.97e-06 [label_fine_grained_interleaved_index]: 2.93e-06 [merge_cast_opt]: 1.30001e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.16003e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 8.00006e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.06003e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.157e-05 [grouped_pairwise_exchange_alltoall]: 1.48002e-06 [offloading_packed_experts]: 3.43999e-06 [overlap_recompute_and_grad_model_parallel]: 4.21001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.634e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.28998e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.886e-05, [1] [Cycle 1]: 6.476e-05, [6] [build]: 2.12001e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.171e-05 [opt_reshape]: 5.90002e-06 [fold_const_symbol]: 8.99e-06 [renormalize]: 1.70025e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.57e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.6e-06 [opt_after_jit_grad]: 0.0004709 [validate]: 3.168e-05 [backend_pass]: 8.80013e-07 [task_emit]: 0.0575716 [execute]: 8.48999e-06 Sums bootstrap : 0.000467s : 0.71% type_inference : 0.004274s : 6.49% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000416s : 0.63% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000338s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.68% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000417s : 0.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000471s : 0.72% validate : 0.000032s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057572s : 87.45% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.03% : 0.000022s : 4: substitution.arithmetic_simplify 1.68% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.36% : 0.000005s : 4: substitution.graph_param_transform 66.08% : 0.000080s : 2: substitution.inline 2.15% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.66% : 0.000004s : 4: substitution.remove_not_recompute_node 2.99% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004233 2 91.73% : 0.003883s : 1: type_inference.infer 8.27% : 0.000350s : 1: type_inference.specialize ------[replace.] 0.000020 2 100.00% : 0.000020s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000136 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.13% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.90% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.30% : 0.000003s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 1.07% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.87% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000008s : 44: predicate.inline 0.97% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.20% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.91% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.76% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.76% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.34% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.62% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.80% : 0.000001s : 8: predicate.remove_not_recompute_node 1.28% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.38% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.58% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.93% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.97% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.65% : 0.000006s : 41: predicate.switch_simplify 0.72% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.11% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.02% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.59% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.07% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000241 6 42.37% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.63% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.077686 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.85% : 0.002993s : 1: add_attr 3.84% : 0.002984s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000504s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.55% : 0.000425s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.99% : 0.000766s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.06% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.38% : 0.001849s : 1: opt_a 0.13% : 0.000099s : 1: opt_after_cconv 0.62% : 0.000480s : 1: opt_after_jit_grad 0.24% : 0.000184s : 1: opt_b 4.70% : 0.003653s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.23% : 0.000182s : 1: renormalize.infer 0.19% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.13% : 0.057588s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.52% : 0.004289s : 1: type_inference 0.07% : 0.000054s : 1: validate TotalTime = 0.105404, [24] [bootstrap]: 0.00050575 [type_inference]: 0.0102346 [event_method]: 4.341e-05 [auto_monad]: 0.00011276 [graph_reusing]: 8.03999e-06 [inline]: 2.51998e-06 [add_attr]: 0.00300645, [1] [add_attr_with_inline]: 0.00299799, [1] [Cycle 1]: 6.539e-05, [2] [tag_attr]: 3.094e-05 [meta_addattr_fg_expand]: 8.48999e-06 [parallel-infer-symbol]: 2.80002e-06 [pre_auto_parallel]: 4.413e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 6.30011e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.0130664, [53] [py_interpret_to_execute]: 3.543e-05 [rewriter_before_opt_a]: 0.00012613 [opt_a]: 0.0107798, [3] [Cycle 1]: 0.00689105, [45] [expand_dump_flag]: 3.58e-06 [switch_simplify]: 6.562e-05 [loop_unroll]: 5.484e-05 [a_1]: 0.00132787 [with_stream_mark]: 2.322e-05 [recompute_prepare]: 2.177e-05 [updatestate_depend_eliminate]: 9.17001e-06 [updatestate_assign_eliminate]: 7.42002e-06 [updatestate_loads_eliminate]: 7.07002e-06 [parameter_eliminate]: 2.43998e-06 [a_2]: 0.00024397 [accelerated_algorithm]: 3.078e-05 [shard]: 2.14e-06 [meta_shard_fg_expand]: 3.23e-06 [shard_inline]: 1.641e-05 [merge_send_recv]: 1.546e-05 [auto_parallel]: 1.074e-05 [parallel]: 1.794e-05 [flash_sp]: 1.124e-05 [merge_comm]: 9.50001e-06 [allreduce_fusion]: 9.08002e-06 [matmul_add_comm_reduction]: 2.619e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.813e-05 [virtual_dataset]: 1.544e-05 [get_grad_eliminate_]: 1.534e-05 [virtual_output]: 1.548e-05 [merge_forward]: 9.22001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.699e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.853e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 2.672e-05 [set_forward_comm_id_for_comm_node_pass]: 9.61003e-06 [meta_fg_expand]: 0.00142208 [flash_sp_send_recv_attached]: 3.78999e-06 [receive_attached]: 2.89999e-06 [after_resolve]: 5.913e-05 [a_after_grad]: 8.032e-05 [renormalize]: 0.00239714 [add_forward_monad_depend]: 9.53002e-06 [auto_monad_grad]: 5.28002e-06 [auto_monad_eliminator]: 5.692e-05 [cse]: 0.00016744 [a_3]: 0.00033462 [Cycle 2]: 0.00297196, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.719e-05 [loop_unroll]: 4.424e-05 [a_1]: 0.00155376 [with_stream_mark]: 1.178e-05 [recompute_prepare]: 1.099e-05 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.0001262 [accelerated_algorithm]: 1.215e-05 [shard]: 9.50007e-07 [meta_shard_fg_expand]: 2.23998e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 7.6e-06 [parallel]: 4.47998e-06 [flash_sp]: 3.13e-06 [merge_comm]: 5.42001e-06 [allreduce_fusion]: 4.72e-06 [matmul_add_comm_reduction]: 7.64002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.059e-05 [virtual_dataset]: 8.85001e-06 [get_grad_eliminate_]: 8.77e-06 [virtual_output]: 8.43001e-06 [merge_forward]: 4.25999e-06 [cell_reuse_recompute_pass]: 1.04003e-06 [offload_activation]: 8.96998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.605e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 1.408e-05 [set_forward_comm_id_for_comm_node_pass]: 5.13002e-06 [meta_fg_expand]: 3.504e-05 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.10001e-06 [after_resolve]: 1.479e-05 [a_after_grad]: 1.46e-05 [renormalize]: 0.00057844 [add_forward_monad_depend]: 4.12e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.463e-05 [cse]: 4.609e-05 [a_3]: 6.5e-05 [Cycle 3]: 0.00090263, [45] [expand_dump_flag]: 1.05001e-06 [switch_simplify]: 1.078e-05 [loop_unroll]: 9.02999e-06 [a_1]: 0.00025047 [with_stream_mark]: 9.77001e-06 [recompute_prepare]: 9.27999e-06 [updatestate_depend_eliminate]: 4.74e-06 [updatestate_assign_eliminate]: 3.78001e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.0001232 [accelerated_algorithm]: 1.172e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.72001e-06 [shard_inline]: 9.01998e-06 [merge_send_recv]: 6.63e-06 [auto_parallel]: 6.86999e-06 [parallel]: 4.38001e-06 [flash_sp]: 1.07998e-06 [merge_comm]: 4.91002e-06 [allreduce_fusion]: 4.87e-06 [matmul_add_comm_reduction]: 7.63999e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 9.96e-06 [virtual_dataset]: 8.70001e-06 [get_grad_eliminate_]: 8.32e-06 [virtual_output]: 8.35001e-06 [merge_forward]: 4.90001e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.758e-05 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 1.425e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07e-06 [meta_fg_expand]: 2.96001e-06 [flash_sp_send_recv_attached]: 9.09989e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 1.391e-05 [a_after_grad]: 1.399e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.29998e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 1.046e-05 [cse]: 2.545e-05 [a_3]: 6.003e-05 [py_interpret_to_execute_after_opt_a]: 1.073e-05 [slice_cell_reuse_recomputed_activation]: 1.86998e-06 [rewriter_after_opt_a]: 4.611e-05 [convert_after_rewriter]: 9.31e-06 [order_py_execute_after_rewriter]: 6.70998e-06 [mutable_eliminate]: 0.00051526 [opt_b]: 0.00028551, [1] [Cycle 1]: 0.00027966, [7] [b_1]: 0.00018733 [b_2]: 1.079e-05 [updatestate_depend_eliminate]: 7.36001e-06 [updatestate_assign_eliminate]: 3.98999e-06 [updatestate_loads_eliminate]: 3.86999e-06 [renormalize]: 4.30009e-07 [cse]: 3.121e-05 [optimize_parallel_all_gather_comm]: 1.99e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 1.943e-05 [loop_unroll]: 0.00042467 [opt_after_cconv]: 0.00013546, [1] [Cycle 1]: 0.00012952, [7] [c_1]: 4.828e-05 [parameter_eliminate]: 2.12999e-06 [updatestate_depend_eliminate]: 6.84999e-06 [updatestate_assign_eliminate]: 4.1e-06 [updatestate_loads_eliminate]: 4.07e-06 [cse]: 2.994e-05 [renormalize]: 4.60015e-07 [remove_dup_value]: 2.84e-05 [tuple_transform]: 0.00010065, [1] [Cycle 1]: 9.624e-05, [4] [d_1]: 6.641e-05 [none_parameter_eliminate]: 1.72999e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.66003e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 5.543e-05 [cse_after_recomputation]: 3.322e-05, [1] [Cycle 1]: 2.856e-05, [1] [cse]: 2.28e-05 [environ_conv]: 9.07999e-06 [swap_dp_allreduce_reducescatter]: 7.92e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.43e-06 [micro_interleaved_order_control]: 2.31998e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.40023e-07 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 1.96e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.80013e-07 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.686e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 5.15999e-06 [overlap_recompute_and_grad_model_parallel]: 5.39e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.11e-06 [overlap_grad_ring_attention]: 5.01002e-06 [overlap_grad_flash_sp]: 2.323e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.26e-06 [split_layernorm_comm]: 1.83002e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 9.744e-05, [1] [Cycle 1]: 9.309e-05, [6] [build]: 9.93002e-06 [elim_shapecalc]: 1.32e-05 [elim_not_effective]: 1.805e-05 [opt_reshape]: 9.51e-06 [fold_const_symbol]: 1.49e-05 [renormalize]: 2.29978e-07 [detach_backward]: 1.71002e-06 [pipeline_parallel_scheduler]: 1.64998e-06 [auto_monad_reorder]: 2.432e-05 [get_jit_bprop_graph]: 1.00999e-06 [rewriter_after_jit_bprop_graph]: 3.51999e-06 [opt_after_jit_grad]: 0.00047397 [validate]: 4.579e-05 [backend_pass]: 1.37e-06 [task_emit]: 0.0775756 [execute]: 3.354e-05 Sums bootstrap : 0.000506s : 0.50% type_inference : 0.010235s : 10.12% event_method : 0.000043s : 0.04% auto_monad : 0.000113s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000044s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.04% optimize.rewriter_before_opt_a : 0.000126s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000124s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.11% optimize.opt_a.a_1 : 0.003132s : 3.10% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000493s : 0.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001460s : 1.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.09% optimize.opt_a.a_after_grad : 0.000109s : 0.11% optimize.opt_a.renormalize : 0.002976s : 2.94% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000239s : 0.24% optimize.opt_a.a_3 : 0.000460s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000515s : 0.51% optimize.opt_b.b_1 : 0.000187s : 0.19% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000425s : 0.42% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.03% optimize.tuple_transform.d_1 : 0.000066s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000055s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000023s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000474s : 0.47% validate : 0.000046s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.077576s : 76.70% execute : 0.000034s : 0.03% Time group info: ------[substitution.] 0.000724 218 5.89% : 0.000043s : 11: substitution.arithmetic_simplify 1.84% : 0.000013s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.64% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.32% : 0.000002s : 5: substitution.fold_const_symbol 1.03% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 54.73% : 0.000396s : 16: substitution.inline 2.16% : 0.000016s : 2: substitution.inline_without_move 1.34% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.93% : 0.000014s : 20: substitution.remove_not_recompute_node 3.20% : 0.000023s : 10: substitution.replace_applicator 1.47% : 0.000011s : 15: substitution.replace_old_param 0.37% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.80% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.92% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.44% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.43% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.50% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010166 2 87.06% : 0.008850s : 1: type_inference.infer 12.94% : 0.001316s : 1: type_inference.specialize ------[replace.] 0.000199 30 59.10% : 0.000118s : 16: replace.inline 40.90% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000417 30 92.92% : 0.000388s : 16: match.inline 7.08% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.08% : 0.000008s : 67: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.53% : 0.000004s : 32: predicate.addn_check_dump 1.12% : 0.000008s : 67: predicate.addn_zero_filter 1.08% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.09% : 0.000015s : 99: predicate.arithmetic_simplify 1.12% : 0.000008s : 67: predicate.cast_eliminate 1.15% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.20% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.22% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.25% : 0.000017s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.69% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.64% : 0.000042s : 244: predicate.inline 1.31% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000019s : 164: predicate.load_eliminater 0.35% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.45% : 0.000011s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 67: predicate.minmaximum_grad 0.36% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.13% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.70% : 0.000013s : 89: predicate.partial_eliminate 1.05% : 0.000008s : 67: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.64% : 0.000019s : 164: predicate.redundant_stop_gradient_eliminater 0.37% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 149: predicate.replace_applicator 0.64% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 67: predicate.reshape_eliminate 1.13% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.26% : 0.000009s : 68: predicate.same_eliminate 0.40% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.92% : 0.000021s : 165: predicate.switch_layer_defer_inline 4.85% : 0.000036s : 265: predicate.switch_simplify 1.09% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.87% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.03% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001508 32 57.11% : 0.000861s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.89% : 0.000647s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.129555 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.32% : 0.003011s : 1: add_attr 2.32% : 0.003002s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000119s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000539s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.03% : 0.000040s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000433s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000525s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.69% : 0.004779s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000173s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000052s : 4: opt.transform.symbol_engine_opt 8.32% : 0.010783s : 1: opt_a 0.11% : 0.000139s : 1: opt_after_cconv 0.37% : 0.000484s : 1: opt_after_jit_grad 0.22% : 0.000289s : 1: opt_b 10.09% : 0.013070s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000026s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000049s : 1: pre_auto_parallel 0.03% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.21% : 0.001564s : 2: renormalize.infer 1.08% : 0.001399s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.10% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000100s : 1: symbol_engine_optimizer 59.89% : 0.077593s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 7.91% : 0.010249s : 1: type_inference 0.05% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x0-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x0-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x1-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x1-pynative],max_mem:64.0M TotalTime = 0.0214281, [24] [bootstrap]: 0.00055474 [type_inference]: 0.00612346 [event_method]: 1.398e-05 [auto_monad]: 8.144e-05 [graph_reusing]: 5.61003e-06 [inline]: 2.01e-06 [add_attr]: 0.00337473, [1] [add_attr_with_inline]: 0.0033636, [1] [Cycle 1]: 4.349e-05, [2] [tag_attr]: 1.482e-05 [meta_addattr_fg_expand]: 4.05e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.846e-05 [insert-virtual-dataset]: 2.25002e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.77001e-06 [pipeline_split]: 1.86e-06 [optimize]: 0.00397103, [53] [py_interpret_to_execute]: 1.943e-05 [rewriter_before_opt_a]: 5.797e-05 [opt_a]: 0.0021346, [2] [Cycle 1]: 0.00153122, [45] [expand_dump_flag]: 2.68003e-06 [switch_simplify]: 3.212e-05 [loop_unroll]: 2.1e-05 [a_1]: 0.00047141 [with_stream_mark]: 1.323e-05 [recompute_prepare]: 7.70998e-06 [updatestate_depend_eliminate]: 3.86999e-06 [updatestate_assign_eliminate]: 3.23e-06 [updatestate_loads_eliminate]: 3.15998e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.587e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 2.12001e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.11998e-06 [merge_send_recv]: 7.64002e-06 [auto_parallel]: 5.92999e-06 [parallel]: 2.295e-05 [flash_sp]: 7.32997e-06 [merge_comm]: 3.52002e-06 [allreduce_fusion]: 3.72998e-06 [matmul_add_comm_reduction]: 8.45001e-06 [allreduce_slice_to_reducescatter]: 6.80011e-07 [virtual_shard_identity]: 7.56999e-06 [virtual_dataset]: 6.04999e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.68002e-06 [merge_forward]: 4.02002e-06 [cell_reuse_recompute_pass]: 1.15999e-06 [offload_activation]: 8.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.055e-05 [merge_recompute_call_nodes]: 1.92999e-06 [before_grad]: 9.37001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48e-06 [meta_fg_expand]: 2.76e-06 [flash_sp_send_recv_attached]: 2.77002e-06 [receive_attached]: 2.69001e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 8.65999e-06 [renormalize]: 0.00041219 [add_forward_monad_depend]: 4.72e-06 [auto_monad_grad]: 1.67999e-06 [auto_monad_eliminator]: 1.346e-05 [cse]: 2.526e-05 [a_3]: 4.041e-05 [Cycle 2]: 0.00059376, [45] [expand_dump_flag]: 9.29984e-07 [switch_simplify]: 7.16001e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012512 [with_stream_mark]: 9.51e-06 [recompute_prepare]: 5.87001e-06 [updatestate_depend_eliminate]: 2.68e-06 [updatestate_assign_eliminate]: 2.19001e-06 [updatestate_loads_eliminate]: 2.43002e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 6.802e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.49998e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.44e-06 [parallel]: 4e-06 [flash_sp]: 3.58e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 4.95999e-06 [allreduce_slice_to_reducescatter]: 2.50002e-07 [virtual_shard_identity]: 5.99e-06 [virtual_dataset]: 5.46e-06 [get_grad_eliminate_]: 4.99e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 5.74999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.52998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93998e-06 [meta_fg_expand]: 1.72001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.07e-06 [after_resolve]: 9.11002e-06 [a_after_grad]: 7.97e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 6.05002e-06 [cse]: 1.341e-05 [a_3]: 3.331e-05 [py_interpret_to_execute_after_opt_a]: 7.54002e-06 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 2.8e-05 [convert_after_rewriter]: 7.27002e-06 [order_py_execute_after_rewriter]: 5.39e-06 [mutable_eliminate]: 0.0004516 [opt_b]: 0.00018165, [1] [Cycle 1]: 0.00017542, [7] [b_1]: 0.00010757 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.16998e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 4.09986e-07 [cse]: 1.664e-05 [optimize_parallel_all_gather_comm]: 1.499e-05 [overlap_param_gather]: 2.09e-06 [cconv]: 2.199e-05 [loop_unroll]: 0.00041517 [opt_after_cconv]: 9.547e-05, [1] [Cycle 1]: 8.97e-05, [7] [c_1]: 2.816e-05 [parameter_eliminate]: 2.21998e-06 [updatestate_depend_eliminate]: 5.34e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.22999e-06 [cse]: 1.702e-05 [renormalize]: 1.79978e-07 [remove_dup_value]: 1.335e-05 [tuple_transform]: 6.749e-05, [1] [Cycle 1]: 6.306e-05, [4] [d_1]: 3.792e-05 [none_parameter_eliminate]: 1.46002e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 4.98e-05 [cse_after_recomputation]: 2.089e-05, [1] [Cycle 1]: 1.622e-05, [1] [cse]: 1.117e-05 [environ_conv]: 4.74002e-06 [swap_dp_allreduce_reducescatter]: 4.95999e-06 [bias_add_comm_swap]: 2.56e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.65002e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.08002e-06 [micro_interleaved_order_control]: 2.14999e-06 [assign_add_opt]: 1.64e-06 [ForceFp32Comm]: 9.99979e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 1.02e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.112e-05 [grouped_pairwise_exchange_alltoall]: 1.49998e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.17e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.49e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.09e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.653e-05 [begin_end_overlap_inline]: 7.2e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.64998e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.863e-05, [1] [Cycle 1]: 6.44e-05, [6] [build]: 2.19001e-06 [elim_shapecalc]: 8.27e-06 [elim_not_effective]: 1.19e-05 [opt_reshape]: 6.25002e-06 [fold_const_symbol]: 9.14998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.44003e-06 [auto_monad_reorder]: 1.492e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 0.00012676 [opt_after_jit_grad]: 0.00051028 [validate]: 3.076e-05 [backend_pass]: 1.04e-06 [task_emit]: 0.00636561 [execute]: 6.89999e-06 Sums bootstrap : 0.000555s : 3.25% type_inference : 0.006123s : 35.85% event_method : 0.000014s : 0.08% auto_monad : 0.000081s : 0.48% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000597s : 3.49% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000007s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000412s : 2.41% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.11% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000074s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000452s : 2.64% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000415s : 2.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.22% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000050s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000127s : 0.74% opt_after_jit_grad : 0.000510s : 2.99% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006366s : 37.26% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000186 30 24.90% : 0.000046s : 5: substitution.arithmetic_simplify 1.14% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 2.94% : 0.000005s : 4: substitution.graph_param_transform 58.55% : 0.000109s : 3: substitution.inline 1.45% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.38% : 0.000004s : 4: substitution.remove_not_recompute_node 1.93% : 0.000004s : 4: substitution.replace_old_param 5.91% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006074 2 90.26% : 0.005482s : 1: type_inference.infer 9.74% : 0.000592s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.52% : 0.000026s : 3: replace.inline 30.48% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.52% : 0.000107s : 3: match.inline 8.48% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 1.01% : 0.000002s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.55% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.04% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.16% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.85% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000009s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.36% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.23% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.83% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.64% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.64% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.83% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.58% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 11: predicate.reshape_eliminate 0.66% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 1.09% : 0.000002s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000002s : 8: predicate.shard_identity_eliminate 0.85% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 0.86% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.92% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.34% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.23% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000380 8 46.07% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.93% : 0.000205s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030303 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.15% : 0.003379s : 1: add_attr 11.11% : 0.003367s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.29% : 0.000087s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.95% : 0.000591s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000019s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000461s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.18% : 0.000964s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000042s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.05% : 0.002138s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.72% : 0.000520s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.12% : 0.003975s : 1: optimize 0.06% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.69% : 0.000211s : 1: renormalize.infer 0.64% : 0.000195s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.44% : 0.000132s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000032s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 21.04% : 0.006376s : 1: task_emit 0.23% : 0.000070s : 1: tuple_transform 20.25% : 0.006137s : 1: type_inference 0.20% : 0.000060s : 1: validate TotalTime = 0.0179818, [24] [bootstrap]: 0.00043925 [type_inference]: 0.00427568 [event_method]: 1.032e-05 [auto_monad]: 5.057e-05 [graph_reusing]: 4.79e-06 [inline]: 2.30002e-06 [add_attr]: 0.00296741, [1] [add_attr_with_inline]: 0.0029591, [1] [Cycle 1]: 4.491e-05, [2] [tag_attr]: 1.171e-05 [meta_addattr_fg_expand]: 3.12002e-06 [parallel-infer-symbol]: 2.50002e-06 [pre_auto_parallel]: 2.237e-05 [insert-virtual-dataset]: 3.04999e-06 [parallel-infer-symbol-second]: 8.79983e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.71e-06 [optimize]: 0.0036626, [53] [py_interpret_to_execute]: 1.445e-05 [rewriter_before_opt_a]: 3.807e-05 [opt_a]: 0.00183767, [2] [Cycle 1]: 0.00123935, [45] [expand_dump_flag]: 2.74001e-06 [switch_simplify]: 2.38e-05 [loop_unroll]: 1.345e-05 [a_1]: 0.00029244 [with_stream_mark]: 1.328e-05 [recompute_prepare]: 7.33999e-06 [updatestate_depend_eliminate]: 3.56999e-06 [updatestate_assign_eliminate]: 3.40998e-06 [updatestate_loads_eliminate]: 3.18998e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.52e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 1.96e-06 [meta_shard_fg_expand]: 1.51998e-06 [shard_inline]: 5.83002e-06 [merge_send_recv]: 8.19998e-06 [auto_parallel]: 5.52001e-06 [parallel]: 1.749e-05 [flash_sp]: 7.71999e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 9.48002e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 6.86001e-06 [virtual_dataset]: 5.77999e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 5.50001e-06 [merge_forward]: 3.65998e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 8.65999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.05001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.59002e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.54001e-06 [receive_attached]: 2.82002e-06 [after_resolve]: 1.014e-05 [a_after_grad]: 8.57998e-06 [renormalize]: 0.00033602 [add_forward_monad_depend]: 4.52e-06 [auto_monad_grad]: 1.76998e-06 [auto_monad_eliminator]: 1.341e-05 [cse]: 2.59e-05 [a_3]: 4.016e-05 [Cycle 2]: 0.00058895, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.83e-06 [loop_unroll]: 5.46e-06 [a_1]: 0.00012449 [with_stream_mark]: 9.20999e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.69999e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 7.79983e-07 [a_2]: 6.73e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.51e-06 [merge_send_recv]: 4.1e-06 [auto_parallel]: 5.09e-06 [parallel]: 4.35e-06 [flash_sp]: 3.3e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.00001e-06 [allreduce_slice_to_reducescatter]: 2.70025e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.47999e-06 [get_grad_eliminate_]: 4.79e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.57001e-06 [cell_reuse_recompute_pass]: 1.31998e-06 [offload_activation]: 5.87001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56e-06 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 7.93001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.27002e-06 [meta_fg_expand]: 1.62999e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 9.22999e-06 [a_after_grad]: 8.22e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.01997e-06 [auto_monad_grad]: 8.2e-07 [auto_monad_eliminator]: 6.59001e-06 [cse]: 1.247e-05 [a_3]: 3.227e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 1.74998e-06 [rewriter_after_opt_a]: 3.05e-05 [convert_after_rewriter]: 6.79999e-06 [order_py_execute_after_rewriter]: 5.24998e-06 [mutable_eliminate]: 0.00047598 [opt_b]: 0.00018093, [1] [Cycle 1]: 0.00017495, [7] [b_1]: 0.0001068 [b_2]: 7.49002e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.38002e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 3.80009e-07 [cse]: 1.62e-05 [optimize_parallel_all_gather_comm]: 1.6e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 2.352e-05 [loop_unroll]: 0.00041221 [opt_after_cconv]: 9.418e-05, [1] [Cycle 1]: 8.852e-05, [7] [c_1]: 2.786e-05 [parameter_eliminate]: 2.01998e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.27001e-06 [cse]: 1.553e-05 [renormalize]: 4.2998e-07 [remove_dup_value]: 1.156e-05 [tuple_transform]: 6.843e-05, [1] [Cycle 1]: 6.406e-05, [4] [d_1]: 3.845e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 6.11998e-06 [partial_unused_args_eliminate]: 1.73002e-06 [add_recomputation]: 4.219e-05 [cse_after_recomputation]: 1.956e-05, [1] [Cycle 1]: 1.525e-05, [1] [cse]: 1.017e-05 [environ_conv]: 4.74e-06 [swap_dp_allreduce_reducescatter]: 5.19e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.55001e-06 [label_fine_grained_interleaved_index]: 2.54999e-06 [merge_cast_opt]: 1.35001e-06 [slice_recompute_activation]: 2.64001e-06 [micro_interleaved_order_control]: 2.17999e-06 [assign_add_opt]: 1.54998e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.67001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.69972e-07 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 9.99979e-07 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.164e-05 [grouped_pairwise_exchange_alltoall]: 1.53002e-06 [offloading_packed_experts]: 4.03999e-06 [overlap_recompute_and_grad_model_parallel]: 4.73001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 2.49999e-06 [overlap_grad_ring_attention]: 4.44002e-06 [overlap_grad_flash_sp]: 1.613e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.94e-06 [split_layernorm_comm]: 1.89e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 6.707e-05, [1] [Cycle 1]: 6.296e-05, [6] [build]: 2.31998e-06 [elim_shapecalc]: 8.07e-06 [elim_not_effective]: 1.108e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.54e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.41002e-06 [auto_monad_reorder]: 1.504e-05 [get_jit_bprop_graph]: 9.39996e-07 [rewriter_after_jit_bprop_graph]: 3.58999e-06 [opt_after_jit_grad]: 0.00044417 [validate]: 3.023e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00583926 [execute]: 7.63001e-06 Sums bootstrap : 0.000439s : 3.12% type_inference : 0.004276s : 30.40% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000417s : 2.96% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000336s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000476s : 3.38% optimize.opt_b.b_1 : 0.000107s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000024s : 0.17% optimize.loop_unroll : 0.000412s : 2.93% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000444s : 3.16% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005839s : 41.52% execute : 0.000008s : 0.05% Time group info: ------[substitution.] 0.000120 26 18.66% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.01% : 0.000001s : 2: substitution.fold_const_symbol 4.21% : 0.000005s : 4: substitution.graph_param_transform 65.31% : 0.000078s : 2: substitution.inline 2.33% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.78% : 0.000005s : 4: substitution.remove_not_recompute_node 3.27% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004237 2 91.94% : 0.003896s : 1: type_inference.infer 8.06% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.05% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.38% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.31% : 0.000000s : 4: predicate.const_output_eliminate 0.68% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.84% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 2.00% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.97% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.57% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.21% : 0.000002s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.79% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.05% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.48% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.51% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.67% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.98% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 43.05% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.95% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025870 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.49% : 0.002972s : 1: add_attr 11.45% : 0.002963s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.83% : 0.000474s : 1: bootstrap 0.10% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000013s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.03% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000420s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.87% : 0.000485s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.96% : 0.000765s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.11% : 0.001841s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.75% : 0.000454s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.17% : 0.003666s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000027s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.71% : 0.000184s : 1: renormalize.infer 0.56% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.61% : 0.005849s : 1: task_emit 0.28% : 0.000071s : 1: tuple_transform 16.58% : 0.004288s : 1: type_inference 0.22% : 0.000056s : 1: validate TotalTime = 0.0195475, [24] [bootstrap]: 0.00050716 [type_inference]: 0.00551205 [event_method]: 1.354e-05 [auto_monad]: 5.274e-05 [graph_reusing]: 5.54e-06 [inline]: 1.70001e-06 [add_attr]: 0.00299865, [1] [add_attr_with_inline]: 0.00299047, [1] [Cycle 1]: 4.532e-05, [2] [tag_attr]: 1.494e-05 [meta_addattr_fg_expand]: 4.68001e-06 [parallel-infer-symbol]: 2.88e-06 [pre_auto_parallel]: 2.483e-05 [insert-virtual-dataset]: 2.39001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00390355, [53] [py_interpret_to_execute]: 1.899e-05 [rewriter_before_opt_a]: 5.691e-05 [opt_a]: 0.00206978, [2] [Cycle 1]: 0.00147532, [45] [expand_dump_flag]: 2.94999e-06 [switch_simplify]: 3.064e-05 [loop_unroll]: 2.067e-05 [a_1]: 0.00044382 [with_stream_mark]: 1.308e-05 [recompute_prepare]: 7.54002e-06 [updatestate_depend_eliminate]: 3.64002e-06 [updatestate_assign_eliminate]: 3.27002e-06 [updatestate_loads_eliminate]: 2.83003e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.591e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 1.83002e-06 [meta_shard_fg_expand]: 1.56002e-06 [shard_inline]: 5.69999e-06 [merge_send_recv]: 7.48e-06 [auto_parallel]: 5.84999e-06 [parallel]: 1.585e-05 [flash_sp]: 6.79999e-06 [merge_comm]: 3.48999e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 8.48999e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 7.13998e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.36002e-06 [virtual_output]: 5.75001e-06 [merge_forward]: 3.71999e-06 [cell_reuse_recompute_pass]: 1.37999e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.099e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.42001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.53e-06 [receive_attached]: 2.19001e-06 [after_resolve]: 1.053e-05 [a_after_grad]: 8.79003e-06 [renormalize]: 0.00040113 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.279e-05 [cse]: 2.645e-05 [a_3]: 4.084e-05 [Cycle 2]: 0.00058493, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012419 [with_stream_mark]: 9.12999e-06 [recompute_prepare]: 5.69999e-06 [updatestate_depend_eliminate]: 2.68998e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.43002e-06 [parameter_eliminate]: 8.89995e-07 [a_2]: 6.66e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.01002e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.22e-06 [flash_sp]: 3.57002e-06 [merge_comm]: 3.03e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.04998e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 5.87999e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 4.84e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.48002e-06 [offload_activation]: 5.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.21002e-06 [merge_recompute_call_nodes]: 6.39993e-07 [before_grad]: 7.77998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.91e-06 [meta_fg_expand]: 1.71e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.15001e-06 [after_resolve]: 8.95999e-06 [a_after_grad]: 7.91001e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.70002e-07 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 5.77999e-06 [cse]: 1.596e-05 [a_3]: 3.161e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.164e-05 [convert_after_rewriter]: 6.92002e-06 [order_py_execute_after_rewriter]: 5.17999e-06 [mutable_eliminate]: 0.00044808 [opt_b]: 0.00018053, [1] [Cycle 1]: 0.00017466, [7] [b_1]: 0.00010806 [b_2]: 7.31999e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.46e-06 [renormalize]: 3.80009e-07 [cse]: 1.563e-05 [optimize_parallel_all_gather_comm]: 1.557e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.132e-05 [loop_unroll]: 0.00042969 [opt_after_cconv]: 9.42e-05, [1] [Cycle 1]: 8.838e-05, [7] [c_1]: 2.804e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.567e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.204e-05 [tuple_transform]: 6.851e-05, [1] [Cycle 1]: 6.408e-05, [4] [d_1]: 3.885e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.87001e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.211e-05 [cse_after_recomputation]: 2.088e-05, [1] [Cycle 1]: 1.648e-05, [1] [cse]: 1.12e-05 [environ_conv]: 4.70999e-06 [swap_dp_allreduce_reducescatter]: 4.92e-06 [bias_add_comm_swap]: 2.73998e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.41998e-06 [merge_cast_opt]: 1.19e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.01998e-06 [assign_add_opt]: 1.19998e-06 [ForceFp32Comm]: 7.29982e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.09e-06 [reorder_send_recv_between_fp_bp]: 2.65997e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.07998e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.099e-05 [grouped_pairwise_exchange_alltoall]: 1.54998e-06 [offloading_packed_experts]: 3.55998e-06 [overlap_recompute_and_grad_model_parallel]: 4.28999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.08001e-06 [overlap_grad_flash_sp]: 1.655e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.816e-05, [1] [Cycle 1]: 6.414e-05, [6] [build]: 2.52001e-06 [elim_shapecalc]: 8.52e-06 [elim_not_effective]: 1.17e-05 [opt_reshape]: 5.93002e-06 [fold_const_symbol]: 8.59e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 1.515e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.2e-06 [opt_after_jit_grad]: 0.00044476 [validate]: 2.936e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00582467 [execute]: 6.42001e-06 Sums bootstrap : 0.000507s : 3.25% type_inference : 0.005512s : 35.32% event_method : 0.000014s : 0.09% auto_monad : 0.000053s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.36% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000037s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000568s : 3.64% optimize.opt_a.with_stream_mark : 0.000022s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000020s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000401s : 2.57% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000042s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.20% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 2.87% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.14% optimize.loop_unroll : 0.000430s : 2.75% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000445s : 2.85% validate : 0.000029s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005825s : 37.33% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000160 30 14.71% : 0.000024s : 5: substitution.arithmetic_simplify 1.30% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.25% : 0.000005s : 4: substitution.graph_param_transform 66.36% : 0.000106s : 3: substitution.inline 1.97% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.44% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005470 2 90.08% : 0.004928s : 1: type_inference.infer 9.92% : 0.000543s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.98% : 0.000027s : 3: replace.inline 30.02% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.66% : 0.000104s : 3: match.inline 8.34% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.85% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.27% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.64% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.93% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.99% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.11% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.25% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.92% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.23% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000009s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.88% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.77% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.65% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.67% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.04% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.27% : 0.000002s : 11: predicate.reduce_eliminate 2.36% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.73% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.90% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.86% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.00% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.46% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.37% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000336 8 45.73% : 0.000153s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.27% : 0.000182s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027938 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.75% : 0.003003s : 1: add_attr 10.72% : 0.002994s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.16% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000058s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.94% : 0.000542s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.57% : 0.000439s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.32% : 0.000928s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.42% : 0.002073s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.63% : 0.000455s : 1: opt_after_jit_grad 0.66% : 0.000184s : 1: opt_b 13.99% : 0.003907s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.73% : 0.000203s : 1: renormalize.infer 0.69% : 0.000192s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000036s : 1: rewriter_after_opt_a 0.22% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000071s : 1: symbol_engine_optimizer 20.88% : 0.005834s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 19.78% : 0.005525s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.0370519, [24] [bootstrap]: 0.00057352 [type_inference]: 0.0112633 [event_method]: 4.592e-05 [auto_monad]: 0.00011996 [graph_reusing]: 8.15e-06 [inline]: 1.96e-06 [add_attr]: 0.00299666, [1] [add_attr_with_inline]: 0.00298826, [1] [Cycle 1]: 7.014e-05, [2] [tag_attr]: 3.412e-05 [meta_addattr_fg_expand]: 9.71998e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 4.924e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 2.04e-06 [pipeline_split]: 1.78002e-06 [optimize]: 0.0131932, [53] [py_interpret_to_execute]: 3.788e-05 [rewriter_before_opt_a]: 0.00014326 [opt_a]: 0.0109081, [3] [Cycle 1]: 0.00697434, [45] [expand_dump_flag]: 3.48999e-06 [switch_simplify]: 7.371e-05 [loop_unroll]: 6.08e-05 [a_1]: 0.00144545 [with_stream_mark]: 2.28e-05 [recompute_prepare]: 2.101e-05 [updatestate_depend_eliminate]: 9.08002e-06 [updatestate_assign_eliminate]: 7.43e-06 [updatestate_loads_eliminate]: 6.98998e-06 [parameter_eliminate]: 3.09001e-06 [a_2]: 0.00024297 [accelerated_algorithm]: 2.991e-05 [shard]: 1.77999e-06 [meta_shard_fg_expand]: 3.38e-06 [shard_inline]: 1.623e-05 [merge_send_recv]: 1.523e-05 [auto_parallel]: 1.072e-05 [parallel]: 1.789e-05 [flash_sp]: 1.105e-05 [merge_comm]: 9.70002e-06 [allreduce_fusion]: 8.95001e-06 [matmul_add_comm_reduction]: 2.712e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.768e-05 [virtual_dataset]: 1.532e-05 [get_grad_eliminate_]: 1.502e-05 [virtual_output]: 1.494e-05 [merge_forward]: 9.72001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.754e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.866e-05 [merge_recompute_call_nodes]: 1.91e-06 [before_grad]: 2.726e-05 [set_forward_comm_id_for_comm_node_pass]: 9.87999e-06 [meta_fg_expand]: 0.00138731 [flash_sp_send_recv_attached]: 3.43e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 5.963e-05 [a_after_grad]: 8.055e-05 [renormalize]: 0.00239891 [add_forward_monad_depend]: 9.11002e-06 [auto_monad_grad]: 5.59e-06 [auto_monad_eliminator]: 5.522e-05 [cse]: 0.00015967 [a_3]: 0.00033121 [Cycle 2]: 0.00301748, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.664e-05 [loop_unroll]: 4.387e-05 [a_1]: 0.00152355 [with_stream_mark]: 1.194e-05 [recompute_prepare]: 1.085e-05 [updatestate_depend_eliminate]: 4.99e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 3.68999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.00012562 [accelerated_algorithm]: 1.216e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.88002e-06 [shard_inline]: 9.15999e-06 [merge_send_recv]: 6.66e-06 [auto_parallel]: 7.2e-06 [parallel]: 5.09e-06 [flash_sp]: 3.27002e-06 [merge_comm]: 5.11002e-06 [allreduce_fusion]: 4.405e-05 [matmul_add_comm_reduction]: 8.37e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.115e-05 [virtual_dataset]: 9.05999e-06 [get_grad_eliminate_]: 8.69e-06 [virtual_output]: 8.40001e-06 [merge_forward]: 4.50001e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.688e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.408e-05 [set_forward_comm_id_for_comm_node_pass]: 5.22e-06 [meta_fg_expand]: 6.927e-05 [flash_sp_send_recv_attached]: 9.29984e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.609e-05 [a_after_grad]: 1.417e-05 [renormalize]: 0.00057455 [add_forward_monad_depend]: 4.08999e-06 [auto_monad_grad]: 1.30999e-06 [auto_monad_eliminator]: 1.481e-05 [cse]: 4.604e-05 [a_3]: 6.524e-05 [Cycle 3]: 0.00090178, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 1.085e-05 [loop_unroll]: 8.87999e-06 [a_1]: 0.00025002 [with_stream_mark]: 9.87001e-06 [recompute_prepare]: 9.09e-06 [updatestate_depend_eliminate]: 4.88001e-06 [updatestate_assign_eliminate]: 3.91001e-06 [updatestate_loads_eliminate]: 3.84002e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 0.0001227 [accelerated_algorithm]: 1.152e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.68002e-06 [shard_inline]: 8.96998e-06 [merge_send_recv]: 6.89999e-06 [auto_parallel]: 7.23999e-06 [parallel]: 4.69002e-06 [flash_sp]: 9.80013e-07 [merge_comm]: 5.30999e-06 [allreduce_fusion]: 4.94998e-06 [matmul_add_comm_reduction]: 7.38e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 8.59e-06 [get_grad_eliminate_]: 8.50001e-06 [virtual_output]: 8.40999e-06 [merge_forward]: 4.82e-06 [cell_reuse_recompute_pass]: 1.30999e-06 [offload_activation]: 9.10001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.674e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.483e-05 [set_forward_comm_id_for_comm_node_pass]: 5.25001e-06 [meta_fg_expand]: 3.01001e-06 [flash_sp_send_recv_attached]: 7.79983e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.336e-05 [a_after_grad]: 1.377e-05 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 1.04e-06 [auto_monad_eliminator]: 1.079e-05 [cse]: 2.626e-05 [a_3]: 5.915e-05 [py_interpret_to_execute_after_opt_a]: 9.69e-06 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 4.758e-05 [convert_after_rewriter]: 9.05001e-06 [order_py_execute_after_rewriter]: 6.99001e-06 [mutable_eliminate]: 0.00045622 [opt_b]: 0.00028804, [1] [Cycle 1]: 0.00028183, [7] [b_1]: 0.00019003 [b_2]: 1.071e-05 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.09002e-06 [updatestate_loads_eliminate]: 4.07998e-06 [renormalize]: 3.30008e-07 [cse]: 3.096e-05 [optimize_parallel_all_gather_comm]: 1.974e-05 [overlap_param_gather]: 1.99999e-06 [cconv]: 1.941e-05 [loop_unroll]: 0.00042394 [opt_after_cconv]: 0.00013494, [1] [Cycle 1]: 0.00012884, [7] [c_1]: 4.81e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 3.99002e-06 [cse]: 2.969e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 2.899e-05 [tuple_transform]: 0.00010199, [1] [Cycle 1]: 9.743e-05, [4] [d_1]: 6.754e-05 [none_parameter_eliminate]: 1.65001e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 9.81998e-06 [partial_unused_args_eliminate]: 1.72999e-06 [add_recomputation]: 5.561e-05 [cse_after_recomputation]: 3.209e-05, [1] [Cycle 1]: 2.722e-05, [1] [cse]: 2.179e-05 [environ_conv]: 8.77e-06 [swap_dp_allreduce_reducescatter]: 7.75998e-06 [bias_add_comm_swap]: 2.76e-06 [label_micro_interleaved_index]: 4.16001e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.16002e-06 [slice_recompute_activation]: 2.04999e-06 [micro_interleaved_order_control]: 1.97001e-06 [assign_add_opt]: 1.38002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.65002e-06 [comm_op_add_attrs]: 9.60019e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.07998e-06 [interleave_parallel_branches]: 9.79984e-07 [overlap_opt_shard_in_pipeline]: 9.99979e-07 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.631e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 4.96997e-06 [overlap_recompute_and_grad_model_parallel]: 6.14999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.11997e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.06998e-06 [overlap_grad_ring_attention]: 5.37999e-06 [overlap_grad_flash_sp]: 2.323e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.86e-06 [handle_group_info]: 1.21002e-06 [symbol_engine_optimizer]: 0.00013282, [1] [Cycle 1]: 0.00012855, [6] [build]: 1.004e-05 [elim_shapecalc]: 1.338e-05 [elim_not_effective]: 1.792e-05 [opt_reshape]: 1.051e-05 [fold_const_symbol]: 1.534e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.79998e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 2.425e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.38999e-06 [opt_after_jit_grad]: 0.00046963 [validate]: 4.373e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00803371 [execute]: 7.41001e-06 Sums bootstrap : 0.000574s : 1.75% type_inference : 0.011263s : 34.36% event_method : 0.000046s : 0.14% auto_monad : 0.000120s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000010s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000038s : 0.12% optimize.rewriter_before_opt_a : 0.000143s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000131s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.35% optimize.opt_a.a_1 : 0.003219s : 9.82% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.04% optimize.opt_a.parameter_eliminate : 0.000005s : 0.02% optimize.opt_a.a_2 : 0.000491s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.08% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000058s : 0.18% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000019s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001460s : 4.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000005s : 0.01% optimize.opt_a.after_resolve : 0.000089s : 0.27% optimize.opt_a.a_after_grad : 0.000108s : 0.33% optimize.opt_a.renormalize : 0.002974s : 9.07% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000232s : 0.71% optimize.opt_a.a_3 : 0.000456s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000048s : 0.15% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000456s : 1.39% optimize.opt_b.b_1 : 0.000190s : 0.58% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000424s : 1.29% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000030s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000068s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.07% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000470s : 1.43% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008034s : 24.51% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000750 222 5.99% : 0.000045s : 12: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.26% : 0.000002s : 2: substitution.incorporate_call_switch 55.20% : 0.000414s : 17: substitution.inline 2.15% : 0.000016s : 2: substitution.inline_without_move 1.39% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000014s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.88% : 0.000014s : 20: substitution.remove_not_recompute_node 3.22% : 0.000024s : 10: substitution.replace_applicator 1.37% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.63% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.82% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.75% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.39% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011190 2 87.05% : 0.009741s : 1: type_inference.infer 12.95% : 0.001449s : 1: type_inference.specialize ------[replace.] 0.000217 33 58.10% : 0.000126s : 17: replace.inline 41.90% : 0.000091s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000438 33 92.38% : 0.000405s : 17: match.inline 7.62% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000748 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.07% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.12% : 0.000008s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.20% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.11% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.29% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.55% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000042s : 249: predicate.inline 1.27% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.44% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.08% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000010s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.17% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.98% : 0.000022s : 169: predicate.switch_layer_defer_inline 4.99% : 0.000037s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.09% : 0.000008s : 68: predicate.transpose_eliminate 1.50% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.91% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.65% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.26% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001542 34 57.02% : 0.000879s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.98% : 0.000663s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061418 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.89% : 0.003001s : 1: add_attr 4.87% : 0.002992s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000127s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.99% : 0.000607s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000019s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000052s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.71% : 0.000433s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000465s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 7.94% : 0.004875s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.05% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 17.77% : 0.010911s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.78% : 0.000479s : 1: opt_after_jit_grad 0.48% : 0.000292s : 1: opt_b 21.49% : 0.013197s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000042s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000033s : 1: remove_dup_value 2.63% : 0.001612s : 2: renormalize.infer 2.20% : 0.001349s : 2: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000052s : 1: rewriter_after_opt_a 0.24% : 0.000148s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.22% : 0.000136s : 1: symbol_engine_optimizer 13.10% : 0.008043s : 1: task_emit 0.17% : 0.000105s : 1: tuple_transform 18.36% : 0.011278s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0183255, [24] [bootstrap]: 0.00046673 [type_inference]: 0.00425965 [event_method]: 1.102e-05 [auto_monad]: 5.229e-05 [graph_reusing]: 5.31002e-06 [inline]: 1.91e-06 [add_attr]: 0.0029756, [1] [add_attr_with_inline]: 0.00296741, [1] [Cycle 1]: 4.388e-05, [2] [tag_attr]: 1.165e-05 [meta_addattr_fg_expand]: 3.08998e-06 [parallel-infer-symbol]: 3.03e-06 [pre_auto_parallel]: 2.16e-05 [insert-virtual-dataset]: 2.35002e-06 [parallel-infer-symbol-second]: 8.10018e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 2.15002e-06 [optimize]: 0.00363929, [53] [py_interpret_to_execute]: 1.471e-05 [rewriter_before_opt_a]: 3.728e-05 [opt_a]: 0.00184435, [2] [Cycle 1]: 0.00124755, [45] [expand_dump_flag]: 2.46998e-06 [switch_simplify]: 2.413e-05 [loop_unroll]: 1.348e-05 [a_1]: 0.00029182 [with_stream_mark]: 1.333e-05 [recompute_prepare]: 7.60003e-06 [updatestate_depend_eliminate]: 3.50003e-06 [updatestate_assign_eliminate]: 3.36999e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.81e-06 [a_2]: 7.642e-05 [accelerated_algorithm]: 6.56e-06 [shard]: 2.40997e-06 [meta_shard_fg_expand]: 1.48002e-06 [shard_inline]: 5.83997e-06 [merge_send_recv]: 7.39002e-06 [auto_parallel]: 5.74e-06 [parallel]: 1.719e-05 [flash_sp]: 7.55998e-06 [merge_comm]: 3.48e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 9.10001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 7.23999e-06 [virtual_dataset]: 5.74999e-06 [get_grad_eliminate_]: 5.57001e-06 [virtual_output]: 5.51e-06 [merge_forward]: 3.7e-06 [cell_reuse_recompute_pass]: 1.13001e-06 [offload_activation]: 9.14e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.065e-05 [merge_recompute_call_nodes]: 1.47001e-06 [before_grad]: 9.07001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.074e-05 [a_after_grad]: 9.07999e-06 [renormalize]: 0.00033754 [add_forward_monad_depend]: 4.32e-06 [auto_monad_grad]: 2.08002e-06 [auto_monad_eliminator]: 1.313e-05 [cse]: 2.866e-05 [a_3]: 3.974e-05 [Cycle 2]: 0.00058751, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.84999e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.00012391 [with_stream_mark]: 9.27999e-06 [recompute_prepare]: 5.55001e-06 [updatestate_depend_eliminate]: 2.77002e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.752e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.47999e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.68002e-06 [parallel]: 4.37998e-06 [flash_sp]: 3.45e-06 [merge_comm]: 3.01001e-06 [allreduce_fusion]: 2.66e-06 [matmul_add_comm_reduction]: 4.94e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.28e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.75001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.51998e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.86001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.10998e-06 [meta_fg_expand]: 1.52001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.92999e-06 [a_after_grad]: 8.40999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.17999e-06 [auto_monad_grad]: 9.20001e-07 [auto_monad_eliminator]: 6.04999e-06 [cse]: 1.319e-05 [a_3]: 3.187e-05 [py_interpret_to_execute_after_opt_a]: 7.23999e-06 [slice_cell_reuse_recomputed_activation]: 1.77999e-06 [rewriter_after_opt_a]: 3.143e-05 [convert_after_rewriter]: 6.59001e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.00044633 [opt_b]: 0.00018026, [1] [Cycle 1]: 0.00017403, [7] [b_1]: 0.00010775 [b_2]: 6.74001e-06 [updatestate_depend_eliminate]: 5.21998e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.30002e-06 [renormalize]: 3.59985e-07 [cse]: 1.572e-05 [optimize_parallel_all_gather_comm]: 1.535e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 2.142e-05 [loop_unroll]: 0.00041444 [opt_after_cconv]: 9.302e-05, [1] [Cycle 1]: 8.751e-05, [7] [c_1]: 2.742e-05 [parameter_eliminate]: 2.09e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.6e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.244e-05 [tuple_transform]: 6.807e-05, [1] [Cycle 1]: 6.387e-05, [4] [d_1]: 3.87e-05 [none_parameter_eliminate]: 1.51002e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.17999e-06 [partial_unused_args_eliminate]: 2.14999e-06 [add_recomputation]: 4.293e-05 [cse_after_recomputation]: 2.022e-05, [1] [Cycle 1]: 1.597e-05, [1] [cse]: 1.078e-05 [environ_conv]: 4.71002e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.58998e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.61999e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.09e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.34e-06 [ForceFp32Comm]: 8.39995e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.28998e-06 [reorder_send_recv_between_fp_bp]: 2.86999e-06 [comm_op_add_attrs]: 1.15999e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.74998e-06 [control_data_broadcast_order]: 1.148e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.9e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.24e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 1.84e-06 [overlap_grad_ring_attention]: 4.07e-06 [overlap_grad_flash_sp]: 1.697e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.81e-06 [handle_group_info]: 1.34e-06 [symbol_engine_optimizer]: 6.777e-05, [1] [Cycle 1]: 6.368e-05, [6] [build]: 2.38002e-06 [elim_shapecalc]: 8.32e-06 [elim_not_effective]: 1.159e-05 [opt_reshape]: 5.84999e-06 [fold_const_symbol]: 8.81002e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.619e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.39001e-06 [opt_after_jit_grad]: 0.00047921 [validate]: 3.075e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0061494 [execute]: 7.18e-06 Sums bootstrap : 0.000467s : 3.24% type_inference : 0.004260s : 29.58% event_method : 0.000011s : 0.08% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000037s : 0.26% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000416s : 2.89% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000338s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000042s : 0.29% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 3.10% optimize.opt_b.b_1 : 0.000108s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.15% optimize.loop_unroll : 0.000414s : 2.88% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000479s : 3.33% validate : 0.000031s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.006149s : 42.70% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000121 26 17.87% : 0.000022s : 4: substitution.arithmetic_simplify 1.73% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.79% : 0.000006s : 4: substitution.graph_param_transform 65.50% : 0.000079s : 2: substitution.inline 2.31% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.50% : 0.000004s : 4: substitution.remove_not_recompute_node 3.25% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004221 2 91.91% : 0.003879s : 1: type_inference.infer 8.09% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 17: predicate.arithmetic_simplify 0.93% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.75% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.79% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.12% : 0.000002s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.28% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.76% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.67% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.92% : 0.000001s : 9: predicate.reduce_eliminate 2.25% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.70% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 17: predicate.replace_applicator 0.87% : 0.000001s : 8: predicate.replace_old_param 0.43% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.90% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.98% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.43% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.22% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.76% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.71% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000235 6 41.99% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.01% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026202 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.37% : 0.002980s : 1: add_attr 11.34% : 0.002971s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.91% : 0.000501s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000455s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.92% : 0.000766s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.05% : 0.001847s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.87% : 0.000489s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 13.90% : 0.003643s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000185s : 1: renormalize.infer 0.56% : 0.000147s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.51% : 0.006159s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.31% : 0.004273s : 1: type_inference 0.21% : 0.000056s : 1: validate TotalTime = 0.0356702, [24] [bootstrap]: 0.00051159 [type_inference]: 0.0101781 [event_method]: 4.19e-05 [auto_monad]: 0.00011385 [graph_reusing]: 8.03999e-06 [inline]: 2.42001e-06 [add_attr]: 0.00296187, [1] [add_attr_with_inline]: 0.00295354, [1] [Cycle 1]: 6.641e-05, [2] [tag_attr]: 3.02e-05 [meta_addattr_fg_expand]: 8.80001e-06 [parallel-infer-symbol]: 2.77002e-06 [pre_auto_parallel]: 4.436e-05 [insert-virtual-dataset]: 2.53e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.74e-06 [optimize]: 0.013022, [53] [py_interpret_to_execute]: 3.461e-05 [rewriter_before_opt_a]: 0.00012469 [opt_a]: 0.0107648, [3] [Cycle 1]: 0.00689434, [45] [expand_dump_flag]: 3.57997e-06 [switch_simplify]: 6.569e-05 [loop_unroll]: 5.45e-05 [a_1]: 0.00134704 [with_stream_mark]: 2.276e-05 [recompute_prepare]: 2.159e-05 [updatestate_depend_eliminate]: 9.55001e-06 [updatestate_assign_eliminate]: 7.71001e-06 [updatestate_loads_eliminate]: 7.33e-06 [parameter_eliminate]: 2.43998e-06 [a_2]: 0.00024477 [accelerated_algorithm]: 3.118e-05 [shard]: 2.11998e-06 [meta_shard_fg_expand]: 3.35e-06 [shard_inline]: 1.622e-05 [merge_send_recv]: 1.625e-05 [auto_parallel]: 1.113e-05 [parallel]: 1.775e-05 [flash_sp]: 1.123e-05 [merge_comm]: 9.57999e-06 [allreduce_fusion]: 8.85001e-06 [matmul_add_comm_reduction]: 2.597e-05 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 1.762e-05 [virtual_dataset]: 1.546e-05 [get_grad_eliminate_]: 1.493e-05 [virtual_output]: 1.505e-05 [merge_forward]: 9.05999e-06 [cell_reuse_recompute_pass]: 1.47001e-06 [offload_activation]: 1.843e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.891e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 2.823e-05 [set_forward_comm_id_for_comm_node_pass]: 9.57999e-06 [meta_fg_expand]: 0.00137649 [flash_sp_send_recv_attached]: 3.55e-06 [receive_attached]: 2.38998e-06 [after_resolve]: 5.922e-05 [a_after_grad]: 7.975e-05 [renormalize]: 0.00242551 [add_forward_monad_depend]: 8.75001e-06 [auto_monad_grad]: 5.27001e-06 [auto_monad_eliminator]: 5.537e-05 [cse]: 0.00016437 [a_3]: 0.00033459 [Cycle 2]: 0.00296469, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.671e-05 [loop_unroll]: 4.396e-05 [a_1]: 0.0015509 [with_stream_mark]: 1.206e-05 [recompute_prepare]: 1.108e-05 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 4.27998e-06 [updatestate_loads_eliminate]: 3.72002e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012581 [accelerated_algorithm]: 1.206e-05 [shard]: 1.14e-06 [meta_shard_fg_expand]: 1.84e-06 [shard_inline]: 9.64999e-06 [merge_send_recv]: 6.76999e-06 [auto_parallel]: 7.18e-06 [parallel]: 5.19998e-06 [flash_sp]: 3.3e-06 [merge_comm]: 5.17e-06 [allreduce_fusion]: 4.74e-06 [matmul_add_comm_reduction]: 7.71001e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 1.037e-05 [virtual_dataset]: 8.76997e-06 [get_grad_eliminate_]: 9.03002e-06 [virtual_output]: 8.55999e-06 [merge_forward]: 3.95e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.647e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 1.393e-05 [set_forward_comm_id_for_comm_node_pass]: 5.00999e-06 [meta_fg_expand]: 3.49e-05 [flash_sp_send_recv_attached]: 9.60019e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 1.514e-05 [a_after_grad]: 1.422e-05 [renormalize]: 0.00057531 [add_forward_monad_depend]: 3.91999e-06 [auto_monad_grad]: 1.19998e-06 [auto_monad_eliminator]: 1.477e-05 [cse]: 4.597e-05 [a_3]: 6.497e-05 [Cycle 3]: 0.00089131, [45] [expand_dump_flag]: 9.99979e-07 [switch_simplify]: 1.038e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.00024927 [with_stream_mark]: 9.67001e-06 [recompute_prepare]: 9.59e-06 [updatestate_depend_eliminate]: 4.84998e-06 [updatestate_assign_eliminate]: 3.89002e-06 [updatestate_loads_eliminate]: 3.81999e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 0.00012354 [accelerated_algorithm]: 1.167e-05 [shard]: 9.20001e-07 [meta_shard_fg_expand]: 1.77001e-06 [shard_inline]: 9.20999e-06 [merge_send_recv]: 7.01001e-06 [auto_parallel]: 6.96999e-06 [parallel]: 4.28999e-06 [flash_sp]: 1.16002e-06 [merge_comm]: 4.99e-06 [allreduce_fusion]: 4.90001e-06 [matmul_add_comm_reduction]: 7.91001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 9.93998e-06 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.13999e-06 [merge_forward]: 4.25e-06 [cell_reuse_recompute_pass]: 1.41002e-06 [offload_activation]: 8.66002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.612e-05 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 1.407e-05 [set_forward_comm_id_for_comm_node_pass]: 5.32999e-06 [meta_fg_expand]: 2.99001e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 1.315e-05 [a_after_grad]: 1.419e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 1.057e-05 [cse]: 2.47e-05 [a_3]: 5.629e-05 [py_interpret_to_execute_after_opt_a]: 1.106e-05 [slice_cell_reuse_recomputed_activation]: 1.81998e-06 [rewriter_after_opt_a]: 4.911e-05 [convert_after_rewriter]: 9.00999e-06 [order_py_execute_after_rewriter]: 6.61e-06 [mutable_eliminate]: 0.00046236 [opt_b]: 0.00028688, [1] [Cycle 1]: 0.0002808, [7] [b_1]: 0.00018968 [b_2]: 1.058e-05 [updatestate_depend_eliminate]: 6.93e-06 [updatestate_assign_eliminate]: 3.97998e-06 [updatestate_loads_eliminate]: 4.22e-06 [renormalize]: 4.19997e-07 [cse]: 3.042e-05 [optimize_parallel_all_gather_comm]: 1.998e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 1.977e-05 [loop_unroll]: 0.0004273 [opt_after_cconv]: 0.00013444, [1] [Cycle 1]: 0.00012853, [7] [c_1]: 4.834e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.83001e-06 [cse]: 2.893e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 2.954e-05 [tuple_transform]: 0.00010095, [1] [Cycle 1]: 9.624e-05, [4] [d_1]: 6.615e-05 [none_parameter_eliminate]: 1.54998e-06 [renormalize]: 2.50002e-07 [switch_simplify]: 9.69e-06 [partial_unused_args_eliminate]: 2.28002e-06 [add_recomputation]: 7.118e-05 [cse_after_recomputation]: 3.283e-05, [1] [Cycle 1]: 2.799e-05, [1] [cse]: 2.22e-05 [environ_conv]: 8.23999e-06 [swap_dp_allreduce_reducescatter]: 7.68001e-06 [bias_add_comm_swap]: 2.21e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 2.84999e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.57001e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.57001e-06 [ForceFp32Comm]: 9.80013e-07 [remove_cast_before_assign_add]: 1.00999e-06 [full_micro_interleaved_order_control]: 2.01e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 9.50007e-07 [add_comm_op_reuse_tag]: 8.79983e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.47999e-06 [control_data_broadcast_order]: 1.629e-05 [grouped_pairwise_exchange_alltoall]: 1.40001e-06 [offloading_packed_experts]: 5.04e-06 [overlap_recompute_and_grad_model_parallel]: 5.97999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.02999e-06 [overlap_grad_ring_attention]: 5.37001e-06 [overlap_grad_flash_sp]: 2.324e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.24001e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.20001e-06 [symbol_engine_optimizer]: 9.916e-05, [1] [Cycle 1]: 9.493e-05, [6] [build]: 1.046e-05 [elim_shapecalc]: 1.389e-05 [elim_not_effective]: 1.82e-05 [opt_reshape]: 9.71e-06 [fold_const_symbol]: 1.48e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.42999e-06 [auto_monad_reorder]: 2.53e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.56999e-06 [opt_after_jit_grad]: 0.00047557 [validate]: 4.322e-05 [backend_pass]: 1.25999e-06 [task_emit]: 0.00801272 [execute]: 7.46001e-06 Sums bootstrap : 0.000512s : 1.63% type_inference : 0.010178s : 32.35% event_method : 0.000042s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000030s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000044s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000125s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000123s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003147s : 10.00% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000494s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000035s : 0.11% optimize.opt_a.merge_send_recv : 0.000030s : 0.10% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000027s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000002s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000017s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.12% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001414s : 4.50% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.28% optimize.opt_a.a_after_grad : 0.000108s : 0.34% optimize.opt_a.renormalize : 0.003001s : 9.54% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.26% optimize.opt_a.cse : 0.000235s : 0.75% optimize.opt_a.a_3 : 0.000456s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000049s : 0.16% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000462s : 1.47% optimize.opt_b.b_1 : 0.000190s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000427s : 1.36% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000071s : 0.23% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000008s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000016s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000476s : 1.51% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008013s : 25.47% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000752 218 5.86% : 0.000044s : 11: substitution.arithmetic_simplify 4.98% : 0.000037s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.55% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.00% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.30% : 0.000002s : 2: substitution.incorporate_call_switch 52.75% : 0.000397s : 16: substitution.inline 2.10% : 0.000016s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.98% : 0.000015s : 3: substitution.less_batch_normalization 1.74% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.85% : 0.000014s : 20: substitution.remove_not_recompute_node 3.11% : 0.000023s : 10: substitution.replace_applicator 1.42% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.78% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.36% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.30% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010110 2 87.30% : 0.008825s : 1: type_inference.infer 12.70% : 0.001284s : 1: type_inference.specialize ------[replace.] 0.000202 30 59.23% : 0.000120s : 16: replace.inline 40.77% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000419 30 92.78% : 0.000389s : 16: match.inline 7.22% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000734 5663 1.10% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.08% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.07% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.21% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.14% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.22% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.78% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.22% : 0.000016s : 97: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.58% : 0.000041s : 244: predicate.inline 1.24% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.64% : 0.000019s : 164: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.17% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.41% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.57% : 0.000004s : 32: predicate.merge_addn 1.13% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.19% : 0.000001s : 8: predicate.parallel_virtual_node 1.97% : 0.000014s : 97: predicate.partial_defer_inline 1.72% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 67: predicate.reduce_eliminate 2.71% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.36% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.29% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.90% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.45% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.33% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.83% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.10% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.65% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.25% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001454 32 58.01% : 0.000843s : 12: func_graph_cloner_run.FuncGraphClonerGraph 41.99% : 0.000610s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059775 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.96% : 0.002966s : 1: add_attr 4.95% : 0.002957s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.13% : 0.000076s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.91% : 0.000546s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000019s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000036s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000011s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.73% : 0.000436s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.79% : 0.000471s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.02% : 0.004791s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000054s : 4: opt.transform.symbol_engine_opt 18.01% : 0.010768s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.81% : 0.000485s : 1: opt_after_jit_grad 0.49% : 0.000290s : 1: opt_b 21.79% : 0.013026s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000049s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.03% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.74% : 0.001636s : 2: renormalize.infer 2.26% : 0.001353s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000053s : 1: rewriter_after_opt_a 0.22% : 0.000129s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000102s : 1: symbol_engine_optimizer 13.42% : 0.008022s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.05% : 0.010193s : 1: type_inference 0.12% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x1-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x1-kbk],max_mem:64.0M . TotalTime = 0.0786068, [24] [bootstrap]: 0.00054624 [type_inference]: 0.00602811 [event_method]: 1.392e-05 [auto_monad]: 5.315e-05 [graph_reusing]: 5.48002e-06 [inline]: 1.64e-06 [add_attr]: 0.00342103, [1] [add_attr_with_inline]: 0.00340993, [1] [Cycle 1]: 4.374e-05, [2] [tag_attr]: 1.49e-05 [meta_addattr_fg_expand]: 3.89002e-06 [parallel-infer-symbol]: 3.07002e-06 [pre_auto_parallel]: 2.639e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 2.11998e-06 [pipeline_split]: 1.79e-06 [optimize]: 0.00396143, [53] [py_interpret_to_execute]: 1.921e-05 [rewriter_before_opt_a]: 5.811e-05 [opt_a]: 0.00209838, [2] [Cycle 1]: 0.00150417, [45] [expand_dump_flag]: 2.98e-06 [switch_simplify]: 3.235e-05 [loop_unroll]: 2.118e-05 [a_1]: 0.00045686 [with_stream_mark]: 1.365e-05 [recompute_prepare]: 7.81001e-06 [updatestate_depend_eliminate]: 3.68e-06 [updatestate_assign_eliminate]: 3.35e-06 [updatestate_loads_eliminate]: 3.07002e-06 [parameter_eliminate]: 1.67001e-06 [a_2]: 7.581e-05 [accelerated_algorithm]: 6.26998e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.62001e-06 [shard_inline]: 5.77999e-06 [merge_send_recv]: 7.88001e-06 [auto_parallel]: 5.90002e-06 [parallel]: 2.251e-05 [flash_sp]: 7.18e-06 [merge_comm]: 3.46999e-06 [allreduce_fusion]: 3.08e-06 [matmul_add_comm_reduction]: 8.77e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.43999e-06 [virtual_dataset]: 6.10002e-06 [get_grad_eliminate_]: 5.32999e-06 [virtual_output]: 5.69e-06 [merge_forward]: 3.66001e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 8.48001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.066e-05 [merge_recompute_call_nodes]: 1.72001e-06 [before_grad]: 9.62001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46999e-06 [meta_fg_expand]: 2.63998e-06 [flash_sp_send_recv_attached]: 2.81999e-06 [receive_attached]: 2.60997e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 8.84003e-06 [renormalize]: 0.00040419 [add_forward_monad_depend]: 4.67e-06 [auto_monad_grad]: 1.66e-06 [auto_monad_eliminator]: 1.372e-05 [cse]: 2.699e-05 [a_3]: 3.994e-05 [Cycle 2]: 0.00058447, [45] [expand_dump_flag]: 9.79984e-07 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.51e-06 [a_1]: 0.00012483 [with_stream_mark]: 9.99999e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.96999e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.46e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.699e-05 [accelerated_algorithm]: 5.44e-06 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.17e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.25999e-06 [parallel]: 4.11001e-06 [flash_sp]: 3.36999e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.40999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.30002e-06 [virtual_dataset]: 5.26002e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 5.31002e-06 [merge_forward]: 2.68998e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 5.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.49e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.02998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98998e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.07e-06 [after_resolve]: 8.82999e-06 [a_after_grad]: 7.73999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.00001e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 6.01e-06 [cse]: 1.163e-05 [a_3]: 3.161e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 3.055e-05 [convert_after_rewriter]: 6.64999e-06 [order_py_execute_after_rewriter]: 4.90001e-06 [mutable_eliminate]: 0.00045689 [opt_b]: 0.00018165, [1] [Cycle 1]: 0.00017491, [7] [b_1]: 0.00010823 [b_2]: 6.64001e-06 [updatestate_depend_eliminate]: 5.02e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 4.09986e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.509e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.271e-05 [loop_unroll]: 0.0004185 [opt_after_cconv]: 9.27e-05, [1] [Cycle 1]: 8.708e-05, [7] [c_1]: 2.694e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 4.75001e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.584e-05 [renormalize]: 4.69998e-07 [remove_dup_value]: 1.274e-05 [tuple_transform]: 6.883e-05, [1] [Cycle 1]: 6.428e-05, [4] [d_1]: 3.876e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.70025e-07 [switch_simplify]: 6.06e-06 [partial_unused_args_eliminate]: 1.76003e-06 [add_recomputation]: 6.8e-05 [cse_after_recomputation]: 2.149e-05, [1] [Cycle 1]: 1.687e-05, [1] [cse]: 1.139e-05 [environ_conv]: 4.54002e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.08001e-06 [label_fine_grained_interleaved_index]: 2.50997e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 3.13e-06 [assign_add_opt]: 1.31002e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.14e-06 [reorder_send_recv_between_fp_bp]: 2.81999e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.07e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.149e-05 [grouped_pairwise_exchange_alltoall]: 2.00002e-06 [offloading_packed_experts]: 3.73001e-06 [overlap_recompute_and_grad_model_parallel]: 5.09e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.03002e-06 [overlap_grad_ring_attention]: 3.71001e-06 [overlap_grad_flash_sp]: 1.661e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.815e-05, [1] [Cycle 1]: 6.396e-05, [6] [build]: 2.09999e-06 [elim_shapecalc]: 9.07001e-06 [elim_not_effective]: 1.176e-05 [opt_reshape]: 6.05002e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.65001e-06 [pipeline_parallel_scheduler]: 1.45001e-06 [auto_monad_reorder]: 1.482e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.33e-06 [opt_after_jit_grad]: 0.00045405 [validate]: 2.973e-05 [backend_pass]: 9.80013e-07 [task_emit]: 0.0638159 [execute]: 9.29998e-06 Sums bootstrap : 0.000546s : 0.74% type_inference : 0.006028s : 8.12% event_method : 0.000014s : 0.02% auto_monad : 0.000053s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000026s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000582s : 0.78% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000014s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000404s : 0.54% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000457s : 0.62% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000418s : 0.56% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000068s : 0.09% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000454s : 0.61% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063816s : 85.98% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000167 30 14.30% : 0.000024s : 5: substitution.arithmetic_simplify 1.19% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.13% : 0.000005s : 4: substitution.graph_param_transform 66.81% : 0.000112s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.59% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 7.20% : 0.000012s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005984 2 90.85% : 0.005437s : 1: type_inference.infer 9.15% : 0.000547s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.63% : 0.000027s : 3: replace.inline 30.37% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 90.93% : 0.000110s : 3: match.inline 9.07% : 0.000011s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.28% : 0.000004s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.92% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.00% : 0.000002s : 11: predicate.dict_get_item_eliminator 1.05% : 0.000002s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.23% : 0.000000s : 4: predicate.elim_not_effective 0.36% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.86% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.21% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.92% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 32: predicate.load_eliminater 1.09% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.30% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.64% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.60% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.16% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.65% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000002s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.14% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.32% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.19% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.13% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.74% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.70% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000346 8 47.16% : 0.000163s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.84% : 0.000183s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087491 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.91% : 0.003425s : 1: add_attr 3.90% : 0.003414s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.08% : 0.000072s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000588s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000016s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.53% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.08% : 0.000945s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.40% : 0.002101s : 1: opt_a 0.11% : 0.000096s : 1: opt_after_cconv 0.53% : 0.000464s : 1: opt_after_jit_grad 0.21% : 0.000185s : 1: opt_b 4.53% : 0.003965s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000031s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000209s : 1: renormalize.infer 0.22% : 0.000188s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000071s : 1: symbol_engine_optimizer 72.96% : 0.063833s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.91% : 0.006042s : 1: type_inference 0.06% : 0.000054s : 1: validate TotalTime = 0.0702268, [24] [bootstrap]: 0.00047037 [type_inference]: 0.00440676 [event_method]: 1.122e-05 [auto_monad]: 5.107e-05 [graph_reusing]: 4.77e-06 [inline]: 2.22001e-06 [add_attr]: 0.00298363, [1] [add_attr_with_inline]: 0.00297618, [1] [Cycle 1]: 4.448e-05, [2] [tag_attr]: 1.126e-05 [meta_addattr_fg_expand]: 3.21999e-06 [parallel-infer-symbol]: 2.66e-06 [pre_auto_parallel]: 2.187e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00364128, [53] [py_interpret_to_execute]: 1.497e-05 [rewriter_before_opt_a]: 3.85e-05 [opt_a]: 0.00183018, [2] [Cycle 1]: 0.00123719, [45] [expand_dump_flag]: 2.37001e-06 [switch_simplify]: 2.362e-05 [loop_unroll]: 1.341e-05 [a_1]: 0.00028573 [with_stream_mark]: 1.361e-05 [recompute_prepare]: 7.13998e-06 [updatestate_depend_eliminate]: 3.60003e-06 [updatestate_assign_eliminate]: 3.17002e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.619e-05 [accelerated_algorithm]: 6.21998e-06 [shard]: 2.46e-06 [meta_shard_fg_expand]: 1.44e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 7.43e-06 [auto_parallel]: 5.66998e-06 [parallel]: 1.751e-05 [flash_sp]: 7.22002e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.56999e-06 [matmul_add_comm_reduction]: 8.32e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 6.99001e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.42001e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.07999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.012e-05 [a_after_grad]: 8.54e-06 [renormalize]: 0.00034048 [add_forward_monad_depend]: 4.18001e-06 [auto_monad_grad]: 1.85001e-06 [auto_monad_eliminator]: 1.303e-05 [cse]: 2.616e-05 [a_3]: 3.989e-05 [Cycle 2]: 0.00058381, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.46e-06 [loop_unroll]: 5.35999e-06 [a_1]: 0.00012382 [with_stream_mark]: 1.032e-05 [recompute_prepare]: 5.44e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.687e-05 [accelerated_algorithm]: 5.31002e-06 [shard]: 1.20999e-06 [meta_shard_fg_expand]: 1.05001e-06 [shard_inline]: 5.46e-06 [merge_send_recv]: 4.46002e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.35999e-06 [flash_sp]: 3.07002e-06 [merge_comm]: 2.83e-06 [allreduce_fusion]: 2.65002e-06 [matmul_add_comm_reduction]: 5.09998e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 5.94e-06 [virtual_dataset]: 5.13002e-06 [get_grad_eliminate_]: 4.87e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.48002e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.09998e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 7.78001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.21001e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.03002e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.13001e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.32001e-06 [cse]: 1.294e-05 [a_3]: 3.145e-05 [py_interpret_to_execute_after_opt_a]: 7.48999e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 3.09e-05 [convert_after_rewriter]: 7.21001e-06 [order_py_execute_after_rewriter]: 4.88001e-06 [mutable_eliminate]: 0.00044592 [opt_b]: 0.00017964, [1] [Cycle 1]: 0.00017366, [7] [b_1]: 0.00010671 [b_2]: 6.89001e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.32999e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 3.69997e-07 [cse]: 1.677e-05 [optimize_parallel_all_gather_comm]: 1.552e-05 [overlap_param_gather]: 2.36e-06 [cconv]: 2.181e-05 [loop_unroll]: 0.00041201 [opt_after_cconv]: 9.441e-05, [1] [Cycle 1]: 8.875e-05, [7] [c_1]: 2.77e-05 [parameter_eliminate]: 2.06998e-06 [updatestate_depend_eliminate]: 5.25001e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.09999e-06 [cse]: 1.605e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.261e-05 [tuple_transform]: 6.759e-05, [1] [Cycle 1]: 6.347e-05, [4] [d_1]: 3.802e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.30002e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.317e-05 [cse_after_recomputation]: 2.038e-05, [1] [Cycle 1]: 1.607e-05, [1] [cse]: 1.094e-05 [environ_conv]: 4.94998e-06 [swap_dp_allreduce_reducescatter]: 4.83001e-06 [bias_add_comm_swap]: 2.21998e-06 [label_micro_interleaved_index]: 4.50001e-06 [label_fine_grained_interleaved_index]: 2.59999e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.32999e-06 [micro_interleaved_order_control]: 2.21e-06 [assign_add_opt]: 1.30999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.22e-06 [full_micro_interleaved_order_control]: 2.37999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.69972e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60999e-06 [control_data_broadcast_order]: 1.187e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.88001e-06 [overlap_recompute_and_grad_model_parallel]: 4.35999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 2.34001e-06 [overlap_grad_ring_attention]: 3.98001e-06 [overlap_grad_flash_sp]: 1.683e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.39001e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.825e-05, [1] [Cycle 1]: 6.411e-05, [6] [build]: 2.15002e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.119e-05 [opt_reshape]: 5.92999e-06 [fold_const_symbol]: 8.52998e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.529e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00044741 [validate]: 3.025e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0579163 [execute]: 8.76002e-06 Sums bootstrap : 0.000470s : 0.71% type_inference : 0.004407s : 6.65% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000030s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000410s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000002s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000341s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000071s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.67% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000412s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000447s : 0.68% validate : 0.000030s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057916s : 87.39% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000117 26 17.93% : 0.000021s : 4: substitution.arithmetic_simplify 1.57% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.69% : 0.000005s : 4: substitution.graph_param_transform 65.63% : 0.000077s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.54% : 0.000004s : 4: substitution.remove_not_recompute_node 3.31% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004367 2 91.73% : 0.004005s : 1: type_inference.infer 8.27% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000136 984 1.09% : 0.000001s : 9: predicate.accumulaten_eliminater 1.17% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 1.03% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.15% : 0.000002s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.15% : 0.000002s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.79% : 0.000002s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.09% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.75% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.89% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.68% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.38% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.20% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.90% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.76% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.73% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.48% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.92% : 0.000001s : 8: predicate.shard_identity_eliminate 0.89% : 0.000001s : 8: predicate.special_op_eliminate 0.89% : 0.000001s : 8: predicate.specialize_transform 1.14% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.73% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 2.99% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.05% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.73% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000257 6 42.89% : 0.000110s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.11% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078102 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.83% : 0.002988s : 1: add_attr 3.81% : 0.002979s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000506s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000421s : 1: loop_unroll 0.03% : 0.000020s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000455s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000754s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.35% : 0.001833s : 1: opt_a 0.13% : 0.000098s : 1: opt_after_cconv 0.59% : 0.000457s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.67% : 0.003645s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000185s : 1: renormalize.infer 0.19% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.18% : 0.057933s : 1: task_emit 0.09% : 0.000070s : 1: tuple_transform 5.66% : 0.004420s : 1: type_inference 0.07% : 0.000051s : 1: validate TotalTime = 0.0714461, [24] [bootstrap]: 0.00047758 [type_inference]: 0.00546996 [event_method]: 1.498e-05 [auto_monad]: 5.539e-05 [graph_reusing]: 5.32999e-06 [inline]: 1.84e-06 [add_attr]: 0.00292426, [1] [add_attr_with_inline]: 0.00291636, [1] [Cycle 1]: 4.527e-05, [2] [tag_attr]: 1.485e-05 [meta_addattr_fg_expand]: 4.23999e-06 [parallel-infer-symbol]: 2.89999e-06 [pre_auto_parallel]: 2.455e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.51998e-06 [optimize]: 0.0039469, [53] [py_interpret_to_execute]: 2.058e-05 [rewriter_before_opt_a]: 5.843e-05 [opt_a]: 0.00211613, [2] [Cycle 1]: 0.00151406, [45] [expand_dump_flag]: 2.68e-06 [switch_simplify]: 3.248e-05 [loop_unroll]: 2.097e-05 [a_1]: 0.00044299 [with_stream_mark]: 1.298e-05 [recompute_prepare]: 7.43999e-06 [updatestate_depend_eliminate]: 3.81999e-06 [updatestate_assign_eliminate]: 3.48999e-06 [updatestate_loads_eliminate]: 2.84999e-06 [parameter_eliminate]: 1.55999e-06 [a_2]: 7.586e-05 [accelerated_algorithm]: 6.26998e-06 [shard]: 2.12999e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 8.32998e-06 [auto_parallel]: 6.41998e-06 [parallel]: 1.775e-05 [flash_sp]: 7.14001e-06 [merge_comm]: 3.60998e-06 [allreduce_fusion]: 3.92998e-06 [matmul_add_comm_reduction]: 8.59002e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.17002e-06 [virtual_dataset]: 6.09999e-06 [get_grad_eliminate_]: 5.81e-06 [virtual_output]: 5.72001e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.30001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.056e-05 [merge_recompute_call_nodes]: 1.62001e-06 [before_grad]: 9.20999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.54002e-06 [meta_fg_expand]: 2.27001e-06 [flash_sp_send_recv_attached]: 2.29999e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.059e-05 [a_after_grad]: 8.85001e-06 [renormalize]: 0.00043371 [add_forward_monad_depend]: 4.58001e-06 [auto_monad_grad]: 1.86e-06 [auto_monad_eliminator]: 1.381e-05 [cse]: 2.681e-05 [a_3]: 3.995e-05 [Cycle 2]: 0.00059298, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 7.35e-06 [loop_unroll]: 5.47999e-06 [a_1]: 0.00012422 [with_stream_mark]: 9.69e-06 [recompute_prepare]: 5.50001e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.27999e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.727e-05 [accelerated_algorithm]: 5.57999e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.24e-06 [merge_send_recv]: 4.49998e-06 [auto_parallel]: 5.37001e-06 [parallel]: 3.88001e-06 [flash_sp]: 2.93998e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.86e-06 [matmul_add_comm_reduction]: 5.01002e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 5.96e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 4.96002e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.76e-06 [cell_reuse_recompute_pass]: 1.40001e-06 [offload_activation]: 6.04001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71998e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 7.71001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99001e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.55999e-06 [a_after_grad]: 8.05e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.10999e-06 [auto_monad_grad]: 8.39995e-07 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.373e-05 [a_3]: 3.159e-05 [py_interpret_to_execute_after_opt_a]: 7.43e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 3.161e-05 [convert_after_rewriter]: 7e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00045409 [opt_b]: 0.00017995, [1] [Cycle 1]: 0.00017394, [7] [b_1]: 0.000107 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.26e-06 [updatestate_loads_eliminate]: 2.22001e-06 [renormalize]: 6.89994e-07 [cse]: 1.589e-05 [optimize_parallel_all_gather_comm]: 1.529e-05 [overlap_param_gather]: 1.88997e-06 [cconv]: 2.169e-05 [loop_unroll]: 0.00041612 [opt_after_cconv]: 9.398e-05, [1] [Cycle 1]: 8.822e-05, [7] [c_1]: 2.757e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.598e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.279e-05 [tuple_transform]: 6.854e-05, [1] [Cycle 1]: 6.423e-05, [4] [d_1]: 3.839e-05 [none_parameter_eliminate]: 1.52999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 5.95002e-06 [partial_unused_args_eliminate]: 1.91e-06 [add_recomputation]: 4.239e-05 [cse_after_recomputation]: 2.094e-05, [1] [Cycle 1]: 1.669e-05, [1] [cse]: 1.131e-05 [environ_conv]: 4.61002e-06 [swap_dp_allreduce_reducescatter]: 5.29e-06 [bias_add_comm_swap]: 2.44999e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.49999e-06 [micro_interleaved_order_control]: 2.71e-06 [assign_add_opt]: 1.14e-06 [ForceFp32Comm]: 7.80012e-07 [remove_cast_before_assign_add]: 1.04e-06 [full_micro_interleaved_order_control]: 2.12001e-06 [reorder_send_recv_between_fp_bp]: 2.44999e-06 [comm_op_add_attrs]: 1.31998e-06 [add_comm_op_reuse_tag]: 1.06997e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.11997e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.169e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 3.64002e-06 [overlap_recompute_and_grad_model_parallel]: 4.68999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14e-06 [overlap_recompute_allgather_and_fa_grad]: 1.64e-06 [overlap_recompute_comm]: 2.39999e-06 [overlap_grad_ring_attention]: 4.05e-06 [overlap_grad_flash_sp]: 1.648e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.26998e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.724e-05, [1] [Cycle 1]: 6.304e-05, [6] [build]: 2.47001e-06 [elim_shapecalc]: 8.70999e-06 [elim_not_effective]: 1.116e-05 [opt_reshape]: 5.89e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.97001e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.544e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00044907 [validate]: 3.104e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.0578063 [execute]: 8.62e-06 Sums bootstrap : 0.000478s : 0.71% type_inference : 0.005470s : 8.10% event_method : 0.000015s : 0.02% auto_monad : 0.000055s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.09% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000040s : 0.06% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000567s : 0.84% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000434s : 0.64% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.67% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000416s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.66% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057806s : 85.56% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000160 30 15.21% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.16% : 0.000005s : 4: substitution.graph_param_transform 65.93% : 0.000105s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.80% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.95% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005430 2 90.03% : 0.004889s : 1: type_inference.infer 9.97% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.45% : 0.000026s : 3: replace.inline 31.55% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 5 91.17% : 0.000103s : 3: match.inline 8.83% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.93% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.35% : 0.000004s : 19: predicate.arithmetic_simplify 0.84% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.12% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.87% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.96% : 0.000002s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.78% : 0.000001s : 8: predicate.less_batch_normalization 1.74% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.23% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.75% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.90% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.21% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.26% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.79% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.17% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 46.83% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.17% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.079839 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.67% : 0.002929s : 1: add_attr 3.66% : 0.002920s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.64% : 0.000514s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.03% : 0.000020s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.53% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.17% : 0.000931s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.65% : 0.002119s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.57% : 0.000458s : 1: opt_after_jit_grad 0.23% : 0.000183s : 1: opt_b 4.95% : 0.003951s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.26% : 0.000205s : 1: renormalize.infer 0.28% : 0.000222s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.08% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 72.42% : 0.057823s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 6.87% : 0.005484s : 1: type_inference 0.06% : 0.000051s : 1: validate TotalTime = 0.107388, [24] [bootstrap]: 0.00050095 [type_inference]: 0.0113287 [event_method]: 4.831e-05 [auto_monad]: 0.00011805 [graph_reusing]: 7.55e-06 [inline]: 1.62999e-06 [add_attr]: 0.00297408, [1] [add_attr_with_inline]: 0.00296561, [1] [Cycle 1]: 6.919e-05, [2] [tag_attr]: 3.425e-05 [meta_addattr_fg_expand]: 9.16002e-06 [parallel-infer-symbol]: 3.33998e-06 [pre_auto_parallel]: 4.835e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.87999e-06 [optimize]: 0.0133665, [53] [py_interpret_to_execute]: 3.84e-05 [rewriter_before_opt_a]: 0.00016065 [opt_a]: 0.0110117, [3] [Cycle 1]: 0.0070735, [45] [expand_dump_flag]: 3.60998e-06 [switch_simplify]: 7.379e-05 [loop_unroll]: 6.195e-05 [a_1]: 0.00144312 [with_stream_mark]: 2.328e-05 [recompute_prepare]: 2.177e-05 [updatestate_depend_eliminate]: 8.87e-06 [updatestate_assign_eliminate]: 7.82e-06 [updatestate_loads_eliminate]: 7.31001e-06 [parameter_eliminate]: 2.62001e-06 [a_2]: 0.00024217 [accelerated_algorithm]: 2.976e-05 [shard]: 1.81e-06 [meta_shard_fg_expand]: 3.16001e-06 [shard_inline]: 1.584e-05 [merge_send_recv]: 1.552e-05 [auto_parallel]: 1.11e-05 [parallel]: 1.716e-05 [flash_sp]: 1.143e-05 [merge_comm]: 9.96e-06 [allreduce_fusion]: 8.90999e-06 [matmul_add_comm_reduction]: 2.619e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.808e-05 [virtual_dataset]: 1.558e-05 [get_grad_eliminate_]: 1.484e-05 [virtual_output]: 1.573e-05 [merge_forward]: 9.85002e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 1.756e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.863e-05 [merge_recompute_call_nodes]: 1.68997e-06 [before_grad]: 2.812e-05 [set_forward_comm_id_for_comm_node_pass]: 9.42001e-06 [meta_fg_expand]: 0.0013962 [flash_sp_send_recv_attached]: 3.44001e-06 [receive_attached]: 2.37001e-06 [after_resolve]: 0.00011222 [a_after_grad]: 8.432e-05 [renormalize]: 0.00241811 [add_forward_monad_depend]: 8.95001e-06 [auto_monad_grad]: 5.25999e-06 [auto_monad_eliminator]: 5.592e-05 [cse]: 0.00016761 [a_3]: 0.0003348 [Cycle 2]: 0.00302931, [45] [expand_dump_flag]: 1.51002e-06 [switch_simplify]: 4.705e-05 [loop_unroll]: 4.411e-05 [a_1]: 0.00155612 [with_stream_mark]: 1.167e-05 [recompute_prepare]: 1.099e-05 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 4.43001e-06 [updatestate_loads_eliminate]: 3.6e-06 [parameter_eliminate]: 1.02e-06 [a_2]: 0.00012596 [accelerated_algorithm]: 1.175e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 9.12001e-06 [merge_send_recv]: 7.01001e-06 [auto_parallel]: 7.4e-06 [parallel]: 4.75001e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 5.05999e-06 [allreduce_fusion]: 4.52e-06 [matmul_add_comm_reduction]: 7.53999e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 1.03e-05 [virtual_dataset]: 8.84e-06 [get_grad_eliminate_]: 9.81e-06 [virtual_output]: 8.75999e-06 [merge_forward]: 4.27e-06 [cell_reuse_recompute_pass]: 9.49978e-07 [offload_activation]: 9.09e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.639e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.461e-05 [set_forward_comm_id_for_comm_node_pass]: 5.34998e-06 [meta_fg_expand]: 6.883e-05 [flash_sp_send_recv_attached]: 9.89996e-07 [receive_attached]: 1.17e-06 [after_resolve]: 1.618e-05 [a_after_grad]: 1.449e-05 [renormalize]: 0.00059376 [add_forward_monad_depend]: 3.89997e-06 [auto_monad_grad]: 1.27999e-06 [auto_monad_eliminator]: 1.492e-05 [cse]: 4.648e-05 [a_3]: 6.47e-05 [Cycle 3]: 0.00089425, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.091e-05 [loop_unroll]: 8.77e-06 [a_1]: 0.00024832 [with_stream_mark]: 9.81e-06 [recompute_prepare]: 9.05999e-06 [updatestate_depend_eliminate]: 4.74e-06 [updatestate_assign_eliminate]: 3.7e-06 [updatestate_loads_eliminate]: 3.73999e-06 [parameter_eliminate]: 9.70002e-07 [a_2]: 0.00012222 [accelerated_algorithm]: 1.137e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.72999e-06 [shard_inline]: 8.80001e-06 [merge_send_recv]: 6.68e-06 [auto_parallel]: 6.94001e-06 [parallel]: 4.32e-06 [flash_sp]: 1.06002e-06 [merge_comm]: 4.87998e-06 [allreduce_fusion]: 4.78001e-06 [matmul_add_comm_reduction]: 7.51999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 9.92999e-06 [virtual_dataset]: 8.89003e-06 [get_grad_eliminate_]: 8.38999e-06 [virtual_output]: 8.11002e-06 [merge_forward]: 4.17e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 8.39002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.597e-05 [merge_recompute_call_nodes]: 7.59988e-07 [before_grad]: 1.378e-05 [set_forward_comm_id_for_comm_node_pass]: 5.18002e-06 [meta_fg_expand]: 2.83e-06 [flash_sp_send_recv_attached]: 9.19972e-07 [receive_attached]: 1.04003e-06 [after_resolve]: 1.508e-05 [a_after_grad]: 1.465e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.31002e-06 [auto_monad_grad]: 1.01997e-06 [auto_monad_eliminator]: 1.071e-05 [cse]: 2.666e-05 [a_3]: 5.854e-05 [py_interpret_to_execute_after_opt_a]: 1.022e-05 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 4.628e-05 [convert_after_rewriter]: 8.97999e-06 [order_py_execute_after_rewriter]: 6.96001e-06 [mutable_eliminate]: 0.00046319 [opt_b]: 0.00028713, [1] [Cycle 1]: 0.00028084, [7] [b_1]: 0.00018871 [b_2]: 1.04e-05 [updatestate_depend_eliminate]: 7.37002e-06 [updatestate_assign_eliminate]: 4.02002e-06 [updatestate_loads_eliminate]: 4.1e-06 [renormalize]: 3.69997e-07 [cse]: 3.172e-05 [optimize_parallel_all_gather_comm]: 8.756e-05 [overlap_param_gather]: 2.23998e-06 [cconv]: 1.916e-05 [loop_unroll]: 0.00043117 [opt_after_cconv]: 0.00013634, [1] [Cycle 1]: 0.00013045, [7] [c_1]: 4.844e-05 [parameter_eliminate]: 2.12001e-06 [updatestate_depend_eliminate]: 7.22002e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.9e-06 [cse]: 3.08e-05 [renormalize]: 4.39992e-07 [remove_dup_value]: 2.915e-05 [tuple_transform]: 0.00010109, [1] [Cycle 1]: 9.625e-05, [4] [d_1]: 6.703e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 2.60014e-07 [switch_simplify]: 9.52001e-06 [partial_unused_args_eliminate]: 1.71e-06 [add_recomputation]: 5.709e-05 [cse_after_recomputation]: 3.276e-05, [1] [Cycle 1]: 2.794e-05, [1] [cse]: 2.24e-05 [environ_conv]: 8.67e-06 [swap_dp_allreduce_reducescatter]: 7.82998e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4.70999e-06 [label_fine_grained_interleaved_index]: 2.54001e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.69001e-06 [micro_interleaved_order_control]: 2.04e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 7.7e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.22001e-06 [reorder_send_recv_between_fp_bp]: 2.59001e-06 [comm_op_add_attrs]: 1.07e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.82001e-06 [control_data_broadcast_order]: 1.671e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.70999e-06 [overlap_recompute_and_grad_model_parallel]: 5.89999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.03997e-06 [overlap_grad_ring_attention]: 5.56e-06 [overlap_grad_flash_sp]: 2.366e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.67001e-06 [handle_group_info]: 1.17999e-06 [symbol_engine_optimizer]: 9.934e-05, [1] [Cycle 1]: 9.516e-05, [6] [build]: 1.016e-05 [elim_shapecalc]: 1.342e-05 [elim_not_effective]: 1.807e-05 [opt_reshape]: 1.007e-05 [fold_const_symbol]: 1.496e-05 [renormalize]: 2.3999e-07 [detach_backward]: 2.04e-06 [pipeline_parallel_scheduler]: 1.37999e-06 [auto_monad_reorder]: 2.485e-05 [get_jit_bprop_graph]: 1.06002e-06 [rewriter_after_jit_bprop_graph]: 3.70003e-06 [opt_after_jit_grad]: 0.00046975 [validate]: 4.539e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.0782207 [execute]: 7.93001e-06 Sums bootstrap : 0.000501s : 0.49% type_inference : 0.011329s : 10.98% event_method : 0.000048s : 0.05% auto_monad : 0.000118s : 0.11% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000048s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.04% optimize.rewriter_before_opt_a : 0.000161s : 0.16% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000132s : 0.13% optimize.opt_a.loop_unroll : 0.000115s : 0.11% optimize.opt_a.a_1 : 0.003248s : 3.15% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000490s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000053s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000034s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000018s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000033s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001468s : 1.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000143s : 0.14% optimize.opt_a.a_after_grad : 0.000113s : 0.11% optimize.opt_a.renormalize : 0.003012s : 2.92% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.08% optimize.opt_a.cse : 0.000241s : 0.23% optimize.opt_a.a_3 : 0.000458s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.04% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000463s : 0.45% optimize.opt_b.b_1 : 0.000189s : 0.18% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000088s : 0.08% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000431s : 0.42% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.06% optimize.cse_after_recomputation.cse : 0.000022s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000006s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000470s : 0.46% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.078221s : 75.82% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000761 222 5.94% : 0.000045s : 12: substitution.arithmetic_simplify 1.85% : 0.000014s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.48% : 0.000004s : 5: substitution.float_depend_g_call 0.52% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000007s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.01% : 0.000426s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.35% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.87% : 0.000014s : 3: substitution.less_batch_normalization 1.71% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000006s : 5: substitution.partial_eliminate 1.75% : 0.000013s : 20: substitution.remove_not_recompute_node 3.11% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000011s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.59% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.56% : 0.000065s : 30: substitution.tuple_list_get_item_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011255 2 86.35% : 0.009719s : 1: type_inference.infer 13.65% : 0.001536s : 1: type_inference.specialize ------[replace.] 0.000218 33 58.11% : 0.000127s : 17: replace.inline 41.89% : 0.000091s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 33 92.62% : 0.000417s : 17: match.inline 7.38% : 0.000033s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000751 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 68: predicate.addn_zero_filter 1.05% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 1.98% : 0.000015s : 100: predicate.arithmetic_simplify 1.15% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.21% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.37% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.76% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.32% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.60% : 0.000042s : 249: predicate.inline 1.22% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.62% : 0.000005s : 32: predicate.less_batch_normalization 1.66% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.68% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.43% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.13% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.13% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000015s : 101: predicate.partial_defer_inline 1.77% : 0.000013s : 92: predicate.partial_eliminate 1.10% : 0.000008s : 68: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 68: predicate.reduce_eliminate 2.69% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.20% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.88% : 0.000014s : 101: predicate.switch_defer_inline 2.96% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.00% : 0.000037s : 277: predicate.switch_simplify 1.09% : 0.000008s : 68: predicate.tile_eliminate 1.08% : 0.000008s : 68: predicate.transpose_eliminate 1.44% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.43% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.68% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.33% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.58% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001592 34 56.76% : 0.000904s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.24% : 0.000688s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.132028 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.26% : 0.002979s : 1: add_attr 2.25% : 0.002970s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000062s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000125s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000535s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000056s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000011s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.33% : 0.000440s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000472s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.76% : 0.004963s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000174s : 28: opt.transform.opt_b 0.06% : 0.000074s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.34% : 0.011015s : 1: opt_a 0.11% : 0.000140s : 1: opt_after_cconv 0.36% : 0.000479s : 1: opt_after_jit_grad 0.22% : 0.000291s : 1: opt_b 10.13% : 0.013370s : 1: optimize 0.07% : 0.000092s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000009s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000053s : 1: pre_auto_parallel 0.03% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.21% : 0.001592s : 2: renormalize.infer 1.07% : 0.001408s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.13% : 0.000166s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000102s : 1: symbol_engine_optimizer 59.26% : 0.078237s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 8.59% : 0.011343s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.0746418, [24] [bootstrap]: 0.00045797 [type_inference]: 0.00425416 [event_method]: 1.109e-05 [auto_monad]: 5.103e-05 [graph_reusing]: 5.29998e-06 [inline]: 1.97001e-06 [add_attr]: 0.00295704, [1] [add_attr_with_inline]: 0.00294891, [1] [Cycle 1]: 4.405e-05, [2] [tag_attr]: 1.138e-05 [meta_addattr_fg_expand]: 2.96999e-06 [parallel-infer-symbol]: 2.69999e-06 [pre_auto_parallel]: 2.103e-05 [insert-virtual-dataset]: 2.24999e-06 [parallel-infer-symbol-second]: 1.00999e-06 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.00367538, [53] [py_interpret_to_execute]: 1.608e-05 [rewriter_before_opt_a]: 3.775e-05 [opt_a]: 0.00184699, [2] [Cycle 1]: 0.00124979, [45] [expand_dump_flag]: 2.96001e-06 [switch_simplify]: 2.377e-05 [loop_unroll]: 1.389e-05 [a_1]: 0.0002895 [with_stream_mark]: 1.317e-05 [recompute_prepare]: 7.53999e-06 [updatestate_depend_eliminate]: 3.74002e-06 [updatestate_assign_eliminate]: 3.03998e-06 [updatestate_loads_eliminate]: 3.04001e-06 [parameter_eliminate]: 1.75001e-06 [a_2]: 7.579e-05 [accelerated_algorithm]: 6.49999e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.57001e-06 [shard_inline]: 5.78002e-06 [merge_send_recv]: 7.85998e-06 [auto_parallel]: 5.74e-06 [parallel]: 1.718e-05 [flash_sp]: 6.99001e-06 [merge_comm]: 3.66001e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 9.12001e-06 [allreduce_slice_to_reducescatter]: 6.29982e-07 [virtual_shard_identity]: 7.43999e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.64e-06 [virtual_output]: 5.67001e-06 [merge_forward]: 3.50003e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 8.94e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.097e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 9.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.36998e-06 [flash_sp_send_recv_attached]: 2.90002e-06 [receive_attached]: 2.15002e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 8.97e-06 [renormalize]: 0.00034143 [add_forward_monad_depend]: 4.35e-06 [auto_monad_grad]: 1.97001e-06 [auto_monad_eliminator]: 1.326e-05 [cse]: 2.688e-05 [a_3]: 4.056e-05 [Cycle 2]: 0.0005882, [45] [expand_dump_flag]: 9.10019e-07 [switch_simplify]: 6.71e-06 [loop_unroll]: 5.27999e-06 [a_1]: 0.00012335 [with_stream_mark]: 9.04998e-06 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.82002e-06 [updatestate_assign_eliminate]: 2.22999e-06 [updatestate_loads_eliminate]: 2.44999e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 7.055e-05 [accelerated_algorithm]: 5.36998e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.34e-06 [merge_send_recv]: 4.25999e-06 [auto_parallel]: 5.15999e-06 [parallel]: 4.14002e-06 [flash_sp]: 3.15998e-06 [merge_comm]: 2.98e-06 [allreduce_fusion]: 2.99001e-06 [matmul_add_comm_reduction]: 4.79e-06 [allreduce_slice_to_reducescatter]: 3.4002e-07 [virtual_shard_identity]: 5.71e-06 [virtual_dataset]: 5.05999e-06 [get_grad_eliminate_]: 4.95999e-06 [virtual_output]: 4.86002e-06 [merge_forward]: 2.31e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 5.81e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41998e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 2.93e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.12999e-06 [after_resolve]: 9.63002e-06 [a_after_grad]: 8.07e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.21998e-06 [cse]: 1.19e-05 [a_3]: 3.141e-05 [py_interpret_to_execute_after_opt_a]: 7.35e-06 [slice_cell_reuse_recomputed_activation]: 1.92999e-06 [rewriter_after_opt_a]: 3.034e-05 [convert_after_rewriter]: 7.25998e-06 [order_py_execute_after_rewriter]: 4.90999e-06 [mutable_eliminate]: 0.00044498 [opt_b]: 0.00017914, [1] [Cycle 1]: 0.00017354, [7] [b_1]: 0.00010651 [b_2]: 7.06001e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.24999e-06 [renormalize]: 3.7998e-07 [cse]: 1.631e-05 [optimize_parallel_all_gather_comm]: 1.483e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.245e-05 [loop_unroll]: 0.00040752 [opt_after_cconv]: 9.38e-05, [1] [Cycle 1]: 8.821e-05, [7] [c_1]: 2.741e-05 [parameter_eliminate]: 2.25002e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.08002e-06 [cse]: 1.632e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.185e-05 [tuple_transform]: 6.748e-05, [1] [Cycle 1]: 6.331e-05, [4] [d_1]: 3.824e-05 [none_parameter_eliminate]: 1.62001e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 1.58002e-06 [add_recomputation]: 8.375e-05 [cse_after_recomputation]: 2.224e-05, [1] [Cycle 1]: 1.758e-05, [1] [cse]: 1.219e-05 [environ_conv]: 4.58999e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.63e-06 [label_micro_interleaved_index]: 4.05e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.34001e-06 [micro_interleaved_order_control]: 2.23998e-06 [assign_add_opt]: 1.16002e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.68e-06 [comm_op_add_attrs]: 1.26002e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.20999e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.04e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 3.75998e-06 [overlap_recompute_and_grad_model_parallel]: 4.28001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.40002e-06 [overlap_grad_ring_attention]: 4.03999e-06 [overlap_grad_flash_sp]: 1.647e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.98002e-06 [split_layernorm_comm]: 1.96998e-06 [handle_group_info]: 9.20001e-07 [symbol_engine_optimizer]: 6.837e-05, [1] [Cycle 1]: 6.418e-05, [6] [build]: 2.11998e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.186e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.57e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.87999e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 1.01997e-06 [rewriter_after_jit_bprop_graph]: 3.14999e-06 [opt_after_jit_grad]: 0.00044174 [validate]: 3.129e-05 [backend_pass]: 8.29983e-07 [task_emit]: 0.0624949 [execute]: 8.38001e-06 Sums bootstrap : 0.000458s : 0.65% type_inference : 0.004254s : 6.02% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000030s : 0.04% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000413s : 0.58% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000146s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000342s : 0.48% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.63% optimize.opt_b.b_1 : 0.000107s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000408s : 0.58% optimize.opt_after_cconv.c_1 : 0.000027s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000084s : 0.12% optimize.cse_after_recomputation.cse : 0.000012s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000442s : 0.62% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.062495s : 88.36% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000119 26 17.48% : 0.000021s : 4: substitution.arithmetic_simplify 1.77% : 0.000002s : 2: substitution.elim_not_effective 1.08% : 0.000001s : 2: substitution.fold_const_symbol 4.33% : 0.000005s : 4: substitution.graph_param_transform 65.92% : 0.000078s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.60% : 0.000004s : 4: substitution.remove_not_recompute_node 3.55% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004215 2 91.67% : 0.003864s : 1: type_inference.infer 8.33% : 0.000351s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.02% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.43% : 0.000003s : 17: predicate.arithmetic_simplify 0.96% : 0.000001s : 9: predicate.cast_eliminate 0.96% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.76% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.80% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.19% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.06% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.81% : 0.000002s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.07% : 0.000001s : 8: predicate.less_batch_normalization 1.68% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.85% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.80% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.71% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.63% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.75% : 0.000001s : 9: predicate.print_const_string_wrapper 0.68% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.28% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.97% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.95% : 0.000001s : 8: predicate.special_op_eliminate 1.01% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.06% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.08% : 0.000001s : 11: predicate.switch_defer_inline 1.83% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.47% : 0.000006s : 41: predicate.switch_simplify 0.80% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.28% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 41.71% : 0.000100s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.29% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.082532 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.59% : 0.002961s : 1: add_attr 3.58% : 0.002953s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.11% : 0.000088s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.60% : 0.000493s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000025s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000015s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.50% : 0.000416s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.55% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 0.92% : 0.000761s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.24% : 0.001850s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.55% : 0.000452s : 1: opt_after_jit_grad 0.22% : 0.000183s : 1: opt_b 4.46% : 0.003679s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.22% : 0.000185s : 1: renormalize.infer 0.18% : 0.000150s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 75.74% : 0.062511s : 1: task_emit 0.09% : 0.000070s : 1: tuple_transform 5.17% : 0.004268s : 1: type_inference 0.06% : 0.000054s : 1: validate TotalTime = 0.105366, [24] [bootstrap]: 0.00050409 [type_inference]: 0.0102412 [event_method]: 4.411e-05 [auto_monad]: 0.0001136 [graph_reusing]: 8.53001e-06 [inline]: 1.97999e-06 [add_attr]: 0.00299255, [1] [add_attr_with_inline]: 0.00298462, [1] [Cycle 1]: 6.39e-05, [2] [tag_attr]: 3.101e-05 [meta_addattr_fg_expand]: 8.32e-06 [parallel-infer-symbol]: 3.2e-06 [pre_auto_parallel]: 4.585e-05 [insert-virtual-dataset]: 2.37001e-06 [parallel-infer-symbol-second]: 7.79983e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.86e-06 [optimize]: 0.0130607, [53] [py_interpret_to_execute]: 3.663e-05 [rewriter_before_opt_a]: 0.00012723 [opt_a]: 0.0107919, [3] [Cycle 1]: 0.00689775, [45] [expand_dump_flag]: 3.6e-06 [switch_simplify]: 6.698e-05 [loop_unroll]: 5.512e-05 [a_1]: 0.00132759 [with_stream_mark]: 2.357e-05 [recompute_prepare]: 2.134e-05 [updatestate_depend_eliminate]: 9.00001e-06 [updatestate_assign_eliminate]: 8.49998e-06 [updatestate_loads_eliminate]: 7.27997e-06 [parameter_eliminate]: 2.50002e-06 [a_2]: 0.00024316 [accelerated_algorithm]: 3.143e-05 [shard]: 1.77001e-06 [meta_shard_fg_expand]: 3.19001e-06 [shard_inline]: 1.612e-05 [merge_send_recv]: 1.634e-05 [auto_parallel]: 1.108e-05 [parallel]: 1.902e-05 [flash_sp]: 1.082e-05 [merge_comm]: 9.74e-06 [allreduce_fusion]: 9.15999e-06 [matmul_add_comm_reduction]: 2.574e-05 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 1.789e-05 [virtual_dataset]: 1.581e-05 [get_grad_eliminate_]: 1.516e-05 [virtual_output]: 1.523e-05 [merge_forward]: 9.67999e-06 [cell_reuse_recompute_pass]: 1.16002e-06 [offload_activation]: 1.787e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.825e-05 [merge_recompute_call_nodes]: 1.39e-06 [before_grad]: 2.741e-05 [set_forward_comm_id_for_comm_node_pass]: 9.54e-06 [meta_fg_expand]: 0.00142374 [flash_sp_send_recv_attached]: 3.73999e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 5.972e-05 [a_after_grad]: 8.007e-05 [renormalize]: 0.00239789 [add_forward_monad_depend]: 9.12999e-06 [auto_monad_grad]: 5.57999e-06 [auto_monad_eliminator]: 5.582e-05 [cse]: 0.00016646 [a_3]: 0.00033509 [Cycle 2]: 0.00297745, [45] [expand_dump_flag]: 1.43002e-06 [switch_simplify]: 4.715e-05 [loop_unroll]: 4.401e-05 [a_1]: 0.00156074 [with_stream_mark]: 1.167e-05 [recompute_prepare]: 1.092e-05 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 3.58999e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.0001263 [accelerated_algorithm]: 1.197e-05 [shard]: 9.00007e-07 [meta_shard_fg_expand]: 1.79e-06 [shard_inline]: 9.39e-06 [merge_send_recv]: 6.79001e-06 [auto_parallel]: 7.51999e-06 [parallel]: 4.60001e-06 [flash_sp]: 3.09999e-06 [merge_comm]: 5.20001e-06 [allreduce_fusion]: 4.62998e-06 [matmul_add_comm_reduction]: 7.85e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 1.004e-05 [virtual_dataset]: 8.55999e-06 [get_grad_eliminate_]: 8.73001e-06 [virtual_output]: 8.29998e-06 [merge_forward]: 4.23001e-06 [cell_reuse_recompute_pass]: 1.18001e-06 [offload_activation]: 9.57001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.628e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.421e-05 [set_forward_comm_id_for_comm_node_pass]: 5.12999e-06 [meta_fg_expand]: 3.37e-05 [flash_sp_send_recv_attached]: 8.30012e-07 [receive_attached]: 1.15001e-06 [after_resolve]: 1.47e-05 [a_after_grad]: 1.429e-05 [renormalize]: 0.00057651 [add_forward_monad_depend]: 3.95998e-06 [auto_monad_grad]: 1.30999e-06 [auto_monad_eliminator]: 1.445e-05 [cse]: 4.554e-05 [a_3]: 6.539e-05 [Cycle 3]: 0.00090269, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.066e-05 [loop_unroll]: 8.99998e-06 [a_1]: 0.0002504 [with_stream_mark]: 9.92999e-06 [recompute_prepare]: 9.71e-06 [updatestate_depend_eliminate]: 4.90001e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 3.95e-06 [parameter_eliminate]: 9.5999e-07 [a_2]: 0.00012372 [accelerated_algorithm]: 1.166e-05 [shard]: 9.30013e-07 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.54e-06 [merge_send_recv]: 6.96001e-06 [auto_parallel]: 6.98e-06 [parallel]: 4.38999e-06 [flash_sp]: 1.12e-06 [merge_comm]: 4.87998e-06 [allreduce_fusion]: 4.90999e-06 [matmul_add_comm_reduction]: 7.69002e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 9.89001e-06 [virtual_dataset]: 8.47998e-06 [get_grad_eliminate_]: 8.54e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.15999e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 8.45999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.632e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.413e-05 [set_forward_comm_id_for_comm_node_pass]: 5.52999e-06 [meta_fg_expand]: 2.88e-06 [flash_sp_send_recv_attached]: 8.60018e-07 [receive_attached]: 9.5999e-07 [after_resolve]: 1.508e-05 [a_after_grad]: 1.389e-05 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.33002e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 1.064e-05 [cse]: 2.664e-05 [a_3]: 5.888e-05 [py_interpret_to_execute_after_opt_a]: 1.004e-05 [slice_cell_reuse_recomputed_activation]: 2.06e-06 [rewriter_after_opt_a]: 4.664e-05 [convert_after_rewriter]: 9.83002e-06 [order_py_execute_after_rewriter]: 6.64001e-06 [mutable_eliminate]: 0.00045966 [opt_b]: 0.0003183, [1] [Cycle 1]: 0.00031197, [7] [b_1]: 0.00021756 [b_2]: 1.109e-05 [updatestate_depend_eliminate]: 6.99001e-06 [updatestate_assign_eliminate]: 4.27e-06 [updatestate_loads_eliminate]: 3.98999e-06 [renormalize]: 3.59985e-07 [cse]: 3.207e-05 [optimize_parallel_all_gather_comm]: 2.054e-05 [overlap_param_gather]: 1.92001e-06 [cconv]: 1.979e-05 [loop_unroll]: 0.00042558 [opt_after_cconv]: 0.00013567, [1] [Cycle 1]: 0.00012997, [7] [c_1]: 4.769e-05 [parameter_eliminate]: 2.19999e-06 [updatestate_depend_eliminate]: 6.96001e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 3.79002e-06 [cse]: 3.076e-05 [renormalize]: 4.80009e-07 [remove_dup_value]: 2.887e-05 [tuple_transform]: 0.00010134, [1] [Cycle 1]: 9.677e-05, [4] [d_1]: 6.675e-05 [none_parameter_eliminate]: 1.66002e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 1.011e-05 [partial_unused_args_eliminate]: 2.25002e-06 [add_recomputation]: 5.687e-05 [cse_after_recomputation]: 3.297e-05, [1] [Cycle 1]: 2.819e-05, [1] [cse]: 2.257e-05 [environ_conv]: 9.80002e-06 [swap_dp_allreduce_reducescatter]: 7.95998e-06 [bias_add_comm_swap]: 2.27999e-06 [label_micro_interleaved_index]: 3.85e-06 [label_fine_grained_interleaved_index]: 2.46e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.02001e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 8.90024e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.26002e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.692e-05 [grouped_pairwise_exchange_alltoall]: 1.44e-06 [offloading_packed_experts]: 4.82e-06 [overlap_recompute_and_grad_model_parallel]: 5.42001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.38002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.95001e-06 [overlap_grad_flash_sp]: 2.362e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.86e-06 [split_layernorm_comm]: 1.72999e-06 [handle_group_info]: 1.20999e-06 [symbol_engine_optimizer]: 9.788e-05, [1] [Cycle 1]: 9.376e-05, [6] [build]: 1.01e-05 [elim_shapecalc]: 1.326e-05 [elim_not_effective]: 1.847e-05 [opt_reshape]: 9.94999e-06 [fold_const_symbol]: 1.466e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.69998e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 2.416e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00047037 [validate]: 4.52e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.0775782 [execute]: 9.07001e-06 Sums bootstrap : 0.000504s : 0.50% type_inference : 0.010241s : 10.13% event_method : 0.000044s : 0.04% auto_monad : 0.000114s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000037s : 0.04% optimize.rewriter_before_opt_a : 0.000127s : 0.13% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000125s : 0.12% optimize.opt_a.loop_unroll : 0.000108s : 0.11% optimize.opt_a.a_1 : 0.003139s : 3.10% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000042s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000493s : 0.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000033s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.04% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001460s : 1.44% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.09% optimize.opt_a.a_after_grad : 0.000108s : 0.11% optimize.opt_a.renormalize : 0.002974s : 2.94% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000008s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.08% optimize.opt_a.cse : 0.000239s : 0.24% optimize.opt_a.a_3 : 0.000459s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.05% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000460s : 0.45% optimize.opt_b.b_1 : 0.000218s : 0.22% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000426s : 0.42% optimize.opt_after_cconv.c_1 : 0.000048s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000031s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000067s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000024s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000470s : 0.47% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.077578s : 76.72% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000729 218 5.82% : 0.000042s : 11: substitution.arithmetic_simplify 1.85% : 0.000013s : 2: substitution.cast_eliminate 0.39% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.57% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.36% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 54.80% : 0.000400s : 16: substitution.inline 2.09% : 0.000015s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.09% : 0.000015s : 3: substitution.less_batch_normalization 1.83% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000005s : 5: substitution.partial_eliminate 1.87% : 0.000014s : 20: substitution.remove_not_recompute_node 3.31% : 0.000024s : 10: substitution.replace_applicator 1.51% : 0.000011s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.75% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.84% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.44% : 0.000062s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010174 2 86.75% : 0.008826s : 1: type_inference.infer 13.25% : 0.001348s : 1: type_inference.specialize ------[replace.] 0.000221 30 53.59% : 0.000119s : 16: replace.inline 46.41% : 0.000103s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000421 30 92.84% : 0.000391s : 16: match.inline 7.16% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000767 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 67: predicate.addn_zero_filter 1.02% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.06% : 0.000016s : 99: predicate.arithmetic_simplify 1.10% : 0.000008s : 67: predicate.cast_eliminate 1.11% : 0.000008s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.16% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.17% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.08% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.15% : 0.000009s : 75: predicate.environ_get_depend_swap 1.69% : 0.000013s : 107: predicate.environ_get_eliminate 1.16% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.62% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.15% : 0.000017s : 97: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.53% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.44% : 0.000042s : 244: predicate.inline 1.23% : 0.000009s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.69% : 0.000005s : 32: predicate.less_batch_normalization 1.55% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.55% : 0.000020s : 164: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.09% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.35% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.54% : 0.000004s : 32: predicate.merge_addn 1.09% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.10% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.08% : 0.000008s : 67: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.91% : 0.000015s : 97: predicate.partial_defer_inline 1.65% : 0.000013s : 89: predicate.partial_eliminate 1.01% : 0.000008s : 67: predicate.print_const_string_wrapper 0.51% : 0.000004s : 32: predicate.reduce_all_const_elim 1.27% : 0.000010s : 67: predicate.reduce_eliminate 2.57% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.32% : 0.000002s : 32: predicate.remove_not_recompute_node 1.81% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.05% : 0.000008s : 67: predicate.reshape_eliminate 1.15% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.58% : 0.000004s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.21% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.11% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.76% : 0.000014s : 97: predicate.switch_defer_inline 2.81% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.69% : 0.000036s : 265: predicate.switch_simplify 1.03% : 0.000008s : 67: predicate.tile_eliminate 1.04% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.27% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.95% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 5.12% : 0.000039s : 97: predicate.tuple_to_list_eliminator_ 2.53% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.13% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001530 32 56.50% : 0.000864s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.50% : 0.000665s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.129536 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.31% : 0.002997s : 1: add_attr 2.31% : 0.002988s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000121s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.42% : 0.000538s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.04% : 0.000051s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 3.70% : 0.004788s : 117: opt.transform.opt_a 0.04% : 0.000047s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.16% : 0.000203s : 28: opt.transform.opt_b 0.06% : 0.000075s : 2: opt.transform.opt_trans_graph 0.04% : 0.000053s : 4: opt.transform.symbol_engine_opt 8.33% : 0.010795s : 1: opt_a 0.11% : 0.000139s : 1: opt_after_cconv 0.37% : 0.000480s : 1: opt_after_jit_grad 0.25% : 0.000322s : 1: opt_b 10.09% : 0.013064s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000041s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.20% : 0.001560s : 2: renormalize.infer 1.08% : 0.001402s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000101s : 1: symbol_engine_optimizer 59.90% : 0.077596s : 1: task_emit 0.08% : 0.000104s : 1: tuple_transform 7.92% : 0.010256s : 1: type_inference 0.05% : 0.000069s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x1-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x1-ge],max_mem:64.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x2-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x2-pynative],max_mem:64.0M TotalTime = 0.0215716, [24] [bootstrap]: 0.00053087 [type_inference]: 0.00612811 [event_method]: 1.433e-05 [auto_monad]: 5.742e-05 [graph_reusing]: 5.74e-06 [inline]: 1.64998e-06 [add_attr]: 0.00339256, [1] [add_attr_with_inline]: 0.0033816, [1] [Cycle 1]: 4.389e-05, [2] [tag_attr]: 1.492e-05 [meta_addattr_fg_expand]: 3.93001e-06 [parallel-infer-symbol]: 2.65002e-06 [pre_auto_parallel]: 2.761e-05 [insert-virtual-dataset]: 2.94001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 2.17999e-06 [pipeline_split]: 1.88002e-06 [optimize]: 0.00397931, [53] [py_interpret_to_execute]: 1.971e-05 [rewriter_before_opt_a]: 5.795e-05 [opt_a]: 0.00215118, [2] [Cycle 1]: 0.00155077, [45] [expand_dump_flag]: 2.79999e-06 [switch_simplify]: 3.105e-05 [loop_unroll]: 2.12e-05 [a_1]: 0.00045046 [with_stream_mark]: 1.305e-05 [recompute_prepare]: 7.86001e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.52002e-06 [updatestate_loads_eliminate]: 2.73e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.637e-05 [accelerated_algorithm]: 5.99e-06 [shard]: 2.54001e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.93998e-06 [merge_send_recv]: 7.63999e-06 [auto_parallel]: 5.74e-06 [parallel]: 2.35e-05 [flash_sp]: 7.43e-06 [merge_comm]: 3.62998e-06 [allreduce_fusion]: 3.31001e-06 [matmul_add_comm_reduction]: 8.47e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.87002e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.32999e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.05001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.086e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.16998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.19999e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 8.65001e-06 [renormalize]: 0.00045906 [add_forward_monad_depend]: 5.42001e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.326e-05 [cse]: 2.738e-05 [a_3]: 4.025e-05 [Cycle 2]: 0.00059099, [45] [expand_dump_flag]: 7.7e-07 [switch_simplify]: 6.68e-06 [loop_unroll]: 5.55001e-06 [a_1]: 0.00012601 [with_stream_mark]: 9.28002e-06 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 6.712e-05 [accelerated_algorithm]: 5.46998e-06 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.21002e-06 [parallel]: 4.17e-06 [flash_sp]: 2.93998e-06 [merge_comm]: 2.95002e-06 [allreduce_fusion]: 2.61e-06 [matmul_add_comm_reduction]: 4.74e-06 [allreduce_slice_to_reducescatter]: 3.9002e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.16002e-06 [get_grad_eliminate_]: 5.00001e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.78e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.63002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41998e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.75e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96001e-06 [meta_fg_expand]: 1.66998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 9.16002e-06 [a_after_grad]: 7.90998e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.699e-05 [a_3]: 3.288e-05 [py_interpret_to_execute_after_opt_a]: 7.73999e-06 [slice_cell_reuse_recomputed_activation]: 1.68002e-06 [rewriter_after_opt_a]: 2.837e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.17e-06 [mutable_eliminate]: 0.00044559 [opt_b]: 0.0001822, [1] [Cycle 1]: 0.00017637, [7] [b_1]: 0.00010797 [b_2]: 7.17002e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.50002e-06 [renormalize]: 3.50003e-07 [cse]: 1.624e-05 [optimize_parallel_all_gather_comm]: 1.565e-05 [overlap_param_gather]: 2.22999e-06 [cconv]: 2.219e-05 [loop_unroll]: 0.00041128 [opt_after_cconv]: 9.537e-05, [1] [Cycle 1]: 8.954e-05, [7] [c_1]: 2.78e-05 [parameter_eliminate]: 2.11e-06 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 2.53003e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.627e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.254e-05 [tuple_transform]: 6.829e-05, [1] [Cycle 1]: 6.404e-05, [4] [d_1]: 3.886e-05 [none_parameter_eliminate]: 1.32e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.12999e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.946e-05 [cse_after_recomputation]: 1.984e-05, [1] [Cycle 1]: 1.546e-05, [1] [cse]: 1.045e-05 [environ_conv]: 5.12e-06 [swap_dp_allreduce_reducescatter]: 4.95001e-06 [bias_add_comm_swap]: 2.65002e-06 [label_micro_interleaved_index]: 4.06001e-06 [label_fine_grained_interleaved_index]: 2.94999e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.56998e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.09e-06 [full_micro_interleaved_order_control]: 1.97999e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59998e-06 [control_data_broadcast_order]: 1.126e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 3.36999e-06 [overlap_recompute_and_grad_model_parallel]: 4.16001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34998e-06 [overlap_recompute_comm]: 2.13998e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.678e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 1.80001e-06 [handle_group_info]: 1.34998e-06 [symbol_engine_optimizer]: 6.92e-05, [1] [Cycle 1]: 6.51e-05, [6] [build]: 2.26e-06 [elim_shapecalc]: 8.65001e-06 [elim_not_effective]: 1.137e-05 [opt_reshape]: 6.44999e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.52001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 0.00012997 [opt_after_jit_grad]: 0.00045177 [validate]: 3.187e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00657755 [execute]: 6.48e-06 Sums bootstrap : 0.000531s : 3.08% type_inference : 0.006128s : 35.60% event_method : 0.000014s : 0.08% auto_monad : 0.000057s : 0.33% graph_reusing : 0.000006s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.11% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.22% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000576s : 3.35% optimize.opt_a.with_stream_mark : 0.000022s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000011s : 0.07% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000459s : 2.67% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000044s : 0.26% optimize.opt_a.a_3 : 0.000073s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 2.59% optimize.opt_b.b_1 : 0.000108s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000411s : 2.39% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000130s : 0.76% opt_after_jit_grad : 0.000452s : 2.62% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006578s : 38.22% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.96% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.42% : 0.000108s : 3: substitution.inline 1.75% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.63% : 0.000004s : 4: substitution.remove_not_recompute_node 2.57% : 0.000004s : 4: substitution.replace_old_param 6.57% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006085 2 90.37% : 0.005499s : 1: type_inference.infer 9.63% : 0.000586s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.84% : 0.000027s : 3: replace.inline 30.16% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.64% : 0.000106s : 3: match.inline 8.36% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.84% : 0.000001s : 11: predicate.accumulaten_eliminater 0.97% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.76% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.22% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.24% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.33% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.87% : 0.000003s : 23: predicate.environ_get_eliminate 1.04% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.54% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.93% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.82% : 0.000001s : 8: predicate.less_batch_normalization 1.80% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.34% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.61% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.09% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.41% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 21: predicate.replace_applicator 0.67% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.68% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.75% : 0.000001s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.87% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.18% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.51% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.69% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.26% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.35% : 0.000001s : 4: predicate.value_based_eliminate 0.72% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000366 8 46.01% : 0.000169s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.99% : 0.000198s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030495 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.14% : 0.003397s : 1: add_attr 11.10% : 0.003385s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000054s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000063s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.87% : 0.000570s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.07% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.38% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.49% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.08% : 0.000939s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.06% : 0.002154s : 1: opt_a 0.32% : 0.000099s : 1: opt_after_cconv 1.51% : 0.000462s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.06% : 0.003983s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.83% : 0.000254s : 1: renormalize.infer 0.65% : 0.000198s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.45% : 0.000136s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000032s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 21.60% : 0.006588s : 1: task_emit 0.23% : 0.000071s : 1: tuple_transform 20.14% : 0.006141s : 1: type_inference 0.20% : 0.000062s : 1: validate TotalTime = 0.0182243, [24] [bootstrap]: 0.0004794 [type_inference]: 0.00437768 [event_method]: 1.064e-05 [auto_monad]: 5.015e-05 [graph_reusing]: 4.82998e-06 [inline]: 1.52001e-06 [add_attr]: 0.00302077, [1] [add_attr_with_inline]: 0.00301264, [1] [Cycle 1]: 4.506e-05, [2] [tag_attr]: 1.146e-05 [meta_addattr_fg_expand]: 2.96999e-06 [parallel-infer-symbol]: 3.05998e-06 [pre_auto_parallel]: 2.118e-05 [insert-virtual-dataset]: 2.74999e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00366695, [53] [py_interpret_to_execute]: 1.5e-05 [rewriter_before_opt_a]: 3.846e-05 [opt_a]: 0.00186552, [2] [Cycle 1]: 0.00123613, [45] [expand_dump_flag]: 2.64001e-06 [switch_simplify]: 2.422e-05 [loop_unroll]: 1.355e-05 [a_1]: 0.00028988 [with_stream_mark]: 1.287e-05 [recompute_prepare]: 7.34002e-06 [updatestate_depend_eliminate]: 3.56001e-06 [updatestate_assign_eliminate]: 3.26999e-06 [updatestate_loads_eliminate]: 2.61999e-06 [parameter_eliminate]: 2.11e-06 [a_2]: 7.619e-05 [accelerated_algorithm]: 6.21e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 8.03001e-06 [auto_parallel]: 5.50001e-06 [parallel]: 1.658e-05 [flash_sp]: 7.01001e-06 [merge_comm]: 3.51001e-06 [allreduce_fusion]: 3.20998e-06 [matmul_add_comm_reduction]: 9.14e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.51999e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.24998e-06 [merge_forward]: 3.56001e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.17001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.061e-05 [merge_recompute_call_nodes]: 1.57999e-06 [before_grad]: 9.00001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.12999e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 1.057e-05 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00033442 [add_forward_monad_depend]: 4.17e-06 [auto_monad_grad]: 1.57001e-06 [auto_monad_eliminator]: 1.313e-05 [cse]: 2.604e-05 [a_3]: 3.945e-05 [Cycle 2]: 0.00061998, [45] [expand_dump_flag]: 8.80013e-07 [switch_simplify]: 6.51e-06 [loop_unroll]: 5.42001e-06 [a_1]: 0.0001251 [with_stream_mark]: 9.09998e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.44001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 9.493e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.50001e-06 [merge_send_recv]: 4.40999e-06 [auto_parallel]: 5.27001e-06 [parallel]: 4.38999e-06 [flash_sp]: 2.94999e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.32001e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.41e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 4.91002e-06 [merge_forward]: 2.60002e-06 [cell_reuse_recompute_pass]: 1.47999e-06 [offload_activation]: 5.90002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.67001e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99001e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.56e-06 [a_after_grad]: 7.97998e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 7.39994e-07 [auto_monad_eliminator]: 6.17999e-06 [cse]: 1.236e-05 [a_3]: 3.303e-05 [py_interpret_to_execute_after_opt_a]: 7.58001e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.076e-05 [convert_after_rewriter]: 6.88998e-06 [order_py_execute_after_rewriter]: 4.87998e-06 [mutable_eliminate]: 0.00044964 [opt_b]: 0.00018048, [1] [Cycle 1]: 0.0001746, [7] [b_1]: 0.00010691 [b_2]: 7.04001e-06 [updatestate_depend_eliminate]: 5.30999e-06 [updatestate_assign_eliminate]: 2.54999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 5.29981e-07 [cse]: 1.556e-05 [optimize_parallel_all_gather_comm]: 1.554e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.199e-05 [loop_unroll]: 0.00041264 [opt_after_cconv]: 9.541e-05, [1] [Cycle 1]: 8.951e-05, [7] [c_1]: 2.8e-05 [parameter_eliminate]: 2.15002e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.74999e-06 [updatestate_loads_eliminate]: 2.37001e-06 [cse]: 1.575e-05 [renormalize]: 2.09984e-07 [remove_dup_value]: 1.146e-05 [tuple_transform]: 6.903e-05, [1] [Cycle 1]: 6.482e-05, [4] [d_1]: 3.888e-05 [none_parameter_eliminate]: 1.71e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.39999e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.329e-05 [cse_after_recomputation]: 1.898e-05, [1] [Cycle 1]: 1.475e-05, [1] [cse]: 9.69999e-06 [environ_conv]: 4.80999e-06 [swap_dp_allreduce_reducescatter]: 5.72999e-06 [bias_add_comm_swap]: 2.48002e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.63998e-06 [merge_cast_opt]: 1.46002e-06 [slice_recompute_activation]: 2.39999e-06 [micro_interleaved_order_control]: 2.58e-06 [assign_add_opt]: 1.24e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.79984e-07 [full_micro_interleaved_order_control]: 2.37001e-06 [reorder_send_recv_between_fp_bp]: 2.80002e-06 [comm_op_add_attrs]: 1.03001e-06 [add_comm_op_reuse_tag]: 1.06002e-06 [interleave_split_concat_branches]: 1.27e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.138e-05 [grouped_pairwise_exchange_alltoall]: 1.71e-06 [offloading_packed_experts]: 3.46999e-06 [overlap_recompute_and_grad_model_parallel]: 4.31002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.29003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.07999e-06 [overlap_grad_ring_attention]: 4.09002e-06 [overlap_grad_flash_sp]: 1.706e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.08002e-06 [split_layernorm_comm]: 1.89999e-06 [handle_group_info]: 1.19998e-06 [symbol_engine_optimizer]: 6.761e-05, [1] [Cycle 1]: 6.328e-05, [6] [build]: 2.02001e-06 [elim_shapecalc]: 8.12998e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 5.82001e-06 [fold_const_symbol]: 9.02e-06 [renormalize]: 1.59984e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.44e-06 [auto_monad_reorder]: 1.61e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.26999e-06 [opt_after_jit_grad]: 0.00044802 [validate]: 3.082e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.00587877 [execute]: 6.81999e-06 Sums bootstrap : 0.000479s : 3.36% type_inference : 0.004378s : 30.73% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.91% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000171s : 1.20% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000334s : 2.35% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000450s : 3.16% optimize.opt_b.b_1 : 0.000107s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000413s : 2.90% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000011s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000448s : 3.14% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005879s : 41.26% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.53% : 0.000022s : 4: substitution.arithmetic_simplify 1.43% : 0.000002s : 2: substitution.elim_not_effective 1.26% : 0.000002s : 2: substitution.fold_const_symbol 4.52% : 0.000005s : 4: substitution.graph_param_transform 65.10% : 0.000077s : 2: substitution.inline 2.23% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.56% : 0.000004s : 4: substitution.remove_not_recompute_node 3.37% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004338 2 92.13% : 0.003996s : 1: type_inference.infer 7.87% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.07% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 0.96% : 0.000001s : 8: predicate.check_bprop_eliminate 0.67% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.84% : 0.000002s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.76% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.08% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.20% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.79% : 0.000002s : 18: predicate.loop_unroll_before_grad 2.06% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.74% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.80% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.09% : 0.000001s : 9: predicate.reduce_eliminate 2.28% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.62% : 0.000001s : 8: predicate.remove_not_recompute_node 1.25% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.75% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.72% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.16% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.80% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000236 6 42.01% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.99% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026174 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.56% : 0.003025s : 1: add_attr 11.52% : 0.003016s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.08% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.96% : 0.000513s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000022s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000421s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.75% : 0.000459s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000769s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.14% : 0.001868s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.75% : 0.000458s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.02% : 0.003671s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.70% : 0.000184s : 1: renormalize.infer 0.55% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.50% : 0.005888s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.78% : 0.004391s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0194649, [24] [bootstrap]: 0.00047074 [type_inference]: 0.00547016 [event_method]: 1.388e-05 [auto_monad]: 5.488e-05 [graph_reusing]: 5.65001e-06 [inline]: 1.84998e-06 [add_attr]: 0.00295888, [1] [add_attr_with_inline]: 0.00295062, [1] [Cycle 1]: 4.49e-05, [2] [tag_attr]: 1.48e-05 [meta_addattr_fg_expand]: 4.05e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.454e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.30011e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.79998e-06 [optimize]: 0.00391039, [53] [py_interpret_to_execute]: 1.982e-05 [rewriter_before_opt_a]: 5.745e-05 [opt_a]: 0.00206948, [2] [Cycle 1]: 0.00147247, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 3.059e-05 [loop_unroll]: 2.053e-05 [a_1]: 0.00044054 [with_stream_mark]: 1.347e-05 [recompute_prepare]: 7.5e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.73001e-06 [updatestate_loads_eliminate]: 2.71999e-06 [parameter_eliminate]: 1.49998e-06 [a_2]: 7.571e-05 [accelerated_algorithm]: 6.52001e-06 [shard]: 2.01e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 6.10002e-06 [merge_send_recv]: 7.51999e-06 [auto_parallel]: 5.77999e-06 [parallel]: 1.647e-05 [flash_sp]: 6.86001e-06 [merge_comm]: 3.77998e-06 [allreduce_fusion]: 3.41999e-06 [matmul_add_comm_reduction]: 8.78001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.98998e-06 [virtual_dataset]: 5.90002e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.58002e-06 [merge_forward]: 3.76999e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.52001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.20001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.92002e-06 [receive_attached]: 2.81999e-06 [after_resolve]: 1.042e-05 [a_after_grad]: 8.92e-06 [renormalize]: 0.00039836 [add_forward_monad_depend]: 4.53999e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.31e-05 [cse]: 2.605e-05 [a_3]: 4.032e-05 [Cycle 2]: 0.00058745, [45] [expand_dump_flag]: 9.90025e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012552 [with_stream_mark]: 9.04e-06 [recompute_prepare]: 5.55001e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.78e-05 [accelerated_algorithm]: 5.40999e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.31002e-06 [auto_parallel]: 5.46e-06 [parallel]: 4.19002e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.78e-06 [matmul_add_comm_reduction]: 4.97999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 5.78002e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.10001e-06 [virtual_output]: 5.00001e-06 [merge_forward]: 2.51998e-06 [cell_reuse_recompute_pass]: 1.21002e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.2e-06 [set_forward_comm_id_for_comm_node_pass]: 3.03998e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.05999e-06 [a_after_grad]: 8.13999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 1.00001e-06 [auto_monad_eliminator]: 6.02999e-06 [cse]: 1.515e-05 [a_3]: 3.155e-05 [py_interpret_to_execute_after_opt_a]: 7.36999e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 2.988e-05 [convert_after_rewriter]: 3.873e-05 [order_py_execute_after_rewriter]: 5.66e-06 [mutable_eliminate]: 0.00044558 [opt_b]: 0.00017995, [1] [Cycle 1]: 0.00017411, [7] [b_1]: 0.00010762 [b_2]: 7.14001e-06 [updatestate_depend_eliminate]: 5.29e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 3.70026e-07 [cse]: 1.559e-05 [optimize_parallel_all_gather_comm]: 1.524e-05 [overlap_param_gather]: 1.72999e-06 [cconv]: 2.128e-05 [loop_unroll]: 0.00041015 [opt_after_cconv]: 9.348e-05, [1] [Cycle 1]: 8.786e-05, [7] [c_1]: 2.735e-05 [parameter_eliminate]: 2.21e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.551e-05 [renormalize]: 3.80009e-07 [remove_dup_value]: 1.253e-05 [tuple_transform]: 6.796e-05, [1] [Cycle 1]: 6.393e-05, [4] [d_1]: 3.814e-05 [none_parameter_eliminate]: 1.87001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 4.215e-05 [cse_after_recomputation]: 1.993e-05, [1] [Cycle 1]: 1.552e-05, [1] [cse]: 1.032e-05 [environ_conv]: 4.50001e-06 [swap_dp_allreduce_reducescatter]: 5.04e-06 [bias_add_comm_swap]: 2.11e-06 [label_micro_interleaved_index]: 4.07e-06 [label_fine_grained_interleaved_index]: 2.44001e-06 [merge_cast_opt]: 1.60001e-06 [slice_recompute_activation]: 2.34999e-06 [micro_interleaved_order_control]: 2.55002e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.11998e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.29e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.133e-05 [grouped_pairwise_exchange_alltoall]: 1.46002e-06 [offloading_packed_experts]: 3.3e-06 [overlap_recompute_and_grad_model_parallel]: 4.36002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 3.68999e-06 [overlap_grad_flash_sp]: 1.661e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.29001e-06 [split_layernorm_comm]: 1.62999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 6.738e-05, [1] [Cycle 1]: 6.314e-05, [6] [build]: 2.17999e-06 [elim_shapecalc]: 8.16002e-06 [elim_not_effective]: 1.103e-05 [opt_reshape]: 6.12001e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.99e-06 [pipeline_parallel_scheduler]: 1.32e-06 [auto_monad_reorder]: 1.575e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.63999e-06 [opt_after_jit_grad]: 0.00044111 [validate]: 2.982e-05 [backend_pass]: 9.20001e-07 [task_emit]: 0.00585223 [execute]: 6.41e-06 Sums bootstrap : 0.000471s : 3.03% type_inference : 0.005470s : 35.15% event_method : 0.000014s : 0.09% auto_monad : 0.000055s : 0.35% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000025s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000057s : 0.37% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000037s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000566s : 3.64% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.92% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.13% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.10% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.11% optimize.opt_a.renormalize : 0.000398s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000041s : 0.26% optimize.opt_a.a_3 : 0.000072s : 0.46% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000039s : 0.25% optimize.order_py_execute_after_rewriter : 0.000006s : 0.04% optimize.mutable_eliminate : 0.000446s : 2.86% optimize.opt_b.b_1 : 0.000108s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.14% optimize.loop_unroll : 0.000410s : 2.64% optimize.opt_after_cconv.c_1 : 0.000027s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000002s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000441s : 2.83% validate : 0.000030s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.005852s : 37.61% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000159 30 14.82% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.28% : 0.000005s : 4: substitution.graph_param_transform 66.74% : 0.000106s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.67% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005429 2 90.05% : 0.004889s : 1: type_inference.infer 9.95% : 0.000540s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.27% : 0.000027s : 3: replace.inline 29.73% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.64% : 0.000104s : 3: match.inline 8.36% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.18% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.25% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.80% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.11% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.76% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.61% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.36% : 0.000001s : 4: predicate.opt_reshape 0.37% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.59% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.47% : 0.000001s : 8: predicate.remove_not_recompute_node 1.54% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.99% : 0.000002s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.46% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.83% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.83% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.76% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.01% : 0.000008s : 54: predicate.switch_simplify 0.90% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.49% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000332 8 46.82% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.18% : 0.000176s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027819 196 0.01% : 0.000003s : 1: ForceFp32Comm 10.65% : 0.002963s : 1: add_attr 10.62% : 0.002954s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000060s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.81% : 0.000505s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.15% : 0.000043s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.50% : 0.000418s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.63% : 0.000454s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000011s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.34% : 0.000929s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000090s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.45% : 0.002072s : 1: opt_a 0.35% : 0.000097s : 1: opt_after_cconv 1.62% : 0.000450s : 1: opt_after_jit_grad 0.66% : 0.000183s : 1: opt_b 14.07% : 0.003914s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000004s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000029s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.74% : 0.000205s : 1: renormalize.infer 0.67% : 0.000186s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.22% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.25% : 0.000070s : 1: symbol_engine_optimizer 21.07% : 0.005862s : 1: task_emit 0.25% : 0.000071s : 1: tuple_transform 19.71% : 0.005483s : 1: type_inference 0.20% : 0.000055s : 1: validate TotalTime = 0.0372371, [24] [bootstrap]: 0.00049812 [type_inference]: 0.0113102 [event_method]: 4.665e-05 [auto_monad]: 0.00011707 [graph_reusing]: 8.08001e-06 [inline]: 1.90001e-06 [add_attr]: 0.00301951, [1] [add_attr_with_inline]: 0.00301102, [1] [Cycle 1]: 7.005e-05, [2] [tag_attr]: 3.45e-05 [meta_addattr_fg_expand]: 9.13002e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 4.899e-05 [insert-virtual-dataset]: 2.96001e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 2.16e-06 [pipeline_split]: 1.72001e-06 [optimize]: 0.0131919, [53] [py_interpret_to_execute]: 3.721e-05 [rewriter_before_opt_a]: 0.00014427 [opt_a]: 0.0109509, [3] [Cycle 1]: 0.00701441, [45] [expand_dump_flag]: 3.63e-06 [switch_simplify]: 7.365e-05 [loop_unroll]: 6.124e-05 [a_1]: 0.00150272 [with_stream_mark]: 2.312e-05 [recompute_prepare]: 2.13e-05 [updatestate_depend_eliminate]: 9.62999e-06 [updatestate_assign_eliminate]: 7.93001e-06 [updatestate_loads_eliminate]: 7.52002e-06 [parameter_eliminate]: 2.65002e-06 [a_2]: 0.00024247 [accelerated_algorithm]: 3.101e-05 [shard]: 1.97999e-06 [meta_shard_fg_expand]: 3.31001e-06 [shard_inline]: 1.588e-05 [merge_send_recv]: 1.626e-05 [auto_parallel]: 1.123e-05 [parallel]: 1.911e-05 [flash_sp]: 1.128e-05 [merge_comm]: 1.031e-05 [allreduce_fusion]: 8.95999e-06 [matmul_add_comm_reduction]: 2.658e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.805e-05 [virtual_dataset]: 1.563e-05 [get_grad_eliminate_]: 1.535e-05 [virtual_output]: 1.521e-05 [merge_forward]: 9.42999e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 1.76e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.894e-05 [merge_recompute_call_nodes]: 1.62999e-06 [before_grad]: 2.753e-05 [set_forward_comm_id_for_comm_node_pass]: 9.67001e-06 [meta_fg_expand]: 0.00138802 [flash_sp_send_recv_attached]: 3.48999e-06 [receive_attached]: 2.35002e-06 [after_resolve]: 5.892e-05 [a_after_grad]: 8.123e-05 [renormalize]: 0.002367 [add_forward_monad_depend]: 9.45001e-06 [auto_monad_grad]: 5.73997e-06 [auto_monad_eliminator]: 5.582e-05 [cse]: 0.00016185 [a_3]: 0.000333 [Cycle 2]: 0.00302091, [45] [expand_dump_flag]: 1.57999e-06 [switch_simplify]: 4.718e-05 [loop_unroll]: 4.343e-05 [a_1]: 0.00152312 [with_stream_mark]: 1.245e-05 [recompute_prepare]: 1.086e-05 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 4.47e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 9.90025e-07 [a_2]: 0.00012646 [accelerated_algorithm]: 1.225e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.82001e-06 [shard_inline]: 9.19e-06 [merge_send_recv]: 6.86999e-06 [auto_parallel]: 7.13e-06 [parallel]: 5.24998e-06 [flash_sp]: 3.08998e-06 [merge_comm]: 5.87001e-06 [allreduce_fusion]: 5.62999e-06 [matmul_add_comm_reduction]: 8.03999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.056e-05 [virtual_dataset]: 9.12001e-06 [get_grad_eliminate_]: 9.27001e-06 [virtual_output]: 8.60999e-06 [merge_forward]: 4.35e-06 [cell_reuse_recompute_pass]: 1.07e-06 [offload_activation]: 8.99e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.662e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.412e-05 [set_forward_comm_id_for_comm_node_pass]: 5.50001e-06 [meta_fg_expand]: 6.785e-05 [flash_sp_send_recv_attached]: 1.00999e-06 [receive_attached]: 1.11002e-06 [after_resolve]: 1.587e-05 [a_after_grad]: 1.431e-05 [renormalize]: 0.00061783 [add_forward_monad_depend]: 3.73999e-06 [auto_monad_grad]: 1.17e-06 [auto_monad_eliminator]: 1.452e-05 [cse]: 4.455e-05 [a_3]: 6.568e-05 [Cycle 3]: 0.00090178, [45] [expand_dump_flag]: 1.07998e-06 [switch_simplify]: 1.084e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.00025013 [with_stream_mark]: 1.051e-05 [recompute_prepare]: 9.39e-06 [updatestate_depend_eliminate]: 4.92999e-06 [updatestate_assign_eliminate]: 3.93999e-06 [updatestate_loads_eliminate]: 4.1e-06 [parameter_eliminate]: 9.89996e-07 [a_2]: 0.0001232 [accelerated_algorithm]: 1.141e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 8.97e-06 [merge_send_recv]: 6.81001e-06 [auto_parallel]: 7.05998e-06 [parallel]: 4.72e-06 [flash_sp]: 1.17e-06 [merge_comm]: 4.85999e-06 [allreduce_fusion]: 5.04998e-06 [matmul_add_comm_reduction]: 7.67002e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.007e-05 [virtual_dataset]: 8.54e-06 [get_grad_eliminate_]: 8.3e-06 [virtual_output]: 8.16002e-06 [merge_forward]: 4.31002e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 8.96002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.678e-05 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 1.466e-05 [set_forward_comm_id_for_comm_node_pass]: 5.94999e-06 [meta_fg_expand]: 2.81e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 1.334e-05 [a_after_grad]: 1.374e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.30001e-06 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 1.059e-05 [cse]: 2.561e-05 [a_3]: 5.99e-05 [py_interpret_to_execute_after_opt_a]: 1.004e-05 [slice_cell_reuse_recomputed_activation]: 1.72001e-06 [rewriter_after_opt_a]: 4.677e-05 [convert_after_rewriter]: 9.12999e-06 [order_py_execute_after_rewriter]: 6.77002e-06 [mutable_eliminate]: 0.00045723 [opt_b]: 0.00028652, [1] [Cycle 1]: 0.0002804, [7] [b_1]: 0.00018934 [b_2]: 1.08e-05 [updatestate_depend_eliminate]: 6.88998e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 4.23001e-06 [renormalize]: 4.00003e-07 [cse]: 3.027e-05 [optimize_parallel_all_gather_comm]: 1.979e-05 [overlap_param_gather]: 2.04999e-06 [cconv]: 1.925e-05 [loop_unroll]: 0.00041934 [opt_after_cconv]: 0.00013527, [1] [Cycle 1]: 0.00012939, [7] [c_1]: 4.924e-05 [parameter_eliminate]: 2.13002e-06 [updatestate_depend_eliminate]: 7.02002e-06 [updatestate_assign_eliminate]: 4.39998e-06 [updatestate_loads_eliminate]: 3.85998e-06 [cse]: 2.873e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 2.799e-05 [tuple_transform]: 0.00010089, [1] [Cycle 1]: 9.613e-05, [4] [d_1]: 6.611e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 9.77001e-06 [partial_unused_args_eliminate]: 1.74e-06 [add_recomputation]: 5.588e-05 [cse_after_recomputation]: 3.157e-05, [1] [Cycle 1]: 2.682e-05, [1] [cse]: 2.136e-05 [environ_conv]: 8.62e-06 [swap_dp_allreduce_reducescatter]: 7.66001e-06 [bias_add_comm_swap]: 2.74001e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.91e-06 [merge_cast_opt]: 1.33002e-06 [slice_recompute_activation]: 2.02001e-06 [micro_interleaved_order_control]: 2.20002e-06 [assign_add_opt]: 1.49998e-06 [ForceFp32Comm]: 8.50006e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.24e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.675e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 4.99e-06 [overlap_recompute_and_grad_model_parallel]: 5.61e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.55999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 5.20999e-06 [overlap_grad_flash_sp]: 2.355e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.15002e-06 [split_layernorm_comm]: 1.63002e-06 [handle_group_info]: 1.27e-06 [symbol_engine_optimizer]: 9.65e-05, [1] [Cycle 1]: 9.239e-05, [6] [build]: 9.08002e-06 [elim_shapecalc]: 1.326e-05 [elim_not_effective]: 1.799e-05 [opt_reshape]: 9.66998e-06 [fold_const_symbol]: 1.477e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.39e-06 [auto_monad_reorder]: 2.501e-05 [get_jit_bprop_graph]: 1.12e-06 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.0005103 [validate]: 4.482e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00818122 [execute]: 7.03e-06 Sums bootstrap : 0.000498s : 1.51% type_inference : 0.011310s : 34.30% event_method : 0.000047s : 0.14% auto_monad : 0.000117s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000037s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000132s : 0.40% optimize.opt_a.loop_unroll : 0.000114s : 0.34% optimize.opt_a.a_1 : 0.003276s : 9.94% optimize.opt_a.with_stream_mark : 0.000046s : 0.14% optimize.opt_a.recompute_prepare : 0.000042s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000005s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.49% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000030s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000021s : 0.06% optimize.opt_a.allreduce_fusion : 0.000020s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000056s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.06% optimize.opt_a.meta_fg_expand : 0.001459s : 4.42% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000088s : 0.27% optimize.opt_a.a_after_grad : 0.000109s : 0.33% optimize.opt_a.renormalize : 0.002985s : 9.05% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000232s : 0.70% optimize.opt_a.a_3 : 0.000459s : 1.39% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.14% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000457s : 1.39% optimize.opt_b.b_1 : 0.000189s : 0.57% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000419s : 1.27% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000028s : 0.08% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000021s : 0.06% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000510s : 1.55% validate : 0.000045s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008181s : 24.81% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000753 222 5.89% : 0.000044s : 12: substitution.arithmetic_simplify 1.90% : 0.000014s : 2: substitution.cast_eliminate 0.41% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.05% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.30% : 0.000416s : 17: substitution.inline 2.02% : 0.000015s : 2: substitution.inline_without_move 1.38% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.02% : 0.000015s : 3: substitution.less_batch_normalization 1.69% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.94% : 0.000015s : 20: substitution.remove_not_recompute_node 3.15% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.30% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.57% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.79% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.73% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.43% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011238 2 87.16% : 0.009795s : 1: type_inference.infer 12.84% : 0.001443s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.59% : 0.000127s : 17: replace.inline 42.41% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000441 33 92.29% : 0.000407s : 17: match.inline 7.71% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000749 5764 1.08% : 0.000008s : 68: predicate.accumulaten_eliminater 0.27% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.04% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.10% : 0.000001s : 8: predicate.const_output_eliminate 0.51% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.40% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.75% : 0.000013s : 108: predicate.environ_get_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.30% : 0.000017s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.65% : 0.000042s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.65% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.70% : 0.000020s : 168: predicate.load_eliminater 0.30% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.27% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.53% : 0.000004s : 32: predicate.merge_addn 1.11% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.03% : 0.000015s : 101: predicate.partial_defer_inline 1.75% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 68: predicate.reduce_eliminate 2.70% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.91% : 0.000014s : 152: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.08% : 0.000008s : 68: predicate.reshape_eliminate 1.14% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.25% : 0.000009s : 68: predicate.same_eliminate 0.35% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.24% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.12% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.89% : 0.000014s : 101: predicate.switch_defer_inline 2.99% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.05% : 0.000038s : 277: predicate.switch_simplify 1.08% : 0.000008s : 68: predicate.tile_eliminate 1.07% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 84: predicate.tuple_list_get_item_const_eliminator 1.29% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.66% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.17% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.55% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001543 34 57.71% : 0.000891s : 13: func_graph_cloner_run.FuncGraphClonerGraph 42.29% : 0.000652s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061699 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.90% : 0.003024s : 1: add_attr 4.89% : 0.003015s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000124s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.86% : 0.000532s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000054s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.69% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000466s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.00% : 0.004937s : 117: opt.transform.opt_a 0.08% : 0.000048s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000175s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.75% : 0.010954s : 1: opt_a 0.22% : 0.000139s : 1: opt_after_cconv 0.84% : 0.000520s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.39% : 0.013196s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000041s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000032s : 1: remove_dup_value 2.63% : 0.001624s : 2: renormalize.infer 2.19% : 0.001348s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000051s : 1: rewriter_after_opt_a 0.24% : 0.000149s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000099s : 1: symbol_engine_optimizer 13.28% : 0.008191s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.36% : 0.011326s : 1: type_inference 0.12% : 0.000075s : 1: validate TotalTime = 0.0183605, [24] [bootstrap]: 0.00045843 [type_inference]: 0.00425975 [event_method]: 1.06e-05 [auto_monad]: 5.019e-05 [graph_reusing]: 5.47999e-06 [inline]: 1.77001e-06 [add_attr]: 0.00295357, [1] [add_attr_with_inline]: 0.00294535, [1] [Cycle 1]: 4.366e-05, [2] [tag_attr]: 1.16e-05 [meta_addattr_fg_expand]: 3.03e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.008e-05 [insert-virtual-dataset]: 2.26e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 2.19999e-06 [pipeline_split]: 1.91e-06 [optimize]: 0.00363155, [53] [py_interpret_to_execute]: 1.459e-05 [rewriter_before_opt_a]: 3.832e-05 [opt_a]: 0.00183287, [2] [Cycle 1]: 0.00123653, [45] [expand_dump_flag]: 2.83e-06 [switch_simplify]: 2.371e-05 [loop_unroll]: 1.375e-05 [a_1]: 0.00028994 [with_stream_mark]: 1.282e-05 [recompute_prepare]: 8.12e-06 [updatestate_depend_eliminate]: 3.91001e-06 [updatestate_assign_eliminate]: 2.97002e-06 [updatestate_loads_eliminate]: 2.79001e-06 [parameter_eliminate]: 2.16e-06 [a_2]: 7.552e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 2.07001e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 5.72999e-06 [merge_send_recv]: 8.02e-06 [auto_parallel]: 5.30999e-06 [parallel]: 1.705e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.78999e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 8.71002e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.62001e-06 [virtual_output]: 5.82999e-06 [merge_forward]: 3.83001e-06 [cell_reuse_recompute_pass]: 1.09998e-06 [offload_activation]: 9.54e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 9.12999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.30002e-06 [flash_sp_send_recv_attached]: 2.46e-06 [receive_attached]: 2.11003e-06 [after_resolve]: 1.028e-05 [a_after_grad]: 8.75001e-06 [renormalize]: 0.00033449 [add_forward_monad_depend]: 4.24002e-06 [auto_monad_grad]: 1.55999e-06 [auto_monad_eliminator]: 1.257e-05 [cse]: 2.664e-05 [a_3]: 3.982e-05 [Cycle 2]: 0.00058715, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.66e-06 [loop_unroll]: 5.41998e-06 [a_1]: 0.00012442 [with_stream_mark]: 9.29e-06 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.85002e-06 [updatestate_assign_eliminate]: 2.07999e-06 [updatestate_loads_eliminate]: 2.36998e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 6.713e-05 [accelerated_algorithm]: 5.30999e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.22998e-06 [auto_parallel]: 5.12e-06 [parallel]: 4.16001e-06 [flash_sp]: 3.38999e-06 [merge_comm]: 3.2e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 6.58e-06 [allreduce_slice_to_reducescatter]: 2.9002e-07 [virtual_shard_identity]: 6.10002e-06 [virtual_dataset]: 5.52001e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.64999e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.41e-06 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 7.68001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14001e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.00001e-06 [a_after_grad]: 7.73999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 6.33e-06 [cse]: 1.265e-05 [a_3]: 3.175e-05 [py_interpret_to_execute_after_opt_a]: 7.66001e-06 [slice_cell_reuse_recomputed_activation]: 1.84998e-06 [rewriter_after_opt_a]: 3.045e-05 [convert_after_rewriter]: 6.98e-06 [order_py_execute_after_rewriter]: 4.87998e-06 [mutable_eliminate]: 0.00044609 [opt_b]: 0.00018103, [1] [Cycle 1]: 0.0001746, [7] [b_1]: 0.00010712 [b_2]: 7.41999e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.38998e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.50003e-07 [cse]: 1.584e-05 [optimize_parallel_all_gather_comm]: 1.485e-05 [overlap_param_gather]: 1.87001e-06 [cconv]: 2.219e-05 [loop_unroll]: 0.00041411 [opt_after_cconv]: 9.467e-05, [1] [Cycle 1]: 8.884e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.43002e-06 [updatestate_loads_eliminate]: 2.24999e-06 [cse]: 1.59e-05 [renormalize]: 4.89992e-07 [remove_dup_value]: 1.207e-05 [tuple_transform]: 6.958e-05, [1] [Cycle 1]: 6.524e-05, [4] [d_1]: 3.941e-05 [none_parameter_eliminate]: 1.59e-06 [renormalize]: 2.30008e-07 [switch_simplify]: 6.11998e-06 [partial_unused_args_eliminate]: 1.69e-06 [add_recomputation]: 4.427e-05 [cse_after_recomputation]: 1.997e-05, [1] [Cycle 1]: 1.559e-05, [1] [cse]: 1.048e-05 [environ_conv]: 5.39e-06 [swap_dp_allreduce_reducescatter]: 5.25999e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.38001e-06 [label_fine_grained_interleaved_index]: 2.58998e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.22999e-06 [micro_interleaved_order_control]: 2.46e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.28998e-06 [reorder_send_recv_between_fp_bp]: 2.84001e-06 [comm_op_add_attrs]: 1.27999e-06 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.148e-05 [grouped_pairwise_exchange_alltoall]: 1.75001e-06 [offloading_packed_experts]: 3.66001e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 3.8e-06 [overlap_grad_flash_sp]: 1.626e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 6.836e-05, [1] [Cycle 1]: 6.425e-05, [6] [build]: 2.16e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.165e-05 [opt_reshape]: 6.11e-06 [fold_const_symbol]: 8.68001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.72001e-06 [auto_monad_reorder]: 1.567e-05 [get_jit_bprop_graph]: 1.10001e-06 [rewriter_after_jit_bprop_graph]: 3.35998e-06 [opt_after_jit_grad]: 0.00046835 [validate]: 3.111e-05 [backend_pass]: 8.60018e-07 [task_emit]: 0.00623456 [execute]: 6.36998e-06 Sums bootstrap : 0.000458s : 3.17% type_inference : 0.004260s : 29.47% event_method : 0.000011s : 0.07% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000020s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.02% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000414s : 2.87% optimize.opt_a.with_stream_mark : 0.000022s : 0.15% optimize.opt_a.recompute_prepare : 0.000014s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.99% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000010s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.11% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000335s : 2.31% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000446s : 3.09% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000414s : 2.86% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000044s : 0.31% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000468s : 3.24% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.006235s : 43.13% execute : 0.000006s : 0.04% Time group info: ------[substitution.] 0.000119 26 18.17% : 0.000022s : 4: substitution.arithmetic_simplify 1.61% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.60% : 0.000005s : 4: substitution.graph_param_transform 65.31% : 0.000078s : 2: substitution.inline 2.36% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.59% : 0.000004s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004221 2 91.88% : 0.003878s : 1: type_inference.infer 8.12% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.82% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.53% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.10% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.79% : 0.000002s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.81% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.00% : 0.000001s : 8: predicate.less_batch_normalization 1.53% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.34% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.74% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.56% : 0.000001s : 4: predicate.parallel_virtual_node 1.25% : 0.000002s : 11: predicate.partial_defer_inline 1.25% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000001s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.65% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 1.00% : 0.000001s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.42% : 0.000006s : 41: predicate.switch_simplify 0.78% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.63% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.23% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.12% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.11% : 0.000002s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 41.21% : 0.000098s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.79% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026202 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.29% : 0.002958s : 1: add_attr 11.25% : 0.002949s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.88% : 0.000493s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000009s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000455s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.91% : 0.000762s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000044s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.01% : 0.001836s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.82% : 0.000478s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 13.87% : 0.003635s : 1: optimize 0.07% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000019s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.09% : 0.000024s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000185s : 1: renormalize.infer 0.55% : 0.000144s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 23.83% : 0.006245s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.31% : 0.004273s : 1: type_inference 0.22% : 0.000058s : 1: validate TotalTime = 0.0357046, [24] [bootstrap]: 0.00055333 [type_inference]: 0.0101248 [event_method]: 4.037e-05 [auto_monad]: 0.00011588 [graph_reusing]: 7.83999e-06 [inline]: 1.94999e-06 [add_attr]: 0.00298217, [1] [add_attr_with_inline]: 0.00297402, [1] [Cycle 1]: 6.663e-05, [2] [tag_attr]: 3.114e-05 [meta_addattr_fg_expand]: 8.85001e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 4.52e-05 [insert-virtual-dataset]: 2.22999e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.82001e-06 [optimize]: 0.0129156, [53] [py_interpret_to_execute]: 3.518e-05 [rewriter_before_opt_a]: 0.00012503 [opt_a]: 0.0106933, [3] [Cycle 1]: 0.00682599, [45] [expand_dump_flag]: 3.53e-06 [switch_simplify]: 6.642e-05 [loop_unroll]: 5.456e-05 [a_1]: 0.00132904 [with_stream_mark]: 2.258e-05 [recompute_prepare]: 2.159e-05 [updatestate_depend_eliminate]: 9.37999e-06 [updatestate_assign_eliminate]: 7.67998e-06 [updatestate_loads_eliminate]: 7.14001e-06 [parameter_eliminate]: 2.45002e-06 [a_2]: 0.00024347 [accelerated_algorithm]: 3.045e-05 [shard]: 1.86e-06 [meta_shard_fg_expand]: 3.18e-06 [shard_inline]: 1.622e-05 [merge_send_recv]: 1.524e-05 [auto_parallel]: 1.07e-05 [parallel]: 1.861e-05 [flash_sp]: 1.079e-05 [merge_comm]: 9.99001e-06 [allreduce_fusion]: 8.83001e-06 [matmul_add_comm_reduction]: 4.473e-05 [allreduce_slice_to_reducescatter]: 8.00006e-07 [virtual_shard_identity]: 1.797e-05 [virtual_dataset]: 1.573e-05 [get_grad_eliminate_]: 1.516e-05 [virtual_output]: 1.522e-05 [merge_forward]: 9.62999e-06 [cell_reuse_recompute_pass]: 9.99979e-07 [offload_activation]: 1.824e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.854e-05 [merge_recompute_call_nodes]: 1.87001e-06 [before_grad]: 2.692e-05 [set_forward_comm_id_for_comm_node_pass]: 9.69999e-06 [meta_fg_expand]: 0.00136621 [flash_sp_send_recv_attached]: 3.55e-06 [receive_attached]: 2.22001e-06 [after_resolve]: 5.907e-05 [a_after_grad]: 8.13e-05 [renormalize]: 0.00237558 [add_forward_monad_depend]: 9.54e-06 [auto_monad_grad]: 5.30999e-06 [auto_monad_eliminator]: 5.516e-05 [cse]: 0.00016187 [a_3]: 0.00033414 [Cycle 2]: 0.00292291, [45] [expand_dump_flag]: 1.48002e-06 [switch_simplify]: 4.686e-05 [loop_unroll]: 4.343e-05 [a_1]: 0.00152035 [with_stream_mark]: 1.202e-05 [recompute_prepare]: 1.049e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.57e-06 [updatestate_loads_eliminate]: 3.74002e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 0.00012542 [accelerated_algorithm]: 1.203e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.81e-06 [shard_inline]: 9.19e-06 [merge_send_recv]: 6.66e-06 [auto_parallel]: 7.38999e-06 [parallel]: 4.85999e-06 [flash_sp]: 3.08e-06 [merge_comm]: 5.07e-06 [allreduce_fusion]: 4.63001e-06 [matmul_add_comm_reduction]: 7.63001e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.019e-05 [virtual_dataset]: 8.69e-06 [get_grad_eliminate_]: 8.82e-06 [virtual_output]: 8.28001e-06 [merge_forward]: 4.3e-06 [cell_reuse_recompute_pass]: 8.50006e-07 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.627e-05 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 1.376e-05 [set_forward_comm_id_for_comm_node_pass]: 5.06002e-06 [meta_fg_expand]: 3.377e-05 [flash_sp_send_recv_attached]: 9.70002e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 1.501e-05 [a_after_grad]: 1.413e-05 [renormalize]: 0.00056813 [add_forward_monad_depend]: 3.97e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 1.442e-05 [cse]: 4.595e-05 [a_3]: 6.499e-05 [Cycle 3]: 0.00093011, [45] [expand_dump_flag]: 1.12e-06 [switch_simplify]: 1.051e-05 [loop_unroll]: 9.04e-06 [a_1]: 0.0002794 [with_stream_mark]: 1.018e-05 [recompute_prepare]: 9.29998e-06 [updatestate_depend_eliminate]: 4.88001e-06 [updatestate_assign_eliminate]: 3.96001e-06 [updatestate_loads_eliminate]: 3.98001e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00012404 [accelerated_algorithm]: 1.172e-05 [shard]: 9.39996e-07 [meta_shard_fg_expand]: 1.63002e-06 [shard_inline]: 9.02999e-06 [merge_send_recv]: 6.79999e-06 [auto_parallel]: 7.1e-06 [parallel]: 4.53001e-06 [flash_sp]: 1.10999e-06 [merge_comm]: 4.85001e-06 [allreduce_fusion]: 4.82e-06 [matmul_add_comm_reduction]: 7.45998e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.04e-05 [virtual_dataset]: 8.57e-06 [get_grad_eliminate_]: 8.47e-06 [virtual_output]: 8.27e-06 [merge_forward]: 4.33999e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 8.50001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.669e-05 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 1.399e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 3.04999e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.07998e-06 [after_resolve]: 1.328e-05 [a_after_grad]: 1.471e-05 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 8.59989e-07 [auto_monad_eliminator]: 1.039e-05 [cse]: 2.651e-05 [a_3]: 6.001e-05 [py_interpret_to_execute_after_opt_a]: 1.037e-05 [slice_cell_reuse_recomputed_activation]: 1.76e-06 [rewriter_after_opt_a]: 4.536e-05 [convert_after_rewriter]: 9.52999e-06 [order_py_execute_after_rewriter]: 6.69999e-06 [mutable_eliminate]: 0.0004586 [opt_b]: 0.00028533, [1] [Cycle 1]: 0.00027928, [7] [b_1]: 0.00018769 [b_2]: 1.068e-05 [updatestate_depend_eliminate]: 7.15998e-06 [updatestate_assign_eliminate]: 4.2e-06 [updatestate_loads_eliminate]: 4.09002e-06 [renormalize]: 3.09985e-07 [cse]: 3.092e-05 [optimize_parallel_all_gather_comm]: 1.96e-05 [overlap_param_gather]: 1.71998e-06 [cconv]: 1.89e-05 [loop_unroll]: 0.00041931 [opt_after_cconv]: 0.00013464, [1] [Cycle 1]: 0.00012888, [7] [c_1]: 4.855e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 7.07997e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 2.91e-05 [renormalize]: 2.69996e-07 [remove_dup_value]: 2.978e-05 [tuple_transform]: 0.00010174, [1] [Cycle 1]: 9.716e-05, [4] [d_1]: 6.68e-05 [none_parameter_eliminate]: 1.64e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 9.88002e-06 [partial_unused_args_eliminate]: 1.55999e-06 [add_recomputation]: 5.636e-05 [cse_after_recomputation]: 3.21e-05, [1] [Cycle 1]: 2.74e-05, [1] [cse]: 2.172e-05 [environ_conv]: 9.41e-06 [swap_dp_allreduce_reducescatter]: 7.66001e-06 [bias_add_comm_swap]: 2.71e-06 [label_micro_interleaved_index]: 4.06001e-06 [label_fine_grained_interleaved_index]: 2.80002e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.86e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.90023e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.89999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.28002e-06 [add_comm_op_reuse_tag]: 1.31002e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.653e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 4.95999e-06 [overlap_recompute_and_grad_model_parallel]: 5.91998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 5.37999e-06 [overlap_grad_flash_sp]: 2.319e-05 [begin_end_overlap_inline]: 5.10016e-07 [split_matmul_comm_elemetwise]: 2.16e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.589e-05, [1] [Cycle 1]: 9.171e-05, [6] [build]: 8.25999e-06 [elim_shapecalc]: 1.3e-05 [elim_not_effective]: 1.781e-05 [opt_reshape]: 9.71e-06 [fold_const_symbol]: 1.477e-05 [renormalize]: 2.19996e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.32e-06 [auto_monad_reorder]: 2.457e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.46001e-06 [opt_after_jit_grad]: 0.00046684 [validate]: 4.343e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00815676 [execute]: 6.64001e-06 Sums bootstrap : 0.000553s : 1.76% type_inference : 0.010125s : 32.16% event_method : 0.000040s : 0.13% auto_monad : 0.000116s : 0.37% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000125s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.39% optimize.opt_a.loop_unroll : 0.000107s : 0.34% optimize.opt_a.a_1 : 0.003129s : 9.94% optimize.opt_a.with_stream_mark : 0.000045s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000493s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000060s : 0.19% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000036s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.20% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001403s : 4.46% optimize.opt_a.flash_sp_send_recv_attached : 0.000005s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000110s : 0.35% optimize.opt_a.renormalize : 0.002944s : 9.35% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000080s : 0.25% optimize.opt_a.cse : 0.000234s : 0.74% optimize.opt_a.a_3 : 0.000459s : 1.46% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000045s : 0.14% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000459s : 1.46% optimize.opt_b.b_1 : 0.000188s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000019s : 0.06% optimize.loop_unroll : 0.000419s : 1.33% optimize.opt_after_cconv.c_1 : 0.000049s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.09% optimize.tuple_transform.d_1 : 0.000067s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000023s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000008s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000467s : 1.48% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008157s : 25.91% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000748 218 9.65% : 0.000072s : 11: substitution.arithmetic_simplify 1.75% : 0.000013s : 2: substitution.cast_eliminate 0.36% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.53% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.04% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 52.44% : 0.000392s : 16: substitution.inline 2.04% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000014s : 3: substitution.less_batch_normalization 1.69% : 0.000013s : 11: substitution.minmaximum_grad 0.71% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000013s : 20: substitution.remove_not_recompute_node 3.12% : 0.000023s : 10: substitution.replace_applicator 1.39% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.67% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.80% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.40% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.13% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010057 2 87.37% : 0.008787s : 1: type_inference.infer 12.63% : 0.001270s : 1: type_inference.specialize ------[replace.] 0.000198 30 58.86% : 0.000116s : 16: replace.inline 41.14% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000414 30 92.65% : 0.000384s : 16: match.inline 7.35% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000736 5663 1.11% : 0.000008s : 67: predicate.accumulaten_eliminater 0.32% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.05% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.15% : 0.000008s : 67: predicate.cast_eliminate 1.20% : 0.000009s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.10% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.24% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.68% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.27% : 0.000017s : 97: predicate.float_depend_g_call 0.53% : 0.000004s : 32: predicate.float_environ_get_switch 0.68% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.57% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.67% : 0.000005s : 32: predicate.less_batch_normalization 1.62% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.67% : 0.000020s : 164: predicate.load_eliminater 0.29% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.20% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.37% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.14% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 67: predicate.minmaximum_grad 0.34% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.96% : 0.000014s : 97: predicate.partial_defer_inline 1.74% : 0.000013s : 89: predicate.partial_eliminate 1.06% : 0.000008s : 67: predicate.print_const_string_wrapper 0.55% : 0.000004s : 32: predicate.reduce_all_const_elim 1.29% : 0.000009s : 67: predicate.reduce_eliminate 2.65% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.86% : 0.000014s : 149: predicate.replace_applicator 0.62% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.13% : 0.000008s : 67: predicate.reshape_eliminate 1.21% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.28% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.62% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.64% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.18% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.85% : 0.000014s : 97: predicate.switch_defer_inline 2.94% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.91% : 0.000036s : 265: predicate.switch_simplify 1.07% : 0.000008s : 67: predicate.tile_eliminate 1.06% : 0.000008s : 67: predicate.transpose_eliminate 1.46% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.76% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.45% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.02% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.60% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.63% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.27% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.57% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.61% : 0.000004s : 32: predicate.virtual_output_eliminate 0.15% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001456 32 57.42% : 0.000836s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.58% : 0.000620s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059645 237 0.01% : 0.000003s : 1: ForceFp32Comm 5.01% : 0.002987s : 1: add_attr 4.99% : 0.002978s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000061s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000123s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.98% : 0.000587s : 1: bootstrap 0.04% : 0.000022s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000019s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000013s : 1: environ_conv 0.08% : 0.000047s : 1: event_method 0.02% : 0.000011s : 1: execute 0.01% : 0.000006s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000428s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000016s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000017s : 1: opt.transform.mutable_eliminate 8.00% : 0.004772s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000173s : 28: opt.transform.opt_b 0.13% : 0.000075s : 2: opt.transform.opt_trans_graph 0.09% : 0.000052s : 4: opt.transform.symbol_engine_opt 17.93% : 0.010697s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.80% : 0.000476s : 1: opt_after_jit_grad 0.48% : 0.000289s : 1: opt_b 21.66% : 0.012920s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000026s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000034s : 1: remove_dup_value 2.60% : 0.001550s : 2: renormalize.infer 2.32% : 0.001382s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.08% : 0.000049s : 1: rewriter_after_opt_a 0.22% : 0.000129s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000099s : 1: symbol_engine_optimizer 13.69% : 0.008166s : 1: task_emit 0.18% : 0.000105s : 1: tuple_transform 17.00% : 0.010140s : 1: type_inference 0.12% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x2-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x2-kbk],max_mem:64.0M TotalTime = 0.119551, [24] [bootstrap]: 0.00054032 [type_inference]: 0.00598958 [event_method]: 1.32e-05 [auto_monad]: 5.419e-05 [graph_reusing]: 5.54e-06 [inline]: 1.75001e-06 [add_attr]: 0.0034115, [1] [add_attr_with_inline]: 0.00340015, [1] [Cycle 1]: 4.395e-05, [2] [tag_attr]: 1.492e-05 [meta_addattr_fg_expand]: 4.05e-06 [parallel-infer-symbol]: 2.83e-06 [pre_auto_parallel]: 2.77e-05 [insert-virtual-dataset]: 2.26998e-06 [parallel-infer-symbol-second]: 6.89994e-07 [dataset_repeat_opt]: 1.97001e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00394522, [53] [py_interpret_to_execute]: 2.13e-05 [rewriter_before_opt_a]: 5.759e-05 [opt_a]: 0.0020951, [2] [Cycle 1]: 0.00149872, [45] [expand_dump_flag]: 3.03998e-06 [switch_simplify]: 3.092e-05 [loop_unroll]: 2.127e-05 [a_1]: 0.00045012 [with_stream_mark]: 1.269e-05 [recompute_prepare]: 7.67002e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.12002e-06 [updatestate_loads_eliminate]: 2.76e-06 [parameter_eliminate]: 1.53002e-06 [a_2]: 7.56e-05 [accelerated_algorithm]: 6.28002e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.60999e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 7.21999e-06 [auto_parallel]: 5.96e-06 [parallel]: 2.189e-05 [flash_sp]: 7.05998e-06 [merge_comm]: 3.73001e-06 [allreduce_fusion]: 3.38e-06 [matmul_add_comm_reduction]: 8.68001e-06 [allreduce_slice_to_reducescatter]: 6.50005e-07 [virtual_shard_identity]: 7.55998e-06 [virtual_dataset]: 5.87999e-06 [get_grad_eliminate_]: 6.09999e-06 [virtual_output]: 5.90002e-06 [merge_forward]: 3.59002e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.90001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.1e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 9.36e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.33002e-06 [flash_sp_send_recv_attached]: 2.56998e-06 [receive_attached]: 2.65002e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 9.02999e-06 [renormalize]: 0.00040739 [add_forward_monad_depend]: 4.92999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.287e-05 [cse]: 2.585e-05 [a_3]: 4.091e-05 [Cycle 2]: 0.00058697, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.85002e-06 [loop_unroll]: 5.31002e-06 [a_1]: 0.00012455 [with_stream_mark]: 9.74e-06 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.41998e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.77e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.12999e-06 [shard_inline]: 5.40999e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.24e-06 [parallel]: 4.1e-06 [flash_sp]: 3.2e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 4.95001e-06 [allreduce_slice_to_reducescatter]: 3.30008e-07 [virtual_shard_identity]: 6.04001e-06 [virtual_dataset]: 5.30001e-06 [get_grad_eliminate_]: 5.10999e-06 [virtual_output]: 5.10001e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.39e-06 [offload_activation]: 5.91003e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.62999e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 8.00999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.14999e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.02e-06 [a_after_grad]: 7.9e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14e-06 [auto_monad_grad]: 8.10018e-07 [auto_monad_eliminator]: 5.98998e-06 [cse]: 1.21e-05 [a_3]: 3.213e-05 [py_interpret_to_execute_after_opt_a]: 7.26999e-06 [slice_cell_reuse_recomputed_activation]: 2.01e-06 [rewriter_after_opt_a]: 3.023e-05 [convert_after_rewriter]: 6.59001e-06 [order_py_execute_after_rewriter]: 4.71002e-06 [mutable_eliminate]: 0.00045125 [opt_b]: 0.00018201, [1] [Cycle 1]: 0.00017596, [7] [b_1]: 0.00010911 [b_2]: 7.21001e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 3.50003e-07 [cse]: 1.528e-05 [optimize_parallel_all_gather_comm]: 1.51e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.137e-05 [loop_unroll]: 0.00041378 [opt_after_cconv]: 9.379e-05, [1] [Cycle 1]: 8.777e-05, [7] [c_1]: 2.752e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.59e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.16e-06 [cse]: 1.52e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.3e-05 [tuple_transform]: 6.945e-05, [1] [Cycle 1]: 6.506e-05, [4] [d_1]: 3.9e-05 [none_parameter_eliminate]: 1.76e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.772e-05 [cse_after_recomputation]: 1.997e-05, [1] [Cycle 1]: 1.555e-05, [1] [cse]: 1.029e-05 [environ_conv]: 4.70999e-06 [swap_dp_allreduce_reducescatter]: 4.64998e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.06001e-06 [label_fine_grained_interleaved_index]: 2.58998e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.04e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 9.5999e-07 [remove_cast_before_assign_add]: 9.99979e-07 [full_micro_interleaved_order_control]: 2.54999e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 1.34998e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.152e-05 [grouped_pairwise_exchange_alltoall]: 1.948e-05 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.95001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.53002e-06 [overlap_recompute_comm]: 2.23002e-06 [overlap_grad_ring_attention]: 3.79002e-06 [overlap_grad_flash_sp]: 1.704e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.64e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.724e-05, [1] [Cycle 1]: 6.321e-05, [6] [build]: 2.27999e-06 [elim_shapecalc]: 8.72e-06 [elim_not_effective]: 1.129e-05 [opt_reshape]: 5.84e-06 [fold_const_symbol]: 8.62e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.45999e-06 [auto_monad_reorder]: 1.54e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.40998e-06 [opt_after_jit_grad]: 0.00044541 [validate]: 3.133e-05 [backend_pass]: 8.69972e-07 [task_emit]: 0.10483 [execute]: 9.39998e-06 Sums bootstrap : 0.000540s : 0.47% type_inference : 0.005990s : 5.20% event_method : 0.000013s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000006s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000028s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000021s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000575s : 0.50% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.12% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000011s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.01% optimize.opt_a.renormalize : 0.000407s : 0.35% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000038s : 0.03% optimize.opt_a.a_3 : 0.000073s : 0.06% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000451s : 0.39% optimize.opt_b.b_1 : 0.000109s : 0.09% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000414s : 0.36% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000006s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000019s : 0.02% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.01% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000445s : 0.39% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.104830s : 91.03% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.67% : 0.000024s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.33% : 0.000005s : 4: substitution.graph_param_transform 66.98% : 0.000109s : 3: substitution.inline 1.65% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.66% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.45% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005944 2 90.91% : 0.005403s : 1: type_inference.infer 9.09% : 0.000541s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.91% : 0.000027s : 3: replace.inline 30.09% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.89% : 0.000107s : 3: match.inline 8.11% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 0.93% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.38% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.10% : 0.000003s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.49% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.35% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.76% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.63% : 0.000003s : 21: predicate.replace_applicator 0.66% : 0.000001s : 8: predicate.replace_old_param 0.31% : 0.000000s : 4: predicate.reset_defer_inline 0.88% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.82% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.76% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 2.20% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.98% : 0.000008s : 54: predicate.switch_simplify 0.88% : 0.000001s : 11: predicate.tile_eliminate 0.92% : 0.000001s : 11: predicate.transpose_eliminate 1.48% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.76% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.10% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.53% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000341 8 47.49% : 0.000162s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.51% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.128407 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.66% : 0.003416s : 1: add_attr 2.65% : 0.003404s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000059s : 1: auto_monad 0.01% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000579s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000018s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.02% : 0.000023s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.36% : 0.000460s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 0.73% : 0.000940s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.02% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.63% : 0.002098s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.35% : 0.000455s : 1: opt_after_jit_grad 0.14% : 0.000185s : 1: opt_b 3.08% : 0.003949s : 1: optimize 0.01% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000032s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000208s : 1: renormalize.infer 0.15% : 0.000192s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000007s : 1: swap_dp_allreduce_reducescatter 0.05% : 0.000070s : 1: symbol_engine_optimizer 81.66% : 0.104853s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.68% : 0.006003s : 1: type_inference 0.04% : 0.000056s : 1: validate TotalTime = 0.107344, [24] [bootstrap]: 0.00047845 [type_inference]: 0.00444566 [event_method]: 1.089e-05 [auto_monad]: 5.132e-05 [graph_reusing]: 4.99e-06 [inline]: 1.96998e-06 [add_attr]: 0.00293364, [1] [add_attr_with_inline]: 0.00292556, [1] [Cycle 1]: 4.559e-05, [2] [tag_attr]: 1.188e-05 [meta_addattr_fg_expand]: 3.66999e-06 [parallel-infer-symbol]: 2.78998e-06 [pre_auto_parallel]: 2.15e-05 [insert-virtual-dataset]: 2.23002e-06 [parallel-infer-symbol-second]: 7.29982e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00370351, [53] [py_interpret_to_execute]: 1.423e-05 [rewriter_before_opt_a]: 3.881e-05 [opt_a]: 0.0019091, [2] [Cycle 1]: 0.00131348, [45] [expand_dump_flag]: 2.65002e-06 [switch_simplify]: 2.461e-05 [loop_unroll]: 1.37e-05 [a_1]: 0.0002916 [with_stream_mark]: 1.376e-05 [recompute_prepare]: 7.25003e-06 [updatestate_depend_eliminate]: 3.45003e-06 [updatestate_assign_eliminate]: 3.28e-06 [updatestate_loads_eliminate]: 2.71999e-06 [parameter_eliminate]: 1.60999e-06 [a_2]: 7.596e-05 [accelerated_algorithm]: 6.34999e-06 [shard]: 2.22001e-06 [meta_shard_fg_expand]: 1.41002e-06 [shard_inline]: 5.96998e-06 [merge_send_recv]: 7.74002e-06 [auto_parallel]: 6.05002e-06 [parallel]: 1.84e-05 [flash_sp]: 6.80002e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.57002e-06 [matmul_add_comm_reduction]: 8.72e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.25998e-06 [virtual_dataset]: 5.67999e-06 [get_grad_eliminate_]: 5.26998e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.86001e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 8.92999e-06 [cell_reuse_handle_not_recompute_node_pass]: 7.798e-05 [merge_recompute_call_nodes]: 1.39998e-06 [before_grad]: 9.22001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.39001e-06 [flash_sp_send_recv_attached]: 2.63e-06 [receive_attached]: 2.84999e-06 [after_resolve]: 1.031e-05 [a_after_grad]: 8.94e-06 [renormalize]: 0.00033761 [add_forward_monad_depend]: 4.4e-06 [auto_monad_grad]: 1.79e-06 [auto_monad_eliminator]: 1.238e-05 [cse]: 2.681e-05 [a_3]: 3.967e-05 [Cycle 2]: 0.00058644, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 7.02002e-06 [loop_unroll]: 5.29e-06 [a_1]: 0.00012472 [with_stream_mark]: 1.057e-05 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.79001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.53998e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.757e-05 [accelerated_algorithm]: 5.39e-06 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.58002e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.37999e-06 [parallel]: 4.21001e-06 [flash_sp]: 3.13e-06 [merge_comm]: 3.21001e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 6.10002e-06 [virtual_dataset]: 5.17e-06 [get_grad_eliminate_]: 5.03002e-06 [virtual_output]: 4.93001e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 5.84999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.46e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.75998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.71002e-06 [flash_sp_send_recv_attached]: 6.69999e-07 [receive_attached]: 1.04e-06 [after_resolve]: 8.54e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 9.80013e-07 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 5.77999e-06 [cse]: 1.281e-05 [a_3]: 3.175e-05 [py_interpret_to_execute_after_opt_a]: 7.68001e-06 [slice_cell_reuse_recomputed_activation]: 2.27999e-06 [rewriter_after_opt_a]: 3.091e-05 [convert_after_rewriter]: 6.83e-06 [order_py_execute_after_rewriter]: 4.81002e-06 [mutable_eliminate]: 0.00044649 [opt_b]: 0.00017893, [1] [Cycle 1]: 0.00017328, [7] [b_1]: 0.00010654 [b_2]: 7.12002e-06 [updatestate_depend_eliminate]: 5.07e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 4.10015e-07 [cse]: 1.556e-05 [optimize_parallel_all_gather_comm]: 1.549e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.233e-05 [loop_unroll]: 0.00041341 [opt_after_cconv]: 9.39e-05, [1] [Cycle 1]: 8.82e-05, [7] [c_1]: 2.713e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 5.24e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.14999e-06 [cse]: 1.601e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.195e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.382e-05, [4] [d_1]: 3.853e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 5.99e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.245e-05 [cse_after_recomputation]: 2.021e-05, [1] [Cycle 1]: 1.591e-05, [1] [cse]: 1.083e-05 [environ_conv]: 4.77998e-06 [swap_dp_allreduce_reducescatter]: 4.68001e-06 [bias_add_comm_swap]: 2.35002e-06 [label_micro_interleaved_index]: 4.53999e-06 [label_fine_grained_interleaved_index]: 3.09999e-06 [merge_cast_opt]: 1.29e-06 [slice_recompute_activation]: 2.63e-06 [micro_interleaved_order_control]: 2.28002e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 9.89996e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 1.99999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.13001e-06 [overlap_opt_shard_in_pipeline]: 1.07998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.69998e-06 [offloading_packed_experts]: 3.75998e-06 [overlap_recompute_and_grad_model_parallel]: 4.60999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 3.95998e-06 [overlap_grad_flash_sp]: 1.637e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.31e-06 [split_layernorm_comm]: 2.21998e-06 [handle_group_info]: 1.25999e-06 [symbol_engine_optimizer]: 6.775e-05, [1] [Cycle 1]: 6.328e-05, [6] [build]: 2.31998e-06 [elim_shapecalc]: 8.3e-06 [elim_not_effective]: 1.125e-05 [opt_reshape]: 5.99e-06 [fold_const_symbol]: 8.69e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.511e-05 [get_jit_bprop_graph]: 9.30013e-07 [rewriter_after_jit_bprop_graph]: 3.27002e-06 [opt_after_jit_grad]: 0.00044889 [validate]: 3.07e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0949656 [execute]: 9.86998e-06 Sums bootstrap : 0.000478s : 0.46% type_inference : 0.004446s : 4.30% event_method : 0.000011s : 0.01% auto_monad : 0.000051s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000014s : 0.01% optimize.rewriter_before_opt_a : 0.000039s : 0.04% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000032s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000416s : 0.40% optimize.opt_a.with_stream_mark : 0.000024s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000023s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000087s : 0.08% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000338s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000018s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000071s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000446s : 0.43% optimize.opt_b.b_1 : 0.000107s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000413s : 0.40% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.43% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.094966s : 91.80% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000186 26 11.93% : 0.000022s : 4: substitution.arithmetic_simplify 0.93% : 0.000002s : 2: substitution.elim_not_effective 0.67% : 0.000001s : 2: substitution.fold_const_symbol 2.89% : 0.000005s : 4: substitution.graph_param_transform 42.09% : 0.000078s : 2: substitution.inline 1.42% : 0.000003s : 4: substitution.j_node_and_user_rematch 38.13% : 0.000071s : 4: substitution.remove_not_recompute_node 1.94% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004406 2 91.07% : 0.004013s : 1: type_inference.infer 8.93% : 0.000393s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000138 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.71% : 0.000001s : 8: predicate.addn_check_dump 0.71% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.52% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.98% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.42% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.46% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 13: predicate.environ_get_add_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000002s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.85% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.90% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.66% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.30% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.76% : 0.000001s : 8: predicate.reduce_all_const_elim 1.15% : 0.000002s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.69% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.50% : 0.000001s : 4: predicate.row_tensor_eliminate 0.92% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 0.92% : 0.000001s : 8: predicate.special_op_eliminate 0.88% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.01% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.71% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.53% : 0.000006s : 41: predicate.switch_simplify 0.76% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.50% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.52% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.77% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.06% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.86% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.61% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000267 6 43.49% : 0.000116s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.51% : 0.000151s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.115304 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.55% : 0.002938s : 1: add_attr 2.54% : 0.002929s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000046s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000056s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.45% : 0.000514s : 1: bootstrap 0.02% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000017s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.37% : 0.000422s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.40% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.72% : 0.000832s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.66% : 0.001912s : 1: opt_a 0.08% : 0.000097s : 1: opt_after_cconv 0.40% : 0.000458s : 1: opt_after_jit_grad 0.16% : 0.000182s : 1: opt_b 3.22% : 0.003707s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.16% : 0.000184s : 1: renormalize.infer 0.13% : 0.000147s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000043s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.38% : 0.094986s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.87% : 0.004459s : 1: type_inference 0.05% : 0.000052s : 1: validate TotalTime = 0.110461, [24] [bootstrap]: 0.00045735 [type_inference]: 0.00550848 [event_method]: 1.415e-05 [auto_monad]: 6.469e-05 [graph_reusing]: 5.67001e-06 [inline]: 1.65001e-06 [add_attr]: 0.0029205, [1] [add_attr_with_inline]: 0.00291222, [1] [Cycle 1]: 4.487e-05, [2] [tag_attr]: 1.554e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.69001e-06 [pre_auto_parallel]: 2.41e-05 [insert-virtual-dataset]: 2.16998e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.88997e-06 [pipeline_split]: 1.45999e-06 [optimize]: 0.00393925, [53] [py_interpret_to_execute]: 2.031e-05 [rewriter_before_opt_a]: 5.827e-05 [opt_a]: 0.00212163, [2] [Cycle 1]: 0.00151776, [45] [expand_dump_flag]: 2.59001e-06 [switch_simplify]: 3.14e-05 [loop_unroll]: 2.089e-05 [a_1]: 0.00047461 [with_stream_mark]: 1.388e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 3.73001e-06 [updatestate_assign_eliminate]: 3.49001e-06 [updatestate_loads_eliminate]: 2.69001e-06 [parameter_eliminate]: 2.14999e-06 [a_2]: 7.527e-05 [accelerated_algorithm]: 6.48e-06 [shard]: 2.16e-06 [meta_shard_fg_expand]: 1.91e-06 [shard_inline]: 5.84e-06 [merge_send_recv]: 7.58001e-06 [auto_parallel]: 5.59e-06 [parallel]: 1.686e-05 [flash_sp]: 7.04001e-06 [merge_comm]: 3.33998e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.35001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.79999e-06 [virtual_dataset]: 5.92999e-06 [get_grad_eliminate_]: 5.71e-06 [virtual_output]: 5.74999e-06 [merge_forward]: 3.80998e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 9.05999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.061e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 9.19998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.48999e-06 [meta_fg_expand]: 2.22999e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00041055 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.99e-06 [auto_monad_eliminator]: 1.347e-05 [cse]: 2.664e-05 [a_3]: 4.041e-05 [Cycle 2]: 0.00059416, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 6.76999e-06 [loop_unroll]: 5.37999e-06 [a_1]: 0.00012448 [with_stream_mark]: 9.56998e-06 [recompute_prepare]: 5.82999e-06 [updatestate_depend_eliminate]: 2.74999e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 6.746e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.08001e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.90998e-06 [flash_sp]: 3.03e-06 [merge_comm]: 3.11001e-06 [allreduce_fusion]: 3.01999e-06 [matmul_add_comm_reduction]: 5.07999e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.19001e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.16998e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 2.65002e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 6.16e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72999e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 7.93999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08998e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.01997e-06 [after_resolve]: 9.15001e-06 [a_after_grad]: 8.2e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.15002e-06 [cse]: 1.325e-05 [a_3]: 3.135e-05 [py_interpret_to_execute_after_opt_a]: 7.77e-06 [slice_cell_reuse_recomputed_activation]: 1.72999e-06 [rewriter_after_opt_a]: 3.131e-05 [convert_after_rewriter]: 6.95002e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00044701 [opt_b]: 0.00017893, [1] [Cycle 1]: 0.00017284, [7] [b_1]: 0.0001065 [b_2]: 6.80998e-06 [updatestate_depend_eliminate]: 5.10999e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.16e-06 [renormalize]: 4.69998e-07 [cse]: 1.573e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 2.112e-05 [loop_unroll]: 0.00041135 [opt_after_cconv]: 9.432e-05, [1] [Cycle 1]: 8.863e-05, [7] [c_1]: 2.719e-05 [parameter_eliminate]: 2.06e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.53e-06 [updatestate_loads_eliminate]: 2.31998e-06 [cse]: 1.594e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.287e-05 [tuple_transform]: 6.875e-05, [1] [Cycle 1]: 6.438e-05, [4] [d_1]: 3.867e-05 [none_parameter_eliminate]: 1.77001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.01e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.382e-05 [cse_after_recomputation]: 1.953e-05, [1] [Cycle 1]: 1.524e-05, [1] [cse]: 1.028e-05 [environ_conv]: 4.49998e-06 [swap_dp_allreduce_reducescatter]: 4.85999e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.11001e-06 [label_fine_grained_interleaved_index]: 2.48e-06 [merge_cast_opt]: 1.57001e-06 [slice_recompute_activation]: 2.29001e-06 [micro_interleaved_order_control]: 2.22999e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 1.04e-06 [remove_cast_before_assign_add]: 1.10001e-06 [full_micro_interleaved_order_control]: 2.41e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 1.28002e-06 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.41998e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.94e-06 [control_data_broadcast_order]: 1.13e-05 [grouped_pairwise_exchange_alltoall]: 1.77999e-06 [offloading_packed_experts]: 3.58e-06 [overlap_recompute_and_grad_model_parallel]: 4.28999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.44998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.36002e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 3.73001e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.10002e-06 [split_layernorm_comm]: 1.57001e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.62e-05, [1] [Cycle 1]: 6.216e-05, [6] [build]: 2.24001e-06 [elim_shapecalc]: 8.07998e-06 [elim_not_effective]: 1.124e-05 [opt_reshape]: 5.84e-06 [fold_const_symbol]: 8.80001e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.69e-06 [auto_monad_reorder]: 1.576e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.42002e-06 [opt_after_jit_grad]: 0.00047252 [validate]: 3.056e-05 [backend_pass]: 8.2e-07 [task_emit]: 0.0967722 [execute]: 9.47999e-06 Sums bootstrap : 0.000457s : 0.43% type_inference : 0.005508s : 5.17% event_method : 0.000014s : 0.01% auto_monad : 0.000065s : 0.06% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000024s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.04% optimize.opt_a.loop_unroll : 0.000026s : 0.02% optimize.opt_a.a_1 : 0.000599s : 0.56% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000411s : 0.39% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000447s : 0.42% optimize.opt_b.b_1 : 0.000106s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000411s : 0.39% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000473s : 0.44% validate : 0.000031s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.096772s : 90.81% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000161 30 15.20% : 0.000024s : 5: substitution.arithmetic_simplify 1.05% : 0.000002s : 2: substitution.elim_not_effective 0.78% : 0.000001s : 2: substitution.fold_const_symbol 3.31% : 0.000005s : 4: substitution.graph_param_transform 66.31% : 0.000107s : 3: substitution.inline 1.71% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.64% : 0.000004s : 4: substitution.remove_not_recompute_node 2.31% : 0.000004s : 4: substitution.replace_old_param 6.69% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005467 2 89.13% : 0.004873s : 1: type_inference.infer 10.87% : 0.000594s : 1: type_inference.specialize ------[replace.] 0.000068 5 83.19% : 0.000056s : 3: replace.inline 16.81% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000115 5 91.46% : 0.000105s : 3: match.inline 8.54% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 1.01% : 0.000002s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.32% : 0.000004s : 19: predicate.arithmetic_simplify 0.86% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.61% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.90% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.90% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.17% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.04% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.86% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.59% : 0.000001s : 8: predicate.incorporate_call_switch 6.14% : 0.000010s : 51: predicate.inline 0.85% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.24% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.65% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.24% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.58% : 0.000002s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.67% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.51% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.36% : 0.000001s : 4: predicate.reset_defer_inline 0.95% : 0.000001s : 11: predicate.reshape_eliminate 0.88% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.37% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.78% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.83% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.63% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.49% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.48% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.31% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.45% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000390 8 40.49% : 0.000158s : 3: func_graph_cloner_run.FuncGraphClonerGraph 59.51% : 0.000232s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.118872 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.46% : 0.002925s : 1: add_attr 2.45% : 0.002916s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.06% : 0.000070s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.41% : 0.000492s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000020s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.35% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.38% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.81% : 0.000962s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.04% : 0.000051s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000089s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.79% : 0.002124s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.41% : 0.000482s : 1: opt_after_jit_grad 0.15% : 0.000182s : 1: opt_b 3.32% : 0.003943s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.02% : 0.000028s : 1: pre_auto_parallel 0.02% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000203s : 1: renormalize.infer 0.17% : 0.000200s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000069s : 1: symbol_engine_optimizer 81.43% : 0.096795s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.64% : 0.005521s : 1: type_inference 0.04% : 0.000051s : 1: validate TotalTime = 0.143519, [24] [bootstrap]: 0.00050569 [type_inference]: 0.011371 [event_method]: 4.8e-05 [auto_monad]: 0.00012077 [graph_reusing]: 7.68001e-06 [inline]: 1.90001e-06 [add_attr]: 0.00300393, [1] [add_attr_with_inline]: 0.00299528, [1] [Cycle 1]: 7.037e-05, [2] [tag_attr]: 3.453e-05 [meta_addattr_fg_expand]: 9.08002e-06 [parallel-infer-symbol]: 2.86999e-06 [pre_auto_parallel]: 4.983e-05 [insert-virtual-dataset]: 2.84001e-06 [parallel-infer-symbol-second]: 8.30012e-07 [dataset_repeat_opt]: 1.82001e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.013424, [53] [py_interpret_to_execute]: 3.767e-05 [rewriter_before_opt_a]: 0.00014395 [opt_a]: 0.0111119, [3] [Cycle 1]: 0.00712837, [45] [expand_dump_flag]: 3.80998e-06 [switch_simplify]: 7.41e-05 [loop_unroll]: 6.11e-05 [a_1]: 0.00147906 [with_stream_mark]: 2.231e-05 [recompute_prepare]: 2.195e-05 [updatestate_depend_eliminate]: 9.44e-06 [updatestate_assign_eliminate]: 7.63001e-06 [updatestate_loads_eliminate]: 7.36999e-06 [parameter_eliminate]: 2.89999e-06 [a_2]: 0.00024188 [accelerated_algorithm]: 3.074e-05 [shard]: 1.79998e-06 [meta_shard_fg_expand]: 3.46999e-06 [shard_inline]: 1.609e-05 [merge_send_recv]: 1.539e-05 [auto_parallel]: 1.093e-05 [parallel]: 1.878e-05 [flash_sp]: 1.129e-05 [merge_comm]: 9.71998e-06 [allreduce_fusion]: 8.57e-06 [matmul_add_comm_reduction]: 2.615e-05 [allreduce_slice_to_reducescatter]: 7.30011e-07 [virtual_shard_identity]: 1.794e-05 [virtual_dataset]: 1.576e-05 [get_grad_eliminate_]: 1.506e-05 [virtual_output]: 1.492e-05 [merge_forward]: 9.82999e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 1.748e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.899e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 2.699e-05 [set_forward_comm_id_for_comm_node_pass]: 9.57001e-06 [meta_fg_expand]: 0.00140245 [flash_sp_send_recv_attached]: 3.7e-06 [receive_attached]: 2.78998e-06 [after_resolve]: 5.955e-05 [a_after_grad]: 8.054e-05 [renormalize]: 0.00248393 [add_forward_monad_depend]: 9.16998e-06 [auto_monad_grad]: 5.18002e-06 [auto_monad_eliminator]: 5.57e-05 [cse]: 0.00016866 [a_3]: 0.00033417 [Cycle 2]: 0.00303422, [45] [expand_dump_flag]: 1.58997e-06 [switch_simplify]: 4.639e-05 [loop_unroll]: 4.374e-05 [a_1]: 0.00156151 [with_stream_mark]: 1.186e-05 [recompute_prepare]: 1.087e-05 [updatestate_depend_eliminate]: 5.30001e-06 [updatestate_assign_eliminate]: 4.4e-06 [updatestate_loads_eliminate]: 3.60003e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 0.0001267 [accelerated_algorithm]: 1.221e-05 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.74998e-06 [shard_inline]: 9.15001e-06 [merge_send_recv]: 6.68998e-06 [auto_parallel]: 7.23e-06 [parallel]: 4.50001e-06 [flash_sp]: 3.03998e-06 [merge_comm]: 5.10001e-06 [allreduce_fusion]: 4.62e-06 [matmul_add_comm_reduction]: 7.85e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.058e-05 [virtual_dataset]: 9.20999e-06 [get_grad_eliminate_]: 8.55001e-06 [virtual_output]: 8.52e-06 [merge_forward]: 4.42e-06 [cell_reuse_recompute_pass]: 8.49977e-07 [offload_activation]: 9.27999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.688e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.39e-05 [set_forward_comm_id_for_comm_node_pass]: 5.24e-06 [meta_fg_expand]: 6.738e-05 [flash_sp_send_recv_attached]: 1.00001e-06 [receive_attached]: 1.10001e-06 [after_resolve]: 1.596e-05 [a_after_grad]: 1.411e-05 [renormalize]: 0.00059185 [add_forward_monad_depend]: 4.55999e-06 [auto_monad_grad]: 1.14e-06 [auto_monad_eliminator]: 1.488e-05 [cse]: 4.67e-05 [a_3]: 6.587e-05 [Cycle 3]: 0.00093565, [45] [expand_dump_flag]: 9.89996e-07 [switch_simplify]: 1.071e-05 [loop_unroll]: 9.04e-06 [a_1]: 0.00027829 [with_stream_mark]: 1.004e-05 [recompute_prepare]: 9.42001e-06 [updatestate_depend_eliminate]: 4.80001e-06 [updatestate_assign_eliminate]: 3.86999e-06 [updatestate_loads_eliminate]: 3.94002e-06 [parameter_eliminate]: 8.60018e-07 [a_2]: 0.00012296 [accelerated_algorithm]: 1.165e-05 [shard]: 1.00001e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 8.89e-06 [merge_send_recv]: 6.76e-06 [auto_parallel]: 6.92002e-06 [parallel]: 4.53999e-06 [flash_sp]: 1.14e-06 [merge_comm]: 4.80001e-06 [allreduce_fusion]: 4.90001e-06 [matmul_add_comm_reduction]: 7.43e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.026e-05 [virtual_dataset]: 8.72e-06 [get_grad_eliminate_]: 8.43001e-06 [virtual_output]: 8.22e-06 [merge_forward]: 4.2e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 8.54998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.708e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.453e-05 [set_forward_comm_id_for_comm_node_pass]: 5.87001e-06 [meta_fg_expand]: 3.12002e-06 [flash_sp_send_recv_attached]: 9.5999e-07 [receive_attached]: 1.07e-06 [after_resolve]: 1.387e-05 [a_after_grad]: 1.419e-05 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 1.02e-06 [auto_monad_eliminator]: 1.082e-05 [cse]: 2.649e-05 [a_3]: 5.979e-05 [py_interpret_to_execute_after_opt_a]: 1.011e-05 [slice_cell_reuse_recomputed_activation]: 1.97999e-06 [rewriter_after_opt_a]: 4.691e-05 [convert_after_rewriter]: 9.56e-06 [order_py_execute_after_rewriter]: 6.94999e-06 [mutable_eliminate]: 0.00046418 [opt_b]: 0.00028818, [1] [Cycle 1]: 0.00028174, [7] [b_1]: 0.00018867 [b_2]: 1.048e-05 [updatestate_depend_eliminate]: 7.27997e-06 [updatestate_assign_eliminate]: 4.15999e-06 [updatestate_loads_eliminate]: 4.06001e-06 [renormalize]: 3.50003e-07 [cse]: 3.16e-05 [optimize_parallel_all_gather_comm]: 2.107e-05 [overlap_param_gather]: 1.81998e-06 [cconv]: 1.984e-05 [loop_unroll]: 0.00047125 [opt_after_cconv]: 0.00013506, [1] [Cycle 1]: 0.00012923, [7] [c_1]: 4.815e-05 [parameter_eliminate]: 2.43002e-06 [updatestate_depend_eliminate]: 7.06001e-06 [updatestate_assign_eliminate]: 4.07e-06 [updatestate_loads_eliminate]: 3.97e-06 [cse]: 2.985e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 2.885e-05 [tuple_transform]: 0.00010116, [1] [Cycle 1]: 9.663e-05, [4] [d_1]: 6.637e-05 [none_parameter_eliminate]: 1.71998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 1.008e-05 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 5.584e-05 [cse_after_recomputation]: 3.32e-05, [1] [Cycle 1]: 2.861e-05, [1] [cse]: 2.29e-05 [environ_conv]: 9.69999e-06 [swap_dp_allreduce_reducescatter]: 8.01001e-06 [bias_add_comm_swap]: 2.45002e-06 [label_micro_interleaved_index]: 4.45e-06 [label_fine_grained_interleaved_index]: 2.58998e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.67001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.24998e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.20001e-06 [add_comm_op_reuse_tag]: 1.28002e-06 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.70001e-06 [control_data_broadcast_order]: 1.714e-05 [grouped_pairwise_exchange_alltoall]: 1.76e-06 [offloading_packed_experts]: 4.81002e-06 [overlap_recompute_and_grad_model_parallel]: 5.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.38002e-06 [overlap_grad_ring_attention]: 5.00999e-06 [overlap_grad_flash_sp]: 2.402e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.18998e-06 [split_layernorm_comm]: 1.61998e-06 [handle_group_info]: 1.15001e-06 [symbol_engine_optimizer]: 9.809e-05, [1] [Cycle 1]: 9.391e-05, [6] [build]: 9.79e-06 [elim_shapecalc]: 1.313e-05 [elim_not_effective]: 1.875e-05 [opt_reshape]: 1.006e-05 [fold_const_symbol]: 1.491e-05 [renormalize]: 2.30008e-07 [detach_backward]: 1.44998e-06 [pipeline_parallel_scheduler]: 1.37e-06 [auto_monad_reorder]: 2.456e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.43999e-06 [opt_after_jit_grad]: 0.00047515 [validate]: 4.65e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.114196 [execute]: 9.64e-06 Sums bootstrap : 0.000506s : 0.36% type_inference : 0.011371s : 8.17% event_method : 0.000048s : 0.03% auto_monad : 0.000121s : 0.09% graph_reusing : 0.000008s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000035s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000038s : 0.03% optimize.rewriter_before_opt_a : 0.000144s : 0.10% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000131s : 0.09% optimize.opt_a.loop_unroll : 0.000114s : 0.08% optimize.opt_a.a_1 : 0.003319s : 2.38% optimize.opt_a.with_stream_mark : 0.000044s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000492s : 0.35% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000034s : 0.02% optimize.opt_a.merge_send_recv : 0.000029s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000020s : 0.01% optimize.opt_a.allreduce_fusion : 0.000018s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.03% optimize.opt_a.virtual_dataset : 0.000034s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000018s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000063s : 0.05% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000055s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.01% optimize.opt_a.meta_fg_expand : 0.001473s : 1.06% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.06% optimize.opt_a.a_after_grad : 0.000109s : 0.08% optimize.opt_a.renormalize : 0.003076s : 2.21% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.06% optimize.opt_a.cse : 0.000242s : 0.17% optimize.opt_a.a_3 : 0.000460s : 0.33% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000464s : 0.33% optimize.opt_b.b_1 : 0.000189s : 0.14% optimize.opt_b.b_2 : 0.000010s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000471s : 0.34% optimize.opt_after_cconv.c_1 : 0.000048s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000066s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000475s : 0.34% validate : 0.000047s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.114196s : 82.01% execute : 0.000010s : 0.01% Time group info: ------[substitution.] 0.000765 222 6.06% : 0.000046s : 12: substitution.arithmetic_simplify 1.84% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 0.98% : 0.000008s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.50% : 0.000424s : 17: substitution.inline 1.99% : 0.000015s : 2: substitution.inline_without_move 1.33% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.96% : 0.000015s : 3: substitution.less_batch_normalization 1.70% : 0.000013s : 11: substitution.minmaximum_grad 0.69% : 0.000005s : 5: substitution.partial_eliminate 1.92% : 0.000015s : 20: substitution.remove_not_recompute_node 3.09% : 0.000024s : 10: substitution.replace_applicator 1.40% : 0.000011s : 15: substitution.replace_old_param 0.33% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.67% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.74% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.35% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.68% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.37% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011297 2 86.19% : 0.009737s : 1: type_inference.infer 13.81% : 0.001560s : 1: type_inference.specialize ------[replace.] 0.000251 33 63.07% : 0.000158s : 17: replace.inline 36.93% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000450 33 92.36% : 0.000415s : 17: match.inline 7.64% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000760 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.29% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.09% : 0.000008s : 68: predicate.addn_zero_filter 1.06% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.01% : 0.000015s : 100: predicate.arithmetic_simplify 1.18% : 0.000009s : 68: predicate.cast_eliminate 1.14% : 0.000009s : 68: predicate.check_bprop_eliminate 0.51% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.27% : 0.000010s : 68: predicate.dict_get_item_eliminator 1.13% : 0.000009s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.11% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.21% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 76: predicate.environ_get_depend_swap 1.82% : 0.000014s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.74% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.26% : 0.000017s : 101: predicate.float_depend_g_call 0.50% : 0.000004s : 32: predicate.float_environ_get_switch 0.66% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.54% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.48% : 0.000004s : 32: predicate.incorporate_call_switch 5.61% : 0.000043s : 249: predicate.inline 1.28% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.63% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.69% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.25% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.39% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.10% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.09% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.32% : 0.000002s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 2.01% : 0.000015s : 101: predicate.partial_defer_inline 1.73% : 0.000013s : 92: predicate.partial_eliminate 1.06% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.33% : 0.000010s : 68: predicate.reduce_eliminate 2.66% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 68: predicate.reshape_eliminate 1.11% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.64% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.28% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.10% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.15% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.95% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.10% : 0.000008s : 68: predicate.transpose_eliminate 1.46% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.57% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.86% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.97% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.63% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.67% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.30% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.55% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.16% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001593 34 56.40% : 0.000899s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.60% : 0.000695s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168327 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.79% : 0.003008s : 1: add_attr 1.78% : 0.002999s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.08% : 0.000128s : 1: auto_monad 0.02% : 0.000028s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.32% : 0.000540s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.02% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000055s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.29% : 0.000480s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.28% : 0.000473s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 2.96% : 0.004979s : 117: opt.transform.opt_a 0.03% : 0.000047s : 1: opt.transform.opt_after_cconv 0.02% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000174s : 28: opt.transform.opt_b 0.04% : 0.000074s : 2: opt.transform.opt_trans_graph 0.03% : 0.000054s : 4: opt.transform.symbol_engine_opt 6.60% : 0.011115s : 1: opt_a 0.08% : 0.000139s : 1: opt_after_cconv 0.29% : 0.000484s : 1: opt_after_jit_grad 0.17% : 0.000292s : 1: opt_b 7.98% : 0.013428s : 1: optimize 0.01% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.03% : 0.000054s : 1: pre_auto_parallel 0.02% : 0.000042s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 0.99% : 0.001658s : 2: renormalize.infer 0.83% : 0.001405s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.09% : 0.000148s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000101s : 1: symbol_engine_optimizer 67.85% : 0.114218s : 1: task_emit 0.06% : 0.000104s : 1: tuple_transform 6.76% : 0.011385s : 1: type_inference 0.04% : 0.000071s : 1: validate TotalTime = 0.103764, [24] [bootstrap]: 0.00047251 [type_inference]: 0.00432935 [event_method]: 1.097e-05 [auto_monad]: 5.046e-05 [graph_reusing]: 5.06002e-06 [inline]: 1.92001e-06 [add_attr]: 0.00294873, [1] [add_attr_with_inline]: 0.00294087, [1] [Cycle 1]: 4.573e-05, [2] [tag_attr]: 1.181e-05 [meta_addattr_fg_expand]: 3.04001e-06 [parallel-infer-symbol]: 2.94999e-06 [pre_auto_parallel]: 2.099e-05 [insert-virtual-dataset]: 2.44999e-06 [parallel-infer-symbol-second]: 8.60018e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.67001e-06 [optimize]: 0.00365629, [53] [py_interpret_to_execute]: 1.559e-05 [rewriter_before_opt_a]: 3.792e-05 [opt_a]: 0.00184181, [2] [Cycle 1]: 0.00124349, [45] [expand_dump_flag]: 2.44999e-06 [switch_simplify]: 2.331e-05 [loop_unroll]: 1.324e-05 [a_1]: 0.00028934 [with_stream_mark]: 1.384e-05 [recompute_prepare]: 7.65998e-06 [updatestate_depend_eliminate]: 3.53e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.96001e-06 [parameter_eliminate]: 1.89e-06 [a_2]: 7.665e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 2.34999e-06 [meta_shard_fg_expand]: 1.41998e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 7.68999e-06 [auto_parallel]: 5.92999e-06 [parallel]: 1.706e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.98001e-06 [allreduce_fusion]: 3.25e-06 [matmul_add_comm_reduction]: 9.31e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.92002e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.63002e-06 [virtual_output]: 5.71e-06 [merge_forward]: 3.45e-06 [cell_reuse_recompute_pass]: 1.16997e-06 [offload_activation]: 8.93002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.136e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.68002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35e-06 [meta_fg_expand]: 2.25002e-06 [flash_sp_send_recv_attached]: 2.30002e-06 [receive_attached]: 2.22001e-06 [after_resolve]: 9.99001e-06 [a_after_grad]: 8.87999e-06 [renormalize]: 0.00033825 [add_forward_monad_depend]: 4.03001e-06 [auto_monad_grad]: 1.82001e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 2.554e-05 [a_3]: 4.015e-05 [Cycle 2]: 0.00058908, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.86999e-06 [loop_unroll]: 5.29e-06 [a_1]: 0.00012411 [with_stream_mark]: 9.22999e-06 [recompute_prepare]: 5.47999e-06 [updatestate_depend_eliminate]: 2.83998e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.49999e-06 [parameter_eliminate]: 8.79983e-07 [a_2]: 6.773e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.3e-06 [auto_parallel]: 4.98001e-06 [parallel]: 4.12e-06 [flash_sp]: 3.10998e-06 [merge_comm]: 2.94999e-06 [allreduce_fusion]: 2.70997e-06 [matmul_add_comm_reduction]: 4.97e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.32001e-06 [virtual_output]: 5.07e-06 [merge_forward]: 2.58003e-06 [cell_reuse_recompute_pass]: 1.40999e-06 [offload_activation]: 6.14001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.57001e-06 [merge_recompute_call_nodes]: 7.39994e-07 [before_grad]: 8.02e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.57999e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.76997e-06 [a_after_grad]: 8.07e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.30997e-06 [cse]: 1.23e-05 [a_3]: 3.306e-05 [py_interpret_to_execute_after_opt_a]: 7.23999e-06 [slice_cell_reuse_recomputed_activation]: 2.06998e-06 [rewriter_after_opt_a]: 3.062e-05 [convert_after_rewriter]: 6.79001e-06 [order_py_execute_after_rewriter]: 5.05001e-06 [mutable_eliminate]: 0.00046475 [opt_b]: 0.00017954, [1] [Cycle 1]: 0.00017383, [7] [b_1]: 0.00010659 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 5.40022e-07 [cse]: 1.588e-05 [optimize_parallel_all_gather_comm]: 1.558e-05 [overlap_param_gather]: 1.89999e-06 [cconv]: 2.143e-05 [loop_unroll]: 0.0004119 [opt_after_cconv]: 9.475e-05, [1] [Cycle 1]: 8.892e-05, [7] [c_1]: 2.733e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.649e-05 [renormalize]: 1.8999e-07 [remove_dup_value]: 1.222e-05 [tuple_transform]: 6.912e-05, [1] [Cycle 1]: 6.452e-05, [4] [d_1]: 3.915e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.42001e-06 [partial_unused_args_eliminate]: 1.55001e-06 [add_recomputation]: 4.406e-05 [cse_after_recomputation]: 2.006e-05, [1] [Cycle 1]: 1.559e-05, [1] [cse]: 1.049e-05 [environ_conv]: 4.45e-06 [swap_dp_allreduce_reducescatter]: 5.22e-06 [bias_add_comm_swap]: 3.09999e-06 [label_micro_interleaved_index]: 4.18999e-06 [label_fine_grained_interleaved_index]: 2.64999e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.54999e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.41002e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.55002e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 9.90025e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.91e-06 [control_data_broadcast_order]: 1.192e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.85e-06 [overlap_recompute_and_grad_model_parallel]: 4.58999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.18002e-06 [overlap_grad_ring_attention]: 4.01001e-06 [overlap_grad_flash_sp]: 1.7e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.01998e-06 [split_layernorm_comm]: 2.32999e-06 [handle_group_info]: 9.29984e-07 [symbol_engine_optimizer]: 6.748e-05, [1] [Cycle 1]: 6.335e-05, [6] [build]: 2.22001e-06 [elim_shapecalc]: 8.22e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 5.96e-06 [fold_const_symbol]: 8.64e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.568e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.36999e-06 [opt_after_jit_grad]: 0.00044403 [validate]: 3.199e-05 [backend_pass]: 1.05001e-06 [task_emit]: 0.0915448 [execute]: 8.52e-06 Sums bootstrap : 0.000473s : 0.47% type_inference : 0.004329s : 4.34% event_method : 0.000011s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000030s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000413s : 0.41% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.14% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000021s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000338s : 0.34% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000038s : 0.04% optimize.opt_a.a_3 : 0.000073s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000465s : 0.47% optimize.opt_b.b_1 : 0.000107s : 0.11% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000412s : 0.41% optimize.opt_after_cconv.c_1 : 0.000027s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000444s : 0.44% validate : 0.000032s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.091545s : 91.68% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000120 26 17.73% : 0.000021s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.64% : 0.000006s : 4: substitution.graph_param_transform 65.28% : 0.000078s : 2: substitution.inline 2.72% : 0.000003s : 4: substitution.j_node_and_user_rematch 4.05% : 0.000005s : 4: substitution.remove_not_recompute_node 3.05% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004288 2 91.62% : 0.003928s : 1: type_inference.infer 8.38% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000135 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.41% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.30% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.20% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 2.09% : 0.000003s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.78% : 0.000002s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.30% : 0.000000s : 4: predicate.fold_const_symbol 1.06% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000008s : 44: predicate.inline 1.08% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.96% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.86% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.77% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.29% : 0.000002s : 17: predicate.replace_applicator 0.77% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.84% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 1.05% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.51% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.00% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.83% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.34% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.10% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000246 6 41.44% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.56% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.111624 196 0.00% : 0.000003s : 1: ForceFp32Comm 2.65% : 0.002953s : 1: add_attr 2.64% : 0.002944s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.45% : 0.000506s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000015s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.38% : 0.000420s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.42% : 0.000474s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.68% : 0.000763s : 78: opt.transform.opt_a 0.02% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000044s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.65% : 0.001845s : 1: opt_a 0.09% : 0.000098s : 1: opt_after_cconv 0.41% : 0.000453s : 1: opt_after_jit_grad 0.16% : 0.000183s : 1: opt_b 3.28% : 0.003660s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000004s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.17% : 0.000185s : 1: renormalize.infer 0.13% : 0.000146s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000042s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000070s : 1: symbol_engine_optimizer 82.03% : 0.091567s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 3.89% : 0.004343s : 1: type_inference 0.05% : 0.000054s : 1: validate TotalTime = 0.14262, [24] [bootstrap]: 0.00049139 [type_inference]: 0.0107715 [event_method]: 4.279e-05 [auto_monad]: 0.00011275 [graph_reusing]: 7.87003e-06 [inline]: 2.63e-06 [add_attr]: 0.00301688, [1] [add_attr_with_inline]: 0.00300869, [1] [Cycle 1]: 6.648e-05, [2] [tag_attr]: 3.197e-05 [meta_addattr_fg_expand]: 8.16002e-06 [parallel-infer-symbol]: 2.74001e-06 [pre_auto_parallel]: 4.538e-05 [insert-virtual-dataset]: 2.29999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.87001e-06 [optimize]: 0.0137488, [53] [py_interpret_to_execute]: 3.617e-05 [rewriter_before_opt_a]: 0.00012677 [opt_a]: 0.0114295, [3] [Cycle 1]: 0.00749159, [45] [expand_dump_flag]: 3.41999e-06 [switch_simplify]: 6.658e-05 [loop_unroll]: 5.454e-05 [a_1]: 0.00133691 [with_stream_mark]: 2.25e-05 [recompute_prepare]: 2.139e-05 [updatestate_depend_eliminate]: 9.07999e-06 [updatestate_assign_eliminate]: 7.86001e-06 [updatestate_loads_eliminate]: 7.57002e-06 [parameter_eliminate]: 2.51e-06 [a_2]: 0.00024436 [accelerated_algorithm]: 3.078e-05 [shard]: 2.17999e-06 [meta_shard_fg_expand]: 3.18998e-06 [shard_inline]: 1.615e-05 [merge_send_recv]: 1.543e-05 [auto_parallel]: 1.064e-05 [parallel]: 1.81e-05 [flash_sp]: 1.129e-05 [merge_comm]: 9.41e-06 [allreduce_fusion]: 9.02999e-06 [matmul_add_comm_reduction]: 2.627e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.767e-05 [virtual_dataset]: 1.55e-05 [get_grad_eliminate_]: 1.548e-05 [virtual_output]: 1.489e-05 [merge_forward]: 9.52999e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 1.7e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.837e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 2.74e-05 [set_forward_comm_id_for_comm_node_pass]: 9.45001e-06 [meta_fg_expand]: 0.00143192 [flash_sp_send_recv_attached]: 3.82002e-06 [receive_attached]: 2.93e-06 [after_resolve]: 5.992e-05 [a_after_grad]: 8.223e-05 [renormalize]: 0.00290107 [add_forward_monad_depend]: 9.59e-06 [auto_monad_grad]: 5.33002e-06 [auto_monad_eliminator]: 5.644e-05 [cse]: 0.0001768 [a_3]: 0.00039803 [Cycle 2]: 0.00299342, [45] [expand_dump_flag]: 1.64e-06 [switch_simplify]: 4.818e-05 [loop_unroll]: 4.387e-05 [a_1]: 0.00154094 [with_stream_mark]: 1.197e-05 [recompute_prepare]: 1.11e-05 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 4.37998e-06 [updatestate_loads_eliminate]: 3.7e-06 [parameter_eliminate]: 1.18001e-06 [a_2]: 0.00012668 [accelerated_algorithm]: 1.228e-05 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.85001e-06 [shard_inline]: 9.53002e-06 [merge_send_recv]: 7.36999e-06 [auto_parallel]: 7.53999e-06 [parallel]: 5.22e-06 [flash_sp]: 3.08e-06 [merge_comm]: 5.00001e-06 [allreduce_fusion]: 4.72998e-06 [matmul_add_comm_reduction]: 9.25999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.035e-05 [virtual_dataset]: 8.75001e-06 [get_grad_eliminate_]: 9.24e-06 [virtual_output]: 8.58001e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 1.006e-05 [cell_reuse_handle_not_recompute_node_pass]: 1.636e-05 [merge_recompute_call_nodes]: 8.30012e-07 [before_grad]: 1.437e-05 [set_forward_comm_id_for_comm_node_pass]: 5.35001e-06 [meta_fg_expand]: 3.852e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.20999e-06 [after_resolve]: 1.521e-05 [a_after_grad]: 1.489e-05 [renormalize]: 0.00059685 [add_forward_monad_depend]: 3.98999e-06 [auto_monad_grad]: 1.22999e-06 [auto_monad_eliminator]: 1.462e-05 [cse]: 4.693e-05 [a_3]: 6.634e-05 [Cycle 3]: 0.00092942, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 1.069e-05 [loop_unroll]: 8.95001e-06 [a_1]: 0.00025293 [with_stream_mark]: 1.02e-05 [recompute_prepare]: 9.47001e-06 [updatestate_depend_eliminate]: 4.68001e-06 [updatestate_assign_eliminate]: 3.81001e-06 [updatestate_loads_eliminate]: 3.76001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 0.00012378 [accelerated_algorithm]: 1.169e-05 [shard]: 1.15001e-06 [meta_shard_fg_expand]: 1.66002e-06 [shard_inline]: 8.98002e-06 [merge_send_recv]: 6.94001e-06 [auto_parallel]: 7.03998e-06 [parallel]: 4.64998e-06 [flash_sp]: 1.04e-06 [merge_comm]: 4.77998e-06 [allreduce_fusion]: 4.92999e-06 [matmul_add_comm_reduction]: 7.43999e-06 [allreduce_slice_to_reducescatter]: 4.10015e-07 [virtual_shard_identity]: 1.037e-05 [virtual_dataset]: 2.769e-05 [get_grad_eliminate_]: 8.75001e-06 [virtual_output]: 8.43999e-06 [merge_forward]: 4.55999e-06 [cell_reuse_recompute_pass]: 1.38002e-06 [offload_activation]: 8.67e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.622e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.389e-05 [set_forward_comm_id_for_comm_node_pass]: 6.16e-06 [meta_fg_expand]: 3.06001e-06 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 9.89996e-07 [after_resolve]: 1.519e-05 [a_after_grad]: 1.441e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.28002e-06 [auto_monad_grad]: 9.09989e-07 [auto_monad_eliminator]: 1.118e-05 [cse]: 2.737e-05 [a_3]: 6.007e-05 [py_interpret_to_execute_after_opt_a]: 1.138e-05 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 4.692e-05 [convert_after_rewriter]: 8.72998e-06 [order_py_execute_after_rewriter]: 6.76e-06 [mutable_eliminate]: 0.00050649 [opt_b]: 0.00029144, [1] [Cycle 1]: 0.00028517, [7] [b_1]: 0.00019051 [b_2]: 1.105e-05 [updatestate_depend_eliminate]: 7.16001e-06 [updatestate_assign_eliminate]: 3.93999e-06 [updatestate_loads_eliminate]: 4.07998e-06 [renormalize]: 3.9002e-07 [cse]: 3.232e-05 [optimize_parallel_all_gather_comm]: 2.022e-05 [overlap_param_gather]: 1.83002e-06 [cconv]: 2.036e-05 [loop_unroll]: 0.00044073 [opt_after_cconv]: 0.0001391, [1] [Cycle 1]: 0.00013298, [7] [c_1]: 4.94e-05 [parameter_eliminate]: 2.05002e-06 [updatestate_depend_eliminate]: 7.05e-06 [updatestate_assign_eliminate]: 4.17998e-06 [updatestate_loads_eliminate]: 3.80998e-06 [cse]: 3.167e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 2.912e-05 [tuple_transform]: 0.00010205, [1] [Cycle 1]: 9.734e-05, [4] [d_1]: 6.715e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 9.92999e-06 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 5.59e-05 [cse_after_recomputation]: 3.403e-05, [1] [Cycle 1]: 2.917e-05, [1] [cse]: 2.32e-05 [environ_conv]: 9.57001e-06 [swap_dp_allreduce_reducescatter]: 8.23001e-06 [bias_add_comm_swap]: 3.14001e-06 [label_micro_interleaved_index]: 4.46002e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 2.08002e-06 [micro_interleaved_order_control]: 2.84999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 1.42e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.46e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.17999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.61998e-06 [control_data_broadcast_order]: 1.719e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 5.32001e-06 [overlap_recompute_and_grad_model_parallel]: 5.42001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.73e-06 [overlap_grad_ring_attention]: 5.25999e-06 [overlap_grad_flash_sp]: 2.432e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.32001e-06 [split_layernorm_comm]: 1.55999e-06 [handle_group_info]: 9.30013e-07 [symbol_engine_optimizer]: 9.99e-05, [1] [Cycle 1]: 9.568e-05, [6] [build]: 9.37001e-06 [elim_shapecalc]: 1.387e-05 [elim_not_effective]: 1.846e-05 [opt_reshape]: 1.03e-05 [fold_const_symbol]: 1.556e-05 [renormalize]: 2.3999e-07 [detach_backward]: 1.81998e-06 [pipeline_parallel_scheduler]: 1.81e-06 [auto_monad_reorder]: 2.504e-05 [get_jit_bprop_graph]: 1.24998e-06 [rewriter_after_jit_bprop_graph]: 3.77002e-06 [opt_after_jit_grad]: 0.0004882 [validate]: 4.812e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.113571 [execute]: 8.21002e-06 Sums bootstrap : 0.000491s : 0.36% type_inference : 0.010772s : 7.79% event_method : 0.000043s : 0.03% auto_monad : 0.000113s : 0.08% graph_reusing : 0.000008s : 0.01% inline : 0.000003s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000032s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000045s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.09% optimize.opt_a.expand_dump_flag : 0.000006s : 0.00% optimize.opt_a.switch_simplify : 0.000125s : 0.09% optimize.opt_a.loop_unroll : 0.000107s : 0.08% optimize.opt_a.a_1 : 0.003131s : 2.26% optimize.opt_a.with_stream_mark : 0.000045s : 0.03% optimize.opt_a.recompute_prepare : 0.000042s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000495s : 0.36% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.04% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.00% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.02% optimize.opt_a.auto_parallel : 0.000025s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.02% optimize.opt_a.flash_sp : 0.000015s : 0.01% optimize.opt_a.merge_comm : 0.000019s : 0.01% optimize.opt_a.allreduce_fusion : 0.000019s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.03% optimize.opt_a.virtual_dataset : 0.000052s : 0.04% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.02% optimize.opt_a.virtual_output : 0.000032s : 0.02% optimize.opt_a.merge_forward : 0.000019s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000056s : 0.04% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001473s : 1.07% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.00% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000090s : 0.07% optimize.opt_a.a_after_grad : 0.000112s : 0.08% optimize.opt_a.renormalize : 0.003498s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000082s : 0.06% optimize.opt_a.cse : 0.000251s : 0.18% optimize.opt_a.a_3 : 0.000524s : 0.38% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.03% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.00% optimize.mutable_eliminate : 0.000506s : 0.37% optimize.opt_b.b_1 : 0.000191s : 0.14% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.01% optimize.loop_unroll : 0.000441s : 0.32% optimize.opt_after_cconv.c_1 : 0.000049s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000032s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.02% optimize.tuple_transform.d_1 : 0.000067s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000056s : 0.04% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000010s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000488s : 0.35% validate : 0.000048s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.113571s : 82.10% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000736 218 6.02% : 0.000044s : 11: substitution.arithmetic_simplify 1.94% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.51% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.33% : 0.000002s : 5: substitution.fold_const_symbol 1.02% : 0.000007s : 8: substitution.graph_param_transform 0.35% : 0.000003s : 2: substitution.incorporate_call 0.27% : 0.000002s : 2: substitution.incorporate_call_switch 54.87% : 0.000404s : 16: substitution.inline 2.11% : 0.000016s : 2: substitution.inline_without_move 1.42% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.00% : 0.000015s : 3: substitution.less_batch_normalization 1.78% : 0.000013s : 11: substitution.minmaximum_grad 0.72% : 0.000005s : 5: substitution.partial_eliminate 1.79% : 0.000013s : 20: substitution.remove_not_recompute_node 3.35% : 0.000025s : 10: substitution.replace_applicator 1.45% : 0.000011s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.79% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.32% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.47% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010690 2 87.49% : 0.009353s : 1: type_inference.infer 12.51% : 0.001338s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.58% : 0.000119s : 16: replace.inline 40.42% : 0.000081s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000425 30 93.01% : 0.000395s : 16: match.inline 6.99% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000796 5663 1.00% : 0.000008s : 67: predicate.accumulaten_eliminater 0.26% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.48% : 0.000004s : 32: predicate.addn_check_dump 0.98% : 0.000008s : 67: predicate.addn_zero_filter 0.98% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 1.87% : 0.000015s : 99: predicate.arithmetic_simplify 1.05% : 0.000008s : 67: predicate.cast_eliminate 1.08% : 0.000009s : 68: predicate.check_bprop_eliminate 0.48% : 0.000004s : 32: predicate.compare_switch_simplify 0.08% : 0.000001s : 8: predicate.const_output_eliminate 0.48% : 0.000004s : 32: predicate.depend_value_elim 1.10% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.13% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.04% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.36% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.12% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.11% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.12% : 0.000009s : 75: predicate.environ_get_depend_swap 1.66% : 0.000013s : 107: predicate.environ_get_eliminate 1.11% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.55% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.06% : 0.000016s : 97: predicate.float_depend_g_call 0.47% : 0.000004s : 32: predicate.float_environ_get_switch 0.65% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.08% : 0.000001s : 8: predicate.fold_const_symbol 0.55% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.51% : 0.000004s : 32: predicate.incorporate_call 0.46% : 0.000004s : 32: predicate.incorporate_call_switch 5.15% : 0.000041s : 244: predicate.inline 1.20% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.60% : 0.000005s : 32: predicate.less_batch_normalization 1.50% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.53% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.03% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.30% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.52% : 0.000004s : 32: predicate.merge_addn 1.06% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.04% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.04% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.17% : 0.000001s : 8: predicate.parallel_virtual_node 1.85% : 0.000015s : 97: predicate.partial_defer_inline 1.59% : 0.000013s : 89: predicate.partial_eliminate 0.99% : 0.000008s : 67: predicate.print_const_string_wrapper 0.48% : 0.000004s : 32: predicate.reduce_all_const_elim 1.18% : 0.000009s : 67: predicate.reduce_eliminate 2.45% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.31% : 0.000002s : 32: predicate.remove_not_recompute_node 1.70% : 0.000014s : 149: predicate.replace_applicator 0.56% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.02% : 0.000008s : 67: predicate.reshape_eliminate 1.10% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 8.52% : 0.000068s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.58% : 0.000005s : 32: predicate.shard_identity_eliminate 0.29% : 0.000002s : 16: predicate.special_op_eliminate 0.60% : 0.000005s : 32: predicate.specialize_transform 1.18% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.07% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.14% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.69% : 0.000013s : 97: predicate.switch_defer_inline 2.71% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.61% : 0.000037s : 265: predicate.switch_simplify 0.99% : 0.000008s : 67: predicate.tile_eliminate 0.98% : 0.000008s : 67: predicate.transpose_eliminate 1.35% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.40% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.22% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.60% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.37% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.90% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.50% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.45% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.05% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.18% : 0.000001s : 8: predicate.value_based_eliminate 0.54% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.13% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.17% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001534 32 56.63% : 0.000869s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.37% : 0.000665s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.168074 237 0.00% : 0.000003s : 1: ForceFp32Comm 1.80% : 0.003021s : 1: add_attr 1.79% : 0.003012s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000060s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000120s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.31% : 0.000525s : 1: bootstrap 0.01% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.02% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000013s : 1: environ_conv 0.03% : 0.000050s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000005s : 1: get_jit_bprop_graph 0.01% : 0.000012s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000006s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.27% : 0.000450s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000006s : 1: micro_interleaved_order_control 0.31% : 0.000515s : 1: mutable_eliminate 0.00% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000017s : 1: opt.transform.mutable_eliminate 2.90% : 0.004867s : 117: opt.transform.opt_a 0.03% : 0.000048s : 1: opt.transform.opt_after_cconv 0.02% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000175s : 28: opt.transform.opt_b 0.04% : 0.000075s : 2: opt.transform.opt_trans_graph 0.03% : 0.000055s : 4: opt.transform.symbol_engine_opt 6.80% : 0.011433s : 1: opt_a 0.08% : 0.000142s : 1: opt_after_cconv 0.30% : 0.000498s : 1: opt_after_jit_grad 0.18% : 0.000295s : 1: opt_b 8.18% : 0.013753s : 1: optimize 0.01% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000006s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.03% : 0.000050s : 1: pre_auto_parallel 0.02% : 0.000040s : 1: py_interpret_to_execute 0.01% : 0.000015s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000033s : 1: remove_dup_value 1.20% : 0.002023s : 2: renormalize.infer 0.87% : 0.001461s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000051s : 1: rewriter_after_opt_a 0.08% : 0.000131s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000103s : 1: symbol_engine_optimizer 67.59% : 0.113593s : 1: task_emit 0.06% : 0.000105s : 1: tuple_transform 6.42% : 0.010786s : 1: type_inference 0.04% : 0.000075s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x2-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x2-ge],max_mem:68.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x3-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x3-pynative],max_mem:68.0M TotalTime = 0.0213756, [24] [bootstrap]: 0.00053407 [type_inference]: 0.00613829 [event_method]: 1.408e-05 [auto_monad]: 5.466e-05 [graph_reusing]: 5.24998e-06 [inline]: 1.99e-06 [add_attr]: 0.0034002, [1] [add_attr_with_inline]: 0.00338966, [1] [Cycle 1]: 4.393e-05, [2] [tag_attr]: 1.568e-05 [meta_addattr_fg_expand]: 4.25e-06 [parallel-infer-symbol]: 2.84999e-06 [pre_auto_parallel]: 2.653e-05 [insert-virtual-dataset]: 2.91e-06 [parallel-infer-symbol-second]: 1.14e-06 [dataset_repeat_opt]: 1.79998e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00396668, [53] [py_interpret_to_execute]: 1.945e-05 [rewriter_before_opt_a]: 5.744e-05 [opt_a]: 0.0021022, [2] [Cycle 1]: 0.00150383, [45] [expand_dump_flag]: 2.56998e-06 [switch_simplify]: 3.151e-05 [loop_unroll]: 2.054e-05 [a_1]: 0.00045131 [with_stream_mark]: 1.327e-05 [recompute_prepare]: 7.77998e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.61001e-06 [updatestate_loads_eliminate]: 2.54999e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.564e-05 [accelerated_algorithm]: 6.39999e-06 [shard]: 1.87001e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.81998e-06 [merge_send_recv]: 7.96001e-06 [auto_parallel]: 5.54e-06 [parallel]: 2.257e-05 [flash_sp]: 7e-06 [merge_comm]: 3.95e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.18999e-06 [allreduce_slice_to_reducescatter]: 7.2e-07 [virtual_shard_identity]: 7.05e-06 [virtual_dataset]: 5.64e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 9.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 1.56002e-06 [before_grad]: 9.28002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.20002e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.16e-06 [after_resolve]: 1.025e-05 [a_after_grad]: 8.97e-06 [renormalize]: 0.00041341 [add_forward_monad_depend]: 4.57e-06 [auto_monad_grad]: 1.69e-06 [auto_monad_eliminator]: 1.35e-05 [cse]: 2.816e-05 [a_3]: 4.043e-05 [Cycle 2]: 0.0005888, [45] [expand_dump_flag]: 8.59989e-07 [switch_simplify]: 6.94999e-06 [loop_unroll]: 5.61998e-06 [a_1]: 0.0001251 [with_stream_mark]: 9.63002e-06 [recompute_prepare]: 5.56e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.31998e-06 [updatestate_loads_eliminate]: 2.53e-06 [parameter_eliminate]: 8.29983e-07 [a_2]: 6.733e-05 [accelerated_algorithm]: 5.39998e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.15001e-06 [shard_inline]: 5.48002e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.00999e-06 [parallel]: 4.03999e-06 [flash_sp]: 3.33e-06 [merge_comm]: 2.86e-06 [allreduce_fusion]: 2.56e-06 [matmul_add_comm_reduction]: 5.07e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.09999e-06 [virtual_dataset]: 5.22999e-06 [get_grad_eliminate_]: 4.92999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.49001e-06 [cell_reuse_recompute_pass]: 1.42999e-06 [offload_activation]: 5.70001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.78998e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 7.66001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.88998e-06 [meta_fg_expand]: 1.72999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.95001e-06 [a_after_grad]: 7.75e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 9.89996e-07 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.14999e-06 [cse]: 1.335e-05 [a_3]: 3.151e-05 [py_interpret_to_execute_after_opt_a]: 7.63999e-06 [slice_cell_reuse_recomputed_activation]: 1.83002e-06 [rewriter_after_opt_a]: 2.778e-05 [convert_after_rewriter]: 7.13e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00044758 [opt_b]: 0.00018262, [1] [Cycle 1]: 0.00017647, [7] [b_1]: 0.00010872 [b_2]: 7.03998e-06 [updatestate_depend_eliminate]: 5.43002e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.21998e-06 [renormalize]: 2.30008e-07 [cse]: 1.655e-05 [optimize_parallel_all_gather_comm]: 1.537e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.191e-05 [loop_unroll]: 0.00041415 [opt_after_cconv]: 9.641e-05, [1] [Cycle 1]: 9.046e-05, [7] [c_1]: 2.814e-05 [parameter_eliminate]: 2.26998e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.37999e-06 [cse]: 1.645e-05 [renormalize]: 2.80008e-07 [remove_dup_value]: 1.273e-05 [tuple_transform]: 6.921e-05, [1] [Cycle 1]: 6.49e-05, [4] [d_1]: 3.908e-05 [none_parameter_eliminate]: 1.57001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.77999e-06 [add_recomputation]: 7.837e-05 [cse_after_recomputation]: 2.198e-05, [1] [Cycle 1]: 1.746e-05, [1] [cse]: 1.204e-05 [environ_conv]: 4.82e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.44001e-06 [label_micro_interleaved_index]: 4.17e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.25001e-06 [slice_recompute_activation]: 2.27999e-06 [micro_interleaved_order_control]: 2.48e-06 [assign_add_opt]: 1.30001e-06 [ForceFp32Comm]: 9.79984e-07 [remove_cast_before_assign_add]: 1.10999e-06 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.71999e-06 [comm_op_add_attrs]: 1.32e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.165e-05 [grouped_pairwise_exchange_alltoall]: 1.97999e-06 [offloading_packed_experts]: 3.41001e-06 [overlap_recompute_and_grad_model_parallel]: 4.65999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.72001e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 3.75998e-06 [overlap_grad_flash_sp]: 1.664e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.02001e-06 [split_layernorm_comm]: 1.99e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.994e-05, [1] [Cycle 1]: 6.571e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.65999e-06 [elim_not_effective]: 1.233e-05 [opt_reshape]: 6.34999e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.62999e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.616e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 0.00011622 [opt_after_jit_grad]: 0.00045497 [validate]: 3.118e-05 [backend_pass]: 7.89994e-07 [task_emit]: 0.00639538 [execute]: 7.11999e-06 Sums bootstrap : 0.000534s : 3.14% type_inference : 0.006138s : 36.07% event_method : 0.000014s : 0.08% auto_monad : 0.000055s : 0.32% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000019s : 0.11% optimize.rewriter_before_opt_a : 0.000057s : 0.34% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.15% optimize.opt_a.a_1 : 0.000576s : 3.39% optimize.opt_a.with_stream_mark : 0.000023s : 0.13% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.01% optimize.opt_a.a_2 : 0.000143s : 0.84% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.03% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000010s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000414s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000042s : 0.24% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000028s : 0.16% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 2.63% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000414s : 2.43% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000078s : 0.46% optimize.cse_after_recomputation.cse : 0.000012s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000116s : 0.68% opt_after_jit_grad : 0.000455s : 2.67% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.00% task_emit : 0.006395s : 37.58% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000162 30 14.82% : 0.000024s : 5: substitution.arithmetic_simplify 1.21% : 0.000002s : 2: substitution.elim_not_effective 0.76% : 0.000001s : 2: substitution.fold_const_symbol 3.15% : 0.000005s : 4: substitution.graph_param_transform 66.65% : 0.000108s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.84% : 0.000005s : 4: substitution.remove_not_recompute_node 2.39% : 0.000004s : 4: substitution.replace_old_param 6.52% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006096 2 90.35% : 0.005507s : 1: type_inference.infer 9.65% : 0.000589s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.80% : 0.000027s : 3: replace.inline 29.20% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.73% : 0.000106s : 3: match.inline 8.27% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.88% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 19: predicate.arithmetic_simplify 0.90% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.22% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.90% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.68% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.19% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.14% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.69% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.90% : 0.000001s : 11: predicate.minmaximum_grad 1.13% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.56% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 1.02% : 0.000002s : 11: predicate.reshape_eliminate 0.65% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000002s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.82% : 0.000001s : 8: predicate.specialize_transform 0.98% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.79% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.96% : 0.000002s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.43% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.33% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.69% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.50% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000374 8 46.73% : 0.000175s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.27% : 0.000199s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030251 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.25% : 0.003404s : 1: add_attr 11.22% : 0.003393s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.27% : 0.000083s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.89% : 0.000571s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000025s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.40% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.10% : 0.000939s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000033s : 4: opt.transform.symbol_engine_opt 6.96% : 0.002105s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.54% : 0.000465s : 1: opt_after_jit_grad 0.61% : 0.000186s : 1: opt_b 13.12% : 0.003970s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.10% : 0.000031s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.70% : 0.000213s : 1: renormalize.infer 0.64% : 0.000194s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.40% : 0.000122s : 1: rewriter_after_jit_bprop_graph 0.10% : 0.000032s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000073s : 1: symbol_engine_optimizer 21.17% : 0.006406s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.34% : 0.006153s : 1: type_inference 0.19% : 0.000057s : 1: validate TotalTime = 0.0181025, [24] [bootstrap]: 0.00046654 [type_inference]: 0.00434497 [event_method]: 1.063e-05 [auto_monad]: 4.983e-05 [graph_reusing]: 4.48001e-06 [inline]: 1.59998e-06 [add_attr]: 0.00298397, [1] [add_attr_with_inline]: 0.00297594, [1] [Cycle 1]: 4.409e-05, [2] [tag_attr]: 1.13e-05 [meta_addattr_fg_expand]: 3.01999e-06 [parallel-infer-symbol]: 2.98e-06 [pre_auto_parallel]: 2.058e-05 [insert-virtual-dataset]: 2.26998e-06 [parallel-infer-symbol-second]: 6.59988e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.00363502, [53] [py_interpret_to_execute]: 1.528e-05 [rewriter_before_opt_a]: 3.955e-05 [opt_a]: 0.00184319, [2] [Cycle 1]: 0.00124631, [45] [expand_dump_flag]: 2.67001e-06 [switch_simplify]: 2.414e-05 [loop_unroll]: 1.336e-05 [a_1]: 0.00028938 [with_stream_mark]: 1.32e-05 [recompute_prepare]: 7.61999e-06 [updatestate_depend_eliminate]: 3.96001e-06 [updatestate_assign_eliminate]: 3.01001e-06 [updatestate_loads_eliminate]: 2.80002e-06 [parameter_eliminate]: 1.57999e-06 [a_2]: 7.635e-05 [accelerated_algorithm]: 6.23e-06 [shard]: 2.46998e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.80002e-06 [merge_send_recv]: 7.16001e-06 [auto_parallel]: 5.81e-06 [parallel]: 1.721e-05 [flash_sp]: 7.67998e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.29001e-06 [matmul_add_comm_reduction]: 8.80001e-06 [allreduce_slice_to_reducescatter]: 6.69999e-07 [virtual_shard_identity]: 7.25e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.66998e-06 [virtual_output]: 5.86998e-06 [merge_forward]: 3.55e-06 [cell_reuse_recompute_pass]: 1.11997e-06 [offload_activation]: 9.56998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.107e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.26002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.34001e-06 [meta_fg_expand]: 2.16e-06 [flash_sp_send_recv_attached]: 2.53e-06 [receive_attached]: 2.79001e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 9.00001e-06 [renormalize]: 0.00033836 [add_forward_monad_depend]: 5.17e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.258e-05 [cse]: 2.666e-05 [a_3]: 4.022e-05 [Cycle 2]: 0.00058778, [45] [expand_dump_flag]: 9.30013e-07 [switch_simplify]: 6.63e-06 [loop_unroll]: 5.44e-06 [a_1]: 0.0001254 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.59e-06 [updatestate_depend_eliminate]: 2.63e-06 [updatestate_assign_eliminate]: 2.14999e-06 [updatestate_loads_eliminate]: 2.32001e-06 [parameter_eliminate]: 7.2e-07 [a_2]: 6.763e-05 [accelerated_algorithm]: 5.35001e-06 [shard]: 1.04003e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.46998e-06 [merge_send_recv]: 4.03999e-06 [auto_parallel]: 5.05001e-06 [parallel]: 4.99e-06 [flash_sp]: 3.6e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 2.52001e-06 [matmul_add_comm_reduction]: 5.11002e-06 [allreduce_slice_to_reducescatter]: 4.30009e-07 [virtual_shard_identity]: 6.09001e-06 [virtual_dataset]: 5.02e-06 [get_grad_eliminate_]: 4.89e-06 [virtual_output]: 4.84e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.92001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.73998e-06 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.00005e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.98002e-06 [a_after_grad]: 7.8e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.05001e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 5.99999e-06 [cse]: 1.262e-05 [a_3]: 3.141e-05 [py_interpret_to_execute_after_opt_a]: 7.63001e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.072e-05 [convert_after_rewriter]: 6.29001e-06 [order_py_execute_after_rewriter]: 5.23002e-06 [mutable_eliminate]: 0.00044722 [opt_b]: 0.00017789, [1] [Cycle 1]: 0.00017198, [7] [b_1]: 0.0001062 [b_2]: 7.00002e-06 [updatestate_depend_eliminate]: 5.31998e-06 [updatestate_assign_eliminate]: 2.34999e-06 [updatestate_loads_eliminate]: 2.19001e-06 [renormalize]: 3.19997e-07 [cse]: 1.544e-05 [optimize_parallel_all_gather_comm]: 1.623e-05 [overlap_param_gather]: 1.75001e-06 [cconv]: 2.181e-05 [loop_unroll]: 0.00040808 [opt_after_cconv]: 9.416e-05, [1] [Cycle 1]: 8.835e-05, [7] [c_1]: 2.787e-05 [parameter_eliminate]: 2.36e-06 [updatestate_depend_eliminate]: 5.42001e-06 [updatestate_assign_eliminate]: 2.62001e-06 [updatestate_loads_eliminate]: 2.16998e-06 [cse]: 1.533e-05 [renormalize]: 2.9002e-07 [remove_dup_value]: 1.226e-05 [tuple_transform]: 6.849e-05, [1] [Cycle 1]: 6.434e-05, [4] [d_1]: 3.815e-05 [none_parameter_eliminate]: 1.69e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.59001e-06 [partial_unused_args_eliminate]: 1.82001e-06 [add_recomputation]: 4.27e-05 [cse_after_recomputation]: 2.018e-05, [1] [Cycle 1]: 1.573e-05, [1] [cse]: 1.055e-05 [environ_conv]: 4.98001e-06 [swap_dp_allreduce_reducescatter]: 5.30999e-06 [bias_add_comm_swap]: 2.48e-06 [label_micro_interleaved_index]: 4.13001e-06 [label_fine_grained_interleaved_index]: 2.76e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.44001e-06 [micro_interleaved_order_control]: 2.49999e-06 [assign_add_opt]: 1.37e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.32999e-06 [full_micro_interleaved_order_control]: 2.12999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.19e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.171e-05 [grouped_pairwise_exchange_alltoall]: 1.89e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.26002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.33002e-06 [overlap_recompute_comm]: 2.43e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.692e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.86003e-06 [handle_group_info]: 1.39e-06 [symbol_engine_optimizer]: 6.794e-05, [1] [Cycle 1]: 6.391e-05, [6] [build]: 2.12999e-06 [elim_shapecalc]: 8.18001e-06 [elim_not_effective]: 1.134e-05 [opt_reshape]: 6.01998e-06 [fold_const_symbol]: 9.17999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.42e-06 [auto_monad_reorder]: 1.526e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.81999e-06 [opt_after_jit_grad]: 0.00049162 [validate]: 3.094e-05 [backend_pass]: 1.20999e-06 [task_emit]: 0.00583174 [execute]: 6.89999e-06 Sums bootstrap : 0.000467s : 3.29% type_inference : 0.004345s : 30.66% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000004s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000415s : 2.93% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000011s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000338s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000006s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000447s : 3.16% optimize.opt_b.b_1 : 0.000106s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000408s : 2.88% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.05% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.04% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000492s : 3.47% validate : 0.000031s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005832s : 41.16% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 18.28% : 0.000022s : 4: substitution.arithmetic_simplify 1.44% : 0.000002s : 2: substitution.elim_not_effective 1.35% : 0.000002s : 2: substitution.fold_const_symbol 4.22% : 0.000005s : 4: substitution.graph_param_transform 65.63% : 0.000078s : 2: substitution.inline 2.27% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 3.26% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004304 2 92.06% : 0.003962s : 1: type_inference.infer 7.94% : 0.000342s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000135 984 0.86% : 0.000001s : 9: predicate.accumulaten_eliminater 1.06% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.66% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.75% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.82% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.85% : 0.000002s : 21: predicate.environ_get_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.87% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.86% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.24% : 0.000003s : 26: predicate.load_eliminater 1.33% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.73% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.70% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.27% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 1.02% : 0.000001s : 8: predicate.reduce_all_const_elim 0.93% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.93% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.35% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.99% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.55% : 0.000006s : 41: predicate.switch_simplify 0.73% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.54% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.11% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 1.05% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.64% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000242 6 42.51% : 0.000103s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.49% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025984 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.50% : 0.002988s : 1: add_attr 11.47% : 0.002979s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000501s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.02% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000009s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000417s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000456s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.94% : 0.000764s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.10% : 0.001846s : 1: opt_a 0.37% : 0.000097s : 1: opt_after_cconv 1.93% : 0.000501s : 1: opt_after_jit_grad 0.70% : 0.000181s : 1: opt_b 14.00% : 0.003639s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000186s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.48% : 0.005841s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.77% : 0.004359s : 1: type_inference 0.22% : 0.000057s : 1: validate TotalTime = 0.0195051, [24] [bootstrap]: 0.00045855 [type_inference]: 0.00553452 [event_method]: 1.355e-05 [auto_monad]: 5.356e-05 [graph_reusing]: 5.66998e-06 [inline]: 1.97999e-06 [add_attr]: 0.00291576, [1] [add_attr_with_inline]: 0.00290728, [1] [Cycle 1]: 4.361e-05, [2] [tag_attr]: 1.441e-05 [meta_addattr_fg_expand]: 4.09002e-06 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 2.447e-05 [insert-virtual-dataset]: 2.32001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00397093, [53] [py_interpret_to_execute]: 1.974e-05 [rewriter_before_opt_a]: 5.878e-05 [opt_a]: 0.00208221, [2] [Cycle 1]: 0.001483, [45] [expand_dump_flag]: 2.60002e-06 [switch_simplify]: 3.171e-05 [loop_unroll]: 2.052e-05 [a_1]: 0.00044598 [with_stream_mark]: 1.289e-05 [recompute_prepare]: 7.73001e-06 [updatestate_depend_eliminate]: 3.75e-06 [updatestate_assign_eliminate]: 3.56999e-06 [updatestate_loads_eliminate]: 2.87002e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.449e-05 [accelerated_algorithm]: 6.13998e-06 [shard]: 1.91e-06 [meta_shard_fg_expand]: 1.50999e-06 [shard_inline]: 5.71998e-06 [merge_send_recv]: 7.97e-06 [auto_parallel]: 5.73002e-06 [parallel]: 1.736e-05 [flash_sp]: 6.84999e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.73001e-06 [matmul_add_comm_reduction]: 8.33001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.30003e-06 [virtual_dataset]: 5.77999e-06 [get_grad_eliminate_]: 5.69e-06 [virtual_output]: 5.54998e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.77e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.062e-05 [merge_recompute_call_nodes]: 1.42999e-06 [before_grad]: 9.39e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.39999e-06 [flash_sp_send_recv_attached]: 2.22999e-06 [receive_attached]: 2.59999e-06 [after_resolve]: 1.035e-05 [a_after_grad]: 8.55999e-06 [renormalize]: 0.00040558 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.315e-05 [cse]: 2.791e-05 [a_3]: 4.084e-05 [Cycle 2]: 0.00058991, [45] [expand_dump_flag]: 1.02998e-06 [switch_simplify]: 6.40002e-06 [loop_unroll]: 5.41002e-06 [a_1]: 0.00012534 [with_stream_mark]: 1.012e-05 [recompute_prepare]: 5.47001e-06 [updatestate_depend_eliminate]: 2.69001e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.53998e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.708e-05 [accelerated_algorithm]: 5.69e-06 [shard]: 1.09e-06 [meta_shard_fg_expand]: 1.08001e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.28999e-06 [auto_parallel]: 4.98001e-06 [parallel]: 4.1e-06 [flash_sp]: 2.85002e-06 [merge_comm]: 3.05002e-06 [allreduce_fusion]: 2.69999e-06 [matmul_add_comm_reduction]: 5.20001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.28998e-06 [virtual_dataset]: 5.47001e-06 [get_grad_eliminate_]: 5.14e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.52001e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 5.91998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.59999e-06 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.19002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.96999e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.06002e-06 [after_resolve]: 9.32001e-06 [a_after_grad]: 7.92e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.14998e-06 [auto_monad_grad]: 9.39996e-07 [auto_monad_eliminator]: 5.98002e-06 [cse]: 1.642e-05 [a_3]: 3.201e-05 [py_interpret_to_execute_after_opt_a]: 7.17002e-06 [slice_cell_reuse_recomputed_activation]: 1.71e-06 [rewriter_after_opt_a]: 2.958e-05 [convert_after_rewriter]: 6.74999e-06 [order_py_execute_after_rewriter]: 5.37001e-06 [mutable_eliminate]: 0.00044909 [opt_b]: 0.0001803, [1] [Cycle 1]: 0.00017419, [7] [b_1]: 0.00010704 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 5.00999e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 2.69996e-07 [cse]: 1.641e-05 [optimize_parallel_all_gather_comm]: 1.517e-05 [overlap_param_gather]: 1.93002e-06 [cconv]: 2.172e-05 [loop_unroll]: 0.00041191 [opt_after_cconv]: 9.477e-05, [1] [Cycle 1]: 8.896e-05, [7] [c_1]: 2.786e-05 [parameter_eliminate]: 2.31998e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.52001e-06 [updatestate_loads_eliminate]: 2.39001e-06 [cse]: 1.582e-05 [renormalize]: 3.60014e-07 [remove_dup_value]: 1.196e-05 [tuple_transform]: 6.849e-05, [1] [Cycle 1]: 6.428e-05, [4] [d_1]: 3.87e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 5.91e-06 [partial_unused_args_eliminate]: 1.93002e-06 [add_recomputation]: 4.238e-05 [cse_after_recomputation]: 1.983e-05, [1] [Cycle 1]: 1.564e-05, [1] [cse]: 1.054e-05 [environ_conv]: 4.38999e-06 [swap_dp_allreduce_reducescatter]: 4.90999e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.13002e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.29999e-06 [reorder_send_recv_between_fp_bp]: 2.52001e-06 [comm_op_add_attrs]: 1.00999e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.15001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.135e-05 [grouped_pairwise_exchange_alltoall]: 1.60001e-06 [offloading_packed_experts]: 3.28998e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.34999e-06 [overlap_grad_ring_attention]: 3.66999e-06 [overlap_grad_flash_sp]: 1.671e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.36e-06 [split_layernorm_comm]: 2.20002e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 0.00013708, [1] [Cycle 1]: 0.00013304, [6] [build]: 2.24001e-06 [elim_shapecalc]: 7.97998e-06 [elim_not_effective]: 1.121e-05 [opt_reshape]: 6.29001e-06 [fold_const_symbol]: 9.20999e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.91e-06 [pipeline_parallel_scheduler]: 1.50001e-06 [auto_monad_reorder]: 1.5e-05 [get_jit_bprop_graph]: 9.20001e-07 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00044477 [validate]: 3.124e-05 [backend_pass]: 1.22999e-06 [task_emit]: 0.00581945 [execute]: 6.71999e-06 Sums bootstrap : 0.000459s : 2.94% type_inference : 0.005535s : 35.53% event_method : 0.000014s : 0.09% auto_monad : 0.000054s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000024s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.13% optimize.rewriter_before_opt_a : 0.000059s : 0.38% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.24% optimize.opt_a.loop_unroll : 0.000026s : 0.17% optimize.opt_a.a_1 : 0.000571s : 3.67% optimize.opt_a.with_stream_mark : 0.000023s : 0.15% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.91% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000021s : 0.14% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.13% optimize.opt_a.a_after_grad : 0.000016s : 0.11% optimize.opt_a.renormalize : 0.000406s : 2.60% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000044s : 0.28% optimize.opt_a.a_3 : 0.000073s : 0.47% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000449s : 2.88% optimize.opt_b.b_1 : 0.000107s : 0.69% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.14% optimize.loop_unroll : 0.000412s : 2.64% optimize.opt_after_cconv.c_1 : 0.000028s : 0.18% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.25% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.27% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000445s : 2.86% validate : 0.000031s : 0.20% backend_pass : 0.000001s : 0.01% task_emit : 0.005819s : 37.36% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000161 30 15.19% : 0.000024s : 5: substitution.arithmetic_simplify 1.16% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.28% : 0.000005s : 4: substitution.graph_param_transform 65.86% : 0.000106s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000004s : 4: substitution.remove_not_recompute_node 2.55% : 0.000004s : 4: substitution.replace_old_param 6.75% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005494 2 90.22% : 0.004957s : 1: type_inference.infer 9.78% : 0.000537s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.57% : 0.000027s : 3: replace.inline 29.43% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000113 5 91.42% : 0.000104s : 3: match.inline 8.58% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.82% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.15% : 0.000003s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.58% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.99% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.72% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.25% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.41% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.92% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.65% : 0.000001s : 8: predicate.incorporate_call 0.54% : 0.000001s : 8: predicate.incorporate_call_switch 5.98% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.58% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.89% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.35% : 0.000001s : 4: predicate.parallel_virtual_node 1.53% : 0.000002s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.32% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.54% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.70% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 11: predicate.reshape_eliminate 0.91% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.16% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.30% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.11% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.29% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.70% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.30% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.78% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000375 8 52.66% : 0.000198s : 3: func_graph_cloner_run.FuncGraphClonerGraph 47.34% : 0.000178s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.027888 196 0.01% : 0.000004s : 1: ForceFp32Comm 10.47% : 0.002920s : 1: add_attr 10.44% : 0.002911s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000059s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.77% : 0.000493s : 1: bootstrap 0.09% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.07% : 0.000019s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.51% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.64% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.35% : 0.000934s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.32% : 0.000089s : 28: opt.transform.opt_b 0.15% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.48% : 0.002085s : 1: opt_a 0.35% : 0.000098s : 1: opt_after_cconv 1.63% : 0.000454s : 1: opt_after_jit_grad 0.66% : 0.000184s : 1: opt_b 14.25% : 0.003975s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000006s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000028s : 1: pre_auto_parallel 0.08% : 0.000023s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.75% : 0.000208s : 1: renormalize.infer 0.68% : 0.000191s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000034s : 1: rewriter_after_opt_a 0.23% : 0.000063s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.50% : 0.000140s : 1: symbol_engine_optimizer 20.90% : 0.005830s : 1: task_emit 0.26% : 0.000071s : 1: tuple_transform 19.89% : 0.005548s : 1: type_inference 0.20% : 0.000056s : 1: validate TotalTime = 0.0370506, [24] [bootstrap]: 0.00050307 [type_inference]: 0.0112844 [event_method]: 4.735e-05 [auto_monad]: 0.00011928 [graph_reusing]: 8.00999e-06 [inline]: 1.97001e-06 [add_attr]: 0.00300577, [1] [add_attr_with_inline]: 0.00299643, [1] [Cycle 1]: 6.947e-05, [2] [tag_attr]: 3.425e-05 [meta_addattr_fg_expand]: 9.00001e-06 [parallel-infer-symbol]: 3.09999e-06 [pre_auto_parallel]: 4.903e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.0131881, [53] [py_interpret_to_execute]: 3.641e-05 [rewriter_before_opt_a]: 0.00014388 [opt_a]: 0.0109232, [3] [Cycle 1]: 0.00703175, [45] [expand_dump_flag]: 3.61001e-06 [switch_simplify]: 7.427e-05 [loop_unroll]: 6.18e-05 [a_1]: 0.00144886 [with_stream_mark]: 2.19e-05 [recompute_prepare]: 2.118e-05 [updatestate_depend_eliminate]: 8.80999e-06 [updatestate_assign_eliminate]: 7.64002e-06 [updatestate_loads_eliminate]: 7.51001e-06 [parameter_eliminate]: 2.64001e-06 [a_2]: 0.00024279 [accelerated_algorithm]: 3.061e-05 [shard]: 1.79e-06 [meta_shard_fg_expand]: 3.31001e-06 [shard_inline]: 1.613e-05 [merge_send_recv]: 1.574e-05 [auto_parallel]: 1.079e-05 [parallel]: 1.926e-05 [flash_sp]: 1.114e-05 [merge_comm]: 9.87999e-06 [allreduce_fusion]: 8.95999e-06 [matmul_add_comm_reduction]: 2.634e-05 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 1.792e-05 [virtual_dataset]: 1.583e-05 [get_grad_eliminate_]: 1.522e-05 [virtual_output]: 1.506e-05 [merge_forward]: 9.17999e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.731e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.855e-05 [merge_recompute_call_nodes]: 1.46002e-06 [before_grad]: 2.733e-05 [set_forward_comm_id_for_comm_node_pass]: 9.55001e-06 [meta_fg_expand]: 0.00141882 [flash_sp_send_recv_attached]: 3.87002e-06 [receive_attached]: 2.22001e-06 [after_resolve]: 5.881e-05 [a_after_grad]: 8.093e-05 [renormalize]: 0.00237498 [add_forward_monad_depend]: 9.02e-06 [auto_monad_grad]: 5.54998e-06 [auto_monad_eliminator]: 5.544e-05 [cse]: 0.00016395 [a_3]: 0.00036985 [Cycle 2]: 0.0029794, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.757e-05 [loop_unroll]: 4.419e-05 [a_1]: 0.00152621 [with_stream_mark]: 1.203e-05 [recompute_prepare]: 1.047e-05 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 4.35e-06 [updatestate_loads_eliminate]: 3.63e-06 [parameter_eliminate]: 9.39996e-07 [a_2]: 0.0001256 [accelerated_algorithm]: 1.159e-05 [shard]: 1.00999e-06 [meta_shard_fg_expand]: 1.69e-06 [shard_inline]: 9.07001e-06 [merge_send_recv]: 6.60997e-06 [auto_parallel]: 7.29001e-06 [parallel]: 4.68001e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 5.15001e-06 [allreduce_fusion]: 4.56002e-06 [matmul_add_comm_reduction]: 7.33e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 1.019e-05 [virtual_dataset]: 8.82e-06 [get_grad_eliminate_]: 8.99e-06 [virtual_output]: 8.40999e-06 [merge_forward]: 4.40999e-06 [cell_reuse_recompute_pass]: 8.30012e-07 [offload_activation]: 9.01002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.654e-05 [merge_recompute_call_nodes]: 8.70001e-07 [before_grad]: 1.406e-05 [set_forward_comm_id_for_comm_node_pass]: 5.20001e-06 [meta_fg_expand]: 6.89e-05 [flash_sp_send_recv_attached]: 9.90025e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 1.646e-05 [a_after_grad]: 1.455e-05 [renormalize]: 0.00058231 [add_forward_monad_depend]: 4.12003e-06 [auto_monad_grad]: 1.12e-06 [auto_monad_eliminator]: 1.464e-05 [cse]: 4.637e-05 [a_3]: 6.483e-05 [Cycle 3]: 0.00089859, [45] [expand_dump_flag]: 1.18001e-06 [switch_simplify]: 1.068e-05 [loop_unroll]: 9.17001e-06 [a_1]: 0.00025106 [with_stream_mark]: 9.52001e-06 [recompute_prepare]: 9.25001e-06 [updatestate_depend_eliminate]: 4.74002e-06 [updatestate_assign_eliminate]: 3.92998e-06 [updatestate_loads_eliminate]: 3.8e-06 [parameter_eliminate]: 8.80013e-07 [a_2]: 0.00012237 [accelerated_algorithm]: 1.172e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 8.84e-06 [merge_send_recv]: 6.54999e-06 [auto_parallel]: 7.11999e-06 [parallel]: 4.57e-06 [flash_sp]: 1.09e-06 [merge_comm]: 4.95001e-06 [allreduce_fusion]: 4.80999e-06 [matmul_add_comm_reduction]: 7.46001e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 9.79e-06 [virtual_dataset]: 8.37e-06 [get_grad_eliminate_]: 8.39002e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.37998e-06 [cell_reuse_recompute_pass]: 1.35999e-06 [offload_activation]: 8.43999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.596e-05 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 1.383e-05 [set_forward_comm_id_for_comm_node_pass]: 5.10999e-06 [meta_fg_expand]: 2.98998e-06 [flash_sp_send_recv_attached]: 8.70001e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 1.437e-05 [a_after_grad]: 1.495e-05 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.28002e-06 [auto_monad_grad]: 1.08001e-06 [auto_monad_eliminator]: 1.061e-05 [cse]: 2.612e-05 [a_3]: 5.99e-05 [py_interpret_to_execute_after_opt_a]: 1.016e-05 [slice_cell_reuse_recomputed_activation]: 1.81e-06 [rewriter_after_opt_a]: 6.274e-05 [convert_after_rewriter]: 9.05999e-06 [order_py_execute_after_rewriter]: 6.81999e-06 [mutable_eliminate]: 0.00045814 [opt_b]: 0.00028621, [1] [Cycle 1]: 0.00028016, [7] [b_1]: 0.00018869 [b_2]: 1.032e-05 [updatestate_depend_eliminate]: 7.28e-06 [updatestate_assign_eliminate]: 3.95e-06 [updatestate_loads_eliminate]: 3.91001e-06 [renormalize]: 4.69998e-07 [cse]: 3.028e-05 [optimize_parallel_all_gather_comm]: 2.059e-05 [overlap_param_gather]: 2.18002e-06 [cconv]: 1.984e-05 [loop_unroll]: 0.00042275 [opt_after_cconv]: 0.00013511, [1] [Cycle 1]: 0.0001291, [7] [c_1]: 4.846e-05 [parameter_eliminate]: 2.31e-06 [updatestate_depend_eliminate]: 7.37002e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 3.86999e-06 [cse]: 2.889e-05 [renormalize]: 4.30009e-07 [remove_dup_value]: 2.939e-05 [tuple_transform]: 0.00010111, [1] [Cycle 1]: 9.645e-05, [4] [d_1]: 6.637e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 9.69e-06 [partial_unused_args_eliminate]: 1.80001e-06 [add_recomputation]: 5.589e-05 [cse_after_recomputation]: 3.194e-05, [1] [Cycle 1]: 2.724e-05, [1] [cse]: 2.182e-05 [environ_conv]: 8.82e-06 [swap_dp_allreduce_reducescatter]: 7.61001e-06 [bias_add_comm_swap]: 2.58998e-06 [label_micro_interleaved_index]: 4.10998e-06 [label_fine_grained_interleaved_index]: 2.85998e-06 [merge_cast_opt]: 1.59e-06 [slice_recompute_activation]: 2.17001e-06 [micro_interleaved_order_control]: 2.29001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.39994e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 1.96998e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.15001e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.58002e-06 [control_data_broadcast_order]: 1.748e-05 [grouped_pairwise_exchange_alltoall]: 1.45999e-06 [offloading_packed_experts]: 5.19e-06 [overlap_recompute_and_grad_model_parallel]: 5.38002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.18001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 5.00001e-06 [overlap_grad_flash_sp]: 2.376e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.98997e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 9.756e-05, [1] [Cycle 1]: 9.335e-05, [6] [build]: 9.57001e-06 [elim_shapecalc]: 1.32e-05 [elim_not_effective]: 1.798e-05 [opt_reshape]: 1.016e-05 [fold_const_symbol]: 1.459e-05 [renormalize]: 2.09984e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 2.489e-05 [get_jit_bprop_graph]: 1.09e-06 [rewriter_after_jit_bprop_graph]: 3.47002e-06 [opt_after_jit_grad]: 0.00047316 [validate]: 4.402e-05 [backend_pass]: 9.50007e-07 [task_emit]: 0.00807103 [execute]: 6.73998e-06 Sums bootstrap : 0.000503s : 1.53% type_inference : 0.011284s : 34.40% event_method : 0.000047s : 0.14% auto_monad : 0.000119s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000034s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000049s : 0.15% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000036s : 0.11% optimize.rewriter_before_opt_a : 0.000144s : 0.44% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000133s : 0.40% optimize.opt_a.loop_unroll : 0.000115s : 0.35% optimize.opt_a.a_1 : 0.003226s : 9.83% optimize.opt_a.with_stream_mark : 0.000043s : 0.13% optimize.opt_a.recompute_prepare : 0.000041s : 0.12% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000491s : 1.50% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.16% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.10% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000029s : 0.09% optimize.opt_a.flash_sp : 0.000016s : 0.05% optimize.opt_a.merge_comm : 0.000020s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.10% optimize.opt_a.virtual_output : 0.000032s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.17% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.06% optimize.opt_a.meta_fg_expand : 0.001491s : 4.54% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000090s : 0.27% optimize.opt_a.a_after_grad : 0.000110s : 0.34% optimize.opt_a.renormalize : 0.002957s : 9.01% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.04% optimize.opt_a.auto_monad_grad : 0.000008s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000081s : 0.25% optimize.opt_a.cse : 0.000236s : 0.72% optimize.opt_a.a_3 : 0.000495s : 1.51% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000063s : 0.19% optimize.convert_after_rewriter : 0.000009s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000458s : 1.40% optimize.opt_b.b_1 : 0.000189s : 0.58% optimize.opt_b.b_2 : 0.000010s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000030s : 0.09% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000423s : 1.29% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.20% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.17% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.02% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.07% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.05% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.04% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000473s : 1.44% validate : 0.000044s : 0.13% backend_pass : 0.000001s : 0.00% task_emit : 0.008071s : 24.60% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000751 222 5.91% : 0.000044s : 12: substitution.arithmetic_simplify 1.85% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.47% : 0.000004s : 5: substitution.float_depend_g_call 0.54% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 0.99% : 0.000007s : 8: substitution.graph_param_transform 0.40% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 55.49% : 0.000417s : 17: substitution.inline 2.05% : 0.000015s : 2: substitution.inline_without_move 1.32% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.94% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.76% : 0.000006s : 5: substitution.partial_eliminate 1.76% : 0.000013s : 20: substitution.remove_not_recompute_node 3.10% : 0.000023s : 10: substitution.replace_applicator 1.38% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.69% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.87% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.38% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.77% : 0.000066s : 30: substitution.tuple_list_get_item_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.011212 2 86.80% : 0.009732s : 1: type_inference.infer 13.20% : 0.001480s : 1: type_inference.specialize ------[replace.] 0.000220 33 57.84% : 0.000127s : 17: replace.inline 42.16% : 0.000093s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000442 33 92.27% : 0.000408s : 17: match.inline 7.73% : 0.000034s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000750 5764 1.09% : 0.000008s : 68: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.51% : 0.000004s : 32: predicate.addn_check_dump 1.06% : 0.000008s : 68: predicate.addn_zero_filter 1.04% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 100: predicate.arithmetic_simplify 1.14% : 0.000009s : 68: predicate.cast_eliminate 1.15% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.17% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.21% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.12% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 76: predicate.environ_get_depend_swap 1.74% : 0.000013s : 108: predicate.environ_get_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.73% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 101: predicate.float_depend_g_call 0.52% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000042s : 249: predicate.inline 1.26% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 100: predicate.list_to_tuple_eliminator_ 2.66% : 0.000020s : 168: predicate.load_eliminater 0.33% : 0.000003s : 8: predicate.loop_unroll_after_grad 2.28% : 0.000017s : 136: predicate.loop_unroll_before_grad 1.40% : 0.000011s : 84: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000008s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.12% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.14% : 0.000001s : 8: predicate.opt_reshape 0.18% : 0.000001s : 8: predicate.parallel_virtual_node 2.02% : 0.000015s : 101: predicate.partial_defer_inline 1.74% : 0.000013s : 92: predicate.partial_eliminate 1.09% : 0.000008s : 68: predicate.print_const_string_wrapper 0.53% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.68% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.92% : 0.000014s : 152: predicate.replace_applicator 0.59% : 0.000004s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.06% : 0.000008s : 68: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.14% : 0.000001s : 8: predicate.row_tensor_eliminate 1.30% : 0.000010s : 68: predicate.same_eliminate 0.36% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.61% : 0.000005s : 32: predicate.specialize_transform 1.31% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.17% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.87% : 0.000014s : 101: predicate.switch_defer_inline 2.98% : 0.000022s : 169: predicate.switch_layer_defer_inline 5.02% : 0.000038s : 277: predicate.switch_simplify 1.07% : 0.000008s : 68: predicate.tile_eliminate 1.06% : 0.000008s : 68: predicate.transpose_eliminate 1.45% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000010s : 84: predicate.tuple_list_get_item_depend_reorder 2.88% : 0.000022s : 132: predicate.tuple_list_get_item_eliminator 1.44% : 0.000011s : 84: predicate.tuple_list_get_set_item_eliminator 1.96% : 0.000015s : 116: predicate.tuple_list_set_item_eliminator 1.61% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.64% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 200: predicate.updatestate_useless_node_eliminater 0.14% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001560 34 56.16% : 0.000876s : 13: func_graph_cloner_run.FuncGraphClonerGraph 43.84% : 0.000684s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.061451 237 0.01% : 0.000003s : 1: ForceFp32Comm 4.90% : 0.003011s : 1: add_attr 4.88% : 0.003001s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000126s : 1: auto_monad 0.05% : 0.000029s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.87% : 0.000537s : 1: bootstrap 0.04% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000012s : 1: convert_after_rewriter 0.06% : 0.000035s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.09% : 0.000055s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.70% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.76% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.01% : 0.004920s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.28% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.78% : 0.010926s : 1: opt_a 0.23% : 0.000138s : 1: opt_after_cconv 0.79% : 0.000483s : 1: opt_after_jit_grad 0.47% : 0.000290s : 1: opt_b 21.47% : 0.013192s : 1: optimize 0.04% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000010s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.09% : 0.000054s : 1: pre_auto_parallel 0.07% : 0.000040s : 1: py_interpret_to_execute 0.02% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000034s : 1: remove_dup_value 2.58% : 0.001587s : 2: renormalize.infer 2.21% : 0.001358s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000067s : 1: rewriter_after_opt_a 0.24% : 0.000148s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.02% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.16% : 0.000100s : 1: symbol_engine_optimizer 13.15% : 0.008081s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 18.39% : 0.011299s : 1: type_inference 0.12% : 0.000074s : 1: validate TotalTime = 0.0183049, [24] [bootstrap]: 0.00049229 [type_inference]: 0.00428946 [event_method]: 1.044e-05 [auto_monad]: 4.874e-05 [graph_reusing]: 5.39e-06 [inline]: 1.67999e-06 [add_attr]: 0.00294442, [1] [add_attr_with_inline]: 0.00293666, [1] [Cycle 1]: 4.447e-05, [2] [tag_attr]: 1.212e-05 [meta_addattr_fg_expand]: 2.78e-06 [parallel-infer-symbol]: 2.78998e-06 [pre_auto_parallel]: 2.171e-05 [insert-virtual-dataset]: 2.19999e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.87001e-06 [pipeline_split]: 1.66002e-06 [optimize]: 0.00366854, [53] [py_interpret_to_execute]: 1.466e-05 [rewriter_before_opt_a]: 3.8e-05 [opt_a]: 0.00188294, [2] [Cycle 1]: 0.00128905, [45] [expand_dump_flag]: 2.81999e-06 [switch_simplify]: 2.345e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00033714 [with_stream_mark]: 1.349e-05 [recompute_prepare]: 7.28e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.10998e-06 [updatestate_loads_eliminate]: 3.08998e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.654e-05 [accelerated_algorithm]: 6.35002e-06 [shard]: 2.20002e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.91e-06 [merge_send_recv]: 7.70998e-06 [auto_parallel]: 5.72001e-06 [parallel]: 1.711e-05 [flash_sp]: 7.03e-06 [merge_comm]: 3.75e-06 [allreduce_fusion]: 3.6e-06 [matmul_add_comm_reduction]: 9.39e-06 [allreduce_slice_to_reducescatter]: 9.30013e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 6.06e-06 [get_grad_eliminate_]: 5.19e-06 [virtual_output]: 5.92999e-06 [merge_forward]: 3.93001e-06 [cell_reuse_recompute_pass]: 1.14e-06 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.122e-05 [merge_recompute_call_nodes]: 1.40999e-06 [before_grad]: 9.91e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.11998e-06 [flash_sp_send_recv_attached]: 2.23998e-06 [receive_attached]: 2.10002e-06 [after_resolve]: 1.08e-05 [a_after_grad]: 8.73001e-06 [renormalize]: 0.00033564 [add_forward_monad_depend]: 4.50999e-06 [auto_monad_grad]: 1.66002e-06 [auto_monad_eliminator]: 1.289e-05 [cse]: 2.664e-05 [a_3]: 3.956e-05 [Cycle 2]: 0.00058489, [45] [expand_dump_flag]: 8.39995e-07 [switch_simplify]: 6.60997e-06 [loop_unroll]: 5.29e-06 [a_1]: 0.00012403 [with_stream_mark]: 1.093e-05 [recompute_prepare]: 5.67001e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.30002e-06 [updatestate_loads_eliminate]: 2.49001e-06 [parameter_eliminate]: 8.10018e-07 [a_2]: 6.736e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.61998e-06 [merge_send_recv]: 4.18999e-06 [auto_parallel]: 5.77001e-06 [parallel]: 4.03001e-06 [flash_sp]: 2.84999e-06 [merge_comm]: 2.90002e-06 [allreduce_fusion]: 2.63998e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 5.94999e-06 [virtual_dataset]: 5.14e-06 [get_grad_eliminate_]: 4.90001e-06 [virtual_output]: 4.84998e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 5.87999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.50001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.77998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 1.54e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 1.09e-06 [after_resolve]: 8.75999e-06 [a_after_grad]: 7.98999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.07001e-06 [cse]: 1.221e-05 [a_3]: 3.148e-05 [py_interpret_to_execute_after_opt_a]: 6.93e-06 [slice_cell_reuse_recomputed_activation]: 2.09999e-06 [rewriter_after_opt_a]: 3.124e-05 [convert_after_rewriter]: 6.67002e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.00044196 [opt_b]: 0.00018011, [1] [Cycle 1]: 0.00017412, [7] [b_1]: 0.00010739 [b_2]: 7.03e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.39999e-06 [updatestate_loads_eliminate]: 2.27999e-06 [renormalize]: 4.2998e-07 [cse]: 1.529e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 2.15002e-06 [cconv]: 2.266e-05 [loop_unroll]: 0.00041034 [opt_after_cconv]: 9.426e-05, [1] [Cycle 1]: 8.861e-05, [7] [c_1]: 2.775e-05 [parameter_eliminate]: 1.97001e-06 [updatestate_depend_eliminate]: 5.39998e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.524e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.222e-05 [tuple_transform]: 6.838e-05, [1] [Cycle 1]: 6.417e-05, [4] [d_1]: 3.869e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 5.99999e-06 [partial_unused_args_eliminate]: 1.54e-06 [add_recomputation]: 4.165e-05 [cse_after_recomputation]: 1.973e-05, [1] [Cycle 1]: 1.546e-05, [1] [cse]: 1.037e-05 [environ_conv]: 4.76002e-06 [swap_dp_allreduce_reducescatter]: 4.90999e-06 [bias_add_comm_swap]: 2.34001e-06 [label_micro_interleaved_index]: 4.58999e-06 [label_fine_grained_interleaved_index]: 2.67001e-06 [merge_cast_opt]: 1.23002e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.57001e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.80013e-07 [full_micro_interleaved_order_control]: 2.44001e-06 [reorder_send_recv_between_fp_bp]: 2.49999e-06 [comm_op_add_attrs]: 9.49978e-07 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.168e-05 [grouped_pairwise_exchange_alltoall]: 1.50001e-06 [offloading_packed_experts]: 3.6e-06 [overlap_recompute_and_grad_model_parallel]: 4.70001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.22999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.43002e-06 [overlap_recompute_comm]: 2.19999e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.64e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 2.00002e-06 [split_layernorm_comm]: 1.87999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.777e-05, [1] [Cycle 1]: 6.362e-05, [6] [build]: 2.02999e-06 [elim_shapecalc]: 8.03999e-06 [elim_not_effective]: 1.16e-05 [opt_reshape]: 6.14001e-06 [fold_const_symbol]: 8.75001e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.70001e-06 [pipeline_parallel_scheduler]: 1.59e-06 [auto_monad_reorder]: 1.501e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.0004401 [validate]: 4.18e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.00611184 [execute]: 7.42998e-06 Sums bootstrap : 0.000492s : 3.42% type_inference : 0.004289s : 29.76% event_method : 0.000010s : 0.07% auto_monad : 0.000049s : 0.34% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.26% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000030s : 0.21% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000461s : 3.20% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.00% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000336s : 2.33% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.13% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000071s : 0.49% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000442s : 3.07% optimize.opt_b.b_1 : 0.000107s : 0.74% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000410s : 2.85% optimize.opt_after_cconv.c_1 : 0.000028s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000440s : 3.05% validate : 0.000042s : 0.29% backend_pass : 0.000001s : 0.01% task_emit : 0.006112s : 42.40% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000165 26 13.14% : 0.000022s : 4: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.11% : 0.000005s : 4: substitution.graph_param_transform 75.22% : 0.000124s : 2: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.61% : 0.000004s : 4: substitution.remove_not_recompute_node 2.47% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004248 2 91.98% : 0.003908s : 1: type_inference.infer 8.02% : 0.000341s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000123 2 100.00% : 0.000123s : 2: match.inline ------[predicate.] 0.000136 984 0.87% : 0.000001s : 9: predicate.accumulaten_eliminater 0.97% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.89% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.26% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.91% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.99% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.20% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.14% : 0.000002s : 8: predicate.less_batch_normalization 1.54% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.21% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.83% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.52% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 0.90% : 0.000001s : 9: predicate.reduce_eliminate 2.15% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 8: predicate.remove_not_recompute_node 1.33% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.44% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.85% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.99% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.96% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.52% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.86% : 0.000001s : 9: predicate.transpose_eliminate 1.56% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.35% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.21% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.84% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.62% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.65% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.35% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026220 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.25% : 0.002948s : 1: add_attr 11.21% : 0.002940s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000045s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000054s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.00% : 0.000525s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.60% : 0.000419s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.72% : 0.000451s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 3.09% : 0.000810s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000090s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.19% : 0.001886s : 1: opt_a 0.37% : 0.000098s : 1: opt_after_cconv 1.71% : 0.000449s : 1: opt_after_jit_grad 0.70% : 0.000184s : 1: opt_b 14.01% : 0.003673s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.70% : 0.000185s : 1: renormalize.infer 0.55% : 0.000144s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000035s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 23.35% : 0.006121s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.41% : 0.004302s : 1: type_inference 0.26% : 0.000067s : 1: validate TotalTime = 0.0355396, [24] [bootstrap]: 0.00050681 [type_inference]: 0.0101732 [event_method]: 4.132e-05 [auto_monad]: 0.00011387 [graph_reusing]: 7.75998e-06 [inline]: 2.02001e-06 [add_attr]: 0.00292466, [1] [add_attr_with_inline]: 0.00291622, [1] [Cycle 1]: 6.647e-05, [2] [tag_attr]: 3.138e-05 [meta_addattr_fg_expand]: 8.3e-06 [parallel-infer-symbol]: 2.89999e-06 [pre_auto_parallel]: 4.532e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 2.20002e-06 [pipeline_split]: 1.82999e-06 [optimize]: 0.0129454, [53] [py_interpret_to_execute]: 3.508e-05 [rewriter_before_opt_a]: 0.00012656 [opt_a]: 0.0106852, [3] [Cycle 1]: 0.00680658, [45] [expand_dump_flag]: 3.55998e-06 [switch_simplify]: 6.694e-05 [loop_unroll]: 5.4e-05 [a_1]: 0.00134443 [with_stream_mark]: 2.248e-05 [recompute_prepare]: 2.124e-05 [updatestate_depend_eliminate]: 8.90001e-06 [updatestate_assign_eliminate]: 7.87998e-06 [updatestate_loads_eliminate]: 7.21001e-06 [parameter_eliminate]: 2.53e-06 [a_2]: 0.00024356 [accelerated_algorithm]: 3.104e-05 [shard]: 1.79e-06 [meta_shard_fg_expand]: 3.21001e-06 [shard_inline]: 1.602e-05 [merge_send_recv]: 1.566e-05 [auto_parallel]: 1.086e-05 [parallel]: 1.812e-05 [flash_sp]: 1.105e-05 [merge_comm]: 9.62999e-06 [allreduce_fusion]: 8.94e-06 [matmul_add_comm_reduction]: 2.652e-05 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 1.751e-05 [virtual_dataset]: 1.545e-05 [get_grad_eliminate_]: 1.495e-05 [virtual_output]: 1.494e-05 [merge_forward]: 9.16998e-06 [cell_reuse_recompute_pass]: 9.79984e-07 [offload_activation]: 1.752e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.811e-05 [merge_recompute_call_nodes]: 1.52001e-06 [before_grad]: 2.722e-05 [set_forward_comm_id_for_comm_node_pass]: 9.65002e-06 [meta_fg_expand]: 0.00135974 [flash_sp_send_recv_attached]: 4.20999e-06 [receive_attached]: 2.31998e-06 [after_resolve]: 5.885e-05 [a_after_grad]: 8.074e-05 [renormalize]: 0.00236888 [add_forward_monad_depend]: 9.54e-06 [auto_monad_grad]: 5.14e-06 [auto_monad_eliminator]: 5.491e-05 [cse]: 0.00016076 [a_3]: 0.00033471 [Cycle 2]: 0.00297749, [45] [expand_dump_flag]: 1.50999e-06 [switch_simplify]: 4.654e-05 [loop_unroll]: 4.362e-05 [a_1]: 0.00156807 [with_stream_mark]: 1.205e-05 [recompute_prepare]: 1.095e-05 [updatestate_depend_eliminate]: 5.13002e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.64002e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 0.00012606 [accelerated_algorithm]: 1.177e-05 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.34998e-06 [merge_send_recv]: 6.79999e-06 [auto_parallel]: 7.08e-06 [parallel]: 4.84e-06 [flash_sp]: 2.99001e-06 [merge_comm]: 4.88001e-06 [allreduce_fusion]: 4.63999e-06 [matmul_add_comm_reduction]: 7.93999e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 1.019e-05 [virtual_dataset]: 8.64e-06 [get_grad_eliminate_]: 8.87999e-06 [virtual_output]: 8.25e-06 [merge_forward]: 4.18001e-06 [cell_reuse_recompute_pass]: 8.89995e-07 [offload_activation]: 9.13002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.609e-05 [merge_recompute_call_nodes]: 8.50006e-07 [before_grad]: 1.461e-05 [set_forward_comm_id_for_comm_node_pass]: 5.81998e-06 [meta_fg_expand]: 3.39e-05 [flash_sp_send_recv_attached]: 9.30013e-07 [receive_attached]: 1.04003e-06 [after_resolve]: 1.488e-05 [a_after_grad]: 1.458e-05 [renormalize]: 0.00057197 [add_forward_monad_depend]: 3.74002e-06 [auto_monad_grad]: 1.20001e-06 [auto_monad_eliminator]: 1.407e-05 [cse]: 4.589e-05 [a_3]: 6.457e-05 [Cycle 3]: 0.0008874, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 1.076e-05 [loop_unroll]: 8.79e-06 [a_1]: 0.00024829 [with_stream_mark]: 9.55001e-06 [recompute_prepare]: 9.29e-06 [updatestate_depend_eliminate]: 4.85001e-06 [updatestate_assign_eliminate]: 4.03001e-06 [updatestate_loads_eliminate]: 3.85998e-06 [parameter_eliminate]: 8.70001e-07 [a_2]: 0.00012278 [accelerated_algorithm]: 1.161e-05 [shard]: 8.59989e-07 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 8.95999e-06 [merge_send_recv]: 6.84999e-06 [auto_parallel]: 7.05e-06 [parallel]: 4.62e-06 [flash_sp]: 1.12999e-06 [merge_comm]: 4.80999e-06 [allreduce_fusion]: 4.83001e-06 [matmul_add_comm_reduction]: 7.4e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 9.81e-06 [virtual_dataset]: 8.55001e-06 [get_grad_eliminate_]: 8.38999e-06 [virtual_output]: 8.15999e-06 [merge_forward]: 4.29002e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 8.48999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.604e-05 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 1.358e-05 [set_forward_comm_id_for_comm_node_pass]: 5.07e-06 [meta_fg_expand]: 2.98998e-06 [flash_sp_send_recv_attached]: 7.49977e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 1.291e-05 [a_after_grad]: 1.381e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.18001e-06 [auto_monad_grad]: 9.00007e-07 [auto_monad_eliminator]: 1.046e-05 [cse]: 2.468e-05 [a_3]: 5.661e-05 [py_interpret_to_execute_after_opt_a]: 9.86e-06 [slice_cell_reuse_recomputed_activation]: 2.02999e-06 [rewriter_after_opt_a]: 4.706e-05 [convert_after_rewriter]: 9.53002e-06 [order_py_execute_after_rewriter]: 7.28e-06 [mutable_eliminate]: 0.00045303 [opt_b]: 0.00028438, [1] [Cycle 1]: 0.00027806, [7] [b_1]: 0.00018792 [b_2]: 1.082e-05 [updatestate_depend_eliminate]: 6.81001e-06 [updatestate_assign_eliminate]: 3.97e-06 [updatestate_loads_eliminate]: 3.8e-06 [renormalize]: 3.59985e-07 [cse]: 3.064e-05 [optimize_parallel_all_gather_comm]: 1.958e-05 [overlap_param_gather]: 1.77001e-06 [cconv]: 1.963e-05 [loop_unroll]: 0.00042161 [opt_after_cconv]: 0.00013556, [1] [Cycle 1]: 0.00012969, [7] [c_1]: 4.846e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 7.20003e-06 [updatestate_assign_eliminate]: 4.45e-06 [updatestate_loads_eliminate]: 3.80998e-06 [cse]: 2.934e-05 [renormalize]: 5.69999e-07 [remove_dup_value]: 2.927e-05 [tuple_transform]: 0.00010093, [1] [Cycle 1]: 9.628e-05, [4] [d_1]: 6.619e-05 [none_parameter_eliminate]: 1.62999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 1.01e-05 [partial_unused_args_eliminate]: 1.77001e-06 [add_recomputation]: 5.559e-05 [cse_after_recomputation]: 3.154e-05, [1] [Cycle 1]: 2.712e-05, [1] [cse]: 2.177e-05 [environ_conv]: 8.68001e-06 [swap_dp_allreduce_reducescatter]: 4.75e-05 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.15e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.27e-06 [slice_recompute_activation]: 2.31e-06 [micro_interleaved_order_control]: 2.69001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.08002e-06 [reorder_send_recv_between_fp_bp]: 2.50002e-06 [comm_op_add_attrs]: 1.10999e-06 [add_comm_op_reuse_tag]: 9.60019e-07 [interleave_split_concat_branches]: 1.08001e-06 [interleave_parallel_branches]: 9.90025e-07 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.92001e-06 [control_data_broadcast_order]: 1.698e-05 [grouped_pairwise_exchange_alltoall]: 1.47999e-06 [offloading_packed_experts]: 4.85999e-06 [overlap_recompute_and_grad_model_parallel]: 5.94e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.10002e-06 [overlap_grad_ring_attention]: 4.95001e-06 [overlap_grad_flash_sp]: 2.36e-05 [begin_end_overlap_inline]: 6.40022e-07 [split_matmul_comm_elemetwise]: 2.11998e-06 [split_layernorm_comm]: 1.59998e-06 [handle_group_info]: 9.70002e-07 [symbol_engine_optimizer]: 9.709e-05, [1] [Cycle 1]: 9.286e-05, [6] [build]: 8.87e-06 [elim_shapecalc]: 1.35e-05 [elim_not_effective]: 1.774e-05 [opt_reshape]: 1.004e-05 [fold_const_symbol]: 1.44e-05 [renormalize]: 2.10013e-07 [detach_backward]: 1.64e-06 [pipeline_parallel_scheduler]: 1.32999e-06 [auto_monad_reorder]: 2.458e-05 [get_jit_bprop_graph]: 1.04998e-06 [rewriter_after_jit_bprop_graph]: 3.31999e-06 [opt_after_jit_grad]: 0.00046268 [validate]: 4.299e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00801778 [execute]: 6.84999e-06 Sums bootstrap : 0.000507s : 1.62% type_inference : 0.010173s : 32.43% event_method : 0.000041s : 0.13% auto_monad : 0.000114s : 0.36% graph_reusing : 0.000008s : 0.02% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.10% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000008s : 0.03% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000045s : 0.14% insert-virtual-dataset : 0.000003s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000035s : 0.11% optimize.rewriter_before_opt_a : 0.000127s : 0.40% optimize.opt_a.expand_dump_flag : 0.000006s : 0.02% optimize.opt_a.switch_simplify : 0.000124s : 0.40% optimize.opt_a.loop_unroll : 0.000106s : 0.34% optimize.opt_a.a_1 : 0.003161s : 10.08% optimize.opt_a.with_stream_mark : 0.000044s : 0.14% optimize.opt_a.recompute_prepare : 0.000041s : 0.13% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.06% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.05% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.05% optimize.opt_a.parameter_eliminate : 0.000004s : 0.01% optimize.opt_a.a_2 : 0.000492s : 1.57% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.17% optimize.opt_a.shard : 0.000004s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.02% optimize.opt_a.shard_inline : 0.000034s : 0.11% optimize.opt_a.merge_send_recv : 0.000029s : 0.09% optimize.opt_a.auto_parallel : 0.000025s : 0.08% optimize.opt_a.parallel : 0.000028s : 0.09% optimize.opt_a.flash_sp : 0.000015s : 0.05% optimize.opt_a.merge_comm : 0.000019s : 0.06% optimize.opt_a.allreduce_fusion : 0.000018s : 0.06% optimize.opt_a.matmul_add_comm_reduction : 0.000042s : 0.13% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.12% optimize.opt_a.virtual_dataset : 0.000033s : 0.10% optimize.opt_a.get_grad_eliminate_ : 0.000032s : 0.10% optimize.opt_a.virtual_output : 0.000031s : 0.10% optimize.opt_a.merge_forward : 0.000018s : 0.06% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000035s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000060s : 0.19% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.01% optimize.opt_a.before_grad : 0.000055s : 0.18% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.07% optimize.opt_a.meta_fg_expand : 0.001397s : 4.45% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000087s : 0.28% optimize.opt_a.a_after_grad : 0.000109s : 0.35% optimize.opt_a.renormalize : 0.002941s : 9.37% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.05% optimize.opt_a.auto_monad_grad : 0.000007s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000079s : 0.25% optimize.opt_a.cse : 0.000231s : 0.74% optimize.opt_a.a_3 : 0.000456s : 1.45% optimize.py_interpret_to_execute_after_opt_a : 0.000010s : 0.03% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000047s : 0.15% optimize.convert_after_rewriter : 0.000010s : 0.03% optimize.order_py_execute_after_rewriter : 0.000007s : 0.02% optimize.mutable_eliminate : 0.000453s : 1.44% optimize.opt_b.b_1 : 0.000188s : 0.60% optimize.opt_b.b_2 : 0.000011s : 0.03% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000031s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000020s : 0.06% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000020s : 0.06% optimize.loop_unroll : 0.000422s : 1.34% optimize.opt_after_cconv.c_1 : 0.000048s : 0.15% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.02% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.01% optimize.opt_after_cconv.cse : 0.000029s : 0.09% optimize.opt_after_cconv.renormalize : 0.000001s : 0.00% optimize.remove_dup_value : 0.000029s : 0.09% optimize.tuple_transform.d_1 : 0.000066s : 0.21% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.03% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000056s : 0.18% optimize.cse_after_recomputation.cse : 0.000022s : 0.07% optimize.environ_conv : 0.000009s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000047s : 0.15% optimize.bias_add_comm_swap : 0.000003s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.01% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000017s : 0.05% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.02% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000005s : 0.02% optimize.overlap_grad_flash_sp : 0.000024s : 0.08% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.03% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000013s : 0.04% optimize.symbol_engine_optimizer.elim_not_effective : 0.000018s : 0.06% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000014s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.08% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000463s : 1.47% validate : 0.000043s : 0.14% backend_pass : 0.000001s : 0.00% task_emit : 0.008018s : 25.56% execute : 0.000007s : 0.02% Time group info: ------[substitution.] 0.000741 218 5.79% : 0.000043s : 11: substitution.arithmetic_simplify 1.88% : 0.000014s : 2: substitution.cast_eliminate 0.35% : 0.000003s : 5: substitution.elim_not_effective 0.53% : 0.000004s : 5: substitution.float_depend_g_call 0.58% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.28% : 0.000002s : 5: substitution.fold_const_symbol 1.01% : 0.000007s : 8: substitution.graph_param_transform 0.37% : 0.000003s : 2: substitution.incorporate_call 0.25% : 0.000002s : 2: substitution.incorporate_call_switch 55.66% : 0.000412s : 16: substitution.inline 2.05% : 0.000015s : 2: substitution.inline_without_move 1.39% : 0.000010s : 20: substitution.j_node_and_user_rematch 2.10% : 0.000016s : 3: substitution.less_batch_normalization 1.75% : 0.000013s : 11: substitution.minmaximum_grad 0.73% : 0.000005s : 5: substitution.partial_eliminate 1.77% : 0.000013s : 20: substitution.remove_not_recompute_node 3.20% : 0.000024s : 10: substitution.replace_applicator 1.36% : 0.000010s : 15: substitution.replace_old_param 0.31% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.67% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.83% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.25% : 0.000061s : 28: substitution.tuple_list_get_item_eliminator 2.48% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010104 2 87.25% : 0.008815s : 1: type_inference.infer 12.75% : 0.001289s : 1: type_inference.specialize ------[replace.] 0.000200 30 59.06% : 0.000118s : 16: replace.inline 40.94% : 0.000082s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000434 30 93.01% : 0.000404s : 16: match.inline 6.99% : 0.000030s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000728 5663 1.09% : 0.000008s : 67: predicate.accumulaten_eliminater 0.30% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.08% : 0.000008s : 67: predicate.addn_zero_filter 1.07% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.03% : 0.000015s : 99: predicate.arithmetic_simplify 1.14% : 0.000008s : 67: predicate.cast_eliminate 1.16% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.21% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.19% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.13% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.16% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_depend_swap 1.76% : 0.000013s : 107: predicate.environ_get_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.70% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.24% : 0.000016s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.10% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.51% : 0.000004s : 32: predicate.incorporate_call_switch 5.63% : 0.000041s : 244: predicate.inline 1.28% : 0.000009s : 55: predicate.inline_without_move 0.31% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.63% : 0.000005s : 32: predicate.less_batch_normalization 1.61% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.70% : 0.000020s : 164: predicate.load_eliminater 0.31% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.18% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.39% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.55% : 0.000004s : 32: predicate.merge_addn 1.19% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.15% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.14% : 0.000008s : 67: predicate.minmaximum_grad 0.31% : 0.000002s : 8: predicate.mutable_eliminate 0.17% : 0.000001s : 8: predicate.opt_reshape 0.15% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000014s : 97: predicate.partial_defer_inline 1.71% : 0.000012s : 89: predicate.partial_eliminate 1.08% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.26% : 0.000009s : 67: predicate.reduce_eliminate 2.68% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000002s : 32: predicate.remove_not_recompute_node 1.87% : 0.000014s : 149: predicate.replace_applicator 0.60% : 0.000004s : 55: predicate.replace_old_param 0.11% : 0.000001s : 8: predicate.reset_defer_inline 1.09% : 0.000008s : 67: predicate.reshape_eliminate 1.16% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.20% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000004s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.63% : 0.000005s : 32: predicate.specialize_transform 1.25% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.16% : 0.000008s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000013s : 97: predicate.switch_defer_inline 2.95% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.93% : 0.000036s : 265: predicate.switch_simplify 1.08% : 0.000008s : 67: predicate.tile_eliminate 1.09% : 0.000008s : 67: predicate.transpose_eliminate 1.48% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.56% : 0.000011s : 83: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000010s : 83: predicate.tuple_list_get_item_depend_reorder 2.80% : 0.000020s : 129: predicate.tuple_list_get_item_eliminator 1.46% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 1.99% : 0.000014s : 115: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.66% : 0.000019s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000024s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.56% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001449 32 57.49% : 0.000833s : 12: func_graph_cloner_run.FuncGraphClonerGraph 42.51% : 0.000616s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.059476 237 0.01% : 0.000004s : 1: ForceFp32Comm 4.92% : 0.002929s : 1: add_attr 4.91% : 0.002920s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.10% : 0.000060s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000121s : 1: auto_monad 0.05% : 0.000028s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.91% : 0.000541s : 1: bootstrap 0.04% : 0.000023s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.03% : 0.000020s : 1: control_data_broadcast_order 0.02% : 0.000013s : 1: convert_after_rewriter 0.06% : 0.000034s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.02% : 0.000012s : 1: environ_conv 0.08% : 0.000048s : 1: event_method 0.02% : 0.000012s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.02% : 0.000012s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.72% : 0.000430s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.78% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.03% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.03% : 0.000018s : 1: opt.transform.mutable_eliminate 8.07% : 0.004798s : 117: opt.transform.opt_a 0.08% : 0.000047s : 1: opt.transform.opt_after_cconv 0.06% : 0.000033s : 1: opt.transform.opt_after_jit_grad 0.29% : 0.000174s : 28: opt.transform.opt_b 0.12% : 0.000074s : 2: opt.transform.opt_trans_graph 0.09% : 0.000053s : 4: opt.transform.symbol_engine_opt 17.97% : 0.010688s : 1: opt_a 0.23% : 0.000139s : 1: opt_after_cconv 0.79% : 0.000472s : 1: opt_after_jit_grad 0.48% : 0.000288s : 1: opt_b 21.77% : 0.012950s : 1: optimize 0.04% : 0.000023s : 1: optimize_parallel_all_gather_comm 0.02% : 0.000011s : 1: order_py_execute_after_rewriter 0.04% : 0.000027s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000004s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.08% : 0.000050s : 1: pre_auto_parallel 0.07% : 0.000039s : 1: py_interpret_to_execute 0.02% : 0.000013s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000033s : 1: remove_dup_value 2.67% : 0.001589s : 2: renormalize.infer 2.25% : 0.001340s : 2: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.09% : 0.000051s : 1: rewriter_after_opt_a 0.22% : 0.000131s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.09% : 0.000051s : 1: swap_dp_allreduce_reducescatter 0.17% : 0.000100s : 1: symbol_engine_optimizer 13.50% : 0.008028s : 1: task_emit 0.17% : 0.000104s : 1: tuple_transform 17.13% : 0.010189s : 1: type_inference 0.13% : 0.000074s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x3-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x3-kbk],max_mem:68.0M TotalTime = 0.0784968, [24] [bootstrap]: 0.00054454 [type_inference]: 0.00599439 [event_method]: 1.369e-05 [auto_monad]: 5.445e-05 [graph_reusing]: 5.07e-06 [inline]: 1.79e-06 [add_attr]: 0.00338503, [1] [add_attr_with_inline]: 0.00337451, [1] [Cycle 1]: 4.468e-05, [2] [tag_attr]: 1.532e-05 [meta_addattr_fg_expand]: 4.53999e-06 [parallel-infer-symbol]: 2.51e-06 [pre_auto_parallel]: 2.694e-05 [insert-virtual-dataset]: 2.21998e-06 [parallel-infer-symbol-second]: 9.30013e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00395622, [53] [py_interpret_to_execute]: 1.866e-05 [rewriter_before_opt_a]: 5.785e-05 [opt_a]: 0.00208796, [2] [Cycle 1]: 0.00150003, [45] [expand_dump_flag]: 2.99999e-06 [switch_simplify]: 3.215e-05 [loop_unroll]: 2.105e-05 [a_1]: 0.00044878 [with_stream_mark]: 1.341e-05 [recompute_prepare]: 7.46001e-06 [updatestate_depend_eliminate]: 3.95e-06 [updatestate_assign_eliminate]: 3.67002e-06 [updatestate_loads_eliminate]: 2.81e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 7.469e-05 [accelerated_algorithm]: 6.22001e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.55001e-06 [shard_inline]: 5.86e-06 [merge_send_recv]: 7.60998e-06 [auto_parallel]: 5.91e-06 [parallel]: 2.323e-05 [flash_sp]: 7.33e-06 [merge_comm]: 3.93001e-06 [allreduce_fusion]: 3.66001e-06 [matmul_add_comm_reduction]: 8.47998e-06 [allreduce_slice_to_reducescatter]: 6.40022e-07 [virtual_shard_identity]: 7.45e-06 [virtual_dataset]: 6.12999e-06 [get_grad_eliminate_]: 5.52001e-06 [virtual_output]: 5.59e-06 [merge_forward]: 4.05998e-06 [cell_reuse_recompute_pass]: 1.08001e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.55001e-06 [before_grad]: 9.34998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55998e-06 [meta_fg_expand]: 2.42001e-06 [flash_sp_send_recv_attached]: 2.79999e-06 [receive_attached]: 2.75002e-06 [after_resolve]: 1.009e-05 [a_after_grad]: 8.70999e-06 [renormalize]: 0.00040731 [add_forward_monad_depend]: 4.48001e-06 [auto_monad_grad]: 1.75001e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 2.679e-05 [a_3]: 4.086e-05 [Cycle 2]: 0.0005786, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.51999e-06 [loop_unroll]: 5.27001e-06 [a_1]: 0.00012439 [with_stream_mark]: 9.32999e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.75997e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 7.79983e-07 [a_2]: 6.663e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.09e-06 [shard_inline]: 5.43002e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.32999e-06 [parallel]: 3.99002e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.05999e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 5.82001e-06 [virtual_dataset]: 4.99e-06 [get_grad_eliminate_]: 4.85999e-06 [virtual_output]: 4.82e-06 [merge_forward]: 2.38998e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.49e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32999e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.05e-06 [set_forward_comm_id_for_comm_node_pass]: 3.08998e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.19998e-06 [a_after_grad]: 7.97e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.05999e-06 [auto_monad_grad]: 7.50006e-07 [auto_monad_eliminator]: 5.96e-06 [cse]: 1.146e-05 [a_3]: 3.149e-05 [py_interpret_to_execute_after_opt_a]: 7.2e-06 [slice_cell_reuse_recomputed_activation]: 1.69998e-06 [rewriter_after_opt_a]: 3.038e-05 [convert_after_rewriter]: 6.38e-06 [order_py_execute_after_rewriter]: 5.16002e-06 [mutable_eliminate]: 0.00045417 [opt_b]: 0.00018328, [1] [Cycle 1]: 0.00017659, [7] [b_1]: 0.00010775 [b_2]: 7.35003e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.66999e-06 [updatestate_loads_eliminate]: 2.34001e-06 [renormalize]: 3.10014e-07 [cse]: 1.677e-05 [optimize_parallel_all_gather_comm]: 1.633e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.271e-05 [loop_unroll]: 0.00041735 [opt_after_cconv]: 9.428e-05, [1] [Cycle 1]: 8.87e-05, [7] [c_1]: 2.792e-05 [parameter_eliminate]: 2.17001e-06 [updatestate_depend_eliminate]: 5.19e-06 [updatestate_assign_eliminate]: 2.49001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.543e-05 [renormalize]: 3.39991e-07 [remove_dup_value]: 1.209e-05 [tuple_transform]: 6.753e-05, [1] [Cycle 1]: 6.329e-05, [4] [d_1]: 3.769e-05 [none_parameter_eliminate]: 1.60999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.09999e-06 [partial_unused_args_eliminate]: 1.95001e-06 [add_recomputation]: 7.634e-05 [cse_after_recomputation]: 2.093e-05, [1] [Cycle 1]: 1.631e-05, [1] [cse]: 1.102e-05 [environ_conv]: 4.72998e-06 [swap_dp_allreduce_reducescatter]: 5.62999e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 3.82998e-06 [label_fine_grained_interleaved_index]: 2.69999e-06 [merge_cast_opt]: 1.44998e-06 [slice_recompute_activation]: 2.40002e-06 [micro_interleaved_order_control]: 2.66e-06 [assign_add_opt]: 1.12999e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.24003e-06 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 9.49978e-07 [interleave_split_concat_branches]: 1.09e-06 [interleave_parallel_branches]: 9.90025e-07 [overlap_opt_shard_in_pipeline]: 1.08001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.75001e-06 [control_data_broadcast_order]: 1.142e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.16999e-06 [overlap_recompute_and_grad_model_parallel]: 4.08001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.40001e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 4.13999e-06 [overlap_grad_flash_sp]: 1.751e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.99999e-06 [split_layernorm_comm]: 2.02999e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.789e-05, [1] [Cycle 1]: 6.386e-05, [6] [build]: 2.64001e-06 [elim_shapecalc]: 8.47e-06 [elim_not_effective]: 1.15e-05 [opt_reshape]: 6.02999e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.62001e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.49e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.47002e-06 [opt_after_jit_grad]: 0.00044957 [validate]: 3.056e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.0637939 [execute]: 7.93001e-06 Sums bootstrap : 0.000545s : 0.73% type_inference : 0.005994s : 8.08% event_method : 0.000014s : 0.02% auto_monad : 0.000054s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000058s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000573s : 0.77% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000011s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000010s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000407s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000038s : 0.05% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000006s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000454s : 0.61% optimize.opt_b.b_1 : 0.000108s : 0.15% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000417s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000015s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000076s : 0.10% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.61% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063794s : 86.03% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000162 30 15.02% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 2.96% : 0.000005s : 4: substitution.graph_param_transform 66.94% : 0.000108s : 3: substitution.inline 1.73% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.53% : 0.000004s : 4: substitution.remove_not_recompute_node 2.37% : 0.000004s : 4: substitution.replace_old_param 6.56% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005952 2 90.78% : 0.005403s : 1: type_inference.infer 9.22% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.08% : 0.000026s : 3: replace.inline 29.92% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.71% : 0.000106s : 3: match.inline 8.29% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.89% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.84% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.21% : 0.000003s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.98% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.08% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.17% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.81% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.15% : 0.000003s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.58% : 0.000001s : 8: predicate.incorporate_call_switch 6.00% : 0.000009s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.86% : 0.000001s : 8: predicate.less_batch_normalization 1.76% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.42% : 0.000004s : 32: predicate.load_eliminater 1.10% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.62% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.95% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.16% : 0.000002s : 4: predicate.mutable_eliminate 0.34% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.61% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.29% : 0.000000s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.36% : 0.000001s : 4: predicate.row_tensor_eliminate 0.83% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.05% : 0.000002s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.78% : 0.000001s : 8: predicate.specialize_transform 0.88% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.39% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.07% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 1.10% : 0.000002s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.62% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.20% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 45.38% : 0.000154s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.62% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087331 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.88% : 0.003389s : 1: add_attr 3.87% : 0.003378s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.09% : 0.000081s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000018s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000580s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000463s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.07% : 0.000935s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.39% : 0.002091s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.53% : 0.000459s : 1: opt_after_jit_grad 0.21% : 0.000187s : 1: opt_b 4.53% : 0.003960s : 1: optimize 0.02% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000031s : 1: pre_auto_parallel 0.03% : 0.000022s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000210s : 1: renormalize.infer 0.22% : 0.000191s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000062s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000070s : 1: symbol_engine_optimizer 73.06% : 0.063809s : 1: task_emit 0.08% : 0.000070s : 1: tuple_transform 6.88% : 0.006007s : 1: type_inference 0.06% : 0.000056s : 1: validate TotalTime = 0.0704898, [24] [bootstrap]: 0.00046993 [type_inference]: 0.00435887 [event_method]: 1.037e-05 [auto_monad]: 4.958e-05 [graph_reusing]: 4.99e-06 [inline]: 1.52999e-06 [add_attr]: 0.00299475, [1] [add_attr_with_inline]: 0.00298671, [1] [Cycle 1]: 4.458e-05, [2] [tag_attr]: 1.119e-05 [meta_addattr_fg_expand]: 3.21001e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.102e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.89999e-06 [pipeline_split]: 1.52999e-06 [optimize]: 0.0037036, [53] [py_interpret_to_execute]: 1.536e-05 [rewriter_before_opt_a]: 3.888e-05 [opt_a]: 0.00190969, [2] [Cycle 1]: 0.00130425, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.377e-05 [loop_unroll]: 1.37e-05 [a_1]: 0.00028864 [with_stream_mark]: 1.345e-05 [recompute_prepare]: 6.95002e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.02002e-06 [updatestate_loads_eliminate]: 2.71999e-06 [parameter_eliminate]: 1.59998e-06 [a_2]: 7.468e-05 [accelerated_algorithm]: 6.16e-06 [shard]: 2.27001e-06 [meta_shard_fg_expand]: 1.82999e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 7.43999e-06 [auto_parallel]: 5.92999e-06 [parallel]: 1.836e-05 [flash_sp]: 7.28e-06 [merge_comm]: 3.65e-06 [allreduce_fusion]: 3.55e-06 [matmul_add_comm_reduction]: 8.61002e-06 [allreduce_slice_to_reducescatter]: 7.79983e-07 [virtual_shard_identity]: 6.98998e-06 [virtual_dataset]: 5.69999e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.65e-06 [cell_reuse_recompute_pass]: 1.11002e-06 [offload_activation]: 8.90001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.099e-05 [merge_recompute_call_nodes]: 1.35001e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.28998e-06 [after_resolve]: 1.002e-05 [a_after_grad]: 8.63001e-06 [renormalize]: 0.00034997 [add_forward_monad_depend]: 4.47998e-06 [auto_monad_grad]: 1.60999e-06 [auto_monad_eliminator]: 1.254e-05 [cse]: 2.568e-05 [a_3]: 4.008e-05 [Cycle 2]: 0.00059633, [45] [expand_dump_flag]: 1.16002e-06 [switch_simplify]: 7.51001e-06 [loop_unroll]: 5.53002e-06 [a_1]: 0.00012658 [with_stream_mark]: 1.06e-05 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.72001e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.69999e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.871e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.44998e-06 [merge_send_recv]: 4.35999e-06 [auto_parallel]: 5.43002e-06 [parallel]: 4.51002e-06 [flash_sp]: 2.84999e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.55002e-06 [matmul_add_comm_reduction]: 5.54998e-06 [allreduce_slice_to_reducescatter]: 3.39991e-07 [virtual_shard_identity]: 6.24999e-06 [virtual_dataset]: 5.44e-06 [get_grad_eliminate_]: 5.05999e-06 [virtual_output]: 4.87998e-06 [merge_forward]: 2.61e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 6.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71998e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 8.15e-06 [set_forward_comm_id_for_comm_node_pass]: 2.98998e-06 [meta_fg_expand]: 1.62001e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 9.01002e-06 [a_after_grad]: 7.86001e-06 [renormalize]: 6.00121e-08 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.27001e-06 [cse]: 1.345e-05 [a_3]: 3.301e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.078e-05 [convert_after_rewriter]: 7.30998e-06 [order_py_execute_after_rewriter]: 4.82e-06 [mutable_eliminate]: 0.00044622 [opt_b]: 0.00018179, [1] [Cycle 1]: 0.00017574, [7] [b_1]: 0.00010725 [b_2]: 7.09001e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 2.3999e-07 [cse]: 1.616e-05 [optimize_parallel_all_gather_comm]: 1.506e-05 [overlap_param_gather]: 1.96998e-06 [cconv]: 2.08e-05 [loop_unroll]: 0.00041025 [opt_after_cconv]: 9.335e-05, [1] [Cycle 1]: 8.79e-05, [7] [c_1]: 2.794e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.09e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.21998e-06 [cse]: 1.566e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.24e-05 [tuple_transform]: 6.698e-05, [1] [Cycle 1]: 6.281e-05, [4] [d_1]: 3.782e-05 [none_parameter_eliminate]: 1.42e-06 [renormalize]: 1.59984e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 2.22001e-06 [add_recomputation]: 4.436e-05 [cse_after_recomputation]: 1.909e-05, [1] [Cycle 1]: 1.499e-05, [1] [cse]: 1.017e-05 [environ_conv]: 5.16998e-06 [swap_dp_allreduce_reducescatter]: 5.15999e-06 [bias_add_comm_swap]: 2.71999e-06 [label_micro_interleaved_index]: 4.87998e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.16e-06 [micro_interleaved_order_control]: 2.36e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 8.2e-07 [remove_cast_before_assign_add]: 1.32e-06 [full_micro_interleaved_order_control]: 2.32999e-06 [reorder_send_recv_between_fp_bp]: 2.47001e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.169e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.35e-06 [overlap_recompute_and_grad_model_parallel]: 4.53001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.18002e-06 [overlap_grad_ring_attention]: 4.13001e-06 [overlap_grad_flash_sp]: 1.755e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.09999e-06 [split_layernorm_comm]: 1.97001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.678e-05, [1] [Cycle 1]: 6.272e-05, [6] [build]: 2.27001e-06 [elim_shapecalc]: 8.01001e-06 [elim_not_effective]: 1.148e-05 [opt_reshape]: 5.83002e-06 [fold_const_symbol]: 8.82999e-06 [renormalize]: 1.60013e-07 [detach_backward]: 1.63002e-06 [pipeline_parallel_scheduler]: 1.77999e-06 [auto_monad_reorder]: 1.527e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.76001e-06 [opt_after_jit_grad]: 0.00044511 [validate]: 3.144e-05 [backend_pass]: 8.89995e-07 [task_emit]: 0.0581641 [execute]: 7.23999e-06 Sums bootstrap : 0.000470s : 0.71% type_inference : 0.004359s : 6.56% event_method : 0.000010s : 0.02% auto_monad : 0.000050s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000415s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000023s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000350s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000446s : 0.67% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000410s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.07% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000005s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000445s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058164s : 87.48% execute : 0.000007s : 0.01% Time group info: ------[substitution.] 0.000118 26 18.04% : 0.000021s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.09% : 0.000001s : 2: substitution.fold_const_symbol 4.18% : 0.000005s : 4: substitution.graph_param_transform 66.25% : 0.000078s : 2: substitution.inline 2.29% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.58% : 0.000004s : 4: substitution.remove_not_recompute_node 3.13% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004320 2 91.62% : 0.003958s : 1: type_inference.infer 8.38% : 0.000362s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000136 984 0.89% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.70% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.84% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.74% : 0.000001s : 8: predicate.depend_value_elim 0.80% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.95% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.37% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.96% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.93% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.95% : 0.000001s : 8: predicate.less_batch_normalization 1.57% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.35% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.97% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.73% : 0.000001s : 9: predicate.minmaximum_grad 1.25% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.20% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.78% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.49% : 0.000001s : 4: predicate.row_tensor_eliminate 1.00% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.02% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.92% : 0.000001s : 8: predicate.specialize_transform 1.09% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.61% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 1.03% : 0.000001s : 9: predicate.transpose_eliminate 1.44% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.50% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.28% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.75% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.10% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.86% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000269 6 42.29% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.71% : 0.000156s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078459 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.82% : 0.002999s : 1: add_attr 3.81% : 0.002990s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.64% : 0.000504s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000008s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000008s : 1: label_micro_interleaved_index 0.53% : 0.000419s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000765s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000042s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.44% : 0.001913s : 1: opt_a 0.12% : 0.000097s : 1: opt_after_cconv 0.58% : 0.000455s : 1: opt_after_jit_grad 0.24% : 0.000185s : 1: opt_b 4.73% : 0.003708s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000189s : 1: renormalize.infer 0.20% : 0.000154s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000069s : 1: symbol_engine_optimizer 74.15% : 0.058180s : 1: task_emit 0.09% : 0.000070s : 1: tuple_transform 5.57% : 0.004373s : 1: type_inference 0.07% : 0.000052s : 1: validate TotalTime = 0.0723591, [24] [bootstrap]: 0.00046186 [type_inference]: 0.00549039 [event_method]: 1.402e-05 [auto_monad]: 5.328e-05 [graph_reusing]: 5.07999e-06 [inline]: 1.74e-06 [add_attr]: 0.00301546, [1] [add_attr_with_inline]: 0.00300774, [1] [Cycle 1]: 4.47e-05, [2] [tag_attr]: 1.489e-05 [meta_addattr_fg_expand]: 4.18999e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.469e-05 [insert-virtual-dataset]: 2.31e-06 [parallel-infer-symbol-second]: 1.02e-06 [dataset_repeat_opt]: 2.12001e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00397951, [53] [py_interpret_to_execute]: 2.364e-05 [rewriter_before_opt_a]: 5.578e-05 [opt_a]: 0.00215165, [2] [Cycle 1]: 0.00152527, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 3.229e-05 [loop_unroll]: 2.14e-05 [a_1]: 0.00045933 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 8.16002e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.41999e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.63002e-06 [a_2]: 7.862e-05 [accelerated_algorithm]: 6.71e-06 [shard]: 1.83997e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 5.98998e-06 [merge_send_recv]: 7.93001e-06 [auto_parallel]: 5.62999e-06 [parallel]: 1.755e-05 [flash_sp]: 7.28999e-06 [merge_comm]: 7.21999e-06 [allreduce_fusion]: 3.33e-06 [matmul_add_comm_reduction]: 8.42998e-06 [allreduce_slice_to_reducescatter]: 5.99975e-07 [virtual_shard_identity]: 7.11999e-06 [virtual_dataset]: 6.13002e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.55001e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.12999e-06 [offload_activation]: 8.72e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.079e-05 [merge_recompute_call_nodes]: 1.74e-06 [before_grad]: 9.50001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.43e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 2.32999e-06 [receive_attached]: 2.39999e-06 [after_resolve]: 1.314e-05 [a_after_grad]: 9.10001e-06 [renormalize]: 0.00041374 [add_forward_monad_depend]: 4.71002e-06 [auto_monad_grad]: 1.73002e-06 [auto_monad_eliminator]: 1.396e-05 [cse]: 2.765e-05 [a_3]: 4.075e-05 [Cycle 2]: 0.00061675, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.86001e-06 [loop_unroll]: 5.45001e-06 [a_1]: 0.00012655 [with_stream_mark]: 9.94001e-06 [recompute_prepare]: 5.78997e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.36e-06 [updatestate_loads_eliminate]: 2.26e-06 [parameter_eliminate]: 9.80013e-07 [a_2]: 6.82e-05 [accelerated_algorithm]: 5.42001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.52999e-06 [merge_send_recv]: 4.54998e-06 [auto_parallel]: 5.82001e-06 [parallel]: 3.83999e-06 [flash_sp]: 3.09999e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.86999e-06 [matmul_add_comm_reduction]: 5.06002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.48e-06 [virtual_dataset]: 5.39998e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.04e-06 [merge_forward]: 2.59001e-06 [cell_reuse_recompute_pass]: 1.26002e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 6.50005e-07 [before_grad]: 8.07003e-06 [set_forward_comm_id_for_comm_node_pass]: 3.16001e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.61e-06 [a_after_grad]: 8.55999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.31002e-06 [auto_monad_grad]: 1.00999e-06 [auto_monad_eliminator]: 6.63e-06 [cse]: 1.383e-05 [a_3]: 3.208e-05 [py_interpret_to_execute_after_opt_a]: 7.73001e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.075e-05 [convert_after_rewriter]: 6.85998e-06 [order_py_execute_after_rewriter]: 5.41998e-06 [mutable_eliminate]: 0.00045029 [opt_b]: 0.00018232, [1] [Cycle 1]: 0.0001763, [7] [b_1]: 0.00010909 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 4.95999e-06 [updatestate_assign_eliminate]: 2.58e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 4.50003e-07 [cse]: 1.655e-05 [optimize_parallel_all_gather_comm]: 1.593e-05 [overlap_param_gather]: 1.90001e-06 [cconv]: 2.131e-05 [loop_unroll]: 0.00041405 [opt_after_cconv]: 9.589e-05, [1] [Cycle 1]: 8.995e-05, [7] [c_1]: 2.761e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 4.89e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.647e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.18e-05 [tuple_transform]: 6.799e-05, [1] [Cycle 1]: 6.375e-05, [4] [d_1]: 3.858e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 2.09984e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 1.86998e-06 [add_recomputation]: 4.333e-05 [cse_after_recomputation]: 2.083e-05, [1] [Cycle 1]: 1.616e-05, [1] [cse]: 1.106e-05 [environ_conv]: 4.68999e-06 [swap_dp_allreduce_reducescatter]: 5.14e-06 [bias_add_comm_swap]: 2.24999e-06 [label_micro_interleaved_index]: 3.88001e-06 [label_fine_grained_interleaved_index]: 2.44001e-06 [merge_cast_opt]: 1.43002e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.73e-06 [assign_add_opt]: 1.15999e-06 [ForceFp32Comm]: 7.49977e-07 [remove_cast_before_assign_add]: 1.19e-06 [full_micro_interleaved_order_control]: 2.15002e-06 [reorder_send_recv_between_fp_bp]: 2.91e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.33002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.49998e-06 [control_data_broadcast_order]: 1.123e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.41999e-06 [overlap_recompute_and_grad_model_parallel]: 4.3e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.45001e-06 [overlap_recompute_comm]: 1.89e-06 [overlap_grad_ring_attention]: 4.1e-06 [overlap_grad_flash_sp]: 1.775e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 1.98002e-06 [split_layernorm_comm]: 1.93002e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.65e-05, [1] [Cycle 1]: 6.252e-05, [6] [build]: 2.48e-06 [elim_shapecalc]: 8.10999e-06 [elim_not_effective]: 1.09e-05 [opt_reshape]: 6.09001e-06 [fold_const_symbol]: 8.74e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.56998e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.496e-05 [get_jit_bprop_graph]: 9.79984e-07 [rewriter_after_jit_bprop_graph]: 3.42002e-06 [opt_after_jit_grad]: 0.00045115 [validate]: 3.122e-05 [backend_pass]: 8.30012e-07 [task_emit]: 0.0585942 [execute]: 7.83999e-06 Sums bootstrap : 0.000462s : 0.68% type_inference : 0.005490s : 8.03% event_method : 0.000014s : 0.02% auto_monad : 0.000053s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000025s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000024s : 0.03% optimize.rewriter_before_opt_a : 0.000056s : 0.08% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.06% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000586s : 0.86% optimize.opt_a.with_stream_mark : 0.000024s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000147s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000010s : 0.02% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000023s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000414s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.03% optimize.opt_a.cse : 0.000041s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000450s : 0.66% optimize.opt_b.b_1 : 0.000109s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000414s : 0.61% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000451s : 0.66% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058594s : 85.71% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000166 30 14.47% : 0.000024s : 5: substitution.arithmetic_simplify 1.07% : 0.000002s : 2: substitution.elim_not_effective 0.70% : 0.000001s : 2: substitution.fold_const_symbol 3.17% : 0.000005s : 4: substitution.graph_param_transform 65.78% : 0.000109s : 3: substitution.inline 1.79% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.59% : 0.000004s : 4: substitution.remove_not_recompute_node 3.96% : 0.000007s : 4: substitution.replace_old_param 6.48% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005449 2 90.09% : 0.004909s : 1: type_inference.infer 9.91% : 0.000540s : 1: type_inference.specialize ------[replace.] 0.000043 5 74.02% : 0.000031s : 3: replace.inline 25.98% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.76% : 0.000107s : 3: match.inline 8.24% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000156 1131 0.92% : 0.000001s : 11: predicate.accumulaten_eliminater 1.01% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.23% : 0.000003s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.76% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.18% : 0.000003s : 16: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.69% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000009s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.66% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.58% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.19% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.65% : 0.000003s : 16: predicate.partial_defer_inline 1.48% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.07% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.52% : 0.000002s : 21: predicate.replace_applicator 0.57% : 0.000001s : 8: predicate.replace_old_param 0.30% : 0.000000s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.87% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.81% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.90% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.97% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 1.03% : 0.000002s : 11: predicate.transpose_eliminate 1.57% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.58% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.14% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.84% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.13% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.71% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000339 8 46.93% : 0.000159s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.07% : 0.000180s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.080883 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.73% : 0.003020s : 1: add_attr 3.72% : 0.003011s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000059s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.61% : 0.000497s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.52% : 0.000422s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.57% : 0.000459s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000011s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.19% : 0.000959s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.66% : 0.002154s : 1: opt_a 0.12% : 0.000099s : 1: opt_after_cconv 0.57% : 0.000460s : 1: opt_after_jit_grad 0.23% : 0.000186s : 1: opt_b 4.92% : 0.003983s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000029s : 1: pre_auto_parallel 0.03% : 0.000028s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.25% : 0.000205s : 1: renormalize.infer 0.25% : 0.000202s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.07% : 0.000060s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000069s : 1: symbol_engine_optimizer 72.46% : 0.058611s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 6.80% : 0.005504s : 1: type_inference 0.06% : 0.000052s : 1: validate TotalTime = 0.109978, [24] [bootstrap]: 0.00050436 [type_inference]: 0.0123512 [event_method]: 5.067e-05 [auto_monad]: 0.00012335 [graph_reusing]: 9.19e-06 [inline]: 1.70001e-06 [add_attr]: 0.0030706, [1] [add_attr_with_inline]: 0.00306238, [1] [Cycle 1]: 7.299e-05, [2] [tag_attr]: 3.565e-05 [meta_addattr_fg_expand]: 1.05e-05 [parallel-infer-symbol]: 3.09001e-06 [pre_auto_parallel]: 5.03e-05 [insert-virtual-dataset]: 2.38998e-06 [parallel-infer-symbol-second]: 8.89995e-07 [dataset_repeat_opt]: 2.22001e-06 [pipeline_split]: 1.84e-06 [optimize]: 0.0137953, [53] [py_interpret_to_execute]: 3.968e-05 [rewriter_before_opt_a]: 0.00014705 [opt_a]: 0.0114696, [3] [Cycle 1]: 0.00740343, [45] [expand_dump_flag]: 3.9e-06 [switch_simplify]: 7.507e-05 [loop_unroll]: 6.298e-05 [a_1]: 0.00152061 [with_stream_mark]: 2.307e-05 [recompute_prepare]: 2.276e-05 [updatestate_depend_eliminate]: 9.81998e-06 [updatestate_assign_eliminate]: 7.93001e-06 [updatestate_loads_eliminate]: 7.66001e-06 [parameter_eliminate]: 3.07002e-06 [a_2]: 0.00024619 [accelerated_algorithm]: 3.067e-05 [shard]: 2.49001e-06 [meta_shard_fg_expand]: 3.65998e-06 [shard_inline]: 1.63e-05 [merge_send_recv]: 1.594e-05 [auto_parallel]: 1.12e-05 [parallel]: 1.841e-05 [flash_sp]: 1.097e-05 [merge_comm]: 9.89001e-06 [allreduce_fusion]: 8.92e-06 [matmul_add_comm_reduction]: 2.665e-05 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 1.763e-05 [virtual_dataset]: 1.609e-05 [get_grad_eliminate_]: 1.539e-05 [virtual_output]: 1.527e-05 [merge_forward]: 9.55001e-06 [cell_reuse_recompute_pass]: 1.17e-06 [offload_activation]: 1.808e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.86e-05 [merge_recompute_call_nodes]: 1.44998e-06 [before_grad]: 2.843e-05 [set_forward_comm_id_for_comm_node_pass]: 9.96e-06 [meta_fg_expand]: 0.00148534 [flash_sp_send_recv_attached]: 4.12998e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 6.103e-05 [a_after_grad]: 8.346e-05 [renormalize]: 0.00259682 [add_forward_monad_depend]: 9.37999e-06 [auto_monad_grad]: 6.19999e-06 [auto_monad_eliminator]: 5.762e-05 [cse]: 0.00017541 [a_3]: 0.00034333 [Cycle 2]: 0.0031278, [45] [expand_dump_flag]: 1.60999e-06 [switch_simplify]: 4.782e-05 [loop_unroll]: 4.453e-05 [a_1]: 0.00160356 [with_stream_mark]: 1.245e-05 [recompute_prepare]: 1.154e-05 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 4.62e-06 [updatestate_loads_eliminate]: 3.81001e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 0.00012972 [accelerated_algorithm]: 1.243e-05 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.94e-06 [shard_inline]: 9.17999e-06 [merge_send_recv]: 7.16999e-06 [auto_parallel]: 7.41999e-06 [parallel]: 4.75999e-06 [flash_sp]: 3.78001e-06 [merge_comm]: 5.98998e-06 [allreduce_fusion]: 4.99998e-06 [matmul_add_comm_reduction]: 8.15e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 1.02e-05 [virtual_dataset]: 9.17999e-06 [get_grad_eliminate_]: 8.70001e-06 [virtual_output]: 8.70999e-06 [merge_forward]: 4.45999e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.39e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.651e-05 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 1.45e-05 [set_forward_comm_id_for_comm_node_pass]: 5.46e-06 [meta_fg_expand]: 7.238e-05 [flash_sp_send_recv_attached]: 9.20001e-07 [receive_attached]: 1.14e-06 [after_resolve]: 1.67e-05 [a_after_grad]: 1.465e-05 [renormalize]: 0.0006164 [add_forward_monad_depend]: 3.93999e-06 [auto_monad_grad]: 1.29e-06 [auto_monad_eliminator]: 1.57e-05 [cse]: 5.06e-05 [a_3]: 6.699e-05 [Cycle 3]: 0.00092423, [45] [expand_dump_flag]: 1.07998e-06 [switch_simplify]: 1.086e-05 [loop_unroll]: 8.99e-06 [a_1]: 0.0002549 [with_stream_mark]: 1.048e-05 [recompute_prepare]: 1.02e-05 [updatestate_depend_eliminate]: 4.97e-06 [updatestate_assign_eliminate]: 4.02e-06 [updatestate_loads_eliminate]: 4.06001e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 0.00012614 [accelerated_algorithm]: 1.174e-05 [shard]: 9.80013e-07 [meta_shard_fg_expand]: 1.94999e-06 [shard_inline]: 9.18002e-06 [merge_send_recv]: 7.00998e-06 [auto_parallel]: 7.35e-06 [parallel]: 4.52e-06 [flash_sp]: 9.5999e-07 [merge_comm]: 5.07e-06 [allreduce_fusion]: 5.22999e-06 [matmul_add_comm_reduction]: 8.08999e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 1.074e-05 [virtual_dataset]: 9.25999e-06 [get_grad_eliminate_]: 8.67998e-06 [virtual_output]: 8.43999e-06 [merge_forward]: 4.41002e-06 [cell_reuse_recompute_pass]: 1.25001e-06 [offload_activation]: 8.96002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.659e-05 [merge_recompute_call_nodes]: 7.7e-07 [before_grad]: 1.449e-05 [set_forward_comm_id_for_comm_node_pass]: 5.39e-06 [meta_fg_expand]: 3.31999e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.538e-05 [a_after_grad]: 1.521e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.35001e-06 [auto_monad_grad]: 1.15999e-06 [auto_monad_eliminator]: 1.103e-05 [cse]: 2.761e-05 [a_3]: 6.117e-05 [py_interpret_to_execute_after_opt_a]: 1.079e-05 [slice_cell_reuse_recomputed_activation]: 1.82999e-06 [rewriter_after_opt_a]: 4.739e-05 [convert_after_rewriter]: 9.59e-06 [order_py_execute_after_rewriter]: 6.79999e-06 [mutable_eliminate]: 0.00046711 [opt_b]: 0.00029458, [1] [Cycle 1]: 0.00028821, [7] [b_1]: 0.00019315 [b_2]: 1.128e-05 [updatestate_depend_eliminate]: 7.13e-06 [updatestate_assign_eliminate]: 4.05e-06 [updatestate_loads_eliminate]: 4.3e-06 [renormalize]: 3.69997e-07 [cse]: 3.305e-05 [optimize_parallel_all_gather_comm]: 2.113e-05 [overlap_param_gather]: 1.82001e-06 [cconv]: 2.01e-05 [loop_unroll]: 0.0004548 [opt_after_cconv]: 0.00013887, [1] [Cycle 1]: 0.00013267, [7] [c_1]: 4.962e-05 [parameter_eliminate]: 2.38998e-06 [updatestate_depend_eliminate]: 7.56001e-06 [updatestate_assign_eliminate]: 4.32e-06 [updatestate_loads_eliminate]: 4.05e-06 [cse]: 3.033e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 2.96e-05 [tuple_transform]: 0.00010417, [1] [Cycle 1]: 9.933e-05, [4] [d_1]: 6.878e-05 [none_parameter_eliminate]: 1.64998e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 1.037e-05 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 5.681e-05 [cse_after_recomputation]: 3.367e-05, [1] [Cycle 1]: 2.892e-05, [1] [cse]: 2.339e-05 [environ_conv]: 9.24e-06 [swap_dp_allreduce_reducescatter]: 8.30999e-06 [bias_add_comm_swap]: 2.32001e-06 [label_micro_interleaved_index]: 4.28999e-06 [label_fine_grained_interleaved_index]: 2.60997e-06 [merge_cast_opt]: 1.15001e-06 [slice_recompute_activation]: 2.53998e-06 [micro_interleaved_order_control]: 2.02001e-06 [assign_add_opt]: 1.22999e-06 [ForceFp32Comm]: 1.05001e-06 [remove_cast_before_assign_add]: 9.5999e-07 [full_micro_interleaved_order_control]: 2.48e-06 [reorder_send_recv_between_fp_bp]: 2.56998e-06 [comm_op_add_attrs]: 1.15001e-06 [add_comm_op_reuse_tag]: 8.40024e-07 [interleave_split_concat_branches]: 1.14998e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.20001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.79e-06 [control_data_broadcast_order]: 1.752e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 5.32001e-06 [overlap_recompute_and_grad_model_parallel]: 5.65001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.17001e-06 [overlap_grad_ring_attention]: 5.15001e-06 [overlap_grad_flash_sp]: 2.372e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.21e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.18001e-06 [symbol_engine_optimizer]: 0.00010132, [1] [Cycle 1]: 9.706e-05, [6] [build]: 9.32999e-06 [elim_shapecalc]: 1.402e-05 [elim_not_effective]: 1.897e-05 [opt_reshape]: 1.112e-05 [fold_const_symbol]: 1.574e-05 [renormalize]: 1.80007e-07 [detach_backward]: 1.67001e-06 [pipeline_parallel_scheduler]: 1.43002e-06 [auto_monad_reorder]: 2.589e-05 [get_jit_bprop_graph]: 1.07998e-06 [rewriter_after_jit_bprop_graph]: 3.40003e-06 [opt_after_jit_grad]: 0.00048371 [validate]: 4.532e-05 [backend_pass]: 1.05999e-06 [task_emit]: 0.079228 [execute]: 8.14002e-06 Sums bootstrap : 0.000504s : 0.48% type_inference : 0.012351s : 11.69% event_method : 0.000051s : 0.05% auto_monad : 0.000123s : 0.12% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000036s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000011s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000050s : 0.05% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000040s : 0.04% optimize.rewriter_before_opt_a : 0.000147s : 0.14% optimize.opt_a.expand_dump_flag : 0.000007s : 0.01% optimize.opt_a.switch_simplify : 0.000134s : 0.13% optimize.opt_a.loop_unroll : 0.000117s : 0.11% optimize.opt_a.a_1 : 0.003379s : 3.20% optimize.opt_a.with_stream_mark : 0.000046s : 0.04% optimize.opt_a.recompute_prepare : 0.000045s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000020s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000017s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000016s : 0.01% optimize.opt_a.parameter_eliminate : 0.000005s : 0.00% optimize.opt_a.a_2 : 0.000502s : 0.48% optimize.opt_a.accelerated_algorithm : 0.000055s : 0.05% optimize.opt_a.shard : 0.000005s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000008s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000030s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.01% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000043s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000039s : 0.04% optimize.opt_a.virtual_dataset : 0.000035s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000036s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000062s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000057s : 0.05% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000021s : 0.02% optimize.opt_a.meta_fg_expand : 0.001561s : 1.48% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000093s : 0.09% optimize.opt_a.a_after_grad : 0.000113s : 0.11% optimize.opt_a.renormalize : 0.003213s : 3.04% optimize.opt_a.add_forward_monad_depend : 0.000015s : 0.01% optimize.opt_a.auto_monad_grad : 0.000009s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.08% optimize.opt_a.cse : 0.000254s : 0.24% optimize.opt_a.a_3 : 0.000471s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000047s : 0.04% optimize.convert_after_rewriter : 0.000010s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000467s : 0.44% optimize.opt_b.b_1 : 0.000193s : 0.18% optimize.opt_b.b_2 : 0.000011s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000033s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000020s : 0.02% optimize.loop_unroll : 0.000455s : 0.43% optimize.opt_after_cconv.c_1 : 0.000050s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000008s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000030s : 0.03% optimize.tuple_transform.d_1 : 0.000069s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000057s : 0.05% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000003s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000018s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.00% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000011s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000016s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000026s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000484s : 0.46% validate : 0.000045s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079228s : 75.00% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000787 222 5.82% : 0.000046s : 12: substitution.arithmetic_simplify 1.72% : 0.000014s : 2: substitution.cast_eliminate 0.37% : 0.000003s : 5: substitution.elim_not_effective 0.50% : 0.000004s : 5: substitution.float_depend_g_call 0.50% : 0.000004s : 3: substitution.float_tuple_getitem_switch 0.30% : 0.000002s : 5: substitution.fold_const_symbol 0.97% : 0.000008s : 8: substitution.graph_param_transform 0.32% : 0.000003s : 2: substitution.incorporate_call 0.24% : 0.000002s : 2: substitution.incorporate_call_switch 56.76% : 0.000447s : 17: substitution.inline 2.08% : 0.000016s : 2: substitution.inline_without_move 1.31% : 0.000010s : 20: substitution.j_node_and_user_rematch 1.90% : 0.000015s : 3: substitution.less_batch_normalization 1.68% : 0.000013s : 11: substitution.minmaximum_grad 0.68% : 0.000005s : 5: substitution.partial_eliminate 1.74% : 0.000014s : 20: substitution.remove_not_recompute_node 3.11% : 0.000024s : 10: substitution.replace_applicator 1.33% : 0.000010s : 15: substitution.replace_old_param 0.32% : 0.000002s : 1: substitution.set_cell_output_no_recompute 3.55% : 0.000028s : 11: substitution.tuple_list_convert_item_index_to_positive 1.75% : 0.000014s : 11: substitution.tuple_list_get_item_const_eliminator 2.22% : 0.000017s : 11: substitution.tuple_list_get_item_depend_reorder 8.51% : 0.000067s : 30: substitution.tuple_list_get_item_eliminator 2.32% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.012277 2 86.61% : 0.010634s : 1: type_inference.infer 13.39% : 0.001644s : 1: type_inference.specialize ------[replace.] 0.000270 33 49.48% : 0.000133s : 17: replace.inline 50.52% : 0.000136s : 16: replace.tuple_list_get_item_eliminator ------[match.] 0.000473 33 92.55% : 0.000437s : 17: match.inline 7.45% : 0.000035s : 16: match.tuple_list_get_item_eliminator ------[predicate.] 0.000764 5764 1.04% : 0.000008s : 68: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.50% : 0.000004s : 32: predicate.addn_check_dump 1.05% : 0.000008s : 68: predicate.addn_zero_filter 1.03% : 0.000008s : 68: predicate.adjust_all_reduce_mul_add 2.11% : 0.000016s : 100: predicate.arithmetic_simplify 1.13% : 0.000009s : 68: predicate.cast_eliminate 1.12% : 0.000009s : 68: predicate.check_bprop_eliminate 0.52% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.52% : 0.000004s : 32: predicate.depend_value_elim 1.18% : 0.000009s : 68: predicate.dict_get_item_const_eliminator 1.18% : 0.000009s : 68: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 68: predicate.dict_set_item_eliminator 0.38% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.09% : 0.000001s : 8: predicate.elim_not_effective 0.15% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.20% : 0.000009s : 76: predicate.environ_add_const_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_add_eliminate 1.19% : 0.000009s : 76: predicate.environ_get_depend_swap 1.73% : 0.000013s : 108: predicate.environ_get_eliminate 1.17% : 0.000009s : 76: predicate.environ_get_set_eliminate 1.71% : 0.000013s : 101: predicate.exchange_switch_depend_value 2.37% : 0.000018s : 101: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.57% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.54% : 0.000004s : 32: predicate.incorporate_call 0.49% : 0.000004s : 32: predicate.incorporate_call_switch 5.68% : 0.000043s : 249: predicate.inline 1.25% : 0.000010s : 55: predicate.inline_without_move 0.30% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.64% : 0.000005s : 32: predicate.less_batch_normalization 1.65% : 0.000013s : 100: predicate.list_to_tuple_eliminator_ 2.63% : 0.000020s : 168: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.31% : 0.000018s : 136: predicate.loop_unroll_before_grad 1.36% : 0.000010s : 84: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.12% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.11% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.11% : 0.000008s : 68: predicate.minmaximum_grad 0.34% : 0.000003s : 8: predicate.mutable_eliminate 0.15% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 2.04% : 0.000016s : 101: predicate.partial_defer_inline 1.76% : 0.000013s : 92: predicate.partial_eliminate 1.04% : 0.000008s : 68: predicate.print_const_string_wrapper 0.52% : 0.000004s : 32: predicate.reduce_all_const_elim 1.31% : 0.000010s : 68: predicate.reduce_eliminate 2.62% : 0.000020s : 168: predicate.redundant_stop_gradient_eliminater 0.35% : 0.000003s : 32: predicate.remove_not_recompute_node 1.88% : 0.000014s : 152: predicate.replace_applicator 0.63% : 0.000005s : 55: predicate.replace_old_param 0.10% : 0.000001s : 8: predicate.reset_defer_inline 1.07% : 0.000008s : 68: predicate.reshape_eliminate 1.13% : 0.000009s : 68: predicate.row_tensor_add_zeros_like 0.15% : 0.000001s : 8: predicate.row_tensor_eliminate 1.24% : 0.000010s : 68: predicate.same_eliminate 0.37% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.63% : 0.000005s : 32: predicate.shard_identity_eliminate 0.30% : 0.000002s : 16: predicate.special_op_eliminate 0.62% : 0.000005s : 32: predicate.specialize_transform 1.29% : 0.000010s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.86% : 0.000014s : 101: predicate.switch_defer_inline 2.98% : 0.000023s : 169: predicate.switch_layer_defer_inline 5.03% : 0.000038s : 277: predicate.switch_simplify 1.06% : 0.000008s : 68: predicate.tile_eliminate 1.05% : 0.000008s : 68: predicate.transpose_eliminate 1.50% : 0.000011s : 84: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 84: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000011s : 84: predicate.tuple_list_get_item_depend_reorder 2.96% : 0.000023s : 132: predicate.tuple_list_get_item_eliminator 1.51% : 0.000012s : 84: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000016s : 116: predicate.tuple_list_set_item_eliminator 1.62% : 0.000012s : 100: predicate.tuple_to_list_eliminator_ 2.61% : 0.000020s : 168: predicate.updatestate_pure_node_eliminater 3.18% : 0.000024s : 200: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.56% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.54% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.18% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001759 34 55.47% : 0.000976s : 13: func_graph_cloner_run.FuncGraphClonerGraph 44.53% : 0.000783s : 21: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.135472 237 0.00% : 0.000004s : 1: ForceFp32Comm 2.27% : 0.003075s : 1: add_attr 2.26% : 0.003066s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000061s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.10% : 0.000131s : 1: auto_monad 0.02% : 0.000030s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000538s : 1: bootstrap 0.02% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000021s : 1: control_data_broadcast_order 0.01% : 0.000013s : 1: convert_after_rewriter 0.03% : 0.000037s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000058s : 1: event_method 0.01% : 0.000014s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000014s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000464s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000476s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.75% : 0.005079s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000035s : 1: opt.transform.opt_after_jit_grad 0.13% : 0.000178s : 28: opt.transform.opt_b 0.06% : 0.000077s : 2: opt.transform.opt_trans_graph 0.04% : 0.000056s : 4: opt.transform.symbol_engine_opt 8.47% : 0.011473s : 1: opt_a 0.11% : 0.000142s : 1: opt_after_cconv 0.36% : 0.000493s : 1: opt_after_jit_grad 0.22% : 0.000298s : 1: opt_b 10.19% : 0.013799s : 1: optimize 0.02% : 0.000025s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000027s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000055s : 1: pre_auto_parallel 0.03% : 0.000044s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000034s : 1: remove_dup_value 1.25% : 0.001693s : 2: renormalize.infer 1.11% : 0.001507s : 2: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000051s : 1: rewriter_after_opt_a 0.11% : 0.000152s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000104s : 1: symbol_engine_optimizer 58.50% : 0.079245s : 1: task_emit 0.08% : 0.000107s : 1: tuple_transform 9.13% : 0.012366s : 1: type_inference 0.05% : 0.000070s : 1: validate TotalTime = 0.0703806, [24] [bootstrap]: 0.00046021 [type_inference]: 0.00433504 [event_method]: 1.094e-05 [auto_monad]: 5.207e-05 [graph_reusing]: 5.20001e-06 [inline]: 1.96998e-06 [add_attr]: 0.00302634, [1] [add_attr_with_inline]: 0.00301837, [1] [Cycle 1]: 4.397e-05, [2] [tag_attr]: 1.135e-05 [meta_addattr_fg_expand]: 3.22002e-06 [parallel-infer-symbol]: 2.49999e-06 [pre_auto_parallel]: 2.266e-05 [insert-virtual-dataset]: 2.24001e-06 [parallel-infer-symbol-second]: 8.2e-07 [dataset_repeat_opt]: 1.90001e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00376737, [53] [py_interpret_to_execute]: 1.641e-05 [rewriter_before_opt_a]: 3.912e-05 [opt_a]: 0.00189759, [2] [Cycle 1]: 0.0012823, [45] [expand_dump_flag]: 2.54001e-06 [switch_simplify]: 2.486e-05 [loop_unroll]: 1.419e-05 [a_1]: 0.00029845 [with_stream_mark]: 1.373e-05 [recompute_prepare]: 8.09997e-06 [updatestate_depend_eliminate]: 3.71001e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 2.83998e-06 [parameter_eliminate]: 1.52001e-06 [a_2]: 8.082e-05 [accelerated_algorithm]: 6.25997e-06 [shard]: 2.14e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.87999e-06 [merge_send_recv]: 8.42998e-06 [auto_parallel]: 5.98002e-06 [parallel]: 1.814e-05 [flash_sp]: 7.24001e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.26001e-06 [matmul_add_comm_reduction]: 9.10999e-06 [allreduce_slice_to_reducescatter]: 5.8001e-07 [virtual_shard_identity]: 6.89999e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.90002e-06 [merge_forward]: 4.05e-06 [cell_reuse_recompute_pass]: 1.04e-06 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.102e-05 [merge_recompute_call_nodes]: 1.63002e-06 [before_grad]: 9.64999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.63e-06 [meta_fg_expand]: 2.27999e-06 [flash_sp_send_recv_attached]: 2.28002e-06 [receive_attached]: 2.08998e-06 [after_resolve]: 1.09e-05 [a_after_grad]: 9.56e-06 [renormalize]: 0.00035466 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.76e-06 [auto_monad_eliminator]: 1.335e-05 [cse]: 2.607e-05 [a_3]: 4.185e-05 [Cycle 2]: 0.00060612, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 7.23999e-06 [loop_unroll]: 5.71e-06 [a_1]: 0.0001241 [with_stream_mark]: 9.99001e-06 [recompute_prepare]: 5.73002e-06 [updatestate_depend_eliminate]: 2.91e-06 [updatestate_assign_eliminate]: 2.27999e-06 [updatestate_loads_eliminate]: 2.56e-06 [parameter_eliminate]: 7.80012e-07 [a_2]: 7.06e-05 [accelerated_algorithm]: 5.69e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.19003e-06 [shard_inline]: 5.56e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.21002e-06 [parallel]: 4.09997e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 3.25998e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.30999e-06 [allreduce_slice_to_reducescatter]: 3.29979e-07 [virtual_shard_identity]: 6.41e-06 [virtual_dataset]: 5.65001e-06 [get_grad_eliminate_]: 5.29e-06 [virtual_output]: 5.35999e-06 [merge_forward]: 2.93e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 6.02001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.84001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 8.05999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06999e-06 [meta_fg_expand]: 1.77001e-06 [flash_sp_send_recv_attached]: 1.05999e-06 [receive_attached]: 1.00999e-06 [after_resolve]: 9.44e-06 [a_after_grad]: 8.57e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.25001e-06 [auto_monad_grad]: 7.60017e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.344e-05 [a_3]: 3.489e-05 [py_interpret_to_execute_after_opt_a]: 7.22002e-06 [slice_cell_reuse_recomputed_activation]: 2.12001e-06 [rewriter_after_opt_a]: 3.203e-05 [convert_after_rewriter]: 7.11999e-06 [order_py_execute_after_rewriter]: 5.52001e-06 [mutable_eliminate]: 0.00045865 [opt_b]: 0.00018819, [1] [Cycle 1]: 0.00018199, [7] [b_1]: 0.00011261 [b_2]: 7.34002e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 4.19997e-07 [cse]: 1.729e-05 [optimize_parallel_all_gather_comm]: 1.525e-05 [overlap_param_gather]: 1.76e-06 [cconv]: 2.285e-05 [loop_unroll]: 0.00044579 [opt_after_cconv]: 9.704e-05, [1] [Cycle 1]: 9.124e-05, [7] [c_1]: 2.864e-05 [parameter_eliminate]: 2.22001e-06 [updatestate_depend_eliminate]: 5.40001e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.71e-05 [renormalize]: 2.3999e-07 [remove_dup_value]: 1.308e-05 [tuple_transform]: 7.128e-05, [1] [Cycle 1]: 6.698e-05, [4] [d_1]: 4.113e-05 [none_parameter_eliminate]: 1.66e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.63e-06 [partial_unused_args_eliminate]: 1.75001e-06 [add_recomputation]: 4.339e-05 [cse_after_recomputation]: 2.079e-05, [1] [Cycle 1]: 1.64e-05, [1] [cse]: 1.115e-05 [environ_conv]: 4.47998e-06 [swap_dp_allreduce_reducescatter]: 5.74e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.42999e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.11e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 1.00999e-06 [remove_cast_before_assign_add]: 1.15999e-06 [full_micro_interleaved_order_control]: 2.34001e-06 [reorder_send_recv_between_fp_bp]: 2.75002e-06 [comm_op_add_attrs]: 1.17999e-06 [add_comm_op_reuse_tag]: 1.25001e-06 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.15999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.77001e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.95e-06 [overlap_recompute_and_grad_model_parallel]: 4.75001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.16002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35001e-06 [overlap_recompute_comm]: 2.51e-06 [overlap_grad_ring_attention]: 3.83001e-06 [overlap_grad_flash_sp]: 1.764e-05 [begin_end_overlap_inline]: 8.09989e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.69972e-07 [symbol_engine_optimizer]: 6.934e-05, [1] [Cycle 1]: 6.534e-05, [6] [build]: 2.43e-06 [elim_shapecalc]: 8.65001e-06 [elim_not_effective]: 1.158e-05 [opt_reshape]: 6.30997e-06 [fold_const_symbol]: 9.12001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.68002e-06 [pipeline_parallel_scheduler]: 1.80001e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.65e-06 [opt_after_jit_grad]: 0.00046898 [validate]: 3.284e-05 [backend_pass]: 9.40025e-07 [task_emit]: 0.0579509 [execute]: 8.02e-06 Sums bootstrap : 0.000460s : 0.69% type_inference : 0.004335s : 6.53% event_method : 0.000011s : 0.02% auto_monad : 0.000052s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000002s : 0.00% pre_auto_parallel : 0.000023s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000016s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000423s : 0.64% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000151s : 0.23% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000355s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000077s : 0.12% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000459s : 0.69% optimize.opt_b.b_1 : 0.000113s : 0.17% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000446s : 0.67% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000041s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000007s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000018s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000469s : 0.71% validate : 0.000033s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057951s : 87.31% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000122 26 18.17% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.03% : 0.000001s : 2: substitution.fold_const_symbol 4.57% : 0.000006s : 4: substitution.graph_param_transform 65.62% : 0.000080s : 2: substitution.inline 2.56% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.55% : 0.000004s : 4: substitution.remove_not_recompute_node 3.03% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004293 2 91.55% : 0.003930s : 1: type_inference.infer 8.45% : 0.000363s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000140 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.46% : 0.000003s : 17: predicate.arithmetic_simplify 0.87% : 0.000001s : 9: predicate.cast_eliminate 1.07% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.77% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.27% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.01% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.83% : 0.000003s : 21: predicate.environ_get_eliminate 1.02% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.92% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.86% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.83% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.84% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.01% : 0.000001s : 8: predicate.less_batch_normalization 1.65% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.09% : 0.000003s : 26: predicate.load_eliminater 1.45% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.72% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.75% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.41% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.50% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.73% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.06% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.67% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.85% : 0.000001s : 8: predicate.replace_old_param 0.39% : 0.000001s : 4: predicate.reset_defer_inline 0.75% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.06% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.44% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.05% : 0.000001s : 11: predicate.switch_defer_inline 1.80% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.70% : 0.000007s : 41: predicate.switch_simplify 0.71% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.65% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.65% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.27% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.55% : 0.000004s : 25: predicate.tuple_list_set_item_eliminator 1.56% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.01% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.89% : 0.000001s : 8: predicate.virtual_output_eliminate 0.38% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000251 6 41.50% : 0.000104s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.50% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078479 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.86% : 0.003030s : 1: add_attr 3.85% : 0.003022s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000048s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000058s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.63% : 0.000497s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000017s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.58% : 0.000454s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.60% : 0.000468s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 1.00% : 0.000786s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000094s : 28: opt.transform.opt_b 0.06% : 0.000046s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.42% : 0.001901s : 1: opt_a 0.13% : 0.000100s : 1: opt_after_cconv 0.61% : 0.000479s : 1: opt_after_jit_grad 0.24% : 0.000192s : 1: opt_b 4.81% : 0.003771s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.03% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000027s : 1: pre_auto_parallel 0.03% : 0.000020s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000191s : 1: renormalize.infer 0.20% : 0.000157s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000072s : 1: symbol_engine_optimizer 73.86% : 0.057968s : 1: task_emit 0.09% : 0.000074s : 1: tuple_transform 5.54% : 0.004348s : 1: type_inference 0.07% : 0.000056s : 1: validate TotalTime = 0.107226, [24] [bootstrap]: 0.00049483 [type_inference]: 0.0103298 [event_method]: 4.332e-05 [auto_monad]: 0.00011721 [graph_reusing]: 8.52998e-06 [inline]: 1.87001e-06 [add_attr]: 0.00301401, [1] [add_attr_with_inline]: 0.00300593, [1] [Cycle 1]: 6.618e-05, [2] [tag_attr]: 3.082e-05 [meta_addattr_fg_expand]: 8.97999e-06 [parallel-infer-symbol]: 2.76e-06 [pre_auto_parallel]: 4.594e-05 [insert-virtual-dataset]: 2.28002e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 1.99e-06 [pipeline_split]: 1.57001e-06 [optimize]: 0.013232, [53] [py_interpret_to_execute]: 3.536e-05 [rewriter_before_opt_a]: 0.00012722 [opt_a]: 0.0109252, [3] [Cycle 1]: 0.00698622, [45] [expand_dump_flag]: 3.56999e-06 [switch_simplify]: 6.706e-05 [loop_unroll]: 5.548e-05 [a_1]: 0.00135634 [with_stream_mark]: 2.315e-05 [recompute_prepare]: 2.199e-05 [updatestate_depend_eliminate]: 9.11002e-06 [updatestate_assign_eliminate]: 7.62002e-06 [updatestate_loads_eliminate]: 7.26001e-06 [parameter_eliminate]: 2.49001e-06 [a_2]: 0.0002467 [accelerated_algorithm]: 3.039e-05 [shard]: 2.05002e-06 [meta_shard_fg_expand]: 3.25e-06 [shard_inline]: 1.633e-05 [merge_send_recv]: 1.533e-05 [auto_parallel]: 1.114e-05 [parallel]: 1.796e-05 [flash_sp]: 1.073e-05 [merge_comm]: 9.74e-06 [allreduce_fusion]: 9.14998e-06 [matmul_add_comm_reduction]: 2.568e-05 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 1.792e-05 [virtual_dataset]: 1.58e-05 [get_grad_eliminate_]: 1.53e-05 [virtual_output]: 1.541e-05 [merge_forward]: 9.46e-06 [cell_reuse_recompute_pass]: 1.44e-06 [offload_activation]: 1.733e-05 [cell_reuse_handle_not_recompute_node_pass]: 2.831e-05 [merge_recompute_call_nodes]: 1.32999e-06 [before_grad]: 2.787e-05 [set_forward_comm_id_for_comm_node_pass]: 9.36e-06 [meta_fg_expand]: 0.00143233 [flash_sp_send_recv_attached]: 3.85998e-06 [receive_attached]: 2.59001e-06 [after_resolve]: 5.962e-05 [a_after_grad]: 8.064e-05 [renormalize]: 0.00244054 [add_forward_monad_depend]: 9.22001e-06 [auto_monad_grad]: 5.11997e-06 [auto_monad_eliminator]: 5.767e-05 [cse]: 0.0001692 [a_3]: 0.0003371 [Cycle 2]: 0.00301075, [45] [expand_dump_flag]: 1.55001e-06 [switch_simplify]: 4.794e-05 [loop_unroll]: 4.416e-05 [a_1]: 0.00156047 [with_stream_mark]: 1.194e-05 [recompute_prepare]: 1.133e-05 [updatestate_depend_eliminate]: 5.22999e-06 [updatestate_assign_eliminate]: 4.33999e-06 [updatestate_loads_eliminate]: 3.68e-06 [parameter_eliminate]: 1.00001e-06 [a_2]: 0.0001276 [accelerated_algorithm]: 1.19e-05 [shard]: 9.60019e-07 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.27999e-06 [merge_send_recv]: 6.70998e-06 [auto_parallel]: 7.15e-06 [parallel]: 5.19e-06 [flash_sp]: 3.71999e-06 [merge_comm]: 5.77001e-06 [allreduce_fusion]: 4.84e-06 [matmul_add_comm_reduction]: 7.98999e-06 [allreduce_slice_to_reducescatter]: 3.59985e-07 [virtual_shard_identity]: 1.015e-05 [virtual_dataset]: 9.06002e-06 [get_grad_eliminate_]: 8.58001e-06 [virtual_output]: 8.54e-06 [merge_forward]: 4.53999e-06 [cell_reuse_recompute_pass]: 9.20001e-07 [offload_activation]: 9.08002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.663e-05 [merge_recompute_call_nodes]: 6.59988e-07 [before_grad]: 1.442e-05 [set_forward_comm_id_for_comm_node_pass]: 5.15999e-06 [meta_fg_expand]: 3.417e-05 [flash_sp_send_recv_attached]: 9.19972e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 1.544e-05 [a_after_grad]: 1.446e-05 [renormalize]: 0.00059862 [add_forward_monad_depend]: 4.01001e-06 [auto_monad_grad]: 1.19e-06 [auto_monad_eliminator]: 1.526e-05 [cse]: 4.73e-05 [a_3]: 6.658e-05 [Cycle 3]: 0.00091438, [45] [expand_dump_flag]: 1.02e-06 [switch_simplify]: 1.085e-05 [loop_unroll]: 9.00001e-06 [a_1]: 0.00025413 [with_stream_mark]: 1.026e-05 [recompute_prepare]: 9.41e-06 [updatestate_depend_eliminate]: 4.87998e-06 [updatestate_assign_eliminate]: 4.00998e-06 [updatestate_loads_eliminate]: 3.83001e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 0.00012517 [accelerated_algorithm]: 1.186e-05 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.80001e-06 [shard_inline]: 9.32001e-06 [merge_send_recv]: 6.84999e-06 [auto_parallel]: 7.70998e-06 [parallel]: 4.57e-06 [flash_sp]: 1.11997e-06 [merge_comm]: 5.07999e-06 [allreduce_fusion]: 4.77998e-06 [matmul_add_comm_reduction]: 7.46001e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 1.04e-05 [virtual_dataset]: 8.94e-06 [get_grad_eliminate_]: 8.69e-06 [virtual_output]: 8.34998e-06 [merge_forward]: 4.28999e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 8.62998e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.614e-05 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 1.528e-05 [set_forward_comm_id_for_comm_node_pass]: 5.76e-06 [meta_fg_expand]: 3.4e-06 [flash_sp_send_recv_attached]: 8.99978e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 1.444e-05 [a_after_grad]: 1.456e-05 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 9.50007e-07 [auto_monad_eliminator]: 1.113e-05 [cse]: 2.598e-05 [a_3]: 6.05e-05 [py_interpret_to_execute_after_opt_a]: 1.072e-05 [slice_cell_reuse_recomputed_activation]: 2.13998e-06 [rewriter_after_opt_a]: 4.641e-05 [convert_after_rewriter]: 9.12001e-06 [order_py_execute_after_rewriter]: 7.00998e-06 [mutable_eliminate]: 0.00045772 [opt_b]: 0.00034463, [1] [Cycle 1]: 0.00033853, [7] [b_1]: 0.00024166 [b_2]: 1.152e-05 [updatestate_depend_eliminate]: 7.44002e-06 [updatestate_assign_eliminate]: 4.25e-06 [updatestate_loads_eliminate]: 4.07e-06 [renormalize]: 4.19997e-07 [cse]: 3.236e-05 [optimize_parallel_all_gather_comm]: 2.068e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 1.935e-05 [loop_unroll]: 0.00042539 [opt_after_cconv]: 0.00013774, [1] [Cycle 1]: 0.0001316, [7] [c_1]: 4.893e-05 [parameter_eliminate]: 2.32001e-06 [updatestate_depend_eliminate]: 7.31001e-06 [updatestate_assign_eliminate]: 4.23001e-06 [updatestate_loads_eliminate]: 3.95e-06 [cse]: 3.016e-05 [renormalize]: 2.89991e-07 [remove_dup_value]: 2.913e-05 [tuple_transform]: 0.00010319, [1] [Cycle 1]: 9.843e-05, [4] [d_1]: 6.76e-05 [none_parameter_eliminate]: 1.71998e-06 [renormalize]: 2.19996e-07 [switch_simplify]: 1.003e-05 [partial_unused_args_eliminate]: 1.84e-06 [add_recomputation]: 5.897e-05 [cse_after_recomputation]: 3.325e-05, [1] [Cycle 1]: 2.814e-05, [1] [cse]: 2.262e-05 [environ_conv]: 8.79998e-06 [swap_dp_allreduce_reducescatter]: 7.97e-06 [bias_add_comm_swap]: 2.48002e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 3.21999e-06 [merge_cast_opt]: 1.54e-06 [slice_recompute_activation]: 2.09999e-06 [micro_interleaved_order_control]: 2.41998e-06 [assign_add_opt]: 1.33002e-06 [ForceFp32Comm]: 8.09989e-07 [remove_cast_before_assign_add]: 1.07e-06 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.80997e-06 [comm_op_add_attrs]: 9.99979e-07 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.679e-05 [grouped_pairwise_exchange_alltoall]: 1.91998e-06 [offloading_packed_experts]: 5.12e-06 [overlap_recompute_and_grad_model_parallel]: 5.82999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.37e-06 [overlap_recompute_comm]: 2.37999e-06 [overlap_grad_ring_attention]: 5.31002e-06 [overlap_grad_flash_sp]: 2.442e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 1.85001e-06 [split_layernorm_comm]: 1.56002e-06 [handle_group_info]: 1.12999e-06 [symbol_engine_optimizer]: 0.00010003, [1] [Cycle 1]: 9.569e-05, [6] [build]: 9.84999e-06 [elim_shapecalc]: 1.369e-05 [elim_not_effective]: 1.916e-05 [opt_reshape]: 1.015e-05 [fold_const_symbol]: 1.524e-05 [renormalize]: 2.10013e-07 [detach_backward]: 2.07001e-06 [pipeline_parallel_scheduler]: 1.34998e-06 [auto_monad_reorder]: 2.463e-05 [get_jit_bprop_graph]: 1.04003e-06 [rewriter_after_jit_bprop_graph]: 3.48999e-06 [opt_after_jit_grad]: 0.00047485 [validate]: 4.557e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.0791558 [execute]: 9.30001e-06 Sums bootstrap : 0.000495s : 0.48% type_inference : 0.010330s : 10.03% event_method : 0.000043s : 0.04% auto_monad : 0.000117s : 0.11% graph_reusing : 0.000009s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000031s : 0.03% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000009s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000046s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000035s : 0.03% optimize.rewriter_before_opt_a : 0.000127s : 0.12% optimize.opt_a.expand_dump_flag : 0.000006s : 0.01% optimize.opt_a.switch_simplify : 0.000126s : 0.12% optimize.opt_a.loop_unroll : 0.000109s : 0.11% optimize.opt_a.a_1 : 0.003171s : 3.08% optimize.opt_a.with_stream_mark : 0.000045s : 0.04% optimize.opt_a.recompute_prepare : 0.000043s : 0.04% optimize.opt_a.updatestate_depend_eliminate : 0.000019s : 0.02% optimize.opt_a.updatestate_assign_eliminate : 0.000016s : 0.02% optimize.opt_a.updatestate_loads_eliminate : 0.000015s : 0.01% optimize.opt_a.parameter_eliminate : 0.000004s : 0.00% optimize.opt_a.a_2 : 0.000499s : 0.49% optimize.opt_a.accelerated_algorithm : 0.000054s : 0.05% optimize.opt_a.shard : 0.000004s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000007s : 0.01% optimize.opt_a.shard_inline : 0.000035s : 0.03% optimize.opt_a.merge_send_recv : 0.000029s : 0.03% optimize.opt_a.auto_parallel : 0.000026s : 0.03% optimize.opt_a.parallel : 0.000028s : 0.03% optimize.opt_a.flash_sp : 0.000016s : 0.02% optimize.opt_a.merge_comm : 0.000021s : 0.02% optimize.opt_a.allreduce_fusion : 0.000019s : 0.02% optimize.opt_a.matmul_add_comm_reduction : 0.000041s : 0.04% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000038s : 0.04% optimize.opt_a.virtual_dataset : 0.000034s : 0.03% optimize.opt_a.get_grad_eliminate_ : 0.000033s : 0.03% optimize.opt_a.virtual_output : 0.000032s : 0.03% optimize.opt_a.merge_forward : 0.000018s : 0.02% optimize.opt_a.cell_reuse_recompute_pass : 0.000004s : 0.00% optimize.opt_a.offload_activation : 0.000035s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000061s : 0.06% optimize.opt_a.merge_recompute_call_nodes : 0.000003s : 0.00% optimize.opt_a.before_grad : 0.000058s : 0.06% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000020s : 0.02% optimize.opt_a.meta_fg_expand : 0.001470s : 1.43% optimize.opt_a.flash_sp_send_recv_attached : 0.000006s : 0.01% optimize.opt_a.receive_attached : 0.000005s : 0.00% optimize.opt_a.after_resolve : 0.000089s : 0.09% optimize.opt_a.a_after_grad : 0.000110s : 0.11% optimize.opt_a.renormalize : 0.003039s : 2.95% optimize.opt_a.add_forward_monad_depend : 0.000014s : 0.01% optimize.opt_a.auto_monad_grad : 0.000007s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000084s : 0.08% optimize.opt_a.cse : 0.000242s : 0.24% optimize.opt_a.a_3 : 0.000464s : 0.45% optimize.py_interpret_to_execute_after_opt_a : 0.000011s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000046s : 0.05% optimize.convert_after_rewriter : 0.000009s : 0.01% optimize.order_py_execute_after_rewriter : 0.000007s : 0.01% optimize.mutable_eliminate : 0.000458s : 0.44% optimize.opt_b.b_1 : 0.000242s : 0.23% optimize.opt_b.b_2 : 0.000012s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000032s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000021s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000019s : 0.02% optimize.loop_unroll : 0.000425s : 0.41% optimize.opt_after_cconv.c_1 : 0.000049s : 0.05% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000004s : 0.00% optimize.opt_after_cconv.cse : 0.000030s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000029s : 0.03% optimize.tuple_transform.d_1 : 0.000068s : 0.07% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000010s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000059s : 0.06% optimize.cse_after_recomputation.cse : 0.000023s : 0.02% optimize.environ_conv : 0.000009s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000008s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000002s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000017s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000005s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000006s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000005s : 0.01% optimize.overlap_grad_flash_sp : 0.000024s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000010s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000014s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000019s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000010s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000015s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000025s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000475s : 0.46% validate : 0.000046s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.079156s : 76.89% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000745 218 5.85% : 0.000044s : 11: substitution.arithmetic_simplify 1.83% : 0.000014s : 2: substitution.cast_eliminate 0.41% : 0.000003s : 5: substitution.elim_not_effective 0.49% : 0.000004s : 5: substitution.float_depend_g_call 0.61% : 0.000005s : 3: substitution.float_tuple_getitem_switch 0.29% : 0.000002s : 5: substitution.fold_const_symbol 1.06% : 0.000008s : 8: substitution.graph_param_transform 0.38% : 0.000003s : 2: substitution.incorporate_call 0.29% : 0.000002s : 2: substitution.incorporate_call_switch 55.30% : 0.000412s : 16: substitution.inline 2.18% : 0.000016s : 2: substitution.inline_without_move 1.41% : 0.000011s : 20: substitution.j_node_and_user_rematch 1.95% : 0.000015s : 3: substitution.less_batch_normalization 1.73% : 0.000013s : 11: substitution.minmaximum_grad 0.70% : 0.000005s : 5: substitution.partial_eliminate 1.78% : 0.000013s : 20: substitution.remove_not_recompute_node 3.24% : 0.000024s : 10: substitution.replace_applicator 1.41% : 0.000010s : 15: substitution.replace_old_param 0.35% : 0.000003s : 1: substitution.set_cell_output_no_recompute 3.68% : 0.000027s : 11: substitution.tuple_list_convert_item_index_to_positive 1.81% : 0.000013s : 11: substitution.tuple_list_get_item_const_eliminator 2.45% : 0.000018s : 11: substitution.tuple_list_get_item_depend_reorder 8.40% : 0.000063s : 28: substitution.tuple_list_get_item_eliminator 2.41% : 0.000018s : 11: substitution.tuple_list_get_set_item_eliminator ------[type_inference.] 0.010259 2 86.33% : 0.008857s : 1: type_inference.infer 13.67% : 0.001403s : 1: type_inference.specialize ------[replace.] 0.000206 30 58.42% : 0.000120s : 16: replace.inline 41.58% : 0.000085s : 14: replace.tuple_list_get_item_eliminator ------[match.] 0.000434 30 92.95% : 0.000403s : 16: match.inline 7.05% : 0.000031s : 14: match.tuple_list_get_item_eliminator ------[predicate.] 0.000747 5663 1.07% : 0.000008s : 67: predicate.accumulaten_eliminater 0.28% : 0.000002s : 8: predicate.ad_related_special_op_eliminate 0.52% : 0.000004s : 32: predicate.addn_check_dump 1.10% : 0.000008s : 67: predicate.addn_zero_filter 1.04% : 0.000008s : 67: predicate.adjust_all_reduce_mul_add 2.18% : 0.000016s : 99: predicate.arithmetic_simplify 1.14% : 0.000009s : 67: predicate.cast_eliminate 1.13% : 0.000008s : 68: predicate.check_bprop_eliminate 0.53% : 0.000004s : 32: predicate.compare_switch_simplify 0.09% : 0.000001s : 8: predicate.const_output_eliminate 0.53% : 0.000004s : 32: predicate.depend_value_elim 1.19% : 0.000009s : 67: predicate.dict_get_item_const_eliminator 1.25% : 0.000009s : 67: predicate.dict_get_item_eliminator 1.09% : 0.000008s : 67: predicate.dict_set_item_eliminator 0.39% : 0.000003s : 16: predicate.dumpgradient_eliminate 0.10% : 0.000001s : 8: predicate.elim_not_effective 0.17% : 0.000001s : 8: predicate.elim_shapecalc_of_broadcastargs 1.21% : 0.000009s : 75: predicate.environ_add_const_eliminate 1.20% : 0.000009s : 75: predicate.environ_get_add_eliminate 1.18% : 0.000009s : 75: predicate.environ_get_depend_swap 1.75% : 0.000013s : 107: predicate.environ_get_eliminate 1.19% : 0.000009s : 75: predicate.environ_get_set_eliminate 1.66% : 0.000012s : 97: predicate.exchange_switch_depend_value 2.28% : 0.000017s : 97: predicate.float_depend_g_call 0.51% : 0.000004s : 32: predicate.float_environ_get_switch 0.67% : 0.000005s : 40: predicate.float_tuple_getitem_switch 0.09% : 0.000001s : 8: predicate.fold_const_symbol 0.56% : 0.000004s : 32: predicate.get_grad_eliminate 0.09% : 0.000001s : 8: predicate.graph_param_transform 0.56% : 0.000004s : 32: predicate.incorporate_call 0.50% : 0.000004s : 32: predicate.incorporate_call_switch 5.62% : 0.000042s : 244: predicate.inline 1.30% : 0.000010s : 55: predicate.inline_without_move 0.32% : 0.000002s : 32: predicate.j_node_and_user_rematch 0.68% : 0.000005s : 32: predicate.less_batch_normalization 1.60% : 0.000012s : 97: predicate.list_to_tuple_eliminator_ 2.65% : 0.000020s : 164: predicate.load_eliminater 0.32% : 0.000002s : 8: predicate.loop_unroll_after_grad 2.19% : 0.000016s : 128: predicate.loop_unroll_before_grad 1.40% : 0.000010s : 83: predicate.make_slice_get_slice_eliminator 0.56% : 0.000004s : 32: predicate.merge_addn 1.14% : 0.000009s : 68: predicate.micro_step_allgather_replace 1.12% : 0.000008s : 68: predicate.mini_step_allgather_replace 1.10% : 0.000008s : 67: predicate.minmaximum_grad 0.33% : 0.000002s : 8: predicate.mutable_eliminate 0.16% : 0.000001s : 8: predicate.opt_reshape 0.16% : 0.000001s : 8: predicate.parallel_virtual_node 1.98% : 0.000015s : 97: predicate.partial_defer_inline 1.73% : 0.000013s : 89: predicate.partial_eliminate 1.07% : 0.000008s : 67: predicate.print_const_string_wrapper 0.54% : 0.000004s : 32: predicate.reduce_all_const_elim 1.32% : 0.000010s : 67: predicate.reduce_eliminate 2.64% : 0.000020s : 164: predicate.redundant_stop_gradient_eliminater 0.34% : 0.000003s : 32: predicate.remove_not_recompute_node 1.89% : 0.000014s : 149: predicate.replace_applicator 0.61% : 0.000005s : 55: predicate.replace_old_param 0.12% : 0.000001s : 8: predicate.reset_defer_inline 1.10% : 0.000008s : 67: predicate.reshape_eliminate 1.12% : 0.000008s : 68: predicate.row_tensor_add_zeros_like 0.16% : 0.000001s : 8: predicate.row_tensor_eliminate 1.27% : 0.000009s : 68: predicate.same_eliminate 0.39% : 0.000003s : 32: predicate.set_cell_output_no_recompute 0.61% : 0.000005s : 32: predicate.shard_identity_eliminate 0.31% : 0.000002s : 16: predicate.special_op_eliminate 0.65% : 0.000005s : 32: predicate.specialize_transform 1.23% : 0.000009s : 68: predicate.split_environ_get_set_with_tuple_value 1.14% : 0.000009s : 55: predicate.stack_unstack_eliminate 0.16% : 0.000001s : 8: predicate.switch_call_monad_eliminater 1.83% : 0.000014s : 97: predicate.switch_defer_inline 2.90% : 0.000022s : 165: predicate.switch_layer_defer_inline 4.89% : 0.000037s : 265: predicate.switch_simplify 1.06% : 0.000008s : 67: predicate.tile_eliminate 1.08% : 0.000008s : 67: predicate.transpose_eliminate 1.49% : 0.000011s : 83: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000012s : 83: predicate.tuple_list_get_item_const_eliminator 1.41% : 0.000011s : 83: predicate.tuple_list_get_item_depend_reorder 2.84% : 0.000021s : 129: predicate.tuple_list_get_item_eliminator 1.48% : 0.000011s : 83: predicate.tuple_list_get_set_item_eliminator 2.07% : 0.000015s : 115: predicate.tuple_list_set_item_eliminator 1.59% : 0.000012s : 97: predicate.tuple_to_list_eliminator_ 2.62% : 0.000020s : 164: predicate.updatestate_pure_node_eliminater 3.29% : 0.000025s : 196: predicate.updatestate_useless_node_eliminater 0.15% : 0.000001s : 8: predicate.value_based_eliminate 0.59% : 0.000004s : 32: predicate.virtual_dataset_eliminate 0.57% : 0.000004s : 32: predicate.virtual_output_eliminate 0.14% : 0.000001s : 8: predicate.virtual_view_grad_eliminate 0.19% : 0.000001s : 8: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.001528 32 56.62% : 0.000865s : 12: func_graph_cloner_run.FuncGraphClonerGraph 43.38% : 0.000663s : 20: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.131726 237 0.00% : 0.000003s : 1: ForceFp32Comm 2.29% : 0.003018s : 1: add_attr 2.28% : 0.003009s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.05% : 0.000063s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.09% : 0.000124s : 1: auto_monad 0.02% : 0.000029s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.40% : 0.000529s : 1: bootstrap 0.02% : 0.000023s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000020s : 1: control_data_broadcast_order 0.01% : 0.000012s : 1: convert_after_rewriter 0.03% : 0.000036s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000012s : 1: environ_conv 0.04% : 0.000050s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000013s : 1: graph_reusing 0.00% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.33% : 0.000434s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.35% : 0.000467s : 1: mutable_eliminate 0.01% : 0.000008s : 1: offloading_packed_experts 0.01% : 0.000017s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000018s : 1: opt.transform.mutable_eliminate 3.67% : 0.004836s : 117: opt.transform.opt_a 0.04% : 0.000048s : 1: opt.transform.opt_after_cconv 0.03% : 0.000034s : 1: opt.transform.opt_after_jit_grad 0.17% : 0.000226s : 28: opt.transform.opt_b 0.06% : 0.000076s : 2: opt.transform.opt_trans_graph 0.04% : 0.000054s : 4: opt.transform.symbol_engine_opt 8.30% : 0.010928s : 1: opt_a 0.11% : 0.000141s : 1: opt_after_cconv 0.37% : 0.000485s : 1: opt_after_jit_grad 0.26% : 0.000348s : 1: opt_b 10.05% : 0.013236s : 1: optimize 0.02% : 0.000024s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000010s : 1: order_py_execute_after_rewriter 0.02% : 0.000028s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000008s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000009s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.04% : 0.000050s : 1: pre_auto_parallel 0.03% : 0.000039s : 1: py_interpret_to_execute 0.01% : 0.000014s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000033s : 1: remove_dup_value 1.21% : 0.001599s : 2: renormalize.infer 1.08% : 0.001426s : 2: renormalize.specialize 0.00% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000050s : 1: rewriter_after_opt_a 0.10% : 0.000132s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.00% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000011s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000103s : 1: symbol_engine_optimizer 60.10% : 0.079173s : 1: task_emit 0.08% : 0.000106s : 1: tuple_transform 7.85% : 0.010345s : 1: type_inference 0.05% : 0.000070s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x3-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x3-ge],max_mem:68.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x4-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x4-pynative],max_mem:68.0M TotalTime = 0.0216345, [24] [bootstrap]: 0.0005962 [type_inference]: 0.00632578 [event_method]: 1.469e-05 [auto_monad]: 5.875e-05 [graph_reusing]: 5.04e-06 [inline]: 1.89e-06 [add_attr]: 0.00338166, [1] [add_attr_with_inline]: 0.0033708, [1] [Cycle 1]: 4.421e-05, [2] [tag_attr]: 1.503e-05 [meta_addattr_fg_expand]: 4.09002e-06 [parallel-infer-symbol]: 2.62001e-06 [pre_auto_parallel]: 2.832e-05 [insert-virtual-dataset]: 2.66e-06 [parallel-infer-symbol-second]: 8.80013e-07 [dataset_repeat_opt]: 2.24001e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00403905, [53] [py_interpret_to_execute]: 2.015e-05 [rewriter_before_opt_a]: 5.955e-05 [opt_a]: 0.00218051, [2] [Cycle 1]: 0.00157531, [45] [expand_dump_flag]: 2.86999e-06 [switch_simplify]: 6.851e-05 [loop_unroll]: 2.177e-05 [a_1]: 0.00045898 [with_stream_mark]: 1.42e-05 [recompute_prepare]: 7.50998e-06 [updatestate_depend_eliminate]: 4.18999e-06 [updatestate_assign_eliminate]: 3.45998e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 2.06e-06 [a_2]: 7.592e-05 [accelerated_algorithm]: 6.47001e-06 [shard]: 1.89e-06 [meta_shard_fg_expand]: 1.64e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 8.22998e-06 [auto_parallel]: 6.41e-06 [parallel]: 2.337e-05 [flash_sp]: 7.56001e-06 [merge_comm]: 3.76999e-06 [allreduce_fusion]: 3.5e-06 [matmul_add_comm_reduction]: 8.75999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 6.01e-06 [get_grad_eliminate_]: 5.44e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.55998e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.32999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.69e-06 [before_grad]: 9.27999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.33e-06 [meta_fg_expand]: 2.32001e-06 [flash_sp_send_recv_attached]: 2.81e-06 [receive_attached]: 2.26e-06 [after_resolve]: 1.048e-05 [a_after_grad]: 8.90001e-06 [renormalize]: 0.00043142 [add_forward_monad_depend]: 4.43001e-06 [auto_monad_grad]: 1.71e-06 [auto_monad_eliminator]: 1.364e-05 [cse]: 2.723e-05 [a_3]: 4.048e-05 [Cycle 2]: 0.00059582, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.85002e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.0001264 [with_stream_mark]: 1.003e-05 [recompute_prepare]: 5.51e-06 [updatestate_depend_eliminate]: 2.73998e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.46998e-06 [parameter_eliminate]: 8.2e-07 [a_2]: 6.758e-05 [accelerated_algorithm]: 5.54e-06 [shard]: 1.11997e-06 [meta_shard_fg_expand]: 1.24e-06 [shard_inline]: 5.83997e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.44e-06 [parallel]: 4.29997e-06 [flash_sp]: 3.01999e-06 [merge_comm]: 3.03998e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.20999e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 5.91e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 4.97999e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.48e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.27999e-06 [merge_recompute_call_nodes]: 7.50006e-07 [before_grad]: 7.80998e-06 [set_forward_comm_id_for_comm_node_pass]: 2.85998e-06 [meta_fg_expand]: 1.81e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.05001e-06 [after_resolve]: 9.36e-06 [a_after_grad]: 8.20999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 9.30013e-07 [auto_monad_eliminator]: 6.23002e-06 [cse]: 1.391e-05 [a_3]: 3.144e-05 [py_interpret_to_execute_after_opt_a]: 7.75998e-06 [slice_cell_reuse_recomputed_activation]: 1.64e-06 [rewriter_after_opt_a]: 2.991e-05 [convert_after_rewriter]: 7.1e-06 [order_py_execute_after_rewriter]: 5.62001e-06 [mutable_eliminate]: 0.00045309 [opt_b]: 0.00018485, [1] [Cycle 1]: 0.00017885, [7] [b_1]: 0.00010879 [b_2]: 7.3e-06 [updatestate_depend_eliminate]: 5.42999e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.52001e-06 [renormalize]: 3.30008e-07 [cse]: 1.731e-05 [optimize_parallel_all_gather_comm]: 1.661e-05 [overlap_param_gather]: 2.13998e-06 [cconv]: 2.157e-05 [loop_unroll]: 0.0004223 [opt_after_cconv]: 9.713e-05, [1] [Cycle 1]: 9.153e-05, [7] [c_1]: 2.819e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.726e-05 [renormalize]: 4.99975e-07 [remove_dup_value]: 1.282e-05 [tuple_transform]: 6.94e-05, [1] [Cycle 1]: 6.51e-05, [4] [d_1]: 3.95e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.79978e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.831e-05 [cse_after_recomputation]: 2.089e-05, [1] [Cycle 1]: 1.646e-05, [1] [cse]: 1.118e-05 [environ_conv]: 5.20001e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 2.41e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.99001e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.10002e-06 [micro_interleaved_order_control]: 2.22001e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 1.03001e-06 [remove_cast_before_assign_add]: 1.44e-06 [full_micro_interleaved_order_control]: 2.04e-06 [reorder_send_recv_between_fp_bp]: 2.85998e-06 [comm_op_add_attrs]: 9.79984e-07 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.12e-06 [overlap_opt_shard_grad_in_pipeline]: 1.76e-06 [control_data_broadcast_order]: 1.142e-05 [grouped_pairwise_exchange_alltoall]: 1.42e-06 [offloading_packed_experts]: 3.40003e-06 [overlap_recompute_and_grad_model_parallel]: 4.54002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.25001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31002e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.678e-05 [begin_end_overlap_inline]: 7.89994e-07 [split_matmul_comm_elemetwise]: 1.86e-06 [split_layernorm_comm]: 1.59e-06 [handle_group_info]: 9.90025e-07 [symbol_engine_optimizer]: 6.798e-05, [1] [Cycle 1]: 6.391e-05, [6] [build]: 2.06e-06 [elim_shapecalc]: 8.57998e-06 [elim_not_effective]: 1.192e-05 [opt_reshape]: 6.01998e-06 [fold_const_symbol]: 8.67e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.84e-06 [pipeline_parallel_scheduler]: 1.83002e-06 [auto_monad_reorder]: 1.672e-05 [get_jit_bprop_graph]: 1.09003e-06 [rewriter_after_jit_bprop_graph]: 3.57002e-06 [opt_after_jit_grad]: 0.00047203 [validate]: 3.208e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00642945 [execute]: 6.56e-06 Sums bootstrap : 0.000596s : 3.45% type_inference : 0.006326s : 36.62% event_method : 0.000015s : 0.09% auto_monad : 0.000059s : 0.34% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000060s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000075s : 0.44% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000585s : 3.39% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.83% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000028s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.06% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.03% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000432s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000041s : 0.24% optimize.opt_a.a_3 : 0.000072s : 0.42% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.17% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000453s : 2.62% optimize.opt_b.b_1 : 0.000109s : 0.63% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000003s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000017s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.12% optimize.loop_unroll : 0.000422s : 2.44% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000003s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000017s : 0.10% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000472s : 2.73% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006429s : 37.22% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000168 30 14.10% : 0.000024s : 5: substitution.arithmetic_simplify 1.28% : 0.000002s : 2: substitution.elim_not_effective 0.75% : 0.000001s : 2: substitution.fold_const_symbol 3.08% : 0.000005s : 4: substitution.graph_param_transform 67.87% : 0.000114s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.52% : 0.000004s : 4: substitution.remove_not_recompute_node 2.33% : 0.000004s : 4: substitution.replace_old_param 6.37% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006278 2 89.98% : 0.005649s : 1: type_inference.infer 10.02% : 0.000629s : 1: type_inference.specialize ------[replace.] 0.000039 5 71.15% : 0.000028s : 3: replace.inline 28.85% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000122 5 92.10% : 0.000112s : 3: match.inline 7.90% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.79% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.80% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.29% : 0.000004s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.70% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.03% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.34% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.39% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.34% : 0.000004s : 16: predicate.float_depend_g_call 0.61% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.69% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000009s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.38% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.75% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.37% : 0.000004s : 32: predicate.load_eliminater 1.16% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.27% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.75% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.18% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.46% : 0.000001s : 4: predicate.parallel_virtual_node 1.60% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.82% : 0.000001s : 11: predicate.print_const_string_wrapper 0.57% : 0.000001s : 8: predicate.reduce_all_const_elim 1.06% : 0.000002s : 11: predicate.reduce_eliminate 2.44% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.48% : 0.000001s : 8: predicate.remove_not_recompute_node 1.32% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.76% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.78% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.93% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.34% : 0.000002s : 16: predicate.switch_defer_inline 1.92% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.81% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.60% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.62% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.45% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.39% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.16% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.37% : 0.000001s : 4: predicate.value_based_eliminate 0.89% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.77% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000414 8 43.57% : 0.000181s : 3: func_graph_cloner_run.FuncGraphClonerGraph 56.43% : 0.000234s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030627 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.06% : 0.003386s : 1: add_attr 11.02% : 0.003374s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000064s : 1: auto_monad 0.07% : 0.000020s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.07% : 0.000635s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.41% : 0.000431s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.51% : 0.000462s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.22% : 0.000987s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.14% : 0.000044s : 2: opt.transform.opt_trans_graph 0.10% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.13% : 0.002183s : 1: opt_a 0.33% : 0.000100s : 1: opt_after_cconv 1.57% : 0.000482s : 1: opt_after_jit_grad 0.62% : 0.000188s : 1: opt_b 13.20% : 0.004043s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000009s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000217s : 1: renormalize.infer 0.68% : 0.000207s : 1: renormalize.specialize 0.02% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000064s : 1: rewriter_before_opt_a 0.01% : 0.000004s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 21.03% : 0.006441s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.70% : 0.006339s : 1: type_inference 0.21% : 0.000064s : 1: validate TotalTime = 0.0182497, [24] [bootstrap]: 0.00046804 [type_inference]: 0.0043571 [event_method]: 1.08e-05 [auto_monad]: 5.039e-05 [graph_reusing]: 4.71002e-06 [inline]: 2.02999e-06 [add_attr]: 0.00299912, [1] [add_attr_with_inline]: 0.00299113, [1] [Cycle 1]: 4.585e-05, [2] [tag_attr]: 1.181e-05 [meta_addattr_fg_expand]: 3.71001e-06 [parallel-infer-symbol]: 2.96999e-06 [pre_auto_parallel]: 2.148e-05 [insert-virtual-dataset]: 2.48e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.87999e-06 [pipeline_split]: 1.48002e-06 [optimize]: 0.00370864, [53] [py_interpret_to_execute]: 1.543e-05 [rewriter_before_opt_a]: 3.9e-05 [opt_a]: 0.00189814, [2] [Cycle 1]: 0.00125575, [45] [expand_dump_flag]: 2.34001e-06 [switch_simplify]: 2.454e-05 [loop_unroll]: 1.383e-05 [a_1]: 0.00029007 [with_stream_mark]: 1.366e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.51001e-06 [updatestate_assign_eliminate]: 3.11999e-06 [updatestate_loads_eliminate]: 3.33998e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.621e-05 [accelerated_algorithm]: 6.02999e-06 [shard]: 2.44001e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 5.87001e-06 [merge_send_recv]: 7.70998e-06 [auto_parallel]: 5.67001e-06 [parallel]: 1.818e-05 [flash_sp]: 7.50998e-06 [merge_comm]: 3.53999e-06 [allreduce_fusion]: 3.01999e-06 [matmul_add_comm_reduction]: 8.95999e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.06999e-06 [virtual_dataset]: 5.62999e-06 [get_grad_eliminate_]: 5.49998e-06 [virtual_output]: 5.43002e-06 [merge_forward]: 3.54002e-06 [cell_reuse_recompute_pass]: 1.14998e-06 [offload_activation]: 9.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.105e-05 [merge_recompute_call_nodes]: 1.36998e-06 [before_grad]: 9.20999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.29001e-06 [meta_fg_expand]: 2.26e-06 [flash_sp_send_recv_attached]: 3.11999e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.006e-05 [a_after_grad]: 8.64e-06 [renormalize]: 0.00034765 [add_forward_monad_depend]: 4.68001e-06 [auto_monad_grad]: 1.67001e-06 [auto_monad_eliminator]: 1.308e-05 [cse]: 2.619e-05 [a_3]: 4.014e-05 [Cycle 2]: 0.00063333, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.62002e-06 [loop_unroll]: 5.43002e-06 [a_1]: 0.00016325 [with_stream_mark]: 9.97999e-06 [recompute_prepare]: 5.74e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.28998e-06 [updatestate_loads_eliminate]: 2.59999e-06 [parameter_eliminate]: 7.50006e-07 [a_2]: 6.823e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.18001e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.43001e-06 [auto_parallel]: 5.30999e-06 [parallel]: 4.28999e-06 [flash_sp]: 3.42002e-06 [merge_comm]: 2.89001e-06 [allreduce_fusion]: 2.64001e-06 [matmul_add_comm_reduction]: 5.42001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 5.98002e-06 [virtual_dataset]: 5.06002e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 4.94e-06 [merge_forward]: 2.51998e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 6.22001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.51e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.93001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 7.50006e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 9.09998e-06 [a_after_grad]: 8.27e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.80002e-06 [cse]: 1.34e-05 [a_3]: 3.173e-05 [py_interpret_to_execute_after_opt_a]: 7.52998e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.21e-05 [convert_after_rewriter]: 7.01001e-06 [order_py_execute_after_rewriter]: 5.12e-06 [mutable_eliminate]: 0.00044777 [opt_b]: 0.00018357, [1] [Cycle 1]: 0.00017759, [7] [b_1]: 0.0001095 [b_2]: 7.36999e-06 [updatestate_depend_eliminate]: 5.57999e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.10014e-07 [cse]: 1.603e-05 [optimize_parallel_all_gather_comm]: 1.591e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.195e-05 [loop_unroll]: 0.00041687 [opt_after_cconv]: 9.544e-05, [1] [Cycle 1]: 8.991e-05, [7] [c_1]: 2.825e-05 [parameter_eliminate]: 2.37001e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.30002e-06 [cse]: 1.642e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.204e-05 [tuple_transform]: 6.885e-05, [1] [Cycle 1]: 6.448e-05, [4] [d_1]: 3.882e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.34001e-06 [partial_unused_args_eliminate]: 1.59e-06 [add_recomputation]: 4.207e-05 [cse_after_recomputation]: 2.039e-05, [1] [Cycle 1]: 1.614e-05, [1] [cse]: 1.081e-05 [environ_conv]: 4.43999e-06 [swap_dp_allreduce_reducescatter]: 5.37999e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 3.95e-06 [label_fine_grained_interleaved_index]: 2.71e-06 [merge_cast_opt]: 1.26002e-06 [slice_recompute_activation]: 2.45002e-06 [micro_interleaved_order_control]: 2.39999e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.08001e-06 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.67001e-06 [comm_op_add_attrs]: 9.20001e-07 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 9.99979e-07 [overlap_opt_shard_in_pipeline]: 1.37e-06 [overlap_opt_shard_grad_in_pipeline]: 1.59e-06 [control_data_broadcast_order]: 1.202e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.65998e-06 [overlap_recompute_and_grad_model_parallel]: 4.29002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.70001e-06 [overlap_recompute_comm]: 2.26e-06 [overlap_grad_ring_attention]: 3.81999e-06 [overlap_grad_flash_sp]: 1.637e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 1.98997e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.892e-05, [1] [Cycle 1]: 6.485e-05, [6] [build]: 2.08002e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.171e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 9.10999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.83997e-06 [pipeline_parallel_scheduler]: 1.58002e-06 [auto_monad_reorder]: 1.541e-05 [get_jit_bprop_graph]: 9.70002e-07 [rewriter_after_jit_bprop_graph]: 3.50003e-06 [opt_after_jit_grad]: 0.00044816 [validate]: 2.964e-05 [backend_pass]: 8.59989e-07 [task_emit]: 0.00591267 [execute]: 6.88e-06 Sums bootstrap : 0.000468s : 3.28% type_inference : 0.004357s : 30.49% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.13% optimize.opt_a.a_1 : 0.000453s : 3.17% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.01% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000004s : 0.03% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000006s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.13% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000348s : 2.43% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000040s : 0.28% optimize.opt_a.a_3 : 0.000072s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000448s : 3.13% optimize.opt_b.b_1 : 0.000110s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000417s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.11% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000448s : 3.14% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005913s : 41.38% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000119 26 18.44% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.72% : 0.000006s : 4: substitution.graph_param_transform 65.46% : 0.000078s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.58% : 0.000004s : 4: substitution.remove_not_recompute_node 3.06% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004316 2 91.93% : 0.003967s : 1: type_inference.infer 8.07% : 0.000348s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 1.08% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.73% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.34% : 0.000003s : 17: predicate.arithmetic_simplify 0.81% : 0.000001s : 9: predicate.cast_eliminate 0.81% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.12% : 0.000002s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.31% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.55% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.08% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.70% : 0.000001s : 8: predicate.float_environ_get_switch 1.02% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.29% : 0.000000s : 4: predicate.fold_const_symbol 0.88% : 0.000001s : 8: predicate.get_grad_eliminate 0.31% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.64% : 0.000001s : 8: predicate.incorporate_call_switch 5.94% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.93% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.27% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.71% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.88% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.39% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.48% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.80% : 0.000001s : 9: predicate.print_const_string_wrapper 0.74% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.16% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.74% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.79% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.78% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 1.01% : 0.000001s : 8: predicate.same_eliminate 0.64% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.68% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.79% : 0.000001s : 9: predicate.transpose_eliminate 1.59% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.54% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.35% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.14% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.15% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.92% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000239 6 41.50% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 58.50% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026268 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.43% : 0.003003s : 1: add_attr 11.40% : 0.002995s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000504s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000426s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.74% : 0.000457s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 3.06% : 0.000803s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000092s : 28: opt.transform.opt_b 0.16% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.24% : 0.001901s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.74% : 0.000458s : 1: opt_after_jit_grad 0.71% : 0.000187s : 1: opt_b 14.13% : 0.003712s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.02% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000020s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.72% : 0.000190s : 1: renormalize.infer 0.58% : 0.000151s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.14% : 0.000036s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000072s : 1: symbol_engine_optimizer 22.55% : 0.005923s : 1: task_emit 0.27% : 0.000072s : 1: tuple_transform 16.64% : 0.004372s : 1: type_inference 0.21% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x4-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x4-kbk],max_mem:68.0M . TotalTime = 0.0791535, [24] [bootstrap]: 0.00058964 [type_inference]: 0.00620842 [event_method]: 1.438e-05 [auto_monad]: 5.85e-05 [graph_reusing]: 5.39e-06 [inline]: 1.76e-06 [add_attr]: 0.00343614, [1] [add_attr_with_inline]: 0.00342459, [1] [Cycle 1]: 4.282e-05, [2] [tag_attr]: 1.475e-05 [meta_addattr_fg_expand]: 3.97998e-06 [parallel-infer-symbol]: 2.71e-06 [pre_auto_parallel]: 2.727e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.74998e-06 [pipeline_split]: 1.54998e-06 [optimize]: 0.00398618, [53] [py_interpret_to_execute]: 1.926e-05 [rewriter_before_opt_a]: 5.91e-05 [opt_a]: 0.00213429, [2] [Cycle 1]: 0.00152916, [45] [expand_dump_flag]: 2.91999e-06 [switch_simplify]: 3.143e-05 [loop_unroll]: 2.097e-05 [a_1]: 0.00045537 [with_stream_mark]: 1.317e-05 [recompute_prepare]: 8.23001e-06 [updatestate_depend_eliminate]: 3.85e-06 [updatestate_assign_eliminate]: 3.15002e-06 [updatestate_loads_eliminate]: 2.77002e-06 [parameter_eliminate]: 2.08002e-06 [a_2]: 7.566e-05 [accelerated_algorithm]: 6.38e-06 [shard]: 2.08002e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.82001e-06 [merge_send_recv]: 8.22e-06 [auto_parallel]: 5.91e-06 [parallel]: 2.275e-05 [flash_sp]: 6.71999e-06 [merge_comm]: 3.68e-06 [allreduce_fusion]: 3.56001e-06 [matmul_add_comm_reduction]: 8.29002e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.34002e-06 [virtual_dataset]: 5.96e-06 [get_grad_eliminate_]: 5.48002e-06 [virtual_output]: 5.64e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 8.89e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.059e-05 [merge_recompute_call_nodes]: 1.38002e-06 [before_grad]: 9.12999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.46001e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.53e-06 [receive_attached]: 2.48998e-06 [after_resolve]: 1.004e-05 [a_after_grad]: 8.74998e-06 [renormalize]: 0.00043073 [add_forward_monad_depend]: 4.99e-06 [auto_monad_grad]: 1.84e-06 [auto_monad_eliminator]: 1.368e-05 [cse]: 2.714e-05 [a_3]: 4.108e-05 [Cycle 2]: 0.00059579, [45] [expand_dump_flag]: 8.99978e-07 [switch_simplify]: 6.81999e-06 [loop_unroll]: 5.68997e-06 [a_1]: 0.00012595 [with_stream_mark]: 9.57001e-06 [recompute_prepare]: 5.86e-06 [updatestate_depend_eliminate]: 2.87002e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.78998e-06 [parameter_eliminate]: 7.60017e-07 [a_2]: 6.867e-05 [accelerated_algorithm]: 5.82001e-06 [shard]: 1.16002e-06 [meta_shard_fg_expand]: 1.21997e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 4.47e-06 [auto_parallel]: 5.14e-06 [parallel]: 4.43999e-06 [flash_sp]: 3.11001e-06 [merge_comm]: 2.96001e-06 [allreduce_fusion]: 2.63e-06 [matmul_add_comm_reduction]: 5.12e-06 [allreduce_slice_to_reducescatter]: 3.60014e-07 [virtual_shard_identity]: 6.11e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 4.94e-06 [virtual_output]: 4.92e-06 [merge_forward]: 2.67001e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 5.97999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.80998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 1.64e-06 [flash_sp_send_recv_attached]: 8.29983e-07 [receive_attached]: 1.00001e-06 [after_resolve]: 9.43002e-06 [a_after_grad]: 7.96001e-06 [renormalize]: 1.10012e-07 [add_forward_monad_depend]: 1.07998e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.69001e-06 [cse]: 1.765e-05 [a_3]: 3.205e-05 [py_interpret_to_execute_after_opt_a]: 7.65e-06 [slice_cell_reuse_recomputed_activation]: 1.76998e-06 [rewriter_after_opt_a]: 3.048e-05 [convert_after_rewriter]: 6.67002e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00045157 [opt_b]: 0.00018725, [1] [Cycle 1]: 0.00018102, [7] [b_1]: 0.00011082 [b_2]: 7.55e-06 [updatestate_depend_eliminate]: 5.64e-06 [updatestate_assign_eliminate]: 2.59999e-06 [updatestate_loads_eliminate]: 2.12999e-06 [renormalize]: 5.00004e-07 [cse]: 1.678e-05 [optimize_parallel_all_gather_comm]: 1.586e-05 [overlap_param_gather]: 2.09999e-06 [cconv]: 2.05e-05 [loop_unroll]: 0.00041823 [opt_after_cconv]: 9.667e-05, [1] [Cycle 1]: 9.095e-05, [7] [c_1]: 2.843e-05 [parameter_eliminate]: 2.26e-06 [updatestate_depend_eliminate]: 5.39998e-06 [updatestate_assign_eliminate]: 2.57001e-06 [updatestate_loads_eliminate]: 2.29001e-06 [cse]: 1.657e-05 [renormalize]: 3.89991e-07 [remove_dup_value]: 1.232e-05 [tuple_transform]: 6.901e-05, [1] [Cycle 1]: 6.455e-05, [4] [d_1]: 3.841e-05 [none_parameter_eliminate]: 1.59998e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.17001e-06 [partial_unused_args_eliminate]: 1.87999e-06 [add_recomputation]: 4.798e-05 [cse_after_recomputation]: 2.069e-05, [1] [Cycle 1]: 1.595e-05, [1] [cse]: 1.081e-05 [environ_conv]: 4.64998e-06 [swap_dp_allreduce_reducescatter]: 4.99e-06 [bias_add_comm_swap]: 2.53003e-06 [label_micro_interleaved_index]: 4.1e-06 [label_fine_grained_interleaved_index]: 2.85998e-06 [merge_cast_opt]: 1.37999e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.35002e-06 [assign_add_opt]: 1.25999e-06 [ForceFp32Comm]: 8.70001e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 1.97999e-06 [reorder_send_recv_between_fp_bp]: 2.53e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.29984e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.19e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57001e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.47001e-06 [offloading_packed_experts]: 3.76001e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29998e-06 [overlap_recompute_comm]: 1.96e-06 [overlap_grad_ring_attention]: 4.11001e-06 [overlap_grad_flash_sp]: 1.644e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 1.84998e-06 [split_layernorm_comm]: 1.69e-06 [handle_group_info]: 1.37e-06 [symbol_engine_optimizer]: 6.994e-05, [1] [Cycle 1]: 6.577e-05, [6] [build]: 2.14e-06 [elim_shapecalc]: 8.90001e-06 [elim_not_effective]: 1.114e-05 [opt_reshape]: 6.37001e-06 [fold_const_symbol]: 9.81e-06 [renormalize]: 2.20025e-07 [detach_backward]: 1.45001e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.637e-05 [get_jit_bprop_graph]: 1.07e-06 [rewriter_after_jit_bprop_graph]: 3.71999e-06 [opt_after_jit_grad]: 0.00045511 [validate]: 3.085e-05 [backend_pass]: 8.90024e-07 [task_emit]: 0.0640941 [execute]: 7.9e-06 Sums bootstrap : 0.000590s : 0.79% type_inference : 0.006208s : 8.31% event_method : 0.000014s : 0.02% auto_monad : 0.000059s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000581s : 0.78% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000013s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000027s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000431s : 0.58% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000045s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.60% optimize.opt_b.b_1 : 0.000111s : 0.15% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000418s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000038s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000048s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000455s : 0.61% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.064094s : 85.75% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.63% : 0.000024s : 5: substitution.arithmetic_simplify 1.01% : 0.000002s : 2: substitution.elim_not_effective 0.80% : 0.000001s : 2: substitution.fold_const_symbol 3.14% : 0.000005s : 4: substitution.graph_param_transform 67.02% : 0.000110s : 3: substitution.inline 1.64% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.51% : 0.000004s : 4: substitution.replace_old_param 6.72% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006166 2 90.72% : 0.005594s : 1: type_inference.infer 9.28% : 0.000572s : 1: type_inference.specialize ------[replace.] 0.000038 5 67.93% : 0.000026s : 3: replace.inline 32.07% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.54% : 0.000108s : 3: match.inline 8.46% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.89% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.21% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.11% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.91% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.26% : 0.000000s : 4: predicate.elim_not_effective 0.37% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.11% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.93% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.63% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.16% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.47% : 0.000004s : 32: predicate.load_eliminater 1.12% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.18% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.66% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.86% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.39% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.59% : 0.000003s : 16: predicate.partial_defer_inline 1.43% : 0.000002s : 17: predicate.partial_eliminate 0.91% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.46% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.48% : 0.000002s : 21: predicate.replace_applicator 0.62% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.63% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.89% : 0.000001s : 8: predicate.shard_identity_eliminate 0.72% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.40% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.95% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.89% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.31% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.41% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.30% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.18% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.32% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000380 8 44.03% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.97% : 0.000213s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.088107 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.90% : 0.003440s : 1: add_attr 3.89% : 0.003428s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000064s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.71% : 0.000626s : 1: bootstrap 0.03% : 0.000024s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000013s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.48% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.52% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.08% : 0.000948s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000092s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.43% : 0.002137s : 1: opt_a 0.11% : 0.000100s : 1: opt_after_cconv 0.53% : 0.000465s : 1: opt_after_jit_grad 0.22% : 0.000191s : 1: opt_b 4.53% : 0.003990s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000213s : 1: renormalize.infer 0.24% : 0.000211s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000073s : 1: symbol_engine_optimizer 72.76% : 0.064110s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 7.06% : 0.006222s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.0705961, [24] [bootstrap]: 0.0004724 [type_inference]: 0.00439846 [event_method]: 1.099e-05 [auto_monad]: 5.129e-05 [graph_reusing]: 5.17999e-06 [inline]: 1.71e-06 [add_attr]: 0.00298873, [1] [add_attr_with_inline]: 0.00298072, [1] [Cycle 1]: 4.628e-05, [2] [tag_attr]: 1.138e-05 [meta_addattr_fg_expand]: 3.36999e-06 [parallel-infer-symbol]: 2.89001e-06 [pre_auto_parallel]: 2.058e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.2e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00372198, [53] [py_interpret_to_execute]: 1.506e-05 [rewriter_before_opt_a]: 3.999e-05 [opt_a]: 0.00187688, [2] [Cycle 1]: 0.00126838, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.436e-05 [loop_unroll]: 1.396e-05 [a_1]: 0.00029376 [with_stream_mark]: 1.314e-05 [recompute_prepare]: 7.13e-06 [updatestate_depend_eliminate]: 3.55e-06 [updatestate_assign_eliminate]: 3.18998e-06 [updatestate_loads_eliminate]: 3.10002e-06 [parameter_eliminate]: 1.84e-06 [a_2]: 7.598e-05 [accelerated_algorithm]: 6.73e-06 [shard]: 2.24001e-06 [meta_shard_fg_expand]: 1.50001e-06 [shard_inline]: 5.91998e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 5.49e-06 [parallel]: 1.78e-05 [flash_sp]: 7.55998e-06 [merge_comm]: 3.32002e-06 [allreduce_fusion]: 3.48e-06 [matmul_add_comm_reduction]: 8.48001e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 6.74999e-06 [virtual_dataset]: 5.67999e-06 [get_grad_eliminate_]: 5.75001e-06 [virtual_output]: 5.52999e-06 [merge_forward]: 3.46999e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 9.00999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.07e-05 [merge_recompute_call_nodes]: 1.30001e-06 [before_grad]: 9.31e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41001e-06 [meta_fg_expand]: 2.24001e-06 [flash_sp_send_recv_attached]: 2.34001e-06 [receive_attached]: 2.76e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 9.07999e-06 [renormalize]: 0.00035623 [add_forward_monad_depend]: 4.34002e-06 [auto_monad_grad]: 1.68997e-06 [auto_monad_eliminator]: 1.398e-05 [cse]: 2.715e-05 [a_3]: 4.059e-05 [Cycle 2]: 0.00059935, [45] [expand_dump_flag]: 8.50006e-07 [switch_simplify]: 6.94001e-06 [loop_unroll]: 5.41002e-06 [a_1]: 0.00012634 [with_stream_mark]: 1.078e-05 [recompute_prepare]: 5.72001e-06 [updatestate_depend_eliminate]: 2.89001e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.54001e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.786e-05 [accelerated_algorithm]: 5.74e-06 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.20001e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.24002e-06 [auto_parallel]: 5.39e-06 [parallel]: 4.35999e-06 [flash_sp]: 3.02002e-06 [merge_comm]: 3.09001e-06 [allreduce_fusion]: 2.80002e-06 [matmul_add_comm_reduction]: 4.97e-06 [allreduce_slice_to_reducescatter]: 2.80008e-07 [virtual_shard_identity]: 6.35002e-06 [virtual_dataset]: 5.20999e-06 [get_grad_eliminate_]: 5.08002e-06 [virtual_output]: 5.02999e-06 [merge_forward]: 2.59999e-06 [cell_reuse_recompute_pass]: 1.29998e-06 [offload_activation]: 6.09001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.71e-06 [merge_recompute_call_nodes]: 7.30011e-07 [before_grad]: 8.17e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00002e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 9.00007e-07 [receive_attached]: 1.13001e-06 [after_resolve]: 9.75002e-06 [a_after_grad]: 8.43999e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.15999e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.11e-06 [cse]: 1.287e-05 [a_3]: 3.229e-05 [py_interpret_to_execute_after_opt_a]: 7.45998e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.064e-05 [convert_after_rewriter]: 7.31999e-06 [order_py_execute_after_rewriter]: 4.94e-06 [mutable_eliminate]: 0.00044914 [opt_b]: 0.00018367, [1] [Cycle 1]: 0.00017767, [7] [b_1]: 0.00010837 [b_2]: 7.34002e-06 [updatestate_depend_eliminate]: 5.71003e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.17999e-06 [renormalize]: 4.69998e-07 [cse]: 1.644e-05 [optimize_parallel_all_gather_comm]: 1.524e-05 [overlap_param_gather]: 1.96e-06 [cconv]: 2.213e-05 [loop_unroll]: 0.00042 [opt_after_cconv]: 0.000122, [1] [Cycle 1]: 9.147e-05, [7] [c_1]: 2.935e-05 [parameter_eliminate]: 2.14e-06 [updatestate_depend_eliminate]: 5.20001e-06 [updatestate_assign_eliminate]: 2.59001e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.622e-05 [renormalize]: 1.80007e-07 [remove_dup_value]: 1.296e-05 [tuple_transform]: 6.89e-05, [1] [Cycle 1]: 6.481e-05, [4] [d_1]: 3.924e-05 [none_parameter_eliminate]: 1.47999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.19001e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.59e-05 [cse_after_recomputation]: 2.072e-05, [1] [Cycle 1]: 1.654e-05, [1] [cse]: 1.125e-05 [environ_conv]: 4.23001e-06 [swap_dp_allreduce_reducescatter]: 5.35001e-06 [bias_add_comm_swap]: 2.61999e-06 [label_micro_interleaved_index]: 4.14997e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.29998e-06 [slice_recompute_activation]: 2.25002e-06 [micro_interleaved_order_control]: 3.13e-06 [assign_add_opt]: 1.20001e-06 [ForceFp32Comm]: 7.79983e-07 [remove_cast_before_assign_add]: 1.02e-06 [full_micro_interleaved_order_control]: 2.39999e-06 [reorder_send_recv_between_fp_bp]: 2.58e-06 [comm_op_add_attrs]: 1.30001e-06 [add_comm_op_reuse_tag]: 9.89996e-07 [interleave_split_concat_branches]: 1.22e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.161e-05 [grouped_pairwise_exchange_alltoall]: 1.81e-06 [offloading_packed_experts]: 3.5e-06 [overlap_recompute_and_grad_model_parallel]: 4.45e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.10999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.64001e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.647e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 2.11e-06 [split_layernorm_comm]: 1.84998e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 6.802e-05, [1] [Cycle 1]: 6.403e-05, [6] [build]: 2.31998e-06 [elim_shapecalc]: 8.45001e-06 [elim_not_effective]: 1.126e-05 [opt_reshape]: 6.04001e-06 [fold_const_symbol]: 8.96998e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.86e-06 [pipeline_parallel_scheduler]: 1.35001e-06 [auto_monad_reorder]: 1.597e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.26001e-06 [opt_after_jit_grad]: 0.00045048 [validate]: 3.122e-05 [backend_pass]: 8.99978e-07 [task_emit]: 0.0582041 [execute]: 8.14002e-06 Sums bootstrap : 0.000472s : 0.71% type_inference : 0.004398s : 6.60% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000011s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000040s : 0.06% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000420s : 0.63% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000011s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.03% optimize.opt_a.renormalize : 0.000356s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.67% optimize.opt_b.b_1 : 0.000108s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000420s : 0.63% optimize.opt_after_cconv.c_1 : 0.000029s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.07% optimize.cse_after_recomputation.cse : 0.000011s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000450s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058204s : 87.37% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000120 26 18.59% : 0.000022s : 4: substitution.arithmetic_simplify 1.48% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.63% : 0.000006s : 4: substitution.graph_param_transform 65.21% : 0.000078s : 2: substitution.inline 2.30% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.57% : 0.000004s : 4: substitution.remove_not_recompute_node 3.17% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004359 2 91.76% : 0.004000s : 1: type_inference.infer 8.24% : 0.000359s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000139 984 0.83% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.33% : 0.000003s : 17: predicate.arithmetic_simplify 0.80% : 0.000001s : 9: predicate.cast_eliminate 0.83% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.67% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.49% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.15% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.98% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.93% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.03% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.31% : 0.000000s : 4: predicate.fold_const_symbol 1.00% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.82% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 1.05% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.27% : 0.000002s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.13% : 0.000003s : 26: predicate.load_eliminater 1.32% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.70% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.70% : 0.000001s : 9: predicate.minmaximum_grad 1.43% : 0.000002s : 4: predicate.mutable_eliminate 0.44% : 0.000001s : 4: predicate.opt_reshape 0.60% : 0.000001s : 4: predicate.parallel_virtual_node 1.24% : 0.000002s : 11: predicate.partial_defer_inline 1.21% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 0.91% : 0.000001s : 9: predicate.reduce_eliminate 2.08% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.83% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.40% : 0.000001s : 4: predicate.reset_defer_inline 0.80% : 0.000001s : 9: predicate.reshape_eliminate 0.75% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.88% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.91% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.91% : 0.000001s : 8: predicate.specialize_transform 1.12% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.91% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.39% : 0.000006s : 41: predicate.switch_simplify 0.70% : 0.000001s : 9: predicate.tile_eliminate 0.75% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.60% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.36% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.29% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.55% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.14% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.40% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000262 6 42.96% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.04% : 0.000150s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078595 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.81% : 0.002993s : 1: add_attr 3.80% : 0.002984s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000057s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.65% : 0.000509s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000024s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.55% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000006s : 1: micro_interleaved_order_control 0.58% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.98% : 0.000773s : 78: opt.transform.opt_a 0.04% : 0.000028s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000092s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.39% : 0.001880s : 1: opt_a 0.16% : 0.000126s : 1: opt_after_cconv 0.59% : 0.000460s : 1: opt_after_jit_grad 0.24% : 0.000187s : 1: opt_b 4.74% : 0.003726s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.25% : 0.000193s : 1: renormalize.infer 0.20% : 0.000157s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000044s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.08% : 0.058219s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.61% : 0.004411s : 1: type_inference 0.07% : 0.000052s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x4-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x4-ge],max_mem:68.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x5-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x5-pynative],max_mem:68.0M TotalTime = 0.0212243, [24] [bootstrap]: 0.00058398 [type_inference]: 0.00622652 [event_method]: 1.433e-05 [auto_monad]: 6.151e-05 [graph_reusing]: 5.46998e-06 [inline]: 1.99999e-06 [add_attr]: 0.00337489, [1] [add_attr_with_inline]: 0.00336446, [1] [Cycle 1]: 4.403e-05, [2] [tag_attr]: 1.517e-05 [meta_addattr_fg_expand]: 4.58001e-06 [parallel-infer-symbol]: 2.73e-06 [pre_auto_parallel]: 2.793e-05 [insert-virtual-dataset]: 2.36e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 2.00002e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00402714, [53] [py_interpret_to_execute]: 2.125e-05 [rewriter_before_opt_a]: 5.764e-05 [opt_a]: 0.0021775, [2] [Cycle 1]: 0.0015697, [45] [expand_dump_flag]: 3.11999e-06 [switch_simplify]: 3.146e-05 [loop_unroll]: 2.127e-05 [a_1]: 0.00045636 [with_stream_mark]: 1.376e-05 [recompute_prepare]: 8.02e-06 [updatestate_depend_eliminate]: 3.5e-06 [updatestate_assign_eliminate]: 3.41001e-06 [updatestate_loads_eliminate]: 2.91999e-06 [parameter_eliminate]: 2.19001e-06 [a_2]: 7.546e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 1.84e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 6.05002e-06 [merge_send_recv]: 7.66999e-06 [auto_parallel]: 5.69e-06 [parallel]: 2.294e-05 [flash_sp]: 6.79999e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.3e-06 [matmul_add_comm_reduction]: 8.82e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.18e-06 [virtual_dataset]: 6.23e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.66e-06 [merge_forward]: 3.81999e-06 [cell_reuse_recompute_pass]: 1.20001e-06 [offload_activation]: 9.29e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.09e-05 [merge_recompute_call_nodes]: 1.37e-06 [before_grad]: 9.37999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.39001e-06 [meta_fg_expand]: 2.36e-06 [flash_sp_send_recv_attached]: 2.54999e-06 [receive_attached]: 3.06999e-06 [after_resolve]: 1.065e-05 [a_after_grad]: 8.57998e-06 [renormalize]: 0.00046611 [add_forward_monad_depend]: 4.95999e-06 [auto_monad_grad]: 1.92001e-06 [auto_monad_eliminator]: 1.265e-05 [cse]: 2.617e-05 [a_3]: 4.15e-05 [Cycle 2]: 0.00059827, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 7.08e-06 [loop_unroll]: 5.46998e-06 [a_1]: 0.00012631 [with_stream_mark]: 9.14e-06 [recompute_prepare]: 5.76998e-06 [updatestate_depend_eliminate]: 2.68998e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.42001e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.771e-05 [accelerated_algorithm]: 5.64998e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.14e-06 [shard_inline]: 5.36002e-06 [merge_send_recv]: 4.87e-06 [auto_parallel]: 5.20999e-06 [parallel]: 3.6e-06 [flash_sp]: 3.75998e-06 [merge_comm]: 3.13e-06 [allreduce_fusion]: 4.80001e-06 [matmul_add_comm_reduction]: 4.95999e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.06e-06 [virtual_dataset]: 5.24e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 4.95999e-06 [merge_forward]: 2.48002e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 6.09999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.56998e-06 [merge_recompute_call_nodes]: 6.89994e-07 [before_grad]: 7.8e-06 [set_forward_comm_id_for_comm_node_pass]: 3.00002e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.11997e-06 [after_resolve]: 1.004e-05 [a_after_grad]: 8.45001e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.19998e-06 [auto_monad_grad]: 1.05999e-06 [auto_monad_eliminator]: 6.62002e-06 [cse]: 1.256e-05 [a_3]: 3.19e-05 [py_interpret_to_execute_after_opt_a]: 7.66001e-06 [slice_cell_reuse_recomputed_activation]: 1.98002e-06 [rewriter_after_opt_a]: 3.12e-05 [convert_after_rewriter]: 7.06001e-06 [order_py_execute_after_rewriter]: 5.07999e-06 [mutable_eliminate]: 0.00045097 [opt_b]: 0.00018171, [1] [Cycle 1]: 0.00017593, [7] [b_1]: 0.00010863 [b_2]: 7.38e-06 [updatestate_depend_eliminate]: 5.03002e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.38998e-06 [renormalize]: 3.69997e-07 [cse]: 1.605e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 1.72001e-06 [cconv]: 2.18e-05 [loop_unroll]: 0.00042056 [opt_after_cconv]: 9.502e-05, [1] [Cycle 1]: 8.943e-05, [7] [c_1]: 2.839e-05 [parameter_eliminate]: 2.19001e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.34001e-06 [cse]: 1.557e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.185e-05 [tuple_transform]: 7.086e-05, [1] [Cycle 1]: 6.668e-05, [4] [d_1]: 3.98e-05 [none_parameter_eliminate]: 1.89e-06 [renormalize]: 2.40019e-07 [switch_simplify]: 6.27001e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.865e-05 [cse_after_recomputation]: 2.019e-05, [1] [Cycle 1]: 1.583e-05, [1] [cse]: 1.05e-05 [environ_conv]: 5.12e-06 [swap_dp_allreduce_reducescatter]: 5.34998e-06 [bias_add_comm_swap]: 2.27001e-06 [label_micro_interleaved_index]: 4.30999e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.14998e-06 [slice_recompute_activation]: 2.26e-06 [micro_interleaved_order_control]: 2.74999e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.70002e-07 [full_micro_interleaved_order_control]: 2.16e-06 [reorder_send_recv_between_fp_bp]: 2.54001e-06 [comm_op_add_attrs]: 9.39996e-07 [add_comm_op_reuse_tag]: 8.89995e-07 [interleave_split_concat_branches]: 1.10999e-06 [interleave_parallel_branches]: 1.37e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55001e-06 [control_data_broadcast_order]: 1.144e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.58999e-06 [overlap_recompute_and_grad_model_parallel]: 5.07e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12e-06 [overlap_recompute_allgather_and_fa_grad]: 1.47999e-06 [overlap_recompute_comm]: 2.48e-06 [overlap_grad_ring_attention]: 4e-06 [overlap_grad_flash_sp]: 1.623e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.27001e-06 [split_layernorm_comm]: 2.16998e-06 [handle_group_info]: 9.60019e-07 [symbol_engine_optimizer]: 6.933e-05, [1] [Cycle 1]: 6.529e-05, [6] [build]: 2.46998e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 6.45002e-06 [fold_const_symbol]: 9.00999e-06 [renormalize]: 2.00002e-07 [detach_backward]: 1.74e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.498e-05 [get_jit_bprop_graph]: 1.02e-06 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00045432 [validate]: 3.054e-05 [backend_pass]: 1.06002e-06 [task_emit]: 0.00617536 [execute]: 7.16999e-06 Sums bootstrap : 0.000584s : 3.46% type_inference : 0.006227s : 36.89% event_method : 0.000014s : 0.08% auto_monad : 0.000062s : 0.36% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000005s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000583s : 3.45% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000143s : 0.85% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.06% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000011s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000008s : 0.05% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000021s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000466s : 2.76% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000039s : 0.23% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.67% optimize.opt_b.b_1 : 0.000109s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000421s : 2.49% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.09% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000040s : 0.24% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000010s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000454s : 2.69% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006175s : 36.58% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000166 30 15.46% : 0.000026s : 5: substitution.arithmetic_simplify 1.24% : 0.000002s : 2: substitution.elim_not_effective 0.72% : 0.000001s : 2: substitution.fold_const_symbol 3.53% : 0.000006s : 4: substitution.graph_param_transform 65.74% : 0.000109s : 3: substitution.inline 1.58% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000005s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 6.58% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006183 2 90.94% : 0.005622s : 1: type_inference.infer 9.06% : 0.000560s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.76% : 0.000027s : 3: replace.inline 30.24% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.58% : 0.000107s : 3: match.inline 8.42% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.86% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.91% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.63% : 0.000001s : 8: predicate.depend_value_elim 0.88% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.78% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.09% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.42% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.33% : 0.000004s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.74% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.71% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000010s : 51: predicate.inline 0.81% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.03% : 0.000002s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.00% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.13% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.73% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.61% : 0.000001s : 8: predicate.merge_addn 0.64% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.00% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.55% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.86% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.40% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.71% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.77% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.90% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.94% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.78% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.36% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.11% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.06% : 0.000008s : 54: predicate.switch_simplify 0.83% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.56% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.26% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.55% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.67% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.27% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.61% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.36% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.63% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000372 8 48.70% : 0.000181s : 3: func_graph_cloner_run.FuncGraphClonerGraph 51.30% : 0.000191s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030201 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.19% : 0.003379s : 1: add_attr 11.15% : 0.003368s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.22% : 0.000067s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.06% : 0.000622s : 1: bootstrap 0.08% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.52% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.15% : 0.000951s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000091s : 28: opt.transform.opt_b 0.15% : 0.000044s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.22% : 0.002180s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.54% : 0.000464s : 1: opt_after_jit_grad 0.61% : 0.000185s : 1: opt_b 13.35% : 0.004031s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.06% : 0.000019s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000033s : 1: pre_auto_parallel 0.08% : 0.000026s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.84% : 0.000253s : 1: renormalize.infer 0.68% : 0.000207s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000072s : 1: symbol_engine_optimizer 20.48% : 0.006185s : 1: task_emit 0.24% : 0.000074s : 1: tuple_transform 20.66% : 0.006240s : 1: type_inference 0.20% : 0.000060s : 1: validate TotalTime = 0.0181325, [24] [bootstrap]: 0.00046758 [type_inference]: 0.00434253 [event_method]: 1.019e-05 [auto_monad]: 5.096e-05 [graph_reusing]: 5.29e-06 [inline]: 2.05002e-06 [add_attr]: 0.00296651, [1] [add_attr_with_inline]: 0.00295831, [1] [Cycle 1]: 4.599e-05, [2] [tag_attr]: 1.207e-05 [meta_addattr_fg_expand]: 3.40998e-06 [parallel-infer-symbol]: 2.99001e-06 [pre_auto_parallel]: 2.109e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.09988e-07 [dataset_repeat_opt]: 1.96998e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00369882, [53] [py_interpret_to_execute]: 1.576e-05 [rewriter_before_opt_a]: 3.915e-05 [opt_a]: 0.00186816, [2] [Cycle 1]: 0.00126055, [45] [expand_dump_flag]: 2.53e-06 [switch_simplify]: 2.368e-05 [loop_unroll]: 1.372e-05 [a_1]: 0.00029349 [with_stream_mark]: 1.313e-05 [recompute_prepare]: 7.71001e-06 [updatestate_depend_eliminate]: 3.62002e-06 [updatestate_assign_eliminate]: 3.76999e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.66998e-06 [a_2]: 7.695e-05 [accelerated_algorithm]: 6.19999e-06 [shard]: 2.09999e-06 [meta_shard_fg_expand]: 1.52999e-06 [shard_inline]: 5.71e-06 [merge_send_recv]: 7.8e-06 [auto_parallel]: 5.62999e-06 [parallel]: 1.752e-05 [flash_sp]: 7.41999e-06 [merge_comm]: 3.64002e-06 [allreduce_fusion]: 3.24001e-06 [matmul_add_comm_reduction]: 9.14e-06 [allreduce_slice_to_reducescatter]: 6.59988e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 5.71e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.53997e-06 [merge_forward]: 3.86999e-06 [cell_reuse_recompute_pass]: 1.12e-06 [offload_activation]: 9.03002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.123e-05 [merge_recompute_call_nodes]: 1.64e-06 [before_grad]: 9.50001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.29001e-06 [flash_sp_send_recv_attached]: 2.20002e-06 [receive_attached]: 2.36e-06 [after_resolve]: 1.056e-05 [a_after_grad]: 8.83001e-06 [renormalize]: 0.00034592 [add_forward_monad_depend]: 4.40999e-06 [auto_monad_grad]: 1.84998e-06 [auto_monad_eliminator]: 1.299e-05 [cse]: 2.87e-05 [a_3]: 4.107e-05 [Cycle 2]: 0.00059844, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.97002e-06 [loop_unroll]: 5.87999e-06 [a_1]: 0.00012501 [with_stream_mark]: 9.57999e-06 [recompute_prepare]: 5.79e-06 [updatestate_depend_eliminate]: 2.84001e-06 [updatestate_assign_eliminate]: 2.56e-06 [updatestate_loads_eliminate]: 2.14999e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.868e-05 [accelerated_algorithm]: 5.47999e-06 [shard]: 1.20001e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.59e-06 [merge_send_recv]: 4.30999e-06 [auto_parallel]: 5.19e-06 [parallel]: 4.08001e-06 [flash_sp]: 3.06999e-06 [merge_comm]: 3.09999e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 5.22999e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.47001e-06 [virtual_dataset]: 5.42001e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.97e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 6.17999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.80002e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.03001e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89999e-06 [meta_fg_expand]: 1.64998e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 1.04998e-06 [after_resolve]: 9.75002e-06 [a_after_grad]: 8.45999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.12e-06 [auto_monad_grad]: 9.29984e-07 [auto_monad_eliminator]: 6.51e-06 [cse]: 1.298e-05 [a_3]: 3.212e-05 [py_interpret_to_execute_after_opt_a]: 7.16999e-06 [slice_cell_reuse_recomputed_activation]: 1.78002e-06 [rewriter_after_opt_a]: 3.006e-05 [convert_after_rewriter]: 7.26001e-06 [order_py_execute_after_rewriter]: 5.22999e-06 [mutable_eliminate]: 0.00045232 [opt_b]: 0.00018207, [1] [Cycle 1]: 0.00017637, [7] [b_1]: 0.00010917 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.36998e-06 [renormalize]: 2.10013e-07 [cse]: 1.638e-05 [optimize_parallel_all_gather_comm]: 1.608e-05 [overlap_param_gather]: 1.84e-06 [cconv]: 2.239e-05 [loop_unroll]: 0.00041532 [opt_after_cconv]: 9.535e-05, [1] [Cycle 1]: 8.959e-05, [7] [c_1]: 2.82e-05 [parameter_eliminate]: 2.27001e-06 [updatestate_depend_eliminate]: 5.32001e-06 [updatestate_assign_eliminate]: 2.55002e-06 [updatestate_loads_eliminate]: 2.23002e-06 [cse]: 1.6e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.207e-05 [tuple_transform]: 6.91e-05, [1] [Cycle 1]: 6.488e-05, [4] [d_1]: 3.89e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.28e-06 [partial_unused_args_eliminate]: 1.65001e-06 [add_recomputation]: 4.322e-05 [cse_after_recomputation]: 1.961e-05, [1] [Cycle 1]: 1.532e-05, [1] [cse]: 1.017e-05 [environ_conv]: 4.62e-06 [swap_dp_allreduce_reducescatter]: 4.97e-06 [bias_add_comm_swap]: 2.31e-06 [label_micro_interleaved_index]: 4.27e-06 [label_fine_grained_interleaved_index]: 2.81e-06 [merge_cast_opt]: 1.22999e-06 [slice_recompute_activation]: 2.142e-05 [micro_interleaved_order_control]: 2.62001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 7.50006e-07 [remove_cast_before_assign_add]: 9.50007e-07 [full_micro_interleaved_order_control]: 2.06003e-06 [reorder_send_recv_between_fp_bp]: 2.72001e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.31002e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.178e-05 [grouped_pairwise_exchange_alltoall]: 1.50999e-06 [offloading_packed_experts]: 3.7e-06 [overlap_recompute_and_grad_model_parallel]: 4.54998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.23002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.71998e-06 [overlap_recompute_comm]: 2.14e-06 [overlap_grad_ring_attention]: 3.95e-06 [overlap_grad_flash_sp]: 1.689e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.17001e-06 [split_layernorm_comm]: 1.92999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.853e-05, [1] [Cycle 1]: 6.43e-05, [6] [build]: 2.21e-06 [elim_shapecalc]: 8.60999e-06 [elim_not_effective]: 1.18e-05 [opt_reshape]: 5.97999e-06 [fold_const_symbol]: 8.51002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.86003e-06 [pipeline_parallel_scheduler]: 1.54998e-06 [auto_monad_reorder]: 1.631e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.55e-06 [opt_after_jit_grad]: 0.00046664 [validate]: 3.035e-05 [backend_pass]: 9.09989e-07 [task_emit]: 0.00583651 [execute]: 6.44999e-06 Sums bootstrap : 0.000468s : 3.29% type_inference : 0.004343s : 30.56% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000016s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.28% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000418s : 2.95% optimize.opt_a.with_stream_mark : 0.000023s : 0.16% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000146s : 1.03% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.09% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.02% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000346s : 2.44% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000042s : 0.29% optimize.opt_a.a_3 : 0.000073s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000452s : 3.18% optimize.opt_b.b_1 : 0.000109s : 0.77% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.12% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000415s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.08% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000021s : 0.15% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000467s : 3.28% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005837s : 41.08% execute : 0.000006s : 0.05% Time group info: ------[substitution.] 0.000122 26 18.04% : 0.000022s : 4: substitution.arithmetic_simplify 1.60% : 0.000002s : 2: substitution.elim_not_effective 1.00% : 0.000001s : 2: substitution.fold_const_symbol 4.15% : 0.000005s : 4: substitution.graph_param_transform 65.53% : 0.000080s : 2: substitution.inline 2.41% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.91% : 0.000005s : 4: substitution.remove_not_recompute_node 3.36% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004303 2 91.99% : 0.003959s : 1: type_inference.infer 8.01% : 0.000345s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000079 2 100.00% : 0.000079s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.77% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.42% : 0.000003s : 17: predicate.arithmetic_simplify 0.74% : 0.000001s : 9: predicate.cast_eliminate 0.77% : 0.000001s : 8: predicate.check_bprop_eliminate 0.65% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.84% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.28% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.52% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.38% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.91% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.27% : 0.000000s : 4: predicate.fold_const_symbol 0.94% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.81% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 6.02% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000001s : 8: predicate.less_batch_normalization 1.58% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.85% : 0.000003s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.23% : 0.000002s : 4: predicate.mutable_eliminate 0.42% : 0.000001s : 4: predicate.opt_reshape 0.73% : 0.000001s : 4: predicate.parallel_virtual_node 1.23% : 0.000002s : 11: predicate.partial_defer_inline 1.26% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.70% : 0.000001s : 8: predicate.reduce_all_const_elim 1.11% : 0.000002s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.71% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.42% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.63% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.93% : 0.000001s : 8: predicate.shard_identity_eliminate 1.16% : 0.000002s : 8: predicate.special_op_eliminate 0.98% : 0.000001s : 8: predicate.specialize_transform 1.08% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.79% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.26% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.58% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.39% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.46% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.20% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.75% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.04% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.04% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.83% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.75% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.25% : 0.000137s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026077 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.39% : 0.002971s : 1: add_attr 11.36% : 0.002962s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000056s : 1: auto_monad 0.08% : 0.000021s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.93% : 0.000503s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000011s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000006s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.02% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.63% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.77% : 0.000462s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.96% : 0.000773s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.18% : 0.001871s : 1: opt_a 0.38% : 0.000099s : 1: opt_after_cconv 1.83% : 0.000476s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 14.20% : 0.003703s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000188s : 1: renormalize.infer 0.58% : 0.000151s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.10% : 0.000025s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000071s : 1: symbol_engine_optimizer 22.42% : 0.005846s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.70% : 0.004355s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x5-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x5-kbk],max_mem:68.0M TotalTime = 0.116367, [24] [bootstrap]: 0.00055346 [type_inference]: 0.00616963 [event_method]: 1.403e-05 [auto_monad]: 5.42e-05 [graph_reusing]: 5.64e-06 [inline]: 1.81e-06 [add_attr]: 0.00334789, [1] [add_attr_with_inline]: 0.00333695, [1] [Cycle 1]: 4.395e-05, [2] [tag_attr]: 1.448e-05 [meta_addattr_fg_expand]: 4.07998e-06 [parallel-infer-symbol]: 2.71999e-06 [pre_auto_parallel]: 2.679e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 7.89994e-07 [dataset_repeat_opt]: 1.80001e-06 [pipeline_split]: 1.77001e-06 [optimize]: 0.0039701, [53] [py_interpret_to_execute]: 2.043e-05 [rewriter_before_opt_a]: 5.847e-05 [opt_a]: 0.00212293, [2] [Cycle 1]: 0.0015159, [45] [expand_dump_flag]: 3.08e-06 [switch_simplify]: 3.212e-05 [loop_unroll]: 2.068e-05 [a_1]: 0.00045405 [with_stream_mark]: 1.322e-05 [recompute_prepare]: 8.38001e-06 [updatestate_depend_eliminate]: 3.66001e-06 [updatestate_assign_eliminate]: 3.36999e-06 [updatestate_loads_eliminate]: 3.08998e-06 [parameter_eliminate]: 1.74998e-06 [a_2]: 7.63e-05 [accelerated_algorithm]: 6.56999e-06 [shard]: 2.04e-06 [meta_shard_fg_expand]: 1.53002e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 8.18001e-06 [auto_parallel]: 5.69999e-06 [parallel]: 2.171e-05 [flash_sp]: 7.08e-06 [merge_comm]: 3.9e-06 [allreduce_fusion]: 3.33998e-06 [matmul_add_comm_reduction]: 9.04e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 8.01001e-06 [virtual_dataset]: 6.07001e-06 [get_grad_eliminate_]: 5.47001e-06 [virtual_output]: 6.19001e-06 [merge_forward]: 3.85e-06 [cell_reuse_recompute_pass]: 1.22e-06 [offload_activation]: 9.00001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.108e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.40001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.50998e-06 [meta_fg_expand]: 2.09e-06 [flash_sp_send_recv_attached]: 2.78e-06 [receive_attached]: 2.81e-06 [after_resolve]: 1.14e-05 [a_after_grad]: 9.35001e-06 [renormalize]: 0.00041342 [add_forward_monad_depend]: 4.25e-06 [auto_monad_grad]: 1.77999e-06 [auto_monad_eliminator]: 1.379e-05 [cse]: 2.608e-05 [a_3]: 4.139e-05 [Cycle 2]: 0.00059782, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 6.74999e-06 [loop_unroll]: 5.92999e-06 [a_1]: 0.00012727 [with_stream_mark]: 1.011e-05 [recompute_prepare]: 5.91998e-06 [updatestate_depend_eliminate]: 2.85002e-06 [updatestate_assign_eliminate]: 2.32001e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 9.20001e-07 [a_2]: 6.881e-05 [accelerated_algorithm]: 5.69999e-06 [shard]: 1.06002e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.72001e-06 [merge_send_recv]: 4.27e-06 [auto_parallel]: 5.15001e-06 [parallel]: 4.24002e-06 [flash_sp]: 2.98e-06 [merge_comm]: 2.97002e-06 [allreduce_fusion]: 2.65997e-06 [matmul_add_comm_reduction]: 5.04e-06 [allreduce_slice_to_reducescatter]: 2.69996e-07 [virtual_shard_identity]: 5.76003e-06 [virtual_dataset]: 5.40001e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.13002e-06 [merge_forward]: 2.44999e-06 [cell_reuse_recompute_pass]: 1.35001e-06 [offload_activation]: 5.72999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.008e-05 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94001e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.01002e-06 [after_resolve]: 8.90001e-06 [a_after_grad]: 8.17e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.20999e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.27001e-06 [cse]: 1.374e-05 [a_3]: 3.233e-05 [py_interpret_to_execute_after_opt_a]: 7.64002e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.218e-05 [convert_after_rewriter]: 7.3e-06 [order_py_execute_after_rewriter]: 5.24e-06 [mutable_eliminate]: 0.00044882 [opt_b]: 0.00018264, [1] [Cycle 1]: 0.00017669, [7] [b_1]: 0.00010915 [b_2]: 7.60998e-06 [updatestate_depend_eliminate]: 4.95001e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.31998e-06 [renormalize]: 4.59986e-07 [cse]: 1.544e-05 [optimize_parallel_all_gather_comm]: 1.602e-05 [overlap_param_gather]: 1.74998e-06 [cconv]: 2.121e-05 [loop_unroll]: 0.00041576 [opt_after_cconv]: 9.507e-05, [1] [Cycle 1]: 8.942e-05, [7] [c_1]: 2.783e-05 [parameter_eliminate]: 2.07999e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.64e-05 [renormalize]: 3.10014e-07 [remove_dup_value]: 1.207e-05 [tuple_transform]: 6.891e-05, [1] [Cycle 1]: 6.483e-05, [4] [d_1]: 3.911e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.12001e-06 [partial_unused_args_eliminate]: 1.60001e-06 [add_recomputation]: 4.722e-05 [cse_after_recomputation]: 2.126e-05, [1] [Cycle 1]: 1.683e-05, [1] [cse]: 1.14e-05 [environ_conv]: 4.45999e-06 [swap_dp_allreduce_reducescatter]: 5.72999e-06 [bias_add_comm_swap]: 2.79999e-06 [label_micro_interleaved_index]: 4.20999e-06 [label_fine_grained_interleaved_index]: 2.74999e-06 [merge_cast_opt]: 1.37e-06 [slice_recompute_activation]: 2.35002e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 1.02998e-06 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.32001e-06 [reorder_send_recv_between_fp_bp]: 2.78e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 1.04e-06 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.10999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.71e-06 [control_data_broadcast_order]: 1.156e-05 [grouped_pairwise_exchange_alltoall]: 1.76003e-06 [offloading_packed_experts]: 3.71999e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.14003e-06 [overlap_recompute_allgather_and_fa_grad]: 1.66998e-06 [overlap_recompute_comm]: 2.29001e-06 [overlap_grad_ring_attention]: 3.91001e-06 [overlap_grad_flash_sp]: 1.68e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.14999e-06 [split_layernorm_comm]: 1.97999e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.943e-05, [1] [Cycle 1]: 6.537e-05, [6] [build]: 2.42001e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.183e-05 [opt_reshape]: 6.07999e-06 [fold_const_symbol]: 9.10001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.24999e-06 [pipeline_parallel_scheduler]: 1.53002e-06 [auto_monad_reorder]: 1.611e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.35998e-06 [opt_after_jit_grad]: 0.00045533 [validate]: 3.034e-05 [backend_pass]: 1.09998e-06 [task_emit]: 0.101407 [execute]: 9.39e-06 Sums bootstrap : 0.000553s : 0.49% type_inference : 0.006170s : 5.51% event_method : 0.000014s : 0.01% auto_monad : 0.000054s : 0.05% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.02% optimize.rewriter_before_opt_a : 0.000058s : 0.05% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.03% optimize.opt_a.loop_unroll : 0.000027s : 0.02% optimize.opt_a.a_1 : 0.000581s : 0.52% optimize.opt_a.with_stream_mark : 0.000023s : 0.02% optimize.opt_a.recompute_prepare : 0.000014s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.02% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000414s : 0.37% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.02% optimize.opt_a.cse : 0.000040s : 0.04% optimize.opt_a.a_3 : 0.000074s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000449s : 0.40% optimize.opt_b.b_1 : 0.000109s : 0.10% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000015s : 0.01% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.02% optimize.loop_unroll : 0.000416s : 0.37% optimize.opt_after_cconv.c_1 : 0.000028s : 0.02% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.01% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.01% optimize.tuple_transform.d_1 : 0.000039s : 0.03% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000004s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000006s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000002s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000455s : 0.41% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.101407s : 90.57% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000167 30 14.97% : 0.000025s : 5: substitution.arithmetic_simplify 1.11% : 0.000002s : 2: substitution.elim_not_effective 0.82% : 0.000001s : 2: substitution.fold_const_symbol 3.06% : 0.000005s : 4: substitution.graph_param_transform 66.29% : 0.000110s : 3: substitution.inline 1.62% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.76% : 0.000005s : 4: substitution.remove_not_recompute_node 2.74% : 0.000005s : 4: substitution.replace_old_param 6.62% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006127 2 90.45% : 0.005542s : 1: type_inference.infer 9.55% : 0.000585s : 1: type_inference.specialize ------[replace.] 0.000039 5 68.92% : 0.000027s : 3: replace.inline 31.08% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.59% : 0.000108s : 3: match.inline 8.41% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.90% : 0.000001s : 11: predicate.accumulaten_eliminater 0.87% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.88% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.96% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.19% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.86% : 0.000003s : 23: predicate.environ_get_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.26% : 0.000004s : 16: predicate.float_depend_g_call 0.56% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.95% : 0.000009s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.44% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.98% : 0.000002s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.31% : 0.000004s : 32: predicate.load_eliminater 0.98% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.63% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 0.98% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.47% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.49% : 0.000002s : 17: predicate.partial_eliminate 0.92% : 0.000001s : 11: predicate.print_const_string_wrapper 0.63% : 0.000001s : 8: predicate.reduce_all_const_elim 1.14% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.43% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.85% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.85% : 0.000001s : 8: predicate.same_eliminate 0.54% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.79% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.97% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.97% : 0.000002s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.99% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.92% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.84% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.32% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.37% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.73% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.16% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.56% : 0.000001s : 4: predicate.value_based_eliminate 0.73% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000386 8 44.15% : 0.000170s : 3: func_graph_cloner_run.FuncGraphClonerGraph 55.85% : 0.000216s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.125197 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.68% : 0.003352s : 1: add_attr 2.67% : 0.003341s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000051s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000059s : 1: auto_monad 0.02% : 0.000020s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000006s : 1: bias_add_comm_swap 0.47% : 0.000591s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.02% : 0.000024s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000006s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.01% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.34% : 0.000424s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.37% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 0.76% : 0.000953s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.07% : 0.000091s : 28: opt.transform.opt_b 0.03% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000032s : 4: opt.transform.symbol_engine_opt 1.70% : 0.002126s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.37% : 0.000465s : 1: opt_after_jit_grad 0.15% : 0.000186s : 1: opt_b 3.21% : 0.004021s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000031s : 1: pipeline_split 0.02% : 0.000031s : 1: pre_auto_parallel 0.02% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000003s : 1: remove_cast_before_assign_add 0.01% : 0.000015s : 1: remove_dup_value 0.17% : 0.000213s : 1: renormalize.infer 0.15% : 0.000193s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000036s : 1: rewriter_after_opt_a 0.05% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000009s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000072s : 1: symbol_engine_optimizer 81.02% : 0.101429s : 1: task_emit 0.06% : 0.000072s : 1: tuple_transform 4.94% : 0.006183s : 1: type_inference 0.04% : 0.000056s : 1: validate TotalTime = 0.109477, [24] [bootstrap]: 0.00047219 [type_inference]: 0.00437789 [event_method]: 1.046e-05 [auto_monad]: 4.987e-05 [graph_reusing]: 4.92e-06 [inline]: 1.67001e-06 [add_attr]: 0.00297636, [1] [add_attr_with_inline]: 0.00296847, [1] [Cycle 1]: 4.484e-05, [2] [tag_attr]: 1.218e-05 [meta_addattr_fg_expand]: 3.17002e-06 [parallel-infer-symbol]: 2.81e-06 [pre_auto_parallel]: 2.117e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 8.00006e-07 [dataset_repeat_opt]: 2.21003e-06 [pipeline_split]: 1.60001e-06 [optimize]: 0.00367822, [53] [py_interpret_to_execute]: 1.496e-05 [rewriter_before_opt_a]: 3.793e-05 [opt_a]: 0.00184287, [2] [Cycle 1]: 0.00124452, [45] [expand_dump_flag]: 2.55002e-06 [switch_simplify]: 2.324e-05 [loop_unroll]: 1.403e-05 [a_1]: 0.0002911 [with_stream_mark]: 1.275e-05 [recompute_prepare]: 7.15e-06 [updatestate_depend_eliminate]: 3.58999e-06 [updatestate_assign_eliminate]: 3.36001e-06 [updatestate_loads_eliminate]: 2.99001e-06 [parameter_eliminate]: 1.69998e-06 [a_2]: 7.48e-05 [accelerated_algorithm]: 6.18998e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.66e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 7.57002e-06 [auto_parallel]: 5.69999e-06 [parallel]: 1.804e-05 [flash_sp]: 7.03e-06 [merge_comm]: 3.56001e-06 [allreduce_fusion]: 3.38999e-06 [matmul_add_comm_reduction]: 9.12001e-06 [allreduce_slice_to_reducescatter]: 5.60016e-07 [virtual_shard_identity]: 7.04001e-06 [virtual_dataset]: 5.66e-06 [get_grad_eliminate_]: 5.25999e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.53e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.087e-05 [merge_recompute_call_nodes]: 1.34e-06 [before_grad]: 9.29e-06 [set_forward_comm_id_for_comm_node_pass]: 3.44001e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.23998e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.036e-05 [a_after_grad]: 8.34002e-06 [renormalize]: 0.00034403 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.294e-05 [cse]: 2.568e-05 [a_3]: 3.998e-05 [Cycle 2]: 0.00058932, [45] [expand_dump_flag]: 8.79983e-07 [switch_simplify]: 6.92002e-06 [loop_unroll]: 5.34e-06 [a_1]: 0.00012479 [with_stream_mark]: 9.20001e-06 [recompute_prepare]: 5.40001e-06 [updatestate_depend_eliminate]: 2.80997e-06 [updatestate_assign_eliminate]: 2.27001e-06 [updatestate_loads_eliminate]: 2.26e-06 [parameter_eliminate]: 7.89994e-07 [a_2]: 6.628e-05 [accelerated_algorithm]: 5.42999e-06 [shard]: 1.02998e-06 [meta_shard_fg_expand]: 1.22e-06 [shard_inline]: 5.29e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 5.35999e-06 [parallel]: 4.27e-06 [flash_sp]: 3.35998e-06 [merge_comm]: 2.88998e-06 [allreduce_fusion]: 2.65997e-06 [matmul_add_comm_reduction]: 5.05001e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 6.07999e-06 [virtual_dataset]: 5.12e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 5.05999e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.36002e-06 [offload_activation]: 5.78002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.018e-05 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.48999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.28e-06 [meta_fg_expand]: 1.67999e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.05001e-06 [a_after_grad]: 7.97e-06 [renormalize]: 6.99947e-08 [add_forward_monad_depend]: 1.03001e-06 [auto_monad_grad]: 7.80012e-07 [auto_monad_eliminator]: 6.33e-06 [cse]: 1.222e-05 [a_3]: 3.218e-05 [py_interpret_to_execute_after_opt_a]: 7.41001e-06 [slice_cell_reuse_recomputed_activation]: 2.04999e-06 [rewriter_after_opt_a]: 3.121e-05 [convert_after_rewriter]: 6.94001e-06 [order_py_execute_after_rewriter]: 5.22999e-06 [mutable_eliminate]: 0.00047626 [opt_b]: 0.00018137, [1] [Cycle 1]: 0.00017544, [7] [b_1]: 0.00010843 [b_2]: 6.71e-06 [updatestate_depend_eliminate]: 5.05999e-06 [updatestate_assign_eliminate]: 2.48998e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 3.59985e-07 [cse]: 1.639e-05 [optimize_parallel_all_gather_comm]: 1.573e-05 [overlap_param_gather]: 1.91998e-06 [cconv]: 2.156e-05 [loop_unroll]: 0.00041812 [opt_after_cconv]: 9.494e-05, [1] [Cycle 1]: 8.92e-05, [7] [c_1]: 2.773e-05 [parameter_eliminate]: 2.22999e-06 [updatestate_depend_eliminate]: 5.14e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.64e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.257e-05 [tuple_transform]: 6.806e-05, [1] [Cycle 1]: 6.364e-05, [4] [d_1]: 3.818e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.16998e-06 [partial_unused_args_eliminate]: 1.76e-06 [add_recomputation]: 4.353e-05 [cse_after_recomputation]: 2.016e-05, [1] [Cycle 1]: 1.588e-05, [1] [cse]: 1.078e-05 [environ_conv]: 4.98001e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.37999e-06 [label_micro_interleaved_index]: 4.23999e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.18001e-06 [slice_recompute_activation]: 2.46e-06 [micro_interleaved_order_control]: 2.27999e-06 [assign_add_opt]: 1.29e-06 [ForceFp32Comm]: 1.00001e-06 [remove_cast_before_assign_add]: 9.20001e-07 [full_micro_interleaved_order_control]: 2.47001e-06 [reorder_send_recv_between_fp_bp]: 2.59999e-06 [comm_op_add_attrs]: 1.09e-06 [add_comm_op_reuse_tag]: 9.80013e-07 [interleave_split_concat_branches]: 1.17999e-06 [interleave_parallel_branches]: 1.05999e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.55999e-06 [control_data_broadcast_order]: 1.17e-05 [grouped_pairwise_exchange_alltoall]: 1.82001e-06 [offloading_packed_experts]: 3.90998e-06 [overlap_recompute_and_grad_model_parallel]: 4.45999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.52001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.07001e-06 [overlap_grad_ring_attention]: 4.33999e-06 [overlap_grad_flash_sp]: 1.643e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 1.85001e-06 [handle_group_info]: 1.33002e-06 [symbol_engine_optimizer]: 6.812e-05, [1] [Cycle 1]: 6.408e-05, [6] [build]: 2.46e-06 [elim_shapecalc]: 8.13999e-06 [elim_not_effective]: 1.131e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 8.79e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.69e-06 [pipeline_parallel_scheduler]: 1.49998e-06 [auto_monad_reorder]: 1.516e-05 [get_jit_bprop_graph]: 9.89996e-07 [rewriter_after_jit_bprop_graph]: 3.31001e-06 [opt_after_jit_grad]: 0.00044796 [validate]: 2.978e-05 [backend_pass]: 9.69972e-07 [task_emit]: 0.0971602 [execute]: 8.90001e-06 Sums bootstrap : 0.000472s : 0.45% type_inference : 0.004378s : 4.15% event_method : 0.000010s : 0.01% auto_monad : 0.000050s : 0.05% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.01% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000021s : 0.02% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.01% optimize.rewriter_before_opt_a : 0.000038s : 0.04% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000030s : 0.03% optimize.opt_a.loop_unroll : 0.000019s : 0.02% optimize.opt_a.a_1 : 0.000416s : 0.39% optimize.opt_a.with_stream_mark : 0.000022s : 0.02% optimize.opt_a.recompute_prepare : 0.000013s : 0.01% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000141s : 0.13% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.01% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.01% optimize.opt_a.merge_send_recv : 0.000012s : 0.01% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000022s : 0.02% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.01% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.01% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.01% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.02% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.02% optimize.opt_a.a_after_grad : 0.000016s : 0.02% optimize.opt_a.renormalize : 0.000344s : 0.33% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.02% optimize.opt_a.cse : 0.000038s : 0.04% optimize.opt_a.a_3 : 0.000072s : 0.07% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.03% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000476s : 0.45% optimize.opt_b.b_1 : 0.000108s : 0.10% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.01% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.02% optimize.loop_unroll : 0.000418s : 0.40% optimize.opt_after_cconv.c_1 : 0.000028s : 0.03% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.01% optimize.tuple_transform.d_1 : 0.000038s : 0.04% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000044s : 0.04% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.01% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000016s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.01% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000448s : 0.42% validate : 0.000030s : 0.03% backend_pass : 0.000001s : 0.00% task_emit : 0.097160s : 92.06% execute : 0.000009s : 0.01% Time group info: ------[substitution.] 0.000121 26 17.71% : 0.000021s : 4: substitution.arithmetic_simplify 1.45% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.37% : 0.000005s : 4: substitution.graph_param_transform 65.92% : 0.000080s : 2: substitution.inline 2.46% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.79% : 0.000005s : 4: substitution.remove_not_recompute_node 3.27% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004338 2 91.68% : 0.003977s : 1: type_inference.infer 8.32% : 0.000361s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.03% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.42% : 0.000003s : 17: predicate.arithmetic_simplify 0.75% : 0.000001s : 9: predicate.cast_eliminate 0.82% : 0.000001s : 8: predicate.check_bprop_eliminate 0.63% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.83% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.86% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.35% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.09% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.90% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.96% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.89% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 0.98% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.84% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.62% : 0.000001s : 8: predicate.incorporate_call_switch 5.88% : 0.000008s : 44: predicate.inline 0.95% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.02% : 0.000001s : 8: predicate.less_batch_normalization 1.62% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.14% : 0.000003s : 26: predicate.load_eliminater 1.30% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.78% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 1.04% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.32% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.27% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 1.00% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.77% : 0.000001s : 8: predicate.remove_not_recompute_node 1.36% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.76% : 0.000001s : 9: predicate.reshape_eliminate 0.92% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.95% : 0.000001s : 8: predicate.same_eliminate 0.56% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.06% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.95% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.92% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.50% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.83% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.18% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.33% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.68% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.72% : 0.000001s : 4: predicate.value_based_eliminate 0.85% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000258 6 43.12% : 0.000111s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.88% : 0.000147s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.117389 196 0.00% : 0.000004s : 1: ForceFp32Comm 2.54% : 0.002981s : 1: add_attr 2.53% : 0.002972s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.04% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.05% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.43% : 0.000508s : 1: bootstrap 0.02% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.01% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000006s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.01% : 0.000016s : 1: event_method 0.01% : 0.000016s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.36% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.41% : 0.000485s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 0.65% : 0.000762s : 78: opt.transform.opt_a 0.02% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.08% : 0.000090s : 28: opt.transform.opt_b 0.04% : 0.000043s : 2: opt.transform.opt_trans_graph 0.03% : 0.000031s : 4: opt.transform.symbol_engine_opt 1.57% : 0.001846s : 1: opt_a 0.08% : 0.000098s : 1: opt_after_cconv 0.39% : 0.000457s : 1: opt_after_jit_grad 0.16% : 0.000185s : 1: opt_b 3.14% : 0.003682s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.02% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000019s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.01% : 0.000016s : 1: remove_dup_value 0.16% : 0.000188s : 1: renormalize.infer 0.13% : 0.000150s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.03% : 0.000035s : 1: rewriter_after_opt_a 0.04% : 0.000042s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.06% : 0.000071s : 1: symbol_engine_optimizer 82.79% : 0.097181s : 1: task_emit 0.06% : 0.000071s : 1: tuple_transform 3.74% : 0.004391s : 1: type_inference 0.04% : 0.000050s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x5-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x5-ge],max_mem:68.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x6-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x6-pynative],max_mem:68.0M TotalTime = 0.0210623, [24] [bootstrap]: 0.00055409 [type_inference]: 0.00618454 [event_method]: 1.436e-05 [auto_monad]: 5.651e-05 [graph_reusing]: 5.96e-06 [inline]: 1.72999e-06 [add_attr]: 0.003383, [1] [add_attr_with_inline]: 0.00337233, [1] [Cycle 1]: 4.421e-05, [2] [tag_attr]: 1.564e-05 [meta_addattr_fg_expand]: 4.07e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.728e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 7.80012e-07 [dataset_repeat_opt]: 1.74e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00400192, [53] [py_interpret_to_execute]: 2.055e-05 [rewriter_before_opt_a]: 5.683e-05 [opt_a]: 0.00212818, [2] [Cycle 1]: 0.00151636, [45] [expand_dump_flag]: 2.66e-06 [switch_simplify]: 3.167e-05 [loop_unroll]: 2.04e-05 [a_1]: 0.00045314 [with_stream_mark]: 1.37e-05 [recompute_prepare]: 8.17e-06 [updatestate_depend_eliminate]: 3.76001e-06 [updatestate_assign_eliminate]: 3.49001e-06 [updatestate_loads_eliminate]: 2.79999e-06 [parameter_eliminate]: 2.07999e-06 [a_2]: 7.587e-05 [accelerated_algorithm]: 6.34001e-06 [shard]: 2.01998e-06 [meta_shard_fg_expand]: 1.59998e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 7.31999e-06 [auto_parallel]: 5.67001e-06 [parallel]: 2.151e-05 [flash_sp]: 6.93e-06 [merge_comm]: 3.55e-06 [allreduce_fusion]: 3.28e-06 [matmul_add_comm_reduction]: 9.40001e-06 [allreduce_slice_to_reducescatter]: 5.49975e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.17999e-06 [get_grad_eliminate_]: 5.63002e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.61001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.05001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 1.36002e-06 [before_grad]: 9.44e-06 [set_forward_comm_id_for_comm_node_pass]: 3.3e-06 [meta_fg_expand]: 2.10002e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.68e-06 [after_resolve]: 1.045e-05 [a_after_grad]: 8.92e-06 [renormalize]: 0.00042194 [add_forward_monad_depend]: 4.33999e-06 [auto_monad_grad]: 1.93997e-06 [auto_monad_eliminator]: 1.293e-05 [cse]: 2.713e-05 [a_3]: 4.09e-05 [Cycle 2]: 0.00060273, [45] [expand_dump_flag]: 9.00007e-07 [switch_simplify]: 7.05e-06 [loop_unroll]: 5.51002e-06 [a_1]: 0.00012715 [with_stream_mark]: 9.61e-06 [recompute_prepare]: 5.52999e-06 [updatestate_depend_eliminate]: 2.74001e-06 [updatestate_assign_eliminate]: 2.17001e-06 [updatestate_loads_eliminate]: 2.31e-06 [parameter_eliminate]: 9.00007e-07 [a_2]: 6.817e-05 [accelerated_algorithm]: 5.59e-06 [shard]: 1.12e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.52001e-06 [merge_send_recv]: 4.67e-06 [auto_parallel]: 5.27999e-06 [parallel]: 5.85002e-06 [flash_sp]: 2.99999e-06 [merge_comm]: 3.35e-06 [allreduce_fusion]: 2.81e-06 [matmul_add_comm_reduction]: 5.03002e-06 [allreduce_slice_to_reducescatter]: 3.10014e-07 [virtual_shard_identity]: 6.48e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.10001e-06 [virtual_output]: 6.39001e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 6.23002e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.048e-05 [merge_recompute_call_nodes]: 7.40023e-07 [before_grad]: 8.45001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47002e-06 [meta_fg_expand]: 1.69998e-06 [flash_sp_send_recv_attached]: 8.50006e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 8.94998e-06 [a_after_grad]: 8.22003e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.01002e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.08002e-06 [cse]: 1.327e-05 [a_3]: 3.182e-05 [py_interpret_to_execute_after_opt_a]: 7.43999e-06 [slice_cell_reuse_recomputed_activation]: 1.79e-06 [rewriter_after_opt_a]: 3.095e-05 [convert_after_rewriter]: 7.48e-06 [order_py_execute_after_rewriter]: 4.92e-06 [mutable_eliminate]: 0.00045108 [opt_b]: 0.00020778, [1] [Cycle 1]: 0.00020166, [7] [b_1]: 0.00013276 [b_2]: 7.22997e-06 [updatestate_depend_eliminate]: 5.39e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.31e-06 [renormalize]: 3.4002e-07 [cse]: 1.641e-05 [optimize_parallel_all_gather_comm]: 1.634e-05 [overlap_param_gather]: 1.91e-06 [cconv]: 2.216e-05 [loop_unroll]: 0.00042086 [opt_after_cconv]: 9.596e-05, [1] [Cycle 1]: 8.975e-05, [7] [c_1]: 2.784e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.09998e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.19001e-06 [cse]: 1.62e-05 [renormalize]: 3.69997e-07 [remove_dup_value]: 1.253e-05 [tuple_transform]: 6.871e-05, [1] [Cycle 1]: 6.457e-05, [4] [d_1]: 3.863e-05 [none_parameter_eliminate]: 1.49998e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.97001e-06 [add_recomputation]: 4.807e-05 [cse_after_recomputation]: 2.07e-05, [1] [Cycle 1]: 1.636e-05, [1] [cse]: 1.109e-05 [environ_conv]: 4.58001e-06 [swap_dp_allreduce_reducescatter]: 5.03002e-06 [bias_add_comm_swap]: 2.84999e-06 [label_micro_interleaved_index]: 4.37e-06 [label_fine_grained_interleaved_index]: 2.93998e-06 [merge_cast_opt]: 1.12999e-06 [slice_recompute_activation]: 2.24001e-06 [micro_interleaved_order_control]: 2.34999e-06 [assign_add_opt]: 1.27e-06 [ForceFp32Comm]: 7.09988e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.16002e-06 [add_comm_op_reuse_tag]: 9.00007e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.10001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.87001e-06 [control_data_broadcast_order]: 1.115e-05 [grouped_pairwise_exchange_alltoall]: 1.72999e-06 [offloading_packed_experts]: 3.75998e-06 [overlap_recompute_and_grad_model_parallel]: 4.82998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.36998e-06 [overlap_grad_ring_attention]: 4.35e-06 [overlap_grad_flash_sp]: 1.697e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 2.04e-06 [handle_group_info]: 9.89996e-07 [symbol_engine_optimizer]: 6.847e-05, [1] [Cycle 1]: 6.452e-05, [6] [build]: 2.87002e-06 [elim_shapecalc]: 8.75001e-06 [elim_not_effective]: 1.164e-05 [opt_reshape]: 5.82999e-06 [fold_const_symbol]: 8.80999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.45999e-06 [pipeline_parallel_scheduler]: 1.47999e-06 [auto_monad_reorder]: 1.52e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00045216 [validate]: 3.118e-05 [backend_pass]: 9.5999e-07 [task_emit]: 0.00610969 [execute]: 7.5e-06 Sums bootstrap : 0.000554s : 3.32% type_inference : 0.006185s : 37.01% event_method : 0.000014s : 0.09% auto_monad : 0.000057s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000027s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000580s : 3.47% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.13% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.11% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.12% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000422s : 2.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.03% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.11% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.70% optimize.opt_b.b_1 : 0.000133s : 0.79% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.10% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000421s : 2.52% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000048s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.03% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000452s : 2.71% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006110s : 36.56% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000162 30 15.04% : 0.000024s : 5: substitution.arithmetic_simplify 1.29% : 0.000002s : 2: substitution.elim_not_effective 0.81% : 0.000001s : 2: substitution.fold_const_symbol 3.04% : 0.000005s : 4: substitution.graph_param_transform 66.21% : 0.000108s : 3: substitution.inline 1.67% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.83% : 0.000005s : 4: substitution.remove_not_recompute_node 2.41% : 0.000004s : 4: substitution.replace_old_param 6.71% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006141 2 91.02% : 0.005590s : 1: type_inference.infer 8.98% : 0.000551s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.36% : 0.000027s : 3: replace.inline 29.64% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.41% : 0.000106s : 3: match.inline 8.59% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.80% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.43% : 0.000004s : 19: predicate.arithmetic_simplify 0.92% : 0.000001s : 11: predicate.cast_eliminate 0.66% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000001s : 11: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.27% : 0.000000s : 4: predicate.elim_not_effective 0.44% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.14% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.29% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.20% : 0.000003s : 16: predicate.float_depend_g_call 0.62% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.24% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.21% : 0.000010s : 51: predicate.inline 0.88% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.71% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 0.96% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.11% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.59% : 0.000002s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.05% : 0.000002s : 4: predicate.mutable_eliminate 0.37% : 0.000001s : 4: predicate.opt_reshape 0.36% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.66% : 0.000001s : 8: predicate.reduce_all_const_elim 1.26% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.53% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.33% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.41% : 0.000001s : 4: predicate.row_tensor_eliminate 0.76% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.88% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.80% : 0.000001s : 8: predicate.specialize_transform 0.92% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.37% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.41% : 0.000002s : 16: predicate.switch_defer_inline 2.02% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.85% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.50% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.49% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.66% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.38% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000388 8 53.87% : 0.000209s : 3: func_graph_cloner_run.FuncGraphClonerGraph 46.13% : 0.000179s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029993 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.29% : 0.003387s : 1: add_attr 11.26% : 0.003376s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000052s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000062s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.97% : 0.000592s : 1: bootstrap 0.09% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000011s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.43% : 0.000429s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000012s : 1: opt.transform.mutable_eliminate 3.16% : 0.000948s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.38% : 0.000115s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.11% : 0.002131s : 1: opt_a 0.33% : 0.000099s : 1: opt_after_cconv 1.54% : 0.000461s : 1: opt_after_jit_grad 0.70% : 0.000211s : 1: opt_b 13.36% : 0.004006s : 1: optimize 0.07% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000003s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000212s : 1: renormalize.infer 0.67% : 0.000202s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.40% : 0.006120s : 1: task_emit 0.24% : 0.000072s : 1: tuple_transform 20.67% : 0.006198s : 1: type_inference 0.20% : 0.000060s : 1: validate TotalTime = 0.0180513, [24] [bootstrap]: 0.00047167 [type_inference]: 0.00434922 [event_method]: 1.01e-05 [auto_monad]: 5.066e-05 [graph_reusing]: 5.87001e-06 [inline]: 1.65001e-06 [add_attr]: 0.00297174, [1] [add_attr_with_inline]: 0.00296436, [1] [Cycle 1]: 4.478e-05, [2] [tag_attr]: 1.171e-05 [meta_addattr_fg_expand]: 3.41001e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.011e-05 [insert-virtual-dataset]: 2.34999e-06 [parallel-infer-symbol-second]: 6.69999e-07 [dataset_repeat_opt]: 1.72999e-06 [pipeline_split]: 1.60999e-06 [optimize]: 0.00366947, [53] [py_interpret_to_execute]: 1.474e-05 [rewriter_before_opt_a]: 3.799e-05 [opt_a]: 0.00183728, [2] [Cycle 1]: 0.00123738, [45] [expand_dump_flag]: 2.94001e-06 [switch_simplify]: 2.401e-05 [loop_unroll]: 1.336e-05 [a_1]: 0.0002892 [with_stream_mark]: 1.368e-05 [recompute_prepare]: 7.27997e-06 [updatestate_depend_eliminate]: 3.34001e-06 [updatestate_assign_eliminate]: 3.13998e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 1.68002e-06 [a_2]: 7.639e-05 [accelerated_algorithm]: 6.16998e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.79999e-06 [merge_send_recv]: 7.66001e-06 [auto_parallel]: 5.61e-06 [parallel]: 1.655e-05 [flash_sp]: 6.78998e-06 [merge_comm]: 3.43e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 8.64e-06 [allreduce_slice_to_reducescatter]: 5.59987e-07 [virtual_shard_identity]: 6.82002e-06 [virtual_dataset]: 5.78002e-06 [get_grad_eliminate_]: 5.54e-06 [virtual_output]: 5.34998e-06 [merge_forward]: 3.64002e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 9.44e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.092e-05 [merge_recompute_call_nodes]: 1.40001e-06 [before_grad]: 8.91002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45003e-06 [meta_fg_expand]: 2.07999e-06 [flash_sp_send_recv_attached]: 2.41e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.044e-05 [a_after_grad]: 8.39002e-06 [renormalize]: 0.00033892 [add_forward_monad_depend]: 4.45999e-06 [auto_monad_grad]: 1.94e-06 [auto_monad_eliminator]: 1.311e-05 [cse]: 2.648e-05 [a_3]: 4.008e-05 [Cycle 2]: 0.00059088, [45] [expand_dump_flag]: 9.70002e-07 [switch_simplify]: 6.75998e-06 [loop_unroll]: 5.80002e-06 [a_1]: 0.00012627 [with_stream_mark]: 1.096e-05 [recompute_prepare]: 5.61e-06 [updatestate_depend_eliminate]: 2.81999e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.39001e-06 [parameter_eliminate]: 8.09989e-07 [a_2]: 6.714e-05 [accelerated_algorithm]: 5.45001e-06 [shard]: 1.05001e-06 [meta_shard_fg_expand]: 1.09998e-06 [shard_inline]: 5.39e-06 [merge_send_recv]: 4.28001e-06 [auto_parallel]: 5.35999e-06 [parallel]: 3.96001e-06 [flash_sp]: 3.47002e-06 [merge_comm]: 3.30003e-06 [allreduce_fusion]: 2.84999e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.16e-06 [virtual_dataset]: 5.21998e-06 [get_grad_eliminate_]: 5.20999e-06 [virtual_output]: 4.90999e-06 [merge_forward]: 2.63998e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.84e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.89001e-06 [merge_recompute_call_nodes]: 7.09988e-07 [before_grad]: 7.88999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11999e-06 [meta_fg_expand]: 1.54998e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.85999e-06 [a_after_grad]: 8.03999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.10001e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.12001e-06 [cse]: 1.16e-05 [a_3]: 3.148e-05 [py_interpret_to_execute_after_opt_a]: 7.5e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 2.993e-05 [convert_after_rewriter]: 7.12002e-06 [order_py_execute_after_rewriter]: 4.98001e-06 [mutable_eliminate]: 0.00045165 [opt_b]: 0.00017951, [1] [Cycle 1]: 0.00017335, [7] [b_1]: 0.00010549 [b_2]: 7.07002e-06 [updatestate_depend_eliminate]: 5.71998e-06 [updatestate_assign_eliminate]: 2.43e-06 [updatestate_loads_eliminate]: 2.37001e-06 [renormalize]: 3.50003e-07 [cse]: 1.617e-05 [optimize_parallel_all_gather_comm]: 1.521e-05 [overlap_param_gather]: 2.16e-06 [cconv]: 2.167e-05 [loop_unroll]: 0.00041635 [opt_after_cconv]: 9.193e-05, [1] [Cycle 1]: 8.633e-05, [7] [c_1]: 2.703e-05 [parameter_eliminate]: 2.24001e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.43998e-06 [updatestate_loads_eliminate]: 2.21e-06 [cse]: 1.534e-05 [renormalize]: 3.30008e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 6.824e-05, [1] [Cycle 1]: 6.407e-05, [4] [d_1]: 3.869e-05 [none_parameter_eliminate]: 1.54e-06 [renormalize]: 2.10013e-07 [switch_simplify]: 6.24001e-06 [partial_unused_args_eliminate]: 1.67001e-06 [add_recomputation]: 4.171e-05 [cse_after_recomputation]: 1.996e-05, [1] [Cycle 1]: 1.558e-05, [1] [cse]: 1.038e-05 [environ_conv]: 4.4e-06 [swap_dp_allreduce_reducescatter]: 5.35001e-06 [bias_add_comm_swap]: 2.54001e-06 [label_micro_interleaved_index]: 3.97002e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.35999e-06 [slice_recompute_activation]: 2.02999e-06 [micro_interleaved_order_control]: 2.14e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 9.19972e-07 [remove_cast_before_assign_add]: 1.40999e-06 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.29001e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.10001e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.52999e-06 [control_data_broadcast_order]: 1.127e-05 [grouped_pairwise_exchange_alltoall]: 1.72001e-06 [offloading_packed_experts]: 3.87002e-06 [overlap_recompute_and_grad_model_parallel]: 4.53999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.19e-06 [overlap_recompute_allgather_and_fa_grad]: 1.29e-06 [overlap_recompute_comm]: 2.02999e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 5.072e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.22999e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.35999e-06 [symbol_engine_optimizer]: 6.917e-05, [1] [Cycle 1]: 6.488e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.64e-06 [elim_not_effective]: 1.166e-05 [opt_reshape]: 6.16998e-06 [fold_const_symbol]: 8.63001e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.49e-06 [pipeline_parallel_scheduler]: 1.65001e-06 [auto_monad_reorder]: 1.49e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.3e-06 [opt_after_jit_grad]: 0.00044724 [validate]: 3.186e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00579199 [execute]: 6.88e-06 Sums bootstrap : 0.000472s : 3.34% type_inference : 0.004349s : 30.77% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000020s : 0.14% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.10% optimize.rewriter_before_opt_a : 0.000038s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000415s : 2.94% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.03% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000016s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.40% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000038s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000452s : 3.20% optimize.opt_b.b_1 : 0.000105s : 0.75% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.02% optimize.cconv : 0.000022s : 0.15% optimize.loop_unroll : 0.000416s : 2.95% optimize.opt_after_cconv.c_1 : 0.000027s : 0.19% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000042s : 0.30% optimize.cse_after_recomputation.cse : 0.000010s : 0.07% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000051s : 0.36% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000001s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000447s : 3.16% validate : 0.000032s : 0.23% backend_pass : 0.000001s : 0.01% task_emit : 0.005792s : 40.98% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000116 26 18.76% : 0.000022s : 4: substitution.arithmetic_simplify 1.56% : 0.000002s : 2: substitution.elim_not_effective 1.02% : 0.000001s : 2: substitution.fold_const_symbol 4.43% : 0.000005s : 4: substitution.graph_param_transform 65.03% : 0.000076s : 2: substitution.inline 2.24% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.67% : 0.000004s : 4: substitution.remove_not_recompute_node 3.29% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004309 2 92.02% : 0.003965s : 1: type_inference.infer 7.98% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000074 2 100.00% : 0.000074s : 2: match.inline ------[predicate.] 0.000135 984 0.84% : 0.000001s : 9: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 9: predicate.addn_zero_filter 0.73% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.20% : 0.000003s : 17: predicate.arithmetic_simplify 0.90% : 0.000001s : 9: predicate.cast_eliminate 0.76% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.87% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.24% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_depend_swap 1.92% : 0.000003s : 21: predicate.environ_get_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.14% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.81% : 0.000001s : 8: predicate.get_grad_eliminate 0.30% : 0.000000s : 4: predicate.graph_param_transform 0.76% : 0.000001s : 8: predicate.incorporate_call 0.68% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 0.96% : 0.000001s : 8: predicate.inline_without_move 0.47% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.91% : 0.000001s : 8: predicate.less_batch_normalization 1.56% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.25% : 0.000003s : 26: predicate.load_eliminater 1.26% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.76% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.82% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.37% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.41% : 0.000001s : 4: predicate.parallel_virtual_node 1.26% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.81% : 0.000001s : 9: predicate.print_const_string_wrapper 0.79% : 0.000001s : 8: predicate.reduce_all_const_elim 1.12% : 0.000002s : 9: predicate.reduce_eliminate 2.43% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.82% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.75% : 0.000001s : 8: predicate.replace_old_param 0.42% : 0.000001s : 4: predicate.reset_defer_inline 0.78% : 0.000001s : 9: predicate.reshape_eliminate 0.76% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.44% : 0.000001s : 4: predicate.row_tensor_eliminate 0.96% : 0.000001s : 8: predicate.same_eliminate 0.61% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.00% : 0.000001s : 8: predicate.shard_identity_eliminate 0.88% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.50% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.74% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.51% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.61% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.20% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.32% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.58% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.03% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.44% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.41% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.59% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000238 6 42.92% : 0.000102s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.08% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025947 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.47% : 0.002976s : 1: add_attr 11.44% : 0.002968s : 1: add_attr_with_inline 0.01% : 0.000003s : 1: add_comm_op_reuse_tag 0.18% : 0.000046s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000005s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.95% : 0.000507s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.04% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.64% : 0.000424s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.78% : 0.000461s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.93% : 0.000760s : 78: opt.transform.opt_a 0.10% : 0.000026s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.34% : 0.000089s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.09% : 0.001840s : 1: opt_a 0.37% : 0.000095s : 1: opt_after_cconv 1.76% : 0.000457s : 1: opt_after_jit_grad 0.70% : 0.000183s : 1: opt_b 14.16% : 0.003673s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.21% : 0.000055s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000008s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.09% : 0.000024s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.02% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.72% : 0.000187s : 1: renormalize.infer 0.56% : 0.000145s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000042s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000072s : 1: symbol_engine_optimizer 22.36% : 0.005802s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.81% : 0.004362s : 1: type_inference 0.22% : 0.000057s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x6-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x6-kbk],max_mem:68.0M TotalTime = 0.0788422, [24] [bootstrap]: 0.00057193 [type_inference]: 0.00616477 [event_method]: 1.363e-05 [auto_monad]: 5.77e-05 [graph_reusing]: 5.44e-06 [inline]: 1.91998e-06 [add_attr]: 0.00336874, [1] [add_attr_with_inline]: 0.00335803, [1] [Cycle 1]: 4.399e-05, [2] [tag_attr]: 1.436e-05 [meta_addattr_fg_expand]: 3.97e-06 [parallel-infer-symbol]: 2.78e-06 [pre_auto_parallel]: 2.694e-05 [insert-virtual-dataset]: 2.34001e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00398156, [53] [py_interpret_to_execute]: 1.928e-05 [rewriter_before_opt_a]: 5.716e-05 [opt_a]: 0.00214192, [2] [Cycle 1]: 0.00153777, [45] [expand_dump_flag]: 3.26999e-06 [switch_simplify]: 3.166e-05 [loop_unroll]: 2.088e-05 [a_1]: 0.00048564 [with_stream_mark]: 1.298e-05 [recompute_prepare]: 7.9e-06 [updatestate_depend_eliminate]: 3.70998e-06 [updatestate_assign_eliminate]: 3.42002e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.67999e-06 [a_2]: 7.503e-05 [accelerated_algorithm]: 6.29999e-06 [shard]: 2.31e-06 [meta_shard_fg_expand]: 1.54e-06 [shard_inline]: 5.78002e-06 [merge_send_recv]: 7.93001e-06 [auto_parallel]: 5.82001e-06 [parallel]: 2.226e-05 [flash_sp]: 7.33999e-06 [merge_comm]: 3.5e-06 [allreduce_fusion]: 3.46999e-06 [matmul_add_comm_reduction]: 8.69e-06 [allreduce_slice_to_reducescatter]: 5.89993e-07 [virtual_shard_identity]: 7.61999e-06 [virtual_dataset]: 6.07999e-06 [get_grad_eliminate_]: 5.74e-06 [virtual_output]: 5.82001e-06 [merge_forward]: 3.6e-06 [cell_reuse_recompute_pass]: 1.17999e-06 [offload_activation]: 9.02999e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.052e-05 [merge_recompute_call_nodes]: 1.29998e-06 [before_grad]: 8.93002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45998e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.71999e-06 [after_resolve]: 1.007e-05 [a_after_grad]: 8.85001e-06 [renormalize]: 0.00040784 [add_forward_monad_depend]: 4.95001e-06 [auto_monad_grad]: 1.77001e-06 [auto_monad_eliminator]: 1.289e-05 [cse]: 2.827e-05 [a_3]: 4.02e-05 [Cycle 2]: 0.0005948, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 6.85998e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012535 [with_stream_mark]: 9.15999e-06 [recompute_prepare]: 5.66003e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.19999e-06 [updatestate_loads_eliminate]: 2.52001e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 6.818e-05 [accelerated_algorithm]: 5.51e-06 [shard]: 9.89996e-07 [meta_shard_fg_expand]: 1.07e-06 [shard_inline]: 5.62001e-06 [merge_send_recv]: 4.42e-06 [auto_parallel]: 5.41002e-06 [parallel]: 4.05e-06 [flash_sp]: 3.08e-06 [merge_comm]: 3.14001e-06 [allreduce_fusion]: 2.76999e-06 [matmul_add_comm_reduction]: 4.68001e-06 [allreduce_slice_to_reducescatter]: 2.60014e-07 [virtual_shard_identity]: 6.17999e-06 [virtual_dataset]: 5.39e-06 [get_grad_eliminate_]: 5.39e-06 [virtual_output]: 4.92999e-06 [merge_forward]: 2.64001e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 5.81998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.36e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.83001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.54998e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 9.81e-06 [a_after_grad]: 8.67998e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 8.70001e-07 [auto_monad_eliminator]: 6.12999e-06 [cse]: 1.343e-05 [a_3]: 3.244e-05 [py_interpret_to_execute_after_opt_a]: 7.48e-06 [slice_cell_reuse_recomputed_activation]: 1.92001e-06 [rewriter_after_opt_a]: 3.011e-05 [convert_after_rewriter]: 6.86999e-06 [order_py_execute_after_rewriter]: 5.51e-06 [mutable_eliminate]: 0.00045293 [opt_b]: 0.00018103, [1] [Cycle 1]: 0.00017525, [7] [b_1]: 0.00010768 [b_2]: 7.01001e-06 [updatestate_depend_eliminate]: 5.37001e-06 [updatestate_assign_eliminate]: 2.41e-06 [updatestate_loads_eliminate]: 2.29999e-06 [renormalize]: 3.89991e-07 [cse]: 1.612e-05 [optimize_parallel_all_gather_comm]: 1.542e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.324e-05 [loop_unroll]: 0.0004181 [opt_after_cconv]: 9.409e-05, [1] [Cycle 1]: 8.843e-05, [7] [c_1]: 2.779e-05 [parameter_eliminate]: 2.20002e-06 [updatestate_depend_eliminate]: 5.14998e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.26e-06 [cse]: 1.58e-05 [renormalize]: 3.00002e-07 [remove_dup_value]: 1.235e-05 [tuple_transform]: 6.891e-05, [1] [Cycle 1]: 6.449e-05, [4] [d_1]: 3.884e-05 [none_parameter_eliminate]: 1.51998e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.66998e-06 [add_recomputation]: 4.56e-05 [cse_after_recomputation]: 1.96e-05, [1] [Cycle 1]: 1.543e-05, [1] [cse]: 1.032e-05 [environ_conv]: 4.27e-06 [swap_dp_allreduce_reducescatter]: 5.00999e-06 [bias_add_comm_swap]: 2.33002e-06 [label_micro_interleaved_index]: 3.7e-06 [label_fine_grained_interleaved_index]: 2.48002e-06 [merge_cast_opt]: 1.39e-06 [slice_recompute_activation]: 2.37999e-06 [micro_interleaved_order_control]: 2.06e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 1.06002e-06 [remove_cast_before_assign_add]: 9.10019e-07 [full_micro_interleaved_order_control]: 2.34999e-06 [reorder_send_recv_between_fp_bp]: 2.74001e-06 [comm_op_add_attrs]: 1.16997e-06 [add_comm_op_reuse_tag]: 1.00999e-06 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.02998e-06 [overlap_opt_shard_in_pipeline]: 1.11002e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.173e-05 [grouped_pairwise_exchange_alltoall]: 1.89999e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.50999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.44e-06 [overlap_recompute_comm]: 2.16e-06 [overlap_grad_ring_attention]: 3.9e-06 [overlap_grad_flash_sp]: 1.746e-05 [begin_end_overlap_inline]: 5.50004e-07 [split_matmul_comm_elemetwise]: 2.00002e-06 [split_layernorm_comm]: 1.88997e-06 [handle_group_info]: 1.02998e-06 [symbol_engine_optimizer]: 7.009e-05, [1] [Cycle 1]: 6.591e-05, [6] [build]: 2.80002e-06 [elim_shapecalc]: 8.75999e-06 [elim_not_effective]: 1.172e-05 [opt_reshape]: 6.31e-06 [fold_const_symbol]: 9.22001e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.60001e-06 [pipeline_parallel_scheduler]: 1.88002e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 9.99979e-07 [rewriter_after_jit_bprop_graph]: 3.48e-06 [opt_after_jit_grad]: 0.00048434 [validate]: 3.093e-05 [backend_pass]: 8.50006e-07 [task_emit]: 0.0638847 [execute]: 8.11002e-06 Sums bootstrap : 0.000572s : 0.77% type_inference : 0.006165s : 8.28% event_method : 0.000014s : 0.02% auto_monad : 0.000058s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000014s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000019s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000026s : 0.04% optimize.opt_a.a_1 : 0.000611s : 0.82% optimize.opt_a.with_stream_mark : 0.000022s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000026s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000018s : 0.02% optimize.opt_a.renormalize : 0.000408s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000073s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000006s : 0.01% optimize.mutable_eliminate : 0.000453s : 0.61% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000418s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000484s : 0.65% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063885s : 85.75% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000167 30 14.41% : 0.000024s : 5: substitution.arithmetic_simplify 1.02% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.08% : 0.000005s : 4: substitution.graph_param_transform 67.68% : 0.000113s : 3: substitution.inline 1.59% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.54% : 0.000004s : 4: substitution.remove_not_recompute_node 2.49% : 0.000004s : 4: substitution.replace_old_param 6.45% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006121 2 90.70% : 0.005551s : 1: type_inference.infer 9.30% : 0.000569s : 1: type_inference.specialize ------[replace.] 0.000038 5 70.28% : 0.000027s : 3: replace.inline 29.72% : 0.000011s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000121 5 91.93% : 0.000111s : 3: match.inline 8.07% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.91% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.86% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.26% : 0.000004s : 19: predicate.arithmetic_simplify 0.89% : 0.000001s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.59% : 0.000001s : 8: predicate.compare_switch_simplify 0.24% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 1.01% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.86% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.10% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.41% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.13% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.12% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.30% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.24% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.73% : 0.000001s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.73% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000009s : 51: predicate.inline 0.91% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.90% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 1.04% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.09% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.68% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.02% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.72% : 0.000003s : 16: predicate.partial_defer_inline 1.45% : 0.000002s : 17: predicate.partial_eliminate 0.89% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.05% : 0.000002s : 11: predicate.reduce_eliminate 2.48% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.42% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.81% : 0.000001s : 8: predicate.same_eliminate 0.52% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.76% : 0.000001s : 8: predicate.specialize_transform 0.95% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.85% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.36% : 0.000002s : 16: predicate.switch_defer_inline 1.97% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.99% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.98% : 0.000002s : 11: predicate.transpose_eliminate 1.52% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.48% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.54% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.39% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.31% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.63% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.36% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.50% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.40% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000353 8 47.35% : 0.000167s : 3: func_graph_cloner_run.FuncGraphClonerGraph 52.65% : 0.000186s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087728 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.84% : 0.003373s : 1: add_attr 3.83% : 0.003362s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000063s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.70% : 0.000611s : 1: bootstrap 0.03% : 0.000027s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000006s : 1: label_micro_interleaved_index 0.49% : 0.000427s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000462s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.11% : 0.000977s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000033s : 4: opt.transform.symbol_engine_opt 2.44% : 0.002145s : 1: opt_a 0.11% : 0.000097s : 1: opt_after_cconv 0.56% : 0.000494s : 1: opt_after_jit_grad 0.21% : 0.000184s : 1: opt_b 4.54% : 0.003985s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000009s : 1: order_py_execute_after_rewriter 0.02% : 0.000021s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000031s : 1: pre_auto_parallel 0.03% : 0.000023s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000210s : 1: renormalize.infer 0.22% : 0.000191s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000073s : 1: symbol_engine_optimizer 72.84% : 0.063902s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 7.04% : 0.006178s : 1: type_inference 0.07% : 0.000057s : 1: validate TotalTime = 0.0702989, [24] [bootstrap]: 0.00047242 [type_inference]: 0.00438845 [event_method]: 1.031e-05 [auto_monad]: 5.055e-05 [graph_reusing]: 5.31002e-06 [inline]: 2.26e-06 [add_attr]: 0.00292934, [1] [add_attr_with_inline]: 0.00292172, [1] [Cycle 1]: 4.541e-05, [2] [tag_attr]: 1.186e-05 [meta_addattr_fg_expand]: 3.11001e-06 [parallel-infer-symbol]: 2.96001e-06 [pre_auto_parallel]: 2.181e-05 [insert-virtual-dataset]: 2.40002e-06 [parallel-infer-symbol-second]: 7.39994e-07 [dataset_repeat_opt]: 1.75001e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00369865, [53] [py_interpret_to_execute]: 1.452e-05 [rewriter_before_opt_a]: 3.94e-05 [opt_a]: 0.00189711, [2] [Cycle 1]: 0.00129503, [45] [expand_dump_flag]: 2.75997e-06 [switch_simplify]: 2.385e-05 [loop_unroll]: 1.452e-05 [a_1]: 0.0002892 [with_stream_mark]: 1.379e-05 [recompute_prepare]: 7.05e-06 [updatestate_depend_eliminate]: 3.41999e-06 [updatestate_assign_eliminate]: 3.33998e-06 [updatestate_loads_eliminate]: 2.66999e-06 [parameter_eliminate]: 1.66e-06 [a_2]: 7.628e-05 [accelerated_algorithm]: 6.29001e-06 [shard]: 2.34001e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 5.76e-06 [merge_send_recv]: 7.77002e-06 [auto_parallel]: 6.02999e-06 [parallel]: 1.756e-05 [flash_sp]: 7.19001e-06 [merge_comm]: 3.76001e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 9.51e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 6.91001e-06 [virtual_dataset]: 5.74e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.51002e-06 [merge_forward]: 3.64002e-06 [cell_reuse_recompute_pass]: 1.00001e-06 [offload_activation]: 9.46e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.104e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 8.97e-06 [set_forward_comm_id_for_comm_node_pass]: 3.35998e-06 [meta_fg_expand]: 2.13998e-06 [flash_sp_send_recv_attached]: 2.37999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.053e-05 [a_after_grad]: 5.784e-05 [renormalize]: 0.00034221 [add_forward_monad_depend]: 4.74002e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.241e-05 [cse]: 2.569e-05 [a_3]: 3.978e-05 [Cycle 2]: 0.00059266, [45] [expand_dump_flag]: 8.40024e-07 [switch_simplify]: 6.85002e-06 [loop_unroll]: 5.50001e-06 [a_1]: 0.00012488 [with_stream_mark]: 1.055e-05 [recompute_prepare]: 5.97999e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.34001e-06 [updatestate_loads_eliminate]: 2.38998e-06 [parameter_eliminate]: 8.40024e-07 [a_2]: 6.748e-05 [accelerated_algorithm]: 5.56e-06 [shard]: 1.15999e-06 [meta_shard_fg_expand]: 1.14003e-06 [shard_inline]: 5.74999e-06 [merge_send_recv]: 4.29002e-06 [auto_parallel]: 5.36002e-06 [parallel]: 4.21001e-06 [flash_sp]: 3.06001e-06 [merge_comm]: 3.04999e-06 [allreduce_fusion]: 3.44001e-06 [matmul_add_comm_reduction]: 5.04003e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.35997e-06 [virtual_dataset]: 5.32001e-06 [get_grad_eliminate_]: 4.98001e-06 [virtual_output]: 5.09e-06 [merge_forward]: 2.91e-06 [cell_reuse_recompute_pass]: 1.24e-06 [offload_activation]: 5.78997e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47999e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.32e-06 [set_forward_comm_id_for_comm_node_pass]: 3.2e-06 [meta_fg_expand]: 1.68002e-06 [flash_sp_send_recv_attached]: 7.59988e-07 [receive_attached]: 9.50007e-07 [after_resolve]: 9.22999e-06 [a_after_grad]: 7.88999e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.04998e-06 [auto_monad_grad]: 8.50006e-07 [auto_monad_eliminator]: 6.38e-06 [cse]: 1.283e-05 [a_3]: 3.178e-05 [py_interpret_to_execute_after_opt_a]: 7.45e-06 [slice_cell_reuse_recomputed_activation]: 2.15002e-06 [rewriter_after_opt_a]: 3.073e-05 [convert_after_rewriter]: 6.71e-06 [order_py_execute_after_rewriter]: 4.95001e-06 [mutable_eliminate]: 0.00044911 [opt_b]: 0.00018164, [1] [Cycle 1]: 0.00017557, [7] [b_1]: 0.00010748 [b_2]: 6.98e-06 [updatestate_depend_eliminate]: 5.17e-06 [updatestate_assign_eliminate]: 2.46e-06 [updatestate_loads_eliminate]: 2.32001e-06 [renormalize]: 5.40022e-07 [cse]: 1.589e-05 [optimize_parallel_all_gather_comm]: 1.478e-05 [overlap_param_gather]: 2.31e-06 [cconv]: 2.18e-05 [loop_unroll]: 0.00041556 [opt_after_cconv]: 9.587e-05, [1] [Cycle 1]: 8.983e-05, [7] [c_1]: 2.771e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 4.85999e-06 [updatestate_assign_eliminate]: 2.64999e-06 [updatestate_loads_eliminate]: 2.24001e-06 [cse]: 1.649e-05 [renormalize]: 3.19997e-07 [remove_dup_value]: 1.234e-05 [tuple_transform]: 6.785e-05, [1] [Cycle 1]: 6.362e-05, [4] [d_1]: 3.85e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.06998e-06 [partial_unused_args_eliminate]: 2.08998e-06 [add_recomputation]: 4.231e-05 [cse_after_recomputation]: 1.963e-05, [1] [Cycle 1]: 1.558e-05, [1] [cse]: 1.049e-05 [environ_conv]: 4.59002e-06 [swap_dp_allreduce_reducescatter]: 4.94003e-06 [bias_add_comm_swap]: 2.41998e-06 [label_micro_interleaved_index]: 4.35e-06 [label_fine_grained_interleaved_index]: 2.61e-06 [merge_cast_opt]: 1.19998e-06 [slice_recompute_activation]: 1.99e-06 [micro_interleaved_order_control]: 2.50002e-06 [assign_add_opt]: 1.25001e-06 [ForceFp32Comm]: 7.2e-07 [remove_cast_before_assign_add]: 1.05999e-06 [full_micro_interleaved_order_control]: 2.04999e-06 [reorder_send_recv_between_fp_bp]: 2.82002e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 1.17e-06 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.02998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84e-06 [control_data_broadcast_order]: 1.141e-05 [grouped_pairwise_exchange_alltoall]: 1.52001e-06 [offloading_packed_experts]: 3.55e-06 [overlap_recompute_and_grad_model_parallel]: 4.42e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.12999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.21e-06 [overlap_grad_ring_attention]: 3.90998e-06 [overlap_grad_flash_sp]: 1.676e-05 [begin_end_overlap_inline]: 8.09989e-07 [split_matmul_comm_elemetwise]: 2.19001e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 9.39996e-07 [symbol_engine_optimizer]: 6.863e-05, [1] [Cycle 1]: 6.446e-05, [6] [build]: 2.19999e-06 [elim_shapecalc]: 8.71002e-06 [elim_not_effective]: 1.175e-05 [opt_reshape]: 5.91e-06 [fold_const_symbol]: 8.72e-06 [renormalize]: 2.40019e-07 [detach_backward]: 1.83002e-06 [pipeline_parallel_scheduler]: 1.39998e-06 [auto_monad_reorder]: 1.529e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00045169 [validate]: 3.143e-05 [backend_pass]: 8.10018e-07 [task_emit]: 0.0579997 [execute]: 8.10999e-06 Sums bootstrap : 0.000472s : 0.71% type_inference : 0.004388s : 6.61% event_method : 0.000010s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000022s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000020s : 0.03% optimize.opt_a.a_1 : 0.000414s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000007s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000066s : 0.10% optimize.opt_a.renormalize : 0.000342s : 0.52% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000449s : 0.68% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000416s : 0.63% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000042s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000003s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000452s : 0.68% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058000s : 87.33% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000119 26 18.97% : 0.000022s : 4: substitution.arithmetic_simplify 1.53% : 0.000002s : 2: substitution.elim_not_effective 1.05% : 0.000001s : 2: substitution.fold_const_symbol 4.41% : 0.000005s : 4: substitution.graph_param_transform 64.70% : 0.000077s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.64% : 0.000004s : 4: substitution.remove_not_recompute_node 3.36% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004347 2 91.43% : 0.003975s : 1: type_inference.infer 8.57% : 0.000372s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000138 984 0.79% : 0.000001s : 9: predicate.accumulaten_eliminater 1.00% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.65% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.69% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.59% : 0.000004s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.69% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.91% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.25% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.38% : 0.000001s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_depend_swap 1.89% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.97% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 1.01% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.89% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 6.05% : 0.000008s : 44: predicate.inline 1.06% : 0.000001s : 8: predicate.inline_without_move 0.46% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.14% : 0.000002s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.16% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.71% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.79% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.27% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.17% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.77% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 1.01% : 0.000001s : 9: predicate.reduce_eliminate 2.23% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.91% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.78% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000000s : 4: predicate.reset_defer_inline 0.81% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.93% : 0.000001s : 8: predicate.same_eliminate 0.66% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.90% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.47% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.04% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.45% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.53% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.61% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.24% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.39% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.07% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.85% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.55% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000258 6 44.31% : 0.000114s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.69% : 0.000144s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078239 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.75% : 0.002934s : 1: add_attr 3.74% : 0.002925s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000046s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000508s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000006s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.59% : 0.000458s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.04% : 0.000813s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000090s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.43% : 0.001900s : 1: opt_a 0.13% : 0.000099s : 1: opt_after_cconv 0.59% : 0.000461s : 1: opt_after_jit_grad 0.24% : 0.000185s : 1: opt_b 4.73% : 0.003702s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.03% : 0.000026s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000187s : 1: renormalize.infer 0.19% : 0.000149s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.15% : 0.058015s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.63% : 0.004402s : 1: type_inference 0.07% : 0.000053s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x6-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x6-ge],max_mem:68.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x7-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x7-pynative],max_mem:68.0M TotalTime = 0.0211686, [24] [bootstrap]: 0.00058941 [type_inference]: 0.00617055 [event_method]: 1.468e-05 [auto_monad]: 5.514e-05 [graph_reusing]: 5.27001e-06 [inline]: 2.02001e-06 [add_attr]: 0.00336967, [1] [add_attr_with_inline]: 0.00335893, [1] [Cycle 1]: 4.31e-05, [2] [tag_attr]: 1.479e-05 [meta_addattr_fg_expand]: 3.9e-06 [parallel-infer-symbol]: 2.58003e-06 [pre_auto_parallel]: 2.772e-05 [insert-virtual-dataset]: 2.32999e-06 [parallel-infer-symbol-second]: 6.59988e-07 [dataset_repeat_opt]: 1.94e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.00399891, [53] [py_interpret_to_execute]: 2.037e-05 [rewriter_before_opt_a]: 5.744e-05 [opt_a]: 0.00216004, [2] [Cycle 1]: 0.00151079, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 3.177e-05 [loop_unroll]: 2.098e-05 [a_1]: 0.00044658 [with_stream_mark]: 1.281e-05 [recompute_prepare]: 7.38e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.53e-06 [updatestate_loads_eliminate]: 2.92002e-06 [parameter_eliminate]: 1.76e-06 [a_2]: 7.484e-05 [accelerated_algorithm]: 6.24001e-06 [shard]: 1.87999e-06 [meta_shard_fg_expand]: 1.57999e-06 [shard_inline]: 6.02999e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 5.54e-06 [parallel]: 2.314e-05 [flash_sp]: 7.01999e-06 [merge_comm]: 4.2e-06 [allreduce_fusion]: 3.21001e-06 [matmul_add_comm_reduction]: 8.86002e-06 [allreduce_slice_to_reducescatter]: 6.30011e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 6.06998e-06 [get_grad_eliminate_]: 5.57999e-06 [virtual_output]: 5.67999e-06 [merge_forward]: 3.53999e-06 [cell_reuse_recompute_pass]: 1.10999e-06 [offload_activation]: 9.19e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.095e-05 [merge_recompute_call_nodes]: 1.59998e-06 [before_grad]: 9.16002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.32999e-06 [flash_sp_send_recv_attached]: 2.31e-06 [receive_attached]: 2.29999e-06 [after_resolve]: 1.018e-05 [a_after_grad]: 8.72e-06 [renormalize]: 0.00042216 [add_forward_monad_depend]: 5.19998e-06 [auto_monad_grad]: 1.89999e-06 [auto_monad_eliminator]: 1.339e-05 [cse]: 2.7e-05 [a_3]: 4.105e-05 [Cycle 2]: 0.00063995, [45] [expand_dump_flag]: 1.04e-06 [switch_simplify]: 7.12002e-06 [loop_unroll]: 5.67001e-06 [a_1]: 0.00016246 [with_stream_mark]: 1.044e-05 [recompute_prepare]: 5.92001e-06 [updatestate_depend_eliminate]: 2.83e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.21e-06 [parameter_eliminate]: 9.09989e-07 [a_2]: 6.966e-05 [accelerated_algorithm]: 5.89e-06 [shard]: 1.10999e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.45001e-06 [merge_send_recv]: 4.38999e-06 [auto_parallel]: 4.91002e-06 [parallel]: 6.01998e-06 [flash_sp]: 3.46999e-06 [merge_comm]: 2.84999e-06 [allreduce_fusion]: 2.79999e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 3.00002e-07 [virtual_shard_identity]: 6.06998e-06 [virtual_dataset]: 5.58997e-06 [get_grad_eliminate_]: 5.00999e-06 [virtual_output]: 6.26e-06 [merge_forward]: 2.51998e-06 [cell_reuse_recompute_pass]: 1.32999e-06 [offload_activation]: 6.03998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.98002e-06 [merge_recompute_call_nodes]: 6.69999e-07 [before_grad]: 8.25999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.09001e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 8.09989e-07 [receive_attached]: 1.12e-06 [after_resolve]: 9.15999e-06 [a_after_grad]: 8.09997e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 8.89995e-07 [auto_monad_eliminator]: 6.71e-06 [cse]: 1.311e-05 [a_3]: 3.169e-05 [py_interpret_to_execute_after_opt_a]: 7.7e-06 [slice_cell_reuse_recomputed_activation]: 2.19001e-06 [rewriter_after_opt_a]: 3.176e-05 [convert_after_rewriter]: 6.88e-06 [order_py_execute_after_rewriter]: 5.07e-06 [mutable_eliminate]: 0.0004509 [opt_b]: 0.00018074, [1] [Cycle 1]: 0.00017466, [7] [b_1]: 0.00010709 [b_2]: 6.99001e-06 [updatestate_depend_eliminate]: 5.26002e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.24001e-06 [renormalize]: 4.30009e-07 [cse]: 1.625e-05 [optimize_parallel_all_gather_comm]: 1.518e-05 [overlap_param_gather]: 1.79998e-06 [cconv]: 2.156e-05 [loop_unroll]: 0.00041931 [opt_after_cconv]: 9.419e-05, [1] [Cycle 1]: 8.879e-05, [7] [c_1]: 2.776e-05 [parameter_eliminate]: 2.09999e-06 [updatestate_depend_eliminate]: 4.84e-06 [updatestate_assign_eliminate]: 2.50002e-06 [updatestate_loads_eliminate]: 2.27999e-06 [cse]: 1.655e-05 [renormalize]: 3.59985e-07 [remove_dup_value]: 1.289e-05 [tuple_transform]: 6.823e-05, [1] [Cycle 1]: 6.403e-05, [4] [d_1]: 3.833e-05 [none_parameter_eliminate]: 1.52001e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.22001e-06 [partial_unused_args_eliminate]: 1.67999e-06 [add_recomputation]: 4.856e-05 [cse_after_recomputation]: 2.055e-05, [1] [Cycle 1]: 1.609e-05, [1] [cse]: 1.075e-05 [environ_conv]: 4.67e-06 [swap_dp_allreduce_reducescatter]: 5.10999e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.09997e-06 [label_fine_grained_interleaved_index]: 2.47001e-06 [merge_cast_opt]: 1.17e-06 [slice_recompute_activation]: 2.11e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.18001e-06 [ForceFp32Comm]: 7.89994e-07 [remove_cast_before_assign_add]: 1.23002e-06 [full_micro_interleaved_order_control]: 2.11e-06 [reorder_send_recv_between_fp_bp]: 2.57001e-06 [comm_op_add_attrs]: 1.24998e-06 [add_comm_op_reuse_tag]: 9.50007e-07 [interleave_split_concat_branches]: 1.16002e-06 [interleave_parallel_branches]: 1.00001e-06 [overlap_opt_shard_in_pipeline]: 1.14e-06 [overlap_opt_shard_grad_in_pipeline]: 1.47001e-06 [control_data_broadcast_order]: 1.199e-05 [grouped_pairwise_exchange_alltoall]: 1.67001e-06 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.50001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.36002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34003e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 3.95998e-06 [overlap_grad_flash_sp]: 1.643e-05 [begin_end_overlap_inline]: 7.80012e-07 [split_matmul_comm_elemetwise]: 1.97001e-06 [split_layernorm_comm]: 1.77001e-06 [handle_group_info]: 1.00001e-06 [symbol_engine_optimizer]: 6.792e-05, [1] [Cycle 1]: 6.385e-05, [6] [build]: 2.61e-06 [elim_shapecalc]: 8.37e-06 [elim_not_effective]: 1.137e-05 [opt_reshape]: 6.17001e-06 [fold_const_symbol]: 8.62998e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.76e-06 [pipeline_parallel_scheduler]: 1.62999e-06 [auto_monad_reorder]: 1.558e-05 [get_jit_bprop_graph]: 1.05001e-06 [rewriter_after_jit_bprop_graph]: 3.54002e-06 [opt_after_jit_grad]: 0.00045117 [validate]: 3.094e-05 [backend_pass]: 9.70002e-07 [task_emit]: 0.00621472 [execute]: 7.19001e-06 Sums bootstrap : 0.000589s : 3.50% type_inference : 0.006171s : 36.65% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.16% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000020s : 0.12% optimize.rewriter_before_opt_a : 0.000057s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000609s : 3.62% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000013s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000144s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.07% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000010s : 0.06% optimize.opt_a.parallel : 0.000029s : 0.17% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.06% optimize.opt_a.virtual_output : 0.000012s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000422s : 2.51% optimize.opt_a.add_forward_monad_depend : 0.000007s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.12% optimize.opt_a.cse : 0.000040s : 0.24% optimize.opt_a.a_3 : 0.000073s : 0.43% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000032s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000451s : 2.68% optimize.opt_b.b_1 : 0.000107s : 0.64% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000419s : 2.49% optimize.opt_after_cconv.c_1 : 0.000028s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.08% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000004s : 0.02% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.01% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000001s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000016s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000003s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.02% opt_after_jit_grad : 0.000451s : 2.68% validate : 0.000031s : 0.18% backend_pass : 0.000001s : 0.01% task_emit : 0.006215s : 36.91% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000161 30 15.29% : 0.000025s : 5: substitution.arithmetic_simplify 1.08% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.26% : 0.000005s : 4: substitution.graph_param_transform 66.37% : 0.000107s : 3: substitution.inline 1.82% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.73% : 0.000004s : 4: substitution.remove_not_recompute_node 2.42% : 0.000004s : 4: substitution.replace_old_param 6.27% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006129 2 90.57% : 0.005551s : 1: type_inference.infer 9.43% : 0.000578s : 1: type_inference.specialize ------[replace.] 0.000037 5 69.22% : 0.000026s : 3: replace.inline 30.78% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 92.01% : 0.000105s : 3: match.inline 7.99% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.96% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.61% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.35% : 0.000004s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.90% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.95% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.93% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_depend_swap 1.80% : 0.000003s : 23: predicate.environ_get_eliminate 1.10% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.27% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.28% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.22% : 0.000000s : 4: predicate.fold_const_symbol 0.79% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.74% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.10% : 0.000010s : 51: predicate.inline 0.84% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.84% : 0.000001s : 8: predicate.less_batch_normalization 1.73% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.35% : 0.000004s : 32: predicate.load_eliminater 1.08% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.22% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.67% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.61% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.69% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.03% : 0.000002s : 4: predicate.mutable_eliminate 0.35% : 0.000001s : 4: predicate.opt_reshape 0.40% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.41% : 0.000002s : 17: predicate.partial_eliminate 0.83% : 0.000001s : 11: predicate.print_const_string_wrapper 0.62% : 0.000001s : 8: predicate.reduce_all_const_elim 1.25% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.46% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.84% : 0.000001s : 11: predicate.reshape_eliminate 0.71% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.50% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.84% : 0.000001s : 8: predicate.shard_identity_eliminate 0.77% : 0.000001s : 8: predicate.special_op_eliminate 0.75% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.77% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.39% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.01% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.14% : 0.000008s : 54: predicate.switch_simplify 0.84% : 0.000001s : 11: predicate.tile_eliminate 0.88% : 0.000001s : 11: predicate.transpose_eliminate 1.55% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.30% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.40% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.12% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.38% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.14% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.41% : 0.000001s : 4: predicate.value_based_eliminate 0.74% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.71% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000366 8 46.84% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.16% : 0.000194s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.030089 196 0.01% : 0.000003s : 1: ForceFp32Comm 11.21% : 0.003374s : 1: add_attr 11.17% : 0.003362s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000004s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 2.08% : 0.000625s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.42% : 0.000427s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000460s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.25% : 0.000977s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000089s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.10% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.19% : 0.002163s : 1: opt_a 0.32% : 0.000098s : 1: opt_after_cconv 1.53% : 0.000460s : 1: opt_after_jit_grad 0.61% : 0.000184s : 1: opt_b 13.30% : 0.004003s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000024s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.71% : 0.000213s : 1: renormalize.infer 0.67% : 0.000202s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.23% : 0.000071s : 1: symbol_engine_optimizer 20.69% : 0.006225s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.55% : 0.006184s : 1: type_inference 0.20% : 0.000061s : 1: validate TotalTime = 0.0181152, [24] [bootstrap]: 0.0004684 [type_inference]: 0.00436159 [event_method]: 1.005e-05 [auto_monad]: 5.059e-05 [graph_reusing]: 5.04e-06 [inline]: 2.26998e-06 [add_attr]: 0.00298309, [1] [add_attr_with_inline]: 0.00297502, [1] [Cycle 1]: 4.446e-05, [2] [tag_attr]: 1.236e-05 [meta_addattr_fg_expand]: 3.75e-06 [parallel-infer-symbol]: 2.59999e-06 [pre_auto_parallel]: 2.107e-05 [insert-virtual-dataset]: 2.29001e-06 [parallel-infer-symbol-second]: 7.10017e-07 [dataset_repeat_opt]: 1.71e-06 [pipeline_split]: 1.58002e-06 [optimize]: 0.00367423, [53] [py_interpret_to_execute]: 1.534e-05 [rewriter_before_opt_a]: 3.882e-05 [opt_a]: 0.00184108, [2] [Cycle 1]: 0.00124148, [45] [expand_dump_flag]: 2.78e-06 [switch_simplify]: 2.354e-05 [loop_unroll]: 1.389e-05 [a_1]: 0.00028789 [with_stream_mark]: 1.306e-05 [recompute_prepare]: 7.4e-06 [updatestate_depend_eliminate]: 3.61999e-06 [updatestate_assign_eliminate]: 3.42997e-06 [updatestate_loads_eliminate]: 3.26999e-06 [parameter_eliminate]: 1.58002e-06 [a_2]: 7.634e-05 [accelerated_algorithm]: 6.16e-06 [shard]: 2.49999e-06 [meta_shard_fg_expand]: 1.39e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 7.55e-06 [auto_parallel]: 6.04999e-06 [parallel]: 1.793e-05 [flash_sp]: 7.22002e-06 [merge_comm]: 3.65998e-06 [allreduce_fusion]: 3.45e-06 [matmul_add_comm_reduction]: 9.25999e-06 [allreduce_slice_to_reducescatter]: 5.40022e-07 [virtual_shard_identity]: 7.4e-06 [virtual_dataset]: 5.69e-06 [get_grad_eliminate_]: 5.34e-06 [virtual_output]: 5.37999e-06 [merge_forward]: 3.65998e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 8.97e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.064e-05 [merge_recompute_call_nodes]: 1.37999e-06 [before_grad]: 9.58997e-06 [set_forward_comm_id_for_comm_node_pass]: 3.51001e-06 [meta_fg_expand]: 2.13002e-06 [flash_sp_send_recv_attached]: 2.39999e-06 [receive_attached]: 2.17999e-06 [after_resolve]: 1.037e-05 [a_after_grad]: 8.89e-06 [renormalize]: 0.00033893 [add_forward_monad_depend]: 4.60999e-06 [auto_monad_grad]: 1.72001e-06 [auto_monad_eliminator]: 1.291e-05 [cse]: 2.728e-05 [a_3]: 3.912e-05 [Cycle 2]: 0.00059026, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 7.30998e-06 [loop_unroll]: 5.64e-06 [a_1]: 0.00012482 [with_stream_mark]: 8.99e-06 [recompute_prepare]: 5.77001e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.37001e-06 [parameter_eliminate]: 7.29982e-07 [a_2]: 6.758e-05 [accelerated_algorithm]: 5.56002e-06 [shard]: 9.70002e-07 [meta_shard_fg_expand]: 1.14998e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.25e-06 [auto_parallel]: 5.00999e-06 [parallel]: 3.93001e-06 [flash_sp]: 3.07002e-06 [merge_comm]: 3.31999e-06 [allreduce_fusion]: 2.66999e-06 [matmul_add_comm_reduction]: 4.89e-06 [allreduce_slice_to_reducescatter]: 2.59985e-07 [virtual_shard_identity]: 6.26998e-06 [virtual_dataset]: 5.22e-06 [get_grad_eliminate_]: 5.04e-06 [virtual_output]: 4.90001e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.42e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.028e-05 [merge_recompute_call_nodes]: 7.10017e-07 [before_grad]: 8.01001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.56998e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.00999e-06 [after_resolve]: 8.92e-06 [a_after_grad]: 8.12e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.02e-06 [auto_monad_grad]: 8.30012e-07 [auto_monad_eliminator]: 6.28e-06 [cse]: 1.185e-05 [a_3]: 3.218e-05 [py_interpret_to_execute_after_opt_a]: 7.18998e-06 [slice_cell_reuse_recomputed_activation]: 2.11998e-06 [rewriter_after_opt_a]: 3.054e-05 [convert_after_rewriter]: 6.69001e-06 [order_py_execute_after_rewriter]: 5.00001e-06 [mutable_eliminate]: 0.00044867 [opt_b]: 0.00018136, [1] [Cycle 1]: 0.00017516, [7] [b_1]: 0.00010805 [b_2]: 6.84999e-06 [updatestate_depend_eliminate]: 5.35999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.21e-06 [renormalize]: 2.30008e-07 [cse]: 1.6e-05 [optimize_parallel_all_gather_comm]: 1.618e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.265e-05 [loop_unroll]: 0.00041396 [opt_after_cconv]: 9.292e-05, [1] [Cycle 1]: 8.751e-05, [7] [c_1]: 2.775e-05 [parameter_eliminate]: 2.16e-06 [updatestate_depend_eliminate]: 5.23002e-06 [updatestate_assign_eliminate]: 2.56998e-06 [updatestate_loads_eliminate]: 2.12999e-06 [cse]: 1.506e-05 [renormalize]: 3.50003e-07 [remove_dup_value]: 1.232e-05 [tuple_transform]: 6.968e-05, [1] [Cycle 1]: 6.512e-05, [4] [d_1]: 3.887e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.37001e-06 [partial_unused_args_eliminate]: 1.64e-06 [add_recomputation]: 4.316e-05 [cse_after_recomputation]: 2.009e-05, [1] [Cycle 1]: 1.587e-05, [1] [cse]: 1.079e-05 [environ_conv]: 4.33001e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.12999e-06 [label_micro_interleaved_index]: 4.52e-06 [label_fine_grained_interleaved_index]: 2.91999e-06 [merge_cast_opt]: 1.48002e-06 [slice_recompute_activation]: 2.59001e-06 [micro_interleaved_order_control]: 2.07001e-06 [assign_add_opt]: 1.54e-06 [ForceFp32Comm]: 8.29983e-07 [remove_cast_before_assign_add]: 9.60019e-07 [full_micro_interleaved_order_control]: 2.36e-06 [reorder_send_recv_between_fp_bp]: 2.56e-06 [comm_op_add_attrs]: 1.01002e-06 [add_comm_op_reuse_tag]: 8.99978e-07 [interleave_split_concat_branches]: 1.12999e-06 [interleave_parallel_branches]: 1.00999e-06 [overlap_opt_shard_in_pipeline]: 1.07e-06 [overlap_opt_shard_grad_in_pipeline]: 1.72001e-06 [control_data_broadcast_order]: 1.143e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.56001e-06 [overlap_recompute_and_grad_model_parallel]: 4.1e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30999e-06 [overlap_recompute_comm]: 1.99e-06 [overlap_grad_ring_attention]: 3.75e-06 [overlap_grad_flash_sp]: 1.636e-05 [begin_end_overlap_inline]: 4.7998e-07 [split_matmul_comm_elemetwise]: 2.34001e-06 [split_layernorm_comm]: 1.71e-06 [handle_group_info]: 9.79984e-07 [symbol_engine_optimizer]: 0.0001016, [1] [Cycle 1]: 9.746e-05, [6] [build]: 2.49001e-06 [elim_shapecalc]: 8.20999e-06 [elim_not_effective]: 4.281e-05 [opt_reshape]: 6.43998e-06 [fold_const_symbol]: 9.73002e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.72999e-06 [pipeline_parallel_scheduler]: 1.73002e-06 [auto_monad_reorder]: 1.521e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.41001e-06 [opt_after_jit_grad]: 0.00044851 [validate]: 2.964e-05 [backend_pass]: 9.39996e-07 [task_emit]: 0.00582727 [execute]: 6.85998e-06 Sums bootstrap : 0.000468s : 3.30% type_inference : 0.004362s : 30.76% event_method : 0.000010s : 0.07% auto_monad : 0.000051s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.02% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.03% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000002s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.01% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000015s : 0.11% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000020s : 0.14% optimize.opt_a.a_1 : 0.000413s : 2.91% optimize.opt_a.with_stream_mark : 0.000022s : 0.16% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.07% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000010s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000003s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000018s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000339s : 2.39% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000071s : 0.50% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.04% optimize.mutable_eliminate : 0.000449s : 3.16% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000414s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.cse : 0.000015s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000004s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000016s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.02% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000043s : 0.30% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.05% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000010s : 0.07% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.16% validate : 0.000030s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005827s : 41.10% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000117 26 18.31% : 0.000021s : 4: substitution.arithmetic_simplify 1.67% : 0.000002s : 2: substitution.elim_not_effective 1.23% : 0.000001s : 2: substitution.fold_const_symbol 4.20% : 0.000005s : 4: substitution.graph_param_transform 65.24% : 0.000076s : 2: substitution.inline 2.26% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.70% : 0.000004s : 4: substitution.remove_not_recompute_node 3.39% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004322 2 92.00% : 0.003976s : 1: type_inference.infer 8.00% : 0.000346s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000136 984 0.80% : 0.000001s : 9: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.72% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.50% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.80% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.73% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 1.00% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.83% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.36% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.34% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.23% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.08% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.07% : 0.000001s : 13: predicate.environ_get_depend_swap 1.94% : 0.000003s : 21: predicate.environ_get_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 2.12% : 0.000003s : 11: predicate.float_depend_g_call 0.69% : 0.000001s : 8: predicate.float_environ_get_switch 0.97% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.28% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.65% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 1.02% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.94% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.15% : 0.000003s : 26: predicate.load_eliminater 1.18% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.77% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.73% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.74% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.77% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.69% : 0.000001s : 9: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.46% : 0.000001s : 4: predicate.opt_reshape 0.44% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.28% : 0.000002s : 13: predicate.partial_eliminate 0.74% : 0.000001s : 9: predicate.print_const_string_wrapper 0.80% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.21% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.80% : 0.000001s : 8: predicate.remove_not_recompute_node 1.40% : 0.000002s : 17: predicate.replace_applicator 0.83% : 0.000001s : 8: predicate.replace_old_param 0.45% : 0.000001s : 4: predicate.reset_defer_inline 0.82% : 0.000001s : 9: predicate.reshape_eliminate 0.79% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.67% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.09% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.11% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 1.00% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.46% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.06% : 0.000001s : 11: predicate.switch_defer_inline 1.76% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.44% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.77% : 0.000001s : 9: predicate.transpose_eliminate 1.49% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.66% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.24% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.53% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.05% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.00% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.45% : 0.000001s : 4: predicate.value_based_eliminate 0.84% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.81% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.56% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000287 6 51.61% : 0.000148s : 2: func_graph_cloner_run.FuncGraphClonerGraph 48.39% : 0.000139s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.026064 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.46% : 0.002988s : 1: add_attr 11.43% : 0.002978s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.94% : 0.000505s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000007s : 1: environ_conv 0.06% : 0.000015s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000423s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000458s : 1: mutable_eliminate 0.02% : 0.000006s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000012s : 1: opt.transform.mutable_eliminate 2.93% : 0.000763s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.25% : 0.000064s : 4: opt.transform.symbol_engine_opt 7.07% : 0.001844s : 1: opt_a 0.37% : 0.000096s : 1: opt_after_cconv 1.76% : 0.000458s : 1: opt_after_jit_grad 0.71% : 0.000185s : 1: opt_b 14.11% : 0.003678s : 1: optimize 0.08% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.07% : 0.000019s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000016s : 1: remove_dup_value 0.72% : 0.000186s : 1: renormalize.infer 0.56% : 0.000146s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.40% : 0.000104s : 1: symbol_engine_optimizer 22.39% : 0.005837s : 1: task_emit 0.28% : 0.000072s : 1: tuple_transform 16.79% : 0.004375s : 1: type_inference 0.21% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x7-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x7-kbk],max_mem:68.0M TotalTime = 0.0784007, [24] [bootstrap]: 0.00054371 [type_inference]: 0.00606742 [event_method]: 1.403e-05 [auto_monad]: 0.00012168 [graph_reusing]: 5.34e-06 [inline]: 1.69e-06 [add_attr]: 0.00334545, [1] [add_attr_with_inline]: 0.0033346, [1] [Cycle 1]: 4.536e-05, [2] [tag_attr]: 1.503e-05 [meta_addattr_fg_expand]: 4.03001e-06 [parallel-infer-symbol]: 2.82002e-06 [pre_auto_parallel]: 2.724e-05 [insert-virtual-dataset]: 2.58e-06 [parallel-infer-symbol-second]: 7.00005e-07 [dataset_repeat_opt]: 1.77999e-06 [pipeline_split]: 1.49e-06 [optimize]: 0.00397162, [53] [py_interpret_to_execute]: 2.021e-05 [rewriter_before_opt_a]: 5.731e-05 [opt_a]: 0.00213322, [2] [Cycle 1]: 0.00153435, [45] [expand_dump_flag]: 2.61999e-06 [switch_simplify]: 3.111e-05 [loop_unroll]: 2.125e-05 [a_1]: 0.00047627 [with_stream_mark]: 1.362e-05 [recompute_prepare]: 7.92998e-06 [updatestate_depend_eliminate]: 3.76001e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.19001e-06 [parameter_eliminate]: 1.83002e-06 [a_2]: 7.565e-05 [accelerated_algorithm]: 6.57002e-06 [shard]: 2.22999e-06 [meta_shard_fg_expand]: 1.47001e-06 [shard_inline]: 6.16e-06 [merge_send_recv]: 7.3e-06 [auto_parallel]: 5.91003e-06 [parallel]: 2.384e-05 [flash_sp]: 7.09001e-06 [merge_comm]: 3.69002e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.65999e-06 [allreduce_slice_to_reducescatter]: 5.50004e-07 [virtual_shard_identity]: 7.58999e-06 [virtual_dataset]: 6.11e-06 [get_grad_eliminate_]: 5.67001e-06 [virtual_output]: 5.78002e-06 [merge_forward]: 3.75e-06 [cell_reuse_recompute_pass]: 1.15001e-06 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.077e-05 [merge_recompute_call_nodes]: 1.42e-06 [before_grad]: 9.22999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36999e-06 [meta_fg_expand]: 2.17001e-06 [flash_sp_send_recv_attached]: 3.04001e-06 [receive_attached]: 2.69999e-06 [after_resolve]: 1.082e-05 [a_after_grad]: 8.95001e-06 [renormalize]: 0.00040971 [add_forward_monad_depend]: 4.47998e-06 [auto_monad_grad]: 2.10002e-06 [auto_monad_eliminator]: 1.37e-05 [cse]: 2.705e-05 [a_3]: 4.026e-05 [Cycle 2]: 0.00058964, [45] [expand_dump_flag]: 1.07e-06 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.54e-06 [a_1]: 0.00012429 [with_stream_mark]: 9.69e-06 [recompute_prepare]: 5.79999e-06 [updatestate_depend_eliminate]: 2.84999e-06 [updatestate_assign_eliminate]: 2.22001e-06 [updatestate_loads_eliminate]: 2.68e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.761e-05 [accelerated_algorithm]: 5.57001e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.20999e-06 [shard_inline]: 5.47001e-06 [merge_send_recv]: 4.20999e-06 [auto_parallel]: 5.42001e-06 [parallel]: 3.93999e-06 [flash_sp]: 3.18e-06 [merge_comm]: 3.08e-06 [allreduce_fusion]: 2.79001e-06 [matmul_add_comm_reduction]: 5.49e-06 [allreduce_slice_to_reducescatter]: 2.99973e-07 [virtual_shard_identity]: 6.54999e-06 [virtual_dataset]: 5.55001e-06 [get_grad_eliminate_]: 5.02e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 2.94001e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 5.66998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.44e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 7.85e-06 [set_forward_comm_id_for_comm_node_pass]: 2.99999e-06 [meta_fg_expand]: 1.59998e-06 [flash_sp_send_recv_attached]: 8.00006e-07 [receive_attached]: 1.02e-06 [after_resolve]: 8.85001e-06 [a_after_grad]: 7.95e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 9.60019e-07 [auto_monad_grad]: 8.29983e-07 [auto_monad_eliminator]: 6.01e-06 [cse]: 1.597e-05 [a_3]: 3.198e-05 [py_interpret_to_execute_after_opt_a]: 7.85e-06 [slice_cell_reuse_recomputed_activation]: 1.86e-06 [rewriter_after_opt_a]: 3.233e-05 [convert_after_rewriter]: 7.15e-06 [order_py_execute_after_rewriter]: 5.05999e-06 [mutable_eliminate]: 0.00045173 [opt_b]: 0.0001802, [1] [Cycle 1]: 0.00017417, [7] [b_1]: 0.00010738 [b_2]: 6.81001e-06 [updatestate_depend_eliminate]: 5.22e-06 [updatestate_assign_eliminate]: 2.47001e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 3.39991e-07 [cse]: 1.613e-05 [optimize_parallel_all_gather_comm]: 1.591e-05 [overlap_param_gather]: 1.69e-06 [cconv]: 2.15e-05 [loop_unroll]: 0.0004173 [opt_after_cconv]: 9.487e-05, [1] [Cycle 1]: 8.935e-05, [7] [c_1]: 2.763e-05 [parameter_eliminate]: 2.17999e-06 [updatestate_depend_eliminate]: 5.07999e-06 [updatestate_assign_eliminate]: 2.60002e-06 [updatestate_loads_eliminate]: 2.32999e-06 [cse]: 1.624e-05 [renormalize]: 3.9002e-07 [remove_dup_value]: 1.218e-05 [tuple_transform]: 6.915e-05, [1] [Cycle 1]: 6.498e-05, [4] [d_1]: 3.97e-05 [none_parameter_eliminate]: 1.37999e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.14001e-06 [partial_unused_args_eliminate]: 1.89e-06 [add_recomputation]: 4.584e-05 [cse_after_recomputation]: 2.052e-05, [1] [Cycle 1]: 1.606e-05, [1] [cse]: 1.09e-05 [environ_conv]: 4.62998e-06 [swap_dp_allreduce_reducescatter]: 5.02e-06 [bias_add_comm_swap]: 2.66e-06 [label_micro_interleaved_index]: 4.09002e-06 [label_fine_grained_interleaved_index]: 2.42001e-06 [merge_cast_opt]: 1.11002e-06 [slice_recompute_activation]: 2.19001e-06 [micro_interleaved_order_control]: 2.28998e-06 [assign_add_opt]: 1.50999e-06 [ForceFp32Comm]: 1.02e-06 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.00002e-06 [reorder_send_recv_between_fp_bp]: 2.55002e-06 [comm_op_add_attrs]: 1.07998e-06 [add_comm_op_reuse_tag]: 8.60018e-07 [interleave_split_concat_branches]: 1.13001e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.84998e-06 [control_data_broadcast_order]: 1.125e-05 [grouped_pairwise_exchange_alltoall]: 1.73002e-06 [offloading_packed_experts]: 3.25e-06 [overlap_recompute_and_grad_model_parallel]: 4.17e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.42e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 1.82001e-06 [overlap_grad_ring_attention]: 3.82998e-06 [overlap_grad_flash_sp]: 1.662e-05 [begin_end_overlap_inline]: 5.69999e-07 [split_matmul_comm_elemetwise]: 2.06e-06 [split_layernorm_comm]: 1.60999e-06 [handle_group_info]: 1.14e-06 [symbol_engine_optimizer]: 6.947e-05, [1] [Cycle 1]: 6.534e-05, [6] [build]: 2.43e-06 [elim_shapecalc]: 8.75999e-06 [elim_not_effective]: 1.19e-05 [opt_reshape]: 6.13002e-06 [fold_const_symbol]: 8.87999e-06 [renormalize]: 2.49973e-07 [detach_backward]: 2.09999e-06 [pipeline_parallel_scheduler]: 1.40999e-06 [auto_monad_reorder]: 1.542e-05 [get_jit_bprop_graph]: 1.04e-06 [rewriter_after_jit_bprop_graph]: 3.55998e-06 [opt_after_jit_grad]: 0.00046413 [validate]: 3.002e-05 [backend_pass]: 9.60019e-07 [task_emit]: 0.0635627 [execute]: 8.01001e-06 Sums bootstrap : 0.000544s : 0.73% type_inference : 0.006067s : 8.19% event_method : 0.000014s : 0.02% auto_monad : 0.000122s : 0.16% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000001s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000057s : 0.08% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000038s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.04% optimize.opt_a.a_1 : 0.000601s : 0.81% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000014s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000143s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000028s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000012s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000007s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.01% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000410s : 0.55% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.03% optimize.opt_a.cse : 0.000043s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.10% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000452s : 0.61% optimize.opt_b.b_1 : 0.000107s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000417s : 0.56% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000011s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000002s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000464s : 0.63% validate : 0.000030s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.063563s : 85.79% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000163 30 14.51% : 0.000024s : 5: substitution.arithmetic_simplify 1.36% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.45% : 0.000006s : 4: substitution.graph_param_transform 66.90% : 0.000109s : 3: substitution.inline 1.70% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.62% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.39% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006023 2 91.00% : 0.005481s : 1: type_inference.infer 9.00% : 0.000542s : 1: type_inference.specialize ------[replace.] 0.000040 5 69.89% : 0.000028s : 3: replace.inline 30.11% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.96% : 0.000107s : 3: match.inline 8.04% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000160 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.56% : 0.000001s : 8: predicate.addn_check_dump 0.83% : 0.000001s : 11: predicate.addn_zero_filter 0.77% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.11% : 0.000003s : 19: predicate.arithmetic_simplify 1.13% : 0.000002s : 11: predicate.cast_eliminate 0.68% : 0.000001s : 8: predicate.check_bprop_eliminate 0.56% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.59% : 0.000001s : 8: predicate.depend_value_elim 0.87% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.94% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.05% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.29% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_depend_swap 1.93% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.35% : 0.000004s : 16: predicate.float_depend_g_call 0.57% : 0.000001s : 8: predicate.float_environ_get_switch 0.94% : 0.000002s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.75% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.39% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.69% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.43% : 0.000004s : 32: predicate.load_eliminater 1.01% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.17% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.78% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.60% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.68% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.08% : 0.000002s : 4: predicate.mutable_eliminate 0.33% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.42% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.10% : 0.000002s : 11: predicate.reduce_eliminate 2.47% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.53% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 21: predicate.replace_applicator 0.69% : 0.000001s : 8: predicate.replace_old_param 0.35% : 0.000001s : 4: predicate.reset_defer_inline 0.86% : 0.000001s : 11: predicate.reshape_eliminate 0.69% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.39% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000002s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.85% : 0.000001s : 8: predicate.specialize_transform 0.99% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.82% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.79% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.49% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.70% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.31% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.38% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.32% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.34% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.31% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.51% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 45.94% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.06% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.087244 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.84% : 0.003350s : 1: add_attr 3.83% : 0.003338s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.15% : 0.000127s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000006s : 1: bias_add_comm_swap 0.67% : 0.000580s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.01% : 0.000011s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.49% : 0.000426s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.53% : 0.000461s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000012s : 1: opt.transform.mutable_eliminate 1.11% : 0.000966s : 78: opt.transform.opt_a 0.03% : 0.000026s : 1: opt.transform.opt_after_cconv 0.02% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.45% : 0.002136s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.54% : 0.000473s : 1: opt_after_jit_grad 0.21% : 0.000184s : 1: opt_b 4.56% : 0.003975s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.00% : 0.000004s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000024s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000209s : 1: renormalize.infer 0.22% : 0.000194s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.07% : 0.000061s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 72.88% : 0.063579s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.97% : 0.006081s : 1: type_inference 0.06% : 0.000055s : 1: validate TotalTime = 0.0701777, [24] [bootstrap]: 0.00049088 [type_inference]: 0.00438548 [event_method]: 1.096e-05 [auto_monad]: 5.04e-05 [graph_reusing]: 5.62001e-06 [inline]: 1.59998e-06 [add_attr]: 0.00295662, [1] [add_attr_with_inline]: 0.00294878, [1] [Cycle 1]: 4.386e-05, [2] [tag_attr]: 1.151e-05 [meta_addattr_fg_expand]: 2.89001e-06 [parallel-infer-symbol]: 2.69001e-06 [pre_auto_parallel]: 2.046e-05 [insert-virtual-dataset]: 2.33002e-06 [parallel-infer-symbol-second]: 6.80011e-07 [dataset_repeat_opt]: 1.86e-06 [pipeline_split]: 1.54e-06 [optimize]: 0.00368328, [53] [py_interpret_to_execute]: 1.451e-05 [rewriter_before_opt_a]: 3.811e-05 [opt_a]: 0.00186299, [2] [Cycle 1]: 0.00125719, [45] [expand_dump_flag]: 2.83003e-06 [switch_simplify]: 2.501e-05 [loop_unroll]: 1.389e-05 [a_1]: 0.00029223 [with_stream_mark]: 1.328e-05 [recompute_prepare]: 7.25e-06 [updatestate_depend_eliminate]: 3.58e-06 [updatestate_assign_eliminate]: 3.3e-06 [updatestate_loads_eliminate]: 2.93e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.679e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.11e-06 [meta_shard_fg_expand]: 1.52001e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 7.87e-06 [auto_parallel]: 6.06e-06 [parallel]: 1.695e-05 [flash_sp]: 6.94999e-06 [merge_comm]: 3.53999e-06 [allreduce_fusion]: 3.26999e-06 [matmul_add_comm_reduction]: 8.57e-06 [allreduce_slice_to_reducescatter]: 6.10016e-07 [virtual_shard_identity]: 7.12002e-06 [virtual_dataset]: 5.83002e-06 [get_grad_eliminate_]: 5.84e-06 [virtual_output]: 5.56e-06 [merge_forward]: 3.73999e-06 [cell_reuse_recompute_pass]: 9.99979e-07 [offload_activation]: 8.77e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.082e-05 [merge_recompute_call_nodes]: 1.67001e-06 [before_grad]: 9.74999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.55e-06 [meta_fg_expand]: 2.19001e-06 [flash_sp_send_recv_attached]: 2.42001e-06 [receive_attached]: 2.54999e-06 [after_resolve]: 1.052e-05 [a_after_grad]: 9.12001e-06 [renormalize]: 0.00034878 [add_forward_monad_depend]: 4.38001e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.288e-05 [cse]: 2.588e-05 [a_3]: 4.039e-05 [Cycle 2]: 0.00059644, [45] [expand_dump_flag]: 9.5999e-07 [switch_simplify]: 6.79001e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012564 [with_stream_mark]: 1.087e-05 [recompute_prepare]: 5.96998e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.21e-06 [updatestate_loads_eliminate]: 2.58e-06 [parameter_eliminate]: 8.00006e-07 [a_2]: 6.823e-05 [accelerated_algorithm]: 5.60001e-06 [shard]: 1.17999e-06 [meta_shard_fg_expand]: 1.13001e-06 [shard_inline]: 5.24e-06 [merge_send_recv]: 4.38001e-06 [auto_parallel]: 5.69999e-06 [parallel]: 4.15999e-06 [flash_sp]: 2.92002e-06 [merge_comm]: 3.01999e-06 [allreduce_fusion]: 2.67001e-06 [matmul_add_comm_reduction]: 5.09e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.12001e-06 [virtual_dataset]: 5.29e-06 [get_grad_eliminate_]: 5.07999e-06 [virtual_output]: 5.24998e-06 [merge_forward]: 2.62001e-06 [cell_reuse_recompute_pass]: 1.20999e-06 [offload_activation]: 5.96e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.67999e-06 [merge_recompute_call_nodes]: 6.99976e-07 [before_grad]: 8.32998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.53e-06 [meta_fg_expand]: 1.70001e-06 [flash_sp_send_recv_attached]: 7.89994e-07 [receive_attached]: 1.02e-06 [after_resolve]: 9.12999e-06 [a_after_grad]: 8.28999e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.09e-06 [auto_monad_grad]: 7.89994e-07 [auto_monad_eliminator]: 5.97999e-06 [cse]: 1.332e-05 [a_3]: 3.205e-05 [py_interpret_to_execute_after_opt_a]: 7.31999e-06 [slice_cell_reuse_recomputed_activation]: 1.84e-06 [rewriter_after_opt_a]: 2.983e-05 [convert_after_rewriter]: 6.79001e-06 [order_py_execute_after_rewriter]: 4.79e-06 [mutable_eliminate]: 0.00047217 [opt_b]: 0.0001815, [1] [Cycle 1]: 0.00017557, [7] [b_1]: 0.00010743 [b_2]: 7.65e-06 [updatestate_depend_eliminate]: 5.12e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.25002e-06 [renormalize]: 6.89994e-07 [cse]: 1.656e-05 [optimize_parallel_all_gather_comm]: 1.504e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.179e-05 [loop_unroll]: 0.00041149 [opt_after_cconv]: 9.543e-05, [1] [Cycle 1]: 8.971e-05, [7] [c_1]: 2.802e-05 [parameter_eliminate]: 2.34999e-06 [updatestate_depend_eliminate]: 5.04e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.31e-06 [cse]: 1.576e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.201e-05 [tuple_transform]: 6.908e-05, [1] [Cycle 1]: 6.485e-05, [4] [d_1]: 3.909e-05 [none_parameter_eliminate]: 1.53997e-06 [renormalize]: 1.50001e-07 [switch_simplify]: 6.26e-06 [partial_unused_args_eliminate]: 1.58002e-06 [add_recomputation]: 4.29e-05 [cse_after_recomputation]: 1.969e-05, [1] [Cycle 1]: 1.539e-05, [1] [cse]: 1.018e-05 [environ_conv]: 4.72e-06 [swap_dp_allreduce_reducescatter]: 4.96002e-06 [bias_add_comm_swap]: 2.29001e-06 [label_micro_interleaved_index]: 4.14002e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.44e-06 [slice_recompute_activation]: 2.06e-06 [micro_interleaved_order_control]: 2.37001e-06 [assign_add_opt]: 1.19e-06 [ForceFp32Comm]: 7.60017e-07 [remove_cast_before_assign_add]: 1.09003e-06 [full_micro_interleaved_order_control]: 2.18002e-06 [reorder_send_recv_between_fp_bp]: 2.78998e-06 [comm_op_add_attrs]: 9.70002e-07 [add_comm_op_reuse_tag]: 9.5999e-07 [interleave_split_concat_branches]: 1.11997e-06 [interleave_parallel_branches]: 1.07e-06 [overlap_opt_shard_in_pipeline]: 1.13001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.60001e-06 [control_data_broadcast_order]: 1.187e-05 [grouped_pairwise_exchange_alltoall]: 1.45001e-06 [offloading_packed_experts]: 3.86999e-06 [overlap_recompute_and_grad_model_parallel]: 4.22e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.15999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.71e-06 [overlap_grad_ring_attention]: 3.95998e-06 [overlap_grad_flash_sp]: 1.679e-05 [begin_end_overlap_inline]: 5.29981e-07 [split_matmul_comm_elemetwise]: 1.84e-06 [split_layernorm_comm]: 2.05002e-06 [handle_group_info]: 9.80013e-07 [symbol_engine_optimizer]: 6.872e-05, [1] [Cycle 1]: 6.467e-05, [6] [build]: 2.17001e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.169e-05 [opt_reshape]: 6.41e-06 [fold_const_symbol]: 9.14e-06 [renormalize]: 1.50001e-07 [detach_backward]: 1.54e-06 [pipeline_parallel_scheduler]: 1.46002e-06 [auto_monad_reorder]: 1.487e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.13e-06 [opt_after_jit_grad]: 0.00044701 [validate]: 3.118e-05 [backend_pass]: 8.40024e-07 [task_emit]: 0.0578582 [execute]: 7.95e-06 Sums bootstrap : 0.000491s : 0.74% type_inference : 0.004385s : 6.62% event_method : 0.000011s : 0.02% auto_monad : 0.000050s : 0.08% graph_reusing : 0.000006s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000020s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000038s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000032s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000418s : 0.63% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.01% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.22% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000012s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000018s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000349s : 0.53% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000039s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000472s : 0.71% optimize.opt_b.b_1 : 0.000107s : 0.16% optimize.opt_b.b_2 : 0.000008s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000017s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.03% optimize.loop_unroll : 0.000411s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000003s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.03% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000447s : 0.67% validate : 0.000031s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.057858s : 87.32% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000121 26 18.38% : 0.000022s : 4: substitution.arithmetic_simplify 1.38% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.40% : 0.000005s : 4: substitution.graph_param_transform 65.55% : 0.000079s : 2: substitution.inline 2.34% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.59% : 0.000004s : 4: substitution.remove_not_recompute_node 3.30% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004345 2 91.90% : 0.003993s : 1: type_inference.infer 8.10% : 0.000352s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000077 2 100.00% : 0.000077s : 2: match.inline ------[predicate.] 0.000137 984 0.88% : 0.000001s : 9: predicate.accumulaten_eliminater 1.09% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.74% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.71% : 0.000004s : 17: predicate.arithmetic_simplify 0.76% : 0.000001s : 9: predicate.cast_eliminate 0.85% : 0.000001s : 8: predicate.check_bprop_eliminate 0.66% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.70% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.92% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.23% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.33% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_depend_swap 1.82% : 0.000002s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.80% : 0.000002s : 11: predicate.float_depend_g_call 0.67% : 0.000001s : 8: predicate.float_environ_get_switch 0.99% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.32% : 0.000000s : 4: predicate.graph_param_transform 0.78% : 0.000001s : 8: predicate.incorporate_call 0.60% : 0.000001s : 8: predicate.incorporate_call_switch 5.97% : 0.000008s : 44: predicate.inline 1.13% : 0.000002s : 8: predicate.inline_without_move 0.57% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.99% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.19% : 0.000003s : 26: predicate.load_eliminater 1.29% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.67% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.81% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.83% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.82% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.45% : 0.000002s : 4: predicate.mutable_eliminate 0.43% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.20% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.73% : 0.000001s : 8: predicate.reduce_all_const_elim 1.08% : 0.000001s : 9: predicate.reduce_eliminate 2.08% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.35% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.41% : 0.000001s : 4: predicate.reset_defer_inline 0.72% : 0.000001s : 9: predicate.reshape_eliminate 0.86% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.47% : 0.000001s : 4: predicate.row_tensor_eliminate 0.98% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.96% : 0.000001s : 8: predicate.shard_identity_eliminate 0.91% : 0.000001s : 8: predicate.special_op_eliminate 0.95% : 0.000001s : 8: predicate.specialize_transform 1.15% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 1.03% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.82% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.59% : 0.000006s : 41: predicate.switch_simplify 0.83% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.48% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.51% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.34% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.15% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.43% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.50% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.09% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.80% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.76% : 0.000001s : 8: predicate.virtual_output_eliminate 0.42% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.65% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000255 6 44.38% : 0.000113s : 2: func_graph_cloner_run.FuncGraphClonerGraph 55.62% : 0.000142s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078096 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.79% : 0.002961s : 1: add_attr 3.78% : 0.002952s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.07% : 0.000055s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000005s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.67% : 0.000527s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000420s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.62% : 0.000481s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.99% : 0.000771s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.12% : 0.000091s : 28: opt.transform.opt_b 0.06% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.39% : 0.001866s : 1: opt_a 0.13% : 0.000099s : 1: opt_after_cconv 0.58% : 0.000457s : 1: opt_after_jit_grad 0.24% : 0.000185s : 1: opt_b 4.72% : 0.003687s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000006s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000025s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000015s : 1: remove_dup_value 0.24% : 0.000190s : 1: renormalize.infer 0.19% : 0.000152s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000034s : 1: rewriter_after_opt_a 0.05% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000004s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000071s : 1: symbol_engine_optimizer 74.11% : 0.057874s : 1: task_emit 0.09% : 0.000072s : 1: tuple_transform 5.63% : 0.004400s : 1: type_inference 0.07% : 0.000051s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x7-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x7-ge],max_mem:68.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x8-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x8-pynative],max_mem:68.0M TotalTime = 0.0209979, [24] [bootstrap]: 0.00054562 [type_inference]: 0.00614042 [event_method]: 1.436e-05 [auto_monad]: 5.642e-05 [graph_reusing]: 5.84e-06 [inline]: 1.60999e-06 [add_attr]: 0.0033555, [1] [add_attr_with_inline]: 0.00334503, [1] [Cycle 1]: 4.232e-05, [2] [tag_attr]: 1.453e-05 [meta_addattr_fg_expand]: 3.97998e-06 [parallel-infer-symbol]: 2.64001e-06 [pre_auto_parallel]: 2.793e-05 [insert-virtual-dataset]: 2.53003e-06 [parallel-infer-symbol-second]: 7.59988e-07 [dataset_repeat_opt]: 1.95001e-06 [pipeline_split]: 1.50001e-06 [optimize]: 0.0040035, [53] [py_interpret_to_execute]: 2.15e-05 [rewriter_before_opt_a]: 5.822e-05 [opt_a]: 0.00213584, [2] [Cycle 1]: 0.00152476, [45] [expand_dump_flag]: 2.58e-06 [switch_simplify]: 3.162e-05 [loop_unroll]: 2.101e-05 [a_1]: 0.00045233 [with_stream_mark]: 1.391e-05 [recompute_prepare]: 7.92e-06 [updatestate_depend_eliminate]: 3.76999e-06 [updatestate_assign_eliminate]: 3.26001e-06 [updatestate_loads_eliminate]: 3.08e-06 [parameter_eliminate]: 1.65001e-06 [a_2]: 7.569e-05 [accelerated_algorithm]: 6.56999e-06 [shard]: 2.08002e-06 [meta_shard_fg_expand]: 1.45001e-06 [shard_inline]: 6.12001e-06 [merge_send_recv]: 8.38999e-06 [auto_parallel]: 6.22001e-06 [parallel]: 2.201e-05 [flash_sp]: 7.23e-06 [merge_comm]: 3.71001e-06 [allreduce_fusion]: 3.65998e-06 [matmul_add_comm_reduction]: 8.45999e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.55e-06 [virtual_dataset]: 5.81e-06 [get_grad_eliminate_]: 5.36002e-06 [virtual_output]: 5.62001e-06 [merge_forward]: 3.48e-06 [cell_reuse_recompute_pass]: 1.05001e-06 [offload_activation]: 9.12001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.04e-05 [merge_recompute_call_nodes]: 1.30999e-06 [before_grad]: 9.44998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.62002e-06 [meta_fg_expand]: 2.09999e-06 [flash_sp_send_recv_attached]: 2.46998e-06 [receive_attached]: 2.49001e-06 [after_resolve]: 9.94001e-06 [a_after_grad]: 8.87e-06 [renormalize]: 0.00042669 [add_forward_monad_depend]: 5.07e-06 [auto_monad_grad]: 1.74998e-06 [auto_monad_eliminator]: 1.348e-05 [cse]: 2.754e-05 [a_3]: 4.18e-05 [Cycle 2]: 0.0006017, [45] [expand_dump_flag]: 1.00999e-06 [switch_simplify]: 6.98e-06 [loop_unroll]: 5.56998e-06 [a_1]: 0.00012476 [with_stream_mark]: 9.39998e-06 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.73e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.24001e-06 [parameter_eliminate]: 1.04e-06 [a_2]: 6.98e-05 [accelerated_algorithm]: 5.81e-06 [shard]: 1.25999e-06 [meta_shard_fg_expand]: 1.19998e-06 [shard_inline]: 5.46998e-06 [merge_send_recv]: 4.58999e-06 [auto_parallel]: 5.66e-06 [parallel]: 3.81001e-06 [flash_sp]: 2.98e-06 [merge_comm]: 3.17997e-06 [allreduce_fusion]: 2.73e-06 [matmul_add_comm_reduction]: 6.07999e-06 [allreduce_slice_to_reducescatter]: 3.49974e-07 [virtual_shard_identity]: 6.34001e-06 [virtual_dataset]: 5.77001e-06 [get_grad_eliminate_]: 5.71998e-06 [virtual_output]: 5.29e-06 [merge_forward]: 2.87002e-06 [cell_reuse_recompute_pass]: 1.22999e-06 [offload_activation]: 6.07001e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.47001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.98001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.18e-06 [meta_fg_expand]: 1.59e-06 [flash_sp_send_recv_attached]: 8.39995e-07 [receive_attached]: 1.02998e-06 [after_resolve]: 8.80001e-06 [a_after_grad]: 8.12e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.15001e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 7.14001e-06 [cse]: 1.337e-05 [a_3]: 3.238e-05 [py_interpret_to_execute_after_opt_a]: 7.56001e-06 [slice_cell_reuse_recomputed_activation]: 1.69e-06 [rewriter_after_opt_a]: 3.009e-05 [convert_after_rewriter]: 6.96001e-06 [order_py_execute_after_rewriter]: 5.07e-06 [mutable_eliminate]: 0.00044723 [opt_b]: 0.00018193, [1] [Cycle 1]: 0.00017577, [7] [b_1]: 0.00010754 [b_2]: 7.05998e-06 [updatestate_depend_eliminate]: 5.27999e-06 [updatestate_assign_eliminate]: 2.44999e-06 [updatestate_loads_eliminate]: 2.39999e-06 [renormalize]: 4.09986e-07 [cse]: 1.629e-05 [optimize_parallel_all_gather_comm]: 1.566e-05 [overlap_param_gather]: 1.66e-06 [cconv]: 2.112e-05 [loop_unroll]: 0.00044731 [opt_after_cconv]: 9.459e-05, [1] [Cycle 1]: 8.879e-05, [7] [c_1]: 2.748e-05 [parameter_eliminate]: 2.30002e-06 [updatestate_depend_eliminate]: 5.15999e-06 [updatestate_assign_eliminate]: 2.51998e-06 [updatestate_loads_eliminate]: 2.23002e-06 [cse]: 1.633e-05 [renormalize]: 2.00002e-07 [remove_dup_value]: 1.206e-05 [tuple_transform]: 6.818e-05, [1] [Cycle 1]: 6.386e-05, [4] [d_1]: 3.865e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 1.60013e-07 [switch_simplify]: 6.12001e-06 [partial_unused_args_eliminate]: 1.96e-06 [add_recomputation]: 4.694e-05 [cse_after_recomputation]: 2.054e-05, [1] [Cycle 1]: 1.606e-05, [1] [cse]: 1.08e-05 [environ_conv]: 5.12999e-06 [swap_dp_allreduce_reducescatter]: 5.09998e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.17998e-06 [label_fine_grained_interleaved_index]: 2.80002e-06 [merge_cast_opt]: 1.38002e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.02001e-06 [assign_add_opt]: 1.27999e-06 [ForceFp32Comm]: 9.70002e-07 [remove_cast_before_assign_add]: 9.89996e-07 [full_micro_interleaved_order_control]: 2.55997e-06 [reorder_send_recv_between_fp_bp]: 2.54999e-06 [comm_op_add_attrs]: 1.10001e-06 [add_comm_op_reuse_tag]: 9.70002e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 9.89996e-07 [overlap_opt_shard_in_pipeline]: 1.45999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.89999e-06 [control_data_broadcast_order]: 1.161e-05 [grouped_pairwise_exchange_alltoall]: 1.41998e-06 [offloading_packed_experts]: 3.71001e-06 [overlap_recompute_and_grad_model_parallel]: 4.25999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.39e-06 [overlap_recompute_allgather_and_fa_grad]: 1.27e-06 [overlap_recompute_comm]: 2.22999e-06 [overlap_grad_ring_attention]: 4.25999e-06 [overlap_grad_flash_sp]: 1.681e-05 [begin_end_overlap_inline]: 5.00004e-07 [split_matmul_comm_elemetwise]: 2.04999e-06 [split_layernorm_comm]: 1.71998e-06 [handle_group_info]: 1.35001e-06 [symbol_engine_optimizer]: 6.875e-05, [1] [Cycle 1]: 6.459e-05, [6] [build]: 2.22001e-06 [elim_shapecalc]: 8.55999e-06 [elim_not_effective]: 1.162e-05 [opt_reshape]: 6.02001e-06 [fold_const_symbol]: 8.88002e-06 [renormalize]: 2.09984e-07 [detach_backward]: 1.50999e-06 [pipeline_parallel_scheduler]: 1.57001e-06 [auto_monad_reorder]: 1.563e-05 [get_jit_bprop_graph]: 1.15001e-06 [rewriter_after_jit_bprop_graph]: 3.5e-06 [opt_after_jit_grad]: 0.0004494 [validate]: 3.146e-05 [backend_pass]: 9.30013e-07 [task_emit]: 0.00612368 [execute]: 6.79001e-06 Sums bootstrap : 0.000546s : 3.27% type_inference : 0.006140s : 36.84% event_method : 0.000014s : 0.09% auto_monad : 0.000056s : 0.34% graph_reusing : 0.000006s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000058s : 0.35% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000039s : 0.23% optimize.opt_a.loop_unroll : 0.000027s : 0.16% optimize.opt_a.a_1 : 0.000577s : 3.46% optimize.opt_a.with_stream_mark : 0.000023s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000145s : 0.87% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.07% optimize.opt_a.merge_send_recv : 0.000013s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.07% optimize.opt_a.parallel : 0.000026s : 0.15% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000015s : 0.09% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000012s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.07% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000015s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.02% optimize.opt_a.receive_attached : 0.000004s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000427s : 2.56% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000021s : 0.12% optimize.opt_a.cse : 0.000041s : 0.25% optimize.opt_a.a_3 : 0.000074s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.18% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000447s : 2.68% optimize.opt_b.b_1 : 0.000108s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000021s : 0.13% optimize.loop_unroll : 0.000447s : 2.68% optimize.opt_after_cconv.c_1 : 0.000027s : 0.16% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000039s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000047s : 0.28% optimize.cse_after_recomputation.cse : 0.000011s : 0.06% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000003s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.10% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000002s : 0.01% auto_monad_reorder : 0.000016s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.70% validate : 0.000031s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006124s : 36.74% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.78% : 0.000024s : 5: substitution.arithmetic_simplify 1.13% : 0.000002s : 2: substitution.elim_not_effective 0.74% : 0.000001s : 2: substitution.fold_const_symbol 3.28% : 0.000005s : 4: substitution.graph_param_transform 66.87% : 0.000109s : 3: substitution.inline 1.66% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.58% : 0.000004s : 4: substitution.remove_not_recompute_node 2.22% : 0.000004s : 4: substitution.replace_old_param 6.74% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006098 2 90.74% : 0.005533s : 1: type_inference.infer 9.26% : 0.000564s : 1: type_inference.specialize ------[replace.] 0.000038 5 68.97% : 0.000026s : 3: replace.inline 31.03% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000117 5 91.52% : 0.000107s : 3: match.inline 8.48% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.87% : 0.000001s : 11: predicate.accumulaten_eliminater 0.92% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.60% : 0.000001s : 8: predicate.addn_check_dump 0.82% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.19% : 0.000003s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.58% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.07% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.18% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.85% : 0.000003s : 23: predicate.environ_get_eliminate 1.06% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.59% : 0.000001s : 8: predicate.float_environ_get_switch 0.87% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.20% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.25% : 0.000000s : 4: predicate.graph_param_transform 0.67% : 0.000001s : 8: predicate.incorporate_call 0.57% : 0.000001s : 8: predicate.incorporate_call_switch 6.11% : 0.000010s : 51: predicate.inline 0.86% : 0.000001s : 8: predicate.inline_without_move 0.43% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.89% : 0.000001s : 8: predicate.less_batch_normalization 1.97% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.33% : 0.000004s : 32: predicate.load_eliminater 1.07% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.15% : 0.000003s : 26: predicate.loop_unroll_before_grad 1.70% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.78% : 0.000001s : 11: predicate.minmaximum_grad 1.15% : 0.000002s : 4: predicate.mutable_eliminate 0.41% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.62% : 0.000003s : 16: predicate.partial_defer_inline 1.44% : 0.000002s : 17: predicate.partial_eliminate 0.87% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.42% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.44% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.70% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.78% : 0.000001s : 8: predicate.same_eliminate 0.51% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.86% : 0.000001s : 8: predicate.shard_identity_eliminate 0.70% : 0.000001s : 8: predicate.special_op_eliminate 0.77% : 0.000001s : 8: predicate.specialize_transform 1.07% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.40% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.38% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.04% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.87% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.29% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.15% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.37% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.15% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.36% : 0.000001s : 4: predicate.value_based_eliminate 0.77% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.74% : 0.000001s : 8: predicate.virtual_output_eliminate 0.33% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.47% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000365 8 47.00% : 0.000171s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.00% : 0.000193s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029884 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.24% : 0.003360s : 1: add_attr 11.21% : 0.003349s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.17% : 0.000051s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.21% : 0.000061s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.96% : 0.000585s : 1: bootstrap 0.08% : 0.000024s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000010s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.53% : 0.000457s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000456s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.17% : 0.000946s : 78: opt.transform.opt_a 0.09% : 0.000026s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.16% : 0.002139s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.53% : 0.000459s : 1: opt_after_jit_grad 0.62% : 0.000185s : 1: opt_b 13.41% : 0.004007s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000005s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.08% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000016s : 1: remove_dup_value 0.72% : 0.000214s : 1: renormalize.infer 0.69% : 0.000205s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.11% : 0.000034s : 1: rewriter_after_opt_a 0.21% : 0.000062s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000004s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.53% : 0.006134s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.59% : 0.006154s : 1: type_inference 0.21% : 0.000062s : 1: validate TotalTime = 0.0180783, [24] [bootstrap]: 0.00045096 [type_inference]: 0.00434961 [event_method]: 1.055e-05 [auto_monad]: 5.166e-05 [graph_reusing]: 5.35001e-06 [inline]: 1.84e-06 [add_attr]: 0.00295763, [1] [add_attr_with_inline]: 0.00294922, [1] [Cycle 1]: 4.518e-05, [2] [tag_attr]: 1.176e-05 [meta_addattr_fg_expand]: 3.15998e-06 [parallel-infer-symbol]: 2.98998e-06 [pre_auto_parallel]: 2.167e-05 [insert-virtual-dataset]: 2.61e-06 [parallel-infer-symbol-second]: 6.60017e-07 [dataset_repeat_opt]: 1.82999e-06 [pipeline_split]: 1.49998e-06 [optimize]: 0.00368338, [53] [py_interpret_to_execute]: 1.437e-05 [rewriter_before_opt_a]: 3.888e-05 [opt_a]: 0.00188341, [2] [Cycle 1]: 0.00123887, [45] [expand_dump_flag]: 2.56e-06 [switch_simplify]: 2.39e-05 [loop_unroll]: 1.38e-05 [a_1]: 0.00028793 [with_stream_mark]: 1.369e-05 [recompute_prepare]: 7.41999e-06 [updatestate_depend_eliminate]: 3.45e-06 [updatestate_assign_eliminate]: 2.98e-06 [updatestate_loads_eliminate]: 2.66e-06 [parameter_eliminate]: 1.57001e-06 [a_2]: 7.72e-05 [accelerated_algorithm]: 6.32001e-06 [shard]: 2.39999e-06 [meta_shard_fg_expand]: 1.60001e-06 [shard_inline]: 5.92001e-06 [merge_send_recv]: 7.48999e-06 [auto_parallel]: 6.24001e-06 [parallel]: 1.694e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.30998e-06 [matmul_add_comm_reduction]: 9.12001e-06 [allreduce_slice_to_reducescatter]: 6.60017e-07 [virtual_shard_identity]: 7.01001e-06 [virtual_dataset]: 5.57001e-06 [get_grad_eliminate_]: 5.35999e-06 [virtual_output]: 5.91998e-06 [merge_forward]: 3.55003e-06 [cell_reuse_recompute_pass]: 1.03001e-06 [offload_activation]: 9.58997e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 8.95999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.38999e-06 [meta_fg_expand]: 2.11e-06 [flash_sp_send_recv_attached]: 2.77002e-06 [receive_attached]: 2.15002e-06 [after_resolve]: 1.034e-05 [a_after_grad]: 8.84998e-06 [renormalize]: 0.00033645 [add_forward_monad_depend]: 4.2e-06 [auto_monad_grad]: 1.64998e-06 [auto_monad_eliminator]: 1.323e-05 [cse]: 2.647e-05 [a_3]: 4.134e-05 [Cycle 2]: 0.00063529, [45] [expand_dump_flag]: 9.09989e-07 [switch_simplify]: 6.64999e-06 [loop_unroll]: 5.57999e-06 [a_1]: 0.00012636 [with_stream_mark]: 1.061e-05 [recompute_prepare]: 5.64e-06 [updatestate_depend_eliminate]: 2.66999e-06 [updatestate_assign_eliminate]: 2.15002e-06 [updatestate_loads_eliminate]: 2.51e-06 [parameter_eliminate]: 7.7e-07 [a_2]: 0.00010591 [accelerated_algorithm]: 5.56e-06 [shard]: 1.08001e-06 [meta_shard_fg_expand]: 1.25999e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.43999e-06 [auto_parallel]: 5.29e-06 [parallel]: 4.04002e-06 [flash_sp]: 3.42997e-06 [merge_comm]: 3.11999e-06 [allreduce_fusion]: 2.74001e-06 [matmul_add_comm_reduction]: 4.80001e-06 [allreduce_slice_to_reducescatter]: 3.89991e-07 [virtual_shard_identity]: 6.11998e-06 [virtual_dataset]: 5.42999e-06 [get_grad_eliminate_]: 5.12e-06 [virtual_output]: 4.95001e-06 [merge_forward]: 2.58e-06 [cell_reuse_recompute_pass]: 1.34e-06 [offload_activation]: 5.86e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.50001e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.19998e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.10018e-07 [receive_attached]: 1.09e-06 [after_resolve]: 9.86998e-06 [a_after_grad]: 8.44002e-06 [renormalize]: 8.00064e-08 [add_forward_monad_depend]: 1.32e-06 [auto_monad_grad]: 8.60018e-07 [auto_monad_eliminator]: 6.49999e-06 [cse]: 1.284e-05 [a_3]: 3.223e-05 [py_interpret_to_execute_after_opt_a]: 7.39002e-06 [slice_cell_reuse_recomputed_activation]: 1.87001e-06 [rewriter_after_opt_a]: 3.052e-05 [convert_after_rewriter]: 6.93e-06 [order_py_execute_after_rewriter]: 4.85001e-06 [mutable_eliminate]: 0.0004477 [opt_b]: 0.00018096, [1] [Cycle 1]: 0.00017513, [7] [b_1]: 0.00010813 [b_2]: 7.08e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.29001e-06 [updatestate_loads_eliminate]: 2.23998e-06 [renormalize]: 4.69998e-07 [cse]: 1.61e-05 [optimize_parallel_all_gather_comm]: 1.528e-05 [overlap_param_gather]: 1.97999e-06 [cconv]: 2.252e-05 [loop_unroll]: 0.00041344 [opt_after_cconv]: 9.412e-05, [1] [Cycle 1]: 8.833e-05, [7] [c_1]: 2.764e-05 [parameter_eliminate]: 2.34001e-06 [updatestate_depend_eliminate]: 4.93001e-06 [updatestate_assign_eliminate]: 2.49999e-06 [updatestate_loads_eliminate]: 2.11998e-06 [cse]: 1.607e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.205e-05 [tuple_transform]: 6.839e-05, [1] [Cycle 1]: 6.424e-05, [4] [d_1]: 3.868e-05 [none_parameter_eliminate]: 1.44e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.34999e-06 [partial_unused_args_eliminate]: 2.04e-06 [add_recomputation]: 4.285e-05 [cse_after_recomputation]: 2.046e-05, [1] [Cycle 1]: 1.627e-05, [1] [cse]: 1.09e-05 [environ_conv]: 4.84e-06 [swap_dp_allreduce_reducescatter]: 5.32001e-06 [bias_add_comm_swap]: 2.67001e-06 [label_micro_interleaved_index]: 3.9e-06 [label_fine_grained_interleaved_index]: 2.51e-06 [merge_cast_opt]: 1.45999e-06 [slice_recompute_activation]: 2.22001e-06 [micro_interleaved_order_control]: 2.32999e-06 [assign_add_opt]: 1.43002e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.14e-06 [full_micro_interleaved_order_control]: 2.14999e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 1.12e-06 [add_comm_op_reuse_tag]: 9.99979e-07 [interleave_split_concat_branches]: 1.15999e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.05999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.54998e-06 [control_data_broadcast_order]: 1.16e-05 [grouped_pairwise_exchange_alltoall]: 1.94999e-06 [offloading_packed_experts]: 4.1e-06 [overlap_recompute_and_grad_model_parallel]: 4.21001e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.50001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.31998e-06 [overlap_recompute_comm]: 2.31e-06 [overlap_grad_ring_attention]: 4.07003e-06 [overlap_grad_flash_sp]: 1.688e-05 [begin_end_overlap_inline]: 4.69998e-07 [split_matmul_comm_elemetwise]: 1.97001e-06 [split_layernorm_comm]: 1.87001e-06 [handle_group_info]: 1.39003e-06 [symbol_engine_optimizer]: 6.891e-05, [1] [Cycle 1]: 6.479e-05, [6] [build]: 2.17999e-06 [elim_shapecalc]: 8.32998e-06 [elim_not_effective]: 1.186e-05 [opt_reshape]: 6.10002e-06 [fold_const_symbol]: 8.97e-06 [renormalize]: 1.90019e-07 [detach_backward]: 1.85001e-06 [pipeline_parallel_scheduler]: 1.35999e-06 [auto_monad_reorder]: 1.526e-05 [get_jit_bprop_graph]: 1.01002e-06 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00044886 [validate]: 3.05e-05 [backend_pass]: 1.03001e-06 [task_emit]: 0.00583266 [execute]: 6.74001e-06 Sums bootstrap : 0.000451s : 3.18% type_inference : 0.004350s : 30.70% event_method : 0.000011s : 0.07% auto_monad : 0.000052s : 0.36% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000022s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000001s : 0.01% optimize.py_interpret_to_execute : 0.000014s : 0.10% optimize.rewriter_before_opt_a : 0.000039s : 0.27% optimize.opt_a.expand_dump_flag : 0.000003s : 0.02% optimize.opt_a.switch_simplify : 0.000031s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000414s : 2.92% optimize.opt_a.with_stream_mark : 0.000024s : 0.17% optimize.opt_a.recompute_prepare : 0.000013s : 0.09% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000183s : 1.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000012s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000012s : 0.08% optimize.opt_a.parallel : 0.000021s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.09% optimize.opt_a.virtual_dataset : 0.000011s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.14% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.03% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000337s : 2.38% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000003s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.28% optimize.opt_a.a_3 : 0.000074s : 0.52% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.22% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.16% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000023s : 0.16% optimize.loop_unroll : 0.000413s : 2.92% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.11% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.09% optimize.tuple_transform.d_1 : 0.000039s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000011s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.04% optimize.bias_add_comm_swap : 0.000003s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.02% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000002s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.02% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 3.17% validate : 0.000030s : 0.22% backend_pass : 0.000001s : 0.01% task_emit : 0.005833s : 41.17% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000117 26 18.77% : 0.000022s : 4: substitution.arithmetic_simplify 1.49% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.31% : 0.000005s : 4: substitution.graph_param_transform 65.29% : 0.000076s : 2: substitution.inline 2.32% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.45% : 0.000004s : 4: substitution.remove_not_recompute_node 3.32% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004309 2 92.01% : 0.003965s : 1: type_inference.infer 7.99% : 0.000344s : 1: type_inference.specialize ------[replace.] 0.000018 2 100.00% : 0.000018s : 2: replace.inline ------[match.] 0.000075 2 100.00% : 0.000075s : 2: match.inline ------[predicate.] 0.000139 984 0.81% : 0.000001s : 9: predicate.accumulaten_eliminater 1.29% : 0.000002s : 4: predicate.ad_related_special_op_eliminate 0.64% : 0.000001s : 8: predicate.addn_check_dump 0.68% : 0.000001s : 9: predicate.addn_zero_filter 0.68% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.55% : 0.000004s : 17: predicate.arithmetic_simplify 0.79% : 0.000001s : 9: predicate.cast_eliminate 0.78% : 0.000001s : 8: predicate.check_bprop_eliminate 0.64% : 0.000001s : 8: predicate.compare_switch_simplify 0.32% : 0.000000s : 4: predicate.const_output_eliminate 0.78% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.79% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.29% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.16% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.10% : 0.000002s : 13: predicate.environ_get_depend_swap 1.80% : 0.000002s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.94% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.87% : 0.000003s : 11: predicate.float_depend_g_call 0.66% : 0.000001s : 8: predicate.float_environ_get_switch 1.00% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.82% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.85% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.99% : 0.000008s : 44: predicate.inline 1.07% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.09% : 0.000002s : 8: predicate.less_batch_normalization 1.73% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.06% : 0.000003s : 26: predicate.load_eliminater 1.22% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.73% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.79% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.69% : 0.000001s : 8: predicate.merge_addn 0.99% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.81% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.42% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.42% : 0.000001s : 4: predicate.parallel_virtual_node 1.28% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.75% : 0.000001s : 8: predicate.reduce_all_const_elim 0.97% : 0.000001s : 9: predicate.reduce_eliminate 2.14% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.94% : 0.000001s : 8: predicate.remove_not_recompute_node 1.34% : 0.000002s : 17: predicate.replace_applicator 0.72% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.71% : 0.000001s : 9: predicate.reshape_eliminate 0.81% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.43% : 0.000001s : 4: predicate.row_tensor_eliminate 0.89% : 0.000001s : 8: predicate.same_eliminate 0.59% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.97% : 0.000001s : 8: predicate.shard_identity_eliminate 1.05% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.43% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.43% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.01% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.40% : 0.000006s : 41: predicate.switch_simplify 0.81% : 0.000001s : 9: predicate.tile_eliminate 0.78% : 0.000001s : 9: predicate.transpose_eliminate 1.51% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.55% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.28% : 0.000005s : 25: predicate.tuple_list_get_item_eliminator 1.49% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.25% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.48% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.08% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.01% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.43% : 0.000001s : 4: predicate.value_based_eliminate 0.81% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.79% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.52% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000237 6 42.75% : 0.000101s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.25% : 0.000136s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025985 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.40% : 0.002962s : 1: add_attr 11.36% : 0.002953s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.22% : 0.000057s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000006s : 1: bias_add_comm_swap 1.87% : 0.000487s : 1: bootstrap 0.10% : 0.000026s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000015s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000023s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.02% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000005s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.62% : 0.000422s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000457s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.95% : 0.000767s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000091s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.26% : 0.001886s : 1: opt_a 0.38% : 0.000097s : 1: opt_after_cconv 1.76% : 0.000458s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.19% : 0.003687s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.02% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.02% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.03% : 0.000007s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000026s : 1: pre_auto_parallel 0.07% : 0.000018s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000015s : 1: remove_dup_value 0.72% : 0.000187s : 1: renormalize.infer 0.55% : 0.000143s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.16% : 0.000043s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.28% : 0.000071s : 1: symbol_engine_optimizer 22.48% : 0.005842s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.79% : 0.004364s : 1: type_inference 0.22% : 0.000056s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x8-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x8-kbk],max_mem:68.0M TotalTime = 0.0810909, [24] [bootstrap]: 0.00055439 [type_inference]: 0.00600347 [event_method]: 1.401e-05 [auto_monad]: 5.497e-05 [graph_reusing]: 5.46002e-06 [inline]: 2.02001e-06 [add_attr]: 0.00336865, [1] [add_attr_with_inline]: 0.00335792, [1] [Cycle 1]: 4.427e-05, [2] [tag_attr]: 1.533e-05 [meta_addattr_fg_expand]: 3.85e-06 [parallel-infer-symbol]: 3.05002e-06 [pre_auto_parallel]: 2.729e-05 [insert-virtual-dataset]: 2.65002e-06 [parallel-infer-symbol-second]: 7.40023e-07 [dataset_repeat_opt]: 2.14e-06 [pipeline_split]: 1.70001e-06 [optimize]: 0.00400208, [53] [py_interpret_to_execute]: 2.049e-05 [rewriter_before_opt_a]: 5.858e-05 [opt_a]: 0.00217128, [2] [Cycle 1]: 0.00157164, [45] [expand_dump_flag]: 2.45002e-06 [switch_simplify]: 3.222e-05 [loop_unroll]: 2.096e-05 [a_1]: 0.00045662 [with_stream_mark]: 1.353e-05 [recompute_prepare]: 7.83001e-06 [updatestate_depend_eliminate]: 3.6e-06 [updatestate_assign_eliminate]: 3.4e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.64e-06 [a_2]: 7.652e-05 [accelerated_algorithm]: 6.19001e-06 [shard]: 2.18998e-06 [meta_shard_fg_expand]: 1.51998e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 5.92999e-06 [parallel]: 2.217e-05 [flash_sp]: 7.16001e-06 [merge_comm]: 3.88001e-06 [allreduce_fusion]: 3.23e-06 [matmul_add_comm_reduction]: 8.45001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.87e-06 [virtual_dataset]: 6.01998e-06 [get_grad_eliminate_]: 5.47999e-06 [virtual_output]: 5.81e-06 [merge_forward]: 3.91999e-06 [cell_reuse_recompute_pass]: 1.09e-06 [offload_activation]: 9.24e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.083e-05 [merge_recompute_call_nodes]: 1.58997e-06 [before_grad]: 9.57001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.58e-06 [meta_fg_expand]: 2.43002e-06 [flash_sp_send_recv_attached]: 2.31998e-06 [receive_attached]: 2.39001e-06 [after_resolve]: 1.024e-05 [a_after_grad]: 8.78001e-06 [renormalize]: 0.00046828 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 1.63002e-06 [auto_monad_eliminator]: 1.35e-05 [cse]: 2.646e-05 [a_3]: 4.023e-05 [Cycle 2]: 0.00059047, [45] [expand_dump_flag]: 9.20001e-07 [switch_simplify]: 6.84001e-06 [loop_unroll]: 5.61e-06 [a_1]: 0.00012651 [with_stream_mark]: 9.52999e-06 [recompute_prepare]: 5.57999e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.48e-06 [parameter_eliminate]: 9.30013e-07 [a_2]: 6.851e-05 [accelerated_algorithm]: 5.55001e-06 [shard]: 1.21002e-06 [meta_shard_fg_expand]: 1.10999e-06 [shard_inline]: 5.79e-06 [merge_send_recv]: 4.17e-06 [auto_parallel]: 5.19998e-06 [parallel]: 4.15e-06 [flash_sp]: 2.94001e-06 [merge_comm]: 2.99001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 5.06002e-06 [allreduce_slice_to_reducescatter]: 3.80009e-07 [virtual_shard_identity]: 6.14999e-06 [virtual_dataset]: 5.29998e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.08002e-06 [merge_forward]: 2.47001e-06 [cell_reuse_recompute_pass]: 1.19998e-06 [offload_activation]: 6.00002e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.32001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.79002e-06 [set_forward_comm_id_for_comm_node_pass]: 3.04001e-06 [meta_fg_expand]: 1.60001e-06 [flash_sp_send_recv_attached]: 7.39994e-07 [receive_attached]: 1.03001e-06 [after_resolve]: 8.94998e-06 [a_after_grad]: 8.00999e-06 [renormalize]: 1.00001e-07 [add_forward_monad_depend]: 8.50006e-07 [auto_monad_grad]: 7.60017e-07 [auto_monad_eliminator]: 5.89e-06 [cse]: 1.595e-05 [a_3]: 3.225e-05 [py_interpret_to_execute_after_opt_a]: 7.89002e-06 [slice_cell_reuse_recomputed_activation]: 2.14e-06 [rewriter_after_opt_a]: 3.188e-05 [convert_after_rewriter]: 6.68e-06 [order_py_execute_after_rewriter]: 4.95999e-06 [mutable_eliminate]: 0.00044778 [opt_b]: 0.00018188, [1] [Cycle 1]: 0.000176, [7] [b_1]: 0.00010844 [b_2]: 7.33999e-06 [updatestate_depend_eliminate]: 5.12999e-06 [updatestate_assign_eliminate]: 2.36998e-06 [updatestate_loads_eliminate]: 2.29001e-06 [renormalize]: 4.39992e-07 [cse]: 1.607e-05 [optimize_parallel_all_gather_comm]: 1.496e-05 [overlap_param_gather]: 1.70001e-06 [cconv]: 2.267e-05 [loop_unroll]: 0.00040937 [opt_after_cconv]: 9.424e-05, [1] [Cycle 1]: 8.871e-05, [7] [c_1]: 2.806e-05 [parameter_eliminate]: 2.58e-06 [updatestate_depend_eliminate]: 4.98001e-06 [updatestate_assign_eliminate]: 2.44001e-06 [updatestate_loads_eliminate]: 2.07999e-06 [cse]: 1.564e-05 [renormalize]: 4.19997e-07 [remove_dup_value]: 1.239e-05 [tuple_transform]: 6.956e-05, [1] [Cycle 1]: 6.514e-05, [4] [d_1]: 3.958e-05 [none_parameter_eliminate]: 1.57999e-06 [renormalize]: 2.00002e-07 [switch_simplify]: 6.07999e-06 [partial_unused_args_eliminate]: 1.72001e-06 [add_recomputation]: 4.638e-05 [cse_after_recomputation]: 1.942e-05, [1] [Cycle 1]: 1.521e-05, [1] [cse]: 1.013e-05 [environ_conv]: 4.58999e-06 [swap_dp_allreduce_reducescatter]: 5.07e-06 [bias_add_comm_swap]: 2.52001e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.52001e-06 [merge_cast_opt]: 1.47001e-06 [slice_recompute_activation]: 2.31998e-06 [micro_interleaved_order_control]: 2.43e-06 [assign_add_opt]: 1.17999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.27999e-06 [full_micro_interleaved_order_control]: 2.17999e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.02998e-06 [add_comm_op_reuse_tag]: 9.20001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.01002e-06 [overlap_opt_shard_in_pipeline]: 1.29998e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62001e-06 [control_data_broadcast_order]: 1.195e-05 [grouped_pairwise_exchange_alltoall]: 1.97001e-06 [offloading_packed_experts]: 3.45e-06 [overlap_recompute_and_grad_model_parallel]: 4.38999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.20001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.19001e-06 [overlap_grad_ring_attention]: 3.98999e-06 [overlap_grad_flash_sp]: 1.707e-05 [begin_end_overlap_inline]: 7.49977e-07 [split_matmul_comm_elemetwise]: 1.96998e-06 [split_layernorm_comm]: 1.60001e-06 [handle_group_info]: 1.10001e-06 [symbol_engine_optimizer]: 6.924e-05, [1] [Cycle 1]: 6.504e-05, [6] [build]: 2.53998e-06 [elim_shapecalc]: 8.59e-06 [elim_not_effective]: 1.139e-05 [opt_reshape]: 6.13002e-06 [fold_const_symbol]: 9.48997e-06 [renormalize]: 2.19996e-07 [detach_backward]: 1.71998e-06 [pipeline_parallel_scheduler]: 1.55999e-06 [auto_monad_reorder]: 1.568e-05 [get_jit_bprop_graph]: 9.5999e-07 [rewriter_after_jit_bprop_graph]: 3.63e-06 [opt_after_jit_grad]: 0.00044955 [validate]: 3.068e-05 [backend_pass]: 1.02e-06 [task_emit]: 0.066331 [execute]: 7.92003e-06 Sums bootstrap : 0.000554s : 0.72% type_inference : 0.006003s : 7.82% event_method : 0.000014s : 0.02% auto_monad : 0.000055s : 0.07% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000027s : 0.04% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.03% optimize.rewriter_before_opt_a : 0.000059s : 0.08% optimize.opt_a.expand_dump_flag : 0.000003s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.05% optimize.opt_a.loop_unroll : 0.000027s : 0.03% optimize.opt_a.a_1 : 0.000583s : 0.76% optimize.opt_a.with_stream_mark : 0.000023s : 0.03% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000145s : 0.19% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.01% optimize.opt_a.parallel : 0.000026s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.01% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.01% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.01% optimize.opt_a.virtual_output : 0.000011s : 0.01% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.02% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000019s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.02% optimize.opt_a.renormalize : 0.000468s : 0.61% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.01% optimize.opt_a.auto_monad_grad : 0.000002s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000042s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.09% optimize.py_interpret_to_execute_after_opt_a : 0.000008s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.04% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000448s : 0.58% optimize.opt_b.b_1 : 0.000108s : 0.14% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000023s : 0.03% optimize.loop_unroll : 0.000409s : 0.53% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.02% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.02% optimize.tuple_transform.d_1 : 0.000040s : 0.05% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000046s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.01% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000003s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000009s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.01% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.00% opt_after_jit_grad : 0.000450s : 0.59% validate : 0.000031s : 0.04% backend_pass : 0.000001s : 0.00% task_emit : 0.066331s : 86.42% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000165 30 14.67% : 0.000024s : 5: substitution.arithmetic_simplify 1.06% : 0.000002s : 2: substitution.elim_not_effective 0.88% : 0.000001s : 2: substitution.fold_const_symbol 3.24% : 0.000005s : 4: substitution.graph_param_transform 67.02% : 0.000110s : 3: substitution.inline 1.76% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.47% : 0.000004s : 4: substitution.remove_not_recompute_node 2.30% : 0.000004s : 4: substitution.replace_old_param 6.61% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005961 2 90.78% : 0.005411s : 1: type_inference.infer 9.22% : 0.000550s : 1: type_inference.specialize ------[replace.] 0.000039 5 69.75% : 0.000027s : 3: replace.inline 30.25% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000118 5 91.73% : 0.000108s : 3: match.inline 8.27% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000159 1131 0.89% : 0.000001s : 11: predicate.accumulaten_eliminater 0.90% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.58% : 0.000001s : 8: predicate.addn_check_dump 0.79% : 0.000001s : 11: predicate.addn_zero_filter 0.82% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.24% : 0.000004s : 19: predicate.arithmetic_simplify 0.83% : 0.000001s : 11: predicate.cast_eliminate 0.67% : 0.000001s : 8: predicate.check_bprop_eliminate 0.62% : 0.000001s : 8: predicate.compare_switch_simplify 0.23% : 0.000000s : 4: predicate.const_output_eliminate 0.60% : 0.000001s : 8: predicate.depend_value_elim 0.85% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.89% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.14% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.31% : 0.000001s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.13% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_depend_swap 1.77% : 0.000003s : 23: predicate.environ_get_eliminate 1.11% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.19% : 0.000003s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.84% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.26% : 0.000000s : 4: predicate.fold_const_symbol 0.70% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.64% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.01% : 0.000010s : 51: predicate.inline 0.82% : 0.000001s : 8: predicate.inline_without_move 0.40% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.79% : 0.000001s : 8: predicate.less_batch_normalization 1.70% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.38% : 0.000004s : 32: predicate.load_eliminater 0.94% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.21% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.74% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.80% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.72% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.77% : 0.000001s : 11: predicate.minmaximum_grad 1.10% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.43% : 0.000001s : 4: predicate.parallel_virtual_node 1.67% : 0.000003s : 16: predicate.partial_defer_inline 1.47% : 0.000002s : 17: predicate.partial_eliminate 0.85% : 0.000001s : 11: predicate.print_const_string_wrapper 0.65% : 0.000001s : 8: predicate.reduce_all_const_elim 1.04% : 0.000002s : 11: predicate.reduce_eliminate 2.37% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.52% : 0.000001s : 8: predicate.remove_not_recompute_node 1.41% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.94% : 0.000002s : 11: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.79% : 0.000001s : 8: predicate.same_eliminate 0.55% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.94% : 0.000001s : 8: predicate.shard_identity_eliminate 0.83% : 0.000001s : 8: predicate.special_op_eliminate 0.84% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.81% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.41% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 2.00% : 0.000003s : 24: predicate.switch_layer_defer_inline 4.96% : 0.000008s : 54: predicate.switch_simplify 0.80% : 0.000001s : 11: predicate.tile_eliminate 0.90% : 0.000001s : 11: predicate.transpose_eliminate 1.58% : 0.000003s : 19: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000003s : 19: predicate.tuple_list_get_item_const_eliminator 1.43% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.53% : 0.000006s : 29: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.22% : 0.000004s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.29% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.12% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.76% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.72% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.48% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 45.63% : 0.000155s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.37% : 0.000185s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.090035 196 0.00% : 0.000003s : 1: ForceFp32Comm 3.75% : 0.003373s : 1: add_attr 3.73% : 0.003362s : 1: add_attr_with_inline 0.00% : 0.000003s : 1: add_comm_op_reuse_tag 0.06% : 0.000050s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000060s : 1: auto_monad 0.02% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000004s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.66% : 0.000591s : 1: bootstrap 0.03% : 0.000026s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.02% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000019s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.46% : 0.000418s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.51% : 0.000456s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.01% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.01% : 0.000013s : 1: opt.transform.mutable_eliminate 1.06% : 0.000951s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.02% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.10% : 0.000091s : 28: opt.transform.opt_b 0.05% : 0.000044s : 2: opt.transform.opt_trans_graph 0.04% : 0.000032s : 4: opt.transform.symbol_engine_opt 2.41% : 0.002174s : 1: opt_a 0.11% : 0.000098s : 1: opt_after_cconv 0.51% : 0.000459s : 1: opt_after_jit_grad 0.21% : 0.000185s : 1: opt_b 4.45% : 0.004006s : 1: optimize 0.02% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.02% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000032s : 1: pre_auto_parallel 0.03% : 0.000025s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000214s : 1: renormalize.infer 0.27% : 0.000248s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000036s : 1: rewriter_after_opt_a 0.07% : 0.000063s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000004s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.08% : 0.000072s : 1: symbol_engine_optimizer 73.69% : 0.066347s : 1: task_emit 0.08% : 0.000072s : 1: tuple_transform 6.68% : 0.006017s : 1: type_inference 0.06% : 0.000057s : 1: validate TotalTime = 0.0706569, [24] [bootstrap]: 0.00047604 [type_inference]: 0.00438023 [event_method]: 1.086e-05 [auto_monad]: 5.063e-05 [graph_reusing]: 5.49e-06 [inline]: 1.85001e-06 [add_attr]: 0.0029995, [1] [add_attr_with_inline]: 0.0029913, [1] [Cycle 1]: 4.561e-05, [2] [tag_attr]: 1.173e-05 [meta_addattr_fg_expand]: 3.67002e-06 [parallel-infer-symbol]: 2.79001e-06 [pre_auto_parallel]: 2.043e-05 [insert-virtual-dataset]: 2.31998e-06 [parallel-infer-symbol-second]: 6.59988e-07 [dataset_repeat_opt]: 1.84e-06 [pipeline_split]: 1.52001e-06 [optimize]: 0.00363001, [53] [py_interpret_to_execute]: 1.469e-05 [rewriter_before_opt_a]: 3.924e-05 [opt_a]: 0.00183835, [2] [Cycle 1]: 0.00123993, [45] [expand_dump_flag]: 2.71999e-06 [switch_simplify]: 2.471e-05 [loop_unroll]: 1.359e-05 [a_1]: 0.00028629 [with_stream_mark]: 1.331e-05 [recompute_prepare]: 7.09001e-06 [updatestate_depend_eliminate]: 3.60998e-06 [updatestate_assign_eliminate]: 2.98998e-06 [updatestate_loads_eliminate]: 2.88e-06 [parameter_eliminate]: 1.64998e-06 [a_2]: 7.524e-05 [accelerated_algorithm]: 6.46e-06 [shard]: 2.16998e-06 [meta_shard_fg_expand]: 1.49998e-06 [shard_inline]: 5.73997e-06 [merge_send_recv]: 8.05999e-06 [auto_parallel]: 5.86998e-06 [parallel]: 1.836e-05 [flash_sp]: 7.28e-06 [merge_comm]: 3.38e-06 [allreduce_fusion]: 3.4e-06 [matmul_add_comm_reduction]: 8.60001e-06 [allreduce_slice_to_reducescatter]: 6.09987e-07 [virtual_shard_identity]: 7.19001e-06 [virtual_dataset]: 5.76e-06 [get_grad_eliminate_]: 5.42001e-06 [virtual_output]: 5.42999e-06 [merge_forward]: 3.63e-06 [cell_reuse_recompute_pass]: 1.02998e-06 [offload_activation]: 9.32001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.075e-05 [merge_recompute_call_nodes]: 1.32e-06 [before_grad]: 9.22999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.36001e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.27999e-06 [receive_attached]: 2.66e-06 [after_resolve]: 1.02e-05 [a_after_grad]: 8.43001e-06 [renormalize]: 0.00033963 [add_forward_monad_depend]: 4.12998e-06 [auto_monad_grad]: 1.91e-06 [auto_monad_eliminator]: 1.253e-05 [cse]: 2.788e-05 [a_3]: 3.976e-05 [Cycle 2]: 0.00058968, [45] [expand_dump_flag]: 8.89995e-07 [switch_simplify]: 6.68e-06 [loop_unroll]: 5.32001e-06 [a_1]: 0.0001244 [with_stream_mark]: 1.026e-05 [recompute_prepare]: 5.77999e-06 [updatestate_depend_eliminate]: 2.79999e-06 [updatestate_assign_eliminate]: 2.31e-06 [updatestate_loads_eliminate]: 2.47001e-06 [parameter_eliminate]: 8.59989e-07 [a_2]: 6.721e-05 [accelerated_algorithm]: 5.47001e-06 [shard]: 1.07e-06 [meta_shard_fg_expand]: 1.18001e-06 [shard_inline]: 5.30001e-06 [merge_send_recv]: 4.21001e-06 [auto_parallel]: 5.22e-06 [parallel]: 3.77998e-06 [flash_sp]: 2.94001e-06 [merge_comm]: 3.13998e-06 [allreduce_fusion]: 2.77002e-06 [matmul_add_comm_reduction]: 5.17e-06 [allreduce_slice_to_reducescatter]: 3.50003e-07 [virtual_shard_identity]: 6.21e-06 [virtual_dataset]: 5.37001e-06 [get_grad_eliminate_]: 5.04998e-06 [virtual_output]: 4.99e-06 [merge_forward]: 2.51e-06 [cell_reuse_recompute_pass]: 1.33002e-06 [offload_activation]: 5.99999e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.52001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 8.08999e-06 [set_forward_comm_id_for_comm_node_pass]: 2.89999e-06 [meta_fg_expand]: 1.69e-06 [flash_sp_send_recv_attached]: 7.29982e-07 [receive_attached]: 1.27999e-06 [after_resolve]: 9.82001e-06 [a_after_grad]: 8.28999e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.06002e-06 [auto_monad_grad]: 8.09989e-07 [auto_monad_eliminator]: 6.10002e-06 [cse]: 1.223e-05 [a_3]: 3.184e-05 [py_interpret_to_execute_after_opt_a]: 7.26001e-06 [slice_cell_reuse_recomputed_activation]: 1.74e-06 [rewriter_after_opt_a]: 3.079e-05 [convert_after_rewriter]: 6.89001e-06 [order_py_execute_after_rewriter]: 4.97999e-06 [mutable_eliminate]: 0.00044439 [opt_b]: 0.00017917, [1] [Cycle 1]: 0.00017314, [7] [b_1]: 0.00010628 [b_2]: 6.83e-06 [updatestate_depend_eliminate]: 5.05001e-06 [updatestate_assign_eliminate]: 2.45002e-06 [updatestate_loads_eliminate]: 2.37999e-06 [renormalize]: 5.8001e-07 [cse]: 1.616e-05 [optimize_parallel_all_gather_comm]: 1.474e-05 [overlap_param_gather]: 1.71e-06 [cconv]: 2.107e-05 [loop_unroll]: 0.00041293 [opt_after_cconv]: 9.58e-05, [1] [Cycle 1]: 9.022e-05, [7] [c_1]: 2.783e-05 [parameter_eliminate]: 2.27999e-06 [updatestate_depend_eliminate]: 5.17999e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.17001e-06 [cse]: 1.715e-05 [renormalize]: 4.70027e-07 [remove_dup_value]: 1.274e-05 [tuple_transform]: 6.825e-05, [1] [Cycle 1]: 6.41e-05, [4] [d_1]: 3.859e-05 [none_parameter_eliminate]: 1.49e-06 [renormalize]: 1.90019e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.298e-05 [cse_after_recomputation]: 1.962e-05, [1] [Cycle 1]: 1.536e-05, [1] [cse]: 1.046e-05 [environ_conv]: 4.35e-06 [swap_dp_allreduce_reducescatter]: 5.24e-06 [bias_add_comm_swap]: 2.64001e-06 [label_micro_interleaved_index]: 4.24002e-06 [label_fine_grained_interleaved_index]: 2.68e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.39001e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.22e-06 [ForceFp32Comm]: 9.29984e-07 [remove_cast_before_assign_add]: 1.40999e-06 [full_micro_interleaved_order_control]: 2.29001e-06 [reorder_send_recv_between_fp_bp]: 2.73e-06 [comm_op_add_attrs]: 1.22999e-06 [add_comm_op_reuse_tag]: 1.24998e-06 [interleave_split_concat_branches]: 1.09998e-06 [interleave_parallel_branches]: 1.04e-06 [overlap_opt_shard_in_pipeline]: 1.02e-06 [overlap_opt_shard_grad_in_pipeline]: 1.81e-06 [control_data_broadcast_order]: 1.147e-05 [grouped_pairwise_exchange_alltoall]: 1.57001e-06 [offloading_packed_experts]: 3.65e-06 [overlap_recompute_and_grad_model_parallel]: 4.54998e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.30001e-06 [overlap_recompute_comm]: 2.09999e-06 [overlap_grad_ring_attention]: 4.28999e-06 [overlap_grad_flash_sp]: 1.661e-05 [begin_end_overlap_inline]: 4.89992e-07 [split_matmul_comm_elemetwise]: 1.99e-06 [split_layernorm_comm]: 1.96e-06 [handle_group_info]: 1.37999e-06 [symbol_engine_optimizer]: 6.687e-05, [1] [Cycle 1]: 6.292e-05, [6] [build]: 2.31e-06 [elim_shapecalc]: 8.36002e-06 [elim_not_effective]: 1.127e-05 [opt_reshape]: 5.93002e-06 [fold_const_symbol]: 8.53001e-06 [renormalize]: 2.10013e-07 [detach_backward]: 1.54998e-06 [pipeline_parallel_scheduler]: 1.54e-06 [auto_monad_reorder]: 1.568e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 4.13001e-06 [opt_after_jit_grad]: 0.0004473 [validate]: 3.044e-05 [backend_pass]: 8.79983e-07 [task_emit]: 0.0583413 [execute]: 8.33001e-06 Sums bootstrap : 0.000476s : 0.71% type_inference : 0.004380s : 6.57% event_method : 0.000011s : 0.02% auto_monad : 0.000051s : 0.08% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.01% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000020s : 0.03% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.02% optimize.rewriter_before_opt_a : 0.000039s : 0.06% optimize.opt_a.expand_dump_flag : 0.000004s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.05% optimize.opt_a.loop_unroll : 0.000019s : 0.03% optimize.opt_a.a_1 : 0.000411s : 0.62% optimize.opt_a.with_stream_mark : 0.000024s : 0.04% optimize.opt_a.recompute_prepare : 0.000013s : 0.02% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.00% optimize.opt_a.a_2 : 0.000142s : 0.21% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000011s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000022s : 0.03% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000007s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.02% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.02% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000010s : 0.02% optimize.opt_a.virtual_output : 0.000010s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.02% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.03% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000004s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.03% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000340s : 0.51% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.03% optimize.opt_a.cse : 0.000040s : 0.06% optimize.opt_a.a_3 : 0.000072s : 0.11% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000031s : 0.05% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000444s : 0.67% optimize.opt_b.b_1 : 0.000106s : 0.16% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000001s : 0.00% optimize.opt_b.cse : 0.000016s : 0.02% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.02% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000021s : 0.03% optimize.loop_unroll : 0.000413s : 0.62% optimize.opt_after_cconv.c_1 : 0.000028s : 0.04% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000017s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.02% optimize.tuple_transform.d_1 : 0.000039s : 0.06% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.06% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000004s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000017s : 0.02% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.01% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.01% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000002s : 0.00% auto_monad_reorder : 0.000016s : 0.02% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000004s : 0.01% opt_after_jit_grad : 0.000447s : 0.67% validate : 0.000030s : 0.05% backend_pass : 0.000001s : 0.00% task_emit : 0.058341s : 87.49% execute : 0.000008s : 0.01% Time group info: ------[substitution.] 0.000117 26 18.76% : 0.000022s : 4: substitution.arithmetic_simplify 1.50% : 0.000002s : 2: substitution.elim_not_effective 1.06% : 0.000001s : 2: substitution.fold_const_symbol 4.32% : 0.000005s : 4: substitution.graph_param_transform 65.06% : 0.000076s : 2: substitution.inline 2.43% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.61% : 0.000004s : 4: substitution.remove_not_recompute_node 3.26% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004341 2 91.79% : 0.003984s : 1: type_inference.infer 8.21% : 0.000357s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000074 2 100.00% : 0.000074s : 2: match.inline ------[predicate.] 0.000136 984 0.85% : 0.000001s : 9: predicate.accumulaten_eliminater 1.01% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.78% : 0.000001s : 9: predicate.addn_zero_filter 0.74% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.39% : 0.000003s : 17: predicate.arithmetic_simplify 0.78% : 0.000001s : 9: predicate.cast_eliminate 0.84% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.29% : 0.000000s : 4: predicate.const_output_eliminate 0.72% : 0.000001s : 8: predicate.depend_value_elim 0.79% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.85% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.82% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.48% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.07% : 0.000001s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.98% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.99% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.03% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.24% : 0.000000s : 4: predicate.fold_const_symbol 0.96% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.77% : 0.000001s : 8: predicate.incorporate_call 0.63% : 0.000001s : 8: predicate.incorporate_call_switch 6.04% : 0.000008s : 44: predicate.inline 0.98% : 0.000001s : 8: predicate.inline_without_move 0.48% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.11% : 0.000002s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.23% : 0.000003s : 26: predicate.load_eliminater 1.28% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.74% : 0.000002s : 18: predicate.loop_unroll_before_grad 1.75% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.72% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.71% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.75% : 0.000001s : 9: predicate.minmaximum_grad 1.49% : 0.000002s : 4: predicate.mutable_eliminate 0.40% : 0.000001s : 4: predicate.opt_reshape 0.45% : 0.000001s : 4: predicate.parallel_virtual_node 1.22% : 0.000002s : 11: predicate.partial_defer_inline 1.24% : 0.000002s : 13: predicate.partial_eliminate 0.78% : 0.000001s : 9: predicate.print_const_string_wrapper 0.71% : 0.000001s : 8: predicate.reduce_all_const_elim 0.98% : 0.000001s : 9: predicate.reduce_eliminate 2.19% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.78% : 0.000001s : 8: predicate.remove_not_recompute_node 1.38% : 0.000002s : 17: predicate.replace_applicator 0.82% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000000s : 4: predicate.reset_defer_inline 0.90% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.46% : 0.000001s : 4: predicate.row_tensor_eliminate 0.90% : 0.000001s : 8: predicate.same_eliminate 0.62% : 0.000001s : 8: predicate.set_cell_output_no_recompute 1.03% : 0.000001s : 8: predicate.shard_identity_eliminate 0.80% : 0.000001s : 8: predicate.special_op_eliminate 0.97% : 0.000001s : 8: predicate.specialize_transform 1.13% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.02% : 0.000001s : 11: predicate.switch_defer_inline 1.77% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.33% : 0.000006s : 41: predicate.switch_simplify 0.77% : 0.000001s : 9: predicate.tile_eliminate 0.81% : 0.000001s : 9: predicate.transpose_eliminate 1.60% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.36% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.25% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.45% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.51% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.11% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.16% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.88% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.88% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000000s : 4: predicate.virtual_view_grad_eliminate 0.57% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000274 6 46.60% : 0.000128s : 2: func_graph_cloner_run.FuncGraphClonerGraph 53.40% : 0.000146s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.078519 196 0.00% : 0.000004s : 1: ForceFp32Comm 3.83% : 0.003004s : 1: add_attr 3.81% : 0.002995s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.06% : 0.000047s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.07% : 0.000056s : 1: auto_monad 0.03% : 0.000020s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.65% : 0.000511s : 1: bootstrap 0.03% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000015s : 1: control_data_broadcast_order 0.01% : 0.000010s : 1: convert_after_rewriter 0.03% : 0.000022s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000007s : 1: environ_conv 0.02% : 0.000016s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000006s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.54% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.58% : 0.000453s : 1: mutable_eliminate 0.01% : 0.000007s : 1: offloading_packed_experts 0.02% : 0.000013s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000013s : 1: opt.transform.mutable_eliminate 0.97% : 0.000759s : 78: opt.transform.opt_a 0.03% : 0.000027s : 1: opt.transform.opt_after_cconv 0.03% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.11% : 0.000089s : 28: opt.transform.opt_b 0.05% : 0.000043s : 2: opt.transform.opt_trans_graph 0.04% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.34% : 0.001841s : 1: opt_a 0.13% : 0.000099s : 1: opt_after_cconv 0.58% : 0.000457s : 1: opt_after_jit_grad 0.23% : 0.000182s : 1: opt_b 4.63% : 0.003634s : 1: optimize 0.02% : 0.000018s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000004s : 1: pipeline_split 0.03% : 0.000024s : 1: pre_auto_parallel 0.02% : 0.000018s : 1: py_interpret_to_execute 0.01% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.02% : 0.000016s : 1: remove_dup_value 0.24% : 0.000185s : 1: renormalize.infer 0.19% : 0.000148s : 1: renormalize.specialize 0.01% : 0.000006s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.04% : 0.000035s : 1: rewriter_after_opt_a 0.06% : 0.000043s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.09% : 0.000070s : 1: symbol_engine_optimizer 74.32% : 0.058358s : 1: task_emit 0.09% : 0.000071s : 1: tuple_transform 5.60% : 0.004394s : 1: type_inference 0.07% : 0.000052s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x8-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x8-ge],max_mem:68.0M . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x9-pynative] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x9-pynative],max_mem:68.0M TotalTime = 0.0209748, [24] [bootstrap]: 0.00055335 [type_inference]: 0.00609074 [event_method]: 1.467e-05 [auto_monad]: 5.522e-05 [graph_reusing]: 5.37001e-06 [inline]: 1.81e-06 [add_attr]: 0.00335084, [1] [add_attr_with_inline]: 0.00333999, [1] [Cycle 1]: 4.425e-05, [2] [tag_attr]: 1.542e-05 [meta_addattr_fg_expand]: 4.15e-06 [parallel-infer-symbol]: 2.72001e-06 [pre_auto_parallel]: 2.756e-05 [insert-virtual-dataset]: 2.27999e-06 [parallel-infer-symbol-second]: 7.7e-07 [dataset_repeat_opt]: 1.93002e-06 [pipeline_split]: 1.50999e-06 [optimize]: 0.00400019, [53] [py_interpret_to_execute]: 2.104e-05 [rewriter_before_opt_a]: 5.647e-05 [opt_a]: 0.00216368, [2] [Cycle 1]: 0.00155724, [45] [expand_dump_flag]: 2.93e-06 [switch_simplify]: 3.153e-05 [loop_unroll]: 2.1e-05 [a_1]: 0.0004535 [with_stream_mark]: 1.328e-05 [recompute_prepare]: 7.95e-06 [updatestate_depend_eliminate]: 3.48e-06 [updatestate_assign_eliminate]: 3.20998e-06 [updatestate_loads_eliminate]: 3.06999e-06 [parameter_eliminate]: 1.52999e-06 [a_2]: 7.552e-05 [accelerated_algorithm]: 6.41e-06 [shard]: 1.94e-06 [meta_shard_fg_expand]: 1.55999e-06 [shard_inline]: 5.99999e-06 [merge_send_recv]: 7.53999e-06 [auto_parallel]: 5.97999e-06 [parallel]: 2.266e-05 [flash_sp]: 6.88e-06 [merge_comm]: 3.53e-06 [allreduce_fusion]: 3.22002e-06 [matmul_add_comm_reduction]: 9.27001e-06 [allreduce_slice_to_reducescatter]: 6.79982e-07 [virtual_shard_identity]: 6.83e-06 [virtual_dataset]: 5.86e-06 [get_grad_eliminate_]: 5.44998e-06 [virtual_output]: 5.60001e-06 [merge_forward]: 3.49001e-06 [cell_reuse_recompute_pass]: 1.02e-06 [offload_activation]: 9.49e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.046e-05 [merge_recompute_call_nodes]: 1.31002e-06 [before_grad]: 9.19e-06 [set_forward_comm_id_for_comm_node_pass]: 3.41999e-06 [meta_fg_expand]: 2.12001e-06 [flash_sp_send_recv_attached]: 2.84999e-06 [receive_attached]: 2.21e-06 [after_resolve]: 9.76e-06 [a_after_grad]: 8.69e-06 [renormalize]: 0.00041565 [add_forward_monad_depend]: 4.55001e-06 [auto_monad_grad]: 1.62999e-06 [auto_monad_eliminator]: 1.27e-05 [cse]: 7.475e-05 [a_3]: 4.112e-05 [Cycle 2]: 0.00059682, [45] [expand_dump_flag]: 1.00001e-06 [switch_simplify]: 6.86999e-06 [loop_unroll]: 5.40999e-06 [a_1]: 0.00012509 [with_stream_mark]: 1.025e-05 [recompute_prepare]: 5.73002e-06 [updatestate_depend_eliminate]: 2.78e-06 [updatestate_assign_eliminate]: 2.13998e-06 [updatestate_loads_eliminate]: 2.21e-06 [parameter_eliminate]: 1.05999e-06 [a_2]: 6.693e-05 [accelerated_algorithm]: 5.34998e-06 [shard]: 1.02e-06 [meta_shard_fg_expand]: 1.07998e-06 [shard_inline]: 7.26001e-06 [merge_send_recv]: 4.2e-06 [auto_parallel]: 5.37001e-06 [parallel]: 4.22e-06 [flash_sp]: 3.31001e-06 [merge_comm]: 3.14001e-06 [allreduce_fusion]: 2.69001e-06 [matmul_add_comm_reduction]: 4.84e-06 [allreduce_slice_to_reducescatter]: 3.19997e-07 [virtual_shard_identity]: 7.06001e-06 [virtual_dataset]: 5.35001e-06 [get_grad_eliminate_]: 5.55001e-06 [virtual_output]: 5.21002e-06 [merge_forward]: 2.84999e-06 [cell_reuse_recompute_pass]: 1.24998e-06 [offload_activation]: 6.06e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.72001e-06 [merge_recompute_call_nodes]: 7.00005e-07 [before_grad]: 7.9e-06 [set_forward_comm_id_for_comm_node_pass]: 3.23e-06 [meta_fg_expand]: 1.60999e-06 [flash_sp_send_recv_attached]: 7.7e-07 [receive_attached]: 1.22999e-06 [after_resolve]: 8.87e-06 [a_after_grad]: 8.31002e-06 [renormalize]: 7.99773e-08 [add_forward_monad_depend]: 1.20001e-06 [auto_monad_grad]: 8.49977e-07 [auto_monad_eliminator]: 6.54001e-06 [cse]: 1.373e-05 [a_3]: 3.185e-05 [py_interpret_to_execute_after_opt_a]: 7.37002e-06 [slice_cell_reuse_recomputed_activation]: 1.89e-06 [rewriter_after_opt_a]: 3.106e-05 [convert_after_rewriter]: 7.18e-06 [order_py_execute_after_rewriter]: 5.52999e-06 [mutable_eliminate]: 0.00044746 [opt_b]: 0.00018228, [1] [Cycle 1]: 0.00017635, [7] [b_1]: 0.00010778 [b_2]: 7.41999e-06 [updatestate_depend_eliminate]: 4.99998e-06 [updatestate_assign_eliminate]: 2.29999e-06 [updatestate_loads_eliminate]: 2.26e-06 [renormalize]: 4.80009e-07 [cse]: 1.695e-05 [optimize_parallel_all_gather_comm]: 1.572e-05 [overlap_param_gather]: 1.81e-06 [cconv]: 2.183e-05 [loop_unroll]: 0.00041204 [opt_after_cconv]: 9.514e-05, [1] [Cycle 1]: 8.951e-05, [7] [c_1]: 2.774e-05 [parameter_eliminate]: 2.41e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.61999e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.703e-05 [renormalize]: 3.4002e-07 [remove_dup_value]: 1.215e-05 [tuple_transform]: 6.853e-05, [1] [Cycle 1]: 6.418e-05, [4] [d_1]: 3.833e-05 [none_parameter_eliminate]: 1.78002e-06 [renormalize]: 1.39989e-07 [switch_simplify]: 6.11998e-06 [partial_unused_args_eliminate]: 1.57001e-06 [add_recomputation]: 4.869e-05 [cse_after_recomputation]: 2.113e-05, [1] [Cycle 1]: 1.667e-05, [1] [cse]: 1.144e-05 [environ_conv]: 4.97999e-06 [swap_dp_allreduce_reducescatter]: 5.41002e-06 [bias_add_comm_swap]: 2.39001e-06 [label_micro_interleaved_index]: 4.51002e-06 [label_fine_grained_interleaved_index]: 2.68998e-06 [merge_cast_opt]: 1.20999e-06 [slice_recompute_activation]: 2.37001e-06 [micro_interleaved_order_control]: 2.56e-06 [assign_add_opt]: 1.21002e-06 [ForceFp32Comm]: 7.00005e-07 [remove_cast_before_assign_add]: 1.07998e-06 [full_micro_interleaved_order_control]: 2.45002e-06 [reorder_send_recv_between_fp_bp]: 2.48e-06 [comm_op_add_attrs]: 9.80013e-07 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.40001e-06 [interleave_parallel_branches]: 1.02e-06 [overlap_opt_shard_in_pipeline]: 1.12999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.57999e-06 [control_data_broadcast_order]: 1.215e-05 [grouped_pairwise_exchange_alltoall]: 1.80001e-06 [offloading_packed_experts]: 3.83001e-06 [overlap_recompute_and_grad_model_parallel]: 4.59002e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.43002e-06 [overlap_recompute_allgather_and_fa_grad]: 1.34e-06 [overlap_recompute_comm]: 2.12999e-06 [overlap_grad_ring_attention]: 3.88001e-06 [overlap_grad_flash_sp]: 1.757e-05 [begin_end_overlap_inline]: 4.80009e-07 [split_matmul_comm_elemetwise]: 2.07001e-06 [split_layernorm_comm]: 1.94e-06 [handle_group_info]: 9.50007e-07 [symbol_engine_optimizer]: 6.834e-05, [1] [Cycle 1]: 6.407e-05, [6] [build]: 2.37001e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.111e-05 [opt_reshape]: 6.33002e-06 [fold_const_symbol]: 9.02999e-06 [renormalize]: 1.8999e-07 [detach_backward]: 2.05002e-06 [pipeline_parallel_scheduler]: 1.40001e-06 [auto_monad_reorder]: 1.534e-05 [get_jit_bprop_graph]: 1.00001e-06 [rewriter_after_jit_bprop_graph]: 3.4e-06 [opt_after_jit_grad]: 0.00044912 [validate]: 3.159e-05 [backend_pass]: 9.00007e-07 [task_emit]: 0.0061558 [execute]: 6.83e-06 Sums bootstrap : 0.000553s : 3.32% type_inference : 0.006091s : 36.56% event_method : 0.000015s : 0.09% auto_monad : 0.000055s : 0.33% graph_reusing : 0.000005s : 0.03% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000015s : 0.09% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000028s : 0.17% insert-virtual-dataset : 0.000002s : 0.01% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000021s : 0.13% optimize.rewriter_before_opt_a : 0.000056s : 0.34% optimize.opt_a.expand_dump_flag : 0.000004s : 0.02% optimize.opt_a.switch_simplify : 0.000038s : 0.23% optimize.opt_a.loop_unroll : 0.000026s : 0.16% optimize.opt_a.a_1 : 0.000579s : 3.47% optimize.opt_a.with_stream_mark : 0.000024s : 0.14% optimize.opt_a.recompute_prepare : 0.000014s : 0.08% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.04% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.03% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.03% optimize.opt_a.parameter_eliminate : 0.000003s : 0.02% optimize.opt_a.a_2 : 0.000142s : 0.86% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.07% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000013s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.07% optimize.opt_a.auto_parallel : 0.000011s : 0.07% optimize.opt_a.parallel : 0.000027s : 0.16% optimize.opt_a.flash_sp : 0.000010s : 0.06% optimize.opt_a.merge_comm : 0.000007s : 0.04% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.08% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.08% optimize.opt_a.virtual_dataset : 0.000011s : 0.07% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.07% optimize.opt_a.virtual_output : 0.000011s : 0.06% optimize.opt_a.merge_forward : 0.000006s : 0.04% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.01% optimize.opt_a.offload_activation : 0.000016s : 0.09% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.12% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.10% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.04% optimize.opt_a.meta_fg_expand : 0.000004s : 0.02% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000019s : 0.11% optimize.opt_a.a_after_grad : 0.000017s : 0.10% optimize.opt_a.renormalize : 0.000416s : 2.50% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.03% optimize.opt_a.auto_monad_grad : 0.000002s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000019s : 0.12% optimize.opt_a.cse : 0.000088s : 0.53% optimize.opt_a.a_3 : 0.000073s : 0.44% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.04% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000031s : 0.19% optimize.convert_after_rewriter : 0.000007s : 0.04% optimize.order_py_execute_after_rewriter : 0.000006s : 0.03% optimize.mutable_eliminate : 0.000447s : 2.69% optimize.opt_b.b_1 : 0.000108s : 0.65% optimize.opt_b.b_2 : 0.000007s : 0.04% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.01% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.10% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.09% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.13% optimize.loop_unroll : 0.000412s : 2.47% optimize.opt_after_cconv.c_1 : 0.000028s : 0.17% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000017s : 0.10% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000012s : 0.07% optimize.tuple_transform.d_1 : 0.000038s : 0.23% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000049s : 0.29% optimize.cse_after_recomputation.cse : 0.000011s : 0.07% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.01% optimize.label_micro_interleaved_index : 0.000005s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000002s : 0.01% optimize.micro_interleaved_order_control : 0.000003s : 0.02% optimize.assign_add_opt : 0.000001s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000012s : 0.07% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.02% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.01% optimize.overlap_grad_ring_attention : 0.000004s : 0.02% optimize.overlap_grad_flash_sp : 0.000018s : 0.11% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.05% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.07% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.05% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.09% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000003s : 0.02% opt_after_jit_grad : 0.000449s : 2.70% validate : 0.000032s : 0.19% backend_pass : 0.000001s : 0.01% task_emit : 0.006156s : 36.95% execute : 0.000007s : 0.04% Time group info: ------[substitution.] 0.000163 30 14.94% : 0.000024s : 5: substitution.arithmetic_simplify 1.09% : 0.000002s : 2: substitution.elim_not_effective 0.77% : 0.000001s : 2: substitution.fold_const_symbol 3.22% : 0.000005s : 4: substitution.graph_param_transform 66.59% : 0.000109s : 3: substitution.inline 1.69% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.65% : 0.000004s : 4: substitution.remove_not_recompute_node 2.51% : 0.000004s : 4: substitution.replace_old_param 6.54% : 0.000011s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.006049 2 90.46% : 0.005472s : 1: type_inference.infer 9.54% : 0.000577s : 1: type_inference.specialize ------[replace.] 0.000038 5 69.52% : 0.000027s : 3: replace.inline 30.48% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000116 5 91.71% : 0.000107s : 3: match.inline 8.29% : 0.000010s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000158 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.78% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.85% : 0.000001s : 11: predicate.addn_zero_filter 0.78% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.13% : 0.000003s : 19: predicate.arithmetic_simplify 0.96% : 0.000002s : 11: predicate.cast_eliminate 0.71% : 0.000001s : 8: predicate.check_bprop_eliminate 0.60% : 0.000001s : 8: predicate.compare_switch_simplify 0.27% : 0.000000s : 4: predicate.const_output_eliminate 0.61% : 0.000001s : 8: predicate.depend_value_elim 0.89% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.15% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.40% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.05% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_depend_swap 1.82% : 0.000003s : 23: predicate.environ_get_eliminate 1.15% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.26% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.37% : 0.000004s : 16: predicate.float_depend_g_call 0.58% : 0.000001s : 8: predicate.float_environ_get_switch 0.89% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 0.77% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.66% : 0.000001s : 8: predicate.incorporate_call 0.56% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000009s : 51: predicate.inline 0.90% : 0.000001s : 8: predicate.inline_without_move 0.42% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.88% : 0.000001s : 8: predicate.less_batch_normalization 1.67% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.39% : 0.000004s : 32: predicate.load_eliminater 1.02% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.25% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.71% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.59% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.65% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.80% : 0.000001s : 11: predicate.minmaximum_grad 1.17% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.39% : 0.000001s : 4: predicate.parallel_virtual_node 1.63% : 0.000003s : 16: predicate.partial_defer_inline 1.52% : 0.000002s : 17: predicate.partial_eliminate 0.84% : 0.000001s : 11: predicate.print_const_string_wrapper 0.60% : 0.000001s : 8: predicate.reduce_all_const_elim 1.03% : 0.000002s : 11: predicate.reduce_eliminate 2.39% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.49% : 0.000001s : 8: predicate.remove_not_recompute_node 1.49% : 0.000002s : 21: predicate.replace_applicator 0.64% : 0.000001s : 8: predicate.replace_old_param 0.32% : 0.000001s : 4: predicate.reset_defer_inline 0.87% : 0.000001s : 11: predicate.reshape_eliminate 0.73% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.38% : 0.000001s : 4: predicate.row_tensor_eliminate 0.84% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.82% : 0.000001s : 8: predicate.shard_identity_eliminate 0.75% : 0.000001s : 8: predicate.special_op_eliminate 0.81% : 0.000001s : 8: predicate.specialize_transform 1.04% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.84% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.35% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.09% : 0.000008s : 54: predicate.switch_simplify 0.82% : 0.000001s : 11: predicate.tile_eliminate 0.85% : 0.000001s : 11: predicate.transpose_eliminate 1.53% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.53% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.41% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.44% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.18% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.61% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.56% : 0.000006s : 40: predicate.updatestate_useless_node_eliminater 0.39% : 0.000001s : 4: predicate.value_based_eliminate 0.75% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.73% : 0.000001s : 8: predicate.virtual_output_eliminate 0.34% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000370 8 46.46% : 0.000172s : 3: func_graph_cloner_run.FuncGraphClonerGraph 53.54% : 0.000198s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.029840 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.24% : 0.003355s : 1: add_attr 11.21% : 0.003344s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000053s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.20% : 0.000060s : 1: auto_monad 0.06% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.98% : 0.000592s : 1: bootstrap 0.08% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.05% : 0.000015s : 1: control_data_broadcast_order 0.03% : 0.000010s : 1: convert_after_rewriter 0.08% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.07% : 0.000020s : 1: event_method 0.04% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000009s : 1: graph_reusing 0.02% : 0.000005s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.02% : 0.000007s : 1: label_micro_interleaved_index 1.41% : 0.000421s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.53% : 0.000457s : 1: mutable_eliminate 0.02% : 0.000007s : 1: offloading_packed_experts 0.04% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.04% : 0.000013s : 1: opt.transform.mutable_eliminate 3.16% : 0.000944s : 78: opt.transform.opt_a 0.09% : 0.000027s : 1: opt.transform.opt_after_cconv 0.07% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.30% : 0.000090s : 28: opt.transform.opt_b 0.14% : 0.000043s : 2: opt.transform.opt_trans_graph 0.11% : 0.000032s : 4: opt.transform.symbol_engine_opt 7.26% : 0.002167s : 1: opt_a 0.33% : 0.000098s : 1: opt_after_cconv 1.54% : 0.000458s : 1: opt_after_jit_grad 0.62% : 0.000186s : 1: opt_b 13.42% : 0.004004s : 1: optimize 0.06% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.07% : 0.000021s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.02% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.02% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000004s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000005s : 1: pipeline_split 0.11% : 0.000032s : 1: pre_auto_parallel 0.09% : 0.000025s : 1: py_interpret_to_execute 0.04% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.05% : 0.000015s : 1: remove_dup_value 0.70% : 0.000208s : 1: renormalize.infer 0.67% : 0.000201s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.02% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.12% : 0.000035s : 1: rewriter_after_opt_a 0.20% : 0.000061s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.24% : 0.000071s : 1: symbol_engine_optimizer 20.66% : 0.006166s : 1: task_emit 0.24% : 0.000071s : 1: tuple_transform 20.46% : 0.006104s : 1: type_inference 0.21% : 0.000062s : 1: validate TotalTime = 0.0180262, [24] [bootstrap]: 0.00046144 [type_inference]: 0.0043339 [event_method]: 1.07e-05 [auto_monad]: 4.96e-05 [graph_reusing]: 5.01002e-06 [inline]: 1.67999e-06 [add_attr]: 0.0029236, [1] [add_attr_with_inline]: 0.00291578, [1] [Cycle 1]: 4.408e-05, [2] [tag_attr]: 1.164e-05 [meta_addattr_fg_expand]: 3.11001e-06 [parallel-infer-symbol]: 2.65002e-06 [pre_auto_parallel]: 2.103e-05 [insert-virtual-dataset]: 2.76999e-06 [parallel-infer-symbol-second]: 6.59988e-07 [dataset_repeat_opt]: 1.89e-06 [pipeline_split]: 1.57999e-06 [optimize]: 0.00367563, [53] [py_interpret_to_execute]: 5.211e-05 [rewriter_before_opt_a]: 3.989e-05 [opt_a]: 0.00184251, [2] [Cycle 1]: 0.00123757, [45] [expand_dump_flag]: 2.86e-06 [switch_simplify]: 2.451e-05 [loop_unroll]: 1.375e-05 [a_1]: 0.00028962 [with_stream_mark]: 1.395e-05 [recompute_prepare]: 7.68001e-06 [updatestate_depend_eliminate]: 3.56001e-06 [updatestate_assign_eliminate]: 3.2e-06 [updatestate_loads_eliminate]: 3.20998e-06 [parameter_eliminate]: 1.55999e-06 [a_2]: 7.657e-05 [accelerated_algorithm]: 6.27001e-06 [shard]: 2.36e-06 [meta_shard_fg_expand]: 1.40999e-06 [shard_inline]: 5.83997e-06 [merge_send_recv]: 7.71999e-06 [auto_parallel]: 5.90002e-06 [parallel]: 1.744e-05 [flash_sp]: 7.31001e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.31999e-06 [matmul_add_comm_reduction]: 8.59e-06 [allreduce_slice_to_reducescatter]: 6.19999e-07 [virtual_shard_identity]: 7.08e-06 [virtual_dataset]: 6.18998e-06 [get_grad_eliminate_]: 5.60001e-06 [virtual_output]: 5.96003e-06 [merge_forward]: 3.68e-06 [cell_reuse_recompute_pass]: 1.10001e-06 [offload_activation]: 9.56e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.089e-05 [merge_recompute_call_nodes]: 1.35999e-06 [before_grad]: 9.10999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.45e-06 [meta_fg_expand]: 2.41e-06 [flash_sp_send_recv_attached]: 2.68003e-06 [receive_attached]: 2.21e-06 [after_resolve]: 1.034e-05 [a_after_grad]: 8.72998e-06 [renormalize]: 0.00033092 [add_forward_monad_depend]: 4.77e-06 [auto_monad_grad]: 1.66002e-06 [auto_monad_eliminator]: 1.318e-05 [cse]: 2.683e-05 [a_3]: 3.973e-05 [Cycle 2]: 0.0005956, [45] [expand_dump_flag]: 1.01002e-06 [switch_simplify]: 7.01999e-06 [loop_unroll]: 5.42999e-06 [a_1]: 0.00012519 [with_stream_mark]: 1.066e-05 [recompute_prepare]: 5.85002e-06 [updatestate_depend_eliminate]: 2.86999e-06 [updatestate_assign_eliminate]: 2.24001e-06 [updatestate_loads_eliminate]: 2.62001e-06 [parameter_eliminate]: 8.50006e-07 [a_2]: 6.753e-05 [accelerated_algorithm]: 5.59998e-06 [shard]: 1.03001e-06 [meta_shard_fg_expand]: 1.10001e-06 [shard_inline]: 5.66e-06 [merge_send_recv]: 4.26001e-06 [auto_parallel]: 5.30001e-06 [parallel]: 4.12998e-06 [flash_sp]: 3.5e-06 [merge_comm]: 3.20998e-06 [allreduce_fusion]: 2.83998e-06 [matmul_add_comm_reduction]: 5.13002e-06 [allreduce_slice_to_reducescatter]: 3.69997e-07 [virtual_shard_identity]: 6.63e-06 [virtual_dataset]: 5.40001e-06 [get_grad_eliminate_]: 5.15001e-06 [virtual_output]: 5.17e-06 [merge_forward]: 2.88e-06 [cell_reuse_recompute_pass]: 1.29e-06 [offload_activation]: 5.91e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.64e-06 [merge_recompute_call_nodes]: 6.80011e-07 [before_grad]: 8.38001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.11001e-06 [meta_fg_expand]: 1.67001e-06 [flash_sp_send_recv_attached]: 8.2e-07 [receive_attached]: 9.80013e-07 [after_resolve]: 9.67999e-06 [a_after_grad]: 8.51002e-06 [renormalize]: 8.9989e-08 [add_forward_monad_depend]: 1.21997e-06 [auto_monad_grad]: 7.50006e-07 [auto_monad_eliminator]: 6.56e-06 [cse]: 1.205e-05 [a_3]: 3.218e-05 [py_interpret_to_execute_after_opt_a]: 7.13e-06 [slice_cell_reuse_recomputed_activation]: 1.96003e-06 [rewriter_after_opt_a]: 3.01e-05 [convert_after_rewriter]: 6.53e-06 [order_py_execute_after_rewriter]: 4.80001e-06 [mutable_eliminate]: 0.00044771 [opt_b]: 0.00018011, [1] [Cycle 1]: 0.00017416, [7] [b_1]: 0.00010754 [b_2]: 7.25e-06 [updatestate_depend_eliminate]: 5.10001e-06 [updatestate_assign_eliminate]: 2.37999e-06 [updatestate_loads_eliminate]: 2.22999e-06 [renormalize]: 3.29979e-07 [cse]: 1.611e-05 [optimize_parallel_all_gather_comm]: 1.548e-05 [overlap_param_gather]: 1.80001e-06 [cconv]: 2.194e-05 [loop_unroll]: 0.00040855 [opt_after_cconv]: 9.481e-05, [1] [Cycle 1]: 8.898e-05, [7] [c_1]: 2.809e-05 [parameter_eliminate]: 2.04e-06 [updatestate_depend_eliminate]: 4.80999e-06 [updatestate_assign_eliminate]: 2.42001e-06 [updatestate_loads_eliminate]: 2.02001e-06 [cse]: 1.634e-05 [renormalize]: 2.50002e-07 [remove_dup_value]: 1.315e-05 [tuple_transform]: 6.816e-05, [1] [Cycle 1]: 6.387e-05, [4] [d_1]: 3.847e-05 [none_parameter_eliminate]: 1.55001e-06 [renormalize]: 1.69995e-07 [switch_simplify]: 6.28002e-06 [partial_unused_args_eliminate]: 1.62001e-06 [add_recomputation]: 4.305e-05 [cse_after_recomputation]: 2.128e-05, [1] [Cycle 1]: 1.695e-05, [1] [cse]: 1.162e-05 [environ_conv]: 4.56002e-06 [swap_dp_allreduce_reducescatter]: 4.82998e-06 [bias_add_comm_swap]: 2.16e-06 [label_micro_interleaved_index]: 4.22e-06 [label_fine_grained_interleaved_index]: 3.08998e-06 [merge_cast_opt]: 1.21002e-06 [slice_recompute_activation]: 2.58e-06 [micro_interleaved_order_control]: 2.12001e-06 [assign_add_opt]: 1.64e-06 [ForceFp32Comm]: 8.59989e-07 [remove_cast_before_assign_add]: 1.00001e-06 [full_micro_interleaved_order_control]: 2.07001e-06 [reorder_send_recv_between_fp_bp]: 2.51e-06 [comm_op_add_attrs]: 9.5999e-07 [add_comm_op_reuse_tag]: 9.30013e-07 [interleave_split_concat_branches]: 1.14e-06 [interleave_parallel_branches]: 1.05001e-06 [overlap_opt_shard_in_pipeline]: 1.03001e-06 [overlap_opt_shard_grad_in_pipeline]: 1.62999e-06 [control_data_broadcast_order]: 1.139e-05 [grouped_pairwise_exchange_alltoall]: 1.68002e-06 [offloading_packed_experts]: 3.61999e-06 [overlap_recompute_and_grad_model_parallel]: 4.43999e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.13001e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32e-06 [overlap_recompute_comm]: 2.21998e-06 [overlap_grad_ring_attention]: 3.97002e-06 [overlap_grad_flash_sp]: 1.716e-05 [begin_end_overlap_inline]: 4.60015e-07 [split_matmul_comm_elemetwise]: 2.04e-06 [split_layernorm_comm]: 1.79e-06 [handle_group_info]: 1.15999e-06 [symbol_engine_optimizer]: 6.779e-05, [1] [Cycle 1]: 6.371e-05, [6] [build]: 2.07999e-06 [elim_shapecalc]: 8.38999e-06 [elim_not_effective]: 1.13e-05 [opt_reshape]: 6.24999e-06 [fold_const_symbol]: 8.61002e-06 [renormalize]: 1.69995e-07 [detach_backward]: 1.64998e-06 [pipeline_parallel_scheduler]: 1.44998e-06 [auto_monad_reorder]: 1.527e-05 [get_jit_bprop_graph]: 1.02998e-06 [rewriter_after_jit_bprop_graph]: 3.58e-06 [opt_after_jit_grad]: 0.00045985 [validate]: 2.937e-05 [backend_pass]: 1.00001e-06 [task_emit]: 0.00582153 [execute]: 6.89999e-06 Sums bootstrap : 0.000461s : 3.26% type_inference : 0.004334s : 30.63% event_method : 0.000011s : 0.08% auto_monad : 0.000050s : 0.35% graph_reusing : 0.000005s : 0.04% inline : 0.000002s : 0.01% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.08% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.02% parallel-infer-symbol : 0.000003s : 0.02% pre_auto_parallel : 0.000021s : 0.15% insert-virtual-dataset : 0.000003s : 0.02% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.01% pipeline_split : 0.000002s : 0.01% optimize.py_interpret_to_execute : 0.000052s : 0.37% optimize.rewriter_before_opt_a : 0.000040s : 0.28% optimize.opt_a.expand_dump_flag : 0.000004s : 0.03% optimize.opt_a.switch_simplify : 0.000032s : 0.22% optimize.opt_a.loop_unroll : 0.000019s : 0.14% optimize.opt_a.a_1 : 0.000415s : 2.93% optimize.opt_a.with_stream_mark : 0.000025s : 0.17% optimize.opt_a.recompute_prepare : 0.000014s : 0.10% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.05% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.04% optimize.opt_a.updatestate_loads_eliminate : 0.000006s : 0.04% optimize.opt_a.parameter_eliminate : 0.000002s : 0.02% optimize.opt_a.a_2 : 0.000144s : 1.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.08% optimize.opt_a.shard : 0.000003s : 0.02% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.02% optimize.opt_a.shard_inline : 0.000011s : 0.08% optimize.opt_a.merge_send_recv : 0.000012s : 0.08% optimize.opt_a.auto_parallel : 0.000011s : 0.08% optimize.opt_a.parallel : 0.000022s : 0.15% optimize.opt_a.flash_sp : 0.000011s : 0.08% optimize.opt_a.merge_comm : 0.000007s : 0.05% optimize.opt_a.allreduce_fusion : 0.000006s : 0.04% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.10% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.01% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.10% optimize.opt_a.virtual_dataset : 0.000012s : 0.08% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.08% optimize.opt_a.virtual_output : 0.000011s : 0.08% optimize.opt_a.merge_forward : 0.000007s : 0.05% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.02% optimize.opt_a.offload_activation : 0.000015s : 0.11% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000021s : 0.15% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.01% optimize.opt_a.before_grad : 0.000017s : 0.12% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.05% optimize.opt_a.meta_fg_expand : 0.000004s : 0.03% optimize.opt_a.flash_sp_send_recv_attached : 0.000004s : 0.02% optimize.opt_a.receive_attached : 0.000003s : 0.02% optimize.opt_a.after_resolve : 0.000020s : 0.14% optimize.opt_a.a_after_grad : 0.000017s : 0.12% optimize.opt_a.renormalize : 0.000331s : 2.34% optimize.opt_a.add_forward_monad_depend : 0.000006s : 0.04% optimize.opt_a.auto_monad_grad : 0.000002s : 0.02% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.14% optimize.opt_a.cse : 0.000039s : 0.27% optimize.opt_a.a_3 : 0.000072s : 0.51% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.05% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.01% optimize.rewriter_after_opt_a : 0.000030s : 0.21% optimize.convert_after_rewriter : 0.000007s : 0.05% optimize.order_py_execute_after_rewriter : 0.000005s : 0.03% optimize.mutable_eliminate : 0.000448s : 3.16% optimize.opt_b.b_1 : 0.000108s : 0.76% optimize.opt_b.b_2 : 0.000007s : 0.05% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.04% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.02% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000016s : 0.11% optimize.optimize_parallel_all_gather_comm : 0.000015s : 0.11% optimize.overlap_param_gather : 0.000002s : 0.01% optimize.cconv : 0.000022s : 0.16% optimize.loop_unroll : 0.000409s : 2.89% optimize.opt_after_cconv.c_1 : 0.000028s : 0.20% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.03% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000002s : 0.02% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.01% optimize.opt_after_cconv.cse : 0.000016s : 0.12% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.09% optimize.tuple_transform.d_1 : 0.000038s : 0.27% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.01% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.04% optimize.partial_unused_args_eliminate : 0.000002s : 0.01% optimize.add_recomputation : 0.000043s : 0.30% optimize.cse_after_recomputation.cse : 0.000012s : 0.08% optimize.environ_conv : 0.000005s : 0.03% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.03% optimize.bias_add_comm_swap : 0.000002s : 0.02% optimize.label_micro_interleaved_index : 0.000004s : 0.03% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.02% optimize.merge_cast_opt : 0.000001s : 0.01% optimize.slice_recompute_activation : 0.000003s : 0.02% optimize.micro_interleaved_order_control : 0.000002s : 0.01% optimize.assign_add_opt : 0.000002s : 0.01% optimize.ForceFp32Comm : 0.000001s : 0.01% optimize.remove_cast_before_assign_add : 0.000001s : 0.01% optimize.full_micro_interleaved_order_control : 0.000002s : 0.01% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.02% optimize.comm_op_add_attrs : 0.000001s : 0.01% optimize.add_comm_op_reuse_tag : 0.000001s : 0.01% optimize.interleave_split_concat_branches : 0.000001s : 0.01% optimize.interleave_parallel_branches : 0.000001s : 0.01% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.01% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.01% optimize.control_data_broadcast_order : 0.000011s : 0.08% optimize.grouped_pairwise_exchange_alltoall : 0.000002s : 0.01% optimize.offloading_packed_experts : 0.000004s : 0.03% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.03% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.01% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.01% optimize.overlap_recompute_comm : 0.000002s : 0.02% optimize.overlap_grad_ring_attention : 0.000004s : 0.03% optimize.overlap_grad_flash_sp : 0.000017s : 0.12% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.01% optimize.split_layernorm_comm : 0.000002s : 0.01% optimize.handle_group_info : 0.000001s : 0.01% optimize.symbol_engine_optimizer.build : 0.000002s : 0.01% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.06% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.08% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.04% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.06% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.01% pipeline_parallel_scheduler : 0.000001s : 0.01% auto_monad_reorder : 0.000015s : 0.11% get_jit_bprop_graph : 0.000001s : 0.01% rewriter_after_jit_bprop_graph : 0.000004s : 0.03% opt_after_jit_grad : 0.000460s : 3.25% validate : 0.000029s : 0.21% backend_pass : 0.000001s : 0.01% task_emit : 0.005822s : 41.14% execute : 0.000007s : 0.05% Time group info: ------[substitution.] 0.000118 26 17.77% : 0.000021s : 4: substitution.arithmetic_simplify 1.54% : 0.000002s : 2: substitution.elim_not_effective 0.99% : 0.000001s : 2: substitution.fold_const_symbol 4.33% : 0.000005s : 4: substitution.graph_param_transform 65.89% : 0.000078s : 2: substitution.inline 2.50% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.70% : 0.000004s : 4: substitution.remove_not_recompute_node 3.29% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004292 2 92.01% : 0.003949s : 1: type_inference.infer 7.99% : 0.000343s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000076 2 100.00% : 0.000076s : 2: match.inline ------[predicate.] 0.000136 984 0.78% : 0.000001s : 9: predicate.accumulaten_eliminater 0.93% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.67% : 0.000001s : 8: predicate.addn_check_dump 0.75% : 0.000001s : 9: predicate.addn_zero_filter 0.70% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.47% : 0.000003s : 17: predicate.arithmetic_simplify 0.85% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.68% : 0.000001s : 8: predicate.compare_switch_simplify 0.26% : 0.000000s : 4: predicate.const_output_eliminate 0.71% : 0.000001s : 8: predicate.depend_value_elim 0.81% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.87% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.81% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.35% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.32% : 0.000000s : 4: predicate.elim_not_effective 0.47% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.11% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.04% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_depend_swap 1.88% : 0.000003s : 21: predicate.environ_get_eliminate 1.06% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.85% : 0.000003s : 11: predicate.float_depend_g_call 0.68% : 0.000001s : 8: predicate.float_environ_get_switch 1.04% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.28% : 0.000000s : 4: predicate.fold_const_symbol 0.83% : 0.000001s : 8: predicate.get_grad_eliminate 0.29% : 0.000000s : 4: predicate.graph_param_transform 0.84% : 0.000001s : 8: predicate.incorporate_call 0.67% : 0.000001s : 8: predicate.incorporate_call_switch 5.93% : 0.000008s : 44: predicate.inline 1.03% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 1.04% : 0.000001s : 8: predicate.less_batch_normalization 1.63% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.17% : 0.000003s : 26: predicate.load_eliminater 1.14% : 0.000002s : 4: predicate.loop_unroll_after_grad 1.84% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.69% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.70% : 0.000001s : 8: predicate.merge_addn 0.73% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.83% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.71% : 0.000001s : 9: predicate.minmaximum_grad 1.35% : 0.000002s : 4: predicate.mutable_eliminate 0.45% : 0.000001s : 4: predicate.opt_reshape 0.54% : 0.000001s : 4: predicate.parallel_virtual_node 1.29% : 0.000002s : 11: predicate.partial_defer_inline 1.22% : 0.000002s : 13: predicate.partial_eliminate 0.76% : 0.000001s : 9: predicate.print_const_string_wrapper 0.84% : 0.000001s : 8: predicate.reduce_all_const_elim 0.96% : 0.000001s : 9: predicate.reduce_eliminate 2.18% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.73% : 0.000001s : 8: predicate.remove_not_recompute_node 1.31% : 0.000002s : 17: predicate.replace_applicator 0.80% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000000s : 4: predicate.reset_defer_inline 0.99% : 0.000001s : 9: predicate.reshape_eliminate 0.83% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.59% : 0.000001s : 4: predicate.row_tensor_eliminate 0.94% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.98% : 0.000001s : 8: predicate.shard_identity_eliminate 0.86% : 0.000001s : 8: predicate.special_op_eliminate 0.94% : 0.000001s : 8: predicate.specialize_transform 1.16% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.94% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.45% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.03% : 0.000001s : 11: predicate.switch_defer_inline 1.72% : 0.000002s : 19: predicate.switch_layer_defer_inline 4.46% : 0.000006s : 41: predicate.switch_simplify 0.74% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.57% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.64% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.38% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.08% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.42% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.30% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.54% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.09% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 3.07% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.51% : 0.000001s : 4: predicate.value_based_eliminate 0.82% : 0.000001s : 8: predicate.virtual_dataset_eliminate 1.09% : 0.000001s : 8: predicate.virtual_output_eliminate 0.37% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.60% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000234 6 42.58% : 0.000099s : 2: func_graph_cloner_run.FuncGraphClonerGraph 57.42% : 0.000134s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.025885 196 0.01% : 0.000004s : 1: ForceFp32Comm 11.31% : 0.002928s : 1: add_attr 11.28% : 0.002919s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.18% : 0.000047s : 1: add_recomputation 0.02% : 0.000004s : 1: assign_add_opt 0.21% : 0.000055s : 1: auto_monad 0.07% : 0.000019s : 1: auto_monad_reorder 0.02% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.02% : 0.000005s : 1: bias_add_comm_swap 1.92% : 0.000497s : 1: bootstrap 0.10% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.06% : 0.000014s : 1: control_data_broadcast_order 0.04% : 0.000010s : 1: convert_after_rewriter 0.09% : 0.000024s : 1: cse_after_recomputation 0.02% : 0.000005s : 1: dataset_repeat_opt 0.02% : 0.000005s : 1: detach_backward 0.03% : 0.000008s : 1: environ_conv 0.06% : 0.000016s : 1: event_method 0.05% : 0.000012s : 1: execute 0.02% : 0.000005s : 1: full_micro_interleaved_order_control 0.02% : 0.000004s : 1: get_jit_bprop_graph 0.03% : 0.000008s : 1: graph_reusing 0.02% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.02% : 0.000005s : 1: inline 0.02% : 0.000006s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.02% : 0.000004s : 1: interleave_split_concat_branches 0.02% : 0.000006s : 1: label_fine_grained_interleaved_index 0.03% : 0.000007s : 1: label_micro_interleaved_index 1.61% : 0.000417s : 1: loop_unroll 0.02% : 0.000004s : 1: merge_cast_opt 0.02% : 0.000005s : 1: micro_interleaved_order_control 1.76% : 0.000457s : 1: mutable_eliminate 0.03% : 0.000007s : 1: offloading_packed_experts 0.05% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.05% : 0.000013s : 1: opt.transform.mutable_eliminate 2.97% : 0.000769s : 78: opt.transform.opt_a 0.10% : 0.000027s : 1: opt.transform.opt_after_cconv 0.08% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.35% : 0.000090s : 28: opt.transform.opt_b 0.17% : 0.000043s : 2: opt.transform.opt_trans_graph 0.12% : 0.000031s : 4: opt.transform.symbol_engine_opt 7.13% : 0.001845s : 1: opt_a 0.38% : 0.000098s : 1: opt_after_cconv 1.82% : 0.000470s : 1: opt_after_jit_grad 0.71% : 0.000184s : 1: opt_b 14.21% : 0.003679s : 1: optimize 0.07% : 0.000019s : 1: optimize_parallel_all_gather_comm 0.03% : 0.000008s : 1: order_py_execute_after_rewriter 0.08% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.03% : 0.000007s : 1: overlap_grad_ring_attention 0.02% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.02% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.03% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.02% : 0.000005s : 1: overlap_recompute_comm 0.02% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.02% : 0.000005s : 1: partial_unused_args_eliminate 0.02% : 0.000005s : 1: pipeline_parallel_scheduler 0.02% : 0.000004s : 1: pipeline_split 0.10% : 0.000025s : 1: pre_auto_parallel 0.22% : 0.000057s : 1: py_interpret_to_execute 0.04% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.06% : 0.000017s : 1: remove_dup_value 0.70% : 0.000182s : 1: renormalize.infer 0.55% : 0.000142s : 1: renormalize.specialize 0.02% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.03% : 0.000007s : 1: rewriter_after_jit_bprop_graph 0.13% : 0.000034s : 1: rewriter_after_opt_a 0.17% : 0.000044s : 1: rewriter_before_opt_a 0.02% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.02% : 0.000005s : 1: slice_recompute_activation 0.02% : 0.000005s : 1: split_layernorm_comm 0.02% : 0.000005s : 1: split_matmul_comm_elemetwise 0.03% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.27% : 0.000070s : 1: symbol_engine_optimizer 22.53% : 0.005832s : 1: task_emit 0.27% : 0.000071s : 1: tuple_transform 16.80% : 0.004347s : 1: type_inference 0.21% : 0.000055s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x9-kbk] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x9-kbk],max_mem:68.0M TotalTime = 0.919649, [24] [bootstrap]: 0.00053931 [type_inference]: 0.00596954 [event_method]: 1.39e-05 [auto_monad]: 6.079e-05 [graph_reusing]: 5.27999e-06 [inline]: 1.69e-06 [add_attr]: 0.00336156, [1] [add_attr_with_inline]: 0.00335099, [1] [Cycle 1]: 4.508e-05, [2] [tag_attr]: 1.576e-05 [meta_addattr_fg_expand]: 4.12998e-06 [parallel-infer-symbol]: 2.74999e-06 [pre_auto_parallel]: 2.89e-05 [insert-virtual-dataset]: 2.76999e-06 [parallel-infer-symbol-second]: 7.50006e-07 [dataset_repeat_opt]: 1.91e-06 [pipeline_split]: 1.76e-06 [optimize]: 0.00396561, [53] [py_interpret_to_execute]: 1.996e-05 [rewriter_before_opt_a]: 5.803e-05 [opt_a]: 0.0021307, [2] [Cycle 1]: 0.00153263, [45] [expand_dump_flag]: 2.71e-06 [switch_simplify]: 3.2e-05 [loop_unroll]: 2.212e-05 [a_1]: 0.00045234 [with_stream_mark]: 1.279e-05 [recompute_prepare]: 7.65e-06 [updatestate_depend_eliminate]: 3.61001e-06 [updatestate_assign_eliminate]: 3.14999e-06 [updatestate_loads_eliminate]: 2.53998e-06 [parameter_eliminate]: 1.59e-06 [a_2]: 7.574e-05 [accelerated_algorithm]: 6.68e-06 [shard]: 2.09e-06 [meta_shard_fg_expand]: 1.45999e-06 [shard_inline]: 6.01e-06 [merge_send_recv]: 8.05e-06 [auto_parallel]: 5.73002e-06 [parallel]: 2.314e-05 [flash_sp]: 7.35998e-06 [merge_comm]: 3.63e-06 [allreduce_fusion]: 3.32002e-06 [matmul_add_comm_reduction]: 9.11998e-06 [allreduce_slice_to_reducescatter]: 7.09988e-07 [virtual_shard_identity]: 7.28e-06 [virtual_dataset]: 6.31e-06 [get_grad_eliminate_]: 5.56e-06 [virtual_output]: 5.69999e-06 [merge_forward]: 3.74002e-06 [cell_reuse_recompute_pass]: 9.70002e-07 [offload_activation]: 9.04e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.081e-05 [merge_recompute_call_nodes]: 1.59e-06 [before_grad]: 9.20001e-06 [set_forward_comm_id_for_comm_node_pass]: 3.5e-06 [meta_fg_expand]: 2.34999e-06 [flash_sp_send_recv_attached]: 2.47001e-06 [receive_attached]: 2.32999e-06 [after_resolve]: 1.095e-05 [a_after_grad]: 8.59e-06 [renormalize]: 0.00040722 [add_forward_monad_depend]: 4.28999e-06 [auto_monad_grad]: 1.80001e-06 [auto_monad_eliminator]: 1.375e-05 [cse]: 2.716e-05 [a_3]: 4.052e-05 [Cycle 2]: 0.00058901, [45] [expand_dump_flag]: 9.50007e-07 [switch_simplify]: 6.93e-06 [loop_unroll]: 5.49e-06 [a_1]: 0.00012602 [with_stream_mark]: 9.19e-06 [recompute_prepare]: 5.71e-06 [updatestate_depend_eliminate]: 2.89999e-06 [updatestate_assign_eliminate]: 2.26998e-06 [updatestate_loads_eliminate]: 2.68998e-06 [parameter_eliminate]: 8.39995e-07 [a_2]: 6.787e-05 [accelerated_algorithm]: 5.52999e-06 [shard]: 1.17e-06 [meta_shard_fg_expand]: 1.17999e-06 [shard_inline]: 5.69e-06 [merge_send_recv]: 4.67e-06 [auto_parallel]: 5.00999e-06 [parallel]: 4.01001e-06 [flash_sp]: 3.45e-06 [merge_comm]: 3.06999e-06 [allreduce_fusion]: 2.74999e-06 [matmul_add_comm_reduction]: 5.12999e-06 [allreduce_slice_to_reducescatter]: 3.09985e-07 [virtual_shard_identity]: 6.23e-06 [virtual_dataset]: 5.27001e-06 [get_grad_eliminate_]: 5.07e-06 [virtual_output]: 5.24998e-06 [merge_forward]: 2.73e-06 [cell_reuse_recompute_pass]: 1.32e-06 [offload_activation]: 5.64998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.15001e-06 [merge_recompute_call_nodes]: 7.29982e-07 [before_grad]: 7.7e-06 [set_forward_comm_id_for_comm_node_pass]: 3.06001e-06 [meta_fg_expand]: 1.65001e-06 [flash_sp_send_recv_attached]: 7.60017e-07 [receive_attached]: 1.08001e-06 [after_resolve]: 9.07001e-06 [a_after_grad]: 7.55998e-06 [renormalize]: 7.00238e-08 [add_forward_monad_depend]: 1.04e-06 [auto_monad_grad]: 8.00006e-07 [auto_monad_eliminator]: 6.10002e-06 [cse]: 1.496e-05 [a_3]: 3.206e-05 [py_interpret_to_execute_after_opt_a]: 7.36001e-06 [slice_cell_reuse_recomputed_activation]: 2.07999e-06 [rewriter_after_opt_a]: 3.17e-05 [convert_after_rewriter]: 6.63e-06 [order_py_execute_after_rewriter]: 4.79e-06 [mutable_eliminate]: 0.00044793 [opt_b]: 0.0001819, [1] [Cycle 1]: 0.00017593, [7] [b_1]: 0.00010767 [b_2]: 7.16001e-06 [updatestate_depend_eliminate]: 5.38002e-06 [updatestate_assign_eliminate]: 2.37001e-06 [updatestate_loads_eliminate]: 2.47001e-06 [renormalize]: 4.90021e-07 [cse]: 1.656e-05 [optimize_parallel_all_gather_comm]: 1.609e-05 [overlap_param_gather]: 2.01e-06 [cconv]: 2.185e-05 [loop_unroll]: 0.00041506 [opt_after_cconv]: 9.519e-05, [1] [Cycle 1]: 8.949e-05, [7] [c_1]: 2.791e-05 [parameter_eliminate]: 2.36998e-06 [updatestate_depend_eliminate]: 5.00001e-06 [updatestate_assign_eliminate]: 2.51e-06 [updatestate_loads_eliminate]: 2.17999e-06 [cse]: 1.635e-05 [renormalize]: 2.19996e-07 [remove_dup_value]: 1.25e-05 [tuple_transform]: 6.879e-05, [1] [Cycle 1]: 6.455e-05, [4] [d_1]: 3.896e-05 [none_parameter_eliminate]: 1.50001e-06 [renormalize]: 1.80007e-07 [switch_simplify]: 6.29001e-06 [partial_unused_args_eliminate]: 1.59998e-06 [add_recomputation]: 4.732e-05 [cse_after_recomputation]: 2.012e-05, [1] [Cycle 1]: 1.57e-05, [1] [cse]: 1.056e-05 [environ_conv]: 4.63999e-06 [swap_dp_allreduce_reducescatter]: 5.25001e-06 [bias_add_comm_swap]: 2.53e-06 [label_micro_interleaved_index]: 4.09002e-06 [label_fine_grained_interleaved_index]: 2.63e-06 [merge_cast_opt]: 1.22e-06 [slice_recompute_activation]: 2.07999e-06 [micro_interleaved_order_control]: 2.07999e-06 [assign_add_opt]: 1.20999e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 1.57999e-06 [full_micro_interleaved_order_control]: 2.63e-06 [reorder_send_recv_between_fp_bp]: 2.49001e-06 [comm_op_add_attrs]: 1.04998e-06 [add_comm_op_reuse_tag]: 9.39996e-07 [interleave_split_concat_branches]: 1.17e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.25999e-06 [overlap_opt_shard_grad_in_pipeline]: 1.64e-06 [control_data_broadcast_order]: 1.179e-05 [grouped_pairwise_exchange_alltoall]: 1.49e-06 [offloading_packed_experts]: 3.25998e-06 [overlap_recompute_and_grad_model_parallel]: 4.52e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.17999e-06 [overlap_recompute_allgather_and_fa_grad]: 1.35999e-06 [overlap_recompute_comm]: 1.89e-06 [overlap_grad_ring_attention]: 3.97998e-06 [overlap_grad_flash_sp]: 1.68e-05 [begin_end_overlap_inline]: 5.09986e-07 [split_matmul_comm_elemetwise]: 2.06998e-06 [split_layernorm_comm]: 1.91e-06 [handle_group_info]: 9.5999e-07 [symbol_engine_optimizer]: 6.904e-05, [1] [Cycle 1]: 6.487e-05, [6] [build]: 2.29001e-06 [elim_shapecalc]: 8.47998e-06 [elim_not_effective]: 1.177e-05 [opt_reshape]: 6.16e-06 [fold_const_symbol]: 9.33002e-06 [renormalize]: 1.8999e-07 [detach_backward]: 1.78002e-06 [pipeline_parallel_scheduler]: 1.39003e-06 [auto_monad_reorder]: 1.508e-05 [get_jit_bprop_graph]: 9.80013e-07 [rewriter_after_jit_bprop_graph]: 3.35e-06 [opt_after_jit_grad]: 0.00044895 [validate]: 2.988e-05 [backend_pass]: 9.79984e-07 [task_emit]: 0.904942 [execute]: 9.22999e-06 Sums bootstrap : 0.000539s : 0.06% type_inference : 0.005970s : 0.65% event_method : 0.000014s : 0.00% auto_monad : 0.000061s : 0.01% graph_reusing : 0.000005s : 0.00% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000016s : 0.00% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000004s : 0.00% parallel-infer-symbol : 0.000003s : 0.00% pre_auto_parallel : 0.000029s : 0.00% insert-virtual-dataset : 0.000003s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000020s : 0.00% optimize.rewriter_before_opt_a : 0.000058s : 0.01% optimize.opt_a.expand_dump_flag : 0.000004s : 0.00% optimize.opt_a.switch_simplify : 0.000039s : 0.00% optimize.opt_a.loop_unroll : 0.000028s : 0.00% optimize.opt_a.a_1 : 0.000578s : 0.06% optimize.opt_a.with_stream_mark : 0.000022s : 0.00% optimize.opt_a.recompute_prepare : 0.000013s : 0.00% optimize.opt_a.updatestate_depend_eliminate : 0.000007s : 0.00% optimize.opt_a.updatestate_assign_eliminate : 0.000005s : 0.00% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.00% optimize.opt_a.parameter_eliminate : 0.000002s : 0.00% optimize.opt_a.a_2 : 0.000144s : 0.02% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.00% optimize.opt_a.shard : 0.000003s : 0.00% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.00% optimize.opt_a.shard_inline : 0.000012s : 0.00% optimize.opt_a.merge_send_recv : 0.000013s : 0.00% optimize.opt_a.auto_parallel : 0.000011s : 0.00% optimize.opt_a.parallel : 0.000027s : 0.00% optimize.opt_a.flash_sp : 0.000011s : 0.00% optimize.opt_a.merge_comm : 0.000007s : 0.00% optimize.opt_a.allreduce_fusion : 0.000006s : 0.00% optimize.opt_a.matmul_add_comm_reduction : 0.000014s : 0.00% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000014s : 0.00% optimize.opt_a.virtual_dataset : 0.000012s : 0.00% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.00% optimize.opt_a.virtual_output : 0.000011s : 0.00% optimize.opt_a.merge_forward : 0.000006s : 0.00% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.00% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.00% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.00% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000007s : 0.00% optimize.opt_a.meta_fg_expand : 0.000004s : 0.00% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.00% optimize.opt_a.receive_attached : 0.000003s : 0.00% optimize.opt_a.after_resolve : 0.000020s : 0.00% optimize.opt_a.a_after_grad : 0.000016s : 0.00% optimize.opt_a.renormalize : 0.000407s : 0.04% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.00% optimize.opt_a.auto_monad_grad : 0.000003s : 0.00% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.00% optimize.opt_a.cse : 0.000042s : 0.00% optimize.opt_a.a_3 : 0.000073s : 0.01% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.00% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000032s : 0.00% optimize.convert_after_rewriter : 0.000007s : 0.00% optimize.order_py_execute_after_rewriter : 0.000005s : 0.00% optimize.mutable_eliminate : 0.000448s : 0.05% optimize.opt_b.b_1 : 0.000108s : 0.01% optimize.opt_b.b_2 : 0.000007s : 0.00% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.00% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.00% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.00% optimize.loop_unroll : 0.000415s : 0.05% optimize.opt_after_cconv.c_1 : 0.000028s : 0.00% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.00% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.00% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.00% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.00% optimize.tuple_transform.d_1 : 0.000039s : 0.00% optimize.tuple_transform.none_parameter_eliminate : 0.000002s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.00% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000047s : 0.01% optimize.cse_after_recomputation.cse : 0.000011s : 0.00% optimize.environ_conv : 0.000005s : 0.00% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.00% optimize.bias_add_comm_swap : 0.000003s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.00% optimize.label_fine_grained_interleaved_index : 0.000003s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000002s : 0.00% optimize.full_micro_interleaved_order_control : 0.000003s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000002s : 0.00% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000012s : 0.00% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000003s : 0.00% optimize.overlap_recompute_and_grad_model_parallel : 0.000005s : 0.00% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.00% optimize.overlap_grad_flash_sp : 0.000017s : 0.00% optimize.begin_end_overlap_inline : 0.000001s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.00% optimize.symbol_engine_optimizer.elim_not_effective : 0.000012s : 0.00% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.00% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.00% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.00% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.00% opt_after_jit_grad : 0.000449s : 0.05% validate : 0.000030s : 0.00% backend_pass : 0.000001s : 0.00% task_emit : 0.904942s : 98.87% execute : 0.000009s : 0.00% Time group info: ------[substitution.] 0.000162 30 15.15% : 0.000025s : 5: substitution.arithmetic_simplify 1.19% : 0.000002s : 2: substitution.elim_not_effective 0.90% : 0.000001s : 2: substitution.fold_const_symbol 3.37% : 0.000005s : 4: substitution.graph_param_transform 65.80% : 0.000107s : 3: substitution.inline 1.73% : 0.000003s : 4: substitution.j_node_and_user_rematch 2.68% : 0.000004s : 4: substitution.remove_not_recompute_node 2.77% : 0.000005s : 4: substitution.replace_old_param 6.40% : 0.000010s : 2: substitution.tuple_list_get_item_eliminator ------[type_inference.] 0.005926 2 90.73% : 0.005376s : 1: type_inference.infer 9.27% : 0.000549s : 1: type_inference.specialize ------[replace.] 0.000039 5 70.52% : 0.000028s : 3: replace.inline 29.48% : 0.000012s : 2: replace.tuple_list_get_item_eliminator ------[match.] 0.000114 5 91.79% : 0.000105s : 3: match.inline 8.21% : 0.000009s : 2: match.tuple_list_get_item_eliminator ------[predicate.] 0.000157 1131 0.88% : 0.000001s : 11: predicate.accumulaten_eliminater 0.81% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.57% : 0.000001s : 8: predicate.addn_check_dump 0.81% : 0.000001s : 11: predicate.addn_zero_filter 0.79% : 0.000001s : 11: predicate.adjust_all_reduce_mul_add 2.16% : 0.000003s : 19: predicate.arithmetic_simplify 0.87% : 0.000001s : 11: predicate.cast_eliminate 0.69% : 0.000001s : 8: predicate.check_bprop_eliminate 0.57% : 0.000001s : 8: predicate.compare_switch_simplify 0.25% : 0.000000s : 4: predicate.const_output_eliminate 0.62% : 0.000001s : 8: predicate.depend_value_elim 0.86% : 0.000001s : 11: predicate.dict_get_item_const_eliminator 0.97% : 0.000002s : 11: predicate.dict_get_item_eliminator 0.88% : 0.000001s : 11: predicate.dict_set_item_eliminator 1.13% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.28% : 0.000000s : 4: predicate.elim_not_effective 0.38% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.09% : 0.000002s : 15: predicate.environ_add_const_eliminate 1.09% : 0.000002s : 15: predicate.environ_get_add_eliminate 1.07% : 0.000002s : 15: predicate.environ_get_depend_swap 1.78% : 0.000003s : 23: predicate.environ_get_eliminate 1.08% : 0.000002s : 15: predicate.environ_get_set_eliminate 1.28% : 0.000002s : 16: predicate.exchange_switch_depend_value 2.29% : 0.000004s : 16: predicate.float_depend_g_call 0.60% : 0.000001s : 8: predicate.float_environ_get_switch 0.88% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.25% : 0.000000s : 4: predicate.fold_const_symbol 0.72% : 0.000001s : 8: predicate.get_grad_eliminate 0.27% : 0.000000s : 4: predicate.graph_param_transform 0.70% : 0.000001s : 8: predicate.incorporate_call 0.55% : 0.000001s : 8: predicate.incorporate_call_switch 6.06% : 0.000010s : 51: predicate.inline 0.87% : 0.000001s : 8: predicate.inline_without_move 0.41% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.83% : 0.000001s : 8: predicate.less_batch_normalization 1.72% : 0.000003s : 21: predicate.list_to_tuple_eliminator_ 2.40% : 0.000004s : 32: predicate.load_eliminater 0.90% : 0.000001s : 4: predicate.loop_unroll_after_grad 2.48% : 0.000004s : 26: predicate.loop_unroll_before_grad 1.63% : 0.000003s : 19: predicate.make_slice_get_slice_eliminator 0.62% : 0.000001s : 8: predicate.merge_addn 0.63% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.66% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.79% : 0.000001s : 11: predicate.minmaximum_grad 1.11% : 0.000002s : 4: predicate.mutable_eliminate 0.38% : 0.000001s : 4: predicate.opt_reshape 0.49% : 0.000001s : 4: predicate.parallel_virtual_node 1.64% : 0.000003s : 16: predicate.partial_defer_inline 1.46% : 0.000002s : 17: predicate.partial_eliminate 1.05% : 0.000002s : 11: predicate.print_const_string_wrapper 0.64% : 0.000001s : 8: predicate.reduce_all_const_elim 1.20% : 0.000002s : 11: predicate.reduce_eliminate 2.38% : 0.000004s : 32: predicate.redundant_stop_gradient_eliminater 0.50% : 0.000001s : 8: predicate.remove_not_recompute_node 1.45% : 0.000002s : 21: predicate.replace_applicator 0.68% : 0.000001s : 8: predicate.replace_old_param 0.34% : 0.000001s : 4: predicate.reset_defer_inline 0.83% : 0.000001s : 11: predicate.reshape_eliminate 0.72% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.40% : 0.000001s : 4: predicate.row_tensor_eliminate 0.75% : 0.000001s : 8: predicate.same_eliminate 0.53% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.85% : 0.000001s : 8: predicate.shard_identity_eliminate 0.74% : 0.000001s : 8: predicate.special_op_eliminate 0.74% : 0.000001s : 8: predicate.specialize_transform 0.96% : 0.000002s : 8: predicate.split_environ_get_set_with_tuple_value 0.80% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.38% : 0.000001s : 4: predicate.switch_call_monad_eliminater 1.37% : 0.000002s : 16: predicate.switch_defer_inline 1.98% : 0.000003s : 24: predicate.switch_layer_defer_inline 5.10% : 0.000008s : 54: predicate.switch_simplify 0.90% : 0.000001s : 11: predicate.tile_eliminate 0.86% : 0.000001s : 11: predicate.transpose_eliminate 1.54% : 0.000002s : 19: predicate.tuple_list_convert_item_index_to_positive 1.58% : 0.000002s : 19: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 19: predicate.tuple_list_get_item_depend_reorder 3.40% : 0.000005s : 29: predicate.tuple_list_get_item_eliminator 1.50% : 0.000002s : 19: predicate.tuple_list_get_set_item_eliminator 2.17% : 0.000003s : 27: predicate.tuple_list_set_item_eliminator 1.65% : 0.000003s : 21: predicate.tuple_to_list_eliminator_ 2.35% : 0.000004s : 32: predicate.updatestate_pure_node_eliminater 3.22% : 0.000005s : 40: predicate.updatestate_useless_node_eliminater 0.42% : 0.000001s : 4: predicate.value_based_eliminate 0.79% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.78% : 0.000001s : 8: predicate.virtual_output_eliminate 0.35% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.46% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000340 8 45.87% : 0.000156s : 3: func_graph_cloner_run.FuncGraphClonerGraph 54.13% : 0.000184s : 5: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.928480 196 0.00% : 0.000003s : 1: ForceFp32Comm 0.36% : 0.003366s : 1: add_attr 0.36% : 0.003355s : 1: add_attr_with_inline 0.00% : 0.000004s : 1: add_comm_op_reuse_tag 0.01% : 0.000052s : 1: add_recomputation 0.00% : 0.000004s : 1: assign_add_opt 0.01% : 0.000066s : 1: auto_monad 0.00% : 0.000019s : 1: auto_monad_reorder 0.00% : 0.000006s : 1: backend_pass 0.00% : 0.000003s : 1: begin_end_overlap_inline 0.00% : 0.000005s : 1: bias_add_comm_swap 0.06% : 0.000578s : 1: bootstrap 0.00% : 0.000025s : 1: cconv 0.00% : 0.000004s : 1: comm_op_add_attrs 0.00% : 0.000015s : 1: control_data_broadcast_order 0.00% : 0.000010s : 1: convert_after_rewriter 0.00% : 0.000023s : 1: cse_after_recomputation 0.00% : 0.000005s : 1: dataset_repeat_opt 0.00% : 0.000005s : 1: detach_backward 0.00% : 0.000008s : 1: environ_conv 0.00% : 0.000019s : 1: event_method 0.00% : 0.000017s : 1: execute 0.00% : 0.000005s : 1: full_micro_interleaved_order_control 0.00% : 0.000004s : 1: get_jit_bprop_graph 0.00% : 0.000009s : 1: graph_reusing 0.00% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.00% : 0.000004s : 1: handle_group_info 0.00% : 0.000005s : 1: inline 0.00% : 0.000006s : 1: insert-virtual-dataset 0.00% : 0.000004s : 1: interleave_parallel_branches 0.00% : 0.000004s : 1: interleave_split_concat_branches 0.00% : 0.000006s : 1: label_fine_grained_interleaved_index 0.00% : 0.000007s : 1: label_micro_interleaved_index 0.05% : 0.000423s : 1: loop_unroll 0.00% : 0.000004s : 1: merge_cast_opt 0.00% : 0.000005s : 1: micro_interleaved_order_control 0.05% : 0.000457s : 1: mutable_eliminate 0.00% : 0.000006s : 1: offloading_packed_experts 0.00% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.00% : 0.000013s : 1: opt.transform.mutable_eliminate 0.10% : 0.000946s : 78: opt.transform.opt_a 0.00% : 0.000027s : 1: opt.transform.opt_after_cconv 0.00% : 0.000021s : 1: opt.transform.opt_after_jit_grad 0.01% : 0.000090s : 28: opt.transform.opt_b 0.00% : 0.000043s : 2: opt.transform.opt_trans_graph 0.00% : 0.000033s : 4: opt.transform.symbol_engine_opt 0.23% : 0.002134s : 1: opt_a 0.01% : 0.000098s : 1: opt_after_cconv 0.05% : 0.000458s : 1: opt_after_jit_grad 0.02% : 0.000185s : 1: opt_b 0.43% : 0.003970s : 1: optimize 0.00% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.00% : 0.000008s : 1: order_py_execute_after_rewriter 0.00% : 0.000020s : 1: overlap_grad_flash_sp 0.00% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.00% : 0.000007s : 1: overlap_grad_ring_attention 0.00% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.00% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.00% : 0.000005s : 1: overlap_param_gather 0.00% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.00% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.00% : 0.000005s : 1: overlap_recompute_comm 0.00% : 0.000007s : 1: parallel-infer-symbol 0.00% : 0.000004s : 1: parallel-infer-symbol-second 0.00% : 0.000005s : 1: partial_unused_args_eliminate 0.00% : 0.000004s : 1: pipeline_parallel_scheduler 0.00% : 0.000005s : 1: pipeline_split 0.00% : 0.000033s : 1: pre_auto_parallel 0.00% : 0.000024s : 1: py_interpret_to_execute 0.00% : 0.000011s : 1: py_interpret_to_execute_after_opt_a 0.00% : 0.000004s : 1: remove_cast_before_assign_add 0.00% : 0.000016s : 1: remove_dup_value 0.02% : 0.000210s : 1: renormalize.infer 0.02% : 0.000191s : 1: renormalize.specialize 0.00% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.00% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.00% : 0.000036s : 1: rewriter_after_opt_a 0.01% : 0.000062s : 1: rewriter_before_opt_a 0.00% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.00% : 0.000005s : 1: slice_recompute_activation 0.00% : 0.000005s : 1: split_layernorm_comm 0.00% : 0.000005s : 1: split_matmul_comm_elemetwise 0.00% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.01% : 0.000072s : 1: symbol_engine_optimizer 97.47% : 0.904987s : 1: task_emit 0.01% : 0.000072s : 1: tuple_transform 0.64% : 0.005984s : 1: type_inference 0.01% : 0.000055s : 1: validate TotalTime = 0.0543656, [24] [bootstrap]: 0.00047142 [type_inference]: 0.00438135 [event_method]: 1.025e-05 [auto_monad]: 5.043e-05 [graph_reusing]: 5.05001e-06 [inline]: 1.91e-06 [add_attr]: 0.00305262, [1] [add_attr_with_inline]: 0.00304494, [1] [Cycle 1]: 4.604e-05, [2] [tag_attr]: 1.209e-05 [meta_addattr_fg_expand]: 3.38999e-06 [parallel-infer-symbol]: 2.84001e-06 [pre_auto_parallel]: 2.071e-05 [insert-virtual-dataset]: 2.27001e-06 [parallel-infer-symbol-second]: 6.39993e-07 [dataset_repeat_opt]: 2.14999e-06 [pipeline_split]: 1.62999e-06 [optimize]: 0.00366222, [53] [py_interpret_to_execute]: 1.518e-05 [rewriter_before_opt_a]: 3.753e-05 [opt_a]: 0.00183989, [2] [Cycle 1]: 0.00123924, [45] [expand_dump_flag]: 2.42001e-06 [switch_simplify]: 2.377e-05 [loop_unroll]: 1.377e-05 [a_1]: 0.00028982 [with_stream_mark]: 1.369e-05 [recompute_prepare]: 7e-06 [updatestate_depend_eliminate]: 3.40003e-06 [updatestate_assign_eliminate]: 3.46001e-06 [updatestate_loads_eliminate]: 2.78e-06 [parameter_eliminate]: 2.32999e-06 [a_2]: 7.629e-05 [accelerated_algorithm]: 6.33e-06 [shard]: 2.09999e-06 [meta_shard_fg_expand]: 1.43002e-06 [shard_inline]: 6.06e-06 [merge_send_recv]: 7.18998e-06 [auto_parallel]: 5.77001e-06 [parallel]: 1.685e-05 [flash_sp]: 6.98998e-06 [merge_comm]: 3.37997e-06 [allreduce_fusion]: 3.30998e-06 [matmul_add_comm_reduction]: 8.45001e-06 [allreduce_slice_to_reducescatter]: 5.69999e-07 [virtual_shard_identity]: 6.99001e-06 [virtual_dataset]: 5.73997e-06 [get_grad_eliminate_]: 5.84e-06 [virtual_output]: 5.49e-06 [merge_forward]: 3.68999e-06 [cell_reuse_recompute_pass]: 1.19e-06 [offload_activation]: 9.15001e-06 [cell_reuse_handle_not_recompute_node_pass]: 1.123e-05 [merge_recompute_call_nodes]: 1.34998e-06 [before_grad]: 9.67999e-06 [set_forward_comm_id_for_comm_node_pass]: 3.47002e-06 [meta_fg_expand]: 2.17999e-06 [flash_sp_send_recv_attached]: 2.43e-06 [receive_attached]: 2.14e-06 [after_resolve]: 1.061e-05 [a_after_grad]: 8.82e-06 [renormalize]: 0.00033755 [add_forward_monad_depend]: 4.12003e-06 [auto_monad_grad]: 1.72999e-06 [auto_monad_eliminator]: 1.334e-05 [cse]: 2.667e-05 [a_3]: 3.987e-05 [Cycle 2]: 0.00059167, [45] [expand_dump_flag]: 7.89994e-07 [switch_simplify]: 7.31001e-06 [loop_unroll]: 5.51998e-06 [a_1]: 0.00012453 [with_stream_mark]: 9.29998e-06 [recompute_prepare]: 5.66e-06 [updatestate_depend_eliminate]: 2.71e-06 [updatestate_assign_eliminate]: 2.39001e-06 [updatestate_loads_eliminate]: 2.51998e-06 [parameter_eliminate]: 8.99978e-07 [a_2]: 6.807e-05 [accelerated_algorithm]: 5.69e-06 [shard]: 1.05999e-06 [meta_shard_fg_expand]: 1.12e-06 [shard_inline]: 5.63002e-06 [merge_send_recv]: 4.44002e-06 [auto_parallel]: 5.15999e-06 [parallel]: 3.86999e-06 [flash_sp]: 3.41999e-06 [merge_comm]: 3.09001e-06 [allreduce_fusion]: 2.68e-06 [matmul_add_comm_reduction]: 4.95999e-06 [allreduce_slice_to_reducescatter]: 2.89991e-07 [virtual_shard_identity]: 6.31e-06 [virtual_dataset]: 5.21998e-06 [get_grad_eliminate_]: 5.10001e-06 [virtual_output]: 5.10999e-06 [merge_forward]: 2.74999e-06 [cell_reuse_recompute_pass]: 1.27e-06 [offload_activation]: 6.11998e-06 [cell_reuse_handle_not_recompute_node_pass]: 9.25001e-06 [merge_recompute_call_nodes]: 7.2e-07 [before_grad]: 7.69002e-06 [set_forward_comm_id_for_comm_node_pass]: 2.94999e-06 [meta_fg_expand]: 1.66e-06 [flash_sp_send_recv_attached]: 7.80012e-07 [receive_attached]: 1.05999e-06 [after_resolve]: 9.58997e-06 [a_after_grad]: 8.23001e-06 [renormalize]: 9.00181e-08 [add_forward_monad_depend]: 1.08001e-06 [auto_monad_grad]: 8.79983e-07 [auto_monad_eliminator]: 6.56e-06 [cse]: 1.19e-05 [a_3]: 3.211e-05 [py_interpret_to_execute_after_opt_a]: 7.03998e-06 [slice_cell_reuse_recomputed_activation]: 1.94e-06 [rewriter_after_opt_a]: 3.036e-05 [convert_after_rewriter]: 7.01999e-06 [order_py_execute_after_rewriter]: 5.03002e-06 [mutable_eliminate]: 0.00044501 [opt_b]: 0.0002068, [1] [Cycle 1]: 0.00020095, [7] [b_1]: 0.00013195 [b_2]: 7.28999e-06 [updatestate_depend_eliminate]: 5.34998e-06 [updatestate_assign_eliminate]: 2.48e-06 [updatestate_loads_eliminate]: 2.39001e-06 [renormalize]: 3.00002e-07 [cse]: 1.666e-05 [optimize_parallel_all_gather_comm]: 1.606e-05 [overlap_param_gather]: 2.16e-06 [cconv]: 2.165e-05 [loop_unroll]: 0.00041557 [opt_after_cconv]: 9.445e-05, [1] [Cycle 1]: 8.881e-05, [7] [c_1]: 2.799e-05 [parameter_eliminate]: 2.29001e-06 [updatestate_depend_eliminate]: 5.11002e-06 [updatestate_assign_eliminate]: 2.54001e-06 [updatestate_loads_eliminate]: 2.36e-06 [cse]: 1.554e-05 [renormalize]: 4.20026e-07 [remove_dup_value]: 1.287e-05 [tuple_transform]: 6.888e-05, [1] [Cycle 1]: 6.467e-05, [4] [d_1]: 3.921e-05 [none_parameter_eliminate]: 1.47001e-06 [renormalize]: 1.8999e-07 [switch_simplify]: 6.29999e-06 [partial_unused_args_eliminate]: 1.92999e-06 [add_recomputation]: 4.274e-05 [cse_after_recomputation]: 1.965e-05, [1] [Cycle 1]: 1.542e-05, [1] [cse]: 1.045e-05 [environ_conv]: 4.85001e-06 [swap_dp_allreduce_reducescatter]: 5.27999e-06 [bias_add_comm_swap]: 2.26998e-06 [label_micro_interleaved_index]: 4e-06 [label_fine_grained_interleaved_index]: 2.49001e-06 [merge_cast_opt]: 1.24e-06 [slice_recompute_activation]: 2.36e-06 [micro_interleaved_order_control]: 2.27001e-06 [assign_add_opt]: 1.28002e-06 [ForceFp32Comm]: 7.30011e-07 [remove_cast_before_assign_add]: 9.00007e-07 [full_micro_interleaved_order_control]: 2.07999e-06 [reorder_send_recv_between_fp_bp]: 2.71e-06 [comm_op_add_attrs]: 9.89996e-07 [add_comm_op_reuse_tag]: 8.70001e-07 [interleave_split_concat_branches]: 1.12e-06 [interleave_parallel_branches]: 1.03001e-06 [overlap_opt_shard_in_pipeline]: 1.09e-06 [overlap_opt_shard_grad_in_pipeline]: 1.86e-06 [control_data_broadcast_order]: 1.132e-05 [grouped_pairwise_exchange_alltoall]: 1.42e-06 [offloading_packed_experts]: 3.60998e-06 [overlap_recompute_and_grad_model_parallel]: 4.4e-06 [overlap_grad_matmul_and_grad_allreduce]: 1.09998e-06 [overlap_recompute_allgather_and_fa_grad]: 1.32999e-06 [overlap_recompute_comm]: 2.43998e-06 [overlap_grad_ring_attention]: 3.97e-06 [overlap_grad_flash_sp]: 1.65e-05 [begin_end_overlap_inline]: 4.90021e-07 [split_matmul_comm_elemetwise]: 1.92001e-06 [split_layernorm_comm]: 2.16e-06 [handle_group_info]: 1.29e-06 [symbol_engine_optimizer]: 6.811e-05, [1] [Cycle 1]: 6.39e-05, [6] [build]: 2.34999e-06 [elim_shapecalc]: 8.42e-06 [elim_not_effective]: 1.128e-05 [opt_reshape]: 6.09999e-06 [fold_const_symbol]: 8.59002e-06 [renormalize]: 2.30008e-07 [detach_backward]: 1.66e-06 [pipeline_parallel_scheduler]: 1.49e-06 [auto_monad_reorder]: 1.544e-05 [get_jit_bprop_graph]: 1.03001e-06 [rewriter_after_jit_bprop_graph]: 3.21999e-06 [opt_after_jit_grad]: 0.00044646 [validate]: 3.038e-05 [backend_pass]: 1.17e-06 [task_emit]: 0.0419956 [execute]: 7.87e-06 Sums bootstrap : 0.000471s : 0.94% type_inference : 0.004381s : 8.70% event_method : 0.000010s : 0.02% auto_monad : 0.000050s : 0.10% graph_reusing : 0.000005s : 0.01% inline : 0.000002s : 0.00% add_attr.add_attr_with_inline.tag_attr : 0.000012s : 0.02% add_attr.add_attr_with_inline.meta_addattr_fg_expand : 0.000003s : 0.01% parallel-infer-symbol : 0.000003s : 0.01% pre_auto_parallel : 0.000021s : 0.04% insert-virtual-dataset : 0.000002s : 0.00% parallel-infer-symbol-second : 0.000001s : 0.00% dataset_repeat_opt : 0.000002s : 0.00% pipeline_split : 0.000002s : 0.00% optimize.py_interpret_to_execute : 0.000015s : 0.03% optimize.rewriter_before_opt_a : 0.000038s : 0.07% optimize.opt_a.expand_dump_flag : 0.000003s : 0.01% optimize.opt_a.switch_simplify : 0.000031s : 0.06% optimize.opt_a.loop_unroll : 0.000019s : 0.04% optimize.opt_a.a_1 : 0.000414s : 0.82% optimize.opt_a.with_stream_mark : 0.000023s : 0.05% optimize.opt_a.recompute_prepare : 0.000013s : 0.03% optimize.opt_a.updatestate_depend_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_assign_eliminate : 0.000006s : 0.01% optimize.opt_a.updatestate_loads_eliminate : 0.000005s : 0.01% optimize.opt_a.parameter_eliminate : 0.000003s : 0.01% optimize.opt_a.a_2 : 0.000144s : 0.29% optimize.opt_a.accelerated_algorithm : 0.000012s : 0.02% optimize.opt_a.shard : 0.000003s : 0.01% optimize.opt_a.meta_shard_fg_expand : 0.000003s : 0.01% optimize.opt_a.shard_inline : 0.000012s : 0.02% optimize.opt_a.merge_send_recv : 0.000012s : 0.02% optimize.opt_a.auto_parallel : 0.000011s : 0.02% optimize.opt_a.parallel : 0.000021s : 0.04% optimize.opt_a.flash_sp : 0.000010s : 0.02% optimize.opt_a.merge_comm : 0.000006s : 0.01% optimize.opt_a.allreduce_fusion : 0.000006s : 0.01% optimize.opt_a.matmul_add_comm_reduction : 0.000013s : 0.03% optimize.opt_a.allreduce_slice_to_reducescatter : 0.000001s : 0.00% optimize.opt_a.virtual_shard_identity : 0.000013s : 0.03% optimize.opt_a.virtual_dataset : 0.000011s : 0.02% optimize.opt_a.get_grad_eliminate_ : 0.000011s : 0.02% optimize.opt_a.virtual_output : 0.000011s : 0.02% optimize.opt_a.merge_forward : 0.000006s : 0.01% optimize.opt_a.cell_reuse_recompute_pass : 0.000002s : 0.00% optimize.opt_a.offload_activation : 0.000015s : 0.03% optimize.opt_a.cell_reuse_handle_not_recompute_node_pass : 0.000020s : 0.04% optimize.opt_a.merge_recompute_call_nodes : 0.000002s : 0.00% optimize.opt_a.before_grad : 0.000017s : 0.03% optimize.opt_a.set_forward_comm_id_for_comm_node_pass : 0.000006s : 0.01% optimize.opt_a.meta_fg_expand : 0.000004s : 0.01% optimize.opt_a.flash_sp_send_recv_attached : 0.000003s : 0.01% optimize.opt_a.receive_attached : 0.000003s : 0.01% optimize.opt_a.after_resolve : 0.000020s : 0.04% optimize.opt_a.a_after_grad : 0.000017s : 0.03% optimize.opt_a.renormalize : 0.000338s : 0.67% optimize.opt_a.add_forward_monad_depend : 0.000005s : 0.01% optimize.opt_a.auto_monad_grad : 0.000003s : 0.01% optimize.opt_a.auto_monad_eliminator : 0.000020s : 0.04% optimize.opt_a.cse : 0.000039s : 0.08% optimize.opt_a.a_3 : 0.000072s : 0.14% optimize.py_interpret_to_execute_after_opt_a : 0.000007s : 0.01% optimize.slice_cell_reuse_recomputed_activation : 0.000002s : 0.00% optimize.rewriter_after_opt_a : 0.000030s : 0.06% optimize.convert_after_rewriter : 0.000007s : 0.01% optimize.order_py_execute_after_rewriter : 0.000005s : 0.01% optimize.mutable_eliminate : 0.000445s : 0.88% optimize.opt_b.b_1 : 0.000132s : 0.26% optimize.opt_b.b_2 : 0.000007s : 0.01% optimize.opt_b.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_b.updatestate_assign_eliminate : 0.000002s : 0.00% optimize.opt_b.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_b.renormalize : 0.000000s : 0.00% optimize.opt_b.cse : 0.000017s : 0.03% optimize.optimize_parallel_all_gather_comm : 0.000016s : 0.03% optimize.overlap_param_gather : 0.000002s : 0.00% optimize.cconv : 0.000022s : 0.04% optimize.loop_unroll : 0.000416s : 0.83% optimize.opt_after_cconv.c_1 : 0.000028s : 0.06% optimize.opt_after_cconv.parameter_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.updatestate_depend_eliminate : 0.000005s : 0.01% optimize.opt_after_cconv.updatestate_assign_eliminate : 0.000003s : 0.01% optimize.opt_after_cconv.updatestate_loads_eliminate : 0.000002s : 0.00% optimize.opt_after_cconv.cse : 0.000016s : 0.03% optimize.opt_after_cconv.renormalize : 0.000000s : 0.00% optimize.remove_dup_value : 0.000013s : 0.03% optimize.tuple_transform.d_1 : 0.000039s : 0.08% optimize.tuple_transform.none_parameter_eliminate : 0.000001s : 0.00% optimize.tuple_transform.renormalize : 0.000000s : 0.00% optimize.tuple_transform.switch_simplify : 0.000006s : 0.01% optimize.partial_unused_args_eliminate : 0.000002s : 0.00% optimize.add_recomputation : 0.000043s : 0.08% optimize.cse_after_recomputation.cse : 0.000010s : 0.02% optimize.environ_conv : 0.000005s : 0.01% optimize.swap_dp_allreduce_reducescatter : 0.000005s : 0.01% optimize.bias_add_comm_swap : 0.000002s : 0.00% optimize.label_micro_interleaved_index : 0.000004s : 0.01% optimize.label_fine_grained_interleaved_index : 0.000002s : 0.00% optimize.merge_cast_opt : 0.000001s : 0.00% optimize.slice_recompute_activation : 0.000002s : 0.00% optimize.micro_interleaved_order_control : 0.000002s : 0.00% optimize.assign_add_opt : 0.000001s : 0.00% optimize.ForceFp32Comm : 0.000001s : 0.00% optimize.remove_cast_before_assign_add : 0.000001s : 0.00% optimize.full_micro_interleaved_order_control : 0.000002s : 0.00% optimize.reorder_send_recv_between_fp_bp : 0.000003s : 0.01% optimize.comm_op_add_attrs : 0.000001s : 0.00% optimize.add_comm_op_reuse_tag : 0.000001s : 0.00% optimize.interleave_split_concat_branches : 0.000001s : 0.00% optimize.interleave_parallel_branches : 0.000001s : 0.00% optimize.overlap_opt_shard_in_pipeline : 0.000001s : 0.00% optimize.overlap_opt_shard_grad_in_pipeline : 0.000002s : 0.00% optimize.control_data_broadcast_order : 0.000011s : 0.02% optimize.grouped_pairwise_exchange_alltoall : 0.000001s : 0.00% optimize.offloading_packed_experts : 0.000004s : 0.01% optimize.overlap_recompute_and_grad_model_parallel : 0.000004s : 0.01% optimize.overlap_grad_matmul_and_grad_allreduce : 0.000001s : 0.00% optimize.overlap_recompute_allgather_and_fa_grad : 0.000001s : 0.00% optimize.overlap_recompute_comm : 0.000002s : 0.00% optimize.overlap_grad_ring_attention : 0.000004s : 0.01% optimize.overlap_grad_flash_sp : 0.000016s : 0.03% optimize.begin_end_overlap_inline : 0.000000s : 0.00% optimize.split_matmul_comm_elemetwise : 0.000002s : 0.00% optimize.split_layernorm_comm : 0.000002s : 0.00% optimize.handle_group_info : 0.000001s : 0.00% optimize.symbol_engine_optimizer.build : 0.000002s : 0.00% optimize.symbol_engine_optimizer.elim_shapecalc : 0.000008s : 0.02% optimize.symbol_engine_optimizer.elim_not_effective : 0.000011s : 0.02% optimize.symbol_engine_optimizer.opt_reshape : 0.000006s : 0.01% optimize.symbol_engine_optimizer.fold_const_symbol : 0.000009s : 0.02% optimize.symbol_engine_optimizer.renormalize : 0.000000s : 0.00% detach_backward : 0.000002s : 0.00% pipeline_parallel_scheduler : 0.000001s : 0.00% auto_monad_reorder : 0.000015s : 0.03% get_jit_bprop_graph : 0.000001s : 0.00% rewriter_after_jit_bprop_graph : 0.000003s : 0.01% opt_after_jit_grad : 0.000446s : 0.89% validate : 0.000030s : 0.06% backend_pass : 0.000001s : 0.00% task_emit : 0.041996s : 83.39% execute : 0.000008s : 0.02% Time group info: ------[substitution.] 0.000121 26 18.12% : 0.000022s : 4: substitution.arithmetic_simplify 1.47% : 0.000002s : 2: substitution.elim_not_effective 1.04% : 0.000001s : 2: substitution.fold_const_symbol 4.58% : 0.000006s : 4: substitution.graph_param_transform 65.39% : 0.000079s : 2: substitution.inline 2.58% : 0.000003s : 4: substitution.j_node_and_user_rematch 3.64% : 0.000004s : 4: substitution.remove_not_recompute_node 3.18% : 0.000004s : 4: substitution.replace_old_param ------[type_inference.] 0.004341 2 91.95% : 0.003991s : 1: type_inference.infer 8.05% : 0.000349s : 1: type_inference.specialize ------[replace.] 0.000019 2 100.00% : 0.000019s : 2: replace.inline ------[match.] 0.000078 2 100.00% : 0.000078s : 2: match.inline ------[predicate.] 0.000137 984 1.07% : 0.000001s : 9: predicate.accumulaten_eliminater 0.94% : 0.000001s : 4: predicate.ad_related_special_op_eliminate 0.68% : 0.000001s : 8: predicate.addn_check_dump 0.76% : 0.000001s : 9: predicate.addn_zero_filter 0.71% : 0.000001s : 9: predicate.adjust_all_reduce_mul_add 2.53% : 0.000003s : 17: predicate.arithmetic_simplify 0.82% : 0.000001s : 9: predicate.cast_eliminate 0.79% : 0.000001s : 8: predicate.check_bprop_eliminate 0.70% : 0.000001s : 8: predicate.compare_switch_simplify 0.28% : 0.000000s : 4: predicate.const_output_eliminate 0.69% : 0.000001s : 8: predicate.depend_value_elim 0.82% : 0.000001s : 9: predicate.dict_get_item_const_eliminator 0.88% : 0.000001s : 9: predicate.dict_get_item_eliminator 0.80% : 0.000001s : 9: predicate.dict_set_item_eliminator 1.32% : 0.000002s : 8: predicate.dumpgradient_eliminate 0.30% : 0.000000s : 4: predicate.elim_not_effective 0.45% : 0.000001s : 4: predicate.elim_shapecalc_of_broadcastargs 1.14% : 0.000002s : 13: predicate.environ_add_const_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_add_eliminate 1.03% : 0.000001s : 13: predicate.environ_get_depend_swap 1.86% : 0.000003s : 21: predicate.environ_get_eliminate 1.05% : 0.000001s : 13: predicate.environ_get_set_eliminate 0.95% : 0.000001s : 11: predicate.exchange_switch_depend_value 1.73% : 0.000002s : 11: predicate.float_depend_g_call 0.65% : 0.000001s : 8: predicate.float_environ_get_switch 1.05% : 0.000001s : 12: predicate.float_tuple_getitem_switch 0.23% : 0.000000s : 4: predicate.fold_const_symbol 1.11% : 0.000002s : 8: predicate.get_grad_eliminate 0.26% : 0.000000s : 4: predicate.graph_param_transform 0.80% : 0.000001s : 8: predicate.incorporate_call 0.66% : 0.000001s : 8: predicate.incorporate_call_switch 5.85% : 0.000008s : 44: predicate.inline 1.00% : 0.000001s : 8: predicate.inline_without_move 0.45% : 0.000001s : 8: predicate.j_node_and_user_rematch 0.92% : 0.000001s : 8: predicate.less_batch_normalization 1.59% : 0.000002s : 17: predicate.list_to_tuple_eliminator_ 2.21% : 0.000003s : 26: predicate.load_eliminater 1.15% : 0.000002s : 4: predicate.loop_unroll_after_grad 2.00% : 0.000003s : 18: predicate.loop_unroll_before_grad 1.70% : 0.000002s : 17: predicate.make_slice_get_slice_eliminator 0.71% : 0.000001s : 8: predicate.merge_addn 0.75% : 0.000001s : 8: predicate.micro_step_allgather_replace 0.78% : 0.000001s : 8: predicate.mini_step_allgather_replace 0.72% : 0.000001s : 9: predicate.minmaximum_grad 1.21% : 0.000002s : 4: predicate.mutable_eliminate 0.47% : 0.000001s : 4: predicate.opt_reshape 0.58% : 0.000001s : 4: predicate.parallel_virtual_node 1.21% : 0.000002s : 11: predicate.partial_defer_inline 1.23% : 0.000002s : 13: predicate.partial_eliminate 0.79% : 0.000001s : 9: predicate.print_const_string_wrapper 0.77% : 0.000001s : 8: predicate.reduce_all_const_elim 0.99% : 0.000001s : 9: predicate.reduce_eliminate 2.17% : 0.000003s : 26: predicate.redundant_stop_gradient_eliminater 0.94% : 0.000001s : 8: predicate.remove_not_recompute_node 1.37% : 0.000002s : 17: predicate.replace_applicator 1.07% : 0.000001s : 8: predicate.replace_old_param 0.37% : 0.000001s : 4: predicate.reset_defer_inline 0.77% : 0.000001s : 9: predicate.reshape_eliminate 0.80% : 0.000001s : 8: predicate.row_tensor_add_zeros_like 0.45% : 0.000001s : 4: predicate.row_tensor_eliminate 0.91% : 0.000001s : 8: predicate.same_eliminate 0.60% : 0.000001s : 8: predicate.set_cell_output_no_recompute 0.99% : 0.000001s : 8: predicate.shard_identity_eliminate 0.99% : 0.000001s : 8: predicate.special_op_eliminate 0.96% : 0.000001s : 8: predicate.specialize_transform 1.00% : 0.000001s : 8: predicate.split_environ_get_set_with_tuple_value 0.92% : 0.000001s : 8: predicate.stack_unstack_eliminate 0.48% : 0.000001s : 4: predicate.switch_call_monad_eliminater 0.99% : 0.000001s : 11: predicate.switch_defer_inline 1.91% : 0.000003s : 19: predicate.switch_layer_defer_inline 4.54% : 0.000006s : 41: predicate.switch_simplify 0.75% : 0.000001s : 9: predicate.tile_eliminate 0.80% : 0.000001s : 9: predicate.transpose_eliminate 1.52% : 0.000002s : 17: predicate.tuple_list_convert_item_index_to_positive 1.59% : 0.000002s : 17: predicate.tuple_list_get_item_const_eliminator 1.37% : 0.000002s : 17: predicate.tuple_list_get_item_depend_reorder 3.04% : 0.000004s : 25: predicate.tuple_list_get_item_eliminator 1.47% : 0.000002s : 17: predicate.tuple_list_get_set_item_eliminator 2.27% : 0.000003s : 25: predicate.tuple_list_set_item_eliminator 1.57% : 0.000002s : 17: predicate.tuple_to_list_eliminator_ 2.16% : 0.000003s : 26: predicate.updatestate_pure_node_eliminater 2.99% : 0.000004s : 34: predicate.updatestate_useless_node_eliminater 0.46% : 0.000001s : 4: predicate.value_based_eliminate 0.83% : 0.000001s : 8: predicate.virtual_dataset_eliminate 0.82% : 0.000001s : 8: predicate.virtual_output_eliminate 0.39% : 0.000001s : 4: predicate.virtual_view_grad_eliminate 0.54% : 0.000001s : 4: predicate.zero_like_fill_zero ------[func_graph_cloner_run.] 0.000247 6 43.27% : 0.000107s : 2: func_graph_cloner_run.FuncGraphClonerGraph 56.73% : 0.000140s : 4: func_graph_cloner_run.FuncGraphSpecializer ------[meta_graph.] 0.000000 0 ------[manager.] 0.000000 0 ------[pynative] 0.000000 0 ------[others.] 0.062365 196 0.01% : 0.000004s : 1: ForceFp32Comm 4.90% : 0.003057s : 1: add_attr 4.89% : 0.003048s : 1: add_attr_with_inline 0.01% : 0.000004s : 1: add_comm_op_reuse_tag 0.07% : 0.000047s : 1: add_recomputation 0.01% : 0.000004s : 1: assign_add_opt 0.09% : 0.000055s : 1: auto_monad 0.03% : 0.000019s : 1: auto_monad_reorder 0.01% : 0.000006s : 1: backend_pass 0.01% : 0.000003s : 1: begin_end_overlap_inline 0.01% : 0.000005s : 1: bias_add_comm_swap 0.81% : 0.000507s : 1: bootstrap 0.04% : 0.000025s : 1: cconv 0.01% : 0.000004s : 1: comm_op_add_attrs 0.02% : 0.000014s : 1: control_data_broadcast_order 0.02% : 0.000010s : 1: convert_after_rewriter 0.04% : 0.000023s : 1: cse_after_recomputation 0.01% : 0.000005s : 1: dataset_repeat_opt 0.01% : 0.000005s : 1: detach_backward 0.01% : 0.000008s : 1: environ_conv 0.02% : 0.000015s : 1: event_method 0.02% : 0.000014s : 1: execute 0.01% : 0.000005s : 1: full_micro_interleaved_order_control 0.01% : 0.000004s : 1: get_jit_bprop_graph 0.01% : 0.000009s : 1: graph_reusing 0.01% : 0.000004s : 1: grouped_pairwise_exchange_alltoall 0.01% : 0.000004s : 1: handle_group_info 0.01% : 0.000005s : 1: inline 0.01% : 0.000005s : 1: insert-virtual-dataset 0.01% : 0.000004s : 1: interleave_parallel_branches 0.01% : 0.000004s : 1: interleave_split_concat_branches 0.01% : 0.000005s : 1: label_fine_grained_interleaved_index 0.01% : 0.000007s : 1: label_micro_interleaved_index 0.68% : 0.000424s : 1: loop_unroll 0.01% : 0.000004s : 1: merge_cast_opt 0.01% : 0.000005s : 1: micro_interleaved_order_control 0.73% : 0.000454s : 1: mutable_eliminate 0.01% : 0.000006s : 1: offloading_packed_experts 0.02% : 0.000012s : 1: opt.transform.loop_unroll_optimizer 0.02% : 0.000012s : 1: opt.transform.mutable_eliminate 1.23% : 0.000767s : 78: opt.transform.opt_a 0.04% : 0.000027s : 1: opt.transform.opt_after_cconv 0.04% : 0.000022s : 1: opt.transform.opt_after_jit_grad 0.18% : 0.000114s : 28: opt.transform.opt_b 0.07% : 0.000043s : 2: opt.transform.opt_trans_graph 0.05% : 0.000031s : 4: opt.transform.symbol_engine_opt 2.95% : 0.001843s : 1: opt_a 0.16% : 0.000098s : 1: opt_after_cconv 0.73% : 0.000456s : 1: opt_after_jit_grad 0.34% : 0.000210s : 1: opt_b 5.88% : 0.003666s : 1: optimize 0.03% : 0.000020s : 1: optimize_parallel_all_gather_comm 0.01% : 0.000008s : 1: order_py_execute_after_rewriter 0.03% : 0.000020s : 1: overlap_grad_flash_sp 0.01% : 0.000004s : 1: overlap_grad_matmul_and_grad_allreduce 0.01% : 0.000007s : 1: overlap_grad_ring_attention 0.01% : 0.000004s : 1: overlap_opt_shard_grad_in_pipeline 0.01% : 0.000004s : 1: overlap_opt_shard_in_pipeline 0.01% : 0.000005s : 1: overlap_param_gather 0.01% : 0.000004s : 1: overlap_recompute_allgather_and_fa_grad 0.01% : 0.000007s : 1: overlap_recompute_and_grad_model_parallel 0.01% : 0.000005s : 1: overlap_recompute_comm 0.01% : 0.000006s : 1: parallel-infer-symbol 0.01% : 0.000004s : 1: parallel-infer-symbol-second 0.01% : 0.000005s : 1: partial_unused_args_eliminate 0.01% : 0.000005s : 1: pipeline_parallel_scheduler 0.01% : 0.000005s : 1: pipeline_split 0.04% : 0.000025s : 1: pre_auto_parallel 0.03% : 0.000019s : 1: py_interpret_to_execute 0.02% : 0.000010s : 1: py_interpret_to_execute_after_opt_a 0.01% : 0.000004s : 1: remove_cast_before_assign_add 0.03% : 0.000016s : 1: remove_dup_value 0.30% : 0.000184s : 1: renormalize.infer 0.24% : 0.000147s : 1: renormalize.specialize 0.01% : 0.000005s : 1: reorder_send_recv_between_fp_bp 0.01% : 0.000006s : 1: rewriter_after_jit_bprop_graph 0.05% : 0.000034s : 1: rewriter_after_opt_a 0.07% : 0.000042s : 1: rewriter_before_opt_a 0.01% : 0.000005s : 1: slice_cell_reuse_recomputed_activation 0.01% : 0.000005s : 1: slice_recompute_activation 0.01% : 0.000005s : 1: split_layernorm_comm 0.01% : 0.000005s : 1: split_matmul_comm_elemetwise 0.01% : 0.000008s : 1: swap_dp_allreduce_reducescatter 0.11% : 0.000071s : 1: symbol_engine_optimizer 67.36% : 0.042011s : 1: task_emit 0.11% : 0.000072s : 1: tuple_transform 7.05% : 0.004395s : 1: type_inference 0.08% : 0.000051s : 1: validate . [hook] pytest_runtest_teardown:test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x9-ge] tests/st/mint/test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y9-dtype_x9-ge],max_mem:68.0M =============================== warnings summary =============================== ../../../../../../../../usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/classifier/transdata/transdata_classifier.py:222 /usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/classifier/transdata/transdata_classifier.py:222: DeprecationWarning: invalid escape sequence \B """ ../../../../../../../../usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:143 /usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:143: DeprecationWarning: invalid escape sequence \c """ ../../../../../../../../usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:170 /usr/local/Ascend/cann-8.5.0/python/site-packages/tbe/dsl/unify_schedule/vector/transdata/common/graph/transdata_graph_info.py:170: DeprecationWarning: invalid escape sequence \c """ ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:549: UserWarning: The value of the smallest subnormal for type is zero. setattr(self, word, getattr(machar, word).flat[0]) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/numpy/core/getlimits.py:89: UserWarning: The value of the smallest subnormal for type is zero. return self._float_to_str(self.smallest_subnormal) ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2.py:57 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2.py:57: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad.py:56 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad.py:56: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad_reduce.py:48 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/batchnorm_fold2_grad_reduce.py:48: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("batchnorm_fold2_grad_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul.py:51 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul.py:51: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:51 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:51: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:143 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/correction_mul_grad.py:143: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("correction_mul_grad_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad.py:92 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad.py:92: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer_grad_d") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad_reduce.py:49 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perlayer_grad_reduce.py:49: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perlayer_grad_d_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad.py:91 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad.py:91: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel_grad_d") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad_reduce.py:48 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_learned_scale_quant_perchannel_grad_reduce.py:48: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_learned_scale_quant_perchannel_grad_d_reduce") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel.py:52 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel.py:52: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_perchannel") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel_grad.py:81 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perchannel_grad.py:81: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_perchannel_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer.py:54 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer.py:54: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_per_layer") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer_grad.py:81 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/fake_quant_perlayer_grad.py:81: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("fake_quant_per_layer_grad") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perchannel.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perchannel.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("minmax_update_perchannel") ../../../../../../anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perlayer.py:50 /home/jenkins/anaconda3/envs/ci39/lib/python3.9/site-packages/mindspore/ops/_op_impl/_custom_op/minmax_update_perlayer.py:50: DeprecationWarning: te_fusion.fusion_manager.fusion_manager.register is deprecated,please replace it with tbe.common.register.register_op_compute @fusion_manager.register("minmax_update_perlayer") test_functional_mul.py::test_mint_mul_mixed_precision_combinations[dtype_y0-dtype_x0-ge] /usr/local/Ascend/cann-8.5.0/python/site-packages/asc_op_compile_base/asc_op_compiler/ascendc_compile_gen_code.py:161: DeprecationWarning: invalid escape sequence \w match = re.search(f'{option}=(\w+)', ' '.join(compile_options)) -- Docs: https://docs.pytest.org/en/stable/warnings.html ================ 300 passed, 26 warnings in 1048.52s (0:17:28) =================